Duke’s Legacy: Video Game Source Disc Preservation at the Library of Congress

The following is a guest post from David Gibson, a moving image technician in the Library of Congress. He was previously interviewed about the Library of Congress video games collection.

The discovery of that which has been lost or previously unattainable is one of the driving forces behind the archival profession and one of the passions the profession shares with the gaming community. Video game enthusiasts have long been fascinated by unreleased games and “lost levels,” gameplay levels which are partially developed but left out of the final release of the game. Discovery is, of course, a key component to gameplay. Players revel in the thrill of unlocking the secret door or uncovering Easter eggs hidden in the game by developers. In many ways, the fascination with obtaining access to unreleased games or levels brings this thrill of discovery into the real world. In a recent article written for The Atlantic, Heidi Kemps discusses the joy in obtaining online access to playable lost levels from the 1992 Sega Genesis game, Sonic The Hedgehog 2, reveling in the fact that access to these levels gave her a glimpse into how this beloved game was made.

Original source disc as it was received by the Library of Congress.

Original source disc as it was received by the Library of Congress.

Since 2006, the Moving Image section of the Library of Congress has served as the custodial unit for video games. In this capacity, we receive roughly 400 video games per year through the Copyright registration process, about 99% of which are physically published console games. In addition to the games themselves we sometimes receive ancillary materials, such as printed descriptions of the game, DVDs or VHS cassettes featuring excerpts of gameplay, or the occasional printed source code excerpt. These materials are useful, primarily for their contextual value, in helping to tell the story of video game development in this country and are retained along with the games in the collection.

Several months ago, while performing an inventory of recently acquired video games, I happened upon a DVD-R labeled Duke Nukem: Critical Mass (PSP). My first assumption was that the disc, like so many others we have received, was a DVD-R of gameplay. However, a line of text on the Copyright database record for the item intrigued me. It reads: Authorship: Entire video game; computer code; artwork; and music. I placed the disc into my computer’s DVD drive to discover that the DVD-R did not contain video, but instead a file directory, including every asset used to make up the game in a wide variety of proprietary formats. Upon further research, I discovered that the Playstation Portable version of Duke Nukem: Critical Mass was never actually released commercially and was in fact a very different beast than the Nintendo DS version of the game which did see release. I realized then that in my computer was the source disc used to author the UMD for an unreleased PlayStation Portable game. I could feel the lump in my throat. I felt as though I had solved the wizard’s riddle and unlocked the secret door.

Excerpt of code from boot.bin including game text.

Excerpt of code from boot.bin including game text.

The first challenge involved finding a way to access the proprietary Sony file formats contained within the disc, including, but not limited to, graphics files in .gim format and audio files in .AT3 format. I enlisted the aid of Packard Campus Software Developer Matt Derby and we were able to pull the files off of the disc and get a clearer sense of the file structure contained within. Through some research on various PSP homebrew sites we discovered Noesis, a program that would allow us to access the .gim and .gmo files which contain the 3D models and textures used to create the game’s characters and 3D environments. With this program we were able to view a complete 3D view of Duke Nukem himself, soaring through the air on his jetpack and a pre-composite 3D model of one of the game’s nemeses, the Pig Cops. Additionally, we employed Mediacoder and VLC in order to convert the Sony .AT3 (ATRAC3) audio files to MP3 in order to have access to the game’s many music cues.

duke_jp_gmo

3D model for Duke Nukem equipped with jetpack. View an animated gif of the model here.

Perhaps the most exciting discovery came when we used a hex editor to access the ASCII text held in the boot.bin folder in the disc’s system directory. Here we located the full text and credit information for the game along with a large chunk of un-obfuscated software code. However, much of what is contained in this folder was presented as compiled binaries. It is my hope that access to both the compiled binaries and ASCII code will allow us to explore future preservation options for video games. Such information becomes even more vital in the case of games such as this Duke Nukem title which were never released for public consumption. In many ways, this source disc can serve as an exemplary case as we work to define preferred format requirements for software received by the Library of Congress. Ultimately, I feel that access to the game assets and source code will prove to be invaluable both to researchers who are interested in game design and mechanics and to any preservation efforts the Library may undertake.

Providing access to the disc’s content to researchers will, unfortunately, remain a challenge. As mentioned above, it was difficult enough for Library of Congress staff to view the proprietary formats found on the disc before seeking help from the homebrew community. The legal and logistical hurdles related to providing access to licensed software will continue to present themselves as we move forward but I hope that increased focus on the tremendous research value of such digital assets will allow for these items to be more accessible in the future. For now the assets and code will be stored in our digital archive at the Packard Campus in Culpeper and the physical disc will be stored in temperature-controlled vaults.

The source disc for the PSP version of Duke Nukem: Critical Mass stands out in the video game collection of the Library of Congress as a true digital rarity. In Doug Reside’s recent article “File Not Found: Rarity in the Age of Digital Plenty” (pdf), he explores the notion of source code as manuscript and the concept of digital palimpsests that are created through the various layers that make up a Photoshop document or which are present in the various saved “layers” of a Microsoft Word document. The ability to view the pre-compiled assets for this unreleased game provides a similar opportunity to view the game as a work-in-progress, or at the very least to see the inner workings and multiple layers of a work of software beyond what is presented to us in the final, published version. In my mind, receiving the source disc for an unreleased game directly from the developer is analogous to receiving the original camera negative for an unreleased film, along with all of the separate production elements used to make the film. The disc is a valuable evidentiary artifact and I hope we will see more of its kind as we continue to define and develop our software preservation efforts.

The staff of the Moving Image section would love the opportunity to work with more source materials for games and I hope that game developers who are interested in preserving their legacy will be willing to submit these kinds of materials to us in the future. Though source discs are not currently a requirement for copyright, they are absolutely invaluable in contributing to our efforts towards stewardship and long term access to the documentation of these creative works.

Special thanks to Matt Derby for his assistance with this project and input for this post.

Making Scanned Content Accessible Using Full-text Search and OCR

This following is a guest post by Chris Adams from the Repository Development Center at the Library of Congress, the technical lead for the World Digital Library. We live in an age of cheap bits: scanning objects en masse has never been easier, storage has never been cheaper and large-scale digitization has become routine for […]

Web Archiving and Preserving the Performing Arts in the Digital Age

The following is a guest post from Gavin Frome, an intern for the Web Archiving Team at the Library of Congress. Performing artists are by necessity a traveling people. They journey far and wide in the pursuit of their respective crafts, working, learning and weaving a fabric of loose cultural connections that help bind people […]

Recommended Format Specifications from the Library of Congress: An Interview with Ted Westervelt

Continuing the NDSA Insights interview series, I am thrilled to talk about the new Library of Congress Recommended Format Specifications with Ted Westervelt, head of acquisitions and cataloging for U.S. Serials – Arts, Humanities & Sciences at the Library of Congress. Ted has been overseeing the development of the Recommended Format Specifications. While the specifications […]

National Digital Stewardship Residents Conclude Program with Capstone Meeting

The following is a guest post by Kris Nelson, Program Management Specialist at the Library of Congress and Program Coordinator of the National Digital Stewardship Residency. A version of this article was originally published in the Library of Congress weekly staff newspaper The Gazette. The National Digital Stewardship Residency concluded the inaugural year of the […]

Saving the Date: Exploring Calendar and Scheduling Formats

Each January, my family picks out a new wall calendar to hang in our kitchen. Its main appeal these days is nostalgic decoration since we no longer use it to write down our appointments or important dates. Like many people, we now rely on electronic calendar and scheduling tools built into personal information manager software […]

Shaking the Email Format Family Tree

Recently, we’ve started to add email formats to the Sustainability of Digital Formats website. Eventually, when we get a more robust collection, we’d like to split them out into a separate content category but for now, they (mostly) are categorized with their closest cousin, the Textual Content family.  Our genealogical research is still very much […]