As a follow up to the “Preserving Digital Culture” panel from DigitalPreservation 2012, I’ve been interviewing panelists who haven’t yet been featured on The Signal. Megan Winget was previously interviewed, discussing the challenges of preserving new media. This week’s post features Doug Reside, Digital Curator with the New York Public Library for the Performing Arts. During the panel session last month, Doug told the fascinating story of his collaboration with the Library of Congress on the Jonathan Larson collection, and how he was able to reveal previously hidden information from born digital materials in the collection. Here, he talks about his background, a bit more about this project, and offers more thoughts on working with born digital collections.
Susan: Tell us a little about your background and how you became a Digital Curator.
Doug: As an undergraduate I double majored in English and Computer Science because I enjoyed both writing text and code as two related forms of creative expression. While working on my master’s degree, I wrote a program that compared the frequency of non-Latin characters in Old English manuscripts while taking a course on Beowulf, and in the process discovered humanities computing (as Digital Humanities was then called). For my Ph.D., I went to the University of Kentucky to join the digital medieval work there and ended up writing a dissertation that included an electronic edition of a 20th century musical theater text.
After graduating, I joined the staff of the Maryland Institute for Technology in the Humanities at the University of Maryland in College Park, first as assistant director and eventually, associate director. While at MITH, I worked on several projects involving digital preservation and curation including helping to preserve and emulate William Gibson’s electronic poem, Agrippa; an NEH-funded white paper on literary born digital collections; the CLIR report on using digital forensics in archives (and, of course, the Jonathan Larson research I’ll discuss later). The position of digital curator at New York Public Library actually emerged out of discussions I had with leadership at NYPL in the summer of 2010, and when the position was created I was honored to accept an offer to become the first to occupy it.
Susan: At the DigitalPreservation 2012 meeting, you described your recent project working with the Jonathan Larson papers at the Library of Congress. How did this come about?
Doug: I was working on a paper for a conference on the life and musicals of Stephen Sondheim and was interested in the way in which emerging digital composition practices might have affected his work. I asked Library of Congress Senior Music Specialist, Mark Eden Horowitz, if the library had yet received any born digital material as part of Sondheim’s papers (which had recently been promised to the Music Division.) Mark told me there was nothing from Sondheim, but that the Jonathan Larson collection did include some floppy disks. He worked with me to develop an agreement between the Library, the Larson estate, and myself to permit me to study them.
Susan: Describe your basic process for uncovering Larson’s early drafts from all the floppy disks.
Doug: Early in my research, I discovered that about 80% of the disks were 800K Double Density disks and so could not be easily imaged with my USB floppy drive. I ended up finding a Macbook G3 Wallstreet edition in the supply closet at MITH which had a drive capable of reading the media.
I then used disk imaging tools to create full disk images of the write-protected disks; this process generated exact bit-for-bit copies of all of the information on the disks. I then used a combination of plain text editors, hex editors, and emulators (software that simulates old computers) to read the drafts of RENT.
Susan: What was the biggest challenge in trying to capture this information?
Doug: Larson used a lot of old software which saved data in file formats that are difficult to open using commercially available tools. Unfortunately, the companies that produced the original software (if they still exist) have little incentive to either support their old products or release it into the public domain. Of course, some of this old software can be obtained through unofficial (and possibly illegal) channels on the Internet–many old video games are available today only because fans were willing to break the law to preserve them–but specialized programs (like early 1990s music programs) are often difficult to locate using any means. I’ve still been unable to locate a working copy of the version of Markof the Unicorn’s Performer that Jonathan Larson used.
Susan: This project involved a pretty unique kind of collaboration here at the Library. Any other thoughts on that?
Doug: My work with the Larson collection was made possible entirely because Mark Horowitz, the Larson estate, and the Music Division at the Library of Congress were willing to take an unconventional (though still responsibly careful) approach to serving an unprocessed born digital collection. I think sometimes we who work in libraries and archives practice our role as guardians of material more fiercely than we practice our role as a collaborator in research. Mark and his colleagues, on the other hand, were willing to extend themselves beyond established practice to help me get the access I needed. This is a great model of librarianship!
It’s also important to note that once I had migrated the data to the servers, it was up to me, as a researcher, to make sense of it. I think often we worry too much about doing research for our readers. Over the last decade or so we’ve come to understand that “more product, less process” is a better approach for paper collections, but I still hear a lot of fretting about how we will process and serve born digital collections if we, as library staff, don’t know how to access or emulate the files ourselves. My feeling is that our role is simply to give the researchers what they need and get out of the way.
Susan: What are some other projects/collections you are currently working on at NYPL?
Doug: I’m currently working on a system for serving born digital and digitized video of dance performance to our reading rooms and, where rights permit, to the wider world. I’m also curating a monthly blog series called “Musical of the Month” which each month makes available the libretto of an out-of-copyright musical in TEI/XML, PDF, and various eBook formats. Additionally, I’m also working with the staff here to begin to acquire new born digital performing arts collections for the Library for the Performing Arts.