Metaphors for Understanding Born Digital Collection Access: Part I

The following is a guest post by Senior Archivist Kathleen O’Neill. Kathleen and her colleague Chad Conrady are currently working on a project called Born Digital Access Now! as the 2020 Staff Innovators in LC Labs. Their first blog post introduces the project, which aims to provide greater access to born digital materials held in the Manuscript Division Today’s post is the first in a series of three blog posts in which Kathleen will discuss different challenges or barriers to born digital collection access through the lens of three different metaphors. Up first is: “Media Format, or, Have Fun Storming the Castle!”

One of the joys of processing a paper collection is the initial review — opening the boxes, noting the condition of the papers, getting clues from the folder headings, documenting any evident organization, dates, and types of materials. It feels like the beginning of an adventure or mystery to unlock, and when a folder heading sparks interest, you simply pick up the folder, open it, and dive in. While there are times paper materials need mold remediation, de-acidification, or stabilization before an archivist can begin processing, for the most part, paper materials allow immediate access to their contents.

For born digital material, the joy described above is not as immediate and is usually hard won. Yes, a review of the media formats and their labels can provide clues to the content and age of the materials. But labels are often incorrect and there is no way to know the number or formats of files by looking at the label. The barriers to overcome include media format, acquiring tools, and in some cases, obsolete software and operating systems. The experience can feel like falling down a rabbit hole, setting off on a mission, translating a foreign language, unlocking a code, or revealing a buried treasure. This work requires patience, trial and error, multiple tools, and most importantly, mixed metaphors. Join me as I walk you through the challenges of accessing born digital materials and, in the process, introduce you to some of our collections.

Media Format or “Have fun storming the castle!”

The Library of Congress Manuscript Division born-digital holdings include content from the 1980s to the present day. The media formats in our collections reflect the myriad ways people captured and stored information during that date range. So yes, we have computer tape, 8”, 5.25”, and 3.5” floppies, CDs and DVDs, hard drives, Bernoulli drives, USB drives, and content from proprietary online services and applications. Each has its inherent challenges.

For the physical media, the most basic questions is: do you have the drives to access the media? Working with online services and apps entails questions about passwords, permissions, and concerns about altered metadata. We learn from our collections. I do not just mean learning subject matter and history, but we learn how to be archivists from the opportunities our collections provide us. This, of course, is true of working with paper materials, but born-digital materials are a relatively new format and the archival profession is still developing workflows, standard practices, and tools to access and preserve the digital content. I included the above line of dialogue from The Princess Bride because I think it perfectly captures my mood as I set off to tackle the digital content in the Seth McFarlane Collection of the Carl Sagan and Ann Druyan Archive. Like the motley group off to save Princess Buttercup, I had a sense of mission and responsibility, a hard deadline to meet, and a blissful ignorance of the difficulties that lay ahead.

Seth MacFarlane collection of the Carl Sagan and Ann Druyan archive

The Manuscript Division received the Seth MacFarlane collection of the Carl Sagan and Ann Druyan archive in 2012 when our born-digital workflow was relatively new and still developing. Carl Sagan did not use a computer so we expected some but not a great deal of digital content in the collection. The first pass on the paper materials uncovered two boxes of almost 200 storage media, a combination of 3.5”and 5.25” floppy disks. As the processing team worked over the next 18 months, more and still more media was found. In the end, the collection contained over 730 pieces of digital storage media and it remains one of Division’s largest collections in terms of the number of media.

When I inform Sagan admirers that he did not use a computer, they always look surprised and a bit disappointed. He used an array of technology in his fascinating creative process, which involved dictating sections of book drafts that were then transcribed for him. His writing process is described in  more detail here: // Together, the physical and digital parts of the collection document not only his creative process, but an overlay of technologies including the adoption and usage of various digital storage media and file formats.

four 3.5” floppy disks, three 5.25” floppy disks, and a piece of paper with file listing.

(Fig. 1) 3.5” and 5.25” floppy disks from the Seth McFarlane Collection of the Carl Sagan and Ann Druyan Archive

After I got over the shock of the amount and diversity of digital media, I realized, fortunately, the collection contained primarily 3.5” floppy disks. Why fortunately? Well, since it was 2012, our work computers still had 3.5” floppy drives. We got to answer, in the affirmative, that first basic question: do you have the drives to access the media? Additionally, the 3.5” floppy are surprisingly stable. If you have a 3.5” floppy drive, it is relatively easy to copy files off the media. These files were largely from the mid-1980s to 1997 and therefore, tended to be small, not complex with simple or no hierarchies.

There were some bumps in the road recovering data from the 3.5” floppies. Some of the disks were damaged or the files were corrupt. We were able to recover content from the damaged and corrupt disks by using Forensic Toolkit (FTK) imager to create disk images then exporting the files from the disk image.

In the end, we were able to recover the content from 420 out of 498 of the 3.5” floppy disks, comprising over 19,000 files (242.6 MB). These were largely text files. The lead archivist, Connie Cartledge, estimated that approximately 95% of the digital content was printed out and could be found in the paper portion of the collection.

There still remained significant media format challenges. We were unable to process the remaining 78 of the 3.5” floppy disks. Most of these were either double sided double density disks or Mac formatted which our computers could not read. In addition, we had not yet developed the 5.25” workflow and did not have the drives to read the 8 floppy disk workflow and Bernoulli drives.  The 8″ floppy disks and Bernoulli drives remain inaccessible.

Digital processing lesson: The proper hardware in the form of floppy drive readers does not guarantee access to digital content.

Where does that leave us with the “storming the castle” metaphor? We’ve made progress, rescued significant content. We’ve scaled the wall and reached the courtyard, only to discover the door to the dungeon is locked and we’ve brought the wrong key.

Next week, join us for Legacy File Formats and Operating Systems or “Lost in Translation” when the Walter Sullivan papers teach me what happens when obsolete file formats meet modern day operating systems.


Note: this post has been slightly edited for clarity.

Memory XFR

This is a guest post by Siobhan C. Hagan reporting on the Memory XFR event hosted by the American Folklife Center and the DC Public Library. Siobhan is the Memory Lab Network Project Manager at DC Public Library, where she leads the IMLS National Leadership Grant project to embed digital preservation tools and education in […]

Librarians learn about personal archiving at the Library of Congress

On April 16th and 17th, National Digital Initiatives in partnership with DC Public Library hosted the Memory Lab Network Bootcamp at the Library of Congress. The Memory Lab Network – a cohort of 7 urban, rural, and tribal library systems – will build digitization stations and teach classes through an IMLS grant to support personal […]

Memory Lab Network: An interview with Project Manager Lorena Ramirez-Lopez

Applications are being accepted until December 15th to participate in the Memory Lab Network, an Institute of Museum and Library Services National Leadership Grant facilitated by DC Public Library (DCPL) in partnership with the Public Library Association (PLA) to create free public access to tools and information on caring for personal digital archives. Seven public […]

The Personal Digital Archiving 2015 Conference

The annual Personal Digital Archiving conference is about preserving any digital collection that falls outside the purview of large cultural institutions. Considering the expanding range of interests at each subsequent PDA conference, the meaning of the word “personal” has become thinly stretched to cover topics such as family history, community history, genealogy and digital humanities. New York […]

Helping Congress Archive Their Personal Digital Files

By early December 2014, a Congressional election year, newly elected Members of Congress were preparing for public service as outgoing Members were ending their public service and attending exit briefings. At an event sponsored by the U.S. Association of Former Members of Congress, the December 3rd “Life After Congress” seminar, Robin Reeder, Archivist of the […]

“Elementary!” A Sleuth Activity for Personal Digital Archiving

As large institutions and organizations continue to implement preservation processes for their digital collections, a smattering of self-motivated information professionals are trying to reach out to the rest of the world’s digital preservation stakeholders —  individuals and small organizations — to help them manage their digital collections. Part of that challenge is just making people aware that: […]

Personal Digital Archiving 2015 in NYC — “Call for Papers” Deadline Approaching

The Personal Digital Archiving Conference 2015 will take place in New York City for the first time. The conference will be hosted by our NDIIPP and NDSA partners at New York University’s Moving Image Archiving and Preservation program April 24-26, 2015. Presentation submissions for Personal Digital Archiving are due Monday, December 8th, 2014 by 11:59 […]