Accessing Our Digital Past in the Manuscript Division Reading Room

In this guest post, Kathleen O’Neill, Senior Archives Specialist in the Manuscript Division, announces the launch of a new workstation in the Library’s Manuscript Division reading room. The workstation is equipped with specialized software that allows members of the public to examine complex and legacy born-digital files in the Division’s collections and represents the fruition of the 2020 Staff Innovator project, Born Digital Access Now!

 Signal Blog readers may remember Kathleen and her colleague Chad Conrady joining the LC Labs team for the 120-day Staff Innovator detail in 2020 and 2021, during which time they conducted research on the file formats in the Manuscript Division’s collections, identified existing tools and technology for born-digital access, and developed recommendations for a workstation that met user needs as well as the Library’s technical requirements and rights restrictions. 

This post was originally shared on the Library’s Unfolding History blog and readers are encouraged to reach out directly to the Manuscript Division’s reference librarians for more information on how to use the workstation. 

The Manuscript Division, located in the Library’s James Madison Building in Washington, D.C., is excited to announce the launch of a born-digital access workstation in the Manuscript Reading Room.

Born-digital collection materials are files created and maintained in digital form. Unlike digitized content, born-digital files are not surrogates for physical materials; their original format is digital. Word processing documents, websites, email, digital photographs, and databases are all examples of born-digital files. The division’s born-digital holdings span from the late 1970s to the present, and encompass many of the media formats, file formats, software, and operating systems in use during this period. Files in obsolete formats (e.g., WordPerfect for DOS) or created with obsolete operating systems and software (e.g., Mac OS9 and MacDrawPro v.1.5) require specialized tools for access and that’s where the born-digital access workstation comes in. While an estimated 85% of the born-digital collection material is accessible using modern software and file viewers, many files created before 2001, particularly Mac files, cannot be rendered using modern software.

Access to science and journalism collections in particular is negatively impacted by obsolescent operating systems and file format issues. For example, up to 40% of the born-digital files in the papers of molecular biologist Nina V. Fedoroff are inaccessible without specialized tools capable of running outdated software. One approach to providing access is through emulation, a process by which modern software can imitate legacy operating systems and software. Emulators have been around since the mid-1960s, but were popularized more recently by the video gaming community. Since the mid-1990s, video game enthusiasts have used emulators to preserve the look and feel of classic videos games such as Nintendo’s Excitebike. The tools installed on the Manuscript Division’s born-digital access workstation can emulate a range of Apple and DOS operating systems. These emulation tools include Basilisk II (Mac OS 0.x thru 8.1), Sheepshaver (Mac OS 7.5.3 to OS 9.0.4), QEMU (Mac OS 9.2 to 10.4) and DOSBox (MS-DOS).

Practically speaking, the availability of these emulation tools means that thousands of previously inaccessible files are now available to researchers in the Manuscript Division’s reading room. Fedoroff, for instance, initially used paper index cards to document hybridization of “jumping genes” in corn. Around 1988, she abandoned those paper index cards and began to use a Mac-based HyperCard program called HyperMaize to document this information. With the new workstation, researchers can now access and interact with more than 1,000 Hypermaize files in her papers.

Black, white, and gray image of a single Hypermaize card, showing multiple fields documenting hybredized corn genetic data.]

Fig 1. Emulated HyperMaize Card. Digital ID: mss85579_042_006_disk_image_ver01, HyperMaize/Ear information/EAR CARDS, June 15, 1995, 7:13 am, Nina V. Fedoroff Papers, Manuscript Division, Library of Congress, Washington, D.C.

Similarly, the papers of chaos theorist Edward N. Lorenz contain several DOS-based data visualization programs created for Lorenz attractor data. Without the DOSBox emulator, researchers could only look at the program and data files as text files (Fig. 2) or watch a screen capture of the data visualization running (Fig. 3). Now, researchers can interact with the software by changing inputs or even using their own data with the program.

Two side by side images. The left hand image shows a black and white directory listing of the files for the data visualization. The files date span from 1990 to 1992. The right hand image shows a screen capture of the visualization program running. The image has a black background with fine blue lines forming the distinctive butterfly shape.

Left: Fig. 2. File list for chaos water wheel. Right: Fig 3. Still image of chaos water wheel visualization. Digital ID: mss85426_060_003, “Chaos Water Wheel,” software by Page, Mike & Jim Holsapple, 1990-1992. Edward N. Lorenz Papers, Manuscript Division, Library of Congress, Washington, D.C.

The HyperMaize cards and the Chaos Water Wheel software are examples of file formats that are completely inaccessible without emulation. Many more files are only partially accessible using modern software, but emulation enables us to access the full information and formatting of a file. The example below, again from the Fedoroff Papers, shows an image rendered with modern software on the left and a fully emulated version on the right.

Two side by side images. The left hand image shows black rectangles, arrows, and text on a grey background. The image takes up the top half of the page with the remaining space blank. The right hand image is in full color with a bright blue background and light blue, purple, orange, green, and red rectangles. The image on the right has additional information not visible on the left hand image, including an extra row of rectangles and title information.

Left: Fig. 4. 01.Ds transposon slide with modern file viewer. Right: Fig. 5. 01.Ds transposon slide. Emulated with SheepShaver in MacOS 9 and opened in MacDraw. Digital ID: mss85579_042_021_ver01\data\Slides 9_96\Transposon tagging\01.Ds transposon slide-Preservation. Nina V. Fedoroff Papers, Manuscript Division, Library of Congress, Washington, D.C.

The newly installed tools on the workstation focus on supporting access by allowing researchers to view and copy files and extract metadata. Researchers do not have permission to copy software. Soon, additional tools to support digital humanities research methods will be installed, including tools for text analysis, topic modelling, and data visualization.

The workstation represents the fruition of the 2020 Staff Innovator project, Born Digital Access Now!, a 120-day detail focused on investigating barriers to born-digital access. The project enabled staff innovators Chad Conrady and Kathleen O’Neill, senior archives specialists in the Manuscript Division, to identify existing tools and technology to support access to the full range of the division’s born-digital holdings. The Manuscript Division would like to thank the Digital Strategy Directorate and LC Labs for creating, supporting, and hosting the Staff Innovator project and, in particular, Eileen Jakeway, for her tenacity and creativity as project lead. The project was essential for the Manuscript Division to reach its goal of improving access to and engaging researchers with born-digital material.

The Manuscript Division looks forward to welcoming researchers interested in working with our born-digital collections. The division holds more than 120 collections containing born-digital files, which have been processed and described, with new finding aids added monthly. Only processed, open collections are accessible. To find out more about what materials are currently available, please browse or search the division’s finding aids or ask a librarian in the Manuscript Division. When searching the Finding Aids database, type the search term “digital ID” in the search box; select “All Words” from the drop down menu; and limit the search to “Manuscript” in the Collections drop down list. You may also directly call (202) 707-5387 to set up an appointment to use the workstation in the reading room. Please share widely!

Do you want more stories like this? Then subscribe to Unfolding History – it’s free!

What Is Your Earliest Memory of the Internet?

What is your earliest memory of the internet? The Web Archiving Team and our colleagues in the Digital Content Management Section asked this question during an open house for attendees of the American Library Association’s annual conference, where we had a table set up to share information about our work. As an ice breaker, we […]

Why Web Archiving?: A Conversation with Web Archivists and Researchers

On May 23, the Library of Congress hosted “#WhyWebArchiving: Preserving Internet Content for Research Use,” a virtual event that brought together Library subject experts actively involved in building web archives with researchers that have utilized the Library’s web archives in their work. The event kicked-off the 2022 Web Archiving Conference, which the Library co-hosted with […]

Registration Now Open for IIPC’s 2022 Web Archiving Conference

We are excited to announce that registration is now open for the 2022 Web Archiving Conference! The event, which the Library of Congress is hosting in partnership with the International Internet Preservation Consortium (IIPC) will be held virtually on May 23-25, 2022. The conference is free and open to everyone with an interest in web […]

Library of Congress Opens Search for Next Innovator in Residence

The Library of Congress is looking for a creative dreamer and doer to serve as its next Innovator in Residence. Through May 2nd, 2022, the Library is inviting researchers, artists, and bold thinkers of all types to propose imaginative new experiments designed to open the Library’s vast treasure chest and connect its digital collections with […]

Review With Us: By the People and Smithsonian Transcription Center team up for crowdsourced transcription

Today’s guest post is from Caitlin Haynes, the Program Coordinator for the Smithsonian Transcription Center in Washington, D.C. You can read Caitlin’s original post from the Smithsonian here.* During the month of August 2021, we teamed up with the community managers and volunteers at By the People, the Library of Congress’s crowdsourced transcription program, to focus […]

Reflecting on the Mary Church Terrell transcribe-a-thon with the Douglass Day team

In today’s post, By the People community manager Lauren Algee interviews members of the Douglass Day team about their February 2021 transcribe-a-thon for the Mary Church Terrell Papers. Launched in 2018, By the People is a volunteer engagement and collection enhancement program at the Library of Congress that invites the public to explore and transcribe Library of Congress digital […]