Centralized Digital Accessioning at Yale University

This is a guest post from Alice Prael, Digital Accessioning Archivist for Yale Special Collections at the Beinecke Rare Book & Manuscript Library at Yale University.

Photograph of digital lab, with computers and scan station. by Alice Prael

Photo by Alice Prael

As digital storage technology progresses, many archivists are left with boxes of obsolete storage media, such as floppy disks and ZIP disks.  These physical storage media plague archives that struggle to find the time and technology to access and rescue the content trapped in the metal and plastic. The Digital Accessioning Service was created to fix this problem across Yale University Libraries and Museums for special collections units that house unique and rare materials and require specialized preservation and supervised access. Nine of Yale’s special collections units participate in the Digital Accessioning Service.

The goal of the Service is to centralize the capture of content off physical media and eliminate the backlog of born-digital material that has not yet been captured for preservation. Until now, this work was completed in an ad hoc fashion (often in response to researcher requests), which has led to a large backlog of disks that may have been described and separated from the collection but have never been fully processed. By centralizing digital accessioning, Yale Libraries leverages its hardware, software and expertise to make the Service available to special collections units that may lack the resources to capture and preserve born digital collections.

The Di Bonaventura Digital Archaeology and Preservation Lab, shared by the Beinecke Rare Book and Manuscript Library and the YUL Preservation Department, hosts the Digital Accessioning workstations. There are two custom-built computers created to  capture content from storage media such as floppy disks, CDs and hard drives. One non-networked computer is used to scan media for viruses prior to capturing the content (it is disconnected from the network so that viruses cannot get loose and spread throughout the network). Another computer has specialized software to scan the content for private information as well as for other in-depth processing tasks. These machines form the technological base of the Digital Accessioning Service.

The Service is mainly staffed by me (with guidance from Gabby Redwine, Beinecke’s Digital Archivist) and the Born Digital Archives Working Group, which is made up of practitioners from across Yale University Libraries and Museums. The Service also employs student assistants to help with disk imaging and data entry.

Before Drafting Policies and Procedures
Before we drafted policies and procedures, the Digital Archivist and I met with the participating special collection units and talked with each unit about the collections they hold and their expectations for future born-digital acquisitions. (It’s important that the Service be able to provide services for all the major media types within our collections.) We completed an informal environmental scan prior to the creation of the Service to determine what media types the Service should be ready for and how much storage would be necessary to preserve all the captured content. Once the challenges began to take shape, I consulted with the Born Digital Archives Working Group and began building workflows and testing tools.

The Service uses a variety of software and hardware tools, including Kryoflux, Forensic Toolkit, IsoBuster and the BitCurator environment. More details about our usage of these tools are available in the Disk Imaging and Content Capture Manual on our Digital Accessioning Service Libguide. I tested the workflow with dummy media, mostly using software-installation disks. In an effort to stay as transparent as possible to special collections units and the larger digital-archives community, I published much of the Service’s documentation — including workflows, manuals and frequently asked questions — on the Born Digital Archives Working Group Libguide.

The main steps of the workflow are:

  1. Complete a submission form (done by the special collections unit) and deliver media securely to the Lab
  2. Confirm that the boxes of media that arrived match the content described by the special collections unit
  3. Photograph the disks
  4. Scan the disks for viruses
  5. Connect to writeblockers (which block archivists from making any changes — accidentally or deliberately — to the original disk) and attempt to create an exact copy, called a disk image, of the content
  6. If disk-image creation fails, attempt to transfer files off storage media
  7. Scan captured content for personally identifiable information
  8. Package all captured content, photographs and associated metadata files for ingest into the preservation system.
Photo of a floppy disk

Some record creators use every inch of their labels, leaving little room for archivists to apply their own naming conventions. Photo by Alice Prael.

In creating the Service, I encountered some unexpected challenges, many of which I documented on the Saving Digital Stuff blog. One challenge was determining a standard method for labeling the storage media. It is important that media is labeled in order to correctly identify content and ensure that the description is permanently associated with the storage media. Each special collections unit labels storage media prior to submission to the Service. We had challenges in labeling media that were already covered with text from the original record creator. We also faced difficulties labeling fragile media such as CDs and DVDs. Another challenge was the need for different tools for handling Compact Disks-Digital Audio, or CD-DAs, which have a higher error rate than CDs that contain other data. The Service ultimately decided to use Exact Audio Copy, a software tool created for capturing content from CD-DAs.

The Digital Accessioning Service is only one piece of a larger digital preservation and processing environment. The Service requires that special collections units provide a minimum level of description via spreadsheets that get imported into ArchivesSpace, the archival description management system adopted at Yale University Libraries. However not all of the special collection units have fully implemented ArchivesSpace yet. By using the spreadsheets as an intermediate step, the Service can accommodate all special collections units’ needs regardless of their current stage of ArchivesSpace implementation.

Once the Service’s disk processing is complete, the disk image, photographs, log files and other associated files get moved into the Library’s digital-preservation system, Preservica. Yale University Libraries’ implementation of Preservica is integrated with ArchivesSpace descriptions, which will aid future archivists in locating digital material described in our finding aids. Content from each disk is ingested into Preservica and listed as a digital object in ArchivesSpace, associated with the item-level description for the disk.

After Drafting Policies and Procedures
After drafting and revising the policies and procedures in collaboration with the Born Digital Archives Working Group, the Digital Archivist and I returned to the special collections units to make sure that our workflows would be sufficient for their materials.

One concern was regarding the immediate ingest of material into Preservica. Since many special collections units do not have the hardware to preview disks prior to submission for accessioning, the files themselves have not yet been appraised to determine their archival value. Once content is ingested for preservation, deletion is possible but much more onerous. For special collections units that require appraisal post-accessioning, the Service decided to use the SIP Creator tool, developed by Preservica to package content and maintain the integrity of the files, then move the packaged content onto a shared network storage folder. Special collections units may then access and appraise their content prior to ingest for long-term preservation.

The focus of the Service at this point is to address the significant backlog of material that has been acquired but not yet captured for preservation. The Service is currently funded as a two-year project. As we approach the eight-month mark, we are using this time to determine the ongoing needs for special collections units at Yale. I hope that, as the backlog is diminished, the existence of the Service will aid in future born-digital collection development. Some special collections units have noted that in the past they were hesitant to accept certain donated material because they could not ensure the capture and preservation of the content. By removing this barrier, I hope that donors, curators and archivists across Yale University will be more comfortable working with born-digital material.

Recommendations for Enabling Digital Scholarship

Mass digitization — coupled with new media, technology and distribution networks — has transformed what’s possible for libraries and their users. The Library of Congress makes millions of items freely available on loc.gov and other public sites like HathiTrust and DPLA. Incredible resources — like digitized historic newspapers from across the United States, the personal papers […]

Wisdom is Learned: An Interview with Applications Developer Ashley Blewer

  Ashley Blewer is an archivist, moving image specialist and developer who works at the New York Public Library. In her spare time she helps develop open source AV file conformance and QC software as well as standards such as Matroska and FFV1. She’s a three time Association of American Moving Image Archivists’ AV Hack […]

User Experience (UX) Design in Libraries: An Interview with Natalie Buda Smith

  Natalie Buda Smith is the User Experience (UX) Team supervisor at the Library of Congress, and most recently worked with NDI to design the beautiful graphic for our Collections as Data conference. Her team has been busy redesigning Loc.gov, and the new homepage is set to debut Tuesday, Nov.1st. We caught up over coffee […]

Digital Collections and Data Science

Researchers, of varying technical abilities, are increasingly applying data science tools and methods to digital collections. As a result, new ways are emerging for processing and analyzing the digital collections’ raw material — the data. For example, instead of pondering one single digital item at a time – such as a news story, photo or […]

Co-Hosting a Datathon at the Library of Congress

On June 14 and 15, the Library of Congress hosted Archives Unleashed 2.0, a web archive “datathon” (otherwise known as a “hackathon,” but apparently any term with the word “hack” in it might sound a bit menacing) in which teams of researchers used a variety of analytical tools to query web-archive data sets in the hopes of discovering some intriguing insights before their 48-hour deadline […]

O Email! My Email! Our Fearful Trip is Just Beginning: Further Collaborations with Archiving Email

Apologies to Walt Whitman for co-opting the first line of his famous poem O Captain! My Captain!  but solutions for archiving email are not yet anchor’d safe and sound. Thanks to the collaborative and cooperative community working in this space, however, we’re making headway on the journey. Email archiving as a distinct research area has […]

Bagger’s Enhancements for Digital Accessions

This is a guest post by John Scancella, Information Technology Specialist with the Library of Congress, and Tibaut Houzanme, Digital Archivist with the Indiana Archives and Records Administration. BagIt is an internationally accepted method of transferring files via digital containers. If you are new to BagIt, please watch our introductory video. Bagger is a digital […]

Intellectual Property Rights Issues for Software Emulation: An Interview with Euan Cochrane, Zach Vowell, and Jessica Meyerson

The following is a guest post by Morgan McKeehan, National Digital Stewardship Resident at Rhizome. She is participating in the NDSR-NYC cohort. I began my National Digital Stewardship Residency at Rhizome — NDSR project description here (PDF) — by leading a workshop for the Emulation as a Service framework (EaaS), at “Party Like it’s 1999: […]