Centralized Digital Accessioning at Yale University

This is a guest post from Alice Prael, Digital Accessioning Archivist for Yale Special Collections at the Beinecke Rare Book & Manuscript Library at Yale University.

Photograph of digital lab, with computers and scan station. by Alice Prael

Photo by Alice Prael

As digital storage technology progresses, many archivists are left with boxes of obsolete storage media, such as floppy disks and ZIP disks.  These physical storage media plague archives that struggle to find the time and technology to access and rescue the content trapped in the metal and plastic. The Digital Accessioning Service was created to fix this problem across Yale University Libraries and Museums for special collections units that house unique and rare materials and require specialized preservation and supervised access. Nine of Yale’s special collections units participate in the Digital Accessioning Service.

The goal of the Service is to centralize the capture of content off physical media and eliminate the backlog of born-digital material that has not yet been captured for preservation. Until now, this work was completed in an ad hoc fashion (often in response to researcher requests), which has led to a large backlog of disks that may have been described and separated from the collection but have never been fully processed. By centralizing digital accessioning, Yale Libraries leverages its hardware, software and expertise to make the Service available to special collections units that may lack the resources to capture and preserve born digital collections.

The Di Bonaventura Digital Archaeology and Preservation Lab, shared by the Beinecke Rare Book and Manuscript Library and the YUL Preservation Department, hosts the Digital Accessioning workstations. There are two custom-built computers created to  capture content from storage media such as floppy disks, CDs and hard drives. One non-networked computer is used to scan media for viruses prior to capturing the content (it is disconnected from the network so that viruses cannot get loose and spread throughout the network). Another computer has specialized software to scan the content for private information as well as for other in-depth processing tasks. These machines form the technological base of the Digital Accessioning Service.

The Service is mainly staffed by me (with guidance from Gabby Redwine, Beinecke’s Digital Archivist) and the Born Digital Archives Working Group, which is made up of practitioners from across Yale University Libraries and Museums. The Service also employs student assistants to help with disk imaging and data entry.

Before Drafting Policies and Procedures
Before we drafted policies and procedures, the Digital Archivist and I met with the participating special collection units and talked with each unit about the collections they hold and their expectations for future born-digital acquisitions. (It’s important that the Service be able to provide services for all the major media types within our collections.) We completed an informal environmental scan prior to the creation of the Service to determine what media types the Service should be ready for and how much storage would be necessary to preserve all the captured content. Once the challenges began to take shape, I consulted with the Born Digital Archives Working Group and began building workflows and testing tools.

The Service uses a variety of software and hardware tools, including Kryoflux, Forensic Toolkit, IsoBuster and the BitCurator environment. More details about our usage of these tools are available in the Disk Imaging and Content Capture Manual on our Digital Accessioning Service Libguide. I tested the workflow with dummy media, mostly using software-installation disks. In an effort to stay as transparent as possible to special collections units and the larger digital-archives community, I published much of the Service’s documentation — including workflows, manuals and frequently asked questions — on the Born Digital Archives Working Group Libguide.

The main steps of the workflow are:

  1. Complete a submission form (done by the special collections unit) and deliver media securely to the Lab
  2. Confirm that the boxes of media that arrived match the content described by the special collections unit
  3. Photograph the disks
  4. Scan the disks for viruses
  5. Connect to writeblockers (which block archivists from making any changes — accidentally or deliberately — to the original disk) and attempt to create an exact copy, called a disk image, of the content
  6. If disk-image creation fails, attempt to transfer files off storage media
  7. Scan captured content for personally identifiable information
  8. Package all captured content, photographs and associated metadata files for ingest into the preservation system.
Photo of a floppy disk

Some record creators use every inch of their labels, leaving little room for archivists to apply their own naming conventions. Photo by Alice Prael.

In creating the Service, I encountered some unexpected challenges, many of which I documented on the Saving Digital Stuff blog. One challenge was determining a standard method for labeling the storage media. It is important that media is labeled in order to correctly identify content and ensure that the description is permanently associated with the storage media. Each special collections unit labels storage media prior to submission to the Service. We had challenges in labeling media that were already covered with text from the original record creator. We also faced difficulties labeling fragile media such as CDs and DVDs. Another challenge was the need for different tools for handling Compact Disks-Digital Audio, or CD-DAs, which have a higher error rate than CDs that contain other data. The Service ultimately decided to use Exact Audio Copy, a software tool created for capturing content from CD-DAs.

The Digital Accessioning Service is only one piece of a larger digital preservation and processing environment. The Service requires that special collections units provide a minimum level of description via spreadsheets that get imported into ArchivesSpace, the archival description management system adopted at Yale University Libraries. However not all of the special collection units have fully implemented ArchivesSpace yet. By using the spreadsheets as an intermediate step, the Service can accommodate all special collections units’ needs regardless of their current stage of ArchivesSpace implementation.

Once the Service’s disk processing is complete, the disk image, photographs, log files and other associated files get moved into the Library’s digital-preservation system, Preservica. Yale University Libraries’ implementation of Preservica is integrated with ArchivesSpace descriptions, which will aid future archivists in locating digital material described in our finding aids. Content from each disk is ingested into Preservica and listed as a digital object in ArchivesSpace, associated with the item-level description for the disk.

After Drafting Policies and Procedures
After drafting and revising the policies and procedures in collaboration with the Born Digital Archives Working Group, the Digital Archivist and I returned to the special collections units to make sure that our workflows would be sufficient for their materials.

One concern was regarding the immediate ingest of material into Preservica. Since many special collections units do not have the hardware to preview disks prior to submission for accessioning, the files themselves have not yet been appraised to determine their archival value. Once content is ingested for preservation, deletion is possible but much more onerous. For special collections units that require appraisal post-accessioning, the Service decided to use the SIP Creator tool, developed by Preservica to package content and maintain the integrity of the files, then move the packaged content onto a shared network storage folder. Special collections units may then access and appraise their content prior to ingest for long-term preservation.

The focus of the Service at this point is to address the significant backlog of material that has been acquired but not yet captured for preservation. The Service is currently funded as a two-year project. As we approach the eight-month mark, we are using this time to determine the ongoing needs for special collections units at Yale. I hope that, as the backlog is diminished, the existence of the Service will aid in future born-digital collection development. Some special collections units have noted that in the past they were hesitant to accept certain donated material because they could not ensure the capture and preservation of the content. By removing this barrier, I hope that donors, curators and archivists across Yale University will be more comfortable working with born-digital material.

Recommendations for Enabling Digital Scholarship

Mass digitization — coupled with new media, technology and distribution networks — has transformed what’s possible for libraries and their users. The Library of Congress makes millions of items freely available on loc.gov and other public sites like HathiTrust and DPLA. Incredible resources — like digitized historic newspapers from across the United States, the personal papers […]

Developing a Digital Preservation Infrastructure at Georgetown University Library

This is a guest post by Joe Carrano, a resident in the National Digital Stewardship Residency program. The Joseph Mark Lauinger Memorial Library is at home among the many Brutalist-style buildings in and around Washington, D.C. This granite-chip aggregate structure, the main library at Georgetown University, houses a moderate-sized staff that provides critical information needs […]

Women’s History Month Wikipedia Edit-a-thon

This is a guest post from Sarah Osborne Bender, Director of the Betty Boyd Dettre Library and Research Center at the National Museum of Women in the Arts. I graduated from library school in 2001, just months after Wikipedia was launched. So as a freshly minted information professional, it is no surprise that I fell […]

Open Science Framework: Meeting Researchers Where They Are

This is a guest post by Megan Potterbusch, National Digital Stewardship resident at the Association of Research Libraries. Openly sharing research data, code and methodology are integral parts of open science. Whether due to disciplinary culture shifts or funder and publisher mandates, the general trend towards open science has been increasing in many research fields. […]

Assembling the Whole: An Interview with Librarian|Artist Oliver Baez Bendorf

Oliver Baez Bendorf is a poet, cartoonist, librarian, teaching artist and activist. He holds an MFA in Poetry and MLIS from the University of Wisconsin-Madison, author of the book of poems The Spectral Wilderness (Kent State University Press 2015) and an essay on activism in the forthcoming Poet-Librarians in the Library of Babel (Library Juice […]

Read Collections as Data Report Summary

Our Collections as Data event in September 2016 on exploring the computational use of library collections was a success on several levels, including helping steer our team at National Digital Initiatives in our path of action. We are pleased to release the following summary report which includes an executive summary of the event, the outline […]

IEEE Big Data Conference 2016: Computational Archival Science

This is a guest post by Meredith Claire Broadway,a consultant for the World Bank. Computational Archival Science can be regarded as the intersection between the archival profession and “hard” technical fields, such as computer science and engineering. CAS applies computational methods and resources to large-scale records and archives processing, analysis, storage, long-term preservation and access. […]

The University of Richmond’s Digital Scholarship Lab

In November, 2016, staff from the Library of Congress’s National Digital Initiatives division visited the University of Richmond’s Digital Scholarship Lab as part of NDI’s efforts to explore data librarianship, computational research and digital scholarship at other libraries and cultural institutions. Like many university digital labs, the DSL is based in the library, which DSL […]