Developing a Digital Preservation Infrastructure at Georgetown University Library

This is a guest post by Joe Carrano, a resident in the National Digital Stewardship Residency program.

Preliminary sketch of Lauinger Library by John Carl Warnecke. Photo courtesy of Georgetown University Library.

Preliminary sketch of Lauinger Library by John Carl Warnecke. Photo courtesy of Georgetown University Library.

The Joseph Mark Lauinger Memorial Library is at home among the many Brutalist-style buildings in and around Washington, D.C. This granite-chip aggregate structure, the main library at Georgetown University, houses a moderate-sized staff that provides critical information needs and assistance to the University community. Within that staff is the the library IT department, half of which is focused on programming, web development and support for computers and systems; the other half is focused on digital services such as digital publishing, digitization and digital preservation. These functions overlap and support each other, which creates a need to work together.

Mingling librarians and IT professionals within the same department is a little different than the way in which many libraries and archives handle the division of services. In some other organizations, the two services are in separate departments and the relationship of librarians and archivists with IT can be dysfunctional. At Georgetown, both types of professionals work closely together, fostering better communication and making it easier to get things done. Often it is invaluable to have people with a depth of knowledge from many different areas working together in the same department. For instance, it’s nice to have people around that really understand computer hardware when you’re trying to transfer data off of obsolete media. They may even have an old collection of floppy disks to donate for testing.

While digital preservation and IT is centered in one department, the preservation files for digitized and born-digital material are spread throughout the library, in different systems and on different storage mediums. My NDSR project focuses in part on bringing together these materials and documenting the workflows of putting them into Academic Preservation Trust, a digital preservation repository.

Georgetown’s decision to use APTrust stemmed from a digital-preservation working group at the library. Group members identified strategies the library could take to improve digital preservation management and methods it should take to implement their goals. Four of the six “next steps” identified in this plan helped form my NDSR project:

1. Implement preservation infrastructure, including a digital-preservation repository
2. Develop and document digital-preservation workflows and procedures
3. Develop a training program and documentation to help build specialized skills for librarians and staff
4. Explore and expand collaborations with internal (university-wide) and external partners to enhance and extend a sustainable infrastructure and to further the library’s involvement at the regional and national levels in digital-preservation strategies.

These goals build upon each other to create a sustainable digital-preservation framework. Membership in — and use of — APTrust fulfilled the first and fourth of these goals. In addition, the difficulty and cost associated with the creation of our own trusted digital repository led the staff to choose this option, which could meet our needs just as well.

APTrust is a distributed, dark, digital-preservation repository, which stores member institutions’ digital materials in Amazon Web Services’ cloud storage (Amazon S3 and Glacier), in two geographic regions – Virginia and Oregon – and in three different “availability zones” within those regions. Digital content is ingested and duplicated six times (once in each zone), so the base 10TB of storage per member institution actually amounts to a total of 60TB. Along with the other preservation actions performed by APTrust, this distribution should help ensure long-term preservation of Georgetown’s digital materials.

As well as being a digital preservation repository, APTrust is also a consortium of higher-education  institutions that are a part of the governance and development of the repository. For instance, I and other Georgetown staff members are in the Bagging Best Practices working group, which determines member-institutions’ needs relating to the BagIt specification [watch the video] and how BagIt is used for packaging and transfer of material into APTrust.

So, while Georgetown gets a hosted digital-preservation repository, it also gets to guide their efforts and participate regionally and nationally in the digital-preservation community. (If you’re interested in the guiding principles of APTrust, check out this Signal interview from 2015).

Joe Carrano. Photo by Mike Matason, Georgetown University Library.

Joe Carrano. Photo by Mike Matason, Georgetown University Library.

Through implementing APTrust, we are also able to fulfill, in part, steps 2 and 3 mentioned above. Georgetown’s migration of materials into APTrust depends on the creation of tools to manage and upload. This is where the close working relationship with the developer in our department, Terry Brady, has been essential, allowing each of us to draw on each other’s expertise to create custom automated solutions to fit Georgetown’s needs. The code for Terry’s BagIt tool and upload verification tool is available on GitHub.

So far, we’ve completed workflows and ingest for all content that had preservation copies in our DSpace digital repository, DigitalGeorgetown. I am also developing documentation and workflows that can be used so that any staff member can sit down and be able to upload materials into APTrust without much training.

I’ve begun training librarians and archivists in other departments to ensure the sustainability of the project’s outcome. Digital curation and preservation tasks are becoming more and more commonplace and we believe that these skills need to be dispersed throughout our institution rather than performed by only a few people. Other staff here have been smoothly integrated into this process, thanks to our thorough documentation. Their new skills helps speed up our ingest rate. This documentation will be open to the public when complete; we hope that it will be useful to the wider library and archival community.

Currently, we’re working on ingesting materials into APTrust that have their preservation copy on network or external storage and metadata in DigitalGeorgetown. This is less automated due to having to get preservation copies off external storage, rather than from our DSpace servers. We will start moving into items with metadata and/or files from other systems such as ArchivesSpace, embARK (art collections) and the library catalog.

By the end of this process we hope to have all our preservation copies transferred and the infrastructure in place to keep digital preservation sustainable at Georgetown.

Spotlighting Research Data: Building Relationships with Outreach for the NYU Data Catalog

This is a guest post by Nicole Contaxis, Data Catalog Coordinator at NYU Health Sciences Library. You can email her at [email protected] An increasing number of publishers and grant-funding organizations are requiring researchers to share their data, so libraries and other institutions are creating tools and strategies to support researchers in this effort. To meet […]

Using Three-Dimensional Modeling to Preserve Cultural Heritage

This is a guest post by Elizabeth England, a resident in the National Digital Stewardship Residency program. In recent years, a few news stories focused on the use of digital tools in preserving cultural heritage three-dimensional objects, stories such as the printed reconstruction of the Arch of Triumph in Palmyra, Syria and the construction of a […]

Library of Congress Advisory Team Kicks off New Digitization Effort at Eckerd College

This is a guest post by Eckerd College faculty David Gliem, associate professor of Art History, and Nancy Schuler, librarian and assistant professor of Electronic Resources, Collection Development and Instructional Services. On June 3rd, a meeting at Eckerd College in St. Petersburg, Florida, brought key experts and College departments together to begin plans for the […]

Digital Curation and the Public: Strategies for Education and Advocacy

This is a guest post by Jaime Mears. On March 4th, 2016, the Washington DC Public Library hosted Digital Curation and the Public: Strategies for Education and Advocacy at the Martin Luther King, Jr. Memorial Library. It was what the National Digital Stewardship Residents program calls an “enrichment session” and the audience was composed of NDSR colleagues and mentors. […]

Blurred Lines, Shapes, and Polygons, Part 1: An NDSR-NY Project Update

The following is a guest post by Genevieve Havemeyer-King, National Digital Stewardship Resident at the Wildlife Conservation Society Library & Archives. She participates in the NDSR-NY cohort. This post is Part 1 of 2 posts on Genevieve’s exploration of stewardship issues for preserving geospatial data. A few weeks ago, I wrote an article for the […]

Digital Preservation Planning: An NDSR Boston Project Update

The following is a guest post by Jeffrey Erickson, National Digital Stewardship Resident at the University Archives and Special Collections at UMass Boston. He participates in the NDSR-Boston cohort. I am a recent graduate of Simmons College’s School of Library and Information Science as well as a current participant in this year’s Boston cohort of […]