Developing a Digital Preservation Infrastructure at Georgetown University Library

This is a guest post by Joe Carrano, a resident in the National Digital Stewardship Residency program.

Preliminary sketch of Lauinger Library by John Carl Warnecke. Photo courtesy of Georgetown University Library.

Preliminary sketch of Lauinger Library by John Carl Warnecke. Photo courtesy of Georgetown University Library.

The Joseph Mark Lauinger Memorial Library is at home among the many Brutalist-style buildings in and around Washington, D.C. This granite-chip aggregate structure, the main library at Georgetown University, houses a moderate-sized staff that provides critical information needs and assistance to the University community. Within that staff is the the library IT department, half of which is focused on programming, web development and support for computers and systems; the other half is focused on digital services such as digital publishing, digitization and digital preservation. These functions overlap and support each other, which creates a need to work together.

Mingling librarians and IT professionals within the same department is a little different than the way in which many libraries and archives handle the division of services. In some other organizations, the two services are in separate departments and the relationship of librarians and archivists with IT can be dysfunctional. At Georgetown, both types of professionals work closely together, fostering better communication and making it easier to get things done. Often it is invaluable to have people with a depth of knowledge from many different areas working together in the same department. For instance, it’s nice to have people around that really understand computer hardware when you’re trying to transfer data off of obsolete media. They may even have an old collection of floppy disks to donate for testing.

While digital preservation and IT is centered in one department, the preservation files for digitized and born-digital material are spread throughout the library, in different systems and on different storage mediums. My NDSR project focuses in part on bringing together these materials and documenting the workflows of putting them into Academic Preservation Trust, a digital preservation repository.

Georgetown’s decision to use APTrust stemmed from a digital-preservation working group at the library. Group members identified strategies the library could take to improve digital preservation management and methods it should take to implement their goals. Four of the six “next steps” identified in this plan helped form my NDSR project:

1. Implement preservation infrastructure, including a digital-preservation repository
2. Develop and document digital-preservation workflows and procedures
3. Develop a training program and documentation to help build specialized skills for librarians and staff
4. Explore and expand collaborations with internal (university-wide) and external partners to enhance and extend a sustainable infrastructure and to further the library’s involvement at the regional and national levels in digital-preservation strategies.

These goals build upon each other to create a sustainable digital-preservation framework. Membership in — and use of — APTrust fulfilled the first and fourth of these goals. In addition, the difficulty and cost associated with the creation of our own trusted digital repository led the staff to choose this option, which could meet our needs just as well.

APTrust is a distributed, dark, digital-preservation repository, which stores member institutions’ digital materials in Amazon Web Services’ cloud storage (Amazon S3 and Glacier), in two geographic regions – Virginia and Oregon – and in three different “availability zones” within those regions. Digital content is ingested and duplicated six times (once in each zone), so the base 10TB of storage per member institution actually amounts to a total of 60TB. Along with the other preservation actions performed by APTrust, this distribution should help ensure long-term preservation of Georgetown’s digital materials.

As well as being a digital preservation repository, APTrust is also a consortium of higher-education  institutions that are a part of the governance and development of the repository. For instance, I and other Georgetown staff members are in the Bagging Best Practices working group, which determines member-institutions’ needs relating to the BagIt specification [watch the video] and how BagIt is used for packaging and transfer of material into APTrust.

So, while Georgetown gets a hosted digital-preservation repository, it also gets to guide their efforts and participate regionally and nationally in the digital-preservation community. (If you’re interested in the guiding principles of APTrust, check out this Signal interview from 2015).

Joe Carrano. Photo by Mike Matason, Georgetown University Library.

Joe Carrano. Photo by Mike Matason, Georgetown University Library.

Through implementing APTrust, we are also able to fulfill, in part, steps 2 and 3 mentioned above. Georgetown’s migration of materials into APTrust depends on the creation of tools to manage and upload. This is where the close working relationship with the developer in our department, Terry Brady, has been essential, allowing each of us to draw on each other’s expertise to create custom automated solutions to fit Georgetown’s needs. The code for Terry’s BagIt tool and upload verification tool is available on GitHub.

So far, we’ve completed workflows and ingest for all content that had preservation copies in our DSpace digital repository, DigitalGeorgetown. I am also developing documentation and workflows that can be used so that any staff member can sit down and be able to upload materials into APTrust without much training.

I’ve begun training librarians and archivists in other departments to ensure the sustainability of the project’s outcome. Digital curation and preservation tasks are becoming more and more commonplace and we believe that these skills need to be dispersed throughout our institution rather than performed by only a few people. Other staff here have been smoothly integrated into this process, thanks to our thorough documentation. Their new skills helps speed up our ingest rate. This documentation will be open to the public when complete; we hope that it will be useful to the wider library and archival community.

Currently, we’re working on ingesting materials into APTrust that have their preservation copy on network or external storage and metadata in DigitalGeorgetown. This is less automated due to having to get preservation copies off external storage, rather than from our DSpace servers. We will start moving into items with metadata and/or files from other systems such as ArchivesSpace, embARK (art collections) and the library catalog.

By the end of this process we hope to have all our preservation copies transferred and the infrastructure in place to keep digital preservation sustainable at Georgetown.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.