Beyond Us and Them: Designing Storage Architectures for Digital Collections 2014

The following post was authored by Erin Engle, Michelle Gallinger, Butch Lazorchak, Jane Mandelbaum and Trevor Owens from the Library of Congress.

The Library of Congress held the 10th annual Designing Storage Architectures for Digital Collections meeting September 22-23, 2014. This meeting is an annual opportunity for invited technical industry experts, IT  professionals, digital collections and strategic planning staff and digital preservation practitioners to discuss the challenges of digital storage and to help inform decision-making in the future. Participants come from a variety of government agencies, cultural heritage institutions and academic and research organizations.

The DSA Meeting. Photo credit: Peter Krogh/DAM Useful Publishing.

The DSA Meeting. Photo credit: Peter Krogh/DAM Useful Publishing.

Throughout the two days of the meeting the speakers took the participants back in time and then forward again. The meeting kicked-off with a review of the origins of the DSA meeting. It started ten years ago with a gathering of Library of Congress and external experts who discussed requirements for digital storage architectures for the Library’s Packard Campus of the National Audio-Visual Conservation Center. Now, ten years later, the speakers included representatives from Facebook and Amazon Web Services, both of which manage significant amounts of content and neither of which existed in 2004 when the DSA meeting started.

The theme of time passing continued with presentations by strategic technical experts from the storage industry who began with an overview of the capacity and cost trends in storage media over the past years. Two of the storage media being tracked weren’t on anyone’s radar in 2004, but loom large for the future – flash memory and Blu-ray disks. Moving from the past quickly to the future, the experts then offered predictions, with the caveat that predictions beyond a few years are predictably unpredictable in the storage world.

Another facet of time – “back to the future” – came up in a series of discussions on the emergence of object storage in up-and-coming hardware and software products.  With object storage, hardware and software can deal with data objects (like files), rather than physical blocks of data.  This is a concept familiar to those in the digital curation world, and it turns out that it was also familiar to long-time experts in the computer architecture world, because the original design for this was done ten years ago. Here are some of the key meeting presentations on object storage:

Several speakers talked about the impact of the passage of time on existing digital storage collections in their institutions and the need to perform migrations of content from one set of hardware or software to another as time passes.  The lessons of this were made particularly vivid by one speaker’s analogy, which compared the process to the travails of someone trying to manage the physical contents of a car over one’s lifetime.

Even more vivid was the “Cost of Inaction” calculator, which provides black-and-white evidence of the costs of not preserving analog media over time, starting with the undeniable fact that you have to start with an actual date in the future for the “doomsday” when all your analog media will be unreadable.

The DSA Meeting. Photo Credit: Trevor Owens

The DSA Meeting. Photo Credit: Trevor Owens

Several persistent time-related themes engaged the participants in lively interactive discussions during the meeting.  One topic was the practical methods for checking the data integrity of content  in digital collections.  This concept, called fixity, has been a common topic of interest in the digital preservation community. Similarly, a thread of discussion on predicting and dealing with failure and data loss over time touched on a number of interesting concepts, including “anti-entropy,” a type of computer “gossip” protocol designed to query, detect and correct damaged distributed digital files. Participants agreed it would be useful to find a practical approach to identifying and quantifying types of failures.  Are the failures relatively regular but small enough that the content can be reconstructed? Or are the data failures highly irregular but catastrophic in nature?

Another common theme that arose is how to test and predict the lifetime of storage media.  For example, how would one test the lifetime of media projected to last 1000 years without having a time-travel machine available?  Participants agreed to continue the discussions of these themes over the next year with the goal of developing practical requirements for communication with storage and service providers.

The meeting closed with presentations from vendors working on the cutting edge of new archival media technologies.  One speaker dealt with questions about the lifetime of media by serenading the group with accompaniment from a 32-year-old audio CD copy of Pink Floyd’s “Dark Side of the Moon.” The song “Us and Them” underscored how the DSA meeting strives to bridge the boundaries placed between IT conceptions of storage systems and architectures and the practices, perspectives and values of storage and preservation in the cultural heritage sector. The song playing back from three decade old media on a contemporary device was a fitting symbol of the objectives of the meeting.

Background reading (PDF) was circulated prior to the meeting and the meeting agenda and copies of the presentations are available at http://www.digitalpreservation.gov/meetings/storage14.html.

Library to Launch 2015 Class of NDSR

The Library of Congress Office of Strategic Initiatives, in partnership with the Institute of Museum and Library Services, has recently announced the 2015 National Digital Stewardship Residency program, which will be held in the Washington, DC area starting in June 2015. As you may know (NDSR was well represented on the blog last year), this […]

Hybrid Born-Digital and Analog Special Collecting: Megan Halsband on the SPX Comics Collection

Every year, The Small Press Expo in Bethesda, Md brings together a community of alternative comic creators and independent publishers. With a significant history of collecting comics, it made sense for the Library of Congress’ Serial and Government Publications Division and the Prints & Photographs Division to partner with SPX to build a collection documenting […]

Upgrading Image Thumbnails… Or How to Fill a Large Display Without Your Content Team Quitting

The following is a guest post by Chris Adams from the Repository Development Center at the Library of Congress, the technical lead for the World Digital Library. Preservation is usually about maintaining as much information as possible for the future but access requires us to balance factors like image quality against file size and design […]

Duke’s Legacy: Video Game Source Disc Preservation at the Library of Congress

The following is a guest post from David Gibson, a moving image technician in the Library of Congress. He was previously interviewed about the Library of Congress video games collection. The discovery of that which has been lost or previously unattainable is one of the driving forces behind the archival profession and one of the […]

Making Scanned Content Accessible Using Full-text Search and OCR

The following is a guest post by Chris Adams from the Repository Development Center at the Library of Congress, the technical lead for the World Digital Library. We live in an age of cheap bits: scanning objects en masse has never been easier, storage has never been cheaper and large-scale digitization has become routine for […]

Web Archiving and Preserving the Performing Arts in the Digital Age

The following is a guest post from Gavin Frome, an intern for the Web Archiving Team at the Library of Congress. Performing artists are by necessity a traveling people. They journey far and wide in the pursuit of their respective crafts, working, learning and weaving a fabric of loose cultural connections that help bind people […]

Recommended Format Specifications from the Library of Congress: An Interview with Ted Westervelt

Continuing the NDSA Insights interview series, I am thrilled to talk about the new Library of Congress Recommended Format Specifications with Ted Westervelt, head of acquisitions and cataloging for U.S. Serials – Arts, Humanities & Sciences at the Library of Congress. Ted has been overseeing the development of the Recommended Format Specifications. While the specifications […]