The following post was authored by Erin Engle, Michelle Gallinger, Butch Lazorchak, Jane Mandelbaum and Trevor Owens from the Library of Congress.
The Library of Congress held the 10th annual Designing Storage Architectures for Digital Collections meeting September 22-23, 2014. This meeting is an annual opportunity for invited technical industry experts, IT professionals, digital collections and strategic planning staff and digital preservation practitioners to discuss the challenges of digital storage and to help inform decision-making in the future. Participants come from a variety of government agencies, cultural heritage institutions and academic and research organizations.
Throughout the two days of the meeting the speakers took the participants back in time and then forward again. The meeting kicked-off with a review of the origins of the DSA meeting. It started ten years ago with a gathering of Library of Congress and external experts who discussed requirements for digital storage architectures for the Library’s Packard Campus of the National Audio-Visual Conservation Center. Now, ten years later, the speakers included representatives from Facebook and Amazon Web Services, both of which manage significant amounts of content and neither of which existed in 2004 when the DSA meeting started.
The theme of time passing continued with presentations by strategic technical experts from the storage industry who began with an overview of the capacity and cost trends in storage media over the past years. Two of the storage media being tracked weren’t on anyone’s radar in 2004, but loom large for the future – flash memory and Blu-ray disks. Moving from the past quickly to the future, the experts then offered predictions, with the caveat that predictions beyond a few years are predictably unpredictable in the storage world.
Another facet of time – “back to the future” – came up in a series of discussions on the emergence of object storage in up-and-coming hardware and software products. With object storage, hardware and software can deal with data objects (like files), rather than physical blocks of data. This is a concept familiar to those in the digital curation world, and it turns out that it was also familiar to long-time experts in the computer architecture world, because the original design for this was done ten years ago. Here are some of the key meeting presentations on object storage:
- Henry Newman – Instrumental, Inc., “Object Storage Developments” (PDF)
- David Anderson – Seagate, “Introduction to Kinetic Key Value Storage” (PDF)
- Sage Weil – Ceph, “Digital Preservation with Open Source” (PDF)
- Chris MacGown – Piston Cloud, “Object Storage: An update in brief” (PDF)
Several speakers talked about the impact of the passage of time on existing digital storage collections in their institutions and the need to perform migrations of content from one set of hardware or software to another as time passes. The lessons of this were made particularly vivid by one speaker’s analogy, which compared the process to the travails of someone trying to manage the physical contents of a car over one’s lifetime.
Even more vivid was the “Cost of Inaction” calculator, which provides black-and-white evidence of the costs of not preserving analog media over time, starting with the undeniable fact that you have to start with an actual date in the future for the “doomsday” when all your analog media will be unreadable.
Several persistent time-related themes engaged the participants in lively interactive discussions during the meeting. One topic was the practical methods for checking the data integrity of content in digital collections. This concept, called fixity, has been a common topic of interest in the digital preservation community. Similarly, a thread of discussion on predicting and dealing with failure and data loss over time touched on a number of interesting concepts, including “anti-entropy,” a type of computer “gossip” protocol designed to query, detect and correct damaged distributed digital files. Participants agreed it would be useful to find a practical approach to identifying and quantifying types of failures. Are the failures relatively regular but small enough that the content can be reconstructed? Or are the data failures highly irregular but catastrophic in nature?
Another common theme that arose is how to test and predict the lifetime of storage media. For example, how would one test the lifetime of media projected to last 1000 years without having a time-travel machine available? Participants agreed to continue the discussions of these themes over the next year with the goal of developing practical requirements for communication with storage and service providers.
The meeting closed with presentations from vendors working on the cutting edge of new archival media technologies. One speaker dealt with questions about the lifetime of media by serenading the group with accompaniment from a 32-year-old audio CD copy of Pink Floyd’s “Dark Side of the Moon.” The song “Us and Them” underscored how the DSA meeting strives to bridge the boundaries placed between IT conceptions of storage systems and architectures and the practices, perspectives and values of storage and preservation in the cultural heritage sector. The song playing back from three decade old media on a contemporary device was a fitting symbol of the objectives of the meeting.
Background reading (PDF) was circulated prior to the meeting and the meeting agenda and copies of the presentations are available at http://www.digitalpreservation.gov/meetings/storage14.html.