A couple of weeks ago, I had the opportunity to attend the Designing Storage Architectures for Digital Collections meeting, or the storage meeting for short.
Why is this annual meeting a highlight for many of my colleagues here at the Library?
Because it zeros in on the critical data storage challenge cultural heritage organizations face with managing digital collections. The meeting is unique in that it brings together the data storage community and the digital preservation community to focus on the challenge. Libraries, archives and museums depend on storage systems to archive and preserve cultural digital content. While our interests in storage media are driven by our organizations mission, commercial vendors have strong financial incentives with respect to investing in technology for new storage media. The two don’t always intersect.
This year’s meeting featured panels and presentations from various storage vendors discussing data storage applications, cloud and tiered storage, standards development (or lack thereof), scalability of systems, and durability and longevity of “tape vs. disk.” Vendors were asked to specifically address these points:
- How would you store data 5 petabytes, 20 PB and 50 PB in a given year between now and 2018, based on where technology is going?
- What is the future market for tape usage?
- What is the future market for what we now call hierarchical storage management?
The discussions following the presentations were engaging and (at times) pointedly direct. The presentations are available here; discussion notes will be available within the next couple of weeks. For those of you looking for the meat of the technical discussions, check back tomorrow for Leslie Johnstons post.
For someone who doesnt have a strong technical background, I found the discussions about current and future technologies interesting, albeit highly descriptive and scientific regarding hardware and systems components. Lots of acronyms and phrases were thrown around (REST S3 is the new SCSI and HSM appliance). LTFS (Linear Tape File System) was significantly discussed. I searched for these and other terms on my laptop during the meeting, many, many times.
The discussions around the expectations of the archival community for vendors resonated more with me and my work. Aside from technical requirements for data storage, there are preservation strategies and policy decisions that influence how cultural heritage institutions manage their digital collections. Libraries and archives cant afford to lose any content under their stewardship, but there is also the realistic tension of how much loss is acceptable. So, institutions make decisions about their storage media based on other factors in addition to costs and scalability of systems.
For one, institutions consider how storage systems manage metadata and provenance. Can the system authenticate the data? Are privacy and legal issue policies built into the system? For another, institutions rely on various levels of file accessibility. What level of effort is needed retrieve or correct corrupt files? Does the system allow for migration and seamless access? Can data stewards, and not just IT managers, easily use the system?
These, among others, are the factors digital preservation practitioners consider for their storage and management systems. Convening together helps both communities identify common areas of interest and possible solutions.