The Library of Congress hosts a small annual meeting on preservation storage that brings vendors and the preservation community together to share points of view. The 2012 Designing Storage Architectures meeting was held on September 20-21, and as usual it was enlightening–and exciting.
Two forms of large-scale storage have the largest amount of market share: spinning disk and tape. Tape is often erroneously referred to as “archival storage,” when, really, storage architects are referring to tape archives, the large tape storage arrays. It is better to think of tape storage as “long-term” storage. But even that is a misnomer because tape requires media migration just like spinning disk.
The discussions of tape storage were at time reminiscent of a verbal Doomsday Cage match–or maybe a seminar on technological implications stemming from the philosophy of Friedrich Nietzsche.
“Tape is dead.” “Tape is not dead.” “Why is tape dead?” “We killed tape and it remains dead.” “With very large collections no one can afford not to use tape–it’s so not dead.”
The first three questions could be argued for years. The last is the crux of the matter. When you have a digital collection that is north of 5 petabytes in size and you want, as a preservation strategy, multiple copies on different media in geographically distributed locations tape is still cost effective.
While the storage technology community is not all gloom and doom, there were some cautionary tales from technology manufacturers related to performance. More processor cores are needed for many preservation activities, such as fixity checking or format migration, and memory performance has not scaled along with CPU performance. There are physical limits, just like the laws of physics. Media reliability is stable, but not necessarily increasing. Storage densities are increasing, but there is a cost associated with migrating, which can take years depending on the scale of the collection. And the scale question–for operating systems and file systems–is critical when an organization has billions of files.
There are interesting storage technologies that provide potential areas of improvement. Solid State Drives have no moving mechanical parts (which makes them quieter and less susceptible to physical shock) and have faster access times, but are more expensive than both traditional spinning disk or tape.
The Library of Congress is beginning to introduce the use of SSDs to our tiered storage architecture where high speed data access is needed. If you are interested in what the Library if doing, look at the presentations by Carl Watts and Scott Rife.
I mentioned tiered storage architecture in passing, and that seemed to be accepted by many or most in the room as the most common practice at large scale. The management of large scale tiered storage architectures was another verging-on-cage-match-smackdown discussion. This Hierarchical Storage Management model and the methods of its implementation are myriad. What are the industry standards for HSM? Are new standards developing? Are these software or appliance implementations or both? Where do preservation standards fit into these HSM implementations? How is data integrity ensured across SSDs and spinning disk and tape and the cloud?
The meeting included detailed presentations of storage architectures in place for large-scale preservation initiatives, such as the Hathi Trust, Chronopolis, and the emerging Digital Preservation Network for academic institutions. The National Digital Stewardship Alliance presented the results of its survey on preservation storage.
All the presentations are available on the web site, and I strongly recommend reviewing them for insight into where our community is heading and what it needs.