Protect Your Data: Storage and Geographic Location

This post is about row one column one, the first box, in the levels of digital preservation.

This post is about row one column one, the first box, in the NDSA levels of digital preservation.

The NDSA levels of digital preservation are useful in providing a high-level, at-a-glance overview of tiered guidance for planning for digital preservation. One of the most common requests received by the NDSA group working on this is that we provide more in-depth information on the issues discussed in each cell.

To that end, we are excited to start a new series of posts, set up to help you and your organization think through how to go about working your way through the cells on each level.

There are 20 cells in the five levels, so there much to discuss. We intend to work our way through each cell while expounding on the issues inherent in that level. We will define some terms, identify key considerations and point to some secondary resources.  If you want an overall explanation of the levels, take a look at The NDSA Levels of Digital Preservation: An Explanation and Uses.

Let’s start with row one cell one, Protect Your Data: Storage and Geographic Location.

The Two Requirements of Row One Column One

There are only two requirements in the first cell, but there is actually a good bit of practical logic tucked away inside the reasoning for those two requirements.

Two complete copies that are not collocated

For starters you want to have more than one copy and you want to have those two copies in different places. The difference between having a single point of failure and two points of failure is huge.   For someone working at a small house museum that has a set of digital recordings of oral history interviews this might be as simple as making a second copy of all of the recordings on an external hard drive and taking that drive home and tucking it away somewhere. If you only have one copy, you are one spilt cup of coffee, one dropped drive, or one massive power surge or fire away from having no copies. While you could meet this requirement literally by simply making any type of copy of your data and taking it home, it will become clear that this alone is not going to be a tenable solution for you to make it further up the levels in the long run. The point of the levels is to start somewhere and make progress.

With this said, it’s important to note that all storage media is not created equally. The difference in error rates between something like a flash drive on your key chain, to an enterprise hard disk or tape is gigantic. So gigantic in fact that from error rate alone, you would likely be better off only having one copy on a far better quality piece of media than having two copies on something like two cheap flash drives. Remember though, the hard error rate of the storage devices is not the only factor you should be worried about. In many cases, human error is likely to be the biggest factor that would result in data loss, particularly when you have a small (or no) system in place.

“Complete” copies are an important factor here. Defining “completeness” is something worth thinking through.  For example, a “complete copy” may be defined in terms of the integrity of the digital file or files that make up your source and your target.   At the most basic level, when you make copies you want to do a quick check to make sure that the file size or sizes in the copy are the same as the size of the original files. Ideally, you would run a fixity check, comparing for instance the MD5 hash value for all the first copies with the MD5 hash value of the second copies. The important point here is that “trying” to make a copy is not the same thing as actually having succeeded in making a copy.  You are going to want to be sure you do at least a spot check to make sure that you really have created an accurate copy.

For data on heterogeneous media (optical discs, hard drives, etc.) get the content off the media and into your storage system

A recording artist ships a box full of CDs and hard disks to their label for production of their next release. A famous writer offers an archive her personal papers and includes two of her old laptops, a handful of 5.25 inch floppies, and a few keychain quality flash drives. An organization’s records management division is given a crate full of rewritable CDs from the accounting department. In each of these cases, a set of heterogeneous digital media have ended up on the doorstep of a steward often with little or no preliminary communications. Getting the bits off that media is a critical first step. None of these methods of storage are intended for long term; in many cases things like flash drives and rewritable CDs are not intended to function, even in optimal conditions, for more than a few years.

So, get the bits off their original media. But where exactly are you supposed to put them? The requirement in this cell suggests you should put them in your “storage system.” But what exactly is that supposed to mean? It’s intentionally vague in this chart in order to account for different types of organizations, resource levels and overall departmental goals.  With that said the general idea is that you want to focus on good quality media (designed for longer rather than shorter life), for example “enterprise quality” spinning disk or magnetic tape (or some combination of the two), and a way of managing what you have.  For the first cell here, the focus is on the quality of the media. However, as requirements move further along it is going to become increasingly important to be able to be able to check and validate your data. Thus easy ways to manage the data on all of your copies becomes a critical component of your storage strategy. For example, a library of “good” quality CDs could serve as a kind of storage system. However, managing all of those pieces of individual media would itself become a threat to maintaining access to that content. In addition, when you inevitably need to migrate forward to future media, the need to individually transfer everything off of that collection of CDs would become a significant bottleneck for being able to move to future media. In short, the design and architecture of your storage system is a whole other problem space, one not really directly covered by the NDSA Levels of Digital Preservation.

Related Resources

You’ve Got to Walk Before You Can Run: First Steps for Managing Born-Digital Content Ricky Erway, 2012

The NDSA Levels of Digital Preservation: An Explanation and Uses Megan Phillips, Jefferson Bailey, Andrea Goethals, Trevor Owens

How Long Will Digital Storage Media Last? Personal Digital Archiving Series from The Library of Congress

One Comment

  1. http://datastoragebristol.weebly.com/
    February 5, 2014 at 4:48 am

    Good info. I remember the days when we had CD back ups of everything. It was a nightmare when we actually wanted to access anything!

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.