Prognosticating Digital Preservation Infrastructure: Final Results from the NDSA Storage Survey

The following is a guest post by Jefferson Bailey, Fellow at the Library of Congress’s Office of Strategic Initiatives.

Over the last few months, we have been reporting results from the storage survey conducted by the NDSA Infrastructure Working Group, one of the five working groups of the The National Digital Stewardship Alliance. See the note at bottom for details on the survey. Previous posts have discussed the role of access requirements, cloud and distributed storage, and fixity checking in building and maintaining digital preservation infrastructure. In this post, our last before we release a summary report of the survey results, we examine some of the findings of the survey related to strategic planning for digital preservation storage infrastructure.

Before we examine those topics, however, there are a number of survey questions which didn’t fit thematically in early posts but will be of interest to readers – specifically storage media currently being used by survey respondents, how many preservation copies of digital assets institutions are keeping, and the number of members that have documented requirements for storage systems

Number of Copies, Storage Media, and Documented Requirements

The below chart shows the number or preservation copies institutions are keeping, with 45% (25 of 56) keeping 3 or more copies of their digital assets.

As far as storage media being used for preservation storage, members are using the following, with some members using multiple kinds of media:

  • 62% (36 of 58) of members are using spinning disk – locally or network attached  storage
  • 55% (32 of 58) of members are using – spinning disk storage area network
  • 53% (31 of 58) of members are using magnetic tape
  • 7% (4 of 58) of members are using optical disks
  • 5% (3 of 58) of members are using cloud storage
  • 3% (2 of 58) of members are using non-networked drives

The question “does your organization have specific document requires for your storage systems” elicited a wide range of responses to the different types of requirements. Forty-nine of the fifty-eight organizations that responded to the survey reported currently having some form of requirements, or planning to develop requirements in the next year.

  • 43% (21 of 49) have documented functional requirements
  • 37% (18 of 49) have documented security requirements
  • 35% (17 of 49) have documented general performance requirements
  • 29% (14 of 49) plan to develop requirements within one year
  • 18% (9 of 49) have other documented requirements
  • 16% (8 of 49) have documented performance requirements for ingest
  • 12% (6 of 49) have documented performance requirements for migration to new technology or other one-time intensive operations

For the 18% claiming “other documented requirements,” these were most often client-specific or content-specific requirements.

Storage Usage and Expectations

One fundamental consideration when planning digital preservation infrastructure needs is the amount of storage space required. The survey queried participants both on the amount of storage space they were currently using for all copies of their digital content and the amount they expect to need three years from now:

Charting out these numbers shows the expected growth of storage needs in the next three years, especially in the upper ranges of storage amount. The chart shows many of the member organizations moving out of the less then ten terabytes category and moving up into the bigger brackets. Notably, the 1000+ TB (1 PB)  category is likely to see the largest increase, more than doubling from 5 members to 11.

When averaged out between the two questions, the disparity in the amount of storage currently used and expected to be needed in three years becomes even more apparent. Current usage averaged out to 492 TB per institution whereas anticipated need in three years more than doubled, averaging out to 1107 TB per institution.

Predicting Future Storage Needs

A number of the survey questions asked members to estimate other aspects of digital preservation storage needs three years from now. While cost modeling for digital preservation has been getting increased research scrutiny lately (here are links to three recent resources), the Storage Survey polled members on issues of strategic planning and administration of infrastructure including expectations on technology changes, available resources, organizational plans, and audit and certification as a trustworthy repository.

The speed of technological change and its impact on digital preservation is nowhere more evident than in the fact that 64% (37 of 58) of respondents agree or strongly agree that their organization plans to make significant changes in technologies in their preservation storage architecture within the next three years. At the same time, survey participants remained confidant of their ability to meet these challenges, with 83% (48 of 58) agreeing or strongly agreeing their institution will have adequate resources to meet projected preservation storage requirements over the next three years.

As is evident in the table, the statistics on adequate resources expectations and proper organizational planning are very similar. The positivity reflected in these numbers is a good sign for the future of digital preservation. Another positive result revolved around expectations for meeting the requirements for the recently approved ISO Standard 16363, better known to many as TRAC or Trustworthy Repositories Audit & Certification (PDF). The fact that over half of survey respondents (60%) plan on complying with the rigorous TRAC standards within three years signals a increased acknowledgement of the importance of these requirements in certifying digital preservation repository standards.

Conclusion

Predicting the future is more an art than a science. The storage survey revealed an inherent optimism in addressing future digital preservation storage infrastructure issues even as anticipated storage needs rise dramatically and technology changes often. The results also revealed the complexity of digital preservation storage planning, especially given the large number of preservation copies being maintained and the diversity of media used and requirements documented. One question not included in the survey, but that should be a part of all future storage surveys, is inquiring if institutions are planning on maintaining the same number of file copies into the future or whether redundancy policies are flexible in response to infrastructure limitations or forecasting. As the size of each digital item increases, and as size-intensive formats like audio and video become a larger percentage of preserved collections, keeping multiple copies will have an increased impact storage capacity needs. Other potential areas of investigation for a future survey could revolve around the fickle role of formats, compression, and means of access play in determining storage infrastructure. As institutions plan for their future storage needs, the knowledge sharing and collaboration of the NDSA will offer guidance as they make digital preservation infrastructure decisions.

Note on survey data

The NDSA Infrastructure Survey, conducted between August 2011 and November 2011, received responses from 58 members of the 74 NDSA member organizations who are preserving digital content. This represents a 78% response rate. The goal of this survey was to get a snapshot of current storage practices within the organizations of the National Digital Stewardship Alliance. The original survey was sent out to the then 98 members of the NDSA (current membership is 116 institutions). We confirmed that 24 members do not have content they are actively involved in preserving. These organizations include consortia groups, professional organizations, university departments, funders and vendors. There were 16 organizations that neither responded to the survey nor indicated that they were not preserving digital collections. The 16 non-respondents are distributed across the different kinds of organizations in the NDSA, including state archives, service providers, federal agencies, universities and media producers.

 

3 Comments

  1. Ed Summers
    August 3, 2012 at 12:59 pm

    Thanks for these blog posts about the survey. Are you planning on releasing the survey data in some form?

    One thing that struck me about the survey results is the apparent lack of requirements for the cost of storage, at least in the types requirements that responders identified.

    There also seemed to be a bit of a disconnect between the anticipated needs for storage, current storage capacity, and the needs for digital preservation. Maybe I did the math wrong, but it looks like 83% of institutions feel they have adequate resources for their storage needs, and 65% of institutions have less than 110 TB of storage currently. Yet according to the Blue Ribbon Taskforce Report on Sustainable Digital Preservation and Access there are an estimated 2,500,000 petabytes of data in the world that need preservation.

    I realize that NDSA institutions aren’t responsible for archiving all the world’s data. But adding up the numbers from the Storage Usage table above it appears that NDSA institutions collectively only have capacity to store .0008% of the extant data in need of preservation, and only account for .02% of the existing storage that is available.

    Maybe I’m just looking at the glass half full, but it seems that NDSA institutions might be overly optimistic about their abilities to preserve the ever increasing amount of digital content being generated, aren’t planning on doing much archiving, or are perhaps naive about the magnitude of the digital deluge they are facing.

    In my opinion the concern that David Rosenthal voiced at the recent Digital Preservation 2012 meeting is right: we really need to have more financial support (grants, etc) for exploring cost-effective storage solutions so that we can meet this challenge with eyes wide open.

  2. Trevor
    August 3, 2012 at 2:35 pm

    I think a key point here is that the organizations feel optimistic about doing what they need to do to meet their organizations current and future storage needs. To your question, “There also seemed to be a bit of a disconnect between the anticipated needs for storage, current storage capacity, and the needs for digital preservation.” I don’t think there is a disconnect here as much as there are very different things.

    Part of this is that organizations set the marks for what they can do with the resources they have not what they would like to have. As I see it, only taking on what you have the ability to handle is a critical component of good stewardship.

    I’m with you about the important message David had at the meeting. There are good reasons to be concerned about the deluge of data and the need for systems that can work with that kind of data. At the same time, I think the need to focus on making the best use of resources at hand by being selective in deciding on what to collect and making sure that you only take in what you can manage is also critical.

  3. Glen McAninch
    September 10, 2012 at 8:45 am

    I believe that a question about use of commercial cloud storage services would also be in order along with the cost factors mentioned above.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.