The following is a guest post by Jefferson Bailey, Fellow at the Library of Congress’s Office of Strategic Initiatives.
Over the last few months, we have been reporting results from the storage survey conducted by the NDSA Infrastructure Working Group, one of the five working groups of the The National Digital Stewardship Alliance. See the note at bottom for details on the survey. Previous posts have discussed the role of access requirements, cloud and distributed storage, and fixity checking in building and maintaining digital preservation infrastructure. In this post, our last before we release a summary report of the survey results, we examine some of the findings of the survey related to strategic planning for digital preservation storage infrastructure.
Before we examine those topics, however, there are a number of survey questions which didn’t fit thematically in early posts but will be of interest to readers – specifically storage media currently being used by survey respondents, how many preservation copies of digital assets institutions are keeping, and the number of members that have documented requirements for storage systems
Number of Copies, Storage Media, and Documented Requirements
The below chart shows the number or preservation copies institutions are keeping, with 45% (25 of 56) keeping 3 or more copies of their digital assets.
- 62% (36 of 58) of members are using spinning disk – locally or network attached storage
- 55% (32 of 58) of members are using – spinning disk storage area network
- 53% (31 of 58) of members are using magnetic tape
- 7% (4 of 58) of members are using optical disks
- 5% (3 of 58) of members are using cloud storage
- 3% (2 of 58) of members are using non-networked drives
The question “does your organization have specific document requires for your storage systems” elicited a wide range of responses to the different types of requirements. Forty-nine of the fifty-eight organizations that responded to the survey reported currently having some form of requirements, or planning to develop requirements in the next year.
- 43% (21 of 49) have documented functional requirements
- 37% (18 of 49) have documented security requirements
- 35% (17 of 49) have documented general performance requirements
- 29% (14 of 49) plan to develop requirements within one year
- 18% (9 of 49) have other documented requirements
- 16% (8 of 49) have documented performance requirements for ingest
- 12% (6 of 49) have documented performance requirements for migration to new technology or other one-time intensive operations
For the 18% claiming “other documented requirements,” these were most often client-specific or content-specific requirements.
Storage Usage and Expectations
One fundamental consideration when planning digital preservation infrastructure needs is the amount of storage space required. The survey queried participants both on the amount of storage space they were currently using for all copies of their digital content and the amount they expect to need three years from now:
Charting out these numbers shows the expected growth of storage needs in the next three years, especially in the upper ranges of storage amount. The chart shows many of the member organizations moving out of the less then ten terabytes category and moving up into the bigger brackets. Notably, the 1000+ TB (1 PB) category is likely to see the largest increase, more than doubling from 5 members to 11.
When averaged out between the two questions, the disparity in the amount of storage currently used and expected to be needed in three years becomes even more apparent. Current usage averaged out to 492 TB per institution whereas anticipated need in three years more than doubled, averaging out to 1107 TB per institution.
Predicting Future Storage Needs
A number of the survey questions asked members to estimate other aspects of digital preservation storage needs three years from now. While cost modeling for digital preservation has been getting increased research scrutiny lately (here are links to three recent resources), the Storage Survey polled members on issues of strategic planning and administration of infrastructure including expectations on technology changes, available resources, organizational plans, and audit and certification as a trustworthy repository.
The speed of technological change and its impact on digital preservation is nowhere more evident than in the fact that 64% (37 of 58) of respondents agree or strongly agree that their organization plans to make significant changes in technologies in their preservation storage architecture within the next three years. At the same time, survey participants remained confidant of their ability to meet these challenges, with 83% (48 of 58) agreeing or strongly agreeing their institution will have adequate resources to meet projected preservation storage requirements over the next three years.
As is evident in the table, the statistics on adequate resources expectations and proper organizational planning are very similar. The positivity reflected in these numbers is a good sign for the future of digital preservation. Another positive result revolved around expectations for meeting the requirements for the recently approved ISO Standard 16363, better known to many as TRAC or Trustworthy Repositories Audit & Certification (PDF). The fact that over half of survey respondents (60%) plan on complying with the rigorous TRAC standards within three years signals a increased acknowledgement of the importance of these requirements in certifying digital preservation repository standards.
Predicting the future is more an art than a science. The storage survey revealed an inherent optimism in addressing future digital preservation storage infrastructure issues even as anticipated storage needs rise dramatically and technology changes often. The results also revealed the complexity of digital preservation storage planning, especially given the large number of preservation copies being maintained and the diversity of media used and requirements documented. One question not included in the survey, but that should be a part of all future storage surveys, is inquiring if institutions are planning on maintaining the same number of file copies into the future or whether redundancy policies are flexible in response to infrastructure limitations or forecasting. As the size of each digital item increases, and as size-intensive formats like audio and video become a larger percentage of preserved collections, keeping multiple copies will have an increased impact storage capacity needs. Other potential areas of investigation for a future survey could revolve around the fickle role of formats, compression, and means of access play in determining storage infrastructure. As institutions plan for their future storage needs, the knowledge sharing and collaboration of the NDSA will offer guidance as they make digital preservation infrastructure decisions.
Note on survey data
The NDSA Infrastructure Survey, conducted between August 2011 and November 2011, received responses from 58 members of the 74 NDSA member organizations who are preserving digital content. This represents a 78% response rate. The goal of this survey was to get a snapshot of current storage practices within the organizations of the National Digital Stewardship Alliance. The original survey was sent out to the then 98 members of the NDSA (current membership is 116 institutions). We confirmed that 24 members do not have content they are actively involved in preserving. These organizations include consortia groups, professional organizations, university departments, funders and vendors. There were 16 organizations that neither responded to the survey nor indicated that they were not preserving digital collections. The 16 non-respondents are distributed across the different kinds of organizations in the NDSA, including state archives, service providers, federal agencies, universities and media producers.