This is a guest post by Abbie Grotke, Library of Congress Web Archiving Team Lead and Co-Chair of the National Digital Stewardship Alliance Content Working Group
You may have seen the news on this blog and elsewhere that the National Digital Stewardship Alliance launched the first ever National Agenda for Digital Stewardship last July. One major section of that document addresses digital content areas. Here’s an excerpt:
Both born‐digital and digitized content present a multitude of challenges to stewards tasked with preservation: the size of data requiring preservation, the selection of content when the totality cannot be preserved, and the selection of modes of both content storage and format migration to ensure long‐term preservation.
Digital stewardship planning must go beyond a focus on content we already have and technology already in use. Even in the near term, a number of trends are evident. Given the ever growing quantity of digital content being produced, scalability is an immediate concern. More and more people globally have access to tools and technologies to create digital content, increasingly with mobile devices equipped with cameras and apps developed specifically for the generation and dissemination of digital content. Moreover, the web continues to be a publishing mechanism for individuals, organizations, and governments, as publishing tools become easier to use. In light of these trends, the question of how to deal with “big data” is a major concern for digital preservation communities.
Selection is increasingly a concern with digital content. With so much data, how do we decide what to preserve? Again, from the agenda:
Content selection policies vary widely depending on the organization and its mission, and when addressing its collections, each organization must discuss and decide upon approaches to many questions. While selection policies for traditional content are most often topically organized, digital content categories, described here, present specific challenges. In the first place, there is the challenge of countering the public expectation that everything digital can be captured and preserved ‐‐ stewards must educate the stakeholders on the necessity of selection. Then there are the general organizational questions that apply to all digital preservation collections. For example, how to determine the long‐term value of content?
Audiences increasingly desire not only access, but enhanced use options and tools for engaging with digital content. Usability is increasingly a fundamental driver of support for preservation, particularly for ongoing monetary support. Which stakeholders should be involved and represented in these determinations? Of the content that is of interest to stakeholders, what is at risk and must be preserved? What are appropriate deselection policies? What editions/versions, expressions and manifestations (e.g. items in different formats) should be selected?
Members of the NDSA’s Content Working Group contributed to 2014 agenda by discussing what content was particularly challenging to them. Report writers then drafted sections of the Agenda to focus on particular challenges with each of the four identified content areas:
- Electronic Records
- Research Data
- Web and Social Media
- Moving Image and Recorded Sound
One simple thing we are doing within the NDSA Content Working Group is holding dedicated meetings focusing on each of the four areas listed above, so that members can learn more and share information about specific challenges, tools in use or being developed and so forth.
The first of these meetings was held December 4, 2013 and focused on web and social media. I provided an overview of web archiving: why web and social media is being archived, who is doing what, what challenges do we face, whether social and ethical, legal, or technical. A PDF of my slides is here. Kris Carpenter from the Internet Archive followed and spoke about the “Challenges of Collecting and Preserving the Social Web.” A PDF of her slides is here.
In January we’ll be focusing on electronic records, and later this spring we’ll have sessions on moving image and recorded sound as well as research data. If you’d like to get in on those conversations, join us in the NDSA!
We don’t claim that the issues surrounding any of the four content types will all be solved over the course of the year, or that these are the only content areas that our members and the broader digital preservation community are dealing with. Who knows what the 2015 Agenda will bring us! But we do hope that by drawing more attention to the challenges we are facing, more research, tools development and related efforts will help advance the work of stewards charged with caring for these digital content areas.