Top of page

Dodge that Memory Hole: Saving Digital News

Share this post:

Newspapers are some of the most-used collections at libraries. They have been carefully selected and preserved and represent what is often referred to as “the first draft of history.” Digitized historical newspapers provide broad and rich access to a community’s past, enabling new kinds of inquiry and research. However, these kinds of resources are at risk of being lost to future users.  Networked digital technologies have changed how we communicate with each other and have rapidly changed how information is disseminated. These changes have had a drastic effect in the news industry, disrupting delivery mechanisms, upending business models and dispersing  resources across the world wide web.

Current library acquisition and preservation methods for news are closely linked to the physical newspaper. Ensuring that the new modes of journalism, which are moving toward a “digital- and mobile-first” model, are captured and preserved at libraries and other memory institutions is the main goal of the Dodging the Memory Hole series of events. The first was organized in November 2014 by the Reynolds Journalism Institute at the University of Missouri.  The most recent took place in May of 2015 and was organized by the Educopia Institute at the Charlotte Mecklenburg Public Library in Charlotte, NC.

Hong Kong, 31st day of the Umbrella Revolution, taken October 28, 2014 by Pasu Au Yeung
Hong Kong, 31st day of the Umbrella Revolution, taken October 28, 2014 by Pasu Au Yeung.

I had the opportunity to close out the May meeting and highlight areas where continued work would have an impact in helping libraries collect, preserve and provide access to born-digital news. A (slightly longer but hopefully clearer) version of my talk (pdf) is below.

I want to start with a photograph from last year’s protest in Hong Kong known as the Umbrella Revolution. The picture speaks to the complexity of the problem we face in capturing and preserving the news of today. The protest was unique in that it was one of the first protests in China organized, sustained and broadcast via social media. Capturing a diverse set of materials about this news event would mean capturing the stories from established media companies and the writings and images from individual blogs and other social media. This is especially important in the case of the Umbrella Revolution because official media outlets (and social media accounts) in China are often censored. This protest was also an example of how activism in general has adapted due to networked digital technologies. Future researchers studying social and political movements happening right now would never get the whole story without access to the social media.

The role of the journalist is to get the story out and just like other publishers in the digital age, they’ve had to adapt to stay relevant. Digital storytelling is becoming more dynamic,  exemplified by publications like Highline, a new long-form product from Huffington Post which is richly illustrated with audio and visual elements and is translated into a variety of languages. We can expect that in the pursuit of getting the story out and advancing story telling, news content will come from more sources, be more dynamic and continue using all kinds of formats and distribution mechanisms.

Memory hole.
Memory hole.

Libraries have also been transformed by digital technologies. There are a large number of digitized collections; we are creating vast and rich resources and, I think, providing great access and good stewardship to a large amount of this digitized content. Chronicling America and the Digital Public Library of America are great examples of this. However, there are gaps–or holes–in our collections, especially the born-digital content about contemporary events. Libraries haven’t broadly adopted collecting practices so that they are relevant to the current publishing environment which today is dominated by the web.

Several people at this meeting mentioned the study done by Andy Jackson (ppt) at the British Library. I have his permission to share these slides which he presented at the recent General Assembly of the International Internet Preservation Consortium. It is a simple but powerful study of ten years (2004-2014) worth of content from the UK Web Archive. It aims to find out what they have in their archive that is not on the live web anymore. He looked at a sample of URLs per year and analyzed the content to determine if the content at the URL in the archive was still at the same URL on the live web. He broke down and color coded the URLs according to a percentage scale expressing if the content was moved, changed, missing or gone. He found that after one year half of the content was either gone or had been changed so much as to be unrecognizable. After ten years almost no content still resides at its original URL. This analysis was done across all domains but you can make a logical assumption that news content wouldn’t fare any better if subjected to this same type of analysis.

Fifty percent of URLS in the UK Web Archive have lost or missing content after one year. After ten years nearly all content is lost or missing.
Fifty percent of URLS in the UK Web Archive have lost or missing content after one year. After ten years nearly all content is moved, changed, missing or gone. Credit: Taken from a presentation given by Andy Jackson at the IIPC GA  Apr 27, 2015. The full presentation available at

We have clear data that if content is not captured from the web soon after its creation, it is at risk. Which brings me to where I think our main challenge is with collecting born-digital news: library acquisition policies and practices. Libraries collect the majority of their content by buying something–a newspaper subscription, a standing order for a serial publication, a package of titles from a publisher, an access license from an aggregator, etc. The news content that’s available for purchase and printed in a newspaper is a small subset of the content that’s created and available online. Videos, interactive graphs, comments and other user-generated data are almost exclusively available online. The absence of an acquisition stream for this content puts it at risk of being lost to future library and archives users.

Establishing relationships (and eventually agreements) with the organizations that create, distribute and own news content is one of the more promising strategies for libraries to collect digital news content.  Brian Hocker from KXAS-TV, an NBC affiliate in the Dallas area, shared the story of how KXAS partnered with the University of North Texas Libraries to digitize, share and ultimately preserve their station’s video archives as part of the Portal for Texas History. Jim Kroll from the Denver Public library also shared his story of acquiring the archives of the Rocky Mountain News after the newspaper ceased publication. Both stories emphasized the importance of establishing lasting relationships with decision-makers from news outlets in their respective communities. They also each created donor agreements that provided community access to the news archives which can serve as models for future agreements.

The relationships that enabled these agreements were the result of what I think of as entrepreneurial collection development in the model of acquiring special collections. The archives were pursed actively and over time, they represent a new type of content, required a new type of relationship with a donor and were a good fit–both geographically and topically–with existing collections at UNT and DPL.

Web archiving is another promising strategy to capture and preserve born-digital news. The Library of Congress recently announced its effort to save news websites, specifically those not affiliated with traditional news companies. Ben Walsh, creator of, announced that his service is now Memento-compliant, which will allow the archived front pages of websites from major-market newspapers that PastPages collects to be available in a Momento search. These projects will capture content at a national level, but the hyper-local news sites and citizen journalism and other niche blogs– news that used to be published as community newsletters or pamphlets–are most likely not being captured. Internet Archive’s Archive-It service is a mechanism for smaller libraries to engage in web archiving and capture some of this unique content. Capturing the social media around news events continues to be challenging but tools have been developed to capture tweets and collections of tweets around news events are being captured and shared.

The Dodging the Memory Hole events have thus far been excellent opportunities to bring librarians, archivists, the news industry and technologists together to help save news content for future generations. Look for more from this group on awareness raising, studies on what news content has already been lost, collaborations with the developers of news content management systems, and more guidance on developing donation agreements. To read more about the event, check out Trevor Owens’ report on the IMLS blog.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.