Preserving Born Digital News at Digital Preservation 2014

The following is a guest post from Anne Wootton, CEO of Pop Up Archive, which makes tools for preserving and creating access to digital spoken word; Edward McCain, the Digital Curator of Journalism at the Donald W. Reynolds Journalism Institute at the University of Missouri; Leslie Johnston, Direction of Digital Preservation at the National Archives and Records Administration; and Aurelia Moser, a librarian-journalist and current Knight-Mozilla OpenNews fellow.

As everyone gears up for the annual Digital Preservation 2014 conference next week, we’re excited to crash the party with a bunch of journalists.

In all seriousness: we proposed our session for DigiPres 2014 in reaction to growing alarm among journalists that as news is increasingly digital in nature, news organizations don’t have the expertise or resources to properly archive or preserve their digital work for future generations — or even to preserve their work for a year from now.

This is true at small regional newspapers struggling to maintain a web presence and keep digital backups of their articles, images and videos. It’s also true at the biggest national news organizations building interactive news apps and data-driven journalism projects.

Important initiatives like the National Digital Newspaper Program (a partnership of the Library of Congress and the National Endowment for the Humanities) are helping save the legacy of print journalism for future generations. Educopia published a guide to digital newspaper preservation for libraries and cultural heritage institutions this year.

But what about the digital news being created today, especially given its state of continual flux and evolution?

One of us, Edward McCain, has written: “The problems surrounding preservation of and access to digital news archives stem from a combination of frequently changing factors. …[Losing digital] news, birth announcements, obituaries and feature stories about the happenings in any community represents a loss of cultural heritage and identity. It also has an effect on the news ecosystem, since reporters often depend on the ‘morgue’ – newspaper parlance for their library – to add background and context to their stories.”

For example, at a gathering of journalists last fall, Scott Klein of ProPublica told the story of Adrian Holovaty’s Chicago Crime, a groundbreaking news application that is now lost to the world. “What do we want to know in 2014 about that app? What would we want to know in 2034? It’s not just the code that Adrian wrote or the map itself … We want to know about his process. We want to know the infrastructure on which he built the app … We want to know about how it was designed, how the user interactions worked. We want to know the impact it had and who responded to it.”

In our Digital Preservation 2014 session on Wednesday 7/23 we will share stories from our varied experiences in digital news production and archiving. We come from different backgrounds, but we share in common a desire to better unite journalists and news organizations with archivists and the archival communities best equipped to provide them with resources. For starters, we thought we’d kick off the conversation with early results from a born-digital news archiving survey conducted by Edward and RJI.

RJI’s Journalism Digital News Archive initiative was launched in 2013 with the mission of finding and implementing viable solutions for saving news content originally produced in a digital format. There are a number of challenges implicit in this undertaking, not the least of which is that we don’t know a lot about the current state of born-digital news archives.

A recent RJI/JDNA survey addresses this knowledge deficit about the policies and practices of news organizations when it comes to born-digital content. In a survey of 476 news organizations, the largest group questioned, with 406 respondents, was denoted as “Hybrid” enterprises: print newspapers with an online platform. The smaller group of 70 respondents was denoted “Online Only:” organizations that publish their content via the World Wide Web exclusively.

The survey asks about content production; what kinds of digital objects are these news organizations creating? Edward and his team asked about the basics such as text and images, but also about video, interactive, mobile-only and other formats, starting with: “Does your news organization produce born-digital text content?” (Respondents were given a working definition of born-digital content: “materials that originate in a digital form, not scanned from other media. Examples include digital photographs, digital documents, harvested web content, digital manuscripts, electronic records, and etc.”)

It should come as no surprise that the vast majority of news organizations create digital content in text format (Figure 1). But what does it mean that 6% of Hybrid enterprises report that they don’t produce text? It could be that they are engaged in some unorthodox business model or perhaps they didn’t really understand the question. If those 6% of survey respondents really find the language or concepts behind the question about born-digital text formats, it may indicate a more general disconnect between journalistic and digital preservation cultures.

Born-digital news text produced.png

Figure 1. Does your news organization produce born-digital text content?

RJI’s survey also asked about the use of Content Management Systems for storage and retrieval as well as what other kinds of technology were currently in use for providing access to digital news archives (Figure 2). More than twice the percentage (47 percent) of Online Only organizations store and retrieve their digital content on their own CMS versus Hybrid news producers (20 percent). As with many of the responses, the difference here may be due to the age and size of each type of organization.

CMS usage to store:retrieve BDNC.png

Figure 2. Do you store and retrieve born-digital content using a CMS?

Edward and his team also asked organizations about the completeness of their born-digital archives, how far back they have access to those files, and whether or not they work with a memory institution such as a library, museum or archive to preserve their electronic news assets. The survey asked about the value of born-digital news archives from different perspectives: historical content creation, audience engagement, quality journalism and return on investment. It also asked about loss of digital content as well as perceived threats in areas such as media failure, technical obsolescence, policy and resources. And since librarians – not journalists – have been the driving force behind the creation and preservation of news archives, the survey asked organizations if they employed a news librarian or equivalent position.

Please join us at Digital Preservation 2014 from 10:45–noon on Wednesday, July 23rd in the West End Ballroom Salon C to hear the answers to these questions and to share your ideas about how digital preservationists can join force with journalists and other stakeholders to improve preservation and access for digital news content.

Our 2014 Digital Preservation Born Digital News Archiving panel is a continuation of initiatives that many of us are already involved in with colleagues from the Reynolds Journalism Institute, the Newseum, the Mozilla Foundation, the NYTimes, ProPublica, and the Washington Post. Read more here:

Please leave a comment below with thoughts on the topic or ideas in advance of the meeting. We welcome all questions and suggestions.

Scoring, Not Storing: Digital Preservation Assessment Criteria at #digpres14

The following is a guest post by Seth Anderson, consultant at AVPreserve.  This is part of an ongoing series of posts to highlight and preview the Digital Preservation 2014 program.  Here Seth previews the session he organized, “Digital Preservation Audit and Planning with ISO 16363 and NDSA Levels of Preservation,” scheduled for Wednesday, July 23 […]

July Library of Congress Digital Preservation Newsletter

The July issue of the Library of Congress Digital Preservation newsletter is now available! In this issue: Featuring “Digital Preservation and the Arts” including Web Archiving and Preserving the Arts, and Preserving Digital and Software-Based Artworks An Interview with Marla Misunas (and friends) of SFMOMA, part 2 Community Approaches to Digital Stewardship Exhibiting GIFs, with […]

Preserving Digital and Software-Based Artworks: Recap of a NDSA Discussion

In response to a suggestion from our active membership, the NDSA Standards and Practices Working Group recently hosted a discussion about preserving digital and software-based artworks. Interestingly, the suggestion for this topic came not from a museum staffer but by Winston Atkins, Preservation Officer at Duke University Libraries. Complex materials like digital art works and […]

June Library of Congress Digital Preservation Newsletter Now Available

The June 2014 Library of Congress Digital Preservation Newsletter (pdf) is now available! Included in this issue: The 2014 NDSA Innovation Award Winners. An Insights Interview with Zach Whalen. Comparing Formats for Still Image Digitizing. Big Data is not Going to Manage Itself. Personal Digital Archiving: recent projects at two public libraries. Residency Program: wrapping […]

All that Big Data Is Not Going to Manage Itself: Part Two

Yesterday’s blog post described some of the federal government initiatives that have driven data management requirements over the past ten years or so. “Data management” is a hot job area right now, and if you tilt the digital stewardship universe a certain direction, almost everything we do falls under the rubric of “data management.” Data […]

Comparing Formats for Still Image Digitizing: Part Two

The following is a guest post by Carl Fleischhauer, a Digital Initiatives Project Manager in NDIIPP. This is the second post (of two) on the recently posted comparison of selected digital file formats compiled by the Still Images Working Group within the Federal Agencies Digitization Guidelines Initiative.  In this post, I’ll offer some thoughts about JPEG […]

Comparing Formats for Still Image Digitizing: Part One

The following is a guest post by Carl Fleischhauer, a Digital Initiatives Project Manager in NDIIPP. The Still Image Working Group within the Federal Agencies Digitization Guidelines Initiative (FADGI) recently posted a comparison of a few selected digital file formats.  We sometimes call these target formats: they are the output format that you reformat to.  In […]