Top of page

D is for data

Share this post:

Newspapers across the U.S: The Rural American West Initiative

Part of a continuing series of alphabetically chosen digital preservation topics.

I believe a “picture is worth a thousand words” especially when masses of digits form a new shape that presents fresh insights. The Library and National Endowment for the Humanities have been working with partners for several years to build a digital archive of historic newspapers. The project has worked hard to enable text searching and article viewing of these newspapers. But recently I was delighted to see an interactive map showing the spread of newspapers across the American continent over the last 300 years. The Rural American West Initiative created the visualization from the data about 140,000 newspapers embodied in Chronicling America.

The fresh insight is that data sets are not just scientific and business tables and spreadsheets, our cultural heritage collections are now considered data. They are the digital building blocks for interpretation and new discoveries that transform them into entities that we may not recognize as cultural heritage information. Researchers use algorithms to mine the rich information and tools to create pictures that translate that information into knowledge.

Until a few years ago, I did not think of digital collections as data. I thought data were gathered from satellites or collected during scientific experiments. On some level, I had the idea that digital libraries would be used online much as they were used in their analog forms. I did not think of them being used as data.   We encounter more and more researchers who want to use collections as a whole, mining and organizing the information in novel ways. When we began archiving election web sites, we imagined users browsing through the web pages, studying the graphics or use of phrases or links. But when our first researchers came to the Library, they wanted to know about all those topics, but they used the computer and scripts to look for them and sort them into categories. They were not very much interested in reading web pages.

If you need some more evidence of this trend toward data, check out the Digging into Data Challenge.  The repositories available for research include not only scientific information—astronomy, geology, physics, biology, social science surveys, they also include images, film, sound, newspapers, maps, art, archaeology, architecture and government records. The second round of awards sponsored by eight international research funders, representing Canada, the Netherlands, the United Kingdom and the United States will be announced in December.

Guidelines for Data Seal of Approval:Data Archiving and Networked Services

So in terms of digital preservation practice, cultural heritage collections benefit from being thought of as data. In 2005, two Dutch science organizations joined together to form the Data Archiving and Networked Services. Their work, although directed at scientific communities is applicable to cultural heritage archives. The Data Seal of Approval distills many methods, practices and standards into a manageable set of  guidelines that address A) the quality of the data, B) the quality of the data repository and C) the quality of access to and use of the data. The brief guidelines document provides a clear roadmap for preserving digital information.

As digital preservationists, I think we can use the Data Seal of Approval guidelines to A) assess our stewardship of digital information, B) engage with researchers to learn more about how they think of using our digital libraries and C) engage with producers to foster good practices around the creation of the data.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.