Part of a continuing series of alphabetically chosen digital preservation topics.
I believe a “picture is worth a thousand words” especially when masses of digits form a new shape that presents fresh insights. The Library and National Endowment for the Humanities have been working with partners for several years to build a digital archive of historic newspapers. The project has worked hard to enable text searching and article viewing of these newspapers. But recently I was delighted to see an interactive map showing the spread of newspapers across the American continent over the last 300 years. The Rural American West Initiative created the visualization from the data about 140,000 newspapers embodied in Chronicling America.
The fresh insight is that data sets are not just scientific and business tables and spreadsheets, our cultural heritage collections are now considered data. They are the digital building blocks for interpretation and new discoveries that transform them into entities that we may not recognize as cultural heritage information. Researchers use algorithms to mine the rich information and tools to create pictures that translate that information into knowledge.
Until a few years ago, I did not think of digital collections as data. I thought data were gathered from satellites or collected during scientific experiments. On some level, I had the idea that digital libraries would be used online much as they were used in their analog forms. I did not think of them being used as data. We encounter more and more researchers who want to use collections as a whole, mining and organizing the information in novel ways. When we began archiving election web sites, we imagined users browsing through the web pages, studying the graphics or use of phrases or links. But when our first researchers came to the Library, they wanted to know about all those topics, but they used the computer and scripts to look for them and sort them into categories. They were not very much interested in reading web pages.
If you need some more evidence of this trend toward data, check out the Digging into Data Challenge. The repositories available for research include not only scientific information—astronomy, geology, physics, biology, social science surveys, they also include images, film, sound, newspapers, maps, art, archaeology, architecture and government records. The second round of awards sponsored by eight international research funders, representing Canada, the Netherlands, the United Kingdom and the United States will be announced in December.
So in terms of digital preservation practice, cultural heritage collections benefit from being thought of as data. In 2005, two Dutch science organizations joined together to form the Data Archiving and Networked Services. Their work, although directed at scientific communities is applicable to cultural heritage archives. The Data Seal of Approval distills many methods, practices and standards into a manageable set of guidelines that address A) the quality of the data, B) the quality of the data repository and C) the quality of access to and use of the data. The brief guidelines document provides a clear roadmap for preserving digital information.
As digital preservationists, I think we can use the Data Seal of Approval guidelines to A) assess our stewardship of digital information, B) engage with researchers to learn more about how they think of using our digital libraries and C) engage with producers to foster good practices around the creation of the data.