A Collection of Collections: Learning More

If you are a frequent visitor to digitalpreservation.gov you may have come across our Collections section. In it we list the digital collections that have been selected for preservation by our NDIIPP partners. We are using Viewshare to make the list more interactive and to help us learn more about this collection of collections. We’ve just revised the view to include digital collections preserved by NDSA partners, upping the number to 1406, and we’ve included additional views that reveal some broad characteristics of this group of digital collections.

Updated Collections View

Updated Collections View

Key outcomes of the NDIIPP program are to identify priorities for born digital collections, engage organizations committed to preserving digital content, and to preserve at-risk digital materials of high importance for the research, scholarship and cultural heritage communities. (See the NDIIPP 2010 report for more details). Essential to achieving these outcomes is understanding the characteristics of born digital materials that libraries and other institutions collect; Viewshare helps us do that by pulling out common elements for comparison and analysis.

To create this view we collected data from our partners, created a spreadsheet and loaded it into Viewshare. NDIIPP partners are a diverse group that includes libraries large and small, archives, museums, state governments and federal agencies. We asked for collection data in any form they had. Working with Viewshare made the process of normalizing the different data not excruciating. We were able to see where the gaps were and where it made sense to apply effort in refining the data.

Our goal for the view is to showcase the collections, to understand what has been preserved and what makes born digital collections unique. A couple of notes about the data in the view: We are presenting collections but each institution defines a collection differently. For example the collections from the DATAPass project are defined by a single survey or poll, the responses and supporting material. Our archives partners have collections of papers and manuscripts by an individual or organization; libraries define collections by subject area, creator or publisher. The Subjects and Types assigned to the collections in this view are also admittedly imprecise. Our goal is not to describe the collections in detail–this has been done by the collecting institution. We want to know about the broad landscape of digital collections and to use the data we have.

Because Viewshare provides easy ways to map data we filled in geographic information both for the collections and for the collecting institutions.  Currently, Viewshare can only map a single point so we had to figure out where to place a pin for a collection that is not about a particular area. What if a collection is about a region, like Central America? We decided we wanted to show all the collections on the map,  so for regions we picked the center and mapped collections there. For global collections we picked the latitude and longitude of 0,0 (off the coast of West Africa). Again, this is imprecise but we think it is still useful to see how the collections are spread across the world.

Looking at the COLLECTIONS map view you can see fairly good coverage of most of the globe. However, when you switch to the COLLECTING INSTUTIONS view you see the institutions most engaged in digital collecting are concentrated in Western Europe and North America—a signal that more partnerships could be pursued. Is this a situation unique to born digital content? What does this say about the heritage and technical infrastructure of today?

Type of Content - Government

Type of content collection - detail of Government

The CONTENT TYPE view shows what types of materials are being collected. The large majority of collections are Text and/or Images. This category includes the many collections of digititized books and journals that our partners are preserving. What would this picture look like in 10 years, will digitized materials still be the majority? Clicking through the Subjects facet could give hints for the future. In many subject areas Web sites make up more collections, especially of note is the Government, Law and Politics area. What if we were charting file size and not number of collections, how would this chart look differently, what would it look like in 10 years?

Scissors Tool

Use the scissors tool to extract data from the view.

These are just a few questions we are grappling with in terms of identifying and preserving digital collections. Viewshare has been extremely useful for our purposes. The NDSA Content Working Group is also thinking about how data and views like this could help the community declare that preservation action has been taken for certain collections. We also want to explore how data and views like this could help researchers use our digital collections in different ways and have our content reach more audiences.

If you are interested in looking at or using the underlying data that create this interface click on the scissors icon (on the view) and copy/paste the data into your own spreadsheet. If you want to add a collection to this view email [email protected].

A note of thanks to Nick Krabbenhoeft, our intern from the University of Michigan School of Information, who worked on this view.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.