This is part of a series that explores the topic of digital preservation in an alphabetical way. Each post will use a word or phrase as a device to explore a concept and point to a useful resource for understanding specific aspects of the practice of digital preservation.
Almost every week, I encounter some comments or discussions about the appropriateness of the word “collection” when applied to various bodies of digital content. Some say a collection must have more than one thing or others state that items in a collection must share the same provenance. I always think, they are right BUT….
Advantages for digital stewardship. The basic definition of a collection is a group of items gathered by a person or an organization (and maybe even a machine!). In cultural heritage practice, collections have an organizing principle. In digital preservation, the collection is usually the target of preservation actions. Collections offer stewards an organizing principle for managing large volumes of digital content. For example, decisions about description, storage, and migration can be made on a collection by collection basis rather than on the item level.
Throughout the NDIIPP program, our partners have described their work as collection based. However, those collections were diverse. Each project took advantage of the grouping concept to tackle specific areas of exploration and development. The Program looked at digital content through a technical lens in four groups—text and image, audio and video, geospatial and web sites. When grouped in this manner, the projects were able to share expertise about digital content forms. But this was not the only organizing principle at work.
Flexible concept. The collection concept is elastic and can allow for many organizing principles. One group of partners worked with social science datasets, a type of collection not familiar to many libraries but of growing interest in the expanding information environment. These collections were often grouped around a specific set of survey questions or study. The resulting data from the responses become a collection. Another project worked with political web sites. The organizing principle was a topic and a distribution source. The project primarily dealt with the identification, selection, collection, preservation and access to web sites created by government or citizen advocacy groups.
Sometimes a single web site may constitute a collection, especially if it links to many other websites of the same topic. Geospatial collections are organized around geographic locations or the organization collecting and collating the data. One of the digital television projects was organized around international sources and another was organized around specific public broadcasting outlets. A single movie with all its production files could be a collection. The production list of a music studio could be a collection or all the recordings of a single artist or all jazz recordings…. You get the point.
Collection concept and access. As useful as collections are for stewardship organizations, the concept is not particularly helpful to users. Collections named for very well-known personages, such George Washington, are understandable. Collections named for philanthropists may not be so apparent. A finding aid may provide wonderful context for the collection, but the user may bypass that description just to see an image or read a specific text. In the digital realm, a single item can be part of many collections because it does not need to occupy specific shelf space. The technical approach which helps us plan for management of the data is not meaningful to a student looking for information to complete her homework.
In this time of social media, many people want to form their own collections from content available across the web and share them with their friends. We have seen the rise of journalists, bloggers and website producers as curators of content, aggregating selections from the vast array available to tell a story. At the same time, there is a growing interest in data mining, using collections as data rather than as a set of single browseable objects.
The objective of digital preservation is to enable access over the long-term to digital content. How are we bridging from conventions that aid our preservation practices to reach the interests of students, scholars, life-long learners and any curious person?