From AIP to Zettabyte: Comparing Digital Preservation Glossaries

The following is a guest post by Emily Reynolds, a 2012 Junior Fellow.

As we mentioned in our introductory post last month, the OSI Junior Fellows are working on a project involving a draft digital preservation policy framework. One component of our work is revising a glossary that accompanies the framework. We’ve spent the last two weeks poring through more than two dozen glossaries relating to digital preservation concepts to locate and refine definitions to fit the terms used in the document.

The Library: Encyclopedias, 1964, by LSE Library, on Flickr

The Library: Encyclopedias, 1964, by LSE Library, on Flickr

We looked at dictionaries from well-established archival entities like the Society of American Archivists, as well as more strictly technical organizations like the Internet Engineering Task Force. While some glossaries take a traditional archival approach, others were more technical; we consulted documents primarily focusing on electronic records, archives, digital storage and other relevant fields. Because of influential frameworks like the OAIS Reference Model, some terms were defined similarly across the glossaries that we looked at. But the variety in the definitions for other terms points to the range of practitioners discussing digital preservation issues, and highlights the need for a common vocabulary.  Based on what we found, that vocabulary will have to be broadly drawn and flexible to meet different kinds of requirements.

Digital preservation happens at the intersection of diverse fields, each of which has a point of view that must be taken into account.  This became very clear for some of the terms we were trying to define. One of the terms for which it was most difficult to pin down a useful and succinct definition was “authenticity.” Authenticity in a digital context is a complex issue, encompassing both IT security principles and traditional archival ideology, and the definitions we looked at certainly reflected that.

Genuine Authentic Legit Proven Official Bona Fide Real True Valid Tested, by daemonsquire, on Flickr

Genuine Authentic Legit Proven Official Bona Fide Real True Valid Tested, by daemonsquire, on Flickr

The most useful definitions were those that acknowledged the multiple viewpoints at play, allowing for a more holistic understanding of these terms in a digital preservation environment. For example, the Storage Networking Industry Association dictionary defines authenticity from three perspectives: data management, data security and legal demands. Each of the definitions is relevant to requirements for repositories managing (and authenticating) digital content. Archives New Zealand incorporates several viewpoints in their glossary as well, framing the term in a digital preservation context, an IT context and a recordkeeping context. Again, each of these meanings is essential to the effective management of digital content in a repository.

While each definition centers on similar ideas, the specific mechanisms of what makes an object authentic is different from each perspective. Is it verifying the creator and chain of custody of an object? Is it assuring that the metadata and file formats are what were expected? Is it a matter of making sure that checksum values match? For paper materials, such questions are more straightforward; with digital content, both defining and assessing authenticity encompasses a range of different attributes. Many of the definitions that we reviewed demonstrate the range of voices present in the digital preservation community, as well as the many purposes that any single term can serve.

5 Comments

  1. Mark Evans
    July 11, 2012 at 3:18 pm

    Great article on one of the biggest challenges in digital preservation – Semantics. Thanks for posting.

    I really appreciate your reference to one of the most challenging terms to define – Authenticity

    It has taken me many years to fully appreciate that this single term can mean very different things to very different audiences.

    I would also add “Legal” to your list of considered viewpoints, as they can often be a key influencer of any digital preservation policy and process.

  2. Nick
    July 11, 2012 at 3:44 pm

    I’m reading a lot about OAIS right now, particularly David Giaretta’s Advanced Digital Preservation, and I’m realizing the importance of the designated community. Are there any definitions of authenticity that frame it in that context? Whether future users of an object believe it to be the correct object should be an important consideration for archives.

  3. Emily Reynolds
    July 12, 2012 at 11:40 am

    Thanks for your comment! The SNIA definition linked in the post does mention a legal perspective.

    I agree that it’s a very important part of digital preservation, but you’re right to point out that it certainly wasn’t a common point of view across the definitions we reviewed.

  4. Elisa Lanzi
    January 11, 2013 at 12:29 pm

    I too am struggling with finding the right glossary(ies) that will be referenced in our digital preservation policy. I’m attempting to include a list of top 10 terms with definitions within the policy and then link out to an external glossary. Our policy is being drafted as a college-wide policy (led by the libraries, but applying to digital resources across the college). One of my challenges is in how we use terms such as: digital asset, digital object, digital content, digital resources, digital materials, etc. While some of these terms have strict definitions, some are a bit more slippery.
    I will be watching your progress, Emily and Junior Fellows. Thanks for doing this important work.

  5. Butch Lazorchak
    January 11, 2013 at 12:46 pm

    The NDSA is working on a glossary as part of its “Levels of Preservation” work (see blog post at //blogs.loc.gov/digitalpreservation/2012/11/ndsa-levels-of-digital-preservation-release-candidate-one/).

    An early draft of the glossary is up at http://www.digitalpreservation.gov/ndsa/ndsa-glossary.html.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.