The Release of the NDSA Digital Stewardship Glossary

Glossaries are important. If you can’t agree on the definition of terms how will you know what you’re arguing about? And if you don’t know what you’re arguing about you’ll never come to an agreement on anything!

In this picture V. Valta Parma, Curator of the Rare Book Division at Congressional Library, is showing Ethel Hearn the first and present Webster dictionaries, 3/21/38

In this picture V. Valta Parma, Curator of the Rare Book Division at Congressional Library, is showing Ethel Hearn the first and present Webster dictionaries, 3/21/38. Photo from the Library of Congress Collections.

With that in mind, today we’re announcing the release of a National Digital Stewardship Alliance glossary to support the work being done in the NDSA on the Levels of Preservation. The Levels of Preservation activity is working to provide basic digital preservation guidance on how an organization should prioritize its resource allocation. The NDSA glossary strives to provide a common language for NDSA members to communicate about the levels work and should also be useful as a general digital stewardship glossary.

This glossary evolved out of the work of the Library of Congress Digital Preservation Working Group which has spent several years identifying current practices and policies for the preservation of digital material at the Library. The glossary was drafted by our 2012 Junior Fellows as an internal document to define terms in working group progress reports and thus was somewhat limited in term scope. It was never an official Library of Congress product. However, it did provide a solid basis for future work.

The Junior Fellows extensively referenced the available literature to identify pertinent sources, including glossaries developed by the Society of American Archivists, the Digital Curation Centre, Jisc and many others (a list of the referenced works is available here (PDF)). And while the existing sourced glossaries have much to recommend them, there was still a need to develop a glossary that succinctly defined terms of special value to the NDSA and its extended digital stewardship community.

With that in mind, we’re sharing the first version for all to use with the idea that it is a living document and that members of the NDSA and the wider digital stewardship community will contribute to its ongoing development. Feel free to drop suggestions in the comments below or send an email to ndsa@loc.gov with thoughts on future terms and definitions.

3 Comments

  1. Paul Wheatley
    February 12, 2013 at 10:39 am

    Hi,

    After reading somewhat negative comments about this new glossary on Twitter, I thought I’d have a look for myself and see what all the fuss was about. I have to say, I’m a little unsure of why we need another glossary. As you point out, there are already plenty of existing glossaries (http://blogs.loc.gov/digitalpreservation/files/2013/02/NDSA_glossary_references021113.pdf) and some of those provide some pretty solid sources of generally agreed terminology. I do accept that there are some terms that remain ambiguous and/or have different meanings for different groups – although obviously this is a tough nut to crack.

    The only strong reason I can think of for having *another* glossary is if this glossary can make some kind of serious attempt to amalgamate previous work in a useful manner (and indeed there is a strong parallel with my case for bringing together the many competing tool registries http://openplanetsfoundation.org/blogs/2013-01-08-creating-community-owned-digital-preservation-tool-registry-coptr).

    This work has the potential to meet this very useful if lofty aim, but I don’t think it nails this target as it stands. I think that this work would have much greater impact and value to NDSA and the wider world with some pretty small changes to the way it is presented and developed further.

    A critical issue for me seems to be context. Listing existing glossaries and sources of terms in a separate document is a starting point, but what we really need to see is some much more comprehensive linking to existing sources of definitions (where they are applicable) from the actual glossary. Even better would be a few more table columns showing corresponding definitions from other key sources such as OAIS (someone must have done this before?). This would be a way of improving understanding across other standards/glossaries/disciplines as well as an excellent way of validating this glossary. I won’t lecture on provenance, but clearly we must practice what we preach.

    Secondly, a more collaborative approach to this document’s development would take advantage of the great deal of community interest and enthusiasm for the excellent Levels of Preservation document. This fabulous community interest and buy in should be encouraged and supported. The best way to do this, in my opinion, would be to open up these works to a more collaborative form of editing and development, rather than doing it behind closed doors.

    Butch said “…it is a living document and that members of the NDSA and the wider digital stewardship community will contribute to its ongoing development. Feel free to drop suggestions in the comments below or send an email to ndsa@loc.gov with thoughts on future terms and definitions.”

    The first half of this quote seems to embody the community engagement I’m excited about. The second half of the quote puts a hidden, human bottleneck between community feedback and future development. Could we do this more openly? [cough] wiki?. I’m not currently feeling the Levels of Preservaiton document is very alive, but I’d like to help make it so. Currently that’s difficult.

    In the spirit of contributing something, regardless of my moaning about the general approach, here are a few specific comments on some of the entries:

    Archival Original/Received Version/Preservation Copy: There’s something about these terms that makes me nervous. The complete omission of the very widely used and understood SIP/AIP/DIP from OAIS is I think part of the problem.

    Authenticity: I’m not an Archivist (and would not want to even consider speaking for archivists) but I suspect a few of them might not be happy with this definition. And what does this “mechanical” term mean? Explain/define or remove.

    Backup: Is there a distinction to be made here between backup of current business critical type information (eg. the typical daily incrementals and weekly full backups) and the multi copy redundancy found in comprehensive digital repository implementations (putting multiple copies in geographically separate locations and then checking them for integrity/consistency)? This should perhaps be two interlinked definitions. The wording is tricky but the distinction is I think important, if somewhat fuzzy.

    Bagger: This is the only tool defined. Either define all preservation tools or none of them.

    Canonical: This isn’t a definition of “Canonical”. This needs a serious look.

    Digital Content / Digital materials: What is an “Item”? I don’t like these terms. This needs a rethink and some clearer definitions that reference the “digital object” term in manner that makes sense.

    Digital Preservation: This seems to be a very slight change on an otherwise word for word quote from the ALA definition (http://www.ala.org/alcts/resources/preserv/defdigpres0408). Why change it? Why is it not referenced?

    Emulation: Emulation isn’t always a means of overcoming technological obsolescence, or indeed isn’t always a means of imitating obsolete systems on future computers.

    File Format: Gulp. This is a hard term to get a good definition for, but this is wide of the mark. I clumsily reference some of the important points here: http://libraries.stackexchange.com/questions/1117/converting-invalid-pdfs-or-not-for-digital-preservation/1389#1389

    Format Migration: Is not used in the most part to overcome technological obsolescence. It’s used to create Derivatives.

    Ingest: What is a “managed environment”? I think the universal term for this is a digital repository? Either way, there needs to be a definition for this, and it needs to be used consistently.

    Instance: “Any particular instantiation of a digital file, object or collection.” I’m not really sure this is a useful term to have in the glossary. Firstly, what is a “file” or “collection” – these aren’t in the glossary. Secondly, why not just say “This is an instance of a digital object”? Hence no specific DP definition is required.

    Lifecyle: There are a lot of terms in this definition that aren’t defined anywhere else. This definition would be much more useful with links to a few of the top lifecycle models.

    Metadata: I’m not going to get into this, but it needs to refer to existing standards and definitions, otherwise new or unreferenced definitions just muddy the murky waters of metadata even further.

    Package N/V: This needs to key into OAIS.

    Received version: Is “the closest surviving copy” really still the “received version”? “Library”??? Define “Repository” and use that.

    Unique Identifier: Seems to be missing some wider concepts, which would be useful for the reader (domains, resolving, etc)

    Storage Migration: I’ve never heard it called this before. I’d guess others might have concerns?

    Validation: I think it would be useful to provide a careful definition of “Format Validation” (as used by JHOVE) as it’s an often misunderstood concept, leading to often very poor preservation action decisions.

    Verify: This is fuzzy and misleading. Remove or be more specific about what is being verified in the term.

    That’s my two-penneth but as I say, mapping these to other key definitions would I think help to massively improve the quality of this document.

    Cheers

    Paul

  2. Butch Lazorchak
    February 13, 2013 at 5:21 pm

    Paul,

    These are great comments and your points are well taken.

    While all can judge the utility of “just another glossary,” this one is designed more as a “meta glossary,” pulling in appropriate terms and definitions from a wide range of existing glossaries, including domains outside of libraries and archives.

    Your idea of including provenance information is a very good one. We have some of that information available in different forms as part of the information gathering so we’ll see if we can address that in a future version.

    The NDSA will also explore ways to make the process more transparent and inclusive.

    “Digital stewardship” touches on so many disparate fields that it’s useful to have a set of terms we can call our own that define what we mean when we talk about our processes.

    Will we ever come to full agreement on terms and definitions even within our own community? I hope not! It would make evenings out so much less interesting!

  3. Paul Wheatley
    February 15, 2013 at 10:55 am

    Thanks for that response Butch! I hope its possible to move things towards a bit more of an open process, and I’d love to contribute more in future. I do realise this isn’t always easy, and the last thing you want is to start a definition war that no one ever wins…

    I like your take on (not) reaching full agreement! *:-)

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.