Check Yourself: How and When to Check Fixity

How do I know if a digital file/object has been corrupted, changed or altered? Further how can I prove that I know what I have? How can I be confident that the content I am providing is in good condition, complete, or reasonably complete?  How do I verify that a file/object has not changed over time or during transfer processes?

Please consider reading and commenting on this draft document.

Please consider reading and commenting on this draft document.

In digital preservation, a key part of answering these questions comes through establishing and checking the “fixity” or stability of digital content. At this point, many in the preservation community know they should be checking the fixity of their content, but how, when and how often?

A team of individuals from the NDSA Infrastructure & Standards working groups have developed Checking Your Digital Content: How, What and When to Check Fixity? in an effort to help stewards answer these questions in a way that makes sense for their organization based on their needs and resources. We are excited to publicly share this draft document for more broad open discussion and review here on The Signal. We welcome comments and questions, please post them at the bottom of this post for the working group to review.

Not Best Practices, but Guidance for Making Best Use of Resources at Hand

In keeping with work on the NDSA Levels of Digital Preservation, this document is not a benchmark or requirement. It is instead intended as a tool to help organizations develop a plan that fits resource constraints. Different systems and different collections are going to require different fixity checking approaches, and our hope is that this document can help.

Connection to National Agenda for Digital Stewardship

This guidance was developed in direct response to start to address the need articulated in the infrastructure section of the inaugural National Agenda for Digital Stewardship. I’ll include it below at length for context.

Fixity checking is of particular concern in ensuring content integrity. Abstract requirements for fixity checking can be useful as principals, but when applied universally can actually be detrimental to some digital preservation system architectures. The digital preservation community needs to establish best practices for fixity strategies for different system configurations. For example, if an organization were keeping multiple copies of material on magnetic tape and wanted to check fixity of content on a monthly basis, they might end up continuously reading their tape and thereby very rapidly push their tape systems to the limit of reads for the lifetime of the medium.

There is a clear need for use ‐ case driven examples of best practices for fixity in particular system designs and configurations established to meet particular preservation requirements. This would likely include description of fixity strategies for all spinning disk systems, largely tape ‐ based systems, as well as hierarchical storage management systems. A chart documenting the benefits of fixity checks for certain kinds of digital preservation activities would bring clarity and offer guidance to the entire community. A document modeled after the NDSA Levels of Digital Preservation would be a particularly useful way to provide guidance and information about fixity checks based on storage systems in use, as well as other preservation choices.

Again, please share your comments on this here, and consider forwarding this on to others who you think might have comments to share with us.

4 Comments

  1. Nicholas Taylor
    February 8, 2014 at 5:31 pm

    Hi Trevor, thanks for sharing this great resource. Regarding assurances from third-party storage providers, would it be worth mentioning cryptographic nonces?

  2. Jonathan McCabe
    February 9, 2014 at 5:25 pm

    If we are really worried about bit level corruption, why don’t we use error correcting codes rather than just error detecting hashes or checksums? A robust code would allow recovery from a high level of damage, using fixity checks means that a single bit-flip invalidates the file.

  3. Alexander Duryee
    February 10, 2014 at 12:43 pm

    When selecting an algorithm, cryptographic security is a non-trivial criteria. While there have been no known attacks on archival data, it’s not beyond possibility that a bad actor could alter data and have it pass a weak (SHA1 and below) fixity check. As such, it’s difficult to trust weaker checksum algorithms – how do I know that the objects weren’t deliberately changed so that the MD5 hash was unchanged?

    Intentional damage is one of the threat vectors that Rosenthal et al list as a risk for digital preservation (http://www.dlib.org/dlib/november05/rosenthal/11rosenthal.html). It is certainly something to be mindful of when selecting an algorithm to use for verifying assets.

  4. Andy Jackson
    February 11, 2014 at 8:29 am

    Cryptographically strong checksums won’t save you from malicious behaviour on their own. If an attacker can change your content, they can also change your checksums to match! To counter intentional damage you need do more, like publish and/or chain and/or cryptographically sign your checksums. I believe LOCKSS does this kind of thing.

    To flip that around: if you’re only really worried about accidental damage, MD5/SHA1 is totally fine.

    @Jonathan using checksums to repair mirrored copies of a bitstream is effectively an error correction code system (albeit a rather big and clumsy one). Having said that, it would be nice to see error correction codes taken a bit further, e.g. using parity file a la Parchive (https://en.wikipedia.org/wiki/Parchive).

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.