Top of page

Check Yourself: How and When to Check Fixity

Share this post:

How do I know if a digital file/object has been corrupted, changed or altered? Further how can I prove that I know what I have? How can I be confident that the content I am providing is in good condition, complete, or reasonably complete?  How do I verify that a file/object has not changed over time or during transfer processes?

Please consider reading and commenting on this draft document.
Please consider reading and commenting on this draft document.

In digital preservation, a key part of answering these questions comes through establishing and checking the “fixity” or stability of digital content. At this point, many in the preservation community know they should be checking the fixity of their content, but how, when and how often?

A team of individuals from the NDSA Infrastructure & Standards working groups have developed Checking Your Digital Content: How, What and When to Check Fixity? in an effort to help stewards answer these questions in a way that makes sense for their organization based on their needs and resources. We are excited to publicly share this draft document for more broad open discussion and review here on The Signal. We welcome comments and questions, please post them at the bottom of this post for the working group to review.

Not Best Practices, but Guidance for Making Best Use of Resources at Hand

In keeping with work on the NDSA Levels of Digital Preservation, this document is not a benchmark or requirement. It is instead intended as a tool to help organizations develop a plan that fits resource constraints. Different systems and different collections are going to require different fixity checking approaches, and our hope is that this document can help.

Connection to National Agenda for Digital Stewardship

This guidance was developed in direct response to start to address the need articulated in the infrastructure section of the inaugural National Agenda for Digital Stewardship. I’ll include it below at length for context.

Fixity checking is of particular concern in ensuring content integrity. Abstract requirements for fixity checking can be useful as principals, but when applied universally can actually be detrimental to some digital preservation system architectures. The digital preservation community needs to establish best practices for fixity strategies for different system configurations. For example, if an organization were keeping multiple copies of material on magnetic tape and wanted to check fixity of content on a monthly basis, they might end up continuously reading their tape and thereby very rapidly push their tape systems to the limit of reads for the lifetime of the medium.

There is a clear need for use ‐ case driven examples of best practices for fixity in particular system designs and configurations established to meet particular preservation requirements. This would likely include description of fixity strategies for all spinning disk systems, largely tape ‐ based systems, as well as hierarchical storage management systems. A chart documenting the benefits of fixity checks for certain kinds of digital preservation activities would bring clarity and offer guidance to the entire community. A document modeled after the NDSA Levels of Digital Preservation would be a particularly useful way to provide guidance and information about fixity checks based on storage systems in use, as well as other preservation choices.

Again, please share your comments on this here, and consider forwarding this on to others who you think might have comments to share with us.

Comments (4)

  1. Hi Trevor, thanks for sharing this great resource. Regarding assurances from third-party storage providers, would it be worth mentioning cryptographic nonces?

  2. If we are really worried about bit level corruption, why don’t we use error correcting codes rather than just error detecting hashes or checksums? A robust code would allow recovery from a high level of damage, using fixity checks means that a single bit-flip invalidates the file.

  3. When selecting an algorithm, cryptographic security is a non-trivial criteria. While there have been no known attacks on archival data, it’s not beyond possibility that a bad actor could alter data and have it pass a weak (SHA1 and below) fixity check. As such, it’s difficult to trust weaker checksum algorithms – how do I know that the objects weren’t deliberately changed so that the MD5 hash was unchanged?

    Intentional damage is one of the threat vectors that Rosenthal et al list as a risk for digital preservation (http://www.dlib.org/dlib/november05/rosenthal/11rosenthal.html). It is certainly something to be mindful of when selecting an algorithm to use for verifying assets.

  4. Cryptographically strong checksums won’t save you from malicious behaviour on their own. If an attacker can change your content, they can also change your checksums to match! To counter intentional damage you need do more, like publish and/or chain and/or cryptographically sign your checksums. I believe LOCKSS does this kind of thing.

    To flip that around: if you’re only really worried about accidental damage, MD5/SHA1 is totally fine.

    @Jonathan using checksums to repair mirrored copies of a bitstream is effectively an error correction code system (albeit a rather big and clumsy one). Having said that, it would be nice to see error correction codes taken a bit further, e.g. using parity file a la Parchive (https://en.wikipedia.org/wiki/Parchive).

Add a Comment

Your email address will not be published. Required fields are marked *