Top of page

Tool Time, or a Discussion on Picking the Right Digital Preservation Tools for Your Program: An NDSR Project Update

Share this post:

The following is a guest post by John Caldwell, National Digital Stewardship Resident at the United States Senate Historical Office.

Who remembers Home Improvement? Tim the “Tool Man” Taylor was always trying to show the “Tool Time” audience how to build things, make repairs and of course, demo new tools made by the show’s sponsor, Binford. In true sitcom fashion, he broke more things than he fixed, thanks to his “more power” mantra. But I’m reminded of the episode “Be True To Your Tool,” where Tim tested a new saw, took it apart to analyze its construction and refused to endorse it on the show because it wasn’t good enough.

I see this as an object lesson for the digital preservation community. There are lots of tools out there, from checksum validators to digital forensics suites and wholesale preservation solutions. Many people feel it’s important to have the latest and greatest (as someone who regularly upgrades his cell phone, I can sympathize), but in this impulse for the new and the now, we sometimes forget to ask the big question: For your institution, is tool X good enough? Or, to put it another way, is this the right tool for you?

I’m trying to answer that question right now in the U.S. Senate Historical Office.

Commercial Break: My NDSR Project

44 USC §2118 and Senate Standing Rules XI and XXVI require the Secretary of the Senate to transfer non-current Senate committee records to NARA’s Center for Legislative Archives for long-term storage and preservation. The Senate Historical Office, specifically the Senate Archivist and her team, takes on the task of working with Senate offices and their archivists to collect, describe and prepare records for transfer to NARA, who takes on the responsibility of long-term preservation and storage.

Since the beginning of the 111th Congress in 2009, Senate archivists have transferred nearly 12 TB of Senate Committee digital records to NARA, focusing primarily on gathering digital records and describing their informational content. In 2015, now that the archivists are more experienced in managing electronic records, it’s time to better align the Senate with the digital preservation best practices that have developed over the last six years.

And this is where I come in. My project (PDF) is to help the Senate archivists by:

  1. studying current Senate workflows;
  2. benchmarking current policies against best practices;
  3. reviewing and testing potential digital curation applications;
  4. proposing sustainable workflows that align with current digital curation standards; and
  5. producing a white paper to sum up current processes and propose next steps.

So, where are we now in this project? I’m starting step number 3, “reviewing and testing potential digital curation applications.” In other words… it’s tool time!

Back to our Tool Talk

In order to determine what the right tool, or tool box, is for the Senate archivists, we’re following a modified version of Regine Heberlein’s “Gospel of Metadata” presented at the Introduction to Metadata Power Tools for the Curious Beginner pop-up session during the 2015 Society of American Archivists Annual Meeting in Cleveland. The session was designed to present case studies of archivists with limited IT experience sharing the tools and techniques they’ve found successful in processing existing collections and “cleaning up” digital object metadata. Though the tools demo-ed during the session were designed to assist archivists managing existing data, Heberlein’s process can be applied to help inform the selection of tools used to generate unique descriptive and preservation metadata.

The first step in answering that question is to know your records. Here, that meant learning more about how electronic records are being managed in the Senate committees, how committee archivists are processing electronic records, and what NARA does with the materials when they receive the transferred records.

We decided that surveying the archivists, IT staff, and committee clerks would be the best way to ask about electronic records management and electronic records archiving in various office environments. I spent the first few months meeting with committee staff and learning about their particular processes. I also met with the staff of the Center for Legislative Archives to find out what happens when our bytes leave the Hill.

Once we learned about our records, we addressed the second step in Heberlein’s gospel, what we want the end result to be. For the last six years, there have been conscious efforts made to increase the variety and quality of metadata that accompanies records transferred from the Senate to NARA. To date, the focus has been on better content and contextual description, and these efforts have reaped much benefit, especially for committees needing to recall records. The Center’s staff doesn’t have the time to augment records description, so it falls to the Senate archivists to generate the descriptive metadata which accompanies the records. There is an automatic 20 year closure on all Senate committee records (and 50 for investigations, nominations, and records with PII), so it will be at least 2035 before today’s work product may become accessible to researchers; the more that can be done up front, the more discoverable content will be later.

One of the many things we’re trying to do now is add preservation metadata to the records, establishing their integrity as early in the lifecycle as possible. A lot can happen to a piece of paper in 20 years’ time, but for a digital file, 20 years of benign neglect is tantamount to destruction due to technological obsolesence. Two aspects of integrity that have been identified are file format identification and fixity in the form of cryptographic hashes. Other tasks we’re hoping automated tools will improve include identifying PII, getting more accurate volume information on transfers, de-duplication, and anything to help conquer “the email problem.”

Once you know what you want, you need to find the tool for the task. This brings our conversation full circle: do we get the tool we need, or do we find something for the tool to do?

A lot goes into trying to find that perfect fit:

  • Placement: Where does the tool fit into your process?
  • Purpose: What does the tool actually do? Is it replacing a process (making it more efficient) or are we using if for a new process?
  • Utility: How easy is the tool to use and does its output make sense?
  • Viability: Is the tool a long-term solution or a quick fix for today?

These are the questions I’m in the middle of answering right now. Here’s what I have so far:

Placement: Where does the tool fit into your process?

Since there are nine archivists working in eight different offices, there is no single process. I created a workflow for all of the archivists to document how they process electronic records. With the workflows, we can identify specific processes that can be automated, where and when to incorporate tools, and how to make their integration as seamless as possible.

Electronic Records Processing Workflow. Credit: John Caldwell
Electronic Records Processing Workflow. Credit: John Caldwell

Purpose: What does the tool actually do?

This is where research skills come into play: identifying all the possible tools you think will get the job done, learning what they do best, hearing from other professionals’ experiences, reading the manuals to see if they’re actually usable, and winnowing down that list to three or four that seem, on the surface, to be a good fit. (This is very reminiscent of my college search, actually.) If you want to test multiple tools that perform the same primary function, an important question to consider is: what else do they do? For example, NARA File Analyzer and DROID are both principally designed to examine digital files and identify the file type, but they also have the ability to generate checksums at the same time; on the other hand, Karen’s Directory Printer only generates checksums. The workflow analysis is also important. Knowing what steps in the process each tool affects will help you decide whether to test and how best to test a tool.

Utility: How easy is the tool to use and does its output make sense?

If the tool is too complicated to use or the output too complex, then the tool is unusable. A tool that is easy to use but tries to do too many things or is too specialized may be just as impractical. Using our earlier example of fixity and format identification, is it better to use a tool only for its intended purpose (e.g. DROID only for format identification) and add multiple tools to the process, each doing a specific task (such as Karen’s Directory Printer for checksums)? Or do you try to maximize the multiple functionality of a tool (NARA File Analyzer’s dual fixity/format identification features), even if it makes the resulting data more complicated to use long term? Do you install a full suite of programs (e.g. BitCurator) because there are one or two individual tools that look promising (Bulk Extractor to identify PII in large data sets)? Or do you try to isolate the specific tools you think you want? Even if there isn’t a strong argument now for installing and learning how to use the full BitCurator environment, might there be a situation in a year or two where its full functionality will be useful?

Screenshots of various tools for the digital preservation toolbox. Credit: John Caldwell
Screenshots of various tools for the digital preservation toolbox. Credit: John Caldwell

These are just some of the issues that we are confronting as we enter the testing phase of tool selection. The seemingly straightforward question of utility is fundamentally tied to the question of purpose, and also the viability question: is the tool a long-term solution or a quick fix for today?

As the testing phase gets underway, we’re developing a procedure that can be replicated with every potential tool for each specific purpose, identifying the essential criteria, and figuring out the logistics for implementation in a production environment. It will be some time before we can hope to make final selections, but we’re following a necessary and invaluable sequence of events that will be beneficial to the digital archivists and the institution as a whole.

Diving into this process has given me a new appreciation for the Tool Man. Maybe if he had taken his time in every episode, instead of just rushing ahead, “More Power” may have led to better results. But then, that can make for boring TV. Fortunately, my project is anything but boring!

Comments

  1. I like the reference to Tool Time as a metaphor for this issue. If you are going to be building a house do you start with, “What tool do I use?” Do you start by asking what tools other people used to build their house? Or do you start with an architectural design that details how you want the house laid out, what different areas of the house are for functionally, and how you want those to work together?

    Just like home building scenario, the tool you use to build a digital preservation system it is the least important. What matters is that the digital preservation system does what what you want and expect it to do. If you have failed to articulate those requirements and use cases, how can you expect a tool to solve your needs — you don’t even know what they are!

    Digital preservation systems are precisely that: systems. Systems are a complex set of elements (people, technologies) and the connections between them (policies, procedures). Without all of these pieces, there really isn’t a system. There is just a tool.

    A hammer isn’t a house, just as a tool isn’t a digital preservation system.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.


Required fields are indicated with an * asterisk.