Tool Time, or a Discussion on Picking the Right Digital Preservation Tools for Your Program: An NDSR Project Update

The following is a guest post by John Caldwell, National Digital Stewardship Resident at the United States Senate Historical Office.

Who remembers Home Improvement? Tim the “Tool Man” Taylor was always trying to show the “Tool Time” audience how to build things, make repairs and of course, demo new tools made by the show’s sponsor, Binford. In true sitcom fashion, he broke more things than he fixed, thanks to his “more power” mantra. But I’m reminded of the episode “Be True To Your Tool,” where Tim tested a new saw, took it apart to analyze its construction and refused to endorse it on the show because it wasn’t good enough.

I see this as an object lesson for the digital preservation community. There are lots of tools out there, from checksum validators to digital forensics suites and wholesale preservation solutions. Many people feel it’s important to have the latest and greatest (as someone who regularly upgrades his cell phone, I can sympathize), but in this impulse for the new and the now, we sometimes forget to ask the big question: For your institution, is tool X good enough? Or, to put it another way, is this the right tool for you?

I’m trying to answer that question right now in the U.S. Senate Historical Office.

Commercial Break: My NDSR Project

44 USC §2118 and Senate Standing Rules XI and XXVI require the Secretary of the Senate to transfer non-current Senate committee records to NARA’s Center for Legislative Archives for long-term storage and preservation. The Senate Historical Office, specifically the Senate Archivist and her team, takes on the task of working with Senate offices and their archivists to collect, describe and prepare records for transfer to NARA, who takes on the responsibility of long-term preservation and storage.

Since the beginning of the 111th Congress in 2009, Senate archivists have transferred nearly 12 TB of Senate Committee digital records to NARA, focusing primarily on gathering digital records and describing their informational content. In 2015, now that the archivists are more experienced in managing electronic records, it’s time to better align the Senate with the digital preservation best practices that have developed over the last six years.

And this is where I come in. My project (PDF) is to help the Senate archivists by:

  1. studying current Senate workflows;
  2. benchmarking current policies against best practices;
  3. reviewing and testing potential digital curation applications;
  4. proposing sustainable workflows that align with current digital curation standards; and
  5. producing a white paper to sum up current processes and propose next steps.

So, where are we now in this project? I’m starting step number 3, “reviewing and testing potential digital curation applications.” In other words… it’s tool time!

Back to our Tool Talk

In order to determine what the right tool, or tool box, is for the Senate archivists, we’re following a modified version of Regine Heberlein’s “Gospel of Metadata” presented at the Introduction to Metadata Power Tools for the Curious Beginner pop-up session during the 2015 Society of American Archivists Annual Meeting in Cleveland. The session was designed to present case studies of archivists with limited IT experience sharing the tools and techniques they’ve found successful in processing existing collections and “cleaning up” digital object metadata. Though the tools demo-ed during the session were designed to assist archivists managing existing data, Heberlein’s process can be applied to help inform the selection of tools used to generate unique descriptive and preservation metadata.

The first step in answering that question is to know your records. Here, that meant learning more about how electronic records are being managed in the Senate committees, how committee archivists are processing electronic records, and what NARA does with the materials when they receive the transferred records.

We decided that surveying the archivists, IT staff, and committee clerks would be the best way to ask about electronic records management and electronic records archiving in various office environments. I spent the first few months meeting with committee staff and learning about their particular processes. I also met with the staff of the Center for Legislative Archives to find out what happens when our bytes leave the Hill.

Once we learned about our records, we addressed the second step in Heberlein’s gospel, what we want the end result to be. For the last six years, there have been conscious efforts made to increase the variety and quality of metadata that accompanies records transferred from the Senate to NARA. To date, the focus has been on better content and contextual description, and these efforts have reaped much benefit, especially for committees needing to recall records. The Center’s staff doesn’t have the time to augment records description, so it falls to the Senate archivists to generate the descriptive metadata which accompanies the records. There is an automatic 20 year closure on all Senate committee records (and 50 for investigations, nominations, and records with PII), so it will be at least 2035 before today’s work product may become accessible to researchers; the more that can be done up front, the more discoverable content will be later.

One of the many things we’re trying to do now is add preservation metadata to the records, establishing their integrity as early in the lifecycle as possible. A lot can happen to a piece of paper in 20 years’ time, but for a digital file, 20 years of benign neglect is tantamount to destruction due to technological obsolesence. Two aspects of integrity that have been identified are file format identification and fixity in the form of cryptographic hashes. Other tasks we’re hoping automated tools will improve include identifying PII, getting more accurate volume information on transfers, de-duplication, and anything to help conquer “the email problem.”

Once you know what you want, you need to find the tool for the task. This brings our conversation full circle: do we get the tool we need, or do we find something for the tool to do?

A lot goes into trying to find that perfect fit:

  • Placement: Where does the tool fit into your process?
  • Purpose: What does the tool actually do? Is it replacing a process (making it more efficient) or are we using if for a new process?
  • Utility: How easy is the tool to use and does its output make sense?
  • Viability: Is the tool a long-term solution or a quick fix for today?

These are the questions I’m in the middle of answering right now. Here’s what I have so far:

Placement: Where does the tool fit into your process?

Since there are nine archivists working in eight different offices, there is no single process. I created a workflow for all of the archivists to document how they process electronic records. With the workflows, we can identify specific processes that can be automated, where and when to incorporate tools, and how to make their integration as seamless as possible.

Electronic Records Processing Workflow. Credit: John Caldwell

Electronic Records Processing Workflow. Credit: John Caldwell

Purpose: What does the tool actually do?

This is where research skills come into play: identifying all the possible tools you think will get the job done, learning what they do best, hearing from other professionals’ experiences, reading the manuals to see if they’re actually usable, and winnowing down that list to three or four that seem, on the surface, to be a good fit. (This is very reminiscent of my college search, actually.) If you want to test multiple tools that perform the same primary function, an important question to consider is: what else do they do? For example, NARA File Analyzer and DROID are both principally designed to examine digital files and identify the file type, but they also have the ability to generate checksums at the same time; on the other hand, Karen’s Directory Printer only generates checksums. The workflow analysis is also important. Knowing what steps in the process each tool affects will help you decide whether to test and how best to test a tool.

Utility: How easy is the tool to use and does its output make sense?

If the tool is too complicated to use or the output too complex, then the tool is unusable. A tool that is easy to use but tries to do too many things or is too specialized may be just as impractical. Using our earlier example of fixity and format identification, is it better to use a tool only for its intended purpose (e.g. DROID only for format identification) and add multiple tools to the process, each doing a specific task (such as Karen’s Directory Printer for checksums)? Or do you try to maximize the multiple functionality of a tool (NARA File Analyzer’s dual fixity/format identification features), even if it makes the resulting data more complicated to use long term? Do you install a full suite of programs (e.g. BitCurator) because there are one or two individual tools that look promising (Bulk Extractor to identify PII in large data sets)? Or do you try to isolate the specific tools you think you want? Even if there isn’t a strong argument now for installing and learning how to use the full BitCurator environment, might there be a situation in a year or two where its full functionality will be useful?

Screenshots of various tools for the digital preservation toolbox. Credit: John Caldwell

Screenshots of various tools for the digital preservation toolbox. Credit: John Caldwell

These are just some of the issues that we are confronting as we enter the testing phase of tool selection. The seemingly straightforward question of utility is fundamentally tied to the question of purpose, and also the viability question: is the tool a long-term solution or a quick fix for today?

As the testing phase gets underway, we’re developing a procedure that can be replicated with every potential tool for each specific purpose, identifying the essential criteria, and figuring out the logistics for implementation in a production environment. It will be some time before we can hope to make final selections, but we’re following a necessary and invaluable sequence of events that will be beneficial to the digital archivists and the institution as a whole.

Diving into this process has given me a new appreciation for the Tool Man. Maybe if he had taken his time in every episode, instead of just rushing ahead, “More Power” may have led to better results. But then, that can make for boring TV. Fortunately, my project is anything but boring!

Improving Technical Options for Audiovisual Collections Through the PREFORMA Project

The digital preservation community is a connected and collaborative one. I first heard about the Europe-based PREFORMA project last summer at a Federal Agencies Digitization Guidelines Initiative meeting when we were discussing the Digital File Formats for Videotape Reformatting comparison matrix. My interest was piqued because I heard about their incorporation of FFV1 and Matroska, […]

Cultural Institutions Embrace Crowdsourcing

Many cultural institutions have accelerated the development of their digital collections and data sets by allowing citizen volunteers to help with the millions of crucial tasks that archivists, scientists, librarians, and curators face. One of the ways institutions are addressing these challenges is through crowdsourcing. In this post, I’ll look at a few sample crowdsourcing projects […]

The National Digital Platform for Libraries: An Interview with Trevor Owens and Emily Reynolds from IMLS

I had the chance to ask Trevor Owens and Emily Reynolds at the Institute of Museum and Library Services (IMLS) about the national digital platform priority and current IMLS grant opportunities.  I was interested to hear how these opportunities could support ongoing activities and research in the digital preservation and stewardship communities. Erin: Could you […]

Seeking Comment on Migration Checklist

The NDSA Infrastructure Working Group’s goals are to identify and share emerging practices around the development and maintenance of tools and systems for the curation, preservation, storage, hosting, migration, and similar activities supporting the long term preservation of digital content. One of the ways the IWG strives to achieve their goals is to collaboratively develop […]

Viewshare Supports Critical Thinking in the Classroom

This year I had the pleasure of meeting Dr. Peggy Spitzer Christoff, lecturer in Asian and Asian American Studies at Stony Brook University. She shared with me how she’s using the Library of Congress’ Viewshare tool to engage her students in an introduction to Asia Studies course. Peg talked about using digital platforms as a way to improve writing, […]

Mapping the Digital Galaxy: The Keepers Registry Expands its Tool Kit

This past month, The Keepers Registry released a new version of its website with a suite of significant new features to help its members monitor the archival status of e-journal content. The Library of Congress has been one of the archiving institutions of The Keepers Registry and we thought this was a good time to […]

Mapping Libraries: Creating Real-time Maps of Global Information

The following is a guest post by Kalev Hannes Leetaru, a data scientist and Senior Fellow at George Washington University Center for Cyber & Homeland Security. In a previous post, he introduced us to the GDELT Project, a platform that monitors the news media, and presented how mass translation of the world’s information offers libraries […]

We Welcome Our Email Overlords: Highlights from the Archiving Email Symposium

This post is co-authored with Erin Engle, a Digital Archivist in the Office of Strategic Initiatives. Despite the occasional death knell claims, email is alive, well and exponentially thriving in many organizations. It’s become an increasingly complex challenge for collecting and memory institutions as we struggle with the same issues: How is email processed differently […]

We Did All That? NDSA Standards and Practices Working Group Project Recaps

The end of the school year often finds me thinking about time gone by. What did I work on and what can I show for it? The NDSA Standards and Practices Working Group members were in the same frame of mind so we recently did a survey of our projects and accomplishments since the NDSA […]