The following is a guest post from Meg Phillips, Electronic Records Lifecycle Coordinator for the National Archives and Records Administration.
“What’s the bare minimum I can responsibly do with my electronic stuff?” was one of the central questions on the table at CurateCamp Processing. The unconference, focused on Processing Data / Processing Collections, was a great way for a group of thought leaders and practitioners to surface issues keeping them up at night, compare notes, and start charting a path forward. The theme for this CURATEcamp framed a series of discussions on how archivists and librarians think about processing digital collections compared to the ways programmers, software developers, and engineers think about processing data. We worked on a lot of different issues, but I found one particularly interesting: what do recent discussions in the archival community about minimal processing mean for digital materials?
The CURATEcamp format allows all participants to propose and collectively select and organize the sessions the group is most interested in discussing. One of the sessions that resulted from this process focused on how the archival principles of “More Product, Less Process” (MPLP) as Mark Greene and Dennis Meissner describe in “More Product, Less Process: Revamping Traditional Archival Processing,” apply to the processing of digital materials.
The participants in this session wanted to explore whether we could reach a professional consensus around what must be done to all digital files. This question would also reveal what the community considers intensive processing that might be applied to only some collections or files. We wanted to benefit from MPLP’s rational approach to allocating resources. If we could figure out how to apply these concepts, the maximum number of collections would be usable by the greatest number of people in the electronic realm as well as the physical.
There are clearly some differences in managing paper and electronic objects, but there are similarities, too. One important difference is that some processing steps for digital objects can be automated. Even if the actions are performed at the item (or file) level, as is often the case in the electronic world, this kind of data processing is not necessarily a bottleneck that creates backlogs. Similarly, there are opportunities to use content searching to locate electronic items that aren’t available for physical items. On the other hand, as for physical records, processing steps done by humans, or steps that require a great deal of analysis, can create processing backlogs even if they can be applied to many files at a time.
The participants in the session agreed that the minimum elements may not always be the same for all institutions. Some institutions have legal environments or strong researcher expectations that, for example, the original media will be preserved, or restricted information exempt from Freedom of Information Act requests must not be released to researchers. (There was another session at the CURATEcamp specifically about using automated tools to speed the review of collections for restricted information, an intriguing related topic.)
In spite of the importance of the particular institution’s environment, by the end of the meeting session participants were able to sketch out a preliminary list of minimal processing steps, which follows:
- Establish fixity, for example through hash codes, so changes to files can be detected
- Make a backup copy to reduce the risk of loss
- Provide write-blockers to ensure that files can’t be changed accidentally or intentionally Document the chain of custody and provenance and provide some archival context for the materials
- Provide some way of discovering that the materials exist and of finding materials within the collection.
One interesting thing about this list is what it doesn’t include. The list does not include identifying the format of the files, validating that files are well-formed, or migrating files to more researcher-friendly formats. The first few topics didn’t even come up. Someone did suggest that providing files to researchers in formats they could use might be essential, but others believed that format migration and emulation were so complicated that they should be considered intensive processing, not minimal processing. However, as MPLP reminds us, minimal processing may not be sufficient for all collections.
The hour available to generate this list at the CURATEcamp went by in a flash, and I was impressed that we were able to generate even a first draft like this. However, we didn’t have time to systematically poke holes in the ideas reflected here or debate a lot of other options. I would be thrilled to hear from blog readers about their thoughts and reactions.
What do you think?
Would it be possible, or even desirable, to have a community definition of minimally acceptable processing for born digital archival content?
What are your opinions about the items we came up with at this session?
What would you add or subtract from the list, and why?