A Day Camp for Digital Preservation

On July 26, 2012, the Library of Congress hosted CURATEcamp Processing: Processing Data/Processing Collections.

The idea to hold a CurateCamp had been percolating for some time, but the event really came about through a fortuitously timed conversation with our colleague Meg Phillips at the National Archives and Records Administration, and the interest of our colleague Mark Matienzo, a digital archivist at Yale University Library.  Throw in a lot of enthusiasm from my Library of Congress colleagues Trevor Owens and Jefferson Bailey, and CurateCamp Processing 2012 was on!

Suggesting topics and setting the schedule at CurateCamp Processing. Photo by Leslie Johnston

Suggesting topics and setting the schedule at CurateCamp Processing. Photo by Leslie Johnston

This was an “unconference,” a meeting where a theme is announced beforehand but the sessions and schedule are set collaboratively by the participants at the meeting.  The Library hosted a small unconference before – one of the series of CRIG RepoCamps in 2008 — but this the first unconference that the Library has organized.

And by organized, I mean threw open the doors.

Of course, it’s not strictly speaking true that this took no organizing. We identified a theme, set up a section on the CurateCamp wiki, suggested topics, promoted the event, and asked that anyone signing up think about topics and write about them in their registration.

Once we all arrived, anyone with a session idea wrote down a title and a short description on a piece of paper and taped it to a schedule grid on the wall. These were then reviewed, combined where appropriate, rearranged and horse-traded until there was agreement on a full final schedule.  More than half the sessions on the schedule have links through to notes from the session (I confess I am behind on getting my notes uploaded).  Lunchtime was dedicated to lightning talks.

Many associate unconferences with software development or geeky topics, and might be afraid to attend.  But CurateCamp?  It was first and foremost a good old-fashioned exploration of issues and ideas, such as:

  • What is appraisal for born-digital collections?
  • What is the minimum processing needed for born-digital collections to make them available to researchers as quickly as possible?
  • What are the most useful/usable formats for those materials for researchers?  Do we give them authenticated ISO disk images? Or migrate formats for usability?  Do we let them work with the original media?
  • If we have one workflow for processing the analog items in a collections and a different one for the born-digital items, how do pull together the description and discovery of the various components?
  • What are the core significant properties that we need to document to preserve born-digital collections? How do we extract them from files?
  • And how can we identify and extract key “entity” information such as names and events/dates and places?
  • And for places, how do we track changes in place names (and boundaries) over time?  And use that data to provide more map-based interfaces to collections?
  • How can we automate the discovery of Personal Identifying Information in born-digital collections and redact it, if necessary?
  • How do we provide counts for these collections? Storage size? Files? Items?
Breakout session at CurateCamp Processing. Photo by Leslie Johnston

Breakout session at CurateCamp Processing. Photo by Leslie Johnston

And discuss these issues we did.

In every session I was in, there was a lively discussion that involved archivists and technologists.  People talked about projects they had attempted, leading to both success and failure. People discussed the relative merits of different tools they had worked with.  Technologists heard about the requirements and concerns of archivists about processing and making born-digital collections available.  And archivists heard what might or might not be technically feasible. Check out the linked notes from the session on the wiki to read more.

No code was written (that I know of), but there was a lot of exchange of ideas in a group that included archivists, museum curators and registrars, librarians, and technologists.  I heard a lot of “Have you tried…?” and “Do you know about…?” and “How about if we try this?” and “I didn’t know about this before…” and “Thanks for sharing your work.” And of course, “It was so great to meet you.”

And there were lightning talks.

  • Trevor Owens reporting on Tim Sherratt’s project “The Real Face of White Australia”
  • Monique Politowski on the NARA/Ancentry.com records digitization partnership
  • Cal Lee on the BitCurator project
  • Brett Abrams discussing issues in accessioning databases
  • Jane Zhang on the issues of providing archival context for digital content
  • Jeanne Kramer-Smyth on rescuing content from de-commissioned systems
  • Michael Levy on the Blacklight implementaiton at the U.S. Holocaust Memorial Museum
  • Mark Matienzo on integrating tools for analyzing legacy media/filesystsms
  • Mitch Brodsky on the New York Philharmonic Archive discovery interface.

And that’s what CurateCamps are about.  Not necessarily all about the coding. It’s about participation and conversation.  And collaboration. And developing a community.

I have to share the great writeup that Jeanne Kramer-Smyth has on her Spellbound blog.  Thanks for your participation that day!

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.