Top of page

CurateCamp Processing: Processing Data/Processing Collections

Share this post:

Alongside this year’s NDSA/NDIIPP conference, DigitalPreservation 2012, we are excited to try out another kind of meeting, an unconference. In conjunction with DigitalPreservation 2012 we are going to play host to a CurateCamp. For those unfamiliar with unconferences, the key idea is that the participants define the agenda and that there are no spectators, everyone who comes should plan on actively participating in and helping to lead discussions. Everybody who participates should come ready to work.

“Data Processing Machine” from Cushing Memorial Library and Archives, Flickr

We are focusing this camp on the idea of processing, bringing together the computational sense of the word with the archival sense of it. We are particularly excited about bringing together archivists and curators with software developers and engineers to do some creative thinking and tinkering. You can read up on the topic below. We will be opening up registration for the camp, and posting information about where exactly in the DC metro area we will be hosting the event, but we wanted to make sure those interested could put it on their calendars now. The camp will be the last day of DigitalPreservation 2012, July 26th and it is being facilitated by myself and  Leslie Johnston from the Library of Congress and Meg Phillips, Electronic Records Lifecycle Coordinator at the National Archives and Records Administration and Mark Matienzo, Digital Archivist at Yale University.

If you are interested in participating, or just have ideas for things you would love to see campers engage with, take a minute to post a comment about an idea you have for a session in the comments of this post. Consider posing some questions you would like the group to think about tackling in some of the sessions.

Processing Data/Processing Collections

Processing means different things to an archivist and a software developer. To the former, processing is about taking custody of collections, preserving context, and providing arrangement, description, and accessibility. Processing, in its analog archival sense, also includes a lot of preservation, (stabilization, preliminary conservation assessment, and the dreaded “re-housing”). To the latter, processing is about computer processing and has to do with how one automates a range of tasks through computation. When a cultural heritage organization’s work is organized around processing digital objects, these two notions of processing intermingle. This CurateCamp unconference is intended to put these two notions of processing together in whatever ways can be imagined by the curators, archivists, librarians, scholars, software developers, computer engineers, and others that attend.

"This is what Archivist’s Do” from dolescum on Flickr.

Potential topics and considerations could include:

  • Automated inventorying and file characterization
  • Computational determination of hierarchical arrangement
  • Format validation & migrations
  • Automated metadata extraction
  • Potential roles for entity extraction in subject cataloging
  • Dynamically generated description
  • Malware scanning
  • Pattern & fuzzy searching for PII, SSNs, etc
  • Automated access restrictions
  • Generating visualizations and using them as access tools
  • Human computation’s potential role in cultural heritage collections
  • Machine learning and digital collections
  • Using name authority linked data
  • Processes for geo-referencing
  • Potential uses of facial recognition tools for identifying individuals in collection images

What do you think we should discuss and tinker with at the camp?


Comments (16)

  1. Sounds like fun! The openness of it offers a lot of potential, so it’s kind of hard brainstorming topics. Perhaps a topic could be methods for indexing unstructured digitally-scanned material (with a little help from OCR). This may fall under automated metadata extraction.

  2. Awesome news, I’m really looking forward to this! I hope we can make time to discuss compression and packaging of archival information packages (AIPs). We’re putting a lot of thought into this right now for the Archivematica project (feel free to take a peak at our draft requirements: https://www.archivematica.org/wiki/AIP_packaging_and_compression)

    Basically, we’re looking at:

    1- Whether compression and packaging should be a processing decision option.
    2- Which tools
    3- Which formats

    Cheers,
    Courtney
    Archivematica Community Manager, Artefactual Systems, Inc.

  3. The Augmented Processing Project (APT) at the School of Information at The University of Texas at Austin speaks to your focus on the intersection of computation and processing. Specifically, we are looking at processing digitized objects using large interactive surfaces. A description of the first prototype will appear shortly in the proceedings of Theory and Practice of Digital Libraries 2012 [TPDL]. http://www.tpdl2012.org. In addition we will be giving a presentation about the project at SAA [session 105 – “A Meeting of the Minds: HCI and the Re-engineering of Archival Processing”].

  4. Any details on location?

  5. Thanks for sharing these great ideas everyone, keep um coming! To answer Mark’s question, we are still finalizing a venue, but whichever venue we go with it will accessible via DC’s metro system. Stay tuned for more details on the exact location.

  6. Not sure I can make this one… any plans to have a Camp alongside iPres at the beginning of October?

  7. Running a CurateCamp alongside iPres is a great idea. You might think about contacting someone from the program committee about the idea. Feel free to copy the concept of this whole cloth or work up something else. There is also info on running a CurateCamp on their website.

  8. Trevor, done. I’m in communication with the organizing committee.

  9. I wish I could be there. I’ve been working around these sorts of topics for some time.

    If you haven’t seen it, check out The Visible Archive project (http://visiblearchive.blogspot.com.au/) for some great collection visualisations.

    Some stuff of mine that might be of interest:

    Facial recognition: http://discontents.com.au/shoebox/archives-shoebox/the-real-face-of-white-australia

    Topic modelling: http://discontents.com.au/shed/experiments/topic-modelling-in-the-archives

    Machine processing / crowdsourcing / geolocation: http://discontents.com.au/words/articles/local-heroes

    Text analysis: http://labs.nma.gov.au/blog/category/seeing-the-collection/

    Have fun and tweet often!

  10. I have been struggling with automating the flux of metadata out and back in to images. The ideal would be able to generate lists of metadata on large collections extracting the metadata from the files and then editing these lists, checking metadata, making changes and then pushing this metadata back to files. I have had some limited success with smaller collections, but not with collections containing TIFF, JPEG, DNG files and over 200-300k files.

  11. I’m really interested in hearing about any methods or tools, other than DROID, in place to automate the appraisal process for digital born records.

    Millard’s post also brought to mind that I would like to find a way to extract multiple types of metadata (IPTC, Exif, XMP, Dublin Core) from batches of digital images and export them into a single file or at least a single file per image.

    Also, I’m always interested in discussing streamlined processing approaches to digital born records.

  12. Another little experiment that might be of interest. It also illustrates how you can use Userscripts to play around with collection interfaces without having to get to the back end.

    Basically I took portrait images that I’d extracted from documents using a facial detection script (see the link above) and then created a Userscript to insert these at appropriate points in the collection database to show ‘The people inside’:

    http://storify.com/wragge/the-people-inside

    Script: http://userscripts.org/scripts/show/138111

  13. Good post

  14. Nice post

  15. Sounds like fun! The openness of it offers a lot of potential, so it’s kind of hard brainstorming topics. Perhaps a topic could be methods for indexing unstructured digitally-scanned material (with a little help from OCR and https://bestcoyotecalls.com). This may fall under automated metadata extraction.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.


Required fields are indicated with an * asterisk.