Emerging Collaborations for Accessing and Preserving Email

The following is a guest post by Chris Prom, Assistant University Archivist and Professor, University of Illinois at Urbana-Champaign.

I’ll never forget one lesson from my historical methods class at Marquette University.  Ronald Zupko–famous for his lecture about the bubonic plague and a natural showman–was expounding on what it means to interrogate primary sources–to cast a skeptical eye on every source, to see each one as a mere thread of evidence in a larger story, and to remember that every event can, and must, tell many different stories.

He asked us to name a few documentary genres, along with our opinions as to their relative value.  We shot back: “Photographs, diaries, reports, scrapbooks, newspaper articles,” along with the type of ill-informed comments graduate students are prone to make.  As our class rattled off responses, we gradually came to realize that each document reflected the particular viewpoint of its creator–and that the information a source conveyed was constrained by documentary conventions and other social factors inherent to the medium underlying the expression. Settling into the comfortable role of skeptics, we noted the biases each format reflected.  Finally, one student said: “What about correspondence?”  Dr Zupko erupted: “There is the real meat of history!  But, you need to be careful!”

problemInbox

Dangerous Inbox by Recrea HQ. Photo courtesy of Flickr through a CC BY-NC-SA 2.0 license.

Letters, memos, telegrams, postcards: such items have long been the stock-in-trade for archives.  Historians and researchers of all types, while mindful of the challenges in using correspondence, value it as a source for the insider perspective it provides on real-time events.   For this reason, the library and archives community must find effective ways to identify, preserve and provide access to email and other forms of electronic correspondence.

After I researched and wrote a guide to email preservation (pdf) for the Digital Preservation Coalition’s Technology Watch Report series, I concluded that the challenges are mostly cultural and administrative.

I have no doubt that with the right tools, archivists could do what we do best: build the relationships that underlie every successful archival acquisition.  Engaging records creators and donors in their digital spaces, we can help them preserve access to the records that are so sorely needed for those who will write histories.  But we need the tools, and a plan for how to use them.  Otherwise, our promises are mere words.

For this reason, I’m so pleased to report on the results of a recent online meeting organized by the National Digital Stewardship Alliance’s Standards and Practices Working Group.  On August 25, a group of fifty-plus experts from more than a dozen institutions informally shared the work they are doing to preserve email.

For me, the best part of the meeting was that it represented the diverse range of institutions (in terms of size and institutional focus) that are interested in this critical work. Email preservation is not something of interest only to large government archives,or to small collecting repositories, but also to every repository in between. That said, the representatives displayed a surprising similar vision for how email preservation can be made effective.

Robert Spangler, Lisa Haralampus, Ken  Hawkins and Kevin DeVorsey described challenges that the National Archives and Records Administration has faced in controlling and providing access to large bodies of email. Concluding that traditional records management practices are not sufficient to task, NARA has developed the Capstone approach, seeking to identify and preserve particular accounts that must be preserved as a record series, and is currently revising its transfer guidance.  Later in the meeting, Mark Conrad described the particular challenge of preserving email from the Executive Office of the President, highlighting the point that “scale matters”–a theme that resonated across the board.

The whole account approach that NARA advocates meshes well with activities described by other presenters.  For example, Kelly Eubank from North Carolina State Archives and the EMCAP project discussed the need for software tools to ingest and process email records while Linda Reib from the Arizona State Library noted that the PeDALS Project is seeking to continue their work, focusing on account-level preservation of key state government accounts.

Functional comparison of selected email archives tools/services. Courtesy Wendy Gogel.

Functional comparison of selected email archives tools/services. Courtesy Wendy Gogel.

Ricc Ferrante and Lynda Schmitz Fuhrig from the Smithsonian Institution Archives discussed the CERP project which produced, in conjunction with the EMCAP project, an XML schema for email objects among its deliverables. Kate Murray from the Library of Congress reviewed the new email and related calendaring formats on the Sustainability of Digital Formats website.

Harvard University was up next.  Andrea Goethels and Wendy Gogel shared information about Harvard’s Electronic Archiving Service.  EAS includes tools for normalizing email from an account into EML format (conforming to the Internet Engineering Task Force RFC 2822), then packaging it for deposit into Harvard’s digital repository.

One of the most exciting presentations was provided by Peter Chan and Glynn Edwards from Stanford University.  With generous funding from the National Historical Publications and Records Commission, as well as some internal support, the ePADD Project (“Email: Process, Appraise, Discover, Deliver”) is using natural language processing and entity extraction tools to build an application that will allow archivists and records creators to review email, then process it for search, display and retrieval.  Best of all, the web-based application will include a built-in discovery interface and users will be able to define a lexicon and to provide visual representations of the results.  Many participants in the meeting commented that the ePADD tools may provided a meaningful focus for additional collaborations.  A beta version is due out next spring.

In the discussion that followed the informal presentations, several presenters congratulated the Harvard team on a slide Wendy Gogel shared, comparing the functions provided by various tools and services (reproduced above).

As is apparent from even a cursory glance at the chart, repositories are doing wonderful work—and much yet remains.

Collaboration is the way forward. At the end of the discussion, participants agreed to take three specific steps to drive email preservation initiatives to the next level: (1) providing tool demo sessions; (2) developing use cases; and (3) working together.

The bottom line: I’m more hopeful about the ability of the digital preservation community to develop an effective approach toward email preservation than I have been in years.  Stay tuned for future developments!

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.