Close Reading, Distant Reading: Should Archival Appraisal Adjust?

From time to time, co-chairs of the National Digital Stewardship Alliance Arts and Humanities Content Working Group will bring you guest posts addressing the future of research and development for digital cultural heritage as a follow-up to a dynamic forum held at the 2014 Digital Preservation Conference.  

The following is a guest post from Meg Phillips, External Affairs Liaison, National Archives and Records Administration. Opinions expressed are those of the author and do not necessarily represent positions of the National Archives and Records Administration.

Meg Phillips, External Affairs Liaison at the National Archives and Records Administration and member of the NDSA Coordinating Committee.

Meg Phillips, External Affairs Liaison at the National Archives and Records Administration and member of the NDSA Coordinating Committee.

Digital humanists and digital historians are employing research methods that most of us did not anticipate when we were learning to be archivists.  Do new types of research mean archivists should re-examine the way we learned to do appraisal?

The new types of researchers are experimenting with methods beyond the scholarly tradition of “close reading.”  When paper archives were the only game in town, close reading was all a researcher could do – it’s what we generally mean by “reading.”  Researchers studied individual records, extracting meaning and context from the information contained in each document.  Now, however, digital humanists are using born-digital or digitized collections to explore the benefits of computational analysis techniques, or “distant reading.” They are using computer programs to analyze patterns and find meaning in entire corpora of records without a human ever reading any individual record at all.

I have been interested in digital scholarship and its implications for archives for a while, but I hadn’t heard the phrase “distant reading” until seeing Franco Moretti’s book “Distant Reading” reviewed earlier this year. (See  “What is Distant Reading?” in the New York Times and “In Praise of Overstating the Case: A review of Franco Moretti, Distant Reading” in Digital Humanities Quarterly for a taste of the debate over the book.)  The phrase stuck with me as provocative shorthand for a new way of using records, and I started thinking about what distant reading might mean for archival appraisal.

Our traditions of archival appraisal are based on locating records that reward close reading.  A series appraised as permanent contains individual records that contain historically valuable information.  Both appraisal itself and the culling that happens during transfer or processing focus on removing records that do not contain permanently valuable information.

Now, however, it is possible to ask and answer entirely new kinds of questions with born-digital or digitized records. What did the network of influence in an organization look like?  How did communication flow? Was the chief executive interacting with a particular vendor unusually often? When did a new concept or term first appear and how quickly did use of the new term spread?  How did a disease spread through a community?  Not only is it possible, but early adopters are now teaching these research methods to a new generation of students.  For example, Professor Matthew Connelly is teaching a seminar at the London School of Economics called Hacking the Archives.  The course challenges students of international history to explore the new kinds of questions computational research allows.  These are questions whose answers emerge not from deep reading of individual records but from analysis of patterns in  large bodies of records.

The National Archives from user silbersam on <a href="https://flic.kr/p/9BQvkr">Flickr</a>.

The National Archives from user silbersam on Flickr.

The interesting thing about these questions is that the answers may rely on the presence of records that would clearly be temporary if judged on their individual merits. Consider email messages like “Really sick today – not coming in” or a message from the executive of a  regulated company saying “Want to meet for lunch?” to a government policymaker. In the aggregate, the patterns of these messages  may paint a picture of disease spread or the inner workings of access and influence in government.  Those are exactly the kinds of messages traditional archival practice would try to cull. In these cases, appraising an entire corpus of records as permanent would support distant reading much better.  The informational value of the whole corpus cannot be captured by selecting just the records with individual value.

If we adjusted practice to support more distant reading, archivists would still do appraisal, deciding what is worth permanent preservation.  We would just be doing it at a different level of granularity – appraising the research value of an entire email system, SharePoint site or social media account, for example.

Incidentally, on a practical level this level of appraisal might also lead to disposition instructions that are easier for creating offices to carry out.

Figuring out how to do appraisal to support both distant reading and close reading would be an excellent project for the archival and digital preservation fields.  What questions would we want to answer?  We could start with some questions like these:

  • How many researchers are actually engaged in distant reading?  What fields do they work in?  Are their numbers increasing?
  • Do they want to apply computational techniques to archival materials, for example Federal records in the National Archives, or in any other environment?  Perhaps they are getting their source material somewhere else, bypassing archives.
  • To what extent do their research methods rely on having a complete set of the records created rather than a subset of the most permanently valuable records?
  • Do current definitions of a record and current recordkeeping regulations support a change to appraisal of entire corpora of records?
  • How would we know which corpora of records were most useful to researchers?
    • Is the benefit of distant reading worth the cost and risk of retaining more material that could have personal privacy or other protected content?
  • Is there a meaningful difference between trying to support computational research and actually just keeping everything?  (Perhaps this whole discussion is just the modern version of the old tension between historians who want to save everything and archivists who are trying to put their resources toward the most important materials.)

Staff at the National Archives and other institutions are starting to create opportunities for archivists to discuss questions like these.  Josh Sternfeld of NEH, Jordan Steele of Johns Hopkins and Paul Wester and I from NARA will be holding a panel discussion of these issues at the Fall 2014 Mid Atlantic Regional Archives Conference meeting in Baltimore, for example.   Paul and I will be also be speaking with Matthew Connelly and others on an American Historical Association panel at the 2015 annual meeting in New York City, “Are We Losing History? Capturing Archival Records for a New ERA of Research.”

However, we need to create even more opportunities for archivists to explore these issues with digital humanists. A forum that pulled together digital researchers, archivists, librarians and technologists could be a great opportunity for us all to learn from each other. Such an event could also spread the word about the exciting new things that can be done with digital primary sources and the rich collections of digital resources that are now available in archives and libraries.

Of course, we can also blog about the issues and hope that the community leaps into the fray!

In that spirit, do you think archival appraisal needs to change, and if so, how?

Beyond Us and Them: Designing Storage Architectures for Digital Collections 2014

The following post was authored by Erin Engle, Michelle Gallinger, Butch Lazorchak, Jane Mandelbaum and Trevor Owens from the Library of Congress. The Library of Congress held the 10th annual Designing Storage Architectures for Digital Collections meeting September 22-23, 2014. This meeting is an annual opportunity for invited technical industry experts, IT  professionals, digital collections […]

Perpetual Access and Digital Preservation at #SAA14

I had the distinct pleasure of moderating the opening plenary session of the Joint Annual Meeting of COSA, NAGARA and SAA in Washington D.C. in early August. The panel was on the “state of access,” and I shared the dais with David Cuillier, an Associate Professor and Director of the University of Arizona School of […]

August Library of Congress Digital Preservation Newsletter is Now Available

The August Library of Congress Digital Preservation Newsletter is now available: Included in this issue: Digital Preservation 2014: It’s a Thing Preserving Born Digital News LOLCats and Libraries with Amanda Brennan Digital Preservation Questions and Answers End-of-Life Care for Aging, Fragile CDs Education Program updates Interviews with Henry Jenkins and Trevor Blank More on Digital […]

National Geospatial Advisory Committee: The Shape of Geo to Come

Back in late June I attended the National Geospatial Advisory Committee (NGAC) meeting here in DC. NGAC is a Federal Advisory Committee sponsored by the Department of the Interior under the Federal Advisory Committee Act. The committee is composed of (mostly) non-federal representatives from all sectors of the geospatial community and features very high profile […]

Digital Preservation 2014: It’s a Thing

“Digital preservation makes headlines now, seemingly routinely. And the work performed by the community gathered here is the bedrock underlying such high profile endeavors.” – Matt Kirschenbaum The annual Digital Preservation meeting, held each summer in Washington, DC, brings together experts in academia, government and the private and non-profit sectors to celebrate key work and […]

Digital Preservation 2014 in Three, Two, One…

And we’re off! Digital Preservation 2014 starts today and we’re really excited to welcome our colleagues from near and far to Washington DC this week for a full and packed program! Digital Preservation 2014, the annual meeting of the National Digital Information Infrastructure and Preservation Program and the National Digital Stewardship Alliance, provides opportunities to […]

Digital Preservation 2014 Session Preview: Preserving and Rescuing Heritage Information on Analog Media

The following is a guest post by Dr. Elizabeth Griffin, Volunteer Visitor at the Dominion Astrophysical Observatory, Canada, and Chair of the CODATA “Data at Risk” Task Group.  This is part of an ongoing series of posts to highlight and preview the Digital Preservation 2014 program.   Elizabeth previews the session she’s helped organize, “Preserving and […]

Preserving and Curating Research Data: Panel Preview for DP2014

Continuing with our series of blog posts devoted to the upcoming Digital Preservation 2014 conference, the following interview features a preview of the panel session entitled “Research Data and Curation” with panel members Inna Kouper (Data to Insight Center at Indiana University), Elizabeth Yakel (University of Michigan School of Information) and Ixchel Faniel (OCLC Research). Susan: […]

Preserving Born Digital News at Digital Preservation 2014

The following is a guest post from Anne Wootton, CEO of Pop Up Archive, which makes tools for preserving and creating access to digital spoken word; Edward McCain, the Digital Curator of Journalism at the Donald W. Reynolds Journalism Institute at the University of Missouri; Leslie Johnston, Direction of Digital Preservation at the National Archives […]