Close Reading, Distant Reading: Should Archival Appraisal Adjust?

From time to time, co-chairs of the National Digital Stewardship Alliance Arts and Humanities Content Working Group will bring you guest posts addressing the future of research and development for digital cultural heritage as a follow-up to a dynamic forum held at the 2014 Digital Preservation Conference.  

The following is a guest post from Meg Phillips, External Affairs Liaison, National Archives and Records Administration. Opinions expressed are those of the author and do not necessarily represent positions of the National Archives and Records Administration.

Meg Phillips, External Affairs Liaison at the National Archives and Records Administration and member of the NDSA Coordinating Committee.

Meg Phillips, External Affairs Liaison at the National Archives and Records Administration and member of the NDSA Coordinating Committee.

Digital humanists and digital historians are employing research methods that most of us did not anticipate when we were learning to be archivists.  Do new types of research mean archivists should re-examine the way we learned to do appraisal?

The new types of researchers are experimenting with methods beyond the scholarly tradition of “close reading.”  When paper archives were the only game in town, close reading was all a researcher could do – it’s what we generally mean by “reading.”  Researchers studied individual records, extracting meaning and context from the information contained in each document.  Now, however, digital humanists are using born-digital or digitized collections to explore the benefits of computational analysis techniques, or “distant reading.” They are using computer programs to analyze patterns and find meaning in entire corpora of records without a human ever reading any individual record at all.

I have been interested in digital scholarship and its implications for archives for a while, but I hadn’t heard the phrase “distant reading” until seeing Franco Moretti’s book “Distant Reading” reviewed earlier this year. (See  “What is Distant Reading?” in the New York Times and “In Praise of Overstating the Case: A review of Franco Moretti, Distant Reading” in Digital Humanities Quarterly for a taste of the debate over the book.)  The phrase stuck with me as provocative shorthand for a new way of using records, and I started thinking about what distant reading might mean for archival appraisal.

Our traditions of archival appraisal are based on locating records that reward close reading.  A series appraised as permanent contains individual records that contain historically valuable information.  Both appraisal itself and the culling that happens during transfer or processing focus on removing records that do not contain permanently valuable information.

Now, however, it is possible to ask and answer entirely new kinds of questions with born-digital or digitized records. What did the network of influence in an organization look like?  How did communication flow? Was the chief executive interacting with a particular vendor unusually often? When did a new concept or term first appear and how quickly did use of the new term spread?  How did a disease spread through a community?  Not only is it possible, but early adopters are now teaching these research methods to a new generation of students.  For example, Professor Matthew Connelly is teaching a seminar at the London School of Economics called Hacking the Archives.  The course challenges students of international history to explore the new kinds of questions computational research allows.  These are questions whose answers emerge not from deep reading of individual records but from analysis of patterns in  large bodies of records.

The National Archives from user silbersam on <a href="https://flic.kr/p/9BQvkr">Flickr</a>.

The National Archives from user silbersam on Flickr.

The interesting thing about these questions is that the answers may rely on the presence of records that would clearly be temporary if judged on their individual merits. Consider email messages like “Really sick today – not coming in” or a message from the executive of a  regulated company saying “Want to meet for lunch?” to a government policymaker. In the aggregate, the patterns of these messages  may paint a picture of disease spread or the inner workings of access and influence in government.  Those are exactly the kinds of messages traditional archival practice would try to cull. In these cases, appraising an entire corpus of records as permanent would support distant reading much better.  The informational value of the whole corpus cannot be captured by selecting just the records with individual value.

If we adjusted practice to support more distant reading, archivists would still do appraisal, deciding what is worth permanent preservation.  We would just be doing it at a different level of granularity – appraising the research value of an entire email system, SharePoint site or social media account, for example.

Incidentally, on a practical level this level of appraisal might also lead to disposition instructions that are easier for creating offices to carry out.

Figuring out how to do appraisal to support both distant reading and close reading would be an excellent project for the archival and digital preservation fields.  What questions would we want to answer?  We could start with some questions like these:

  • How many researchers are actually engaged in distant reading?  What fields do they work in?  Are their numbers increasing?
  • Do they want to apply computational techniques to archival materials, for example Federal records in the National Archives, or in any other environment?  Perhaps they are getting their source material somewhere else, bypassing archives.
  • To what extent do their research methods rely on having a complete set of the records created rather than a subset of the most permanently valuable records?
  • Do current definitions of a record and current recordkeeping regulations support a change to appraisal of entire corpora of records?
  • How would we know which corpora of records were most useful to researchers?
    • Is the benefit of distant reading worth the cost and risk of retaining more material that could have personal privacy or other protected content?
  • Is there a meaningful difference between trying to support computational research and actually just keeping everything?  (Perhaps this whole discussion is just the modern version of the old tension between historians who want to save everything and archivists who are trying to put their resources toward the most important materials.)

Staff at the National Archives and other institutions are starting to create opportunities for archivists to discuss questions like these.  Josh Sternfeld of NEH, Jordan Steele of Johns Hopkins and Paul Wester and I from NARA will be holding a panel discussion of these issues at the Fall 2014 Mid Atlantic Regional Archives Conference meeting in Baltimore, for example.   Paul and I will be also be speaking with Matthew Connelly and others on an American Historical Association panel at the 2015 annual meeting in New York City, “Are We Losing History? Capturing Archival Records for a New ERA of Research.”

However, we need to create even more opportunities for archivists to explore these issues with digital humanists. A forum that pulled together digital researchers, archivists, librarians and technologists could be a great opportunity for us all to learn from each other. Such an event could also spread the word about the exciting new things that can be done with digital primary sources and the rich collections of digital resources that are now available in archives and libraries.

Of course, we can also blog about the issues and hope that the community leaps into the fray!

In that spirit, do you think archival appraisal needs to change, and if so, how?

What Does it Take to Be a Well-rounded Digital Archivist?

The following is a guest post from Peter Chan, a Digital Archivist at the Stanford University Libraries. I am a digital archivist at Stanford University. A couple of years ago, Stanford was involved in the AIMS project, which jump-started Stanford’s thinking about the role of a “digital archivist.” The project ended in 2011 and I […]

We Want You Just the Way You Are: The What, Why and When of Fixity

Fixity, the property of a digital file or object being fixed or unchanged, is a cornerstone of digital preservation. Fixity information, from simple file counts or file size values to more precise checksums and cryptographic hashes, is data used to verify whether an object has been altered or degraded. Many in the preservation community know […]

The Library of Congress Wants Your File Format Ideas

In June of this year, the Library of Congress announced a list of formats it would prefer for digital collections. This list of recommended formats is an ongoing work; the Library will be reviewing the list and making revisions for an updated version in June 2015. Though the team behind this work continues to put […]

Announcing the Release of the 2015 National Agenda For Digital Stewardship

The National Digital Stewardship Alliance is pleased to announce the release today of the “2015 National Agenda for Digital Stewardship.”  The Agenda provides funders, decision‐makers and practitioners with insight into emerging technological trends, gaps in digital stewardship capacity and key areas for research and development to support the work needed to ensure that today’s valuable […]

QCTools: Open Source Toolset to Bring Quality Control for Video within Reach

In this interview, part of the Insights Interview series, FADGI talks with Dave Rice and Devon Landes about the QCTools project. In a previous blog post, I interviewed Hannah Frost and Jenny Brice about the AV Artifact Atlas, one of the components of Quality Control Tools for Video Preservation, an NEH-funded project which seeks to […]

Preliminary Results for the Ranking Stumbling Blocks for Video Preservation Survey

In a previous blog post, the NDSA Standards and Practices Working Group announced the opening of a survey to rank issues in preserving video collections. The survey closed on August 2, 2014 and while there’s work ahead to analyze the results and develop action plans, we can share some preliminary findings. We purposely cast a […]

Untangling the Knot of CAD Preservation

At the 2014 Society of American Archivists meeting, the CAD/BIM Taskforce held a session titled “Frameworks for the Discussion of Architectural Digital Data” to consider the daunting matter of archiving computer-aided design and Building Information Modelling files. This was the latest evidence that — despite some progress in standards and file exchange — archivists and the […]

Curating Extragalactic Distances: An interview with Karl Nilsen & Robin Dasler

While a fair amount of digital preservation focuses on objects that have clear corollaries to objects from our analog world (still and moving images and documents for example), there are a range of forms that are basically natively digital. Completely native digital forms, like database-driven web applications, introduce a variety of challenges for long-term preservation […]

August Library of Congress Digital Preservation Newsletter is Now Available

The August Library of Congress Digital Preservation Newsletter is now available: Included in this issue: Digital Preservation 2014: It’s a Thing Preserving Born Digital News LOLCats and Libraries with Amanda Brennan Digital Preservation Questions and Answers End-of-Life Care for Aging, Fragile CDs Education Program updates Interviews with Henry Jenkins and Trevor Blank More on Digital […]