Web Archive Management at NYARC: An NDSR Project Update

The following is a guest post by Karl-Rainer Blumenthal, National Digital Stewardship Resident at the New York Art Resources Consortium (NYARC).

A tipping point from traditional to emergent digital technologies in the regular conduct of art historical scholarship threatens to leave unprepared institutions and their researchers alike in a “digital black hole.” NYARC–the partnership of the Frick Art Reference Library, the Museum of Modern Art Library and the Brooklyn Museum Library & Archives–seeks to institute permanent and precedent-setting collecting programs for born-digital primary source materials that make this black hole significantly more gray.

Since the 2013 grant from the Andrew W. Mellon Foundation, for instance, NYARC has archived the web presences of its partner museums and those of prominent galleries, auction houses, artists, provenance researchers and others within their traditional collecting scopes. While working to define description standards and integrating access points with those of traditional resources, NYARC has further leveraged this leadership opportunity by designing this current National Digital Stewardship Residency project, which is to concurrently prepare their nascent collections for long-term management and preservation.

Archiving MoMA's many exhibit sites preserves them for future art historians, but only if critical elements aren't lost in the process.

Archiving MoMA’s many exhibit sites preserves them for future art historians, but only if critical elements aren’t lost in the process.

Stewarding web archives to the future generations that will learn from them requires careful planning and policymaking. Sensitive preservation description and reliable storage and backup routines will ultimately determine the accessibility of these benchmarks of our online culture for future librarians, archivists, researchers and students. Before we can plan and prepare for the long term, however, it is incumbent upon those of us with responsibility to steward especially visually rich and complex cultural artifacts to assure their integrity at the point of collection–to assure their faithful rendition of the extent, behavior and appearance of visual information transmitted over this uniquely visual medium.

2015-0105_Blumenthal_2-QA-diagramQuality assurance (QA)–the process of verifying and/or making the interventions necessary to improve the accuracy and integrity of archived web-based resources at the point of their collection–was therefore the logical place to begin defining long term stewardship needs.  As I quickly discovered, though, it also happens to be one of the slipperiest issues for even experienced web archivists. Like putting together a jigsaw puzzle, its success begins with having all of the right pieces, then requires fitting those pieces together in the correct order and sequence, and ultimately hinges on the degree to which our final product’s ‘look and feel’ resembles that of our original vision.

Unless and until the technologies that we use to crawl and capture content from the live web can simply replicate every conceivable experience that any human browser may have online, we are compelled to decide which specific properties of equally sprawling and ephemeral web presences are of primary significance to our respective missions and patrons, and which therefore demand our most assiduous and resource-intensive pursuit.

Determining those priority areas and then finding the requisite time and manpower to do them justice is challenging enough to any web archiving operation. To a multi-institutional partnership sharing responsibility for aesthetically diverse but equally rich and complex web designs, it’s enough to stop you right in your tracks. To keep NYARC’s small army of graduate student QA technicians all moving in the same direction as efficiently as possible, and to sustain a model of their work beyond the end of their grant-funded terms, I’ve therefore spent the bulk of this first phase to my NDSR project building towards the following procedural reference guide. I now welcome the broader web archiving community to review, discuss and adapt this to their own use:

QA GuideThis living document will be updated to reflect technical and practical developments throughout and beyond the remainder of my residency. In the meantime, it will provide NYARC’s decision-makers, and others who are designing permanent web archiving programs, an executive summary of the principles and technologies that influence the potential scopes of QA work. Its procedural guidelines walk our QA technicians through their regular assessment and documentation process. Perhaps most importantly, this roadmap directs them to the areas where they may make meaningful interventions, indicates where they alternatively must rely on help from our software service providers, Archive-It, and flags where future technical development still precludes any potential for improvement. Finally, it inventories the major problem areas and improvement strategies presently known to NYARC to make or break the whole process.

This iteration of NYARC’s documentation is the product of expansive literature review, hands-on QA work, regular consultation and problem solving with interns and professional staff, and the generous advice of colleagues throughout the community. As such, it has prepared me not only for upcoming NDSR project phases focused on preservation metadata and archival storage, but also for a much longer career in digital preservation.

As any such project must, it hinges the success of any rapidly acquired technical knowledge or expertise to equally effective project management, communication and open documentation–skill sets that every emergent professional must cultivate in order to have a permanent role in the stewardship of our always tumultuous digital culture. I’m sure that this small documentation effort will provide NYARC, and similar partners in the field, with the tools to improve the quality of their web archives. Also, I sincerely hope that it provides a model of practice to sustain such improvements over radical and unforeseen technological changes–that it makes the digital black hole just a little more gray.

One Comment

  1. Doreva Belfiore
    January 16, 2015 at 3:28 pm

    Excellent work Karl! This is a wonderful resource for web archiving. Thank you for your and NYARC’s hard work on this.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.