Top of page

22 Opportunities in Web Archiving! A Harvard Library Report

Share this post:

The following is a guest post by Andrea Goethals, Digital Preservation and Repository Manager at Harvard Library.

report coverIt’s St. Patrick’s Day, so I wanted to have a catchy Irish saying for the title but, believe it or not, Irish sayings about web archiving or even the web are hard to find. I did find some great phrases though, especially “You must take the little potato with the big potato.” Potatoes seem to be a common theme in Irish sayings, along with rain.

In the last couple years within Harvard Library, when we haven’t been thinking about our own frequently inclement weather, we have been thinking a lot about web archiving and what our strategy should be for scaling up our web archiving activities. We wanted to know more about the current practices, needs and expectations of other institutions who are either actively engaged in web archiving or would like to be, and if our institutions had common needs that might be addressed by collaborative efforts.

With the generous support of the Arcadia Fund, my colleague Abigail Bordeaux and I worked closely with Gail Truman of Truman Technologies to conduct a five-month environmental scan of web archiving programs, practices, tools and research. The final report is now available from Harvard’s open access repository, DASH.

The heart of the study was a series of interviews with web archiving practitioners from archives, museums and libraries worldwide; web archiving service providers; and researchers who use web archives. The interviewees were selected from the membership of the International Internet Preservation Consortium, the Web Archiving Roundtable at the Society of American Archivists, the Internet Archive’s Archive-It Partner Community, the Ivy Plus institutions, Working with Internet archives for REsearch (Ruters/WIRE Group), and the Research infrastructure for the Study of Archived Web materials (RESAW).

The interviews of web archiving practitioners covered a wide range of areas, everything from how the institution is maintaining their web archiving infrastructure (e.g. outsourcing, staffing, location in the organization), to how they are (or aren’t) integrating their web archives with their other collections. From this data, profiles were created for 23 institutions, and the data was aggregated and analyzed to look for common themes, challenges and opportunities.

In the end, the environmental scan revealed 22 opportunities for future research and development. These opportunities are listed in Table 1 (below) and described in more detail in the report. At a high level, these opportunities fall under four themes: (1) increase communication and collaboration, (2) focus on “smart” technical development, (3) focus on training and skills development and (4) build local capacity.

Table 1: The 22 opportunities for further research and development that emerged from the environmental scan
Table 1: The 22 opportunities for further research and development that emerged from the environmental scan

One of the biggest takeaways is that the first theme, the need to radically increase communication and collaboration among all individuals and organizations involved in some way in web archiving, was the most prevalent. Thirteen of the 22 opportunities fell under this theme. Clearly much more communication and collaboration is needed among those collecting web content but also between those who are collecting it and researchers who would like to use it.

This environmental scan has given us a great deal of insight into how other institutions are approaching web archiving, which will inform our own web archiving strategy at Harvard Library in the coming years. We hope that it has also highlighted key areas for research and development that need to be addressed if we are to build efficient and sustainable web archiving programs that result in complementary and rich collections that are truly useful to researchers.


  1. The Gaelic form of the mission statement of the Irish National Archives:
    Ráiteas Misin

    Taifead poiblí na hÉireann a bhailiú, a riar agus a chaomhnú agus a chinntiú go mbíonn sé ar fáil, mar acmhainn agus chun cearta an tsaoránaigh a chosaint
    Buanchoimeád an taifid phoiblí a chinntiú, agus cur dá réir leis an saol cultúrtha agus leis an gcuimhne ag muintir na hÉireann

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.