The following is a guest post by Andrea Goethals, Digital Preservation and Repository Manager at Harvard Library.
It’s St. Patrick’s Day, so I wanted to have a catchy Irish saying for the title but, believe it or not, Irish sayings about web archiving or even the web are hard to find. I did find some great phrases though, especially “You must take the little potato with the big potato.” Potatoes seem to be a common theme in Irish sayings, along with rain.
In the last couple years within Harvard Library, when we haven’t been thinking about our own frequently inclement weather, we have been thinking a lot about web archiving and what our strategy should be for scaling up our web archiving activities. We wanted to know more about the current practices, needs and expectations of other institutions who are either actively engaged in web archiving or would like to be, and if our institutions had common needs that might be addressed by collaborative efforts.
With the generous support of the Arcadia Fund, my colleague Abigail Bordeaux and I worked closely with Gail Truman of Truman Technologies to conduct a five-month environmental scan of web archiving programs, practices, tools and research. The final report is now available from Harvard’s open access repository, DASH.
The heart of the study was a series of interviews with web archiving practitioners from archives, museums and libraries worldwide; web archiving service providers; and researchers who use web archives. The interviewees were selected from the membership of the International Internet Preservation Consortium, the Web Archiving Roundtable at the Society of American Archivists, the Internet Archive’s Archive-It Partner Community, the Ivy Plus institutions, Working with Internet archives for REsearch (Ruters/WIRE Group), and the Research infrastructure for the Study of Archived Web materials (RESAW).
The interviews of web archiving practitioners covered a wide range of areas, everything from how the institution is maintaining their web archiving infrastructure (e.g. outsourcing, staffing, location in the organization), to how they are (or aren’t) integrating their web archives with their other collections. From this data, profiles were created for 23 institutions, and the data was aggregated and analyzed to look for common themes, challenges and opportunities.
In the end, the environmental scan revealed 22 opportunities for future research and development. These opportunities are listed in Table 1 (below) and described in more detail in the report. At a high level, these opportunities fall under four themes: (1) increase communication and collaboration, (2) focus on “smart” technical development, (3) focus on training and skills development and (4) build local capacity.
One of the biggest takeaways is that the first theme, the need to radically increase communication and collaboration among all individuals and organizations involved in some way in web archiving, was the most prevalent. Thirteen of the 22 opportunities fell under this theme. Clearly much more communication and collaboration is needed among those collecting web content but also between those who are collecting it and researchers who would like to use it.
This environmental scan has given us a great deal of insight into how other institutions are approaching web archiving, which will inform our own web archiving strategy at Harvard Library in the coming years. We hope that it has also highlighted key areas for research and development that need to be addressed if we are to build efficient and sustainable web archiving programs that result in complementary and rich collections that are truly useful to researchers.