Top of page

It Takes a Village…to Archive the Internet

Share this post:

The following is a guest post by Abbie Grotke, Web Archiving Team Lead at the Library of Congress.

In an earlier post I talked about the types of Web content that the Library of Congress archives. Despite the tremendous amount we’ve preserved, we know we can’t do it alone. We often collaborate to build web archives with other libraries, archives and organizations in the United States and around the globe. We do this when events unfold quickly on the Internet and the Library can’t react as quickly as we’d like for whatever reason, but also when the scope of a collection is so big that we must work with others to ensure that breadth of content is preserved.

Crowd Voice
Crowdvoice: Tracking Voices of Protest., archived February 3, 2011, North Africa & the Middle East 2011 collection.

We find many of our partners in the membership of the International Internet Preservation Consortium. Members work together not only to develop tools and solve issues concerning harvesting, access and preservation, but on collection-building projects as well. For instance, we recently began developing an IIPC-sponsored archive of 2012 Olympics-related websites; each participating institution will contribute URLs from their country to create a combined archive covering a global perspective.

IIPC members also join together to preserve web materials related to events unfolding around the world. Some content is particularly at risk, for instance sites or social media cropping up during something we call “spontaneous events” – those we can’t plan for but we must react quickly to preserve. Examples include recent partnerships with Internet Archive and the National Library of France to document the recent events in North Africa and the Middle East. For the Jasmine Revolution in Tunisia, subject experts from our African and Middle East Division helped identify websites, YouTube videos and social media content to be preserved.

Japan Earthquake
Save the Library. http://45/, archived April 14, 2011, Japan Earthquake collection.

Our Asian Division subject experts have contributed to a Japanese Earthquake collection, a collaboration of Virginia Tech, Internet Archive, LC, the National Diet Library in Japan and Harvard University. And a few years ago, we worked with a variety of partners to archive content related to the earthquake in Haiti. In each of these cases, LC is not archiving content itself but contributing valuable subject and language expertise.

Here in the United States, we partnered with Internet Archive, the Pew Internet & American Life project and to build the September 11 Web Archive – an early example of collaborative archiving.

American Red Cross 9/11
American Red Cross., archived September 15, 2001, September 11 Web Archive.

Since then, we’ve worked with others to document events such as Hurricanes Katrina and Rita. Another big project we collaborated on in recent years was the End of Term Government Web Archive. For this project, we worked Internet Archive, California Digital Library, the University of North Texas Libraries, and the Government Printing Office to archive the United States government web during the transition from President Bush’s administration to President Obama. The team is already planning for the next End of Term archive and a related project to build a more comprehensive Election 2012 archive.

So while each institution has its own collection policies to follow, there is an obvious recognition by web archivists that the Internet is a global place, not neatly wrapped by physical boundaries. Personally, I value these collaborations and know that my colleagues do too. If we work together, we can ensure that more of the Web is preserved, and often we can act more quickly, particularly when the risk of loss of content is so great.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.