Eight Years of Web Archiving, One Week in The Hague

Session attendees, taking it all in.  Photo credit: Jos Uljee

The following is a guest post by Abbey Potter, Program Officer, NDIIPP.  She is also Communications Officer for the IIPC.

The Internet is a vast utility shared across borders and cultures–a resource like no other. It presents information from governments, news outlets, corporations, nonprofits and cultural heritage institutions with the thoughts, feelings and everyday outputs of individuals in hundreds of languages with vivid media. Anyone can share information on the Internet and much of it is free to use and reuse. It is invaluable as a communications and publication tool today and it will serve as a highly unique resource for researchers tomorrow.

However, many things published on the Internet quickly vanish: new information replaces old, whole web sites are abandoned, technologies and standards change.  Beginning in the late 1990s the Internet Archive and several national libraries, the Library of Congress included, started to explore how to capture and preserve web sites for posterity. Laws were passed in several countries, Sweden being the first, permitting national libraries to collect the entire country domain through their respective legal deposit regimes.

Session panel. Photo credit: Jos Uljee.

It was around this time that the International Internet Preservation Consortium began to form. In February of 2000 the Nordic national libraries of Iceland, Denmark, Sweden, Norway and Finland began to cooperate in co-developing better tools to capture the Internet.

Early in 2003 the National Library of France organized a meeting in Paris with the Nordic group, the Library of Congress, the British Library and the Internet Archive to combine efforts. It was at that meeting where the Internet Archive developed the first bits of code for the first archival web crawler, Heritrix. Later in 2003 the IIPC was officially chartered adding Canada, Australia and Italy for a total of 12 founding members.

Today there are 39 member institutions across 5 continents in the IIPC and last month the 8th annual General Assembly took place at the National Library of the Netherlands in The Hague. The first years of the IIPC were focused on tools and standards development which have made web archiving programs operational in libraries across the world. The biggest challenge facing members now are how to select and preserve quality collections amidst an ever-growing Internet, and even more challenging, providing meaningful access to web archives.

Helen Hockx-Yu, British Library. Photo credit: Jos Uljee

This was the focus of the public event: “Out of the Box: Building and Using Web Archives.” Members showed the contents of their collections through stories, illustrations and demonstrations. Users of web archives discussed their needs when approaching web archives and demonstrated the tools and methodologies they use to answer research questions with archived web data. Inge Angevaar of the NCDD in The Netherlands wrote a wonderful summary of the event at her blog and all the presentations are also available at the IIPC web site.

The General Assembly filled the rest of the week in The Hague with more successes, challenges and inspiration. The multilingualization of NutchWax was completed by the Diet Library of Japan, the lack of and need for automated quality assurance tools was discussed, plans for format validators and outreach projects were developed and new ways to access archives were demonstrated. The week ended with a data mining and analysis workshop led by the Internet Archive. Over 100 people registered for this year’s GA, the biggest crowd ever at an IIPC meeting.

An informal session.

Question: How can a diverse set of people from all over the world spend 8 years working on web archiving together, let alone one week in the same library?

Answer: Mutual admiration and the occasional happy hour. Members of the IIPC are web archiving pioneers and experts and they work together because they learn from each other and are able to do more together than they can do alone. That last sentence was stolen from a new video the IIPC has produced. You can get the same warm fuzzy feeling when you watch it (link coming soon)!

Update: The video is now live! Check it out on the IIPC website

One Comment

  1. scott phillips
    June 8, 2011 at 11:44 am

    I love this! Very helpful. I am trying to figure out how to archive my weekly updates for my family from my family tree software and my blog (http://onwardtoourpast.blogspot.com).

    A very challenging issue and one that deserves much attention!

    Thanks and keep up the great work!

    Scott

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.