Capturing and Preserving the Olympic Spirit Via Web Archiving

Image from the IIPC website

Image from the IIPC website

Every two years there is a fresh opportunity for excitement in following the Olympic games – not only for the thrill of the sports themselves, and rooting for hometown heroes, but for the fascination and variety of all the international culture in one place.   And now, there is an effort going on behind the scenes to capture the highlights, the competition, and the general cultural history surrounding the Olympic Games.  That is, a project to archive the 2014 Olympics web sites. This effort may not be well known, but the resultant archive will be invaluable for researchers in the future.

This web archiving project is being produced by the International Internet Preservation Consortium.  The IIPC has been around since 2003, and it’s a collaborative organization dedicated to improving the tools, standards and best practices of web archiving while promoting international collaboration and the broad access and use of web archives for research and cultural heritage.  Membership in the IIPC currently includes almost 50 organizations; libraries (including the Library of Congress), archives, museums and other cultural heritage organizations, representing over 25 countries. This Olympics project is being coordinated through the IIPC Access Working Group, and the project leaders are Nicola Bingham and Helen Helen Hockx-Yu, both of the British Library.

The IIPC has produced similar projects before: there is an archive of the 2010 Winter Olympics in Vancouver and the 2012 Summer Olympics and Parolympics in London.  And this current effort aims to preserve a range of web sites relating to the 2014 Olympics in Sochi, Russia.

A little bit about the process – IIPC member institutions all contribute their own list of suggested web sites (referred to as “seeds”) for inclusion in the collection.  With so many member organizations around the world, the aim is to include Olympics-related sites from many countries, in a variety of languages and from a variety of viewpoints.

A torch relay map from the 2012 site collection

A torch relay map page image from the 2012 collection

The previous IIPC project to capture the 2012 Olympics in London included many British sites that provide an overall view of the host country preparations.  These archived sites are not available yet, but include the official London 2012 Olympic and Paralympic Games sites as well as the British Olympic Association which includes details of the Olympics bid, and a local council’s 2012 Olympic and Paralympic website.  It also includes the Hidden London site showing the building stages of the Olympic stadium, as well as blogs and commentaries related to arts and culture, featuring such things as a torch from the 1948 London Olympics acquired by the Victoria and Albert Museum.

For this current 2014 Olympics project, the various IIPC member institutions are all recommending their own list of websites to be included.   For example, the Library of Congress has recommended 131 web sites.  As described by Michael Neubert, Supervisory Digital Projects Specialist here at the Library: “The selection of most sites for such collections is mechanical, in that we know we want sites for the various US teams – each team sport has its own site, for example, then along with that site there will be various social media sites/channels.  In order to optimize the crawls, we nominate the social media separately. In addition to the team sites, we also chose a limited number of news media sites where the coverage of the Olympics seemed segregated from the rest of the site.”

New Zealand Olympic committee site for 2014

New Zealand Olympic committee page image

Nicola Bingham of the British Library, and one of the project coordinators, emphasizes additional contributions to this project.  “The IIPC 2014 Winter Olympics project is being supported by the Internet Archive who are crawling the seeds (sites) and the University of North Texas who are supporting the nomination tool. A common subject scheme is being used to categorize websites according to producer type and Olympic sport. Crawling began in mid December 2013, and to-date 745 seeds have been nominated by 17 IIPC member institutions.”

“The Internet Archive has taken on the role of crawling, without which the project would have been much more difficult. Many other IIPC members would not have been able to perform the crawling, not necessarily for technical reasons but due to legal and/or political considerations.”  For more about the web archiving process, see the IIPC “About Archiving” page.

As stated on the group’s Access Working Group page, “It is hoped that the project will enable institutions to continue to experiment with tools and processes that facilitate collaborative definition, collection and accessibility of web data.”

Over the next year or so, the IIPC will be working on creating wider access to all these Olympic archives.  For the latest updates on this and other IIPC projects, follow @netpreserve.

See other Olympics-related blog posts here at the Library, from Poetry and Literature, Teaching, and the Law Library.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.