Every two years there is a fresh opportunity for excitement in following the Olympic games – not only for the thrill of the sports themselves, and rooting for hometown heroes, but for the fascination and variety of all the international culture in one place. And now, there is an effort going on behind the scenes to capture the highlights, the competition, and the general cultural history surrounding the Olympic Games. That is, a project to archive the 2014 Olympics web sites. This effort may not be well known, but the resultant archive will be invaluable for researchers in the future.
This web archiving project is being produced by the International Internet Preservation Consortium. The IIPC has been around since 2003, and it’s a collaborative organization dedicated to improving the tools, standards and best practices of web archiving while promoting international collaboration and the broad access and use of web archives for research and cultural heritage. Membership in the IIPC currently includes almost 50 organizations; libraries (including the Library of Congress), archives, museums and other cultural heritage organizations, representing over 25 countries. This Olympics project is being coordinated through the IIPC Access Working Group, and the project leaders are Nicola Bingham and Helen Helen Hockx-Yu, both of the British Library.
The IIPC has produced similar projects before: there is an archive of the 2010 Winter Olympics in Vancouver and the 2012 Summer Olympics and Parolympics in London. And this current effort aims to preserve a range of web sites relating to the 2014 Olympics in Sochi, Russia.
A little bit about the process – IIPC member institutions all contribute their own list of suggested web sites (referred to as “seeds”) for inclusion in the collection. With so many member organizations around the world, the aim is to include Olympics-related sites from many countries, in a variety of languages and from a variety of viewpoints.
The previous IIPC project to capture the 2012 Olympics in London included many British sites that provide an overall view of the host country preparations. These archived sites are not available yet, but include the official London 2012 Olympic and Paralympic Games sites as well as the British Olympic Association which includes details of the Olympics bid, and a local council’s 2012 Olympic and Paralympic website. It also includes the Hidden London site showing the building stages of the Olympic stadium, as well as blogs and commentaries related to arts and culture, featuring such things as a torch from the 1948 London Olympics acquired by the Victoria and Albert Museum.
For this current 2014 Olympics project, the various IIPC member institutions are all recommending their own list of websites to be included. For example, the Library of Congress has recommended 131 web sites. As described by Michael Neubert, Supervisory Digital Projects Specialist here at the Library: “The selection of most sites for such collections is mechanical, in that we know we want sites for the various US teams – each team sport has its own site, for example, then along with that site there will be various social media sites/channels. In order to optimize the crawls, we nominate the social media separately. In addition to the team sites, we also chose a limited number of news media sites where the coverage of the Olympics seemed segregated from the rest of the site.”
Nicola Bingham of the British Library, and one of the project coordinators, emphasizes additional contributions to this project. “The IIPC 2014 Winter Olympics project is being supported by the Internet Archive who are crawling the seeds (sites) and the University of North Texas who are supporting the nomination tool. A common subject scheme is being used to categorize websites according to producer type and Olympic sport. Crawling began in mid December 2013, and to-date 745 seeds have been nominated by 17 IIPC member institutions.”
“The Internet Archive has taken on the role of crawling, without which the project would have been much more difficult. Many other IIPC members would not have been able to perform the crawling, not necessarily for technical reasons but due to legal and/or political considerations.” For more about the web archiving process, see the IIPC “About Archiving” page.
As stated on the group’s Access Working Group page, “It is hoped that the project will enable institutions to continue to experiment with tools and processes that facilitate collaborative definition, collection and accessibility of web data.”
Over the next year or so, the IIPC will be working on creating wider access to all these Olympic archives. For the latest updates on this and other IIPC projects, follow @netpreserve.