Top of page

IIPC @10

Share this post:

Late in April, in Ljubljana, Slovenia, the International Internet Preservation Consortium gathered for its annual General Assembly. This year is the 10th anniversary of the organization, and we marked the milestone by reflecting on our past accomplishments and thinking about how the members could work together to make positive and lasting impacts on the field of web archiving.

Members of the IIPC at the 2013 General Assembly

The mission of the IIPC is to “to acquire, preserve and make accessible knowledge and information from the Internet for future generations everywhere, promoting global exchange and international relations.” The original vision included collections that were built with common tools and practices that would enable interoperable access across systems and boarders. This vision is still at the core of what the IIPC is trying to accomplish, and many strides have been made toward this goal.

At this year’s GA several discussions centered around how to keep the IIPC  making significant contributions to cultural heritage and remain relevant in a fast-changing web world. In addition to noting achievements the members suggested to the  Steering Committee how to guide the organization in the next 10 years, below are the major themes that came out of these discussions during the entire week of the General Assembly.

Support Common Tools & Standards

The most obvious achievements of the IIPC are open source tools for the harvesting, processing and navigating of web archives—Heritrix, Wayback and WARC Tools. These are the foundational tools used by most of the members and by a larger world-wide community for commercial and preservation purposes. Members absolutely depend on these tools and the web archiving practice has been built around them. These tools have always been open source but they are currently being moved to Github so a broader base of developers can contribute to, maintain and improve the code. The IIPC also developed the ISO preservation standard for web archives, WARC.

Looking toward the next 10 years, members recognize the current tools need to evolve or change to capture what will be the future of the web. The IIPC must continue to support new tools like the Memento Aggregator and the Live HTTP Proxy harvester. A diversity of tools to collect and preserve web sites will be advantageous to future uses because different tools will be able to collect different parts of the web. Some thinking about the need for evolving tools was presented at the  in-depth discussion David Rosenthal of Stanford University and Kris Carpenter of the Internet Archive hosted about the future of web capture.

Build the Collection

The collection built by IIPC members is truly massive and global. Over a petabyte of web sites have been captured and indexed. Entire domains from many countries have been preserved; the newest country to have the legal authority to preserve their whole domain is the United Kingdom. This content will be of great interest to current and future researchers.

IIPC members touring the National and University Library of Slovenia

During the GA an open conference was held titled “Scholarly Use of Web Archives: Progress Requirements and Challenges” where researchers who are using web archives in their work now shared with members their methods, interests and (sometimes) frustrations. No two researchers seem to want the same thing out of web archives.

Sophie Gebeil of Aix-Marseille-Université presented her research into how African immigration is discussed in the French web domain, and analyzed the contents of web sites much like other documentary evidence. While Megan Dougherty of the Loyola University of Chicago explained that she is most interested in specific features and design elements on web pages and how those change over time, not necessarily the intellectual content on the site. Ditte Laursen, a researcher in Denmark, explained she needed to capture second by second changes on social media and television web sites for her work.

IIPC members have web archive content that is of great interest to researchers but providing access to those collections are often a challenge. Two important researcher-lead initiatives to build access tools and establish methodologies in building research corpses were shared that hold great promise in bridging the gap between those who collect and preserve web archives and users.

Build the Community

The IIPC began in 2003 with 12 members. Tool development was the original focus. Standards and best practices for web archiving grew with the organization. The IIPC has grown to include 44 members, all willing to share best practices, develop tools and resources for the global cultural heritage community. It is the primary resource for organizations that are just starting web archiving programs, and it is a venue for organizations with mature web archiving programs that want to advance the field. The work has been truly collaborative as embers worked on projects and tools that met their local needs while contributing to the tools, standards and practices of the web archiving domain. The unique quality of the collaboration has been the focus on shared practice rather than organizational differences. The focus has resulted in a international resource of expertise in web archiving.

In recent years the IIPC established an Education and Training program to fund professional development workshops and sponsoring a PhD student at the University of North Texas Information School for special studies in web archiving. This effort coupled with the outreach and awareness projects the IIPC has taken on are important to building the community for web archiving and maturing the field. The IIPC as an organization is also maturing with the addition of its first full-time employee, Mary Pitt of the British Library, who will serve as both the Program and Communication officer.


The National and University Library of Slovenia were perfect hosts for the 2013 IIPC General Assembly. If you want to know more details about what was shared over the week see Ahmed AlSum’s detailed summary of the GA. Ro, Old Dominion University. Rosalie Lack of the California Digital Library also shared her impressions.

All presentations will be posted on the IIPC website.

Comments (2)

  1. Building research corpses? Dr. Frankenstein in the library? 🙂

    • Nope! Though I’m sure Dr. Frankenstein would have a fascinating web archive. Sorry about that auto-correct error. Corpses should be corpuses or corpora, depending on your preference.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.