IIPC @10

Late in April, in Ljubljana, Slovenia, the International Internet Preservation Consortium gathered for its annual General Assembly. This year is the 10th anniversary of the organization, and we marked the milestone by reflecting on our past accomplishments and thinking about how the members could work together to make positive and lasting impacts on the field of web archiving.

Members of the IIPC at the 2013 General Assembly

The mission of the IIPC is to “to acquire, preserve and make accessible knowledge and information from the Internet for future generations everywhere, promoting global exchange and international relations.” The original vision included collections that were built with common tools and practices that would enable interoperable access across systems and boarders. This vision is still at the core of what the IIPC is trying to accomplish, and many strides have been made toward this goal.

At this year’s GA several discussions centered around how to keep the IIPC  making significant contributions to cultural heritage and remain relevant in a fast-changing web world. In addition to noting achievements the members suggested to the  Steering Committee how to guide the organization in the next 10 years, below are the major themes that came out of these discussions during the entire week of the General Assembly.

Support Common Tools & Standards

The most obvious achievements of the IIPC are open source tools for the harvesting, processing and navigating of web archives—Heritrix, Wayback and WARC Tools. These are the foundational tools used by most of the members and by a larger world-wide community for commercial and preservation purposes. Members absolutely depend on these tools and the web archiving practice has been built around them. These tools have always been open source but they are currently being moved to Github so a broader base of developers can contribute to, maintain and improve the code. The IIPC also developed the ISO preservation standard for web archives, WARC.

Looking toward the next 10 years, members recognize the current tools need to evolve or change to capture what will be the future of the web. The IIPC must continue to support new tools like the Memento Aggregator and the Live HTTP Proxy harvester. A diversity of tools to collect and preserve web sites will be advantageous to future uses because different tools will be able to collect different parts of the web. Some thinking about the need for evolving tools was presented at the  in-depth discussion David Rosenthal of Stanford University and Kris Carpenter of the Internet Archive hosted about the future of web capture.

Build the Collection

The collection built by IIPC members is truly massive and global. Over a petabyte of web sites have been captured and indexed. Entire domains from many countries have been preserved; the newest country to have the legal authority to preserve their whole domain is the United Kingdom. This content will be of great interest to current and future researchers.

IIPC members touring the National and University Library of Slovenia

During the GA an open conference was held titled “Scholarly Use of Web Archives: Progress Requirements and Challenges” where researchers who are using web archives in their work now shared with members their methods, interests and (sometimes) frustrations. No two researchers seem to want the same thing out of web archives.

Sophie Gebeil of Aix-Marseille-Université presented her research into how African immigration is discussed in the French web domain, and analyzed the contents of web sites much like other documentary evidence. While Megan Dougherty of the Loyola University of Chicago explained that she is most interested in specific features and design elements on web pages and how those change over time, not necessarily the intellectual content on the site. Ditte Laursen, a researcher in Denmark, explained she needed to capture second by second changes on social media and television web sites for her work.

IIPC members have web archive content that is of great interest to researchers but providing access to those collections are often a challenge. Two important researcher-lead initiatives to build access tools and establish methodologies in building research corpses were shared that hold great promise in bridging the gap between those who collect and preserve web archives and users.

Build the Community

The IIPC began in 2003 with 12 members. Tool development was the original focus. Standards and best practices for web archiving grew with the organization. The IIPC has grown to include 44 members, all willing to share best practices, develop tools and resources for the global cultural heritage community. It is the primary resource for organizations that are just starting web archiving programs, and it is a venue for organizations with mature web archiving programs that want to advance the field. The work has been truly collaborative as embers worked on projects and tools that met their local needs while contributing to the tools, standards and practices of the web archiving domain. The unique quality of the collaboration has been the focus on shared practice rather than organizational differences. The focus has resulted in a international resource of expertise in web archiving.

In recent years the IIPC established an Education and Training program to fund professional development workshops and sponsoring a PhD student at the University of North Texas Information School for special studies in web archiving. This effort coupled with the outreach and awareness projects the IIPC has taken on are important to building the community for web archiving and maturing the field. The IIPC as an organization is also maturing with the addition of its first full-time employee, Mary Pitt of the British Library, who will serve as both the Program and Communication officer.

***

The National and University Library of Slovenia were perfect hosts for the 2013 IIPC General Assembly. If you want to know more details about what was shared over the week see Ahmed AlSum’s detailed summary of the GA. Ro, Old Dominion University. Rosalie Lack of the California Digital Library also shared her impressions.

All presentations will be posted on the IIPC website.

What Do Researchers Want From Institutions that Preserve Digital Content?

A smart-alecky way to answer the question in the title above would be: “why everything, of course.”  But we don’t traffic in snark here, at least not intentionally. User expectations influence so much of what stewardship organizations do. We collect and preserve all content primarily to support use, but the issue is especially important in …

Read more »

Reality Check: What Most People Actually Do with Their Personal Digital Archives

While Noah Lenstra was working on a website about African-American history in Champaign-Urbana, Illinois, many of the people he met at local public libraries, churches and businesses told him they had personal and family memorabilia they wanted to digitize, or they had digital stuff that they didn’t know what to do with. Lenstra, a PhD student …

Read more »

Open Data and Preservation

Yesterday, May 9, 2013, the U.S. government issued an executive order and an open data policy mandating that federal agencies collect and publish new datasets in open, machine-readable, and, whenever possible, non-proprietary formats.  The new policy gives agencies six months to create an inventory of all the government-produced datasets they collect and maintain; a list …

Read more »

Fifty Digital Preservation Activities You Can Do

The following is a guest post by Tess Webre, former intern with NDIIPP at the Library of Congress Preservation Week 2013 might be over, but digital preservation must go on every week of the year. In truth, preservation is an ongoing, long lasting process that requires active management. Don’t despair, though. I have some helpful suggestions to …

Read more »

Historicizing the Digital for Digital Preservation Education: An Interview with Alison Langmead and Brian Beaton

In this installment of the NDSA innovation working group’s ongoing series of innovation interviews I talk with Alison Langmead and Brian Beaton about the approach they are taking to teaching Digital Preservation at the University of Pittsburgh. Alison holds a joint appointment in the Department of the History of Art and Architecture and the School …

Read more »

New Video: Digital Preservation at the Library of Congress’s Packard Campus for Audio Visual Conservation

We produce occasional short videos  related to digital preservation. These videos address such topics as personal digital archiving, adding descriptions to digital photographs and the K-12 Web Archiving program, to name a few. Our newest video profiles one of the Library of Congress’s most magnificent treasures: the Packard Campus for Audio Visual Conservation, located in …

Read more »

Before You Were Born: Image Digitization, a Personal Reminiscence

Image scanning of one sort or another has been in common usage in some industries since the 1920s. Yes, really, the 1920s. The news wire services used telephotography — where images are captured using photo cells and transmitted over phone lines — well into the 1990s.  Scanners and digital cameras like those we are familiar …

Read more »

The Content Matters Interview Series: Dr. Sylvia Chou of the National Cancer Institute

The following is a guest post by Christie Moffatt, Manager, Digital Manuscripts Program, History of Medicine Division, National Library of Medicine In this installment of the “Content Matters” series of the National Digital Stewardship Alliance Content Working Group, I interview Dr. Sylvia Chou, PhD, MPH, Program Director of the National Cancer Institute’s Health Communication and …

Read more »