Top of page

Happy Birthday to LCWA! Celebrating the 20th Anniversary of Web Archiving at the Library of Congress.

Share this post:

Today’s guest post is from Abbie Grotke, who is Lead Librarian, Web Archiving Team in the Digital Content Management Section of the Library of Congress.  

2020 marks a special occasion for the Library of Congress – our anniversary of 20 years of web archiving!

Remember the year 2000? Back when we all breathed a collective sigh of relief that our systems didn’t crash thanks to Y2K?  I remember it well, partly because I was working in nearly the same office (well, not currently, since I’m working from home right now). I was in my early years at the Library of Congress, transitioning from digitization projects to other types of digital library work. Little did I know that there was an activity brewing in another part of the Library that would ultimately change the course of my career when I joined the project team a few years later.

Figure 1. MINERVA: Mapping the Internet Electronic Resources Virtual Archive.

It was in 2000 that the Library of Congress embarked on a web preservation pilot project, which eventually became the Library’s web archiving program. An acronym to describe the pilot program was born to align with a beautiful mosaic in the Jefferson Building: “MINERVA: Mapping the Internet Electronic Resources Virtual Archive” (you can see an early capture in figure 1.) From our records, the pilot program activities began around this time in 2000, and after some early test crawls, the first collection, related to the Election 2000, began in August 2000. Like many national libraries, election archives were a natural first archive, since many campaign websites tend to disappear.

The early pilot efforts (figure 2)were well documented in two reports written in 2001 by project consultant William Arms, who at the time was at Cornell University: an interim report issued in January and a final report in September. The reports reflected the project team’s experiences, outlined progress and outcomes of the pilot, and included recommendations for the Library to consider as it embarked on collecting this new form of content. The reports go into detail about topics familiar to us seasoned and just-getting-started web archivists: selection and collection policies, potential uses for scholarship and research, information discovery, discussions of copyright and legal issues, long-term preservation, and suggestions regarding the development of a production system to accomplish the work.

Figure 2. Early capture of the Web Preservation Project Pilot.

Digging around for links in our archive to the reports, I was also reminded of some articles describing the early efforts that were posted on an early version of our website: “Election 2000, as It Happened: Library and Alexa Announce Election Web Archives” from the Library of Congress Information Bulletin (July –August 2001). This RLG News article was written by Arms and members of the pilot team: “Collecting and Preserving the Web: The Minerva Prototype” (bonus: you can see evidence of our inability to get all of the images in those early days.) This presentation (downloadable PPT file) about the pilot is also fun to flip through given what we know now in 2020.

We have packed a lot to be proud of into the past 20 years. While we were hoping to have a big event and party this spring to celebrate (that is on hold for obvious reasons), so we are moving to virtual celebration of the program for now. Stay tuned on the Signal for highlights, stories, and people that were involved in the early years of the program, and some of the accomplishments in our first 20 years. We’ll also be on Twitter — look for special #WebArchiveWednesday tweets on @librarycongress and @LC_Labs that will feature content and tales from the archive.

Figure 3. Over the last 20 years, web archiving has grown dramatically, you can see how the program has grown to more than 2000 TB or over 2 Petabytes of data.

Comments (8)

  1. congratulations – virtual champagne all round. (Does it have to be alcohol-free until the program turns 21 next year?)

  2. Congratulations to the entire Web Archiving Team (WAT)!! You all are amazing and your work is appreciated!

  3. Congratulations! We all appreciate your dedication and determination!

  4. Congratulations Abbie and team! You definitely have a lot to be proud of.

  5. I remember attending the first meetings at the start of this program and being fascinated. It has been awe-inspiring watching the project grow and mature. Congrats Abbie and all of the wonderful people who have been involved.

  6. Thank you all! It has certainly been a team effort with many wonderful staff involved over the years.

    Danna, we may need to interview you to get some of your early memories recorded!

  7. Let me join the chorus of praise for the orderly development of a complex and valuable endeavor, building a collection that will benefit researchers for . . . um, well, forever. Abbie and her colleagues did and continue to do a bang-up job!

  8. Here’s to our fabulous web archiving team. It must be gratifying to have contributed so much to a new and exciting field.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.


Required fields are indicated with an * asterisk.