Happy Birthday to LCWA! Celebrating the 20th Anniversary of Web Archiving at the Library of Congress.

Today’s guest post is from Abbie Grotke, who is Lead Librarian, Web Archiving Team in the Digital Content Management Section of the Library of Congress.  

2020 marks a special occasion for the Library of Congress – our anniversary of 20 years of web archiving!

Remember the year 2000? Back when we all breathed a collective sigh of relief that our systems didn’t crash thanks to Y2K?  I remember it well, partly because I was working in nearly the same office (well, not currently, since I’m working from home right now). I was in my early years at the Library of Congress, transitioning from digitization projects to other types of digital library work. Little did I know that there was an activity brewing in another part of the Library that would ultimately change the course of my career when I joined the project team a few years later.

Figure 1. MINERVA: Mapping the Internet Electronic Resources Virtual Archive.

It was in 2000 that the Library of Congress embarked on a web preservation pilot project, which eventually became the Library’s web archiving program. An acronym to describe the pilot program was born to align with a beautiful mosaic in the Jefferson Building: “MINERVA: Mapping the Internet Electronic Resources Virtual Archive” (you can see an early capture in figure 1.) From our records, the pilot program activities began around this time in 2000, and after some early test crawls, the first collection, related to the Election 2000, began in August 2000. Like many national libraries, election archives were a natural first archive, since many campaign websites tend to disappear.

The early pilot efforts (figure 2)were well documented in two reports written in 2001 by project consultant William Arms, who at the time was at Cornell University: an interim report issued in January and a final report in September. The reports reflected the project team’s experiences, outlined progress and outcomes of the pilot, and included recommendations for the Library to consider as it embarked on collecting this new form of content. The reports go into detail about topics familiar to us seasoned and just-getting-started web archivists: selection and collection policies, potential uses for scholarship and research, information discovery, discussions of copyright and legal issues, long-term preservation, and suggestions regarding the development of a production system to accomplish the work.

Figure 2. Early capture of the Web Preservation Project Pilot.

Digging around for links in our archive to the reports, I was also reminded of some articles describing the early efforts that were posted on an early version of our website: “Election 2000, as It Happened: Library and Alexa Announce Election Web Archives” from the Library of Congress Information Bulletin (July –August 2001). This RLG News article was written by Arms and members of the pilot team: “Collecting and Preserving the Web: The Minerva Prototype” (bonus: you can see evidence of our inability to get all of the images in those early days.) This presentation (downloadable PPT file) about the pilot is also fun to flip through given what we know now in 2020.

We have packed a lot to be proud of into the past 20 years. While we were hoping to have a big event and party this spring to celebrate (that is on hold for obvious reasons), so we are moving to virtual celebration of the program for now. Stay tuned on the Signal for highlights, stories, and people that were involved in the early years of the program, and some of the accomplishments in our first 20 years. We’ll also be on Twitter — look for special #WebArchiveWednesday tweets on @librarycongress and @LC_Labs that will feature content and tales from the archive.

Figure 3. Over the last 20 years, web archiving has grown dramatically, you can see how the program has grown to more than 2000 TB or over 2 Petabytes of data.

More Open eBooks: Routinizing Open Access eBook Workflows

This is a guest post by Kristy Darby, a Digital Collections Specialist in the Digital Content Management Section in Library Services. We are excited to share that anyone anywhere can now access a growing online collection of contemporary open access eBooks from the Library of Congress website. For example, you can now directly access books […]

PDF is Here to Stay: Archiving with the Portable Document Format

Today’s guest post is from Kate Murray (Digital Projects Coordinator, Digital Collections Management and Services Division, Library of Congress), Duff Johnson (Executive Director, PDF Association / ISO Project Leader, ISO 32000), and Kevin De Vorsey (Senior Electronic Records Policy Analyst, Records Management Policy and Standards, National Archives and Records Administration). PDF in the Federal Archiving […]

New Collaboration between LC Labs, British Library, and the Zooniverse

Announcing preliminary details for Arts & Humanities Research Council UK-US Partnership Development Grant awarded jointly to the the British Library, the Zooniverse, and the Library of Congress. The project is titled “From crowdsourcing to digitally-enabled participation: the state of the art in collaboration, access, and inclusion for cultural heritage institutions.” Several opportunities to participate are described.

Machine Learning + Libraries Summit: Event Summary now live!

The Machine Learning + Libraries Summit Event Summary is now available as a downloadable report on labs.loc.gov. This document includes more detailed information about the conference proceedings. It broadly summarizes recurring themes of discussion and compiles the outputs of the small group activities.

Computing Cultural Heritage in the Cloud Quarterly Update

This is a guest post from LC Labs Senior Innovation Specialist Laurie Allen. This is the second post in a series where we are sharing experiences from the Andrew W. Mellon-funded Computing Cultural Heritage in the Cloud. The series began with an introductory post.  Learn about the grant on the experiments page, and see the […]

LC Labs Letter January 2020

LC LABS LETTER A Monthly Roundup of News and Thoughts from the Library of Congress Labs Team The Computing Cultural Heritage in the Cloud Project is HIRING! Come join the Mellon-funded Computing Cultural Heritage in the Cloud Project as one of two digital scholarship specialists! The positions will be funded for three years and will […]