Library’s Web Archiving: COVID-19 Challenges

This is a guest post by Joe Puccio, a collection development officer in the Collection Development Office.

The COVID-19 pandemic has presented challenges to the Library’s web archiving program not seen since the terrorist attacks against the U.S. on Sept. 11, 2001. The program had just begun in 2000, and the Library rushed to pull together online material from all across the country after the attacks. The resulting archive is part of the Library’s permanent collection.

Since then, the web archiving program has collected an enormous amount of materials (more than two petabytes of data and over 21 billion files) primarily in event or theme-based collections that are proposed, approved and set up in a process that can take several weeks to complete.

In mid-March, the Library’s reading rooms were closed and practically all staff began teleworking because of COVID-19. By that point, the Library was already capturing pandemic web content even before there was a formal collection plan in place. In addition, since the Library is a member of the International Internet Preservation Consortium, there was a desire to suggest sites for its global Novel Coronavirus collection. Our staff nominated sites for that effort. Also, by May, those same staff had recommended a substantial number of sites to be harvested for the Library’s collection, with more than 75 percent from outside the U.S.

Several things then became obvious.

First, we were rapidly expanding the number of sites to be collected, but still without a full collecting plan, as events were moving so fast. Second, the Library was approaching its crawling capacity under its current web harvesting contract. Third, there was growing uncertainty regarding the Library’s next annual budget, which had not yet been appropriated but was set to take effect on Oct. 1. Fourth, many other groups across the U.S. had already started COVID-19 archiving projects.

The Collection Development Office and the Web Archiving Team of the Digital Collections Management and Services Division developed  a collecting proposal that took into account both the scope and funding of the project. Robin Dale, the associate librarian for Library Services, approved the plan in mid-June.

The plan has three primary objectives that are being carried out by a collection team led by subject experts from the Science, Technology and Business Division. The first objective is to fill major gaps in our pandemic web collection. The second is to determine high-priority sub-topics within the U.S. The final objective is to better identify and organize material we’ve already collected.

The team has been highly selective regarding new nominations, with a primary focus on the U.S. The team is also planning for the eventual public launch of the collection, which has a working title of the “Coronavirus Web Archive.” Since the Library’s web archives program observes a one-year embargo on harvested content, that collection will likely be made fully available in the latter half of 2021. Small parts of it will be available before the full launch.

The goal is to have a well-balanced collection of archived pandemic-related websites that will be preserved and made accessible to the Library’s users.  Subject areas will include government information, social and cultural impacts, scientific material, personal narratives and everyday life.  Examples of sites being collected:

Eventually, the Library will have materials in its collections on this world-changing pandemic in a variety of formats, both physical and digital. Web archives will be prominent among them. If you’d like to submit your own collection of covid-related photographs, you can apply to this Library program.

Subscribe to the blog— it’s free! — and the largest library in world history will send cool stories straight to your inbox.

Johannes Kepler and COVID-19: 400 Years of Mathematical Modeling

In 1619, German astronomer Johannes Kepler wrote “Harmonices Mundi” (“Harmony of the Worlds”), a book that tried to understand the mystery of the polyhedral designs of viruses. Four centuries later, the same designs are seen in the building blocks of COVID-19. The Library has copies of Kepler’s work in the Rare Books and Special Collections Division.