Library’s Web Archiving: COVID-19 Challenges

This is a guest post by Joe Puccio, a collection development officer in the Collection Development Office.

The COVID-19 pandemic has presented challenges to the Library’s web archiving program not seen since the terrorist attacks against the U.S. on Sept. 11, 2001. The program had just begun in 2000, and the Library rushed to pull together online material from all across the country after the attacks. The resulting archive is part of the Library’s permanent collection.

Since then, the web archiving program has collected an enormous amount of materials (more than two petabytes of data and over 21 billion files) primarily in event or theme-based collections that are proposed, approved and set up in a process that can take several weeks to complete.

In mid-March, the Library’s reading rooms were closed and practically all staff began teleworking because of COVID-19. By that point, the Library was already capturing pandemic web content even before there was a formal collection plan in place. In addition, since the Library is a member of the International Internet Preservation Consortium, there was a desire to suggest sites for its global Novel Coronavirus collection. Our staff nominated sites for that effort. Also, by May, those same staff had recommended a substantial number of sites to be harvested for the Library’s collection, with more than 75 percent from outside the U.S.

Several things then became obvious.

First, we were rapidly expanding the number of sites to be collected, but still without a full collecting plan, as events were moving so fast. Second, the Library was approaching its crawling capacity under its current web harvesting contract. Third, there was growing uncertainty regarding the Library’s next annual budget, which had not yet been appropriated but was set to take effect on Oct. 1. Fourth, many other groups across the U.S. had already started COVID-19 archiving projects.

The Collection Development Office and the Web Archiving Team of the Digital Collections Management and Services Division developed  a collecting proposal that took into account both the scope and funding of the project. Robin Dale, the associate librarian for Library Services, approved the plan in mid-June.

The plan has three primary objectives that are being carried out by a collection team led by subject experts from the Science, Technology and Business Division. The first objective is to fill major gaps in our pandemic web collection. The second is to determine high-priority sub-topics within the U.S. The final objective is to better identify and organize material we’ve already collected.

The team has been highly selective regarding new nominations, with a primary focus on the U.S. The team is also planning for the eventual public launch of the collection, which has a working title of the “Coronavirus Web Archive.” Since the Library’s web archives program observes a one-year embargo on harvested content, that collection will likely be made fully available in the latter half of 2021. Small parts of it will be available before the full launch.

The goal is to have a well-balanced collection of archived pandemic-related websites that will be preserved and made accessible to the Library’s users.  Subject areas will include government information, social and cultural impacts, scientific material, personal narratives and everyday life.  Examples of sites being collected:

Eventually, the Library will have materials in its collections on this world-changing pandemic in a variety of formats, both physical and digital. Web archives will be prominent among them. If you’d like to submit your own collection of covid-related photographs, you can apply to this Library program.

Subscribe to the blog— it’s free! — and the largest library in world history will send cool stories straight to your inbox.

My Job at the Library: Cataloging Children’s Literature

This interview with Ann Sullivan was first published in the September–October issue of LCM, the Library of Congress magazine. The issue is available in its entirety online. After reading the interview, make sure to take the quiz that follows! How would you describe your work? I catalog children’s books at the Library of Congress. This […]

My Job at the Library: A Folklife Cataloger Reflects on Her Career

Margaret “Maggie” Kruesi is the first and, so far, the only cataloger to work at the Library’s American Folklife Center (AFC). Before starting in 2004, she earned a Ph.D. in folklore from the University of Pennsylvania and acquired considerable experience cataloging and otherwise processing archival collections at Penn’s Van Pelt Library. She will retire from […]

Inquiring Minds: The Unheralded Story of the Card Catalog

The library card catalog was one of the most versatile and durable technologies in history—a veritable road map for navigating a “wilderness of books”—says Peter Devereaux of the Library’s Publishing Office. His new book on the subject, “The Card Catalog: Books, Cards and Literary Treasures,” explores the history of this once-revolutionary system and celebrates literary […]

Headlines from America’s Earliest Days

Want to read how an 18th-century newspaper covered the inauguration of George Washington? How about learning what issues divided Congress in the early 1800s? Going back into early American history is now possible due to new digital content that has been added to Chronicling America, the open access database of historic U.S. newspapers that is […]

Pic of the Week: Final Projects

On Wednesday, the Library of Congress Junior Fellows Summer Interns presented more than 100 rare and unique items from 17 Library divisions. The display provided the opportunity for fellows to discuss the historic significance of the collection items they have researched and processed during their 10-week internships. Some highlights included: an Olmec ceramic figurine (900-1200 […]

New Online: Website Updates, Presidential Papers, Federal Resources

(The following is a guest post by William Kellum, manager in the Library’s Web Services Division.)  Website Resources New in July is a new, responsive design for the Library’s Online Catalog, one of the most heavily used features of our website. Like other websites, we’ve seen a dramatic increase in the number of users accessing […]

The Ghost Writer of the “Seaman’s Ghost”

The following post has been written by Sierriana Terry, one of 36 college students who participated in the 10-week Library of Congress Junior Fellow Summer Intern Program. A senior at North Carolina Central University studying music performance with a licensure in K-12 education, Terry worked in the Library’s Music Division. Her plan after the program […]