Top of page

Library’s Web Archiving: COVID-19 Challenges

Share this post:

This is a guest post by Joe Puccio, a collection development officer in the Collection Development Office.

The COVID-19 pandemic has presented challenges to the Library’s web archiving program not seen since the terrorist attacks against the U.S. on Sept. 11, 2001. The program had just begun in 2000, and the Library rushed to pull together online material from all across the country after the attacks. The resulting archive is part of the Library’s permanent collection.

Since then, the web archiving program has collected an enormous amount of materials (more than two petabytes of data and over 21 billion files) primarily in event or theme-based collections that are proposed, approved and set up in a process that can take several weeks to complete.

In mid-March, the Library’s reading rooms were closed and practically all staff began teleworking because of COVID-19. By that point, the Library was already capturing pandemic web content even before there was a formal collection plan in place. In addition, since the Library is a member of the International Internet Preservation Consortium, there was a desire to suggest sites for its global Novel Coronavirus collection. Our staff nominated sites for that effort. Also, by May, those same staff had recommended a substantial number of sites to be harvested for the Library’s collection, with more than 75 percent from outside the U.S.

Several things then became obvious.

First, we were rapidly expanding the number of sites to be collected, but still without a full collecting plan, as events were moving so fast. Second, the Library was approaching its crawling capacity under its current web harvesting contract. Third, there was growing uncertainty regarding the Library’s next annual budget, which had not yet been appropriated but was set to take effect on Oct. 1. Fourth, many other groups across the U.S. had already started COVID-19 archiving projects.

The Collection Development Office and the Web Archiving Team of the Digital Collections Management and Services Division developed  a collecting proposal that took into account both the scope and funding of the project. Robin Dale, the associate librarian for Library Services, approved the plan in mid-June.

The plan has three primary objectives that are being carried out by a collection team led by subject experts from the Science, Technology and Business Division. The first objective is to fill major gaps in our pandemic web collection. The second is to determine high-priority sub-topics within the U.S. The final objective is to better identify and organize material we’ve already collected.

The team has been highly selective regarding new nominations, with a primary focus on the U.S. The team is also planning for the eventual public launch of the collection, which has a working title of the “Coronavirus Web Archive.” Since the Library’s web archives program observes a one-year embargo on harvested content, that collection will likely be made fully available in the latter half of 2021. Small parts of it will be available before the full launch.

The goal is to have a well-balanced collection of archived pandemic-related websites that will be preserved and made accessible to the Library’s users.  Subject areas will include government information, social and cultural impacts, scientific material, personal narratives and everyday life.  Examples of sites being collected:

Eventually, the Library will have materials in its collections on this world-changing pandemic in a variety of formats, both physical and digital. Web archives will be prominent among them. If you’d like to submit your own collection of covid-related photographs, you can apply to this Library program.

Subscribe to the blog— it’s free! — and the largest library in world history will send cool stories straight to your inbox.

Comments (4)

  1. The Library and its staff are to be commended for preserving this valuable historic data. Thank you for your tireless efforts.

  2. I fully agree with the note from the previous comment-writer Cassy Ammen. From the start, the LC web-archiving effort has sought to find the best possible balance between “harvest everything” and “carefully selected sites.” In this, I think, it is a great echo and continuation of LC’s long-term collections development approach for printed matter, manuscripts, maps, and all the rest. It’s just that there is more water from this digital firehose and, alas, it is at risk of drying up faster. Bravo!

  3. Pandemic Relief… One Plie at a time
    Spring, Fall, Summer, Winter, I am now well into the fourth season of teaching community dance classes in my backyard. A year ago last February, my world was upended by a raging pandemic. My job of forty years at Blues Alley Jazz Club, vanished. Planned trips, social interactions, daily visits to the gym, family get togethers and holiday celebrations, were crossed off my calendar. I was a little lost and anxious , but still grateful for my many blessings. The need to retreat into something to regain control and help dissipate stress was evident. Immersing myself into healthy cooking and working out, my background in dance served me well for this journey. I diligently gave myself a 45 minute dance class daily. The mention of it to a couple of my friends sparked their interest and desire for something like that, as they were missing their gyms. Luckily I have a rather large court in my back yard, and I offered for them to follow along , remaining socially distant. We labeled the class Covid Yoga, although there was very little yoga involved. I had spent 25 years teaching creative movement to preschoolers, but had very little experience teaching adults. I ventured into a hybrid class focusing on strength, balance and flexibility. Word of mouth spread, and the class quickly grew. Adirondack chairs were purchased to provide each participant with a personal ballet barre. We were quite the sight to behold under the canopy of trees lining our makeshift studio. My IPAD housed a Covid Yoga playlist of memory provoking music supplying underlying rhythms for the various movement phrases. Before each class, I would diligently use the leaf blower to clear the court of any debris or puddles. Chairs were positioned at least 6 feet apart, and yoga mats and music put in place. We were ready to go.
    Last March, when this journey began, we were quite the motley crew of middle aged women. Most everyone in class engaged in various forms of physical fitness, but dance experience was nary to be had. My combinations striving for strength, agility and grace, were met with sighs, groans and dirty looks. I cringed at the abuse of the carefully crafted relationship between music and movement. Classes commenced with our chairs pulled into a vast circle, so we could share our Pandemic fears, anxiety over our political atmosphere, and our longing to spend time with our children, grandchildren, friends and family. This time together became a lifeline to normalcy and salvation.
    Honestly, I began this venture with the idea of providing a community service for my friends and neighbors. Little did I know, I would be the greatest beneficiary of these efforts. I truly had forgotten how much I loved teaching, learning, and dancing. My passion for dance had been rekindled and retooled. One year later, amidst demonstrating choreography to my willing participants, I have to stop and smile. Balances are steadied, posture is poised, limbs are graceful, and rhythms are no longer butchered. Their bodies speak back to me with confidence and strength. We are a community of women, lifting each other up, providing our own small piece of Pandemic relief.

  4. The above information is very useful about COVID-19 consortium and challenges as it encloses the computing potential from some of most powerful and advanced computers in world.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.


Required fields are indicated with an * asterisk.