Exploring Web Archiving at the Library of Congress

The following is a guest post by Samantha Abrams, an intern for the Web Archiving Team at the Library of Congress.

Madison, Wisconsin's Lake Mendota, where the The iSchool at UW-Madison sits. Credit: Samantha Abrams

Madison, Wisconsin’s Lake Mendota, where the The iSchool at UW-Madison sits. Credit: Samantha Abrams

As a library school graduate student, I developed an interest in archives and born-digital objects (content pulled from floppy disks, web pages, Tweets, and on) but I lack practical, professional experience working with these materials. But after my time interning with the Web Archiving Team at the Library of Congress, I am confident in my exposure to a subset of digital materials and to the professional world of web archives: its relationships, its openness, its complexities.

The Library’s Web Archiving Team works to manage and preserve at-risk digital content born from the web – web pages, and yes, social media included and more. The team considers the task of archiving the web from every angle: by working with software, like Openwayback, and developing tools to assist with crawls; considering copyright issues; and building collections that help paint a comprehensive picture of the web as it stands today (or, as it stood yesterday).

Abrams' "rad" notes from a web archiving meeting. Credit: Samantha Abrams

Abrams’ “RAD!” notes from a web archiving meeting. Credit: Samantha Abrams

In four weeks, I have learned about the ins and outs of what web archiving really is (and what it can be). At a recent meeting, we discussed the look, and feel, and design of the collections: how can we keep users focused as they interact with the massive collection, yet allow them to discover both related and unrelated content while introducing them to the web of the past? I have spent time cleaning up data in preparation for migration to a new curator tool. And, in what will be my final project with the Library, I have helped lay the groundwork for a Business in America Web Archive. It has been a process of learning and asking questions: web archiving is an emerging and changing field, and the way professionals consider its quirks and processes requires constant readjustment and creative thinking. To be on a team so interested in following those changes as they occur has been as challenging as it has been rewarding.

I have also spent time at the Library getting to know the archival profession on an individual level: person to person, process to process, idea to idea. Early on in my time here, I reached out to archivist Kathleen O’Neill, and asked her if she would be willing to explain the way the Manuscript Division handles the acquisition and processing of born-digital materials. She introduced me to software the Division uses to access content on tangible media, and spoke about the ethical questions this processes often raises. For instance: how do archivists handle uncovering once-deleted files stored on tangible media? I’ve also spoken with Andrew Cassidy-Amstutz, an archivist with the Veteran’s History Project, and he spoke openly about the Project’s goals: reaching out to veterans, and seeking very specific content, which, in turn, leads to a workflow focused on processing digital items in bulk, and pulling as much content as possible, as quickly as possible, from the media donated to the Project. All of my questions have been answered eagerly, with thoughtful recommendations including: You know who you should talk to next about this? You know what I once read about this exact question? Have you heard of this archivist, with this institution? You should reach out to them. And on.

And this, I have realized, has been the most rewarding experience of my time with the Library. I have been introduced to an institution filled with connected, passionate individuals, eager to share their knowledge with those interested in asking about it. The people I have met here have helped introduce me to the archival world as a whole: the way we stand connected, bound by our interest in the same field, in its materials, and in its people. And just like the rest of the Library, the Web Archiving Team is composed of talented individuals, interested in sharing what they know. And these individuals, in turn, contribute to an archival profession that is vast, far-reaching, and eager to share.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.