U.S. Government, Elections, and Web Archiving, Oh My!

The following is a guest post by Abbie Grotke, Web Archiving Team Lead.

I continue to be reminded that we’re extremely lucky in the digital preservation community to have a wide range of partners and collaborators from a diverse set of organizations to work with, both in the Unites States and globally. With web archiving particularly, because this field is still relatively new in the big scheme of things, and the web is so very large, we often work together to accomplish projects that could otherwise not be done by just one institution. The various skills and interests that each partner brings to the table benefit us all, and while we have many LC-only web archive projects going on, working on these collaborative efforts is increasingly a part of my job these days.

Beta site of the End of Term 2008-2009 archive.

After participating in a productive meeting last week with the Coordinating Committee and fellow co-chairs of working groups of the National Digital Stewardship Alliance, I attended the CNI Spring meeting this week in Baltimore, Maryland. Not only did I hear about some exciting projects and activities, I gave a status report on the End of Term Government Web Archive. The slides from this presentation are here.

In those slides, you’ll see that Kathleen Murray from the University of North Texas Libraries co-presented with me, speaking about an IMLS funded project utilizing some of the government websites we archived. The project is titled “Classification of the End-of-Term Archive: Extending Collection Development to Web Archives.” Kathleen talked about their classification of the EOT Archive, involving both structural analysis and human analysis. The work is further described on their project website.

SuDoc Classification Findings from Murray's presentation at CNI.

In terms of the public access side of the EOT archive, our project team still has some work to do. The 2008-2009 archive is still in beta, but feel free to take a look. Come summertime, we’ll be drafting volunteers to help us build the 2012 archive. And if you happen to be a member of ALA GODORT and have received the Spring 2012 (Vol. 40, No. 1) issue of DttP: Documents to the People , an article about the End of Term project was just published. It was jointly written by yours truly, Tracy Seneca from California Digital Library, Cathy Hartman from University of North Texas Libraries, and Kris Carpenter from Internet Archive.  (Note:  It’s unfortunately not currently available online for non-members.)

Talking about End of Term also gives me a chance to plug another related (in that similar partners are involved) collaborative web archive.  In a previous post, I described the kick-off of our U.S. Election archiving project here at the Library of Congress. We are still busily archiving presidential campaign websites, and starting to add congressional sites now that more of the primaries are on the calendar.

We’re excited to be collaborating with some of our fellow International Internet Preservation Consortium members to select and archive content related to the elections, beyond what we’re doing with campaign sites here at LC. This collaborative Election 2012 project has a focus on web sites produced by non-profits, academic institutions, and individuals, including blogs and fact checking sites.

So far, subject experts at the Harvard Kennedy School in areas such as political science and public policy are identifying relevant web sites for long-term preservation. Other experts at the University of North Texas and California Digital Library may also contribute URLs to this archive. The Internet Archive is performing weekly crawls on this content. So far, over 120 URLs have been identified for collection in the collaborative project.

If you were trying to document the 2012 Election on the web for future researchers, what type of sites would you include?

 

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.