Top of page

Bar graph and line graph
Campaign type and date distribution of target websites in metadata.csv, a file that lists all of the candidate websites that have been collected as part of the United States Elections Web Archive and which are expected to be indexed in this data package’s CDX index files. Links to these specific data sources can be found at the bottom of the post.

New U.S. Elections Web Archive Data Resources Available

Share this post:

This blog post was guest-authored by Rachel Trent, Senior Digital Collections Data Librarian.


For nearly twenty-five years, the Library of Congress has been archiving campaign websites for Presidential, Congressional, and gubernatorial elections. Back in 2022, we released a dataset of index files for the United States Elections Web Archive, and we are happy to announce that this dataset is being relaunched as data package on data.labs.loc.gov/packages/, with new resources to help researchers understand and use the data.

The new data package includes enhanced documentation explaining the contents of the dataset and how it was created,  as well as metadata for candidate campaign sites extracted from the United States Elections Web Archive. The general election seasons from 2000-2016 are currently available, with more recent data to be added later. As before, the data includes index files (CDX file format) rather than archived web content itself. These index files list archived document URLs and help users to automatically construct URLs for fetching the archived web documents. For help getting started, a Python notebook is available to demonstrate the basics of using the dataset—including how to query the metadata, filter and download CDX files, and analyze the text.

We love hearing how people are using our datasets! If you want to tell us about what you’re working on, or if you would like to ask a question or send in feedback, contact the Web Archiving Program at [email protected] or submit a ticket on Github!

 

The data visualizations at the top of the page were created from two United States Web Archive Data .csv files– one for date distribution and the other for campaign type.

Add a Comment

Your email address will not be published. Required fields are marked *