This is a guest post from Nathan Yarasavage, a digital projects specialist in the Serial and Government Publications Division.
This week we celebrate an exciting milestone. Chronicling America, the online searchable database of historic U.S. newspapers, now includes more than 15 million pages! To mark the occasion, we are throwing a #ChronAmParty on Twitter and unveiling a set of interactive data visualizations that help reveal the variety of content available in a corpus of 15 million digitized newspaper pages.
Libraries, historical societies, and other institutions throughout the country have contributed newspapers from their collections to Chronicling America since 2005. This process is part of the National Digital Newspaper Program (NDNP), a collaborative program sponsored by the National Endowment for the Humanities (NEH) and the Library of Congress. The goal of NDNP is to digitize newspapers from all states and U.S. territories. To date, we have more than 2,800 newspapers from 46 states, Puerto Rico, and the District of Columbia.
Every year, NDNP partners select newspapers to digitize based on available microfilm and occasional surviving print copies in their own collections. Some focus their selections on a few major city titles, while others have digitized a diverse collection of smaller community newspapers. Ultimately, all state partners follow recommended selection guidelines while striving to include newspapers that represent diverse viewpoints in the retelling of their state histories.
Nearly every week we receive new newspaper pages from our partners to add to Chronicling America. Because of this ongoing expansion of the collection, our users have often asked questions about the coverage of newspapers available in the database at a given time. Knowing the scope of any digital collection helps set expectations and can help clarify questions that arise when analyzing and interpreting search results. Here we debut several different types of data visualizations describing the newspapers’ locations, dates, subjects, languages, and quantities. We hope these graphs, charts, and maps will help users better understand what is and what is not in Chronicling America. We will preview a few of the visualizations below, but head to our new Chronicling America Data Visualizations web page to read more.
Chronicling America Coverage by Time
Chronicling America Coverage by State
Each state partner approaches newspaper selection in a different manner. It is therefore no surprise when users ask questions such as, “what titles are available in my state?” While users of Chronicling America can already consult a dynamic list of titles by state, the ever-growing nature of this list makes it challenging for individuals to assess the collection’s geographic coverage. In another visualization, we extracted the information about the newspapers from the aforementioned list of titles, and plotted it on an interactive map interface. As shown in the screen shot below, each dot on the map represents locations where newspapers currently available in Chronicling America were published. Users can zoom in (using the +/- controls or the mouse scroll wheel) and click on the dots to see more information and link to the digitized newspapers.
Another common question we receive is, “do you have newspapers from [insert date] available from my state?” To help break down Chronicling America’s temporal coverage by state, we added a map to the temporal coverage visualization mentioned earlier. This allows a user to filter the temporal coverage area graph from above to show the specific dates available at a state or territory level. This information is useful in setting user expectations by showing gaps in coverage. State partners can also use these visualizations to help steer their future selection decisions (i.e. target filling collection gaps) if they like. The screen shot below shows coverage details for papers published in Ohio. Click on the map to see what years of newspapers are available in your state.
Chronicling America Coverage by Language and Ethnic Press
Did you know that Chronicling America also features newspapers from nineteenth and early twentieth century immigrant communities, including many non-English speaking groups? As explained in a 2014 blog post by NEH, “for decades, Germans were the largest non-English-speaking immigrant group in America … the group established a pattern that other immigrant groups followed later.” This pattern is apparent when we plot non-English language page counts into a packed bubble visualization (pictured to right). As demonstrated by the size of the bubbles, German, Spanish, and Polish papers currently have the most representation in the database, but recent additions of papers in Arabic, Cherokee, Czech, Lithuanian, and Icelandic are among the 18 languages currently included in the corpus.
Where were these communities located? Plotting our newspaper metadata to a map interface allows users to explore language and ethnicity coverage by location. Did you know Chronicling America has Finnish newspapers from Oregon, an Arabic newspaper from Michigan, and African American publications from over 20 states? Exploring our interactive map visualizations (one pictured to the left), users can zoom in and click on areas of interest to see details about the ethnic press and non-English titles in Chronicling America.
How we did it
There are a number of tools and technologies available to create similar data demonstrations. We used Tableau Public software for creating the visualizations from open data we extracted directly from Chronicling America. Every newspaper digitized for Chronicling America has an associated MARC XML catalog record as part of the U.S. Newspaper Directory. All digitized titles contain date and location information, and some titles have special subject headings to indicate a newspaper’s intended audience and ethnic press affiliation.
Extracting this data was easy, thanks to the well-designed Chronicling America API, structured views of the data, and expert assistance from LC colleague, Chris Ehrman. Generally, we extracted the data from Chronicling America and converted it to .csv or .xlsx formats, both of which work seamlessly in Tableau Public software. We provide access to this data via the download link alongside each visualization. More information about the visualizations and the links to the scripts Chris created can be found on our Chronicling America Data Visualizations web page.
Celebrate with Us!
To help celebrate these exciting accomplishments, please join our #15MillionPages #ChronAmParty on May 21st. All throughout the day, NDNP partners will be tweeting what is sure to be an eclectic assortment of content on the theme of #15MillionPages. Follow along and retweet our finds to your own followers or tweet your own discoveries! Just include #ChronAmParty #15MillionPages to join in the fun. Throughout the year, we’ll continue the #ChronAmParty with a changing theme every third Tuesday of the month. We’re looking forward to celebrating with you!