Visualizing Chronicling America Data: 15 million pages of digitized historical newspapers

This is a guest post from Nathan Yarasavage, a digital projects specialist in the Serial and Government Publications Division.

This week we celebrate an exciting milestone. Chronicling America, the online searchable database of historic U.S. newspapers, now includes more than 15 million pages! To mark the occasion, we are throwing a #ChronAmParty on Twitter and unveiling a set of interactive data visualizations that help reveal the variety of content available in a corpus of 15 million digitized newspaper pages.

Screenshot of Chronicling America website homepage. — 15 million pages and counting on the Chronicling America homepage (as of 20 May 2019)

Libraries, historical societies, and other institutions throughout the country have contributed newspapers from their collections to Chronicling America since 2005. This process is part of the National Digital Newspaper Program (NDNP), a collaborative program sponsored by the National Endowment for the Humanities (NEH) and the Library of Congress. The goal of NDNP is to digitize newspapers from all states and U.S. territories. To date, we have more than 2,800 newspapers from 46 states, Puerto Rico, and the District of Columbia.

Every year, NDNP partners select newspapers to digitize based on available microfilm and occasional surviving print copies in their own collections. Some focus their selections on a few major city titles, while others have digitized a diverse collection of smaller community newspapers. Ultimately, all state partners follow recommended selection guidelines while striving to include newspapers that represent diverse viewpoints in the retelling of their state histories.

Map of the United States demonstrating participation in the National Digital Newspaper Program. — The National Digital Newspaper Program, 2008-2015

Nearly every week we receive new newspaper pages from our partners to add to Chronicling America. Because of this ongoing expansion of the collection, our users have often asked questions about the coverage of newspapers available in the database at a given time. Knowing the scope of any digital collection helps set expectations and can help clarify questions that arise when analyzing and interpreting search results. Here we debut several different types of data visualizations describing the newspapers’ locations, dates, subjects, languages, and quantities. We hope these graphs, charts, and maps will help users better understand what is and what is not in Chronicling America. We will preview a few of the visualizations below, but head to our new Chronicling America Data Visualizations web page to read more.

Chronicling America Coverage by Time

Currently, Chronicling America provides access to newspapers published somewhere between the years of 1789 and 1963. However, because of the way Chronicling America’s collection scope has changed since 2005, the vast majority of pages currently available actually represent news from a narrower period spanning 1836-1922. As shown in the orange area chart (pictured below), newspaper coverage in Chronicling America is highest between 1908-1911. Fewer issues are available pre-1840. This is in line with trends in U.S. newspaper publishing. What may come as a surprise to some users is the dramatic drop of newspaper issues in 1923. Until a few years ago, newspapers published after 1922 were not in scope for inclusion in Chronicling America. The expansion of Chronicling America’s scope in 2016 opened the doors for new content pre-1836 and post-1922 (for newspapers not under copyright.) We’ll be adding more content in the later and earlier periods as our partners continue selecting and digitizing newspapers published in these important decades

Orange bar chart of Chronicling America coverage.

Chronicling America Coverage by State

Each state partner approaches newspaper selection in a different manner. It is therefore no surprise when users ask questions such as, “what titles are available in my state?” While users of Chronicling America can already consult a dynamic list of titles by state, the ever-growing nature of this list makes it challenging for individuals to assess the collection’s geographic coverage. In another visualization, we extracted the information about the newspapers from the aforementioned list of titles, and plotted it on an interactive map interface. As shown in the screen shot below, each dot on the map represents locations where newspapers currently available in Chronicling America were published. Users can zoom in (using the +/- controls or the mouse scroll wheel) and click on the dots to see more information and link to the digitized newspapers.

Map of U.S. demonstrating newspaper titles in Chronicling America.

Another common question we receive is, “do you have newspapers from [insert date] available from my state?” To help break down Chronicling America’s temporal coverage by state, we added a map to the temporal coverage visualization mentioned earlier. This allows a user to filter the temporal coverage area graph from above to show the specific dates available at a state or territory level. This information is useful in setting user expectations by showing gaps in coverage. State partners can also use these visualizations to help steer their future selection decisions (i.e. target filling collection gaps) if they like. The screen shot below shows coverage details for papers published in Ohio. Click on the map to see what years of newspapers are available in your state.

Map and chart of coverage of newspaper issues by year of publication.

Chronicling America Coverage by Language and Ethnic Press

Bubble chart of non-English Language page counts.

Did you know that Chronicling America also features newspapers from nineteenth and early twentieth century immigrant communities, including many non-English speaking groups? As explained in a 2014 blog post by NEH, “for decades, Germans were the largest non-English-speaking immigrant group in America … the group established a pattern that other immigrant groups followed later.” This pattern is apparent when we plot non-English language page counts into a packed bubble visualization (pictured to right). As demonstrated by the size of the bubbles, German, Spanish, and Polish papers currently have the most representation in the database, but recent additions of papers in Arabic, Cherokee, Czech, Lithuanian, and Icelandic are among the 18 languages currently included in the corpus.

Where were these communities located? Plotting our newspaper metadata to a map interface allows users to explore language and ethnicity coverage by location. Did you know Chronicling America has Finnish newspapers from Oregon, an Arabic newspaper from Michigan, and African American publications from over 20 states? Exploring our interactive map visualizations (one pictured to the left), users can zoom in and click on areas of interest to see details about the ethnic press and non-English titles in Chronicling America.

Map of United States demonstrating Ethnic Press coverage.

How we did it

There are a number of tools and technologies available to create similar data demonstrations. We used Tableau Public software for creating the visualizations from open data we extracted directly from Chronicling America. Every newspaper digitized for Chronicling America has an associated MARC XML catalog record as part of the U.S. Newspaper Directory. All digitized titles contain date and location information, and some titles have special subject headings to indicate a newspaper’s intended audience and ethnic press affiliation.

Extracting this data was easy, thanks to the well-designed Chronicling America API, structured views of the data, and expert assistance from LC colleague, Chris Ehrman. Generally, we extracted the data from Chronicling America and converted it to .csv or .xlsx formats, both of which work seamlessly in Tableau Public software. We provide access to this data via the download link alongside each visualization. More information about the visualizations and the links to the scripts Chris created can be found on our Chronicling America Data Visualizations web page.

Celebrate with Us!

To help celebrate these exciting accomplishments, please join our #15MillionPages #ChronAmParty on May 21st. All throughout the day, NDNP partners will be tweeting what is sure to be an eclectic assortment of content on the theme of #15MillionPages. Follow along and retweet our finds to your own followers or tweet your own discoveries! Just include #ChronAmParty #15MillionPages to join in the fun. Throughout the year, we’ll continue the #ChronAmParty with a changing theme every third Tuesday of the month. We’re looking forward to celebrating with you!

Questions about NDNP or Chronicling America? Contact [email protected]. You can also subscribe to our recent additions feed for more content updates.

Comments

Theresa says:
May 21, 2019 at 8:06 pm

This was a great explanation of the state of LoC’s efforts. I love Chronicling America and refer it frequently to my friends exploring their family histories.

Add a Comment Cancel reply

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.

Name (no commercial URLs) *

Email (will not be published) *

Comment: