Top of page

Newspaper Navigator Surfaces Treasure Trove of Historic Images – Get a Sneak Peek at Upcoming Data Jam!

Share this post:

Eileen Jakeway, an innovation specialist on the LC Labs team, first posted this piece to The Signal blog.  In this post, Innovator in Residence Ben Lee talks about his aspirations for engaging the American public with the millions of images he extracted from Chronicling America.  The Chronicling America historic newspapers online collection is a product of the National Digital Newspaper Program and jointly sponsored by the Library of Congress and the National Endowment for the Humanities.

Although Library of Congress buildings are closed to the public, projects like Newspaper Navigator are busy unlocking even more digital content for members of the public to access from home!

On May 7th at 2pm EST, Ben will host a virtual data jam to experiment and play with thousands of images—including maps, advertisements, comics, and more!—from historical newspapers dating all the way back to the 1800s. If you’re interested in getting a sneak peek of the headlines, photographs, illustrations, maps, comics, editorial cartoons, and advertisements contained in the 16 MILLION pages of historical newspapers held by the Library of Congress, sign up to attend Ben’s remote data jam. And now, here’s Ben.

Looking through the historic newspapers in Chronicling America, I’m always struck by how visually rich the pages are: beautiful illustrations, fascinating political cartoons, and prominent headlines abound. During my time as an Innovator in Residence at the Library of Congress, I’m developing a project called Newspaper Navigator, the goal of which is to re-imagine how we can explore wonderfully rich visual content in Chronicling America.

Library of Congress Innovator in Residence Benjamin Lee, February 27, 2020. Photo by Shawn Miller/Library of Congress.
Note: Privacy and publicity rights for individuals depicted may apply.

Having just finished the first phase of Newspaper Navigator, which involved using machine learning to extract the visual content in Chronicling America, I am more amazed than ever at the diversity of images preserved in these pages. To give a sense of this, included below is a visualization which contains all of the Civil War maps identified by Newspaper Navigator. Zoom in to see the maps in more detail! What excites me most about these maps is the diversity of questions we can ask. What can we learn about cartographic history? What do the patterns of use and re-use of the same maps tell us about the historical relationships between newspaper writers and editors? Many scholars have been exploring these kinds of research questions for a long time reads (for example the Viral Texts project) but I think these images also have a lot to offer anyone interested in learning more about American history.

Ben’s visualization of Civil War-era maps

My interest in using computer science to help identify and analyze images in digitized cultural heritage collections stems from my time as a Digital Humanities Associate Fellow at the United States Holocaust Memorial Museum (USHMM). My grandmother is a survivor of Auschwitz-Birkenau concentration camp, and when I first visited the USHMM with her in 2007 as a teenager, a reference librarian told us about the International Tracing Service, one of the largest repositories of documents pertaining to the Holocaust. A decade later, while a fellow at the USHMM, I learned about the idiosyncrasies and difficulties of manually searching the digital archive. As a result, I developed a project to aid the search process by using machine learning to identify different types of documents in the archive (if you are interested, you can read more about this work here). The idea of facilitating access to meaningful materials has stuck with me. It has been a large motivation in pursuing this line of research with cultural heritage collections with both the Library of Congress and the University of Washington, where I am currently a Ph.D. student in Computer Science and Engineering.

The 16+ million historical newspapers within Chronicling America are fascinating to me on so many levels. They are a portal back in time and reveal the rich history of the United States in a way that is unique to historic newspapers, from local histories to fun advertisements. But what excites me most about Chronicling America is how it reaches such a wide range of the American public, including school groups, genealogists, journalists, local historians, researchers, and even people looking to recreate old cooking recipes! With Newspaper Navigator, I have been fortunate to be able to build on the work of Beyond Words volunteers who identified visual content in World War I-era newspapers; using the annotations volunteers created, I have trained a machine learning model to identify visual content in historic newspapers. In the first phase of Newspaper Navigator, I have used this model to find and extract all of the visual content included in over 16 million pages in Chronicling America. The resulting dataset, which I am calling the Newspaper Navigator dataset, includes headlines, photographs, illustrations, maps, comics, editorial cartoons, and advertisements.

A compilation of extracted images from the Newspaper Navigator dataset

With the Newspaper Navigator dataset, I hope to enable Chronicling America users to search the collection in entirely new ways. What are some of these ways?

That’s where the Newspaper Navigator “data jam” comes in! Before the public website for interactive browsing goes live, I am holding an event called a “data jam” to provide a sneak peek of the Newspaper Navigator dataset. The goal of the data jam is to release specific subsets of these images grouped by category, topic, and publication date – such as all of the Civil War maps shown above – in order to see what people can do with them. That goes for any skill or interest level – no coding necessary! I’m imagining some people may just want to browse, for example, all of the advertisements in New York newspapers from 1920 to 1930. Or there may be people with even more specific research questions looking for maps of the shifting front lines during the Civil War. At the other end of the spectrum, there may be programmers looking to study the headlines using emerging techniques from natural language processing or create visualizations of the extracted photographs. There will be something for everyone–the only requirement to join is an interest in the dynamic images contained within this treasure trove of newspapers past and present.

In any case, I think this project comes at a time when a lot of people are asking what the roles of museums and libraries are in providing access to interesting, culturally relevant, and intellectually enriching digital content to the American public. I personally feel that the Library of Congress has an enormous amount to offer in this area. In the spirit of providing free and open access to digital content, the Newspaper Navigator dataset, the Newspaper Navigator application, and all of the code written to produce them will be dedicated to the public domain.

Lastly, I think projects like these create more of an open dialogue between Library digital users and the stewards of this content! To me, an ideal outcome of this project would be to have a whole sub-community of people using this content – in their research, in their schoolwork, or just as a way of staying entertained while working from home. I would love nothing more than to meet and work with people who are just as excited about these possibilities as I am!

If you have any questions, feel free to reach out to me by email at [email protected].

To be notified when the dynamic Newspaper Navigator search interface goes live, subscribe to the LC Labs Letter.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.


Required fields are indicated with an * asterisk.