Top of page

Credit: Extracted from Newspaper Navigator.

Humanistic Research and Machine Learning: Exploring Editorial Cartoons with Newspaper Navigator

Share this post:

This is a guest post by Sylvi Rose Stein, Benjamin Charles Germain Lee, and Brandon Webb

Sylvi Stein is an undergraduate at Columbia University and a research assistant at the Kluge Center. She is a recipient of the Laidlaw Research Fellowship.

Benjamin Charles Germain Lee is an Assistant Professor in the University of Washington’s Information School. He was a 2023-24 Kluge Fellow in Digital Studies. Previously, he served as an Innovator in Residence at the Library of Congress.

Brandon Webb is a Kluge Fellow (2024). His current book project is a cultural and labor history of American political cartooning in the 20th century. Based in Montreal, he completed his PhD in 2022 and is the co-editor of Concordia University at 50: A Collective History (Concordia University Press, 2024).      

Newspapers are a unique portal into American life. Produced by the Library of Congress and the National Endowment for the Humanities, the online database called Chronicling America has made it possible for scholars, genealogists, and members of the public to access and search tens of millions of pages of historic American newspaper pages through digitization. Text search within Chronicling America is robust, making it possible for end-users to search for the appearance of various keywords and phrases within articles. How does one begin to search the visual culture within the digitized newspaper pages?

Five years ago, Ben Lee and LC Labs began a project called Newspaper Navigator through the Library of Congress’s Innovator-in-Residence program. Newspaper Navigator explored the possibilities for applying machine learning to Chronicling America in order to identify the visual culture within the 16 million pages of digitized historic American newspapers available at the time (the database now contains over 21 million pages).

In particular, Lee trained a machine learning model on thousands of crowdsourced annotations of visual culture from the Beyond Words crowdsourcing project. Lee then built out a pipeline to process those 16 million pages and extract seven different types of visual features: photographs, illustrations, maps, comics, editorial cartoons, headlines, and advertisements. Lee and LC Labs released the publicly-available Newspaper Navigator dataset in May, 2020, enabling users to explore the visual culture in new ways. In September, 2020, Lee and LC Labs launched a search application for over 1.5 million photos in the dataset (more about the Newspaper Navigator project can be found in the two Signal blog posts available here and here, as well as Lee’s data archaeology).

Newspaper Navigator represents just one of many projects that have adopted the use of machine learning and AI within cultural heritage, and humanists are considering how these new approaches and methodologies may provide new opportunities for scholarship. Returning to the library this past year as a Kluge Fellow in Digital Studies, Lee met Kluge Fellow Brandon Webb, and the two became interested in exploring the possibilities for navigating the editorial cartoons within the Newspaper Navigator dataset. As a student research intern at the Kluge Center, Sylvi Stein assisted with testing the technology in a research setting. In particular, we wanted to explore the following questions:

  • What might the experience using Newspaper Navigator teach us about the challenges in combining humanistic methodological frameworks with tools in digital humanities?
  • What limits and possibilities do they present?
  • How can these tools reshape our understanding about the past?

Webb’s research provided a readymade topic in which to explore these questions. Specifically, his Library of Congress research and current book project has expanded on the post-World War II temporal focus of his dissertation to the broader scope of twentieth century American political cartooning. His work looks at how political cartooning became “a visual medium that doubles as a mode of communication and social critique” and one that visualized changing notions of citizenship, nationhood, and institutions.[1]

Political cartoons offer historians a window into how history was being interpreted and represented as it was happening.

Stein joined this collaboration as a research assistant, bringing her acute eye for archival finds to a digital repository that had no shortage of images to flesh out Webb’s tentative arguments about the transformations that upended the medium as cartoonists became integrated into an expanding commercial press in the early 20th century.

Using the tool Newspaper Navigator was a transformative experience, in that it quite literally transformed the style of research undertaken by Stein. Instead of combing through archives manually, flipping dusty pages, she scrolled through neatly-packed folders of information delivered digitally to her computer screen.

Screenshot example of how the folder displays multiple reprinted copies of the same image. Credit: Extracted from Newspaper Navigator.

 

Newspaper Navigator draws on the archives of Chronicling America, a digital database of millions of newspaper pages. Stein used this tool to investigate political cartoons in the twentieth century before World War I. The goal was to explore the changing portrayal of a figure known as “Mr. Moneybags,” a plump man with a waistcoat and top hat, who is symbolically associated with capitalism. Mr. Moneybags appears, over the years, as various trusts, as sectors of the government, and even as specific politicians.

Although he largely disappears from the world of editorial cartoons in the post-WWII period, the caricature left his mark on American visual culture (see, for example, Mr. Monopoly, the mascot of the popular board game). Mr. Moneybags was a staple of early twentieth-century American dailies; conveniently, with Newspaper Navigator, Stein could download all the images categorized as “cartoons” for a given year. She could even download a random sample of 1000 cartoons from that year. This tool greatly reduced the time that would have spent categorizing, sorting, and puzzling out the years; with a few clicks, one could call up thumbnails for a thousand cartoons for any given year.

Credit: Extracted from Newspaper Navigator.

 

Despite the ease and convenience offered by this tool, there were still some issues Stein ran across as she scoured these images for the information needed. As an AI tool, Newspaper Navigator has been trained on data to differentiate between a political cartoon and, say, a photograph. However, it often made errors; it could be tricked by ads which were hand-drawn, or illustrations that accompanied a story. Around a third of the images in each folder were not actually political cartoons. This mislabeling is also telling, however, since the graphic art from this era, whether political cartoons, comic strips or advertisements, were often drawn by the same artists employed to perform various illustrative tasks for a newspaper.

Another issue she found was that the images themselves were occasionally not legible; due to poor scan qualities, she had to guess at the labels assigned to some figures. Newspaper Navigator would occasionally show the same cartoon several times, probably because it was printed in more than one paper in its database, so each image was classified differently. However, while flipping through an actual newspaper would allow someone to quickly discover the differences and similarities between the contexts in which these cartoons were printed, Stein was left observing the digital versions in a vacuum. There is a spreadsheet associated with the tool that you can call up to find out in which paper a cartoon was printed, as well as on what date, but this information is still not nearly as easily accessible as it would be to someone working directly with the archival material.

 

Credit: Extracted from Newspaper Navigator.

 

These limitations direct us to larger questions about the ambiguities that transfer over from in-person to digital archives, namely the dangers of decontextualizing visual content. Given the general drift in digital research toward favoring aggregate searches over more granular readings, it is important to ward off decontextualizing tendencies that are inherent to researching large datasets. At bottom, studying the past asks us to contextualize primary sources by placing them in conversation with other types of sources from the same period. When applied to the visual material compiled in Newspaper Navigator, however, things get tricky.

Analyzing a selection of cartoons without reference to how readers encountered them – through newspaper readership – ignores the commercial aspect of the mass circulating dailies that published and circulated these images widely. Attention to the economics of print capitalism reminds us that producing newspapers was, above all, big business, and one that was fiercely competitive.

In this way, it is possible to read the impressive array of newspapers housed in Chronicling America as an archive of competition and concentration of capital that created a national print and visual culture. For example, in the early twentieth century the rise of syndication and newspaper chains provided a commercial model for circulating and sharing content between large and small circulating dailies. Cartoonists who worked for urban-based newspapers saw their images distributed to small dailies across North America. Scholars of print media note that syndication helped to create a national visual culture mediated through newspaper reading long before television superseded newspapers as the dominant medium for news and entertainment. This evolving commercial context cannot be grasped by looking at cartoons on their own, but it can be glimpsed, something Stein’s research into the Mr. Moneybags character demonstrated.

Reading the Mr. Moneybags motif that cartoonists used to explain the social upheavals that accompanied early twentieth-century American capitalism helps avoid viewing this sample-base as a collection of static, stand-alone images divorced from the politics of its day. As visual texts political cartoons are highly contextual and, as such, encourage situational readings. Aesthetically-centered readings tend to miss this salient point by treating cartoonists as isolated creators rather than the embedded print workers they were for much of the medium’s history. These potential pitfalls, we found, can be mitigated through reference to the other categories of visual content collated by Newspaper Navigator’s algorithms. In this way, research methods, following established points in the scholarly literature, provided a means of searching through large datasets in a way that deepened an appreciation for historiography’s crowded interpretive field of competing and complementary claims.

Credit: Extracted from Newspaper Navigator.

 

In keeping with historians of political cartoons who try to interpret their sources in relation to the text and images found elsewhere in a newspaper, we analyzed our sample base in relation to news articles, headlines, photographs and editorials. Newspaper Navigator, helpfully, collates these categories and therefore provides tools to combat the decontextualizing tendencies of digital research. For instance, pairing searches of editorial cartoons with searches of headlines within a given timeframe combines research into two categories of visual content extracted from the Chronicling America database. This approach can be generalized, thus giving a more holistic view of print cultures of the past. After conducting searches, researchers can also circle back to the newspaper in question and delve into its textual contents to better gauge the context of a specific cartoon. Returning to political cartoons, such a method may yield insights into what a cartoonist may have been responding to while opening up possibilities for reconstructing how readers encountered this content.

The above reflections exemplify, we hope, some of the problems noted in literature in digital humanities more broadly. In his blending of critical theory and computational science, James E. Dobson has argued for a critical digital humanities perspective that goes beyond not only our own disciplinary frameworks in the humanities, but also considers how methods we deploy in archival practice are (re)shaped by new technologies. The “search for a methodology,” as Dobson suggests, begins not with answers, but questions.[2] Similarly, it is important to recall that the advent of machine learning research tools has historical parallels. As the editors to a special journal issue on digital cultures have described it: “[just as] the modern industrial era reshaped the nature of human and political subjectivity, the digital information era is reshaping communication and social relations across multiple platforms.”[3]

Like any archival practice, navigating Newspaper Navigator highlighted how crucial it is for scholars who engage with digital tools to reflect on the limits of historical research. Part of “reading sources against the grain” is to acknowledge that the historical record, as it were, is by its very nature incomplete. No effort to fill in its gaps, however much it is aided by technology, can fill in all its absences, silences or ambiguities. It is within these cracks that interpretation reemerges in its essential role. Moreover, since this research into early 20th century print culture is difficult to replicate on a human scale, digital tools aid these interpretative efforts. But it is humans, we should remember, who design these tools, collect sources, tag images, and so on. The experience of exploring the visual content collated by Newspaper Navigator underscores that writing about historical sources is an ongoing conversation about the past that has no fixed endpoint. As with digital archives, the possibilities for further reflection and new questions are endless.

 

[1] Brandon Webb, “Laughter Louder than Bombs? Apocalyptic Graphic Satire in Cold War Cartooning, 1946-1959,” American Quarterly 70, no.2 (2018): pp. 235-266.

[2] See, for example, James E. Dobson, Critical Digital Humanities: The Search for a Methodology (University of Illinois Press, 2019).

[3] Editors’ introduction to special journal issue, “Radical Histories in Digital Cultures” in Radical History Review, issue 117 (Fall 2013), p. 1. Edited by Lyell Davies, Conor McGrady, and Elena Razlogova.

Add a Comment

Your email address will not be published. Required fields are marked *