Do volunteer transcriptions improve search and discovery in loc.gov?

Today’s guest post is from Abby Shelton, a Digital Collections Specialist and By the People Community Manager in the Digital Content Management Section at the Library of Congress.


How do people use crowdsourced transcriptions? Do they drive increased traffic and engagement to our digital collections? What kinds of activity do transcriptions of handwritten documents facilitate?

These are some of the big questions that the By the People team is asking this year. We know our volunteers are motivated by making collections accessible and useful for all.  Our volunteers have completed an incredible number of transcriptions, over 580,000 of them. We have integrated over 146,000 of those back into their source collections and are continuing to add as our volunteers complete campaigns. To better understand our program’s reach and communicate to our volunteers the real-world impact of their work, we have started looking into the impact of transcriptions from a few different angles. This post will focus on search and discovery, which is one of the ways we know that transcriptions make the collections of the Library of Congress more accessible to all.

We frequently talk about the ways that transcriptions can facilitate better search and discovery of documents in the Library’s collections. And we have heard stories from researchers (see here and here) and Library of Congress curators that using transcription data to search across a collection enables new discoveries and connections to emerge. To test how transcriptions aid web users in their search for relevant materials, we turned to web analytics. The Library of Congress collects various pieces of information about patrons’ behavior on our websites (for more information, see our privacy policy) including the search terms that people use on the Library’s website and where they end up as a result of those terms. Our team wanted to assess whether there was an observable difference in the number of search terms leading to collections before and after transcriptions were added to loc.gov.

To evaluate search terms and usage, we identified terms used in loc.gov that resulted in a user landing on an item from three collections: the papers of Branch RickeyCarrie Chapman Catt, and Rosa Parks. And we created custom date ranges for each collection depending on when the transcriptions were added to loc.gov so that we could compare pre-and post-transcription data.

Record for the Branch Rickey Papers, as it appears on loc.gov.

A search result from the Branch Rickey Papers for the term “Eddie Stanky.” In this scouting report, Rickey describes minor league player John Covaleski as an “18 year old Eddie Stanky.”

We found that as expected, adding transcription data to these collections increased the number of times a user found items from the collections in their search results. In the year after the Branch Rickey transcriptions were integrated into loc.gov the digital collection saw an 86% increase in the number of search terms and 93% increase in user visits where a search led patrons to a Rickey item. Similarly, in the six months after the Catt transcriptions were published, there was a 47% increase in the number of terms and a 43% increase in user visits leading patrons to discover items from the collection. The Rosa Parks collections showed only a modest increase of 3% of search terms leading to Parks and a 23% increase in user visits after transcriptions were added to the collection. One theory about why this might be the case is that the Rosa Parks collection is one of the most popular collections at the Library of Congress and it receives more traffic per year than either of the other test collections. As a result of such high traffic, the transcriptions made a slight but unremarkable difference in users finding their way to the digital collection via keyword search.

Next, we were curious to know what kinds of terms led patrons to these collections due to the transcription data. This required checking search terms against transcription content in loc.gov. And we found all kinds of interesting terms that people all over the world have used to access our transcribed collections.

Place names, thematic terms, and historical events dominate the list but a number of terms illuminate the networks that surrounded the collections. For instance, loc.gov users frequently searched for figures in Carrie Chapman Catt’s network of suffragists, activists, and reformers. Jessie Haver Butler (one of the first women professional lobbyists in Washington), Ella A. Boole (president of the Women’s Christian Temperance Union), and María Abella de Ramírez (founder of the National Women’s League in Argentina) all appear as part of the transcriptions. Without the effort of By the People volunteers, a search of loc.gov for the names of these and many other Catt correspondents would skip over important materials in the collection. These are terms found only in the transcriptions, not in the titles or other metadata associated with the item.

A letter from Ella Boule, president of the Women’s Christian Temperance League, to Carrie Chapman Catt from November 10, 1930. //www.loc.gov/resource/mss15404.mss15404-005_00170_00175/?sp=5?loclr=blogsig

Similarly, a majority of the search terms used to find the Rosa Parks papers revolved around Civil Rights figures and events, including a large group of Black women’s names. Many of these terms came out of the many programs or newspaper clippings reporting on events where Rosa Parks was honored or gave a lecture. For example, the author of this article from the March 1991 edition of Jet Magazine listed the luminaries who gathered at the National Gallery of Art for the unveiling of a statute of Parks, including Coretta Scott King, C. Delores Tucker, John Lewis, and Cicely Tyson. The transcriptions of these programs and news clippings allow us to get a better sense for the networks that Rosa Parks inhabited-who she spoke with and attended events alongside. And if not for our volunteers’ transcriptions, a patron searching for one of these names in the Rosa Parks papers wouldn’t find any of the textual materials related to these figures in the collection.

Jet magazine article with title, "Sculpture of Civil Rights Heroine Rosa Parks Unveiled"

An article in Jet Magazine from 1991 reporting on the unveiling of the Rosa Parks sculpture in the Smithsonian. //www.loc.gov/resource/mss85943.002001/?sp=7?loclr=blogsig

This is the just the beginning of the impact transcriptions could have on search and discoverability. Have you used By the People transcriptions? Let us know-we would love to expand our understanding of how these resources are being used!

 

What’s new online at the Library of Congress – Thanksgiving 2022

Interested in learning more about what’s new in the Library of Congress’ digital collections? The Signal now shares out semi-regularly about new additions to publicly-available digital collections and we can’t wait to show off all the hard work from our colleagues from across the Library. Read on for a sample of what’s been added recently and […]