Top of page

Do volunteer transcriptions improve search and discovery in loc.gov?

Share this post:

Today’s guest post is from Abby Shelton, a Digital Collections Specialist and By the People Community Manager in the Digital Content Management Section at the Library of Congress.


How do people use crowdsourced transcriptions? Do they drive increased traffic and engagement to our digital collections? What kinds of activity do transcriptions of handwritten documents facilitate?

These are some of the big questions that the By the People team is asking this year. We know our volunteers are motivated by making collections accessible and useful for all.  Our volunteers have completed an incredible number of transcriptions, over 580,000 of them. We have integrated over 146,000 of those back into their source collections and are continuing to add as our volunteers complete campaigns. To better understand our program’s reach and communicate to our volunteers the real-world impact of their work, we have started looking into the impact of transcriptions from a few different angles. This post will focus on search and discovery, which is one of the ways we know that transcriptions make the collections of the Library of Congress more accessible to all.

We frequently talk about the ways that transcriptions can facilitate better search and discovery of documents in the Library’s collections. And we have heard stories from researchers (see here and here) and Library of Congress curators that using transcription data to search across a collection enables new discoveries and connections to emerge. To test how transcriptions aid web users in their search for relevant materials, we turned to web analytics. The Library of Congress collects various pieces of information about patrons’ behavior on our websites (for more information, see our privacy policy) including the search terms that people use on the Library’s website and where they end up as a result of those terms. Our team wanted to assess whether there was an observable difference in the number of search terms leading to collections before and after transcriptions were added to loc.gov.

To evaluate search terms and usage, we identified terms used in loc.gov that resulted in a user landing on an item from three collections: the papers of Branch RickeyCarrie Chapman Catt, and Rosa Parks. And we created custom date ranges for each collection depending on when the transcriptions were added to loc.gov so that we could compare pre-and post-transcription data.

Record for the Branch Rickey Papers, as it appears on loc.gov.
A search result from the Branch Rickey Papers for the term “Eddie Stanky.” In this scouting report, Rickey describes minor league player John Covaleski as an “18 year old Eddie Stanky.”

We found that as expected, adding transcription data to these collections increased the number of times a user found items from the collections in their search results. In the year after the Branch Rickey transcriptions were integrated into loc.gov the digital collection saw an 86% increase in the number of search terms and 93% increase in user visits where a search led patrons to a Rickey item. Similarly, in the six months after the Catt transcriptions were published, there was a 47% increase in the number of terms and a 43% increase in user visits leading patrons to discover items from the collection. The Rosa Parks collections showed only a modest increase of 3% of search terms leading to Parks and a 23% increase in user visits after transcriptions were added to the collection. One theory about why this might be the case is that the Rosa Parks collection is one of the most popular collections at the Library of Congress and it receives more traffic per year than either of the other test collections. As a result of such high traffic, the transcriptions made a slight but unremarkable difference in users finding their way to the digital collection via keyword search.

Next, we were curious to know what kinds of terms led patrons to these collections due to the transcription data. This required checking search terms against transcription content in loc.gov. And we found all kinds of interesting terms that people all over the world have used to access our transcribed collections.

Place names, thematic terms, and historical events dominate the list but a number of terms illuminate the networks that surrounded the collections. For instance, loc.gov users frequently searched for figures in Carrie Chapman Catt’s network of suffragists, activists, and reformers. Jessie Haver Butler (one of the first women professional lobbyists in Washington), Ella A. Boole (president of the Women’s Christian Temperance Union), and María Abella de Ramírez (founder of the National Women’s League in Argentina) all appear as part of the transcriptions. Without the effort of By the People volunteers, a search of loc.gov for the names of these and many other Catt correspondents would skip over important materials in the collection. These are terms found only in the transcriptions, not in the titles or other metadata associated with the item.

A letter from Ella Boule, president of the Women’s Christian Temperance League, to Carrie Chapman Catt from November 10, 1930. https://www.loc.gov/resource/mss15404.mss15404-005_00170_00175/?sp=5?loclr=blogsig

Similarly, a majority of the search terms used to find the Rosa Parks papers revolved around Civil Rights figures and events, including a large group of Black women’s names. Many of these terms came out of the many programs or newspaper clippings reporting on events where Rosa Parks was honored or gave a lecture. For example, the author of this article from the March 1991 edition of Jet Magazine listed the luminaries who gathered at the National Gallery of Art for the unveiling of a statute of Parks, including Coretta Scott King, C. Delores Tucker, John Lewis, and Cicely Tyson. The transcriptions of these programs and news clippings allow us to get a better sense for the networks that Rosa Parks inhabited-who she spoke with and attended events alongside. And if not for our volunteers’ transcriptions, a patron searching for one of these names in the Rosa Parks papers wouldn’t find any of the textual materials related to these figures in the collection.

Jet magazine article with title, "Sculpture of Civil Rights Heroine Rosa Parks Unveiled"
An article in Jet Magazine from 1991 reporting on the unveiling of the Rosa Parks sculpture in the Smithsonian. https://www.loc.gov/resource/mss85943.002001/?sp=7?loclr=blogsig

This is the just the beginning of the impact transcriptions could have on search and discoverability. Have you used By the People transcriptions? Let us know-we would love to expand our understanding of how these resources are being used!

 

Comments (4)

  1. Transcribing items from the LoC holding is good, but how can people from the outside find what is available? Is there a search engine? Please share as I am always looking for new things on early aviation to learn.
    Simine Short

    • This response is from Abby Shelton, By the People Community Manager and the author of this post.

      Hi Simine! There are a couple of ways you can find out what the Library has related to your topic of interest. You can search https://loc.gov/ and return results from across the Library’s collections and web content. And you could reach out to our Reference Librarians through the Ask-a-Librarian service (https://ask.loc.gov/) to ask a question about your topic. The Library also has research guides arranged by topic which might be useful. The guides provide a selected overview of resources the Library has related to a variety of subjects: https://guides.loc.gov/. Best of luck in your research!

  2. Were the searches done in the main search box at loc.gov? Or were they done at the digital collections page? Do the transcriptions connect to the catalog page at all? (catalog.loc.gov)

    • This response is from Abby Shelton, By the People Community Manager and the author of this post.

      Hi Joanna! Thank you for your great questions. We’ve looked at searches conducted in the main search box and at the collection level. For this post, we decided to feature the main box searches since we hypothesize that more users conduct broader searches on the loc.gov homepage than at the collection level.

      The transcriptions are connected at the page level to the digital collections on loc.gov. For instance, here’s a transcribed page from the Carrie Chapman Catt collection: https://www.loc.gov/resource/mss15404.mss15404-007_00229_00243/?sp=2&st=text. You can see the text appears alongside a scan of the original. Most digital collections include a link to a catalog record and/or archival finding aid. For instance on the page linked above, you can find a link to the finding aid below the image on the right side of the page. Hope that answers your question!

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.


Required fields are indicated with an * asterisk.