Top of page

Black and white campaign images for Susan B. Anthony Papers and the Elizabeth Cady Stanton Papers.
Both the "Elizabeth Cady Stanton Papers" and "Susan B. Anthony Papers" campaigns were among those retired from the By the People site, crowd.loc.gov, in May 2023.

The crowdsourced transcription lifecycle – from conception to retirement

Share this post:

Today’s guest post is from Lauren Algee, a Senior Collections Specialist in the Digital Content Management Section and a By the People community manager.


After years of service from thousands of volunteers, some of the oldest completed By the People crowdsourced transcription campaigns are celebrating their retirement!

This week, the By the People team retired 10 campaigns completed by volunteers between 2019 and 2022. The transcriptions from retired campaigns can now be explored and searched on the Library’s main website (loc.gov). This means we no longer need to keep duplicate images and transcriptions on the By the People website and the files can be retired! Retired campaigns will keep their landing pages on By the People, which contain contextual information, basic statistics, and links to where to find the campaign’s transcriptions and datasets on loc.gov (see below).

Landing page for the retired Susan B. Anthony page on the By the People website.
Landing page for the retired Susan B. Anthony page on the By the People website.

To explain what retirement means and how transcriptions get there, we’ll take a quick walk through the lifecycle of a transcription campaign – from idea, to completion, and readiness for retirement.

The Life of a Transcription

Campaigns start with the Library’s collections specialists and staff who work in divisions all over the institution. Staff propose groups of historic texts from the Library’s collections for transcription. In By the People-speak, we call these groups “campaigns.” Campaigns usually correspond to a specific Library collection, but may also bring together related materials from across collections. Candidate materials need to already be digitized, free of intellectual property constraints and privacy concerns, and if printed, not broadly amenable to OCR (Optical Character Recognition) technology.

Once we approve a proposal, we work on bringing the images and web content into Concordia, the Library’s open source web-based transcription platform. The Library’s IT Design and Development Directorate developed Concordia in-house and it powers most of what you see on the By the People website. The By the People team creates and uploads spreadsheets that Concordia uses to pull images and metadata from the loc.gov API into the system. Library collection staff are responsible for writing narrative context, selecting representative images and helpful links, and collaborating on outreach.

Once a campaign is live on crowd.loc.gov, volunteers get to work! Transcriptions are completed through a consensus-based model. It takes at least two volunteers to finish a page (one to transcribe and one to review), but a more complicated transcription could take many more passes back and forth through review. We think of this process as a safety net that gives volunteers the chance to learn and grow. It can take from two days to over two years to complete a campaign depending on the amount of material, the interest level of volunteers, and the difficulty. Once volunteers complete all of the transcriptions in a campaign, staff do some light review before Library developers publish the transcriptions to loc.gov.

Screenshot of a letter from Georgia O’Keeffe to Henwar Rodakiewicz, December 1932, published alongside a transcription created by By the People volunteers. Letter is from the from the Georgia O'Keeffe and Alfred Stieglitz Correspondence and Related Material collection, Manuscript Division.
Screenshot of a letter from Georgia O’Keeffe to Henwar Rodakiewicz, December 1932, published alongside a transcription created by By the People volunteers. Letter is from the from the Georgia O’Keeffe and Alfred Stieglitz Correspondence and Related Material collection, Manuscript Division.

What data results from this process?  Concordia can export a TXT file for each transcription, as well as a CSV file for a campaign that contains every transcription and some metadata. We work with colleagues in the Library’s Digital Content Management Section and the loc.gov team to add both forms of transcriptions to loc.gov. The TXT files are matched up to the original digital images in the loc.gov digital collection. When published, these individual transcriptions facilitate image-level keyword search, as well as improved readability and accessibility. We package a campaign’s bulk CSV file with documentation and publish it as a dataset in the Library of Congress Selected Datasets collection for computational use.

All transcriptions include an attribution acknowledging volunteer efforts. To-date, 643,000 transcriptions have been completed, of which more than 147,000 are published on loc.gov. Additionally, 11 campaigns have been published as datasets and you can keep track of our latest numbers on our website.

A retirement well-earned

By the People was always intended as a passthrough site, where transcriptions are created and housed temporarily before being returned to loc.gov. As By the People has grown rapidly in the last four years and our database of images and transcriptions has grown larger and larger, the website’s performance has started to slow. So we decided it was finally time to retire long-completed and already-published images and transcriptions.

By the People community managers worked with the Concordia development team to develop retirement functionality. After community managers assess and decide a campaign is ready for retirement, they enable the new functionality in the back-end of Concordia which removes images and transcriptions for a campaign from the database, while retaining user metrics and other campaign data needed for reporting. Registered volunteers won’t see any change to the contribution counts that display on their profile pages – you’ll still know just how many pages you’ve transcribed and reviewed. And most importantly, all completed transcriptions for retiring campaigns are already safely preserved in the Library of Congress digital collections repository.

Retirement will improve By the People site speed and functionality and has the added benefit of encouraging researchers to view and use transcriptions within the loc.gov digital collections where they are full-text searchable and sit in context with other Library of Congress resources.

Campaigns retired this week are:

Please join us in congratulating the many volunteers who transcribed these collections on a job well done! Let’s send these campaigns off to a very happy retirement!

Add a Comment

Your email address will not be published. Required fields are marked *