Introducing Beyond Words

As a part of Library of Congress Labs release last week, the National Digital Initiatives team launched Beyond Words. This pilot crowdsourcing application was created in collaboration with the Serial and Government Publications Division and the Office of the Chief Information Officer (OCIO) at the Library of Congress. In our first week and a half, we’ve hosted nearly 1,300 volunteers and marked over 30,000 pictures in historic newspaper pages. In this post, we explore the goals, background, workflows, possibilities, and more on our progress so far with Beyond Words.

Beyond Words Goals and Background

You’ll find Beyond Words is part of our recently launched labs.loc.gov within our Experiments section. As a pilot, the main goal of Beyond Words is to identify and caption pictures in newspaper pages to create public domain data for researchers to use. The crowdsourced data that are collaboratively generated in Beyond Words are released into the public domain, then available for download as JSON data and exploration in a public gallery.

Screenshot of editorial cartoon in the Beyond Words Picture Gallery

Beyond Words Picture Gallery – Search and Filter

Our secondary goal is to generate feedback about the workflow, instructions, and resulting data. Beyond Words may change quickly and will continue to serve as an experimental application. The pilot is also an opportunity to continue to learn from and apply lessons from other cultural heritage institutions with established transcription programs such as the U.S. National Archives and Records Administration Citizen Archivist and the Smithsonian Institution Transcription Center, as well as examples from the Library including Flickr Commons. Beyond Words further allows us to observe activity and pain points as we begin the design of our forthcoming transcription and tagging platform.

Building Beyond Words

Beyond Words is a web-based application that was developed as an Innovator-in-Residence project by Library of Congress OCIO developer Tong Wang. Beyond Words is an open source crowdsourcing pilot built as an instance of Scribe, the NEH-funded collaboration between the New York Public Library and Zooniverse. You can learn more about our implementation of Scribe on GitHub and watch for updates.

The newspaper pages that are marked and transcribed in Beyond Words are selected from Chronicling America. Chronicling America is a dynamic project that currently supports over 12 million newspaper pages from 40 states, with new papers added every day. Since we designed Beyond Words as a pilot, we needed to hone in on a focused set of newspapers. We targeted the centennial commemoration of World War I and limited our range to the U.S. declaration of war through the cessation of hostilities, 06 April 1917 to 11 November 1918. Since new pages are added each day, we also limited our data set to what was available in Chronicling America, in the date range, as of 14 September 2017.

Jumping in: Tasks & Tips 

How does Beyond Words work? First: No log in! Secondly, you’ll need to know what we’re seeking. We ask that you mark pictures and transcribe the title, caption, and cutline when present; you’ll also categorize the picture type and make a note of the artist, if present. We use the word “pictures” in the instructions to include photographs, illustrations, editorial cartoons, comics, and maps. However, we are excluding advertisements–despite interesting content that lasts–in this pilot newspaper set.

On Beyond Words, you can get started right away by selecting one of three steps: mark, transcribe, verify. At least two people must agree in their task in each step; matching marks and transcriptions to skip the verify step. If inconsistencies emerge, the best transcription, category, and artist (if present) is selected by volunteers in the verify step. Our tutorial shows how to break out the title, caption, and cutline–watch for all three, plus category AND artist as you verify.

View of transcription window and photograph of Captain Wickerham

Transcribing Captain Wickersham’s Promotion

We ask that you take your time as you work to carefully identify the pictures. Pages without images should be marked “Done.” Some of the older photographs may look like illustrations; watch for mix ups of illustration and map. Also keep in mind that the artist is often included in very small print. Common photographers include Underwood & Underwood and Harris & Ewing. You’ll see comics from A.D. Condo, Hop, and W. R. Allman.

Want additional hints? This application works best on a desktop or laptop with a mouse. Zoom in using your keyboard or the zoom tool. You can also begin your Beyond Words activity in a preferred state from the home page. Reminders of instructions are found in the “View A Tutorial” section, as well as the FAQ. Want to transcribe a picture right after you mark it? Select “Transcribe this page now!” And at any point in any of the three steps, you can view the original page in Chronicling America.

We invite you to have fun and do your best; the newspapers are fascinating but marking and transcription isn’t always easy. Remember to take breaks and send us feedback! If you are inspired by what you are learning while using Beyond Words, you can explore Library of Congress World War I collections.

Doors to Discoveries

What might a volunteer discover while marking, transcribing, and verifying newspaper pictures? Certainly many social and cultural changes that marked the Great War era. On 05 January 1918, you’ll see “Women Performing Hard Tasks of Men in Big Chemical Plants” and “Capable Women and their doings” in Ogden, Utah. Another page reveals a significant victory of Florence Ellinwood Allen: successfully defending a women’s suffrage amendment to the charter of East Cleveland before the Supreme Court of Ohio.

Verifying window and photograph of Miss Florence Allen

Verifying Miss Allen’s Victory before Ohio Supreme Court

There are also views into African American papers like the Nashville Globe, established in response to the extension of Jim Crow to Nashville’s city transportation system; the paper began as a means of documenting black business owners and their attempts to establish an alternate streetcar system. The Nashville Globe ran from 1906 to 1960.

Conclusion

We’re continuing to seek and receive feedback on Beyond Words on formatting text, improving accessibility, extending the volunteer experience, greater precision around identifying artists, and more. We hope that educators, researchers, and artists will take advantage of the ability to group image collections by time frame, such as identifying all historic cartoons appearing in World War I era newspapers. If you create something with the data set, tweet us and use the hashtag #BuiltwithLC.

With over 1,200 images waiting to be verified, we could use your help! Thanks in advance for joining us and for your feedback; we’ll share what we’re learning again soon.

Hack-to-Learn at the Library of Congress

When hosting workshops, such as Software Carpentry, or events, such as Collections As Data, our National Digital Initiatives team made a discovery—there is an appetite among librarians for hands-on computational experience. That’s why we created an inclusive hackathon, or a “hack-to-learn,” taking advantage of the skills librarians already have and paring them with programmers to […]

Automating Digital Archival Processing at Johns Hopkins University

This is a guest post from Elizabeth England, National Digital Stewardship Resident, and Eric Hanson, Digital Content Metadata Specialist, at Johns Hopkins University.  Elizabeth: In my National Digital Stewardship Residency at Johns Hopkins University’s Sheridan Libraries, I am responsible for a digital preservation project addressing a large backlog (about 50 terabytes) of photographs documenting the university’s […]

Recommendations for Enabling Digital Scholarship

Mass digitization — coupled with new media, technology and distribution networks — has transformed what’s possible for libraries and their users. The Library of Congress makes millions of items freely available on loc.gov and other public sites like HathiTrust and DPLA. Incredible resources — like digitized historic newspapers from across the United States, the personal papers […]

Using Three-Dimensional Modeling to Preserve Cultural Heritage

This is a guest post by Elizabeth England, a resident in the National Digital Stewardship Residency program. In recent years, a few news stories focused on the use of digital tools in preserving cultural heritage three-dimensional objects, stories such as the printed reconstruction of the Arch of Triumph in Palmyra, Syria and the construction of a […]

The Keepers Registry: Ensuring the Future of the Digital Scholarly Record

This is a guest post by Ted Westervelt, section head in the Library of Congress’s US Arts, Sciences & Humanities Division. Strange as it now seems, it was not that long ago that scholarship was not digital. Writing a dissertation in the 1990s was done on a computer and took full advantage of the latest […]

The TriCollege Libraries Consortium and Digital Content

This is a guest post from Stefanie Ramsay, a Digital Collections Librarian at Swarthmore College, which is part of the TriCollege Libraries consortium. Consortium arrangements among libraries and archives are an increasingly popular strategy for managing the large amount of digital content they produce and for providing increased access to these important materials. Luckily for […]

“Volun-peers” Help Liberate Smithsonian Digital Collections

The Smithsonian Transcription Center creates indexed, searchable text by means of crowdsourcing…or as Meghan Ferriter, project coordinator at the TC describes it, “harnessing the endless curiosity and goodwill of the public.” As of the end of the current fiscal year, 7,060 volunteers at the TC have transcribed 208,659 pages. The scope, planning and execution of the […]

Wisdom is Learned: An Interview with Applications Developer Ashley Blewer

  Ashley Blewer is an archivist, moving image specialist and developer who works at the New York Public Library. In her spare time she helps develop open source AV file conformance and QC software as well as standards such as Matroska and FFV1. She’s a three time Association of American Moving Image Archivists’ AV Hack […]