“Volun-peers” Help Liberate Smithsonian Digital Collections

Scan of an herb.

Scan of Chamaenerion Latifolium. US National Herbarium, Smithsonian.

The Smithsonian Transcription Center creates indexed, searchable text by means of crowdsourcing…or as Meghan Ferriter, project coordinator at the TC describes it, “harnessing the endless curiosity and goodwill of the public.” As of the end of the current fiscal year, 7,060 volunteers at the TC have transcribed 208,659 pages.

The scope, planning and execution of the TC’s work – the in-house coordination among the Smithsonian’s units and the external coordination of volunteers — is staggering to think about. The Smithsonian Institution is composed of 19 museums, archives, galleries and libraries; nine research centers; and a zoo. Fifteen of the Smithsonian units have collections in the TC, which is run by Ching-hsien Wang, Libraries and Archives System Support Branch manager with the Smithsonian Institution Office of the Chief Information Officer.

Ferriter said, “To manage a project of this scope, one must understand and troubleshoot the system and unit workflows as well as work with unit representatives as they select content and set objectives for their projects.  Neither simply building a tool nor merely inviting participation is enough to sustain and grow a digital project, whatever the scale.”

The TC benefits from the Smithsonian’s online collections. Though individual units may have their own databases, they all link to a central repository, the Smithsonian’s “Enterprise Digital Asset Network,” or EDAN, which is searchable from the Smithsonian’s Collections Search Center. The TC leverages the capabilities of EDAN and builds on the foundation of data and collections-management systems supported by the the Office of the Chief Information Officer. In some cases, for example, a unit may have digitized a collection and the TC arranges for volunteers to add metadata.

Photo of Ching-hsien Wang.

Ching-hsien Wang.

Each unit has a different goal for its digital collections. The goal for one project might be to transcribe handwritten notes; the goal for another project might be to key in text from a scanned document. A project might call for geotagging or adding metadata from controlled vocabularies (pre-set tags, used to avoid ambiguities or sloppy mistakes). But the source for each TC project is always a collection of digital files that a volunteer can access online.

Sharing data across the Smithsonian’s back end is an impressive technological feat but it’s only half of this story. The other half is about the relationship between the TC and the volunteers. And the pivotal component that enables the two sides to engage effectively: trust.

The TC’s role at the Smithsonian is as an aggregator, making bulk data available for volunteers to process and directing the flow of volunteer-processed data to the main repository. So, more than just trafficking in data, the TC nurtures its relationships with volunteers by means of technical fail-safe resources and down-to-earth, sincere human engagement.

Ferriter shows her respect for the volunteers when she refers to them as “volunpeers.” Ferriter said, ” ‘Volunpeers’ indicates the ways unit administrators and Smithsonian staff experience the TC along with volunteers. ‘Volunpeers’ underscores the values articulated by volunteers describing their activities and personal goals on the TC, including to learn, to help and to give back to something bigger….Establishing a collaborative space that uses peer-review resources brings to the foreground what is being done together rather than exclusively highlighting what is being done by particular individuals.”

TC staff made a crucial discovery when they figured out that what motivated people to volunteer was a sincere desire to help. Wang said, “Volunteers feel privileged and take the responsibility seriously. And they like that the Smithsonian values what they do.”

Photo of Meghan Ferriter.

Meghan Ferriter.

Ferriter said, “Volunteers indicated they were seeking increased behind-the-scenes access as a reward for participating, rather than receiving discounts or merchandise from Smithsonian vendors.” So TC staff developed a close relationship with the volunteers and they remain in constant contact my means of social media.

“Communicating in an authentic way is central to my strategy,” Ferriter said. “Being authentic includes being vulnerable and expressing real enthusiasm. It also entails revealing my lack of knowledge while learning alongside volunteers. My strategy incorporates an inclusive attitude with the intent of shortening the distance of institutional authority and public positioning.”

Institutional authority — or the perception of institutional authority — can be a potential obstacle to finding volunteers. Wang said the Smithsonian — like other staid old institutions — was perceived several years ago to have an image problem. She said that research indicated, “People think it’s nothing but old white men scientists.” Wang and Ferriter do not suggest that the solution is for the TC to appear young and hip and “with it.” Rather the TC demonstrates its inclusiveness in a very real and sincere way: by reaching out to any and all volunteers and treating them with appreciation and respect.

Volunteers are always publicly credited for their work. They can download and review PDFs of what they’ve done once a project is finished. Ferriter said, “I advise Smithsonian staff members who want to be part of the Transcription Center, ‘You need to understand that there is a commitment that you’re making to participate in this project, which requires you to be involved with communicating with the public, to answer their questions, to tell them specific details about projects, to be prepared to provide a behind-the-scenes tour.”

Scan of a handwritten letter.

Scan of handwritten document from “The Legend of Sgenhadishon.” National Anthropological Archives, the Smithsonian.

Each project includes three steps: transcription, review and approval. One of the remarkable results of the TC/volunteer relationship is that the review process has become so thorough and consistently reliable, and  volunteers behave so professionally and responsibly, there is often little change required during the approval phase. This trust in the reviewers — trust that the reviewers earn and deserve — saves a significant amount of staff time for the Smithsonian in the approval phase.

Another remarkable result of the volunteers’ dedication is that TC staff has found that their manual transcriptions are statistically far superior than OCR, which often tends to be “dirty” and requires additional time and labor to correct.

Ferriter said that as successful as the Transcription Center is, as evidenced by the amount of digital collections it has made keyword searchable, there remain further opportunities to look at the larger picture of inter-related data. “The story may be more than merely what is contained within the TC project,” Ferriter said. “There are opportunities to connect the project to its significance in history, science and other related SI and cultural heritage collections.”

When those opportunities arise, the volunpeers will no doubt help make the connections happen.

Data and Humanism Shape Library of Congress Conference

The presentations at the Library of Congress’ Collections As Data conference coalesced into two main themes: 1) digital collections are composed of data that can be acquired,  processed and displayed in countless scientific and creative ways and 2) we should always be aware and respectful that data is manipulated by — and derived from — people. […]

Announcing the 2015 Innovation Award Winners

On behalf of the National Digital Stewardship Alliance Innovation Working Group, I am excited to announce the 2015 NDSA Innovation Award winners! This year, the annual innovation awards committee reviewed over thirty exceptional nominations from across the country. Awardees were selected based on how their work or their project’s whose goals or outcomes represent an […]

Five Questions for the Smithsonian Institution Archives’ Lynda Schmitz Fuhrig

The following is a guest post from Michael Neubert, a supervisory digital projects specialist at the Library of Congress. In February of this year I wrote a post here about an collaborative effort of representatives of the National Archives and Records Administration (NARA), the Government Publishing Office (GPO), and the Library of Congress to work […]

Viewshare Supports Critical Thinking in the Classroom

This year I had the pleasure of meeting Dr. Peggy Spitzer Christoff, lecturer in Asian and Asian American Studies at Stony Brook University. She shared with me how she’s using the Library of Congress’ Viewshare tool to engage her students in an introduction to Asia Studies course. Peg talked about using digital platforms as a way to improve writing, […]

The Personal Digital Archiving 2015 Conference

The annual Personal Digital Archiving conference is about preserving any digital collection that falls outside the purview of large cultural institutions. Considering the expanding range of interests at each subsequent PDA conference, the meaning of the word “personal” has become thinly stretched to cover topics such as family history, community history, genealogy and digital humanities. New York […]

How to Participate in the September 2015 NDSA New England Regional Meeting

The following is a guest post by Kevin Powell, digital preservation librarian at Brown University. On September 25th, UMass Dartmouth will host the National Digital Stewardship Alliance New England Regional Meeting with Brown University. We enthusiastically encourage librarians, archivists, preservation specialists, knowledge managers, and anyone else with an interest in digital stewardship and preservation to […]

We Welcome Our Email Overlords: Highlights from the Archiving Email Symposium

This post is co-authored with Erin Engle, a Digital Archivist in the Office of Strategic Initiatives. Despite the occasional death knell claims, email is alive, well and exponentially thriving in many organizations. It’s become an increasingly complex challenge for collecting and memory institutions as we struggle with the same issues: How is email processed differently […]

Dodge that Memory Hole: Saving Digital News

Newspapers are some of the most-used collections at libraries. They have been carefully selected and preserved and represent what is often referred to as “the first draft of history.” Digitized historical newspapers provide broad and rich access to a community’s past, enabling new kinds of inquiry and research. However, these kinds of resources are at […]