Linking chatbots to collections for place-based storytelling

The following is a guest post from Library of Congress Labs Innovation Intern, Charlie Moffett. In the course of crafting data-driven narratives with digital collections, he created @govislandbot and an open-source mapping tutorial. Below he shares his processes, some of the challenges he encountered, along with the code.

I started my remote internship with LC Labs expecting to build a Twitterbot to support the Library of Congress Baseball Americana engagement with MLB All-Star Week 2018. Around the same time, I was ramping up an unrelated school project to build a chatbot prototype that would connect visitors on Governors Island with stories about its historic district. My aim in that undertaking was to use the physical context of the island to seamlessly engage New Yorkers with interesting, localized digital content using an AI tool I hadn’t much explored up to that point. What I soon realized however was that if my aim was to connect Gov Island history with its context in space, I needed first to connect more meaningfully with the digital collections that I was channeling history from. I didn’t feel right about asserting that this ‘place-based’ approach to storytelling would be worth exploring without first establishing an appreciation for the digital collections I’d be serving up. It dawned on me that my internship with the Library could be the perfect opportunity to meet with and learn from the staff behind these collections, as well as other members of LC Labs who were leveraging the newly launched JSON API to provide machine-readable access to the collections for a variety of apps and purposes.

As I pivoted with my primary objective for the internship, one of the first things I decided on with the chatbot prototype was to focus on just a select few primary sources as data sources for the stories I’d be crafting for the bot to tell. During an in-person visit early on in my internship, I met with key folks from Chronicling America (ChronAm), of the Serial and Government Publications Division and the Veterans History Project (VHP), of the American Folklife Center.

In meeting with Robin Butterhof of ChronAm, I picked up on intricacies about how historical newspapers are collected and stored that later became critical for the prototype as I figured out how to programmatically access, manipulate, and deliver newspaper content within the Facebook Messenger environment. Chris Ehrman (also ChronAm) was kind enough to drill into the data extraction process with me through a number of his own Python scripts for a Beyond Words bot he was building: retrieving relevant attributes, downloading content in multiple formats, and visualizing summary statistics about the publications I was pulling from.

Connecting with Megan Harris and Jeanine Nault (Reference Specialist and Digital Assets Specialist with VHP, respectively) helped me imagine what it might look like to include multimedia stories in the chatbot experience, namely the sounds and transcripts of interviews with veterans who had served on Governors Island. During the same visit, I also met with Robert Brammer, a Legal Reference Librarian with the Law Library of Congress, to learn about their process of building the Law Library chatbot, including challenges and successes the chatbot had seen thus far.

Two men and woman standing in front of a seated crowd presenting images on a projection screen

Demo Day on Governors Island (image credit: @nycmedialab on Twitter)

These meetings and the follow-up research I performed as a result added entirely new and invaluable elements to my prototyping process. Laura Wrubel, Software Development Librarian at George Washington University Libraries, helped me to synthesize a lot of what I had picked up by then about the possibilities of the Library’s APIs and data. Listening to her talks, reading her documentation and meeting to review my own progress greatly helped my team back in New York to get going with digital collections. The work I was doing there with my peers from school and industry mentors from NYC Media Lab on the chatbot project centered primarily around user experience design and how we planned to pitch our stakeholders on the value of the project. My internship with LC Labs, on the other hand, allowed me to wade into the richness of the collections and brainstorm directly with the stewards of the material. That experience helped me to not only build a better product but, more importantly, understand and appreciate a much larger portion of the data and software lifecycle.

Because we were making a prototype, we moved fast and made content decisions to prove out UX concepts. The intent behind the bot was never to construct an authoritative history of the island anyhow, but to instead present historical material in their own voices with the added context of physical proximity to the subject at hand.

Side by side view of one historic newspaper page and view of a chatbot window with map and clipping from the same newspaper

Sections of historic NY Tribune papers were common targets for our chatbot prototype

Instead of just showing images of the newspaper pages, we copied OCR data from the PDF versions of the pages into text bubbles for the viewer to digest within Messenger. This introduced yet another moment of curation in our process, not only because we were selectively choosing which chunks of the article to include in our stories, but also through spell-checking and correcting of OCR data before cementing it into the bot. I imagined various configurations in these moments – using raw OCR, including the entirety of articles for users to sift through, using only imagery to preserve nuances of the material – but ultimately our impetus to craft the right user experience for our perceived audience reigned supreme, and we made more than a few quick and dirty shortcuts to get to a presentable prototype. But while we may not have necessarily engaged in “deep” storytelling, my team and I still had to contend with the “value” of the story for the audience and convey the right elements of the material as determined by the various groups we solicited throughout the project.

In the end, I found that I was able to strike a balance between partially overlapping but distinct agendas to demo our proof of concept to the stakeholders in the fellowship, and at the same time create a space where one could reflect on this process of data collection, curation, and stewardship to inform current and future product designs. I’m particularly grateful to have had the opportunity in this project to flesh out an “emerging technology” prototyping experiment with a different sort of context than might otherwise have been obvious. It’s clear to me that these tools, platforms and workflows can enable greater access to stores of wonderful data, but the environments that those data are embedded in and depend on for sustenance are perhaps even more worthwhile to examine. As a next step, I hope to share my learnings with my peers and encourage other budding data scientists to spend time with the collections and imagine their own new and exciting applications of our Library.

My documentation for the chatbot prototype includes additional notes, screenshots, and open-source code, all of which can be accessed on the Open Science Framework.

Control Issues: A Report of SXSW ’18

We went to the SXSW Conference this year to reach an audience of tech developers with our session Hacking the Library of Congress. As you may expect from an emerging technology conference, sessions on virtual reality (VR) (48 sessions) and blockchain (29 sessions) dominated the week.  At the Virtual Cinema, attendees demoed a variety of VR […]

Digital Scholarship Resource Guide: People, Blogs and Labs (part 7 of 7)

This is the final post in a seven-part series by Samantha Herron, our 2017 Junior Fellow. She created this guide to help LC Labs explore how to support digital scholarship at the Library and we started publishing them in January. She’s covered why digital materials matter, how to create digital documents, what digital documents make possible, text […]

Digital Scholarship Resource Guide: Tools for Spatial Analysis (part 5 of 7)

This is part five in a seven part resource guide for digital scholarship by Samantha Herron, our 2017 Junior Fellow. Part one is available here, part two about making digital documents is here, part three is about tools to work with data, part four is all about doing text analysis, and today’s post is focused on spatial analysis. The full […]

Digital Scholarship Resource Guide: Text analysis (part 4 of 7)

This is part four in a seven part resource guide for digital scholarship by Samantha Herron, our 2017 Junior Fellow. Part one is available here, part two about making digital documents is here, part three is about tools to work with data, and part four (below) is all about doing text analysis. The full guide is available […]

Digital Scholarship Resource Guide: Making Digital Resources, Part 2 of 7

This is part two in a seven part resource guide for digital scholarship by Samantha Herron, our 2017 Junior Fellow. Part one is available here, and the full guide is available as a PDF download.  Creating Digital Documents The first step in creating an electronic copy of an analog (non-digital) document is usually scanning it […]

Introducing Beyond Words

As a part of Library of Congress Labs release last week, the National Digital Initiatives team launched Beyond Words. This pilot crowdsourcing application was created in collaboration with the Serial and Government Publications Division and the Office of the Chief Information Officer (OCIO) at the Library of Congress. In our first week and a half, […]

Hack-to-Learn at the Library of Congress

When hosting workshops, such as Software Carpentry, or events, such as Collections As Data, our National Digital Initiatives team made a discovery—there is an appetite among librarians for hands-on computational experience. That’s why we created an inclusive hackathon, or a “hack-to-learn,” taking advantage of the skills librarians already have and paring them with programmers to […]

Automating Digital Archival Processing at Johns Hopkins University

This is a guest post from Elizabeth England, National Digital Stewardship Resident, and Eric Hanson, Digital Content Metadata Specialist, at Johns Hopkins University.  Elizabeth: In my National Digital Stewardship Residency at Johns Hopkins University’s Sheridan Libraries, I am responsible for a digital preservation project addressing a large backlog (about 50 terabytes) of photographs documenting the university’s […]

Recommendations for Enabling Digital Scholarship

Mass digitization — coupled with new media, technology and distribution networks — has transformed what’s possible for libraries and their users. The Library of Congress makes millions of items freely available on and other public sites like HathiTrust and DPLA. Incredible resources — like digitized historic newspapers from across the United States, the personal papers […]