Linking chatbots to collections for place-based storytelling

The following is a guest post from Library of Congress Labs Innovation Intern, Charlie Moffett. In the course of crafting data-driven narratives with digital collections, he created @govislandbot and an open-source mapping tutorial. Below he shares his processes, some of the challenges he encountered, along with the code.

I started my remote internship with LC Labs expecting to build a Twitterbot to support the Library of Congress Baseball Americana engagement with MLB All-Star Week 2018. Around the same time, I was ramping up an unrelated school project to build a chatbot prototype that would connect visitors on Governors Island with stories about its historic district. My aim in that undertaking was to use the physical context of the island to seamlessly engage New Yorkers with interesting, localized digital content using an AI tool I hadn’t much explored up to that point. What I soon realized however was that if my aim was to connect Gov Island history with its context in space, I needed first to connect more meaningfully with the digital collections that I was channeling history from. I didn’t feel right about asserting that this ‘place-based’ approach to storytelling would be worth exploring without first establishing an appreciation for the digital collections I’d be serving up. It dawned on me that my internship with the Library could be the perfect opportunity to meet with and learn from the staff behind these collections, as well as other members of LC Labs who were leveraging the newly launched loc.gov JSON API to provide machine-readable access to the collections for a variety of apps and purposes.

As I pivoted with my primary objective for the internship, one of the first things I decided on with the chatbot prototype was to focus on just a select few primary sources as data sources for the stories I’d be crafting for the bot to tell. During an in-person visit early on in my internship, I met with key folks from Chronicling America (ChronAm), of the Serial and Government Publications Division and the Veterans History Project (VHP), of the American Folklife Center.

In meeting with Robin Butterhof of ChronAm, I picked up on intricacies about how historical newspapers are collected and stored that later became critical for the prototype as I figured out how to programmatically access, manipulate, and deliver newspaper content within the Facebook Messenger environment. Chris Ehrman (also ChronAm) was kind enough to drill into the data extraction process with me through a number of his own Python scripts for a Beyond Words bot he was building: retrieving relevant attributes, downloading content in multiple formats, and visualizing summary statistics about the publications I was pulling from.

Connecting with Megan Harris and Jeanine Nault (Reference Specialist and Digital Assets Specialist with VHP, respectively) helped me imagine what it might look like to include multimedia stories in the chatbot experience, namely the sounds and transcripts of interviews with veterans who had served on Governors Island. During the same visit, I also met with Robert Brammer, a Legal Reference Librarian with the Law Library of Congress, to learn about their process of building the Law Library chatbot, including challenges and successes the chatbot had seen thus far.

Two men and woman standing in front of a seated crowd presenting images on a projection screen

Demo Day on Governors Island (image credit: @nycmedialab on Twitter)

These meetings and the follow-up research I performed as a result added entirely new and invaluable elements to my prototyping process. Laura Wrubel, Software Development Librarian at George Washington University Libraries, helped me to synthesize a lot of what I had picked up by then about the possibilities of the Library’s APIs and data. Listening to her talks, reading her documentation and meeting to review my own progress greatly helped my team back in New York to get going with digital collections. The work I was doing there with my peers from school and industry mentors from NYC Media Lab on the chatbot project centered primarily around user experience design and how we planned to pitch our stakeholders on the value of the project. My internship with LC Labs, on the other hand, allowed me to wade into the richness of the collections and brainstorm directly with the stewards of the material. That experience helped me to not only build a better product but, more importantly, understand and appreciate a much larger portion of the data and software lifecycle.

Because we were making a prototype, we moved fast and made content decisions to prove out UX concepts. The intent behind the bot was never to construct an authoritative history of the island anyhow, but to instead present historical material in their own voices with the added context of physical proximity to the subject at hand.

Side by side view of one historic newspaper page and view of a chatbot window with map and clipping from the same newspaper

Sections of historic NY Tribune papers were common targets for our chatbot prototype

Instead of just showing images of the newspaper pages, we copied OCR data from the PDF versions of the pages into text bubbles for the viewer to digest within Messenger. This introduced yet another moment of curation in our process, not only because we were selectively choosing which chunks of the article to include in our stories, but also through spell-checking and correcting of OCR data before cementing it into the bot. I imagined various configurations in these moments – using raw OCR, including the entirety of articles for users to sift through, using only imagery to preserve nuances of the material – but ultimately our impetus to craft the right user experience for our perceived audience reigned supreme, and we made more than a few quick and dirty shortcuts to get to a presentable prototype. But while we may not have necessarily engaged in “deep” storytelling, my team and I still had to contend with the “value” of the story for the audience and convey the right elements of the material as determined by the various groups we solicited throughout the project.

In the end, I found that I was able to strike a balance between partially overlapping but distinct agendas to demo our proof of concept to the stakeholders in the fellowship, and at the same time create a space where one could reflect on this process of data collection, curation, and stewardship to inform current and future product designs. I’m particularly grateful to have had the opportunity in this project to flesh out an “emerging technology” prototyping experiment with a different sort of context than might otherwise have been obvious. It’s clear to me that these tools, platforms and workflows can enable greater access to stores of wonderful data, but the environments that those data are embedded in and depend on for sustenance are perhaps even more worthwhile to examine. As a next step, I hope to share my learnings with my peers and encourage other budding data scientists to spend time with the collections and imagine their own new and exciting applications of our Library.

My documentation for the chatbot prototype includes additional notes, screenshots, and open-source code, all of which can be accessed on the Open Science Framework.

Building a Southern Mosaic

The following is a guest post from Innovation Intern Aditya Jain on his Southern Mosaic visualization. Two weeks into my LC Labs Innovation internship, I came across Rachel I. Howard’s essay Southern Mosaic on the Library of Congress website. The essay describes the story of John and Ruby Lomax, a husband and wife who made […]

IIIF at the Library of Congress

The Library of Congress and LC Labs are delighted to co-host the 2018 International Image Interoperability Framework (IIIF) Conference with the Smithsonian Institution and the Folger Shakespeare Library. The event will be held May 21-25 in Washington, DC. In preparation for the event, we sat down with Chris Thatcher, a senior software developer at the […]

Librarians learn about personal archiving at the Library of Congress

On April 16th and 17th, National Digital Initiatives in partnership with DC Public Library hosted the Memory Lab Network Bootcamp at the Library of Congress. The Memory Lab Network – a cohort of 7 urban, rural, and tribal library systems – will build digitization stations and teach classes through an IMLS grant to support personal […]

Teaming Up! Digital Content Management Joins the Signal

Hello Signal Readers! It’s been a while, and I’m thrilled to be back here at The Library of Congress. Over the last eight months, I’ve been working to build up a new Digital Content Management Section (DCM), which will be supporting the amazing and talented librarians and allied professionals here by building capacity to acquire, preserve, and make […]

Control Issues: A Report of SXSW ’18

We went to the SXSW Conference this year to reach an audience of tech developers with our session Hacking the Library of Congress. As you may expect from an emerging technology conference, sessions on virtual reality (VR) (48 sessions) and blockchain (29 sessions) dominated the week.  At the Virtual Cinema, attendees demoed a variety of VR […]

Welcoming Charlie Moffett and crafting interactive, location-based narratives on the web at the Library of Congress

In January, the LC Labs team welcomed Charlie Moffett as he kicked off his innovation internship with the Library of Congress. He’s been exploring digital collections and geospatial data and where the two might intersect to tell stories about place and change. We checked in with him to learn more about his goals, background, and […]

Rethinking LC for Robots: From Topics to Actions

Have you noticed that our LC for Robots page has a new look this month? We integrated feedback from visitors, discussion, and a card sorting exercise to consolidate resources for machine-readable access to Library of Congress digital collections. We’re looking for your feedback, but first, learn more about how we approached this redesign. In September […]

Digital Scholarship Resource Guide: People, Blogs and Labs (part 7 of 7)

This is the final post in a seven-part series by Samantha Herron, our 2017 Junior Fellow. She created this guide to help LC Labs explore how to support digital scholarship at the Library and we started publishing them in January. She’s covered why digital materials matter, how to create digital documents, what digital documents make possible, text […]