Lowering barriers to using collections in an NDSR workshop with Shawn Averkamp

This is a guest post by Charlotte Kostelic, National Digital Stewardship Resident with the Library of Congress and Royal Collection Trust for the Georgian Papers Programme. Her project focuses on exploring ways to optimize access and use among related digital collections held at separate institutions. This work has included a comparative analysis of international metadata standards and a series of user interviews in order to determine how current practices meet user needs. A final report on this project will be released at the end of December.

On November 13th the Library of Congress hosted Computation in Conversation: Fostering New Fluencies in Collections as Data a lecture and workshop led by Shawn Averkamp, Manager of Metadata Services at New York Public Library. I had the opportunity to host the event as an enrichment session for the National Digital Stewardship Residency (NDSR) program; I hoped to use this opportunity to invite someone who would highlight the ways in which collections as data can be accessible for all library users regardless of their level of technical expertise. Within NDSR, I am working on a project that focuses on user needs for digital collections for research. In the context of my project, research could mean the work an elementary school student does for a class project or the work an academic does in preparing a journal article. What these and all other users have in common is a need for data to be made accessible in formats that they can use and understand.

Averkamp’s presentation focused on how librarians in every department of a library – from public services and outreach to cataloging and systems – can contribute to the development of collections as data with their unique expertise. The presentation also explored how librarians can present collections as data in a way that lowers the barrier to use. In preparing for the workshop and presentation, Averkamp met with librarians across NYPL in order to ask them about the types of data users ask for and the ways in which they use the data sets NYPL has made public. Institutions such as the Library of Congress and NYPL have made significant amounts of data accessible to users. However, Averkamp highlighted that users may not necessarily know what to do with the data or may think that computational work might not be for them.

Venn diagram describing "collections as data" as the overlap of people who can code and people who care about collections

collections as data Slide from Shawn Averkamp’s “Computation in Conversation” workshop on 13 November. Photo by Meghan Ferriter.

While the term computational use might suggest that one needs to be able to write code, it could also mean working with data in a spreadsheet. The barriers to using computational methods with collections as data can be even higher when the labor that goes into transforming data sets into visualizations or other digital projects is not made visible in the end product. Averkamp’s workshop aimed to lower this barrier by presenting simple, free tools that can be used to make a data set ready for computational use. Using just Google Sheets and Timeline.js, workshop attendees were able to standardize dates and load the data into a template so that individual objects from collections held by the Library of Congress and NYPL could be presented in a timeline. Anyone can view Averkamp’s slides or try the workshop on their own by following her guide here: https://github.com/saverkamp/loc-talk-2017.


View of interactive timeline demonstrating titles of publications relating to women's suffrage from 1835-1880, created during Averkamp's workshop.

Example Timeline.js using titles related to Women’s Suffrage at the Library of Congress and the New York Public Library

By presenting each of the steps that it takes to transform a data set into a timeline, Averkamp also highlighted how context can be lost with each change made to the data set. She discussed this loss of context in relation to Caroline Sinders’ concept of the data ethnographer. Sinders highlights the necessity of data ethnographers who will be able to describe the social and cultural contexts in which a data set was created. For library collections as data this could mean publishing a library’s cataloging guidelines, the date the data was created, and the transformations that were made to the data before it was made publicly available. Averkamp’s workshop demonstrated that by documenting the context of their data sets’ creation, as well as providing simple tools for using collections as data, librarians and libraries can lower the barrier to using collections as data.

You can follow Charlotte Kostelic and Shawn Averkamp on Twitter and find Averkamp’s workshop notes on GitHub

Welcoming Laura Wrubel and exploring digital scholarship at the Library of Congress

In November, the LC Labs team welcomed Laura Wrubel as she kicked off her research leave in residence with the Library of Congress. Over the next 3 months, she’ll explore digital scholarship with our team and how it might be best supported. We checked in with her to learn more about her goals, background, and […]

October Innovator-in-Residence Update

Library of Congress Innovator-in-Residence, Jer Thorp, has started diving into the collections at the Library. We’ve rounded up some of his activities in October and how he is sharing his process in this post. Jer has created a “text-based exploration of Library of Congress @librarycongress‘ MARC records, specifically of ~9M books & the names of […]

Welcoming Jer Thorp as Innovator-in-Residence

Starting this week, acclaimed data artist Jer Thorp began his tenure as the 2017 Library of Congress Innovator-in-Residence. He will spend six months with the National Digital Initiatives team exploring the Library’s digital collections and creating an art piece that will be displayed in the Library’s public spaces. Jer Thorp is an artist and educator […]

Hack-to-Learn at the Library of Congress

When hosting workshops, such as Software Carpentry, or events, such as Collections As Data, our National Digital Initiatives team made a discovery—there is an appetite among librarians for hands-on computational experience. That’s why we created an inclusive hackathon, or a “hack-to-learn,” taking advantage of the skills librarians already have and paring them with programmers to […]

Developing a Digital Preservation Infrastructure at Georgetown University Library

This is a guest post by Joe Carrano, a resident in the National Digital Stewardship Residency program. The Joseph Mark Lauinger Memorial Library is at home among the many Brutalist-style buildings in and around Washington, D.C. This granite-chip aggregate structure, the main library at Georgetown University, houses a moderate-sized staff that provides critical information needs […]

IEEE Big Data Conference 2016: Computational Archival Science

This is a guest post by Meredith Claire Broadway,a consultant for the World Bank. Computational Archival Science can be regarded as the intersection between the archival profession and “hard” technical fields, such as computer science and engineering. CAS applies computational methods and resources to large-scale records and archives processing, analysis, storage, long-term preservation and access. […]

The University of Richmond’s Digital Scholarship Lab

In November, 2016, staff from the Library of Congress’s National Digital Initiatives division visited the University of Richmond’s Digital Scholarship Lab as part of NDI’s efforts to explore data librarianship, computational research and digital scholarship at other libraries and cultural institutions. Like many university digital labs, the DSL is based in the library, which DSL […]