Top of page

In conversation: LC Labs staff and Einstein Educator Fellow discuss library data, STEM education, and primary source analysis

Share this post:

I sat down with Peter DeCraene, the 2020-2022 Albert Einstein Distinguished Educator Fellow at the Library of Congress, to reflect on how messy, yet rewarding, it is to think of library collections as data in a classroom educational setting. 

Eileen J. Manchester (EJM): Hi, Peter. First of all, thank you so much for spending the last two years as a Fellow at the Library of Congress. During your fellowship, you’ve written many insightful posts about bringing mathematics to life with historical sources and, vice versa, the history of mathematics education.

We started working together because of our shared interest in datasets. My team, LC Labs, investigates creative and computational ways that researchers and artists can use library data and metadata. This approach of treating digital library information as data for computation is broadly described as “collections as data.” To make people aware of these computational resources, LC Labs has convened conferences, hackathons, software carpentries, data jams, data challenges—you name it!

Collections as Data logo
Collections as Data graphic created by Natalie Buda Smith, Library of Congress

But when you and I started to explore the idea of using datasets as primary sources, K-12 teachers were our audience, not computer scientists. To kick off our conversation, can you tell me more about how K-12 educators currently teach data? What sorts of activities would your students undertake when working with data?

Peter DeCraene (PD): Data analysis and representation are becoming big topics in school mathematics. At my home school, the number of students taking statistics has at least tripled in the last ten years, and we just added a data science course as well. Both of these involve understanding, modeling with, and representing data. Additionally, I teach a computer science principles class, where we discuss and analyze data visualizations, and learn how to create representations from large data files.

EJM: Are there datasets in the Library’s collection that work well as primary sources for the STEM classroom?

PD: The datasets generated during the transcription process in the By the People program provide some really interesting opportunities for cross-curricular connections between math and humanities. Performing some basic sentiment analysis techniques on the data in a math or computer science class may lead to new insights into events studied in a history class. The Grand Comics Database has some qualitative and quantitative data that is interesting to look at, and other data files are filled with numeric data from which trends and patterns can be found and visualized.

The difficult thing about many of these files is that they may not be “classroom ready” for a specific task or project. These are real-life files with messy data, possibly little documentation, and sizes that may take a very long time to download. The files are ripe for analysis, but in their current form, many are probably more useful to researchers than they would be for students at the K-12 level. I don’t say that to discourage teachers from using them, but to be aware that they are not as immediately student-friendly as a photograph or newspaper article.

EJM: How would you describe the various kinds of datasets that the Library of Congress makes available? Which of these do you still have questions about?

PD:  (laughing) Huge and varied, and all of them!  There’s everything from samples of government PowerPoint presentations to transcriptions of baseball executive Branch Ricky’s notes. There are PDFs with United States Geological Survey reports on water to spreadsheets with information about subsidized housing.

The biggest question I have about the Selected Datasets Collection as a whole is the same one I have for most primary sources: “Why these?”  I am really curious about why these items were created, how they came to be in the Library’s custodianship and, more specifically, in this particular online collection.

EJM: The notion of “data” is almost as old as the notion of research and scientific inquiry itself. During your fellowship, did you come across any information related to the history of science or the history of math? That is, historical examples of people conducting quantitative research with data?

PD:  There are actually lots of items in the Library’s collections related to data, in addition to those in the Selected Datasets collection. For example:

EJM: How does this shape your thinking about modern computer science and data science education?

PD: When I studied computer science in college, it was all about algorithms, programming, and hardware. More recently, the wide availability of lots of different kinds of data really demand that we look beyond the technical aspects to the human aspects – who gathered the data, who created the visualization, and for what purpose. Of course, these have always been important questions, and seeing some of these historical items really drives home that human connection. That connection and context will certainly color how and what I teach next year. And everyone needs to understand that data is a human, not an objective, phenomenon.

EJM: If you had one piece of advice for a digital librarian, what would it be? What about for a K-12 teacher?

PD: As a teacher in a large high school, I know how easy it is to get stuck in our departmental silos, and I have heard many teachers (including myself at one time) say “I teach math,” with the unexamined implication that the subject is objective. STEM subjects are still human subjects, and finding connections outside of that silo is really important. I’m sure that’s something people in any field, whether digital librarian, computer scientist, or STEM teacher should keep in mind. If we don’t reach out to others outside our own narrow field, we risk cutting ourselves off from connections with and the expertise and imagination of everybody else – we would stunt our own growth and the growth of our fields.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.