Datasets as Primary Sources, Part II

This post was written by Peter DeCraene, a 2021-22 Albert Einstein Distinguished Educator Fellow at the Library of Congress. This is part 2 of an ongoing occasional series about using datasets as primary sources. We thank Peter and the Teaching with LC team for allowing us to cross-post his writing for Signal Blog readers! 

Part 1 of this series looked at the transcription records of the Rosa Parks Papers. Other datasets from the Library’s Selected Datasets online collection include the diary of Samuel J. Gibson, a Union soldier held in a Confederate prisoner of war camp, and the papers of Susan B. Anthony. Each of these data files contains information about historical people and their views on the world around them at the time. The collection also includes information about topics as diverse as U.S. Geological Survey reports on water use and the Grand Comics Database.Webpage for Selected Datasets collection

The files in the Selected Datasets collection vary widely and require different approaches to teaching and learning.  And the collection is always growing. One of the well-documented and quickly accessible datasets is the “Dataset from a picture of subsidized households: 2008.” For computer science students interested in learning to access, clean, and analyze complex data, this dataset provides a wealth of information about the state of public housing in the U.S. at the beginning of the 21st century. It also includes a detailed document describing the information in the data files. Digging into this data would make a great cross-curricular project with social studies students researching the history and current state of public housing.

For teachers and students wishing to bypass some of the more technical aspects of accessing items in the Selected Datasets collection, viewing the Chronicling America collection as a dataset can also yield interesting information using the advanced search features. For example, students might look at the number of newspapers in Virginia with the words “free” and “independent” on their front pages in the years leading up to the Civil War, and compare the number of occurrences to those from the same search of newspapers in California, Alabama, or Ohio. In what contexts are those words used in each state? Performing the search one year at a time from the start of James Buchanan’s presidency in 1857 through the end of the war in 1865 might also reveal some interesting trends. Or, search for those words appearing on the second page and discuss the reasons the words might show up more or less frequently there. Determining search parameters, then analyzing and representing the results would be a good collaboration between students in math and social studies classes.

Chronicling America advanced search features

Data scientists perform this type of frequency analysis all the time on data gathered from many sources: polling information, website usage, or social media accounts, for example. In addition to the typical questions we ask about primary sources (Who created this item, why was it created, who was the audience?), this type of primary sources analysis also raises other questions: What might be missing from the data? How might this data have been used or misused? Would different data representations lead to different interpretations? The connections across school subjects and to current cultural practices make analysis of datasets as primary sources a vital and engaging part of our lessons.

Annotation as Aesthetic: A Closing Interview with Innovator in Residence Courtney McClellan

2021 Innovator in Residence Courtney McClellan created Speculative Annotation, an experimental browser-based application that encourages students and teachers to have conversations with historic Library of Congress items through annotation and mark-making. McClellan is a research-based artist who lives in Atlanta, Georgia. With a subject focus on speech and civic engagement, McClellan works in a range […]

It’s a bird, it’s a plane, it’s a…derivative dataset!

This post describes a collaboration between LC Labs member Eileen J. Manchester and Peter DeCraene, the Albert Einstein Distinguished Educator Fellow to answer the question: “what would it mean to treat a dataset as a primary source?”

Sparking the Datamagination: 2021 Digital Strategy Summer Intern Design Sprint part II

This is an interview with Maria Capecchi, Abigail Tick, and Joshua Ortiz Baco, three of the seven students that joined our team during the summer of 2021. As a small group, they worked together to better understand the Newspaper Navigator data set with the needs of undergraduate students in mind.

Next Slide Please: 2021 Digital Strategy Summer Intern Design Sprint part I

This is an interview with Emily Zerrenner, Jodanna Domond, Luke Borland, and Darshni Patel, four of the seven students that joined our team during the summer of 2021. As a small group, they worked together to better understand the Library’s Web Archives with the needs of researchers and data visualization artists in mind.