{ subscribe_url:'//blogs.loc.gov/share/sites/library-of-congress-blogs/international-collections.php' }

Summer 2020 Junior Fellow Reveals Hispanic Digital Collections

(The following is a guest post by Hispanic Division Junior Fellow Matthew Bova.)

My assignment as a Junior Fellow was to make a visualization showing the digitized data related to the Hispanic Division that is available via the Library’s website. During my first few days, I found the LOC API (Application Programming Interface). This API was developed by LOC Labs, and allows computer programs to access data about all the items found in a search of the Library’s website. You can try this for yourself – just search something like dogs on the Library’s website and add the term “&fo=json” to the end of it. It displays most of the data you would find on that webpage in a format called JSON, which computers can interpret far more easily than a normal webpage.

I found I was able to count and display the results of my searches, and eventually discovered a method for counting every item by format (books, audio, manuscripts, etc.). I then came up with a method of finding every digitized item with Hispanic metadata, meaning it contained a subject, location or language that could be considered “Hispanic.” (Within the Library of Congress, the Hispanic Division recommends the acquisition of materials and provides reference services related to Spain, Portugal, Latin America, the Caribbean, and US Latinx communities) This allowed me to create a simple visualization (shown below), using open-source graphing software.

An Analysis of LOC API Metadata. Visualization by Junior Fellow Matthew Bova.

This method has its limitations. I was only extracting the numbers themselves rather than the values. This limited how much I could tell from this graph. My computer only grabbed what it needed to make this chart. For example, my dataset cant’t tell you how many of these “Hispanic” books were published before 1900. To do that, you would need a dataset containing every single item in the library, and I set out to create that dataset.

This proved to be more difficult than expected. The API isn’t designed for mass data collection, and only opens the first 100,000 results of a search. To solve this, I chunked up the format searches into individual years, and downloaded them as .csv files, a very simple tabular data format. Using this method, I sorted out the Hispanic metadata items and was able to make a chart that shows the growth of the library’s digital catalog, both in total and those with Hispanic Metadata.

Digitized materials with Hispanic Metadata. Visualization by Junior Fellow Matthew Bova,

This chart allows the Hispanic Division staff to see the Hispanic resources that are accessible through a loc.gov search and to use that data to plan for the future.

Additional Resources:

  • Junior Fellows Program Overview
  • Watch a webcast of Matt talking about his Junior Fellows’ project.
  • Meet all the 2020 Junior Fellows and read about their wide-ranging virtual explorations of the LOC collections:

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.