APIs: How Machines Share and Expose Digital Collections

By DLR German Aerospace Center (Zwei Roboterfreunde / Two robot friends) [CC BY 2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons

By DLR German Aerospace Center (Zwei Roboterfreunde / Two robot friends) [CC BY 2.0], via Wikimedia Commons.

Kim Milai, a retired school teacher, was searching on ancestry.com for information about her great grandfather, Amohamed Milai, when her browser turned up something she had not expected: a page from the Library of Congress’s Chronicling America site displaying a scan of the Harrisburg Telegraph newspaper from March 13, 1919. On that page was a story with the headline, “Prof. Amohamed Milai to Speak at Second Baptist.” The article was indeed about her great grandfather, who was an enigmatic figure within her family, but…”Professor!?,” Milai said. “He was not a professor. He exaggerated.” Whether it was the truth or an exaggeration, it was, after all, a rare bit of documentation about him, so Milai printed it out and got to add colorful piece to the mosaic of her family history. But she might never have found that piece if it wasn’t for ancestry.com’s access to Chronicling America’s collections via an API.

Application Programming Interfaces (APIs) are not new. API-based interactions are part of the backdrop of modern life. For example, your browser, an application program, interfaces  with web servers. Another example is when an ATM screen enables you to interact with a financial system. When you search online for a flight, the experience involves multiple API relationships: your travel site or app communicates with individual airlines sites which, in turn, query their systems and pass their schedules and prices back to your travel site or app. When you book the flight, your credit card system gets involved. But all you see during the process are a few screens, while in the background, at each point of machine-to-machine interaction, servers rapidly communicate with each other, across their boundaries, via APIs. But what exactly are they?

Chris Adams, an information technology specialist at the Library of Congress, explained that APIs can be considered a protocol or a set of rules governing the format of data exchanged between applications. Rules of engagement, so to speak. This allows any application in the exchange to be modified (for, say, application redesigns) as long as they continue to follow the same rules for data exchange.

Ancient map of the world and logo of World Digital Library. Library of Congress.

World Digital Library, Library of Congress.

Adams created the APIs for the World Digital Library, an international project between approximately 190 libraries, archives and museums. The World Digital Library’s APIs describe what to expect from the API and explain how to build tools to access the WDL’s collections. Adams said, “The APIs declare that we publish all of our data in a certain format, at a certain location and ‘here’s how you can interact with it.’ ”

Adams also said that an institution’s digital-collections systems can and should evolve over time but their APIs should remain stable in order to provide  reliable access points to the underlying data.

Logo for HathiTrust Digital Library. Hathitrust.org.

HathiTrust Digital Library. Hathitrust.org.

So, for us consumers, the experience of booking a flight or buying a book online just seems like the way things ought to be. And libraries, museums, government agencies and other institutions are coming around to “the way things ought to be” and beginning to implement APIs to share their digital collections in ways that consumers have come to expect.

Another example of API implementation, similar to the WDL’s, is how HathiTrust accesses shared collections. Tom Burton-West, information retrieval programmer at the University of Michigan Library, said, “HathiTrust searches an index of the HathiTrust repository, which contains approximately 13.9 million digitized works contributed by HathiTrust members.” The Library of Congress is among them. The search results include a few million items, which you can filter by Media, Language, Country and a variety of other filters.

Ultimately it may not matter to you which institutions you got your items from; what matters is that you got an abundance of good results for your search. To many online researchers, it’s the stuff that matters, not so much which institution hosts the collection.

That doesn’t mean that the online collaboration of cultural institutions might diminish the eminence of any individual institution. Each object in the search results — of HathiTrust, WDL and similar resources — is clearly tagged with metadata and information about where the original material object resides, and so the importance of each institution’s collections becomes more widely publicized. APIs help cultural institutions increase their value — and their web traffic — by exposing more of their collections and sharing more of their content with the world.

The increasing use of APIs does not mean that institutions who want them are required to write code for them. David Brunton, a supervisory IT specialist at the Library of Congress, said that most people are using time-tested APIs instead of writing their own, and, as a result, standardized APIs are emerging. Brunton said, “Other people have already written the code, so it’s less work to reuse it. And most people don’t have infinite programming resources to throw at something.”

Screenshot of Library of Congress search engine field for Firefox.

Example 1. Adding the Library of Congress search engine to Firefox.

Brunton cites OpenSearch as an example of a widely used, standardized API. OpenSearch helps search engines and clients communicate, by means of a common set of formats, to perform search requests and publish results for syndication and aggregation. He gave an example of how to view it in action by adding a Library of Congress search engine to the Firefox browser.

“In Firefox, go to www.loc.gov and look in the little search box at the top of the browser,” Brunton said. “A green plus sign (+) pops up next to ‘Search.’ If you click on the little green Plus sign, one of the things you see in the menu is ‘Add the Library of Congress search.’ [Example 1.] When you click on that, the Library’s search engine gets added into your browser and you can search the Library’s site from a non-Library page.”

As institutions open up more and more of their online digital collections, Chris Adams sees great potential in using another API, the International Image Interoperability Framework , as a research tool. IIIF enables users to, among other things, compare and annotate side-by-side digital objects from participating institutions without the need for each institution to run the same applications or specifically enable each tool used to view the items.  Adams points to an example of how it works by means of the Mirador image viewer. Here is a demonstration:

  1. Go to http://iiif.github.io/mirador/ and, at the top right of the page, click “Demo.” The subsequent page, once it loads, should display two graphics side by side – “Self-Portrait Dedicated to Paul Gauguin” in the left window and “Buddhist Triad: Amitabha Buddha Seated” in the right window. [Example 2.]

    Screen shot of the Mirador viewer demo.

    Example 2. Mirador image viewer demo.

  2. Click on the thumbnails at the bottom of each window to change the graphic in the main windows.
  3. In the left window, select the grid symbol in the upper left corner and, in the drop down menu, select “New Object.” [Example 3.]

    Screen shot of example 3, select new object.

    Example 3. Select New Object.

  4. The subsequent page should display thumbnails of sample objects from different collections at Harvard, Yale, Stanford, BnF, the National Library of Wales and e-codices. [Example 4.]

    Screenshot of thumbnails from collections.

    Example 4. Thumbnails from collections.

  5. Double-click a new object and it will appear in left image viewer window.
  6. Repeat the process for the right viewer window.

To see how it could work with the WDL collections:

  1. Go to http://iiif.github.io/mirador/ and click “Demo” at the top right of the page. The subsequent page will display the page with the two graphics.
  2. Open a separate browser window or tab.
  3. Open “The Sanmai-bashi Bridges in Ueno.”
  4. Scroll to the bottom of the page and copy the link displayed under “IIIF Manifest,” The link URL is http://www.wdl.org/en/item/11849/manifest
  5. Go back to the Mirador graphics page, to the left window, select the grid symbol and in the drop down menus select “New Object.”
  6. In the subsequent page, in the field that says “Add new object from URL…” paste the IIIF Manifest URL. [Example 5.]

    Screenshot of add new object from URL.

    Example 5. Add new object from URL…”

  7. Click “enter/return” on your computer keyboard. “The Sanmai-bashi Bridges in Ueno” should appear at the top of the list of collections. Double-click one of the three thumbnails to add it to the left graphics viewer window.
  8. For the right window in the graphics viewer page use another sample from WDL, “The Old People Mill,” and copy its IIIF Manifest URL from the bottom of the page (http://www.wdl.org/en/item/11628/manifest).
  9. Return to the graphics viewer page, to the right window, select the grid symbol and in the drop down menus select “New Object.”
  10. In the subsequent page, in the field that says “Add new object from URL…,” paste the IIIF Manifest URL and click the “enter/return” key. “The Old People Mill” should appear at the top of the list of collections. Double-click to add it to the right graphics viewer window.

This process can be repeated using any tool which supports IIIF, such as the Universal Viewer, and new tools can be built by anyone without needing to learn a separate convention for each of the many digital libraries in the world which support IIIF.

Adams said that implementing an API encourages good software design and data management practices. “The process of developing an API can encourage you to better design your own site,” Adams said. “It forces you to think about how you would split responsibilities.” As programmers rush to meet deadlines, they often face the temptation of solving a problem in the simplest way possible at the expense of future flexibility; an API provides a natural point to reconsider those decisions. This encourages code which is easier to develop and test, and makes it cheaper to expand server capacity as the collections grow and user traffic increases.

Meanwhile, the APIs themselves should remain unchanged, clarifying expectations on both sides, essentially declaring, “I will do this. You must do that. And then it will work.”

APIs enable a website like the HathiTrust, Digital Public Library of America or Europeana to display a vast collection of digital objects without having to host them all. APIs enable a website like Chronicling America or the World Digital Library to open up its collections to automated access by anyone. In short, APIs enable digital collections to become part of a collective, networked system where they can be enjoyed — and used — by a vast international audience of patrons.

“Offering an API allows other people to reuse your content in ways that you didn’t anticipate or couldn’t afford to do yourself,” said Adams. “That’s what I would like for the library world, those things that let other people re-use your data in ways you didn’t even think about.”

One Comment

  1. Tom Burton-West
    January 8, 2016 at 5:32 pm


    I work on HathiTrust search for the University of Michigan Library.

    Regarding the statement:

    “HathiTrust uses APIs among shared collections….a search of HathiTrust for the term “Civil War” queries the collections of all of their 110 or so consortium partners”

    This is not quite accurate. When you search HathiTrust, it is not sending out a search to other library’s collections using APIs. It is searching an index of the HathiTrust repository which contains approximately 13.9 million digitized works contributed by HathiTrust members.

    More detail are here:

    Tom Burton-West
    Information Retrieval Programmer
    Digital Library Production Service
    University of Michigan Library

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.