Top of page

Yes, The Library of Congress Develops Lots of Software Tools

Share this post:

When I meet new people and tell them that I work at the Library of Congress, the first question is usually “What do you do there?”  When I explain that part of my job is to oversee a group that develops software, the reaction is often one of surprise: “What? Libraries develop software?”

Yes.  Yes they do.

There are a number of groups at the Library of Congress that develop and customize software: the ILS Office, the U.S. Copyright Office, the Network Development and MARC Standards Office and the central Library IT Systems department.  I have the distinct pleasure of working with the Repository Development Center (RDC) at the Library, which is responsible for the development of a number of web applications and tools that support the acquisition, management, preservation and delivery of digital collections.  The RDC is busy on many fronts.

Washington Times, August 8, 1911
Front page of the August 8, 1911 Washington Times from Chronicling America, Library of Congress collection

Chronicling America is the website that manages and delivers historic newspaper content for the National Endowment for the Humanities-supported National Digital Newspaper Program.  The RDC created and maintains a robust web application, which handles the ingest and indexing of the newspaper page images, optical character recognition results and metadata.  The application provides for search functionality, including machine access through various web services.  Of special note is the availability of the entire newspaper data set as linked open data.  We are also responsible for the Digital Viewer and Validator, which is made available to the partner organizations to ensure that files meet the standards set for the program.  In 2011, the program released the LC Newspaper Viewer,  the Web-based delivery application that powers the Chronicling America site,  as open source software.

The RDC team is working on a suite of solutions, known as Content Transfer Services, that focuses on digital content life cycle actions that are undertaken primarily at the bit level, including transferring, moving and inventorying files, as well as verifying that files have not changed over time.  To this end we are working on the Library’s Inventory System to record life cycle events and support auditing; the BagIt Specification for the packaging of content; the Bagger desktop application and BIL Java Library to support the use of BagIt; and workflow tools that leverage both to ensure repeatable, documented, audited processes.  The team has developed these tools for use at the Library, but many have now been released as open source.

Japanese Ukiyo-e print from the World Digital Library
Japanese Ukiyo-e print from the World Digital Library, Kuniyoshi Utagawa, 1850, Library of Congress collection

The World Digital Library is a joint project of the Library and the United Nations Educational, Scientific and Cultural Organization.  WDL makes it possible to discover, study and enjoy cultural treasures from around the world on one site, in a variety of ways. Information may easily be browsed by place, time, topic, type of item and contributing institution, or can be located by an open-ended search. Navigation tools and content descriptions are provided in Arabic, Chinese, English, French, Portuguese, Russian and Spanish. Many more languages are represented in the actual books, manuscripts, maps, photographs and other primary materials, which are provided in their original languages.  The RDC team created the web application, produced a cataloging application to manage the collation and enrich metadata from partners and developed a variety tools for content production workflows.

Another team new to the RDC focuses on support for the web archiving activities at the Library.  This team has led the development of the DigiBoard, a tool that allows nominators to select websites to be archived, as well as streamlining the permissions tracking, quality review processes and reporting for the web archives.   The team members are also active participants in the Preservation Working Group of the International Internet Preservation Consortium.

The RDC also has the distinct pleasure of developing tools for the acquisition and processing of content in connection with the Library’s acquisition of the Twitter archive.  We also work with the U.S. Copyright Office and our colleagues in Library Services on developing tools needed for processing of electronic journals added to the Library’s collection through electronic deposit.  Other activities include developing tools to provide a Digital Conversion Automation Framework for the quality review of newly digitized Library collections and records, as well as collaborating with colleagues at other institutions on a developing RESTful Bag Server Specification.  And we enthusiastically participate in numerous other initiatives, including JHOVE2, the development of World Wide Web Consortium semantic web standards and the Federal Agencies Digitization Guidelines Initiative.

The entire team is one that never avoids a challenge and often seeks them out.  It is a point of pride for us all to have a role in both helping to preserve digital collections and making them accessible.

 

Comments (3)

  1. I have an automotive history website, and it’s a real struggle trying to catalog and present materials and collections of objects on the web. I was curious about the software used for the main page on the LOC. Is it custom made for the Library, or customized from an existing platform? Is this system available to the general public for installation on private webservers? I do have full permissions on my own webserver.

  2. To research inventions and access to inventions.

  3. I am looking for a repository of computer program source code that is in the Public Domain.
    (Not Open Source code, or Free Source code like Linux and it’s many spinoffs)
    Public Domain source code is unrestricted programing code, No longer copy right protected
    or copy Left protected as in Linux.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.


Required fields are indicated with an * asterisk.