When I meet new people and tell them that I work at the Library of Congress, the first question is usually “What do you do there?” When I explain that part of my job is to oversee a group that develops software, the reaction is often one of surprise: “What? Libraries develop software?”
Yes. Yes they do.
There are a number of groups at the Library of Congress that develop and customize software: the ILS Office, the U.S. Copyright Office, the Network Development and MARC Standards Office and the central Library IT Systems department. I have the distinct pleasure of working with the Repository Development Center (RDC) at the Library, which is responsible for the development of a number of web applications and tools that support the acquisition, management, preservation and delivery of digital collections. The RDC is busy on many fronts.
Chronicling America is the website that manages and delivers historic newspaper content for the National Endowment for the Humanities-supported National Digital Newspaper Program. The RDC created and maintains a robust web application, which handles the ingest and indexing of the newspaper page images, optical character recognition results and metadata. The application provides for search functionality, including machine access through various web services. Of special note is the availability of the entire newspaper data set as linked open data. We are also responsible for the Digital Viewer and Validator, which is made available to the partner organizations to ensure that files meet the standards set for the program. In 2011, the program released the LC Newspaper Viewer, the Web-based delivery application that powers the Chronicling America site, as open source software.
The RDC team is working on a suite of solutions, known as Content Transfer Services, that focuses on digital content life cycle actions that are undertaken primarily at the bit level, including transferring, moving and inventorying files, as well as verifying that files have not changed over time. To this end we are working on the Library’s Inventory System to record life cycle events and support auditing; the BagIt Specification for the packaging of content; the Bagger desktop application and BIL Java Library to support the use of BagIt; and workflow tools that leverage both to ensure repeatable, documented, audited processes. The team has developed these tools for use at the Library, but many have now been released as open source.
The World Digital Library is a joint project of the Library and the United Nations Educational, Scientific and Cultural Organization. WDL makes it possible to discover, study and enjoy cultural treasures from around the world on one site, in a variety of ways. Information may easily be browsed by place, time, topic, type of item and contributing institution, or can be located by an open-ended search. Navigation tools and content descriptions are provided in Arabic, Chinese, English, French, Portuguese, Russian and Spanish. Many more languages are represented in the actual books, manuscripts, maps, photographs and other primary materials, which are provided in their original languages. The RDC team created the web application, produced a cataloging application to manage the collation and enrich metadata from partners and developed a variety tools for content production workflows.
Another team new to the RDC focuses on support for the web archiving activities at the Library. This team has led the development of the DigiBoard, a tool that allows nominators to select websites to be archived, as well as streamlining the permissions tracking, quality review processes and reporting for the web archives. The team members are also active participants in the Preservation Working Group of the International Internet Preservation Consortium.
The RDC also has the distinct pleasure of developing tools for the acquisition and processing of content in connection with the Library’s acquisition of the Twitter archive. We also work with the U.S. Copyright Office and our colleagues in Library Services on developing tools needed for processing of electronic journals added to the Library’s collection through electronic deposit. Other activities include developing tools to provide a Digital Conversion Automation Framework for the quality review of newly digitized Library collections and records, as well as collaborating with colleagues at other institutions on a developing RESTful Bag Server Specification. And we enthusiastically participate in numerous other initiatives, including JHOVE2, the development of World Wide Web Consortium semantic web standards and the Federal Agencies Digitization Guidelines Initiative.
The entire team is one that never avoids a challenge and often seeks them out. It is a point of pride for us all to have a role in both helping to preserve digital collections and making them accessible.