More Open eBooks: Routinizing Open Access eBook Workflows

This is a guest post by Kristy Darby, a Digital Collections Specialist in the Digital Content Management Section in Library Services.

Figure 1. Youjeong Oh’s Pop City: Korean Popular Culture and the Selling of Place, one of the open access books available from the Library of Congress collections.

We are excited to share that anyone anywhere can now access a growing online collection of contemporary open access eBooks from the Library of Congress website. For example, you can now directly access books such as Cory Doctorow’s Little Brother, Yochai Benkler’s The Wealth of Networks, and Youjeong Oh’s Pop City: Korean Popular Culture and the Selling of Place from the Library of Congress website. All of these books have been made broadly available online in keeping with the intent of their creators and publishers, which chose to publish these works under open access licenses.

A key objective of the Library of Congress digital collecting plan is the development and implementation of an acquisitions program for openly available content. We have previously discussed a number of open access book projects, including open access Latin American books, and open access children’s books. Significantly, the Library of Congress has long been receiving print copies of open access books through multiple routine acquisition streams. These openly licensed works can be made much more broadly accessible in their digital form.

These books are the result of a pilot effort of the Digital Content Management Section (DCM). DCM staff, in collaboration with the Collection Development Office (CDO), identified books available through Directory of Open Access Books (DOAB) of which the Library already holds a copy in print. DOAB is a digital directory that provides access to academic peer-reviewed books available under open access licenses.

While all the books in DOAB could potentially be considered for addition to the Library of Congress collections, all books added to the collection go through a selection process whereby subject matter experts determine which works are in scope based on the collection policy statements. By identifying matches in DOAB to print holdings of the Library of Congress, we could identify a set of works for which a selection decision had already been made.

Analyzing DOAB Data

Identifying books to include in this pilot project required some data crunching. DOAB provides metadata about all eBooks available through their service, so staff compared ISBNs from DOAB books against ISBNs for books in the Library’s catalog. This provided a list of matches on books the Library holds in print and books included in DOAB, which gave DCM staff a list of books to work on as part of this pilot. DCM staff carefully inspected each book to ensure that the creator had licensed it under an open access license, such as Creative Commons.

Processing Open Access eBooks

Figure 2. The ILS record for Pop City : Korean Popular Culture and the Selling of Place which now includes a direct link to the copy of the work in the Library’s digital collections.

DCM staff established strategies for taking the book from the very beginning – identifying titles to process – to the end, which is full and open access on Because the Library already holds the print books as part of its collection, the Library’s catalog includes a MARC bibliographic record for the book. DCM staff, with the help of staff from the Integrated Library Systems Program Office, developed a method of cloning and then transforming the metadata to create new records for the eBook counterparts, making them discoverable in the catalog.

These new eBook records include information about the terms of the open license the work is provided under in the MARC 540 field. Staff made necessary changes to the records to ensure that the books and the accompanying metadata would display correctly on DCM staff downloaded the eBook files from DOAB and processed the files for presentation on as well as for long-term preservation. The DOAB eBooks were made available via after processing was complete and the content and metadata were live on the website.

Expanding Access and Enhancing Resilience in the Commons

Figure 3. A view of some of the book covers for open access titles now available through the Library of Congress digital collections.

The books added to the collection through the DOAB pilot are digital versions of print books already held by the Library. The print books are only available to researchers who visit one of the reading rooms at the Library of Congress in Washington, DC and sign up for a reader card. The eBooks are openly available on without any restrictions. There is no travel, registration, or authentication necessary.

These eBooks are available to anyone in the world with an internet connection. Also, by collecting the eBooks in addition to the print books, the Library commits to preserving the digital content and providing lasting access to this content. While it would be possible to simply link to copies of these books hosted elsewhere, which many libraries do, the Library of Congress is invested in preserving content for the long term that is added to its collection. By acquiring the digital files for these works the Library is helping to support enduring access to these works for communities around the world.

To explore how these workflows and processes would scale, over a three month period, DCM staff processed and provided access to over three hundred OA eBooks recorded in DOAB. The workflows created, codified, and documented during the pilot project will now be used to support routine OA eBook processing not only for DOAB, but for any OA eBook projects in which DCM is involved.

We are excited to continue to refine and improve this process. You can find books like these alongside other open access books in the Library’s collection at this link: //

Foreign Law Web Archives

Law and government are major areas of web archiving at the Library of Congress, and feature prominently among the event and thematic collections available on The Law Library, which holds the largest collection of legal materials in the world, also coordinates the collection of Law websites through five significant collections: the Federal Courts Web […]

Centralized Digital Accessioning at Yale University

This is a guest post from Alice Prael, Digital Accessioning Archivist for Yale Special Collections at the Beinecke Rare Book & Manuscript Library at Yale University. As digital storage technology progresses, many archivists are left with boxes of obsolete storage media, such as floppy disks and ZIP disks.  These physical storage media plague archives that […]

Developing a Digital Preservation Infrastructure at Georgetown University Library

This is a guest post by Joe Carrano, a resident in the National Digital Stewardship Residency program. The Joseph Mark Lauinger Memorial Library is at home among the many Brutalist-style buildings in and around Washington, D.C. This granite-chip aggregate structure, the main library at Georgetown University, houses a moderate-sized staff that provides critical information needs […]

Using Three-Dimensional Modeling to Preserve Cultural Heritage

This is a guest post by Elizabeth England, a resident in the National Digital Stewardship Residency program. In recent years, a few news stories focused on the use of digital tools in preserving cultural heritage three-dimensional objects, stories such as the printed reconstruction of the Arch of Triumph in Palmyra, Syria and the construction of a […]

The Keepers Registry: Ensuring the Future of the Digital Scholarly Record

This is a guest post by Ted Westervelt, section head in the Library of Congress’s US Arts, Sciences & Humanities Division. Strange as it now seems, it was not that long ago that scholarship was not digital. Writing a dissertation in the 1990s was done on a computer and took full advantage of the latest […]

The TriCollege Libraries Consortium and Digital Content

This is a guest post from Stefanie Ramsay, a Digital Collections Librarian at Swarthmore College, which is part of the TriCollege Libraries consortium. Consortium arrangements among libraries and archives are an increasingly popular strategy for managing the large amount of digital content they produce and for providing increased access to these important materials. Luckily for […]

“Volun-peers” Help Liberate Smithsonian Digital Collections

The Smithsonian Transcription Center creates indexed, searchable text by means of crowdsourcing…or as Meghan Ferriter, project coordinator at the TC describes it, “harnessing the endless curiosity and goodwill of the public.” As of the end of the current fiscal year, 7,060 volunteers at the TC have transcribed 208,659 pages. The scope, planning and execution of the […]

New FADGI Guidelines for Embedded Metadata in DPX Files

The Federal Agencies Digitization Guidelines Initiative Audio-Visual Working Group is pleased to announce that its new draft publication, Embedding Metadata in Scanned Motion Picture Film Files: Guideline for Federal Agency Use of DPX Files, is available for public comment. The Digital Picture Exchange format typically stores image-only data from scanned motion picture film or born-digital […]