Top of page

Information or Artifact: Digitizing a Book, Part 2

Share this post:

The following is a guest post by Carl Fleischhauer, a Digital Initiatives Project Manager in NDIIPP.

Yesterday, I blogged about the digital reformatting of historical books and other documents.  I reported that virtually all digitization projects in memory institutions present the information from the pages in the form of a searchable text.  I also noted the variation in the types of images that are typically employed to represent the book as an artifact or, at minimum, to compensate for the inaccuracies in the automated OCR transcriptions of the text by presenting a facsimile image of each page.

Although not typical of memory institution reformatting programs, it is worth spotlighting a special form of scanning–perhaps the maximal expression of artifactual imaging–that documents the physical structure of the inks, the paper or parchment, and other aspects of the original.  This form of imaging is intended to support scientific study and the careful work of object conservation.  As an example, the Library of Congress Preservation Directorate has carried out a “hyperspectral” look at the historic Waldseemueller map.  (See also the team’s discussion of the technology).

Another example of scientific imaging is from the Archimedes Palimpsest project, an examination of a medieval manuscript on parchment on deposit at The Walters Art Museum in Baltimore.  Unlike paper, parchment is sufficiently durable that you can take a knife, scrape off the text, and then overwrite it with a new text.  The pages in the Archimedes Palimpsest came from five older books that medieval scribes had taken apart, scraped, and reinscribed to rebind as a prayerbook.  The science team at the Walters used multispectral imaging to see the hidden writing (and diagrams) under the new text.

Figure 1. At left is a normal-color scan of a segment of two pages in the Archimedes Palimpsest, showing the most recent text (a prayerbook) more or less as the eye would see it. The gutter from the binding runs across the middle of the image. At right is image that resulted from multispectral imaging and post-processing. Twelve images were created for each page, each exposed to a different wavelength of light, from ultraviolet through infrared. This image set was post-processed in a variety of ways, including one that "peel(ed) off the prayerbook text completely and reveal(ed) the undertext alone." The drawings are important "because the Archimedes Palimpsest is a unique source for the diagrams that Archimedes himself drew in the sand, in Syracuse, in the third century BC." From http://www.archimedespalimpsest.org/imaging_imageprocessing1.html; The Imaging of the Archimedes Palimpsest: Image Processing.
Figure 1. At left is a normal-color scan of a segment of two pages in the Archimedes Palimpsest, showing the most recent text (a prayerbook) more or less as the eye would see it. The gutter from the binding runs across the middle of the image. At right is image that resulted from multispectral imaging and post-processing. Twelve images were created for each page, each exposed to a different wavelength of light, from ultraviolet through infrared. This image set was post-processed in a variety of ways, including one that "peel(ed) off the prayerbook text completely and reveal(ed) the undertext alone." The drawings are important "because the Archimedes Palimpsest is a unique source for the diagrams that Archimedes himself drew in the sand, in Syracuse, in the third century BC." From http://www.archimedespalimpsest.org/imaging_imageprocessing1.html; The Imaging of the Archimedes Palimpsest: Image Processing.

Meanwhile, at the informational end of spectrum, we have a multiyear project to scan the Copyright Office card catalog, where words-on-the-card are of the paramount importance.  The 45 million cards in this catalog provide an index to copyright registrations and transfers of ownership in the United States from 1870 to 1977, offering a window into the literary, musical, artistic, and scientific production of the United States and foreign countries.  These cards are an important supplement to the Library’s main catalog because only some of the works deposited for copyright are selected for inclusion in the Library’s collections and we do not always fully catalog the works we select.

Figure 2.  Scanned image of a copyright catalog card that records the 1974 assignment of a copyright to a set of musical selections from one party to another.
Figure 2. Scanned image of a copyright catalog card that records the 1974 assignment of a copyright to a set of musical selections from one party to another.

The copyright card scanning project started a little over a year ago and relatively high quality uncompressed master files have been produced.  The planners are certain that very good OCR results can be obtained from these images.  Meanwhile, some in the planning group feel that equally good results could also be obtained using a lossy compressed variant of the JPEG 2000 format, and there has been some informal discussion of producing future master files in this image format.

Why do we care about these varying imaging specifications?  They are all on the docket for the Still Image Working Group in the Federal Agencies Digitization Guidelines Initiative, in which the Library is key player.  The Working Group continues to refine its guidelines for still-image scanning, recognizing that recommended imaging specifications will vary according to a given project’s objectives.  For books (and other textual materials), every project seeks to get informational content.  But as reported above, there is considerable variation in the degree of importance assigned to artifactual values, which accounts for the wide range of image types: from something that looks like a Xerox copy to something that looks like an art museum poster to scientific representations of a page’s microstructure.

The Working Group guidelines will also cover pictorial materials, notably photographs.  The group’s consideration of the imaging of photographic negatives and transparencies distinguishes between informational and artifactual in a slightly different way.  That will be the subject of my next blog.

Comments (2)

  1. Carl, thank you for these wonderful overviews.

  2. Fascinating about the palimpsest, thanks for the article! Great image of the two scans side-by-side.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.


Required fields are indicated with an * asterisk.