Where are we now? — Project accomplishments so far

A detailed analysis of the Copyright records has been completed, and test scanning has been done to determine the best digitization parameters for the several formats of the records. For optimal preservation, the records will be scanned in uncompressed tagged image file format (TIFF) at a minimum of 300 pixels per inch (ppi) in 24 bit color. For routine access to the digitized records, derivative files will be created in high quality JPEG and JPEG2000 format at 50:1 compression. Production scanning has begun on three sets of records.

First, the 2.5 million catalog cards which constitute the indexes to assignments and transfers of copyrights from 1870 to 1977 have been digitized and the images placed in archival storage in our data center and at a backup site. Four Metadata Specialists have been working part-time to capture index terms from the assignment and transfer card images in local databases as a prototype for capturing data to create a publicly available online index to the digitized records. Data has been captured from more than 126 thousand images.

Second, digitization of the 7.7 million registration catalog cards from the period 1971 to 1977 has been completed, and over 2.4 million cards have been digitized so far from the 1955 to 1970 period.  Work is continuing on the card scanning.

Third, the 660 bound volumes of the Catalogs of Copyright Entries are being scanned at the Internet Archive center in the Library of Congress Adams Building. This is the same center and process being used to scan works from the Library’s collections. 417 volumes have been digitized so far including registrations and renewals from 1936 to 1977.  They are available at http://www.archive.org/details/copyrightrecords/ with a limited search capability based on the results of optical character recognition (OCR) of the scanned text. This work will require another year to complete.

The focus now is on how to index the records and make them widely available via the web. This is the central purpose of this blog. I and others will periodically post information about what we’re working on and seek input from you about how you think we should proceed. Your input is very important to us so that we can build the very best system to meet your needs for copyright information. If you have interest in copyrights and in seeing the records made more accessible, then please follow our posts and provide us with your feedback and comments.

5 Comments

  1. Philip Merrill
    December 21, 2011 at 5:33 pm

    This is an exciting and incredibly worthwhile project. Kudos to all of you working on it!

  2. Carlos Leyva
    December 22, 2011 at 8:53 am

    Agree, with Philip, this is a high value add to the copyright community and an example of our tax dollars well spent. Great job!

  3. John Mark Ockerbloom
    January 8, 2012 at 3:32 pm

    I’m very glad to see this, and I’m now in the process of indexing these volumes (including statistics and links directly to renewal sections) on my Catalog of Copyright Records site at

    http://onlinebooks.library.upenn.edu/cce/

    I’m hoping that you’ll be putting up additional volumes as well, at least as far back as 1923 (the earliest copyrights that might still be in force).

    I’m going through the volumes now, and did find one with some pages missing. Specifically, the copy you have on the Internet Archive of the 1974 Commercial prints and labels volume skips from the front cover pages to page 7. I now link to a HathiTrust scan that has the missing pages; you might want to see about including or rescanning those pages in your copy.

    Many, many thanks for this valuable project!

  4. Mike Burke
    January 9, 2012 at 3:07 pm

    John,

    Thank you for pointing out the missing pages in the 1974 volume. Indeed the volume was missing those pages but our other copy has the pages and we will be rescanning them.

    We plan to continue scanning all of the CCEs back to their origin in 1891.

    Mike Burke

  5. Claudio Benincase
    July 25, 2012 at 1:39 pm

    Cheers discussing your notions for this web website.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. Your submission may be subject to disclosure under the Freedom of Information Act (FOIA). The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.