Copyright Records: Short term strategies for making them more accessible

The number of non-digital Copyright records (70 million) and the constraints on funding make the digitization of copyright records a long-term project. But that doesn’t mean we can’t make some records available sooner rather than later. We’re looking at several strategies and are eager for your feedback and ideas.

First, we can demonstrate what’s possible, engage users, and make some records available online through a search and retrieval pilot using a small but complete subset of the records indexed by multiple fields with links to images of the original copyright records. Options include the records of transfers and assignments of copyrights. The 2.5 million catalog cards with indexes to approximately 350,000 documents recorded between 1870 and 1977 have already been digitized. PDF copies of the documents also exist. Transfer and assignment records must be consulted to determine the complete ownership history of any copyright and so their availability online would form a nice complement to the Catalog of Copyright Entries from 1891 to 1977, which are being digitized onsite here at the Library and made available through the Internet Archive website. Nearly two thirds complete, CCE records are now available online back to 1936.   Another option is the set of records of prints and labels registered between 1922 and 1940.  This set is much smaller, about 43 thousand registrations, and could be done sooner but it may be too confined to be a model for the 16 million records referring to many other types of copyrightable material.  A third option is the set of registration records from 1971 to 1977.  This is a much larger set of 7.7 million catalog cards with indexes to 2.8 million registrations and would require considerably more time to complete.  I seek your comments on which of these three options would be most useful to you.

Second, as an interim measure while full record indexing is underway, we are considering making the catalog card images available online through a virtual card catalog organized hierarchically by type of record, time period, drawer name, and card image number. This could be done after digitization of each set and would enable online searching of these records in a manner that mimics searching the actual cards. While this would require a few more steps to search for a particular term, it would enable viewing surrounding records, a feature considered useful by some users.

Third, we are exploring the feasibility, costs, and benefits of optical character recognition and double-blind data capture as possible options for extracting data from copyright records. Indexing 70 million records is a daunting task and way beyond present staff resources. At the same time, the accuracy and integrity of the records is of paramount importance. Through prototyping and piloting and your feedback, we plan to find the optimal approach that will capture the necessary information correctly and completely. Whether captured through keyboarding or OCR there must be a second pass of the data for verification. In concert with this we are considering how we might use crowd-sourcing to engage large numbers of interested persons to help with the data capture and verification.

Fourth, we are going to publicize the project through the Copyright website and other media such as this blog to generate excitement, seek input, and garner support for the project.

As always, your feedback and comments are most important and most welcome.


  1. Greg Cram
    December 30, 2011 at 10:53 am

    As the Rights Clearance Analyst for the New York Public Library, I use CCEs (first on microfilm and now on IA) very frequently to check for renewals. Having the transfers and assignments records available would be helpful for us in our efforts to track down current rights holders.

    It would also be nice if a search engine could be built to search across non-book CCEs without having search year-by-year. Standford’s renewal database is wonderful for books, but we’re missing a similar database for all of the other types of works.


  2. Sharad Shah
    January 3, 2012 at 12:22 pm

    Tough call on the three options. However, for purposes of research and retrieval, I believe the 1970-77, while large, may prove most beneficial. It’s a case where both the original claimants, their relatives, and others may seek information on the records. More people would benefit from it, I believe.

    As far as data extraction from the copyright records, I was talking this over with someone who evidently knows her stuff, and she directed towards reCAPTCHA.

  3. Sharad Shah
    January 3, 2012 at 12:24 pm

    Looking at the site, it says 99.5% accuracy. Not sure how it holds up against handwritten information, but definitely worth a look.