24.9 million cards out of the estimated 40 million in the Copyright Card Catalog have been digitized, quality checked and image files placed in secure Library storage. By September 30 we expect to have completed over 30 million. In addition to the cards, all 667 volumes of the published Catalog of Copyright Entries from 1891 to 1978 have been digitized and are available online at the Internet Archive: www.archive.org/details/copyrightrecords/.
While this progress is satisfying the preservation goal of the project, we’ve also come a long way in figuring out how to make the records available online. We continue to pursue a two stage approach that includes a near term virtual card catalog solution through which card images could be searched in a way mimicking the searching of the actual cards (see earlier post http://blogs.loc.gov/copyrightdigitization/?p=823), and a longer term (because of cost) solution based on conversion of the card content from the images to create online database records. I mentioned in my last post about how a group of Copyright Office staff were studying the cards to define patterns and characteristics that can facilitate parsing the data into designated fields for indexing. This has been completed for the 1971 to 1977 registration cards and we’re using that detailed information to get a better idea of the cost for conversion and indexing.
Two requests for information (RFIs) have been posted on the Federal Business Opportunities website operated by the U.S. General Services Administration (www.fbo.gov). One describes the content, characteristics and patterns found in the 1971 to 1977 registration cards (solicitation number COP20130027). The other similarly describes the 1870 to 1977 assignment and transfer cards (solicitation number COP20130026). If you are interested and have experience and the resources to capture, verify, parse and organize data from a high volume of document images, I encourage you to visit the FBO site and look at the two RFIs. All of the content in the Copyright catalog cards is public information. The Copyright Office will consider all viable approaches including crowdsourcing.