The following is a guest post by Maria A. Pallante, Register of Copyrights and Director of the U.S. Copyright Office.
Help Wanted: Have you ever attempted to build an electronic index and searchable database of a complex and diverse collection of 70 million imaged historical records? Neither have we.
Current records dating back to 1978 are available online and searchable at www.copyright.gov/records. The Office’s records date back to 1870, however, and many pertain to works still under copyright protection. These records are the focus of our current digitization efforts. This is an ambitious project that I announced recently as one of several priorities and special projects the Copyright Office is undertaking. To date nearly 13 million index cards from our card catalog and over half of the 660 volume Catalog of Copyright Entries have been scanned, and the images have been processed through quality assurance and moved to long-term managed storage.
So, back to the earlier question: How do we go about creating a searchable database comprised of 70 million digital objects? For that matter, how do we create metadata for such a large volume of records? Assuming we would like to achieve full-level indexing, how do we do so on a rudimentary indexing budget? What technologies and creative approaches can we profitably employ to get this work done? We welcome your ideas and suggestions on these and many other questions related to this project.
The Copyright Office historical catalog serves as the mint record of American creativity, and there are great benefits to making the collection accessible online. We know that working collaboratively will ensure that the final product best meets the needs of the widest audience of users. I hope you will subscribe to our project blog at http://blogs.loc.gov/copyrightdigitization/ and visit our project web page at www.copyright.gov/digitization from time to time. Most of all, I hope that you will be an active partner in this important effort.