The following is a guest post by Maria A. Pallante, Register of Copyrights and Director of the U.S. Copyright Office. See the new U.S. Copyright Office blog at http://blogs.loc.gov/copyrightdigitization/
Help Wanted: Have you ever attempted to build an electronic index and searchable database of a complex and diverse collection of 70 million imaged historical records? Neither have we.
Current records dating back to 1978 are available online and searchable at www.copyright.gov/records. The office’s records date back to 1870, however, and many pertain to works still under copyright protection. These records are the focus of our current digitization efforts. This is an ambitious project that I announced recently as one of several priorities and special projects the U.S. Copyright Office is undertaking. To date nearly 13 million index cards from our card catalog and over half of the 660 volume Catalog of Copyright Entries have been scanned, and the images have been processed through quality assurance and moved to long-term managed storage.
So, back to the earlier question: How do we go about creating a searchable database comprised of 70 million digital objects? For that matter, how do we create metadata for such a large volume of records? Assuming we would like to achieve full-level indexing, how do we do so on a rudimentary indexing budget? What technologies and creative approaches can we profitably employ to get this work done? We welcome your ideas and suggestions on these and many other questions related to this project.
The Copyright Office historical catalog serves as the mint record of American creativity, and there are great benefits to making the collection accessible online. We know that working collaboratively will ensure that the final product best meets the needs of the widest audience of users. I hope you will subscribe to our project blog at http://blogs.loc.gov/copyrightdigitization/ and visit our project web page at www.copyright.gov/digitization from time to time. Most of all, I hope that you will be an active partner in this important effort.
Comments (3)
The articles notes: ” …..1870….and many pertain to works still under copyright protection.”
Well, how does a work loses its copyright protection to begin with?
Is a copyright protection a time-limited entity??
thanks if anyone knows the answers..
I created an electronic index of roughly 800 images from a trip to New Zealand (unsearchable mind you) and was exhausted and nauseous at completion. 70 million? What does that number even mean?
This is a big but important project. With digital technology and OCR capabilities I would imagine that this process will develop momentum and be completed faster than we might expect. It reminds me of the DNA sequencing problem. It was thought that the sequencing of DNA would take years, however with technology and computers the time frame was dramatically shortened.