<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Copyright Matters: Digitization and Public Access</title>
	<atom:link href="http://blogs.loc.gov/copyrightdigitization/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.loc.gov/copyrightdigitization</link>
	<description>Engage interested parties in the digitization and accessibility of non-digital Copyright Office records.</description>
	<lastBuildDate>Fri, 05 Apr 2013 21:26:56 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.1</generator>
		<item>
		<title>Cumulative Motion Pictures and Dramas</title>
		<link>http://blogs.loc.gov/copyrightdigitization/2013/04/cumulative-motion-pictures-and-dramas/</link>
		<comments>http://blogs.loc.gov/copyrightdigitization/2013/04/cumulative-motion-pictures-and-dramas/#comments</comments>
		<pubDate>Fri, 05 Apr 2013 21:26:56 +0000</pubDate>
		<dc:creator>Mike Burke</dc:creator>
				<category><![CDATA[About the Project and the Records]]></category>
		<category><![CDATA[Search Strategies]]></category>

		<guid isPermaLink="false">http://blogs.loc.gov/copyrightdigitization/?p=991</guid>
		<description><![CDATA[In previous posts, I&#8217;ve mentioned the digitization of the Catalog of Copyright Entries from 1891 to 1978 and their availability on the Internet Archive website at: www.archive.org/details/copyrightrecords/.   Seven more related volumes have just been added to the collection which you may find useful if searching for motion pictures or dramatic compositions.  The volumes are a [...]]]></description>
			<content:encoded><![CDATA[<p>In previous posts, I&#8217;ve mentioned the digitization of the Catalog of Copyright Entries from 1891 to 1978 and their availability on the Internet Archive website at: <a href="http://www.archive.org/details/copyrightrecords/">www.archive.org/details/copyrightrecords/</a>.   Seven more related volumes have just been added to the collection which you may find useful if searching for motion pictures or dramatic compositions.  The volumes are a cumulative series of these works, published at various times by the Copyright Office.</p>
<div id="attachment_989" class="wp-caption alignright" style="width: 239px"><a href="http://blogs.loc.gov/copyrightdigitization/files/2013/04/motionpict19121939librrich_0007.jpg"><img class="size-medium wp-image-989" title="motionpict19121939librrich_0007" src="http://blogs.loc.gov/copyrightdigitization/files/2013/04/motionpict19121939librrich_0007-229x300.jpg" alt="Cumulative Motion Pictures" width="229" height="300" /></a><p class="wp-caption-text">Cumulative Motion Pictures</p></div>
<ul>
<li>Cumulative motion pictures 1894 to 1912   <a href="http://archive.org/details/Motionpict18941912librrich0013">http://archive.org/details/Motionpict18941912librrich0013</a></li>
<li>Cumulative motion pictures 1912 to 1939      <a href="http://archive.org/details/Motionpict19121939librrich0010">http://archive.org/details/Motionpict19121939librrich0010</a></li>
<li>Cumulative motion pictures 1940 to 1949      <a href="http://archive.org/details/Motionpict19401949librrich0010">http://archive.org/details/Motionpict19401949librrich0010</a></li>
<li>Cumulative motion pictures 1950 to 1959      <a href="http://archive.org/details/Motionpict19591960librrich0008">http://archive.org/details/Motionpict19591960librrich0008</a></li>
<li>Cumulative motion pictures 1960 to 1969      <a href="http://archive.org/details/Motionpict19601969librrich0013">http://archive.org/details/Motionpict19601969librrich0013</a></li>
</ul>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<div id="attachment_990" class="wp-caption alignright" style="width: 238px"><a href="http://blogs.loc.gov/copyrightdigitization/files/2013/04/dramaticcomposit01libr_0005.jpg"><img class=" wp-image-990" title="dramaticcomposit01libr_0005" src="http://blogs.loc.gov/copyrightdigitization/files/2013/04/dramaticcomposit01libr_0005-206x300.jpg" alt="Dramatic Compositions 1870 to 1916" width="228" height="328" /></a><p class="wp-caption-text">Dramatic Compositions 1870 to 1916</p></div>
<ul>
<li>Dramatic compositions 1870 to 1916 Volume 1      <a href="http://archive.org/details/Dramaticcomposit01libr0012_201303">http://archive.org/details/Dramaticcomposit01libr0012_201303</a></li>
<li>Dramatic compositions 1870 to 1916 Volume 2      <a href="http://archive.org/details/Dramaticcomposit02libr0012">http://archive.org/details/Dramaticcomposit02libr0012</a></li>
</ul>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>We found that the volumes were already available on the IA website but in other collections.  The motion picture volumes had been digitized by the Prelinger Library and the dramatic composition volumes by the Boston Public Library.  Both institutions concurred with our adding copies of their image files to the Copyright Records collection and thereby avoiding the cost of digitizing our own copies.</p>
<p>The need to include these volumes in the collection came as a result of discussions with frequent users of Copyright records.  The cumulative motion picture volumes are a convenient alternative to searching multiple annual CCE volumes.  The catalog of dramatic compositions includes all such titles registered from July 21, 1870 to December 31, 1916 inclusive, approximately 60,000 entries.  Both types of works had at times been registered in different classes and these cumulative volumes bring them together.</p>
<p>In two of the motion picture volumes, 1912 to 1939 and 1940 to 1949, the registration numbers as shown include a series code of ‘U’ for unpublished works or ‘P’ for published works.  This series code was included based on information found in the application for copyright; however, such codes were not included in the actual assigned registration numbers until 1947.  Prior to that year the number consisted of simply the class code, L or M, followed by a sequential number.  Knowing this can be important when searching for related records such as renewals or assignments which may reflect the registration number without the series code.</p>
<p>We hope that you find the volumes in the Copyright Records collection useful while we continue to work towards online data records.  Any thoughts or feedback is most welcome.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.loc.gov/copyrightdigitization/2013/04/cumulative-motion-pictures-and-dramas/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Copyright digitization:  Moving right along!</title>
		<link>http://blogs.loc.gov/copyrightdigitization/2013/03/copyright-digitization-moving-right-along/</link>
		<comments>http://blogs.loc.gov/copyrightdigitization/2013/03/copyright-digitization-moving-right-along/#comments</comments>
		<pubDate>Fri, 22 Mar 2013 15:12:28 +0000</pubDate>
		<dc:creator>Mike Burke</dc:creator>
				<category><![CDATA[About the Project and the Records]]></category>
		<category><![CDATA[Crowdsourcing]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[Metadata]]></category>
		<category><![CDATA[Partners and Collaboration]]></category>

		<guid isPermaLink="false">http://blogs.loc.gov/copyrightdigitization/?p=967</guid>
		<description><![CDATA[24.9 million cards out of the estimated 40 million in the Copyright Card Catalog have been digitized, quality checked and image files placed in secure Library storage.  By September 30 we expect to have completed over 30 million.  In addition to the cards, all 667 volumes of the published Catalog of Copyright Entries from 1891 [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_12" class="wp-caption alignleft" style="width: 310px"><a href="http://blogs.loc.gov/copyrightdigitization/files/2011/12/card_catalog.jpg"><img class="size-medium wp-image-12" title="Copyright Card Catalog" src="http://blogs.loc.gov/copyrightdigitization/files/2011/12/card_catalog-300x192.jpg" alt="" width="300" height="192" /></a><p class="wp-caption-text">Copyright Card Catalog</p></div>
<p>24.9 million cards out of the estimated 40 million in the Copyright Card Catalog have been digitized, quality checked and image files placed in secure Library storage.  By September 30 we expect to have completed over 30 million.  In addition to the cards, all 667 volumes of the published Catalog of Copyright Entries from 1891 to 1978 have been digitized and are available online at the Internet Archive: <a href="http://www.archive.org/details/copyrightrecords/">www.archive.org/details/copyrightrecords/</a>.</p>
<p>While this progress is satisfying the preservation goal of the project, we’ve also come a long way in figuring out how to make the records available online.  We continue to pursue a two stage approach that includes a near term virtual card catalog solution through which card images could be searched in a way mimicking the searching of the actual cards (see earlier post <a href="http://blogs.loc.gov/copyrightdigitization/?p=823">http://blogs.loc.gov/copyrightdigitization/?p=823</a>), and a longer term (because of cost) solution based on conversion of the card content from the images to create online database records.  I mentioned in my last post about how a group of Copyright Office staff were studying the cards to define patterns and characteristics that can facilitate parsing the data into designated fields for indexing.  This has been completed for the 1971 to 1977 registration cards and we’re using that detailed information to get a better idea of the cost for conversion and indexing.</p>
<p>Two requests for information (RFIs) have been posted on the Federal Business Opportunities website operated by the U.S. General Services Administration (<a href="http://www.fbo.gov/">www.fbo.gov</a>).  One describes the content, characteristics and patterns found in the 1971 to 1977 registration cards (solicitation number COP20130027).  The other similarly describes the 1870 to 1977 assignment and transfer cards (solicitation number COP20130026).  If you are interested and have experience and the resources to capture, verify, parse and organize data from a high volume of document images, I encourage you to visit the FBO site and look at the two RFIs.  All of the content in the Copyright catalog cards is public information.  The Copyright Office will consider all viable approaches including crowdsourcing.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.loc.gov/copyrightdigitization/2013/03/copyright-digitization-moving-right-along/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Getting ready for data capture:  Sorting out the details in the catalog cards</title>
		<link>http://blogs.loc.gov/copyrightdigitization/2012/12/getting-ready-for-data-capture-sorting-out-the-details-in-the-catalog-cards/</link>
		<comments>http://blogs.loc.gov/copyrightdigitization/2012/12/getting-ready-for-data-capture-sorting-out-the-details-in-the-catalog-cards/#comments</comments>
		<pubDate>Wed, 19 Dec 2012 20:35:56 +0000</pubDate>
		<dc:creator>Mike Burke</dc:creator>
				<category><![CDATA[About the Project and the Records]]></category>
		<category><![CDATA[Crowdsourcing]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[Metadata]]></category>

		<guid isPermaLink="false">http://blogs.loc.gov/copyrightdigitization/?p=935</guid>
		<description><![CDATA[Data capture and indexing of the pre-1978 Copyright records will be by all accounts a challenging task.  But some volunteer work recently done by Copyright staff may facilitate capturing the data and organizing it for effective searching.  Focusing on the 1971 to 1977 time period within the Copyright Card Catalog, they have identified over 30 [...]]]></description>
			<content:encoded><![CDATA[<p>Data capture and indexing of the pre-1978 Copyright records will be by all accounts a challenging task.  But some volunteer work recently done by Copyright staff may facilitate capturing the data and organizing it for effective searching.  Focusing on the 1971 to 1977 time period within the Copyright Card Catalog, they have identified over 30 characteristics and patterns in the free form textual data that should make it easier to convert it into searchable online records.</p>
<p>The Copyright Card Catalog is considered the most up-to-date index to pre-1978 copyright registrations.  It has been updated over time to reflect corrections and changes sometimes with handwritten annotations and sometimes with new cards.  And so it is also considered the best source for information to build an online searchable index.  The part of the catalog for registered works is divided into six time periods, the most recent being 1971 to 1977. There are 7.8 million cards for this time period representing 2.8 million registrations arranged in a single alphabetical index of names and titles. Each card has a heading that’s either a name or a title under which the card is filed.   For most registrations the heading is followed by a text paragraph that starts with the title of the work and includes the author and claimant names, the registration number assigned and the effective date of registration as well as other facts pertinent to the registration.  For renewal registrations, the original registration number and date are also included just after the copyright notice symbol.</p>
<div id="attachment_936" class="wp-caption aligncenter" style="width: 499px"><a href="http://blogs.loc.gov/copyrightdigitization/files/2012/12/CC19711977ACHIE-ACKERMAN_V.0004.jpg"><img class=" wp-image-936" title="CC19711977ACHIE-ACKERMAN_V.0004" src="http://blogs.loc.gov/copyrightdigitization/files/2012/12/CC19711977ACHIE-ACKERMAN_V.0004-1024x637.jpg" alt="" width="489" height="296" /></a><p class="wp-caption-text">(Typical catalog card from the 1971 to 1977 time period)</p></div>
<p>The information in the registration cards is not tagged nor is it in specific fields.  But the patterns found in the cards will enable a comparison of a card heading to data strings in the text paragraph and from that determine whether it’s a title, author name or claimant name.</p>
<p>Here’s a sample of the key patterns and characteristics found:</p>
<ul>
<li>99.6% of the cards had the title of the work at the beginning of the text paragraph;</li>
<li>76.6% of the cards had the registration number at the end of the text paragraph;</li>
<li>94.6% of the cards contained a copyright notice symbol © or Ⓟ;</li>
<li>Approximately 79% of the cards had the claimant name right after the © or Ⓟ;</li>
<li>Approximately 7% of the cards were for renewal registrations;</li>
<li>Author names often have “markers” such as the word “by” or the letters “w” or “m” for words or music indicating their role in the work;</li>
<li>Registration numbers are recognized by the distinct class prefix and by the location on the card image;</li>
<li>Date of registration has a consistent format and is almost always just before the registration number.</li>
</ul>
<p>The patterns and characteristics identified will be valuable input for the development of workflows and tasks used for data capture through crowd sourcing, a process looking more and more feasible for Copyright data.  Our research shows that workflows can be designed to capture and verify the data through keyboarding or they can incorporate OCR followed by cleansing and parsing using the crowd.  The data can be indexed in the appropriate fields and any of the information in the text paragraph can be made accessible through keyword searching.</p>
<p>Good progress has been made on the preservation front of the project with more than 22 million cards digitized and all volumes of the published Catalog of Copyright Entries scanned and available online through the Internet Archive.  Now we’re ready to move ahead on making the records more accessible online.  The Office will soon issue a Request for Quotation containing sufficient details about the cards, and the patterns and characteristics found, to obtain cost estimates for capturing the data contained in the 1971 to 1977 catalog cards.  I’ll keep you posted on progress and as always your input is most welcome and appreciated.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.loc.gov/copyrightdigitization/2012/12/getting-ready-for-data-capture-sorting-out-the-details-in-the-catalog-cards/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Current Copyright search capability:  Tell us what you think.</title>
		<link>http://blogs.loc.gov/copyrightdigitization/2012/08/current-copyright-search-capability-tell-us-what-you-think/</link>
		<comments>http://blogs.loc.gov/copyrightdigitization/2012/08/current-copyright-search-capability-tell-us-what-you-think/#comments</comments>
		<pubDate>Thu, 23 Aug 2012 21:10:17 +0000</pubDate>
		<dc:creator>Mike Burke</dc:creator>
				<category><![CDATA[Indexing]]></category>
		<category><![CDATA[Metadata]]></category>
		<category><![CDATA[Search Strategies]]></category>

		<guid isPermaLink="false">http://blogs.loc.gov/copyrightdigitization/?p=888</guid>
		<description><![CDATA[As we digitize, capture and index the pre-1978 Copyright records, a goal is that they be searchable in combination with the existing post-1977 records eventually resulting in a search capability that spans the full realm of creativity and copyright ownership from 1870 to the present.  More than 18 million records from 1978 to the present [...]]]></description>
			<content:encoded><![CDATA[<p>As we digitize, capture and index the pre-1978 Copyright records, a goal is that they be searchable in combination with the existing post-1977 records eventually resulting in a search capability that spans the full realm of creativity and copyright ownership from 1870 to the present.  More than 18 million records from 1978 to the present are already available online and first thoughts are to add the pre-1978 records to that same database.  However, before we go further with that idea, we’d like to know what you like and what you don’t like about the existing online search functionality for Copyright records.  It’s available at the following address:  <a href="http://cocatalog.loc.gov/">http://cocatalog.loc.gov/</a>.  The basic search page looks like this:</p>
<p>&nbsp;</p>
<div id="attachment_889" class="wp-caption aligncenter" style="width: 310px"><a href="http://blogs.loc.gov/copyrightdigitization/files/2012/08/OPAC-screen.jpg"><img class="size-medium wp-image-889" src="http://blogs.loc.gov/copyrightdigitization/files/2012/08/OPAC-screen-300x194.jpg" alt="" width="300" height="194" /></a><p class="wp-caption-text">Copyright basic search page</p></div>
<p>&nbsp;</p>
<p> Before 2007 when the current search functionality was installed, the post-1977 Copyright records were maintained in three separate files, one each for monograph registrations, serial registrations, and transfer/assignment documents.  The index files for searching were similarly kept in separate files although some combined searching of the files was possible.  Only left anchored searching was available, a disadvantage when one didn&#8217;t know the exact title they were searching for or at least how the title began.  The old system had been developed at the Library in the 1970’s and by 2003 the time had come to replace it.  A decision was made to use the same software for Copyright records that was already in place for the Library’s bibliographic records and which is still used today.  Several benefits derived from this.  It meant that the same tool would be used for both Copyright and bibliographic records with a very similar look and feel, a benefit for users of both sets of records.  It avoided the cost of buying or developing and maintaining new software just for Copyright records.  It leveraged the knowledge and experience that the Library had gained since implementing the software several years earlier.  It entailed a conversion of the Copyright data resulting in records that are cleaner, more consistent, and better organized and which enable more portability of the data.  And all records are stored in one database.  The new software also supported improved indexing and keyword searching.</p>
<p> As with any tool, some users had developed expertise in using the old system and were sad to see it retired.  Moving to the new software was a good decision at the time but before we add 16 million more records, we’d like to hear what you think of the present system for searching Copyright records.  The Copyright Office is currently conducting an online survey that&#8217;s available when one searches the records at the link given above.  The survey can only be completed once and requires that your browser pop-up blocker be turned off.  If you haven&#8217;t already completed the survey, please take a few minutes to use it to provide us with feedback about what you like or don&#8217;t like about this search capability or your thoughts on adding the pre-1978 records.  Question 6 of the survey has a text box in which you can give us your comments.  If you&#8217;ve already completed the survey but have additional comments please add them to this post.   Whatever feedback or comments you provide based on your experience in searching Copyright records online will be most appreciated and will be taken into account in deciding how to organize, index and make available the pre-1978 records.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.loc.gov/copyrightdigitization/2012/08/current-copyright-search-capability-tell-us-what-you-think/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Quiet but not quiescent: Steady progress on several fronts</title>
		<link>http://blogs.loc.gov/copyrightdigitization/2012/06/quiet-but-not-quiescent-steady-progress-on-several-fronts/</link>
		<comments>http://blogs.loc.gov/copyrightdigitization/2012/06/quiet-but-not-quiescent-steady-progress-on-several-fronts/#comments</comments>
		<pubDate>Thu, 07 Jun 2012 18:26:19 +0000</pubDate>
		<dc:creator>Mike Burke</dc:creator>
				<category><![CDATA[About the Project and the Records]]></category>
		<category><![CDATA[Crowdsourcing]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[Metadata]]></category>

		<guid isPermaLink="false">http://blogs.loc.gov/copyrightdigitization/?p=870</guid>
		<description><![CDATA[ It’s been a few weeks since my last post but that doesn&#8217;t mean we haven’t been making progress.  On the contrary we&#8217;ve been quite busy with several irons in the fire.  Our principal focus remains very much on how to achieve the public access goal of the project and we’re looking at all of the options.  [...]]]></description>
			<content:encoded><![CDATA[<p> It’s been a few weeks since my last post but that doesn&#8217;t mean we haven’t been making progress.  On the contrary we&#8217;ve been quite busy with several irons in the fire.</p>
<p> Our principal focus remains very much on how to achieve the public access goal of the project and we’re looking at all of the options.  We’re fortunate to have several staff from all parts of the Copyright Office helping on a voluntary part time basis with the analysis of the Copyright catalog cards.  These cards remain the best source of information to build online indexes to the pre-1978 records.  But like typical catalog cards the content is not labeled, which presents a challenge in extracting and identifying the types of index terms.  The volunteers are studying the cards to identify patterns that could allow programmatic parsing of the data.  For instance, the class codes will allow us to recognize registration numbers, and the copyright notice symbol should allow us to recognize the claimant name text string. Also, the relative location of other text strings in the card content, when compared with the card header information, should allow us to recognize some if not all of the other index terms.  The easy approach would be to invert all of the text in a card and provide general word searching, but we’re making the extra effort to try to index these older records in the same way as the post-1977 records.</p>
<p> On the data capture front, we published a request for information to learn what skills, experience and technology exist in the marketplace to support the capture of information through crowdsourcing.  We&#8217;ve got a lot of data to capture but very limited resources to get the work done. We found companies that build work-flow processes to capture and verify data through keyboarding from displayed images.  The processes are made available to interested persons through online service providers to bring together those with data needing capture and those who are willing to spend some time contributing towards meeting that need.  Some of our data capture requirements appear to lend themselves to crowdsourcing so we are planning to invite those who responded to the RFI to see the actual records and to discuss what’s feasible. </p>
<p> We published a similar request for information about building a virtual card catalog (see the March 22<sup>nd</sup> and April 5<sup>th</sup> blog posts) and were similarly encouraged from responses about the feasibility of using this as an interim approach to making the records available online.  We’ll continue to explore this as a possible avenue to sharing the card images with you online as an interim measure pending a full search capability.</p>
<p> In the course of scanning the published Catalogs of Copyright Entries for preservation purposes, we captured OCR output of the content. For some volumes, particularly those that were typeset in at least 8 point font, the OCR output is relatively good and may allow us to avoid keyboarding all of the records.  Some CCEs published in the 1970s used computer line printing followed by photo reduction which resulted in not so clear characters at about 6 point font size.  While eye readable, the content is less than clear to the OCR engine and the output attests to that. </p>
<p> We’re also refining the estimates of how many cards exist in the catalog and finding that there are fewer cards than originally estimated.  Card thickness varied over the years and so estimates based on numbers of cards per inch are not uniform across the 108 year span of the catalog.  Based on the new estimates, we believe that all of the cards in the catalog could be imaged by the end of FY2014.</p>
<p> So there’s lot’s going on and the knowledge we’re gaining about the records is helping us plan the most efficient and shortest road to making them available online for your use.  This blog is a way to keep you informed about the project and the progress we’re making, but it’s also a means for you to provide feedback about what we can do to best meet your needs when it comes to Copyright records.  Your comments and suggestions are most important and always welcome.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.loc.gov/copyrightdigitization/2012/06/quiet-but-not-quiescent-steady-progress-on-several-fronts/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>A brief status update on the digitization and public access project</title>
		<link>http://blogs.loc.gov/copyrightdigitization/2012/04/a-brief-status-update-on-the-digitization-and-public-access-project/</link>
		<comments>http://blogs.loc.gov/copyrightdigitization/2012/04/a-brief-status-update-on-the-digitization-and-public-access-project/#comments</comments>
		<pubDate>Thu, 26 Apr 2012 21:15:23 +0000</pubDate>
		<dc:creator>Mike Burke</dc:creator>
				<category><![CDATA[About the Project and the Records]]></category>

		<guid isPermaLink="false">http://blogs.loc.gov/copyrightdigitization/?p=837</guid>
		<description><![CDATA[We are nearing completion of the digitization of the Catalog of Copyright Entries with online availability through the Internet Archive.   645 CCE volumes are now available at http://www.archive.org/details/copyrightrecords/ ranging from the very first publication in 1891 up to and including 1978 and these cover all classes of works and all renewals.  A few volumes are still in process [...]]]></description>
			<content:encoded><![CDATA[<p>We are nearing completion of the digitization of the Catalog of Copyright Entries with online availability through the Internet Archive.   645 CCE volumes are now available at <a href="http://www.archive.org/details/copyrightrecords/">http://www.archive.org/details/copyrightrecords/</a> ranging from the very first publication in 1891 up to and including 1978 and these cover <span style="text-decoration: underline">all</span> classes of works and <span style="text-decoration: underline">all</span> renewals.  A few volumes are still in process due to their size which will require additional preparation prior to scanning.</p>
<p>The 1978 volumes have been included because they contain entries for pre-1978 registrations that were not complete at the time of publication of the 1977 CCE&#8217;s.  These entries appear in a separate section at the end of each volume.  All registrations made under the Copyright Law that went into effect on January 1, 1978 are available online at the Copyright Office website:  <a href="http://www.copyright.gov/records/">http://www.copyright.gov/records/</a>.</p>
<p>Scanning of the cards in the Copyright Card Catalog is continuing with completion of cards back to1955 expected in a few months.  Based on the positive feedback to the recent posts about a virtual card catalog, we are also researching how to best construct such a catalog using derivative images from these scans.</p>
<p>Study and testing of the use of optical character recognition (OCR) to capture data from the card images is proceeding as well as the feasibility of using crowd sourcing.  Some of the records will lend themselves to these processes and some probably won&#8217;t.  We&#8217;ve also recently engaged several staff from all parts of the Copyright Office to assist on a volunteer basis with analysis of the card formats to determine ways to parse out the titles and the author and claimant names in order to produce index terms for a full online search capability.  It remains our ultimate goal to provide a search capability that spans both pre-1978 and post-1977 records and that supports searching by title and name with means to narrow the results to the particular item of interest.  Work is proceeding on a demonstration model of such a database and we hope to provide you with access to that model in the not too distant future to obtain your feedback and comments.</p>
<p>Progress is being made on both the preservation and the access fronts and we&#8217;ll keep you posted on new developments.  In the meantime your input is most welcome and most appreciated.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.loc.gov/copyrightdigitization/2012/04/a-brief-status-update-on-the-digitization-and-public-access-project/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A virtual Copyright card catalog?  Maybe not a bad idea.</title>
		<link>http://blogs.loc.gov/copyrightdigitization/2012/04/a-virtual-copyright-card-catalog-maybe-not-a-bad-idea/</link>
		<comments>http://blogs.loc.gov/copyrightdigitization/2012/04/a-virtual-copyright-card-catalog-maybe-not-a-bad-idea/#comments</comments>
		<pubDate>Thu, 05 Apr 2012 21:14:40 +0000</pubDate>
		<dc:creator>Mike Burke</dc:creator>
				<category><![CDATA[Crowdsourcing]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[Search Strategies]]></category>

		<guid isPermaLink="false">http://blogs.loc.gov/copyrightdigitization/?p=823</guid>
		<description><![CDATA[This is a follow-up to my post of two weeks ago about making images of the pre-1978 Copyright catalog cards available online for searching just as one would search the physical cards.  More comments came in about that post than any previous one and the overall reaction was very positive.  We very much appreciate the feedback.  [...]]]></description>
			<content:encoded><![CDATA[<p>This is a follow-up to my post of two weeks ago about making images of the pre-1978 Copyright catalog cards available online for searching just as one would search the physical cards.  More comments came in about that post than any previous one and the overall reaction was very positive.  We very much appreciate the feedback.  It&#8217;s the principal purpose of this blog to let you know what we&#8217;re doing and what we have in mind and get your thoughts and recommendations.  We want to stay in step with your expectations and avoid hearing &#8220;What were they thinking?&#8221; after the fact.</p>
<p>Because of the positive response, we intend to pursue this option.  It seems to be a good interim step while we figure out how to muster the resources, maybe through crowd sourcing, to achieve the eventual goal of robust word searching of titles and names.  Some of the comments included specific suggestions such as the ability to skip some preset number of cards and to display only the top half inch of the cards in the scrollable search panel with a full card display in an adjoining panel.  These suggestions are most helpful and most welcome.</p>
<p>While the overall card catalog is large, it&#8217;s divided into six time periods and a seventh set for assignments and transfers.  Two of these seven sets have already been digitized and the completion of a third one is near.  The sets could be made available as they are scanned; no need to wait until they&#8217;re all done.  Good performance will be fundamental to the efficacy of a virtual card catalog and that will be a key factor in selecting the type and size of derivative images to be displayed as well as how they are organized.</p>
<p>We&#8217;ve already begun market research to find out who has done this before and to benefit from their experience.  There are probably pitfalls to be avoided and we don&#8217;t want to reinvent the wheel.  We&#8217;ll reach out to organizations and particularly libraries that have similar online catalogs.  The Internet Archive was mentioned in a couple of comments.  We&#8217;ve been working with them for the past two years on the scanning of the published Catalog of Copyright Entries and will consult with them on this initiative as well.  The Princeton University Library has a supplementary online catalog that is somewhat like what we have in mind and we&#8217;ll seek input on their experience.  If you are aware of others that have put virtual card catalogs online, please let us know.</p>
<p>I&#8217;ll report on plans and progress on this initiative in future posts.  Thank you again for your feedback and I look forward to receiving more of your comments.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.loc.gov/copyrightdigitization/2012/04/a-virtual-copyright-card-catalog-maybe-not-a-bad-idea/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>A virtual Copyright card catalog?  Tell us what you think.</title>
		<link>http://blogs.loc.gov/copyrightdigitization/2012/03/a-virtual-copyright-card-catalog-tell-us-what-you-think/</link>
		<comments>http://blogs.loc.gov/copyrightdigitization/2012/03/a-virtual-copyright-card-catalog-tell-us-what-you-think/#comments</comments>
		<pubDate>Thu, 22 Mar 2012 19:56:51 +0000</pubDate>
		<dc:creator>Mike Burke</dc:creator>
				<category><![CDATA[About the Project and the Records]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[Search Strategies]]></category>

		<guid isPermaLink="false">http://blogs.loc.gov/copyrightdigitization/?p=768</guid>
		<description><![CDATA[Of the 25,723 drawers in the Copyright Card Catalog, more than 12,000 have already been scanned resulting in more than 17 million card images safely tucked away in Library storage.  The long term plan is to capture index terms from the card images using OCR and keyboarding and to build indexes for online searching.  But [...]]]></description>
			<content:encoded><![CDATA[<p>Of the 25,723 drawers in the Copyright Card Catalog, more than 12,000 have already been scanned resulting in more than 17 million card images safely tucked away in Library storage.  The long term plan is to capture index terms from the card images using OCR and keyboarding and to build indexes for online searching.  But this will require significant time and money to achieve.  Must we wait to share these images with you?  Maybe not.</p>
<p>As an interim step, the Copyright Office is considering making the images of the cards in the catalog available online through a hierarchical structure that would mimic the way a researcher would approach and use the physical card catalog. We’re calling this a virtual card catalog.  While it would not provide the full record level indexing that remains a principal goal, it would make information available as we’re doing the scanning and as searchable as the actual cards.</p>
<p>The card images have been organized by drawer, each in its own folder, and the image file names contain the time period, the drawer label, a sequential four digit number starting with 0001, and occasionally an alphabetic suffix when information exists on a verso or there are multiple card images for a single entry.  So there’s already a hierarchical organization of the images that could enable a virtual card catalog.</p>
<p>But how would this virtual card catalog look and operate?  A search would probably begin at the top of the hierarchy with the selection of a catalog segment  (e.g., Registrations from 1971 to 1977) perhaps from a drop down list.  This would be followed by the entry of a search term (i.e., a name or a title).  The software would step down to the next level of the hierarchy within the selected catalog segment and find the “virtual drawer” folder that alphabetically within the segment should contain the term and then display that drawer label along with labels for some number of drawers immediately preceding it and some number of drawers immediately following it.  The researcher could select any one of the drawers displayed or return to the initial search screen.  For a selected drawer the software could display small scrollable images in one panel and a full card image in another panel.   One could scroll through the smaller images to different points in the virtual drawer, select and display a specific card, navigate to the next card, the previous card, the beginning of the drawer, and to the end of the drawer.  The software could support a return to the list of drawers, forward and backward navigation at the drawer level, and a return to the initial search screen.   The following is a mock-up of how the card images might be displayed.  Click on the image to see a larger display.</p>
<div id="attachment_809" class="wp-caption aligncenter" style="width: 451px"><a href="http://blogs.loc.gov/copyrightdigitization/files/2012/03/CVCC-display-mock-up-20120322.jpg"><img class="size-medium wp-image-809" src="http://blogs.loc.gov/copyrightdigitization/files/2012/03/CVCC-display-mock-up-20120322-300x172.jpg" alt="" width="441" height="267" /></a><p class="wp-caption-text">Mock-up of a virtual card catalog display</p></div>
<p> We are exploring multiple ways of making the Copyright records available online sooner rather than later.  The notion of a virtual card catalog is an example and one that could probably be done at a modest cost.  It sounds good to us but we want to hear what you think of it.  While not the optimal solution, would it nevertheless be useful to you as an interim step?  Do you know of other organizations that have done something similar and done it well?  Please take a moment to consider this option and let us know what you think.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.loc.gov/copyrightdigitization/2012/03/a-virtual-copyright-card-catalog-tell-us-what-you-think/feed/</wfw:commentRss>
		<slash:comments>26</slash:comments>
		</item>
		<item>
		<title>Did Grandma write songs? &#8211; The personal side of Copyright records</title>
		<link>http://blogs.loc.gov/copyrightdigitization/2012/03/did-grandma-write-songs-the-personal-side-of-copyright-records/</link>
		<comments>http://blogs.loc.gov/copyrightdigitization/2012/03/did-grandma-write-songs-the-personal-side-of-copyright-records/#comments</comments>
		<pubDate>Fri, 09 Mar 2012 21:54:43 +0000</pubDate>
		<dc:creator>Mike Burke</dc:creator>
				<category><![CDATA[About the Project and the Records]]></category>

		<guid isPermaLink="false">http://blogs.loc.gov/copyrightdigitization/?p=751</guid>
		<description><![CDATA[A recent comment on the blog from Barbara tells of a musical work that she registered in the Copyright Office in the early 1970s and the disappointment that the record was not available online and might never be seen by her grandchildren.  Last December, a post on the Library’s Performing Arts blog In the Muse told [...]]]></description>
			<content:encoded><![CDATA[<p>A recent comment on the blog from Barbara tells of a musical work that she registered in the Copyright Office in the early 1970s and the disappointment that the record was not available online and might never be seen by her grandchildren. </p>
<p>Last December, a post on the Library’s Performing Arts blog <em>In the Muse</em> told the story of a niece who found among the Copyright records 42 songs registered in the 1920s and 1930s by a great uncle, most still unpublished and hidden away for over 70 years (<a href="http://blogs.loc.gov/music/2011/12/pic-of-the-week-uncle-bennie-edition/"><span>http://blogs.<span>loc</span>.gov/music/2011/12/pic-of-the-week-uncle-<span>bennie</span>-edition/</span></a>). </p>
<p>These are touching stories about creative accomplishments that bring out the personal side of Copyright records, records that have meaning not only to the creators but also to their families.  The pride of a grandmother showing her grandchildren the records of songs that she wrote and the thrill for a niece finding records of a great uncle’s long lost songs stoke the fire of a passion that I and my colleagues share to make these records available online sooner rather than later.</p>
<p>Beyond being a source of family pride, the records also show where copyright may still persist for specific works. The amendment of the Copyright Law in 1992 made renewal automatic for works still in their first term of protection and made renewal registration optional for works originally copyrighted between January 1, 1964 and December 31, 1977.  In the case of the song registered in the early 1970s, the copyright very likely has not expired and would persist under the present law until at least 2065.  For the songs registered in the 20s and 30s, a search of the records for renewals would tell about their status.  So the song, book, or other work that has lain silent perhaps for decades could be the subject for the next hit tune or blockbuster motion picture, carrying with it all the benefits that can accrue to a copyright owner.</p>
<p>Copyright records reflect ownership of intellectual property that can have significant commercial value. They are a treasure trove for those doing research on the cultural development of our great nation.  But they also tell the personal stories of creative accomplishments of everyday folks that can be the inspiration for us and others in the future.</p>
<p><span>So Barbara, that song of so many years ago is still under Copyright protection and your grandchildren could one day inherit those rights.  And we remain committed to the task of making the records available and easily <span>searchable</span> online.  In the meantime, you might want to look in the online Catalog of Copyright Entries that we recently had scanned and that are now available at <a href="http://www.archive.org/details/copyrightrecords/">http://www.archive.org/details/copyrightrecords/</a> </span>.   The 489 volumes that have been scanned so far contain about 12.6 million registrations dating from 1924 to 1977 in all classes including music, motion pictures, works of art, prints, photos, pamphlets, periodicals, maps, books and renewal registrations.  The CCE volumes are by year, cataloging period, and class of material.  Word searching of the online volumes is available and there are indexes included in each volume or in an accompanying volume.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.loc.gov/copyrightdigitization/2012/03/did-grandma-write-songs-the-personal-side-of-copyright-records/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Leaves of history:  Some interesting stuff.</title>
		<link>http://blogs.loc.gov/copyrightdigitization/2012/02/leaves-of-history-some-interesting-stuff/</link>
		<comments>http://blogs.loc.gov/copyrightdigitization/2012/02/leaves-of-history-some-interesting-stuff/#comments</comments>
		<pubDate>Fri, 17 Feb 2012 14:43:57 +0000</pubDate>
		<dc:creator>Mike Burke</dc:creator>
				<category><![CDATA[About the Project and the Records]]></category>

		<guid isPermaLink="false">http://blogs.loc.gov/copyrightdigitization/?p=590</guid>
		<description><![CDATA[The image to the left shows a page from the 1855 record books of the U.S. District Court, Southern District of New York containing the original registration for Walt Whitman&#8217;s Leaves of Grass.  Whitman provided the title to the court on May 15, 1855.  The record shows his claim as author and proprietor and it was [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_586" class="wp-caption alignleft" style="width: 166px"><a href="http://blogs.loc.gov/copyrightdigitization/files/2012/02/RB1855books_A178.jpg"><img class="size-medium wp-image-586 " src="http://blogs.loc.gov/copyrightdigitization/files/2012/02/RB1855books_A178-187x300.jpg" alt="" width="156" height="259" /></a><p class="wp-caption-text">Page from the District Court 1855 Record Books</p></div>
<p>The image to the left shows a page from the 1855 record books of the U.S. District Court, Southern District of New York containing the original registration for Walt Whitman&#8217;s <em>Leaves of Grass</em>.  Whitman provided the title to the court on May 15, 1855.  The record shows his claim as author and proprietor and it was signed by the Clerk of the Southern District, George F. Betts.</p>
<p>Until 1870 when copyright functions were centralized in the Library of Congress, claims in copyright were recorded by the Clerks of the U.S. district courts.  The district court record books are now located in the Rare Book and Special Collections Division of the Library of Congress.  Microfilm copies of the books are available to the public in the Copyright Office.</p>
<div id="attachment_588" class="wp-caption alignright" style="width: 166px"><a href="http://blogs.loc.gov/copyrightdigitization/files/2012/02/RB1883records_O4808b.jpg"><img class="size-medium wp-image-588" src="http://blogs.loc.gov/copyrightdigitization/files/2012/02/RB1883records_O4808b-176x300.jpg" alt="" width="156" height="259" /></a><p class="wp-caption-text">Page from the Library of Congress 1883 Record Books</p></div>
<p>Under the Copyright Law of 1831, Whitman&#8217;s original registration of <em>Leaves of Grass </em>had a term of 28 years. On March 15, 1883, two months before the end of the term, Whitman renewed the registration for an additional 14 years in accordance with the statute. The image to the right shows the page of the Library of Congress record book containing the renewal record signed by the Librarian of Congress at the time Ainsworth Rand Spofford.  The record shows Whitman&#8217;s street address in Camden, New Jersey.</p>
<p>Whitman continually updated <em>Leaves of Grass</em> until his death in 1892 and registered other editions.  Original registrations for two of these are found in the 1876 and 1881 record books.  Images of those registration records are shown below.</p>
<div id="attachment_587" class="wp-caption alignleft" style="width: 166px"><a href="http://blogs.loc.gov/copyrightdigitization/files/2012/02/RB1876records_G1585b.jpg"><img class="size-medium wp-image-587 " src="http://blogs.loc.gov/copyrightdigitization/files/2012/02/RB1876records_G1585b-187x300.jpg" alt="" width="156" height="259" /></a><p class="wp-caption-text">Page from the Library of Congress 1876 Record Books</p></div>
<div class="mceTemp">
<div id="attachment_589" class="wp-caption alignleft" style="width: 166px"><a href="http://blogs.loc.gov/copyrightdigitization/files/2012/02/RB1881records_M15514b.jpg"><img class="size-medium wp-image-589 " src="http://blogs.loc.gov/copyrightdigitization/files/2012/02/RB1881records_M15514b-220x300.jpg" alt="" width="156" height="259" /></a><p class="wp-caption-text">Page from the Library of Congress 1881 Record Books</p></div>
</div>
<p>&nbsp;</p>
<p>These four records are just a glimpse at the interesting stories found within the leaves of the Copyright record books.  The principal purpose of the Copyright records is to identify ownership of intellectual property, but collectively they tell the much larger story of creativity in literature, music and art since the founding of our nation.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.loc.gov/copyrightdigitization/2012/02/leaves-of-history-some-interesting-stuff/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
