6 Comments

  1. Steven B. Tuttle
    December 23, 2012 at 5:06 pm

    I hope I can volunteer as a work at home capacity with my computer maybe 2 to 4 hours per day 3 days per week.
    Thank You in advance for considering this.
    How about if I can get the OCR data side by side with a card scan or picture of the card to verify the data that has been OCR’d correctly?

  2. David Hayes
    January 3, 2013 at 11:02 am

    Steven B. Tuttle’s remark suggests that the OCR data appear side-by-side on the screen with an image of the card that had been scanned to produce the data. Might I suggest a better idea?

    Why not have an imaging program function which automatically expands the space between lines of text on the card so that it becomes large enough to fit the OCR text? The OCR program knows where one line of text has reached bottom and where the next line has its top-most position. Having the OCR text directly above the image makes comparisons much easier, allowing difficult-to-spot differences to stand out because any letter or number in the middle of a line is positioned adjacent to its supposed counterpart.

  3. David Hayes
    January 3, 2013 at 11:23 am

    The blog post mentions the planned use of automated procedures to parse the data derived from the catalog cards, and suggests that some percentage of the parses will be misidentified by the computer program because some cards have data which is positioned or formatted differently than is normal for the cards. Some human checking of the results will be necessary.

    The human examination of the decisions made by the computer program as to where to parse and how to assign data fields, can be made easier by having the program place a bullet (of a type never seen on the catalog cards) where it determined to parse, and color-coding the data-field decisions. Thus, for example, the data identified by the program as being the registration number might get a light-yellow background. The date might get a light-green background. (The date is almost fool-proof to identify, because it will have one of twelve abbreviations for a month sandwiched between two groups of numbers.) The title might get an orange underline. In this way, a person might most-easily spot an error made by a computer..

    A computer program may well be written with enough intricacy that a paragraph block which begins “MARY STEVENS, M. D. W. B. Pictures, Inc.” would be recognized as consisting of a title which ends on “M. D.” (a common abbreviation in English) and an author name which begins with “W. B.” (some human authors give their first and middle names as initials and spell out only the surname). However, some decisions will be made incorrectly by a program because human judgment can ferret out practices too uncommon to be thought of during the programming phase. This is where having different color underlinings or backgrounds would assist the person tasked with identifying poor decisions by the computer as to where to parse and assign text into data fields.

  4. Steven B. Tuttle
    March 13, 2013 at 5:24 pm

    I was thinking that the computer can scan and hold a bit map image as well as the OCR information side by side so that human checkers can check the information easier or over the internet. I believe this would open up the possibility of many more volunteers that would normally not volunteer because of transportation,job,time,cost issues.
    Transportation Commuting on a daily basses working at home eliminates travel.
    Job my interfere with current job working at home during a free period would reduce this.
    Time Less time wasted to commute able to schedule days off able to fit volunteer work into schedule.
    Cost of commuting can be reduced for local volunteers. Cost for travail to Washington D.C. , hotel and meal would be reduced.
    I hope this helps.

  5. chatlines
    March 16, 2013 at 10:35 pm

    I wanted to draft you that little bit of remark so as to thank you so much over again on your wonderful information you have provided here. It’s certainly surprisingly generous of people like you to offer extensively all a few individuals would have distributed as an e book in order to make some bucks for themselves, chiefly considering the fact that you could possibly have done it if you decided. Those principles additionally served as a great way to realize that the rest have similar interest similar to mine to know a good deal more around this issue. Certainly there are numerous more pleasant sessions ahead for folks who check out your blog post.

  6. Chat teen
    March 16, 2013 at 11:54 pm

    Does your website have a contact page? I’m having a tough time locating it but, I’d like to shoot you an email. I’ve got some suggestions for your blog you might be interested in hearing. Either way, great site and I look forward to seeing it expand over time.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. Your submission may be subject to disclosure under the Freedom of Information Act (FOIA). The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.