VA-HIST Archives

Discussion of research and writing about Virginia history

VA-HIST@LISTLVA.LIB.VA.US

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Randy Cabell <[log in to unmask]>
Reply To:
Discussion of research and writing about Virginia history <[log in to unmask]>
Date:
Wed, 20 Mar 2002 13:54:12 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (13 lines)
Is there any rule of thumb, or are their any guidelines for OCR vs retyping of old documents?  I am looking into converting  minutes books of The Cabell Foundation from 1955 - 2002 to editable (searchable) text.  OCR came to mind first, since I have been very successful doing contemporary minutes of Boards of Supervisors and School Boards.

But the early Cabell minutes were typed with a typewriter which formed very poor characters, many not closed, downstrokes faint or missing on characters like "m" and "p", etc.  Using Omini Page to OCR a page was a complete disaster. I had to intervene in about 40 cases, but it missed 70-100 or so words on the page completely because it did not recognize characters.  And of course the higher the intervention and error rates, the more time is required to proof the final copy to make sure it did not miss anything.

At the moment, it looks like to me that if a page has more than a dozen or so 'interventions' required during the OCR process, then one is better off in just re-typing everything in initially.  

Any experience out there to share?

Randy Cabell

To subscribe, change options, or unsubscribe, please see the instructions
at http://listlva.lib.va.us/archives/va-hist.html

ATOM RSS1 RSS2


LISTLVA.LIB.VA.US