Subject: | |
From: | |
Reply To: | |
Date: | Wed, 20 Mar 2002 13:54:12 -0500 |
Content-Type: | text/plain |
Parts/Attachments: |
|
|
Is there any rule of thumb, or are their any guidelines for OCR vs retyping of old documents? I am looking into converting minutes books of The Cabell Foundation from 1955 - 2002 to editable (searchable) text. OCR came to mind first, since I have been very successful doing contemporary minutes of Boards of Supervisors and School Boards.
But the early Cabell minutes were typed with a typewriter which formed very poor characters, many not closed, downstrokes faint or missing on characters like "m" and "p", etc. Using Omini Page to OCR a page was a complete disaster. I had to intervene in about 40 cases, but it missed 70-100 or so words on the page completely because it did not recognize characters. And of course the higher the intervention and error rates, the more time is required to proof the final copy to make sure it did not miss anything.
At the moment, it looks like to me that if a page has more than a dozen or so 'interventions' required during the OCR process, then one is better off in just re-typing everything in initially.
Any experience out there to share?
Randy Cabell
To subscribe, change options, or unsubscribe, please see the instructions
at http://listlva.lib.va.us/archives/va-hist.html
|
|
|