Yes, I failed at a similar task but for different reasons. All my documents were in outline form with several successive indentations per page. All indentations were lost or dislocated, but dropping the scanned OCRd document into Word for spellchecking often gave a good guess for "misspelled" words. Had the document's format been preserved, I think scanning, then OCR to spellchecker would have been successful enough. Regardless it is a lot of work. I have bought CDs of typed information (Genealogical information), where it is an image of the typed paper and not the OCR result of scanning. The provider created an index which would locate the image page where the information resides and that may be less work than a complete re-typing, but is only searchable to the extent of good indexing. I could have done this as well as made an image of the author's index, but 600 pages was a bit much. By the way getting images via a very good digital camera is, of course, much faster than scanning. My camera does an excellent job of making images of pages, but it was a costly machine. I do not know of a really good solution for that chore. ----- Original Message ----- From: "Randy Cabell" <[log in to unmask]> To: <[log in to unmask]> Sent: Wednesday, March 20, 2002 1:54 PM Subject: Capturing old Text via OCR Is there any rule of thumb, or are their any guidelines for OCR vs retyping of old documents? I am looking into converting minutes books of The Cabell Foundation from 1955 - 2002 to editable (searchable) text. OCR came to mind first, since I have been very successful doing contemporary minutes of Boards of Supervisors and School Boards. But the early Cabell minutes were typed with a typewriter which formed very poor characters, many not closed, downstrokes faint or missing on characters like "m" and "p", etc. Using Omini Page to OCR a page was a complete disaster. I had to intervene in about 40 cases, but it missed 70-100 or so words on the page completely because it did not recognize characters. And of course the higher the intervention and error rates, the more time is required to proof the final copy to make sure it did not miss anything. At the moment, it looks like to me that if a page has more than a dozen or so 'interventions' required during the OCR process, then one is better off in just re-typing everything in initially. Any experience out there to share? Randy Cabell To subscribe, change options, or unsubscribe, please see the instructions at http://listlva.lib.va.us/archives/va-hist.html To subscribe, change options, or unsubscribe, please see the instructions at http://listlva.lib.va.us/archives/va-hist.html