VA-HIST Archives

Discussion of research and writing about Virginia history

VA-HIST@LISTLVA.LIB.VA.US

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Randy Cabell <[log in to unmask]>
Reply To:
Randy Cabell <[log in to unmask]>
Date:
Fri, 5 Oct 2007 09:34:08 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (17 lines)
Over the last couple of months, many of your have responded to my questions about digitizing.  I have learned more about that than I ever knew existed, so as partial payback here are some rules of thumb that I have come up with from proposals and sizings.

COST PER PAGE - Includes scanning to TIF images, OCR of any text, and production of PDF files. 
   SCRAPBOOKS  10x12 inch pages.  $2.50 to $3.50 per page.  The more items per page and more color, the the more $$
   NEWSLETTERS 8.5x11inch pages  $1.10 to $1.70 per page.  The more drawings and non text, the more the $$$

EQUIPMENT
   After your suggestions, and some research, I settled on a Canon 1210C copier which I got from CDW online for about $380.  It can operate as high as 600PEL, 35sheet Automatic Document Feeder with speed up to 12 ppm, driver software and ADOBE 7.0 which is really required (or something like that) to glom image files together into a single document.  Installation is pretty easy, although it did take me about 3 hours to get comfortable and accustomed to its little excentricities.  It can produce a number of outputs incluind .TIF and JPG, but for me the PDF was what I needed.  It is really amazing.  I did a newsletter test, and it fed the pages about about 8 ppm and that included OCR, conversion to and writing PDF.  Egad.  That 480 pages an hour!!!!!  OCR is about 90-95% satisfactory -- it errs on the side of just ignoring things it cannot figure out.  The manual says I can throw the recognized text over into Microsoft WORD, but figuring that out is the task for this weekend.

From a do-it-yourself standpoint.  Assume that because of between-batch paper shuffling, the average speed of your project through the ADF drops to 4 pages per minute.  That's 240 per hour.  Assume costs of an operator, overhead, amortization of the scanner TOTALS $60 an hour, that means you are producing scanned PDF text/files for about 40 cents a page.  Not bad.

I experimented by reprocessing the PDF files through my Omniscan 11 software which gives you an opportunity to assist in recognition.  It was not good.  I did not tell it to ignore images, or otherwise tell it where to OCR and where not to.  It took about 3 minutes per page, and the image PDF file it produced did not really look like the original.  There is more work for me to do here, but for the moment, my conclusion is that if you can live with the 90-95% (probably is closr to 95%) OCR recognition, then it is not going to be worth your time to try to tweak it up.

I hope this helps somebody out there.  It would have moved me ahead a couple of weeks if I had known it 2 months ago.  I welcome any off-line questions.

Randy Cabell

ATOM RSS1 RSS2


LISTLVA.LIB.VA.US