Book People Archive

Re: RE: OmniPage 11 and Epson scanners



OP11 is similar to TextBridgePro 98 and ABBYY FineReader in
allowing two-page spreads to be scanned in landscape mode.
In Tools/Options/Scanner/Scanner there are options for
Page length and orientation.  If you choose "Landscape"
and have the book oriented properly (I think it is the
opposite of OP10, but the same as TBP98) then the resulting
scan will be oriented correctly for zoning.  If you create
two zones for the side-by-side pages it works just as TBP98.
Autozoning and autorecognition ought to work okay--I admit
that the gutter was too black, though, in OP11.  You might
try adjusting the contrast and brightness.

I believe ABBYY FineReader is much easier and better at
doing this.  OP11 did not allow me to scan more than
"letter" size, while I was able with ABBYY to get a
full scan using the "default" page length setting.
(I use Epson's TWAIN5 driver interface.)

I did an extensive test comparing the three programs
(while watching the Red Sox lose again--it was no fun).

ABBYY FineReader is the champ--it was the fastest and
most accurate and easiest to use of the three.  I
recommend setting the program to use its own interface
rather than the scanner's.  You can do a "scan and read
multiple pages" and it will recognize each page just as
fast as you can turn the pages on your scanner bed.
Press "stop" when you finish the book.  Then select the
first page and proofread.  You can move from page to page
by pressing ALT-down arrow and edit the text directly.
Or you can move from suspect word to word just as in
TBP98 (though the image of the suspect word is at the
bottom rather than directly over the suggested correction).

Here is where a big monitor can really be useful.  I
have a 19-inch monitor and for this use I set it at
1600 by 1200 (usually text is too small to read at that
resolution, but not with ABBYY, and then you can see the
whole page).

OP11 can do a timed scan, say every 10 seconds, so it is
similar to FineReader--but I found that it was significantly
slower in doing recognizing--it often took a few more
seconds to catch up before scanning the next page.

Scanning and proofing a 130-page book took two hours with TBP98,
1:45 with OP11, and 1:15 with AFR.  AFR had virtually no
errors so took very little time to proofread (though I later
discovered two minor errors caused by speckles amid the text,
not pointed out to me during the online proofing stage).
OP11 was more accurate and took less time to proofread than
TBP98.

If you use PDF, then OP11 might be a good choice.  (AFR can
save to PDF as well.)  I understand you can even open a
PDF file in OP11 and save it in another format, but I
haven't tried that yet.  The PDFs I made from OP11 and
AFR were very good and certainly take a lot less time
than careful HTML.  But the file sizes are quite large
(I seem to remember that OP11's were smaller than AFR's
PDF files.)

The HTML filters for each are much improved over TBP98's,
but still fill the code with junk that I have to get rid of.
So I usually save to RTF and then use MS Word 97 to save
that to HTML, then strip the junk out of that with
the program HTML "tidy".  The pictures and much of the
fine HTML coding details I have to do by hand.  That takes
a lot longer than 2 hours, but I get a chance to read the
text and proofread it at the same time.  You also have to
be careful of hyphenation and em dashes.

I also have a big disk now so I save a TIFF file from AFR
(select all the pages and do File/Save image).  That
way I can re-read pages as needed later.  OP11 has a
feature to save the whole project in an .opd file--that
is a good idea.  In TBP98 you had to do these things
separately--choose at the beginning whether to save to
PDF or to RTF, choose to scan just images, or to recognize
each page without saving to image first.  OP11 is better.
But AFR is still best.

If anyone is interested I will post the TIFF and work files
for each program for this book.  Please read the result at
http://www.eldritchpress.org/clt/ig.html

(I have donated it to the Celebration of Women Writers,
so it may not always be accessible at the above location.)

I appreciate having members of this list point out to me the
excellent program ABBYY FineReader.  OmniPage 11 looks almost
identical in its user interface.  Perhaps if they listen to
us then ScanSoft will make it work as well.  Competition is
good for them and cooperation is good for us!

On Thu, Jun 28, 2001 at 02:41:22PM -0700, Shawn Redford wrote:
> There seem to be very few places on the Net where you can collaborate with
> others who deal specifically with scanning and OCR of BOOKS.
>
> Prior posts on Finereader are very interesting. I have Textbridge 98,
> Millennium and the new Omnipage 11. Omnipage looks really good overall, but
> my greatest frustration is that it lacks the Book/Dual-Page input feature
> which would break apart scans of opposing book pages and then rotate each
> page for optimal OCR orientation. I realize that the Textbridge to Omnipage
> upgrade is the reason for this, but it makes this seem like a downgrade as
> much as an upgrade.
>
> If anyone knows of a scanning utility that would split apart opposing book
> pages which could then be fed into an OCR engine, I would love to know about
> it.

--
nom:"Eric"  Eric Eldred  Eldritch Press
mailto:Eldred@[redacted]
vCard3.0:http://www.eldritchpress.org/EricEldred.vcf