Book People Archive

RE: FineReader OCR



Hello Dean and Book People,
I would judge that Omnipage 10 is EQUAL to  FineReader 5.0
in its uncorrected OCR ability. They both made similar errors.
http://www.scansoft.com/products/omnipage/pro/      $99.95 "upgrade"
http://www.finereader.com/products/fine/index.htm     $99.95
(might be better for non English languages and may have forms ability)
http://docmorph.nlm.nih.gov/docmorph/       free single page OCR.
I do not have any foreign language page to test,
If you are not upgrading from anything FineReader has the better price.
But the upgrade price for Omnipage 10 is the same as the FineReader 5 full 
price.
You can upgrade from anything , and download or have a box delivered,
there seems to be no upgrade verification
so I do not know why they bother to have a "full" OmniPage 10 price.

Linked below is a Omnipage 10 OCR example with no corrections.

First here is this page saved by OmniPage10 as HTML
I've made NO corrections or changes, but
I did add the green hilighting as you would see it in Omnipage 10
which is intended for operator corrections.
I also added the red hilighting which are mistakes that Omnipage 10 did not 
question.
The errors are very similar to those made by FineReader 5.0
(e.g. quote marks, Bs and Hs, lie for he, etc.)
http://www.usigs.org/library/books/ma/Scituate1831Dean/sctd0004.htm   * * * 
* * * * *
Also here is how Omnipage 10 saves the OCR file as text without line breaks
(but with paragraph breaks)
http://www.usigs.org/library/books/ma/Scituate1831Dean/sctd0004.txt
and as text with line breaks
http://www.usigs.org/library/books/ma/Scituate1831Dean/sctd0004c.txt
OmniPage 10 can save in MANY formats

Notice that we are OCRing from this 200dpi G4TIFF image
mediocre xerographic copy of a book page
from the History of Scituate Massachusetts  published 1831
http://www.usigs.org/library/books/ma/Scituate1831Dean/SCTD0004.TIF
OCRing from an original unbound book at 300dpi would be better,
and that is what I did with the History of Marblehead and of Portland books.

See 250KB sized (IrfanView 3.33) image screen captures -
http://www.usigs.org/library/books/ma/Scituate1831Dean/OmniPage1000.jpg
http://www.usigs.org/library/books/ma/Scituate1831Dean/OmniPage1001.jpg
http://www.usigs.org/library/books/ma/Scituate1831Dean/OmniPage1002.jpg
http://www.usigs.org/library/books/ma/Scituate1831Dean/OmniPage1003.jpg
http://www.usigs.org/library/books/ma/Scituate1831Dean/OmniPage1004.jpg
...
http://www.usigs.org/library/books/ma/Scituate1831Dean/OmniPage1009.jpg
for 250KB sized image screen captures of the OmniPage10 operator edit process.

I agree that DocMorph  did poorly with the footnotes in the smaller font.
http://docmorph.nlm.nih.gov/docmorph/
but DocMorphs normal font OCR seemed equal to both FindeReader and OmniPage

Does FineReader 5.0 hilight questionable OCR text and offer the operator
the opportunity to make corrections like Omnipage 10 does?
Omnipage 10 actually goes thru all its questionable OCR words and asks
the operator to make corrections and offers alternate OCR words.

I would add that I have used Omnpage 8 and it WAS much worse than Omnipage 10
Omnipage 8 did have an error rate twice that of Omnipage 10 or FineReader 5.0
and many of the older OCR efforts in the library below show this.

Best Regards
David Blackwell
Groveland, Mass.
NE LHG Free Books Online EFFORT coordinator, and USIGS librarian.
http://www.usigs.org/library/books/books.html
http://www.usigs.org/library/books/buk.shtml
http://www.usigs.org/library/books/uk/ Derbyshire(DBY) and few UK books

http://www.usigs.org/library/books/ USIGS Library
http://www.usigs.org/library/books/ma/ Massachusetts Collection about 1.5 
Gigabytes
http://www.usigs.org/library/books/me/ Maine just 2 books now, 3 more 
waiting to be scanned
http://www.usigs.org/library/books/ri/ Rhode Island just a few
http://www.usigs.org/library/books/families/ 600 MegaBytes maybe 30 books.
http://www.usigs.org/library/books/buk.shtml DBY and UK books
PS any link you see to genweb.net should be to the above usigs directories.
(genweb.net disappeared without explanation)


Below are Dean's  DocMorph(free) and FineReader 5.0 OCR text file of the 
same page.

At 10:54 AM 3/27/2001 -0500, you wrote:
>Well, I tried DocMorph. Below are the results of OCR - first from
>DocMorph and second from FineReader. These are both of the file
>SCTD0004.TIF. Looks like a big win for FineReader to me. The footnote
>text is particularly striking.
>
>-- Dean
>
>------ DocMorph version ---------
>BOUNDARIES.
>comprehend the whole easterly line - with one exception, how-
>ever, which we will here notice. In 1636 we find the following
>entry in the Col. Rec. Mr Hatherly in behalf of the Church
>at Scituate, complained that the place was too straite for them,
>the landes adjacent being stoney, and not convenient to plant
>upon." The Court passed the following order 11 that they have
>liberty to seeke out a convenient place for their residing within
>the Colonic, or that some other lands be layed to them for
>more comfortable subsistence." This matter was in agitation
>nearly four years, for we find the settlers of Scituate were not
>satisfied until 1640, when a grant was made to them "of two
>miles in length and one mile in breadth on the easterly side of
>the N. River." We mention this here as an exception to the
>boundaries above; we shall notice the territory called ,The
>Two Miles" hereafter. The boundaries continued as above
>until A. D. 1727, when that part of the town on the southerly
>side of the third Herring brook, was incorporated by the name
>of Hanover.* In this form it continued until 1788, when the
>oTwo Miles" was ceded to Marshfield. The Town is now
>bounded N. W. by Hingham and Cohasset, N1. E. by Massa-
>chusetts Bay, S. E. by N. River which separates it from
>Marshfield and Pembroke, and S. W. by Hanover and Abing-
>ton.
>The N. W. line of Scituate, being also the Colony line, was
>long a subject of tedious controversy. It may be proper here
>to subjoin a brief history of the transactions relative to that line.
>As early as 1636, there was found to be a want of a definitive
>settlement of the line. Hingham which then included Cohasset,
>claimed a part of the marshes on the East side of 11 Conihassett
>Gulph." The plea of Scituate was that the gulph was a good
>natural boundary, and therefore the proper boundary between
>the two patents. Hingham on the other hand pleaded, that
>-
>'The first Minister of Ifanover was IvIr Benjamin Bass of l1raintree,
>11. C.
>fie has descendants in Hanover, The second
>171~-ordainold "Cc, 1728.
>Minister was Mr Samuel Baidwin, 11, C- 1752-ord. 1757-mar. Hannsh,
>daughter of Chief justice John Cushing, 1758, 'File wife ()f Mr Robert
>~od-
>men of Hanover is his daughter. I It was descended from Henry Baldwin,
>who cattle front Devonshire, Zog. and settled at Woburn, j65o. The son
>of
>Henry, was Henry, cold the son of the latter was David, the father of
>Rev.
>Samuel, of Hanover. (Farmer.) The third Minister was Mr John Mellen,
>If. C. 1740, Minister of Stcrling, 1744-holralled at 11430ver, 178z.
>[Its
>solls,wete Rev John, 1I C- 1770, andinhoster of Barnstable-Ileary, hsq.
>of Dwer, N, H - if, C. 1784, counsellor at law, and Ifou. Prentiss
>Mellen,
>H- C 1784, now Chief Justice of 'Maine. Rev. Sinnott Mellen died at
>Reading, 1807, aged 85. Ile was succ"ded by Rev. Calvin Chadwi~k, Dart.
>Col. 1766. to vown, succeeded Rev, Seth Chapin, i8if- 11. U. i8o& Rev.
>Lthan Smith is the preseot pastor.
>[snip]
>
>---------- FineReader Version ------------------
2 BOUNDARIES.
comprehend the whole easterly line - with one exception, however, which
we will here notice. In 1636 we find the following entry in the Col.
Rec. " Mr Hatherly in behalf of the Church at Scituate, complained that
the place was too straite for them, the landes adjacent being stoney,
and not convenient to plant upon." The Court passed the following order
"that they have liberty to seeke out a convenient place for their
residing within the Colonie, or that some other lands be layed to them
for more comfortable subsistence." This matter was in agitation nearly
four years, for we find the settlers of Scituate were not satisfied
until 1640, when a grant was made to them "of two miles in length and
one mile in breadth on the easterly side of the N. River." We mention
this here as an exception to the boundaries above; we shall notice the
territory called "The Two Miles" hereafter. The boundaries continued as
above until A. D. 1727, when that part of the town on the southerly side
of the third Herring brook, was incorporated by the name of Hanover.* In
this form it continued until 1788, when the 41 Two Miles" was ceded to
Marshfield. The Town is now bounded N. W, by Hingham and Cohasset, N. E.
by Massachusetts Bay, S. E, by N. River which separates it from
Marshfield and Pembroke, and S. W. by Hanover and Abing-ton.
The N. W. line of Scituate, being also the Colony line, was long a
subject of tedious controversy. It may be proper here to subjoin a brief
history of the transactions relative to that line. As early as 1636,
there was found to be a want of a definitive settlement of the line.
Hingham which then included Cohasset, claimed a part of the marshes on
the East side of " Conihassett Gulph," The plea of Scituate was that the
gulph was a good natural boundary, and therefore the proper boundary
between the two patents. Hingham on the other hand pleaded, that
* The first Minister of Hanover was Mr llenjamin llass of Hraintree, H.
C. 1715 - ordained Dec, 1728. He has descendants in Hanover. The second
Minister was Mr Samuel Baldwin, H, C. 1752 - ord. 1757 - mar. Hannah,
daughter of Chief Justice John Gushing, 1758, The wife of Mr Robert
Salmon of Hanover is his daughter, lie was descended from Henry Baldwin,
who came from Devonshire, ling, and settled at Woburn, 1650. The son of
Henry, was Henry, and the son of the latter was David, the father of
Rev. Samuel, of Hanover. (Farmer.) The third Minister was Mr John
Mellen, H. C. 1740, Minister of Sterling, 1744 - installed at Hanover,
ijHt. His sons were Rev. John, H. C. 1770, and minister of Uarnstable-
Henry, Ksq. of Dover, N, H,, It, C. 1784, counsellor at law, and Hon.
Premiss Mellen, II. C. 1784, now Chief Justice of Maine. Rev. Samuel
Mellen died at Reading, 1807, aged 85. lie was succeeded by Rev. Calvin
Chadwiuk, Uart. Col, 1786. To whom succeeded Rev, Seth Chapin, 1816-IS.
U. 1808. Uev. Ethan Smith is the present pastor.

[snip]