Book People Archive

Re: Why proofed and formatted digital text?



Bowerbird wrote:

> in the near future, machines will do almost _all_ the work
> of morphing "raw o.c.r. text" to a state of near-perfection
> in regard to both proofing and formatting, so this whole
> line of questioning will fade to total meaninglessness...

I disagree, and I believe most who work on texts, including those who
build OCR clean-up tools will disagree with Bowerbird. Of course, it
depends upon what Bowerbird means by his imprecise marketing term of
"near-perfection", and by "formatting". His "clean up tools" will
never reach the level of accuracy and proper text formatting which
many in DP now produce.

And all one needs is one example to show the difficulties of
all-machine processing: e.g., "The Adventures of Tom Sawyer."

Obviously tools help, and work by many (including the interesting
research by Bill Janssen, et al) will improve the accuracy of post-OCR
machine processing. OCR itself will improve, but it has limits, too.

(I believe that for machine-processing only sentient-level intelligence
is capable of perfect machine identification of all the glyphs in
typography and being able to properly format the text with full
identification of all the document structures. This is because to
fully understand document structures from all cultures and all
languages requires sentient understanding of the text, and this has
to include intelligent understanding of the cultural and historical
contexts underlying the texts. A "Commander Data". When will we have
sentient-level artificial intelligence? Who knows, but AI research
has been going on for many decades.)


> (the future will also prove that heavy markup has high costs
> which do not return benefits sufficient for the investment...)

Or, the future will prove your prediction to be wrong. <smile/>

Anyway, define "heavy".

And define "markup".


> p.s.   no project i know of has said "the scans are enough".
> all of them are doing o.c.r.   otherwise, they couldn't _search_,
> and users would look at them like they were absolutely crazy.
> which is why this whole issue is a no-brainer from the start...

Oh? Well, you need get out of your house and start talking with people.
One reason for asking the question as I have is because I have been
out and about talking with people who are at the decision-making level
on text projects. They want more than to be told by Bowerbird that
"this is a no-brainer."

Jon