Book People Archive

the good, the bad, and the ugly



it is perhaps fitting that nicholas would send a message on his 
400th book on the same day i was ending my umichigan series.

the workflow nicholas has developed can be described as _good_,
while google's scans are _bad_, and the umichigan text is _ugly_...

i encourage you to download one of the .pdfs that nicholas has
created which are composed of the scans he's made of a book.

what you will see is a scan-set that has been done _correctly_...
the scans were straightened, and then cropped to the page area,
so they are of a uniform size without any wasteful margin areas.

then compare this to a scan-set .pdf made by google or the o.c.a.,
where -- as you thumb through them -- the scans jump around
on the page area and/or tilt back and forth like a drunken sailor.

this kind of attention to detail isn't just pleasant to read, either.
it also pays a _huge_ dividend when it comes to o.c.r. accuracy.
o.c.r. apps work _much_ better when they scans are deskewed,
because the letters are then much closer to their "ideal" form...

i always laugh at the people who complain about o.c.r. quality
when i look at the scans that they are using.   well, of _course_
you're gonna get horrendous o.c.r. quality out of a shitty scan.
what did you expect?

the truth of the matter is that even a _slightly_ skewed image
can decrease your o.c.r. quality much more than you'd expect.

so nicholas has a headstart on accuracy right from the outset.

but he doesn't stop there.   no sir.   nicholas has programmed
a set of routines that enable him to pin-point any problems...

further, he takes the additional step of creating an audio-file,
using text-to-speech, and then _listens_ to the book as well.
this helps him identify problems with the text that might be
much more difficult to isolate otherwise, like stealth scannos.

as a result, the books that nicholas has digitized are simply
phenomenal in terms of the high level of accuracy they attain.

nicholas is also tenacious in his pursuit of this great accuracy.
when he comes across an idea that allows him to write a new
routine to increase his quality, he will go and run that routine
against his completed books, just to see what it dredges up...
as a result, his books are on a relentless march to perfection.

nicholas also applies a good deal of elbow-grease to the task.
he works hard -- much too hard for someone of his age! --
and applies his immense intelligence to getting great results.

and that shows in the fact that he has digitized _400_ books!
and he's done them from scanning start to posted perfection!

but in addition to his great work ethic, and his high standards,
nicholas is aided to a large degree by the quality of the _tools_
that he has created to help him do the job.   tools are important!

the bottom line is that i am pleased as punch that brewster kahle
has recognized the tremendous job being done by nicholas, and
asked nicholas to help formulate policies for the internet archive.
that move, all by itself, gives me a ray of hope for the o.c.a. effort.

thank you nicholas, for everything you do.   you are an inspiration.
and especially considering the time of year, you're truly a saint...       
:+)

-bowerbird