Book People Archive

Re: another year here on bookpeople



>> We do OCR for 2 reasons: 1) it greatly improves access for
>> keyword and phrase searching, and 2) we can do it cheaply.

>it sounds like you don't particularly care about the formatting or
>readability of the OCR you output (or at least not enough to invest
>many resources in it at this point), as long as it recognizes the
>words well enough to provide decent full-text search at low cost
>of preparation. Is that correct?

That sums it up pretty well.

>You do display the two final text pages of
>_A Pair of Patient Lovers_ when Google does not, even though they appear
>to be the same scan (and the two pages you have even say "Digitized by
>Google" on the bottom!)  Did you have to do anything special to recover
>these two pages, or were they just as you expected?

We put online what they send us, so they have the page images somewhere. The
pages aren't showing up online in GBS for some reason, but I don't know why
that might be.

>Also, even if Google or Michigan doesn't care much about good quality
>transcriptions,

That's a little strong. We do care about the quality of the OCR.

>there are other people and groups that do, and that might
>like to work with content you now manage.  For instance, if they
>had access to your archive-quality images, they might take the
>time to do a better OCR and formatting process along the lines
>of what folks like Bowerbird and Nick Hodson are recommending, to
>produce better etext versions (which you could then reuse yourself
>if you want and have the time to ingest them). I realize you may
>have some contractual restrictions against exporting some
>Google-produced material, but what about MOA and the other
>non-Google collections you've done on your own?  There may
>be some folks on this list you could partner with if interested,
>if you aren't already doing so.

I'd be interested in discussing this more. There are potential impacts on
print-on-demand and other services that we'll need to consider here at U-M.
But how would you see this working? Thanks,

Perry Willett
Head, Digital Library Production Service
300 Hatcher North
University of Michigan
Ann Arbor MI 48109-1205
Ph: 734-764-8074
Fax: 734-647-6897
Email: pwillett@[redacted]