Re: More About Google's Deal with U of Cal
- From: John Mark Ockerbloom <ockerblo@[redacted]>
- Subject: Re: More About Google's Deal with U of Cal
- Date: Wed, 30 Aug 2006 11:45:28 -0400 (EDT)
On Wed, 30 Aug 2006 goopfic@[redacted] wrote:
> It would seem that, as of this morning, Google has added a "Download PDF"
> link to their public domain texts. What you get seems to be a much
> higher-resolution set of scans than the "zoom=3" version available through
> the browser interface, with a page of usage guidelines prepended to the
> file. The PDFs are not encrypted, and they make no effort to prevent
> printing or content extraction.
Yes, these could be quite useful: you can download a book in one go
to read offline (or to use for proofreading) instead of having to click
through the book one page at a time.
The PDF files appear to be page images only; no searchable text included.
There may still be a few bugs in the system. In particular,
their PDF generator may also need to be adjusted a bit to conform to
the PDF standard, as my experiments so far indicate that at least some
of the files don't work in many PDF viewers. To see an example, look at
the download for _The Slipper Point Mystery_, which I recently listed at
http://books.google.com/books?id=vohkC25OUsQC&jtp=1
The "Download" button appears on the right column.
Clicking on it will download a PDF that I can view in a recent Adobe
browser plugin on Windows, but not one that I can view in other
PDF viewers that I've tried that should be able to cope with
PDF 1.4 files (the version of PDF the file claims to be). In particular:
-- Older Adobe browser plugin, Apple OS 9:
Only alternate pages visible; others blank
-- Preview, Apple OS X:
Only initial Google cover page visible; others blank
-- Xpdf, Solaris Unix:
Viewer crashes after initial Google cover page
One thing that I notice about this book is that, for whatever reason,
the left and right hand pages alternately appear as color and black and white.
(They seem to have been scanned either on different passes or at different
effective lighting levels, which may have caused different treatment in later
processing.) If the Google PDF generator has encoded the color pages
in a way that ordinary PDF viewers don't expect, that could explain
the odd behavior of the various viewers above. (I haven't eliminated
the possibility that they're generating correct but hard-to-process
PDF, but the fact that three different viewers have problems with it
makes it likely that they're generating nonstandard PDF that the recent
Adobe plugin happens to be able to cope with nonetheless, rather than
standard PDF that the other viewers simply can't handle.)
Still, it's a promising start, and if they can fine-tune their PDF
creator to make more portable PDFs (or if others can post-process them
to make them more portable) these could enable lots of people to carry
around and share their public domain books. Many thanks to the Google team!
John