Re: Where to put scraped Google Book Search OCRs
- From: Mike Barlow <mike@[redacted]>
- Subject: Re: Where to put scraped Google Book Search OCRs
- Date: Fri, 10 Feb 2006 17:23:58 GMT
I'm far too intellectually challenged to work out how to download the
images, en masse....is there such a thing as a CGI script/Executable
etc that will run on a WIN machine to do the trick?
I tried the downloading single images etc, but 40 pages into a 740
page tome I gave it up as a complete waste of life ;o)
Best,
Mike
[Moderator: To minimize the time going back over old ground, I'll note
that Google prohibits automated downloading of their books pages
(see http://books.google.com/robots.txt ) and has used countermeasures
against automated downloaders, including CAPTCHAs and outright blocking
(see, e.g., http://www.zuhause.org/dp/gprint.html )
However, there are already some downloaded images that one can
transcribe from (such as a few in zip files at
http://steinbeck.ucs.indiana.edu/novels/author.html)
And for downloading other book images, the long discussion thread
we had on Google Book Search in late 2005 included some tips on
efficient manual downloading, which does not violate the robots.txt
rules. There's also been some urging that downloads be coordinated so
that books don't get scraped more than once by the community of folks
working on transcriptions. For an example post covering both points, see
http://onlinebooks.library.upenn.edu/webbin/bparchive/?year=2005&post=2005-12-15,4
- JMO]