Book People Archive

Re: Where to put scraped Google Book Search OCRs

From: Mike Barlow <mike@[redacted]>
Subject: Re: Where to put scraped Google Book Search OCRs
Date: Fri, 10 Feb 2006 17:23:58 GMT

I'm far too intellectually challenged to work out how to download the 
images, en masse....is there such a thing as a CGI script/Executable 
etc that will run on a WIN machine to do the trick?

I tried the downloading single images etc, but 40 pages into a 740 
page tome I gave it up as a complete waste of life ;o)

Best,

Mike

[Moderator: To minimize the time going back over old ground, I'll note
 that Google prohibits automated downloading of their books pages
 (see http://books.google.com/robots.txt ) and has used countermeasures
 against automated downloaders, including CAPTCHAs and outright blocking
 (see, e.g., http://www.zuhause.org/dp/gprint.html )

 However, there are already some downloaded images that one can 
 transcribe from (such as a few in zip files at
 http://steinbeck.ucs.indiana.edu/novels/author.html)

 And for downloading other book images, the long discussion thread
 we had on Google Book Search in late 2005 included some tips on
 efficient manual downloading, which does not violate the robots.txt
 rules.  There's also been some urging that downloads be coordinated so
 that books don't get scraped more than once by the community of folks
 working on transcriptions.  For an example post covering both points, see
 http://onlinebooks.library.upenn.edu/webbin/bparchive/?year=2005&post=2005-12-15,4

    - JMO]