Book People Archive

Re: (BP) Working through the maze of PDA formats for etexts...



> Thanks for the info, Bill.  I tried it but wasn't impressed:  too buggy and 
> difficult to use for my taste.  I'll keep playing around with it though.

I assume you mean the parser/distiller.  Yeah, it's overly configurable.

Here's what I do if I want to create an ebook version of the Stanford
Encyclopedia of Philosophy's brief biography of Ralph Waldo Emerson at
http://plato.stanford.edu/entries/emerson/.

% plucker-build -p /tmp -f Emerson -N "Ralph Waldo Emerson" \
  -H http://plato.stanford.edu/entries/emerson/  -M 1 --bpp=2 \
  --zlib-compression --no-backup --category=Biography

Pluckerdir is '/tmp'...
---- 0 collected, 1 to do ----
Processing http://plato.stanford.edu/entries/emerson/...
  Retrieved ok.
  Parsed ok; added 1 document link and 1 image.
---- 1 collected, 2 to do ----
Processing mailto:rgoodman%40unm%2eedu...
  Retrieved ok.
  Parsed ok.
---- 2 collected, 1 to do ----
Processing http://plato.stanford.edu/entries/emerson/emerson.jpg...
  Retrieved ok.
  Parsed ok.
---- all 3 pages retrieved and parsed ----

Writing out collected data...
Writing document 'Ralph Waldo Emerson' to file /tmp/Emerson.plkr
Converting mailto:rgoodman%40unm%2eedu...
Converting http://plato.stanford.edu/entries/emerson/emerson.jpg?width=150&height=193&depth=2...
Converting http://plato.stanford.edu/entries/emerson/...
Wrote 1 <= plucker:/~special~/index
Wrote 2 <= plucker:/~special~/metadata
Wrote 3 <= plucker:/~special~/pluckerlinks
Wrote 4 <= plucker:/~special~/category
Wrote 5 <= http://plato.stanford.edu/entries/emerson/
Wrote 11 <= http://plato.stanford.edu/entries/emerson/emerson.jpg?width=150&height=193&depth=2
Wrote 12 <= mailto:rgoodman%40unm%2eedu
Wrote 13 <= plucker:/~parts~/http%3a%2f%2fplato.stanford.edu%2fentries%2femerson%2f/1
Wrote 58 <= plucker:/~special~/links1
Done!
%

The ebook winds up in /tmp/Emerson.plkr.  Because of "-M 1", it only
pulls that one page and the things it includes (like images).  Links
off the page are kept, but the page pointed to by the link isn't
included.  Links internal to the page (it starts with a TOC) are kept,
and are active.  The images are two-bit grayscale (use --bpp=8 for
paletted 8-bit color, and --bpp=16 for non-paletted 16-bit color).
You can specify "-q" if you don't want to see all this output.
Because of the "--category=Biography", it winds up in the "Biography"
section of Plucker's little library browser.

Bill