DjVu all over again (Was Re: Browser-friendly text image compression
- From: Eric Eldred <eldred@[redacted]>
- Subject: DjVu all over again (Was Re: Browser-friendly text image compression
- Date: Tue, 18 Sep 2001 17:43:27 -0400
On Tue, Sep 04, 2001 at 11:01:31PM +0700, Doug Cooper wrote:
> ....
> a semi-public entry,
> DjVu. This was developed at AT&T, and is being commercially
> pushed by LizardTech ( http://www.lizardtech.com ). The decoding
> plug-in is free, as is a non-commercial encoder from LizardTech, and
> a small collection of public-domain software.
As an experiment, I scanned two books using djvu. Here they are:
http://www.eldritchpress.org/pvo/bezette_stad.djvu
http://www.eldritchpress.org/jhf/fabre_book_insects.djvu
(File size each about 1.5MB.)
(you can use the default 80 port as well as the 8080 port now
for eldritchpress.org, btw--as long as at&t/@[redacted] allows it)
To view these files you should download the browser plug-in
appropriate for your machine from http://www.lizardtech.com
The first book I scanned from DjVu Solo 3.1 in MS Windows 98SE
(the only operating system for this utility). This required
using the scanner's Twain user interface and saving each page
to a new file. It was possible to set some pages to b&w only,
others color. I converted each file separately to djvu format.
It took a bit of manual work but was simple. Then I added pages
all at once after the first, to "bundle" the book.
The second book I scanned from ABBYY Finereader 3.0, using
the Scan button (not Scan and Read), and setting the scanner
control in ABFR to color at 300 dpi. One page image I rotated
in ABFR. I then selected all
the pages in the thumbnail view (click on the first, scroll
to the last, shift-click it), then File/Save image as (F12)
and chose an uncompressed TIFF b&w format. I then scrolled
to the 13 color pages and saved those to color uncompressed
TIFF files. In DjVu I then added an image page after the
appropriate text page, selected the color TIFF file, then
deleted the b&w file placemarker. I saved to a djvu file.
The only problem I had was trying to work with compressed
TIFF files--not a good idea, it turns out, since it led to
inexplicable crashes. The compression ratio for a "bundled"
file, as my second is, is considered superior to compressing
each file first in djvu.
Anybody who wants can have these fine books (first published
in 1921). I still intend to shut the site and move off the
WWW.
DjVu is excellent, I think. I have been trying to figure
out a way to present these books for a long time. Bezette Stad
is a great example of Dada and de Stijl book publication. I would like
to see some translation of the Flemish and other text, but
it is almost impossible to present the text linearly as in
an HTML file. The illustrations in the Fabre book are
incomparable. I was able to enlarge them 700% and still get
a usable image--that would not be likely with JPEG. I don't
have the plugin for OCRing djvu, but a reader could extract
the text from the files this way and send it to a braille printer
or speech synthesizer, I guess.
For color images along with text, DjVu is a great solution
and I recommend learning to use it. It is very fast and
increases production considerably, since there is no
proofreading to speak of. ABBYY ought to consider making
it easier to save from Finereader into djvu format, saving
some of the manual steps I had to take. I didn't try saving
to PDF and then using DjVu tools to convert to djvu, but if
anyone has done this please tell us your results. And,
thanks, Doug, for the tip!