Book People Archive

another fantastic "digital-reprint" from jose menendez

From: Bowerbird@[redacted]
Subject: another fantastic "digital-reprint" from jose menendez
Date: Tue, 28 Feb 2006 17:07:50 EST
jose menendez has created another one of his
fantastic "digital-reprints" of a scanned p-book.

this time it's for "books and culture", the book
that was google's first public-domain example.

>   http://www.ibiblio.org/ebooks/Mabie/

it ends up that i have worked rather extensively
with both of the books jose has "reprinted" now
-- "books and culture" and also "my antonia" --
so i've seen and reformatted 'em many ways, but
jose's files are undoubtedly the most interesting...

inside the arena of "scans versus digital text" --
which, yes, is largely a false dilemma, but still...
-- these digital-reprints are quite fascinating...

first of all, the digital-reprint replicates the "look"
of the p-book rather faithfully (except the fonts).

we need to remind ourselves that page-images
of any particular title are an "idealistic rendering"
that is meant to symbolize "the book itself", and
almost never that particular _copy_ of the book...

except in the cases where a book is freakishly rare,
it's not a fetish over the particular copy we scanned;
any other copy would've served equally well as "ideal".

indeed, as "exemplar", any particular copy we scan
has flaws -- broken and faded type, maybe a word
a reader underlined years ago, and so on and forth...

the digital-reprint, on the other hand, is a clean copy.

none of the characters are broken; they are all sharp.

each line of text -- located with computer precision --
has leading that is exact and consistent across a page,
rather than the usually-close-but-certainly-not-exact
leading we get from those 20th-century typographers,
who were literally putting thin sheets of physical metal
between physical rows of type on their physical presses.

and, once again because of the precision of computers,
the baseline for each and every line in the digital-reprint
is a perfectly straight line!   not so with the page-scans...

these factors, and many more, make the digital-reprint
a _better_ "exemplar" of the "ideal" version of the book
than any set of scans of a paper-copy could _ever_ be...

moreover, of course, a digital-reprint gives to users
all the _power_ of digitized text that scans cannot --
copying, searching, reflowability, you know the drill,
including maybe the most important -- small filesize.

and don't give me that "disk-space is cheap" rot, either.
i'm not talking just "expense", but also "convenience",
and not for a single book, but for an immense library.

jose's digital-reprint of "books and culture" is <1-meg.
take that as the average size of books on a 6-gig d.v.d.,
and the d.v.d. holds 6,000 books.   that's a good number,
and means you can carry 600,000 books with one hand.

the page-scans, on the other hand, need about 20 megs
for "books and culture", depending on the image-format,
so we could put _300_ books of that size onto our d.v.d.

um, not as good.   20 times not-as-good.   20 times more
expensive, which -- if we produced 10 million d.v.d's --
is not a trivial cost, not at all.   but maybe more important,
the headaches of dealing with 20-times-as-many d.v.d.'s
(storing, shipping, etc.) is an order-of-magnitude greater.

if we can hand each third-world village a _briefcase_ with
the entire contents of the library at stanford -- rather than
a _palette_ that would require a forklift -- that's a big plus,
in terms of _both_ expense _and_ convenience.   a big plus.

so yes, somewhere in cyberspace, we should keep a set
of the page-scans of "books in culture" and "my antonia",
and all the other books in our global cyberspace library,
one that anyone can access if they fetishize "the original"...

but as the "working version" of the "ideal" of those books,
jose's digital-reprints are a _much_better_ version for use.

and once we recognize the "scans versus digital text" choice
as a false one, and admit that the scans need a digital mirror,
if only so we can run searches on it to pull up the desired scan,
then we will realize that we absolutely _need_ a digital version
that _is_ totally faithful to the _look_ of the scans themselves.
so it is a matter of _necessity_ that we create digital-reprints.

pay attention that i am _not_ suggesting the "digital-reprint"
of the scans should be our "final working version", not at all.
there are artifacts in the digital-reprint, namely the linebreaks
and the pagebreaks, end-of-line hyphenates, running heads,
pagenumbers, etc.   indeed, the _whole_point_ of the "reprint"
is to include these artifacts.   however, in most of our dealings,
we'll actively want to eliminate these artifacts most of the time.

so after making this "digital-reprint" that mimics the scans,
we should go on to create the even-more-ideal version that
strips away those artifacts of the p-book, so we have a "pure"
digitized version of the text, to be our major working copy...

but in the "chain" of versions that we create, starting from the
page-images, the "digital-reprint" step is an important one to
execute and to keep for future use.   as i just said up above,
if we want to use the scans at _all_, we'll need a digital-reprint.

so i encourage people to take a good hard look at jose's files,
and compare them with the scans.   it's particularly easy with
his "books and culture" because each pagenumber in the .pdf
is a link that will summon up that page-scan in your browser.
(the links are pointing to the page-scans housed by google.)

i also encourage you to ruminate on what "digital-reprints"
mean relative to the crafting of our ultimate workflows and
the output of our production-process for a global e-library.

if your rumination doesn't enlighten you, you need to do more.
this is very powerful stuff.   thank you, jose, for leading the way.

-bowerbird