Book People Archive

"eBooks" vs "Digital Picture Books




This is something I wrote overnight at the request of
a newspaper doing a workup on the recent announcement
reviving "The Million Book Project/Universal Library"
I was once affiliate with at Carnegie Mellon.

I figured it might be suitable for today's closing of
"The Book People," and I am still hoping to make just
a personal comment. . . .

Thanks!!!

Michael S. Hart
Founder
Project Gutenberg



eBooks Versus Digital Picture Books


by Michael S. Hart
Inventor of eBooks
Project Gutenberg,
& World eBook Fair


eBooks as I invented them are what you would get if you
sat at your computer and typed in a book the same way a
person types in anything else, however, but most of the
news items that claim a million books are book pictures
rather than actual computer characters.

Here is the difference:

When you hear about a megabyte, that is the one million
sized bunch of computer characters. . .just what we get
if we typed in a million characters like this.

A gigabyte is a billion characters, or bytes.

A terabyte is a trillion characters. . .etc.

This month terabytes have been around $200, very cheap.

A million eBooks with a million characters each makes a
library of eBooks that just fills up a terabyte as long
as these are real computer characters, such as the ones
you and I type in to create these eMails.

Footnote:  if you use .zip compression files you can do
2.5 million eBooks per terabyte.  This does not work on
the digital picture books explained below.

You can search every character of these eBooks, and put
quotes from them into your eMail, your word processors,
or any other text oriented programs in the world.  Your
programs can do spell checking, indexing, concordancing
and any numbers of other things usually associated with
the way people use books.

BUT!

Not so with digital picture books.


The Million Book Project, Google and all the others who
tout having a million books are talking about pictures,
picture of books, but NOT characters, NOT what we type,
here and now, creating our eMails, but what you get out
if you were to take a digital photograph of characters,
digital photographs of books, or of anything else.

A picture is harder for computers to use than letters.

It is just that simple.

Each page has to have its own picture in its own file.

Way too many files.

The better the file looks, the bigger it is.

Small picture files look "pixellated" and obvious that
they are computer pictures.

Big pictures files look better and take so much longer
to download and to process that "flipping pages" isn't
really in the cards. . .but with real eBook pages your
pages can go by as fast as you like, and all the pages
can easily be in one single file.

One file. . .versus. . .250 files. . .per book.

If you've ever opened a directory with many thousands,
or even less, files in it, you know what is coming....

It takes the computers an enormous effort to contain a
seriously large number of files.

The more files, the more of the computer is wasted.

Let's take the 1.5 million books mentioned in the news
being covered at this very moment.

Presuming 200 pages per book that's 300 million pages!

Can you grasp what your computer would be like if your
1.5 million books took up 300 million files?!?!?!?!

Unless you had a much better computer than most people
your computer would be serious dragging with the load!

Just try downloading a book from Project Gutenberg and
a book from The Million Book Project and it will be an
obvious situation. . .be sure to use a stopwatch.

Project Gutenberg sends out over one eBook per second,
and your eBook shouldn't take any time whatsoever more
than your own network delay. . .if you have very quick
Internet connections you can pick one collection of an
assortment of eBooks and get a number of books at just
about one book per second. . .though most connections,
sadly to say, have so much time lag that you would not
usually notice how fast the eBooks went out.

Then try downloading a Million Book Project entry.

See what I mean?

It's only obvious if you try.

You could literally download 1.5 million eBooks of the
Project Gutenberg variety to your terabyte drive.

No problem.

Plenty of space to spare for another million books, if
your fancy should be tickled.

Now let's consider The Million Book Project, Google or
any other of the "digital pictures of books" variety.

Each file is going to be sent separately.

They don't really go out of their way to make it easy,
because they don't really want to you OWN their books,
what they really want is to get you to read over their
shoulders, so to speak. . .they keep the books and you
never get to actually own them.

Not so with Project Gutenberg.

You download 100,000 of our eBooks. . .you own them!!!

And it doesn't take all your computer power to do it!



300 Million Pages!!!

Even at about 10 pages per second it takes a year!!!

Huh???

Think about it. . . .

Your computer network probably runs 30 million seconds
per year, presuming a few percent downtime, for normal
maintenance, upgrades, network outages, etc.

IT WOULD TAKE ONE ENTIRE YEAR TO GET 1.5 MILLION BOOKS

If these books were "digital picture books."

Get the "picture?"


Then think about how to store 300 million page files.

Get the picture?


Then think that you can't cut and paste quotations.

Or even search for quotations.

Or eMail the quotations.

Or correct typos in the books.

Or use them in Microsoft Word when writing.

ALL you can do is READ them. . . .

Which is certainly nice. . . .

But it is no advantage over paper books other than in
the sense of reading at a distance.

You still have to type in all your notes, quotations,
and anything else.

Let's say you were doing Romeo and Juliet in theater.

You want to make up scripts for your cast.

You want to individualize the scripts for your play--
each player only needs certain portions.

Not all portions are usually included, editions would
be different about this.

You can tailor make your script from a dozen editions
simply by cutting and pasting the parts you want.

Then you can cut and paste up each players scripts.

Daily changes, corrections, stage instructions can be
included in each printout.

Try THAT with a "digital picture book."


Hopefully your imagination has been sufficient to see
what I have been trying to explain here, but you will
find me easily available to clarify, if needed.



Part II


We actually tried The Million Book Project's eBooks a
bit since the big announcement came out November 29--
and have a few more things to report as a result.

First, we should report that this is a renamed effort
started at Carnegie Mellon back in the heyday of what
turned out to be the beginning of major interests for
the eBook world by world-class universities.

Sadly to say, this project never fulfilled the dreams
of its creators in either the number or quality of an
eBook collection, hence every few years announcements
of a "new" version of this project came out trying to
generate new interest, or financial support, etc, but
the result is that most of the books we looked at for
our sample either had missing pages, fuzzy pages [the
computers and humans both have trouble reading these]
and pages that were not photographed straight, [which
causes the same reading problems as above].

It would be an overstatement to say that 1.5 million,
or even close to it, book were available to download,
at least in the sense of what you would get from your
local bookstores, libraries, etc.

It is one thing to say an eBook [computer characters]
is 99.99% accurate to the original and that this will
move to 99.999% and then 99.9999% in the future, but,
it is totally something else to say that a collection
of books has 99% of its pages in decent formats but a
1% portion of the pages is missing or unreadable.

Obviously these comments are only from small samples,
but it appears from such samples that the process had
been totally automated and that no human being looked
to see that each page had actually been scanned in an
accurate enough way to read, or scanned at all.

However, it is also obvious that LATER someone looked
and marked the pages as missing.

Why this couldn't have been arranged for at scanning,
and then corrected while the book as at the scanner--
is beyond me.

[Note, we are not experts in Chinese, though we quite
literally are just starting a few eBook projects from
Chinese books, so we are just guessing that the notes
we found where pages are missing, were telling us the
pages were missing.]

While no one expects perfection from new projects the
project in question is not new, barely referenced for
this in the press release, then simply by an old name
for the project, it will be difficult in the extreme,
too difficult to imagine being done well or soon, for
someone to dig up the missing and unfocused pages, in
fact, it might take as much effort as the scanning of
the original books. . .100% reinvestment of time.

With eBooks it is trivial to correct most mistakes in
the books, you just put the book in a word processor,
correct the error, and save the new file.

Anyone on any computer can correct eBook errors.

But errors in "digital picture books" are hard to fix
in that sense, you would actually have to "photoshop"
the page the way they do for movies to fix an error--
very time consuming and labor intensive, and not such
a thing most people could do on their own computers.

Thus we should not expect much improvement in digital
picture books, while we should expect improvements of
a significant nature in eBooks.

Just another of the differences that put these books,
often confused with each other, into separate camps.



Summary


An eBook and a digital picture book are different.


eBooks are small files, needing only 2.5 megabytes to
store 2.5 million character book. . .or one megabyte,
if you use compressed files such a .zip file.

Digital picture books are large, one file per page to
start with, and each file does not compress well from
the standpoint of such .zip files.  So many files are
not easy for most computers to handle, which wastes a
lot of the hard drive space, the directory structure,
and the time it takes to flip through the pages.


eBooks are trivial to correct.  Just bring them up in
your favorite word processor, any will do, and fix up
whatever errors you find.  You can even include notes
about the errors, or about anything else, in seconds.

Digital picture books are impossible to correct, with
reference to the average computer owner.  It takes an
expertise with "Photoshop" or similar programs and we
do not have that expertise en masse to do corrections
in the same way the average computer user can fix any
error they find in a regular eBook.


The time it takes to download eBooks is short.

The time it takes to download picture books is long.


It takes one terabyte to store a million eBooks.
[2.5 million if you use .zip files]  All for $200.

It would take more hard drives than anyone has that I
know of to store a million digital picture books of a
variety such as The Million Book Project or Google.

We are talking about a seriously major investment.



Personal Computers As Personal Libraries


The average personal computer today is under $500.

Adding a terabyte might add an average of $200.

That easily gives you space for a million eBooks.

Want two million, just add another terabyte.

They are out there, free for the downloading.


If you simply presume "digital picture books" are ten
times the size in disk usage, it moves the picture to
something only 1% of computer users might consider to
be possible, given either their physical space or the
limitations of their budgets.

If you presumed the digital picture books were larger
by a factor of 100 times, then a personal library can
be said to be the stuff only of the elite.


eBooks were designed to be for everyone to own.


Digital picture books were designed for major players
to own, not for the rest of us to do more than a read
over their shoulders now and then.


"eBooks" are designed to be incorporated into eMails,
school papers, research papers, or new editions, just
make your changes to the old editions, and you should
be ready to publish your own edition of any one of an
assortment of millions of free eBooks out there.

Try that with a "digital picture book!"


eBooks were designed to be searchable in seconds.

You can download The Complete Shakespeare in a minute
and then search it as many times as you like, seconds
or less for each search.


Try that with a "digital picture book."


The difference should be clear. . . .