Book People Archive

Re: feedback to umichigan on "books and culture", part 7



On Jan. 2, 2007, Bowerbird wrote:

> jose said:

>> The original book's running heads have a period at the end.
>> Those periods are missing from the running heads in your version. 
> 
> that's not an "error".   i remove them intentionally.


Just because something is done "intentionally" doesn't mean it isn't 
an error. Whether it is or not often depends on the circumstances or 
the context. Let me remind you of the context of your claim of a 
"perfect text." In the post I replied to,

http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2006&post=2006-12-22,1

you had a list of nine "suggestions to umichigan (and to google too)." 
At the end of the list you had this:

>>  suggestion9:  find a way to incorporate corrections by end-users. 
> 
> that last one should be fairly obvious.  i've got a perfect text now.

Why don't we ask Perry Willett if the University of Michigan Library 
would consider a text that's missing all those periods "perfect," 
whether they were removed intentionally or not? (There's also a much 
more serious error in your ebook that I describe in about the last 
fourth of this message.)


> i don't subscribe to the position that i should be an "archivist" who 
> simply records whatever the original typographer put on the page.


You can do what you like with your copy. That's one of the beauties of 
the public domain. Heck, if you--or anyone else--wanted to, you could 
add a running commentary to the book. But it's different when you 
offer your copy to an *actual archive* under the guise of a "perfect 
text."


> i am an active republisher moving a 19th-century paper-book into 
> the 21st-century environment where books are universally available.


You're moving that 19th-century book into the 21st-century 
environment? Wasn't I the first person to move the *digital text* 
(instead of scans) of Mabie's "Books and Culture" onto the Internet, 
where it's freely and universally available?


> as i make clear (in a very colorful way) on this page:
>> >    http://www.greatamericannovel.com/mabie/mabiec001.html
> i'm doing "a 2006 electronic remix of a 19th-century book"...


And you consider a "remix" a "perfect text" of a specific edition--for 
an archive? What if UM also has different edition of "Books and 
Culture"? Would your remix be a "perfect text" for that edition too? 
Or should they use someone else's remix for that one?


> you'll also find i removed the periods from all chapter heads, both 
> the one after the chapter number and the one after the chapter title;


I consider those errors as well in the context of a "perfect text" for 
an archive. And you not only removed the periods, but you placed the 
chapter number and chapter title, which were on separate lines in the 
original book, on the same line, with an extraneous em dash (actually 
two hyphens) between them. For example, you turned this:

         Chapter I.

    Material and Method.

into this:

    Chapter I -- Material and Method

Now, before you claim that I'm just nitpicking because it's your 
version, here's an excerpt from an email I sent Jon Noring back on 
Jan. 19, 2006 about his "My Antonia" demo project:

> You did introduce some characters that aren't in the original. For example, the titles of the major sections look like this in the original book: 
> 
> 
>    BOOK II
> 
> THE HIRED GIRLS
> 
> 
> In yours, you changed them like this:
> 
> 
> Book II -- The Hired Girls
> 
> 
> with an extra em dash that didn't appear in the original.

(I used an actual em dash in my email to Jon, but changed it to two 
hyphens to save our moderator the trouble of doing it.) :)


> runheads and chapters simply don't need any terminal punctuation. 
> indeed, without it, it's much easier to recognize 'em as what they are. 
> (it's easier for the _computer_ to recognize 'em.   humans don't care.)


If a computer has trouble recognizing them because of a period at the 
end, I think its routines could use some work. Also, in the 
overwhelming majority of books I've seen, the running heads were set 
in all capital letters. "Books and Culture," on the other hand, had 
mixed-case running heads like the example I used in my last post: 
"Liberation through Ideas."

If you're so concerned about making it "easier for the _computer_ to 
recognize 'em," why didn't you convert those mixed-case running heads 
to all caps?


> it's pretty simple -- just put a "blockquote" tag before and after 
> the block of text that is prefaced with those "/tab\" indicators, 
> which are then removed.   but i hadn't written it yet.   sorry... 
> :+) 


You know what surprised me the most when I saw all those lines 
beginning with "/tab\"? In HTML, for a simple blockquote, you'd only 
need a <blockquote> before and a </blockquote> after the quotation. 
You don't need a <blockquote> in front of every line in the quote. So 
why with ZML, which, I believe, is supposed to be less obtrusive than 
HTML, do you need a "/tab\" before every single line in the quotation? 
I even checked your mabie.zml file

http://www.greatamericannovel.com/mabie/mabie.zml

and saw all those lines preceded by "/tab\" in it too. Can't your ZML 
viewer app recognize a blockquote with just a "/tab\" before and, say, 
a "\tab/" after the block of text?


> when i get the routine written, yes, they'll be formatted correctly. 
> ;+)


I'm glad to hear it. :)


>> I should warn you that the only way you could make a PDF that 
>> reproduced the original line breaks--with justified text and 
>> uniform left and right margins--would be to set those quoted 
>> sections in a smaller font, just like the original book and just like 
>> my PDF digital reprint.
> 
> right.   and the reason is because those lines have more characters.
> 
> because, due to the smaller font used in the p-book, more characters 
> fit on a line.   so -- if you use the same linebreaks -- you too must then 
> use a smaller font in order to fit those "more characters" into the space.


That's why I used a smaller font size for those blockquotes in my PDF 
digital reprint. :)


> which, to give away one of my "secrets", is how a computer routine 
> can tell (from an o.c.r. file) that it has a blockquote, or a footnote, 
> or any "semantic structure" that's typically printed in a smaller font; 
> the lines of that structure will have more characters than is typical; 
> in fact, simply by scrolling down an o.c.r. file, your eye immediately 
> locates such structures because the long lines will jump out at you.


That's one of your "secrets"? I guess it wasn't working very well at 
the time you were doing your previous "feedback to umichigan" posts. 
For example, in this one, dated Oct. 3, 2006,

http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2006&post=2006-10-03,2

you wrote this:

> likewise, i reintroduced the block-quote indentation in the 
> one place in this book where it exists.  easy enough for me, 
> on this book, but a pain in the patooty on other books, so 
> _please_, google/umichigan, _fix_this_.  indentation is vital!

"A pain in the patooty"? I guess your computer routine still needed 
some work back then, because you said that you "reintroduced the 
block-quote indentation in the *one place* in this book where it 
exists." (Emphasis mine.)

There are actually *two* lengthy blockquotes in "Books and Culture." 
The first one starts a bit more than half way down page 104, extends 
all the way through page 105, and ends about a third of the way down 
page 106. See:

http://www.greatamericannovel.com/mabie/mabiep104.html
http://www.greatamericannovel.com/mabie/mabiep105.html
http://www.greatamericannovel.com/mabie/mabiep106.html

The second one starts about a third of the way down page 107 and ends 
a bit more than half way down page 108. See

http://www.greatamericannovel.com/mabie/mabiep107.html
http://www.greatamericannovel.com/mabie/mabiep108.html

It's a good thing you didn't replace the original text of your 
"continuous proofreading" demo, which you copied almost entirely (and 
imperfectly) from my versions, with the text of your one-hour demo, or 
you would have lost one of the blockquotes.

What's really odd is that you apparently didn't notice that there are 
two blockquotes while you were doing your "page-by-page visual 
inspection." Of course, you also managed to miss that italicized 
"Kourotrophos" on the scan of page 105.


> indeed, most "semantic structures" have typographical renderings 
> that make them fairly easy to identify, not just to us human beings 
> (to whom these things are blatant), but to computer routines too...


So your computer routine wouldn't have any trouble recognizing the 
blockquote that begins on page vi of the preface in Frederick 
Douglass' autobiography, "My Bondage and My Freedom"?

http://books.google.com/books?id=rKkJAAAAIAAJ&pg=RA2-PR6

Would it have any trouble recognizing the blockquote on page xviii of 
the introduction?

http://books.google.com/books?id=rKkJAAAAIAAJ&pg=RA2-PR18

Oh, wait a minute. Despite the much smaller type (and more lines) on 
that page than on a page in the main part of the book, e.g. page 123,

http://books.google.com/books?id=rKkJAAAAIAAJ&pg=RA2-PA123

that isn't a blockquote. If we go back a page to the start of the 
introduction,

http://books.google.com/books?id=rKkJAAAAIAAJ&pg=RA2-PR17

we'll see that the whole introduction was set in that very small type, 
smaller even, perhaps, than the type that was used for that blockquote 
in the preface. (The introduction is so long, perhaps the publisher 
wanted to save paper.)

By the way, the whole appendix was set in very small type too:

http://books.google.com/books?id=rKkJAAAAIAAJ&pg=RA3-PA407

But I'm sure your computer routine wouldn't have any problem 
recognizing that neither it nor the introduction are blockquotes.


> (oh, and jose, my .pdf will also indent a blockquote -- like this one -- 
> even if it wasn't indented in the original p-book, because i'm like that, 
> which means that the fontsize will have to be dropped even further...)


I'm sure readers with poor eyesight will be duly appreciative. Of 
course, using HTML and CSS, one can easily reproduce a paper book's 
formatting of blockquotes, e.g. no indentation on either side, 
indented on both sides, indented on the left side but not on the 
right, etc.


>> In the 3rd line from the bottom of the page, you'll see the name 
>> "Kourotrophos." It's italicized in the scan, but it isn't in your HTML 
>> version. 
> 
> you finally found a _real_ mistake!   good for you!   and thank you!


"Finally"? Actually, I knew about that mistake from the first day I 
looked at your demo. You see, I was eager to see how your ZML managed 
to handle the blockquotes. And there on page 105, I saw the 
"Kourotrophos" without underscores (the way you indicate italics).

Which just goes to show that no matter what kind of error-reporting 
system one puts in place, if a reader isn't interested in reporting 
the errors he sees, he won't. I only revealed the error a few days ago 
because you made the claim that you have a "perfect text."


> too bad you didn't report it using the error-report form on that page. 
> i might have paid you a "finders fee" for locating it (knuth pays $2.56 
> for every error that's reported to him) if you'd reported it "before" me.


I did report it before you--where it counts--here on the BP List where 
you made your "perfect text" claim.


>>    There are a few other errors
> 
> let's hear them!
> 
> i _promise_ i won't call you names for pointing out my mistakes...


I wouldn't care if you did call me names, but I think our moderator 
definitely would. :)


> (but report them on-site first, ok?)               ;+)


No thanks. I'm one of those people who hate filling out forms. :)

Well, let's see there are several mistakes left, but I'll start with 
the biggest. Let's take a look at your page 6:

http://www.greatamericannovel.com/mabie/mabiep006.html

On the left, we'll see your text of the page with a list of "Books by 
Mr. Mabie," and on the right there's a somewhat doctored image of that 
page from the book, but it'll suffice for this. Now, if we count how 
many book titles are in each list, we'll get 19 for the list in the 
scan, but only 18 for your list. Here are the first 4 items in *your* 
version of the list:


My Study Fire
Under the Trees and Elsewhere
Short Studies in Literature
My Study Fire, Second Series


And here are the first *5* in the scan's list:


My Study Fire
Under the Trees and Elsewhere
Short Studies in Literature
Essays in Literary Interpretation
My Study Fire, Second Series


When you made your version of the list, Bowerbird, you missed one: 
"Essays in Literary Interpretation." Interestingly, you made the 
mistake on just about the only part of the book that you couldn't copy 
from my versions.

(For those who've never downloaded either my HTML or PDF ebooks of 
"Books and Culture," I didn't include that page with the list of 
"Books by Mr. Mabie." My ebooks start with the title page, which comes 
right after the page with the list.

http://www.ibiblio.org/ebooks/Mabie/ )

What's really odd, Bowerbird, about your missing "Essays in Literary 
Interpretation" in that list is that you told us many times during 
your "feedback to umichigan" series of posts that you'd started from 
Google's OCR on the UM website. Well, if we look at the OCR of the 
page with that list at the UM website, we'll see that it includes 
"ESSAYS IN LITERARY INTERPRETATION" where it should be (4th on the list).

http://mdp.lib.umich.edu/cgi/m/mdp/pt?view=text;size=100;id=39015016881628;seq=6;page=root

(Anyone interested in seeing the undoctored scan of that page, can 
click on the "view page as image" link on the left side of the screen.)

And in this BP post dated Dec. 22, 2006,

http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2006&post=2006-12-22,1

you told us this:


> by the way, in an interesting turn, re-doing this book led me 
> to discover two errors in the version that i had put up online, 
> which i _had_ thought was error-free, on pages 6 and 247...
> 
> and, with my continuous proofreading interface, i was able to 
> make a note about each error right on the page where it was, 
> which will serve as reminders so that i don't forget about them.


But on your page 6,

http://www.greatamericannovel.com/mabie/mabiep006.html

there's no note about missing "Essays in Literary Interpretation" nor 
even about just missing an unspecified title in the list. There is, 
however, this error report on that page:


name: bowerbird
email:
old:
new:
comment: old> A Child of ANature (Illustrated)
new> A Child of Nature (Illustrated)
filenameis: mabie/mabiep006.html


So you found a typo but didn't spot that missing title in the list. I 
wonder how you managed to lose it again, when it was included in the 
Google OCR you said you started from.


Jose Menendez


P.S. Also, note the seven "Illustrated"'s in the list. You enclosed 
them in parentheses, instead of italicizing them the way they are in 
the original book.