Book People Archive

Re: PDF, DRM, and "open" formats, part 1



jon said:
>    The document structure represented by the bunch of text 
>    is lost in a machine readable sense

that depends on how smart your machine is.


>    although humans, when they understand the language 
>    of the text, are pretty good at inferring what's what.

it's not so much language that helps us "infer what's what",
but typographic conventions.


>    (For example, how do we know what is a header? 

headers -- by their very nature -- are quite easy to recognize,
because they are _created_ so as to be as obvious as possible,
meaning they're usually big, and bold, with lots of white space.
you don't need to "understand the language" to recognize 'em.


>    What is an ordinary paragraph, 

it usually starts with an indented-and-justified line,
continues with nonindented-and-justified lines,
and ends with a nonindented-nonjustified line...

again, you don't need to "understand the language"
in order to recognize one.


>    what is verse, 

lines indented-and-ragged.

easy enough to recognize, even 
without understanding the language.


>    what is a figure caption, 

small type, usually located right underneath a figure.

no need to understand the language.

photo-credits, on the other hand, usually run perpendicular,
up one side of the photo, and are in even-smaller type.

but again, no need to understand the language.

>    etc.?)

etc.


>    Very complex PDF layouts (the "space shuttle cockpit" 
>    type of layouts -- such as the "eye candy" complex textbooks 
>    with myriads of sidebars, footnotes, dual columns, 
>    lots of figures and figure captions, varying typography, etc.
>    -- all to keep 21st century kiddies happy) require a lot of 
>    human intervention to properly untangle the whole mess.

these layouts might be _busy_, but they're not "complex".

our "21st-century kiddies" can sort out all of your features,
precisely because those kids are now visually-sophisticated,
even though far too many of them _cannot_even_read_...

your analysis is just plain wrong.

-bowerbird