Re: PDF, DRM, and "open" formats, part 1
- From: Bowerbird@[redacted]
- Subject: Re: PDF, DRM, and "open" formats, part 1
- Date: Fri, 12 May 2006 12:24:29 EDT
jon said:
> The document structure represented by the bunch of text
> is lost in a machine readable sense
that depends on how smart your machine is.
> although humans, when they understand the language
> of the text, are pretty good at inferring what's what.
it's not so much language that helps us "infer what's what",
but typographic conventions.
> (For example, how do we know what is a header?
headers -- by their very nature -- are quite easy to recognize,
because they are _created_ so as to be as obvious as possible,
meaning they're usually big, and bold, with lots of white space.
you don't need to "understand the language" to recognize 'em.
> What is an ordinary paragraph,
it usually starts with an indented-and-justified line,
continues with nonindented-and-justified lines,
and ends with a nonindented-nonjustified line...
again, you don't need to "understand the language"
in order to recognize one.
> what is verse,
lines indented-and-ragged.
easy enough to recognize, even
without understanding the language.
> what is a figure caption,
small type, usually located right underneath a figure.
no need to understand the language.
photo-credits, on the other hand, usually run perpendicular,
up one side of the photo, and are in even-smaller type.
but again, no need to understand the language.
> etc.?)
etc.
> Very complex PDF layouts (the "space shuttle cockpit"
> type of layouts -- such as the "eye candy" complex textbooks
> with myriads of sidebars, footnotes, dual columns,
> lots of figures and figure captions, varying typography, etc.
> -- all to keep 21st century kiddies happy) require a lot of
> human intervention to properly untangle the whole mess.
these layouts might be _busy_, but they're not "complex".
our "21st-century kiddies" can sort out all of your features,
precisely because those kids are now visually-sophisticated,
even though far too many of them _cannot_even_read_...
your analysis is just plain wrong.
-bowerbird