Re: Why proofed and formatted digital text?
- From: "David Starner" <prosfilaes@[redacted]>
- Subject: Re: Why proofed and formatted digital text?
- Date: Mon, 27 Feb 2006 17:22:38 CST
On 2/25/06, David Feuer <david.feuer@[redacted]> wrote:
> Images don't have OCR and proofreading errors, and don't miss any subtle
> points.
Look at the HTML edition of <http://www.gutenberg.org/etext/17719>;
it's full of errors in the non-English text that have been noted and
corrected in the text edition.
> If the
> e-text is marked carefully with information about font, type size,
> hyphenation, and significant spacing, it's easy for someone to come in
> later and format the text in various ways,
No, not really. It's very hard to turn that type of information into
useful information for reformatting a text. In particular, why worry
about hyphenation? Unless you're concerned about making a text version
that looks exactly like the original, and most people prefer the
actual scans, it's not important.
>My philosophy is that the primary
> goal of PG should be creating texts that precisely record every
> significant element of each text,
It's not. And I wonder about your definition of significant. I'm much
more concerned about marking sections as foreign and what language
they're in, rather then copying the exact typography of the original.
It's more important to mark up speakers and stage directions on a play
then noting details about the fonts used, and only marking up the
speakers will let you reformat it in a useful way.
> and its secondary goal should be
> producing e-texts to read.
That is PG's primary goal. I don't think you'll get many people
involved in a project that has a more abstract goal.
> DP asks proofreaders to join hyphenated
> words when seemingly appropriate: I think that task (as well as page
> joining) should be left to post-processing, leaving the original
> formatting available.
But the pre-post-processed text isn't really available anywhere. And I
don't see the point of putting more weight on the post-processors, for
the one or two scholars who study hyphenation and might care where the
hyphens lay in the original.