Re: on duplications
- From: Jon Noring <jon@[redacted]>
- Subject: Re: on duplications
- Date: Thu, 11 May 2006 15:37:54 MDT
Michael Hart wrote:
As usual, it's amazing how well Michael can right justify his plain
text messages!
> For those who think someone is bringing up a new topic here, the
> fact is that even before Project Gutenberg had 100 entries I had
> addressed the topic of including multiple editions and said that
> by the time we had 10,000 titles, I expected to have handfuls of
> editions of the greatest works, and dozen by the time of 100,000
> entries in our catalogs. Obviously we started quite early in an
> approach of this nature, with multiple editions of John Milton's
> Paradise Lost, two different sources of this great book were the
> entries numbered 20 and 26, not to mention Darwin, Shakespeare's
> Complete Works, Dante's Divine Comedy, etc., etc., etc.
I find the FRBR system of classification, what I call the "WEMI
System" (rhymes with "hemi") to be an excellent way to understand the
world of texts. WEMI is the easy-to-remember acronym for
Work-Expression-Manifestation-Item. For further info on FRBR, see:
http://www.ifla.org/VII/s13/frbr/frbr.pdf
Clearly, for a given Work, there can be multiple Expressions, and
each of these Expressions, even though representative of the Work, are
truly different from each other to have their own identity (e.g.,
different English language translations of some original Work.) So
having multiple Expressions in a corpus is not duplication. But having
multiple Manifestations of the same Expression *is* duplication.
(Certainly, at times the boundary between Expression and Manifestation
is fuzzy, but then all of life is never black or white, but a whole
rainbow of greys or colors -- we still speak of red, blue and green,
despite an infinity of colors in real life.)
How does the WEMI system work? Let's look at the classic example: Mary
Shelley's "Frankenstein." Mary Shelley wrote essentially two different
Expressions of the Work known as "Frankenstein" (some say three), each
with a quite distinct ending. The Work (an abstract concept) is the
same, but two different Expressions of that Work exist (a lot of Works
have only one Expression, just to make clear.)
In turn, each Expression has been printed at different times by
different publishers and maybe with a few minor text edits (such
as changing of punctuation to conform to the conventions of the
time) -- so we can have a plethora of Manifestations. Of course,
libraries may own one or more different printings on the shelves --
each actual book is an Item, and itself is unique in a physical
sense (e.g., one may have coffee stains on some pages.)
Work: Frankenstein
Expressions: 1st Edition
2nd Edition (different ending than 1st Edition)
Manifestations: Lots of different printings for each Expression
Item: A particular book printing sitting on the library shelf.
(In the digital world, Item tends to loses its significance.)
> Our policy has always been to list the same books from different
> sources as individual editions and then often to reconcile those
> editions into what is known as a "critical edition" of sorts.
Well, this is where you and I part company that this is what PG
should be doing. In my opinion, PG should focus on making etexts
authentic to the original paper copy, whatever it is. (Certainly, the
concept of "authenticity to the original" is itself somewhat fuzzy,
but that's a different thread.) Now, if someone wants to make a modern
"composite" edition by combining these various sources and also submit
that to PG, great! But then it should be identified in metadata for
what it is -- a modern composite of several sources -- and sit along
side the other more authentic issues, rather than replacing them.
Fortunately, Distributed Proofreaders, the #1 supplier of new etexts
to PG, does believe in authentic transcriptions from the source
document.
Jon Noring