Book People Archive

Re: Duplicate File Names



jrusk@[redacted] wrote:
> 
> Considering the number of etexts in Project Gutenberg it is
> not a surprise that there are some duplicate files names. In
> order to help others who may be creating indexes to the
> Gutenberg etexts, here are the duplicates I've found:

I'm one of the people who maintains an index of Gutenberg etexts,
and I don't see a problem with Gutenberg reusing the same "leaf"
filename (that is, one that's the same except for the directory it's in),
as long as they're filed in different years, as they are
in all the examples given in your note.  Different years go in
different directories in most Gutenberg archives, so there's no
conflict.

Basically, I consider the year to be part of the file identifier,
so for example,

   etext00/lteng10

is the file identifier for Voltaire's "Letters on England", and

   etext99/lteng10

is the file identifier for Bancroft's "Letters from England".

(Actually, the main identifier I use for indexing
is the etext number, rather than the filename.)  

For me, it only becomes a problem if updates to these etexts are placed
in different years from the original etexts (which occasionally
happens, though not yet in a way that's caused a collision), or if
the same "leaf" filename is used twice in the same year for two
different etexts.

Mind you, if Gutenberg says that the part after the year is meant to
be unique, there would be a consistency problem.  But I haven't seen
them say that-- if they do, I'd be interested in knowing where.

The only duplication I've had problems with from Gutenberg is when
they occasionally issue two versions of the exact same work, without
any clear distinction intended (or noted) between source editions.
But that's not very common.

John