Re: Paul Duguid article: "Limits of self-organization: Peer
- From: John Mark Ockerbloom <ockerblo@[redacted]>
- Subject: Re: Paul Duguid article: "Limits of self-organization: Peer
- Date: Tue, 12 Dec 2006 14:34:32 -0500
Bowerbird@[redacted] wrote:
> john said:
>
>> If someone can provide me page-addressable interactive scans of
>> a substantial part of the Gutenberg collection (say, 1000 titles or more,
>> on a reasonable trajectory to growing with the PG collection), and
>
> distributed proofreaders could do that.
>
> they're probably the only entity that can.
Well, I suppose someone could go through the books, see which ones are
credited to outside projects (MOA, Canadiana Online) and sent me a file of
links to the page images there. It wouldn't support your demo (unless you
then sucked in images from those projects) but it would meet my requirements
if there were enough of them.
As another possibility, I suppose one could find another useful big
collection with OCR-quality images and work with that. To date,
I've been thinking in terms of Gutenberg, but they're not the only
project that one might do interesting alternative interfaces for.
> the reason we're talking past each other, john, is because
> you're saying "i've got pointers to books, and i'd like to add
> to each one a pointer to its scanset". fine, but my counter is
> "a pointer to the book should _be_ a pointer to the scan-set."
Well, that's what some folks want, but not others. In my experience,
folks want a variety of formats for online and offline reading. What I'd
like to do with my Gutenberg cover pages is give more options. Integrated
text-and-scansets are a nice option, but not the one everyone will pick.
(And for good reason, particularly if they want to read them offline without
waiting for your offline viewer program to become generally available
and supported on whatever device they're using to read books.)
One of the reasons I have
Gutenberg cover pages in the first place instead of just a linking to a
particular Gutenberg format is that I found out that people wanted choice
in formats, and it was relatively easy for me to give them that choice, given
the regularity in Gutenberg ebook filenames and a little programming on my
end. (It also provides some redundancy in case one of the Gutenberg servers
goes offline; I have links to four others besides the main gutenberg.org
servers. Which is useful in itself, since sometimes gutenberg.org still
refuses connections under high traffic.)
There's also evidence that lots of folks like the additional converted
formats that Manybooks provides for many Gutenberg texts. (I say that because
I often see links now to the Manybooks editions rather than the Gutenberg
originals.) So I'd like to add them too, if there's an easy way to do that.
(I've dropped a note to the site maintainer; I'll see what comes of that.)
> my idea of the _architecture_of_the_infrastructure_ is that
> the text and the scan it came from should be intertwined...
It's an interesting idea, and I hope we'll get the chance to try it out.
But, to continue the archiectural metaphor, I think I'll start by
accommodating an addition to the existing architecture, rather than
tearing down the existing house in the hopes a better one will be provided
to replace it. And we'll see over time whether the addition is the part
that holds up the best.
John