Re: Paul Duguid article: "Limits of self-organization: Peer production and quality"
- From: Bowerbird@[redacted]
- Subject: Re: Paul Duguid article: "Limits of self-organization: Peer production and quality"
- Date: Fri, 8 Dec 2006 16:48:02 EST
john said:
> If folks see something suspicious
> but don't have the time or skills to
> download and unzip a big bunch of tiffs,
> they could just hit the "report error"
> link from the Gutenberg cover page.
> Then someone at the other end who
> *did* know how to open and use the scan set
> would be able to find it via the "see scan set" link
> and either make a correction or leave it as is.
any "system" that requires _anybody_ (average end-user or
p.g. bureaucrat) to "download and unzip a big bunch of tiffs"
is a massively bad failure of architecture from the word "go".
why download a whole book of scans to look at one page?
every one of the scans needs to be publicly addressable
with its own u.r.l., which should be easily determinable.
i really hate to keep pointing to the same set of files, but
darn it, i have already prototyped a model system for this:
> http://www.greatamericannovel.com/mabie/mabiep123.html
> http://www.greatamericannovel.com/myant/myantp123.html
> http://www.greatamericannovel.com/sgfhb/sgfhbp123.html
> http://www.greatamericannovel.com/tolbk/tolbkp023.html
> http://www.greatamericannovel.com/ahmmw/ahmmwp023.html
an important part of the developing web is the open a.p.i.,
the notion that you make it easy for others to access you...
surely in something like a _library_ -- where _access_ is
one of the key elements in the first place -- this practice
must be firmly and deeply embedded in the cornerstone.
make all the scans _available_ -- on an individual basis --
so people can point to them, grab them, remix them, etc.
public domain materials need to be given to the public!
if someone wanted to point to page 123 of "books and culture"
by hamilton wright mabie, here's the u.r.l. i would tell 'em to use:
> http://snowy.arsc.alaska.edu/bowerbird/mabie/mabiep123.jpg
likewise, page 73 and 241 would be:
> http://snowy.arsc.alaska.edu/bowerbird/mabie/mabiep073.jpg
> http://snowy.arsc.alaska.edu/bowerbird/mabie/mabiep241.jpg
in contrast, here's the u.r.l. you'd use for page 123 at umichigan:
> http://mdp.lib.umich.edu/cgi/m/mdp/pt?seq=129;id=39015016881628
never mind the b.s. that the "sequence" number is 129, and not 123
-- extrapolate from that to see how to get page 73 and page 241 --
because i have already documented exhaustively how stupid _that_ is.
even worse, this shows the _image_, but also the umichigan _interface_
that displays the image, along with a wide variety of other interface stuff.
that's not going to be acceptable to someone who wants just the _image_.
now, if you dig through the source .html for that page,
you will find that the _image_ is being called from here:
>
http://mdp.lib.umich.edu/cache/39015/0/1/6/39015016881628/00000129.tif.50.2.png
and indeed, when i copied that u.r.l. and paste it into a new browser window,
i _did_ get the image. but there's a gotcha. if you look at that u.r.l.
closely, you'll notice it contains "cache". the image is being generated
on-the-fly and then stored in a cache. that's because you can ask for
different image-sizes and rotations, which is a neat capability to have.
(you might notice that u.r.l. returns a half-size upside-down version of
the page.) but the upshot is that an end-user
cannot just point to the "cache" u.r.l. and expect that there will be an
image there.
the reason i got the image for page 123 was because i had just generated it
and it was still in the cache. but _tomorrow_, or next week, it might not
still be there. likewise, when i changed the "129" in that u.r.l. to
another number, without first having generated the image on-the-fly by
going to its umichigan interface page, i got a 404-not-found error.
now there's certainly got to be some way to reference the image all by itself
--
from the u.r.l., i'd guess there's a "master" 129.tif floating around
somewhere -- but umichigan hasn't made it clear to us how we might go
about doing that.
once again, the sine qua non of a library is _to_provide_access_.
if you've built an infrastructure that doesn't give it, you've failed.
and when you fail on a library that contains "billions of pages",
you have failed on a very grand scale indeed.
***
anyway...
as i said, i developed a prototype of a system.
if there's something that could be improved in my system,
let's talk about it and thrash it out. but for heaven's sake,
if not, then let's stop jabbering and simply _implement_it_.
-bowerbird