Re: Paul Duguid article: "Limits of self-organization: Peer production and quality"
- From: Bowerbird@[redacted]
- Subject: Re: Paul Duguid article: "Limits of self-organization: Peer production and quality"
- Date: Wed, 29 Nov 2006 15:21:56 EST
john said:
> Paul Duguid has an interesting article in last month's First Monday,
> looking at quality control issues in peer production systems.
> The specific examples he looked at were Gracenote's CDDB,
> Wikipedia, and Project Gutenberg.
yes, it's a well-done article.
> The article makes the point well that peer production
> in itself does not guarantee either high quality or even
> progressively increasing quality, for various reasons.
duguid makes the point "well" because those "various reasons"
are very convincing ones, which is an important thing to state...
> The article talks about the problem of granularity with respect
> to Distributed Proofreaders, where proofing is done page by page,
> and says that doesn't work particularly well for editorial decisions
> that need to be applied uniformly across a book.
that's a good summation of that particular point, yes.
> He seems to have missed that DP also includes rounds where
> one person reviews the entire text for issues like formatting.
> (At least I think people look at the entire book in the formatting
> rounds; I'm having a hard time finding that said explicitly.)
you won't find that said explicitly, especially in regard to "rounds",
as the person who "looks at the entire book" is the "post-processor",
a job which is done _after_ the book is all finished with the rounds...
it's one of the "non-distributed" tasks at d.p., so it fits duguid's point
that such non-distributed, non-granular treatment is necessary too.
("content-providing" is the other main d.p. "non-distributed" task...)
in the "formatting rounds", it's still a page-at-a-time process, though
it's highly likely that on some books (and perhaps even many of them)
one person might indeed format a large number of consecutive pages,
occasionally even a whole book. but its design is still page-at-a-time.
after (one or) two "formatting rounds", in steps the "post-processor",
whose job it is to resolve any "book-wide" inconsistencies that remain.
it is because of this that many people over at distributed proofreaders
categorically rejected what duguid had to say, even though if you look
at what he has to say from an unbiased viewpoint, you'll see it is sound.
the detailed example he used -- "tristram shandy" -- is a book where
the digitizer(s) needed to know the _content_ of the book in order t
make the _editorial_ decisions that bore upon the task of digitization.
but even a "look-at-the-whole-book" mode won't help in that regard;
that is, the post-processor is _not_ expected to _read_ the book in full.
that falls to the (optional and rarely exercised) task of "smoothreader".
my model of "continuous proofreading" puts the _final_burden_ on
readers from the general public, and i think this is a better approach.
some errors can only be caught by a person who's reading for content;
so we need to make that an _explicit_ part of our bargain with the users.
but that's generally considered "outside the realm" of most d.p. work.
their attitude is that once they are "finished" with a book, it is _done_...
we should note that d.p. didn't actually digitize that "tristram shandy".
i'm not sure who did, but it was probably one person. so that example
doesn't have a _whole_ lot of bearing on the points duguid was making
in terms of "the laws of quality", which he _should_ surround in "quotes",
because he is disputing (or at least questioning) that they actually exist.
but _that_ is where duguid's _strongest_ point is. he argues that
our belief that "more eyeballs can solve the problems" blinds us to
doing a close examination on whether this "law of quality" is true...
he's saying that we need to entertain the possibility we are wrong.
and although you wouldn't think that is a controversial thing to say,
you'd be wrong in regard to the people at distributed proofreaders.
the way they have stuck their head in the sand concerning this article
is all the proof you need that they are not just blind, but willfully so...
duguid makes other points -- how distributing a task might cause
difficulty in the communications arena, for instance, or uncertainty
surrounding tasks that are not clear-cut as to their responsibility,
or that there is no check for errors introduced late in the process --
that someone who is familiar with d.p. can see obviously playing out.
(feel free to ask me for specific examples if you're really interested;
please state explicitly whether you prefer brief or "extensive" ones.)
but since these points have already been made by other observers
(e.g., in the book titled "the mythical man-month", as one example),
i think his argument about a general blind-spot is his most potent.
he's reminding us to question our assumptions, always good advice.
but that's advice that distributed proofreaders has already forgotten.
> It might be worth giving more publicity and community discussion
> to the process of overall review of books, as that's an important part
> of ensuring quality.
it would be great to have such "publicity and community discussion",
except nobody seems to have any inclination to have that discussion.
i've tried to raise this issue here many times in the past, unsuccessfully,
primarily because the d.p. people here embroil me in mud when i try...
my guess is that very few people -- if indeed anyone -- will respond
with any intelligent commentary on this listserve. i'd like to be wrong
-- oh crap, i would _love_ to be wrong -- but i really doubt that i am.
the simple fact is that nobody seems to care. oh well...
> (Even if DP folks commonly discuss this internally
i have tried, on the gutvol-d listserve, to engage the d.p. people on
the points raised by this article, but they are uniformly uninterested.
it was discussed -- for a day -- in their own forums, where duguid was
attacked -- unfairly -- until they felt they'd discredited him sufficiently.
perhaps the title of the thread -- "bad press for p.g." -- is enough to
tell you how intently they processed the points duguid made so well...
> http://www.pgdp.net/phpBB2/viewtopic.php?t=22839
i see that juliet has now "replied" to this listserve, by pointing to her
summary comment in that forum thread. do please go and read it.
but her two-paragraph abstract here (which basically boils down to
(1) duguid's points do not apply and (2) we are all about the quality
already so nobody needs to concern themselves with our processes),
are an excellent encapsulation of the d.p. attitude toward "outsiders",
especially those who do not fawn over their bake-sale cookies (to use
a phrase -- and very sharp insight -- that duguid himself introduces).
and ya know what? i could even be comfortable with that antagonism,
except _too_many_ people seem to think of distributed proofreaders as
_the_ example of _how_ text digitization should be done. i'll grant you
that there are lots of appealing aspects to d.p., but there are also some
_huge_ flaws that it would simply be _tragic_ to see emulated elsewhere.
the idea that d.p. is indicative of "best practices" is just plain
laughable...
(even in their _current_ state, and they were even worse in years past.)
in a nutshell, d.p. is _extremely_ inefficient, a fact that is largely masked
because of the very large number of volunteers contributing to its effort.
i've said before that if those volunteers knew exactly how much of their
time and energy is being wasted by an inefficient workflow, they'd leave.
and i'd certainly hope that anyone who might build a digitization effort
will have the good sense to conduct a range of efficiency experiments!
(and no, i do not think that a more efficient workflow would reduce the
commitment of volunteers. on the contrary, efficiency would boost it.
who wants to feel their contribution to a good cause is squandered?)
so make up your mind, d.p. if you want to be left alone, that's _fine_.
but then don't hold yourself up as a shining example of "best practice".
if, on the other hand, you want to be seen as a model of digitization,
then you're gonna have to get used to proving your claim empirically,
and be subjected to close scrutiny by outsiders who think otherwise...
> (Even if DP folks commonly discuss this internally,
> I suspect it's not really well known among outsiders,
> which may have an impact on the perception of quality.
i suspect that the low quality of some p.g. e-texts exerts a
much more significant "impact on the perception of quality"
than any article ever could.
> It may also be that the process of doing the whole-book review,
> and of certifying whole-book reviewers, could be improved
> as a result of a wider discussion.)
d.p. doesn't like to have "a wider discussion" of any of its processes.
the only talk they seem willing to do is inside their forums, where a
cult-like "aren't we all wonderful people doing great work!" prevails.
that's great for "bonding", but it sure doesn't improve their workflow.
(this cultish sense is also why "critical outsiders" get treated so badly.)
> I suspect that it would be useful to consider how high-quality
> information can be built not just in the scope of a single project, but
> in an environment where multiple projects stage off one another's work.
a vitally important rule in terms of that kind of workflow is that the
earlier stages not throw out information important to later stages...
here again, though, distributed proofreaders has a bad track record.
i have asked them in the past to retain the original p-book linebreaks,
as just one example. but they are unwilling to do so, even though they
themselves keep the linebreaks during their _own_ workflow, because
doing so makes their proofing task extremely easier. yet nonetheless,
they seem unable to imagine someone else would want the linebreaks.
as another example, i have asked them to include the _filename_ of
the graphic when they note an "[illustration]" in the plain-ascii e-text;
they actually do include the filename in the .html version, of course,
in the "img" tag, so they could just as easily keep it in the ascii version.
but instead, they toss it out. and that just seems bloody stupid to me,
since my viewer-program for the ascii file could show the illustration
if it simply _knew_ the name of the file that contains it. bloody stupid.
i'm not saying that d.p. is _always_ bad at "playing well with others".
here on this very list, for instance, last year, i asked how to download
scans from a d.p. project, and i was not only given that information,
but told how to do it in one easy step (rather than a few easy steps),
and i do want to be fair and give d.p. all of the credit that it deserves,
even while i dish out the criticism that it also (fairly) deserves.
> I think there's an opportunity for careful thought here;
> designing projects to build off others' work, and be reused
> in yet more work, may well produce better results in the long run
> than designing projects to be completely self-contained,
> or in leaving inter-project interactions to chance.
i think it is _highly_ironic_ that you would say this, john, when --
in the course of some backchannel "discussions" we had recently
-- you discounted entirely what i had to say along this same line.
(although you never were able to meet a simple challenge i made.)
i think every project should do its utmost to make itself transparent
in regard to an end-user's ability to _remix_every_bit_ of its content,
and that most projects are falling short at present, sometimes _badly_.
(and i'm looking right at _you_, university of michigan, when i say that.)
so john, are you saying that you now agree with me?
-bowerbird
p.s. i've got a lot of specific reactions to duguid's article, and will be
making a series of posts to the gutvol-d listserve in the near future...
whether the d.p. people care to hear them or not. meanwhile, in the
(highly unlikely) event that d.p. people would like to discuss it _here_,
please let me know and i will be more than happy to be more specific
-- much more specific, with lots and lots of details -- than i have been
with the general thrust i have instead made in this particular message.
oh, and to be considerate of all the lurkers here, if any of you would
_object_ to this highly-detailed treatment (which can be "intricate" and
perhaps excruciatingly picayune, even to most obsessive-compulsives),
do please feel free to register your preference not to hear such a thread.
(but if you don't have the balls to say it publicly, don't whine backchannel
to the moderator, since i just tell him to tell you where your delete key
is.)
[Moderator: Just for the record, I don't recall getting backchannel complaints
from any threads in recent memory concerning too detailed treatments.
We continue to make moderation decisions as we think best, and in our
list policies we've noted some qualities of submissions that can count
against their being accepted, but to-the-point details are definitely
not one of those qualities (assuming we're not talking huge data dumps).
Quite the opposite, in fact. - JMO]