Book People Archive

Distributed proofreading at very fine grain



A computer science project in my old department at CMU uses
an interesting twist on CAPTCHA technology to tackle two
problems at once: fighting spam and improving the quality
of OCR'd online books.  Both problems involve having humans do
things that machines can't easily do by themselves; in this case,
verifying the correct transcription of uncertainly OCR'd
text.

The project, known as reCAPTCHA, is working with text
captures from archive.org's books project.  It uses a voting
mechanism to deal with the problems of text that humans might
(accidentally or deliberately) misread.

Read about it (and some comments and critiques from readers)
at project member Ben Maurer's blog at

    http://bmaurer.blogspot.com/2007/05/recaptcha-new-way-to-fight-spam.html

John