reCAPTCHA
April 19, 2008 | 4:58 am
Unless you haven’t used the internet in the past 3 years or so, you are probably very well acquainted with CAPTCHAs already; they are these morphed letters or numbers you see on blogs and signup forms that are supposed to prevent spam. They are a product of research by the “human-based computation” / cryptography researcher Luis von Ahn (also here) at Carnegie Mellon University.
CAPTCHAs solve a simple problem; Spammers can write small applications that scan for input forms and fill out these forms automatically and repeatedly. If you run your own blog, you probably noticed this (spam actually isn’t written by people usually
). So the answer to this symptom is also simple, if spammers are using computer applications to automatically propagate spam, make part of the fill out form something that the computers can not recognize. So a CAPTCHA creates a sequence of characters and morphs it so that the program that is generating it itself can not read it (and hence other programs won’t be able to read it either), and puts it on the forms. We, being mostly human, are actually able to read and decode that morphed text, and thus by solving these little problems, we can assure the web application that is receiving this form, that we are indeed humans (not spam applications).
Needless to say, the idea of CAPTCHAs was a big hit. These things sprung all over the web and are now used everywhere. Recent studies put an estimate number of how many seconds does each average web surfers spend daily solving these CAPTCHAs. Although the amount is small for a single person, aggregating this amount over all internet users gives us a huge huge number of wasted “human cycles”. So, Luis decided to utilize that time.
As many of you know, there is a huge effort to digitize old books for which no electronic copies exist. Digitizing these books uses a process called OCR which is also provided by your average home scanner. Unfortunately OCR is a bit error prone and some words are not recognized correctly. However, given the awesome power of recognition that people poses, even though OCR programs might not be able to understand certain words, we can. But, who will be willing to sit in an office all day “recognizing” words ? This is a mundane job to say the least. Can we somehow outsource this job to the masses (without paying them
) ?
Introducing reCAPTCHA
Connecting the dots seems easy now, and a re-engineering of CAPTCHAs is made, and out comes reCAPTCHA (again by Luis and his team). A reCAPTCHA is just like your ordinary CAPTCHA, but instead of one morphed word you now have two. One of these words the server already knows the answer for, and the other one is unrecognizable by the OCR program. The thought is, if the entity filling out the form knows the answer for the thing that i already know, then it is highly likely that it also knows the answer for that other thing that i don’t know.
So, now every time you fill out one of these reCAPTCHA forms, (or comment on this blog
) you are helping digitize one word. If this spreads wide on the internet, we can have our old library digitized in no time.
Anyway, after this long introduction and motivation, i just would like to announce that i am adding reCAPTCHA to my blog. There is a ready wordpress plugin for it so i urge you all to add it if you can.











