Tuesday, May 29, 2007

"ReCAPTCHA" your forms and email while helping the world to read


The computer scientists from the School of Computer Science at Carnegie Mellon University, with support from Intel, Novell and The MacArthur Foundation, have put together a really nifty service that kills two birds with one click of the submit button.

reCAPTCHA is a free web service that anyone can use to prevent spammers and bots from posting to your web form or sending you email by scraping your mailto links. They do a really nice job of explaining their service and the whole subject of CAPTCHA in general but I'll try to sum it up.

To submit a form or reveal an email link on a web page, you must enter two words presented to you as images that have been slightly obfuscated so that only a human brain is likely to make them out. One of the words is known by the system and if you enter it correctly, the submission is considered to be by made by a human and not some spambot. The other word is not known by the system but if you entered the the known word correctly, the system will make the reasonable assumption that the other word is also likely correct - that is, was entered by someone with a brain - and it will compare your entry with the other potentially correct entries for that word to determine what that unknown word really is.

OK, well, so what? Well, those words are very important. They come from texts that have been scanned in with OCR (optical character recognition) software but were not fully recognized and matched to their text equivalents. So by using the reCAPTCHA service, you are helping to digitize books from the Internet Archive. With each accurate challenge response input by a human - or very lucky monkey - a hard-to-decipher piece of text from a book gets that much closer to being translated.

Simple but clever solution, eh? Turing would be proud.

It's sort of like SETI@home except instead of sharing processors to hunt for alien messages from space, they're farming human neurons to decode messages - in the form of books - from our own earthbound species. It's a real-time human neural net, a collectively conscious form of web-based wetware!

Some important things to note about how this service is different than many others:
  • The widget offers a reload button so if you can't identify the two words you can try another two.
  • The widget provides audio CAPTCHA so it's accessible to the visually impaired. A series of numbers are read out with a slight fuzzing noise in the background (a aural equivalent to the visual obfuscation in the image version) and you enter them instead of the words.
  • The same widget GUI is available in a bonus feature, reCAPTCHA Mailhide, that lets you hide your mailto links via a popup window linked to the obfuscated email address.
Most of the software on the backend is FOSS, heavy on the Python side. But there are plugins and libraries for wiring up PHP, Python, Ruby, Rails, Perl and some of the more popular bulletin boards, blogging, and publishing tools out there like Wordpress, phpBB, and MoveableType to name just a few. I'm sure the list will grow.

In fact, I wanted support in both Rails and Django for controlling the CSS themes and tabindex in the widget per the "Look and Feel" section of the API, so I just contributed a little code of my own to Jason L. Perry's most excellent Rails plugin and Ben Maurer's handy Python library. (I know the functionality is now in the Rails plugin and it will probably be available shortly in the Python version.)

There is a Google reCAPTCHA group for questions and support and you can download some of the plugin and library code with subversion from Google Code.

Kudos to the folks at CMU for such a nice implementation and all the developer's who've been contributing plugins for it. I'm going to recommend this solution to my clients as it's the best I've seen to date.

Please help spread the word about this great new service and contribute code to help support their effort.

Death to spam!!!

[UPDATE]
I've just checked in the same widget styling functionality to McClain Looney's recaptcha Ruby gem. The nice thing about McClain's solution is that it can also be used outside of Rails. It also supports Mailhide nicely.

(In the process McClain introduced me to Mercurial, a light-weight, distributed scm built in Python. There's even a TextMate bundle for Mercurial.)

p.s. Here are the Wired article and Ben Maurer's blog post that first turned me on this.

p.p.s. For you Django developers, here's a view code example of a contact form (with newforms) using Ben's library with my changes for CSS themes and tabindex.

settings.py:
...
# you API keys here...
RECAPTCHA_PUBLIC_KEY = 'xxxxxxxxxxxxxxxxxxxxxxxxx'
RECAPTCHA_PRIVATE_KEY = 'yyyyyyyyyyyyyyyyyyyyyyyyyyy'
...

forms.py:

from django import newforms as forms

class ContactForm(forms.Form):
subject = forms.CharField(max_length=100)
message = forms.CharField()
sender = forms.EmailField()
cc_myself = forms.BooleanField(required=False)

views.py:

from recaptcha.client import captcha
from citizencoder.blog.forms import ContactForm
from citizencoder import settings

def contact(request):
captcha_error = ''
captcha_html = captcha.displayhtml(settings.RECAPTCHA_PUBLIC_KEY,
True, theme='white', tabindex=5)
if request.method == 'POST':
form = ContactForm(request.POST, auto_id=True)
captcha_response = captcha.submit(request.POST['recaptcha_challenge_field'],
request.POST['recaptcha_response_field'],
settings.RECAPTCHA_PRIVATE_KEY,
request.META['REMOTE_ADDR'])
if captcha_response.is_valid:
if form.is_valid():
# Form processing here...
return HttpResponseRedirect('/blog/thanks/')
else:
captcha_error = """
Captcha was correct but you will need to reenter it
because other form fields weren't correct.
"""
else:
#captcha_error = captcha_response.error_code
# TODO: map captcha error_codes to different messages...
captcha_error = "Invalid captcha entry. Please try again."
else:
form = ContactForm(auto_id=True)

return render_to_response('blog/contact.html',
{'form' : form,
'captcha_html' : captcha_html,
'captcha_error' : captcha_error })

contact.html:

{% extends "base.html" %}
{% block title %} Contact Me {% endblock %}
{% block content %}
...
<form method="post">
<table>
{{ form }}
</table>
{% if captcha_error %}
<p>{{ captcha_error }}</p>
{% endif %}
<p>{{ captcha_html }}</p>
<input type="submit" value="submit" />
</form>
{% endblock %}

1 Comments:

Blogger Rocco Pellegrini said...

Good!
It seems not real but only you made an example about recaptcha in python.
Thanks
Rocco Pellegrini

9:53 AM  

Post a Comment

<< Home