Monthly ArchiveNovember 2006



Blog 24 Nov 2006 05:49 pm

Using Wordpress…

I switched over to using Wordpress for my personal blog. Drupal was a bit heavyweight and I wanted to get some experience with Wordpress. Hopefully all the posts moved over correctly (I know I have some formatting tweaks to do and stuff).

Coding 08 Nov 2006 11:09 am

On Comment Spam and CAPTCHAS

This article on alternatives to graphical CAPTCHAS is pretty brief, but it made me remember my ideas for battling comment spam.

This CodingHorror entry inspired me to think up some ways of doing validation that is easy for humans, but potentially hard for bots.

  • Simple Word - a simple printout of the validation word next to an input box that the user must then replicate. This could be a word from any dictionary, or random, or some combination of the two. I don’t think this is that much less effective than an image with the same function. If you’re worried that the proximity of the word to the input is a problem, use stylesheets to randomly separate them. This still allows a real human to copy-paste the text, so it makes it real simple. Alternatively, do similar things with an image that is easier to read.
  • Randomized Form IDs - the actual entry name (variable name) could be randomized for each user - which would prohibit automated bots from just submitting blind posts.
  • Randomized Submit - I imagine a grid of buttons, all of which are submittable (but may be hidden), one of which is shown to the user and valid. Put that button randomly in that grid, with a random ID (as above), and I can’t imagine a good way of automating a submission to that. Using the button type=image is very powerful here, where hidden buttons will be a transparent image, or the same as the background, and the shown button would be unique. Using some rewrites to make sure those image names are randomized works here (where only certain ranges or checksum+salts are valid)
  • Simple Logic - any number of single-digit-arithmetic could be done with the Simple Word method above. Things as simple as 2+2=?, 8-3=? would get you pretty far, and depending on your audience, you could get more complicated.
  • Word Manipulation - Using the Simple Word methods above, do some manipulation of that word (like the word backwards, or if multiple words are used, transpose the words).
  • Time Limit - encrypt (or even just hash) a time limit on the form - if the time limit is less than that which a human would use (depends on the application, obviously), then its probably a bot.

All of these can be done without javascript, and at most require some of the more simple functions of stylesheets on the user side. On the server side, they all require some session-persistance, so cookies are required, unless you encrypt the valid data into the form (which isn’t terrible).

As with most things, a combination of many of the above is preferred - and some of it randomized (word vs logic, etc).

I’m going to look into how drupal does their captcha and see if I can build a library do do some of these things. One of the sites I have waiting in the wings will use these methods heavily (the previous run was heavily bot’ed within a week).