Saturday, January 24th, 2009
Everybody’s favourite glass shield to protect web apps are CAPTCHAS. These are the distorted characters displayed on a page that a user has to enter before gaining access or sending off a form. They annoy normal users, are largely inaccessible to blind users or dyslexic people and are not that safe as we think they are. PWNtcha continually reports successful cracks of various captchas on the web using OCR algos and backend systems.
As John Resig explains in his analysis of the script there’s some pretty nifty work going on:
- The HTML 5 Canvas getImageData API is used to get at the pixel data from the Captcha image. Canvas gives you the ability to embed an image into a canvas (from which you can later extract the pixel data back out again).
- The pixel data, extracted from the image using Canvas, is fed into the neural network in an attempt to divine the exact characters being used – in a sort of crude form of Optical Character Recognition (OCR).
Posted by Chris Heilmann at 5:42 am