Upcoming SlideShare
Loading in...5







Total Views
Views on SlideShare
Embed Views



9 Embeds 119 81 15 11 6 2 1 1 1 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Captcha Captcha Presentation Transcript

  • Are you Human? (Sorry, I had to ask) Ecaterina Valică
  • Agenda
    • What is CAPTCHA?
    • Types of CAPTCHA
    • Where to use CAPTCHAs?
    • Guidelines when making a CAPTCHA
    • Ways to break CAPTCHAs
    • reCAPTCHA
    • Human Computation Games
  • Example: Filling out a form Google uses CAPTCHA for Gmail accounts:
  • Beginnings
    • C ompletely A utomated P ublic T uring test to tell C omputers and H umans A part
    • Created in 2000 for Yahoo to prevent automated e-mail account registration,
    • by Luis von Ahn , Manuel Blum, Nicholas Hopper and John Langford, Carnegie Mellon University.
  • Inventor Luis von Ahn ( 1978 - ) Photograph by Mike McGregor
  • What is CAPTCHA?
    • A program that can tell whether its user is a human or a computer.
    • It uses a type of challenge-response test to determine that the response is not generated by a computer.
  • Turing Test
    • „ Standard Interpretation"
    • player C , the interrogator,
    • is tasked with trying to determine which player
    • - A or B - is a computer and which is a human.
  • Reverse Turing Test
    • A CAPTCHA is sometimes described as a reverse Turing test, because it is
    • administered by a machine and
    • targeted to a human .
  • So, CAPTCHA is…
    • A program that can generate and grade tests that:
    • Most humans can pass;
    • Current computer programs cannot pass.
  • Making a CAPTCHA
    • Pick random string of characters
    • (or words)
    • ifhkfp
    • Renders it into a distorted image
  • Making a CAPTCHA
    • … and the program generates a test :
    • Type the characters that appear in the image
  • Outperform the computers
    • In many simple tasks, a typical 5-year-old can outperform the most powerful computers
    • easier for computers:
      • like medical diagnosis,
      • playing chess,
    • hard for computers:
      • operations requiring vision , hearing , language or motor control .
  • Type: Early CAPTCHAs
    • Generated by the EZ-Gimpy program;
    • Used previously on Yahoo!
  • Type: Improved CAPTCHA
    • high contrast for human readability;
    • medium, per-character perturbation;
    • random fonts per character;
    • low background noise ;
  • Type: A modern CAPTCHA
    • rather than attempting to create a distorted background and high levels of warping on the text;
    • focus on making segmentation difficult by adding an angled line ;
  • Type: A modern CAPTCHA
    • another way to make segmentation difficult is to crowd symbols together ;
    • this can be read by humans but cannot be segmented by bots;
  • Other Types of CAPTCHA
    • Animated CAPTCHAs
    • 3D CAPTCHA
    • ASCII art
    • Reverse CAPTCHA "Leave this field blank"
  • Other: Cognitive Puzzles
    • Distinguish pictures of dogs from cats
    • Choose a word that relates to all the images
    • Trivia questions
    • Math and word problems
    • 3D Object CAPTCHA
    • Solve failed OCR inputs
  • Other: Distinguish pictures
    • Microsoft Asirra (Animal Species Image Recognition for Restricting Access);
    • KittenAuth Project .
  • Other : Mathematical CAPTCHA
  • Other : Mathematical CAPTCHA
  • Other: 3D Object CAPTCHA
    • You must enter them in the exact sequence listed:
    • The Head of the Walking Man,
    • The Vase,
    • The Back of the Chair.
  • Other: Jumble Game
  • Other: Drupal Examples
  • Other: Tests
    • „ Common Sense" questions:
      • „ What is 3 + 5?“
      • „ What color is the sky?"
    • Type the word 'orange';
    • Require a valid email to approve;
    • These attempts violate principles:
      • they cannot be automatically generated;
      • they can be easily cracked given the state of AI.
  • Where to use CAPTCHAs?
    • Data Collection
    • Worms and Spam
    • Preventing Comment Spam in Blogs
    • Protecting Email Addresses From Scrapers
    • Online Polls
    • Protecting Website Registration
    • Preventing Dictionary Attacks
    • Search Engine Bots
  • Where to use CAPTCHAs?
    • Preventing Comment Spam in Blogs.
    • Protecting Email Addresses From Scrapers. Mechanism to hide your email address, require users to solve a CAPTCHA before showing your email address
    • Online Polls. You cannot trust the results of an online roll because anybody could just write a program to vote for their favorite option thousands of times.
  • Where to use CAPTCHAs?
    • Protecting Website Registration. ( E-mail services: Yahoo, Microsoft, Google )
    • Preventing Dictionary Attacks (in password systems). Prevent a computer to iterate through the entire space of passwords by requiring it to solve a CAPTCHA after a certain number of unsuccessful logins.
    • Search Engine Bots. It is sometimes desirable to keep webpages unindexed to prevent others from finding them easily.
  • Guidelines
    • Image Security. Images of text should be distorted randomly before being presented to the user.
    • Script Level Security. Insecurities:
      • Systems that pass the answer in plain text;
      • Systems where a solution to the same CAPTCHA can be used multiple times ("replay attacks").
  • Guidelines
    • Security Even After Wide-Spread Adoption. There are CAPTCHAs that would be insecure if a significant number of sites started using them.
      • Example: text-based questions;
      • A parser could easily be written that would allow bots to bypass the test;
      • Such “CAPTCHAs” rely on the fact that few sites use them, and thus that a bot author has no incentive to program their bot to solve that challenge.
  • Guidelines
    • Accessibility.
      • CAPTCHAs prevent visually impaired users (for example, due to a disability or because it is difficult to read) from accessing the protected resource;
      • They use screen reader, so when you reached an image, all it can do is to read the caption of that image;
      • Solution : permitting users to opt for an audio or sound CAPTCHA.
  • Guidelines: Accesibility Hard to read CAPTCHAs:
  • Guidelines: Accesibility Worst CAPTCHAs:
  • Ways to break CAPTCHAs
    • Exploiting bugs in the implementation that allow the attacker to completely bypass the CAPTCHA;
    • Improving Character Recognition software (OCR – Optical Character Recognition );
    • Using cheap human labor to process the tests ( sweatshops ).
  • Break: Insecure implementation
    • Re-using the session ID of a known CAPTCHA image.
    • Other CAPTCHA use a hash of the solution as a key passed to the client to validate. Often it is small enough size that it can be cracked .
    • Other implementations use only a small fixed pool of CAPTCHA images (Asirra – 3 millions).
  • Break : Character Recognition
    • Programs that have the following functions:
      • Extraction of the image from the web page
      • Removal of background clutter, for example with color filters and detection of thin lines;
      • Segmentation , i.e. splitting the image into regions each containing a single letter;
      • Identifying the letter for each region.
  • Attacks – EZ-Gimpy 2000
    • Yahoo's early CAPTCHA called "EZ-Gimpy“;
    • The program picks a word from a dictionary, and produces a distorted and noisy image of the word;
    • Algorithm for breaking EZ-Gimpy ( 92% ):
      • Locate possible letters at various locations;
      • Construct graph of consistent letters;
      • Look for plausible words in the graph.
  • Attacks – EZ-Gimpy 2000
    • EZ-Gimpy
    • Possible Letters
    • Graph of Letters
    • Plausible Words
  • Attacks – SimpleOCR Engine 2002
  • Attacks – Jan/Feb 2008
    • Google
    • ( Jan 17 ) 20%
    • Hotmail
    • (Feb 6) 30-35%
    • Yahoo
    • (Feb 22) 30-35%
  • Attacks – Projects
    • Several broking CAPTCHAs projects:
      • http:// / captcha /
  • Break: Human solvers
    • Attacks that uses humans to solve the puzzles;
    • Approaches:
      • relaying the puzzles to a group of human operators who can solve CAPTCHAs;
      • copying the CAPTCHA images and using them as CAPTCHAs for a high-traffic site owned by the attacker.
  • CAPTCHA Sweatshops
    • A computer fills out a form and when it reaches a CAPTCHA, it gives it to the operator to solve.
    • Weakness for Asirra:
      • if the database of cat and dog photos can be downloaded,
      • then paying workers $0.01 to classify each photo,
      • means that almost the entire database of photos can be deciphered for $30,000.
    • Once IP has misclassified a challenge, a human needs to just solve two Asirras in a row from the same browser session.
  • CAPTCHA Sweatshops
    • Not Economical Viable
    • A typical spam run of 1 million messages per day would cost $14,000 per day and require 116 people working 24/7.
    $2.50 / h for each human 720 CAPTCHAs per hour per human 1/3 cent per account
  • Porn Companies (October 2007)
    • They write a program that fills out the entire registration form (ex Yahoo);
    • When the program gets to the CAPTCHA it can’t solve it;
    • So it copies the CAPTCHA back to the porn page;
    • One person gets the screen saying if you want to see the next picture, you’ve got to tell what word is in the specific CAPTCHA.
  • Porn Companies (October 2007)
  • Next CAPTCHA Generation
    • CAPTCHAs can be made stronger, but they are already too advanced for a large percentage of Internet users;
    • CAPTCHA devolves from a simple human reading test into an intelligence test or an acuity test.
  • reCAPTCHA (2007)
    • New form of CAPTCHA that also helps digitize books ;
    • The words displayed to the user come directly from old books that are being digitized;
    • Words that OCR could not identify ;
    • Pairs an unknown word with a known one ;
    • Distorts them both and puts a line through them and then sent them to be proofread;
    • Respondent answers both elements:
      • half of effort validates the challenge;
      • the other half is captured as work.
  • Time spent
    • Roughly 60 million CAPTCHAs are solved each day;
    • Medium 10 seconds to solve a captcha;
    • People around the world waste more than 150,000 hours on solving CAPTCHAs;
  • Time spent
    • A fifth of those users giving
    • 30,000 daily man-hours of work;
    • It would constitute the world's fastest and most accurate character-recognition computer, processing 10 million words a day.
    • Recreating the books – word by word
  • Time spent
    • 9 Billion Human-Hours of
    • Solitaire were played in 2003
    • Empire State Building
    • 7 million Human-Hours
    • (6.8 Hours of Solitaire)
    • Panama Canal
    • 20 Million Human-Hours
    • (Less than a day of Solitaire)
  • Wasted human cycles
    • If the world's computer Solitaire players could be coaxed into enjoying a game that contributed to solving a computing problem, he calculates, it would produce billions of man-hours of labor each year.
    • „ make all of humanity more efficient by exploiting the human cycles that get wasted“
  • Wasted human cycles
    • People will contribute their brainpower, but only if they're given an enjoyable , time-killing experience in exchange.
    • Most projects that harness human processing power rely on a different motivator: money .
    • Which produces better results — a small group of experts or a huge mob of amateurs ?
  • Human Computation
    • Things that we humans can do and computer cannot, like:
      • Labeling images with words;
      • Picking out a voice in a loudly room;
    • Humans have trouble remembering long, random strings of characters, yet they excel at remembering faces and objects.
  • Symbiotic relationship
    • One in which humans solve some problems, computer solve some other problems;
    • Image search - A method that every image on the Web could give us accurate textual descriptions of those images;
  • The ESP Game
    • Two-player online game;
    • Partners don’t know each other
    • and can’t communicate;
    • Object of the game:
    • Type the same word ;
    • The only thing in common
    • is an image ;
  • The ESP Game Player 1 Guessing: CAR Guessing : HAT Guessing: KID Success! You agree on CAR Player 2 Guessing : BOY Guessing : CAR Success! You agree on CAR
  • The ESP Game
    • The ESP Game has been licensed (2006) by Google in the form of the Google Image Labeler , and is used to improve the accuracy of the Google Image Search .
    • “ 5000 people playing simultaneously can
    • label all images on Google in 30 days !”
  • ESP Tag a Tune Matchin
  • Squigl Verbosity
  • Future Games
    • Language translation. A game could challenge two players who don’t speak the same language to translate text from one language to the other.
    • Monitoring of security cameras . Players could monitor security cameras and alert authorities about suspected illegal activity.
  • Future Games
    • Improving Web search. People have varying degrees of skill at searching for information on the Web. A game could be designed in which the players perform searches for other people.
    • Text summarization. Imagine a game in which people summarize important documents for the rest of the world.
  • Still not thinking big enough
    • "If we have that many people all doing some little part, we could do something insanely huge for humanity."
    • "We'll never run out of things to digitize"
  • Bibliography
    • Site: Luis von Ahn Website (2006)
    • Site: reCAPTCHA (2007)
    • Site: CAPTCHA (2007)
    • Site: Gwap (2008)
    • Interview: „ Using “ captchas ” to digitize books “ (2007)
    • Interview: „For Certain Tasks , the Cortex Still Beats the CPU“ (2007)
  • Bibliography
    • Video: Wired – „Human Computation “ (2007)
    • Video: Google TechTalks – “Human Computation” (2006)
    • Paper: „ Games With a Purpose “ (2006)
    • Paper: „ How Lazy Cryptographers do AI “ (2004)
    • Paper: „ CAPTCHA: Using Hard AI Problems for Security “(2003)
  • Bibliography
    • Article: “CAPTCHA is Dead, Long Live CAPTCHA!” (2008)
    • Article: „ Yahoo's CAPTCHA Security Reportedly Broken “ (2008)
    • Article: „ Anti-CAPTCHA operations on Microsoft Mail“ (2008)
    • Article: „ Google’s CAPTCHA busted in recent spammer tactics “ (2008)
  • Bibliography
    • Paper: „ Recognizing Objects in Adversarial Clutter “ (2002)
    • Article: Wikipedia CAPTCHA (2008)
    • Article: „ CAPTCHA Effectiveness” (2006)
    • Article: „ Breaking a Visual CAPTCHA“ (2002)
    • Article: „Human or Computer? Take This Test“ (2002)
    • Site: XKCD (2008)
    • Thank you!