• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
563.10.3 captcha

563.10.3 captcha






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    563.10.3 captcha 563.10.3 captcha Presentation Transcript

    • 563.10.3 CAPTCHA Presented by: Sari Louis SPAM Group: Marc Gagnon, Sari Louis, Steve White University of Illinois Spring 2006
    • Agenda
      • Definition
      • Background
      • Applications
      • Types of CAPTCHAs
      • Breaking CAPTCHAs
      • Proposed Approach
      • Conclusion
    • Definition
      • CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart
      • A.K.A. Reverse Turing Test, Human Interaction Proof
      • The challenge: develop a software program that can create and grade challenges most humans can pass but computers cannot
    • Background
      • First used by Altavista in1997
        • Reduced SPAM add-url by over 95%
      • CMU/Yahoo!
        • Automated the creating and grading of challenges
      • PARC
        • Relies on document image degradation to prevent successful OCR
        • Conducted user-focused studies to assess the effectiveness of CAPTCHAs
    • Background
      • CAPTCHAs are based on open AI problems
      • Breaking CAPTCHAs help advance AI by solving these open problems
      • Improving CAPTCHAs help telling computers and human apart
      • Win-win situation
    • Background - Papers
      • Pessimal Print: A Reverse Turing Test Allison L. Coates, Henry S. Baird, Richard J. Fateman
      • Telling Humans and Computer Apart Automatically Luis von Ahn, Manuel Blum, and John Langford
      • CAPTCHA: Using Hard AI Problems for Security Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford
      • Using Machine Learning to Break Visual Human Interaction Proofs (HIPs) Kumar Chellapilla, Patrice Y. Simard
    • Applications
      • Free email services
      • Online polls
      • Dictionary attacks
      • Newsgroups, Blogs, etc…
      • SPAM
    • Types of CAPTCHAs
      • Text based
        • Gimpy, ez-gimpy
        • Gimpy-r, Google CAPTCHA
        • Simard’s HIP (MSN)
      • Graphic based
        • Bongo
        • Pix
      • Audio based
    • Text Based CAPTCHAs
      • Gimpy, ez-gimpy
        • Pick a word or words from a small dictionary
        • Distort them and add noise and background
      • Gimpy-r, Google’s CAPTCHA
        • Pick random letters
        • Distort them, add noise and background
      • Simard’s HIP
        • Pick random letters and numbers
        • Distort them and add arcs
    • Text Based CAPTCHAs
    • Graphic Based CAPTCHAs
      • Bongo
        • Display two series of blocks
        • User must find the characteristic that sets the two series apart
        • User is asked to determine which series each of four single blocks belongs to
        • Difference? thick vs. thin lines
    • Graphic Based CAPTCHAs
      • PIX
        • Create a large database of labeled images
        • Pick a concrete object
        • Pick four images of the object from the images database
        • Distort the images
        • Ask the user to pick the object for a list of words
    • Graphic Based CAPTCHAs Dog Pool
    • Audio Based CAPTCHAs
      • Pick a word or a sequence of numbers at random
      • Render them into an audio clip using a TTS software
      • Distort the audio clip
      • Ask the user to identify and type the word or numbers
    • Breaking CAPTCHAs
      • Most text based CAPTCHAs have been broken by software
        • OCR
        • Segmentation
      • Other CAPTCHAs were broken by streaming the tests for unsuspecting users to solve.
    • Proposed Approach
      • Very similar to PIX
      • Pick a concrete object
      • Get 6 images at random from images.google.com that match the object
      • Distort the images
      • Build a list of 100 words: 90 from a full dictionary, 10 from the objects dictionary
      • Prompt the user to pick the object from the list of words
    • Proposed Approach - Technical
      • Make an HTTP call to images.google.com and search for the object
      • Screen scrape the result of 2-3 pages to get the list of images
      • Pick 6 images at random
      • Randomly distort both the images and their URLs before displaying them
      • Expire the CAPTCHA in 30-45 seconds
    • Proposed Approach - Benefits
      • The database already exists and is public
      • The database is constantly being updated and maintained
      • Adding “concrete objects” to the dictionary is virtually instantaneous
      • Distortion prevents caching hacks
      • Quick expiration limits streaming hacks
    • Proposed Approach - Drawbacks
      • Not accessible to people with disabilities (which is the case of most CAPTCHAs)
      • Relies on Google’s infrastructure
      • Unlike CAPTCHAs using random letters and numbers, the number of challenge words is limited