CAPTCHA
Vivek Maskara
Vaibhav Goyal
Khoshow
AGENDA
 DEFINITION
 BACKGROUND
 TYPES
 APPLICATIONS
 CONSTRUCTING CAPTCHA
 BREAKING CAPTCHA
 ISSUES WITH CAPTCHA
 CONCLUSION
INTRODUCTION
CAPTCHA – Completely Automated
Public Turing test to tell Computers &
Humans Apart.
Invented at CMU by Luis von Ahn,
Manuel Blum, et.al.
It is a program that is a challenge
response to test to separate humans
from computer programs.
Generic CAPTCHAs distort letters &
numbers -
Distorted characters are presented to
the user.
User has to recognize the distorted
letters.
If the guessed letters are correct, the
user is inferred to be a human &
allowed access.
Contd…
Humans can read the distorted &
noisy text.
Current OCRs(Optical Character
Recognition) cannot read them.
BACKGROUND
Why CAPTCHA was needed ?
 Sabotage of Online Polls.
 Spam e-mails.
 Abusing free Online accounts.
 Tampering with rankings on
recommendation systems (like
Ebay, Amazon)
What is TURING TEST ?
 Proposed by Alan Turing.
 To test a machine’s level of intelligence.
 Human judge asks questions to two participants,
one is a machine & the other human.
 The judge doesn’t know which is which.
 After listening to the answer, if the judge fails to
recognize which one is the machine, then the
machine passes the test.
Contd…
CAPTCHA employs a Reverse Turing
Test.
Judge = CAPTCHA program,
participant = user
If the user passes CAPTCHA, he is
human otherwise it is a machine.
Types of CAPTCHAs
1. Text Based-
 simple, normal questions :-
 What is the sum of three & thirty-five ?
 If today is Saturday, what is day after
tomorrow ?
 Which of mango, table & water is a fruit ?
Very effective, needs a large question bank.
Congnitively challenged users find it hard.
2. Gimpy-
 Designed by Yahoo & CMU(Carnegie Mellon
University)
 Picks up 10 random words from dictionary & distorts,
fills with noise.
 User has to recognize at least 3 words.
 If the user is correct, then he is admitted.
3. EZ-Gimpy-
 A modified version of Gimpy.
 Yahoo used this version in Messenger.
 Has only 1 random string of characters.
 Not a dictionary word, so not prone to dictionary
attack.
 Not a good implimentation , already broken by
OCRs(Optical Character Recognition).
4. MSNs passport service CAPATCHAs-
 Provided for Microsoft’s MSN services.
 Use of 8 characters.
 Warping is used to distort.
 Very strong implementation, hasn’t been broken.
 It is segmentation-resistant.
5. Graphic based CAPTCHAs-
 1. BONGO-
 After M.M.Bongard, pattern recognition expert.
 User has to solve a pattern recognition problem.
 Has to tell the distinct characteristic between
two sets of figures.
 Then tell to which set a given figure belongs to.
Contd…
 2. PIX-
 Uses a large database of labelled images.
 It shows a set of images, user has to recognize
the common feature among those.
 Eg :- pick the common characteristic among the
following 4 pictures = “aeroplane”.
6. Audio CAPTCHAs-
 Consists of downloadable audio clip.
 User listens & enters the spoken word.
 Helps visually disabled users.
 Below is the Google’s audio enabled CAPTCHA-
7. Applications-
 Protect Online polls.
 Prevent web registration abuse, protect
passwords from brute-force attack.
 Prevent comment spam & spam e-mails.
 E-ticketing, prevent scalping.
Contd…
 Verify digitized books : “RE-CAPTCHA”
 Used in Google books project.
 Two words are shown, the program knows the first word.
 If the user enter the first word correctly, it assumes that
the second unknown word will also be entered correctly.
 Second word becomes “known”.
Constructing CAPTCHAs
 Things to keep in mind :-
 Don’t store CAPTCHA solution in web page’s
metadata.
 A CAPTCHA is no good if it doesn’t distort.
 Need a large database of different CAPTCHA
questions.
 Avoid repetition of question.
CAPTCHA logic
 Generate the question
 Persist the correct answer
 Present the question to the user
 Evaluate the answer, if incorrect start again-
Generate a different CAPTCHA
 If correct allow the access to the user
Breaking CAPTCHAs
 Cracking CAPTCHAs through programs –
 Convert CAPTCHA into Grey scale.
 Detect patterns in the image
corresponding to the characters
 Greg Mori & Jitendra Malik have broken text
CAPTCHAs
Ex:- Easy Gimpy
Contd…
 Social engineering to break CAPTCHAs –
 Spammer encounters a CAPTCHA
 That CAPTCHA is copied to another site
 Humans are baited, Ex:- free MP3s, free wallpapers, etc.
 To get those MP3s or wallpapers, users are told to solve
the copied CAPTCHA.
 Then the solution is routed back to the spammer.
Solution – Fix a time-to-live period for a question.
Issues with CAPTCHAs
 Usability issue –
 W3C mandates web to be accessible to all
people.
 Some CAPTCHAs are inaccessible to visually
impaired, cognitively challenged people.
 Compatibility issue –
 Java script may be needed to be activated in
browsers.
 Some may need Adobe Flash Plugin.
SUMMARY
CAPTCHAs are an effective way to
counter bots & reduce spam.
They help advance AI knowledge.
Some issues with current
implementations represent challenges
for future improvements.
THANK YOU

Captcha and Recaptcha Seminar

  • 1.
  • 2.
    AGENDA  DEFINITION  BACKGROUND TYPES  APPLICATIONS  CONSTRUCTING CAPTCHA  BREAKING CAPTCHA  ISSUES WITH CAPTCHA  CONCLUSION
  • 3.
    INTRODUCTION CAPTCHA – CompletelyAutomated Public Turing test to tell Computers & Humans Apart. Invented at CMU by Luis von Ahn, Manuel Blum, et.al. It is a program that is a challenge response to test to separate humans from computer programs.
  • 4.
    Generic CAPTCHAs distortletters & numbers - Distorted characters are presented to the user. User has to recognize the distorted letters. If the guessed letters are correct, the user is inferred to be a human & allowed access.
  • 5.
    Contd… Humans can readthe distorted & noisy text. Current OCRs(Optical Character Recognition) cannot read them.
  • 6.
    BACKGROUND Why CAPTCHA wasneeded ?  Sabotage of Online Polls.  Spam e-mails.  Abusing free Online accounts.  Tampering with rankings on recommendation systems (like Ebay, Amazon)
  • 7.
    What is TURINGTEST ?  Proposed by Alan Turing.  To test a machine’s level of intelligence.  Human judge asks questions to two participants, one is a machine & the other human.  The judge doesn’t know which is which.  After listening to the answer, if the judge fails to recognize which one is the machine, then the machine passes the test.
  • 8.
    Contd… CAPTCHA employs aReverse Turing Test. Judge = CAPTCHA program, participant = user If the user passes CAPTCHA, he is human otherwise it is a machine.
  • 9.
  • 10.
    1. Text Based- simple, normal questions :-  What is the sum of three & thirty-five ?  If today is Saturday, what is day after tomorrow ?  Which of mango, table & water is a fruit ? Very effective, needs a large question bank. Congnitively challenged users find it hard.
  • 11.
    2. Gimpy-  Designedby Yahoo & CMU(Carnegie Mellon University)  Picks up 10 random words from dictionary & distorts, fills with noise.  User has to recognize at least 3 words.  If the user is correct, then he is admitted.
  • 12.
    3. EZ-Gimpy-  Amodified version of Gimpy.  Yahoo used this version in Messenger.  Has only 1 random string of characters.  Not a dictionary word, so not prone to dictionary attack.  Not a good implimentation , already broken by OCRs(Optical Character Recognition).
  • 13.
    4. MSNs passportservice CAPATCHAs-  Provided for Microsoft’s MSN services.  Use of 8 characters.  Warping is used to distort.  Very strong implementation, hasn’t been broken.  It is segmentation-resistant.
  • 14.
    5. Graphic basedCAPTCHAs-  1. BONGO-  After M.M.Bongard, pattern recognition expert.  User has to solve a pattern recognition problem.  Has to tell the distinct characteristic between two sets of figures.  Then tell to which set a given figure belongs to.
  • 15.
    Contd…  2. PIX- Uses a large database of labelled images.  It shows a set of images, user has to recognize the common feature among those.  Eg :- pick the common characteristic among the following 4 pictures = “aeroplane”.
  • 16.
    6. Audio CAPTCHAs- Consists of downloadable audio clip.  User listens & enters the spoken word.  Helps visually disabled users.  Below is the Google’s audio enabled CAPTCHA-
  • 17.
    7. Applications-  ProtectOnline polls.  Prevent web registration abuse, protect passwords from brute-force attack.  Prevent comment spam & spam e-mails.  E-ticketing, prevent scalping.
  • 18.
    Contd…  Verify digitizedbooks : “RE-CAPTCHA”  Used in Google books project.  Two words are shown, the program knows the first word.  If the user enter the first word correctly, it assumes that the second unknown word will also be entered correctly.  Second word becomes “known”.
  • 19.
    Constructing CAPTCHAs  Thingsto keep in mind :-  Don’t store CAPTCHA solution in web page’s metadata.  A CAPTCHA is no good if it doesn’t distort.  Need a large database of different CAPTCHA questions.  Avoid repetition of question.
  • 20.
    CAPTCHA logic  Generatethe question  Persist the correct answer  Present the question to the user  Evaluate the answer, if incorrect start again- Generate a different CAPTCHA  If correct allow the access to the user
  • 21.
    Breaking CAPTCHAs  CrackingCAPTCHAs through programs –  Convert CAPTCHA into Grey scale.  Detect patterns in the image corresponding to the characters  Greg Mori & Jitendra Malik have broken text CAPTCHAs Ex:- Easy Gimpy
  • 22.
    Contd…  Social engineeringto break CAPTCHAs –  Spammer encounters a CAPTCHA  That CAPTCHA is copied to another site  Humans are baited, Ex:- free MP3s, free wallpapers, etc.  To get those MP3s or wallpapers, users are told to solve the copied CAPTCHA.  Then the solution is routed back to the spammer. Solution – Fix a time-to-live period for a question.
  • 23.
    Issues with CAPTCHAs Usability issue –  W3C mandates web to be accessible to all people.  Some CAPTCHAs are inaccessible to visually impaired, cognitively challenged people.  Compatibility issue –  Java script may be needed to be activated in browsers.  Some may need Adobe Flash Plugin.
  • 24.
    SUMMARY CAPTCHAs are aneffective way to counter bots & reduce spam. They help advance AI knowledge. Some issues with current implementations represent challenges for future improvements.
  • 25.