Published on

Published in: Travel, Technology
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Captcha

    1. Are you Human? (Sorry, I had to ask) Ecaterina Valică
    2. Agenda <ul><li>What is CAPTCHA? </li></ul><ul><li>Types of CAPTCHA </li></ul><ul><li>Where to use CAPTCHAs? </li></ul><ul><li>Guidelines when making a CAPTCHA </li></ul><ul><li>Ways to break CAPTCHAs </li></ul><ul><li>reCAPTCHA </li></ul><ul><li>Human Computation Games </li></ul>
    3. Example: Filling out a form Google uses CAPTCHA for Gmail accounts:
    4. Beginnings <ul><li>C ompletely A utomated P ublic T uring test to tell C omputers and H umans A part </li></ul><ul><li>Created in 2000 for Yahoo to prevent automated e-mail account registration, </li></ul><ul><li>by Luis von Ahn , Manuel Blum, Nicholas Hopper and John Langford, Carnegie Mellon University. </li></ul>
    5. Inventor Luis von Ahn ( 1978 - ) Photograph by Mike McGregor
    6. What is CAPTCHA? <ul><li>A program that can tell whether its user is a human or a computer. </li></ul><ul><li>It uses a type of challenge-response test to determine that the response is not generated by a computer. </li></ul>
    7. Turing Test <ul><li>„ Standard Interpretation&quot; </li></ul><ul><li>player C , the interrogator, </li></ul><ul><li>is tasked with trying to determine which player </li></ul><ul><li>- A or B - is a computer and which is a human. </li></ul>
    8. Reverse Turing Test <ul><li>A CAPTCHA is sometimes described as a reverse Turing test, because it is </li></ul><ul><li>administered by a machine and </li></ul><ul><li>targeted to a human . </li></ul>
    9. So, CAPTCHA is… <ul><li>A program that can generate and grade tests that: </li></ul><ul><li>Most humans can pass; </li></ul><ul><li>Current computer programs cannot pass. </li></ul>
    10. Making a CAPTCHA <ul><li>Pick random string of characters </li></ul><ul><li> (or words) </li></ul><ul><li>ifhkfp </li></ul><ul><li>Renders it into a distorted image </li></ul>
    11. Making a CAPTCHA <ul><li>… and the program generates a test : </li></ul><ul><li>Type the characters that appear in the image </li></ul>
    12. Outperform the computers <ul><li>In many simple tasks, a typical 5-year-old can outperform the most powerful computers </li></ul><ul><li>easier for computers: </li></ul><ul><ul><li>like medical diagnosis, </li></ul></ul><ul><ul><li>playing chess, </li></ul></ul><ul><li>hard for computers: </li></ul><ul><ul><li>operations requiring vision , hearing , language or motor control . </li></ul></ul>
    13. Type: Early CAPTCHAs <ul><li>Generated by the EZ-Gimpy program; </li></ul><ul><li>Used previously on Yahoo! </li></ul>
    14. Type: Improved CAPTCHA <ul><li>high contrast for human readability; </li></ul><ul><li>medium, per-character perturbation; </li></ul><ul><li>random fonts per character; </li></ul><ul><li>low background noise ; </li></ul>
    15. Type: A modern CAPTCHA <ul><li>rather than attempting to create a distorted background and high levels of warping on the text; </li></ul><ul><li>focus on making segmentation difficult by adding an angled line ; </li></ul>
    16. Type: A modern CAPTCHA <ul><li>another way to make segmentation difficult is to crowd symbols together ; </li></ul><ul><li>this can be read by humans but cannot be segmented by bots; </li></ul>
    17. Other Types of CAPTCHA <ul><li>Animated CAPTCHAs </li></ul><ul><li>3D CAPTCHA </li></ul><ul><li>ASCII art </li></ul><ul><li>Reverse CAPTCHA &quot;Leave this field blank&quot; </li></ul>
    18. Other: Cognitive Puzzles <ul><li>Distinguish pictures of dogs from cats </li></ul><ul><li>Choose a word that relates to all the images </li></ul><ul><li>Trivia questions </li></ul><ul><li>Math and word problems </li></ul><ul><li>3D Object CAPTCHA </li></ul><ul><li>Solve failed OCR inputs </li></ul>
    19. Other: Distinguish pictures <ul><li>Microsoft Asirra (Animal Species Image Recognition for Restricting Access); </li></ul><ul><li>KittenAuth Project . </li></ul>
    20. Other : Mathematical CAPTCHA
    21. Other : Mathematical CAPTCHA
    22. Other: 3D Object CAPTCHA <ul><li>You must enter them in the exact sequence listed: </li></ul><ul><li>The Head of the Walking Man, </li></ul><ul><li>The Vase, </li></ul><ul><li>The Back of the Chair. </li></ul>
    23. Other: Jumble Game
    24. Other: Drupal Examples
    25. Other: Tests <ul><li>„ Common Sense&quot; questions: </li></ul><ul><ul><li>„ What is 3 + 5?“ </li></ul></ul><ul><ul><li>„ What color is the sky?&quot; </li></ul></ul><ul><li>Type the word 'orange'; </li></ul><ul><li>Require a valid email to approve; </li></ul><ul><li>These attempts violate principles: </li></ul><ul><ul><li>they cannot be automatically generated; </li></ul></ul><ul><ul><li>they can be easily cracked given the state of AI. </li></ul></ul>
    26. Where to use CAPTCHAs? <ul><li>Data Collection </li></ul><ul><li>Worms and Spam </li></ul><ul><li>Preventing Comment Spam in Blogs </li></ul><ul><li>Protecting Email Addresses From Scrapers </li></ul><ul><li>Online Polls </li></ul><ul><li>Protecting Website Registration </li></ul><ul><li>Preventing Dictionary Attacks </li></ul><ul><li>Search Engine Bots </li></ul>
    27. Where to use CAPTCHAs? <ul><li>Preventing Comment Spam in Blogs. </li></ul><ul><li>Protecting Email Addresses From Scrapers. Mechanism to hide your email address, require users to solve a CAPTCHA before showing your email address </li></ul><ul><li>Online Polls. You cannot trust the results of an online roll because anybody could just write a program to vote for their favorite option thousands of times. </li></ul>
    28. Where to use CAPTCHAs? <ul><li>Protecting Website Registration. ( E-mail services: Yahoo, Microsoft, Google ) </li></ul><ul><li>Preventing Dictionary Attacks (in password systems). Prevent a computer to iterate through the entire space of passwords by requiring it to solve a CAPTCHA after a certain number of unsuccessful logins. </li></ul><ul><li>Search Engine Bots. It is sometimes desirable to keep webpages unindexed to prevent others from finding them easily. </li></ul>
    29. Guidelines <ul><li>Image Security. Images of text should be distorted randomly before being presented to the user. </li></ul><ul><li>Script Level Security. Insecurities: </li></ul><ul><ul><li>Systems that pass the answer in plain text; </li></ul></ul><ul><ul><li>Systems where a solution to the same CAPTCHA can be used multiple times (&quot;replay attacks&quot;). </li></ul></ul>
    30. Guidelines <ul><li>Security Even After Wide-Spread Adoption. There are CAPTCHAs that would be insecure if a significant number of sites started using them. </li></ul><ul><ul><li>Example: text-based questions; </li></ul></ul><ul><ul><li>A parser could easily be written that would allow bots to bypass the test; </li></ul></ul><ul><ul><li>Such “CAPTCHAs” rely on the fact that few sites use them, and thus that a bot author has no incentive to program their bot to solve that challenge. </li></ul></ul>
    31. Guidelines <ul><li>Accessibility. </li></ul><ul><ul><li>CAPTCHAs prevent visually impaired users (for example, due to a disability or because it is difficult to read) from accessing the protected resource; </li></ul></ul><ul><ul><li>They use screen reader, so when you reached an image, all it can do is to read the caption of that image; </li></ul></ul><ul><ul><li>Solution : permitting users to opt for an audio or sound CAPTCHA. </li></ul></ul>
    32. Guidelines: Accesibility Hard to read CAPTCHAs:
    33. Guidelines: Accesibility Worst CAPTCHAs:
    34. Ways to break CAPTCHAs <ul><li>Exploiting bugs in the implementation that allow the attacker to completely bypass the CAPTCHA; </li></ul><ul><li>Improving Character Recognition software (OCR – Optical Character Recognition ); </li></ul><ul><li>Using cheap human labor to process the tests ( sweatshops ). </li></ul>
    35. Break: Insecure implementation <ul><li>Re-using the session ID of a known CAPTCHA image. </li></ul><ul><li>Other CAPTCHA use a hash of the solution as a key passed to the client to validate. Often it is small enough size that it can be cracked . </li></ul><ul><li>Other implementations use only a small fixed pool of CAPTCHA images (Asirra – 3 millions). </li></ul>
    36. Break : Character Recognition <ul><li>Programs that have the following functions: </li></ul><ul><ul><li>Extraction of the image from the web page </li></ul></ul><ul><ul><li>Removal of background clutter, for example with color filters and detection of thin lines; </li></ul></ul><ul><ul><li>Segmentation , i.e. splitting the image into regions each containing a single letter; </li></ul></ul><ul><ul><li>Identifying the letter for each region. </li></ul></ul>
    37. Attacks – EZ-Gimpy 2000 <ul><li>Yahoo's early CAPTCHA called &quot;EZ-Gimpy“; </li></ul><ul><li>The program picks a word from a dictionary, and produces a distorted and noisy image of the word; </li></ul><ul><li>Algorithm for breaking EZ-Gimpy ( 92% ): </li></ul><ul><ul><li>Locate possible letters at various locations; </li></ul></ul><ul><ul><li>Construct graph of consistent letters; </li></ul></ul><ul><ul><li>Look for plausible words in the graph. </li></ul></ul>
    38. Attacks – EZ-Gimpy 2000 <ul><li>EZ-Gimpy </li></ul><ul><li>Possible Letters </li></ul><ul><li>Graph of Letters </li></ul><ul><li>Plausible Words </li></ul>
    39. Attacks – SimpleOCR Engine 2002
    40. Attacks – Jan/Feb 2008 <ul><li>Google </li></ul><ul><li>( Jan 17 ) 20% </li></ul><ul><li>Hotmail </li></ul><ul><li>(Feb 6) 30-35% </li></ul><ul><li>Yahoo </li></ul><ul><li>(Feb 22) 30-35% </li></ul>
    41. Attacks – Projects <ul><li>Several broking CAPTCHAs projects: </li></ul><ul><ul><li> </li></ul></ul><ul><ul><li>http:// / captcha / </li></ul></ul>
    42. Break: Human solvers <ul><li>Attacks that uses humans to solve the puzzles; </li></ul><ul><li>Approaches: </li></ul><ul><ul><li>relaying the puzzles to a group of human operators who can solve CAPTCHAs; </li></ul></ul><ul><ul><li>copying the CAPTCHA images and using them as CAPTCHAs for a high-traffic site owned by the attacker. </li></ul></ul>
    43. CAPTCHA Sweatshops <ul><li>A computer fills out a form and when it reaches a CAPTCHA, it gives it to the operator to solve. </li></ul><ul><li>Weakness for Asirra: </li></ul><ul><ul><li>if the database of cat and dog photos can be downloaded, </li></ul></ul><ul><ul><li>then paying workers $0.01 to classify each photo, </li></ul></ul><ul><ul><li>means that almost the entire database of photos can be deciphered for $30,000. </li></ul></ul><ul><li>Once IP has misclassified a challenge, a human needs to just solve two Asirras in a row from the same browser session. </li></ul>
    44. CAPTCHA Sweatshops <ul><li>Not Economical Viable </li></ul><ul><li>A typical spam run of 1 million messages per day would cost $14,000 per day and require 116 people working 24/7. </li></ul>$2.50 / h for each human 720 CAPTCHAs per hour per human 1/3 cent per account
    45. Porn Companies (October 2007) <ul><li>They write a program that fills out the entire registration form (ex Yahoo); </li></ul><ul><li>When the program gets to the CAPTCHA it can’t solve it; </li></ul><ul><li>So it copies the CAPTCHA back to the porn page; </li></ul><ul><li>One person gets the screen saying if you want to see the next picture, you’ve got to tell what word is in the specific CAPTCHA. </li></ul>
    46. Porn Companies (October 2007)
    47. Next CAPTCHA Generation <ul><li>CAPTCHAs can be made stronger, but they are already too advanced for a large percentage of Internet users; </li></ul><ul><li>CAPTCHA devolves from a simple human reading test into an intelligence test or an acuity test. </li></ul>
    48. reCAPTCHA (2007) <ul><li>New form of CAPTCHA that also helps digitize books ; </li></ul><ul><li>The words displayed to the user come directly from old books that are being digitized; </li></ul><ul><li>Words that OCR could not identify ; </li></ul>
    49. reCAPTCHA <ul><li>Pairs an unknown word with a known one ; </li></ul><ul><li>Distorts them both and puts a line through them and then sent them to be proofread; </li></ul><ul><li>Respondent answers both elements: </li></ul><ul><ul><li>half of effort validates the challenge; </li></ul></ul><ul><ul><li>the other half is captured as work. </li></ul></ul>
    50. reCAPTCHA
    51. Time spent <ul><li>Roughly 60 million CAPTCHAs are solved each day; </li></ul><ul><li>Medium 10 seconds to solve a captcha; </li></ul><ul><li>People around the world waste more than 150,000 hours on solving CAPTCHAs; </li></ul>
    52. Time spent <ul><li>A fifth of those users giving </li></ul><ul><li>30,000 daily man-hours of work; </li></ul><ul><li>It would constitute the world's fastest and most accurate character-recognition computer, processing 10 million words a day. </li></ul><ul><li>Recreating the books – word by word </li></ul>
    53. Time spent <ul><li>9 Billion Human-Hours of </li></ul><ul><li>Solitaire were played in 2003 </li></ul><ul><li>Empire State Building </li></ul><ul><li>7 million Human-Hours </li></ul><ul><li>(6.8 Hours of Solitaire) </li></ul><ul><li>Panama Canal </li></ul><ul><li>20 Million Human-Hours </li></ul><ul><li>(Less than a day of Solitaire) </li></ul>
    54. Wasted human cycles <ul><li>If the world's computer Solitaire players could be coaxed into enjoying a game that contributed to solving a computing problem, he calculates, it would produce billions of man-hours of labor each year. </li></ul><ul><li>„ make all of humanity more efficient by exploiting the human cycles that get wasted“ </li></ul>
    55. Wasted human cycles <ul><li>People will contribute their brainpower, but only if they're given an enjoyable , time-killing experience in exchange. </li></ul><ul><li>Most projects that harness human processing power rely on a different motivator: money . </li></ul><ul><li>Which produces better results — a small group of experts or a huge mob of amateurs ? </li></ul>
    56. Human Computation <ul><li>Things that we humans can do and computer cannot, like: </li></ul><ul><ul><li>Labeling images with words; </li></ul></ul><ul><ul><li>Picking out a voice in a loudly room; </li></ul></ul><ul><li>Humans have trouble remembering long, random strings of characters, yet they excel at remembering faces and objects. </li></ul>
    57. Symbiotic relationship <ul><li>One in which humans solve some problems, computer solve some other problems; </li></ul><ul><li>Image search - A method that every image on the Web could give us accurate textual descriptions of those images; </li></ul>
    58. The ESP Game <ul><li>Two-player online game; </li></ul><ul><li>Partners don’t know each other </li></ul><ul><li>and can’t communicate; </li></ul><ul><li>Object of the game: </li></ul><ul><li>Type the same word ; </li></ul><ul><li>The only thing in common </li></ul><ul><li>is an image ; </li></ul>
    59. The ESP Game Player 1 Guessing: CAR Guessing : HAT Guessing: KID Success! You agree on CAR Player 2 Guessing : BOY Guessing : CAR Success! You agree on CAR
    60. The ESP Game <ul><li>The ESP Game has been licensed (2006) by Google in the form of the Google Image Labeler , and is used to improve the accuracy of the Google Image Search . </li></ul><ul><li>“ 5000 people playing simultaneously can </li></ul><ul><li>label all images on Google in 30 days !” </li></ul>
    62. ESP Tag a Tune Matchin
    63. Squigl Verbosity
    64. Future Games <ul><li>Language translation. A game could challenge two players who don’t speak the same language to translate text from one language to the other. </li></ul><ul><li>Monitoring of security cameras . Players could monitor security cameras and alert authorities about suspected illegal activity. </li></ul>
    65. Future Games <ul><li>Improving Web search. People have varying degrees of skill at searching for information on the Web. A game could be designed in which the players perform searches for other people. </li></ul><ul><li>Text summarization. Imagine a game in which people summarize important documents for the rest of the world. </li></ul>
    66. Still not thinking big enough <ul><li>&quot;If we have that many people all doing some little part, we could do something insanely huge for humanity.&quot; </li></ul><ul><li>&quot;We'll never run out of things to digitize&quot; </li></ul>
    68. Bibliography <ul><li>Site: Luis von Ahn Website (2006) </li></ul><ul><li>Site: reCAPTCHA (2007) </li></ul><ul><li>Site: CAPTCHA (2007) </li></ul><ul><li>Site: Gwap (2008) </li></ul><ul><li>Interview: „ Using “ captchas ” to digitize books “ (2007) </li></ul><ul><li>Interview: „For Certain Tasks , the Cortex Still Beats the CPU“ (2007) </li></ul>
    69. Bibliography <ul><li>Video: Wired – „Human Computation “ (2007) </li></ul><ul><li>Video: Google TechTalks – “Human Computation” (2006) </li></ul><ul><li>Paper: „ Games With a Purpose “ (2006) </li></ul><ul><li>Paper: „ How Lazy Cryptographers do AI “ (2004) </li></ul><ul><li>Paper: „ CAPTCHA: Using Hard AI Problems for Security “(2003) </li></ul>
    70. Bibliography <ul><li>Article: “CAPTCHA is Dead, Long Live CAPTCHA!” (2008) </li></ul><ul><li>Article: „ Yahoo's CAPTCHA Security Reportedly Broken “ (2008) </li></ul><ul><li>Article: „ Anti-CAPTCHA operations on Microsoft Mail“ (2008) </li></ul><ul><li>Article: „ Google’s CAPTCHA busted in recent spammer tactics “ (2008) </li></ul>
    71. Bibliography <ul><li>Paper: „ Recognizing Objects in Adversarial Clutter “ (2002) </li></ul><ul><li>Article: Wikipedia CAPTCHA (2008) </li></ul><ul><li>Article: „ CAPTCHA Effectiveness” (2006) </li></ul><ul><li>Article: „ Breaking a Visual CAPTCHA“ (2002) </li></ul><ul><li>Article: „Human or Computer? Take This Test“ (2002) </li></ul><ul><li>Site: XKCD (2008) </li></ul>
    72. <ul><li>Thank you! </li></ul>