Your SlideShare is downloading. ×
Captcha
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Captcha

30,465
views

Published on

Published in: Travel, Technology

9 Comments
23 Likes
Statistics
Notes
No Downloads
Views
Total Views
30,465
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1,945
Comments
9
Likes
23
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Transcript

    • 1. Are you Human? (Sorry, I had to ask) Ecaterina Valică http://students.info.uaic.ro/~evalica/
    • 2. Agenda
      • What is CAPTCHA?
      • Types of CAPTCHA
      • Where to use CAPTCHAs?
      • Guidelines when making a CAPTCHA
      • Ways to break CAPTCHAs
      • reCAPTCHA
      • Human Computation Games
    • 3. Example: Filling out a form Google uses CAPTCHA for Gmail accounts:
    • 4. Beginnings
      • C ompletely A utomated P ublic T uring test to tell C omputers and H umans A part
      • Created in 2000 for Yahoo to prevent automated e-mail account registration,
      • by Luis von Ahn , Manuel Blum, Nicholas Hopper and John Langford, Carnegie Mellon University.
    • 5. Inventor Luis von Ahn ( 1978 - ) Photograph by Mike McGregor
    • 6. What is CAPTCHA?
      • A program that can tell whether its user is a human or a computer.
      • It uses a type of challenge-response test to determine that the response is not generated by a computer.
    • 7. Turing Test
      • „ Standard Interpretation"
      • player C , the interrogator,
      • is tasked with trying to determine which player
      • - A or B - is a computer and which is a human.
    • 8. Reverse Turing Test
      • A CAPTCHA is sometimes described as a reverse Turing test, because it is
      • administered by a machine and
      • targeted to a human .
    • 9. So, CAPTCHA is…
      • A program that can generate and grade tests that:
      • Most humans can pass;
      • Current computer programs cannot pass.
    • 10. Making a CAPTCHA
      • Pick random string of characters
      • (or words)
      • ifhkfp
      • Renders it into a distorted image
    • 11. Making a CAPTCHA
      • … and the program generates a test :
      • Type the characters that appear in the image
    • 12. Outperform the computers
      • In many simple tasks, a typical 5-year-old can outperform the most powerful computers
      • easier for computers:
        • like medical diagnosis,
        • playing chess,
      • hard for computers:
        • operations requiring vision , hearing , language or motor control .
    • 13. Type: Early CAPTCHAs
      • Generated by the EZ-Gimpy program;
      • Used previously on Yahoo!
    • 14. Type: Improved CAPTCHA
      • high contrast for human readability;
      • medium, per-character perturbation;
      • random fonts per character;
      • low background noise ;
    • 15. Type: A modern CAPTCHA
      • rather than attempting to create a distorted background and high levels of warping on the text;
      • focus on making segmentation difficult by adding an angled line ;
    • 16. Type: A modern CAPTCHA
      • another way to make segmentation difficult is to crowd symbols together ;
      • this can be read by humans but cannot be segmented by bots;
    • 17. Other Types of CAPTCHA
      • Animated CAPTCHAs
      • 3D CAPTCHA
      • ASCII art
      • Reverse CAPTCHA "Leave this field blank"
    • 18. Other: Cognitive Puzzles
      • Distinguish pictures of dogs from cats
      • Choose a word that relates to all the images
      • Trivia questions
      • Math and word problems
      • 3D Object CAPTCHA
      • Solve failed OCR inputs
    • 19. Other: Distinguish pictures
      • Microsoft Asirra (Animal Species Image Recognition for Restricting Access);
      • KittenAuth Project .
    • 20. Other : Mathematical CAPTCHA
    • 21. Other : Mathematical CAPTCHA
    • 22. Other: 3D Object CAPTCHA
      • You must enter them in the exact sequence listed:
      • The Head of the Walking Man,
      • The Vase,
      • The Back of the Chair.
    • 23. Other: Jumble Game
    • 24. Other: Drupal Examples
    • 25. Other: Tests
      • „ Common Sense" questions:
        • „ What is 3 + 5?“
        • „ What color is the sky?"
      • Type the word 'orange';
      • Require a valid email to approve;
      • These attempts violate principles:
        • they cannot be automatically generated;
        • they can be easily cracked given the state of AI.
    • 26. Where to use CAPTCHAs?
      • Data Collection
      • Worms and Spam
      • Preventing Comment Spam in Blogs
      • Protecting Email Addresses From Scrapers
      • Online Polls
      • Protecting Website Registration
      • Preventing Dictionary Attacks
      • Search Engine Bots
    • 27. Where to use CAPTCHAs?
      • Preventing Comment Spam in Blogs.
      • Protecting Email Addresses From Scrapers. Mechanism to hide your email address, require users to solve a CAPTCHA before showing your email address
      • Online Polls. You cannot trust the results of an online roll because anybody could just write a program to vote for their favorite option thousands of times.
    • 28. Where to use CAPTCHAs?
      • Protecting Website Registration. ( E-mail services: Yahoo, Microsoft, Google )
      • Preventing Dictionary Attacks (in password systems). Prevent a computer to iterate through the entire space of passwords by requiring it to solve a CAPTCHA after a certain number of unsuccessful logins.
      • Search Engine Bots. It is sometimes desirable to keep webpages unindexed to prevent others from finding them easily.
    • 29. Guidelines
      • Image Security. Images of text should be distorted randomly before being presented to the user.
      • Script Level Security. Insecurities:
        • Systems that pass the answer in plain text;
        • Systems where a solution to the same CAPTCHA can be used multiple times ("replay attacks").
    • 30. Guidelines
      • Security Even After Wide-Spread Adoption. There are CAPTCHAs that would be insecure if a significant number of sites started using them.
        • Example: text-based questions;
        • A parser could easily be written that would allow bots to bypass the test;
        • Such “CAPTCHAs” rely on the fact that few sites use them, and thus that a bot author has no incentive to program their bot to solve that challenge.
    • 31. Guidelines
      • Accessibility.
        • CAPTCHAs prevent visually impaired users (for example, due to a disability or because it is difficult to read) from accessing the protected resource;
        • They use screen reader, so when you reached an image, all it can do is to read the caption of that image;
        • Solution : permitting users to opt for an audio or sound CAPTCHA.
    • 32. Guidelines: Accesibility Hard to read CAPTCHAs:
    • 33. Guidelines: Accesibility Worst CAPTCHAs:
    • 34. Ways to break CAPTCHAs
      • Exploiting bugs in the implementation that allow the attacker to completely bypass the CAPTCHA;
      • Improving Character Recognition software (OCR – Optical Character Recognition );
      • Using cheap human labor to process the tests ( sweatshops ).
    • 35. Break: Insecure implementation
      • Re-using the session ID of a known CAPTCHA image.
      • Other CAPTCHA use a hash of the solution as a key passed to the client to validate. Often it is small enough size that it can be cracked .
      • Other implementations use only a small fixed pool of CAPTCHA images (Asirra – 3 millions).
    • 36. Break : Character Recognition
      • Programs that have the following functions:
        • Extraction of the image from the web page
        • Removal of background clutter, for example with color filters and detection of thin lines;
        • Segmentation , i.e. splitting the image into regions each containing a single letter;
        • Identifying the letter for each region.
    • 37. Attacks – EZ-Gimpy 2000
      • Yahoo's early CAPTCHA called "EZ-Gimpy“;
      • The program picks a word from a dictionary, and produces a distorted and noisy image of the word;
      • Algorithm for breaking EZ-Gimpy ( 92% ):
        • Locate possible letters at various locations;
        • Construct graph of consistent letters;
        • Look for plausible words in the graph.
    • 38. Attacks – EZ-Gimpy 2000
      • EZ-Gimpy
      • Possible Letters
      • Graph of Letters
      • Plausible Words
    • 39. Attacks – SimpleOCR Engine 2002
    • 40. Attacks – Jan/Feb 2008
      • Google
      • ( Jan 17 ) 20%
      • Hotmail
      • (Feb 6) 30-35%
      • Yahoo
      • (Feb 22) 30-35%
    • 41. Attacks – Projects
      • Several broking CAPTCHAs projects:
        • http://libcaca.zoy.org/wiki/PWNtcha
        • http:// www.lafdc.com / captcha /
    • 42. Break: Human solvers
      • Attacks that uses humans to solve the puzzles;
      • Approaches:
        • relaying the puzzles to a group of human operators who can solve CAPTCHAs;
        • copying the CAPTCHA images and using them as CAPTCHAs for a high-traffic site owned by the attacker.
    • 43. CAPTCHA Sweatshops
      • A computer fills out a form and when it reaches a CAPTCHA, it gives it to the operator to solve.
      • Weakness for Asirra:
        • if the database of cat and dog photos can be downloaded,
        • then paying workers $0.01 to classify each photo,
        • means that almost the entire database of photos can be deciphered for $30,000.
      • Once IP has misclassified a challenge, a human needs to just solve two Asirras in a row from the same browser session.
    • 44. CAPTCHA Sweatshops
      • Not Economical Viable
      • A typical spam run of 1 million messages per day would cost $14,000 per day and require 116 people working 24/7.
      $2.50 / h for each human 720 CAPTCHAs per hour per human 1/3 cent per account
    • 45. Porn Companies (October 2007)
      • They write a program that fills out the entire registration form (ex Yahoo);
      • When the program gets to the CAPTCHA it can’t solve it;
      • So it copies the CAPTCHA back to the porn page;
      • One person gets the screen saying if you want to see the next picture, you’ve got to tell what word is in the specific CAPTCHA.
    • 46. Porn Companies (October 2007)
    • 47. Next CAPTCHA Generation
      • CAPTCHAs can be made stronger, but they are already too advanced for a large percentage of Internet users;
      • CAPTCHA devolves from a simple human reading test into an intelligence test or an acuity test.
    • 48. reCAPTCHA (2007)
      • New form of CAPTCHA that also helps digitize books ;
      • The words displayed to the user come directly from old books that are being digitized;
      • Words that OCR could not identify ;
    • 49. reCAPTCHA
      • Pairs an unknown word with a known one ;
      • Distorts them both and puts a line through them and then sent them to be proofread;
      • Respondent answers both elements:
        • half of effort validates the challenge;
        • the other half is captured as work.
    • 50. reCAPTCHA
    • 51. Time spent
      • Roughly 60 million CAPTCHAs are solved each day;
      • Medium 10 seconds to solve a captcha;
      • People around the world waste more than 150,000 hours on solving CAPTCHAs;
    • 52. Time spent
      • A fifth of those users giving
      • 30,000 daily man-hours of work;
      • It would constitute the world's fastest and most accurate character-recognition computer, processing 10 million words a day.
      • Recreating the books – word by word
    • 53. Time spent
      • 9 Billion Human-Hours of
      • Solitaire were played in 2003
      • Empire State Building
      • 7 million Human-Hours
      • (6.8 Hours of Solitaire)
      • Panama Canal
      • 20 Million Human-Hours
      • (Less than a day of Solitaire)
    • 54. Wasted human cycles
      • If the world's computer Solitaire players could be coaxed into enjoying a game that contributed to solving a computing problem, he calculates, it would produce billions of man-hours of labor each year.
      • „ make all of humanity more efficient by exploiting the human cycles that get wasted“
    • 55. Wasted human cycles
      • People will contribute their brainpower, but only if they're given an enjoyable , time-killing experience in exchange.
      • Most projects that harness human processing power rely on a different motivator: money .
      • Which produces better results — a small group of experts or a huge mob of amateurs ?
    • 56. Human Computation
      • Things that we humans can do and computer cannot, like:
        • Labeling images with words;
        • Picking out a voice in a loudly room;
      • Humans have trouble remembering long, random strings of characters, yet they excel at remembering faces and objects.
    • 57. Symbiotic relationship
      • One in which humans solve some problems, computer solve some other problems;
      • Image search - A method that every image on the Web could give us accurate textual descriptions of those images;
    • 58. The ESP Game
      • Two-player online game;
      • Partners don’t know each other
      • and can’t communicate;
      • Object of the game:
      • Type the same word ;
      • The only thing in common
      • is an image ;
    • 59. The ESP Game Player 1 Guessing: CAR Guessing : HAT Guessing: KID Success! You agree on CAR Player 2 Guessing : BOY Guessing : CAR Success! You agree on CAR
    • 60. The ESP Game
      • The ESP Game has been licensed (2006) by Google in the form of the Google Image Labeler , and is used to improve the accuracy of the Google Image Search .
      • “ 5000 people playing simultaneously can
      • label all images on Google in 30 days !”
    • 61. http://gwap.com/gwap/
    • 62. http://gwap.com/gwap/ ESP Tag a Tune Matchin
    • 63. http://gwap.com/gwap/ Squigl Verbosity
    • 64. Future Games
      • Language translation. A game could challenge two players who don’t speak the same language to translate text from one language to the other.
      • Monitoring of security cameras . Players could monitor security cameras and alert authorities about suspected illegal activity.
    • 65. Future Games
      • Improving Web search. People have varying degrees of skill at searching for information on the Web. A game could be designed in which the players perform searches for other people.
      • Text summarization. Imagine a game in which people summarize important documents for the rest of the world.
    • 66. Still not thinking big enough
      • "If we have that many people all doing some little part, we could do something insanely huge for humanity."
      • "We'll never run out of things to digitize"
    • 67.  
    • 68. Bibliography
      • Site: Luis von Ahn Website (2006)
      • Site: reCAPTCHA (2007)
      • Site: CAPTCHA (2007)
      • Site: Gwap (2008)
      • Interview: „ Using “ captchas ” to digitize books “ (2007)
      • Interview: „For Certain Tasks , the Cortex Still Beats the CPU“ (2007)
    • 69. Bibliography
      • Video: Wired – „Human Computation “ (2007)
      • Video: Google TechTalks – “Human Computation” (2006)
      • Paper: „ Games With a Purpose “ (2006)
      • Paper: „ How Lazy Cryptographers do AI “ (2004)
      • Paper: „ CAPTCHA: Using Hard AI Problems for Security “(2003)
    • 70. Bibliography
      • Article: “CAPTCHA is Dead, Long Live CAPTCHA!” (2008)
      • Article: „ Yahoo's CAPTCHA Security Reportedly Broken “ (2008)
      • Article: „ Anti-CAPTCHA operations on Microsoft Mail“ (2008)
      • Article: „ Google’s CAPTCHA busted in recent spammer tactics “ (2008)
    • 71. Bibliography
      • Paper: „ Recognizing Objects in Adversarial Clutter “ (2002)
      • Article: Wikipedia CAPTCHA (2008)
      • Article: „ CAPTCHA Effectiveness” (2006)
      • Article: „ Breaking a Visual CAPTCHA“ (2002)
      • Article: „Human or Computer? Take This Test“ (2002)
      • Site: XKCD (2008)
    • 72.
      • Thank you!