DevilTyper: A Game for CAPTCHA Usability Evaluation


Published on

CAPTCHA is an effective and widely used solution for preventing computer programs (i.e., bots) from performing automated but often malicious actions, such as registering thousands of free email accounts or posting advertisement on Web blogs. To make CAPTCHAs robust to automatic character recognition techniques, the text in the tests are often distorted, blurred, and obscure. At the same time, those robust tests may prevent genuine users from telling the text easily and thus distribute the cost of crime prevention among all the users. Thus, we are facing a dilemma, that is, a CAPTCHA should be robust enough so that it cannot be broken by programs, but also needs to be easy enough so that users need not to repeatedly take tests because of wrong guesses.
In this article, we attempt to resolve the dilemma by proposing a human computation game for quantifying the usability of CAPTCHAs. In our game, DevilTyper, players try to defeat as many devils as possible by solving CAPTCHAs, and player behavior in completing a CAPTCHA is recorded at the same time. Therefore, we can evaluate CAPTCHAs’ usability by analyzing collected player inputs. Since DevilTyper provides entertainment itself, we conduct a large-scale study for CAPTCHAs’ usability without the resource overhead required by traditional survey-based studies. In addition, we propose a consistent and reliable metric for assessing usability. Our evaluation results show that DevilTyper provides a fun and efficient platform for CAPTCHA designers to assess their CAPTCHA usability and thus improve CAPTCHA design.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • To ensure that the response is not generated by a computer
  • The common procedures to generate such images often include distortions,overlapping, clipping, and noise addition. These proceduresare performed to make image recognition algorithms unableto resolve the text in the images. However, the distortion ofthe text should be controlled to a reasonable level so thathuman can still tell the text clearly.
  • The most intuitive way to assess the usability of CAPTCHAsis to ask numerous human subjects to solve assignedCAPTCHAs repeatedly.However, such surveys are cost prohibitiveif a large-scale study is required and the investigatedCAPTCHAs are constantly updating. For example,investigating how different background noises affect theuser perception would require a large number of user inputs,which requires significant monetary investment to conductuser studies.
  • DevilTyper provides an open platform for evaluating CAPTCHA usability
  • (a) AuthImage(b) Captcher(c) Kiranvj(d) SecurImage(e) Plain Text(f) CoolCAPTCHA (g) TgCAPTCHA
  • Character distance stands for the distance between characters.In our experiment, we randomly set the characterdistances between 0.8 and 1.3, where a larger value correspondsto a tighter character arrangement.X-axis wave controls the degree of sine-wave distortions ofcharacters along the x-axis. In the experiment, this parameteris randomly set within the range from 0.5 to 1.2, wherea larger magnitude corresponds to stronger distortion.the x-axis wave distortion does notmake a systematic influence on users’ error rate, which impliesthat this type of distortion does not harm the CAPTCHA’susability.the y-axis distortions lead to a much moresignificant impact on CAPTCHA usability than x-axis distortions.Therefore, CAPTCHA designers should be carefulin choosing the appropriate degree for this type of distortionswhen adopting such CAPTCHAs in real use.the y-axis wave controlsthe degree of sine-wave distortions of characters alongthe y-axis, which we set within the range of 0.5 and 1.2in our experiments.
  • TgCAPTCHA, which is similar to theprevious Microsft CAPTCHA scheme, to demonstrate howsuch analysis is done by using the traces produced by DevilTyper.==========================================Long ArcsThe long arcs parameter controls the number of long arcsoverlaid on the image, where the position, length, and curvatureof the arcs are randomly chosen. In the experiment,we set this parameter between 0 and 5. we can see that the long arcs do not influence the usabilityof the CAPTCHAs significantly even when 5 long arcs wereadded.==========================================Short ArcsSimilar to long arcs, the short arcs parameter controls thenumber of short arcs overlaid on the image. In our experiment,the number of short arcs are randomly drawn fromthe range 0 to 20. Interestingly, while long arcs do not impactthe CAPTCHA’s usability, short arcs do, as shown inFigure 13(b). We believe it is due to the length of short arcsare similar to that of the character strokes so that short arcsare more likely to interfere with distorted text and increasethe difficulty of text recognition.==========================================Short LinesThe short lines parameter controls the number of short linesoverlaid on the rendered CAPTCHA. As with long and shortarcs, the position, length, and direction of each segmentis randomly decided. Our results show that users’ averageerror rates slightly but steadily increase with more shortlines, as shown in Figure 13(c). However, the impact ofshort lines is slightly less than that of short arcs, which isreasonable because arcs are more like the strokes of distortedtext and therefore more interference on readers’ recognitionis induced.
  • Each CAPTCHA scheme has its own obscuration algorithmto distort the text, which may have different impactson the recognition difficulty of different characters.We believe such results provide helpful informationwhen designing and applying CAPTCHAs. One obviousapplication is that, if a user happens to correctly solve allthe characters beside a ‘C’ character with the SecurImagescheme, we may allow the user pass the test as the ‘C’ characteris really difficult to recognize with that scheme.
  • DevilTyper: A Game for CAPTCHA Usability Evaluation

    1. 1. Chien-Ju Ho1, Chen-Chi Wu2,Kuan-Ta Chen1, Chin-Kuang Lai2Presenter: Derec Wu11Institute of Information Science, Academia Sinica2Department of Electrical Engineering, NationalTaiwan University
    2. 2.  Acronym for Computer Automated PublicTuring test to tell Computers and HumansApart Challenge-Response test Require users type letters or digits from adistorted image to distinguish humans fromcomputers
    3. 3.  CAPTCHAs tests must be Secure▪ Hard for computers▪ Prevent computer programs from performing automatedmalicious tasks Usable▪ easy for human beings ?
    4. 4. Human Usabilityv.s.Computational Challenges
    5. 5.  Determine the difficulty of the CAPTCHA testfor human beings Traditional approach human survey▪ cost a lot of money▪ difficult to scale up
    6. 6.  A human computation game for CAPTCHAusability evaluation Players are engaged to solve the problem forus while having fun themselves Lower monetary cost and easier to scale up
    7. 7.  Overview CAPTCHA Why devilTyper DevilTyper Design Experiment Experiment setup Results Conclusion
    8. 8.  Each devil is attached with a CAPTCHA test Players are required to solve the testcorrectly to win the game Player behaviors are recorded and are used toevaluate the CAPTCHA usability
    9. 9.  Players must solve the CAPTCHA before thedevil from the top reaches the bottom Get scores by solving CAPTCHAs Lose HP if the devil reaches the bottom
    10. 10.  High score lists are maintained to encourageplayers playing more
    11. 11. 
    12. 12.  The following player behaviors for solvingeach CAPTCHA test are collected Finish time Rate of typing error Rate of giving up the test Rate of repeated typing Rate of failing to solve the test within time limit
    13. 13.  Overview CAPTCHA Why devilTyper DevilTyper Design Experiment Experiment setup Results Conclusion
    14. 14.  We announced the game in a popular socialnetwork PTT and held a four-week campaign Total cost: US$ 30 Total number of games being played: 6,500 Total CAPTCHAs being solved: 1,407,055
    15. 15.  CAPTCHATypes
    16. 16.  The results of different metrics are consistent* A-F:different types of CAPTCHAs*The results are normalized to0 to 1 for comparisons
    17. 17.  The DevilTyper results are consistent withtraditional survey method (MechanicalTurk)* A-F:different types of CAPTCHAs*The results are normalized to0 to 1 for comparisonsDevilTyper provides an open platformfor evaluating CAPTCHA usability
    18. 18. Design factors analysis usingDevilTyper
    19. 19. A:B:C:D:E:F:G:PlainText
    20. 20.  Three strategies for text distortion inCoolCAPTCHACharacter Distance X-AxisWave Y-AxisWave
    21. 21.  Three strategies for noise addition inTgCAPTCHALongArcs Noise ShortArcs Noise Short Line Noise
    22. 22.  The difficulty of recognizing each character indifferentCAPTCHA types can be determined“i” is hardly recognizablein TgCAPTCHA“i” is easier to recognizein CoolCAPTCHAQ VCT
    23. 23.  We proposed a human computation game,Deviltyper, for evaluating CAPTCHA usability Monetary cost is much lower than traditionalsurveys Evaluation is easier to scale up We show how this open platform can be usedto help the CAPTCHA designers to designmore user-friendlyCAPTCHAs
    24. 24. ThankYou 