Your SlideShare is downloading. ×
Decaptcha
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Decaptcha

510
views

Published on

Breaking Text Based Captchas

Breaking Text Based Captchas

Published in: Education

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
510
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. OVERVIEW Introduction Corpus Classifiers Segmentation Design Principles Decaptcha Future Work Conclusion References
  • 2. Strengths and Weaknesses of Captcha Introduction What is Captcha? “A CAPTCHA (an acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart") is a type of challenge-response test used in computing to determine whether or not the user is human.” – Wikipedia “A CAPTCHA is a program that protects websites against bots by generating and grading tests that humans can pass but current computer programs cannot. For example, humans can read distorted text as the one shown in the next slide, but current computer programs can't:” – www.captcha.net O GUNESHWOR SINGH PESIT, Dept. of CSE 2013 3/23
  • 3. Strengths and Weaknesses of Captcha Introduction ` Background  Measuring attack effectiveness • Coverage is the fraction of Captchas that the solver attempts to answer. • Precision is the fraction of Captchas answered correctly.  Attacking Captchas • Pre-processing, • Segmentation, and The Captcha design goal is that “automatic scripts should not be more successful than 1 in 10,000” attempts (i.e. a precision of 0.01%) • Classification O GUNESHWOR SINGH Fig: This CAPTCHA of "smwm" obscures its message from computer interpretation by twisting the letters and adding a background color gradient PESIT, Dept. of CSE 2013 4/23
  • 4. Strengths and Weaknesses of Captcha Corpus  Popular Real World Captchas The anti-recognition techniques considered are: 1. Multi-fonts, Using multiple fonts or font-faces. 2. Charset, Which charset the scheme uses. 3. Font size, Using variable font size. 4. Distortion, Distorting the Captcha globally using attractor fields. 5. Blurring, Blurring letters. 6. Tilting, Rotating characters with various angles. 7. Waving, Rotating the characters in a wave fashion. O GUNESHWOR SINGH PESIT, Dept. of CSE 2013 5/23
  • 5. Strengths and Weaknesses of Captcha Corpus The anti-segmentation techniques considered are: 1. Complex background, To hide the text in a complex background to "confuse" the solver. 2. Lines, Adding extra lines to prevent the solver from knowing what are the real character segments. 3. Collapsing, Removing the space between characters to prevent segmentation.  Synthetic corpus Using Mechanical Turk from Amazon, synthetic corpus is generated and annotated O GUNESHWOR SINGH PESIT, Dept. of CSE 2013 6/23
  • 6. Strengths and Weaknesses of Captcha Corpus O GUNESHWOR SINGH PESIT, Dept. of CSE 2013 7/23
  • 7. Strengths and Weaknesses of Captcha Classifiers  SVM (Support Vector Machines) • Known to almost always yield very good performance regardless of the problem • Does better on distortion ( 61% vs. 50% )  KNN (K Nearest Neighbors) • Fastest classifier and it has nice stability properties that make it very reliable • To remove the burden of setting the number of neighbors (K) by hand, heuristic is used that computes the optimal K value, which is often 1, by performing a cross validation on the training set to find the optimal maximal K value • KNN performs better with the mix of five complex fonts ( 62% vs. 59% ) O GUNESHWOR SINGH PESIT, Dept. of CSE 2013 8/23
  • 8. Strengths and Weaknesses of Captcha Classifiers Fig: Effectiveness of classifiers on various anti-recognition features. These graphs depict how fast each classifier precision improves as more examples are added to the training set. O GUNESHWOR SINGH PESIT, Dept. of CSE 2013 9/23
  • 9. Strengths and Weaknesses of Captcha Segmentation Background Blending Techniques that try to prevent segmentation by “blending” the captcha text with the background.  Complex background • Using a complex background is that the lines/shapes “inside it” will be confused with the real text and thus will prevent the breaker from isolating and segmenting the Captcha • Anti-pattern: for all the possible font colors remove everything from the Captcha that is not close to this color and test if you get a reasonable number of clusters (letters) with the right amount of pixels. O GUNESHWOR SINGH PESIT, Dept. of CSE 2013 10/23
  • 10. Strengths and Weaknesses of Captcha Segmentation Background Blending  Color similarity • Using colors that are perceived as very different by humans but are in reality very close in the RGB spectrum • Binarize the Captcha by using a threshold based on the hue or the saturation to break Fig: Example of the Skyrock pipeline O GUNESHWOR SINGH PESIT, Dept. of CSE 2013 11/23
  • 11. Strengths and Weaknesses of Captcha Segmentation Background Blending  Noise • Most efficient technique used to confuse the segmentation is to add random noise to the image • MRF (Markov Random Field) aka Gibbs algorithm: iterative algorithm that works by computing the energy of each pixel based on its surroundings and removing pixels that have an energy below a certain threshold Fig: Example of the Captcha.net pipeline O GUNESHWOR SINGH PESIT, Dept. of CSE 2013 12/23
  • 12. Strengths and Weaknesses of Captcha Segmentation Using Lines Another approach to prevent segmentation is to use line(s) that cross multiple characters. This approach is used by Digg and Slashdot for instance.  Small lines • Using small lines that will prevent the captcha from being segmented. This is the strategy used by Digg • Histogram-based segmentation: segmentation based on the region where the characters are/is denser and therefore will create peaks in the histogram Fig: Example of the Digg pipeline O GUNESHWOR SINGH PESIT, Dept. of CSE 2013 13/23
  • 13. Strengths and Weaknesses of Captcha Segmentation Using Lines  Big lines • Using lines that have the same “width” as that of character segments • Susceptible to line-finding algorithms, such as the Canny edge detection and the Hough Transform, because the lines cross the entire Captcha Fig: Example of the Slashdot pipeline O GUNESHWOR SINGH PESIT, Dept. of CSE 2013 14/23
  • 14. Strengths and Weaknesses of Captcha Segmentation Collapsing Collapsing is considered by far to be the most secure anti-segmentation technique. Two cases arise: one where the attacker can exploit a design flaw to predict the characters’ segmentation despite the collapsing and the case where there is no flaw and the attacker is forced to “brute force" the Captcha.  Predictable collapsing • An attacker can make an educated guess where the cuts are likely to occur if the width of the letters is too regular and/or the number of letters is known in advance Fig: Example of the eBay pipeline O GUNESHWOR SINGH PESIT, Dept. of CSE 2013 15/23
  • 15. Strengths and Weaknesses of Captcha Segmentation Collapsing  Unpredictable collapsing • When the number of characters is unknown and the average size of each character is unpredictable then the only option is to try to recognize each letter of the Captcha directly without segmenting it. • One solution might be to train on character templates segmented by hand and then use a space displacement neural network to recognize the characters without segmenting first. O GUNESHWOR SINGH PESIT, Dept. of CSE 2013 16/23
  • 16. Strengths and Weaknesses of Captcha Design Principles for a Secure Captcha Core feature principles The following principles apply to the design of the Captcha core features: 1. Randomize the Captcha length 2. Randomize the character size 3. Wave the Captcha Anti-recognition 1. Use anti-recognition techniques as a means of strengthening Captcha security 2. Don’t use a complex charset Anti-Segmentation 1. Use collapsing or lines 2. Create alternative schemes O GUNESHWOR SINGH PESIT, Dept. of CSE 2013 17/23
  • 17. Strengths and Weaknesses of Captcha Decaptcha Pipeline Decaptcha uses the five stage pipeline: 1. Preprocessing: Background is removed using several algorithms and the Captcha is binarized 2. Segmentation: Segment the Captchas using various segmentation techniques, the most common being CFS(Color Filling Segmentation) which uses a paint bucket flood filling algorithm 3. Post-Segmentation: Segments’ sizes are always normalized for easier recognition 4. Recognition: In training mode, this stage is used to teach the classifier what each letter looks like after the Captcha has been segmented. In testing mode, the classifier is used in predictive mode to recognize each character O GUNESHWOR SINGH PESIT, Dept. of CSE 2013 18/23
  • 18. Strengths and Weaknesses of Captcha Decaptcha Pipeline 5. Post-processing: Classifier’s output is improved when possible. For instance, spell checking is performed on the classifier’s output for Slashdot Using spellchecking allows us to increase our precision on Slashdot from 24% to 35% . Captcha Final Answer Image Matrix Preprocessing Segments matrices Segmentation Segments matrices Postsegmentation Potential answer Recognition Postprocessing k356fs O GUNESHWOR SINGH PESIT, Dept. of CSE 2013 19/23
  • 19. Strengths and Weaknesses of Captcha Future Relevant Work  Recognition algorithm • KNN & SVM Algorithm • Neural Networks  Machine vision algorithms • Canny detection • Hough Transform • Markov Random Field (Gibbs) • SIFT and SURF  Captcha • How efficient statistical classifier are at recognizing Captcha chars • Other forms of Captcha - Gotcha O GUNESHWOR SINGH PESIT, Dept. of CSE 2013 20/23
  • 20. Strengths and Weaknesses of Captcha Conclusion • Evaluated various automated methods on real world Captchas and synthetic generated Captchas using anti-segmentation techniques and antirecognition techniques • Efficiency of the tool Decaptcha against real captchas from Authorize, Baidu, Blizzard, Captcha.net, CNN, Digg, eBay,Google, Megaupload, NIH, Recaptcha, Reddit, Skyrock, Slashdot, and Wikipedia. On these 15 captchas, 1%-10% success rate on two (Baidu, Skyrock),10-24% on two (CNN, Digg), 25-49% on four (eBay, Reddit, Slashdot, Wikipedia), and 50% or greater on five (Authorize, Blizzard, Captcha.net, Megaupload, NIH). • Only Google and Recaptcha resisted to Decaptcha’s attack attempts • Led to a series of recommendations for captcha designers as well as for hackers O GUNESHWOR SINGH PESIT, Dept. of CSE 2013 21/23
  • 21. Strengths and Weaknesses of Captcha References [1] E. Bursztein, M. Martin, and J. Mitchell. Text-based CAPTCHA strengths and weaknesses. In Proceedings of the 18th ACM conference on Computer and communications security, 2011 [2] E. Bursztein and S. Bethard. DeCAPTCHA: breaking 75% of eBay audio CAPTCHAs. In Proceedings of the 3rd USENIX Workshop on Offensive Technologies, 2009 [3] Wikipedia. Flood fill algorithm. http://en.wikipedia.org/wiki/Flood_fill [4] Wikipedia. Hsl and Hsv color representation. http://en.wikipedia.org/wiki/HSL_and_HSV [5] Elie Bursztein, Steven Bethard, John C. Mitchell, Dan Jurafsky, and Celine Fabry. How good are humans at solving Captchas? A large scale evaluation. Insecurity and Privacy, 2010. [6] C. Cortes and V. Vapnik. Support-vector networks.Machine learning, 20(3):273–297, 1995. O GUNESHWOR SINGH PESIT, Dept. of CSE 2013 22/23
  • 22. THANK YOU