The document discusses the evolution of CAPTCHAs from first generation distorted text to reCAPTCHAs that helped digitize books by using words humans could read but computers could not. It then discusses how NoCAPTCHA reCAPTCHA was developed to address issues like accessibility for those with disabilities. The document also summarizes a student project that used deep learning methods like CNNs and transfer learning to break single character CAPTCHAs with high accuracy, showing the need for more advanced CAPTCHAs.
9953330565 Low Rate Call Girls In Rohini Delhi NCR
Ā
Breaking CAPTCHAs using ML
1. Introduction Evolution Method
CPS 205 : Introduction to Cybersecurity
Breaking CAPTCHAs using ML
Jishnu Jaykumar P
jishnujayakumar.github.io
Robert Bosch Centre for Cyber-Physical Systems
Indian Institute of Science Bangalore
March 5, 2018
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
2. Introduction Evolution Method
Introduction
CAPTCHA stands for
Completely
Automated
Public
Turing Test to Tell
Computers and
Humans
Apart.
1
CAPTCHA: using hard AI problems for security
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
3. Introduction Evolution Method
Introduction
CAPTCHA stands for
Completely
Automated
Public
Turing Test to Tell
Computers and
Humans
Apart.
The term CAPTCHA was coined in 2003 by Luis von
Ahn, Manuel Blum, Nicholas Hopper and John
Langford of Carnegie Mellon University.
1
CAPTCHA: using hard AI problems for security
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
4. Introduction Evolution Method
Introduction
CAPTCHA stands for
Completely
Automated
Public
Turing Test to Tell
Computers and
Humans
Apart.
The term CAPTCHA was coined in 2003 by Luis von
Ahn, Manuel Blum, Nicholas Hopper and John
Langford of Carnegie Mellon University.
Find the paper here 1
1
CAPTCHA: using hard AI problems for security
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
5. Introduction Evolution Method
What is a CAPTCHA?
A CAPTCHA is a program that protects websites
against bots by generating and grading tests that
humans can pass but current computer programs can-
not.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
6. Introduction Evolution Method
What is a CAPTCHA?
A CAPTCHA is a program that protects websites
against bots by generating and grading tests that
humans can pass but current computer programs can-
not.
For example, humans can read distorted text as the one
shown below, but current computer programs canāt:
Figure: Source - https://fakecaptcha.com
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
7. Introduction Evolution Method
CAPTCHAs have several applications for practical
security, including (but not limited to):
Preventing Comment Spam in Blogs.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
8. Introduction Evolution Method
CAPTCHAs have several applications for practical
security, including (but not limited to):
Preventing Comment Spam in Blogs.
Protecting Website Registration.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
9. Introduction Evolution Method
CAPTCHAs have several applications for practical
security, including (but not limited to):
Preventing Comment Spam in Blogs.
Protecting Website Registration.
Protecting Email Addresses From Scrapers.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
10. Introduction Evolution Method
CAPTCHAs have several applications for practical
security, including (but not limited to):
Preventing Comment Spam in Blogs.
Protecting Website Registration.
Protecting Email Addresses From Scrapers.
Online Polls (CMU-MIT bot race for best CS university
ranking, 1999).
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
11. Introduction Evolution Method
CAPTCHAs have several applications for practical
security, including (but not limited to):
Preventing Comment Spam in Blogs.
Protecting Website Registration.
Protecting Email Addresses From Scrapers.
Online Polls (CMU-MIT bot race for best CS university
ranking, 1999).
Preventing Dictionary Attacks.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
12. Introduction Evolution Method
CAPTCHAs have several applications for practical
security, including (but not limited to):
Preventing Comment Spam in Blogs.
Protecting Website Registration.
Protecting Email Addresses From Scrapers.
Online Polls (CMU-MIT bot race for best CS university
ranking, 1999).
Preventing Dictionary Attacks.
Search Engine Bots.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
13. Introduction Evolution Method
CAPTCHAs have several applications for practical
security, including (but not limited to):
Preventing Comment Spam in Blogs.
Protecting Website Registration.
Protecting Email Addresses From Scrapers.
Online Polls (CMU-MIT bot race for best CS university
ranking, 1999).
Preventing Dictionary Attacks.
Search Engine Bots.
Worms and Spam.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
14. Introduction Evolution Method
First Generation CAPTCHA
Distorted pieces of text that would help stop spam on
the internet.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
15. Introduction Evolution Method
First Generation CAPTCHA
Distorted pieces of text that would help stop spam on
the internet.
They worked because humans could read the text but
the computers/bots couldnāt.
Figure: An example of First Gen CAPTCHA
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
16. Introduction Evolution Method
reCAPTCHA - Stop Spam, Read Books
Fast Forwarding, millions of CAPTCHAs were solved
daily.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
17. Introduction Evolution Method
reCAPTCHA - Stop Spam, Read Books
Fast Forwarding, millions of CAPTCHAs were solved
daily.
So Luis von Ahn started to think, can we use this
brain power to do something useful.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
18. Introduction Evolution Method
reCAPTCHA - Stop Spam, Read Books
Fast Forwarding, millions of CAPTCHAs were solved
daily.
So Luis von Ahn started to think, can we use this
brain power to do something useful.
And the answer to this was yes and that gave birth to
reCAPTCHA.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
19. Introduction Evolution Method
reCAPTCHA - Stop Spam, Read Books
Fast Forwarding, millions of CAPTCHAs were solved
daily.
So Luis von Ahn started to think, can we use this
brain power to do something useful.
And the answer to this was yes and that gave birth to
reCAPTCHA.
They decided to use this brain power to digitize every
single physical book that we have.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
20. Introduction Evolution Method
reCAPTCHA - Stop Spam, Read Books
Figure: First take real physical books and scan them.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
21. Introduction Evolution Method
reCAPTCHA - Stop Spam, Read Books
Figure: Some errors while translating scanned copies to digital
text. Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
22. Introduction Evolution Method
reCAPTCHA - Stop Spam, Read Books
The reCAPTCHA team dumped the words that were
diļ¬cult to decipher by the OCR to the reCAPTCHA
database.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
23. Introduction Evolution Method
reCAPTCHA - Stop Spam, Read Books
The reCAPTCHA team dumped the words that were
diļ¬cult to decipher by the OCR to the reCAPTCHA
database.
So now, instead of using distorted text, they started to
show words from books that computers couldnāt under-
stand.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
24. Introduction Evolution Method
reCAPTCHA - Stop Spam, Read Books
The reCAPTCHA team dumped the words that were
diļ¬cult to decipher by the OCR to the reCAPTCHA
database.
So now, instead of using distorted text, they started to
show words from books that computers couldnāt under-
stand.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
25. Introduction Evolution Method
reCAPTCHA - Stop Spam, Read Books
Figure: When enough people on the internet solving these CAPTCHAs
wrote the same word for a piece of text shown, it would be uploaded
to the E-Books database.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
26. Introduction Evolution Method
reCAPTCHA - Stop Spam, Read Books
100 million reCAPTCHAs/day were being solved
everyday.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
27. Introduction Evolution Method
reCAPTCHA - Stop Spam, Read Books
100 million reCAPTCHAs/day were being solved
everyday.
Equivalent to 2.5 million books/year.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
28. Introduction Evolution Method
reCAPTCHA - Stop Spam, Read Books
100 million reCAPTCHAs/day were being solved
everyday.
Equivalent to 2.5 million books/year.
Hence in 2009, Google acquired reCAPTCHA.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
29. Introduction Evolution Method
reCAPTCHA - Stop Spam, Read Books
100 million reCAPTCHAs/day were being solved
everyday.
Equivalent to 2.5 million books/year.
Hence in 2009, Google acquired reCAPTCHA.
Google used the brain power to digitize all of the
New York Times Article Archives since 1851 and
Google Books.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
30. Introduction Evolution Method
reCAPTCHA - Stop Spam, Read Books
Figure: When Google ran out of NYT articles and Google Books,
they started giving street numbers from street views that helped
label Google Maps.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
32. Introduction Evolution Method
reCAPTCHA - Stop Spam, Read Books
Seems like a good solution, right?
NO!!!
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
33. Introduction Evolution Method
reCAPTCHA - Stop Spam, Read Books
Seems like a good solution, right?
NO!!!
What about blind people?
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
34. Introduction Evolution Method
reCAPTCHA - Stop Spam, Read Books
Seems like a good solution, right?
NO!!!
What about blind people?
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
35. Introduction Evolution Method
reCAPTCHA - Stop Spam, Read Books
Seems like a good solution, right?
NO!!!
What about blind people?
What about people with Dyslexia, poor eyesight, poor
hearing ability?
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
36. Introduction Evolution Method
reCAPTCHA - Stop Spam, Read Books
Seems like a good solution, right?
NO!!!
What about blind people?
What about people with Dyslexia, poor eyesight, poor
hearing ability?
On the other hand, computer vision algorithms were
becoming powerful and were outperforming humans in
solving problems (An example is shown towards the
end).
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
40. Introduction Evolution Method
NoCAPTCHA reCAPTCHA
Figure: When you click it, it sends a whole bunch of information
to Google.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
41. Introduction Evolution Method
NoCAPTCHA reCAPTCHA
Figure: If the Google reCAPTCHA risk analysis engine is still con-
fused, then it pops up a task box. If you pass it, then chances are the
next time you click it, it will automatically allow you to pass without
the task box challenge.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
42. Introduction Evolution Method
Overview
Final year project by Stanford students Nathan Zhao
Yi Liu and Yijun Jiang, Autumn 2017. 2
They had proposed the following algorithms.
Single-letter CAPTCHA recognition.
Multi-CAPTCHA recognition algorithm.
2
http://cs229.stanford.edu/proj2017/ļ¬nal-reports/5239112.pdf
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
44. Introduction Evolution Method
Dataset
PyCaptcha, a python package for CAPTCHA gener-
ation was used to make custom CAPTCHA image
dataset.
This package oļ¬ers several degrees of freedom such as
font style, distortion and noise, which can be exploited
to increase the diversity of the data and the diļ¬culty of
the recognition task.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
45. Introduction Evolution Method
Dataset
PyCaptcha, a python package for CAPTCHA gener-
ation was used to make custom CAPTCHA image
dataset.
This package oļ¬ers several degrees of freedom such as
font style, distortion and noise, which can be exploited
to increase the diversity of the data and the diļ¬culty of
the recognition task.
Single-letter CAPTCHA images (40-by-60 pixels) were
created by feeding PyCaptcha with uppercase letters
ranging from A to Z from a restricted set of fonts.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
46. Introduction Evolution Method
Dataset
PyCaptcha, a python package for CAPTCHA gener-
ation was used to make custom CAPTCHA image
dataset.
This package oļ¬ers several degrees of freedom such as
font style, distortion and noise, which can be exploited
to increase the diversity of the data and the diļ¬culty of
the recognition task.
Single-letter CAPTCHA images (40-by-60 pixels) were
created by feeding PyCaptcha with uppercase letters
ranging from A to Z from a restricted set of fonts.
The resulting images were labelled by the corresponding
letters.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
47. Introduction Evolution Method
Dataset
PyCaptcha, a python package for CAPTCHA gener-
ation was used to make custom CAPTCHA image
dataset.
This package oļ¬ers several degrees of freedom such as
font style, distortion and noise, which can be exploited
to increase the diversity of the data and the diļ¬culty of
the recognition task.
Single-letter CAPTCHA images (40-by-60 pixels) were
created by feeding PyCaptcha with uppercase letters
ranging from A to Z from a restricted set of fonts.
The resulting images were labelled by the corresponding
letters.
This thus gave a supervised classiļ¬cation problem with
26 classes.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
48. Introduction Evolution Method
Sample CAPTCHA generated by PyCaptcha
Figure: A typical CAPTCHA, which is an image distortion of the string
ADMD
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
50. Introduction Evolution Method
K-Means clustering results
Figure: Clustering after dimensionality reduction from 40x60 dimen-
sions to 2D.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
51. Introduction Evolution Method
Method: CNN
Figure: Proposed structure of convolutional neural network.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
52. Introduction Evolution Method
Method: VGG-19
Figure: Structure of VGG-19 and freezing of many last convolutional
layers.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
53. Introduction Evolution Method
Related Work
As CAPTCHAs are actively used by many websites to
protect traļ¬c, major corporations have already invested
signiļ¬cant resources in breaking CAPTCHAs to assess
the strengths of shortcomings of these data techniques.
3
Goodfellow, I.J., Bulatov, Y., Ibarz, J. Arnoud, S., Shet, V. (2013).
Multi-digit Number Recognition from Street View: Imagery using Deep
Convolutional Neural Networks. arxiv preprint.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
54. Introduction Evolution Method
Related Work
As CAPTCHAs are actively used by many websites to
protect traļ¬c, major corporations have already invested
signiļ¬cant resources in breaking CAPTCHAs to assess
the strengths of shortcomings of these data techniques.
A noteworthy mention is Googleās StreetView team.
3
Goodfellow, I.J., Bulatov, Y., Ibarz, J. Arnoud, S., Shet, V. (2013).
Multi-digit Number Recognition from Street View: Imagery using Deep
Convolutional Neural Networks. arxiv preprint.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
55. Introduction Evolution Method
Related Work
As CAPTCHAs are actively used by many websites to
protect traļ¬c, major corporations have already invested
signiļ¬cant resources in breaking CAPTCHAs to assess
the strengths of shortcomings of these data techniques.
A noteworthy mention is Googleās StreetView team.
They have used their algorithms for recognizing signs
in images on the CAPTCHA problem, achieving
99.8% 3
success on particular types of diļ¬cult-to-read
CAPTCHAs.
3
Goodfellow, I.J., Bulatov, Y., Ibarz, J. Arnoud, S., Shet, V. (2013).
Multi-digit Number Recognition from Street View: Imagery using Deep
Convolutional Neural Networks. arxiv preprint.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
56. Introduction Evolution Method
Google StreetView teamās DataSet
Examples of incorrectly transcribed street numbers from the large internal dataset
(transcription vs. ground truth). Note that for some of these, the Ėaground truthĖa is also
incorrect. The ground truth labels in this dataset are quite noisy, as is common in real world
settings.4
4
Multi-digit Number Recognition from Street View: Imagery using
Deep Convolutional Neural Networks. arxiv preprint.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
57. Introduction Evolution Method
Hard CAPTCHA puzzles dataset
Examples of images from the hard CAPTCHA puzzles
dataset.5
5
Goodfellow, I.J., Bulatov, Y., Ibarz, J. Arnoud, S., Shet, V. (2013).
Multi-digit Number Recognition from Street View: Imagery using Deep
Convolutional Neural Networks. arxiv preprint.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML
59. Introduction Evolution Method
The NSA and Israel wrote Stuxnet
together. - Edward Snowden
Thank You.
Jishnu Jaykumar P CPS205 : Breaking CAPTCHAs using ML