Hacking Google reCaptcha with Google Voice Recognition... and Google Chrome in a Google ChromeBook

Automatic solving of Google reCAPTCHA v2
Authors: Ioseba Palop, Óscar Bralo, Álvaro Núñez
Redaction: Carmen Torrano
1 Abstract
CAPTCHAs are designed to distinguish between machines and human beings. Since
automatically solving CAPTCHAs implies that a bot can impersonate a human being, it is very
important to guarantee the effectiveness of CAPTCHAs. In this paper a mechanism to
automatically solving Google reCAPTCHA v2 is presented. In particular, it automatically solves
the audio challenge available for visually impaired individuals.
Although this reCAPTCHA is considered the hardest to break, the presented solution achieves a
92% success rate. This shows that Google reCAPTCHA v2 is not secure. Thus, the problem of
distinguishing humans from bots is still not properly solved.
2 Context
Ever since Alan Turing first proposed his famous Turing test in 1950, the problem of
distinguishing between people and robots has been a challenge in the field of artificial
intelligence. One of the methods presented for making such tests automatic are CAPTCHAs
(Completely Automated Public Turing test to tell Computers and Humans Apart). The term
CAPTCHA was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas Hopper and John Langford
from Carnegie Mellon University [1].
CAPTCHAs [2] are automated tests designed to tell computers and humans apart by presenting
users with a problem that humans can solve but current computer programs cannot yet [1].
They are frequently used to prevent automated abusing of online services and secure different
applications, such as preventing bots from voting continuously in online polls, automatically
registering of millions of spam email accounts, automatically purchasing tickets to buy out an
event, etc.
CAPTCHA usually consist of a visual challenge, showing an image that the user should recognize,
like deciphering distorted characters, or answering questions related to the image shown (such
as identifying a house from a given set for example). However, since visual challenges limit
access to millions of visually impaired individuals, audio challenges were created. In this case, a
set of words, sentences or digits should be recognized from the audio.
Audio challenges are less frequent than their visual counterpart. It is estimated that nearly 1%
of all CAPTCHAs are delivered as an audio [5]. Additionally, Bursztein et al. [5] affirm that audio
challenges are harder to solve than image ones.
Von Ahn et al. [3] provide an estimation of the effort that humans spend solving CAPTCHAs.
Their results pointed out that humans around the world type more than 100 million CAPTCHAs
every day. The authors proposed the idea about turning this big amount of effort productive and
employing it for useful tasks, like digitizing books. This is the core philosophy behind the
reCAPTCHA project which is implemented by more than 40,000 Web sites.
When Google acquired reCAPTCHA in September 2009, they announced that current Artificial
Intelligence technology can solve even the most difficult variant of distorted text with 99.8%
accuracy. Consequently, in 2014 Google launched a new version of reCAPTCHA [4]. Its main
novelty is the distinction between machines and humans with a click. That is why this new
version of CAPTCHA is also known as “No CAPTCHA reCAPTCHA”. This distinction relies on

several security considerations are introduced in the design. Some of them are further detailed
in section “Security measures of Google reCAPTCHA v2”.
Their creators claim that it is designed to have anti-bot protection, in fact the slogan is “tough
on bots, easy on humans”. Google also affirms that reCAPTCHA is the most widely used in the
world, being used by Snapchat or Wordpress among others.
3 Solving Google reCAPTCHA v2 using audio challenge
In order to solve reCAPTCHA, the following steps are required:
1. Clicking the “I am not a robot”
checkbox. In some cases, the
reCAPTCHA will be solved only
clicking this checkbox. This behavior
is totally random based on Google
algorithms.
Fig.1. Example of a form with the Google reCAPTCHA v2.
2. If the click has not been classified as
human behavior, an image will
appear (visual challenge). However,
in order to make it accessible (for
blind people for example) the user
is presented with a headphones
button to get an audio challenge.
Fig. 2. Example of the visual challenge. The headphone
button is located in the bottom left corner.

3. After clicking on the headphones,
the user can click the “Play” button
and the browser will play the audio
challenge. Alternatively, it is also
possible to download the audio file
as MP3 by clicking the download
button.
Fig. 3. Example of the screen corresponding to the audio
challenge.
4. When playing the audio, only five digits are pronounced by different people, always in
English, with different intonations, different accents and different pauses.
5. The user is supposed to type the
digits heard into the box. If the
digits are correctly introduced,
reCAPTCHA considers that the
challenge has been solved by a
human. For each audio challenge
there is only one chance to solve it.
Fig. 4. Example of validated reCAPTCHA.
4 Security measures of the reCAPTCHA v2 audio challenge
Bursztein et al. [5] show in their study a comparison of the features corresponding to different
CAPTCHAs. According to the results, in 2010 the Google audio challenge had the following
characteristics: male voice, length of 5 to 15 digits, single digit charset [0-9], the average
duration of 37.1 seconds, sample rate 8000Hz, no beep and no repeat.
As of this paper, there is no official information about the security mechanisms implemented in
Google reCAPTCHA v2. However, five main measures have been deduced:
- It detects when a click is simulated, hence distinguishing it from a real mouse click of a
user.
- Audios are recorded with different speakers: pitch, intensity and accent.
- The digits have different pauses between them.
- The timing when typing is monitored, so, if the digits are typed too quickly, it is flagged
as machine behavior.
- Google controls the time spent to click the Verify button. In case it is clicked too quickly,
for example before the complete duration of an audio track, it is considered as bot
behavior.
- If it is considered that a bot is trying to automatically solving Google reCAPTCHA, the IP
address is banned for a certain period of time.

5 Related Work
In March 2016, Suphannee Sivakorn et al. [17] presented in Black Hat Asia 2016 their paper “I’m
not a human: Breaking the Google reCAPTCHA” with the automatic resolution of Google
reCAPTCHA using the image challenge. They achieved 70% of success rate, feeding their system
previously, storing and tagging all the images for future resolutions.
No more automatic solutions have been reported to date.
6 Solution details
The solution has been designed as a client-backend service architecture. The client consists of a
Chrome extension developed in Javascript and the backend service has been developed in the
.NET Framework.
The extension is designed so that it is enabled automatically when it detects an instance of
reCAPTCHA in the web page the user is currently visiting.
The proposed technique for automatically solving Google reCAPTCHA takes advantage of the
accessibility option, bypassing the audio reCAPTCHA.
The steps to get to the audio challenge and solve it were explained in Sec. “Example of solving
Google reCAPTCHA v2”.
The goal is to reproduce the steps that a human being would take without being detected as a
machine behavior.
6.1 Steps
1. For triggering a click on the “I am not a robot” checkbox, the extension detects the
coordinates where the reCAPTCHA iframe is located. To obtain those coordinates, it is
necessary that reCAPTCHA appears in the visible part of the DOM. This is expected since
human behavior is being simulated, human need to actually see the corresponding
checkbox. Once inside the visible DOM, the chrome extension is able to get the
coordinates correctly regardless of the window size and position. Then a call to the
backend service is made in order to perform the click on the checkbox coordinates.
2. The backend service triggers a click event in the specified position.
3. When the iframe with the image challenge appears, the extension gets the coordinates
where the headphones button is located, and make another call to the backend service
with the headphones position.
4. The backend triggers a new click event in the headphones button.
5. As soon as the last iframe is loaded, the chrome extension is able to obtain the url
corresponding to the audio file, together with the other needed coordinates (textbox,
verify button) in order to perform the last step. Then, this information is sent to the
backend service.
6. The backend service then processes the audio in order to get the numbers that can be
heard from that audio using Google Speech API. The audio file can have one of two
contents. If the behavior is judged as machine-like, the audio will play something similar
to this: “We are sorry, but we have detected that your computer is sending automatic
requests and to protect our users …”. In this case, the process stops. Otherwise, the
audio contains five digits and the process continues with the next steps. The audio
processing details are explained in Sec. “Voice recognition”.

7. The backend triggers a click event on the textbox and writes the digits. In order to bypass
the protection mechanism related to the typing speed and avoid being detected
because of typing too quickly, our solution waits for a random time between 0.5 and 1
seconds after typing each digit. This strategy is enough to deceive this protection
mechanism and make reCAPTCHA algorithms think that this behavior is human-like.
8. Finally, the backend service triggers a click event on the Verify button, the request for
solving reCAPTCHA is sent and Google replies if it has been correctly solved.
6.2 Voice recognition
The Google speech recognition API allows the definition of the set of words expected in the
audio file. This contributes to the effectiveness of the recognition when phonetically similar
words appear, maybe because the pronunciation of the speaker is not clear enough. Since
Google reCAPTCHA only uses digits, it is enough to specify a list of numbers from zero to nine.
To start working with Google Speech API, first the audio file has to be converted from MP3 to
FLAC, because this is one of the formats Google API recognizes.
The backend service sends three parallel requests to Google Speech API (one using an unaltered
version of the audio file, one reducing the silences between digits and another one reducing the
speed of the audio) in order to improve the success rate. Then, it stores the results to decide
which one should be used. The criterion to decide which of these three recognition results is the
winner is based on the number of digits recognized by each of them. The higher the number of
digits recognized, the better the method is considered. In case of a tie (same number of digits
recognized), any of the results is taken, in this case, the first one after ordering the alternatives.
We realized that introducing only three correct digits (not even sequential but three digits in any
position) allows the user to solve reCAPTCHA. We sort the results based on the count of digits
recognized, first five, then four and then three. Any count under three is discarded. If got any
number with count five, then it uses this one, if not, then it gets the four digits count, and so on.
6.3 Technical anti bot considerations
One of the security measures of Google reCAPTCHA is that the user needs to make real clicks.
Thus, in order to bypass this protection, and after trying to use a wide variety of
programmatically solutions, we decided to simulate the real clicks by making calls to the
Windows API. This solution makes possible to trigger a mouse event that is exactly the same that
a human being handling a mouse would do.
In a random (but small) percentage of the cases, instead of launching an audio challenge when
user clicks on the headphones, the Google reCAPTCHA launches a text challenge, where the user
should choose between different words proposed. In this case, our system would not be able to
automatically solve the reCAPTCHA, since its aim is solving audio reCAPTCHAs. This happens
randomly based on the machine learning algorithms within the reCAPTCHA. The solution to this
situation is to reload the page or ask for a new audio challenge. The resolution time of the
proposed solution is approximately 20 seconds with a 92% of success rate.
6.4 Experiments and Results
For the experiments a set of 1172 audio files was collected. From them, 328 were solved
automatically when clicking the “I am not a robot” checkbox.
We studied the effectiveness of the proposed solution for the remaining cases (844).

Table 1 shows the results obtained. Four cases are possible:
- Solved means that the proposed solution has resolved the captcha recognizing three or
more digits from the audio and Google verified it.
- Not Solved refers to two different cases. One is where the speech recognition API is able
to recognize three or more digits but these are wrong and Google did not verify the
whole number. The second case is when reCAPTCHA detects a bot-like behavior.
- Incomplete is where less than three digits are recognized from the audio file.
- Fail occurs when zero digits are obtained from the voice system or any error happens.
Result Recognized digit count Partial (%) Total (%)
Solved
3 11,01
92,064 32,58
5 48,47
Not Solved 2,84
Incomplete 4,74
Fail 0,36
Table 1. Performance results.
Table 1 shows that the proposed solution is able to automatically solve Google reCAPTCHA with
a 92.06% success rate, detailed with the count of digits recognized. It means that 92% of the
times it was able to impersonate a human being. Considering that CAPTCHAs are used to protect
against abusing services, this fact implies overkill and important consequences.
From those cases where the reCAPTCHA was automatically solved, we studied the effectiveness
of each processing audio technique. In 46.98% of cases the winner algorithm was the audio with
silence processing. The 38.47% the winner was the raw audio and the 14.29% was the audio
with speed processing.
We would like to mention that experiments cannot be repeated using the same audio twice, as
the verification process can only happen once.
7 Recommendations for strong audio challenges
Since one of the weakest points of Google reCAPTCHA audio is using only five digits, the
recommendation is using longer sequences. Furthermore, numbers do not need to be reduced
to one digit only (for example, numbers from 0 to 999 could be used).
Additionally, using the whole alphabet (not only numbers) in order to increase the search field.
Increasing the number of possibilities makes it harder for bots to break the CAPTCHA. Even
complete words could be introduced and mixed with letters and numbers.
Furthermore, the experiments reveal that solving only three out of five digits is enough in order
to solve the CAPTCHA. This decreases the search space to one thousand possibilities (three
digits). This again brings us to the known principle in security: a system is as secure as its weakest
link.
Another recommendation would be to introduce distortions in the audio. This would make the
understanding of the audio more difficult to machines. The background noise could be another
good thing to add.

Conclusions
Although Google reCAPTCHA was designed to be easy on humans and hard on bots, in this paper
it is shown that it is not secure. In fact, the proposed solution is able to break it in 92% of the
cases. This fact shows that the challenge of distinguishing humans from bots apart is still an open
problem.
The proposed solution relies on taking advantage of the audio challenge available for vision
impaired individuals. One of the weaknesses of the Google reCAPTCHA v2 is that for the audio
challenge it asks for five digits only. Furthermore, even guessing any three out of the five digits,
it is possible to solve the challenge. This reduces the scope to only 103
possibilities, which is far
from being considered secure.
In this paper it is shown that it is possible to automatically solve Google reCAPTCHA v2 in 92%
of the times. Considering that this is the strongest CAPTCHA, the situation is alarming. This
implies being able to impersonate people in scenarios such as e-voting, spam in mail accounts,
performing denial of service attacks and so on. For achieving a more secure digital world
stronger CAPTCHAs have to be designed.
Attendee Takeaways
- Notions about CAPTCHA and reCAPTCHA.
- State of the art in CAPTCHAs.
- Description of the Google reCAPTCHA v2 and some security measures it applies.
- Technique to automatically solving the Google reCAPTCHA v2. Voice recognition
processing techniques used.
- Recommendations for designing strong audio challenges.
What’s new?
The proposed solution is a fully automated reCAPTCHA solver that takes advantage of Google
speech API and it is the only known solution that is entirely based on the audio challenge. It
achieves a 92% success rate which is the highest among any other existing solutions, without
previous learning needed and no data storing. Since reCAPTCHA is owned by Google the present
proof of concept breaks a Google service by using another Google service.
Why Black Hat?
The consequences of bypassing CAPTCHAs can be very dramatic since they are designed to
distinguish between humans and bots in actions such as voting continuously in online polls,
automatically registering for millions of spam email accounts, automatically purchasing tickets
to buy out an event, etc. Furthermore, the number of users that interact with reCAPTCHAs is
extremely high.
We consider that it is vital to protect such scenarios and offer security for preventing these kind
of abuses.
Given the popularity of Black Hat and the type of public attending, we consider that it is the
perfect scenario for presenting our research. Given the importance of the consequences of these
attacks and the volume of users affected, we think that it should be presented in a conference
such as Black Hat.

With this talk, we also expect to create awareness not only about the importance of designing
strong CAPTCHAs, which an unsolved challenge nowadays, but also we hope that this helps in
the purpose of creating a more secure and trustable society and world.
References
[1] http://www.CAPTCHA.net
[2] L. von Ahn, M. Blum, and J. Langford. “Telling Humans and Computers Apart Automatically,”
Communications of the ACM, vol. 47, no. 2, pp. 57-60, Feb. 2004.
[3] Von Ahn, L., Maurer, B., McMillen, C., Abraham, D., & Blum, M. (2008). reCAPTCHA: Human-
based character recognition via web security measures. Science, 321(5895), 1465-1468.
[4] https://www.google.com/reCAPTCHA/intro/index.html
[5] Bursztein, E., Bethard, S., Fabry, C., Mitchell, J. C., & Jurafsky, D. (2010, May). How Good Are
Humans at Solving CAPTCHAs? A Large Scale Evaluation. In IEEE Symposium on Security and
Privacy (pp. 399-413).
[6] Tam, J., Simsa, J., Hyde, S., & Ahn, L. V. (2008). Breaking audio CAPTCHAs. In Advances in
Neural Information Processing Systems (pp. 1625-1632).
[7] Wilkins, J. (2010). Strong CAPTCHA guidelines.
[8] Tam J, Huggins-Daines JD, von Ahn L, Blum M. (2008, July). Improving audio CAPTCHAs. In
Proceedings of the 2008 symposium on accessible privacy and security (SOAPS 2008), USA.
[9] Houck, C., Lee, J. (2010, August). Decoding reCAPTCHA. DEF CON 18 Hacking Conference.
[10] Adam, C-P, Jeffball. (2012 May). Codename Stiltwalker. Layer ONE hacker conference, USA.
[11] Cruz-Perez, C., Starostenko, O., Uceda-Ponga, F., Alarcon-Aquino, V., Reyes-Cabrera, L.
(2012, June) Breaking reCAPTCHAs with Unpredictable Collapse: Heuristic Character
Segmentation and Recognition, Volume 7329 of the series Lecture Notes in Computer Science
pp 155-165
[12] Chellapilla, K., and Simard, P. (2004). Using Machine Learning to Break Visual Human
Interaction Proofs (HIPs). In Advances in Neural Information Processing Systems 17, Neural
Information Processing Systems (NIPS). MIT Press.
[13] Chellapilla, K., Larson, K., Simard, P., and Czerwinski, M. (2005). Building Segmentation
Based Human-friendly Human Interaction Proofs. In 2nd Int’l Workshop on Human Interaction
Proofs, Springer-Verlag. LNCS 3517.
[14] Bursztein, E., Matthieu, M., and John M. (2011). Text-based CAPTCHA strengths and
weaknesses. In Proceedings of the 18th ACM conference on Computer and communications
security. ACM.
[15] Ahmad, E., Ahmad S., Yan, J., and Tayara, M. (2011). The robustness of Google CAPTCHA's.
Computing Science, Newcastle University.
[16] Yan, J. and El Ahmed, A.S. (2008, October). A Low-cost Attack on a Microsoft CAPTCHA. In
15th ACM Conference on Computer and Communications Security (CCS’08). Virginia, USA. ACM
Press. pp. 543-554.

[17] Suphannee Sivakorn, Jason Polakis, and Angelos D. Keromytis. (2016). I’m not a human:
Breaking the Google reCAPTCHA.

Hacking Google reCaptcha with Google Voice Recognition... and Google Chrome in a Google ChromeBook

More Related Content

Viewers also liked

More from Chema Alonso

Recently uploaded

Hacking Google reCaptcha with Google Voice Recognition... and Google Chrome in a Google ChromeBook