Major Project- Security analysis by recognition of captcha


Published on

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Major Project- Security analysis by recognition of captcha

  2. 2. CAPTCHAS
  3. 3. HOW DOES IT WORK? CAPTCHA works on a simple principal: Only solvable by Humans. CAPTCHA works on the principle that computers cannot process the image character, while a human can easily read the CAPTCHA text. Hence it became quite a successful scheme where a user would have to enter the characters in order to proceed to any website.
  4. 4. While there exist many types of CAPTCHA, the most common one is the text based CAPTCHA where the random combination of characters of varying length is distorted into an image which, assumingly, cannot be processed and solved by a computer script but only read and understood by the Human senses. Once the Human enters the CAPTCHA characters, it is matched at the backend with the already known solution and if it is 100% perfect, the user can proceed to do the tasks. Cracking the CAPTCHA has been a challenge to AI Research community, and till date there has been so system that has been developed that was able to achieve a 100% accuracy and efficiency rate.
  5. 5. CAPTCHAs has applications for practical security like • Preventing Comment Spam in Blogs: Comment spamming to increase the index in the search engine. These bots spam the comments in blog with index words that will increase the blog’s index higher on search engine. CAPTCHA ensures that this does not happen. • Protecting Website Registration: Everyone uses emails! Sever websites have signups. It is humans who are supposed to sign up, however with Registration bots several such email services and sign up websites realized that it had millions of accounts overnight, all fake generated by the bots. • Protecting Email Addresses From Scrapers: Spammers crawl the Web in search of email addresses posted in clear text. CAPTCHAs provide an effective mechanism to hide your email address from Web scrapers. The idea is to require users to solve a CAPTCHA before showing your email address.
  6. 6. • Preventing Dictionary Attacks: A way to hack someone’s email or registration account is try millions of combinations in the password box along with the right userid. A CAPTCHA prevents this by showing up after a number of ‘miss’ trials of logging in. Since a bot cannot solve the CAPTCHA, more trials are not possible and it doesn’t account the account in any way. • Search Engine Bots: It is sometimes desirable to keep web pages unindexed to prevent others from finding them easily. There is an html tag to prevent search engine bots from reading web pages. The tag, however, doesn't guarantee that bots won't read a web page; it only serves to say "no bots, please." Search engine bots, since they usually belong to large companies, respect web pages that don't want to allow them in. However, in order to truly guarantee that bots won't enter a web site, CAPTCHAs are needed.
  7. 7. GOALS TO ACHIEVE • Web interface for the CAPTCHA system: Given a web page, we construct a plug-in so that when you click a button, the CAPTCHA will be captured, passed to a recognizer, get the result back, and fill in the CAPTCHA text box. The result is checked to see if the CAPTCHA is correctly filled. If yes, we record the CAPTCHA and the answer in a database, for future research. Also, the recognition rate is calculated for analysis. • Segmentation Engine: The JCAPTCHA is segmented here implemented on differed modes of segmentation. The segmentation algorithms are based on invariants observed on hundreds of JCAPTCHA. • Recognition Engine: Build a recognition engine for the JCAPTCHA segmented characters to identify the best answer possible.
  8. 8. A BRIEF FLOW: • A CAPTCHA recognition framework consists of 3 main features: • The front end plug-in that is used to detect the CAPTCHA on the webpage. • The segmentation engine which segments the characters of the CAPTCHA. • The recognizer which is responsible to identify the segmented character.
  9. 9. The diagram below demonstrates the framework for CAPTCHA recognition:
  10. 10. JCAPTCHA Recognizer Engine • The Recognizer Engine forms the core of the JCAP 1. Collecting files and removing artifacts We observed that the JCAPTCHA image file saved by the plugin had a 2-pixel blue border. This border was not in the original image and was an artifact created when the plugin software iMacros selected the image to take a screen shot. This border is cropped off the image, and the new image is saved in the Recognizer folder.
  11. 11. 2. Segmentation • There are three modes of segmentation that is configurable by the user. 1.Fast Pixel Array mode 2.Slow Pixel Array mode 3.Connected Components mode 3. Recognition • As introduced in the theory our approach to Character Recognition is based on template matching. Although, the implementation of the OCR is based very much on explanation given in the theory, I’d like to walk you through the flow of the code talking about some of the challenges I experienced building each function.
  12. 12. Screenshots 1. Image extraction using imacros
  13. 13. 2. Extracted CAPTCHA in the specified folder 3. Pre-processed images
  14. 14. 4. Segmentation
  15. 15. THANK YOU!