SlideShare a Scribd company logo
Machine Learning Methods
 for CAPTCHA Recognition
       Rachel Shadoan
       Zachery Tidwell, II
CAPTCHA
Completely Automated Public Turing Test to tell Computers and Humans Apart


Why are they interesting?
  o Harder than normal text recognition
         On par with handwriting recognition,
         reading damaged text
  o Techniques translate well to other problems
         Facial recognition (Gonzaga, 2002)
         Weed identification (Yang, 2000)
  o Near infinite data sets
         Easier to avoid over-fitting
Hypothesis

CAPTCHA recognition can be
 accomplished to a high degree
 of accuracy using machine
 learning methods with minimal
 preprocessing of inputs.
Methods
           Tools
              o JCaptcha
              o Image Processing

Learning Methods        Segmentation Methods
  o Feed-forward Neural   o Overlapping
    Nets                     o Whitespace
  o Self-Organizing Maps     o K-Means
  o K-Means
  o Cluster Classification
JCaptcha

o Open-source CAPTCHA
  generation software
o Highly configurable
   Can produce CAPTCHAs of
   many levels of difficulty

o Check it out at:
  http://jcaptcha.sourceforge.net
Image Processing
Sparse Image
  Represents Images as unbounded set of pixels
  Each pixel is a value between 0 and 1 and a
    coordinate pair
  Center each image before turning into a matrix of
    0s and 1s




         Original          After Transformation
Feed-Forward Neural Nets




      As covered in class
Self-Organizing Maps
Training                          Collection
    Initialize N buckets to         For many inputs
       random values
                                          Sort each input into 
    For each input                        the bucket it most 
       Find the bucket that is            closely matches
       “closest” to the input       For each bucket and each 
       Adjust the “closest”         character
       bucket to more closely             Calculate the 
       match the input using              probability of that 
       exponential average                character going into 
                                          that bucket.
K-Means
• Very similar to Self‐
  Organizing Maps 
  (SOMs)
• Can use the same 
  classifying mechanism 
  as used for SOM
Overlapping Segmentation
• Divide image into
  fixed number of
  overlapping tiles of
  the same size
• In our case, 20 x 20
  pixels with a 50%
  overlap
• Discard chunks
  under a certain size   Note: This is a B with
                         part of it cut off, not
  and chunks that are    an E. Therein lies the
  all white              rub.
Whitespace Segmentation
• Iterate through the
  image from left to
  right—segment
  when a full column
  of whitespace is
  encountered
• Works perfectly for
  well-spaced text
K-Means Segmentation
• Performs better
  than heuristic
  segmentation on
  closely-packed
  inputs
Segmentation Comparison
     Even‐width


     Whitespace


     K‐Means



     Even‐width


     Whitespace


     K‐Means
Experiment 1
Machine Learning Method:
  Self-Organizing Map
Topology
  200 buckets, initialized randomly
Inputs:
  3 letter CATPCHAs
  Random fonts
  Letters A-G
  “Chunked” using overlapping segmentation
Experiment 1 Results
Buckets fell into three primary categories:

  Distinguishable
  letters


  Chunks with halves
  of two letters

  Indistinguishable
  noise
Experiment 1 Results
Experiment 2
ML Method:                                        Contains … ?
  Neural Net
                                                             A: 0 or 1
Topology:                                                    B : 0 or 1
                                                             C: 0 or 1




                           400 Nodes
  Fully connected




                                       50 Nodes




                                                   7 Nodes
                                                             D: 0 or 1
                                                             E: 0 or 1
  400 inputs                                                 F: 0 or 1
  50 node hidden layer                                       G: 0 or 1

  7 outputs
Inputs:
  Single letter CATPCHAs
  Random fonts
  Letters A-G
Experiment 2 Results




     Neural Net Learning Curve
Experiment 2 Results

                                               Past a certain
                                               number of nodes
                                               in the hidden
                                               layer, the
                                               topology ceases
                                               to have a huge
                                               impact on
                                               accuracy.



Neural Net Accuracy vs. Size of Hidden Layer
Experiment 3
ML Method:                ML Method:
 SOM                       Neural Net
Topology:                 Topology:
 500 buckets               Fully connected
                           400 inputs
                           1000 node hidden layer
                           7 outputs
Inputs:
      4 letter CATPCHAs
      Fandom fonts
      Letters A-G
Experiment 3




Neural Net vs. SOM on CAPTCHAs Length 4, Letters A‐G
Experiment 4
ML Method:                ML Method:
 SOM                       Neural Net
Topology:                 Topology:
 500 buckets               Fully connected
                           400 inputs
                           1000 node hidden layer
                           7 outputs
Inputs:
      4 letter CATPCHAs
      Fandom fonts
      Letters A-Z
Experiment 4




Neural Net vs. SOM on CAPTCHAs Length 4, Letters A‐Z
Experiment 5
ML Method:                ML Method:
 SOM                       Neural Net
Topology:                 Topology:
 500 buckets               Fully connected
                           400 inputs
                           1000 node hidden layer
                           7 outputs
Inputs:
      5 letter CATPCHAs
      Fandom fonts
      Letters A-Z
Experiment 5




Neural Net vs. SOM on CAPTCHAs Length 5, Letters A-Z
What it all means
• Increasing number of characters
  dramatically decreases total accuracy
  because segmentation quality decreases
• True positive rate goes down when
  segmentation quality decreases
• Hence, better segmentation is the key
Future Work
Improved Segmentation
   o Wirescreen segmentation
   o Ensemble techniques
Improved True Positive Rates with Current
  System
   o Ensemble techniques
New problems
   o Handwriting recognition
   o Bot net of doom
Questions?

More Related Content

What's hot

Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
Chia-Wen Cheng
 
Machine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative ModelsMachine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative Modelsbutest
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
Sujit Pal
 
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksSequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural Networks
Nguyen Quang
 
Network embedding
Network embeddingNetwork embedding
Network embedding
SOYEON KIM
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP
Textkernel
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Roelof Pieters
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial Intelligence
Jonathan Mugan
 
Icml2018 naver review
Icml2018 naver reviewIcml2018 naver review
Icml2018 naver review
NAVER Engineering
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Matthew Lease
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
Bhaskar Mitra
 
AINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoAINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, Couto
Lidia Pivovarova
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
Jonathan Mugan
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information Retrieval
Bhaskar Mitra
 
Representation Learning in Large Attributed Graphs
Representation Learning in Large Attributed GraphsRepresentation Learning in Large Attributed Graphs
Representation Learning in Large Attributed Graphs
Nesreen K. Ahmed
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
IJERD Editor
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
eXascale Infolab
 
Hate speech detection
Hate speech detectionHate speech detection
Hate speech detection
NASIM ALAM
 
Tagged network (colored clique network) COGNITIVE 2015 by Stephen Larroque
Tagged network (colored clique network) COGNITIVE 2015 by Stephen LarroqueTagged network (colored clique network) COGNITIVE 2015 by Stephen Larroque
Tagged network (colored clique network) COGNITIVE 2015 by Stephen Larroque
Stephen Larroque
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep Learning
Asim Jalis
 

What's hot (20)

Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
Machine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative ModelsMachine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative Models
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
 
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksSequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural Networks
 
Network embedding
Network embeddingNetwork embedding
Network embedding
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial Intelligence
 
Icml2018 naver review
Icml2018 naver reviewIcml2018 naver review
Icml2018 naver review
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
AINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoAINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, Couto
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information Retrieval
 
Representation Learning in Large Attributed Graphs
Representation Learning in Large Attributed GraphsRepresentation Learning in Large Attributed Graphs
Representation Learning in Large Attributed Graphs
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
 
Hate speech detection
Hate speech detectionHate speech detection
Hate speech detection
 
Tagged network (colored clique network) COGNITIVE 2015 by Stephen Larroque
Tagged network (colored clique network) COGNITIVE 2015 by Stephen LarroqueTagged network (colored clique network) COGNITIVE 2015 by Stephen Larroque
Tagged network (colored clique network) COGNITIVE 2015 by Stephen Larroque
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep Learning
 

Viewers also liked

CAPTCHA and Convolutional neural network
CAPTCHA and Convolutional neural network CAPTCHA and Convolutional neural network
CAPTCHA and Convolutional neural network
Bushra Jbawi
 
Captcha-recognition-with-active-deep-learning
Captcha-recognition-with-active-deep-learningCaptcha-recognition-with-active-deep-learning
Captcha-recognition-with-active-deep-learning
crew1274
 
CAPTCHA Cracking System
CAPTCHA Cracking SystemCAPTCHA Cracking System
CAPTCHA Cracking System
Ayan Omer
 
breaking PHP web Captcha
breaking PHP web Captchabreaking PHP web Captcha
breaking PHP web Captcha
crew1274
 
Captcha seminar
Captcha seminar Captcha seminar
Captcha seminar
Aurobindo Nayak
 
captcha.ppt
 captcha.ppt captcha.ppt
captcha.ppt
avinash2008
 
Generic Solving Of Text Based Captcha
Generic Solving Of Text Based CaptchaGeneric Solving Of Text Based Captcha
Generic Solving Of Text Based Captcha
karanwayne
 
Human or Intelligent Machine?
Human or Intelligent Machine?Human or Intelligent Machine?
Human or Intelligent Machine?
ameyakulk
 
CAPTCHA
CAPTCHACAPTCHA
CAPTCHA
Preetam Rout
 
CAPTCHA- Newly Attractive Presentation for Youth
CAPTCHA- Newly Attractive Presentation for YouthCAPTCHA- Newly Attractive Presentation for Youth
CAPTCHA- Newly Attractive Presentation for Youth
WebCrazyLabs
 
Seminar report on captcha
Seminar report on captchaSeminar report on captcha
Seminar report on captchakunalkiit
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
Christian Perone
 
deCaptcha
deCaptchadeCaptcha
deCaptcha
Vishal Punjabi
 
Captcha
CaptchaCaptcha
Captcha
crew1274
 

Viewers also liked (20)

CAPTCHA and Convolutional neural network
CAPTCHA and Convolutional neural network CAPTCHA and Convolutional neural network
CAPTCHA and Convolutional neural network
 
Captcha-recognition-with-active-deep-learning
Captcha-recognition-with-active-deep-learningCaptcha-recognition-with-active-deep-learning
Captcha-recognition-with-active-deep-learning
 
CAPTCHA Cracking System
CAPTCHA Cracking SystemCAPTCHA Cracking System
CAPTCHA Cracking System
 
Captcha
CaptchaCaptcha
Captcha
 
breaking PHP web Captcha
breaking PHP web Captchabreaking PHP web Captcha
breaking PHP web Captcha
 
CAPTCHA
CAPTCHACAPTCHA
CAPTCHA
 
Captchas
CaptchasCaptchas
Captchas
 
Captcha seminar
Captcha seminar Captcha seminar
Captcha seminar
 
captcha.ppt
 captcha.ppt captcha.ppt
captcha.ppt
 
Generic Solving Of Text Based Captcha
Generic Solving Of Text Based CaptchaGeneric Solving Of Text Based Captcha
Generic Solving Of Text Based Captcha
 
Human or Intelligent Machine?
Human or Intelligent Machine?Human or Intelligent Machine?
Human or Intelligent Machine?
 
CAPTCHA
CAPTCHACAPTCHA
CAPTCHA
 
Captcha ppt
Captcha pptCaptcha ppt
Captcha ppt
 
CAPTCHA- Newly Attractive Presentation for Youth
CAPTCHA- Newly Attractive Presentation for YouthCAPTCHA- Newly Attractive Presentation for Youth
CAPTCHA- Newly Attractive Presentation for Youth
 
Seminar report on captcha
Seminar report on captchaSeminar report on captcha
Seminar report on captcha
 
Captcha
CaptchaCaptcha
Captcha
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
deCaptcha
deCaptchadeCaptcha
deCaptcha
 
Captcha
CaptchaCaptcha
Captcha
 

Similar to Machine Learning Methods For Captcha Recognition

Komdat-Kompresi Data
Komdat-Kompresi DataKomdat-Kompresi Data
Komdat-Kompresi Data
mursalinfajri007
 
Original SOINN
Original SOINNOriginal SOINN
Original SOINNSOINN Inc.
 
The Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmThe Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmESCOM
 
InfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksInfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial Networks
Zak Jost
 
Intelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalIntelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalSuhas Pillai
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAM
fnothaft
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
Stanley Wang
 

Similar to Machine Learning Methods For Captcha Recognition (7)

Komdat-Kompresi Data
Komdat-Kompresi DataKomdat-Kompresi Data
Komdat-Kompresi Data
 
Original SOINN
Original SOINNOriginal SOINN
Original SOINN
 
The Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmThe Back Propagation Learning Algorithm
The Back Propagation Learning Algorithm
 
InfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksInfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial Networks
 
Intelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalIntelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_final
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAM
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
 

Recently uploaded

The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 

Recently uploaded (20)

The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 

Machine Learning Methods For Captcha Recognition

  • 1. Machine Learning Methods for CAPTCHA Recognition Rachel Shadoan Zachery Tidwell, II
  • 2. CAPTCHA Completely Automated Public Turing Test to tell Computers and Humans Apart Why are they interesting? o Harder than normal text recognition On par with handwriting recognition, reading damaged text o Techniques translate well to other problems Facial recognition (Gonzaga, 2002) Weed identification (Yang, 2000) o Near infinite data sets Easier to avoid over-fitting
  • 3. Hypothesis CAPTCHA recognition can be accomplished to a high degree of accuracy using machine learning methods with minimal preprocessing of inputs.
  • 4. Methods Tools o JCaptcha o Image Processing Learning Methods Segmentation Methods o Feed-forward Neural o Overlapping Nets o Whitespace o Self-Organizing Maps o K-Means o K-Means o Cluster Classification
  • 5. JCaptcha o Open-source CAPTCHA generation software o Highly configurable Can produce CAPTCHAs of many levels of difficulty o Check it out at: http://jcaptcha.sourceforge.net
  • 6. Image Processing Sparse Image Represents Images as unbounded set of pixels Each pixel is a value between 0 and 1 and a coordinate pair Center each image before turning into a matrix of 0s and 1s Original After Transformation
  • 7. Feed-Forward Neural Nets As covered in class
  • 8. Self-Organizing Maps Training Collection Initialize N buckets to  For many inputs random values Sort each input into  For each input the bucket it most  Find the bucket that is  closely matches “closest” to the input For each bucket and each  Adjust the “closest”  character bucket to more closely  Calculate the  match the input using  probability of that  exponential average character going into  that bucket.
  • 9. K-Means • Very similar to Self‐ Organizing Maps  (SOMs) • Can use the same  classifying mechanism  as used for SOM
  • 10. Overlapping Segmentation • Divide image into fixed number of overlapping tiles of the same size • In our case, 20 x 20 pixels with a 50% overlap • Discard chunks under a certain size Note: This is a B with part of it cut off, not and chunks that are an E. Therein lies the all white rub.
  • 11. Whitespace Segmentation • Iterate through the image from left to right—segment when a full column of whitespace is encountered • Works perfectly for well-spaced text
  • 12. K-Means Segmentation • Performs better than heuristic segmentation on closely-packed inputs
  • 13. Segmentation Comparison Even‐width Whitespace K‐Means Even‐width Whitespace K‐Means
  • 14. Experiment 1 Machine Learning Method: Self-Organizing Map Topology 200 buckets, initialized randomly Inputs: 3 letter CATPCHAs Random fonts Letters A-G “Chunked” using overlapping segmentation
  • 15. Experiment 1 Results Buckets fell into three primary categories: Distinguishable letters Chunks with halves of two letters Indistinguishable noise
  • 17. Experiment 2 ML Method: Contains … ? Neural Net A: 0 or 1 Topology: B : 0 or 1 C: 0 or 1 400 Nodes Fully connected 50 Nodes 7 Nodes D: 0 or 1 E: 0 or 1 400 inputs F: 0 or 1 50 node hidden layer G: 0 or 1 7 outputs Inputs: Single letter CATPCHAs Random fonts Letters A-G
  • 18. Experiment 2 Results Neural Net Learning Curve
  • 19. Experiment 2 Results Past a certain number of nodes in the hidden layer, the topology ceases to have a huge impact on accuracy. Neural Net Accuracy vs. Size of Hidden Layer
  • 20. Experiment 3 ML Method: ML Method: SOM Neural Net Topology: Topology: 500 buckets Fully connected 400 inputs 1000 node hidden layer 7 outputs Inputs: 4 letter CATPCHAs Fandom fonts Letters A-G
  • 22. Experiment 4 ML Method: ML Method: SOM Neural Net Topology: Topology: 500 buckets Fully connected 400 inputs 1000 node hidden layer 7 outputs Inputs: 4 letter CATPCHAs Fandom fonts Letters A-Z
  • 24. Experiment 5 ML Method: ML Method: SOM Neural Net Topology: Topology: 500 buckets Fully connected 400 inputs 1000 node hidden layer 7 outputs Inputs: 5 letter CATPCHAs Fandom fonts Letters A-Z
  • 25. Experiment 5 Neural Net vs. SOM on CAPTCHAs Length 5, Letters A-Z
  • 26. What it all means • Increasing number of characters dramatically decreases total accuracy because segmentation quality decreases • True positive rate goes down when segmentation quality decreases • Hence, better segmentation is the key
  • 27. Future Work Improved Segmentation o Wirescreen segmentation o Ensemble techniques Improved True Positive Rates with Current System o Ensemble techniques New problems o Handwriting recognition o Bot net of doom