24. Possible security measures
Funky background image
− usually can be removed with basic preprocessing
Text distortions
25. Possible security measures
Funky background image
− usually can be removed with basic preprocessing
Text distortions
− modern OCR techniques can beat it
26. Possible security measures
Funky background image
− usually can be removed with basic preprocessing
Text distortions
− modern OCR techniques can beat it
Anti segmentation measures
28. Beating segmentation
If a character signature can be extracted from
only the vertical signature, character
segmentation becomes trivial
A Low-cost Attack on a Microsoft CAPTCHA - Jeff Yan, Ahmad Salah El Ahmad
School of Computing Science, Newcastle University, UK
30. Beating segmentation
We can otherwise ignore it!
The following slides are about an experiment
about this approach
31. A Monte-Carlo experiment
Note: for testing performance, the variance of
the characters has been kept to a minimum
f(x) → y
x in binary( 0 - 2^3000 )
y in 10^6
32. Training:
− Select one character image at random
− Select N black spots
− Sort the points for uniqueness
− Subtract the first point from all others for position
independence
− Assign it a 'weight' for each character using the
following formula:
matched characters count / sample size
− Assign it a 'score' (indicates classification quality)
selected digit weight / (1 + other digit weights)
33. Recognition:
− Make a score map for all points
− Select the most appropriate character for each
column
− Process the resulting string into a 6 digit string
36. An equivalent model
input layer
linear hidden layer
(feature layer)
threshold layers
softmax layer
37. An equivalent model
input layer
OCR
linear hidden layer
(feature layer) without zero
penalty
==
threshold layers No biases for
the first layer
(avoids the
2*binary - 1
effect)
softmax layer
38. Hacking the OCR:
To negate the effect the biases, for each image we
add random noise in the white areas
This will greatly improve the recognition in a noisy
image
39. An more powerful model
input layer
Hacked OCR layer
Score map
output layer