SlideShare a Scribd company logo
1 of 24
End-to-End Text Recognition with
Convolutional Neural Networks
Tao Wang*, David J. Wu*, Adam Coates, Andrew Y. Ng
Computer Science Department
Stanford University
* Denotes equal contribution
Tao Wang 2
Scene Text Recognition Overview
• Text “in the wild” are hard to recognize
• Wide range of variations in backgrounds,
textures, fonts, and lighting conditions
Street View Text Dataset
K.Wang et al., 2011
ICDAR 2003 Dataset
S. Lucas et al., 2003
Tao Wang 3
Detection/Classification High-level Inference
“HOTEL”
Two-Stage Framework
Tao Wang 4
Exhaustive
Graph Search
MSER + SVM with RBF
Kernel
Neumann and
Matas, 2012
CRF + N-gram
model
HOG + SVM with RBF
Kernel
Mishra et al., 2012
Pictorial
Structure
HOG + Random Ferns
K. Wang et al., 2011
Semi-Markov
CRF
Appearance + Geometry
Weinman et al.,
2008
High-level
inference
Classification and
detection
Works
Tao Wang 5
Simple
off-the-shelf
heuristics
Learnt features +
2-layer CNN
Our approach
Graph based
inference
models
Hand-designed
features + off-the-shelf
classifier
Most other
approaches
High-level
inference
Classification and
detection
Tao Wang 6
ICDAR 62-way cropped
character classification
Detection/Classification End-to-end system after high-level inference
Various Benchmarks
ICDAR and SVT end-to-end text recognition
ICDAR and SVT Cropped
word recognition Lexicon
SOTA
SOTA on ICDAR SOTA
Tao Wang 7
Unsupervised Feature Learning
Contrast Normalization + ZCA whitening
K-Means
Coates et al., 2011
Tao Wang 8
Convolution Convolution
Spatial Pooling Spatial Pooling
L2-SVM Classifier
√ Text × Non-Text
Backpropagation
Large representation but not enough data.
Overfitting?
96
256
~10K parameters for detection
~50K parameters for classification
1st layer 2nd layer
Tao Wang 9
Synthetic Data
Color Statistics
Synthetic “hard negatives”
Real Synthetic
Unrealistic Synthetic Data
Real Data
Java.Font + Natural backgrounds
Tao Wang 10
Detector Performance
Tao Wang 11
Text Line Bounding boxes
Candidate spaces
Tao Wang 12
81.4 81.7
64
89
50
55
60
65
70
75
80
85
90
95
100
Yokobayashi et
al., 2006
Coates et al.,
2011
K.Wang et al.,
2011
Our Approach Human
83.9
62-way classification accuracy on ICDAR cropped characters
(on ICDAR-Sample
characters)
Accuracy(%)
Higher is
better
Classifier Performance
Tao Wang 13
Tao Wang 14
Char
Class
Sliding window position
Tao Wang 15
Word Recognition
Lexicon:
…
MAKE
SERIES
ESTATE
POKER
…
S E R I E S -5.45
7.82
-1.74
-9.02
max ∑
Tao Wang 16
76
82
90
62
84
57
73
70
40
50
60
70
80
90
100
ICDAR-WD-50 ICDAR-WD-FULL SVT-WD
K.Wang et al., 2011
Mishra, et al., 2012
Our approach
Cropped Word Recognition Accuracy
Accuracy(%)
Cropped Words Benchmarks
Higher is better
Tao Wang 17
…
…
Candidate spaces
generated by detector
max( )
j
j
M
Seg
M Seg
S


Tao Wang 18
Tao Wang 19
End-to-end text recognition results
0.72
0.76
0.7
0.74
0.68
0.72
0.51
0.67
0.38
0.46
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
ICDAR-5 ICDAR-20 ICDAR-50 ICDAR-FULL SVT
K.Wang et
al., 2011
Our
approach
F-Score
End-to-end Benchmarks
Higher is better
Tao Wang 20
Sample Output
Images from SVT
Tao Wang 21
Sample Output Images
from ICDAR-FULL
Tao Wang 22
max( )
max({  })
n c
m n c n

  -- “confidence margin”
PEOSTEL
PEOST
POST
POS
Hunspell
POSE
POST
PEOPLE
PISTOL
…
LEXICON
Suggested
Words
Our F-score: 0.38
Neumann and
Matas, 2010: 0.40
c
Tao Wang 23
• Learnt features + 2-layer CNN for+ character detection and classification
• Simple heuristics to build end-to-end scene text recognition system
• State-of-the-art performances on
- ICDAR cropped character classification
- ICDAR cropped word recognition
- Lexicon based end-to-end recognition on ICDAR and SVT
• Extensible to more general lexicon with off-the-shelf spelling checker
Conclusion
Tao Wang 24

More Related Content

Similar to ICPR2012_slides.ppt

Similar to ICPR2012_slides.ppt (9)

Trivandrum
TrivandrumTrivandrum
Trivandrum
 
Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...
 
Core java Training in Chennai
Core java Training in ChennaiCore java Training in Chennai
Core java Training in Chennai
 
Integrated Approach to Handwritten Character Recognition using ANN and it’s I...
Integrated Approach to Handwritten Character Recognition using ANN and it’s I...Integrated Approach to Handwritten Character Recognition using ANN and it’s I...
Integrated Approach to Handwritten Character Recognition using ANN and it’s I...
 
360 b sc(cs) - semester v ku
360 b sc(cs) - semester v ku360 b sc(cs) - semester v ku
360 b sc(cs) - semester v ku
 
Bangla handwritten character recognition using MobileNet V1 architecture
Bangla handwritten character recognition using MobileNet V1 architectureBangla handwritten character recognition using MobileNet V1 architecture
Bangla handwritten character recognition using MobileNet V1 architecture
 
Dcn 20170823 yjy
Dcn 20170823 yjyDcn 20170823 yjy
Dcn 20170823 yjy
 
STRICT-SANER2015
STRICT-SANER2015STRICT-SANER2015
STRICT-SANER2015
 
Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...
Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...
Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...
 

Recently uploaded

Recently uploaded (16)

Best Of Korea- South Korea-kpop-k-drama-Kculture- kbeauty
Best Of Korea- South Korea-kpop-k-drama-Kculture- kbeautyBest Of Korea- South Korea-kpop-k-drama-Kculture- kbeauty
Best Of Korea- South Korea-kpop-k-drama-Kculture- kbeauty
 
Jackrabbit Limousine - Your Fast And Reliable Ride
Jackrabbit Limousine - Your Fast And Reliable RideJackrabbit Limousine - Your Fast And Reliable Ride
Jackrabbit Limousine - Your Fast And Reliable Ride
 
Expect On a Voodoo Cemetery Tour in New Orleans.pptx
Expect On a Voodoo Cemetery Tour in New Orleans.pptxExpect On a Voodoo Cemetery Tour in New Orleans.pptx
Expect On a Voodoo Cemetery Tour in New Orleans.pptx
 
Pune Baramati Visit Education Tour Report
Pune Baramati  Visit Education Tour ReportPune Baramati  Visit Education Tour Report
Pune Baramati Visit Education Tour Report
 
Essential Grammar in Use 4th Edition by R. Murphy.pdf
Essential Grammar in Use 4th Edition by R. Murphy.pdfEssential Grammar in Use 4th Edition by R. Murphy.pdf
Essential Grammar in Use 4th Edition by R. Murphy.pdf
 
What Should I Know Before Booking A Catamaran In Aruba
What Should I Know Before Booking A Catamaran In ArubaWhat Should I Know Before Booking A Catamaran In Aruba
What Should I Know Before Booking A Catamaran In Aruba
 
Book A Romantic Honeymoon Trip to the Andaman Islands
Book A Romantic Honeymoon Trip to the Andaman IslandsBook A Romantic Honeymoon Trip to the Andaman Islands
Book A Romantic Honeymoon Trip to the Andaman Islands
 
Taxi Bambino is a service providing clients with taxis with car seats for the...
Taxi Bambino is a service providing clients with taxis with car seats for the...Taxi Bambino is a service providing clients with taxis with car seats for the...
Taxi Bambino is a service providing clients with taxis with car seats for the...
 
The Need to Establish a State Owned Airline
The Need to Establish a State Owned AirlineThe Need to Establish a State Owned Airline
The Need to Establish a State Owned Airline
 
6 Unmissable Czech Food Experiences to try in Prague
6 Unmissable Czech Food Experiences to try in Prague6 Unmissable Czech Food Experiences to try in Prague
6 Unmissable Czech Food Experiences to try in Prague
 
Travel In Jhang and Visa company,"Explore, Dream,
Travel In Jhang and Visa company,"Explore, Dream,Travel In Jhang and Visa company,"Explore, Dream,
Travel In Jhang and Visa company,"Explore, Dream,
 
Top Places To Visit In Sikkim Tour Package.pdf
Top Places To Visit In Sikkim Tour Package.pdfTop Places To Visit In Sikkim Tour Package.pdf
Top Places To Visit In Sikkim Tour Package.pdf
 
A_Brief_Introductory_of_Nuristan, by Ab.Hakim Hakimi.pdf
A_Brief_Introductory_of_Nuristan, by Ab.Hakim Hakimi.pdfA_Brief_Introductory_of_Nuristan, by Ab.Hakim Hakimi.pdf
A_Brief_Introductory_of_Nuristan, by Ab.Hakim Hakimi.pdf
 
Top Temples in Uttarakhand for Newly Married.pptx
Top Temples in Uttarakhand for Newly Married.pptxTop Temples in Uttarakhand for Newly Married.pptx
Top Temples in Uttarakhand for Newly Married.pptx
 
Everything you need to know about adventure tourism in Nepal
Everything you need to know about adventure tourism in NepalEverything you need to know about adventure tourism in Nepal
Everything you need to know about adventure tourism in Nepal
 
🕉️MAHANAVABHARATAADISREERAMAYANAGH.docx
🕉️MAHANAVABHARATAADISREERAMAYANAGH.docx🕉️MAHANAVABHARATAADISREERAMAYANAGH.docx
🕉️MAHANAVABHARATAADISREERAMAYANAGH.docx
 

ICPR2012_slides.ppt

  • 1. End-to-End Text Recognition with Convolutional Neural Networks Tao Wang*, David J. Wu*, Adam Coates, Andrew Y. Ng Computer Science Department Stanford University * Denotes equal contribution
  • 2. Tao Wang 2 Scene Text Recognition Overview • Text “in the wild” are hard to recognize • Wide range of variations in backgrounds, textures, fonts, and lighting conditions Street View Text Dataset K.Wang et al., 2011 ICDAR 2003 Dataset S. Lucas et al., 2003
  • 3. Tao Wang 3 Detection/Classification High-level Inference “HOTEL” Two-Stage Framework
  • 4. Tao Wang 4 Exhaustive Graph Search MSER + SVM with RBF Kernel Neumann and Matas, 2012 CRF + N-gram model HOG + SVM with RBF Kernel Mishra et al., 2012 Pictorial Structure HOG + Random Ferns K. Wang et al., 2011 Semi-Markov CRF Appearance + Geometry Weinman et al., 2008 High-level inference Classification and detection Works
  • 5. Tao Wang 5 Simple off-the-shelf heuristics Learnt features + 2-layer CNN Our approach Graph based inference models Hand-designed features + off-the-shelf classifier Most other approaches High-level inference Classification and detection
  • 6. Tao Wang 6 ICDAR 62-way cropped character classification Detection/Classification End-to-end system after high-level inference Various Benchmarks ICDAR and SVT end-to-end text recognition ICDAR and SVT Cropped word recognition Lexicon SOTA SOTA on ICDAR SOTA
  • 7. Tao Wang 7 Unsupervised Feature Learning Contrast Normalization + ZCA whitening K-Means Coates et al., 2011
  • 8. Tao Wang 8 Convolution Convolution Spatial Pooling Spatial Pooling L2-SVM Classifier √ Text × Non-Text Backpropagation Large representation but not enough data. Overfitting? 96 256 ~10K parameters for detection ~50K parameters for classification 1st layer 2nd layer
  • 9. Tao Wang 9 Synthetic Data Color Statistics Synthetic “hard negatives” Real Synthetic Unrealistic Synthetic Data Real Data Java.Font + Natural backgrounds
  • 10. Tao Wang 10 Detector Performance
  • 11. Tao Wang 11 Text Line Bounding boxes Candidate spaces
  • 12. Tao Wang 12 81.4 81.7 64 89 50 55 60 65 70 75 80 85 90 95 100 Yokobayashi et al., 2006 Coates et al., 2011 K.Wang et al., 2011 Our Approach Human 83.9 62-way classification accuracy on ICDAR cropped characters (on ICDAR-Sample characters) Accuracy(%) Higher is better Classifier Performance
  • 15. Tao Wang 15 Word Recognition Lexicon: … MAKE SERIES ESTATE POKER … S E R I E S -5.45 7.82 -1.74 -9.02 max ∑
  • 16. Tao Wang 16 76 82 90 62 84 57 73 70 40 50 60 70 80 90 100 ICDAR-WD-50 ICDAR-WD-FULL SVT-WD K.Wang et al., 2011 Mishra, et al., 2012 Our approach Cropped Word Recognition Accuracy Accuracy(%) Cropped Words Benchmarks Higher is better
  • 17. Tao Wang 17 … … Candidate spaces generated by detector max( ) j j M Seg M Seg S  
  • 19. Tao Wang 19 End-to-end text recognition results 0.72 0.76 0.7 0.74 0.68 0.72 0.51 0.67 0.38 0.46 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 ICDAR-5 ICDAR-20 ICDAR-50 ICDAR-FULL SVT K.Wang et al., 2011 Our approach F-Score End-to-end Benchmarks Higher is better
  • 20. Tao Wang 20 Sample Output Images from SVT
  • 21. Tao Wang 21 Sample Output Images from ICDAR-FULL
  • 22. Tao Wang 22 max( ) max({ }) n c m n c n    -- “confidence margin” PEOSTEL PEOST POST POS Hunspell POSE POST PEOPLE PISTOL … LEXICON Suggested Words Our F-score: 0.38 Neumann and Matas, 2010: 0.40 c
  • 23. Tao Wang 23 • Learnt features + 2-layer CNN for+ character detection and classification • Simple heuristics to build end-to-end scene text recognition system • State-of-the-art performances on - ICDAR cropped character classification - ICDAR cropped word recognition - Lexicon based end-to-end recognition on ICDAR and SVT • Extensible to more general lexicon with off-the-shelf spelling checker Conclusion