SlideShare a Scribd company logo
End-to-End Text Recognition with
Convolutional Neural Networks
Tao Wang*, David J. Wu*, Adam Coates, Andrew Y. Ng
Computer Science Department
Stanford University
* Denotes equal contribution
Tao Wang 2
Scene Text Recognition Overview
• Text “in the wild” are hard to recognize
• Wide range of variations in backgrounds,
textures, fonts, and lighting conditions
Street View Text Dataset
K.Wang et al., 2011
ICDAR 2003 Dataset
S. Lucas et al., 2003
Tao Wang 3
Detection/Classification High-level Inference
“HOTEL”
Two-Stage Framework
Tao Wang 4
Exhaustive
Graph Search
MSER + SVM with RBF
Kernel
Neumann and
Matas, 2012
CRF + N-gram
model
HOG + SVM with RBF
Kernel
Mishra et al., 2012
Pictorial
Structure
HOG + Random Ferns
K. Wang et al., 2011
Semi-Markov
CRF
Appearance + Geometry
Weinman et al.,
2008
High-level
inference
Classification and
detection
Works
Tao Wang 5
Simple
off-the-shelf
heuristics
Learnt features +
2-layer CNN
Our approach
Graph based
inference
models
Hand-designed
features + off-the-shelf
classifier
Most other
approaches
High-level
inference
Classification and
detection
Tao Wang 6
ICDAR 62-way cropped
character classification
Detection/Classification End-to-end system after high-level inference
Various Benchmarks
ICDAR and SVT end-to-end text recognition
ICDAR and SVT Cropped
word recognition Lexicon
SOTA
SOTA on ICDAR SOTA
Tao Wang 7
Unsupervised Feature Learning
Contrast Normalization + ZCA whitening
K-Means
Coates et al., 2011
Tao Wang 8
Convolution Convolution
Spatial Pooling Spatial Pooling
L2-SVM Classifier
√ Text × Non-Text
Backpropagation
Large representation but not enough data.
Overfitting?
96
256
~10K parameters for detection
~50K parameters for classification
1st layer 2nd layer
Tao Wang 9
Synthetic Data
Color Statistics
Synthetic “hard negatives”
Real Synthetic
Unrealistic Synthetic Data
Real Data
Java.Font + Natural backgrounds
Tao Wang 10
Detector Performance
Tao Wang 11
Text Line Bounding boxes
Candidate spaces
Tao Wang 12
81.4 81.7
64
89
50
55
60
65
70
75
80
85
90
95
100
Yokobayashi et
al., 2006
Coates et al.,
2011
K.Wang et al.,
2011
Our Approach Human
83.9
62-way classification accuracy on ICDAR cropped characters
(on ICDAR-Sample
characters)
Accuracy(%)
Higher is
better
Classifier Performance
Tao Wang 13
Tao Wang 14
Char
Class
Sliding window position
Tao Wang 15
Word Recognition
Lexicon:
…
MAKE
SERIES
ESTATE
POKER
…
S E R I E S -5.45
7.82
-1.74
-9.02
max ∑
Tao Wang 16
76
82
90
62
84
57
73
70
40
50
60
70
80
90
100
ICDAR-WD-50 ICDAR-WD-FULL SVT-WD
K.Wang et al., 2011
Mishra, et al., 2012
Our approach
Cropped Word Recognition Accuracy
Accuracy(%)
Cropped Words Benchmarks
Higher is better
Tao Wang 17
…
…
Candidate spaces
generated by detector
max( )
j
j
M
Seg
M Seg
S


Tao Wang 18
Tao Wang 19
End-to-end text recognition results
0.72
0.76
0.7
0.74
0.68
0.72
0.51
0.67
0.38
0.46
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
ICDAR-5 ICDAR-20 ICDAR-50 ICDAR-FULL SVT
K.Wang et
al., 2011
Our
approach
F-Score
End-to-end Benchmarks
Higher is better
Tao Wang 20
Sample Output
Images from SVT
Tao Wang 21
Sample Output Images
from ICDAR-FULL
Tao Wang 22
max( )
max({  })
n c
m n c n

  -- “confidence margin”
PEOSTEL
PEOST
POST
POS
Hunspell
POSE
POST
PEOPLE
PISTOL
…
LEXICON
Suggested
Words
Our F-score: 0.38
Neumann and
Matas, 2010: 0.40
c
Tao Wang 23
• Learnt features + 2-layer CNN for+ character detection and classification
• Simple heuristics to build end-to-end scene text recognition system
• State-of-the-art performances on
- ICDAR cropped character classification
- ICDAR cropped word recognition
- Lexicon based end-to-end recognition on ICDAR and SVT
• Extensible to more general lexicon with off-the-shelf spelling checker
Conclusion
Tao Wang 24

More Related Content

Similar to ICPR2012_slides.ppt

Trivandrum
TrivandrumTrivandrum
Trivandrum
vgovindaraju
 
Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...
Divya Gera
 
Core java Training in Chennai
Core java Training in ChennaiCore java Training in Chennai
Core java Training in Chennai
Core Mind
 
Integrated Approach to Handwritten Character Recognition using ANN and it’s I...
Integrated Approach to Handwritten Character Recognition using ANN and it’s I...Integrated Approach to Handwritten Character Recognition using ANN and it’s I...
Integrated Approach to Handwritten Character Recognition using ANN and it’s I...
Amol Mahurkar
 
360 b sc(cs) - semester v ku
360 b sc(cs) - semester v ku360 b sc(cs) - semester v ku
360 b sc(cs) - semester v ku
Rajitha Reddy Alugati
 
Bangla handwritten character recognition using MobileNet V1 architecture
Bangla handwritten character recognition using MobileNet V1 architectureBangla handwritten character recognition using MobileNet V1 architecture
Bangla handwritten character recognition using MobileNet V1 architecture
journalBEEI
 
Dcn 20170823 yjy
Dcn 20170823 yjyDcn 20170823 yjy
Dcn 20170823 yjy
재연 윤
 
STRICT-SANER2015
STRICT-SANER2015STRICT-SANER2015
STRICT-SANER2015
Masud Rahman
 
Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...
Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...
Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...
IMPACT Centre of Competence
 

Similar to ICPR2012_slides.ppt (9)

Trivandrum
TrivandrumTrivandrum
Trivandrum
 
Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...
 
Core java Training in Chennai
Core java Training in ChennaiCore java Training in Chennai
Core java Training in Chennai
 
Integrated Approach to Handwritten Character Recognition using ANN and it’s I...
Integrated Approach to Handwritten Character Recognition using ANN and it’s I...Integrated Approach to Handwritten Character Recognition using ANN and it’s I...
Integrated Approach to Handwritten Character Recognition using ANN and it’s I...
 
360 b sc(cs) - semester v ku
360 b sc(cs) - semester v ku360 b sc(cs) - semester v ku
360 b sc(cs) - semester v ku
 
Bangla handwritten character recognition using MobileNet V1 architecture
Bangla handwritten character recognition using MobileNet V1 architectureBangla handwritten character recognition using MobileNet V1 architecture
Bangla handwritten character recognition using MobileNet V1 architecture
 
Dcn 20170823 yjy
Dcn 20170823 yjyDcn 20170823 yjy
Dcn 20170823 yjy
 
STRICT-SANER2015
STRICT-SANER2015STRICT-SANER2015
STRICT-SANER2015
 
Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...
Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...
Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...
 

Recently uploaded

When Should You Visit Puerto Rico's Bioluminescent Bay For The Best Viewing E...
When Should You Visit Puerto Rico's Bioluminescent Bay For The Best Viewing E...When Should You Visit Puerto Rico's Bioluminescent Bay For The Best Viewing E...
When Should You Visit Puerto Rico's Bioluminescent Bay For The Best Viewing E...
Caribbean Breeze Adventures
 
Discover the Magic of Ibiza An Unforgettable Boat Trip
Discover the Magic of Ibiza An Unforgettable Boat TripDiscover the Magic of Ibiza An Unforgettable Boat Trip
Discover the Magic of Ibiza An Unforgettable Boat Trip
White Island Charter
 
一比一原版(UST毕业证)圣托马斯大学毕业证如何办理
一比一原版(UST毕业证)圣托马斯大学毕业证如何办理一比一原版(UST毕业证)圣托马斯大学毕业证如何办理
一比一原版(UST毕业证)圣托马斯大学毕业证如何办理
yfuwd
 
Bahrain Visa For Indians, Complete Process
Bahrain Visa For Indians, Complete ProcessBahrain Visa For Indians, Complete Process
Bahrain Visa For Indians, Complete Process
toolzbuycomaccess
 
ghmc zones and circle and why they are needed
ghmc zones and circle and why they are neededghmc zones and circle and why they are needed
ghmc zones and circle and why they are needed
narinav14
 
Frontier Airlines at Boston Logan International Airport (BOS) Comprehensive G...
Frontier Airlines at Boston Logan International Airport (BOS) Comprehensive G...Frontier Airlines at Boston Logan International Airport (BOS) Comprehensive G...
Frontier Airlines at Boston Logan International Airport (BOS) Comprehensive G...
AirportCityTerminals Terminals
 
Inca Trail to Machu Picchu An Unforgettable Adventure
Inca Trail to Machu Picchu An Unforgettable AdventureInca Trail to Machu Picchu An Unforgettable Adventure
Inca Trail to Machu Picchu An Unforgettable Adventure
Xtreme Tourbulencia
 
What Should You Expect On Austin's History Tour
What Should You Expect On Austin's History TourWhat Should You Expect On Austin's History Tour
What Should You Expect On Austin's History Tour
Walking Tours of Austin
 
What Budget-Friendly Attractions Does San Antonio Offer For Families
What Budget-Friendly Attractions Does San Antonio Offer For FamiliesWhat Budget-Friendly Attractions Does San Antonio Offer For Families
What Budget-Friendly Attractions Does San Antonio Offer For Families
Walking Tours of San Antonio
 
09 Days Tour To Skardu(By Road): Skardu Ambassador Tours
09 Days Tour To Skardu(By Road): Skardu Ambassador Tours09 Days Tour To Skardu(By Road): Skardu Ambassador Tours
09 Days Tour To Skardu(By Road): Skardu Ambassador Tours
Skardu Ambassador Tours
 
定制(cardiff学位证书)英国卡迪夫大学毕业证本科学历原版一模一样
定制(cardiff学位证书)英国卡迪夫大学毕业证本科学历原版一模一样定制(cardiff学位证书)英国卡迪夫大学毕业证本科学历原版一模一样
定制(cardiff学位证书)英国卡迪夫大学毕业证本科学历原版一模一样
eovoam
 
Southwest Airlines Low Fare Calendar: The Ultimate Guide
Southwest Airlines Low Fare Calendar: The Ultimate GuideSouthwest Airlines Low Fare Calendar: The Ultimate Guide
Southwest Airlines Low Fare Calendar: The Ultimate Guide
i2aanshul
 
Best leisure recommended travel tips of 2024
Best leisure recommended travel tips of 2024Best leisure recommended travel tips of 2024
Best leisure recommended travel tips of 2024
kdadfarin363
 

Recently uploaded (13)

When Should You Visit Puerto Rico's Bioluminescent Bay For The Best Viewing E...
When Should You Visit Puerto Rico's Bioluminescent Bay For The Best Viewing E...When Should You Visit Puerto Rico's Bioluminescent Bay For The Best Viewing E...
When Should You Visit Puerto Rico's Bioluminescent Bay For The Best Viewing E...
 
Discover the Magic of Ibiza An Unforgettable Boat Trip
Discover the Magic of Ibiza An Unforgettable Boat TripDiscover the Magic of Ibiza An Unforgettable Boat Trip
Discover the Magic of Ibiza An Unforgettable Boat Trip
 
一比一原版(UST毕业证)圣托马斯大学毕业证如何办理
一比一原版(UST毕业证)圣托马斯大学毕业证如何办理一比一原版(UST毕业证)圣托马斯大学毕业证如何办理
一比一原版(UST毕业证)圣托马斯大学毕业证如何办理
 
Bahrain Visa For Indians, Complete Process
Bahrain Visa For Indians, Complete ProcessBahrain Visa For Indians, Complete Process
Bahrain Visa For Indians, Complete Process
 
ghmc zones and circle and why they are needed
ghmc zones and circle and why they are neededghmc zones and circle and why they are needed
ghmc zones and circle and why they are needed
 
Frontier Airlines at Boston Logan International Airport (BOS) Comprehensive G...
Frontier Airlines at Boston Logan International Airport (BOS) Comprehensive G...Frontier Airlines at Boston Logan International Airport (BOS) Comprehensive G...
Frontier Airlines at Boston Logan International Airport (BOS) Comprehensive G...
 
Inca Trail to Machu Picchu An Unforgettable Adventure
Inca Trail to Machu Picchu An Unforgettable AdventureInca Trail to Machu Picchu An Unforgettable Adventure
Inca Trail to Machu Picchu An Unforgettable Adventure
 
What Should You Expect On Austin's History Tour
What Should You Expect On Austin's History TourWhat Should You Expect On Austin's History Tour
What Should You Expect On Austin's History Tour
 
What Budget-Friendly Attractions Does San Antonio Offer For Families
What Budget-Friendly Attractions Does San Antonio Offer For FamiliesWhat Budget-Friendly Attractions Does San Antonio Offer For Families
What Budget-Friendly Attractions Does San Antonio Offer For Families
 
09 Days Tour To Skardu(By Road): Skardu Ambassador Tours
09 Days Tour To Skardu(By Road): Skardu Ambassador Tours09 Days Tour To Skardu(By Road): Skardu Ambassador Tours
09 Days Tour To Skardu(By Road): Skardu Ambassador Tours
 
定制(cardiff学位证书)英国卡迪夫大学毕业证本科学历原版一模一样
定制(cardiff学位证书)英国卡迪夫大学毕业证本科学历原版一模一样定制(cardiff学位证书)英国卡迪夫大学毕业证本科学历原版一模一样
定制(cardiff学位证书)英国卡迪夫大学毕业证本科学历原版一模一样
 
Southwest Airlines Low Fare Calendar: The Ultimate Guide
Southwest Airlines Low Fare Calendar: The Ultimate GuideSouthwest Airlines Low Fare Calendar: The Ultimate Guide
Southwest Airlines Low Fare Calendar: The Ultimate Guide
 
Best leisure recommended travel tips of 2024
Best leisure recommended travel tips of 2024Best leisure recommended travel tips of 2024
Best leisure recommended travel tips of 2024
 

ICPR2012_slides.ppt

  • 1. End-to-End Text Recognition with Convolutional Neural Networks Tao Wang*, David J. Wu*, Adam Coates, Andrew Y. Ng Computer Science Department Stanford University * Denotes equal contribution
  • 2. Tao Wang 2 Scene Text Recognition Overview • Text “in the wild” are hard to recognize • Wide range of variations in backgrounds, textures, fonts, and lighting conditions Street View Text Dataset K.Wang et al., 2011 ICDAR 2003 Dataset S. Lucas et al., 2003
  • 3. Tao Wang 3 Detection/Classification High-level Inference “HOTEL” Two-Stage Framework
  • 4. Tao Wang 4 Exhaustive Graph Search MSER + SVM with RBF Kernel Neumann and Matas, 2012 CRF + N-gram model HOG + SVM with RBF Kernel Mishra et al., 2012 Pictorial Structure HOG + Random Ferns K. Wang et al., 2011 Semi-Markov CRF Appearance + Geometry Weinman et al., 2008 High-level inference Classification and detection Works
  • 5. Tao Wang 5 Simple off-the-shelf heuristics Learnt features + 2-layer CNN Our approach Graph based inference models Hand-designed features + off-the-shelf classifier Most other approaches High-level inference Classification and detection
  • 6. Tao Wang 6 ICDAR 62-way cropped character classification Detection/Classification End-to-end system after high-level inference Various Benchmarks ICDAR and SVT end-to-end text recognition ICDAR and SVT Cropped word recognition Lexicon SOTA SOTA on ICDAR SOTA
  • 7. Tao Wang 7 Unsupervised Feature Learning Contrast Normalization + ZCA whitening K-Means Coates et al., 2011
  • 8. Tao Wang 8 Convolution Convolution Spatial Pooling Spatial Pooling L2-SVM Classifier √ Text × Non-Text Backpropagation Large representation but not enough data. Overfitting? 96 256 ~10K parameters for detection ~50K parameters for classification 1st layer 2nd layer
  • 9. Tao Wang 9 Synthetic Data Color Statistics Synthetic “hard negatives” Real Synthetic Unrealistic Synthetic Data Real Data Java.Font + Natural backgrounds
  • 10. Tao Wang 10 Detector Performance
  • 11. Tao Wang 11 Text Line Bounding boxes Candidate spaces
  • 12. Tao Wang 12 81.4 81.7 64 89 50 55 60 65 70 75 80 85 90 95 100 Yokobayashi et al., 2006 Coates et al., 2011 K.Wang et al., 2011 Our Approach Human 83.9 62-way classification accuracy on ICDAR cropped characters (on ICDAR-Sample characters) Accuracy(%) Higher is better Classifier Performance
  • 15. Tao Wang 15 Word Recognition Lexicon: … MAKE SERIES ESTATE POKER … S E R I E S -5.45 7.82 -1.74 -9.02 max ∑
  • 16. Tao Wang 16 76 82 90 62 84 57 73 70 40 50 60 70 80 90 100 ICDAR-WD-50 ICDAR-WD-FULL SVT-WD K.Wang et al., 2011 Mishra, et al., 2012 Our approach Cropped Word Recognition Accuracy Accuracy(%) Cropped Words Benchmarks Higher is better
  • 17. Tao Wang 17 … … Candidate spaces generated by detector max( ) j j M Seg M Seg S  
  • 19. Tao Wang 19 End-to-end text recognition results 0.72 0.76 0.7 0.74 0.68 0.72 0.51 0.67 0.38 0.46 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 ICDAR-5 ICDAR-20 ICDAR-50 ICDAR-FULL SVT K.Wang et al., 2011 Our approach F-Score End-to-end Benchmarks Higher is better
  • 20. Tao Wang 20 Sample Output Images from SVT
  • 21. Tao Wang 21 Sample Output Images from ICDAR-FULL
  • 22. Tao Wang 22 max( ) max({ }) n c m n c n    -- “confidence margin” PEOSTEL PEOST POST POS Hunspell POSE POST PEOPLE PISTOL … LEXICON Suggested Words Our F-score: 0.38 Neumann and Matas, 2010: 0.40 c
  • 23. Tao Wang 23 • Learnt features + 2-layer CNN for+ character detection and classification • Simple heuristics to build end-to-end scene text recognition system • State-of-the-art performances on - ICDAR cropped character classification - ICDAR cropped word recognition - Lexicon based end-to-end recognition on ICDAR and SVT • Extensible to more general lexicon with off-the-shelf spelling checker Conclusion