SlideShare a Scribd company logo
IIIT
Hyderabad
Optical Character Recognition as
Sequence Mapping
Devendra Kumar Sahu
Centre for Visual Information and Technology
IIIT Hyderabad
Advisor: Prof. C.V. Jawahar
Outline
Task and problem motivation
Unsupervised Feature Learning for Printed
Text
Sequence to Sequence Learning
Extensions and Future Work
Task
Segmentation Free Word Prediction
Detection and Isolated character classification
being performance bottleneck
Segmentation / Alignment data missing
•Learning representations from
data
•Fast adaptations to new scripts
•Data dependent representations
•Learn global structures in data like
partial characters
•Less time and effort in comparison
to designing hand engineered
features
Problem 1 motivation
•Using models which don’t need
•aligned data
RBM
RNN
•Segmentation free sequence prediction
•Using a recurrent encoder-decoder framework
•Learning a compact fixed dimensional
representation
•Standard RNNs don’t have fixed dimensional
representations
•Fixed dimensional representation enable fast
retrieval with approximate nearest neighbor.
Problem 2 motivation
Encoder Decoder
Recurrent Recurrent
z
Plan
Task and problem motivation
Unsupervised Feature Learning for Printed
Text
Sequence to Sequence Learning
Extensions and Future Work
Related Work
Linear
Non-linear
Hierarchical
Feature Learning
Turk et al. Eigen faces for recognition
Belhumeur et al. Eigenfaces vs. Fisherfaces
Khambhatla et al. Dimension reduction with local
PCA
Yang et al. Face recognition using kernel
Eigenfaces
S. Chandra et al. Learning Multiple Non-Linear
Sub-Spaces using K-RBMs
Gary B. Huang et al. Learning Hierarchical
Representations for Face Verification with
Convolutional Deep Belief Networks
Related Work
Optical Character Recognition
Y. N. Hammerla et al. Towards Feature Learning for
HMM-based Offline Handwriting Recognition
Breuel et al. High-Performance OCR for Printed
English and Fraktur using LSTM Networks.
P. Krishnan et al. Towards a robust OCR system for
Indic scripts
OCR for Indic Scripts
N. Sankaran et al.
Segmentation free
Segmentation free
hand engineered
features such as
profiles.
LSTM based
sequence
transcription
OCR for Indic Scripts
Segmentation free
Segmentation free
hand engineered
features such as
profiles.
LSTM based
sequence
transcription
N. Sankaran et al.
Goal
Learn features from data in unsupervised
setting
Design goals
Investigate the possibility to learn features from
data
Demonstrate limitations of hand engineered profile
features
Extend profile features with deep learning (deep
profiles)
Use combination of learned feature and RNNs
to perform Optical Character Recognition.
Profile features
Profiles (Rath & Manmatha)
Upper Profile (F1)
Lower Profile (F2)
Ink Transition Profile (F3)
Projection Profile (F4)
F1
F2
F3
F4
Deep Profile Features
Deep Profiles
Natural extension of projection profiles
(2D convolution of network Nhxww
and Ihxw )
Learn projection profiles from data
with dense coverage
(Learn many feature F1, F2, … , Fn)
Each hidden unit sensitive to a pattern
Projection Profiles
Special case of deep projection profiles
F1xw 1hx1 Ihxw
(2D convolution of 1hx1 and Ihxw )
Deep Projection Profiles
Proposed Pipeline
•Binarization
•Otsu thresholding
•Sliding Window Extraction
•Window whxww step size s
•Feature learning from data using
•Stacked RBM
•L layered stacked RBM learnt on data
•from sliding window extraction
Proposed Pipeline
•Latent representation of sequence
•Sequence Xi projected to learnt
•sequence Zi with learnt stacked RBM
•from previous step
•RNN with CTC output layer
•Sequence Zi is mapped to predictions
•Yi
•CTC layer aligns prediction and
•ground truth so error at output layer
•can be known
Visualization
Linear Combination of previous units
Sampling from
Build a Deep Belief Network (DBN) with j layers
Clamp hij = 1 and run gibbs chain for k steps on
layer j and j-1
Perform ancestral top down sampling from layer j-1
to input layer
Visualization Results
Sampling
Linear Combination of previous units
Experiments
Datasets:
Performance Measures
Label Error Rate : Edit distance normalized with
ground-truth length.
Sequence Error Rate: % of samples incorrectly
classified.
Languages Number of Words
English 295K
Kannada 171K
Malayalam 65K
Marathi 135K
Telugu 137K
IIIT
Hyderabad
Significance level = 0.05
Populations generated by training on 1.0, 0.9, 0.8, 0.6
of training data.
Significance testing of performance gain
Statistically Significant
Results
Languages Mean
Performance
Gain (%)
Standard
Deviation
(%)
T-statistic P-value
English 0.75 0.0603 25.05 1.39e-04
Kannada 2.62 0.4340 12.08 1.20e-03
Malayalam 0.8 0.1433 11.16 1.50e-03
Marathi 2.98 0.5289 11.27 1.50e-03
Telugu 3.28 0.4542 14.45 7.17e-04
46299
4535
1264 1001
0 0 0 0
18402
1867
695 725
0 0 0 0
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
" " " " " " " "
Number
of
Sequences
Label Error
Profile
RBM
Most errors are due to 1-2 characters mismatch
Results
Kannada Malayalam
Marathi Telugu
Convergence Results
Plan
Task and problem motivation
Unsupervised Feature Learning for Printed
Text
Sequence to Sequence Learning
Extensions and Future Work
Related Work
Caption Generation (Vinyals et al.)
Learning to execute (Zaremba et al.)
Language translation (Sutskever et al.)
Neural Conversational Networks (Vinyals et
al.)
Goal
Optical Character Recognition task in recurrent
encoder-decoder framework
Design goals
Learn compact fixed dimensional representation
from word image as sequences for recognition.
Investigate its usefulness in retrieval setting.
What kind of structure in learnt representations?
Encoder Decoder
Recurrent Recurrent
z
Sequence to Sequence architecture
Dataset
Annotated books from DLI
295K annotated English word images from 7 books
60% training, 20% validation and remaining 20% for
testing
Results
Results
Results
Model Label Error (%)
ABBYY 1.84
TESSERACT 35.80
TESSERACT 16.95
RNN Encoder-
Decoder
35.57
LSTM-CTC 0.84
LSTM Encoder-
Decoder
0.84
Feature mAP-100
h1-h2 0.7239
c1-c2 0.8548
h1-h2-c1-c2
L1
0.8078
h1-h2-c1-c2
L2
0.7834
h1-h2-c1-c2 0.8545
Results
Features Dim mAP-100 mAP-5000
BOW 400 0.5503 0.33
BOW 2000 0.6321 -
Augmented Profiles 247 0.7371 0.6189
LSTM-Encoder 400 0.7402 (h1-h2)
0.8521 (c1-c2)
0.8521
OCR-TESSERACT - 0.6594 0.7095
OCR-ABBYY - 0.8583 0.872
Results
Results
Results
Limitations of sequence to sequence
architecture
a) Sequence to Sequence Learning b) With Soft Attention
Plan
Task and problem motivation
Unsupervised Feature Learning
Sequence to Sequence Learning
Extensions and Future Work
Future Directions
Representation learning for OCRs using
recurrent generative models.
Sequence to Sequence Learning with attention
for OCRs
Efficient semantic representation of sentences
in fixed dimension using hierarchy of recurrent
networks
Multi-task recurrent networks for OCRs
Conclusion
Deep profiles suitable for representation
learning in OCRs (Compared to profiles)
Sequence to sequence learning can do well in
recognition and learnt compact features can be
used for efficient retrieval.
Publication
Devendra Kumar Sahu and C. V. Jawahar. ”Unsupervised Feature
Learning for Optical Character Recognition.” 13th IAPR International
Conference on Document Analysis and Recognition.
IIIT
Hyderabad
Questions??
Thanks!!

More Related Content

Similar to sahuPPT.pptx

Design and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontDesign and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English Font
IRJET Journal
 
PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...
PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...
PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...
Hyeongmin Lee
 
Deep Learning Fundamentals and Case studies using IBM POWER Systems
Deep Learning Fundamentals and Case studies using IBM POWER SystemsDeep Learning Fundamentals and Case studies using IBM POWER Systems
Deep Learning Fundamentals and Case studies using IBM POWER Systems
Ganesan Narayanasamy
 
Character recognition of Devanagari characters using Artificial Neural Network
Character recognition of Devanagari characters using Artificial Neural NetworkCharacter recognition of Devanagari characters using Artificial Neural Network
Character recognition of Devanagari characters using Artificial Neural Network
ijceronline
 
TechnicalBackgroundOverview
TechnicalBackgroundOverviewTechnicalBackgroundOverview
TechnicalBackgroundOverview
Motaz El-Saban
 
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
Sri Ambati
 
Devanagari Digit and Character Recognition Using Convolutional Neural Network
Devanagari Digit and Character Recognition Using Convolutional Neural NetworkDevanagari Digit and Character Recognition Using Convolutional Neural Network
Devanagari Digit and Character Recognition Using Convolutional Neural Network
IRJET Journal
 
Biometric Recognition using Deep Learning
Biometric Recognition using Deep LearningBiometric Recognition using Deep Learning
Biometric Recognition using Deep Learning
SahithiKotha2
 
Long-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep LearningLong-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep Learning
Elaheh Rashedi
 
DAS_18-KD-v2.pptx
DAS_18-KD-v2.pptxDAS_18-KD-v2.pptx
DAS_18-KD-v2.pptx
KartikDutta10
 
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)
Anmol Dwivedi
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
PyData
 
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowArtificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Jen Stirrup
 
Deep learning unsupervised learning diapo
Deep learning unsupervised learning diapoDeep learning unsupervised learning diapo
Deep learning unsupervised learning diapo
Milton Paja
 
Survey On Broken and Joint Devanagari Handwritten Characters Recognition Usin...
Survey On Broken and Joint Devanagari Handwritten Characters Recognition Usin...Survey On Broken and Joint Devanagari Handwritten Characters Recognition Usin...
Survey On Broken and Joint Devanagari Handwritten Characters Recognition Usin...
IRJET Journal
 
ITK Tutorial Presentation Slides-946
ITK Tutorial Presentation Slides-946ITK Tutorial Presentation Slides-946
ITK Tutorial Presentation Slides-946
Kitware Kitware
 
Convolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approachConvolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approach
Universitat de Barcelona
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PyData
 
Subnetworks in Schizophrenia, fMRI
Subnetworks in Schizophrenia, fMRISubnetworks in Schizophrenia, fMRI
Subnetworks in Schizophrenia, fMRI
Vanessa S
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & Metrics
Sanghamitra Deb
 

Similar to sahuPPT.pptx (20)

Design and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontDesign and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English Font
 
PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...
PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...
PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...
 
Deep Learning Fundamentals and Case studies using IBM POWER Systems
Deep Learning Fundamentals and Case studies using IBM POWER SystemsDeep Learning Fundamentals and Case studies using IBM POWER Systems
Deep Learning Fundamentals and Case studies using IBM POWER Systems
 
Character recognition of Devanagari characters using Artificial Neural Network
Character recognition of Devanagari characters using Artificial Neural NetworkCharacter recognition of Devanagari characters using Artificial Neural Network
Character recognition of Devanagari characters using Artificial Neural Network
 
TechnicalBackgroundOverview
TechnicalBackgroundOverviewTechnicalBackgroundOverview
TechnicalBackgroundOverview
 
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
 
Devanagari Digit and Character Recognition Using Convolutional Neural Network
Devanagari Digit and Character Recognition Using Convolutional Neural NetworkDevanagari Digit and Character Recognition Using Convolutional Neural Network
Devanagari Digit and Character Recognition Using Convolutional Neural Network
 
Biometric Recognition using Deep Learning
Biometric Recognition using Deep LearningBiometric Recognition using Deep Learning
Biometric Recognition using Deep Learning
 
Long-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep LearningLong-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep Learning
 
DAS_18-KD-v2.pptx
DAS_18-KD-v2.pptxDAS_18-KD-v2.pptx
DAS_18-KD-v2.pptx
 
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
 
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowArtificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
 
Deep learning unsupervised learning diapo
Deep learning unsupervised learning diapoDeep learning unsupervised learning diapo
Deep learning unsupervised learning diapo
 
Survey On Broken and Joint Devanagari Handwritten Characters Recognition Usin...
Survey On Broken and Joint Devanagari Handwritten Characters Recognition Usin...Survey On Broken and Joint Devanagari Handwritten Characters Recognition Usin...
Survey On Broken and Joint Devanagari Handwritten Characters Recognition Usin...
 
ITK Tutorial Presentation Slides-946
ITK Tutorial Presentation Slides-946ITK Tutorial Presentation Slides-946
ITK Tutorial Presentation Slides-946
 
Convolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approachConvolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approach
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
 
Subnetworks in Schizophrenia, fMRI
Subnetworks in Schizophrenia, fMRISubnetworks in Schizophrenia, fMRI
Subnetworks in Schizophrenia, fMRI
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & Metrics
 

Recently uploaded

1比1复刻澳洲皇家墨尔本理工大学毕业证本科学位原版一模一样
1比1复刻澳洲皇家墨尔本理工大学毕业证本科学位原版一模一样1比1复刻澳洲皇家墨尔本理工大学毕业证本科学位原版一模一样
1比1复刻澳洲皇家墨尔本理工大学毕业证本科学位原版一模一样
2g3om49r
 
一比一原版(UQ文凭证书)昆士兰大学毕业证如何办理
一比一原版(UQ文凭证书)昆士兰大学毕业证如何办理一比一原版(UQ文凭证书)昆士兰大学毕业证如何办理
一比一原版(UQ文凭证书)昆士兰大学毕业证如何办理
xuqdabu
 
按照学校原版(Birmingham文凭证书)伯明翰大学|学院毕业证快速办理
按照学校原版(Birmingham文凭证书)伯明翰大学|学院毕业证快速办理按照学校原版(Birmingham文凭证书)伯明翰大学|学院毕业证快速办理
按照学校原版(Birmingham文凭证书)伯明翰大学|学院毕业证快速办理
6oo02s6l
 
买(usyd毕业证书)澳洲悉尼大学毕业证研究生文凭证书原版一模一样
买(usyd毕业证书)澳洲悉尼大学毕业证研究生文凭证书原版一模一样买(usyd毕业证书)澳洲悉尼大学毕业证研究生文凭证书原版一模一样
买(usyd毕业证书)澳洲悉尼大学毕业证研究生文凭证书原版一模一样
nvoyobt
 
一比一原版(Adelaide文凭证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide文凭证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide文凭证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide文凭证书)阿德莱德大学毕业证如何办理
xuqdabu
 
按照学校原版(Adelaide文凭证书)阿德莱德大学毕业证快速办理
按照学校原版(Adelaide文凭证书)阿德莱德大学毕业证快速办理按照学校原版(Adelaide文凭证书)阿德莱德大学毕业证快速办理
按照学校原版(Adelaide文凭证书)阿德莱德大学毕业证快速办理
terpt4iu
 
按照学校原版(QU文凭证书)皇后大学毕业证快速办理
按照学校原版(QU文凭证书)皇后大学毕业证快速办理按照学校原版(QU文凭证书)皇后大学毕业证快速办理
按照学校原版(QU文凭证书)皇后大学毕业证快速办理
8db3cz8x
 
Building a Raspberry Pi Robot with Dot NET 8, Blazor and SignalR
Building a Raspberry Pi Robot with Dot NET 8, Blazor and SignalRBuilding a Raspberry Pi Robot with Dot NET 8, Blazor and SignalR
Building a Raspberry Pi Robot with Dot NET 8, Blazor and SignalR
Peter Gallagher
 
按照学校原版(AU文凭证书)英国阿伯丁大学毕业证快速办理
按照学校原版(AU文凭证书)英国阿伯丁大学毕业证快速办理按照学校原版(AU文凭证书)英国阿伯丁大学毕业证快速办理
按照学校原版(AU文凭证书)英国阿伯丁大学毕业证快速办理
ei8c4cba
 
按照学校原版(UST文凭证书)圣托马斯大学毕业证快速办理
按照学校原版(UST文凭证书)圣托马斯大学毕业证快速办理按照学校原版(UST文凭证书)圣托马斯大学毕业证快速办理
按照学校原版(UST文凭证书)圣托马斯大学毕业证快速办理
zpc0z12
 
按照学校原版(SUT文凭证书)斯威本科技大学毕业证快速办理
按照学校原版(SUT文凭证书)斯威本科技大学毕业证快速办理按照学校原版(SUT文凭证书)斯威本科技大学毕业证快速办理
按照学校原版(SUT文凭证书)斯威本科技大学毕业证快速办理
1jtj7yul
 
一比一原版(Adelaide文凭证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide文凭证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide文凭证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide文凭证书)阿德莱德大学毕业证如何办理
nudduv
 
按照学校原版(UAL文凭证书)伦敦艺术大学毕业证快速办理
按照学校原版(UAL文凭证书)伦敦艺术大学毕业证快速办理按照学校原版(UAL文凭证书)伦敦艺术大学毕业证快速办理
按照学校原版(UAL文凭证书)伦敦艺术大学毕业证快速办理
yizxn4sx
 
按照学校原版(UVic文凭证书)维多利亚大学毕业证快速办理
按照学校原版(UVic文凭证书)维多利亚大学毕业证快速办理按照学校原版(UVic文凭证书)维多利亚大学毕业证快速办理
按照学校原版(UVic文凭证书)维多利亚大学毕业证快速办理
1jtj7yul
 
一比一原版(Greenwich文凭证书)格林威治大学毕业证如何办理
一比一原版(Greenwich文凭证书)格林威治大学毕业证如何办理一比一原版(Greenwich文凭证书)格林威治大学毕业证如何办理
一比一原版(Greenwich文凭证书)格林威治大学毕业证如何办理
byfazef
 
按照学校原版(Greenwich文凭证书)格林威治大学毕业证快速办理
按照学校原版(Greenwich文凭证书)格林威治大学毕业证快速办理按照学校原版(Greenwich文凭证书)格林威治大学毕业证快速办理
按照学校原版(Greenwich文凭证书)格林威治大学毕业证快速办理
yizxn4sx
 
一比一原版(UOL文凭证书)利物浦大学毕业证如何办理
一比一原版(UOL文凭证书)利物浦大学毕业证如何办理一比一原版(UOL文凭证书)利物浦大学毕业证如何办理
一比一原版(UOL文凭证书)利物浦大学毕业证如何办理
eydeofo
 
加急办理美国南加州大学毕业证文凭毕业证原版一模一样
加急办理美国南加州大学毕业证文凭毕业证原版一模一样加急办理美国南加州大学毕业证文凭毕业证原版一模一样
加急办理美国南加州大学毕业证文凭毕业证原版一模一样
u0g33km
 
按照学校原版(UOL文凭证书)利物浦大学毕业证快速办理
按照学校原版(UOL文凭证书)利物浦大学毕业证快速办理按照学校原版(UOL文凭证书)利物浦大学毕业证快速办理
按照学校原版(UOL文凭证书)利物浦大学毕业证快速办理
terpt4iu
 
按照学校原版(KCL文凭证书)伦敦国王学院毕业证快速办理
按照学校原版(KCL文凭证书)伦敦国王学院毕业证快速办理按照学校原版(KCL文凭证书)伦敦国王学院毕业证快速办理
按照学校原版(KCL文凭证书)伦敦国王学院毕业证快速办理
terpt4iu
 

Recently uploaded (20)

1比1复刻澳洲皇家墨尔本理工大学毕业证本科学位原版一模一样
1比1复刻澳洲皇家墨尔本理工大学毕业证本科学位原版一模一样1比1复刻澳洲皇家墨尔本理工大学毕业证本科学位原版一模一样
1比1复刻澳洲皇家墨尔本理工大学毕业证本科学位原版一模一样
 
一比一原版(UQ文凭证书)昆士兰大学毕业证如何办理
一比一原版(UQ文凭证书)昆士兰大学毕业证如何办理一比一原版(UQ文凭证书)昆士兰大学毕业证如何办理
一比一原版(UQ文凭证书)昆士兰大学毕业证如何办理
 
按照学校原版(Birmingham文凭证书)伯明翰大学|学院毕业证快速办理
按照学校原版(Birmingham文凭证书)伯明翰大学|学院毕业证快速办理按照学校原版(Birmingham文凭证书)伯明翰大学|学院毕业证快速办理
按照学校原版(Birmingham文凭证书)伯明翰大学|学院毕业证快速办理
 
买(usyd毕业证书)澳洲悉尼大学毕业证研究生文凭证书原版一模一样
买(usyd毕业证书)澳洲悉尼大学毕业证研究生文凭证书原版一模一样买(usyd毕业证书)澳洲悉尼大学毕业证研究生文凭证书原版一模一样
买(usyd毕业证书)澳洲悉尼大学毕业证研究生文凭证书原版一模一样
 
一比一原版(Adelaide文凭证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide文凭证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide文凭证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide文凭证书)阿德莱德大学毕业证如何办理
 
按照学校原版(Adelaide文凭证书)阿德莱德大学毕业证快速办理
按照学校原版(Adelaide文凭证书)阿德莱德大学毕业证快速办理按照学校原版(Adelaide文凭证书)阿德莱德大学毕业证快速办理
按照学校原版(Adelaide文凭证书)阿德莱德大学毕业证快速办理
 
按照学校原版(QU文凭证书)皇后大学毕业证快速办理
按照学校原版(QU文凭证书)皇后大学毕业证快速办理按照学校原版(QU文凭证书)皇后大学毕业证快速办理
按照学校原版(QU文凭证书)皇后大学毕业证快速办理
 
Building a Raspberry Pi Robot with Dot NET 8, Blazor and SignalR
Building a Raspberry Pi Robot with Dot NET 8, Blazor and SignalRBuilding a Raspberry Pi Robot with Dot NET 8, Blazor and SignalR
Building a Raspberry Pi Robot with Dot NET 8, Blazor and SignalR
 
按照学校原版(AU文凭证书)英国阿伯丁大学毕业证快速办理
按照学校原版(AU文凭证书)英国阿伯丁大学毕业证快速办理按照学校原版(AU文凭证书)英国阿伯丁大学毕业证快速办理
按照学校原版(AU文凭证书)英国阿伯丁大学毕业证快速办理
 
按照学校原版(UST文凭证书)圣托马斯大学毕业证快速办理
按照学校原版(UST文凭证书)圣托马斯大学毕业证快速办理按照学校原版(UST文凭证书)圣托马斯大学毕业证快速办理
按照学校原版(UST文凭证书)圣托马斯大学毕业证快速办理
 
按照学校原版(SUT文凭证书)斯威本科技大学毕业证快速办理
按照学校原版(SUT文凭证书)斯威本科技大学毕业证快速办理按照学校原版(SUT文凭证书)斯威本科技大学毕业证快速办理
按照学校原版(SUT文凭证书)斯威本科技大学毕业证快速办理
 
一比一原版(Adelaide文凭证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide文凭证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide文凭证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide文凭证书)阿德莱德大学毕业证如何办理
 
按照学校原版(UAL文凭证书)伦敦艺术大学毕业证快速办理
按照学校原版(UAL文凭证书)伦敦艺术大学毕业证快速办理按照学校原版(UAL文凭证书)伦敦艺术大学毕业证快速办理
按照学校原版(UAL文凭证书)伦敦艺术大学毕业证快速办理
 
按照学校原版(UVic文凭证书)维多利亚大学毕业证快速办理
按照学校原版(UVic文凭证书)维多利亚大学毕业证快速办理按照学校原版(UVic文凭证书)维多利亚大学毕业证快速办理
按照学校原版(UVic文凭证书)维多利亚大学毕业证快速办理
 
一比一原版(Greenwich文凭证书)格林威治大学毕业证如何办理
一比一原版(Greenwich文凭证书)格林威治大学毕业证如何办理一比一原版(Greenwich文凭证书)格林威治大学毕业证如何办理
一比一原版(Greenwich文凭证书)格林威治大学毕业证如何办理
 
按照学校原版(Greenwich文凭证书)格林威治大学毕业证快速办理
按照学校原版(Greenwich文凭证书)格林威治大学毕业证快速办理按照学校原版(Greenwich文凭证书)格林威治大学毕业证快速办理
按照学校原版(Greenwich文凭证书)格林威治大学毕业证快速办理
 
一比一原版(UOL文凭证书)利物浦大学毕业证如何办理
一比一原版(UOL文凭证书)利物浦大学毕业证如何办理一比一原版(UOL文凭证书)利物浦大学毕业证如何办理
一比一原版(UOL文凭证书)利物浦大学毕业证如何办理
 
加急办理美国南加州大学毕业证文凭毕业证原版一模一样
加急办理美国南加州大学毕业证文凭毕业证原版一模一样加急办理美国南加州大学毕业证文凭毕业证原版一模一样
加急办理美国南加州大学毕业证文凭毕业证原版一模一样
 
按照学校原版(UOL文凭证书)利物浦大学毕业证快速办理
按照学校原版(UOL文凭证书)利物浦大学毕业证快速办理按照学校原版(UOL文凭证书)利物浦大学毕业证快速办理
按照学校原版(UOL文凭证书)利物浦大学毕业证快速办理
 
按照学校原版(KCL文凭证书)伦敦国王学院毕业证快速办理
按照学校原版(KCL文凭证书)伦敦国王学院毕业证快速办理按照学校原版(KCL文凭证书)伦敦国王学院毕业证快速办理
按照学校原版(KCL文凭证书)伦敦国王学院毕业证快速办理
 

sahuPPT.pptx

  • 1. IIIT Hyderabad Optical Character Recognition as Sequence Mapping Devendra Kumar Sahu Centre for Visual Information and Technology IIIT Hyderabad Advisor: Prof. C.V. Jawahar
  • 2. Outline Task and problem motivation Unsupervised Feature Learning for Printed Text Sequence to Sequence Learning Extensions and Future Work
  • 3. Task Segmentation Free Word Prediction Detection and Isolated character classification being performance bottleneck Segmentation / Alignment data missing
  • 4. •Learning representations from data •Fast adaptations to new scripts •Data dependent representations •Learn global structures in data like partial characters •Less time and effort in comparison to designing hand engineered features Problem 1 motivation •Using models which don’t need •aligned data RBM RNN
  • 5. •Segmentation free sequence prediction •Using a recurrent encoder-decoder framework •Learning a compact fixed dimensional representation •Standard RNNs don’t have fixed dimensional representations •Fixed dimensional representation enable fast retrieval with approximate nearest neighbor. Problem 2 motivation Encoder Decoder Recurrent Recurrent z
  • 6. Plan Task and problem motivation Unsupervised Feature Learning for Printed Text Sequence to Sequence Learning Extensions and Future Work
  • 7. Related Work Linear Non-linear Hierarchical Feature Learning Turk et al. Eigen faces for recognition Belhumeur et al. Eigenfaces vs. Fisherfaces Khambhatla et al. Dimension reduction with local PCA Yang et al. Face recognition using kernel Eigenfaces S. Chandra et al. Learning Multiple Non-Linear Sub-Spaces using K-RBMs Gary B. Huang et al. Learning Hierarchical Representations for Face Verification with Convolutional Deep Belief Networks
  • 8. Related Work Optical Character Recognition Y. N. Hammerla et al. Towards Feature Learning for HMM-based Offline Handwriting Recognition Breuel et al. High-Performance OCR for Printed English and Fraktur using LSTM Networks. P. Krishnan et al. Towards a robust OCR system for Indic scripts
  • 9. OCR for Indic Scripts N. Sankaran et al. Segmentation free Segmentation free hand engineered features such as profiles. LSTM based sequence transcription
  • 10. OCR for Indic Scripts Segmentation free Segmentation free hand engineered features such as profiles. LSTM based sequence transcription N. Sankaran et al.
  • 11. Goal Learn features from data in unsupervised setting Design goals Investigate the possibility to learn features from data Demonstrate limitations of hand engineered profile features Extend profile features with deep learning (deep profiles) Use combination of learned feature and RNNs to perform Optical Character Recognition.
  • 12. Profile features Profiles (Rath & Manmatha) Upper Profile (F1) Lower Profile (F2) Ink Transition Profile (F3) Projection Profile (F4) F1 F2 F3 F4
  • 13. Deep Profile Features Deep Profiles Natural extension of projection profiles (2D convolution of network Nhxww and Ihxw ) Learn projection profiles from data with dense coverage (Learn many feature F1, F2, … , Fn) Each hidden unit sensitive to a pattern Projection Profiles Special case of deep projection profiles F1xw 1hx1 Ihxw (2D convolution of 1hx1 and Ihxw ) Deep Projection Profiles
  • 14. Proposed Pipeline •Binarization •Otsu thresholding •Sliding Window Extraction •Window whxww step size s •Feature learning from data using •Stacked RBM •L layered stacked RBM learnt on data •from sliding window extraction
  • 15. Proposed Pipeline •Latent representation of sequence •Sequence Xi projected to learnt •sequence Zi with learnt stacked RBM •from previous step •RNN with CTC output layer •Sequence Zi is mapped to predictions •Yi •CTC layer aligns prediction and •ground truth so error at output layer •can be known
  • 16. Visualization Linear Combination of previous units Sampling from Build a Deep Belief Network (DBN) with j layers Clamp hij = 1 and run gibbs chain for k steps on layer j and j-1 Perform ancestral top down sampling from layer j-1 to input layer
  • 18. Experiments Datasets: Performance Measures Label Error Rate : Edit distance normalized with ground-truth length. Sequence Error Rate: % of samples incorrectly classified. Languages Number of Words English 295K Kannada 171K Malayalam 65K Marathi 135K Telugu 137K
  • 20. Significance level = 0.05 Populations generated by training on 1.0, 0.9, 0.8, 0.6 of training data. Significance testing of performance gain Statistically Significant Results Languages Mean Performance Gain (%) Standard Deviation (%) T-statistic P-value English 0.75 0.0603 25.05 1.39e-04 Kannada 2.62 0.4340 12.08 1.20e-03 Malayalam 0.8 0.1433 11.16 1.50e-03 Marathi 2.98 0.5289 11.27 1.50e-03 Telugu 3.28 0.4542 14.45 7.17e-04
  • 21. 46299 4535 1264 1001 0 0 0 0 18402 1867 695 725 0 0 0 0 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 " " " " " " " " Number of Sequences Label Error Profile RBM Most errors are due to 1-2 characters mismatch Results
  • 23. Plan Task and problem motivation Unsupervised Feature Learning for Printed Text Sequence to Sequence Learning Extensions and Future Work
  • 24. Related Work Caption Generation (Vinyals et al.) Learning to execute (Zaremba et al.) Language translation (Sutskever et al.) Neural Conversational Networks (Vinyals et al.)
  • 25. Goal Optical Character Recognition task in recurrent encoder-decoder framework Design goals Learn compact fixed dimensional representation from word image as sequences for recognition. Investigate its usefulness in retrieval setting. What kind of structure in learnt representations? Encoder Decoder Recurrent Recurrent z
  • 26. Sequence to Sequence architecture
  • 27. Dataset Annotated books from DLI 295K annotated English word images from 7 books 60% training, 20% validation and remaining 20% for testing
  • 30. Results Model Label Error (%) ABBYY 1.84 TESSERACT 35.80 TESSERACT 16.95 RNN Encoder- Decoder 35.57 LSTM-CTC 0.84 LSTM Encoder- Decoder 0.84 Feature mAP-100 h1-h2 0.7239 c1-c2 0.8548 h1-h2-c1-c2 L1 0.8078 h1-h2-c1-c2 L2 0.7834 h1-h2-c1-c2 0.8545
  • 31. Results Features Dim mAP-100 mAP-5000 BOW 400 0.5503 0.33 BOW 2000 0.6321 - Augmented Profiles 247 0.7371 0.6189 LSTM-Encoder 400 0.7402 (h1-h2) 0.8521 (c1-c2) 0.8521 OCR-TESSERACT - 0.6594 0.7095 OCR-ABBYY - 0.8583 0.872
  • 35. Limitations of sequence to sequence architecture a) Sequence to Sequence Learning b) With Soft Attention
  • 36. Plan Task and problem motivation Unsupervised Feature Learning Sequence to Sequence Learning Extensions and Future Work
  • 37. Future Directions Representation learning for OCRs using recurrent generative models. Sequence to Sequence Learning with attention for OCRs Efficient semantic representation of sentences in fixed dimension using hierarchy of recurrent networks Multi-task recurrent networks for OCRs
  • 38. Conclusion Deep profiles suitable for representation learning in OCRs (Compared to profiles) Sequence to sequence learning can do well in recognition and learnt compact features can be used for efficient retrieval.
  • 39. Publication Devendra Kumar Sahu and C. V. Jawahar. ”Unsupervised Feature Learning for Optical Character Recognition.” 13th IAPR International Conference on Document Analysis and Recognition.

Editor's Notes

  1. 1
  2. 3
  3. 4
  4. 5
  5. 7
  6. 8
  7. 9
  8. 10
  9. 11
  10. 12
  11. 13
  12. 14
  13. 15
  14. 16
  15. 17
  16. 18
  17. 20
  18. 21
  19. 22
  20. 25
  21. 26
  22. 30
  23. 31
  24. 33
  25. 34
  26. 35
  27. 37
  28. 38
  29. 39
  30. 40