SlideShare a Scribd company logo
1 of 34
Author: Sonia Yousfia, Sid-Ahmed Berrania, Christophe Garciab
Source: elsevier,Volume 64, April 2017, Pages 245-254
Speaker: 李怡芬(1710731003)
Contribution of recurrent connectionist language models in
improving LSTM-based Arabic text recognition in videos
OUTLINE
• Abstract
• Related Work
• Proposed methodology
• Experimental setup
• Result
2
Motivation
• Automatically recognizing such texts can avoid a large part of the manual video
annotation
• Particularly for this language, there are only very few works that have addressed
the problem of Arabic video OCR while many big Arabic news channels appeared
in the last two decades and more than half of a billion people in the world uses the
Arabic language
Abstract
3
Contribution
• Different recurrent connectionist language models to improve LSTM-based Arabic
text recognition in videos.
• Efficient joint decoding paradigm using language model and LSTM responses.
• Additional decoding hyper-parameters, extensively evaluated, that improve
recognition results and optimize running time.
•
Significant recognition improvement by integrating connectionist language
models that outperform n-grams contribution.
• Final Arabic OCR system that significantly outperforms commercial OCR engine.
Abstract
4
Related work
• Language Model
• N-grams have been considered as state-of-the-art LM for many years.
• The most important drawback is their inefficiency to represent long-context patterns.
• For massive amount of training data, a large part of patterns cannot be effectively
represented and discovered during training.
5
Related work
• Neural Networks
• NN-based LMs have been introduced more than a decade ago by Elman and Bengio et al. and have been
successfully used for automatic speech recognition as well as for offline HTR .
• The main drawback of these models remains in their high computational complexity.
• RNN-based LM lies in its representation of the history :
• Unlike the previously mentioned models, context patterns are learned from data.
• The history is presented recurrently by the hidden layer of the network
• RNN language modelers can handle arbitrarily long contexts.
6
Proposed Methodology
In this work :
Author’s focus two main factors to reach better improvements
(1)What type of language model to choose
(2)How to integrate it in the decoding schema
7
Proposed Methodology
1 • Arabic OCR system
2
• Language modeling
• RNN-based language modeling
• RNNME: Joint learning of RNN and ME LMs
3
• Decoding schema
8
Proposed Methodology
Arabic OCR system
• It takes as input a text image without any pre-processing or prior segmentation
• Transform Arabic text images into sequences of relevant learned features
• Text transcription is then performed using the BLSTM-CTC schema
9
Proposed Methodology
Arabic OCR system - Feature extraction
• We apply a multi-scale scanning of the input text image using 4 sliding windows with different
aspect ratios
• Each window is then transformed to a set of learned features using a Convolutional Neural
Network (ConvNet)
10
Proposed Methodology
Language modeling
• RNN-based language modeling
• The network is trained using truncated BPTT to avoid two extreme cases during training
• The first case leads to high computational complexity ,and does not give the network the
opportunity to store relevant context information that can be useful in the future
• The second case is the bottleneck of the vanishing gradient problem
• RNNME :Joint learning of RNN and ME LMs
• Usually, features are hand-designed
• Weight setup
11
Proposed Methodology
Decoding schema
• The goal of this project:
• The decoding stage aims at finding the most probable transcription given these outputs
12
Experimental setup
1 • OCR system setup
2
• Language models set-up
3
• Primary results
4
• Tuning decoding parameters
5
• Final results
13
• OCR system setup
• The goal of this project:
• Study the contribution of the language modeling in optical text recognition
• The effect of their integration paradigm into the decoding stage
Experimental setup
14
• OCR system setup
Experimental setup
Item Note
Dataset • ALIF Dataset
(composed of Arabic text images extracted from Arabic TV Broadcast)
BLSTM-CTC
component
• Trained using 7673 text images
(the 4152 examples of ALIF_Train subset augmented to 7673 examples by applying some image processing
operations like color inversion, blurring, etc)
Algorithm • Stochastic gradient descent algorithm
Learning Rate=10 -4,Momentum=0.9,Random initial weights:[-0.1,0.1]
15
• OCR system setup
Experimental setup
Item Note
ConvNET Feature
Extraction
• Initially from these images, 20,571 character image
• Apply some scaling and color inversion operations
• Obtain a set of 46,689 single character images
• To train and evaluate the ConvNet model
BLSTM Network
Training
• All Arabic letter shapes have been considered
• The OCR component here considers the different shapes of a letter depending on its
position in a word
• In this work, for Arabic language modeling and final text transcription we consider
atomic Arabic letters
16
• Language models set-up
• The goal of this project:
• Choose Language model
Experimental setup
17
• Language models set-up – Get Language dataset
Experimental setup
Item Note
Build Arabic text lines
Dataset
Dataset Source:
• Ajdir Corpora
(From : Arabic newspapers)
• Watan-2004/Khaleej-2004 Corpora
(From : Arabic newspapers)
• Open Source Arabic Corpora
(From : Includes texts collected from Arabic TV channels websites like BBC and CNN. Initially)
To fit context • Cut the obtained text (dedicated to LM) into text lines with a limited number of
words
Note : Dataset contains in total 52.08M characters 18
• Language models set-up – Get Language dataset
Experimental setup
Item Note
Other processing steps • Removing non-Arabic characters, digits and extra punctuation marks and an
important number of text repetitions
• Text lines are then split into individual characters
• The space between words is replaced by a specific label
Note : Dataset contains in total 52.08M characters 19
• Language models set-up – Training Methodology
Experimental setup
Item Note
Dataset Spilt • Randomly select text lines with 44.47M characters to train the LMs.
• The remaining text lines are split into two subsets :
• 4.29M characters for validation
• The other, denoted TEXT_Test set, with 3.32M characters for test.
Algorithm • Train different RNN and RNNME models using the stochastic gradient descent
• An evaluation of the entropy on the validation set
• Learning rate = 0.1
Note: The entropy reflects how well the LM as a probabilistic model predicts samples 20
• Language models set-up – Training Methodology
Experimental setup
Item Note
Training language model • Train, in parallel, n-gram LMs using the SRILM toolkit
• The models are smoothed using the Witten-Bell discounting and the order is
tuned on the validation set
• Best entropy results are obtained with a 7-gram LM
Note: The entropy reflects how well the LM as a probabilistic model predicts samples
21
• Language models set-up – Supplement
• For the tuning of the joint decoding
• Use the WRR criterion and a separate development set, denoted DEV_Set(made
up of 1225 text images)
• Final OCR System Test
• Use text images from ALIF_Test1 and ALIF_Test2 sets
• The subsets contain 827 and 1175 text images respectively.
Experimental setup
22
• Primary results -
Experimental setup
In the OCR schema,each LMs entropy result are presented in Fig5.
Dataset : TEXT_Test set
23
• Primary results -
Experimental setup
Dataset : DEV_Set
Decoding schemes are evaluated in terms of WRR on the DEV_Set text images
• Models with lower entropy yield better recognition rates
when integrated into the decoding
• Connectionist LMs outperform n-grams both in terms
of entropy and WRR
24
• Primary results
Experimental setup
Entropy and WRR training (RNN-700 LM)
Result:
This reduction corresponds to a progressive improvement in WRR while applying the joint decoding
at each of these epochs which proves once again the correctness of the proposed joint decoding schema.
25
• Primary results
Experimental setup
26
• Tuning decoding parameters
Experimental setup
1
• LM weights
2
• The beam width
3
• The score pruning
27
• Tuning decoding parameters - LM weights
Experimental setup
Based on experimental results:
Parameter(ω1,ω2) = (0.7,0.55)。
The map clearly shows the effect of the LM weights
on the recognition results
28
• Tuning decoding parameters-The beam width
• To analyze the impact of the beam width
on decoding results :
• Model :Considered the two best previously performing models
• RNN-700 and RNN-ME-300
• Research point :
• The impact of the beam width in terms of WRR and average
processing time per word
• (a):beam width / WRR
• RNN-ME-300 is better more
• (b:)beam width / average processing time
Experimental setup
29
• Final results
Experimental setup
• This improve- ment reaches almost 16% with the BS-RNN-700 schema on both datasets
• The character-level connectionist LMs still outperform the n-gram LM in terms of WRR.
• In terms of speed, the BS-7-gram is faster than other LMs
32
• Final results - The performance of our proposed text recognizer
Experimental setup
• Chosen a well-known commercial OCR engine,“ABBYY Fine Reader 12”
• The Arabic OCR component of this engine has been applied on the selected images of ALIF_test1.
• Excluded examples with digits and punctuation marks.
• Our methods largely outperform the ABBYY system by more than 35 points in terms of WRR.
33
• Final results - Other instructions
Experimental setup
• Finally, we illustrate, in Fig. 10, some of our OCR system outputs corrected by joint decoding using the RNN-700 LM
• The linguistic information is able to correct confusions between similar characters like ‘Saad’ and ‘Ayn’
34
Result – How much model in this paper
Item Note
Language model RNN-based:
RNN-100 , RNN-300 , RNN-500 , RNN-700
RNNME-base:
RNNME-100 , RNNME-300 , RNNME-500
N-gram-based:
7-gram LM
Total:8
Tunning decoding parameters RNNME-300 , RNN-700
• LM weight(0.7,0.55)
• Beam width(20,6.5)
BS-RNN700 , BS-RNNME-300
• Score pruning
Compare with commercial OCR
engine
BS-NO-LM , BS-RNN-700 , BS-RNN-300
35
Result
項目 說明
1 使用LMs 進行視頻中的阿拉伯文本識別
2 利用RNN來模擬語言中的長程依賴性(建構了兩種類型的character-level language models:一種是based
on RNNs,另一種是based on a joint learning of Maximum Entropy and RNN models)
3 Beam Search的修改版本的解碼模式
4 decoding引入了hyper parameters,以便在識別結果和響應時間之間達到更好的平衡
5 使用公開的ALIF數據集對整個範例進行了廣泛的評估
6 論文中的語言模型在WRR方面優於n-gram超過4 points
7 這些模型的貢獻和解碼模式的影響,與BLSTM-CTC OCR相比,WRR達到了近16 points的改進( BS-
RNN-700 與 BS-NO-LM)
8 結果:論文中所使用的方法識別率良好,優於商用OCR引擎近36%
(BS-RNN-700 與 ABBYY Fine Reader 12)
36

More Related Content

Similar to Contribution of recurrent connectionist language models in improving lstm based arabic text recognition in videos

Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...
Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...
Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...osify
 
Real-Time Voice Actuation
Real-Time Voice ActuationReal-Time Voice Actuation
Real-Time Voice ActuationPragya Agrawal
 
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...Vimukthi Wickramasinghe
 
optimizing code in compilers using parallel genetic algorithm
optimizing code in compilers using parallel genetic algorithm optimizing code in compilers using parallel genetic algorithm
optimizing code in compilers using parallel genetic algorithm Fatemeh Karimi
 
Compeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptxCompeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptxSan Kim
 
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNetFrom Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNetEric Haibin Lin
 
Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram Praveen Penumathsa
 
MLPerf an industry standard benchmark suite for machine learning performance
MLPerf an industry standard benchmark suite for machine learning performanceMLPerf an industry standard benchmark suite for machine learning performance
MLPerf an industry standard benchmark suite for machine learning performancejemin lee
 
OCR by Abdullah Ahmed Abu Rtima
OCR by Abdullah Ahmed Abu Rtima OCR by Abdullah Ahmed Abu Rtima
OCR by Abdullah Ahmed Abu Rtima as af
 
Compiler Construction
Compiler ConstructionCompiler Construction
Compiler ConstructionAhmed Raza
 
口試投影片(詹智傑) Final
口試投影片(詹智傑) Final口試投影片(詹智傑) Final
口試投影片(詹智傑) Final詹智傑
 
Approximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithmsApproximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithmsSabidur Rahman
 
고급컴파일러구성론_개레_230303.pptx
고급컴파일러구성론_개레_230303.pptx고급컴파일러구성론_개레_230303.pptx
고급컴파일러구성론_개레_230303.pptxssuser1e7611
 
Unsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource DomainsUnsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource Domainstaeseon ryu
 
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...Daniel Varro
 
240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptxthanhdowork
 
Parallel Distributed Deep Learning on HPCC Systems
Parallel Distributed Deep Learning on HPCC SystemsParallel Distributed Deep Learning on HPCC Systems
Parallel Distributed Deep Learning on HPCC SystemsHPCC Systems
 

Similar to Contribution of recurrent connectionist language models in improving lstm based arabic text recognition in videos (20)

Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
 
Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...
Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...
Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...
 
Real-Time Voice Actuation
Real-Time Voice ActuationReal-Time Voice Actuation
Real-Time Voice Actuation
 
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
 
optimizing code in compilers using parallel genetic algorithm
optimizing code in compilers using parallel genetic algorithm optimizing code in compilers using parallel genetic algorithm
optimizing code in compilers using parallel genetic algorithm
 
Compeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptxCompeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptx
 
Trans coder
Trans coderTrans coder
Trans coder
 
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNetFrom Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
 
Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram
 
MLPerf an industry standard benchmark suite for machine learning performance
MLPerf an industry standard benchmark suite for machine learning performanceMLPerf an industry standard benchmark suite for machine learning performance
MLPerf an industry standard benchmark suite for machine learning performance
 
OCR by Abdullah Ahmed Abu Rtima
OCR by Abdullah Ahmed Abu Rtima OCR by Abdullah Ahmed Abu Rtima
OCR by Abdullah Ahmed Abu Rtima
 
Compiler Construction
Compiler ConstructionCompiler Construction
Compiler Construction
 
Introduction
IntroductionIntroduction
Introduction
 
口試投影片(詹智傑) Final
口試投影片(詹智傑) Final口試投影片(詹智傑) Final
口試投影片(詹智傑) Final
 
Approximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithmsApproximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithms
 
고급컴파일러구성론_개레_230303.pptx
고급컴파일러구성론_개레_230303.pptx고급컴파일러구성론_개레_230303.pptx
고급컴파일러구성론_개레_230303.pptx
 
Unsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource DomainsUnsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource Domains
 
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
 
240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx
 
Parallel Distributed Deep Learning on HPCC Systems
Parallel Distributed Deep Learning on HPCC SystemsParallel Distributed Deep Learning on HPCC Systems
Parallel Distributed Deep Learning on HPCC Systems
 

Recently uploaded

Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxSilpa
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsbassianu17
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfSumit Kumar yadav
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Silpa
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptxArvind Kumar
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Silpa
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry Areesha Ahmad
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 

Recently uploaded (20)

Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 

Contribution of recurrent connectionist language models in improving lstm based arabic text recognition in videos

  • 1. Author: Sonia Yousfia, Sid-Ahmed Berrania, Christophe Garciab Source: elsevier,Volume 64, April 2017, Pages 245-254 Speaker: 李怡芬(1710731003) Contribution of recurrent connectionist language models in improving LSTM-based Arabic text recognition in videos
  • 2. OUTLINE • Abstract • Related Work • Proposed methodology • Experimental setup • Result 2
  • 3. Motivation • Automatically recognizing such texts can avoid a large part of the manual video annotation • Particularly for this language, there are only very few works that have addressed the problem of Arabic video OCR while many big Arabic news channels appeared in the last two decades and more than half of a billion people in the world uses the Arabic language Abstract 3
  • 4. Contribution • Different recurrent connectionist language models to improve LSTM-based Arabic text recognition in videos. • Efficient joint decoding paradigm using language model and LSTM responses. • Additional decoding hyper-parameters, extensively evaluated, that improve recognition results and optimize running time. • Significant recognition improvement by integrating connectionist language models that outperform n-grams contribution. • Final Arabic OCR system that significantly outperforms commercial OCR engine. Abstract 4
  • 5. Related work • Language Model • N-grams have been considered as state-of-the-art LM for many years. • The most important drawback is their inefficiency to represent long-context patterns. • For massive amount of training data, a large part of patterns cannot be effectively represented and discovered during training. 5
  • 6. Related work • Neural Networks • NN-based LMs have been introduced more than a decade ago by Elman and Bengio et al. and have been successfully used for automatic speech recognition as well as for offline HTR . • The main drawback of these models remains in their high computational complexity. • RNN-based LM lies in its representation of the history : • Unlike the previously mentioned models, context patterns are learned from data. • The history is presented recurrently by the hidden layer of the network • RNN language modelers can handle arbitrarily long contexts. 6
  • 7. Proposed Methodology In this work : Author’s focus two main factors to reach better improvements (1)What type of language model to choose (2)How to integrate it in the decoding schema 7
  • 8. Proposed Methodology 1 • Arabic OCR system 2 • Language modeling • RNN-based language modeling • RNNME: Joint learning of RNN and ME LMs 3 • Decoding schema 8
  • 9. Proposed Methodology Arabic OCR system • It takes as input a text image without any pre-processing or prior segmentation • Transform Arabic text images into sequences of relevant learned features • Text transcription is then performed using the BLSTM-CTC schema 9
  • 10. Proposed Methodology Arabic OCR system - Feature extraction • We apply a multi-scale scanning of the input text image using 4 sliding windows with different aspect ratios • Each window is then transformed to a set of learned features using a Convolutional Neural Network (ConvNet) 10
  • 11. Proposed Methodology Language modeling • RNN-based language modeling • The network is trained using truncated BPTT to avoid two extreme cases during training • The first case leads to high computational complexity ,and does not give the network the opportunity to store relevant context information that can be useful in the future • The second case is the bottleneck of the vanishing gradient problem • RNNME :Joint learning of RNN and ME LMs • Usually, features are hand-designed • Weight setup 11
  • 12. Proposed Methodology Decoding schema • The goal of this project: • The decoding stage aims at finding the most probable transcription given these outputs 12
  • 13. Experimental setup 1 • OCR system setup 2 • Language models set-up 3 • Primary results 4 • Tuning decoding parameters 5 • Final results 13
  • 14. • OCR system setup • The goal of this project: • Study the contribution of the language modeling in optical text recognition • The effect of their integration paradigm into the decoding stage Experimental setup 14
  • 15. • OCR system setup Experimental setup Item Note Dataset • ALIF Dataset (composed of Arabic text images extracted from Arabic TV Broadcast) BLSTM-CTC component • Trained using 7673 text images (the 4152 examples of ALIF_Train subset augmented to 7673 examples by applying some image processing operations like color inversion, blurring, etc) Algorithm • Stochastic gradient descent algorithm Learning Rate=10 -4,Momentum=0.9,Random initial weights:[-0.1,0.1] 15
  • 16. • OCR system setup Experimental setup Item Note ConvNET Feature Extraction • Initially from these images, 20,571 character image • Apply some scaling and color inversion operations • Obtain a set of 46,689 single character images • To train and evaluate the ConvNet model BLSTM Network Training • All Arabic letter shapes have been considered • The OCR component here considers the different shapes of a letter depending on its position in a word • In this work, for Arabic language modeling and final text transcription we consider atomic Arabic letters 16
  • 17. • Language models set-up • The goal of this project: • Choose Language model Experimental setup 17
  • 18. • Language models set-up – Get Language dataset Experimental setup Item Note Build Arabic text lines Dataset Dataset Source: • Ajdir Corpora (From : Arabic newspapers) • Watan-2004/Khaleej-2004 Corpora (From : Arabic newspapers) • Open Source Arabic Corpora (From : Includes texts collected from Arabic TV channels websites like BBC and CNN. Initially) To fit context • Cut the obtained text (dedicated to LM) into text lines with a limited number of words Note : Dataset contains in total 52.08M characters 18
  • 19. • Language models set-up – Get Language dataset Experimental setup Item Note Other processing steps • Removing non-Arabic characters, digits and extra punctuation marks and an important number of text repetitions • Text lines are then split into individual characters • The space between words is replaced by a specific label Note : Dataset contains in total 52.08M characters 19
  • 20. • Language models set-up – Training Methodology Experimental setup Item Note Dataset Spilt • Randomly select text lines with 44.47M characters to train the LMs. • The remaining text lines are split into two subsets : • 4.29M characters for validation • The other, denoted TEXT_Test set, with 3.32M characters for test. Algorithm • Train different RNN and RNNME models using the stochastic gradient descent • An evaluation of the entropy on the validation set • Learning rate = 0.1 Note: The entropy reflects how well the LM as a probabilistic model predicts samples 20
  • 21. • Language models set-up – Training Methodology Experimental setup Item Note Training language model • Train, in parallel, n-gram LMs using the SRILM toolkit • The models are smoothed using the Witten-Bell discounting and the order is tuned on the validation set • Best entropy results are obtained with a 7-gram LM Note: The entropy reflects how well the LM as a probabilistic model predicts samples 21
  • 22. • Language models set-up – Supplement • For the tuning of the joint decoding • Use the WRR criterion and a separate development set, denoted DEV_Set(made up of 1225 text images) • Final OCR System Test • Use text images from ALIF_Test1 and ALIF_Test2 sets • The subsets contain 827 and 1175 text images respectively. Experimental setup 22
  • 23. • Primary results - Experimental setup In the OCR schema,each LMs entropy result are presented in Fig5. Dataset : TEXT_Test set 23
  • 24. • Primary results - Experimental setup Dataset : DEV_Set Decoding schemes are evaluated in terms of WRR on the DEV_Set text images • Models with lower entropy yield better recognition rates when integrated into the decoding • Connectionist LMs outperform n-grams both in terms of entropy and WRR 24
  • 25. • Primary results Experimental setup Entropy and WRR training (RNN-700 LM) Result: This reduction corresponds to a progressive improvement in WRR while applying the joint decoding at each of these epochs which proves once again the correctness of the proposed joint decoding schema. 25
  • 27. • Tuning decoding parameters Experimental setup 1 • LM weights 2 • The beam width 3 • The score pruning 27
  • 28. • Tuning decoding parameters - LM weights Experimental setup Based on experimental results: Parameter(ω1,ω2) = (0.7,0.55)。 The map clearly shows the effect of the LM weights on the recognition results 28
  • 29. • Tuning decoding parameters-The beam width • To analyze the impact of the beam width on decoding results : • Model :Considered the two best previously performing models • RNN-700 and RNN-ME-300 • Research point : • The impact of the beam width in terms of WRR and average processing time per word • (a):beam width / WRR • RNN-ME-300 is better more • (b:)beam width / average processing time Experimental setup 29
  • 30. • Final results Experimental setup • This improve- ment reaches almost 16% with the BS-RNN-700 schema on both datasets • The character-level connectionist LMs still outperform the n-gram LM in terms of WRR. • In terms of speed, the BS-7-gram is faster than other LMs 32
  • 31. • Final results - The performance of our proposed text recognizer Experimental setup • Chosen a well-known commercial OCR engine,“ABBYY Fine Reader 12” • The Arabic OCR component of this engine has been applied on the selected images of ALIF_test1. • Excluded examples with digits and punctuation marks. • Our methods largely outperform the ABBYY system by more than 35 points in terms of WRR. 33
  • 32. • Final results - Other instructions Experimental setup • Finally, we illustrate, in Fig. 10, some of our OCR system outputs corrected by joint decoding using the RNN-700 LM • The linguistic information is able to correct confusions between similar characters like ‘Saad’ and ‘Ayn’ 34
  • 33. Result – How much model in this paper Item Note Language model RNN-based: RNN-100 , RNN-300 , RNN-500 , RNN-700 RNNME-base: RNNME-100 , RNNME-300 , RNNME-500 N-gram-based: 7-gram LM Total:8 Tunning decoding parameters RNNME-300 , RNN-700 • LM weight(0.7,0.55) • Beam width(20,6.5) BS-RNN700 , BS-RNNME-300 • Score pruning Compare with commercial OCR engine BS-NO-LM , BS-RNN-700 , BS-RNN-300 35
  • 34. Result 項目 說明 1 使用LMs 進行視頻中的阿拉伯文本識別 2 利用RNN來模擬語言中的長程依賴性(建構了兩種類型的character-level language models:一種是based on RNNs,另一種是based on a joint learning of Maximum Entropy and RNN models) 3 Beam Search的修改版本的解碼模式 4 decoding引入了hyper parameters,以便在識別結果和響應時間之間達到更好的平衡 5 使用公開的ALIF數據集對整個範例進行了廣泛的評估 6 論文中的語言模型在WRR方面優於n-gram超過4 points 7 這些模型的貢獻和解碼模式的影響,與BLSTM-CTC OCR相比,WRR達到了近16 points的改進( BS- RNN-700 與 BS-NO-LM) 8 結果:論文中所使用的方法識別率良好,優於商用OCR引擎近36% (BS-RNN-700 與 ABBYY Fine Reader 12) 36