This document describes the RWTH-OCR Handwriting Recognition System for Arabic handwriting developed by researchers at RWTH Aachen University. The system adapts the RWTH ASR framework for handwriting recognition. It uses preprocessing-free feature extraction and focuses on modeling writing variants, characters, and context. Discriminative training techniques like modified MMI and unsupervised confidence-based training are used. Experimental results show the system achieves a character error rate of 6.49% on a standard dataset, outperforming previous systems.
This is an introduction of Topic Modeling, including tf-idf, LSA, pLSA, LDA, EM, and some other related materials. I know there are definitely some mistakes, and you can correct them with your wisdom. Thank you~
This document provides an overview of machine learning and various machine learning techniques. It discusses what machine learning is, different types of learning tasks like classification and regression, how performance is measured, and different types of training experiences like direct supervision and reinforcement learning. It then covers specific machine learning algorithms like classification using Rocchio's algorithm, nearest neighbor learning, Bayesian learning approaches, and text categorization using naive Bayes.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
This document discusses using support vector machines for language identification from speech data. It describes extracting mel frequency cepstral coefficients from speech samples in 5 different languages as features, and using these features to train SVM models for each language. The models were tested on held-out speech data to identify the language, achieving accuracy ranging from 67-93% depending on the language. Support vector machines were able to effectively handle the high-dimensional feature vectors and identify the language of new speech samples by finding the model with the highest probability.
Latent Semantic Analysis (LSA) is a mathematical technique for computationally modeling the meaning of words and larger units of texts. LSA works by applying a mathematical technique called Singular Value Decomposition (SVD) to a term*document matrix containing frequency counts for all words found in the corpus in all of the documents or passages in the corpus. After this SVD application, the meaning of a word is represented as a vector in a multidimensional semantic space, which makes it possible to compare word meanings, for instance by computing the cosine between two word vectors.
LSA has been successfully used in a large variety of language related applications from automatic grading of student essays to predicting click trails in website navigation. In Coh-Metrix (Graesser et al. 2004), a computational tool that produces indices of the linguistic and discourse representations of a text, LSA was used as a measure of text cohesion by assuming that cohesion increases as a function of higher cosine scores between adjacent sentences.
Besides being interesting as a technique for building programs that need to deal with semantics, LSA is also interesting as a model of human cognition. LSA can match human performance on word association tasks and vocabulary test. In this talk, Fridolin will focus on LSA as a tool in modeling language acquisition. After framing the area of the talk with sketching the key concepts learning, information, and competence acquisition, and after outlining presuppositions, an introduction into meaningful interaction analysis (MIA) is given. MIA is a means to inspect learning with the support of language analysis that is geometrical in nature. MIA is a fusion of latent semantic analysis (LSA) combined with network analysis (NA/SNA). LSA, NA/SNA, and MIA are illustrated by several examples.
This document provides an overview of using latent semantic analysis (LSA) and the R programming language for language technology enhanced learning applications. It describes using LSA to create a semantic space to compare documents and evaluate student writings. It also demonstrates clustering terms based on their semantic similarity and visualizing networks in R. Evaluation results show LSA machine scores for essay quality had a Spearman's rank correlation of 0.687 with human scores, outperforming a pure vector space model.
Multilingual Text Classification using OntologiesGerard de Melo
In this paper, we investigate strategies for automatically classifying documents in different languages thematically, geographically or according to other criteria. A novel linguistically motivated text representation scheme is presented that can be used with machine learning algorithms in order to learn classifications from pre-classified examples and then automatically classify documents that might be provided in entirely different languages. Our approach makes use of ontologies and lexical resources but goes beyond a simple mapping from terms to concepts by fully exploiting the external knowledge manifested in such resources and mapping to entire regions of concepts. For this, a graph traversal algorithm is used to explore related concepts that might be relevant. Extensive testing has shown that our methods lead to significant improvements compared to existing approaches.
This document summarizes computational approaches to questioned handwriting examination developed by the CEDAR research group. It describes the CEDAR-FOX software system, which performs writer verification and identification by computing the probability and strength of evidence of a handwriting match. The system extracts handwriting characteristics and letter formations to compare between known and questioned documents and provides ranked results. It also has tools for document properties analysis, signature verification, and searching handwritten documents. Future work is focused on improving the statistical model and developing better line segmentation techniques.
This is an introduction of Topic Modeling, including tf-idf, LSA, pLSA, LDA, EM, and some other related materials. I know there are definitely some mistakes, and you can correct them with your wisdom. Thank you~
This document provides an overview of machine learning and various machine learning techniques. It discusses what machine learning is, different types of learning tasks like classification and regression, how performance is measured, and different types of training experiences like direct supervision and reinforcement learning. It then covers specific machine learning algorithms like classification using Rocchio's algorithm, nearest neighbor learning, Bayesian learning approaches, and text categorization using naive Bayes.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
This document discusses using support vector machines for language identification from speech data. It describes extracting mel frequency cepstral coefficients from speech samples in 5 different languages as features, and using these features to train SVM models for each language. The models were tested on held-out speech data to identify the language, achieving accuracy ranging from 67-93% depending on the language. Support vector machines were able to effectively handle the high-dimensional feature vectors and identify the language of new speech samples by finding the model with the highest probability.
Latent Semantic Analysis (LSA) is a mathematical technique for computationally modeling the meaning of words and larger units of texts. LSA works by applying a mathematical technique called Singular Value Decomposition (SVD) to a term*document matrix containing frequency counts for all words found in the corpus in all of the documents or passages in the corpus. After this SVD application, the meaning of a word is represented as a vector in a multidimensional semantic space, which makes it possible to compare word meanings, for instance by computing the cosine between two word vectors.
LSA has been successfully used in a large variety of language related applications from automatic grading of student essays to predicting click trails in website navigation. In Coh-Metrix (Graesser et al. 2004), a computational tool that produces indices of the linguistic and discourse representations of a text, LSA was used as a measure of text cohesion by assuming that cohesion increases as a function of higher cosine scores between adjacent sentences.
Besides being interesting as a technique for building programs that need to deal with semantics, LSA is also interesting as a model of human cognition. LSA can match human performance on word association tasks and vocabulary test. In this talk, Fridolin will focus on LSA as a tool in modeling language acquisition. After framing the area of the talk with sketching the key concepts learning, information, and competence acquisition, and after outlining presuppositions, an introduction into meaningful interaction analysis (MIA) is given. MIA is a means to inspect learning with the support of language analysis that is geometrical in nature. MIA is a fusion of latent semantic analysis (LSA) combined with network analysis (NA/SNA). LSA, NA/SNA, and MIA are illustrated by several examples.
This document provides an overview of using latent semantic analysis (LSA) and the R programming language for language technology enhanced learning applications. It describes using LSA to create a semantic space to compare documents and evaluate student writings. It also demonstrates clustering terms based on their semantic similarity and visualizing networks in R. Evaluation results show LSA machine scores for essay quality had a Spearman's rank correlation of 0.687 with human scores, outperforming a pure vector space model.
Multilingual Text Classification using OntologiesGerard de Melo
In this paper, we investigate strategies for automatically classifying documents in different languages thematically, geographically or according to other criteria. A novel linguistically motivated text representation scheme is presented that can be used with machine learning algorithms in order to learn classifications from pre-classified examples and then automatically classify documents that might be provided in entirely different languages. Our approach makes use of ontologies and lexical resources but goes beyond a simple mapping from terms to concepts by fully exploiting the external knowledge manifested in such resources and mapping to entire regions of concepts. For this, a graph traversal algorithm is used to explore related concepts that might be relevant. Extensive testing has shown that our methods lead to significant improvements compared to existing approaches.
This document summarizes computational approaches to questioned handwriting examination developed by the CEDAR research group. It describes the CEDAR-FOX software system, which performs writer verification and identification by computing the probability and strength of evidence of a handwriting match. The system extracts handwriting characteristics and letter formations to compare between known and questioned documents and provides ranked results. It also has tools for document properties analysis, signature verification, and searching handwritten documents. Future work is focused on improving the statistical model and developing better line segmentation techniques.
Raman spectroscopy was used to analyze blue gel pen inks. Four groups of inks were distinguished based on their Raman spectra using both 514.5 nm and 830 nm lasers. The groups included inks containing pigment blue 15, pigment blue 15 and pigment violet 23, and pigment blue 15 analyzed against a database of ink spectra. Raman spectroscopy proved effective at discriminating inks based on their pigment components and matching to reference spectra in a database.
Image retrieval involves finding images similar to a query image based on visual features like color, texture, and shape. Current systems represent images as feature vectors and calculate similarity using metrics like Euclidean distance. Queries can provide an example image or specify features. Content-based image retrieval faces challenges in semantic retrieval of concepts like objects and faces due to variability in appearance. Time series retrieval also represents sequences as feature vectors to find similar subsequences using models that capture patterns over time.
Document examiners analyze questioned documents to determine authenticity and source. They compare handwriting styles, typed text, and indentations to identify writers. They also examine alterations, erasures, ink types, and paper composition. Through analyzing many subtle writing characteristics and comparing to exemplar samples, examiners can determine if two documents were written by the same person. Their analysis helps answer legal questions about signatures, anonymous letters, and document tampering.
This document discusses techniques for handwriting analysis and typewriter comparison used in document examination. Handwriting is unique to each individual due to physical and mental factors. Comparisons analyze characteristics like letter formation, spacing, pressure and similarities between known and questioned samples. Typewriter identification examines unique defects and wear on the typewriter. Additional methods discussed include analyzing alterations through infrared luminescence, thin layer chromatography of inks, and impressions left on documents.
This document discusses different types of signature forgeries and their detection. It outlines eight main types of forgeries: traced, simulated, practiced, spurious, transplanted, computerized, color copy, and trickery. For each type, it provides a brief definition and characteristics that could identify the forgery. The document also lists characteristics of genuine signatures such as pen movement, lifts, and hesitations. It concludes by noting signs that could indicate a forgery and advising getting an additional signature or consulting a handwriting expert if doubt exists.
This document discusses different series of Canadian bank notes, including the Polymer series and Canadian Journey series. It contains images related to these different series of bank notes used in Canada. The document focuses on introducing various designs found on Canadian currency.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise causes chemical changes in the brain that may help alleviate symptoms of mental illness and boost overall mental well-being.
This document presents an overview of various techniques that have been implemented for counterfeit currency detection. It discusses the rationale and purpose of studying this topic. It then reviews over 10 studies that have implemented techniques like image processing, pattern recognition, machine learning using features of specific currencies. The accuracy rates of these implementations range from 77% to 100%. The document concludes that image processing and pattern recognition is the most common approach and discusses opportunities for future work using other methods.
This document discusses the examination of questioned documents. It defines questioned documents and lists common characteristics examined for handwriting analysis. It emphasizes the importance of obtaining known writing samples and outlines guidelines for collecting exemplars to avoid deception. The document also discusses analyzing typewritten documents, identifying typewriters, and proper collection of typewriting exemplars. Finally, it briefly discusses techniques for examining photocopies, printers, and faxes.
This document discusses methods for detecting art forgeries. It outlines three paintings by Monet, Van Gogh, and Pollock and their artistic styles. It then presents three methods for examining artwork: ultraviolet light can reveal new paint that fluoresces differently than old paint; chemical analysis of paint pigments can show whether historically accurate materials were used; and microscopic examination can identify machine-woven canvases which were not used until around 1850. The document proposes examining three hypothetical artworks using these three detection methods to determine if they are forgeries.
Sangyun Lee, 'Why Korea's Merger Control Occasionally Fails: A Public Choice ...Sangyun Lee
Presentation slides for a session held on June 4, 2024, at Kyoto University. This presentation is based on the presenter’s recent paper, coauthored with Hwang Lee, Professor, Korea University, with the same title, published in the Journal of Business Administration & Law, Volume 34, No. 2 (April 2024). The paper, written in Korean, is available at <https://shorturl.at/GCWcI>.
Guide on the use of Artificial Intelligence-based tools by lawyers and law fi...Massimo Talia
This guide aims to provide information on how lawyers will be able to use the opportunities provided by AI tools and how such tools could help the business processes of small firms. Its objective is to provide lawyers with some background to understand what they can and cannot realistically expect from these products. This guide aims to give a reference point for small law practices in the EU
against which they can evaluate those classes of AI applications that are probably the most relevant for them.
Genocide in International Criminal Law.pptxMasoudZamani13
Excited to share insights from my recent presentation on genocide! 💡 In light of ongoing debates, it's crucial to delve into the nuances of this grave crime.
Lifting the Corporate Veil. Power Point Presentationseri bangash
"Lifting the Corporate Veil" is a legal concept that refers to the judicial act of disregarding the separate legal personality of a corporation or limited liability company (LLC). Normally, a corporation is considered a legal entity separate from its shareholders or members, meaning that the personal assets of shareholders or members are protected from the liabilities of the corporation. However, there are certain situations where courts may decide to "pierce" or "lift" the corporate veil, holding shareholders or members personally liable for the debts or actions of the corporation.
Here are some common scenarios in which courts might lift the corporate veil:
Fraud or Illegality: If shareholders or members use the corporate structure to perpetrate fraud, evade legal obligations, or engage in illegal activities, courts may disregard the corporate entity and hold those individuals personally liable.
Undercapitalization: If a corporation is formed with insufficient capital to conduct its intended business and meet its foreseeable liabilities, and this lack of capitalization results in harm to creditors or other parties, courts may lift the corporate veil to hold shareholders or members liable.
Failure to Observe Corporate Formalities: Corporations and LLCs are required to observe certain formalities, such as holding regular meetings, maintaining separate financial records, and avoiding commingling of personal and corporate assets. If these formalities are not observed and the corporate structure is used as a mere façade, courts may disregard the corporate entity.
Alter Ego: If there is such a unity of interest and ownership between the corporation and its shareholders or members that the separate personalities of the corporation and the individuals no longer exist, courts may treat the corporation as the alter ego of its owners and hold them personally liable.
Group Enterprises: In some cases, where multiple corporations are closely related or form part of a single economic unit, courts may pierce the corporate veil to achieve equity, particularly if one corporation's actions harm creditors or other stakeholders and the corporate structure is being used to shield culpable parties from liability.
This document briefly explains the June compliance calendar 2024 with income tax returns, PF, ESI, and important due dates, forms to be filled out, periods, and who should file them?.
सुप्रीम कोर्ट ने यह भी माना था कि मजिस्ट्रेट का यह कर्तव्य है कि वह सुनिश्चित करे कि अधिकारी पीएमएलए के तहत निर्धारित प्रक्रिया के साथ-साथ संवैधानिक सुरक्षा उपायों का भी उचित रूप से पालन करें।
Raman spectroscopy was used to analyze blue gel pen inks. Four groups of inks were distinguished based on their Raman spectra using both 514.5 nm and 830 nm lasers. The groups included inks containing pigment blue 15, pigment blue 15 and pigment violet 23, and pigment blue 15 analyzed against a database of ink spectra. Raman spectroscopy proved effective at discriminating inks based on their pigment components and matching to reference spectra in a database.
Image retrieval involves finding images similar to a query image based on visual features like color, texture, and shape. Current systems represent images as feature vectors and calculate similarity using metrics like Euclidean distance. Queries can provide an example image or specify features. Content-based image retrieval faces challenges in semantic retrieval of concepts like objects and faces due to variability in appearance. Time series retrieval also represents sequences as feature vectors to find similar subsequences using models that capture patterns over time.
Document examiners analyze questioned documents to determine authenticity and source. They compare handwriting styles, typed text, and indentations to identify writers. They also examine alterations, erasures, ink types, and paper composition. Through analyzing many subtle writing characteristics and comparing to exemplar samples, examiners can determine if two documents were written by the same person. Their analysis helps answer legal questions about signatures, anonymous letters, and document tampering.
This document discusses techniques for handwriting analysis and typewriter comparison used in document examination. Handwriting is unique to each individual due to physical and mental factors. Comparisons analyze characteristics like letter formation, spacing, pressure and similarities between known and questioned samples. Typewriter identification examines unique defects and wear on the typewriter. Additional methods discussed include analyzing alterations through infrared luminescence, thin layer chromatography of inks, and impressions left on documents.
This document discusses different types of signature forgeries and their detection. It outlines eight main types of forgeries: traced, simulated, practiced, spurious, transplanted, computerized, color copy, and trickery. For each type, it provides a brief definition and characteristics that could identify the forgery. The document also lists characteristics of genuine signatures such as pen movement, lifts, and hesitations. It concludes by noting signs that could indicate a forgery and advising getting an additional signature or consulting a handwriting expert if doubt exists.
This document discusses different series of Canadian bank notes, including the Polymer series and Canadian Journey series. It contains images related to these different series of bank notes used in Canada. The document focuses on introducing various designs found on Canadian currency.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise causes chemical changes in the brain that may help alleviate symptoms of mental illness and boost overall mental well-being.
This document presents an overview of various techniques that have been implemented for counterfeit currency detection. It discusses the rationale and purpose of studying this topic. It then reviews over 10 studies that have implemented techniques like image processing, pattern recognition, machine learning using features of specific currencies. The accuracy rates of these implementations range from 77% to 100%. The document concludes that image processing and pattern recognition is the most common approach and discusses opportunities for future work using other methods.
This document discusses the examination of questioned documents. It defines questioned documents and lists common characteristics examined for handwriting analysis. It emphasizes the importance of obtaining known writing samples and outlines guidelines for collecting exemplars to avoid deception. The document also discusses analyzing typewritten documents, identifying typewriters, and proper collection of typewriting exemplars. Finally, it briefly discusses techniques for examining photocopies, printers, and faxes.
This document discusses methods for detecting art forgeries. It outlines three paintings by Monet, Van Gogh, and Pollock and their artistic styles. It then presents three methods for examining artwork: ultraviolet light can reveal new paint that fluoresces differently than old paint; chemical analysis of paint pigments can show whether historically accurate materials were used; and microscopic examination can identify machine-woven canvases which were not used until around 1850. The document proposes examining three hypothetical artworks using these three detection methods to determine if they are forgeries.
Sangyun Lee, 'Why Korea's Merger Control Occasionally Fails: A Public Choice ...Sangyun Lee
Presentation slides for a session held on June 4, 2024, at Kyoto University. This presentation is based on the presenter’s recent paper, coauthored with Hwang Lee, Professor, Korea University, with the same title, published in the Journal of Business Administration & Law, Volume 34, No. 2 (April 2024). The paper, written in Korean, is available at <https://shorturl.at/GCWcI>.
Guide on the use of Artificial Intelligence-based tools by lawyers and law fi...Massimo Talia
This guide aims to provide information on how lawyers will be able to use the opportunities provided by AI tools and how such tools could help the business processes of small firms. Its objective is to provide lawyers with some background to understand what they can and cannot realistically expect from these products. This guide aims to give a reference point for small law practices in the EU
against which they can evaluate those classes of AI applications that are probably the most relevant for them.
Genocide in International Criminal Law.pptxMasoudZamani13
Excited to share insights from my recent presentation on genocide! 💡 In light of ongoing debates, it's crucial to delve into the nuances of this grave crime.
Lifting the Corporate Veil. Power Point Presentationseri bangash
"Lifting the Corporate Veil" is a legal concept that refers to the judicial act of disregarding the separate legal personality of a corporation or limited liability company (LLC). Normally, a corporation is considered a legal entity separate from its shareholders or members, meaning that the personal assets of shareholders or members are protected from the liabilities of the corporation. However, there are certain situations where courts may decide to "pierce" or "lift" the corporate veil, holding shareholders or members personally liable for the debts or actions of the corporation.
Here are some common scenarios in which courts might lift the corporate veil:
Fraud or Illegality: If shareholders or members use the corporate structure to perpetrate fraud, evade legal obligations, or engage in illegal activities, courts may disregard the corporate entity and hold those individuals personally liable.
Undercapitalization: If a corporation is formed with insufficient capital to conduct its intended business and meet its foreseeable liabilities, and this lack of capitalization results in harm to creditors or other parties, courts may lift the corporate veil to hold shareholders or members liable.
Failure to Observe Corporate Formalities: Corporations and LLCs are required to observe certain formalities, such as holding regular meetings, maintaining separate financial records, and avoiding commingling of personal and corporate assets. If these formalities are not observed and the corporate structure is used as a mere façade, courts may disregard the corporate entity.
Alter Ego: If there is such a unity of interest and ownership between the corporation and its shareholders or members that the separate personalities of the corporation and the individuals no longer exist, courts may treat the corporation as the alter ego of its owners and hold them personally liable.
Group Enterprises: In some cases, where multiple corporations are closely related or form part of a single economic unit, courts may pierce the corporate veil to achieve equity, particularly if one corporation's actions harm creditors or other stakeholders and the corporate structure is being used to shield culpable parties from liability.
This document briefly explains the June compliance calendar 2024 with income tax returns, PF, ESI, and important due dates, forms to be filled out, periods, and who should file them?.
सुप्रीम कोर्ट ने यह भी माना था कि मजिस्ट्रेट का यह कर्तव्य है कि वह सुनिश्चित करे कि अधिकारी पीएमएलए के तहत निर्धारित प्रक्रिया के साथ-साथ संवैधानिक सुरक्षा उपायों का भी उचित रूप से पालन करें।
Matthew Professional CV experienced Government LiaisonMattGardner52
As an experienced Government Liaison, I have demonstrated expertise in Corporate Governance. My skill set includes senior-level management in Contract Management, Legal Support, and Diplomatic Relations. I have also gained proficiency as a Corporate Liaison, utilizing my strong background in accounting, finance, and legal, with a Bachelor's degree (B.A.) from California State University. My Administrative Skills further strengthen my ability to contribute to the growth and success of any organization.
What are the common challenges faced by women lawyers working in the legal pr...lawyersonia
The legal profession, which has historically been male-dominated, has experienced a significant increase in the number of women entering the field over the past few decades. Despite this progress, women lawyers continue to encounter various challenges as they strive for top positions.
Synopsis On Annual General Meeting/Extra Ordinary General Meeting With Ordinary And Special Businesses And Ordinary And Special Resolutions with Companies (Postal Ballot) Regulations, 2018
1. The RWTH-OCR Handwriting Recognition System for
Arabic Handwriting
Philippe Dreuw, Georg Heigold, David Rybach,
Christian Gollan, and Hermann Ney
dreuw@cs.rwth-aachen.de
DAAD Workshop, Sousse, Tunisia – March 2010
Human Language Technology and Pattern Recognition
Lehrstuhl für Informatik 6
Computer Science Department
RWTH Aachen University, Germany
Dreuw et. al.: RWTH-OCR 1 / 36 Sousse, Tunisia March 2010
2. Outline
1. Introduction
2. Adaptation of the RWTH-ASR framework for Handwriting Recognition
I System Overview
I Discriminative training using modified MMI criterion
I Unsupervised confidence-based discriminative training during decoding
I Writer Adaptive Training
3. Experimental Results
4. Summary
Dreuw et. al.: RWTH-OCR 2 / 36 Sousse, Tunisia March 2010
3. Introduction
I Arabic handwriting system
. right-to-left, 28 characters, position-dependent character writing variants
. ligatures and diacritics
. Pieces of Arabic Word (PAWs) as subwords
(a) Ligatures (b) Diacritics
I state-of-the-art
. preprocessing (normalization, baseline estimation, etc.) + HMMs
I our approach:
. adaptation of RWTH-ASR framework for handwriting recognition
. preprocessing-free feature extraction, focus on modeling
Dreuw et. al.: RWTH-OCR 3 / 36 Sousse, Tunisia March 2010
4. RWTH ASR System: Overview
The RWTH Aachen University Open Source
Speech Recognition System [Rybach & Gollan+
09]
http://www-i6.informatik.rwth-aachen.de/rwth-asr/
I speech recognition framework supporting:
. acoustic training
including speaker adaptive training
. speaker normalization / adaptation:
VTLN, CMLLR, MLLR
. multi-pass decoding
I framework also used for machine translation,
video / image processing
I published under an open source licence (RWTH ASR Licence)
I commercial licences available on request
I more than 100 registrations until today
Dreuw et. al.: RWTH-OCR 4 / 36 Sousse, Tunisia March 2010
5. Arabic Handwriting - IFN/ENIT Database
Corpus development
I ICDAR 2005 Competition: a, b, c, d sets for training, evaluation on set e
I ICDAR 2007 Competition: ICDAR05 + e sets for training, evaluation on set f
I ICDAR 2009 Competition: ICDAR 2007 for training, evaluation on set f
Dreuw et. al.: RWTH-OCR 5 / 36 Sousse, Tunisia March 2010
6. Arabic Handwriting - IFN/ENIT Database
I 937 classes
I 32492 handwritten Arabic words (Tunisian city names)
I database is used by more than 60 groups all over the world
I writer statistics
set #writers #samples
a 102 6537
b 102 6710
c 103 6477
d 104 6735
e 505 6033
Total 916 32492
I examples (same word):
Dreuw et. al.: RWTH-OCR 6 / 36 Sousse, Tunisia March 2010
7. System Overview
Image Input
Feature
Extraction
Character Inventory
Writing Variants Lexicon
Language Model
Global Search:
maximize
x1...xT
Pr(w1
...wN
) Pr(x1
...xT
| w1...wN)
w1...wN
Recognized
Word Sequence
over
Pr(x1...xT | w1...wN)
Pr(w1
...wN
)
Dreuw et. al.: RWTH-OCR 7 / 36 Sousse, Tunisia March 2010
8. Writing Variant Model Refinement
I HMM baseline system
. searching for an unknown word sequence wN
1 := w1, . . . , wN
. unknown number of words N
. maximize the posterior probability p(wN
1 |xT
1 )
. described by Bayes’ decision rule:
ŵN
1 = arg max
wN
1
n
pγ
(wN
1 )p(xT
1 |wN
1 )
o
with γ a scaling exponent of the language model.
Dreuw et. al.: RWTH-OCR 8 / 36 Sousse, Tunisia March 2010
9. Writing Variant Model Refinement
I ligatures and diacritics in Arabic handwriting
. same Arabic word can be written in several writing variants
→ depends on writer’s handwriting style
I Example: laB khM vs. khMlaB
I lexicon with multiple writing variants [Details]
. problem: many and rare writing variants
Dreuw et. al.: RWTH-OCR 9 / 36 Sousse, Tunisia March 2010
10. Writing Variant Model Refinement
I probability p(v|w) for a variant v of a word w
. usually considered as equally distributed
. here: we use the count statistics as probability:
p(v|w) =
N(v, w)
N(w)
I writing variant model refinement:
p(xT
1 |wN
1 ) ≈ max
vN
1 |wN
1
n
pα
(vN
1 |wN
1 )p(xT
1 |vN
1 , wN
1 )
o
with vN
1 a sequence of unknown writing variants
α a scaling exponent of the writing variant probability
I training: corpus and lexicon with supervised writing variants possible!
Dreuw et. al.: RWTH-OCR 10 / 36 Sousse, Tunisia March 2010
11. Visual Modeling: Feature Extraction and HMM Transitions
I recognition of characters within a context, temporal alignment necessary
I features: sliding window, no preprocessing, PCA reduction
I important: HMM whitespace models (a) and state-transition penalties (b)
(a) (b)
Dreuw et. al.: RWTH-OCR 11 / 36 Sousse, Tunisia March 2010
12. Visual Modeling: Writing Variants Lexicon
I most reported error rates are dependent on the number of PAWs
I without separate whitespace model
I always whitespaces between compound words
I whitespaces as writing variants between and within words
White-Space Models for Pieces of Arabic Words [Dreuw & Jonas+
08] in ICPR 2008
Dreuw et. al.: RWTH-OCR 12 / 36 Sousse, Tunisia March 2010
13. Visual Modeling: Model Length Estimation
I more complex characters should be represented by more HMM states
I the number of states Sc for each character c is updated by
Sc =
Nx,c
Nc
· α
with
Sc = estimated number states for character c
Nx,c = number of observations aligned to character c
Nc = character count of c seen in training
α = character length scaling factor.
[Visualization]
Dreuw et. al.: RWTH-OCR 13 / 36 Sousse, Tunisia March 2010
14. RWTH-OCR Training and Decoding Architectures
I Training
. Maximum Likelihood (ML)
. CMLLR-based Writer Adaptive Training (WAT)
. discriminative training using modified-MMI criterion (M-MMI)
I Decoding
. 1-pass
◦ ML model
◦ M-MMI model
. 2-pass
◦ segment clustering for CMLLR writer adaptation
◦ unsupervised confidence-based M-MMI training for model adaptation
Dreuw et. al.: RWTH-OCR 14 / 36 Sousse, Tunisia March 2010
15. Discriminative Training: Modified-MMI Criterion
I training: weighted accumulation of observations xt:
accs =
R
X
r=1
Tr
X
t=1
ωr,s,t · xt
1. ML: Maximum Likelihood
ωr,s,t := 1.0
2. MMI: Maximum Mutual Information
ωr,s,t :=
P
sTr
1 :st=s
p(xTr
1 |sTr
1 )p(sTr
1 )p(Wr)
P
V
P
sTr
1 :st=s
p(xTr
1 |sTr
1 )p(sTr
1 )p(V )
I ωr,s,t is the “(true) posterior” weight
I iteratively optimized with Rprop
Dreuw et. al.: RWTH-OCR 15 / 36 Sousse, Tunisia March 2010
16. Discriminative Training: Modified-MMI Criterion
I margin-based training for HMMs
. similar to SVM training, but simpler/faster
within RWTH-OCR framework?
. M-MMI = differentiable approximation to
SVM optimization 0
1
2
3
4
5
-4 -2 0 2 4 6
loss
d
hinge
MMI
modified MMI
3. M-MMI:
ωr,s,t(ρ 6= 0) :=
P
sTr
1 :st=s
[p(xTr
1 |sTr
1 )p(sTr
1 )p(Wr) · e−ρδ(Wr,Wr)
]γ
P
V
P
sTr
1 :st=s
[p(xTr
1 |sTr
1 )p(sTr
1 )p(V ) · e−ρδ(Wr,V )]γ
I ωr,s,t is the “margin posterior” weight
I e−ρδ(Wr,Wr)
corresponds to the margin offset
I with γ → ∞ equals to the SVM hinge loss function
I iteratively optimized with Rprop
Dreuw et. al.: RWTH-OCR 16 / 36 Sousse, Tunisia March 2010
17. Decoding: Unsupervised Confidence-Based Discriminative Training
I example for a word-graph and the corresponding 1-best state alignment
0
0
1
1 00
00
11
11 00
00
11
11 00
00
11
11 00
00
00
11
11
11
0
0
1
1
00
00
00
11
11
11
0
0
0
1
1
1
00
00
11
11
0
0
0
1
1
1
00
00
00
11
11
11
0
0
0
1
1
1
00
00
00
11
11
11
0
0
1
1 0
0
1
1
00
00
11
11
0
0
1
1
0
0
0
1
1
1
0
0
0
1
1
1
0
0
0
1
1
1
00
00
00
11
11
11
00
00
11
11
0
0
0
1
1
1
00
00
00
11
11
11
00
00
11
11 0
0
0
1
1
1 0
0
0
1
1
1
0
0
1
1
00
00
11
11
0
0
1
1
P P P P P P P
c = 0.001
c = 0.1
c = 0.7
I necessary steps for margin-based model adaptation during decoding:
. 1-pass recognition (unsupervised transcriptions and word-graph)
. calculation of corresponding confidences (sentence, word, or state-level)
. unsupervised M-MMI-conf training on test data
to adapt models (w/ regularization)
I can be done iteratively with unsupervised corpus update!
Dreuw et. al.: RWTH-OCR 17 / 36 Sousse, Tunisia March 2010
18. Decoding: Modified-MMI Criterion And Confidences
4. M-MMI-conf:
ωr,s,t(ρ 6= 0) :=
P
sTr
1 :st=s
p(xTr
1 |sTr
1 )p(sTr
1 )p(Wr) · e−ρδ(Wr,Wr)
X
V
X
sTr
1 :st=s
p(xTr
1 |sTr
1 )p(sTr
1 )p(V )
| {z }
posterior
· e−ρδ(Wr,V )
| {z }
margin
· δ(cr,s,t > cthreshold)
| {z }
confidence
I weighted accumulation becomes:
accs =
R
X
r=1
Tr
X
t=1
ωr,s,t(ρ)
| {z }
margin posteriorρ6=0
· cr,s,t
|{z}
confidence
· xt
I confidences at:
. sentence-, word-, or state-level
Dreuw et. al.: RWTH-OCR 18 / 36 Sousse, Tunisia March 2010
19. Training Criterions
I ML training: accumulation of observations xt:
accs =
R
X
r=1
Tr
X
t=1
xt
I M-MMI training: weighted accumulation of observations xt:
accs =
R
X
r=1
Tr
X
t=1
ωr,s,t · xt
I M-MMI-conf training: confidence-weighted accumulation of observations xt:
accs =
R
X
r=1
Tr
X
t=1
ωr,s,t · cr,s,t · xt
. with confidence cr,s,t at sentence-, word, or state-level
Dreuw et. al.: RWTH-OCR 19 / 36 Sousse, Tunisia March 2010
20. Results - Unsupervised Model Adaptation: M-MMI-conf
I M-MMI criterion with posterior confidences (M-MMI-conf)
I unsupervised training for model adaptation during decoding
I word-confidence based M-MMI-conf training and rejections
19.1
19.2
19.3
19.4
19.5
19.6
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
1000
2000
3000
4000
5000
6000
7000
8000
WER[%]
word confidence
confidence-based M-MMI writer adaptation
M-MMI baseline
#rejected segments
. confidence threshold c = 0.5 → more than 60% segment rejection rate
. small amount of adaptation data only
Dreuw et. al.: RWTH-OCR 20 / 36 Sousse, Tunisia March 2010
21. Results - Unsupervised Model Adaptation: M-MMI-conf
I unsupervised training for model adaptation during decoding
I state-confidence based M-MMI-conf training and rejections
. arc posteriors from the lattice output from the decoder
. only word frames aligned with a high confidence in 1st pass
→ unsupervised model adaptation
. only 5% frame rejection rate (20,970 frames of 396,416)
I ICDAR 2005 Setup [Comparison]
Training/Adaptation WER[%] CER[%]
ML 21.86 8.11
M-MMI 19.51 7.00
+ unsupervised adaptation 20.11 7.34
+ word-confidences 19.23 7.02
+ state-confidences 17.75 6.49
+ supervised adaptation 2.06 0.77
Dreuw et. al.: RWTH-OCR 21 / 36 Sousse, Tunisia March 2010
22. Results - Training: ML vs. MMI vs. Modified-MMI Criterion
I ML = Maximum Likelihood
I MLE = Model Length Estimation
I MMI vs. modified-MMI after 30 Rprop iterations
I ICDAR 2005 Setup [Comparison]
WER [%]
Train Test ML +MLE +MMI +Modified MMI
abc d 10.88 7.80 7.44 6.12
abd c 11.50 8.71 8.24 6.78
acd b 10.97 7.84 7.56 6.08
bcd a 12.19 8.66 8.43 7.02
abcd e 21.86 16.82 16.44 15.35
Dreuw et. al.: RWTH-OCR 22 / 36 Sousse, Tunisia March 2010
23. Visual Inspection of M-MMI Training
Dreuw et. al.: RWTH-OCR 23 / 36 Sousse, Tunisia March 2010
24. Constrained Maximum Likelihood Linear Regression (CMLLR)
I writer adaptation
. method for improving visual models in handwriting recognition
. refine models by adaptation data of particular writers
. widely used is affine transform based model adaptation
I CMLLR
. Idea: normalize writing styles by adaptation of the features xt
. constrained MLLR feature adaptation technique
. also known as feature space MLLR (fMLLR) [Details]
. estimate affine feature transform:
x0
t = Axt + b
. CMLLR is text dependent
◦ requires an (automatic) transcription
Dreuw et. al.: RWTH-OCR 24 / 36 Sousse, Tunisia March 2010
25. Training: CMLLR-based Writer Adaptive Training
I writer adaptation compensates for writer differences during recognition
→ do the same during visual model training
→ maximize the performance gains from writer adaptation
I writer variations are compensated by writer adaptive training (WAT)
I writer normalization using CMLLR
I necessary steps
1. train writer independent GMMs model
2. CMLLR transformations are estimated for each (estimated) writer
. supervised if writers are known
3. apply CMLLR transformations on features to train writer dependent GMMs
Dreuw et. al.: RWTH-OCR 25 / 36 Sousse, Tunisia March 2010
26. Decoding: CMLLR-based Writer Adaptation
I writers and writing styles are unknown
I necessary steps
1. estimate writing styles using clustering
. Bayesian Information Criterion (BIC) based stopping condition
2. estimate CMLLR feature transformations
for every estimated writing style cluster
3. second pass recognition
. WAT models + CMLLR transformed features
Sys.1 Sys.2
Decoder
Writer Independent
Pass 1: Pass 2:
Clustering CMLLR Decoder
WAT+CMLLR
Dreuw et. al.: RWTH-OCR 26 / 36 Sousse, Tunisia March 2010
27. Results - Decoding: Writer Adaptation
I comparison of MLE, WAT, and CMLLR based feature adaptation
I comparison of unsupervised and supervised writer clustering
. decoding always unsupervised
. supervised clustering → only the writer labels are used!
Train Test WER[%]
1st pass 2nd pass
ML +MLE WAT+CMLLR
unsup. sup.
abc d 10.88 7.83 7.72 5.82
abd c 11.50 8.83 9.05 5.96
acd b 10.97 7.81 7.99 6.04
bcd a 12.19 8.70 8.81 6.49
abcd e 21.86 16.82 17.12 11.22
Dreuw et. al.: RWTH-OCR 27 / 36 Sousse, Tunisia March 2010
28. Results - Decoding: Writer Adaptation
I unsupervised clustering: error analysis
. histograms for segment assignments over the different test folds
. problem: unbalanced segment assignments
Dreuw et. al.: RWTH-OCR 28 / 36 Sousse, Tunisia March 2010
29. Arabic Handwriting - Experimental Results for IFN/ENIT
I Writer Adaptive Training + CMLLR for Writer Adaptation
see [Dreuw & Rybach+
09], ICDAR 2009 [Visualization]
I M-MMI Training + Unsupervised Confidence-Based Model Adaptation
see [Dreuw & Heigold+
09], ICDAR 2009 [Details]
I ICDAR 2005 Setup [Comparison]
Train Test WER[%]
1st pass 2nd pass
ML +MLE +M-MMI WAT+CMLLR M-MMI-conf
unsup. sup.
abc d 10.88 7.83 6.12 7.72 5.82 5.95
abd c 11.50 8.83 6.78 9.05 5.96 6.38
acd b 10.97 7.81 6.08 7.99 6.04 5.84
bcd a 12.19 8.70 7.02 8.81 6.49 6.79
abcd e 21.86 16.82 15.35 17.12 11.22 14.55
Dreuw et. al.: RWTH-OCR 29 / 36 Sousse, Tunisia March 2010
30. Arabic Handwriting - Experimental Results for IFN/ENIT
I evaluation of RWTH-OCR systems at Arabic HWR Competition, ICDAR 2009
. external evaluation at TU Braunschweig, Germany
. set f and set s are unknown (not available)
. unsupervised M-MMI-conf model adaptation achieved similar improvements
. 3rd rank (group)
ID WRR[%]
set fa set ff set fg set f set s
RWTH-OCR, ID12 86.97 88.08 87.98 85.51 71.33
RWTH-OCR, ID13 87.17 88.63 88.68 85.69 72.54
RWTH-OCR, ID15 86.97 88.08 87.98 83.90 65.99
A2iA, ID8 90.66 91.92 92.31 89.42 76.66
MDLSTM, ID11 94.68 95.65 96.02 93.37 81.06
I Note:
. focus on modeling (ID12 and ID13) and speed (ID15) - no preprocessing
Dreuw et. al.: RWTH-OCR 30 / 36 Sousse, Tunisia March 2010
31. Summary
I RWTH-ASR → RWTH-OCR
. simple feature extraction and preprocessing
. Arabic: created a SOTA system, ranked 3rd at ICDAR 2009
I discriminative training
. margin-based HMM training (ML vs. MMI vs. M-MMI)
. unsupervised confidence-based MMI model adaptation (M-MMI-conf)
I writer adaptive training
. supervised writer adaptation demonstrated the potential
I ongoing work
. impact of preprocessing in feature extraction (Arabic vs. Latin)
. more complex features (e.g. MLP)
. character context modeling (e.g. CART)
. Latin: created a SOTA system, best single system
Dreuw et. al.: RWTH-OCR 31 / 36 Sousse, Tunisia March 2010
32. Outlook: Latin Handwriting - IAM Database
I English handwriting, continuous sentences
Train Devel Eval 1 Eval 2 Total
Lines 6,161 1,861 900 940 9,862
Running words 53,884 17,720 7,901 8,568 88,073
Vocabulary size 7,754 3,604 2,290 2,290 11,368
Characters 281,744 83,641 41,672 42,990 450,047
Writers 283 128 46 43 500
OOV Rate ≈15% ≈17% ≈15%
I Example lines:
Dreuw et. al.: RWTH-OCR 32 / 36 Sousse, Tunisia March 2010
33. Outlook: Latin Handwriting - UPV Preprocessing
I Original images
I Images after color normalisation
I Images after slant correction
I Images after height normalisation
Note: preprocessing did not help for Arabic handwriting [Visualization]
Dreuw et. al.: RWTH-OCR 33 / 36 Sousse, Tunisia March 2010
34. Outlook: Latin Handwriting - Experimental Results on IAM Database
Systems Devel WER [%] Eval WER [%]
RWTH-OCR
Baseline* 81.07 83.60
+ UPV Preprocessing* 57.59 65.26
+ LBW LM & 50k Lexicon* 31.92 38.98
+ discriminative training (M-MMI) 26.19 32.52
+ confidences (M-MMI-conf) - 31.87
+ discriminative training (M-MPE) 24.31 30.07
+ confidences (M-MPE-conf) 23.75 29.23
Other Single HMM Systems
[Bertolami & Bunke 08] 30.98 35.52
[Natarajan & Saleem+
08] - 40.01∗∗
[Romero & Alabau+
07] 30.6∗∗
-
System Combination
[Bertolami & Bunke 08] 26.85 32.83
*see [Jonas 09] for details
** different data
Dreuw et. al.: RWTH-OCR 34 / 36 Sousse, Tunisia March 2010
35. Thank you for your attention
Philippe Dreuw
dreuw@cs.rwth-aachen.de
http://www-i6.informatik.rwth-aachen.de/
Dreuw et. al.: RWTH-OCR 35 / 36 Sousse, Tunisia March 2010
36. References
[Bertolami & Bunke 08] R. Bertolami, H. Bunke: Hidden Markov model-based
ensemble methods for offline handwritten text line recognition. Pattern
Recognition, Vol. 41, No. 11, pp. 3452–3460, Nov 2008. 35
[Dreuw & Heigold+
09] P. Dreuw, G. Heigold, H. Ney: Confidence-Based
Discriminative Training for Writer Adaptation in Offline Arabic Handwriting
Recognition. In International Conference on Document Analysis and
Recognition, Barcelona, Spain, July 2009. 30
[Dreuw & Jonas+
08] P. Dreuw, S. Jonas, H. Ney: White-Space Models for
Offline Arabic Handwriting Recognition. In International Congress on Pattern
Recognition, pp. 1–4, Tampa, Florida, USA, Dec 2008. 12, 47
[Dreuw & Rybach+
09] P. Dreuw, D. Rybach, C. Gollan, H. Ney: Writer Adaptive
Training and Writing Variant Model Refinement for Offline Arabic Handwriting
Recognition. In International Conference on Document Analysis and
Recognition, Barcelona, Spain, July 2009. 30
[Jonas 09] S. Jonas: Improved Modeling in Handwriting Recognition. Master’s
thesis, Human Language Technology and Pattern Recognition Group, RWTH
Aachen University, Aachen, Germany, Jun 2009. 35
Dreuw et. al.: RWTH-OCR 36 / 36 Sousse, Tunisia March 2010
37. [Natarajan & Saleem+
08] P. Natarajan, S. Saleem, R. Prasad, E. MacRostie,
K. Subramanian: Arabic and Chinese Handwriting Recognition, Vol.
4768/2008 of LNCS, chapter Multi-lingual Offline Handwriting Recognition
Using Hidden Markov Models: A Script-Independent Approach, pp. 231–250.
Springer Berlin / Heidelberg, 2008. 35
[Romero & Alabau+
07] V. Romero, V. Alabau, J.M. Benedi: Combination of
N-Grams and Stochastic Context-Free Grammars in an Offline Handwritten
Recognition System. Lecture Notes in Computer Science, Vol. 4477,
pp. 467–474, 2007. 35
[Rybach & Gollan+
09] D. Rybach, C. Gollan, G. Heigold, B. Hoffmeister,
J. Lööf, R. Schlüter, H. Ney: The RWTH Aachen University Open Source
Speech Recognition System. In Interspeech, Brighton, U.K., Sep 2009. 4
Dreuw et. al.: RWTH-OCR 37 / 36 Sousse, Tunisia March 2010
38. Appendix: Comparisons for IFN/ENIT
I ICDAR 2005 Evaluation
Rank Group WRR [%]
abc-d abcd-e
1. UOB 85.00 75.93
2. ARAB-IFN 87.94 74.69
3. ICRA (Microsoft) 88.95 65.74
4. SHOCRAN 100.00 35.70
5. TH-OCR 30.13 29.62
BBN 89.49 N.A.
1* RWTH 94.05 85.45
*own evaluation result (no tuning on test data)
Dreuw et. al.: RWTH-OCR 38 / 36 Sousse, Tunisia March 2010
39. Appendix: Arabic Handwriting - IFN/ENIT Database
Corpus development
I ICDAR 2005 Competition: a, b, c, d sets for training, evaluation on set e
I ICDAR 2007 Competition: ICDAR05 + e sets for training, evaluation on set f
I ICDAR 2009 Competition: ICDAR 2007 for training, evaluation on set f
Dreuw et. al.: RWTH-OCR 39 / 36 Sousse, Tunisia March 2010
40. Appendix: Participating Systems at ICDAR 2005 and 2007
I MITRE: Mitre Cooperation, USA
over-segmentation, adaptive lengths, character recognition with post-processing
I UOB-ENST: University of Balamand (UOB), Lebanon and Ecole Nationale Superieure des Telecommunications (ENST), Paris
HMM-based (HTK), slant correction
I MIE: Mie University, Japan
segmentation, adaptive lengths
I ICRA: Intelligent Character Recognition for Arabic, Microsoft
partial word recognizer
I SHOCRAN: Egypt
confidential
I TH-OCR: Tsinghua Universty, Beijing, China
over-segmentation, character recognition with post-processing
I CACI: Knowledge and Information Management Division, Lanham, USA
HMM-based, trajectory features
I CEDAR: Center of Excellence for Document Analysis and Recognition, Buffalo, USA
over-segmentation, HMM-based
I PARIS V / A2iA: University of Paris 5, and A2iA SA, France
hybrid HMM/NN-based, shape-alphabet
I Siemens: SIEMENS AG Industrial Solutions and Services, Germany
HMM-based, adapative lenghths, writing variants
I ARAB-IFN: TU Braunschweig, Germany
HMM-based
Dreuw et. al.: RWTH-OCR 40 / 36 Sousse, Tunisia March 2010
41. Appendix: Visual Modeling - Model Length Estimation
I more complex characters should be represented by more HMM states
I the number of states Sc for each character c is updated by
Sc =
Nx,c
Nc
· α
with
Sc = estimated number states for character c
Nx,c = number of observations aligned to character c
Nc = character count of c seen in training
α = character length scaling factor.
[Visualization]
Dreuw et. al.: RWTH-OCR 41 / 36 Sousse, Tunisia March 2010
42. Appendix: Visual Modeling - Model Length Estimation
Original Length
I overall mean of character length = 7.9 pixel (≈ 2.6 pixel/state)
I total #states = 357
Dreuw et. al.: RWTH-OCR 42 / 36 Sousse, Tunisia March 2010
43. Appendix: Visual Modeling - Model Length Estimation
Estimated Length
I overall mean of character length = 6.2 pixel (≈ 2.0 pixel/state)
I total #states = 558
Dreuw et. al.: RWTH-OCR 43 / 36 Sousse, Tunisia March 2010
44. Appendix: Alignment Visualization
I alignment visualization with and without discriminative training
I upper lines with 5-2 baseline setup, lower lines with additional
discriminative training
Dreuw et. al.: RWTH-OCR 44 / 36 Sousse, Tunisia March 2010
45. Appendix: Arabic Handwriting - UPV Preprocessing
I Original images
I Images after slant correction
I Images after size normalisation
Experimental Results:
I important informations in ascender and descender areas are lost
I not yet suitable for Arabic HWR
Dreuw et. al.: RWTH-OCR 45 / 36 Sousse, Tunisia March 2010
46. Appendix: Visual Modeling - Writing Variants Lexicon
I most reported error rates are dependent on the number of PAWs
I without separate whitespace model
I always whitespaces between compound words
I whitespaces as writing variants between and within words
White-Space Models for Pieces of Arabic Words [Dreuw & Jonas+
08] in ICPR 2008
Dreuw et. al.: RWTH-OCR 46 / 36 Sousse, Tunisia March 2010
47. Appendix: Constrained Maximum Likelihood Linear Regression
Idea: improve the hypotheses by adaptation of the features xt
I effective algorithm for adaptation to a new speaker or environment (ASR)
I GMMs are used to estimate the CMLLR transform
I iterative optimization (ML criterion)
. align each frame x to one HMM state (i.e. GMM)
. accumulate to estimate the adaptation transform A
. likelihood function of the adaptation data given the model is to be
maximized with respect to the transform parameters A, b
I one CMLLR transformation per (estimated) writer
I constrained refers to the use of the same matrix A for
the transformation of the mean µ and variance Σ:
x0
t = Axt + b → N(x|µ̂, Σ̂) with µ̂ = Aµ + b
Σ̂ = AΣAT
Dreuw et. al.: RWTH-OCR 47 / 36 Sousse, Tunisia March 2010