Handwritten Text Recognition:
Key concepts
PD Dr. Roger Labahn
Computational Intelligence Technology Lab
Mathematical Optimization Group
Institute for Mathematics
University of Rostock
co:op Convention | READ Kickoff 19.01.2016
Handwritten Text Recognition: Key concepts
Introduction
Concepts – Problems – Tasks
Recognition & Training
Interpretation – Decoding
Epilog
co:op Convention | READ Kickoff HTR Key Concepts | Introduction Roger Labahn | C I T lab
Framework – Workflow
...
...
...
• Application: keyword search, transcription, . . .. . .. . .
OUT textual information (words, positions, ...) with alternatives & confidences
⇑⇑⇑
• HTR-Engine
⇑⇑⇑
IN writing images (lines, words, table cells, form fields, ...)
• Layout Analysis: . . .. . .. . . , text blocks
...
...
...
co:op Convention | READ Kickoff HTR Key Concepts | Introduction Roger Labahn | C I T lab
Framework – Workflow
...
...
...
• Application: keyword search, transcription, . . .. . .. . .
OUT textual information (words, positions, ...) with alternatives & confidences
⇑⇑⇑
• HTR-Engine
⇑⇑⇑
IN writing images (lines, words, table cells, form fields, ...)
• Layout Analysis: . . .. . .. . . , text blocks
...
...
...
co:op Convention | READ Kickoff HTR Key Concepts | Introduction Roger Labahn | C I T lab
Alternative recognition strategies
Topological methods
• learn & read graphical substructures of writings
• arcs, lines, curves, holes, ...
HMM based methods
• Hidden Markov Models
• learn & read states while traversing the writing
RNN based methods
• Recurrent Neural Networks
co:op Convention | READ Kickoff HTR Key Concepts | Introduction Roger Labahn | C I T lab
Alternative recognition strategies
Topological methods
• learn & read graphical substructures of writings
• arcs, lines, curves, holes, ...
HMM based methods
• Hidden Markov Models
• learn & read states while traversing the writing
RNN based methods
• Recurrent Neural Networks
co:op Convention | READ Kickoff HTR Key Concepts | Introduction Roger Labahn | C I T lab
Recognition Engine C I T lab & MoU partner PLANET
Decoding textual output
• textual interpretation of recognition results
• matching external requierements / knowledge (dictionaries, language model, ...)
⇑⇑⇑ ⇑⇑⇑
Recognition recognition matrix
• recognition information from image information
• processing standardized writing image
⇑⇑⇑ ⇑⇑⇑
Writing preprocessing standardized writing
• corrections & normalizations
• e.g.: baseline, slant, height, ...
co:op Convention | READ Kickoff HTR Key Concepts | Introduction Roger Labahn | C I T lab
Introduction
Concepts – Problems – Tasks
Segmentation
Context
Language
HTR
Recognition & Training
Interpretation – Decoding
Epilog
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks Roger Labahn | C I T lab
Segmentation ?
(classical) OCR = Optical Character Recognition
• Reading single characters Sub-images per character ! ?
• B a d ␣ D o ??? a n
Segmentationfree Reading
• processing the entire writing image: word . . .. . .. . . line . . .. . .. . .
• scanning information data sequence (signal) / character sequence
•
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
Segmentation ? NONE !
(classical) OCR = Optical Character Recognition
• Reading single characters Sub-images per character ! ?
• B a d ␣ D o ??? a n
Segmentationfree Reading
• processing the entire writing image: word . . .. . .. . . line . . .. . .. . .
• scanning information data sequence (signal) / character sequence
•
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
Segmentation ? NONE !
(classical) OCR = Optical Character Recognition
• Reading single characters Sub-images per character ! ?
• B a d ␣ D o ??? a n
Segmentationfree Reading
• processing the entire writing image: word . . .. . .. . . line . . .. . .. . .
• scanning information data sequence (signal) / character sequence
• B
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
Segmentation ? NONE !
(classical) OCR = Optical Character Recognition
• Reading single characters Sub-images per character ! ?
• B a d ␣ D o ??? a n
Segmentationfree Reading
• processing the entire writing image: word . . .. . .. . . line . . .. . .. . .
• scanning information data sequence (signal) / character sequence
• BB
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
Segmentation ? NONE !
(classical) OCR = Optical Character Recognition
• Reading single characters Sub-images per character ! ?
• B a d ␣ D o ??? a n
Segmentationfree Reading
• processing the entire writing image: word . . .. . .. . . line . . .. . .. . .
• scanning information data sequence (signal) / character sequence
• BB.
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
Segmentation ? NONE !
(classical) OCR = Optical Character Recognition
• Reading single characters Sub-images per character ! ?
• B a d ␣ D o ??? a n
Segmentationfree Reading
• processing the entire writing image: word . . .. . .. . . line . . .. . .. . .
• scanning information data sequence (signal) / character sequence
• BB.a
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
Segmentation ? NONE !
(classical) OCR = Optical Character Recognition
• Reading single characters Sub-images per character ! ?
• B a d ␣ D o ??? a n
Segmentationfree Reading
• processing the entire writing image: word . . .. . .. . . line . . .. . .. . .
• scanning information data sequence (signal) / character sequence
• BB.ad
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
Segmentation ? NONE !
(classical) OCR = Optical Character Recognition
• Reading single characters Sub-images per character ! ?
• B a d ␣ D o ??? a n
Segmentationfree Reading
• processing the entire writing image: word . . .. . .. . . line . . .. . .. . .
• scanning information data sequence (signal) / character sequence
• BB.ad␣
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
Segmentation ? NONE !
(classical) OCR = Optical Character Recognition
• Reading single characters Sub-images per character ! ?
• B a d ␣ D o ??? a n
Segmentationfree Reading
• processing the entire writing image: word . . .. . .. . . line . . .. . .. . .
• scanning information data sequence (signal) / character sequence
• BB.ad␣.
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
Segmentation ? NONE !
(classical) OCR = Optical Character Recognition
• Reading single characters Sub-images per character ! ?
• B a d ␣ D o ??? a n
Segmentationfree Reading
• processing the entire writing image: word . . .. . .. . . line . . .. . .. . .
• scanning information data sequence (signal) / character sequence
• BB.ad␣.D
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
Image context is essential !
Single segment without context
•
• virtually not (sufficiently) readable
Character sequence without context
•
• virtually not (sufficiently) explainable
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Context Roger Labahn | C I T lab
Image context is essential !
Single segment without context
• u ?? OR n ??
• virtually not (sufficiently) readable
Character sequence without context
• ???
• virtually not (sufficiently) explainable
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Context Roger Labahn | C I T lab
Language context is essential !
Free reading – no restrictions for possible reading results
• BB.ad␣DDolo.auu
• application: figures & general numbers, ...
Comparison against dictionary or keyword
• task: • Read a german city name from a given list !
• Find the name Bad Doberan !
• Bad Doberan
• goal: optimal / possible correspondence
writing / reading result dictionary entry / keyword
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Language Roger Labahn | C I T lab
Language context is essential !
Free reading – no restrictions for possible reading results
• BB.ad␣DDolo.auu
• application: figures & general numbers, ...
Comparison against dictionary or keyword
• task: • Read a german city name from a given list !
• Find the name Bad Doberan !
• Bad Doberan
• goal: optimal / possible correspondence
writing / reading result dictionary entry / keyword
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Language Roger Labahn | C I T lab
Language context is essential !
Free reading – no restrictions for possible reading results
• BB.ad␣DDolo.auu
• application: figures & general numbers, ...
Comparison against dictionary or keyword
• task: • Read a german city name from a given list !
• Find the name Bad Doberan !
• Bad Doberan
• goal: optimal / possible correspondence
writing / reading result dictionary entry / keyword
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Language Roger Labahn | C I T lab
OCR ?
new paradigm – new concepts
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | HTR Roger Labahn | C I T lab
OCR ? HTR !
new paradigm – new concepts new term !
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | HTR Roger Labahn | C I T lab
OCR ? HTR !
new paradigm – new concepts new term !
• HTR Handwritten Text Recognition
• ATR Automatic Text Recognition
• ... ???
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | HTR Roger Labahn | C I T lab
Introduction
Concepts – Problems – Tasks
Recognition & Training
Feature extraction
Writing processing
Neural Network
Parameter training
Interpretation – Decoding
Epilog
co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training Roger Labahn | C I T lab
From pixel values to features
original grey image
Filtering
co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Feature extraction Roger Labahn | C I T lab
From pixel values to features
original grey image
Filtering
co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Feature extraction Roger Labahn | C I T lab
From pixel values to features
original grey image
Filtering
co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Feature extraction Roger Labahn | C I T lab
From pixel values to features
original grey image
Filtering
co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Feature extraction Roger Labahn | C I T lab
From pixel values to features
original grey image
Filtering
co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Feature extraction Roger Labahn | C I T lab
From pixel values to features
original grey image
Filtering
co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Feature extraction Roger Labahn | C I T lab
Collect & remember context !
Writing processing
• scanning in different directions data sequences (signals)
•
Information memory
• neural networks with complex neurons (cells)
• recurrent connections =⇒=⇒=⇒ memory
co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Writing processing Roger Labahn | C I T lab
Collect & remember context !
Writing processing
• scanning in different directions data sequences (signals)
•
Information memory
• neural networks with complex neurons (cells)
• recurrent connections =⇒=⇒=⇒ memory
co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Writing processing Roger Labahn | C I T lab
Complex cells
co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Neural Network Roger Labahn | C I T lab
Complex cells – memory by recurrent connections
6
?
??

co:op Convention | READ Kickoff HTR Key Concepts | Recognition  Training | Neural Network Roger Labahn | C I T lab
Hierarchical Neuronal Networks
co:op Convention | READ Kickoff HTR Key Concepts | Recognition  Training | Neural Network Roger Labahn | C I T lab
From feature input to network output
(Figure from GRAVES, SCHMIDHUBER: Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks)
co:op Convention | READ Kickoff HTR Key Concepts | Recognition  Training | Neural Network Roger Labahn | C I T lab
From feature input to network output
co:op Convention | READ Kickoff HTR Key Concepts | Recognition  Training | Neural Network Roger Labahn | C I T lab
Parameter training: Machine Learning
Theory
• objective: optimally adapt parameters in cells  along network connections
• idea: train the network with learning data samples
• optimization: minimize error (network output vs. sample target) over training data
Practice: impression of large application cases
• 104
network cells
• 106
trainable parameters
• 104
learning data samples (writing images)
• 150 training epochs each processing every sample once
• 4 weeks training from the scratch
co:op Convention | READ Kickoff HTR Key Concepts | Recognition  Training | Parameter training Roger Labahn | C I T lab
Learning data . . .. . .. . .
• . . .. . .. . . labeled training samples ground truth
HTR: writing images with correct text
• . . .. . .. . . the more the better . . .. . .. . . BUT:
start with realistic (reasonable) number improve while working
• . . .. . .. . . represent all project data . . .. . .. . . BUT:
start with HTR (networks) from similar collections  corpora
• . . .. . .. . . contribute to general HTR engine improvement:
put into network repository for specific application cases
co:op Convention | READ Kickoff HTR Key Concepts | Recognition  Training | Parameter training Roger Labahn | C I T lab
Introduction
Concepts – Problems – Tasks
Recognition  Training
Interpretation – Decoding
Network output
Decoding
Epilog
co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding Roger Labahn | C I T lab
Channel probabilities
Pre-conditions
• (abstract) alphabet of (abstract) characters
• text composed of exactly these characters
• alphabet characters ⇐⇒⇐⇒⇐⇒ network output neurons channels
• example: digits, uppercase letters, lowercase letters, special characters ␣-
• much more general: any symbol unit learnable from training data
• current (large) application case: up to 150 character channels
• independent from (natural) language – reading/writing direction – understanding
Network output
probability of (character) channel at writing (image) position
co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding | Network output Roger Labahn | C I T lab
Confidence Matrix – recognition / perception matrix
. B D a d l o u ␣
co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding | Network output Roger Labahn | C I T lab
Expression matching
• restrict to
permissible words dictionary
keyword(s) construct(s) regular expression
• consider
character confidences probability measure
or their negative logarithms distance measure
Algorithmic method
• compare confidence matrix against any permissible expression
• use extremely fast algorithm: Dynamic Programming
co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding | Decoding Roger Labahn | C I T lab
Expression matching
• restrict to
permissible words dictionary
keyword(s) construct(s) regular expression
• consider
character confidences probability measure
or their negative logarithms distance measure
Algorithmic method
• compare confidence matrix against any permissible expression
• use extremely fast algorithm: Dynamic Programming
co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding | Decoding Roger Labahn | C I T lab
Decoding
Objective – Result
• permissible expression(s) with best matching to recognition output
• best matching ⇐⇒⇐⇒⇐⇒ maximal probability ⇐⇒⇐⇒⇐⇒ minimum distance
• best alternatives ranked by measure (probability / distance)
Practice: impression of actual application cases
• only decoding on pre-processed lines
• searching 1 keyword in 10.500 lines (433 pages) : 2 - 3 sec. average
• reading 1 page against 11.650 word dictionary: 8 - 9 sec. average
co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding | Decoding Roger Labahn | C I T lab
Decoding
Objective – Result
• permissible expression(s) with best matching to recognition output
• best matching ⇐⇒⇐⇒⇐⇒ maximal probability ⇐⇒⇐⇒⇐⇒ minimum distance
• best alternatives ranked by measure (probability / distance)
Practice: impression of actual application cases
• only decoding on pre-processed lines
• searching 1 keyword in 10.500 lines (433 pages) : 2 - 3 sec. average
• reading 1 page against 11.650 word dictionary: 8 - 9 sec. average
co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding | Decoding Roger Labahn | C I T lab
Dynamic Programming
co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding | Decoding Roger Labahn | C I T lab
Dynamic Programming
co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding | Decoding Roger Labahn | C I T lab
Introduction
Concepts – Problems – Tasks
Recognition  Training
Interpretation – Decoding
Epilog
co:op Convention | READ Kickoff HTR Key Concepts | Epilog Roger Labahn | C I T lab
Results
from C I T lab’s contribution to ICDAR’s HTRtS-2015 contest
WER = 0%
CER = 0%
who has with or without right the temporary possession of it : and
who has with or without right the temporary possession of it : and
WER = 17%
CER = 4%
 operation of this act is spent upon Titius only , 
 operation of this act isspeut upon Titius only , 
WER = 67%
CER = 52%
of the said first issue : the amount of such second consequently gap/ to the
of the and put feet the without of such ; said uitrquunity be the
WER = 80%
CER = 17%
for a simple personal Injury the Offender ’ s punish=
For on simple personal injury the offenders punish .
2. Examples of test line images of increasing difficulty. The reference transcript and the CITlab system hypothesis are displayed (in this order) below
h image. The corresponding WER and CER figures are also shown on the right of each image.
the lines with crossed-out word can be transcribed as the
ine shows. Finally, we can see that sometimes, if the line
a large WER but a low CER, the transcript can be more
ul than if the WER is lower and the CER higher (see third
[4] A. Graves, M. Liwicki, S. Fern´andez, R. Bertolami, H. Bunke, and
J. Schmidhuber, “A Novel Connectionist System for Unconstrained
Handwriting Recognition,” IEEE Tr. PAMI, vol. 31, no. 5, pp. 855–
868, 2009.
(Figure from SÁNCHEZ, TOSELLI, ROMERO, VIDAL: ICDAR2015 Competition HTRtS: Handwritten Text Recognition on the tranScriptorium Dataset)
co:op Convention | READ Kickoff HTR Key Concepts | Epilog Roger Labahn | C I T lab
Thanks . . .. . .. . .
C I T lab Group – URO MoU Partner
PLANET intgelligent systems GmbH
EU Funding
Recognition  Enrichment of Archival Documents
. . .. . .. . . for your attention!
co:op Convention | READ Kickoff HTR Key Concepts | Epilog Roger Labahn | C I T lab
co:op Convention | READ Kickoff HTR Key Concepts | Epilog Roger Labahn | C I T lab

co:op-READ-Convention Marburg - Roger Labahn

  • 1.
    Handwritten Text Recognition: Keyconcepts PD Dr. Roger Labahn Computational Intelligence Technology Lab Mathematical Optimization Group Institute for Mathematics University of Rostock co:op Convention | READ Kickoff 19.01.2016
  • 2.
    Handwritten Text Recognition:Key concepts Introduction Concepts – Problems – Tasks Recognition & Training Interpretation – Decoding Epilog co:op Convention | READ Kickoff HTR Key Concepts | Introduction Roger Labahn | C I T lab
  • 3.
    Framework – Workflow ... ... ... •Application: keyword search, transcription, . . .. . .. . . OUT textual information (words, positions, ...) with alternatives & confidences ⇑⇑⇑ • HTR-Engine ⇑⇑⇑ IN writing images (lines, words, table cells, form fields, ...) • Layout Analysis: . . .. . .. . . , text blocks ... ... ... co:op Convention | READ Kickoff HTR Key Concepts | Introduction Roger Labahn | C I T lab
  • 4.
    Framework – Workflow ... ... ... •Application: keyword search, transcription, . . .. . .. . . OUT textual information (words, positions, ...) with alternatives & confidences ⇑⇑⇑ • HTR-Engine ⇑⇑⇑ IN writing images (lines, words, table cells, form fields, ...) • Layout Analysis: . . .. . .. . . , text blocks ... ... ... co:op Convention | READ Kickoff HTR Key Concepts | Introduction Roger Labahn | C I T lab
  • 5.
    Alternative recognition strategies Topologicalmethods • learn & read graphical substructures of writings • arcs, lines, curves, holes, ... HMM based methods • Hidden Markov Models • learn & read states while traversing the writing RNN based methods • Recurrent Neural Networks co:op Convention | READ Kickoff HTR Key Concepts | Introduction Roger Labahn | C I T lab
  • 6.
    Alternative recognition strategies Topologicalmethods • learn & read graphical substructures of writings • arcs, lines, curves, holes, ... HMM based methods • Hidden Markov Models • learn & read states while traversing the writing RNN based methods • Recurrent Neural Networks co:op Convention | READ Kickoff HTR Key Concepts | Introduction Roger Labahn | C I T lab
  • 7.
    Recognition Engine CI T lab & MoU partner PLANET Decoding textual output • textual interpretation of recognition results • matching external requierements / knowledge (dictionaries, language model, ...) ⇑⇑⇑ ⇑⇑⇑ Recognition recognition matrix • recognition information from image information • processing standardized writing image ⇑⇑⇑ ⇑⇑⇑ Writing preprocessing standardized writing • corrections & normalizations • e.g.: baseline, slant, height, ... co:op Convention | READ Kickoff HTR Key Concepts | Introduction Roger Labahn | C I T lab
  • 8.
    Introduction Concepts – Problems– Tasks Segmentation Context Language HTR Recognition & Training Interpretation – Decoding Epilog co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks Roger Labahn | C I T lab
  • 9.
    Segmentation ? (classical) OCR= Optical Character Recognition • Reading single characters Sub-images per character ! ? • B a d ␣ D o ??? a n Segmentationfree Reading • processing the entire writing image: word . . .. . .. . . line . . .. . .. . . • scanning information data sequence (signal) / character sequence • co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
  • 10.
    Segmentation ? NONE! (classical) OCR = Optical Character Recognition • Reading single characters Sub-images per character ! ? • B a d ␣ D o ??? a n Segmentationfree Reading • processing the entire writing image: word . . .. . .. . . line . . .. . .. . . • scanning information data sequence (signal) / character sequence • co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
  • 11.
    Segmentation ? NONE! (classical) OCR = Optical Character Recognition • Reading single characters Sub-images per character ! ? • B a d ␣ D o ??? a n Segmentationfree Reading • processing the entire writing image: word . . .. . .. . . line . . .. . .. . . • scanning information data sequence (signal) / character sequence • B co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
  • 12.
    Segmentation ? NONE! (classical) OCR = Optical Character Recognition • Reading single characters Sub-images per character ! ? • B a d ␣ D o ??? a n Segmentationfree Reading • processing the entire writing image: word . . .. . .. . . line . . .. . .. . . • scanning information data sequence (signal) / character sequence • BB co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
  • 13.
    Segmentation ? NONE! (classical) OCR = Optical Character Recognition • Reading single characters Sub-images per character ! ? • B a d ␣ D o ??? a n Segmentationfree Reading • processing the entire writing image: word . . .. . .. . . line . . .. . .. . . • scanning information data sequence (signal) / character sequence • BB. co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
  • 14.
    Segmentation ? NONE! (classical) OCR = Optical Character Recognition • Reading single characters Sub-images per character ! ? • B a d ␣ D o ??? a n Segmentationfree Reading • processing the entire writing image: word . . .. . .. . . line . . .. . .. . . • scanning information data sequence (signal) / character sequence • BB.a co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
  • 15.
    Segmentation ? NONE! (classical) OCR = Optical Character Recognition • Reading single characters Sub-images per character ! ? • B a d ␣ D o ??? a n Segmentationfree Reading • processing the entire writing image: word . . .. . .. . . line . . .. . .. . . • scanning information data sequence (signal) / character sequence • BB.ad co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
  • 16.
    Segmentation ? NONE! (classical) OCR = Optical Character Recognition • Reading single characters Sub-images per character ! ? • B a d ␣ D o ??? a n Segmentationfree Reading • processing the entire writing image: word . . .. . .. . . line . . .. . .. . . • scanning information data sequence (signal) / character sequence • BB.ad␣ co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
  • 17.
    Segmentation ? NONE! (classical) OCR = Optical Character Recognition • Reading single characters Sub-images per character ! ? • B a d ␣ D o ??? a n Segmentationfree Reading • processing the entire writing image: word . . .. . .. . . line . . .. . .. . . • scanning information data sequence (signal) / character sequence • BB.ad␣. co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
  • 18.
    Segmentation ? NONE! (classical) OCR = Optical Character Recognition • Reading single characters Sub-images per character ! ? • B a d ␣ D o ??? a n Segmentationfree Reading • processing the entire writing image: word . . .. . .. . . line . . .. . .. . . • scanning information data sequence (signal) / character sequence • BB.ad␣.D co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
  • 19.
    Image context isessential ! Single segment without context • • virtually not (sufficiently) readable Character sequence without context • • virtually not (sufficiently) explainable co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Context Roger Labahn | C I T lab
  • 20.
    Image context isessential ! Single segment without context • u ?? OR n ?? • virtually not (sufficiently) readable Character sequence without context • ??? • virtually not (sufficiently) explainable co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Context Roger Labahn | C I T lab
  • 21.
    Language context isessential ! Free reading – no restrictions for possible reading results • BB.ad␣DDolo.auu • application: figures & general numbers, ... Comparison against dictionary or keyword • task: • Read a german city name from a given list ! • Find the name Bad Doberan ! • Bad Doberan • goal: optimal / possible correspondence writing / reading result dictionary entry / keyword co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Language Roger Labahn | C I T lab
  • 22.
    Language context isessential ! Free reading – no restrictions for possible reading results • BB.ad␣DDolo.auu • application: figures & general numbers, ... Comparison against dictionary or keyword • task: • Read a german city name from a given list ! • Find the name Bad Doberan ! • Bad Doberan • goal: optimal / possible correspondence writing / reading result dictionary entry / keyword co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Language Roger Labahn | C I T lab
  • 23.
    Language context isessential ! Free reading – no restrictions for possible reading results • BB.ad␣DDolo.auu • application: figures & general numbers, ... Comparison against dictionary or keyword • task: • Read a german city name from a given list ! • Find the name Bad Doberan ! • Bad Doberan • goal: optimal / possible correspondence writing / reading result dictionary entry / keyword co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Language Roger Labahn | C I T lab
  • 24.
    OCR ? new paradigm– new concepts co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | HTR Roger Labahn | C I T lab
  • 25.
    OCR ? HTR! new paradigm – new concepts new term ! co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | HTR Roger Labahn | C I T lab
  • 26.
    OCR ? HTR! new paradigm – new concepts new term ! • HTR Handwritten Text Recognition • ATR Automatic Text Recognition • ... ??? co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | HTR Roger Labahn | C I T lab
  • 27.
    Introduction Concepts – Problems– Tasks Recognition & Training Feature extraction Writing processing Neural Network Parameter training Interpretation – Decoding Epilog co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training Roger Labahn | C I T lab
  • 28.
    From pixel valuesto features original grey image Filtering co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Feature extraction Roger Labahn | C I T lab
  • 29.
    From pixel valuesto features original grey image Filtering co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Feature extraction Roger Labahn | C I T lab
  • 30.
    From pixel valuesto features original grey image Filtering co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Feature extraction Roger Labahn | C I T lab
  • 31.
    From pixel valuesto features original grey image Filtering co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Feature extraction Roger Labahn | C I T lab
  • 32.
    From pixel valuesto features original grey image Filtering co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Feature extraction Roger Labahn | C I T lab
  • 33.
    From pixel valuesto features original grey image Filtering co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Feature extraction Roger Labahn | C I T lab
  • 34.
    Collect & remembercontext ! Writing processing • scanning in different directions data sequences (signals) • Information memory • neural networks with complex neurons (cells) • recurrent connections =⇒=⇒=⇒ memory co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Writing processing Roger Labahn | C I T lab
  • 35.
    Collect & remembercontext ! Writing processing • scanning in different directions data sequences (signals) • Information memory • neural networks with complex neurons (cells) • recurrent connections =⇒=⇒=⇒ memory co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Writing processing Roger Labahn | C I T lab
  • 36.
    Complex cells co:op Convention| READ Kickoff HTR Key Concepts | Recognition & Training | Neural Network Roger Labahn | C I T lab
  • 37.
    Complex cells –memory by recurrent connections 6 ? ?? co:op Convention | READ Kickoff HTR Key Concepts | Recognition Training | Neural Network Roger Labahn | C I T lab
  • 38.
    Hierarchical Neuronal Networks co:opConvention | READ Kickoff HTR Key Concepts | Recognition Training | Neural Network Roger Labahn | C I T lab
  • 39.
    From feature inputto network output (Figure from GRAVES, SCHMIDHUBER: Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks) co:op Convention | READ Kickoff HTR Key Concepts | Recognition Training | Neural Network Roger Labahn | C I T lab
  • 40.
    From feature inputto network output co:op Convention | READ Kickoff HTR Key Concepts | Recognition Training | Neural Network Roger Labahn | C I T lab
  • 41.
    Parameter training: MachineLearning Theory • objective: optimally adapt parameters in cells along network connections • idea: train the network with learning data samples • optimization: minimize error (network output vs. sample target) over training data Practice: impression of large application cases • 104 network cells • 106 trainable parameters • 104 learning data samples (writing images) • 150 training epochs each processing every sample once • 4 weeks training from the scratch co:op Convention | READ Kickoff HTR Key Concepts | Recognition Training | Parameter training Roger Labahn | C I T lab
  • 42.
    Learning data .. .. . .. . . • . . .. . .. . . labeled training samples ground truth HTR: writing images with correct text • . . .. . .. . . the more the better . . .. . .. . . BUT: start with realistic (reasonable) number improve while working • . . .. . .. . . represent all project data . . .. . .. . . BUT: start with HTR (networks) from similar collections corpora • . . .. . .. . . contribute to general HTR engine improvement: put into network repository for specific application cases co:op Convention | READ Kickoff HTR Key Concepts | Recognition Training | Parameter training Roger Labahn | C I T lab
  • 43.
    Introduction Concepts – Problems– Tasks Recognition Training Interpretation – Decoding Network output Decoding Epilog co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding Roger Labahn | C I T lab
  • 44.
    Channel probabilities Pre-conditions • (abstract)alphabet of (abstract) characters • text composed of exactly these characters • alphabet characters ⇐⇒⇐⇒⇐⇒ network output neurons channels • example: digits, uppercase letters, lowercase letters, special characters ␣- • much more general: any symbol unit learnable from training data • current (large) application case: up to 150 character channels • independent from (natural) language – reading/writing direction – understanding Network output probability of (character) channel at writing (image) position co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding | Network output Roger Labahn | C I T lab
  • 45.
    Confidence Matrix –recognition / perception matrix . B D a d l o u ␣ co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding | Network output Roger Labahn | C I T lab
  • 46.
    Expression matching • restrictto permissible words dictionary keyword(s) construct(s) regular expression • consider character confidences probability measure or their negative logarithms distance measure Algorithmic method • compare confidence matrix against any permissible expression • use extremely fast algorithm: Dynamic Programming co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding | Decoding Roger Labahn | C I T lab
  • 47.
    Expression matching • restrictto permissible words dictionary keyword(s) construct(s) regular expression • consider character confidences probability measure or their negative logarithms distance measure Algorithmic method • compare confidence matrix against any permissible expression • use extremely fast algorithm: Dynamic Programming co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding | Decoding Roger Labahn | C I T lab
  • 48.
    Decoding Objective – Result •permissible expression(s) with best matching to recognition output • best matching ⇐⇒⇐⇒⇐⇒ maximal probability ⇐⇒⇐⇒⇐⇒ minimum distance • best alternatives ranked by measure (probability / distance) Practice: impression of actual application cases • only decoding on pre-processed lines • searching 1 keyword in 10.500 lines (433 pages) : 2 - 3 sec. average • reading 1 page against 11.650 word dictionary: 8 - 9 sec. average co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding | Decoding Roger Labahn | C I T lab
  • 49.
    Decoding Objective – Result •permissible expression(s) with best matching to recognition output • best matching ⇐⇒⇐⇒⇐⇒ maximal probability ⇐⇒⇐⇒⇐⇒ minimum distance • best alternatives ranked by measure (probability / distance) Practice: impression of actual application cases • only decoding on pre-processed lines • searching 1 keyword in 10.500 lines (433 pages) : 2 - 3 sec. average • reading 1 page against 11.650 word dictionary: 8 - 9 sec. average co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding | Decoding Roger Labahn | C I T lab
  • 50.
    Dynamic Programming co:op Convention| READ Kickoff HTR Key Concepts | Interpretation – Decoding | Decoding Roger Labahn | C I T lab
  • 51.
    Dynamic Programming co:op Convention| READ Kickoff HTR Key Concepts | Interpretation – Decoding | Decoding Roger Labahn | C I T lab
  • 52.
    Introduction Concepts – Problems– Tasks Recognition Training Interpretation – Decoding Epilog co:op Convention | READ Kickoff HTR Key Concepts | Epilog Roger Labahn | C I T lab
  • 53.
    Results from C IT lab’s contribution to ICDAR’s HTRtS-2015 contest WER = 0% CER = 0% who has with or without right the temporary possession of it : and who has with or without right the temporary possession of it : and WER = 17% CER = 4% operation of this act is spent upon Titius only , operation of this act isspeut upon Titius only , WER = 67% CER = 52% of the said first issue : the amount of such second consequently gap/ to the of the and put feet the without of such ; said uitrquunity be the WER = 80% CER = 17% for a simple personal Injury the Offender ’ s punish= For on simple personal injury the offenders punish . 2. Examples of test line images of increasing difficulty. The reference transcript and the CITlab system hypothesis are displayed (in this order) below h image. The corresponding WER and CER figures are also shown on the right of each image. the lines with crossed-out word can be transcribed as the ine shows. Finally, we can see that sometimes, if the line a large WER but a low CER, the transcript can be more ul than if the WER is lower and the CER higher (see third [4] A. Graves, M. Liwicki, S. Fern´andez, R. Bertolami, H. Bunke, and J. Schmidhuber, “A Novel Connectionist System for Unconstrained Handwriting Recognition,” IEEE Tr. PAMI, vol. 31, no. 5, pp. 855– 868, 2009. (Figure from SÁNCHEZ, TOSELLI, ROMERO, VIDAL: ICDAR2015 Competition HTRtS: Handwritten Text Recognition on the tranScriptorium Dataset) co:op Convention | READ Kickoff HTR Key Concepts | Epilog Roger Labahn | C I T lab
  • 54.
    Thanks . ... . .. . . C I T lab Group – URO MoU Partner PLANET intgelligent systems GmbH EU Funding Recognition Enrichment of Archival Documents . . .. . .. . . for your attention! co:op Convention | READ Kickoff HTR Key Concepts | Epilog Roger Labahn | C I T lab
  • 55.
    co:op Convention |READ Kickoff HTR Key Concepts | Epilog Roger Labahn | C I T lab