The state of the art in handwriting synthesis

The state of the art in
handwriting synthesis
Randa I. Elanwar
Researcher
Computers and Systems Dept., ERI

Personal handwriting style aspects
1. Glyph and the size of characters
2. Pressure distribution and the slant of
handwriting
3. Relative sizes of the middle, the upper, and
the lower zones of letters
4. Existence and the shape of lead-in,
connecting, and ending parts
5. Letter, the word, and the line spacings
6. Embellishment
7. Simplified or neglected strokes
PEIT 2013 2

What is handwriting synthesis?
 Converting ASCII text into the user’s
personal handwriting.
PEIT 2013 3

What is handwriting synthesis
needed for?
 Forensic examiners
 The disabled
 Handwriting recognition systems
 Biometric security systems
 Information retrieval (web search)
 Natural language understanding
 Personalization of pen-computing devices
PEIT 2013 4

Handwriting synthesis strategies
 Duplicated Samples
 Combination of different real samples
 Synthetic-individuals
PEIT 2013 5

 Duplicated Samples
starts from real samples of a given person
produces duplicated samples corresponding to the
same person.
increase the amount of already acquired
handwriting data
does not to generate completely new datasets
 The great majority of existing approaches for
synthetic signature generation are based on
this type of strategy.
PEIT 2013 6

 Combination of different real samples
◦ starts from a pool of real n-grams and using some
type of concatenation
◦ needs real samples to generate the synthetic
handwriting trait
◦ utility for performance evaluation and vulnerability
assessment is very limited
 Useful to produce multiple handwriting
samples of a given real user, but not to
generate synthetic individuals.
PEIT 2013 7

 Synthetic-individuals
◦ A priori knowledge about handwriting trait is used
to create a model that characterizes that
handwriting trait for a population.
◦ New synthetic individuals can be generated
sampling the constructed model.
 Doesn't need any real handwriting samples to
generate completely synthetic databases.
Thus, overcomes the usual shortage of
handwriting data collection.
PEIT 2013 8

Methods for handwriting synthesis
 Movement simulation techniques
 Shape-simulation techniques
PEIT 2013 9

◦ Studies starting from the late sixties have
analyzed and studied the human handwriting
moments and the human "motor code" for the
production of cursive handwriting
PEIT 2013 10

◦ The handwriting trajectory is analyzed and
modeled by velocity or force functions
◦ focus on the representation and analysis of real
handwriting signals rather than handwriting
synthesis
◦ May not be convenient for synthesizing non-
cursive handwriting
PEIT 2013 11

◦ The straightforward approach to synthesize
handwriting from collected handwritten glyphs.
◦ Each glyph is a handwriting sample image of
one, two or three letters.
◦ When synthesizing a word, glyphs are simply
juxtaposed in sequence and are connected
using simple curves causing unnatural looking
(discontinuities)
PEIT 2013 12

◦ Also called glyph-based methods
◦ usually record the glyphs directly and reuse or
sample the glyphs when synthesis.
◦ Require intensive user involvement in the
sample collection process and cannot produce
various handwriting styles in a natural way
◦ tries to learn a separate model for the
connection of each pair of letters (impractical
due to limited training set)
PEIT 2013 13

Challenges of handwriting synthesis
 the seed data samples from which character
models are designed (variability vs.
generalization
 the model design and style-dependant
parameters selection
 how to generate (deform) a synthesized sample
 finding the best synthesized samples to compose
a word
 how to concatenate the glyphs without
introducing discontinuities
 how to make the handwriting look natural
 how to evaluate the quality of the synthesized
word
PEIT 2013 14

 For the glyph-based approaches, the main
challenge is to collect a large enough seed
dataset of natural human handwritings
encountering a wide range of writer
variability.
 Meanwhile this violates the main target
beyond synthesis, which is, generating
large amount of data independent from
natural sources.
PEIT 2013 15

 Model Selection: it is difficult to find a
reliable description of a word able to
represent all the admitted occurrences of
the input shape.
 Deformation: find an optimal distortion
parameter range and control the
variability of data to ensure that the
synthetic handwriting is natural enough.
PEIT 2013 16

 Natural looking of the synthesized data
(cursiveness): the system has to
determine which pair of adjacent letters in
a word is connected.
 It would be ideal if we compute the
connection probability from handwritten
samples. However, it is impractical as this
requires large amount natural handwriting
samples.
PEIT 2013 17

 Objective evaluation:
◦ Most researchers do not evaluate the quality of
the synthesis process
◦ others use human expertise
◦ using HMM recognizers to justify the
improvement in recognition accuracy due to
adding synthetic handwriting to the training
set.
PEIT 2013 18

Synthesis in biometric security
 Synthetically generated biometric databases:
 i) facilitate the performance evaluation of
recognition systems instead of the costly and
time-consuming real biometric databases
 ii) provide a tool with which to evaluate the
vulnerability of biometric systems to attacks
carried out with synthetically generated traits
PEIT 2013 19

Synthesis in biometric security
 A system example is given to generate synthetic
online signatures. A Discrete Fourier Transform
(DFT) of the trajectory x-y signals is generated.
 Deformations: Horizontal and vertical affine
scaling, Duration expansion or contraction, and
Noise addition are applied.
 Finally, signals are refined and processed in the
time domain to give more realistic appearance
(smoothing, translation, rotation and scaling
transformations are applied).
PEIT 2013 20

Synthesis in Information retrieval
 Handwriting synthesis makes it possible to
perform text searches on handwritten word
image databases when no ground-truth data is
available.
 The handwritten string is treated as a
pictographic pattern without an attempt to
understand it.
 In this approach the query string is compared to
database strings using an appropriate distance
function allowing user to extend his search to
include also non-ASCII symbols
PEIT 2013 21

Synthesis in Information retrieval
 A system example is given to perform text searches
on handwritten word image databases when no
ground-truth data is available.
 The approach proceeds by synthesizing multiple
images of the query string using different computer
fonts. Normalization and feature extraction operations
are applied.
 The model is trained using synthetic images, and
produces samples according to the distribution of
handwritten features. Finally, unsupervised font
selection method leverages the font contributions to
best represent handwritten data, and yields
significant improvements in accuracy.
PEIT 2013 22

Synthesis in handwriting recognition
 The problem of general unconstrained word
recognition is that the recognition rates are
low due to shortage of a good quality training
data of large amount.
 Expanding the training set by collecting
additional natural human written texts is
error prone, expensive and labor- and time-
consuming process. Alternatively, the training
set can be expanded by synthetic texts.
PEIT 2013 23

Synthesis in handwriting recognition
 Several methods have been overviewed. Most of
them generate the synthetic texts from existing
natural ones.
 Some approaches use human written character tuples
to build up synthetic texts, others decompose the
glyph to a set of control points and in-between
curves.
 synthetic texts are generated by applying random
perturbations on human written characters
 Some researchers use horizontal connection lines or
simple polynomials as within-word connections.
PEIT 2013 24

Synthesis in pen based computing
 The recent emergence of pen computers with high
resolution tablets has made available dynamic (temporal)
information as well as created the need for robust online
handwriting recognition algorithms.
 Handwriting is preferable to typed text in some cases
because it adds a personal touch.
 When writing a note on a Tablet PC, if the computer can
automatically correct some written errors and generate
some predefined handwriting strokes, this would be more
effective and intelligent.
 Researchers working on the online problem usually use the
same techniques used for offline problem solution
PEIT 2013 25

Conclusions and opinions
 Researchers are either tending to use local
datasets of their own or using very limited
part of a public dataset.
 This may be attributed to using human
expertise for evaluation (laborious with large
datasets).
 Finding reliable automatic evaluation scheme
accelerates the process and allow using much
bigger dataset.
PEIT 2013 26

 Researchers tend to collect isolated glyph
samples which leads to unnatural looking at
synthesizing words.
 This might be solved if they use cursive
words for training.
 Automation will be needed for training
dataset preprocessing (specially
segmentation) to help enlarging the dataset
used.
PEIT 2013 27

 In the generation process, the glyph model
produces samples according to the features
distribution extracted from the whole training
dataset.
 In case of large variances, such method may
deform the model and lead to odd looking
generated samples.
 Writer style clustering and generating
different models for the same character per
cluster will help restrict the distortion range.
PEIT 2013 28

 Researchers tend to use almost the same
features or subset of them (geometric,
polynomials, filter control points, etc.) which
may be a good reason for limited
performance.
 In case of not finding other novel descriptive
features, feature fusion techniques might
help change the distribution in feature space.
Consequently, modeling will be better.
PEIT 2013 29

 Researchers tend to evaluate their synthesis
process are using recognizers (especially
HMM) by adding the synthetic datasets to the
training data and test their system to
recognize natural handwritten data.
 Recognizers can be used to enhance
synthesis not to evaluate. The training data
set has to be natural handwritten data and
the test dataset should be synthetic.
PEIT 2013 30

 Targeting better recognition performance
will tune the distortion and modify the
synthesized glyph samples look.
 Reaching a proper recognition rate, may
then qualify the synthetic dataset to be
added as training data for another
recognizer type for evaluation.
PEIT 2013 31

The state of the art in handwriting synthesis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to The state of the art in handwriting synthesis

Similar to The state of the art in handwriting synthesis (20)

More from Randa Elanwar

More from Randa Elanwar (20)

Recently uploaded

Recently uploaded (20)

The state of the art in handwriting synthesis