In this paper we present a literature review about the recent trends in handwriting synthesis, highlighting the different generation processes and pointing out the challenges facing the researchers. Finally we are giving a conclusion about the scientific research collections presented, and summarizing our opinions to help move future work up to maturity
1. The state of the art in
handwriting synthesis
Randa I. Elanwar
Researcher
Computers and Systems Dept., ERI
2. Personal handwriting style aspects
1. Glyph and the size of characters
2. Pressure distribution and the slant of
handwriting
3. Relative sizes of the middle, the upper, and
the lower zones of letters
4. Existence and the shape of lead-in,
connecting, and ending parts
5. Letter, the word, and the line spacings
6. Embellishment
7. Simplified or neglected strokes
PEIT 2013 2
3. What is handwriting synthesis?
Converting ASCII text into the user’s
personal handwriting.
PEIT 2013 3
4. What is handwriting synthesis
needed for?
Forensic examiners
The disabled
Handwriting recognition systems
Biometric security systems
Information retrieval (web search)
Natural language understanding
Personalization of pen-computing devices
PEIT 2013 4
6. Handwriting synthesis strategies
Duplicated Samples
starts from real samples of a given person
produces duplicated samples corresponding to the
same person.
increase the amount of already acquired
handwriting data
does not to generate completely new datasets
The great majority of existing approaches for
synthetic signature generation are based on
this type of strategy.
PEIT 2013 6
7. Handwriting synthesis strategies
Combination of different real samples
◦ starts from a pool of real n-grams and using some
type of concatenation
◦ needs real samples to generate the synthetic
handwriting trait
◦ utility for performance evaluation and vulnerability
assessment is very limited
Useful to produce multiple handwriting
samples of a given real user, but not to
generate synthetic individuals.
PEIT 2013 7
8. Handwriting synthesis strategies
Synthetic-individuals
◦ A priori knowledge about handwriting trait is used
to create a model that characterizes that
handwriting trait for a population.
◦ New synthetic individuals can be generated
sampling the constructed model.
Doesn't need any real handwriting samples to
generate completely synthetic databases.
Thus, overcomes the usual shortage of
handwriting data collection.
PEIT 2013 8
9. Methods for handwriting synthesis
Movement simulation techniques
Shape-simulation techniques
PEIT 2013 9
10. Methods for handwriting synthesis
Movement simulation techniques
◦ Studies starting from the late sixties have
analyzed and studied the human handwriting
moments and the human "motor code" for the
production of cursive handwriting
PEIT 2013 10
11. Methods for handwriting synthesis
Movement simulation techniques
◦ The handwriting trajectory is analyzed and
modeled by velocity or force functions
◦ focus on the representation and analysis of real
handwriting signals rather than handwriting
synthesis
◦ May not be convenient for synthesizing non-
cursive handwriting
PEIT 2013 11
12. Methods for handwriting synthesis
Shape-simulation techniques
◦ The straightforward approach to synthesize
handwriting from collected handwritten glyphs.
◦ Each glyph is a handwriting sample image of
one, two or three letters.
◦ When synthesizing a word, glyphs are simply
juxtaposed in sequence and are connected
using simple curves causing unnatural looking
(discontinuities)
PEIT 2013 12
13. Methods for handwriting synthesis
Shape-simulation techniques
◦ Also called glyph-based methods
◦ usually record the glyphs directly and reuse or
sample the glyphs when synthesis.
◦ Require intensive user involvement in the
sample collection process and cannot produce
various handwriting styles in a natural way
◦ tries to learn a separate model for the
connection of each pair of letters (impractical
due to limited training set)
PEIT 2013 13
14. Challenges of handwriting synthesis
the seed data samples from which character
models are designed (variability vs.
generalization
the model design and style-dependant
parameters selection
how to generate (deform) a synthesized sample
finding the best synthesized samples to compose
a word
how to concatenate the glyphs without
introducing discontinuities
how to make the handwriting look natural
how to evaluate the quality of the synthesized
word
PEIT 2013 14
15. Challenges of handwriting synthesis
For the glyph-based approaches, the main
challenge is to collect a large enough seed
dataset of natural human handwritings
encountering a wide range of writer
variability.
Meanwhile this violates the main target
beyond synthesis, which is, generating
large amount of data independent from
natural sources.
PEIT 2013 15
16. Challenges of handwriting synthesis
Model Selection: it is difficult to find a
reliable description of a word able to
represent all the admitted occurrences of
the input shape.
Deformation: find an optimal distortion
parameter range and control the
variability of data to ensure that the
synthetic handwriting is natural enough.
PEIT 2013 16
17. Challenges of handwriting synthesis
Natural looking of the synthesized data
(cursiveness): the system has to
determine which pair of adjacent letters in
a word is connected.
It would be ideal if we compute the
connection probability from handwritten
samples. However, it is impractical as this
requires large amount natural handwriting
samples.
PEIT 2013 17
18. Challenges of handwriting synthesis
Objective evaluation:
◦ Most researchers do not evaluate the quality of
the synthesis process
◦ others use human expertise
◦ using HMM recognizers to justify the
improvement in recognition accuracy due to
adding synthetic handwriting to the training
set.
PEIT 2013 18
19. Synthesis in biometric security
Synthetically generated biometric databases:
i) facilitate the performance evaluation of
recognition systems instead of the costly and
time-consuming real biometric databases
ii) provide a tool with which to evaluate the
vulnerability of biometric systems to attacks
carried out with synthetically generated traits
PEIT 2013 19
20. Synthesis in biometric security
A system example is given to generate synthetic
online signatures. A Discrete Fourier Transform
(DFT) of the trajectory x-y signals is generated.
Deformations: Horizontal and vertical affine
scaling, Duration expansion or contraction, and
Noise addition are applied.
Finally, signals are refined and processed in the
time domain to give more realistic appearance
(smoothing, translation, rotation and scaling
transformations are applied).
PEIT 2013 20
21. Synthesis in Information retrieval
Handwriting synthesis makes it possible to
perform text searches on handwritten word
image databases when no ground-truth data is
available.
The handwritten string is treated as a
pictographic pattern without an attempt to
understand it.
In this approach the query string is compared to
database strings using an appropriate distance
function allowing user to extend his search to
include also non-ASCII symbols
PEIT 2013 21
22. Synthesis in Information retrieval
A system example is given to perform text searches
on handwritten word image databases when no
ground-truth data is available.
The approach proceeds by synthesizing multiple
images of the query string using different computer
fonts. Normalization and feature extraction operations
are applied.
The model is trained using synthetic images, and
produces samples according to the distribution of
handwritten features. Finally, unsupervised font
selection method leverages the font contributions to
best represent handwritten data, and yields
significant improvements in accuracy.
PEIT 2013 22
23. Synthesis in handwriting recognition
The problem of general unconstrained word
recognition is that the recognition rates are
low due to shortage of a good quality training
data of large amount.
Expanding the training set by collecting
additional natural human written texts is
error prone, expensive and labor- and time-
consuming process. Alternatively, the training
set can be expanded by synthetic texts.
PEIT 2013 23
24. Synthesis in handwriting recognition
Several methods have been overviewed. Most of
them generate the synthetic texts from existing
natural ones.
Some approaches use human written character tuples
to build up synthetic texts, others decompose the
glyph to a set of control points and in-between
curves.
synthetic texts are generated by applying random
perturbations on human written characters
Some researchers use horizontal connection lines or
simple polynomials as within-word connections.
PEIT 2013 24
25. Synthesis in pen based computing
The recent emergence of pen computers with high
resolution tablets has made available dynamic (temporal)
information as well as created the need for robust online
handwriting recognition algorithms.
Handwriting is preferable to typed text in some cases
because it adds a personal touch.
When writing a note on a Tablet PC, if the computer can
automatically correct some written errors and generate
some predefined handwriting strokes, this would be more
effective and intelligent.
Researchers working on the online problem usually use the
same techniques used for offline problem solution
PEIT 2013 25
26. Conclusions and opinions
Researchers are either tending to use local
datasets of their own or using very limited
part of a public dataset.
This may be attributed to using human
expertise for evaluation (laborious with large
datasets).
Finding reliable automatic evaluation scheme
accelerates the process and allow using much
bigger dataset.
PEIT 2013 26
27. Conclusions and opinions
Researchers tend to collect isolated glyph
samples which leads to unnatural looking at
synthesizing words.
This might be solved if they use cursive
words for training.
Automation will be needed for training
dataset preprocessing (specially
segmentation) to help enlarging the dataset
used.
PEIT 2013 27
28. Conclusions and opinions
In the generation process, the glyph model
produces samples according to the features
distribution extracted from the whole training
dataset.
In case of large variances, such method may
deform the model and lead to odd looking
generated samples.
Writer style clustering and generating
different models for the same character per
cluster will help restrict the distortion range.
PEIT 2013 28
29. Conclusions and opinions
Researchers tend to use almost the same
features or subset of them (geometric,
polynomials, filter control points, etc.) which
may be a good reason for limited
performance.
In case of not finding other novel descriptive
features, feature fusion techniques might
help change the distribution in feature space.
Consequently, modeling will be better.
PEIT 2013 29
30. Conclusions and opinions
Researchers tend to evaluate their synthesis
process are using recognizers (especially
HMM) by adding the synthetic datasets to the
training data and test their system to
recognize natural handwritten data.
Recognizers can be used to enhance
synthesis not to evaluate. The training data
set has to be natural handwritten data and
the test dataset should be synthetic.
PEIT 2013 30
31. Conclusions and opinions
Targeting better recognition performance
will tune the distortion and modify the
synthesized glyph samples look.
Reaching a proper recognition rate, may
then qualify the synthetic dataset to be
added as training data for another
recognizer type for evaluation.
PEIT 2013 31