Expert Systems With Applications 131 (2019) 219–239
Contents lists available at ScienceDirect
Expert Systems With Applications
journal homepage: www.elsevier.com/locate/eswa
A novel character segmentation-reconstruction approach for license
plate recognition
Vijeta Kharea
, Palaiahnakote Shivakumarab
, Chee Seng Chanb
, Tong Luc,∗
,
Liang Kim Mengd
, Hon Hock Woond
, Michael Blumensteine
a
Department of Computer Science and Software Engineering, Concordia University, Montreal, Canada
b
Faculty of Computer Systems and Information Technology, University of Malaya, Malaysia
c
National Key Lab for Novel Software Technology, Nanjing University, Nanjing, China
d
Advanced Informatics Lab, MIMOS Berhad, Kuala Lumpur, Malaysia
e
Faculty of Engineering and Information Technology, University of Technology Sydney, Australia
a r t i c l e i n f o
Article history:
Received 26 November 2018
Revised 2 March 2019
Accepted 16 April 2019
Available online 18 April 2019
Keywords:
Character segmentation
Character reconstruction
Stroke width
Zero crossing
Gradient vector flow
License plate recognition
a b s t r a c t
Developing an automatic license plate recognition system that can cope with multiple factors is chal-
lenging and interesting in the current scenario. In this paper, we introduce a new concept called partial
character reconstruction to segment characters of license plates to enhance the performance of license
plate recognition systems. Partial character reconstruction is proposed based on the characteristics of
stroke width in the Laplacian and gradient domain in a novel way. This results in character components
with incomplete shapes. The angular information of character components determined by PCA and the
major axis are then studied by considering regular spacing between characters and aspect ratios of char-
acter components in a new way for segmenting characters. Next, the same stroke width properties are
used for reconstructing the complete shape of each character in the gray domain rather than in the gra-
dient domain, which helps in improving the recognition rate. Experimental results on benchmark license
plate databases, namely, MIMOS, Medialab, UCSD data, Uninsbria data Challenged data, as well as video
databases, namely, ICDAR 2015, YVT video, and natural scene data, namely, ICDAR 2013, ICDAR 2015, SVT,
MSRA, show that the proposed technique is effective and useful.
© 2019 Elsevier Ltd. All rights reserved.
1. Introduction
Creating a smart/digital/safe city has been one of the important
emerging trends in both developing and developed countries in re-
cent times. As a result, developing automatic systems has become
an integral part of the above-mentioned initiatives (Rathore, Ah-
mad, Paul, & Rho, 2016; Yuan et al., 2017). One such example is to
develop intelligent transport systems for safety and mobility, and
to enhance public welfare with the help of advanced technologies
by recognizing license plates (Anagnostopoulos, Anagnostopoulos,
Loumos, & Kayafas, 2006; Du, Ibrahim, Shehata, & Badawy, 2013;
Suresh, Kumar, & Rajagopalan, 2007). There are transport systems
proposed for recognizing license plates in the literature for applica-
tions such as the automatic collection of toll fees, automatic mon-
∗
Corresponding author.
E-mail addresses: shiva@um.edu.my (P. Shivakumara), cs.chan@um.edu.my
(C.S. Chan), lutong@nju.edu.cn (T. Lu), liang.kimmeng@mimos.my (L.K. Meng),
hockwoon.hon@mimos.my (H.H. Woon), Michael.Blumenstein@uts.edu.au
(M. Blumenstein).
itoring of car speeds on the road, automatic estimation of traf-
fic volume at different traffic junctions, detection of illegal park-
ing and incorrect traffic flows (Abolghasemi & Ahmadyafrd, 2009;
Azam & Islam, 2016; Tadic, Popovic, & Odry, 2016). However, such
a system only works well for a particular application since it is not
developed for multiple applications. This is because any particular
system can cope with a single adverse factor but not multiple fac-
tors, which affect license plate visuals (Zhou, Li, Lu, & Tian, 2012).
In addition, most of the existing systems that have been devel-
oped use conventional binarization methods, which are proposed
for plain background document images to localize and recognize
license plates (Ghaili, Mashohor, Ramli & Ismail, 2013; Du et al.,
2013; Yu, Li, Zhang, Liu, & Meng, 2015). It is obvious that for differ-
ent real-time applications, multiple environmental effects are com-
mon (e.g., low resolution, low contrast, complex backgrounds, blur
due to camera or vehicle movements, illumination effects due to
sunlight, headlights, degradation effects due to rain, fog or haze,
and distortion effects due to camera angle variations).
The illustration shown in Fig. 1 demonstrates that input image-
1 is affected by perspective distortion, while input image-2 is
https://doi.org/10.1016/j.eswa.2019.04.030
0957-4174/© 2019 Elsevier Ltd. All rights reserved.
220 V. Khare, P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239
Fig. 1. Binarization and recognition results of license plate images affected by different effects, which result in varying recognition results.
affected by blur as shown in Fig. 1(a). For these two license plate
images, the binarization method (Zhou, Feild, Learned-Miller, &
Wang, 2013), which is the state-of-the-art method and works well
for low contrast and complex background images, fails to give
good results for input image-1, but gives better results for input
image-2 as shown in Fig. 1(b). However, the recognition results
given by Tesseract OCR gives nothing for input image-1 due to
touching, and incorrect results for input image-2 due to shape loss
as shown in Fig. 1(b). On the other hand, the proposed method
works well except for the first character in input image-1 through
reconstruction-segmentation with the same OCR. With this illus-
tration, one can conclude that there is an urgent need for develop-
ing a system, which can withstand multiple adverse factors such
that the same system can be used for several real-time applica-
tions successfully.
2. Related work
The proposed license plate recognition system involves char-
acter segmentation through partial reconstruction, and complete
reconstruction for recognition. Therefore, we review the research
related to character segmentation, character recognition and
character reconstruction.
Character Segmentation: Phan, Shivakumara, Su, and Tan
(2011) proposed a gradient-vector-flow based method for video
character segmentation. The method uses text line length for
finding seed points that are unreliable, and then uses mini-
mum cost path estimation for finding spaces between characters.
Sharma, Shivakumara, Pal, Blumenstein, and Tan (2013) proposed a
new method for character segmentation from multi-oriented video
words. The method is sensitive to dominant points. Liang, Shiv-
akumara, Lu, and Tan (2015) proposed a new wavelet Laplacian
method for arbitrarily-oriented character segmentation in video
text lines. This method explores zero crossing points to find
spaces between words or characters. The performance of the
method degrades when an image contains noisy backgrounds.
There are methods proposed for segmenting characters from
license plate images. For example, Tian, Wang, Wang, Liu, and
Xia (2015a) proposed a two-stage character segmentation method
for Chinese license plates. This method relies on binarization for
segmentation. Sedighi and Vafadust (2011) proposed a new and
robust method for character segmentation and recognition in li-
cense plate images. This method uses a classifier, and binarization
for segmentation. As a result, the method is dataset dependent.
Khare, Shivakumara, Raveendran, Meng, and Woon (2015) pro-
posed a new sharpness-based approach for character segmentation
of license plate images. The method explores gradient vector and
sharpness for segmentation. However, the method is said to be
sensitive to seed point selection and blur presence. Kim, Song,
Lee, and Ko (2016) proposed an effective character segmentation
approach for license plate recognition under varying illumination
environments. The method uses binarization and the super pixel
concept for segmentation. However, the method focuses on a
single cause but not multiple causes.
In the same way, recently, Dhar, Guha, Biswas, and Abedin
(2018) proposed a system design for license plate recognition us-
ing edge detection and convolutional neural networks. The method
uses character segmentation as a preprocessing step for license
plate recognition. For character segmentation, the method ex-
plores edge detection, morphological operations and region prop-
erties. However, the method is good for the images with sim-
ple backgrounds but not for images affected by many challenges.
Ingole and Gundre (2017) proposed character feature-based vehi-
cle license plate detection and recognition. First, the method seg-
ments characters from license plate regions for recognition. For
character segmentation, the method proposes vertical and hori-
zontal projection profile-based features. The proposed projection
profile-based features may not be robust for the images with com-
plex backgrounds. Radchenko, Zarovsky, and Kazymyr (2017) pro-
posed a segmentation and recognition method for Ukrainian li-
cense plates. The method segments characters based on connected
component analysis. The connected component analysis works well
when the input image is binarized without the loss of the charac-
ter shapes and touching between the characters. However, for the
images with complex backgrounds, it is hard to propose a binariza-
tion method to separate foreground and background information.
In summary, from the above context, we can conclude that
most of the methods made an attempt to solve the problem of low
resolution or illumination effects, but do not include other distor-
tions such as blur, touching and complex backgrounds. In addition,
none of the methods explore the concept of reconstruction for seg-
menting characters from license plate images.
Character Recognition: To recognize characters in text lines of
video, natural scene images and license plate images, there are
methods that use either binarization methods or classifiers (Ye &
Doermann, 2015). For example, Zhou et al. (2013) proposed scene
V. Khare, P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 221
text binarization via inverse rendering. The method proposes a
different idea for adapting parameters that tune the method ac-
cording to image complexity. However, the assumptions made for
proposing a number of criteria limits its ability to work on dif-
ferent applications. Wang, Shi, Xiao, and Wang (2015) proposed
MRF-based text binarization for complex images using stroke fea-
tures. The success of the method depends on how well it se-
lects seed pixels from the foreground and background. Similarly,
Anagnostopoulos et al. (2006) proposed a license plate recogni-
tion algorithm for intelligent transportation applications. Since the
method involves binarization and a classifier for recognition, it
may not work well for images affected by multiple adverse effects
such as low resolution, blur and touching. Saha, Basu, and Nasipuri
(2015) proposed automatic license plate recognition for Indian li-
cense plate images. The method involves edge map generation,
the Hough transform and a classifier for recognition. The success
of the method depends on edge map generation and a classifier.
Gou, Wang, Yao, and Li (2016) proposed vehicle license plate recog-
nition based on extremal regions and restricted Boltzmann ma-
chines. The method extracts HoG features for detected characters,
and then uses a classifier for recognition. In summary, it is noted
from the above review of license plate recognition approaches that
most of the methods consider binarization algorithms and classi-
fiers for recognition. In addition, the methods do not consider im-
ages affected by multiple factors for achieving their results. There-
fore, the methods lose generality and the ability to work on license
plate images of different background and foreground complexities.
Deep Learning Models for Character Recognition:
Jaderberg, Simonyan, Vedaldi, and Zisserman (2016) proposed an
approach for reading texts in the wild with a convolutional neural
network, which explores deep learning for achieving high recogni-
tion results for texts in natural scene images. Goodfellow, Bulatov,
Ibarz, Arnoud, and Shet (2013) proposed multi-digit number
recognition from street view imagery using deep convolutional
neural networks, which explores deep learning at the pixel level.
Despite both methods addressing the challenges caused by natural
scene images, they are limited to text recognition from high
contrast images but not from low resolution license plate images
and video images. Raghunandan et al. (2017) proposed a Riesz
fractional-based model for enhancing license plate detection and
recognition. This method makes an attempt to address the causes
which affect license plate detection and recognition. Based on
the experimental results, it is noted that enhancement of license
plate images may improve the recognition results but it is not
adequate for real time applications. Al-Shemarry, Li, and Abdulla
(2018) proposed an ensemble of adaboost cascades of 3L-LBPs
classifiers for license plate detection from low quality images. The
method explores texture features based on LBP operations and
uses a classifier for license plate detection from images affected
by multiple adverse factors. However, the performance of the
method heavily depends on learning and the number of labeled
samples. In addition, the scope is limited to text detection but not
recognition as in the proposed work. Text detection is easier than
recognition in this case because detection does not require the full
shapes of characters.
Recently, inspired by the strong ability and discriminating
power of deep learning models, some methods have explored dif-
ferent deep learning models for license plate recognition. For ex-
ample, Dong, He, Luo, Liu, and Zeng (2017) proposed a CNN-
based approach for automatic license plate recognition in the
wild. The method explores an R-CNN for license plate recog-
nition. Bulan, Kozitsky, Ramesh, and Shreve (2017) proposed
segmentation-and annotation-free license plate recognition with
deep localization and failure identification. The method explores
CNNs for detecting a set of candidate regions. Then it filters
false positive from the candidate regions based on strong CNNs.
Silva and Jung (2018) proposed license plate detection and recog-
nition in unconstrained scenarios. The method explores CNNs for
addressing challenges caused by degradation. It detects the license
plate region first and then the detected region is fed to an OCR
for recognition. Lin, Lin, and Liu (2018) proposed an efficient li-
cense plate recognition system using convolution neural networks.
The method detects vehicles for license plate region detection and
then it explores CNNs for recognition. Yang et al. (2018) proposed
Chinese vehicle license plate recognition using kernel-based ex-
treme learning machines with deep convolutional features. The
method explore the combination of CNN and ELM (extreme learn-
ing machines) for license plate recognition. It is found from the
above discussion on deep learning models that the methods work
well when we have a huge number of labeled predefined samples.
However, it is hard to choose predefined samples that represent
all possible variations in license plate recognition, especially for
the images affected by multiple adverse factors as in the proposed
work. In addition, deep learning has its own inherent limitations
such as optimizing parameters for different databases and main-
taining stability of deep neural networks (Liu et al., 2017). It can be
noted from the above discussion that there is a gap between the
state-of-the-art methods and the present demand. This observation
motivated us to propose a new method for license plate recogni-
tion without depending much on classifiers and a large number of
labeled samples, as in the existing methods.
Character Reconstruction: Similar to the proposed work, there
are methods in the literature, which reconstruct character shapes
to improve recognition rates without the help of classifiers and bi-
narization algorithms. Shivakumara, Phan, Bhowmick, Tan, and Pal
(2013) proposed a ring radius transform for character shape re-
construction in video. Its performance is good as long as Canny
produces the correct character structures. However, it is true that
Canny is sensitive to blur and other distortions. To overcome this
drawback, Tian, Shivakumara, Phan, Lu, and Tan (2015b) proposed
a method for character shape restoration using gradient orienta-
tions. It finds the medial axis in the gradient domain with differ-
ent directions. However, the method does not work well for char-
acters having blur and complex backgrounds. In addition, the pri-
mary objective of this work is to reconstruct the characters from
video, which suffer from low resolution and low contrast, but does
not deal with license plate images.
In light of the above discussions on the review of character seg-
mentation from license plate images, character recognition from li-
cense plate images and character reconstruction, most of the meth-
ods focus on a particular dataset and certain applications, such as
natural scene images or video images or license plate images. As a
result the scope of the above methods is limited to specific appli-
cations and objectives. This motivated us to propose a method that
can work well for license plate images, natural scenes and video
images. In addition, license plates images are generally affected by
multiple adverse factors due to background and foreground varia-
tions, making the problem of recognition more complex and inter-
esting.
Inspired by the work Shivakumara et al., 2019) where keyword
spotting is addressed for multiple types of images with powerful
feature extraction, we propose a novel idea for recognizing char-
acters from license plates affected by multiple factors. The key
contributions of the proposed work are as follows: ((1) Propos-
ing partial reconstruction for segmenting characters from license
plate images is novel; (2) Reconstructing complete shapes of char-
acters from segmented characters without binarization, which can
work well for not only license plate images but also natural scene
and video images, is also novel; (3) The combination of reconstruc-
tion and character segmentation in a new way is another inter-
esting step to achieve good recognition rates for multi-type im-
ages. The main advantage of the proposed method is that since the
222 V. Khare, P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239
proposed reconstruction approach preserves character shapes, the
performance of the method does not depend much on classifiers
and the number of training samples.
The proposed method is structured as follows. Stroke width pair
candidate detection is illustrated by estimating stroke width dis-
tances for each pixel in the images in Section 3.1. In Section 3.2,
we propose symmetry properties based on stroke width distances
to obtain partial reconstruction results. Section 3.3 proposes char-
acter segmentation using partial reconstruction results based on
principal and major axis information of the character components.
We describe the steps for complete reconstruction in the gray do-
main in Section 3.4.
3. Proposed technique
This work considers license plates affected by multiple fac-
tors according to various applications, such as low resolution, low
contrast, complex backgrounds, multiple fonts or font sizes, blur,
multi-orientation, touching elements and distortion due to illumi-
nation effects, as input for character segmentation and recognition.
To overcome the problem of low contrast and low resolu-
tion, inspired by Laplacian and gradient operations, which usu-
ally enhance high contrast information at the edges or near
edges by suppressing background information (Phan et al., 2011;
Liang et al. 2015; Khare et al., 2015), we propose Laplacian and gra-
dient information for finding pixels which represent stroke width
(thickness of the stroke) of characters in license plate images. This
is justified because the Laplacian process, which is the second or-
der derivative, gives high positive and negative values at the edges
and near edges, respectively. Similarly, the gradient, which is the
first order derivative, gives high positive values at the edges and
near edges. This information is used for Stroke Width Pair (SWP)
candidate detection. It is true that stroke width or stroke width
distance and color remain constant throughout characters regard-
less of font or font size variations (Epshtein, Ofek, & Wexler, 2010)
at the character level. Most of the time, license plates are pre-
pared using upper case letters. Furthermore, the spacing between
characters in license plate images is almost constant. Based on
these facts, we propose new symmetry features which use Lapla-
cian and gradient properties at the SWP candidates to find neigh-
boring SWPs. However, due to complex backgrounds, severe illu-
mination effects and blur, there is a possibility for SWPs to fail in
satisfying the symmetry features. This results in the loss of infor-
mation and hence we consider the output of this step as partial
reconstruction. We believe that the output of partial reconstruc-
tion results preserve the structure of character components. This
may lead to under-and over-segmentation.
It is understood that Eigen vectors of PCA give angles based on
the number of pixels which contribute to the direction of character
components (Shivakumara, Dutta, Tan, & Pal, 2014). In other words,
to estimate the possible angle of the whole character, PCA does not
requires the full character information. As per our experiments, in
general, if the character contains more than 50% of pixels, one can
expect almost the same angle of the actual character. The same
thing is true for angle estimation via the major axis of the charac-
ter. With this motivation, we use angle information given by PCA
and the Major Axis (MA) to estimate angles of character compo-
nents. The angle information between PCA and MA is explored for
character segmentation. Since the proposed symmetry properties
are sensitive to blur, touching and complex backgrounds, we pro-
pose the same symmetry properties with weak conditions in the
gray domain instead of Laplacian and gradient domains to recon-
struct the full character shape with the help of the Canny edge
image of the input image. This is possible because there is no in-
fluence from neighboring characters after segmenting characters
from the image. The reconstructed characters are passed to Tesser-
Fig. 2. Pipeline of the proposed method.
act OCR for recognition. The flow of the proposed method is shown
in Fig. 2.
3.1. Stroke width pair candidates detection
As mentioned in the previous section, the stroke width dis-
tances (thickness of the stroke) of characters in a license plate im-
age are usually the same as shown in Fig. 3(a). To extract stroke
width distance, we propose a Laplacian operation which gives high
positive and negative responses for the transition from background
to foreground and vice versa, respectively. This results in search-
ing two zero crossing points that define stroke width distance as
shown in Fig. 3(b) and 3(c), where a pictorial representation of the
marked region in Fig. 3(b) is shown. Since the input images con-
sidered have complex backgrounds and small orientations due to
angle variations, we use the following mask to extract horizontal
and vertical diagonal zero crossing points. Due to background vari-
ations and noise introduced by the Laplacian operation as shown
in Fig. 3(b), background and noise pixels may contribute to defin-
ing stroke width distances. Therefore, to overcome this issue, we
plot a histogram for stroke width distances as shown in Fig. 3(c).
The distances are chosen from those contributing to the highest
peak as candidate stroke width pairs, which are shown in Fig. 3(d),
where one can see all the red pixels denoting stroke width pair
candidates. This is justified because the stroke pixel pairs that de-
fine actual stroke width distance are higher than the pixel pairs
defined by background or noise pixels. In this way, the proposed
step can withstand the cause of background noise and degrada-
tions. It may be noted from Fig. 3(d) that Stroke Width Pair (SWP)
candidates represent character strokes. In addition, each character
has a set of SWPs. It is evident from Fig. 3(e) that the proposed
technique detects SWPs for the complex image in Fig. 1(a), where
touching exists due to perspective distortion.
It is noted from Fig. 3(d) that the number of red pixels for the
characters are different from one character to another. This is be-
cause the proposed steps estimate stroke width distance by con-
sidering all the pixels of characters but not the pixels of individual
characters. Since we consider the common stroke width distance
of the pixels in the image, the number of stroke width pairs vary
from one region to another due to background complexity. As a
result, all the pixels of characters may not contribute to the high-
est peak in the histogram. Therefore, one cannot predict the num-
ber of stroke width pairs for each character as shown in Fig. 3(d).
However, the proposed method has the ability to restore the char-
acter shape with one stroke width pair of each character by the
partial reconstruction step. We believe that each character gets at
least one stroke width pair from the histogram operation for the
partial reconstruction step because they follow the same font size
V. Khare, P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 223
Fig. 3. Stroke width pair candidate detection.
and typeface.
Laplace Mask =

1 1 1
1 −8 1
1 1 1

3.2. Partial character reconstruction
The proposed technique considers SWP candidates given by the
previous section as the representatives to find neighboring SWP
candidates, which define stroke width of the character. To achieve
this, for each SWP candidate, the proposed technique considers
eight neighbors of two stroke pixels and then checks all the com-
binations to identify the correct SWP as shown in Fig. 4(a), where
we can see the process of searching for the right neighbor SWP.
In this work, the proposed method uses an 8-directional code for
searching the correct stroke width pair; one can expect 8 neigh-
bor pixels for each stroke pixel of the pair. Therefore, the total
number of combinations is 8 × 8 = 64 pairs. The reason to con-
sider 8 neighbors for each stroke pixel is to ensure that the step
does not miss checking any pair of pixels. Since stroke pixels rep-
resent edge pixels of characters, we can expect high gradient val-
ues compared to their background. Similarly, the pixel value be-
tween the stroke pixels represent a homogeneous background, and
the gradient gives low values for the pixels compared to the gradi-
ent values of the stroke pixels as shown in Fig. 4(b) (Khare et al.,
2015). Therefore, we study the gradual changes from high to low
and low to high as shown in Fig. 4(c), where we can see grad-
ual changes in gradient values which are defined as the Gradient
Symmetry (GS) feature. When we look at the Gradient Vector Flow
(GVF) of the stroke pixels, as shown in Fig. 4(d), we can observe
arrows, which are pointing towards the edges; the direction of the
arrows of two stroke pixels have opposite directions. This is called
the GVF Symmetry (GVFS) feature as shown in Fig. 4(e). Similarly,
we consider the value of a positive peak of the Laplacian and the
difference between the positive and negative peak values for find-
ing symmetry. In this way, we find the neighboring SWP of each
SWP candidate as shown in Fig. 4(f), where one can see positive
and positive-negative peaks. This is called the Laplacian Symme-
try (LS) feature. The proposed technique extracts four symmetry
features for each SWP candidate, and then checks the four sym-
metry features with all 64 combinations. Subsequently, it chooses
the combination which satisfies the four symmetries as the neigh-
boring SWP, and the pair will be displayed as white pixels. The
identified neighbor SWP is considered as an SWP candidate, and
again the whole process repeats recursively to find all the neigh-
boring SWPs in the image. This process stops when it visits all
SWPs. However, the number of iterations depends on the complex-
ity of the characters and the number of SWPs of each character.
As long as the stroke width pair satisfies the symmetry properties,
the partial reconstruction step restores the contour pixels of the
characters. When SWPs fail to satisfy the symmetry properties or
there are no more SWPs to visit, the iterative process terminates.
224 V. Khare, P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239
Fig. 4. Exploiting symmetrical features for finding neighbor SWPs from 64 combinations.
This is the reason to obtain the partial shape of the character by
partial reconstruction as shown in Fig. 5, where we can see the in-
termediate steps for the partial reconstruction results. It can also
be noted from Fig. 5 that the partial reconstruction results provide
the structures of the characters with some loss of information.
The four symmetrical features are defined specifically as fol-
lows.
(i) If GSW = {gSW1,gSW2……,gSWn} and, GNP = {gNP1,gNP2,……,
gNPn}, where n is the size of the stroke width (SW), gSWn
and gNPnrepresents the gradient value of the stroke width
and Neighbor Pair (NP) at location n, respectively.
Then NP = =1iff {gSW1 = =gNP1,gSW2 = =gNP2,……gSWn = =gNPn}
Gradient symmetries can be visualized as in
Fig. 4(b).
V. Khare, P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 225
Fig. 5. Intermediate and final partial reconstruction results.
(ii) Angle information of GVF at the starting point (sp) and end
point (ep) of the stroke width is represented as: GVFSW(sp)
and GVFSW(ep). Then NP = =1iff
GVFNP(sp) == GVFSW (sp)
GVFNP(ep) == GVFSW (ep)
where GVFNP(sp) and GVFNP(ep) represent the angle informa-
tion of GVF at the starting point and end point of NP, re-
spectively. GVF angle symmetry can be visualized as in Fig.
3(e).
(iii) The peak values of stroke width Laplace (L) at the start-
ing point and end point are respectively represented as
P_LSW (sp) and P_LSW (ep), and the peak values of neighbor pair
Laplace starting point and end points are, respectively de-
noted by P_LNP(sp) and P_LNP(ep).
Then NP = =1iffP_LNP(sp) == P_LSW (sp) P_LNP(ep) ==
P_LSW (ep)
(iv) Similarly, the highest peak to the lowest peak of the Laplace
zero-crossing difference is also used for comparing neigh-
bor pairs. Here the highest and lowest peaks of Laplace
zero-crossing points for stroke width can be represented
as: hP_LSW and lP_LSW and for the neighbor pair hP_LNP and
lP_LNP. Then the high to low difference can be defined as:
DiffSW = hP_LSW − lP_LSW , DiffNP = hP_LNP − lP_LNP
Then NP = =1iff DiffNP = DiffSW
Laplace symmetries (iii) and (iv) can be visualized as in Fig. 4(f).
3.3. Character segmentation
When we look at the partial reconstruction results given by the
previous section as shown in Fig. 5(f) and 5(g), one can understand
that even though there is a loss of shape, it still provides enough
structure, which helps us to find the spacing between characters
and character regions for segmentation. As mentioned in the
proposed Methodology Section, Principal Component Analysis
(PCA) and the Major Axis (MA) do not require the full character
shape to estimate possible directions of character components. It
is also noted that most license plate images including Malaysian
license plates contain upper case letters with numerals, but not
the combination of upper case with lower case letters. According
to the statement in Yao, Bai, Liu, Ma, and Tu (2012) that “for
most text lines, the major orientations of characters are nearly
perpendicular to the major orientation of the text line”, both PCA
and MA should give approximately 90° if characters in the text
are aligned in the horizontal direction. The above observations can
be confirmed from the sample results of partial reconstruction
on alphabets, namely, A to Z, and numerals, namely, 0–9, chosen
from the databases shown in Fig. 6, where we note that for both
alphabet and numeral images, PCA (yellow color axis) and MA
(red color axis) give angles, which are almost the same and ap-
proximately 90° because all the images are inclined in the vertical
direction. Similarly, the same conclusion can be drawn from the re-
sults shown in Fig. 7(a)-7(b), where we present PCA and MA angle
information for the images affected by low contrast, complex back-
grounds, multi-fonts, multi-font sizes, blur and perspective distor-
tion. In the same way, the sample partial reconstruction results
shown in Fig. 8(a)-8(b) for the images of two character compo-
nents show that PCA and MA give angles of almost 0° as character
components, which are aligned towards the horizontal direction.
The results in Figs. 6, and 7 show that partial reconstruction
has the ability to preserve character shapes regardless of differ-
ent causes, while PCA and MA have the ability to give the angle
of character orientation without the complete shape of the char-
acter components. This observation leads to define the following
hypothesis for character segmentation. If both the axes give almost
90° with a ± 26 difference, then the component is considered as a
full character, else if both the axes give almost zero degrees with
a ± 26 difference then the component is considered to be an under-
segmentation. This is possible when two character components are
joined together as shown in Fig. 8. Otherwise, the component is
considered as a case of over-segmentation. This occurs when a
character loses shape. The value of ±26 is determined based on
experimental results, which will be presented in the Experimen-
tal Section. The reason to fix such a threshold is that segmentation
requires either a vertical or horizontal orientation. With this idea,
the proposed technique classifies components from the partial re-
construction results into three cases.
In general, characters in license plate images share the same
aspect ratio especially height of characters, as shown in Fig. 5(a).
This observation motivated us to find the width of components of
three cases. If partial reconstruction outputs characters with clear
226 V. Khare, P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239
Fig. 6. Angle information given by PCA and MA for the alphabets and numerals of license plate images. The MA axis is represented by a red color and the PCA axis is
represented by a yellow color.
Fig. 7. PCA and MA angle information of the partial reconstruction result for the different distorted images.
Fig. 8. PCA and MA angle information of the partial reconstruction results for the image of two character components.
shapes, and all the components are classified as an ideal charac-
ter case according to angular information, the proposed technique
considers the width which contributes to the highest peak in the
histogram as the probable width. If the proposed technique does
not find a peak on the basis of width, it considers the average of
the width of the characters as a probable width. The same prob-
able width is used for segmenting characters as shown in Fig. 9,
where for the input license plate images in Fig. 9(a) and Fig. 9(b),
the proposed technique plots histograms using the probable width
as shown in Fig. 9(c), and the segmentation results given by the
probable width are shown in Fig. 9(d) and Fig. 9(e), respectively.
Fig. 9(d) and 9(e) show that segmentation is performed using
probable width segments in almost all the characters for image-
1 in Fig. 9(a) except for “12. For image-2 in Fig. 9(b), it segments
almost all the characters except for “W” and “U”. Therefore, seg-
mentation with probable widths is good in ideal cases as shown
in Fig. 9(f), where for the complex image in Fig. 3(e), the proba-
ble width segments all the characters successfully using the partial
reconstruction results. However, it is not true for all the cases. For
example, it results in under-segmentation and over-segmentation
as shown in Fig. 9(d) and Fig. 9(e), respectively.
To solve the problem of under-segmentation given by the prob-
able width, we propose an iterative-shrinking algorithm, which re-
duces small portions of components from the right side with a step
size of five pixels in the partial reconstruction results, and then
checks angle information of ideal characters. The proposed tech-
nique investigates whether the angle difference between PCA and
MA leads to an angle of 90° or not, iteratively. When the angle
V. Khare, P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 227
Fig. 9. Character segmentation using probable widths.
Fig. 10. Iterative-shrinking process for under-segmentation. (a) gives the case of under segmentation, and (b) shows the intermediate results of the iterative process.
difference satisfies the condition of an ideal character, the itera-
tive process stops, and the character is considered as an individual
component. Since under-segmentation usually contains two char-
acters such as “12, the iterative process segments such cases suc-
cessfully. This process is tested on all the components from the
results of partial reconstruction to solve the problem of under-
segmentation.
The process of iterative-shrinking is illustrated in Fig. 10, where
(a) is a sample of an under-segmentation case, (b) gives the in-
termediate results of the iterative process, and (c) shows the final
results. It is observed from Fig. 10(b) that the angle difference be-
tween axes given by PCA and MA reduces as the iterations con-
tinue, and subsequently stops when both the axes give the same
angle.
In the same way of iterative-shrinking for under-segmentation,
we propose iterative-expansion to solve the over-segmentation
cases. For each component given by the probable width, the pro-
posed technique expands with a step size of five pixels from the
left side. At the same time, in the partial reconstruction results,
it calculates the angle differences of PCA and MA. This process
continues until it gets the angle of almost zero degrees. When
two characters are merged, the iterative process gets an angle
of zero degrees by both PCA and MA. At this point, the itera-
tive process stops and then we use the iterative-shrinking algo-
rithm to segment both the characters. Therefore, the proposed
iterative-expansion uses iterative-shrinking for solving the over-
segmentation problem. Note that the proposed technique first em-
ploys iterative-shrinking to solve the under-segmentation, then
it uses iterative-expansion for solving over-segmentation. This is
because iterative-expansion requires iterative-shrinking. The rea-
son to propose an iterative procedure for both shrinking and ex-
pansion is that when a character component is split into small
228 V. Khare, P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239
Fig. 11. Iterative-expansion process for over-segmentation.
fragments due to adverse factors or when character components
are joined together, it is necessary to study local information in
order to identify the vertical and horizontal cases. The process of
iterative-expansion is illustrated in Fig. 11, where (a) shows the
cases of under-segmentation, (b) shows intermediate results of par-
tial reconstruction of (a), (c) gives the results of iterative-expansion
followed by shrinking for correct segmentation, and (d) gives the
final character segmentation results.
3.4. Complete character reconstruction
Section 3.2 described the method to obtain partial character re-
construction for input license plate images, and the method pre-
sented in Section 3.3 uses the advantage of partial reconstruction
for character segmentation. Since characters are segmented well
from license plate images even when they are affected by multiple
factors, we apply Canny to obtain edges to reconstruct complete
shapes of characters for each incomplete shape given by partial re-
construction. This is because Canny gives fine edges for low and
high contrast images when we supply individual characters rather
than the whole license plate image (Saha et al., 2015). Therefore,
we consider the output of Canny as the input for reconstructing
missing information in partial reconstruction results.
For the Canny edge of the input character image shown in
Fig. 12(a), the proposed technique finds the Stroke Width Pair
(SWP) candidates as described in Section 3.2, where we can see
the characters “W” and “5 given by partial reconstruction of lost
shapes. The SWP are considered as representatives for reconstruc-
tion in this Section. For each SWP, as the proposed technique de-
fines symmetrical features using gradient values, gradient vector
flow and Laplacian, we define the same symmetry features using
gray information rather than gradient information. This is because
according to our analysis of the experimental results, the gradient
does not give good responses for low contrast, low resolution and
distorted images. This is the main reason for the loss of shapes and
the same thing has led to partial reconstruction. Since characters
are segmented and pixels have uniform color values, we propose
symmetry features in the gray domain to restore the rest of the
incomplete information for partial reconstruction results to obtain
complete character shapes.
For SWP, the proposed technique calculates a tangent angle as
defined below:
Angletan = tan((y − y1)/(x − x1))
where (x, y) is the starting pixel location of the SWP, and (x1,y1) is
the location of its neighbor pixel. Since the tangent angle between
the pixel of SWP and the neighbor pixel gives a direction, the pro-
posed technique finds the neighbor pixel in the same direction
with the same stroke width distance to restore the neighbor SWP.
As long as the difference between the tangent angle of the current
pixel and the neighbor pixel remains the same, and the neighbor
pair satisfies the stroke width distance of SWP, the proposed tech-
nique moves in the same direction to restore the neighbor SWP.
This process works well when straight strokes are present, whilst
at curves and corners the tangent angle gives a high difference.
Moreover, this tangent-based restoration works well for individ-
ual characters but not for the whole license plate image, where
this tangent direction may be a guide for touching, adjacent char-
acters. In this situation, the proposed technique recalculates the
stroke width using eight neighbors of SWP pixels as we calculated
in Section 3.2. To find the right combination SWP out of 64, we de-
fine symmetry features as the intensity value at the first pixel, and
the second pixel has almost the same value as shown in Fig. 12(b),
which is called the Peak Intensity Symmetry (PIS). The intensity
values between the first and second pixels of SWP should have
gradual changes from high to low and low to high as shown in
Fig. 12(c), which is called the Intensity Symmetry (IS). If the com-
bination of SWP satisfies the above two symmetry features, the
pair is considered as actual contour pixels and displayed as white
pixels, which are shown in Fig. 12(d), where one can see that the
lost information in Fig. 12(a) is restored. The potential of com-
plete character reconstruction for license plate images shown in
Fig. 13(a) can be seen in Fig. 13(b) where shapes are restored, and
the recognition results in Fig. 13(c) illustrate correct OCR recogni-
tion results for both the license plate images.
In summary, the gradient domain helps us to define symmetry
properties, and at the same time, it misses vital pixels of charac-
ters due to sensitivity to low contrast and low resolution, which
results in partial character reconstruction. To overcome this prob-
lem, the proposed method defines the same properties using gray
values rather than gradient values. This is because the segmented
character does not have an influence on complex backgrounds and
it understood that the pixel of the characters have almost uniform
values. Therefore, the combination of the properties in gradient
and gray domains help us to restore the missing information. In
other words, the partial reconstruction helps in the accurate seg-
mentation of characters while segmentation helps in restoring the
complete shape using intensity values in the gray domain.
4. Experimental results
To evaluate the effectiveness of the proposed technique for
real-time applications, we consider the dataset provided by MI-
MOS, which is the institute funded by the Government of Malaysia
where License Plate Recognition (LPR) is a live ongoing project. The
dataset consists of 680 complex license plate images with various
challenges, such as poor quality images where we can expect low
contrast, blurred images, and character-touching images due to il-
lumination effects, sun light, or headlights at night.
V. Khare, P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 229
Fig. 12. Complete reconstruction in the gray domain.
Fig. 13. Effectiveness of the complete reconstruction algorithm.
To demonstrate the merit of the proposed technique, we con-
sider standard datasets that are available publicly, namely, the
UCSD dataset (Zamberletti, Gallo,  Noce, 2015) with 1547 im-
ages, which have a variety of challenges including the presence
of blur, license plate images with very small font captured from
a substantial distance, and low resolution images. The Medialab
dataset (Zamberletti et al., 2015) contains 680 license plate im-
ages, which have a variety of font sizes, illumination effects,
and shadow effects. The Uninsbria dataset (Zamberletti et al.,
2015) containing 503 license plate images captured from nearby,
are better quality compared to the UCSD and Medialab datasets,
but generally have more complex backgrounds. In total, we con-
sidered 3410 license plate images for experimentation, covering
multiple factors that were mentioned in the Introduction sec-
tion. In addition, we chose 100 license plate images that are af-
fected by multiple adverse factors as mentioned above from all
the license plate datasets to test the ability and effectiveness of
the proposed technique, which are termed as challenging data.
This data does not include ‘good’ (easy) images like in other
datasets.
Since the proposed technique is capable of handling mul-
tiple causes, we test the ability of the proposed technique
on other standard datasets, such as ICDAR 2013 which has
28 videos (Karatzas et al., 2013), YVT which has 29 videos
(Nguyen, Wang,  Belongie, 2014), and ICDAR 2015 which has
49 videos (Karatzas et al., 2015). These datasets are popular for
text detection and recognition in order to evaluate the method.
These datasets include low resolution, low contrast, complex back-
grounds, and multiple fonts, sizes, or orientations. Similarly, for
natural scene datasets, we use ICDAR 2013 (Karatzas et al., 2013),
which has 551 images, SVT which has 350 images (Wang  Be-
longie, 2010), MSRA-500 which has 500 images (Yao et al., 2012)
and ICDAR 2015 which has 462 images (Karatzas et al., 2015). The
reason to consider natural scene datasets for experimentation is to
show that when the proposed technique works well for low res-
olution and low contrast images, it will also work for high res-
olution and high contrast images. The main differences between
the video datasets and these datasets include contrast and reso-
lution. In other words, video datasets suffer from low resolution
and low contrast, while natural scene datasets provide high con-
trast and high resolution images. In total, 3510 license plate im-
ages, 106 videos, and 1863 natural scene images are considered for
experimentation to demonstrate that the proposed technique is ro-
bust, generic and effective.
230 V. Khare, P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239
The proposed technique involves a reconstruction step and a
character segmentation step. To evaluate the reconstruction step,
we follow the standard measures and scheme used in Peyrard, Bac-
couche, Mamalet, and Garcia (2015) for calculating measures,
namely, Peak Signal to Noise Ratio (PSNR), Root Mean Square Er-
ror (RMSE), and Mean Structural Similarity (MSSIM) as defined be-
low. Since the measures used (Peyrard et al., 2015) are proposed
for evaluating the quality of handwritten images, we therefore pre-
fer these measures for evaluating the reconstruction steps of the
proposed technique.
PSNR =
1
N
N

i=1
PSNRi (1)
RMSE =
1
N
N

i=1
RMSEi (2)
MSSIM =
1
N
N

i=1
MSSIMi (3)
For character segmentation, we use standard measures pro-
posed in Phan et al. (2011), where the same measures are used
for character segmentation, namely, Recall (R), Precision (P), F-
measure (F), UnderSegmentation (U) and OverSegmentation (O).
The definitions for the measures are as follows.
Truly Detected Character (TDC): A segmented block that con-
tains correctly-segmented characters.
Under-Segmented Blocks (USB): A segmented block which con-
tains more than one characters.
Over-Segmented Blocks (OSB): A segmented block that contains
no complete characters.
False detected block (FDB): A segmented block that does not
contain any characters; for example, intermediate objects, bound-
ary or a blank space. The measures can be calculated as follows.
Recall(R) = TDC/ANC,
Precision(P) = TDC/(FDB),
F − measure(F) = (2 ∗ P ∗ R)/(P + R).
UnderSegmentation(U) = USB/ANC
OverSegmentation(O) = OSB/ANC
To validate the reconstruction step that preserves charac-
ter shapes, we consider the character recognition rate as a
measure for reconstructed images with the publicly available
Tesseract OCR, 2016. For the purpose of the evaluation of the
recognition results, we follow the definitions of Recall (RR) as Pre-
cision (RP) and F-measures (RF) as in Ben-Ami, Basha, and Avi-
dan (2012), because these definitions are proposed for Bib number
recognition. Since Bib number and license number have a similar-
ity, we prefer to use these measures.
RR is defined as the percentage of correctly recognized charac-
ters out of the total number of characters (ground truth), and RP is
defined as the percentage of correctly recognized characters out of
the total number of recognized characters. For the F-measure, we
use the same formula employed for evaluating the segmentation
step for combining RR and RP into one measure.
Note that since there is no ground truth available for license
plate datasets, for MIMOS, UCSD, Medialab, and Uninsbria, we
manually count the Actual Number of Characters (ANC) as the
ground truth. For standard video and scene image datasets, we use
the available ground truth and evaluation schemes as instructions
in the ground truth.
In order to show the usefulness and effectiveness of the pro-
posed technique, we implement existing character segmentation
methods, namely, Phan et al. (2011), which use minimum cost path
estimation for character segmentation in video; the method of
Khare et al. (2015) proposes sharpness features for character seg-
mentation in license plate images, and the method of Sharma et al.
(2013), uses the combination of clusters analysis and the min-
imum cost path estimation for character segmentation in video
to facilitate comparative studies. The main reason for selecting
these existing methods is that an existing method which focuses
on a single factor may not work well for license plate images af-
fected by multiple factors. Phan et al.’s method addresses low res-
olution and low contrast factors, Khare et al.’s method is a re-
cent one that addresses license plate issues to some extent, and
Sharma et al.’s method addresses multi-oriented and touching fac-
tors. Dhar et al. (2018) proposed a system design for license plate
recognition by using edge detection and convolutional neural net-
works. Ingole and Gundre (2017) proposed character feature-based
vehicle license plate detection and recognition. Radchenko et al.
(2017) proposed a method of segmentation and recognition of
Ukrainian license plates. The reason to choose these methods is
that the objective of the methods is the same as the proposed
work. However, the methods are confined to specific applications.
In the same way, we choose the state-of-the-art recognition
methods, namely, the method of Zhou et al. (2013), which is
a robust binarization approach that works well for high resolu-
tion and low contrast images: the method of Tian et al. (2015a),
which is a recent approach proposed for the recognition of
video characters through shape restoration, and the method of
Anagnostopoulos et al. (2006) which proposes an artificial neural
network for character recognition in license plate images. The mo-
tivation to choose these methods for the comparative study is that
Zhao et al.’s method is the state-of-the-art approach which rep-
resents recognition of scene characters through binarization, Tian
et al.’s method is the state-of-the-art approach which represents
recognition of video characters through reconstruction, and the
method of Anagnostopoulos et al. (2006), is the state-of-the-art
approach recognizing characters in license plates through classi-
fiers. Since the proposed technique is robust to multiple factors, we
chose these methods to work on different datasets for undertaking
a comparative study to validate the strengths of the proposed tech-
nique. Additionally, we also consider the following methods that
explore the recent deep learning models for license plate recog-
nition. Bulan et al. (2017) proposed segmentation-and annotation-
free license plate recognition with deep localization and failure
identification. The method explore CNNs for detecting a set of can-
didate regions. Silva and Jung (2018) proposed license plate de-
tection and recognition in unconstrained scenarios. The method
explores CNNs for addressing challenges caused by degradation.
Lin et al. (2018) proposed an efficient license plate recognition sys-
tem using convolution neural networks.
For finding the value for the parameters, threshold, symmetry
properties and conditions, we randomly chose 500 sample images
from the dataset for experimentation. Since the proposed method
does not involve classifiers for training, we prefer to choose sam-
ples randomly from all the databases considered in this work for
experimentation. We use a system with an Intel Core i5 CPU with
8 GB RAM configuration for all experiments. According to our ex-
periments, the proposed method consumes 30 ms for each im-
age, which includes partial reconstruction, character segmentation,
complete character reconstruction and recognition.
In Section 3.3, we define three hypotheses for ideal charac-
ter detection, over-segmentation and under-segmentation based on
the principal (PCA) and major axes (MA). It is expected that the
PCA gives the same angles for the ideal characters. However, it is
not the case due to the complexity of the problem. Therefore, we
V. Khare, P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 231
Fig. 14. Determining the optimal value for the threshold of PCA and MA to check whether a segmented character is ideal or not.
Note: at a 26 angle value, the recognition rate is high compared to other percentage values.
Fig. 15. Determining the percentage of missing pixels to define partial reconstruction and the threshold value for angle difference between PCA and MA angles.
set ±26° as a threshold for character segmentation using partial
reconstruction results. To determine the value, we conduct experi-
ments for 500 samples chosen randomly by varying different angle
values against the recognition rate as shown in Fig. 14, where it is
observed that for angle, 26, the proposed method reports a high
recognition rate. Hence, we choose the same value for all the ex-
periments in this work.
In Section 3.2, the proposed method introduces the partial re-
construction concept for character segmentation. It is expected
that the partial reconstruction step outputs the structure of the
character shape such that at least a human could read the char-
acter. The question is how to define the partial reconstruction in
terms of quantity. Therefore, we conducted experiments by esti-
mating the number of missing pixels compared to the pixels in
the ground truth. In this experiment, we manually add noise and
blur at different levels to make the character images complex such
that they lose pixels. We calculate the percentage of missing pixels
with the help of the ground truth. We illustrate sample results for
different percentages of missing pixels during partial reconstruc-
tion as shown in Fig. 15(a) where we can see angles given by PCA,
MA, the difference between the PCA and MA angle and different
percentages of missing white pixels. It is observed from Fig. 15(a)
and 15(b) that for 90% to 40%, the proposed method constructs the
complete shape of the character and obtains correct recognition re-
sults. But for lower than 40%, the proposed method loses the shape
of the character, which results in incorrect recognition. Based on
this experimental analysis, we consider 40% as the threshold to de-
fine partial reconstruction results in this work. It is also noted from
Fig. 15(a) that for a difference angle, 28.2, the proposed criteria for
character segmentation fail as the OCR gives incorrect results. It is
evident that ±26 is a feasible threshold to achieve better results.
4.1. Experiments for analyzing the contributions of individual steps of
the proposed technique
The major contributions of the proposed technique are partial
reconstruction, character segmentation and complete reconstruc-
tion. To understand the effectiveness of each step, we conducted
experiments on the MIMOS dataset and calculated the respec-
tive measures as reported in Table 1. The reason for selecting the
Table 1
Performances of individual steps of the Proposed Technique on the MIMOS dataset.
Steps Quality measures Segmentation measures Recognition
PSNR RMSE MSSIM R P F O U RR RP RF
PR 12.3 69.4 0.29 23.7 13.1 18.5
SWR 16.3 8.4 11.1 33.3 26.6
CRWS 8.6 78.6 0.24 14.4 13.2 13.8
232 V. Khare, P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239
MIMOS dataset is that it consists of live data provided by a re-
search institute. To estimate the quality measures for partial re-
construction and complete reconstruction, we use the Canny edge
images of English alphabets created artificially as the ground truth.
It is noted from the quality measures of the partial reconstruction
reported in Table 1 that except MSSIM, the other two measures re-
port poor results. This shows that the partial reconstruction steps
preserve the character structures, at the same time, some informa-
tion is lost. It is evident from the recognition results of the partial
reconstruction reported in Table 1 that all three measures report
low results. Therefore, one can ascertain that partial reconstruction
alone may not help us to achieve better results. To analyze the ef-
fectiveness of the segmentation step, we apply the segmentation
step on the Canny edge images of the input characters without
partial reconstruction (SWR) results. It is observed from the mea-
sures of segmentation that they all report low results, especially
under- and over-segmentation report poor results. This shows that
the segmentation step alone is inadequate for solving the problem
of segmentation for license plate images. Similarly, we apply the
steps of the complete reconstruction algorithm on the Canny edge
image of each input image without segmenting characters (CRWS).
The results reported in Table 1 show that the quality measures
report low results except for MSSIM, and the measures of recog-
nition also report poor results. Therefore, we can argue that the
symmetry features proposed for complete reconstruction are not
good when we apply them on the whole image without segmen-
tation. Overall, we can conclude that reconstruction and character
segmentation are complementing each other to achieve better re-
sults.
In the case of license plate recognition, when the images are af-
fected by multiple causes, sometimes, we can expect a little elon-
gation, such as the effect of perspective distortion. To show the
effect of elongation created by multiple causes, we implemented
the method in Dhar et al. (2018) where the method considers ex-
trema points for correcting small tilts to the horizontal direction.
In this work, we calculate quality measures, segmentation mea-
sures and recognition rate before and after rectification on the MI-
MOS dataset as reported in Table 2. Before rectification the images
are considered as input without correcting the small tilt in the hor-
izontal direction for experimentation. After rectification, the cor-
rected images are considered for experimentation. It is found from
Table 2 that the results of all the steps including the proposed
method give slightly better results after rectification compared to
before rectification. However, the difference is marginal. Therefore,
we can conclude that overall, if we use rectification before recog-
nizing the license plates, the recognition rate improves slightly.
4.2. Experiments on the proposed character segmentation approach
Qualitative results of the proposed technique on license plate
images of different datasets, namely, MIMOS, Medialab, UCSD and
Uninsubria are shown in Fig. 16(a) and 16(b), where we can see
that the complexity of the input images vary from one dataset to
another due to multiple factors of the datasets. For such images,
the proposed technique segments characters successfully. It is
evident that the proposed technique is robust to multiple factors.
Quantitative results of the proposed and existing techniques for
the above-mentioned datasets are reported in Table 3, where we
note that the proposed technique is the best at all the measures
especially under- and over-segmentation rates, which report a low
score compared to the existing techniques. Table 3 shows that
all the methods including the proposed technique provide good
accuracies on the MIMOS dataset and the lowest for the UCSD
dataset. This is because the number of distorted images is higher
in the case of the UCSD dataset compared to MIMOS and the
other datasets. The results of the proposed and existing methods
Table
2
Performance
of
the
individual
steps
and
the
proposed
method
before
and
after
rectification
on
the
MIMOS
dataset.
Before
rectification
After
rectification
Steps
Quality
measures
Segmentation
measures
Recognition
Quality
measures
Segmentation
measures
Recognition
PSNR
RMSE
MSSIM
R
P
F
O
U
RR
RP
RF
PSNR
RMSE
MSSIM
R
P
F
O
U
RR
RP
RF
PR
12.3
69.4
0.29
23.7
13.1
18.5
13.7
65.4
0.32
27.6
15.3
21.4
SWR
16.3
8.4
11.1
33.3
26.6
18.9
10.6
14.7
30.7
24.4
CRWS
8.6
78.6
0.24
14.4
13.2
13.8
10.4
74.3
0.21
Proposed
32.1
7.1
0.65
86.8
82.6
84.6
10.8
2.4
88.4
84.3
86.3
34.7
6.4
0.62
88.9
84.3
86.6
9.4
2.1
90.6
87.3
88.9
V. Khare, P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 233
Table 3
Performance of the proposed and existing techniques for character segmentation on different license plate datasets.
Datasets Measures
Phan et al.,
2011
Khare et al.,
2015
Sharma et al.,
2013
Dhar et al.,
2018
Ingole and
Gundre (2017)
Radchenko
et al., 2017 Proposed
MIMOS R 39.4 58.4 68.3 73.4 74.6 65.3 86.8
P 38.4 57.3 66.9 72.3 70.4 63.3 82.6
F 38.7 57.5 67.5 72.8 72.5 64.3 84.6
O 21.1 23.2 14.9 14.8 15.3 18.3 10.8
U 38.7 18.4 16.7 12.4 12.2 17.4 2.4
Medialab R 34.3 51.3 54.7 69.7 70 59.4 82.1
P 33.6 47.3 42.1 64.2 67.4 55.4 81.6
F 33.9 49.3 48.3 66.9 68.7 57.4 81.6
O 24.2 25.2 19.7 21.1 20.4 42.6 10.1
U 39.6 20.6 22.6 12.7 10.9 23.3 7.9
UCSD R 21.3 26.1 41.3 35.2 47.2 29.6 56.7
P 20.4 22.4 36.9 30.6 40.7 27.4 53.4
F 20.8 24.6 39.1 32.9 43.9 28.5 55.1
O 35.5 43.1 26.4 39.7 34 35.9 12.9
U 45.7 30.6 32.6 27.4 22.1 35.6 29.8
Uninusubria R 31.4 42.7 61.3 41.6 53.6 48.7 75.7
P 30.5 41.6 57.4 39.8 50.9 46.1 66.4
F 30.9 42.1 59.3 40.7 52.2 47.4 71.1
O 35.7 28.9 22.3 31.9 26.9 24.3 12.3
U 32.9 28.4 16.4 27.4 20.8 28.3 12.6
Only Challenged
Images
R 33.4 47.2 57.4 43.6 54.8 51.6 72.1
P 36.2 42.3 52.3 41.1 50.4 47.9 73.4
F 34.7 44.7 54.8 42.3 52.6 49.7 72.6
O 34.3 29.7 24.6 33.5 24.8 26.8 13.6
U 30.9 25.5 20.6 24.1 22.6 23.4 13.8
Fig. 16. Qualitative results of the proposed technique for character segmentation on different datasets.
on challenging data show that the proposed method performs
almost the same as other license plate approaches despite the
fact that the challenging data does not include any ‘good’ (easy)
images as in other datasets. The reason for the poor results by
the existing methods is the main goal of all the three methods
is to detect text in video or natural scene images but not license
plate images. Similarly, though the methods, namely, Dhar et al.
(2018), Ingole and Gundre (2017) and Radchenko et al. (2017) were
developed for character segmentation from license plate images,
the methods do not perform well on all the dataset compared to
the proposed method. The reason is that the methods depend on
profile based features, binarization and the specific nature of the
dataset as conventional document analysis methods.
Similarly, quantitative results of the proposed and existing tech-
niques for video and natural scene images are reported in Table 4,
where it is observed that the proposed technique is the best for
the F-measure for under-and over-segmentations as compared to
existing techniques. It may be noted from Table 4 that the pro-
posed technique scores consistent results for all the datasets except
for the MSRA-TD-500 dataset. This is because this dataset contains
arbitrary-oriented texts. Since our aim is to develop a technique
for license plate images, where we may not find arbitrary orienta-
tions, the proposed technique gives poor results when the charac-
ters are in arbitrary orientations, such as curved texts. The reason
for the poor results by the existing method is that all the three
methods are sensitive to the starting point as they need to esti-
mate the minimum cost path. On the other hand, the proposed
technique does not require either seed points or starting points to
find spaces between characters. Overall, the segmentation experi-
ments shows that the proposed technique is capable of handling
license plates as well as video and natural scene images.
4.3. Experiments on the proposed character recognition technique
through reconstruction
Qualitative results of the proposed and existing techniques for
the recognition of license plate images for different datasets are
shown in Fig. 17(a)–17(d) for MIMOS, Medialab, UCSD and Unin-
subria, respectively. The recognition step considers the output of
the segmentation step as the input, which is shown in Fig. 17,
to reconstruct shapes of the segmented characters. This results in
the conclusion that the proposed technique reconstructs shapes
well for characters of different datasets affected by different fac-
tors. It can be validated by the recognition results given by OCR
in double quotes in Fig. 17. Thus, we can assert that the pro-
posed technique does not require binarization for recognition. To
evaluate the reconstruction results given by the proposed and ex-
isting techniques, we estimate quality measures, which are re-
ported in Table 5 for license plate, video and natural scene im-
ages datasets. Since Tian et al.’s method (Tian et al., 2015a) outputs
234 V. Khare, P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239
Table 4
Performance of the proposed and existing techniques for character segmentation on different video and natural scene datasets.
Datasets Measures Phan et al.,
2011
Khare et al.,
2015
Sharma et al.,
2013
Dhar et al.,
2018
Ingole and
Gundre (2017)
Radchenko
et al., 2017
Proposed
ICDAR 2015 Video R 22.6 37.9 60.7 39.4 55.3 46.8 66.9
P 24.6 34.2 58.3 37.4 53.9 42.8 62.4
F 23.2 36.1 59.5 38.4 54.6 44.8 64.6
O 38.7 28.1 23.4 31.9 24.1 30.9 18.7
U 36.4 35.8 17.1 29.7 21.3 24.3 18.1
YVT Video R 30.6 38.2 51.6 51 65.9 57.4 74.9
P 29.7 41.6 52.4 49.1 60.3 56.3 73.4
F 30.1 39.9 52.1 50 63.1 56.8 73.8
O 32.7 32.6 24.3 29.3 24.5 24.8 16.2
U 36.1 29.8 23.6 20.7 12.4 18.3 14.9
ICDAR 2013 Video R 28.9 37.4 52.6 54.2 66.8 54.5 71.2
P 27.6 39.1 51.4 50.3 63.4 52.2 70.9
F 28.1 38.5 51.9 52.2 65.1 53.3 71.1
O 42.4 33.2 17.8 30.3 20.8 29.2 13.5
U 30.6 28.9 29.4 17.4 14.1 17.4 18.4
ICDAR 2015 Scene
Dataset
R 31.4 42.7 61.3 56.7 58.3 54.1 71.3
P 30.5 41.6 57.4 52.2 52.8 48.7 69.4
F 30.9 42.1 59.3 54.7 55.5 51.4 70.7
O 35.7 28.9 22.3 29.1 31.9 27 14.2
U 32.9 28.4 16.4 16.4 12.5 21.6 16.8
ICDAR 2013 Scene
Dataset
R 32.6 40.7 61.1 59.3 54.3 53.8 76.8
P 32.5 46.5 52.2 52.6 53.7 47.2 72.3
F 32.5 43.4 56.6 55.9 54 50.5 74.5
O 37.1 29.7 21.3 23.6 25.3 25.4 16.3
U 32.4 26.9 22.0 20.4 20.7 24.1 9.1
SVT Scene Dataset R 21.4 38.6 61.3 43.4 50.6 47.9 64.7
P 20.4 31.4 57.4 39.2 45.8 42.7 61.9
F 20.9 35.0 59.3 41.3 48.2 45.3 63.3
O 41.3 22.9 22.3 31.6 27.5 30.8 12.6
U 37.2 42.1 16.4 27.1 24.3 23.9 24.1
MSRA-TD-500
Dataset
R 22.4 26.7 42.1 30.7 38.4 34.3 59.3
P 23.6 24.4 32.1 28.4 35.7 29.4 57.3
F 22.7 25.6 37.1 29.5 37 31.8 58.6
O 32.4 41.2 29.3 42.6 36.6 39.9 22.7
U 46.9 35.7 31.8 27.8 26.3 28.2 28.9
Fig. 17. Qualitative results of the proposed technique for reconstruction and recognition on different license plate images.
V. Khare, P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 235
Fig. 18. Overall performance of the proposed method on the images affected by multiple adverse factors. Column-1-Column-5 denote input images of different causes, the
results of partial reconstruction, the result of character segmentation, the result of full reconstruction and recognition, respectively.
Table 5
Performance of the proposed and existing techniques for reconstruction on different
license, video and natural scene datasets.
Methods Tian et al., 2015a Proposed
Datasets RMSE PSNR MSSIM RMSE PSNR MSSIM
MIMOS 22.7 19.9 0.74 7.1 32.1 0.65
Medialab 42.7 21.8 0.79 12.4 26.3 0.60
UCSD 69.0 19.7 0.59 31.7 22.4 0.6
Uninsubria 72.4 8.4 0.52 26.3 23.8 0.4
ICDAR 2015 Video 63.5 11.7 0.61 19.7 23.9 0.63
YVT Video 55.3 16.1 0.68 16.3 24.5 0.67
ICDAR 2013 Video 63.6 11.7 0.61 18.4 24.0 0.60
ICDAR 2015 Scene 57.3 15.4 0.67 18.41 24.0 0.64
ICDAR 2013 Scene 57.3 15.4 0.67 16.2 24.6 0.65
SVT Scene 62.1 12.4 0.62 22.3 23.8 0.61
MSRA-TD 500 Scene 68.7 10.7 0.59 26.1 23.7 0.55
Only Challenged 39.2 12.5 0.57 15.6 20.4 0.48
reconstruction results for recognition as in our technique but un-
like other existing methods, the proposed technique is compared
with only Tian et al.’s method (Tian et al., 2015a). Table 5 shows
that the proposed technique is better than the existing method in
terms of three quality measures for all the three types of datasets.
It is also observed from Table 5 that the proposed method per-
forms almost the same as on other datasets, but is applied to chal-
lenging data. The main reason for the poor results of the exist-
ing method is that it depends on gradient information, which gives
good responses for high contrast images for reconstructing charac-
ter images, while the proposed technique uses both gradient and
intensity information for reconstruction to handle the images af-
fected by multiple factors.
Quantitative results for the recognition of the proposed and ex-
isting techniques on license plate images, video and natural image
datasets are reported in Table 6. These experiments include recog-
nition results using Canny edge images of input character images,
where we pass Canny edge images to the OCR directly for recogni-
tion without reconstruction. To demonstrate that a Canny edge im-
age alone without reconstruction is inadequate to achieve good re-
sults, we conducted recognition experiments by passing the Canny
edge images to an OCR directly. It is can be verified from the re-
sults reported in Table 6, where it is noted that recognition results
with Canny images are far from those of the proposed technique
in terms of all the three measures. This is due to the fact that
Canny is sensitive to blur and complex backgrounds. We can also
observe from Table 6 that the proposed technique achieves bet-
ter results than the other existing methods for complex datasets,
namely, MIMOS, UCSD, YYVT video, SVT, MSRA and the challeng-
ing dataset. For other datasets, the existing method, Silva and
Jung (2018) achieves better results than all the methods including
the proposed method. This is justifiable because the method ex-
plores a powerful deep learning model for unconstrained license
plate recognition. It is evident that the methods of Bulan et al.
(2017) and Lin et al. (2018), which also explore deep learning mod-
els for license plate recognition, achieve better results than all the
other existing methods but these two perform worse than the pro-
posed method. However, the difference between the method in
Silva and Jung (2018) and the proposed method is marginal. Be-
sides, the results on difficult data show that the proposed method
is effective in tackling challenges as it reports almost the same as
on the other datasets. Therefore, the proposed technique is robust
and generic compared to the existing methods. The major weak-
ness of the existing methods is as follows. Since the gradient used
in Tian et al.’s method is good for high contrast images, it gives
poor results for low contrast images; Zhao et al.’s method is de-
veloped for high contrast and homogeneous background images,
however it gives poor results, and Anagnostopoulus et al.’s method
involves binarization and parameter tuning to give poor results
for the images affected by multiple factors. Conversely, the pro-
posed technique does not depend on binarization and explores the
236 V. Khare, P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239
Table 6
Performance of the proposed and existing techniques for recognition on different license, video and natural scene datasets.
Datasets Measures Canny
Anagnostopoulos
et al., 2006
Zhou et al.,
2013
Tian et al.,
2015a
Bulan et al.,
2017
Silva and
Jung (2018)
Lin et al.,
2018 Proposed
MIMOS RR 58.7 63.2 47.4 57.6 86.3 89.3 78.3 88.4
RP 54.3 64.7 52.3 59.7 82.6 83.2 74.9 84.3
RF 56.4 63.8 50.3 58.6 84.5 86.2 76.6 86.3
Medialab RR 59.3 64.7 52.4 61.2 83.7 86.4 75.6 82.3
RP 52.4 66.9 56.8 62.7 75.3 82.3 71.9 79.3
RF 55.3 65.7 54.6 61.6 79.5 84.3 73.7 81.3
UCSD RR 29.2 42.3 47.2 44.9 52.4 58.3 51.7 65.7
RP 32.7 44.7 48.1 46.2 47.4 55.3 49.5 62.1
RF 31.3 43.6 47.6 45.5 49.9 56.8 50.6 63.9
Uninsubria RR 62.4 65.3 68.3 64.3 76.4 78.4 77.1 78.7
RP 66.7 68.7 69.4 69.4 72.4 75.3 77.4 80.3
RF 64.8 66.9 68.8 67.1 74.4 76.8 77.2 79.5
ICDAR 2015
Video
RR 66.2 68.9 71.8 72.6 83.4 86.4 84.3 78.6
RP 61.3 75.7 72.3 72.7 81.3 80.3 78.9 73.4
RF 63.7 72.7 72.1 72.6 82.3 83.3 81.6 76.2
YVT Video RR 72.4 72.9 66.9 71.4 83.4 85.9 84.9 78.3
RP 65.3 77.8 70.3 74.8 79.2 81.4 78.8 82.6
RF 68.4 75.8 68.7 72.9 81.3 83.6 81.8 80.5
ICDAR 2013
Video
RR 68.2 78.7 71.3 74.9 81.6 83.2 79.5 83.7
RP 61.3 79.3 68.9 71.3 80.4 81.5 78.4 84.2
RF 65.7 78.5 69.8 72.8 81 82.3 78.9 83.5
ICDAR 2015
Scene
RR 66.8 77.3 72.1 65.3 82.3 85.7 80.2 80.3
RP 67.3 72.1 74.3 62.1 81.4 84.4 80.3 82.1
RF 66.9 75.2 73.6 64.4 81.8 85 80.2 81.5
ICDAR 2013
Scene
RR 59.3 72.3 71.3 65.6 83.1 86.1 81.4 78.3
RP 56.3 72.4 68.7 64.3 81.5 84.3 80.4 73.2
RF 58.6 72.3 70.1 64.9 82.3 85.2 80.9 75.8
SVT Scene RR 58.3 76.4 71.4 66.3 78.2 79.3 76.4 80.4
RP 59.7 78.3 74.7 67.2 77.3 76.3 74.8 81.6
RF 58.6 77.9 73.1 66.8 77.7 77.8 75.6 81.0
MSRA-TD-500
Scene
RR 64.3 73.9 75.9 72.4 78.4 81.3 74.9 82.4
RP 65.8 76.4 74.3 77.3 74.8 80.4 73.8 81.6
RF 64.9 75.9 75.1 75.4 76.6 80.8 74.3 81.9
Only
Challenged
Images
RR 58.7 51.9 47.4 57.6 54.8 57.6 55.9 62.9
RP 54.3 52.3 52.3 59.7 51.7 56.6 51.4 65.7
RF 56.5 52.1 49.8 58.6 53.2 57.1 53.6 64.3
combination of gradient and intensity for reconstruction through
character segmentation, and it performs better than the existing
methods especially for the datasets, which involve images contain-
ing multiple challenging factors.
Overall, to show the proposed method is robust to multiple
adverse factors as mentioned in the Introduction and Proposed
Methodology sections, we present sample results of each step on
different images affected by low contrast, complex background,
multi-fonts, multi-font sizes, blur and distortion due to perspec-
tive angle, as shown respectively in Fig. 18(a)–18(f), which include
the results of partial reconstruction, character segmentation, full
reconstruction and recognition. One can assert from the results
shown in Fig. 18 that the proposed method has significant bene-
fits in handling multiple adverse factors. If a license plate image
contains any logo or symbol as shown in Fig. 18(c), the segmenta-
tion algorithm dissects the symbols as characters. However, when
the result is sent to an OCR, it fails as shown in Fig. 18(c). As a
result, the presence of symbols in license plate images does not
affect the overall performance of the technique. It is evident from
the results reported in Tables 3, 5 and 6 on challenging data, that
one can see the proposed method performs almost the same as on
the other data. It is noted that for the recognition experiments, we
use OCR, which is available publicly. This OCR has inherent limita-
tions such as image size, font variation, and orientation. As a result,
despite the fact that the proposed method reconstructs character
shapes, it fails to achieve a high accuracy, which is more than 90%.
Since our target is to address the above challenges and to develop
a generalized method, we prefer to use available OCR approaches
to demonstrate the effectiveness and usefulness of the proposed
method rather than using language models, lexicons and learning
models. This is because these methods restrict generality. There-
fore, we believe that the proposed work makes an important state-
ment that there is a way to handle adverse factors such that one
can use machine learning or deep learning to achieve high accu-
racy instead of using traditional OCR by considering reconstructed
results as input. This is our next target to achieve high accuracy on
the MIMOS dataset by exploring the deep learning concept.
To show the effectiveness of the proposed method on license
plate images of different countries, we test the steps of the pro-
posed method on American license plate images as well as those
shown in Fig. 19, where it is noted that the steps and the pro-
posed method work well for American license plate images. This is
the advantage of the steps proposed in this work, i.e. stroke width
pair candidate detection, partial reconstruction, character segmen-
tation, complete reconstruction and recognition. This shows that
the proposed method is independent of scripts.
V. Khare, P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 237
Fig. 19. Examples of the proposed stroke width pair candidate detection, reconstruction, segmentation and recognition approaches for American license plate images.
Fig. 20. Recognition rate of the proposed method for different scales to find the lower and upper boundary for scaling up and down.
To test the scaling effect for license plate recognition of the pro-
posed method, we calculate recognition rate for different scales as
shown in Fig. 20. If the image is too small as shown in Fig. 20 (i.e.
size of the character image is 4 × 4), the proposed method reports
poor results as shown in Fig. 20. This type of small size is rare for
license plate recognition. However, for a size greater than 16 × 16,
the proposed method gives better results. This shows that the dif-
ferent scales may not have much of an effect on the overall perfor-
mance of the proposed method. Therefore, we can conclude that
the proposed method is invariant to scaling. This is justifiable be-
cause the features proposed based on stroke width distance are
invariant to scaling.
238 V. Khare, P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239
5. Conclusions and future work
We have proposed a novel technique for recognizing license
plates, video and natural scene images through reconstruction.
The proposed technique explores gradient and Laplacian symmet-
rical features based on stroke width distance to obtain partial re-
construction for segmenting characters. To segment characters af-
fected by multiple factors such as low contrast, blur, complex back-
grounds, and illumination variations, we introduce angular infor-
mation for partial reconstruction results based on character struc-
tures, which solve under- and over-segmentations successfully. For
segmented characters, the proposed technique explores symme-
try features based on stroke width distance and tangent direc-
tion in the gray domain to restore complete shapes for partial re-
construction results. Comprehensive experimental results are con-
ducted on large datasets, which include license plates, video and
natural scene images to show that the proposed technique is ro-
bust and generic compared to existing methods. The same idea can
be extended with the help of a deep learning concept for images
of different scripts from other countries, such as Indian, Russian,
Arabic and European, to develop a generic system in the near fu-
ture.
Acknowledgements
This work was supported by the Natural Science Foundation
of China under Grant nos. 61672273 and 61832008, and the Sci-
ence Foundation for Distinguished Young Scholars of Jiangsu un-
der Grant BK20160021. This work is also partly supported by
the University of Malaya under Grant no: UM.0000520/HRU.BK
(BKS003-2018).
The authors would like to thank the anonymous reviewers and
the Editor for their constructive comments and suggestions to im-
prove the quality and clarity of this paper.
Conflict of Interest
None.
References
Abolghasemi, V.,  Ahmadyfard, A. (2009). An edge-based color-aided method for
license plate detection. Image and Vision Computing, 27(8), 1134–1142.
Al-Ghaili, A. M., Mashohor, S., Ramli, A. R.,  Ismail, A. (2013). Vertical-edge-based
car-license-plate detection method. IEEE Transactions on Vehicular Technology,
62(1), 26–38.
Al-Shemarry, M. S., Li, Y.,  Abdulla, S. (2018). Ensemble of adaboost cascades of
3L-LBPs classifiers for license plates detection with low quality images. Expert
Systems with Applications, 92, 216–235.
Anagnostopoulos, C. N. E., Anagnostopoulos, I. E., Loumos, V.,  Kayafas, E. (2006).
A license plate-recognition algorithm for intelligent transportation system ap-
plications. IEEE Transactions on Intelligent Transportation Systems, 7(3), 377–392.
Azam, S.,  Islam, M. M. (2016). Automatic license plate detection in hazardous con-
dition. Journal of Visual Communication and Image Representation, 36, 172–186.
Ben-Ami, I., Basha, T.,  Avidan, S. (2012). Racing bib numbers recognition. In Pro-
ceedings of the BMVC (pp. 1–10).
Bulan, O., Kozitsky, V., Ramesh, P.,  Shreve, M. (2017). Segmentation-and annota-
tion-free license plate recognition with deep localization and failure identifica-
tion. IEEE Transactions ITS, 18(9), 2351–2363.
Dhar, P., Guha, S., Biswas, T.,  Abedin, M. Z. (2018). A system design for license
plate recognition by using edge detection and convolution neural network. In
Proceedings of the IC4ME2 (pp. 1–4).
Dong, M., He, D., Luo, C., Liu, D.,  Zeng, W. (2017). A CNN-based approach for
automatic license plate recognition in the wild. In Proceedings of the BMCV
(pp. 1–12).
Du, S., Ibrahim, M., Shehata, M.,  Badawy, W. (2013). Automatic license plate recog-
nition (ALPR): A state-of-the-art review. IEEE Transactions on Circuits and Sys-
tems for Video Technology, 23(2), 311–325.
Epshtein, B., Ofek, E.,  Wexler, Y. (2010). Detecting text in natural scenes with
stroke width transform. In Proceedings of the IEEE conference on computer vision
and pattern recognition (CVPR) (pp. 2963–2970). IEEE.
Goodfellow, I. J., Bulatov, Y., Ibarz, J., Arnoud, S.,  Shet, V. (2013). Multi-digit num-
ber recognition from street view imagery using deep convolutional neural net-
works. arXiv:1312.6082.
Gou, C., Wang, K., Yao, Y.,  Li, Z. (2016). Vehicle license plate recognition based
on extremal regions and restricted Boltzmann machines. IEEE Transactions on
Intelligent Transportation Systems, 17(4), 1096–1107.
Ingole, S. K.,  Gundre, S. B. (2017). Characters feature based Indian Vehicle license
plate detection and recognition. In Proceedings of the I2C2 (pp. 1–5).
Jaderberg, M., Simonyan, K., Vedaldi, A.,  Zisserman, A. (2016). Reading text in the
wild with convolutional neural networks. International Journal of Computer Vi-
sion, 116(1), 1–20.
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M.,
et al. (2015). ICDAR 2015 competition on robust reading. In Proceedings of
the 13th international conference on document analysis and recognition (ICDAR)
(pp. 1156–1160). IEEE.
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L. G., Mestre, S. R.,
et al. (2013). ICDAR 2013 robust reading competition. In Proceedings of the
12th international conference on document analysis and recognition (ICDAR)
(pp. 1484–1493). IEEE.
Khare, V., Shivakumara, P., Raveendran, P., Meng, L. K.,  Woon, H. H. (2015). A
new sharpness based approach for character segmentation in License plate im-
ages. In Proceedings of the 3rd IAPR Asian conference on pattern recognition (ACPR)
(pp. 544–548). IEEE.
Kim, D., Song, T., Lee, Y.,  Ko, H. (2016). Effective character segmentation for license
plate recognition under illumination changing environment. In Proceedings of
the IEEE international conference on consumer electronics (ICCE) (pp. 532–533).
IEEE.
Liang, G., Shivakumara, P., Lu, T.,  Tan, C. L. (2015). A new wavelet-Laplacian
method for arbitrarily-oriented character segmentation in video text lines. In
Proceedings of the 13th international conference on document analysis and recog-
nition (ICDAR) (pp. 926–930). IEEE.
Lin, C. H., Lin, Y. S.,  Liu, W. C. (2018). An efficient license plate recognition system
using convolution neural networks. In Proceedings of the ICASI (pp. 224–227).
Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y.,  Alsaadi, F. E. (2017). A survey of
deep neural network architectures and their applications. Neurocomputing, 234,
11–26.
Nguyen, P. X., Wang, K.,  Belongie, S. (2014). Video text detection and recognition:
Dataset and benchmark. In Proceedings of the IEEE winter conference on applica-
tions of computer vision (WACV) (pp. 776–783). IEEE.
Peyrard, C., Baccouche, M., Mamalet, F.,  Garcia, C. (2015). ICDAR2015 competition
on text image super-resolution. In Proceedings of the 13th international confer-
ence on document analysis and recognition (ICDAR) (pp. 1201–1205). IEEE.
Phan, T. Q., Shivakumara, P., Su, B.,  Tan, C. L. (2011). A gradient vector flow-based
method for video character segmentation. In Proceedings of the international con-
ference on document analysis and recognition (ICDAR) (pp. 1024–1028). IEEE.
Radchenko, A., Zarovsky, R.,  Kazymyr, V. (2017). Method of segmentation and
recognition of Ukrainian license plates. In Proceedings of the YSF (pp. 62–65).
Raghunandan, K. S., Shivakumara, P., Jalab, H. A., Ibrahim, R. W., Kumar, G. H., Pal, U.,
et al. (2017). Riesz fractional based model for enhancing license plate detection
and recognition. IEEE Transactions on Circuits and Systems for Video Technology,
28(9), 2276–2288.
Rathore, M. M., Ahmad, A., Paul, A.,  Rho, S. (2016). Urban planning and building
smart cities based on the internet of things using big data analytics. Computer
Networks, 101, 63–80.
Saha, S., Basu, S.,  Nasipuri, M. (2015). iLPR: An Indian license plate recognition
system. Multimedia Tools and Applications, 74(23), 10621–10656.
Sedighi, A.,  Vafadust, M. (2011). A new and robust method for character segmen-
tation and recognition in license plate images. Expert Systems with Applications,
38(11), 13497–13504.
Sharma, N., Shivakumara, P., Pal, U., Blumenstein, M.,  Tan, C. L. (2013). A new
method for character segmentation from multi-oriented video words. In Pro-
ceedings of the 12th international conference on document analysis and recognition
(ICDAR) (pp. 413–417). IEEE.
Shivakumara, P., Dutta, A., Tan, C. L.,  Pal, U. (2014). Multi-oriented scene text de-
tection in video based on wavelet and angle projection boundary growing. Mul-
timedia Tools and Applications, 72(1), 515–539.
Shivakumara, P., Phan, T. Q., Bhowmick, S., Tan, C. L.,  Pal, U. (2013). A novel ring
radius transform for video character reconstruction. Pattern Recognition, 46(1),
131–140.
Shivakumara, P., Roy, S., Jalab, H. A., Ibrahim, R. W., Pal, U., Lu, T., et al. (2019).
Fractional means based method for multi-oriented keyword spotting in
video/scene/license plate images. Expert Systems with Applications, 118, 1–19.
Silva, S. M.,  Jung, C. R. (2018). License plate detection and recognition in uncon-
strained scenarios. In Proceedings of the ECCV (pp. 593–609).
Suresh, K. V., Kumar, G. M.,  Rajagopalan, A. N. (2007). Superresolution of license
plates in real traffic videos. IEEE Transactions on Intelligent Transportation Sys-
tems, 8(2), 321–331.
Tadic, V., Popovic, M.,  Odry, P. (2016). Fuzzified Gabor filter for license plate de-
tection. Engineering Applications of Artificial Intelligence, 48, 40–58.
Tesseract OCR software (2016). http://vision.ucsd.edu/belongie-grp/research/carRec/
car_rec.html
Tian, J., Wang, R., Wang, G., Liu, J.,  Xia, Y. (2015a). A two-stage character segmen-
tation method for Chinese license plate. Computers  Electrical Engineering, 46,
539–553.
Tian, S., Shivakumara, P., Phan, T. Q., Lu, T.,  Tan, C. L. (2015b). Character shape
restoration system through medial axis points in video. Neurocomputing, 161,
183–198.
Wang, K.,  Belongie, S. (2010). Word spotting in the wild. In Proceedings of the
European conference on computer vision (pp. 591–604). Springer.
V. Khare, P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 239
Wang, Y., Shi, C., Xiao, B.,  Wang, C. (2015). MRF based text binarization in complex
images using stroke feature. In Proceedings of the 13th international conference on
document analysis and recognition (ICDAR) (pp. 821–825). IEEE.
Yang, Y., Li, D.,  Duan, Z. (2018). Chinese vehicle license plate recognition using
kernel-based extreme learning machine with deep convolutional features. IET
Intelligent Transport System, 12(3), 213–219.
Yao, C., Bai, X., Liu, W., Ma, Y.,  Tu, Z. (2012). Detecting texts of arbitrary orienta-
tions in natural images. In Proceedings of the IEEE conference on computer vision
and pattern recognition (pp. 1083–1090). IEEE.
Ye, Q.,  Doermann, D. (2015). Text detection and recognition in imagery: A survey.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(7), 1480–1500.
Yu, S., Li, B., Zhang, Q., Liu, C.,  Meng, M. Q. H. (2015). A novel license plate loca-
tion method based on wavelet transform and EMD analysis. Pattern Recognition,
48(1), 114–125.
Yuan, Y., Zou, W., Zhao, Y., Wang, Xin’an, Hu, X.,  Komodakis, N. (2017). A robust
and efficient approach to license plate detection. IEEE Transactions Image Pro-
cessing, 26(3), 1102–1114.
Zamberletti, A., Gallo, I.,  Noce, L. (2015). Augmented text character proposals and
convolutional neural networks for text spotting from scene images. In Proceed-
ings of the 3rd IAPR Asian conference on pattern recognition (ACPR) (pp. 196–200).
IEEE.
Zhou, W., Li, H., Lu, Y.,  Tian, Q. (2012). Principal visual word discovery for au-
tomatic license plate detection. IEEE Transactions on Image Processing, 21(9),
4269–4279.
Zhou, Y., Feild, J., Learned-Miller, E.,  Wang, R. (2013). Scene text segmentation via
inverse rendering. In Proceedings of the 12th international conference on document
analysis and recognition (ICDAR) (pp. 457–461). IEEE.

A novel character segmentation reconstruction approach for license plate recognition

  • 1.
    Expert Systems WithApplications 131 (2019) 219–239 Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa A novel character segmentation-reconstruction approach for license plate recognition Vijeta Kharea , Palaiahnakote Shivakumarab , Chee Seng Chanb , Tong Luc,∗ , Liang Kim Mengd , Hon Hock Woond , Michael Blumensteine a Department of Computer Science and Software Engineering, Concordia University, Montreal, Canada b Faculty of Computer Systems and Information Technology, University of Malaya, Malaysia c National Key Lab for Novel Software Technology, Nanjing University, Nanjing, China d Advanced Informatics Lab, MIMOS Berhad, Kuala Lumpur, Malaysia e Faculty of Engineering and Information Technology, University of Technology Sydney, Australia a r t i c l e i n f o Article history: Received 26 November 2018 Revised 2 March 2019 Accepted 16 April 2019 Available online 18 April 2019 Keywords: Character segmentation Character reconstruction Stroke width Zero crossing Gradient vector flow License plate recognition a b s t r a c t Developing an automatic license plate recognition system that can cope with multiple factors is chal- lenging and interesting in the current scenario. In this paper, we introduce a new concept called partial character reconstruction to segment characters of license plates to enhance the performance of license plate recognition systems. Partial character reconstruction is proposed based on the characteristics of stroke width in the Laplacian and gradient domain in a novel way. This results in character components with incomplete shapes. The angular information of character components determined by PCA and the major axis are then studied by considering regular spacing between characters and aspect ratios of char- acter components in a new way for segmenting characters. Next, the same stroke width properties are used for reconstructing the complete shape of each character in the gray domain rather than in the gra- dient domain, which helps in improving the recognition rate. Experimental results on benchmark license plate databases, namely, MIMOS, Medialab, UCSD data, Uninsbria data Challenged data, as well as video databases, namely, ICDAR 2015, YVT video, and natural scene data, namely, ICDAR 2013, ICDAR 2015, SVT, MSRA, show that the proposed technique is effective and useful. © 2019 Elsevier Ltd. All rights reserved. 1. Introduction Creating a smart/digital/safe city has been one of the important emerging trends in both developing and developed countries in re- cent times. As a result, developing automatic systems has become an integral part of the above-mentioned initiatives (Rathore, Ah- mad, Paul, & Rho, 2016; Yuan et al., 2017). One such example is to develop intelligent transport systems for safety and mobility, and to enhance public welfare with the help of advanced technologies by recognizing license plates (Anagnostopoulos, Anagnostopoulos, Loumos, & Kayafas, 2006; Du, Ibrahim, Shehata, & Badawy, 2013; Suresh, Kumar, & Rajagopalan, 2007). There are transport systems proposed for recognizing license plates in the literature for applica- tions such as the automatic collection of toll fees, automatic mon- ∗ Corresponding author. E-mail addresses: shiva@um.edu.my (P. Shivakumara), cs.chan@um.edu.my (C.S. Chan), lutong@nju.edu.cn (T. Lu), liang.kimmeng@mimos.my (L.K. Meng), hockwoon.hon@mimos.my (H.H. Woon), Michael.Blumenstein@uts.edu.au (M. Blumenstein). itoring of car speeds on the road, automatic estimation of traf- fic volume at different traffic junctions, detection of illegal park- ing and incorrect traffic flows (Abolghasemi & Ahmadyafrd, 2009; Azam & Islam, 2016; Tadic, Popovic, & Odry, 2016). However, such a system only works well for a particular application since it is not developed for multiple applications. This is because any particular system can cope with a single adverse factor but not multiple fac- tors, which affect license plate visuals (Zhou, Li, Lu, & Tian, 2012). In addition, most of the existing systems that have been devel- oped use conventional binarization methods, which are proposed for plain background document images to localize and recognize license plates (Ghaili, Mashohor, Ramli & Ismail, 2013; Du et al., 2013; Yu, Li, Zhang, Liu, & Meng, 2015). It is obvious that for differ- ent real-time applications, multiple environmental effects are com- mon (e.g., low resolution, low contrast, complex backgrounds, blur due to camera or vehicle movements, illumination effects due to sunlight, headlights, degradation effects due to rain, fog or haze, and distortion effects due to camera angle variations). The illustration shown in Fig. 1 demonstrates that input image- 1 is affected by perspective distortion, while input image-2 is https://doi.org/10.1016/j.eswa.2019.04.030 0957-4174/© 2019 Elsevier Ltd. All rights reserved.
  • 2.
    220 V. Khare,P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 Fig. 1. Binarization and recognition results of license plate images affected by different effects, which result in varying recognition results. affected by blur as shown in Fig. 1(a). For these two license plate images, the binarization method (Zhou, Feild, Learned-Miller, & Wang, 2013), which is the state-of-the-art method and works well for low contrast and complex background images, fails to give good results for input image-1, but gives better results for input image-2 as shown in Fig. 1(b). However, the recognition results given by Tesseract OCR gives nothing for input image-1 due to touching, and incorrect results for input image-2 due to shape loss as shown in Fig. 1(b). On the other hand, the proposed method works well except for the first character in input image-1 through reconstruction-segmentation with the same OCR. With this illus- tration, one can conclude that there is an urgent need for develop- ing a system, which can withstand multiple adverse factors such that the same system can be used for several real-time applica- tions successfully. 2. Related work The proposed license plate recognition system involves char- acter segmentation through partial reconstruction, and complete reconstruction for recognition. Therefore, we review the research related to character segmentation, character recognition and character reconstruction. Character Segmentation: Phan, Shivakumara, Su, and Tan (2011) proposed a gradient-vector-flow based method for video character segmentation. The method uses text line length for finding seed points that are unreliable, and then uses mini- mum cost path estimation for finding spaces between characters. Sharma, Shivakumara, Pal, Blumenstein, and Tan (2013) proposed a new method for character segmentation from multi-oriented video words. The method is sensitive to dominant points. Liang, Shiv- akumara, Lu, and Tan (2015) proposed a new wavelet Laplacian method for arbitrarily-oriented character segmentation in video text lines. This method explores zero crossing points to find spaces between words or characters. The performance of the method degrades when an image contains noisy backgrounds. There are methods proposed for segmenting characters from license plate images. For example, Tian, Wang, Wang, Liu, and Xia (2015a) proposed a two-stage character segmentation method for Chinese license plates. This method relies on binarization for segmentation. Sedighi and Vafadust (2011) proposed a new and robust method for character segmentation and recognition in li- cense plate images. This method uses a classifier, and binarization for segmentation. As a result, the method is dataset dependent. Khare, Shivakumara, Raveendran, Meng, and Woon (2015) pro- posed a new sharpness-based approach for character segmentation of license plate images. The method explores gradient vector and sharpness for segmentation. However, the method is said to be sensitive to seed point selection and blur presence. Kim, Song, Lee, and Ko (2016) proposed an effective character segmentation approach for license plate recognition under varying illumination environments. The method uses binarization and the super pixel concept for segmentation. However, the method focuses on a single cause but not multiple causes. In the same way, recently, Dhar, Guha, Biswas, and Abedin (2018) proposed a system design for license plate recognition us- ing edge detection and convolutional neural networks. The method uses character segmentation as a preprocessing step for license plate recognition. For character segmentation, the method ex- plores edge detection, morphological operations and region prop- erties. However, the method is good for the images with sim- ple backgrounds but not for images affected by many challenges. Ingole and Gundre (2017) proposed character feature-based vehi- cle license plate detection and recognition. First, the method seg- ments characters from license plate regions for recognition. For character segmentation, the method proposes vertical and hori- zontal projection profile-based features. The proposed projection profile-based features may not be robust for the images with com- plex backgrounds. Radchenko, Zarovsky, and Kazymyr (2017) pro- posed a segmentation and recognition method for Ukrainian li- cense plates. The method segments characters based on connected component analysis. The connected component analysis works well when the input image is binarized without the loss of the charac- ter shapes and touching between the characters. However, for the images with complex backgrounds, it is hard to propose a binariza- tion method to separate foreground and background information. In summary, from the above context, we can conclude that most of the methods made an attempt to solve the problem of low resolution or illumination effects, but do not include other distor- tions such as blur, touching and complex backgrounds. In addition, none of the methods explore the concept of reconstruction for seg- menting characters from license plate images. Character Recognition: To recognize characters in text lines of video, natural scene images and license plate images, there are methods that use either binarization methods or classifiers (Ye & Doermann, 2015). For example, Zhou et al. (2013) proposed scene
  • 3.
    V. Khare, P.Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 221 text binarization via inverse rendering. The method proposes a different idea for adapting parameters that tune the method ac- cording to image complexity. However, the assumptions made for proposing a number of criteria limits its ability to work on dif- ferent applications. Wang, Shi, Xiao, and Wang (2015) proposed MRF-based text binarization for complex images using stroke fea- tures. The success of the method depends on how well it se- lects seed pixels from the foreground and background. Similarly, Anagnostopoulos et al. (2006) proposed a license plate recogni- tion algorithm for intelligent transportation applications. Since the method involves binarization and a classifier for recognition, it may not work well for images affected by multiple adverse effects such as low resolution, blur and touching. Saha, Basu, and Nasipuri (2015) proposed automatic license plate recognition for Indian li- cense plate images. The method involves edge map generation, the Hough transform and a classifier for recognition. The success of the method depends on edge map generation and a classifier. Gou, Wang, Yao, and Li (2016) proposed vehicle license plate recog- nition based on extremal regions and restricted Boltzmann ma- chines. The method extracts HoG features for detected characters, and then uses a classifier for recognition. In summary, it is noted from the above review of license plate recognition approaches that most of the methods consider binarization algorithms and classi- fiers for recognition. In addition, the methods do not consider im- ages affected by multiple factors for achieving their results. There- fore, the methods lose generality and the ability to work on license plate images of different background and foreground complexities. Deep Learning Models for Character Recognition: Jaderberg, Simonyan, Vedaldi, and Zisserman (2016) proposed an approach for reading texts in the wild with a convolutional neural network, which explores deep learning for achieving high recogni- tion results for texts in natural scene images. Goodfellow, Bulatov, Ibarz, Arnoud, and Shet (2013) proposed multi-digit number recognition from street view imagery using deep convolutional neural networks, which explores deep learning at the pixel level. Despite both methods addressing the challenges caused by natural scene images, they are limited to text recognition from high contrast images but not from low resolution license plate images and video images. Raghunandan et al. (2017) proposed a Riesz fractional-based model for enhancing license plate detection and recognition. This method makes an attempt to address the causes which affect license plate detection and recognition. Based on the experimental results, it is noted that enhancement of license plate images may improve the recognition results but it is not adequate for real time applications. Al-Shemarry, Li, and Abdulla (2018) proposed an ensemble of adaboost cascades of 3L-LBPs classifiers for license plate detection from low quality images. The method explores texture features based on LBP operations and uses a classifier for license plate detection from images affected by multiple adverse factors. However, the performance of the method heavily depends on learning and the number of labeled samples. In addition, the scope is limited to text detection but not recognition as in the proposed work. Text detection is easier than recognition in this case because detection does not require the full shapes of characters. Recently, inspired by the strong ability and discriminating power of deep learning models, some methods have explored dif- ferent deep learning models for license plate recognition. For ex- ample, Dong, He, Luo, Liu, and Zeng (2017) proposed a CNN- based approach for automatic license plate recognition in the wild. The method explores an R-CNN for license plate recog- nition. Bulan, Kozitsky, Ramesh, and Shreve (2017) proposed segmentation-and annotation-free license plate recognition with deep localization and failure identification. The method explores CNNs for detecting a set of candidate regions. Then it filters false positive from the candidate regions based on strong CNNs. Silva and Jung (2018) proposed license plate detection and recog- nition in unconstrained scenarios. The method explores CNNs for addressing challenges caused by degradation. It detects the license plate region first and then the detected region is fed to an OCR for recognition. Lin, Lin, and Liu (2018) proposed an efficient li- cense plate recognition system using convolution neural networks. The method detects vehicles for license plate region detection and then it explores CNNs for recognition. Yang et al. (2018) proposed Chinese vehicle license plate recognition using kernel-based ex- treme learning machines with deep convolutional features. The method explore the combination of CNN and ELM (extreme learn- ing machines) for license plate recognition. It is found from the above discussion on deep learning models that the methods work well when we have a huge number of labeled predefined samples. However, it is hard to choose predefined samples that represent all possible variations in license plate recognition, especially for the images affected by multiple adverse factors as in the proposed work. In addition, deep learning has its own inherent limitations such as optimizing parameters for different databases and main- taining stability of deep neural networks (Liu et al., 2017). It can be noted from the above discussion that there is a gap between the state-of-the-art methods and the present demand. This observation motivated us to propose a new method for license plate recogni- tion without depending much on classifiers and a large number of labeled samples, as in the existing methods. Character Reconstruction: Similar to the proposed work, there are methods in the literature, which reconstruct character shapes to improve recognition rates without the help of classifiers and bi- narization algorithms. Shivakumara, Phan, Bhowmick, Tan, and Pal (2013) proposed a ring radius transform for character shape re- construction in video. Its performance is good as long as Canny produces the correct character structures. However, it is true that Canny is sensitive to blur and other distortions. To overcome this drawback, Tian, Shivakumara, Phan, Lu, and Tan (2015b) proposed a method for character shape restoration using gradient orienta- tions. It finds the medial axis in the gradient domain with differ- ent directions. However, the method does not work well for char- acters having blur and complex backgrounds. In addition, the pri- mary objective of this work is to reconstruct the characters from video, which suffer from low resolution and low contrast, but does not deal with license plate images. In light of the above discussions on the review of character seg- mentation from license plate images, character recognition from li- cense plate images and character reconstruction, most of the meth- ods focus on a particular dataset and certain applications, such as natural scene images or video images or license plate images. As a result the scope of the above methods is limited to specific appli- cations and objectives. This motivated us to propose a method that can work well for license plate images, natural scenes and video images. In addition, license plates images are generally affected by multiple adverse factors due to background and foreground varia- tions, making the problem of recognition more complex and inter- esting. Inspired by the work Shivakumara et al., 2019) where keyword spotting is addressed for multiple types of images with powerful feature extraction, we propose a novel idea for recognizing char- acters from license plates affected by multiple factors. The key contributions of the proposed work are as follows: ((1) Propos- ing partial reconstruction for segmenting characters from license plate images is novel; (2) Reconstructing complete shapes of char- acters from segmented characters without binarization, which can work well for not only license plate images but also natural scene and video images, is also novel; (3) The combination of reconstruc- tion and character segmentation in a new way is another inter- esting step to achieve good recognition rates for multi-type im- ages. The main advantage of the proposed method is that since the
  • 4.
    222 V. Khare,P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 proposed reconstruction approach preserves character shapes, the performance of the method does not depend much on classifiers and the number of training samples. The proposed method is structured as follows. Stroke width pair candidate detection is illustrated by estimating stroke width dis- tances for each pixel in the images in Section 3.1. In Section 3.2, we propose symmetry properties based on stroke width distances to obtain partial reconstruction results. Section 3.3 proposes char- acter segmentation using partial reconstruction results based on principal and major axis information of the character components. We describe the steps for complete reconstruction in the gray do- main in Section 3.4. 3. Proposed technique This work considers license plates affected by multiple fac- tors according to various applications, such as low resolution, low contrast, complex backgrounds, multiple fonts or font sizes, blur, multi-orientation, touching elements and distortion due to illumi- nation effects, as input for character segmentation and recognition. To overcome the problem of low contrast and low resolu- tion, inspired by Laplacian and gradient operations, which usu- ally enhance high contrast information at the edges or near edges by suppressing background information (Phan et al., 2011; Liang et al. 2015; Khare et al., 2015), we propose Laplacian and gra- dient information for finding pixels which represent stroke width (thickness of the stroke) of characters in license plate images. This is justified because the Laplacian process, which is the second or- der derivative, gives high positive and negative values at the edges and near edges, respectively. Similarly, the gradient, which is the first order derivative, gives high positive values at the edges and near edges. This information is used for Stroke Width Pair (SWP) candidate detection. It is true that stroke width or stroke width distance and color remain constant throughout characters regard- less of font or font size variations (Epshtein, Ofek, & Wexler, 2010) at the character level. Most of the time, license plates are pre- pared using upper case letters. Furthermore, the spacing between characters in license plate images is almost constant. Based on these facts, we propose new symmetry features which use Lapla- cian and gradient properties at the SWP candidates to find neigh- boring SWPs. However, due to complex backgrounds, severe illu- mination effects and blur, there is a possibility for SWPs to fail in satisfying the symmetry features. This results in the loss of infor- mation and hence we consider the output of this step as partial reconstruction. We believe that the output of partial reconstruc- tion results preserve the structure of character components. This may lead to under-and over-segmentation. It is understood that Eigen vectors of PCA give angles based on the number of pixels which contribute to the direction of character components (Shivakumara, Dutta, Tan, & Pal, 2014). In other words, to estimate the possible angle of the whole character, PCA does not requires the full character information. As per our experiments, in general, if the character contains more than 50% of pixels, one can expect almost the same angle of the actual character. The same thing is true for angle estimation via the major axis of the charac- ter. With this motivation, we use angle information given by PCA and the Major Axis (MA) to estimate angles of character compo- nents. The angle information between PCA and MA is explored for character segmentation. Since the proposed symmetry properties are sensitive to blur, touching and complex backgrounds, we pro- pose the same symmetry properties with weak conditions in the gray domain instead of Laplacian and gradient domains to recon- struct the full character shape with the help of the Canny edge image of the input image. This is possible because there is no in- fluence from neighboring characters after segmenting characters from the image. The reconstructed characters are passed to Tesser- Fig. 2. Pipeline of the proposed method. act OCR for recognition. The flow of the proposed method is shown in Fig. 2. 3.1. Stroke width pair candidates detection As mentioned in the previous section, the stroke width dis- tances (thickness of the stroke) of characters in a license plate im- age are usually the same as shown in Fig. 3(a). To extract stroke width distance, we propose a Laplacian operation which gives high positive and negative responses for the transition from background to foreground and vice versa, respectively. This results in search- ing two zero crossing points that define stroke width distance as shown in Fig. 3(b) and 3(c), where a pictorial representation of the marked region in Fig. 3(b) is shown. Since the input images con- sidered have complex backgrounds and small orientations due to angle variations, we use the following mask to extract horizontal and vertical diagonal zero crossing points. Due to background vari- ations and noise introduced by the Laplacian operation as shown in Fig. 3(b), background and noise pixels may contribute to defin- ing stroke width distances. Therefore, to overcome this issue, we plot a histogram for stroke width distances as shown in Fig. 3(c). The distances are chosen from those contributing to the highest peak as candidate stroke width pairs, which are shown in Fig. 3(d), where one can see all the red pixels denoting stroke width pair candidates. This is justified because the stroke pixel pairs that de- fine actual stroke width distance are higher than the pixel pairs defined by background or noise pixels. In this way, the proposed step can withstand the cause of background noise and degrada- tions. It may be noted from Fig. 3(d) that Stroke Width Pair (SWP) candidates represent character strokes. In addition, each character has a set of SWPs. It is evident from Fig. 3(e) that the proposed technique detects SWPs for the complex image in Fig. 1(a), where touching exists due to perspective distortion. It is noted from Fig. 3(d) that the number of red pixels for the characters are different from one character to another. This is be- cause the proposed steps estimate stroke width distance by con- sidering all the pixels of characters but not the pixels of individual characters. Since we consider the common stroke width distance of the pixels in the image, the number of stroke width pairs vary from one region to another due to background complexity. As a result, all the pixels of characters may not contribute to the high- est peak in the histogram. Therefore, one cannot predict the num- ber of stroke width pairs for each character as shown in Fig. 3(d). However, the proposed method has the ability to restore the char- acter shape with one stroke width pair of each character by the partial reconstruction step. We believe that each character gets at least one stroke width pair from the histogram operation for the partial reconstruction step because they follow the same font size
  • 5.
    V. Khare, P.Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 223 Fig. 3. Stroke width pair candidate detection. and typeface. Laplace Mask = 1 1 1 1 −8 1 1 1 1 3.2. Partial character reconstruction The proposed technique considers SWP candidates given by the previous section as the representatives to find neighboring SWP candidates, which define stroke width of the character. To achieve this, for each SWP candidate, the proposed technique considers eight neighbors of two stroke pixels and then checks all the com- binations to identify the correct SWP as shown in Fig. 4(a), where we can see the process of searching for the right neighbor SWP. In this work, the proposed method uses an 8-directional code for searching the correct stroke width pair; one can expect 8 neigh- bor pixels for each stroke pixel of the pair. Therefore, the total number of combinations is 8 × 8 = 64 pairs. The reason to con- sider 8 neighbors for each stroke pixel is to ensure that the step does not miss checking any pair of pixels. Since stroke pixels rep- resent edge pixels of characters, we can expect high gradient val- ues compared to their background. Similarly, the pixel value be- tween the stroke pixels represent a homogeneous background, and the gradient gives low values for the pixels compared to the gradi- ent values of the stroke pixels as shown in Fig. 4(b) (Khare et al., 2015). Therefore, we study the gradual changes from high to low and low to high as shown in Fig. 4(c), where we can see grad- ual changes in gradient values which are defined as the Gradient Symmetry (GS) feature. When we look at the Gradient Vector Flow (GVF) of the stroke pixels, as shown in Fig. 4(d), we can observe arrows, which are pointing towards the edges; the direction of the arrows of two stroke pixels have opposite directions. This is called the GVF Symmetry (GVFS) feature as shown in Fig. 4(e). Similarly, we consider the value of a positive peak of the Laplacian and the difference between the positive and negative peak values for find- ing symmetry. In this way, we find the neighboring SWP of each SWP candidate as shown in Fig. 4(f), where one can see positive and positive-negative peaks. This is called the Laplacian Symme- try (LS) feature. The proposed technique extracts four symmetry features for each SWP candidate, and then checks the four sym- metry features with all 64 combinations. Subsequently, it chooses the combination which satisfies the four symmetries as the neigh- boring SWP, and the pair will be displayed as white pixels. The identified neighbor SWP is considered as an SWP candidate, and again the whole process repeats recursively to find all the neigh- boring SWPs in the image. This process stops when it visits all SWPs. However, the number of iterations depends on the complex- ity of the characters and the number of SWPs of each character. As long as the stroke width pair satisfies the symmetry properties, the partial reconstruction step restores the contour pixels of the characters. When SWPs fail to satisfy the symmetry properties or there are no more SWPs to visit, the iterative process terminates.
  • 6.
    224 V. Khare,P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 Fig. 4. Exploiting symmetrical features for finding neighbor SWPs from 64 combinations. This is the reason to obtain the partial shape of the character by partial reconstruction as shown in Fig. 5, where we can see the in- termediate steps for the partial reconstruction results. It can also be noted from Fig. 5 that the partial reconstruction results provide the structures of the characters with some loss of information. The four symmetrical features are defined specifically as fol- lows. (i) If GSW = {gSW1,gSW2……,gSWn} and, GNP = {gNP1,gNP2,……, gNPn}, where n is the size of the stroke width (SW), gSWn and gNPnrepresents the gradient value of the stroke width and Neighbor Pair (NP) at location n, respectively. Then NP = =1iff {gSW1 = =gNP1,gSW2 = =gNP2,……gSWn = =gNPn} Gradient symmetries can be visualized as in Fig. 4(b).
  • 7.
    V. Khare, P.Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 225 Fig. 5. Intermediate and final partial reconstruction results. (ii) Angle information of GVF at the starting point (sp) and end point (ep) of the stroke width is represented as: GVFSW(sp) and GVFSW(ep). Then NP = =1iff GVFNP(sp) == GVFSW (sp) GVFNP(ep) == GVFSW (ep) where GVFNP(sp) and GVFNP(ep) represent the angle informa- tion of GVF at the starting point and end point of NP, re- spectively. GVF angle symmetry can be visualized as in Fig. 3(e). (iii) The peak values of stroke width Laplace (L) at the start- ing point and end point are respectively represented as P_LSW (sp) and P_LSW (ep), and the peak values of neighbor pair Laplace starting point and end points are, respectively de- noted by P_LNP(sp) and P_LNP(ep). Then NP = =1iffP_LNP(sp) == P_LSW (sp) P_LNP(ep) == P_LSW (ep) (iv) Similarly, the highest peak to the lowest peak of the Laplace zero-crossing difference is also used for comparing neigh- bor pairs. Here the highest and lowest peaks of Laplace zero-crossing points for stroke width can be represented as: hP_LSW and lP_LSW and for the neighbor pair hP_LNP and lP_LNP. Then the high to low difference can be defined as: DiffSW = hP_LSW − lP_LSW , DiffNP = hP_LNP − lP_LNP Then NP = =1iff DiffNP = DiffSW Laplace symmetries (iii) and (iv) can be visualized as in Fig. 4(f). 3.3. Character segmentation When we look at the partial reconstruction results given by the previous section as shown in Fig. 5(f) and 5(g), one can understand that even though there is a loss of shape, it still provides enough structure, which helps us to find the spacing between characters and character regions for segmentation. As mentioned in the proposed Methodology Section, Principal Component Analysis (PCA) and the Major Axis (MA) do not require the full character shape to estimate possible directions of character components. It is also noted that most license plate images including Malaysian license plates contain upper case letters with numerals, but not the combination of upper case with lower case letters. According to the statement in Yao, Bai, Liu, Ma, and Tu (2012) that “for most text lines, the major orientations of characters are nearly perpendicular to the major orientation of the text line”, both PCA and MA should give approximately 90° if characters in the text are aligned in the horizontal direction. The above observations can be confirmed from the sample results of partial reconstruction on alphabets, namely, A to Z, and numerals, namely, 0–9, chosen from the databases shown in Fig. 6, where we note that for both alphabet and numeral images, PCA (yellow color axis) and MA (red color axis) give angles, which are almost the same and ap- proximately 90° because all the images are inclined in the vertical direction. Similarly, the same conclusion can be drawn from the re- sults shown in Fig. 7(a)-7(b), where we present PCA and MA angle information for the images affected by low contrast, complex back- grounds, multi-fonts, multi-font sizes, blur and perspective distor- tion. In the same way, the sample partial reconstruction results shown in Fig. 8(a)-8(b) for the images of two character compo- nents show that PCA and MA give angles of almost 0° as character components, which are aligned towards the horizontal direction. The results in Figs. 6, and 7 show that partial reconstruction has the ability to preserve character shapes regardless of differ- ent causes, while PCA and MA have the ability to give the angle of character orientation without the complete shape of the char- acter components. This observation leads to define the following hypothesis for character segmentation. If both the axes give almost 90° with a ± 26 difference, then the component is considered as a full character, else if both the axes give almost zero degrees with a ± 26 difference then the component is considered to be an under- segmentation. This is possible when two character components are joined together as shown in Fig. 8. Otherwise, the component is considered as a case of over-segmentation. This occurs when a character loses shape. The value of ±26 is determined based on experimental results, which will be presented in the Experimen- tal Section. The reason to fix such a threshold is that segmentation requires either a vertical or horizontal orientation. With this idea, the proposed technique classifies components from the partial re- construction results into three cases. In general, characters in license plate images share the same aspect ratio especially height of characters, as shown in Fig. 5(a). This observation motivated us to find the width of components of three cases. If partial reconstruction outputs characters with clear
  • 8.
    226 V. Khare,P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 Fig. 6. Angle information given by PCA and MA for the alphabets and numerals of license plate images. The MA axis is represented by a red color and the PCA axis is represented by a yellow color. Fig. 7. PCA and MA angle information of the partial reconstruction result for the different distorted images. Fig. 8. PCA and MA angle information of the partial reconstruction results for the image of two character components. shapes, and all the components are classified as an ideal charac- ter case according to angular information, the proposed technique considers the width which contributes to the highest peak in the histogram as the probable width. If the proposed technique does not find a peak on the basis of width, it considers the average of the width of the characters as a probable width. The same prob- able width is used for segmenting characters as shown in Fig. 9, where for the input license plate images in Fig. 9(a) and Fig. 9(b), the proposed technique plots histograms using the probable width as shown in Fig. 9(c), and the segmentation results given by the probable width are shown in Fig. 9(d) and Fig. 9(e), respectively. Fig. 9(d) and 9(e) show that segmentation is performed using probable width segments in almost all the characters for image- 1 in Fig. 9(a) except for “12. For image-2 in Fig. 9(b), it segments almost all the characters except for “W” and “U”. Therefore, seg- mentation with probable widths is good in ideal cases as shown in Fig. 9(f), where for the complex image in Fig. 3(e), the proba- ble width segments all the characters successfully using the partial reconstruction results. However, it is not true for all the cases. For example, it results in under-segmentation and over-segmentation as shown in Fig. 9(d) and Fig. 9(e), respectively. To solve the problem of under-segmentation given by the prob- able width, we propose an iterative-shrinking algorithm, which re- duces small portions of components from the right side with a step size of five pixels in the partial reconstruction results, and then checks angle information of ideal characters. The proposed tech- nique investigates whether the angle difference between PCA and MA leads to an angle of 90° or not, iteratively. When the angle
  • 9.
    V. Khare, P.Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 227 Fig. 9. Character segmentation using probable widths. Fig. 10. Iterative-shrinking process for under-segmentation. (a) gives the case of under segmentation, and (b) shows the intermediate results of the iterative process. difference satisfies the condition of an ideal character, the itera- tive process stops, and the character is considered as an individual component. Since under-segmentation usually contains two char- acters such as “12, the iterative process segments such cases suc- cessfully. This process is tested on all the components from the results of partial reconstruction to solve the problem of under- segmentation. The process of iterative-shrinking is illustrated in Fig. 10, where (a) is a sample of an under-segmentation case, (b) gives the in- termediate results of the iterative process, and (c) shows the final results. It is observed from Fig. 10(b) that the angle difference be- tween axes given by PCA and MA reduces as the iterations con- tinue, and subsequently stops when both the axes give the same angle. In the same way of iterative-shrinking for under-segmentation, we propose iterative-expansion to solve the over-segmentation cases. For each component given by the probable width, the pro- posed technique expands with a step size of five pixels from the left side. At the same time, in the partial reconstruction results, it calculates the angle differences of PCA and MA. This process continues until it gets the angle of almost zero degrees. When two characters are merged, the iterative process gets an angle of zero degrees by both PCA and MA. At this point, the itera- tive process stops and then we use the iterative-shrinking algo- rithm to segment both the characters. Therefore, the proposed iterative-expansion uses iterative-shrinking for solving the over- segmentation problem. Note that the proposed technique first em- ploys iterative-shrinking to solve the under-segmentation, then it uses iterative-expansion for solving over-segmentation. This is because iterative-expansion requires iterative-shrinking. The rea- son to propose an iterative procedure for both shrinking and ex- pansion is that when a character component is split into small
  • 10.
    228 V. Khare,P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 Fig. 11. Iterative-expansion process for over-segmentation. fragments due to adverse factors or when character components are joined together, it is necessary to study local information in order to identify the vertical and horizontal cases. The process of iterative-expansion is illustrated in Fig. 11, where (a) shows the cases of under-segmentation, (b) shows intermediate results of par- tial reconstruction of (a), (c) gives the results of iterative-expansion followed by shrinking for correct segmentation, and (d) gives the final character segmentation results. 3.4. Complete character reconstruction Section 3.2 described the method to obtain partial character re- construction for input license plate images, and the method pre- sented in Section 3.3 uses the advantage of partial reconstruction for character segmentation. Since characters are segmented well from license plate images even when they are affected by multiple factors, we apply Canny to obtain edges to reconstruct complete shapes of characters for each incomplete shape given by partial re- construction. This is because Canny gives fine edges for low and high contrast images when we supply individual characters rather than the whole license plate image (Saha et al., 2015). Therefore, we consider the output of Canny as the input for reconstructing missing information in partial reconstruction results. For the Canny edge of the input character image shown in Fig. 12(a), the proposed technique finds the Stroke Width Pair (SWP) candidates as described in Section 3.2, where we can see the characters “W” and “5 given by partial reconstruction of lost shapes. The SWP are considered as representatives for reconstruc- tion in this Section. For each SWP, as the proposed technique de- fines symmetrical features using gradient values, gradient vector flow and Laplacian, we define the same symmetry features using gray information rather than gradient information. This is because according to our analysis of the experimental results, the gradient does not give good responses for low contrast, low resolution and distorted images. This is the main reason for the loss of shapes and the same thing has led to partial reconstruction. Since characters are segmented and pixels have uniform color values, we propose symmetry features in the gray domain to restore the rest of the incomplete information for partial reconstruction results to obtain complete character shapes. For SWP, the proposed technique calculates a tangent angle as defined below: Angletan = tan((y − y1)/(x − x1)) where (x, y) is the starting pixel location of the SWP, and (x1,y1) is the location of its neighbor pixel. Since the tangent angle between the pixel of SWP and the neighbor pixel gives a direction, the pro- posed technique finds the neighbor pixel in the same direction with the same stroke width distance to restore the neighbor SWP. As long as the difference between the tangent angle of the current pixel and the neighbor pixel remains the same, and the neighbor pair satisfies the stroke width distance of SWP, the proposed tech- nique moves in the same direction to restore the neighbor SWP. This process works well when straight strokes are present, whilst at curves and corners the tangent angle gives a high difference. Moreover, this tangent-based restoration works well for individ- ual characters but not for the whole license plate image, where this tangent direction may be a guide for touching, adjacent char- acters. In this situation, the proposed technique recalculates the stroke width using eight neighbors of SWP pixels as we calculated in Section 3.2. To find the right combination SWP out of 64, we de- fine symmetry features as the intensity value at the first pixel, and the second pixel has almost the same value as shown in Fig. 12(b), which is called the Peak Intensity Symmetry (PIS). The intensity values between the first and second pixels of SWP should have gradual changes from high to low and low to high as shown in Fig. 12(c), which is called the Intensity Symmetry (IS). If the com- bination of SWP satisfies the above two symmetry features, the pair is considered as actual contour pixels and displayed as white pixels, which are shown in Fig. 12(d), where one can see that the lost information in Fig. 12(a) is restored. The potential of com- plete character reconstruction for license plate images shown in Fig. 13(a) can be seen in Fig. 13(b) where shapes are restored, and the recognition results in Fig. 13(c) illustrate correct OCR recogni- tion results for both the license plate images. In summary, the gradient domain helps us to define symmetry properties, and at the same time, it misses vital pixels of charac- ters due to sensitivity to low contrast and low resolution, which results in partial character reconstruction. To overcome this prob- lem, the proposed method defines the same properties using gray values rather than gradient values. This is because the segmented character does not have an influence on complex backgrounds and it understood that the pixel of the characters have almost uniform values. Therefore, the combination of the properties in gradient and gray domains help us to restore the missing information. In other words, the partial reconstruction helps in the accurate seg- mentation of characters while segmentation helps in restoring the complete shape using intensity values in the gray domain. 4. Experimental results To evaluate the effectiveness of the proposed technique for real-time applications, we consider the dataset provided by MI- MOS, which is the institute funded by the Government of Malaysia where License Plate Recognition (LPR) is a live ongoing project. The dataset consists of 680 complex license plate images with various challenges, such as poor quality images where we can expect low contrast, blurred images, and character-touching images due to il- lumination effects, sun light, or headlights at night.
  • 11.
    V. Khare, P.Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 229 Fig. 12. Complete reconstruction in the gray domain. Fig. 13. Effectiveness of the complete reconstruction algorithm. To demonstrate the merit of the proposed technique, we con- sider standard datasets that are available publicly, namely, the UCSD dataset (Zamberletti, Gallo, Noce, 2015) with 1547 im- ages, which have a variety of challenges including the presence of blur, license plate images with very small font captured from a substantial distance, and low resolution images. The Medialab dataset (Zamberletti et al., 2015) contains 680 license plate im- ages, which have a variety of font sizes, illumination effects, and shadow effects. The Uninsbria dataset (Zamberletti et al., 2015) containing 503 license plate images captured from nearby, are better quality compared to the UCSD and Medialab datasets, but generally have more complex backgrounds. In total, we con- sidered 3410 license plate images for experimentation, covering multiple factors that were mentioned in the Introduction sec- tion. In addition, we chose 100 license plate images that are af- fected by multiple adverse factors as mentioned above from all the license plate datasets to test the ability and effectiveness of the proposed technique, which are termed as challenging data. This data does not include ‘good’ (easy) images like in other datasets. Since the proposed technique is capable of handling mul- tiple causes, we test the ability of the proposed technique on other standard datasets, such as ICDAR 2013 which has 28 videos (Karatzas et al., 2013), YVT which has 29 videos (Nguyen, Wang, Belongie, 2014), and ICDAR 2015 which has 49 videos (Karatzas et al., 2015). These datasets are popular for text detection and recognition in order to evaluate the method. These datasets include low resolution, low contrast, complex back- grounds, and multiple fonts, sizes, or orientations. Similarly, for natural scene datasets, we use ICDAR 2013 (Karatzas et al., 2013), which has 551 images, SVT which has 350 images (Wang Be- longie, 2010), MSRA-500 which has 500 images (Yao et al., 2012) and ICDAR 2015 which has 462 images (Karatzas et al., 2015). The reason to consider natural scene datasets for experimentation is to show that when the proposed technique works well for low res- olution and low contrast images, it will also work for high res- olution and high contrast images. The main differences between the video datasets and these datasets include contrast and reso- lution. In other words, video datasets suffer from low resolution and low contrast, while natural scene datasets provide high con- trast and high resolution images. In total, 3510 license plate im- ages, 106 videos, and 1863 natural scene images are considered for experimentation to demonstrate that the proposed technique is ro- bust, generic and effective.
  • 12.
    230 V. Khare,P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 The proposed technique involves a reconstruction step and a character segmentation step. To evaluate the reconstruction step, we follow the standard measures and scheme used in Peyrard, Bac- couche, Mamalet, and Garcia (2015) for calculating measures, namely, Peak Signal to Noise Ratio (PSNR), Root Mean Square Er- ror (RMSE), and Mean Structural Similarity (MSSIM) as defined be- low. Since the measures used (Peyrard et al., 2015) are proposed for evaluating the quality of handwritten images, we therefore pre- fer these measures for evaluating the reconstruction steps of the proposed technique. PSNR = 1 N N i=1 PSNRi (1) RMSE = 1 N N i=1 RMSEi (2) MSSIM = 1 N N i=1 MSSIMi (3) For character segmentation, we use standard measures pro- posed in Phan et al. (2011), where the same measures are used for character segmentation, namely, Recall (R), Precision (P), F- measure (F), UnderSegmentation (U) and OverSegmentation (O). The definitions for the measures are as follows. Truly Detected Character (TDC): A segmented block that con- tains correctly-segmented characters. Under-Segmented Blocks (USB): A segmented block which con- tains more than one characters. Over-Segmented Blocks (OSB): A segmented block that contains no complete characters. False detected block (FDB): A segmented block that does not contain any characters; for example, intermediate objects, bound- ary or a blank space. The measures can be calculated as follows. Recall(R) = TDC/ANC, Precision(P) = TDC/(FDB), F − measure(F) = (2 ∗ P ∗ R)/(P + R). UnderSegmentation(U) = USB/ANC OverSegmentation(O) = OSB/ANC To validate the reconstruction step that preserves charac- ter shapes, we consider the character recognition rate as a measure for reconstructed images with the publicly available Tesseract OCR, 2016. For the purpose of the evaluation of the recognition results, we follow the definitions of Recall (RR) as Pre- cision (RP) and F-measures (RF) as in Ben-Ami, Basha, and Avi- dan (2012), because these definitions are proposed for Bib number recognition. Since Bib number and license number have a similar- ity, we prefer to use these measures. RR is defined as the percentage of correctly recognized charac- ters out of the total number of characters (ground truth), and RP is defined as the percentage of correctly recognized characters out of the total number of recognized characters. For the F-measure, we use the same formula employed for evaluating the segmentation step for combining RR and RP into one measure. Note that since there is no ground truth available for license plate datasets, for MIMOS, UCSD, Medialab, and Uninsbria, we manually count the Actual Number of Characters (ANC) as the ground truth. For standard video and scene image datasets, we use the available ground truth and evaluation schemes as instructions in the ground truth. In order to show the usefulness and effectiveness of the pro- posed technique, we implement existing character segmentation methods, namely, Phan et al. (2011), which use minimum cost path estimation for character segmentation in video; the method of Khare et al. (2015) proposes sharpness features for character seg- mentation in license plate images, and the method of Sharma et al. (2013), uses the combination of clusters analysis and the min- imum cost path estimation for character segmentation in video to facilitate comparative studies. The main reason for selecting these existing methods is that an existing method which focuses on a single factor may not work well for license plate images af- fected by multiple factors. Phan et al.’s method addresses low res- olution and low contrast factors, Khare et al.’s method is a re- cent one that addresses license plate issues to some extent, and Sharma et al.’s method addresses multi-oriented and touching fac- tors. Dhar et al. (2018) proposed a system design for license plate recognition by using edge detection and convolutional neural net- works. Ingole and Gundre (2017) proposed character feature-based vehicle license plate detection and recognition. Radchenko et al. (2017) proposed a method of segmentation and recognition of Ukrainian license plates. The reason to choose these methods is that the objective of the methods is the same as the proposed work. However, the methods are confined to specific applications. In the same way, we choose the state-of-the-art recognition methods, namely, the method of Zhou et al. (2013), which is a robust binarization approach that works well for high resolu- tion and low contrast images: the method of Tian et al. (2015a), which is a recent approach proposed for the recognition of video characters through shape restoration, and the method of Anagnostopoulos et al. (2006) which proposes an artificial neural network for character recognition in license plate images. The mo- tivation to choose these methods for the comparative study is that Zhao et al.’s method is the state-of-the-art approach which rep- resents recognition of scene characters through binarization, Tian et al.’s method is the state-of-the-art approach which represents recognition of video characters through reconstruction, and the method of Anagnostopoulos et al. (2006), is the state-of-the-art approach recognizing characters in license plates through classi- fiers. Since the proposed technique is robust to multiple factors, we chose these methods to work on different datasets for undertaking a comparative study to validate the strengths of the proposed tech- nique. Additionally, we also consider the following methods that explore the recent deep learning models for license plate recog- nition. Bulan et al. (2017) proposed segmentation-and annotation- free license plate recognition with deep localization and failure identification. The method explore CNNs for detecting a set of can- didate regions. Silva and Jung (2018) proposed license plate de- tection and recognition in unconstrained scenarios. The method explores CNNs for addressing challenges caused by degradation. Lin et al. (2018) proposed an efficient license plate recognition sys- tem using convolution neural networks. For finding the value for the parameters, threshold, symmetry properties and conditions, we randomly chose 500 sample images from the dataset for experimentation. Since the proposed method does not involve classifiers for training, we prefer to choose sam- ples randomly from all the databases considered in this work for experimentation. We use a system with an Intel Core i5 CPU with 8 GB RAM configuration for all experiments. According to our ex- periments, the proposed method consumes 30 ms for each im- age, which includes partial reconstruction, character segmentation, complete character reconstruction and recognition. In Section 3.3, we define three hypotheses for ideal charac- ter detection, over-segmentation and under-segmentation based on the principal (PCA) and major axes (MA). It is expected that the PCA gives the same angles for the ideal characters. However, it is not the case due to the complexity of the problem. Therefore, we
  • 13.
    V. Khare, P.Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 231 Fig. 14. Determining the optimal value for the threshold of PCA and MA to check whether a segmented character is ideal or not. Note: at a 26 angle value, the recognition rate is high compared to other percentage values. Fig. 15. Determining the percentage of missing pixels to define partial reconstruction and the threshold value for angle difference between PCA and MA angles. set ±26° as a threshold for character segmentation using partial reconstruction results. To determine the value, we conduct experi- ments for 500 samples chosen randomly by varying different angle values against the recognition rate as shown in Fig. 14, where it is observed that for angle, 26, the proposed method reports a high recognition rate. Hence, we choose the same value for all the ex- periments in this work. In Section 3.2, the proposed method introduces the partial re- construction concept for character segmentation. It is expected that the partial reconstruction step outputs the structure of the character shape such that at least a human could read the char- acter. The question is how to define the partial reconstruction in terms of quantity. Therefore, we conducted experiments by esti- mating the number of missing pixels compared to the pixels in the ground truth. In this experiment, we manually add noise and blur at different levels to make the character images complex such that they lose pixels. We calculate the percentage of missing pixels with the help of the ground truth. We illustrate sample results for different percentages of missing pixels during partial reconstruc- tion as shown in Fig. 15(a) where we can see angles given by PCA, MA, the difference between the PCA and MA angle and different percentages of missing white pixels. It is observed from Fig. 15(a) and 15(b) that for 90% to 40%, the proposed method constructs the complete shape of the character and obtains correct recognition re- sults. But for lower than 40%, the proposed method loses the shape of the character, which results in incorrect recognition. Based on this experimental analysis, we consider 40% as the threshold to de- fine partial reconstruction results in this work. It is also noted from Fig. 15(a) that for a difference angle, 28.2, the proposed criteria for character segmentation fail as the OCR gives incorrect results. It is evident that ±26 is a feasible threshold to achieve better results. 4.1. Experiments for analyzing the contributions of individual steps of the proposed technique The major contributions of the proposed technique are partial reconstruction, character segmentation and complete reconstruc- tion. To understand the effectiveness of each step, we conducted experiments on the MIMOS dataset and calculated the respec- tive measures as reported in Table 1. The reason for selecting the Table 1 Performances of individual steps of the Proposed Technique on the MIMOS dataset. Steps Quality measures Segmentation measures Recognition PSNR RMSE MSSIM R P F O U RR RP RF PR 12.3 69.4 0.29 23.7 13.1 18.5 SWR 16.3 8.4 11.1 33.3 26.6 CRWS 8.6 78.6 0.24 14.4 13.2 13.8
  • 14.
    232 V. Khare,P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 MIMOS dataset is that it consists of live data provided by a re- search institute. To estimate the quality measures for partial re- construction and complete reconstruction, we use the Canny edge images of English alphabets created artificially as the ground truth. It is noted from the quality measures of the partial reconstruction reported in Table 1 that except MSSIM, the other two measures re- port poor results. This shows that the partial reconstruction steps preserve the character structures, at the same time, some informa- tion is lost. It is evident from the recognition results of the partial reconstruction reported in Table 1 that all three measures report low results. Therefore, one can ascertain that partial reconstruction alone may not help us to achieve better results. To analyze the ef- fectiveness of the segmentation step, we apply the segmentation step on the Canny edge images of the input characters without partial reconstruction (SWR) results. It is observed from the mea- sures of segmentation that they all report low results, especially under- and over-segmentation report poor results. This shows that the segmentation step alone is inadequate for solving the problem of segmentation for license plate images. Similarly, we apply the steps of the complete reconstruction algorithm on the Canny edge image of each input image without segmenting characters (CRWS). The results reported in Table 1 show that the quality measures report low results except for MSSIM, and the measures of recog- nition also report poor results. Therefore, we can argue that the symmetry features proposed for complete reconstruction are not good when we apply them on the whole image without segmen- tation. Overall, we can conclude that reconstruction and character segmentation are complementing each other to achieve better re- sults. In the case of license plate recognition, when the images are af- fected by multiple causes, sometimes, we can expect a little elon- gation, such as the effect of perspective distortion. To show the effect of elongation created by multiple causes, we implemented the method in Dhar et al. (2018) where the method considers ex- trema points for correcting small tilts to the horizontal direction. In this work, we calculate quality measures, segmentation mea- sures and recognition rate before and after rectification on the MI- MOS dataset as reported in Table 2. Before rectification the images are considered as input without correcting the small tilt in the hor- izontal direction for experimentation. After rectification, the cor- rected images are considered for experimentation. It is found from Table 2 that the results of all the steps including the proposed method give slightly better results after rectification compared to before rectification. However, the difference is marginal. Therefore, we can conclude that overall, if we use rectification before recog- nizing the license plates, the recognition rate improves slightly. 4.2. Experiments on the proposed character segmentation approach Qualitative results of the proposed technique on license plate images of different datasets, namely, MIMOS, Medialab, UCSD and Uninsubria are shown in Fig. 16(a) and 16(b), where we can see that the complexity of the input images vary from one dataset to another due to multiple factors of the datasets. For such images, the proposed technique segments characters successfully. It is evident that the proposed technique is robust to multiple factors. Quantitative results of the proposed and existing techniques for the above-mentioned datasets are reported in Table 3, where we note that the proposed technique is the best at all the measures especially under- and over-segmentation rates, which report a low score compared to the existing techniques. Table 3 shows that all the methods including the proposed technique provide good accuracies on the MIMOS dataset and the lowest for the UCSD dataset. This is because the number of distorted images is higher in the case of the UCSD dataset compared to MIMOS and the other datasets. The results of the proposed and existing methods Table 2 Performance of the individual steps and the proposed method before and after rectification on the MIMOS dataset. Before rectification After rectification Steps Quality measures Segmentation measures Recognition Quality measures Segmentation measures Recognition PSNR RMSE MSSIM R P F O U RR RP RF PSNR RMSE MSSIM R P F O U RR RP RF PR 12.3 69.4 0.29 23.7 13.1 18.5 13.7 65.4 0.32 27.6 15.3 21.4 SWR 16.3 8.4 11.1 33.3 26.6 18.9 10.6 14.7 30.7 24.4 CRWS 8.6 78.6 0.24 14.4 13.2 13.8 10.4 74.3 0.21 Proposed 32.1 7.1 0.65 86.8 82.6 84.6 10.8 2.4 88.4 84.3 86.3 34.7 6.4 0.62 88.9 84.3 86.6 9.4 2.1 90.6 87.3 88.9
  • 15.
    V. Khare, P.Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 233 Table 3 Performance of the proposed and existing techniques for character segmentation on different license plate datasets. Datasets Measures Phan et al., 2011 Khare et al., 2015 Sharma et al., 2013 Dhar et al., 2018 Ingole and Gundre (2017) Radchenko et al., 2017 Proposed MIMOS R 39.4 58.4 68.3 73.4 74.6 65.3 86.8 P 38.4 57.3 66.9 72.3 70.4 63.3 82.6 F 38.7 57.5 67.5 72.8 72.5 64.3 84.6 O 21.1 23.2 14.9 14.8 15.3 18.3 10.8 U 38.7 18.4 16.7 12.4 12.2 17.4 2.4 Medialab R 34.3 51.3 54.7 69.7 70 59.4 82.1 P 33.6 47.3 42.1 64.2 67.4 55.4 81.6 F 33.9 49.3 48.3 66.9 68.7 57.4 81.6 O 24.2 25.2 19.7 21.1 20.4 42.6 10.1 U 39.6 20.6 22.6 12.7 10.9 23.3 7.9 UCSD R 21.3 26.1 41.3 35.2 47.2 29.6 56.7 P 20.4 22.4 36.9 30.6 40.7 27.4 53.4 F 20.8 24.6 39.1 32.9 43.9 28.5 55.1 O 35.5 43.1 26.4 39.7 34 35.9 12.9 U 45.7 30.6 32.6 27.4 22.1 35.6 29.8 Uninusubria R 31.4 42.7 61.3 41.6 53.6 48.7 75.7 P 30.5 41.6 57.4 39.8 50.9 46.1 66.4 F 30.9 42.1 59.3 40.7 52.2 47.4 71.1 O 35.7 28.9 22.3 31.9 26.9 24.3 12.3 U 32.9 28.4 16.4 27.4 20.8 28.3 12.6 Only Challenged Images R 33.4 47.2 57.4 43.6 54.8 51.6 72.1 P 36.2 42.3 52.3 41.1 50.4 47.9 73.4 F 34.7 44.7 54.8 42.3 52.6 49.7 72.6 O 34.3 29.7 24.6 33.5 24.8 26.8 13.6 U 30.9 25.5 20.6 24.1 22.6 23.4 13.8 Fig. 16. Qualitative results of the proposed technique for character segmentation on different datasets. on challenging data show that the proposed method performs almost the same as other license plate approaches despite the fact that the challenging data does not include any ‘good’ (easy) images as in other datasets. The reason for the poor results by the existing methods is the main goal of all the three methods is to detect text in video or natural scene images but not license plate images. Similarly, though the methods, namely, Dhar et al. (2018), Ingole and Gundre (2017) and Radchenko et al. (2017) were developed for character segmentation from license plate images, the methods do not perform well on all the dataset compared to the proposed method. The reason is that the methods depend on profile based features, binarization and the specific nature of the dataset as conventional document analysis methods. Similarly, quantitative results of the proposed and existing tech- niques for video and natural scene images are reported in Table 4, where it is observed that the proposed technique is the best for the F-measure for under-and over-segmentations as compared to existing techniques. It may be noted from Table 4 that the pro- posed technique scores consistent results for all the datasets except for the MSRA-TD-500 dataset. This is because this dataset contains arbitrary-oriented texts. Since our aim is to develop a technique for license plate images, where we may not find arbitrary orienta- tions, the proposed technique gives poor results when the charac- ters are in arbitrary orientations, such as curved texts. The reason for the poor results by the existing method is that all the three methods are sensitive to the starting point as they need to esti- mate the minimum cost path. On the other hand, the proposed technique does not require either seed points or starting points to find spaces between characters. Overall, the segmentation experi- ments shows that the proposed technique is capable of handling license plates as well as video and natural scene images. 4.3. Experiments on the proposed character recognition technique through reconstruction Qualitative results of the proposed and existing techniques for the recognition of license plate images for different datasets are shown in Fig. 17(a)–17(d) for MIMOS, Medialab, UCSD and Unin- subria, respectively. The recognition step considers the output of the segmentation step as the input, which is shown in Fig. 17, to reconstruct shapes of the segmented characters. This results in the conclusion that the proposed technique reconstructs shapes well for characters of different datasets affected by different fac- tors. It can be validated by the recognition results given by OCR in double quotes in Fig. 17. Thus, we can assert that the pro- posed technique does not require binarization for recognition. To evaluate the reconstruction results given by the proposed and ex- isting techniques, we estimate quality measures, which are re- ported in Table 5 for license plate, video and natural scene im- ages datasets. Since Tian et al.’s method (Tian et al., 2015a) outputs
  • 16.
    234 V. Khare,P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 Table 4 Performance of the proposed and existing techniques for character segmentation on different video and natural scene datasets. Datasets Measures Phan et al., 2011 Khare et al., 2015 Sharma et al., 2013 Dhar et al., 2018 Ingole and Gundre (2017) Radchenko et al., 2017 Proposed ICDAR 2015 Video R 22.6 37.9 60.7 39.4 55.3 46.8 66.9 P 24.6 34.2 58.3 37.4 53.9 42.8 62.4 F 23.2 36.1 59.5 38.4 54.6 44.8 64.6 O 38.7 28.1 23.4 31.9 24.1 30.9 18.7 U 36.4 35.8 17.1 29.7 21.3 24.3 18.1 YVT Video R 30.6 38.2 51.6 51 65.9 57.4 74.9 P 29.7 41.6 52.4 49.1 60.3 56.3 73.4 F 30.1 39.9 52.1 50 63.1 56.8 73.8 O 32.7 32.6 24.3 29.3 24.5 24.8 16.2 U 36.1 29.8 23.6 20.7 12.4 18.3 14.9 ICDAR 2013 Video R 28.9 37.4 52.6 54.2 66.8 54.5 71.2 P 27.6 39.1 51.4 50.3 63.4 52.2 70.9 F 28.1 38.5 51.9 52.2 65.1 53.3 71.1 O 42.4 33.2 17.8 30.3 20.8 29.2 13.5 U 30.6 28.9 29.4 17.4 14.1 17.4 18.4 ICDAR 2015 Scene Dataset R 31.4 42.7 61.3 56.7 58.3 54.1 71.3 P 30.5 41.6 57.4 52.2 52.8 48.7 69.4 F 30.9 42.1 59.3 54.7 55.5 51.4 70.7 O 35.7 28.9 22.3 29.1 31.9 27 14.2 U 32.9 28.4 16.4 16.4 12.5 21.6 16.8 ICDAR 2013 Scene Dataset R 32.6 40.7 61.1 59.3 54.3 53.8 76.8 P 32.5 46.5 52.2 52.6 53.7 47.2 72.3 F 32.5 43.4 56.6 55.9 54 50.5 74.5 O 37.1 29.7 21.3 23.6 25.3 25.4 16.3 U 32.4 26.9 22.0 20.4 20.7 24.1 9.1 SVT Scene Dataset R 21.4 38.6 61.3 43.4 50.6 47.9 64.7 P 20.4 31.4 57.4 39.2 45.8 42.7 61.9 F 20.9 35.0 59.3 41.3 48.2 45.3 63.3 O 41.3 22.9 22.3 31.6 27.5 30.8 12.6 U 37.2 42.1 16.4 27.1 24.3 23.9 24.1 MSRA-TD-500 Dataset R 22.4 26.7 42.1 30.7 38.4 34.3 59.3 P 23.6 24.4 32.1 28.4 35.7 29.4 57.3 F 22.7 25.6 37.1 29.5 37 31.8 58.6 O 32.4 41.2 29.3 42.6 36.6 39.9 22.7 U 46.9 35.7 31.8 27.8 26.3 28.2 28.9 Fig. 17. Qualitative results of the proposed technique for reconstruction and recognition on different license plate images.
  • 17.
    V. Khare, P.Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 235 Fig. 18. Overall performance of the proposed method on the images affected by multiple adverse factors. Column-1-Column-5 denote input images of different causes, the results of partial reconstruction, the result of character segmentation, the result of full reconstruction and recognition, respectively. Table 5 Performance of the proposed and existing techniques for reconstruction on different license, video and natural scene datasets. Methods Tian et al., 2015a Proposed Datasets RMSE PSNR MSSIM RMSE PSNR MSSIM MIMOS 22.7 19.9 0.74 7.1 32.1 0.65 Medialab 42.7 21.8 0.79 12.4 26.3 0.60 UCSD 69.0 19.7 0.59 31.7 22.4 0.6 Uninsubria 72.4 8.4 0.52 26.3 23.8 0.4 ICDAR 2015 Video 63.5 11.7 0.61 19.7 23.9 0.63 YVT Video 55.3 16.1 0.68 16.3 24.5 0.67 ICDAR 2013 Video 63.6 11.7 0.61 18.4 24.0 0.60 ICDAR 2015 Scene 57.3 15.4 0.67 18.41 24.0 0.64 ICDAR 2013 Scene 57.3 15.4 0.67 16.2 24.6 0.65 SVT Scene 62.1 12.4 0.62 22.3 23.8 0.61 MSRA-TD 500 Scene 68.7 10.7 0.59 26.1 23.7 0.55 Only Challenged 39.2 12.5 0.57 15.6 20.4 0.48 reconstruction results for recognition as in our technique but un- like other existing methods, the proposed technique is compared with only Tian et al.’s method (Tian et al., 2015a). Table 5 shows that the proposed technique is better than the existing method in terms of three quality measures for all the three types of datasets. It is also observed from Table 5 that the proposed method per- forms almost the same as on other datasets, but is applied to chal- lenging data. The main reason for the poor results of the exist- ing method is that it depends on gradient information, which gives good responses for high contrast images for reconstructing charac- ter images, while the proposed technique uses both gradient and intensity information for reconstruction to handle the images af- fected by multiple factors. Quantitative results for the recognition of the proposed and ex- isting techniques on license plate images, video and natural image datasets are reported in Table 6. These experiments include recog- nition results using Canny edge images of input character images, where we pass Canny edge images to the OCR directly for recogni- tion without reconstruction. To demonstrate that a Canny edge im- age alone without reconstruction is inadequate to achieve good re- sults, we conducted recognition experiments by passing the Canny edge images to an OCR directly. It is can be verified from the re- sults reported in Table 6, where it is noted that recognition results with Canny images are far from those of the proposed technique in terms of all the three measures. This is due to the fact that Canny is sensitive to blur and complex backgrounds. We can also observe from Table 6 that the proposed technique achieves bet- ter results than the other existing methods for complex datasets, namely, MIMOS, UCSD, YYVT video, SVT, MSRA and the challeng- ing dataset. For other datasets, the existing method, Silva and Jung (2018) achieves better results than all the methods including the proposed method. This is justifiable because the method ex- plores a powerful deep learning model for unconstrained license plate recognition. It is evident that the methods of Bulan et al. (2017) and Lin et al. (2018), which also explore deep learning mod- els for license plate recognition, achieve better results than all the other existing methods but these two perform worse than the pro- posed method. However, the difference between the method in Silva and Jung (2018) and the proposed method is marginal. Be- sides, the results on difficult data show that the proposed method is effective in tackling challenges as it reports almost the same as on the other datasets. Therefore, the proposed technique is robust and generic compared to the existing methods. The major weak- ness of the existing methods is as follows. Since the gradient used in Tian et al.’s method is good for high contrast images, it gives poor results for low contrast images; Zhao et al.’s method is de- veloped for high contrast and homogeneous background images, however it gives poor results, and Anagnostopoulus et al.’s method involves binarization and parameter tuning to give poor results for the images affected by multiple factors. Conversely, the pro- posed technique does not depend on binarization and explores the
  • 18.
    236 V. Khare,P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 Table 6 Performance of the proposed and existing techniques for recognition on different license, video and natural scene datasets. Datasets Measures Canny Anagnostopoulos et al., 2006 Zhou et al., 2013 Tian et al., 2015a Bulan et al., 2017 Silva and Jung (2018) Lin et al., 2018 Proposed MIMOS RR 58.7 63.2 47.4 57.6 86.3 89.3 78.3 88.4 RP 54.3 64.7 52.3 59.7 82.6 83.2 74.9 84.3 RF 56.4 63.8 50.3 58.6 84.5 86.2 76.6 86.3 Medialab RR 59.3 64.7 52.4 61.2 83.7 86.4 75.6 82.3 RP 52.4 66.9 56.8 62.7 75.3 82.3 71.9 79.3 RF 55.3 65.7 54.6 61.6 79.5 84.3 73.7 81.3 UCSD RR 29.2 42.3 47.2 44.9 52.4 58.3 51.7 65.7 RP 32.7 44.7 48.1 46.2 47.4 55.3 49.5 62.1 RF 31.3 43.6 47.6 45.5 49.9 56.8 50.6 63.9 Uninsubria RR 62.4 65.3 68.3 64.3 76.4 78.4 77.1 78.7 RP 66.7 68.7 69.4 69.4 72.4 75.3 77.4 80.3 RF 64.8 66.9 68.8 67.1 74.4 76.8 77.2 79.5 ICDAR 2015 Video RR 66.2 68.9 71.8 72.6 83.4 86.4 84.3 78.6 RP 61.3 75.7 72.3 72.7 81.3 80.3 78.9 73.4 RF 63.7 72.7 72.1 72.6 82.3 83.3 81.6 76.2 YVT Video RR 72.4 72.9 66.9 71.4 83.4 85.9 84.9 78.3 RP 65.3 77.8 70.3 74.8 79.2 81.4 78.8 82.6 RF 68.4 75.8 68.7 72.9 81.3 83.6 81.8 80.5 ICDAR 2013 Video RR 68.2 78.7 71.3 74.9 81.6 83.2 79.5 83.7 RP 61.3 79.3 68.9 71.3 80.4 81.5 78.4 84.2 RF 65.7 78.5 69.8 72.8 81 82.3 78.9 83.5 ICDAR 2015 Scene RR 66.8 77.3 72.1 65.3 82.3 85.7 80.2 80.3 RP 67.3 72.1 74.3 62.1 81.4 84.4 80.3 82.1 RF 66.9 75.2 73.6 64.4 81.8 85 80.2 81.5 ICDAR 2013 Scene RR 59.3 72.3 71.3 65.6 83.1 86.1 81.4 78.3 RP 56.3 72.4 68.7 64.3 81.5 84.3 80.4 73.2 RF 58.6 72.3 70.1 64.9 82.3 85.2 80.9 75.8 SVT Scene RR 58.3 76.4 71.4 66.3 78.2 79.3 76.4 80.4 RP 59.7 78.3 74.7 67.2 77.3 76.3 74.8 81.6 RF 58.6 77.9 73.1 66.8 77.7 77.8 75.6 81.0 MSRA-TD-500 Scene RR 64.3 73.9 75.9 72.4 78.4 81.3 74.9 82.4 RP 65.8 76.4 74.3 77.3 74.8 80.4 73.8 81.6 RF 64.9 75.9 75.1 75.4 76.6 80.8 74.3 81.9 Only Challenged Images RR 58.7 51.9 47.4 57.6 54.8 57.6 55.9 62.9 RP 54.3 52.3 52.3 59.7 51.7 56.6 51.4 65.7 RF 56.5 52.1 49.8 58.6 53.2 57.1 53.6 64.3 combination of gradient and intensity for reconstruction through character segmentation, and it performs better than the existing methods especially for the datasets, which involve images contain- ing multiple challenging factors. Overall, to show the proposed method is robust to multiple adverse factors as mentioned in the Introduction and Proposed Methodology sections, we present sample results of each step on different images affected by low contrast, complex background, multi-fonts, multi-font sizes, blur and distortion due to perspec- tive angle, as shown respectively in Fig. 18(a)–18(f), which include the results of partial reconstruction, character segmentation, full reconstruction and recognition. One can assert from the results shown in Fig. 18 that the proposed method has significant bene- fits in handling multiple adverse factors. If a license plate image contains any logo or symbol as shown in Fig. 18(c), the segmenta- tion algorithm dissects the symbols as characters. However, when the result is sent to an OCR, it fails as shown in Fig. 18(c). As a result, the presence of symbols in license plate images does not affect the overall performance of the technique. It is evident from the results reported in Tables 3, 5 and 6 on challenging data, that one can see the proposed method performs almost the same as on the other data. It is noted that for the recognition experiments, we use OCR, which is available publicly. This OCR has inherent limita- tions such as image size, font variation, and orientation. As a result, despite the fact that the proposed method reconstructs character shapes, it fails to achieve a high accuracy, which is more than 90%. Since our target is to address the above challenges and to develop a generalized method, we prefer to use available OCR approaches to demonstrate the effectiveness and usefulness of the proposed method rather than using language models, lexicons and learning models. This is because these methods restrict generality. There- fore, we believe that the proposed work makes an important state- ment that there is a way to handle adverse factors such that one can use machine learning or deep learning to achieve high accu- racy instead of using traditional OCR by considering reconstructed results as input. This is our next target to achieve high accuracy on the MIMOS dataset by exploring the deep learning concept. To show the effectiveness of the proposed method on license plate images of different countries, we test the steps of the pro- posed method on American license plate images as well as those shown in Fig. 19, where it is noted that the steps and the pro- posed method work well for American license plate images. This is the advantage of the steps proposed in this work, i.e. stroke width pair candidate detection, partial reconstruction, character segmen- tation, complete reconstruction and recognition. This shows that the proposed method is independent of scripts.
  • 19.
    V. Khare, P.Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 237 Fig. 19. Examples of the proposed stroke width pair candidate detection, reconstruction, segmentation and recognition approaches for American license plate images. Fig. 20. Recognition rate of the proposed method for different scales to find the lower and upper boundary for scaling up and down. To test the scaling effect for license plate recognition of the pro- posed method, we calculate recognition rate for different scales as shown in Fig. 20. If the image is too small as shown in Fig. 20 (i.e. size of the character image is 4 × 4), the proposed method reports poor results as shown in Fig. 20. This type of small size is rare for license plate recognition. However, for a size greater than 16 × 16, the proposed method gives better results. This shows that the dif- ferent scales may not have much of an effect on the overall perfor- mance of the proposed method. Therefore, we can conclude that the proposed method is invariant to scaling. This is justifiable be- cause the features proposed based on stroke width distance are invariant to scaling.
  • 20.
    238 V. Khare,P. Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 5. Conclusions and future work We have proposed a novel technique for recognizing license plates, video and natural scene images through reconstruction. The proposed technique explores gradient and Laplacian symmet- rical features based on stroke width distance to obtain partial re- construction for segmenting characters. To segment characters af- fected by multiple factors such as low contrast, blur, complex back- grounds, and illumination variations, we introduce angular infor- mation for partial reconstruction results based on character struc- tures, which solve under- and over-segmentations successfully. For segmented characters, the proposed technique explores symme- try features based on stroke width distance and tangent direc- tion in the gray domain to restore complete shapes for partial re- construction results. Comprehensive experimental results are con- ducted on large datasets, which include license plates, video and natural scene images to show that the proposed technique is ro- bust and generic compared to existing methods. The same idea can be extended with the help of a deep learning concept for images of different scripts from other countries, such as Indian, Russian, Arabic and European, to develop a generic system in the near fu- ture. Acknowledgements This work was supported by the Natural Science Foundation of China under Grant nos. 61672273 and 61832008, and the Sci- ence Foundation for Distinguished Young Scholars of Jiangsu un- der Grant BK20160021. This work is also partly supported by the University of Malaya under Grant no: UM.0000520/HRU.BK (BKS003-2018). The authors would like to thank the anonymous reviewers and the Editor for their constructive comments and suggestions to im- prove the quality and clarity of this paper. Conflict of Interest None. References Abolghasemi, V., Ahmadyfard, A. (2009). An edge-based color-aided method for license plate detection. Image and Vision Computing, 27(8), 1134–1142. Al-Ghaili, A. M., Mashohor, S., Ramli, A. R., Ismail, A. (2013). Vertical-edge-based car-license-plate detection method. IEEE Transactions on Vehicular Technology, 62(1), 26–38. Al-Shemarry, M. S., Li, Y., Abdulla, S. (2018). Ensemble of adaboost cascades of 3L-LBPs classifiers for license plates detection with low quality images. Expert Systems with Applications, 92, 216–235. Anagnostopoulos, C. N. E., Anagnostopoulos, I. E., Loumos, V., Kayafas, E. (2006). A license plate-recognition algorithm for intelligent transportation system ap- plications. IEEE Transactions on Intelligent Transportation Systems, 7(3), 377–392. Azam, S., Islam, M. M. (2016). Automatic license plate detection in hazardous con- dition. Journal of Visual Communication and Image Representation, 36, 172–186. Ben-Ami, I., Basha, T., Avidan, S. (2012). Racing bib numbers recognition. In Pro- ceedings of the BMVC (pp. 1–10). Bulan, O., Kozitsky, V., Ramesh, P., Shreve, M. (2017). Segmentation-and annota- tion-free license plate recognition with deep localization and failure identifica- tion. IEEE Transactions ITS, 18(9), 2351–2363. Dhar, P., Guha, S., Biswas, T., Abedin, M. Z. (2018). A system design for license plate recognition by using edge detection and convolution neural network. In Proceedings of the IC4ME2 (pp. 1–4). Dong, M., He, D., Luo, C., Liu, D., Zeng, W. (2017). A CNN-based approach for automatic license plate recognition in the wild. In Proceedings of the BMCV (pp. 1–12). Du, S., Ibrahim, M., Shehata, M., Badawy, W. (2013). Automatic license plate recog- nition (ALPR): A state-of-the-art review. IEEE Transactions on Circuits and Sys- tems for Video Technology, 23(2), 311–325. Epshtein, B., Ofek, E., Wexler, Y. (2010). Detecting text in natural scenes with stroke width transform. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2963–2970). IEEE. Goodfellow, I. J., Bulatov, Y., Ibarz, J., Arnoud, S., Shet, V. (2013). Multi-digit num- ber recognition from street view imagery using deep convolutional neural net- works. arXiv:1312.6082. Gou, C., Wang, K., Yao, Y., Li, Z. (2016). Vehicle license plate recognition based on extremal regions and restricted Boltzmann machines. IEEE Transactions on Intelligent Transportation Systems, 17(4), 1096–1107. Ingole, S. K., Gundre, S. B. (2017). Characters feature based Indian Vehicle license plate detection and recognition. In Proceedings of the I2C2 (pp. 1–5). Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A. (2016). Reading text in the wild with convolutional neural networks. International Journal of Computer Vi- sion, 116(1), 1–20. Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., et al. (2015). ICDAR 2015 competition on robust reading. In Proceedings of the 13th international conference on document analysis and recognition (ICDAR) (pp. 1156–1160). IEEE. Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L. G., Mestre, S. R., et al. (2013). ICDAR 2013 robust reading competition. In Proceedings of the 12th international conference on document analysis and recognition (ICDAR) (pp. 1484–1493). IEEE. Khare, V., Shivakumara, P., Raveendran, P., Meng, L. K., Woon, H. H. (2015). A new sharpness based approach for character segmentation in License plate im- ages. In Proceedings of the 3rd IAPR Asian conference on pattern recognition (ACPR) (pp. 544–548). IEEE. Kim, D., Song, T., Lee, Y., Ko, H. (2016). Effective character segmentation for license plate recognition under illumination changing environment. In Proceedings of the IEEE international conference on consumer electronics (ICCE) (pp. 532–533). IEEE. Liang, G., Shivakumara, P., Lu, T., Tan, C. L. (2015). A new wavelet-Laplacian method for arbitrarily-oriented character segmentation in video text lines. In Proceedings of the 13th international conference on document analysis and recog- nition (ICDAR) (pp. 926–930). IEEE. Lin, C. H., Lin, Y. S., Liu, W. C. (2018). An efficient license plate recognition system using convolution neural networks. In Proceedings of the ICASI (pp. 224–227). Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F. E. (2017). A survey of deep neural network architectures and their applications. Neurocomputing, 234, 11–26. Nguyen, P. X., Wang, K., Belongie, S. (2014). Video text detection and recognition: Dataset and benchmark. In Proceedings of the IEEE winter conference on applica- tions of computer vision (WACV) (pp. 776–783). IEEE. Peyrard, C., Baccouche, M., Mamalet, F., Garcia, C. (2015). ICDAR2015 competition on text image super-resolution. In Proceedings of the 13th international confer- ence on document analysis and recognition (ICDAR) (pp. 1201–1205). IEEE. Phan, T. Q., Shivakumara, P., Su, B., Tan, C. L. (2011). A gradient vector flow-based method for video character segmentation. In Proceedings of the international con- ference on document analysis and recognition (ICDAR) (pp. 1024–1028). IEEE. Radchenko, A., Zarovsky, R., Kazymyr, V. (2017). Method of segmentation and recognition of Ukrainian license plates. In Proceedings of the YSF (pp. 62–65). Raghunandan, K. S., Shivakumara, P., Jalab, H. A., Ibrahim, R. W., Kumar, G. H., Pal, U., et al. (2017). Riesz fractional based model for enhancing license plate detection and recognition. IEEE Transactions on Circuits and Systems for Video Technology, 28(9), 2276–2288. Rathore, M. M., Ahmad, A., Paul, A., Rho, S. (2016). Urban planning and building smart cities based on the internet of things using big data analytics. Computer Networks, 101, 63–80. Saha, S., Basu, S., Nasipuri, M. (2015). iLPR: An Indian license plate recognition system. Multimedia Tools and Applications, 74(23), 10621–10656. Sedighi, A., Vafadust, M. (2011). A new and robust method for character segmen- tation and recognition in license plate images. Expert Systems with Applications, 38(11), 13497–13504. Sharma, N., Shivakumara, P., Pal, U., Blumenstein, M., Tan, C. L. (2013). A new method for character segmentation from multi-oriented video words. In Pro- ceedings of the 12th international conference on document analysis and recognition (ICDAR) (pp. 413–417). IEEE. Shivakumara, P., Dutta, A., Tan, C. L., Pal, U. (2014). Multi-oriented scene text de- tection in video based on wavelet and angle projection boundary growing. Mul- timedia Tools and Applications, 72(1), 515–539. Shivakumara, P., Phan, T. Q., Bhowmick, S., Tan, C. L., Pal, U. (2013). A novel ring radius transform for video character reconstruction. Pattern Recognition, 46(1), 131–140. Shivakumara, P., Roy, S., Jalab, H. A., Ibrahim, R. W., Pal, U., Lu, T., et al. (2019). Fractional means based method for multi-oriented keyword spotting in video/scene/license plate images. Expert Systems with Applications, 118, 1–19. Silva, S. M., Jung, C. R. (2018). License plate detection and recognition in uncon- strained scenarios. In Proceedings of the ECCV (pp. 593–609). Suresh, K. V., Kumar, G. M., Rajagopalan, A. N. (2007). Superresolution of license plates in real traffic videos. IEEE Transactions on Intelligent Transportation Sys- tems, 8(2), 321–331. Tadic, V., Popovic, M., Odry, P. (2016). Fuzzified Gabor filter for license plate de- tection. Engineering Applications of Artificial Intelligence, 48, 40–58. Tesseract OCR software (2016). http://vision.ucsd.edu/belongie-grp/research/carRec/ car_rec.html Tian, J., Wang, R., Wang, G., Liu, J., Xia, Y. (2015a). A two-stage character segmen- tation method for Chinese license plate. Computers Electrical Engineering, 46, 539–553. Tian, S., Shivakumara, P., Phan, T. Q., Lu, T., Tan, C. L. (2015b). Character shape restoration system through medial axis points in video. Neurocomputing, 161, 183–198. Wang, K., Belongie, S. (2010). Word spotting in the wild. In Proceedings of the European conference on computer vision (pp. 591–604). Springer.
  • 21.
    V. Khare, P.Shivakumara and C.S. Chan et al. / Expert Systems With Applications 131 (2019) 219–239 239 Wang, Y., Shi, C., Xiao, B., Wang, C. (2015). MRF based text binarization in complex images using stroke feature. In Proceedings of the 13th international conference on document analysis and recognition (ICDAR) (pp. 821–825). IEEE. Yang, Y., Li, D., Duan, Z. (2018). Chinese vehicle license plate recognition using kernel-based extreme learning machine with deep convolutional features. IET Intelligent Transport System, 12(3), 213–219. Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z. (2012). Detecting texts of arbitrary orienta- tions in natural images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1083–1090). IEEE. Ye, Q., Doermann, D. (2015). Text detection and recognition in imagery: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(7), 1480–1500. Yu, S., Li, B., Zhang, Q., Liu, C., Meng, M. Q. H. (2015). A novel license plate loca- tion method based on wavelet transform and EMD analysis. Pattern Recognition, 48(1), 114–125. Yuan, Y., Zou, W., Zhao, Y., Wang, Xin’an, Hu, X., Komodakis, N. (2017). A robust and efficient approach to license plate detection. IEEE Transactions Image Pro- cessing, 26(3), 1102–1114. Zamberletti, A., Gallo, I., Noce, L. (2015). Augmented text character proposals and convolutional neural networks for text spotting from scene images. In Proceed- ings of the 3rd IAPR Asian conference on pattern recognition (ACPR) (pp. 196–200). IEEE. Zhou, W., Li, H., Lu, Y., Tian, Q. (2012). Principal visual word discovery for au- tomatic license plate detection. IEEE Transactions on Image Processing, 21(9), 4269–4279. Zhou, Y., Feild, J., Learned-Miller, E., Wang, R. (2013). Scene text segmentation via inverse rendering. In Proceedings of the 12th international conference on document analysis and recognition (ICDAR) (pp. 457–461). IEEE.