SlideShare a Scribd company logo
1 of 10
Skip Trie Matching: A Greedy Algorithm for Real-
Time OCR Error Correction on Smartphones
Vladimir Kulyukin
Department of Computer Science
Utah State University
Logan, UT, USA
vladimir.kulyukin@usu.edu
Aditya Vanka
Department of Computer Science
Utah State University
Logan, UT, USA
aditya.vanka@aggiemail.usu.edu
Haitao Wang
Department of Computer Science
Utah State University
Logan, UT, USA
haitao.wang@usu.edu
Abstract—Proactive nutrition management is considered by
many nutritionists and dieticians as a key factor in reducing
diabetes, cancer, and other illnesses caused by mismanaged diets.
As more individuals manage their daily activities with
smartphones, they start using their smartphones as diet
management tools. Unfortunately, while there are many vision-
based mobile applications to process barcodes, especially aligned
ones, there is a relative dearth of vision-based applications for
extracting useful nutrition information items such as nutrition
facts, caloric contents, and ingredients. In this article, we present
a greedy algorithm, called Skip Trie Matching (STM), for real time
optical character recognition (OCR) output error correction on
smartphones. The STM algorithm uses a dictionary of strings
stored in a trie data structure to correct OCR errors by skipping
misrecognized characters while driving down several paths in the
trie. The number of skipped characters is referred to as the skip
distance. The algorithm’s worst-case performance is  n , where
n is the length of the input string to spellcheck. The algorithm’s
performance is compared with Apache Lucene’s spell checker
[1], a state of the art spell checker where spell checking can be
done with the n-gram matching [2] or the Levenshtein edit
distance (LED) [3]. The input data for comparison tests are text
strings produced by the Tesserract OCR engine [4] on text image
segments of nutrition data automatically extracted by an Android
2.3.6 smartphone application from real-time video streams of
grocery product packages. The evaluation results indicate that,
while the STM algorithm is greedy in that it does not find all
possible corrections of a misspelled word, it gives higher recalls
than Lucene’s n-gram matching or LED. The average run time
of the STM algorithm is also lower than Lucene’s
implementations of both algorithms.
Keywords—vision-based nutrition information extraction,
nutrition management, spellchecking, optical character
recognition, mobile computing
I. Introduction
According to the U.S. Department of Agriculture, U.S.
residents have increased their caloric intake by 523 calories
per day since 1970. Mismanaged diets are estimated to
account for 30-35 percent of cancer cases [5]. Approximately
47,000,000 U.S. residents have metabolic syndrome and
diabetes. Diabetes in children appears to be closely related to
increasing obesity levels. Many nutritionists and dieticians
consider proactive nutrition management to be a key factor in
reducing and controlling diabetes, cancer, and other illnesses
related to or caused by mismanaged or inadequate diets.
Research and development efforts in public health and
preventive care have resulted in many systems for personal
nutrition and health management. Numerous web sites have
been developed to track caloric intake (e.g.,
http://nutritiondata.self.com), to determine caloric contents
and quantities in consumed food (e.g.,
http://www.calorieking.com), and to track food intake and
exercise (e.g., http://www.fitday.com).
Online health informatics portals have also been
developed. For example, theCarrot.com and Keas.com are
online portals where patients can input medical information,
track conditions via mobile devices, and create health
summaries to share with their doctors. Another online portal,
mypreventivecare.com, provides a space where registered
patients and doctors can collaborate around illness prevention.
HealtheMe (http://www.krminc.com/) is an open source,
personal health record (PHR) system at the U.S. Department
of Veterans Affairs focused on preventive care.
Unfortunately, while these existing systems provide many
useful features, they fail to two critical barriers (CRs): CR01:
Lack of automated, real-time nutrition label analysis: nutrition
labels (NLs) have the potential to promote healthy diets.
Unfortunately, many nutrition label characteristics impede
timely detection and adequate comprehension [6], often
resulting in patients ignoring useful nutrition information;
CR02: Lack of automated, real-time nutrition intake
recording: Manual nutrition intake recording, the de facto
state of the art, is time-consuming and error-prone, especially
on smartphones.
As smartphones become universally adopted worldwide,
they can be used as proactive nutrition management tools. One
smartphone sensor that may help address both critical barriers
is the camera. Currently, the smartphone cameras are used in
many mobile applications to process barcodes. There are free
public online barcode databases (e.g.,
http://www.upcdatabase.com/) that provide some product
descriptions and issuing countries’ names. Unfortunately,
although the quality of data in public barcode databases has
improved, the data remain sparse. Most product information is
International Journal of Digital Information and Wireless Communications (IJDIWC) 3(3): 56-65
The Society of Digital Information and Wireless Communications, 2013 (ISSN: 2225-658X)
56
provided by volunteers who are assumed to periodically
upload product details and associate them with product IDs,
almost no nutritional information is available and some of it
may not be reliable. Some applications (e.g.,
http://redlaser.com) provide some nutritional information for a
few popular products.
While there are many vision-based applications to process
barcodes, there continues to be a relative dearth of vision-
based applications for extracting other types of useful nutrition
information from product packages such as nutrition facts,
caloric contents, and ingredients. If successfully extracted,
such information can be converted into text or SQL via
scalable optical character recognition (OCR) methods and
submitted as queries to cloud-based sites and services.
In general, there are two broad approaches to improving
OCR quality: improved image processing and OCR engine
error correction. The first approach (e.g., [8], [9]) strives to
achieve better OCR results by improving image processing
algorithms. Unfortunately, this approach may not always be
feasible, especially on mobile off-the-shelf platforms due to
processing and networking constraints on the amount of real
time computation or the technical or financial impracticality of
switching to a different OCR engine. The second approach
treats the OCR engine as a black box and attempts to improve
its quality via automated error correction methods applied to
its output. This approach can work with multiple OCR
engines, because it does not modify the underlying image
processing methods.
This article contributes to the body of research on the
second approach. In particular, we present an algorithm, called
Skip Trie Matching (STM) after the trie data structure on
which it is based, for real time OCR output error correction on
smartphones. The algorithm uses a dictionary of strings stored
in a trie to correct OCR errors by skipping misrecognized
characters while going down specific trie paths. The number
of skipped characters, called the skip distance, is the only
variable input parameter of the algorithm.
The remainder of our article is organized as follows.
Section 2 presents related work. Section 3 discusses the
components of the vision-based nutrition information
extraction (NIE) module of the ShopMobile system [10, 11]
that run prior to the STM algorithm. The material in this
section is not the main focus of this article, and is presented
with the sole purpose of giving the reader the broader context
in which the STM algorithm has been developed and applied.
Section 4 details the STM algorithm and gives its asymptotic
analysis. In Section 5, the STM algorithm’s performance is
compared with Apache Lucene’s n-gram matching [2] and
Levenshtein edit distance (LED) [3]. Section 6 analyzes the
results and outlines several research venues for the future.
II. Related Work
Many current R&D efforts aim to utilize the power of mobile
computing to improve proactive nutrition management. In
[12], the research is presented that shows how to design
mobile applications for supporting lifestyle changes among
individuals with Type 2 diabetes and how these changes were
perceived by a group of 12 patients during a 6-month period.
In [13], an application is presented that contains a picture-
based diabetes diary that records physical activity and photos
taken with the phone camera of eaten foods. The smartphone
is connected to a glucometer via Bluetooth to capture blood
glucose values. A web-based, password-secured and encrypted
SMS is provided to users to send messages to their care
providers to resolve daily problems and to send educational
messages to users.
The nutrition label (NL) localization algorithm outlined in
Section 3 is based on vertical and horizontal projections used
in many OCR applications. For example, in [9], projections
are used to detect and recognize Arabic characters. The text
chunking algorithm, also outlined in Section 3, builds on and
complements numerous mobile OCR projects that capitalize
on the ever increasing processing capabilities of smartphone
cameras. For example, in [14], a system is presented for
mobile OCR on mobile phones. In [15], an interactive system
is described for text recognition and translation.
The STM algorithm detailed in Section 4 contributes to a
large and rapidly growing body of research on automated spell
checking, an active research area of computational linguistics
since early 1960’s (see [16] for a comprehensive survey).
Spelling errors can be broadly classified as non-word errors
and real-word errors [17]. Non-word errors are character
sequences returned by OCR engines but not contained in the
spell checker’s dictionary. For example, ‘polassium’ is a non-
word error if the spell checker’s dictionary contains only
‘potassium.’ Real-word errors occur when recognized words
spelled correctly but inappropriate in the current context. For
example, if the OCR engine recognizes ‘nutrition facts’ as
‘nutrition fats,’ the string ‘fats’ is a real-word error in that it is
correctly spelled but inappropriate in the current context.
Real-word error correction is beyond the scope of this paper.
Many researchers consider this problem to be much more
difficult than non-word error recognition and require natural
language processing and AI techniques [2].
Two well-known approaches that handle non-word errors
are n-grams and edit distances [2, 3]. The n-gram approach
breaks dictionary words into sub-sequences of characters of
length n, i.e., n-grams, where n is typically set to 1, 2, or 3. A
table is computed with the statistics of n-gram occurrences.
When a word is checked for spelling, its n-grams are
computed and, if an n-gram is not found, spelling correction is
applied. To find spelling suggestions, spelling correction
methods use various similarity criteria between the n-grams of
a dictionary word and the misspelled word.
The term edit distance denotes the minimum number of
edit operations such as insertions, deletions, and substitutions
required to transform one string into another. Two popular edit
distances are Levenshtein [3] and Damerau-Levenshtein
distance [18]. The Levenshtein distance (LED) is a minimum
cost sequence of single-character replacement, deletion, and
insertion operations required to transform a source string into
a target string. The Damerau-Levenshtein distance (DLED)
extends the LED’s set of operations with the operation of
swapping adjacent characters, called transposition. The DLED
is not as widely used in OCR as the LED, because a major
source of transposition errors are typography errors whereas
most OCR errors are caused by misrecognized characters with
International Journal of Digital Information and Wireless Communications (IJDIWC) 3(3): 56-65
The Society of Digital Information and Wireless Communications, 2013 (ISSN: 2225-658X)
57
similar graphic features (e.g., ‘t’ misrecognized as ‘l’ or ‘u’ as
‘ll’). Another frequent OCR error type, which does not yield
to correction by transposition, is character omission from a
word when the OCR engine’ fails to recognize it or merges it
with the previous character (e.g., the letter ‘i’ with a missing
dot is merged with the following letter ‘k’ when ‘k’ is placed
close to ‘i’).
III. NFT Localization & Text Segmentation
The objective of this section is to give the reader a better
appreciation of the broader context in which the STM
algorithm is applied by outlining the nutrition information
extraction (NIE) module that runs prior to the STM algorithm.
The NIE module is a module of the Persuasive NUtrion
System (PNUTS), a mobile vision-based nutrition
management system for smartphone users currently under
development at the Utah State University (USU) Computer
Science Assistive Technology Laboratory (CSATL) [10, 11,
19]. The system will enable smartphone users, both sighted
and visually impaired, to specify their dietary profiles securely
on the web or in the cloud. When they go shopping, they will
use their smartphones to extract nutrition information from
product packages with their smartphones’ cameras. The
extracted information is not limited to barcodes but also
includes nutrition facts such as calories, saturated fat, sugar
content, cholesterol, sodium, potassium, carbohydrates,
protein, and ingredients.
The development of PNUTS has been guided by the Fogg
Behavior Model (FBM) [23]. The FBM states that motivation
alone is insufficient to stimulate target behavior; a motivated
patient must have both the ability to execute the behavior and
a trigger to engage in that behavior at an appropriate place and
time. Many nutrition management system designers assume
that patients are either more skilled than they actually are, or
that they can be trained to have the required skills. Since
training can be difficult and time consuming, PNUTS’s
objective is to make target behaviors easier and more intuitive
to execute.
A. Vertical and Horizontal Projections
Images captured from the smartphone’s video stream are
divided into foreground and background pixels. Foreground
pixels are content-bearing units where content is defined in a
domain-dependent manner, e.g., black pixels, white pixels,
pixels with specific luminosity levels, specific neighborhood
connectivity patterns, etc. Background pixels are those that are
not foreground. Horizontal projection of an image (HP) is a
sequence of foreground pixel counts for each row in an image.
Vertical projection of an image (VP) is a sequence of
foreground pixel counts for each column in an image. Figure 1
shows horizontal and vertical projections of a black and white
image of three characters ‘ABC’.
Suppose there is an m x n image I where foreground pixels
are black, i.e.,   ,0, yxI and the background pixels are
white, i.e.,   .255, yxI Then the horizontal projection of
row y and the vertical projection of column x can defined as
 yf and  xg , respectively:
    
    







1
0
1
0
.,255
;,255
m
y
n
x
yxIxg
yxIyf (1)
For the discussion that follows it is important to keep in
mind that the vertical projections are used for detecting the
vertical boundaries of NFTs while the horizontal projections
are used in computing the NFTs’ horizontal boundaries.
B. Horizontal Line Filtering
In detecting NFT boundaries, three assumptions are made: 1) a
NFT is present in the image; 2) the NFT is not cropped; and 3)
the NFT is horizontally or vertically aligned. Figures 2 shows
horizontally and vertically aligned NFTs. The detection of
NFT boundaries proceeds in three stages. Firstly, the first
approximation of vertical boundaries is computed. Secondly,
the vertical boundaries are extended left and right. Thirdly, the
upper and lower horizontal boundaries are computed.
The objective of the first stage is to detect the approximate
location of the NFT along the horizontal axis  ., ''
es xx This
approximation starts with the detection of horizontal lines in
the image, which is accomplished with a horizontal line
detection kernel (HLDK) described in our previous
publications [19]. It should be noted that other line detection
techniques (e.g., Hough transform [20]) can be used for this
purpose. Our HLDK is designed to detect large horizontal
lines in images to maximize computational efficiency. On
rotated images, the kernel is used to detect vertical lines. The
left image of Figure 3 gives the output of running the HLDK
filter on the left image of Figure 2.
Figure 1. Horizontal & Vertical Projections.
International Journal of Digital Information and Wireless Communications (IJDIWC) 3(3): 56-65
The Society of Digital Information and Wireless Communications, 2013 (ISSN: 2225-658X)
58
C. Detection of Vertical Boundaries
Let HLFI be a horizontally line filtered image, i.e., the image
put through the HLDK filter or some other line detection filter.
Let  HLFIVP be the vertical projections of white pixels in
each column of HLFI. The right image in Figure 3 shows the
vertical projection of the HLFI on the left. Let VP be a
threshold, which in our application is set to the mean count of
the white foreground pixels in columns. In Figure 3 (right),
VP is shown by a red line. The foreground pixel counts in the
columns of the image region with the NFT are greater than the
threshold. The NFT vertical boundaries are then computed as:
  
  .&|max
;|min
''
'
cxcgcx
cgcx
lVPxr
VPxl



 (2)
The pairs of the left and right boundaries detected by (2) may
be too close to each other, where ‘too close’ is defined as the
percentage of the image width covered by the distance
between the right and left boundaries. If the boundaries are
found to be too close to each other, the left boundary is
extended left of the current left boundary, for which the
projection is at or above the threshold, whereas the right
boundary is extended to the first column right of the current
right boundary, for which the vertical projection is at or above
the threshold. Figure 4 (left) shows the initial vertical
boundaries extended left and right.
D. Detection of Horizontal Boundaries
The NFT horizontal boundary computation is confined to the
image region vertically bounded by  ., rl xx Let  HLFIHP
be the horizontal projection of the HLFI in Figure 3 (left) and
let HP be a threshold, which in our application is set to the
mean count of the foreground pixels in rows, i.e.,
    .0|  yfyfmeanHP Figure 4 (right) shows the
horizontal projection of the HLFI in Figure 3 (left). The red
line shows .HP
The NFT’s horizontal boundaries are computed in a
manner similar to the computation of its vertical boundaries
with one exception – they are not extended after the first
approximation is computed, because the horizontal boundaries
do not have as much impact on subsequent OCR of segmented
text chunks as vertical boundaries. The horizontal boundaries
are computed as:
  
  .&|max
;|min
uHPl
HPu
rrrfrr
rfrr



 (3)
Figure 5 (left) shows the nutrition table localized via
vertical and horizontal projections and segmented from the
image in Figure 2 (left).
E. Text Chunking
A typical NFT includes text chunks with various caloric and
ingredient information, e.g., “Total Fat 2g 3%.” As can be
seen in Figure 5 (left), text chunks are separated by black
colored separators. These text chunks are segmented from
localized NFTs. This segmentation is referred to as text
chunking.
Figure 2. Vertically & Horizonally Aligned Tables.
Figure 4. VB Extension (left); HP of Left HFLI in
Fig. 2 (right).
Figure 3. HLFI of Fig. 2 (left); its VP (right).
International Journal of Digital Information and Wireless Communications (IJDIWC) 3(3): 56-65
The Society of Digital Information and Wireless Communications, 2013 (ISSN: 2225-658X)
59
Text chunking starts with the detection of the separator
lines. Let N be a binarized image with a segmented NFT and
let  ip denote the probability of image row i containing a
black separator. If such probabilities are reliably computed,
text chunks can be localized. Toward that end, let jl be the
length of the j-th consecutive run of black pixels in row i
above a length threshold l . If m be the total number of such
runs, then  ip is computed as the geometric mean of
 .,...,, 10 mlll The geometric mean is more indicative of the
central tendency of a set of numbers than the arithmetic mean.
If  is the mean value of all positive values normalized by
the maximum value of  ip for the entire image, the start and
end coordinates, sy and ey , respectively, of every separator
along the y axis can be computed by detecting consecutive
rows for which the normalized values are above the threshold:
   
    .&1|
;&1|




jpjpjy
ipipiy
e
s
(4)
Once these coordinates are identified, the text chunks can
be segmented from the image. As can be seen from Figure 5
(right), some text chunks contain single text lines while others
have multiple text lines. The actual OCR in the ShopMobile
system [19] takes place on the text chunk images such as
shown in Figure 5 (right).
IV. Skip Trie Matching
The trie data structure has gained popularity on mobile
platforms due to its space efficiency compared to the standard
hash table and its efficient worst-case lookup times,
 ,nO where n is the length of the input string, not the number
of entries in the data structure. This performance compares
favorably to the hash table that spends the same amount of
time on computing the hash code but requires significantly
more storage space.
On many mobile platforms the trie has been used for word
completion. The STM algorithm is based on an observation
that the trie’s efficient storage of strings can be used in finding
closest dictionary matches to misspelled words. The only
parameter that controls the algorithm’s behavior is the skip
distance, a non-negative integer that defines the maximum
number of misrecognized (misspelled) characters allowed in a
misspelled word. In the current implementation, misspelled
words are produced by the Tesseract OCR engine. However,
the algorithm generalizes to other domains where spelling
errors must be corrected fast.
The STM algorithm uses the trie to represent the target
dictionary. For example, consider a trie dictionary in Figure 6
that (moving left to right) consists of ‘ABOUT,’ ‘ACID,’
‘ACORN,’ ‘BAA,’ ‘BAB,’ ‘BAG,’ ‘BE,’ ‘OIL,’ and ‘ZINC.’
The small balloons at character nodes are Boolean flags that
signal word ends. When a node’s word end flag is true, the
path from the root to the node is a word. The children of each
node are lexicographically sorted so that finding a child
character of a node is  ,lognO where n is the number of the
node’s children.
Suppose that skip distance is set to 1 and the OCR engine
misrecognizes ‘ACID’ as ‘ACIR.’ The STM starts at the root
node, as shown in Figure 7. For each child of the root, the
algorithm checks if the first character of the input string
matches any of the root’s children. If no match is found and
the skip distance > 0, the skip distance is decremented by 1
and the recursive calls are made for each of the root’s
children. In this case, ‘A’ in the input matches the root’s ‘A’
child. Since the match is successful, a recursive call is made
on the remainder of the input ‘CIR’ and the root node’s ‘A’
child at Level 1 as the current node, as shown in Figure 8.
The algorithm next matches ‘C’ of the truncated input
‘CIR’ with the right child of the current node ‘A’ at Level 1,
truncates the input to ‘IR,’ and recurses to the node ‘C’ at
Level 2, i.e., the right child of the node ‘A’ at Level 1. The
skip distance is still 1, because no mismatched characters have
been skipped so far.
Figure 5. Localized NL (left); NL Text Chunks (right).
Figure 6. Simple Trie Dictionary.
International Journal of Digital Information and Wireless Communications (IJDIWC) 3(3): 56-65
The Society of Digital Information and Wireless Communications, 2013 (ISSN: 2225-658X)
60
Figure 10. End Position Errors along X-axis.
Edit Distance:
At the node ‘C’ at Level 2, the character ‘I’ of the
truncated input ‘IR’ is matched with the left child of the node
‘C,’ the input, i.e., the node ‘I’, is therefore truncated to ‘R.’
The algorithm recurses to the node ‘I’ at Level 3, as shown in
Figure 9. The last character of the input string, ‘R,’ is next
matched with the children of the current node ‘I’ at Level 3
(see Figure 10). The binary search on the node’s children fails.
However, since the skip distance is 1, i.e., one more character
can still be skipped, the skip distance is decremented by 1.
Since there are no more characters in the input string after the
mismatched ‘R,’ the algorithm checks if the current node has a
word end flag set to true. In this case, it is true, because the
end word flag at the node ‘D’ at Level 4 is set to true. Thus,
the matched word, ‘ACID,’ is added to the returned list of
suggestions. When the skip distance is 0 and the current node
is not a word end, the algorithm fails, i.e., the current path is
not added to the returned list of spelling suggestions.
Figure 12 gives the pseudocode of the STM algorithm. The
dot operator is used with the semantics common to many OOP
languages to denote access to objects’ member variable. The
colon denotes after the IF condition signals the start of the
THEN clause for that condition. Python-like indentation is
used to denote scope. Lines 1 computes the length of the
possibly misspelled input string object stored in the variable
inputStr. Line 2 initializes an array of string objects called
suggestionList. When the algorithm finishes, suggestionList
contains all found corrections of inputString.
The first call to the STM function occurs on Line 3. The
first parameter is inputStr; the second parameter d is a skip
distance; the third parameter curNode is a trie node object
initially referencing the trie’s root; the fourth parameter
suggestion is a string that holds the sequence of characters
from the trie’s root curNode. Found suggestions are added to
suggestionList on Line 10.
Let len(inputStr) = n and d = d. The largest branching
factor of a trie node is  , i.e., the size of the alphabet over
which the trie is built. If inputStr is in the trie, the binary
search on line 14 runs exactly once for each of the n
characters, which gives us  .log nO If inputStr is not in the
trie, it is allowed to contain at most d character mismatches.
Figure 8. Recursive Call from Node 'A' at Level 1.
Figure 7. STM of 'ACIR' with Skip Distance of 1.
Figure 9. Recursive call at node 'I' at Level 3.
International Journal of Digital Information and Wireless Communications (IJDIWC) 3(3): 56-65
The Society of Digital Information and Wireless Communications, 2013 (ISSN: 2225-658X)
61
1. inputLength = len(intpuStr)
2. suggestionList = [ ]
3. STM(inputStr, d, curNode, suggestion):
4. IF len(inputStr) == 0 || curNode == NULL: fail
5. IF len(inputStr) == 1:
6. IF inputStr[0] == curNode.char || d > 0:
7. add curNode.char to suggestion
8. IF len(suggestion) == inputLength &&
9. curNode.wordEnd == True:
10. add suggestion to suggestionList
11. ELSE IF len(inputStr) > 1:
12. IF inputStr[0] == curNode.char:
13. add curNode.char to suggestion
14. nextNode = binSearch(inputStr[1], curNode.chidren)
15. IF nextNode != NULL:
16. STM(rest(inputStr), d, nextNode, suggestion)
17. ELSE IF d > 0:
18. FOR each node C in curNode.children:
19. add C.char to suggestion
20. STM(rest(inputStr), d-1, C, suggestion)
21. ELSE fail
Figure 11. STM Algorithm.
Thus, there are  dn  matches and d mismatches. All matches
run in   .log  dnO In the worst case, for each mismatch,
lines 18-19 ensure that
d
 nodes are inspected, which gives
us the run time of      loglog
dd
nOdnO . The
worst case rarely occurs in practice because in a trie built for a
natural language most nodes have branching factors equal to a
small fraction of the constant  . Since Nd  is a small non-
negative constant,    .log nOnO
d
 Since
d
 and log are small constants for the natural languages
with written scripts, the STM algorithm is asymptotically
expected to run faster than the quadratic edit distance
algorithms and to be on par with n-gram algorithms.
Two current limitations of the STM algorithm, as is evident
from Figure 11, are 1) that it finds spelling suggestions that
are of the same length as inputStr (Line 8) and 2) that it does
not find all possible misspellings of a misspelled word because
of its greedy selection of the next node in the trie (Lines 14 –
16).
Both limitations were deliberately added into the algorithm
to increase the run-time performance. In Section VI, we will
discuss how these limitations can be addressed and possible
overcome. In particular, the second limitation can be
overcome by having the algorithm perform a breadth-first
search of the current node’s children instead of selecting only
one child that matches the current character. While this
modification will not chance the asymptotic performance, it
will, in practice, increase the running time.
V. Experiments
A. Tesseract vs. GOCR
Our first experiment was to choose an open source OCR
engine to test the STM algorithm. We chose to compare the
relative performance of Tesseract with GOCR [22], another
open source OCR engine developed under the GNU public
license. In Tesseract, OCR consists of segmentation and
recognition. During segmentation, a connected component
analysis first identifies blobs and segments them into text
lines, which are analyzed for proportional or fixed pitch text.
The lines are then broken into individual words via spacing
between individual characters. Finally, individual character
cells are identified. During recognition, an adaptive classifier
recognizes both word blobs and character cells. The classifier
is adaptive in that it can be trained on various text corpora.
GOCR preprocesses images via box-detection, zoning, and
line detection. OCR is done on boxes, zones, and lines via
pixel pattern analysis.
The available funding for this project did not allow us to
integrate a commercial OCR engine into our application. For
the sake of completeness, we should mention two OCR
engines that have become popular in the OCR community.
ABBYY [24] offers a powerful and compact mobile OCR
engine which can be integrated within various mobile
platforms such as iOS, Android, Windows mobile platforms.
Another company called WINTONE [25] claims that its
mobile OCR software can achieve recognition accuracy to the
tune of 95 percent for English documents.
The experimental comparison of the two open source
engines was guided by speed and accuracy. An Android 2.3.6
application was developed and tested on two hundred images
of NL text chunks, some of which are shown in Figure 5
(right).
After the application starts, it first app creates necessary
temporary directories and output files, it also ensures the
presence of necessary training files for Tesseract on the
sdcard. Both the modified Otsu’s method and modified
Niblack’s method of binarization [26] are applied to each
image read from the input image directory on the sdcard.
These binarized images are stored in the binary images
directory. All three images (the input image and the two
binarized images) are subjected to OCRed either on the device
itself or on the server. The recognized text is obtained and is
subjected to spelling correction. Finally the processed text is
both saved to a text file.
Each image was processed with both Tesseract and GOCR,
and the processing times were logged for each text chunk. The
images were read from the smartphone’s sdcard one by one.
The image read time was not integrated into the run time total.
The Android application was designed and developed to
operate in two modes: device and server. In the device mode,
everything was computed on the smartphone. In the server
mode, the HTTP protocol was used to send the images over
Wi-Fi from the device to an Apache web server running on
Ubuntu 12.04. Images sent from the Android application were
handled by a PHP script that ran one of the OCR engines on
the server, captured the extracted text and sent it back to the
application. Returned text messages were saved on the
International Journal of Digital Information and Wireless Communications (IJDIWC) 3(3): 56-65
The Society of Digital Information and Wireless Communications, 2013 (ISSN: 2225-658X)
62
smartphone’s sdcard and later categorized by a human judge
into three categories: complete, partial, and garbled.
The three categorizes were meant to estimate the degree of
recognition. Complete recognition denoted images where the
text recognized with OCR was identical to the text in the
image. For partial recognition, at least one character in the
returned text had to be missing or inaccurately substituted. For
garbled recognition, either empty text was returned or all
characters in the returned text were misrecognized. Table 1
gives the results of the OCR engine text recognition
comparison. The abbreviations TD, GD, TS, and GS stand for
‘Tesseract Device,’ ‘GOCR Device,’ ‘Tesseract Server,’ and
‘GOCR Server,’ respectively. Each cell contains the exact
number of images out of 200 in a specific category.
Table 1. Tesseract vs. GOCR (%).
Complete Partial Garbled
TD 146 36 18
GD 42 23 135
TS 158 23 19
GS 58 56 90
The percentages in Table 1 indicate that the recognition
rates on Tesseract are higher than those on GOCR. Tesseract
also compared favorably with GOCR in the garbled category
where its numbers were lower than those of GOCR.
To evaluate the runtime performance of both engines, we
ran the application in both modes on the same sample of 200
images five times. Table 2 tabulates the rounded processing
times and averages in seconds. The numbers 1 through 5 are
the run numbers. The AVG column contains the average time
of the five runs. The AVG/Img column contains the average
time per individual image.
Table 2. Run Times (in secs) of Tesseract & GOCR.
1 2 3 4 5 AVG AVG/Img
TD 128 101 101 110 103 110 0.5
GD 50 47 49 52 48 49 0.3
TS 40 38 38 10 39 38 0.2
GS 21 21 20 21 21 21 0.1
Table 2 shows no significant variation among the
processing times of individual runs. The AVG column
indicates that Tesseract was slower than GOCR on the sample
of images. The difference in run times can be attributed to the
amount of text recognized by each engine. Since GOCR
extracts less information than Tesseract, as indicated in Table
1, its running times are faster. Tesseract, on the other hand,
extracts much more accurate information from most images,
which causes it to take more time. While the combined run
times of Tesseract were slower than those of GOCR, the
average run time per frame, shown in the AVG/Img column,
were still under one second. Tessaract’s higher recognition
rates swayed our final decision in its favor. Additionally, as
Table 2 shows, when the OCR was done on the server,
Tesseract’s run times, while still slower than GOCR, were
faster, which made it more acceptable to us. This performance
can be further optimized through server-side servlets and
faster image transmission channels (e.g., 4G instead of Wi-Fi).
After we performed these experiments, we discovered that we
could get even better accuracy from the Tesseract OCR engine
through the character whitelist and blacklist options.
Whitelisting characters will limit the set of characters for
which Tesseract looks. Blacklisting on the other hand instructs
Tesseract not to look for certain characters in the image. A list
of frequent NL characters was manually prepared and used for
whitelisting after the experiments. Further evaluation is
necessary to verify the conjecture that whitelisting improves
accuracy.
B. STM vs. Apache Lucene’s N-Grams and LED
After Tesseract was chosen as the base OCR engine, the
performance of the STM algorithm was compared with the
Apache Lucene implementation of n-gram matching and the
LED. Each of the three algorithms worked on the text strings
produced by the Tesseract OCR running on Android 2.3.6 and
Android 4.2 smartphones on a collection of 600 text segment
images produced by the algorithm described in Section IV.
Figure 12 shows the exact control flow of the application used
in the tests.
Android applications cannot execute long-running
operations on the UI Thread. The application force-closes if an
operation takes more than 5 seconds while running on the UI
Thread. Since in our application we have some long-running
operations such as OCR processing, and network
communication, we have used the Pipeline Thread model. The
Pipeline Thread holds a queue of some units of work that can
be executed. Other threads can push new tasks into the
Pipeline Thread’s queue as new tasks become available. The
Pipeline Thread processes the tasks in the queue one after
another. If there are no tasks in the queue, it blocks until a new
task is added to the queue. On the Android platform, this
design pattern can be easily implemented using the Loopers
and Handlers. A Looper class makes a thread into a pipeline
Figure 12. Application Flowchart.
International Journal of Digital Information and Wireless Communications (IJDIWC) 3(3): 56-65
The Society of Digital Information and Wireless Communications, 2013 (ISSN: 2225-658X)
63
thread and a Handler makes it easier to push tasks into a
Looper from other threads.
The Lucene n-gram and edit distance matching algorithms
find possible correct spellings of misspelled words. The n-
gram distance algorithm divides a misspelt word into chunks
of size n (n defaults to 2) and compares them to a dictionary of
n-grams, each of which points to the words in which they are
found. A word with the largest number of matched n-grams is
a possible spelling of a misspelled word. If m is the number of
n-grams found in a misspelled word and K is the number of n-
grams in a dictionary, the time complexity of the n-gram
matching is O(Km). Since the standard LED uses dynamic
programming, its time complexity is O(n2
), where n is the
maximum of the lengths of two strings being matched. If there
are m entries in a dictionary, the run time of the LED
algorithm is O(mn2
).
In comparing the performance of the STM algorithm with
the n-gram matching and the LED, time and accuracy were
used as performance metrics. The average run time taken by
the STM algorithm, the n-gram matching, and the LED
distance are 20 ms, 50 ms, and 51 ms, respectively. The
performance of each algorithm was evaluated on non-word
error correction with the recall measure computed as the ratio
of corrected and misspelled words. The recall coefficients of
the STM, n-gram matching, and the LED were 0.15, 0.085,
and 0.078, respectively. Table 3 gives the run-time and the
corresponding recalls.
Table 3. Performance Comparision.
STM N-Gram LED
Run Time 25ms 51ms 51ms
Recall 15% 9% 8%
VI. Discussion
In this paper, we presented an algorithm, called Skip Trie
Matching (STM), for real time OCR output error correction on
smartphones and compared its performance with the n-gram
matching and the LED. The experiments on the sample of over
600 texts extracted by the Tesseract OCR engine from the NL
text images show that the STM algorithm ran faster, which is
predicted by the asymptotic analysis, and corrected more
words than the n-gram matching and the LED.
One current limitation of the STM algorithm is that it can
correct a misspelled word so long as there is a target word in
the trie dictionary of the same length. One possible way to
address this limitation is to relax the exact length comparison
on Line 8 in the STM algorithm given in Figure 8. This will
increase the recall but will also increase the run time. Another
approach, which may be more promising in the long run, is
example-driven human-assisted learning [2]. Each OCR
engine working in a given domain is likely to misrecognize
the same characters consistently. For example, we have
noticed that Tesseract consistently misrecognized ‘u’ as ‘ll’ or
‘b’ as ‘8.’ Such examples, if consistently classified by human
judges as misrecognition examples, can be added by the
system to add to a dictionary of regular misspellings. In a
subsequent run, when the Tesseract OCR engine
misrecognizes ‘potassium’ and as ‘potassillm,’ the STM
algorithm, when failing to find a possible suggestion on the
original input, will replace all regularly misspelled characters
in the input with their misspellings. Thus, ‘ll’ in ‘potassill’
will be replaced with ‘u’ to obtain ‘potassium’ and get a
successful match.
Another limitation of the STM algorithm is that does not
find all possible corrected spellings of a misspelled word. To
make it find all possible corrections, the algorithm can be
forced to examine all nodes even when it finds a successful
match on line 12. Although it will likely take a toll on the
algorithm’s run time, the worst case asymptotic run time,
   ,log nOnO
d
 remains the same.
An experimental contribution of this research presented in
this article is an experimental comparison of Tesseract and
GOCR, two popular open source OCR engines, for vision-
based nutrition information extraction. GOCR appears to
extract less information than Tesseract but has faster run
times. While the run times of Tesseract were slower, its higher
recognition rates swayed our final decision in its favor. The
run-time performance, as indicated in Tables 1 and 2, can be
further optimized through server-side servlets and faster image
transmission channels (e.g., 4G instead of Wi-Fi).
Our ultimate objective is to build a mobile system that can
not only extract nutrition information from product packages
but also to match the extracted information to the users’
dietary profiles and to make dietary recommendations to effect
behavior changes. For example, if a user is pre-diabetic, the
system will estimate the amount of sugar from the extracted
ingredients and will make specific recommendations to the
user. The system, if the users so choose, will keep track of
their long-term buying patterns and make recommendations on
a daily, weekly or monthly basis. For example, if a user
exceeds his or her total amount of saturated fat permissible for
the specified profile, the system will notify the user and, if the
user’s profile has appropriate permissions, the user’s dietician.
Acknowledgment
This project has been supported, in part, by the MDSC
Corporation. We would like to thank Dr. Stephen Clyde,
MDSC President, for supporting our research and
championing our cause.
References
[1] Apache Lucene, http://lucene.apache.org/core/, Retrieved
04/15/2013.
[2] Jurafsky, D. and Martin, J. (2000). Speech and Language
Processing: An Introduction to Natural Language
Processing, Computational Linguistics, and Speech
Recognition. Prentice Hall, Upper Saddle River, New
Jersey, 2000, ISBN: 0130950696.
[3] Levenshtein, V. (1966). Binary codes capable of
correcting deletions, insertions, and reversals, Doklady
Akademii Nauk SSSR, 163(4):845-848, 1965 (Russian).
English translation in Soviet Physics Doklady, 10(8):707-
710, 1966.
International Journal of Digital Information and Wireless Communications (IJDIWC) 3(3): 56-65
The Society of Digital Information and Wireless Communications, 2013 (ISSN: 2225-658X)
64
[4] Tesseract Optical Character Recognition Engine.
http://code.google.com/p/tesseract-ocr/, Retrieved
04/15/2013.
[5] Anding, R. Nutrition Made Clear. The Great Courses,
Chantilly, VA, 2009.
[6] D. Graham, J. Orquin, V. Visschers. (2012). Eye Tracking
and Nutrition Label Use: A Review of the Literatur and
Recommendations for Label Enhancement. Food Policy,
Vol. 37, pp. 378-382.
[7] Kane S, Bigham J, Wobbrock J. (2008). Slide Rule:
Making M obile Touch Screens Accessible to Blind
People using Multi-Touch Interaction Techniques. In
Proceedings of 10-th Conference on Computers and
Accessibility (ASSETS 2008), October, Halifax, Nova
Scotia, Canada 2008; pp. 73-80.
[8] Kulyukin, V., Crandall, W., and Coster, D. (2011).
Efficiency or Quality of Experience: A Laboratory Study
of Three Eyes-Free Touchscreen Menu Browsing User
Interfaces for Mobile Phones. The Open Rehabilitation
Journal, Vol. 4, pp. 13-22, DOI:
10.2174/1874943701104010013.
[9] K-NFB Reader for Nokia Mobile Phones,
www.knfbreader.com, Retrieved 03/10/2013.
[10] Al-Yousefi, H., and Udpa, S. (1992). Recognition of
Arabic Characters. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 14, 8 (August 1992),
pp. 853-857.
[11] Kulyukin, V., Zaman, T., Andhavarapu, A., and
Kutiyanawala, A. (2012). Eyesight Sharing in Blind
Grocery Shopping: Remote P2P Caregiving through
Cloud Computing. In Proceedings of the 13-th
International Conference on Computers Helping People
with Special Needs (ICCHP 2012), K. Miesenberger et al.
(Eds.), ICCHP 2012, Part II, Springer Lecture Notes on
Computer Science (LNCS) 7383, pp. 75-82, July 11-13,
2012, Linz, Austria (pdf); DOI 10.1007/978-3-642-
31522-0; ISBN 978-3-642-31521-3; ISSN 0302-9743.
[12] Kulyukin, V. and Kutiyanawala, A. (2010). Accessible
Shopping Systems for Blind and Visually Impaired
Individuals: Design Requirements and the State of the
Art. The Open Rehabilitation Journal, ISSN: 1874-9437,
Volume 2, 2010, pp. 158-168, DOI:
10.2174/1874943701003010158.
[13] Årsand, E., Tatara, N., Østengen, G., and Hartvigsen, G.
(2010). Mobile Phone-Based Self-Management Tools for
Type 2 Diabetes: The Few Touch Application. Journal of
Diabetes Science and Technology, 4, 2 (March 2010), pp.
328-336.
[14] Frøisland, D.H., Arsand E., and Skårderud F. (2010).
Improving Diabetes Care for Young People with Type 1
Diabetes through Visual Learning on Mobile Phones:
Mixed-methods Study. J. Med. Internet Res. 6, 14(4),
(Aug 2012), published online; DOI= 10.2196/jmir.2155.
[15] Bae, K. S., Kim, K. K., Chung, Y. G., and Yu, W. P.
(2005). Character Recognition System for Cellular Phone
with Camera. In Proceedings of the 29th Annual
International Computer Software and Applications
Conference, Volume 01, (Washington, DC, USA, 2005),
COMPSAC '05, IEEE Computer Society, pp. 539-544.
[16] Hsueh, M. (2011). Interactive Text Recognition and
Translation on a Mobile Device. Master's thesis, EECS
Department, University of California, Berkeley, May
2011.
[17] Kukich, K. (1992). Techniques for Automatically
Correcting Words in Text. ACM Computing Surveys
(CSUR), v.24 n.4, p.377-439, Dec. 1992.
[18] Kai, N. (2010). Unsupervised Post-Correction of OCR
Errors. Diss. Master’s Thesis, Leibniz Universität
Hannover, Germany, 2010.
[19] Damerau, F.J. (1964). A technique for computer detection
and correction of spelling errors. In Communications of
ACM, 7(3):171-176, 1964.
[20] Kulyukin, V., Kutiyanawala, A., and Zaman, T. (2012).
Eyes-Free Barcode Detection on Smartphones with
Niblack's Binarization and Support Vector Machines. In
Proceedings of the 16-th International Conference on
Image Processing, Computer Vision, and Pattern
Recognition, Vol. 1, (Las Vegas, Nevada, USA), IPCV
2012, CSREA Press, July 16-19, 2012, pp. 284-290;
ISBN: 1-60132-223-2, 1-60132-224-0.
[21] Duda, R. O. and P. E. Hart. 1972. Use of the Hough
Transformation to Detect Lines and Curves in Pictures.
Comm. ACM, Vol. 15, (January 1972), pp. 11-15.
[22] GOCR - A Free Optical Character Recognition Program.
http://jocr.sourceforge.net/, Retrieved 04/15/2013.
[23] B.J. Fogg. 2009. A Behavior for Persuasive Design. In
Proceedings of the 4th International Conference on
Persuasive Technology (Persuasive 09), Article No. 40,
ACM New York, NY, USA, ISBN: 978-1-60558-376-1.
[24] ABBYY Mobile OCR Engine.
http://www.abbyy.com/mobileocr/, Retrieved April 15
2013.
[25] WINTONE Mobile OCR Engine.
http://www.wintone.com.cn/en/prod/44/detail270.aspx,
Retrieved April 15, 2013.
[26] Kulyukin, V., Kutiyanawala, A., and Zaman, T. (2012).
Eyes-Free Barcode Detection on Smartphones with
Niblack's Binarization and Support Vector Machines . In
Proceedings of the 16-th International Conference on
Image Processing, Computer Vision, and Pattern
Recognition ( IPCV 2012), Vol. I, pp. 284-290, CSREA
Press, July 16-19, 2012, Las Vegas, Nevada, USA, (pdf);
ISBN: 1-60132-223-2, 1-60132-224-0.
International Journal of Digital Information and Wireless Communications (IJDIWC) 3(3): 56-65
The Society of Digital Information and Wireless Communications, 2013 (ISSN: 2225-658X)
65

More Related Content

What's hot

IRJET- An Android Application for Electronic Health Record System
IRJET- An Android Application for Electronic Health Record SystemIRJET- An Android Application for Electronic Health Record System
IRJET- An Android Application for Electronic Health Record SystemIRJET Journal
 
INTERNET OF THINGS BASED MODEL FOR IDENTIFYING PEDIATRIC EMERGENCY CASES
INTERNET OF THINGS BASED MODEL FOR IDENTIFYING PEDIATRIC EMERGENCY CASESINTERNET OF THINGS BASED MODEL FOR IDENTIFYING PEDIATRIC EMERGENCY CASES
INTERNET OF THINGS BASED MODEL FOR IDENTIFYING PEDIATRIC EMERGENCY CASESamsjournal
 
INTERNET OF THINGS BASED MODEL FOR IDENTIFYING PEDIATRIC EMERGENCY CASES
INTERNET OF THINGS BASED MODEL FOR IDENTIFYING PEDIATRIC EMERGENCY CASESINTERNET OF THINGS BASED MODEL FOR IDENTIFYING PEDIATRIC EMERGENCY CASES
INTERNET OF THINGS BASED MODEL FOR IDENTIFYING PEDIATRIC EMERGENCY CASESpijans
 
IRJET- Diabetes Diagnosis using Machine Learning Algorithms
IRJET- Diabetes Diagnosis using Machine Learning AlgorithmsIRJET- Diabetes Diagnosis using Machine Learning Algorithms
IRJET- Diabetes Diagnosis using Machine Learning AlgorithmsIRJET Journal
 
Diabetes prediction-unpublished
Diabetes prediction-unpublishedDiabetes prediction-unpublished
Diabetes prediction-unpublishedmayankagrawal233
 
Nutritional software
Nutritional software Nutritional software
Nutritional software David mbwiga
 
IRJET - Machine Learning for Diagnosis of Diabetes
IRJET - Machine Learning for Diagnosis of DiabetesIRJET - Machine Learning for Diagnosis of Diabetes
IRJET - Machine Learning for Diagnosis of DiabetesIRJET Journal
 
Usability analysis of sms alert system for immunization in the context of ban...
Usability analysis of sms alert system for immunization in the context of ban...Usability analysis of sms alert system for immunization in the context of ban...
Usability analysis of sms alert system for immunization in the context of ban...eSAT Journals
 

What's hot (12)

IRJET- An Android Application for Electronic Health Record System
IRJET- An Android Application for Electronic Health Record SystemIRJET- An Android Application for Electronic Health Record System
IRJET- An Android Application for Electronic Health Record System
 
INTERNET OF THINGS BASED MODEL FOR IDENTIFYING PEDIATRIC EMERGENCY CASES
INTERNET OF THINGS BASED MODEL FOR IDENTIFYING PEDIATRIC EMERGENCY CASESINTERNET OF THINGS BASED MODEL FOR IDENTIFYING PEDIATRIC EMERGENCY CASES
INTERNET OF THINGS BASED MODEL FOR IDENTIFYING PEDIATRIC EMERGENCY CASES
 
INTERNET OF THINGS BASED MODEL FOR IDENTIFYING PEDIATRIC EMERGENCY CASES
INTERNET OF THINGS BASED MODEL FOR IDENTIFYING PEDIATRIC EMERGENCY CASESINTERNET OF THINGS BASED MODEL FOR IDENTIFYING PEDIATRIC EMERGENCY CASES
INTERNET OF THINGS BASED MODEL FOR IDENTIFYING PEDIATRIC EMERGENCY CASES
 
IRJET- Diabetes Diagnosis using Machine Learning Algorithms
IRJET- Diabetes Diagnosis using Machine Learning AlgorithmsIRJET- Diabetes Diagnosis using Machine Learning Algorithms
IRJET- Diabetes Diagnosis using Machine Learning Algorithms
 
Diabetes prediction-unpublished
Diabetes prediction-unpublishedDiabetes prediction-unpublished
Diabetes prediction-unpublished
 
E medicine
E medicineE medicine
E medicine
 
E-Medicine Apps
E-Medicine AppsE-Medicine Apps
E-Medicine Apps
 
Nutritional software
Nutritional software Nutritional software
Nutritional software
 
IRJET - Machine Learning for Diagnosis of Diabetes
IRJET - Machine Learning for Diagnosis of DiabetesIRJET - Machine Learning for Diagnosis of Diabetes
IRJET - Machine Learning for Diagnosis of Diabetes
 
Executive summary
Executive summaryExecutive summary
Executive summary
 
Usability analysis of sms alert system for immunization in the context of ban...
Usability analysis of sms alert system for immunization in the context of ban...Usability analysis of sms alert system for immunization in the context of ban...
Usability analysis of sms alert system for immunization in the context of ban...
 
SAFETY NOTIFICATION AND BUS MONITORING SYSTEM
SAFETY NOTIFICATION AND BUS MONITORING SYSTEMSAFETY NOTIFICATION AND BUS MONITORING SYSTEM
SAFETY NOTIFICATION AND BUS MONITORING SYSTEM
 

Similar to Skip Trie Matching: A Greedy Algorithm for Real-Time OCR Error Correction on Smartphones

Vision-Based Localization & Text Chunking of Nutrition Fact Tables on Android...
Vision-Based Localization & Text Chunking of Nutrition Fact Tables on Android...Vision-Based Localization & Text Chunking of Nutrition Fact Tables on Android...
Vision-Based Localization & Text Chunking of Nutrition Fact Tables on Android...Vladimir Kulyukin
 
Vision-Based Localization & Text Chunking of Nutrition Fact Tables on Android...
Vision-Based Localization & Text Chunking of Nutrition Fact Tables on Android...Vision-Based Localization & Text Chunking of Nutrition Fact Tables on Android...
Vision-Based Localization & Text Chunking of Nutrition Fact Tables on Android...Vladimir Kulyukin
 
Effective Nutrition Label Use on Smartphones
Effective Nutrition Label Use on SmartphonesEffective Nutrition Label Use on Smartphones
Effective Nutrition Label Use on SmartphonesVladimir Kulyukin
 
An Algorithm for In-Place Vision-Based Skewed 1D Barcode Scanning in the Cloud
An Algorithm for In-Place Vision-Based Skewed 1D Barcode Scanning in the CloudAn Algorithm for In-Place Vision-Based Skewed 1D Barcode Scanning in the Cloud
An Algorithm for In-Place Vision-Based Skewed 1D Barcode Scanning in the CloudVladimir Kulyukin
 
An Algorithm for Mobile Vision-Based Localization of Skewed Nutrition Labels ...
An Algorithm for Mobile Vision-Based Localization of Skewed Nutrition Labels ...An Algorithm for Mobile Vision-Based Localization of Skewed Nutrition Labels ...
An Algorithm for Mobile Vision-Based Localization of Skewed Nutrition Labels ...Vladimir Kulyukin
 
IRJET- A Food Recognition System for Diabetic Patients based on an Optimized ...
IRJET- A Food Recognition System for Diabetic Patients based on an Optimized ...IRJET- A Food Recognition System for Diabetic Patients based on an Optimized ...
IRJET- A Food Recognition System for Diabetic Patients based on an Optimized ...IRJET Journal
 
IRJET - Deep Multiple Instance Learning for Automatic Detection of Diabetic R...
IRJET - Deep Multiple Instance Learning for Automatic Detection of Diabetic R...IRJET - Deep Multiple Instance Learning for Automatic Detection of Diabetic R...
IRJET - Deep Multiple Instance Learning for Automatic Detection of Diabetic R...IRJET Journal
 
"Predictive Modelling for Overweight and Obesity: Harnessing Machine Learning...
"Predictive Modelling for Overweight and Obesity: Harnessing Machine Learning..."Predictive Modelling for Overweight and Obesity: Harnessing Machine Learning...
"Predictive Modelling for Overweight and Obesity: Harnessing Machine Learning...IRJET Journal
 
A Mobile-Cloud based Context-Aware and Interactive Framework for Diabetes Man...
A Mobile-Cloud based Context-Aware and Interactive Framework for Diabetes Man...A Mobile-Cloud based Context-Aware and Interactive Framework for Diabetes Man...
A Mobile-Cloud based Context-Aware and Interactive Framework for Diabetes Man...IJERA Editor
 
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...IAEMEPublication
 
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...IAEME Publication
 
Multiple Disease Prediction System: A Review
Multiple Disease Prediction System: A ReviewMultiple Disease Prediction System: A Review
Multiple Disease Prediction System: A ReviewIRJET Journal
 
Eating Habit and Health Monitoring System using Android Based Machine Learning
Eating Habit and Health Monitoring System using Android Based Machine LearningEating Habit and Health Monitoring System using Android Based Machine Learning
Eating Habit and Health Monitoring System using Android Based Machine LearningIRJET Journal
 
“Detection of Diseases using Machine Learning”
“Detection of Diseases using Machine Learning”“Detection of Diseases using Machine Learning”
“Detection of Diseases using Machine Learning”IRJET Journal
 
IRJET- An Android Application for Electronic Health Record System
IRJET- An Android Application for Electronic Health Record SystemIRJET- An Android Application for Electronic Health Record System
IRJET- An Android Application for Electronic Health Record SystemIRJET Journal
 
DANES: Diet and Nutrition Expert System for Meal Management and Nutrition Cou...
DANES: Diet and Nutrition Expert System for Meal Management and Nutrition Cou...DANES: Diet and Nutrition Expert System for Meal Management and Nutrition Cou...
DANES: Diet and Nutrition Expert System for Meal Management and Nutrition Cou...rahulmonikasharma
 
How Mobile Technology dominate the world of Healthcare industry
How Mobile Technology dominate the world of Healthcare industryHow Mobile Technology dominate the world of Healthcare industry
How Mobile Technology dominate the world of Healthcare industryPeerbits
 
“A Medical System to Identify the Mortality Rates with Hospital ResourcesUsin...
“A Medical System to Identify the Mortality Rates with Hospital ResourcesUsin...“A Medical System to Identify the Mortality Rates with Hospital ResourcesUsin...
“A Medical System to Identify the Mortality Rates with Hospital ResourcesUsin...IRJET Journal
 
IRJET- Data Mining Techniques to Predict Diabetes
IRJET- Data Mining Techniques to Predict DiabetesIRJET- Data Mining Techniques to Predict Diabetes
IRJET- Data Mining Techniques to Predict DiabetesIRJET Journal
 

Similar to Skip Trie Matching: A Greedy Algorithm for Real-Time OCR Error Correction on Smartphones (20)

Vision-Based Localization & Text Chunking of Nutrition Fact Tables on Android...
Vision-Based Localization & Text Chunking of Nutrition Fact Tables on Android...Vision-Based Localization & Text Chunking of Nutrition Fact Tables on Android...
Vision-Based Localization & Text Chunking of Nutrition Fact Tables on Android...
 
Vision-Based Localization & Text Chunking of Nutrition Fact Tables on Android...
Vision-Based Localization & Text Chunking of Nutrition Fact Tables on Android...Vision-Based Localization & Text Chunking of Nutrition Fact Tables on Android...
Vision-Based Localization & Text Chunking of Nutrition Fact Tables on Android...
 
Effective Nutrition Label Use on Smartphones
Effective Nutrition Label Use on SmartphonesEffective Nutrition Label Use on Smartphones
Effective Nutrition Label Use on Smartphones
 
An Algorithm for In-Place Vision-Based Skewed 1D Barcode Scanning in the Cloud
An Algorithm for In-Place Vision-Based Skewed 1D Barcode Scanning in the CloudAn Algorithm for In-Place Vision-Based Skewed 1D Barcode Scanning in the Cloud
An Algorithm for In-Place Vision-Based Skewed 1D Barcode Scanning in the Cloud
 
An Algorithm for Mobile Vision-Based Localization of Skewed Nutrition Labels ...
An Algorithm for Mobile Vision-Based Localization of Skewed Nutrition Labels ...An Algorithm for Mobile Vision-Based Localization of Skewed Nutrition Labels ...
An Algorithm for Mobile Vision-Based Localization of Skewed Nutrition Labels ...
 
IRJET- A Food Recognition System for Diabetic Patients based on an Optimized ...
IRJET- A Food Recognition System for Diabetic Patients based on an Optimized ...IRJET- A Food Recognition System for Diabetic Patients based on an Optimized ...
IRJET- A Food Recognition System for Diabetic Patients based on an Optimized ...
 
IRJET - Deep Multiple Instance Learning for Automatic Detection of Diabetic R...
IRJET - Deep Multiple Instance Learning for Automatic Detection of Diabetic R...IRJET - Deep Multiple Instance Learning for Automatic Detection of Diabetic R...
IRJET - Deep Multiple Instance Learning for Automatic Detection of Diabetic R...
 
"Predictive Modelling for Overweight and Obesity: Harnessing Machine Learning...
"Predictive Modelling for Overweight and Obesity: Harnessing Machine Learning..."Predictive Modelling for Overweight and Obesity: Harnessing Machine Learning...
"Predictive Modelling for Overweight and Obesity: Harnessing Machine Learning...
 
CHOOSELYF
CHOOSELYFCHOOSELYF
CHOOSELYF
 
A Mobile-Cloud based Context-Aware and Interactive Framework for Diabetes Man...
A Mobile-Cloud based Context-Aware and Interactive Framework for Diabetes Man...A Mobile-Cloud based Context-Aware and Interactive Framework for Diabetes Man...
A Mobile-Cloud based Context-Aware and Interactive Framework for Diabetes Man...
 
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...
 
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...
 
Multiple Disease Prediction System: A Review
Multiple Disease Prediction System: A ReviewMultiple Disease Prediction System: A Review
Multiple Disease Prediction System: A Review
 
Eating Habit and Health Monitoring System using Android Based Machine Learning
Eating Habit and Health Monitoring System using Android Based Machine LearningEating Habit and Health Monitoring System using Android Based Machine Learning
Eating Habit and Health Monitoring System using Android Based Machine Learning
 
“Detection of Diseases using Machine Learning”
“Detection of Diseases using Machine Learning”“Detection of Diseases using Machine Learning”
“Detection of Diseases using Machine Learning”
 
IRJET- An Android Application for Electronic Health Record System
IRJET- An Android Application for Electronic Health Record SystemIRJET- An Android Application for Electronic Health Record System
IRJET- An Android Application for Electronic Health Record System
 
DANES: Diet and Nutrition Expert System for Meal Management and Nutrition Cou...
DANES: Diet and Nutrition Expert System for Meal Management and Nutrition Cou...DANES: Diet and Nutrition Expert System for Meal Management and Nutrition Cou...
DANES: Diet and Nutrition Expert System for Meal Management and Nutrition Cou...
 
How Mobile Technology dominate the world of Healthcare industry
How Mobile Technology dominate the world of Healthcare industryHow Mobile Technology dominate the world of Healthcare industry
How Mobile Technology dominate the world of Healthcare industry
 
“A Medical System to Identify the Mortality Rates with Hospital ResourcesUsin...
“A Medical System to Identify the Mortality Rates with Hospital ResourcesUsin...“A Medical System to Identify the Mortality Rates with Hospital ResourcesUsin...
“A Medical System to Identify the Mortality Rates with Hospital ResourcesUsin...
 
IRJET- Data Mining Techniques to Predict Diabetes
IRJET- Data Mining Techniques to Predict DiabetesIRJET- Data Mining Techniques to Predict Diabetes
IRJET- Data Mining Techniques to Predict Diabetes
 

More from Vladimir Kulyukin

Toward Sustainable Electronic Beehive Monitoring: Algorithms for Omnidirectio...
Toward Sustainable Electronic Beehive Monitoring: Algorithms for Omnidirectio...Toward Sustainable Electronic Beehive Monitoring: Algorithms for Omnidirectio...
Toward Sustainable Electronic Beehive Monitoring: Algorithms for Omnidirectio...Vladimir Kulyukin
 
Digitizing Buzzing Signals into A440 Piano Note Sequences and Estimating Fora...
Digitizing Buzzing Signals into A440 Piano Note Sequences and Estimating Fora...Digitizing Buzzing Signals into A440 Piano Note Sequences and Estimating Fora...
Digitizing Buzzing Signals into A440 Piano Note Sequences and Estimating Fora...Vladimir Kulyukin
 
Generalized Hamming Distance
Generalized Hamming DistanceGeneralized Hamming Distance
Generalized Hamming DistanceVladimir Kulyukin
 
Adapting Measures of Clumping Strength to Assess Term-Term Similarity
Adapting Measures of Clumping Strength to Assess Term-Term SimilarityAdapting Measures of Clumping Strength to Assess Term-Term Similarity
Adapting Measures of Clumping Strength to Assess Term-Term SimilarityVladimir Kulyukin
 
A Cloud-Based Infrastructure for Caloric Intake Estimation from Pre-Meal Vide...
A Cloud-Based Infrastructure for Caloric Intake Estimation from Pre-Meal Vide...A Cloud-Based Infrastructure for Caloric Intake Estimation from Pre-Meal Vide...
A Cloud-Based Infrastructure for Caloric Intake Estimation from Pre-Meal Vide...Vladimir Kulyukin
 
Exploring Finite State Automata with Junun Robots: A Case Study in Computabil...
Exploring Finite State Automata with Junun Robots: A Case Study in Computabil...Exploring Finite State Automata with Junun Robots: A Case Study in Computabil...
Exploring Finite State Automata with Junun Robots: A Case Study in Computabil...Vladimir Kulyukin
 
Image Blur Detection with 2D Haar Wavelet Transform and Its Effect on Skewed ...
Image Blur Detection with 2D Haar Wavelet Transform and Its Effect on Skewed ...Image Blur Detection with 2D Haar Wavelet Transform and Its Effect on Skewed ...
Image Blur Detection with 2D Haar Wavelet Transform and Its Effect on Skewed ...Vladimir Kulyukin
 
Text Skew Angle Detection in Vision-Based Scanning of Nutrition Labels
Text Skew Angle Detection in Vision-Based Scanning of Nutrition LabelsText Skew Angle Detection in Vision-Based Scanning of Nutrition Labels
Text Skew Angle Detection in Vision-Based Scanning of Nutrition LabelsVladimir Kulyukin
 
Vision-Based Localization and Scanning of 1D UPC and EAN Barcodes with Relaxe...
Vision-Based Localization and Scanning of 1D UPC and EAN Barcodes with Relaxe...Vision-Based Localization and Scanning of 1D UPC and EAN Barcodes with Relaxe...
Vision-Based Localization and Scanning of 1D UPC and EAN Barcodes with Relaxe...Vladimir Kulyukin
 
Narrative Map Augmentation with Automated Landmark Extraction and Path Inference
Narrative Map Augmentation with Automated Landmark Extraction and Path InferenceNarrative Map Augmentation with Automated Landmark Extraction and Path Inference
Narrative Map Augmentation with Automated Landmark Extraction and Path InferenceVladimir Kulyukin
 
Toward Blind Travel Support through Verbal Route Directions: A Path Inference...
Toward Blind Travel Support through Verbal Route Directions: A Path Inference...Toward Blind Travel Support through Verbal Route Directions: A Path Inference...
Toward Blind Travel Support through Verbal Route Directions: A Path Inference...Vladimir Kulyukin
 
Eye-Free Barcode Detection on Smartphones with Niblack's Binarization and Sup...
Eye-Free Barcode Detection on Smartphones with Niblack's Binarization and Sup...Eye-Free Barcode Detection on Smartphones with Niblack's Binarization and Sup...
Eye-Free Barcode Detection on Smartphones with Niblack's Binarization and Sup...Vladimir Kulyukin
 
Eyesight Sharing in Blind Grocery Shopping: Remote P2P Caregiving through Clo...
Eyesight Sharing in Blind Grocery Shopping: Remote P2P Caregiving through Clo...Eyesight Sharing in Blind Grocery Shopping: Remote P2P Caregiving through Clo...
Eyesight Sharing in Blind Grocery Shopping: Remote P2P Caregiving through Clo...Vladimir Kulyukin
 
Wireless Indoor Localization with Dempster-Shafer Simple Support Functions
Wireless Indoor Localization with Dempster-Shafer Simple Support FunctionsWireless Indoor Localization with Dempster-Shafer Simple Support Functions
Wireless Indoor Localization with Dempster-Shafer Simple Support FunctionsVladimir Kulyukin
 
RFID in Robot-Assisted Indoor Navigation for the Visually Impaired
RFID in Robot-Assisted Indoor Navigation for the Visually ImpairedRFID in Robot-Assisted Indoor Navigation for the Visually Impaired
RFID in Robot-Assisted Indoor Navigation for the Visually ImpairedVladimir Kulyukin
 
RoboCart: Toward Robot-Assisted Navigation of Grocery Stores by the Visually ...
RoboCart: Toward Robot-Assisted Navigation of Grocery Stores by the Visually ...RoboCart: Toward Robot-Assisted Navigation of Grocery Stores by the Visually ...
RoboCart: Toward Robot-Assisted Navigation of Grocery Stores by the Visually ...Vladimir Kulyukin
 
On Natural Language Dialogue with Assistive Robots
On Natural Language Dialogue with Assistive RobotsOn Natural Language Dialogue with Assistive Robots
On Natural Language Dialogue with Assistive RobotsVladimir Kulyukin
 
Ergonomics-for-One in a Robotic Shopping Cart for the Blind
Ergonomics-for-One in a Robotic Shopping Cart for the BlindErgonomics-for-One in a Robotic Shopping Cart for the Blind
Ergonomics-for-One in a Robotic Shopping Cart for the BlindVladimir Kulyukin
 
A Wearable Two-Sensor O&M Device for Blind College Students
A Wearable Two-Sensor O&M Device for Blind College StudentsA Wearable Two-Sensor O&M Device for Blind College Students
A Wearable Two-Sensor O&M Device for Blind College StudentsVladimir Kulyukin
 
On the Impact of Data Collection on the Quality of Signal Strength in Wi-Fi I...
On the Impact of Data Collection on the Quality of Signal Strength in Wi-Fi I...On the Impact of Data Collection on the Quality of Signal Strength in Wi-Fi I...
On the Impact of Data Collection on the Quality of Signal Strength in Wi-Fi I...Vladimir Kulyukin
 

More from Vladimir Kulyukin (20)

Toward Sustainable Electronic Beehive Monitoring: Algorithms for Omnidirectio...
Toward Sustainable Electronic Beehive Monitoring: Algorithms for Omnidirectio...Toward Sustainable Electronic Beehive Monitoring: Algorithms for Omnidirectio...
Toward Sustainable Electronic Beehive Monitoring: Algorithms for Omnidirectio...
 
Digitizing Buzzing Signals into A440 Piano Note Sequences and Estimating Fora...
Digitizing Buzzing Signals into A440 Piano Note Sequences and Estimating Fora...Digitizing Buzzing Signals into A440 Piano Note Sequences and Estimating Fora...
Digitizing Buzzing Signals into A440 Piano Note Sequences and Estimating Fora...
 
Generalized Hamming Distance
Generalized Hamming DistanceGeneralized Hamming Distance
Generalized Hamming Distance
 
Adapting Measures of Clumping Strength to Assess Term-Term Similarity
Adapting Measures of Clumping Strength to Assess Term-Term SimilarityAdapting Measures of Clumping Strength to Assess Term-Term Similarity
Adapting Measures of Clumping Strength to Assess Term-Term Similarity
 
A Cloud-Based Infrastructure for Caloric Intake Estimation from Pre-Meal Vide...
A Cloud-Based Infrastructure for Caloric Intake Estimation from Pre-Meal Vide...A Cloud-Based Infrastructure for Caloric Intake Estimation from Pre-Meal Vide...
A Cloud-Based Infrastructure for Caloric Intake Estimation from Pre-Meal Vide...
 
Exploring Finite State Automata with Junun Robots: A Case Study in Computabil...
Exploring Finite State Automata with Junun Robots: A Case Study in Computabil...Exploring Finite State Automata with Junun Robots: A Case Study in Computabil...
Exploring Finite State Automata with Junun Robots: A Case Study in Computabil...
 
Image Blur Detection with 2D Haar Wavelet Transform and Its Effect on Skewed ...
Image Blur Detection with 2D Haar Wavelet Transform and Its Effect on Skewed ...Image Blur Detection with 2D Haar Wavelet Transform and Its Effect on Skewed ...
Image Blur Detection with 2D Haar Wavelet Transform and Its Effect on Skewed ...
 
Text Skew Angle Detection in Vision-Based Scanning of Nutrition Labels
Text Skew Angle Detection in Vision-Based Scanning of Nutrition LabelsText Skew Angle Detection in Vision-Based Scanning of Nutrition Labels
Text Skew Angle Detection in Vision-Based Scanning of Nutrition Labels
 
Vision-Based Localization and Scanning of 1D UPC and EAN Barcodes with Relaxe...
Vision-Based Localization and Scanning of 1D UPC and EAN Barcodes with Relaxe...Vision-Based Localization and Scanning of 1D UPC and EAN Barcodes with Relaxe...
Vision-Based Localization and Scanning of 1D UPC and EAN Barcodes with Relaxe...
 
Narrative Map Augmentation with Automated Landmark Extraction and Path Inference
Narrative Map Augmentation with Automated Landmark Extraction and Path InferenceNarrative Map Augmentation with Automated Landmark Extraction and Path Inference
Narrative Map Augmentation with Automated Landmark Extraction and Path Inference
 
Toward Blind Travel Support through Verbal Route Directions: A Path Inference...
Toward Blind Travel Support through Verbal Route Directions: A Path Inference...Toward Blind Travel Support through Verbal Route Directions: A Path Inference...
Toward Blind Travel Support through Verbal Route Directions: A Path Inference...
 
Eye-Free Barcode Detection on Smartphones with Niblack's Binarization and Sup...
Eye-Free Barcode Detection on Smartphones with Niblack's Binarization and Sup...Eye-Free Barcode Detection on Smartphones with Niblack's Binarization and Sup...
Eye-Free Barcode Detection on Smartphones with Niblack's Binarization and Sup...
 
Eyesight Sharing in Blind Grocery Shopping: Remote P2P Caregiving through Clo...
Eyesight Sharing in Blind Grocery Shopping: Remote P2P Caregiving through Clo...Eyesight Sharing in Blind Grocery Shopping: Remote P2P Caregiving through Clo...
Eyesight Sharing in Blind Grocery Shopping: Remote P2P Caregiving through Clo...
 
Wireless Indoor Localization with Dempster-Shafer Simple Support Functions
Wireless Indoor Localization with Dempster-Shafer Simple Support FunctionsWireless Indoor Localization with Dempster-Shafer Simple Support Functions
Wireless Indoor Localization with Dempster-Shafer Simple Support Functions
 
RFID in Robot-Assisted Indoor Navigation for the Visually Impaired
RFID in Robot-Assisted Indoor Navigation for the Visually ImpairedRFID in Robot-Assisted Indoor Navigation for the Visually Impaired
RFID in Robot-Assisted Indoor Navigation for the Visually Impaired
 
RoboCart: Toward Robot-Assisted Navigation of Grocery Stores by the Visually ...
RoboCart: Toward Robot-Assisted Navigation of Grocery Stores by the Visually ...RoboCart: Toward Robot-Assisted Navigation of Grocery Stores by the Visually ...
RoboCart: Toward Robot-Assisted Navigation of Grocery Stores by the Visually ...
 
On Natural Language Dialogue with Assistive Robots
On Natural Language Dialogue with Assistive RobotsOn Natural Language Dialogue with Assistive Robots
On Natural Language Dialogue with Assistive Robots
 
Ergonomics-for-One in a Robotic Shopping Cart for the Blind
Ergonomics-for-One in a Robotic Shopping Cart for the BlindErgonomics-for-One in a Robotic Shopping Cart for the Blind
Ergonomics-for-One in a Robotic Shopping Cart for the Blind
 
A Wearable Two-Sensor O&M Device for Blind College Students
A Wearable Two-Sensor O&M Device for Blind College StudentsA Wearable Two-Sensor O&M Device for Blind College Students
A Wearable Two-Sensor O&M Device for Blind College Students
 
On the Impact of Data Collection on the Quality of Signal Strength in Wi-Fi I...
On the Impact of Data Collection on the Quality of Signal Strength in Wi-Fi I...On the Impact of Data Collection on the Quality of Signal Strength in Wi-Fi I...
On the Impact of Data Collection on the Quality of Signal Strength in Wi-Fi I...
 

Recently uploaded

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Recently uploaded (20)

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

Skip Trie Matching: A Greedy Algorithm for Real-Time OCR Error Correction on Smartphones

  • 1. Skip Trie Matching: A Greedy Algorithm for Real- Time OCR Error Correction on Smartphones Vladimir Kulyukin Department of Computer Science Utah State University Logan, UT, USA vladimir.kulyukin@usu.edu Aditya Vanka Department of Computer Science Utah State University Logan, UT, USA aditya.vanka@aggiemail.usu.edu Haitao Wang Department of Computer Science Utah State University Logan, UT, USA haitao.wang@usu.edu Abstract—Proactive nutrition management is considered by many nutritionists and dieticians as a key factor in reducing diabetes, cancer, and other illnesses caused by mismanaged diets. As more individuals manage their daily activities with smartphones, they start using their smartphones as diet management tools. Unfortunately, while there are many vision- based mobile applications to process barcodes, especially aligned ones, there is a relative dearth of vision-based applications for extracting useful nutrition information items such as nutrition facts, caloric contents, and ingredients. In this article, we present a greedy algorithm, called Skip Trie Matching (STM), for real time optical character recognition (OCR) output error correction on smartphones. The STM algorithm uses a dictionary of strings stored in a trie data structure to correct OCR errors by skipping misrecognized characters while driving down several paths in the trie. The number of skipped characters is referred to as the skip distance. The algorithm’s worst-case performance is  n , where n is the length of the input string to spellcheck. The algorithm’s performance is compared with Apache Lucene’s spell checker [1], a state of the art spell checker where spell checking can be done with the n-gram matching [2] or the Levenshtein edit distance (LED) [3]. The input data for comparison tests are text strings produced by the Tesserract OCR engine [4] on text image segments of nutrition data automatically extracted by an Android 2.3.6 smartphone application from real-time video streams of grocery product packages. The evaluation results indicate that, while the STM algorithm is greedy in that it does not find all possible corrections of a misspelled word, it gives higher recalls than Lucene’s n-gram matching or LED. The average run time of the STM algorithm is also lower than Lucene’s implementations of both algorithms. Keywords—vision-based nutrition information extraction, nutrition management, spellchecking, optical character recognition, mobile computing I. Introduction According to the U.S. Department of Agriculture, U.S. residents have increased their caloric intake by 523 calories per day since 1970. Mismanaged diets are estimated to account for 30-35 percent of cancer cases [5]. Approximately 47,000,000 U.S. residents have metabolic syndrome and diabetes. Diabetes in children appears to be closely related to increasing obesity levels. Many nutritionists and dieticians consider proactive nutrition management to be a key factor in reducing and controlling diabetes, cancer, and other illnesses related to or caused by mismanaged or inadequate diets. Research and development efforts in public health and preventive care have resulted in many systems for personal nutrition and health management. Numerous web sites have been developed to track caloric intake (e.g., http://nutritiondata.self.com), to determine caloric contents and quantities in consumed food (e.g., http://www.calorieking.com), and to track food intake and exercise (e.g., http://www.fitday.com). Online health informatics portals have also been developed. For example, theCarrot.com and Keas.com are online portals where patients can input medical information, track conditions via mobile devices, and create health summaries to share with their doctors. Another online portal, mypreventivecare.com, provides a space where registered patients and doctors can collaborate around illness prevention. HealtheMe (http://www.krminc.com/) is an open source, personal health record (PHR) system at the U.S. Department of Veterans Affairs focused on preventive care. Unfortunately, while these existing systems provide many useful features, they fail to two critical barriers (CRs): CR01: Lack of automated, real-time nutrition label analysis: nutrition labels (NLs) have the potential to promote healthy diets. Unfortunately, many nutrition label characteristics impede timely detection and adequate comprehension [6], often resulting in patients ignoring useful nutrition information; CR02: Lack of automated, real-time nutrition intake recording: Manual nutrition intake recording, the de facto state of the art, is time-consuming and error-prone, especially on smartphones. As smartphones become universally adopted worldwide, they can be used as proactive nutrition management tools. One smartphone sensor that may help address both critical barriers is the camera. Currently, the smartphone cameras are used in many mobile applications to process barcodes. There are free public online barcode databases (e.g., http://www.upcdatabase.com/) that provide some product descriptions and issuing countries’ names. Unfortunately, although the quality of data in public barcode databases has improved, the data remain sparse. Most product information is International Journal of Digital Information and Wireless Communications (IJDIWC) 3(3): 56-65 The Society of Digital Information and Wireless Communications, 2013 (ISSN: 2225-658X) 56
  • 2. provided by volunteers who are assumed to periodically upload product details and associate them with product IDs, almost no nutritional information is available and some of it may not be reliable. Some applications (e.g., http://redlaser.com) provide some nutritional information for a few popular products. While there are many vision-based applications to process barcodes, there continues to be a relative dearth of vision- based applications for extracting other types of useful nutrition information from product packages such as nutrition facts, caloric contents, and ingredients. If successfully extracted, such information can be converted into text or SQL via scalable optical character recognition (OCR) methods and submitted as queries to cloud-based sites and services. In general, there are two broad approaches to improving OCR quality: improved image processing and OCR engine error correction. The first approach (e.g., [8], [9]) strives to achieve better OCR results by improving image processing algorithms. Unfortunately, this approach may not always be feasible, especially on mobile off-the-shelf platforms due to processing and networking constraints on the amount of real time computation or the technical or financial impracticality of switching to a different OCR engine. The second approach treats the OCR engine as a black box and attempts to improve its quality via automated error correction methods applied to its output. This approach can work with multiple OCR engines, because it does not modify the underlying image processing methods. This article contributes to the body of research on the second approach. In particular, we present an algorithm, called Skip Trie Matching (STM) after the trie data structure on which it is based, for real time OCR output error correction on smartphones. The algorithm uses a dictionary of strings stored in a trie to correct OCR errors by skipping misrecognized characters while going down specific trie paths. The number of skipped characters, called the skip distance, is the only variable input parameter of the algorithm. The remainder of our article is organized as follows. Section 2 presents related work. Section 3 discusses the components of the vision-based nutrition information extraction (NIE) module of the ShopMobile system [10, 11] that run prior to the STM algorithm. The material in this section is not the main focus of this article, and is presented with the sole purpose of giving the reader the broader context in which the STM algorithm has been developed and applied. Section 4 details the STM algorithm and gives its asymptotic analysis. In Section 5, the STM algorithm’s performance is compared with Apache Lucene’s n-gram matching [2] and Levenshtein edit distance (LED) [3]. Section 6 analyzes the results and outlines several research venues for the future. II. Related Work Many current R&D efforts aim to utilize the power of mobile computing to improve proactive nutrition management. In [12], the research is presented that shows how to design mobile applications for supporting lifestyle changes among individuals with Type 2 diabetes and how these changes were perceived by a group of 12 patients during a 6-month period. In [13], an application is presented that contains a picture- based diabetes diary that records physical activity and photos taken with the phone camera of eaten foods. The smartphone is connected to a glucometer via Bluetooth to capture blood glucose values. A web-based, password-secured and encrypted SMS is provided to users to send messages to their care providers to resolve daily problems and to send educational messages to users. The nutrition label (NL) localization algorithm outlined in Section 3 is based on vertical and horizontal projections used in many OCR applications. For example, in [9], projections are used to detect and recognize Arabic characters. The text chunking algorithm, also outlined in Section 3, builds on and complements numerous mobile OCR projects that capitalize on the ever increasing processing capabilities of smartphone cameras. For example, in [14], a system is presented for mobile OCR on mobile phones. In [15], an interactive system is described for text recognition and translation. The STM algorithm detailed in Section 4 contributes to a large and rapidly growing body of research on automated spell checking, an active research area of computational linguistics since early 1960’s (see [16] for a comprehensive survey). Spelling errors can be broadly classified as non-word errors and real-word errors [17]. Non-word errors are character sequences returned by OCR engines but not contained in the spell checker’s dictionary. For example, ‘polassium’ is a non- word error if the spell checker’s dictionary contains only ‘potassium.’ Real-word errors occur when recognized words spelled correctly but inappropriate in the current context. For example, if the OCR engine recognizes ‘nutrition facts’ as ‘nutrition fats,’ the string ‘fats’ is a real-word error in that it is correctly spelled but inappropriate in the current context. Real-word error correction is beyond the scope of this paper. Many researchers consider this problem to be much more difficult than non-word error recognition and require natural language processing and AI techniques [2]. Two well-known approaches that handle non-word errors are n-grams and edit distances [2, 3]. The n-gram approach breaks dictionary words into sub-sequences of characters of length n, i.e., n-grams, where n is typically set to 1, 2, or 3. A table is computed with the statistics of n-gram occurrences. When a word is checked for spelling, its n-grams are computed and, if an n-gram is not found, spelling correction is applied. To find spelling suggestions, spelling correction methods use various similarity criteria between the n-grams of a dictionary word and the misspelled word. The term edit distance denotes the minimum number of edit operations such as insertions, deletions, and substitutions required to transform one string into another. Two popular edit distances are Levenshtein [3] and Damerau-Levenshtein distance [18]. The Levenshtein distance (LED) is a minimum cost sequence of single-character replacement, deletion, and insertion operations required to transform a source string into a target string. The Damerau-Levenshtein distance (DLED) extends the LED’s set of operations with the operation of swapping adjacent characters, called transposition. The DLED is not as widely used in OCR as the LED, because a major source of transposition errors are typography errors whereas most OCR errors are caused by misrecognized characters with International Journal of Digital Information and Wireless Communications (IJDIWC) 3(3): 56-65 The Society of Digital Information and Wireless Communications, 2013 (ISSN: 2225-658X) 57
  • 3. similar graphic features (e.g., ‘t’ misrecognized as ‘l’ or ‘u’ as ‘ll’). Another frequent OCR error type, which does not yield to correction by transposition, is character omission from a word when the OCR engine’ fails to recognize it or merges it with the previous character (e.g., the letter ‘i’ with a missing dot is merged with the following letter ‘k’ when ‘k’ is placed close to ‘i’). III. NFT Localization & Text Segmentation The objective of this section is to give the reader a better appreciation of the broader context in which the STM algorithm is applied by outlining the nutrition information extraction (NIE) module that runs prior to the STM algorithm. The NIE module is a module of the Persuasive NUtrion System (PNUTS), a mobile vision-based nutrition management system for smartphone users currently under development at the Utah State University (USU) Computer Science Assistive Technology Laboratory (CSATL) [10, 11, 19]. The system will enable smartphone users, both sighted and visually impaired, to specify their dietary profiles securely on the web or in the cloud. When they go shopping, they will use their smartphones to extract nutrition information from product packages with their smartphones’ cameras. The extracted information is not limited to barcodes but also includes nutrition facts such as calories, saturated fat, sugar content, cholesterol, sodium, potassium, carbohydrates, protein, and ingredients. The development of PNUTS has been guided by the Fogg Behavior Model (FBM) [23]. The FBM states that motivation alone is insufficient to stimulate target behavior; a motivated patient must have both the ability to execute the behavior and a trigger to engage in that behavior at an appropriate place and time. Many nutrition management system designers assume that patients are either more skilled than they actually are, or that they can be trained to have the required skills. Since training can be difficult and time consuming, PNUTS’s objective is to make target behaviors easier and more intuitive to execute. A. Vertical and Horizontal Projections Images captured from the smartphone’s video stream are divided into foreground and background pixels. Foreground pixels are content-bearing units where content is defined in a domain-dependent manner, e.g., black pixels, white pixels, pixels with specific luminosity levels, specific neighborhood connectivity patterns, etc. Background pixels are those that are not foreground. Horizontal projection of an image (HP) is a sequence of foreground pixel counts for each row in an image. Vertical projection of an image (VP) is a sequence of foreground pixel counts for each column in an image. Figure 1 shows horizontal and vertical projections of a black and white image of three characters ‘ABC’. Suppose there is an m x n image I where foreground pixels are black, i.e.,   ,0, yxI and the background pixels are white, i.e.,   .255, yxI Then the horizontal projection of row y and the vertical projection of column x can defined as  yf and  xg , respectively:                  1 0 1 0 .,255 ;,255 m y n x yxIxg yxIyf (1) For the discussion that follows it is important to keep in mind that the vertical projections are used for detecting the vertical boundaries of NFTs while the horizontal projections are used in computing the NFTs’ horizontal boundaries. B. Horizontal Line Filtering In detecting NFT boundaries, three assumptions are made: 1) a NFT is present in the image; 2) the NFT is not cropped; and 3) the NFT is horizontally or vertically aligned. Figures 2 shows horizontally and vertically aligned NFTs. The detection of NFT boundaries proceeds in three stages. Firstly, the first approximation of vertical boundaries is computed. Secondly, the vertical boundaries are extended left and right. Thirdly, the upper and lower horizontal boundaries are computed. The objective of the first stage is to detect the approximate location of the NFT along the horizontal axis  ., '' es xx This approximation starts with the detection of horizontal lines in the image, which is accomplished with a horizontal line detection kernel (HLDK) described in our previous publications [19]. It should be noted that other line detection techniques (e.g., Hough transform [20]) can be used for this purpose. Our HLDK is designed to detect large horizontal lines in images to maximize computational efficiency. On rotated images, the kernel is used to detect vertical lines. The left image of Figure 3 gives the output of running the HLDK filter on the left image of Figure 2. Figure 1. Horizontal & Vertical Projections. International Journal of Digital Information and Wireless Communications (IJDIWC) 3(3): 56-65 The Society of Digital Information and Wireless Communications, 2013 (ISSN: 2225-658X) 58
  • 4. C. Detection of Vertical Boundaries Let HLFI be a horizontally line filtered image, i.e., the image put through the HLDK filter or some other line detection filter. Let  HLFIVP be the vertical projections of white pixels in each column of HLFI. The right image in Figure 3 shows the vertical projection of the HLFI on the left. Let VP be a threshold, which in our application is set to the mean count of the white foreground pixels in columns. In Figure 3 (right), VP is shown by a red line. The foreground pixel counts in the columns of the image region with the NFT are greater than the threshold. The NFT vertical boundaries are then computed as:      .&|max ;|min '' ' cxcgcx cgcx lVPxr VPxl     (2) The pairs of the left and right boundaries detected by (2) may be too close to each other, where ‘too close’ is defined as the percentage of the image width covered by the distance between the right and left boundaries. If the boundaries are found to be too close to each other, the left boundary is extended left of the current left boundary, for which the projection is at or above the threshold, whereas the right boundary is extended to the first column right of the current right boundary, for which the vertical projection is at or above the threshold. Figure 4 (left) shows the initial vertical boundaries extended left and right. D. Detection of Horizontal Boundaries The NFT horizontal boundary computation is confined to the image region vertically bounded by  ., rl xx Let  HLFIHP be the horizontal projection of the HLFI in Figure 3 (left) and let HP be a threshold, which in our application is set to the mean count of the foreground pixels in rows, i.e.,     .0|  yfyfmeanHP Figure 4 (right) shows the horizontal projection of the HLFI in Figure 3 (left). The red line shows .HP The NFT’s horizontal boundaries are computed in a manner similar to the computation of its vertical boundaries with one exception – they are not extended after the first approximation is computed, because the horizontal boundaries do not have as much impact on subsequent OCR of segmented text chunks as vertical boundaries. The horizontal boundaries are computed as:      .&|max ;|min uHPl HPu rrrfrr rfrr     (3) Figure 5 (left) shows the nutrition table localized via vertical and horizontal projections and segmented from the image in Figure 2 (left). E. Text Chunking A typical NFT includes text chunks with various caloric and ingredient information, e.g., “Total Fat 2g 3%.” As can be seen in Figure 5 (left), text chunks are separated by black colored separators. These text chunks are segmented from localized NFTs. This segmentation is referred to as text chunking. Figure 2. Vertically & Horizonally Aligned Tables. Figure 4. VB Extension (left); HP of Left HFLI in Fig. 2 (right). Figure 3. HLFI of Fig. 2 (left); its VP (right). International Journal of Digital Information and Wireless Communications (IJDIWC) 3(3): 56-65 The Society of Digital Information and Wireless Communications, 2013 (ISSN: 2225-658X) 59
  • 5. Text chunking starts with the detection of the separator lines. Let N be a binarized image with a segmented NFT and let  ip denote the probability of image row i containing a black separator. If such probabilities are reliably computed, text chunks can be localized. Toward that end, let jl be the length of the j-th consecutive run of black pixels in row i above a length threshold l . If m be the total number of such runs, then  ip is computed as the geometric mean of  .,...,, 10 mlll The geometric mean is more indicative of the central tendency of a set of numbers than the arithmetic mean. If  is the mean value of all positive values normalized by the maximum value of  ip for the entire image, the start and end coordinates, sy and ey , respectively, of every separator along the y axis can be computed by detecting consecutive rows for which the normalized values are above the threshold:         .&1| ;&1|     jpjpjy ipipiy e s (4) Once these coordinates are identified, the text chunks can be segmented from the image. As can be seen from Figure 5 (right), some text chunks contain single text lines while others have multiple text lines. The actual OCR in the ShopMobile system [19] takes place on the text chunk images such as shown in Figure 5 (right). IV. Skip Trie Matching The trie data structure has gained popularity on mobile platforms due to its space efficiency compared to the standard hash table and its efficient worst-case lookup times,  ,nO where n is the length of the input string, not the number of entries in the data structure. This performance compares favorably to the hash table that spends the same amount of time on computing the hash code but requires significantly more storage space. On many mobile platforms the trie has been used for word completion. The STM algorithm is based on an observation that the trie’s efficient storage of strings can be used in finding closest dictionary matches to misspelled words. The only parameter that controls the algorithm’s behavior is the skip distance, a non-negative integer that defines the maximum number of misrecognized (misspelled) characters allowed in a misspelled word. In the current implementation, misspelled words are produced by the Tesseract OCR engine. However, the algorithm generalizes to other domains where spelling errors must be corrected fast. The STM algorithm uses the trie to represent the target dictionary. For example, consider a trie dictionary in Figure 6 that (moving left to right) consists of ‘ABOUT,’ ‘ACID,’ ‘ACORN,’ ‘BAA,’ ‘BAB,’ ‘BAG,’ ‘BE,’ ‘OIL,’ and ‘ZINC.’ The small balloons at character nodes are Boolean flags that signal word ends. When a node’s word end flag is true, the path from the root to the node is a word. The children of each node are lexicographically sorted so that finding a child character of a node is  ,lognO where n is the number of the node’s children. Suppose that skip distance is set to 1 and the OCR engine misrecognizes ‘ACID’ as ‘ACIR.’ The STM starts at the root node, as shown in Figure 7. For each child of the root, the algorithm checks if the first character of the input string matches any of the root’s children. If no match is found and the skip distance > 0, the skip distance is decremented by 1 and the recursive calls are made for each of the root’s children. In this case, ‘A’ in the input matches the root’s ‘A’ child. Since the match is successful, a recursive call is made on the remainder of the input ‘CIR’ and the root node’s ‘A’ child at Level 1 as the current node, as shown in Figure 8. The algorithm next matches ‘C’ of the truncated input ‘CIR’ with the right child of the current node ‘A’ at Level 1, truncates the input to ‘IR,’ and recurses to the node ‘C’ at Level 2, i.e., the right child of the node ‘A’ at Level 1. The skip distance is still 1, because no mismatched characters have been skipped so far. Figure 5. Localized NL (left); NL Text Chunks (right). Figure 6. Simple Trie Dictionary. International Journal of Digital Information and Wireless Communications (IJDIWC) 3(3): 56-65 The Society of Digital Information and Wireless Communications, 2013 (ISSN: 2225-658X) 60
  • 6. Figure 10. End Position Errors along X-axis. Edit Distance: At the node ‘C’ at Level 2, the character ‘I’ of the truncated input ‘IR’ is matched with the left child of the node ‘C,’ the input, i.e., the node ‘I’, is therefore truncated to ‘R.’ The algorithm recurses to the node ‘I’ at Level 3, as shown in Figure 9. The last character of the input string, ‘R,’ is next matched with the children of the current node ‘I’ at Level 3 (see Figure 10). The binary search on the node’s children fails. However, since the skip distance is 1, i.e., one more character can still be skipped, the skip distance is decremented by 1. Since there are no more characters in the input string after the mismatched ‘R,’ the algorithm checks if the current node has a word end flag set to true. In this case, it is true, because the end word flag at the node ‘D’ at Level 4 is set to true. Thus, the matched word, ‘ACID,’ is added to the returned list of suggestions. When the skip distance is 0 and the current node is not a word end, the algorithm fails, i.e., the current path is not added to the returned list of spelling suggestions. Figure 12 gives the pseudocode of the STM algorithm. The dot operator is used with the semantics common to many OOP languages to denote access to objects’ member variable. The colon denotes after the IF condition signals the start of the THEN clause for that condition. Python-like indentation is used to denote scope. Lines 1 computes the length of the possibly misspelled input string object stored in the variable inputStr. Line 2 initializes an array of string objects called suggestionList. When the algorithm finishes, suggestionList contains all found corrections of inputString. The first call to the STM function occurs on Line 3. The first parameter is inputStr; the second parameter d is a skip distance; the third parameter curNode is a trie node object initially referencing the trie’s root; the fourth parameter suggestion is a string that holds the sequence of characters from the trie’s root curNode. Found suggestions are added to suggestionList on Line 10. Let len(inputStr) = n and d = d. The largest branching factor of a trie node is  , i.e., the size of the alphabet over which the trie is built. If inputStr is in the trie, the binary search on line 14 runs exactly once for each of the n characters, which gives us  .log nO If inputStr is not in the trie, it is allowed to contain at most d character mismatches. Figure 8. Recursive Call from Node 'A' at Level 1. Figure 7. STM of 'ACIR' with Skip Distance of 1. Figure 9. Recursive call at node 'I' at Level 3. International Journal of Digital Information and Wireless Communications (IJDIWC) 3(3): 56-65 The Society of Digital Information and Wireless Communications, 2013 (ISSN: 2225-658X) 61
  • 7. 1. inputLength = len(intpuStr) 2. suggestionList = [ ] 3. STM(inputStr, d, curNode, suggestion): 4. IF len(inputStr) == 0 || curNode == NULL: fail 5. IF len(inputStr) == 1: 6. IF inputStr[0] == curNode.char || d > 0: 7. add curNode.char to suggestion 8. IF len(suggestion) == inputLength && 9. curNode.wordEnd == True: 10. add suggestion to suggestionList 11. ELSE IF len(inputStr) > 1: 12. IF inputStr[0] == curNode.char: 13. add curNode.char to suggestion 14. nextNode = binSearch(inputStr[1], curNode.chidren) 15. IF nextNode != NULL: 16. STM(rest(inputStr), d, nextNode, suggestion) 17. ELSE IF d > 0: 18. FOR each node C in curNode.children: 19. add C.char to suggestion 20. STM(rest(inputStr), d-1, C, suggestion) 21. ELSE fail Figure 11. STM Algorithm. Thus, there are  dn  matches and d mismatches. All matches run in   .log  dnO In the worst case, for each mismatch, lines 18-19 ensure that d  nodes are inspected, which gives us the run time of      loglog dd nOdnO . The worst case rarely occurs in practice because in a trie built for a natural language most nodes have branching factors equal to a small fraction of the constant  . Since Nd  is a small non- negative constant,    .log nOnO d  Since d  and log are small constants for the natural languages with written scripts, the STM algorithm is asymptotically expected to run faster than the quadratic edit distance algorithms and to be on par with n-gram algorithms. Two current limitations of the STM algorithm, as is evident from Figure 11, are 1) that it finds spelling suggestions that are of the same length as inputStr (Line 8) and 2) that it does not find all possible misspellings of a misspelled word because of its greedy selection of the next node in the trie (Lines 14 – 16). Both limitations were deliberately added into the algorithm to increase the run-time performance. In Section VI, we will discuss how these limitations can be addressed and possible overcome. In particular, the second limitation can be overcome by having the algorithm perform a breadth-first search of the current node’s children instead of selecting only one child that matches the current character. While this modification will not chance the asymptotic performance, it will, in practice, increase the running time. V. Experiments A. Tesseract vs. GOCR Our first experiment was to choose an open source OCR engine to test the STM algorithm. We chose to compare the relative performance of Tesseract with GOCR [22], another open source OCR engine developed under the GNU public license. In Tesseract, OCR consists of segmentation and recognition. During segmentation, a connected component analysis first identifies blobs and segments them into text lines, which are analyzed for proportional or fixed pitch text. The lines are then broken into individual words via spacing between individual characters. Finally, individual character cells are identified. During recognition, an adaptive classifier recognizes both word blobs and character cells. The classifier is adaptive in that it can be trained on various text corpora. GOCR preprocesses images via box-detection, zoning, and line detection. OCR is done on boxes, zones, and lines via pixel pattern analysis. The available funding for this project did not allow us to integrate a commercial OCR engine into our application. For the sake of completeness, we should mention two OCR engines that have become popular in the OCR community. ABBYY [24] offers a powerful and compact mobile OCR engine which can be integrated within various mobile platforms such as iOS, Android, Windows mobile platforms. Another company called WINTONE [25] claims that its mobile OCR software can achieve recognition accuracy to the tune of 95 percent for English documents. The experimental comparison of the two open source engines was guided by speed and accuracy. An Android 2.3.6 application was developed and tested on two hundred images of NL text chunks, some of which are shown in Figure 5 (right). After the application starts, it first app creates necessary temporary directories and output files, it also ensures the presence of necessary training files for Tesseract on the sdcard. Both the modified Otsu’s method and modified Niblack’s method of binarization [26] are applied to each image read from the input image directory on the sdcard. These binarized images are stored in the binary images directory. All three images (the input image and the two binarized images) are subjected to OCRed either on the device itself or on the server. The recognized text is obtained and is subjected to spelling correction. Finally the processed text is both saved to a text file. Each image was processed with both Tesseract and GOCR, and the processing times were logged for each text chunk. The images were read from the smartphone’s sdcard one by one. The image read time was not integrated into the run time total. The Android application was designed and developed to operate in two modes: device and server. In the device mode, everything was computed on the smartphone. In the server mode, the HTTP protocol was used to send the images over Wi-Fi from the device to an Apache web server running on Ubuntu 12.04. Images sent from the Android application were handled by a PHP script that ran one of the OCR engines on the server, captured the extracted text and sent it back to the application. Returned text messages were saved on the International Journal of Digital Information and Wireless Communications (IJDIWC) 3(3): 56-65 The Society of Digital Information and Wireless Communications, 2013 (ISSN: 2225-658X) 62
  • 8. smartphone’s sdcard and later categorized by a human judge into three categories: complete, partial, and garbled. The three categorizes were meant to estimate the degree of recognition. Complete recognition denoted images where the text recognized with OCR was identical to the text in the image. For partial recognition, at least one character in the returned text had to be missing or inaccurately substituted. For garbled recognition, either empty text was returned or all characters in the returned text were misrecognized. Table 1 gives the results of the OCR engine text recognition comparison. The abbreviations TD, GD, TS, and GS stand for ‘Tesseract Device,’ ‘GOCR Device,’ ‘Tesseract Server,’ and ‘GOCR Server,’ respectively. Each cell contains the exact number of images out of 200 in a specific category. Table 1. Tesseract vs. GOCR (%). Complete Partial Garbled TD 146 36 18 GD 42 23 135 TS 158 23 19 GS 58 56 90 The percentages in Table 1 indicate that the recognition rates on Tesseract are higher than those on GOCR. Tesseract also compared favorably with GOCR in the garbled category where its numbers were lower than those of GOCR. To evaluate the runtime performance of both engines, we ran the application in both modes on the same sample of 200 images five times. Table 2 tabulates the rounded processing times and averages in seconds. The numbers 1 through 5 are the run numbers. The AVG column contains the average time of the five runs. The AVG/Img column contains the average time per individual image. Table 2. Run Times (in secs) of Tesseract & GOCR. 1 2 3 4 5 AVG AVG/Img TD 128 101 101 110 103 110 0.5 GD 50 47 49 52 48 49 0.3 TS 40 38 38 10 39 38 0.2 GS 21 21 20 21 21 21 0.1 Table 2 shows no significant variation among the processing times of individual runs. The AVG column indicates that Tesseract was slower than GOCR on the sample of images. The difference in run times can be attributed to the amount of text recognized by each engine. Since GOCR extracts less information than Tesseract, as indicated in Table 1, its running times are faster. Tesseract, on the other hand, extracts much more accurate information from most images, which causes it to take more time. While the combined run times of Tesseract were slower than those of GOCR, the average run time per frame, shown in the AVG/Img column, were still under one second. Tessaract’s higher recognition rates swayed our final decision in its favor. Additionally, as Table 2 shows, when the OCR was done on the server, Tesseract’s run times, while still slower than GOCR, were faster, which made it more acceptable to us. This performance can be further optimized through server-side servlets and faster image transmission channels (e.g., 4G instead of Wi-Fi). After we performed these experiments, we discovered that we could get even better accuracy from the Tesseract OCR engine through the character whitelist and blacklist options. Whitelisting characters will limit the set of characters for which Tesseract looks. Blacklisting on the other hand instructs Tesseract not to look for certain characters in the image. A list of frequent NL characters was manually prepared and used for whitelisting after the experiments. Further evaluation is necessary to verify the conjecture that whitelisting improves accuracy. B. STM vs. Apache Lucene’s N-Grams and LED After Tesseract was chosen as the base OCR engine, the performance of the STM algorithm was compared with the Apache Lucene implementation of n-gram matching and the LED. Each of the three algorithms worked on the text strings produced by the Tesseract OCR running on Android 2.3.6 and Android 4.2 smartphones on a collection of 600 text segment images produced by the algorithm described in Section IV. Figure 12 shows the exact control flow of the application used in the tests. Android applications cannot execute long-running operations on the UI Thread. The application force-closes if an operation takes more than 5 seconds while running on the UI Thread. Since in our application we have some long-running operations such as OCR processing, and network communication, we have used the Pipeline Thread model. The Pipeline Thread holds a queue of some units of work that can be executed. Other threads can push new tasks into the Pipeline Thread’s queue as new tasks become available. The Pipeline Thread processes the tasks in the queue one after another. If there are no tasks in the queue, it blocks until a new task is added to the queue. On the Android platform, this design pattern can be easily implemented using the Loopers and Handlers. A Looper class makes a thread into a pipeline Figure 12. Application Flowchart. International Journal of Digital Information and Wireless Communications (IJDIWC) 3(3): 56-65 The Society of Digital Information and Wireless Communications, 2013 (ISSN: 2225-658X) 63
  • 9. thread and a Handler makes it easier to push tasks into a Looper from other threads. The Lucene n-gram and edit distance matching algorithms find possible correct spellings of misspelled words. The n- gram distance algorithm divides a misspelt word into chunks of size n (n defaults to 2) and compares them to a dictionary of n-grams, each of which points to the words in which they are found. A word with the largest number of matched n-grams is a possible spelling of a misspelled word. If m is the number of n-grams found in a misspelled word and K is the number of n- grams in a dictionary, the time complexity of the n-gram matching is O(Km). Since the standard LED uses dynamic programming, its time complexity is O(n2 ), where n is the maximum of the lengths of two strings being matched. If there are m entries in a dictionary, the run time of the LED algorithm is O(mn2 ). In comparing the performance of the STM algorithm with the n-gram matching and the LED, time and accuracy were used as performance metrics. The average run time taken by the STM algorithm, the n-gram matching, and the LED distance are 20 ms, 50 ms, and 51 ms, respectively. The performance of each algorithm was evaluated on non-word error correction with the recall measure computed as the ratio of corrected and misspelled words. The recall coefficients of the STM, n-gram matching, and the LED were 0.15, 0.085, and 0.078, respectively. Table 3 gives the run-time and the corresponding recalls. Table 3. Performance Comparision. STM N-Gram LED Run Time 25ms 51ms 51ms Recall 15% 9% 8% VI. Discussion In this paper, we presented an algorithm, called Skip Trie Matching (STM), for real time OCR output error correction on smartphones and compared its performance with the n-gram matching and the LED. The experiments on the sample of over 600 texts extracted by the Tesseract OCR engine from the NL text images show that the STM algorithm ran faster, which is predicted by the asymptotic analysis, and corrected more words than the n-gram matching and the LED. One current limitation of the STM algorithm is that it can correct a misspelled word so long as there is a target word in the trie dictionary of the same length. One possible way to address this limitation is to relax the exact length comparison on Line 8 in the STM algorithm given in Figure 8. This will increase the recall but will also increase the run time. Another approach, which may be more promising in the long run, is example-driven human-assisted learning [2]. Each OCR engine working in a given domain is likely to misrecognize the same characters consistently. For example, we have noticed that Tesseract consistently misrecognized ‘u’ as ‘ll’ or ‘b’ as ‘8.’ Such examples, if consistently classified by human judges as misrecognition examples, can be added by the system to add to a dictionary of regular misspellings. In a subsequent run, when the Tesseract OCR engine misrecognizes ‘potassium’ and as ‘potassillm,’ the STM algorithm, when failing to find a possible suggestion on the original input, will replace all regularly misspelled characters in the input with their misspellings. Thus, ‘ll’ in ‘potassill’ will be replaced with ‘u’ to obtain ‘potassium’ and get a successful match. Another limitation of the STM algorithm is that does not find all possible corrected spellings of a misspelled word. To make it find all possible corrections, the algorithm can be forced to examine all nodes even when it finds a successful match on line 12. Although it will likely take a toll on the algorithm’s run time, the worst case asymptotic run time,    ,log nOnO d  remains the same. An experimental contribution of this research presented in this article is an experimental comparison of Tesseract and GOCR, two popular open source OCR engines, for vision- based nutrition information extraction. GOCR appears to extract less information than Tesseract but has faster run times. While the run times of Tesseract were slower, its higher recognition rates swayed our final decision in its favor. The run-time performance, as indicated in Tables 1 and 2, can be further optimized through server-side servlets and faster image transmission channels (e.g., 4G instead of Wi-Fi). Our ultimate objective is to build a mobile system that can not only extract nutrition information from product packages but also to match the extracted information to the users’ dietary profiles and to make dietary recommendations to effect behavior changes. For example, if a user is pre-diabetic, the system will estimate the amount of sugar from the extracted ingredients and will make specific recommendations to the user. The system, if the users so choose, will keep track of their long-term buying patterns and make recommendations on a daily, weekly or monthly basis. For example, if a user exceeds his or her total amount of saturated fat permissible for the specified profile, the system will notify the user and, if the user’s profile has appropriate permissions, the user’s dietician. Acknowledgment This project has been supported, in part, by the MDSC Corporation. We would like to thank Dr. Stephen Clyde, MDSC President, for supporting our research and championing our cause. References [1] Apache Lucene, http://lucene.apache.org/core/, Retrieved 04/15/2013. [2] Jurafsky, D. and Martin, J. (2000). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, Upper Saddle River, New Jersey, 2000, ISBN: 0130950696. [3] Levenshtein, V. (1966). Binary codes capable of correcting deletions, insertions, and reversals, Doklady Akademii Nauk SSSR, 163(4):845-848, 1965 (Russian). English translation in Soviet Physics Doklady, 10(8):707- 710, 1966. International Journal of Digital Information and Wireless Communications (IJDIWC) 3(3): 56-65 The Society of Digital Information and Wireless Communications, 2013 (ISSN: 2225-658X) 64
  • 10. [4] Tesseract Optical Character Recognition Engine. http://code.google.com/p/tesseract-ocr/, Retrieved 04/15/2013. [5] Anding, R. Nutrition Made Clear. The Great Courses, Chantilly, VA, 2009. [6] D. Graham, J. Orquin, V. Visschers. (2012). Eye Tracking and Nutrition Label Use: A Review of the Literatur and Recommendations for Label Enhancement. Food Policy, Vol. 37, pp. 378-382. [7] Kane S, Bigham J, Wobbrock J. (2008). Slide Rule: Making M obile Touch Screens Accessible to Blind People using Multi-Touch Interaction Techniques. In Proceedings of 10-th Conference on Computers and Accessibility (ASSETS 2008), October, Halifax, Nova Scotia, Canada 2008; pp. 73-80. [8] Kulyukin, V., Crandall, W., and Coster, D. (2011). Efficiency or Quality of Experience: A Laboratory Study of Three Eyes-Free Touchscreen Menu Browsing User Interfaces for Mobile Phones. The Open Rehabilitation Journal, Vol. 4, pp. 13-22, DOI: 10.2174/1874943701104010013. [9] K-NFB Reader for Nokia Mobile Phones, www.knfbreader.com, Retrieved 03/10/2013. [10] Al-Yousefi, H., and Udpa, S. (1992). Recognition of Arabic Characters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 8 (August 1992), pp. 853-857. [11] Kulyukin, V., Zaman, T., Andhavarapu, A., and Kutiyanawala, A. (2012). Eyesight Sharing in Blind Grocery Shopping: Remote P2P Caregiving through Cloud Computing. In Proceedings of the 13-th International Conference on Computers Helping People with Special Needs (ICCHP 2012), K. Miesenberger et al. (Eds.), ICCHP 2012, Part II, Springer Lecture Notes on Computer Science (LNCS) 7383, pp. 75-82, July 11-13, 2012, Linz, Austria (pdf); DOI 10.1007/978-3-642- 31522-0; ISBN 978-3-642-31521-3; ISSN 0302-9743. [12] Kulyukin, V. and Kutiyanawala, A. (2010). Accessible Shopping Systems for Blind and Visually Impaired Individuals: Design Requirements and the State of the Art. The Open Rehabilitation Journal, ISSN: 1874-9437, Volume 2, 2010, pp. 158-168, DOI: 10.2174/1874943701003010158. [13] Årsand, E., Tatara, N., Østengen, G., and Hartvigsen, G. (2010). Mobile Phone-Based Self-Management Tools for Type 2 Diabetes: The Few Touch Application. Journal of Diabetes Science and Technology, 4, 2 (March 2010), pp. 328-336. [14] Frøisland, D.H., Arsand E., and Skårderud F. (2010). Improving Diabetes Care for Young People with Type 1 Diabetes through Visual Learning on Mobile Phones: Mixed-methods Study. J. Med. Internet Res. 6, 14(4), (Aug 2012), published online; DOI= 10.2196/jmir.2155. [15] Bae, K. S., Kim, K. K., Chung, Y. G., and Yu, W. P. (2005). Character Recognition System for Cellular Phone with Camera. In Proceedings of the 29th Annual International Computer Software and Applications Conference, Volume 01, (Washington, DC, USA, 2005), COMPSAC '05, IEEE Computer Society, pp. 539-544. [16] Hsueh, M. (2011). Interactive Text Recognition and Translation on a Mobile Device. Master's thesis, EECS Department, University of California, Berkeley, May 2011. [17] Kukich, K. (1992). Techniques for Automatically Correcting Words in Text. ACM Computing Surveys (CSUR), v.24 n.4, p.377-439, Dec. 1992. [18] Kai, N. (2010). Unsupervised Post-Correction of OCR Errors. Diss. Master’s Thesis, Leibniz Universität Hannover, Germany, 2010. [19] Damerau, F.J. (1964). A technique for computer detection and correction of spelling errors. In Communications of ACM, 7(3):171-176, 1964. [20] Kulyukin, V., Kutiyanawala, A., and Zaman, T. (2012). Eyes-Free Barcode Detection on Smartphones with Niblack's Binarization and Support Vector Machines. In Proceedings of the 16-th International Conference on Image Processing, Computer Vision, and Pattern Recognition, Vol. 1, (Las Vegas, Nevada, USA), IPCV 2012, CSREA Press, July 16-19, 2012, pp. 284-290; ISBN: 1-60132-223-2, 1-60132-224-0. [21] Duda, R. O. and P. E. Hart. 1972. Use of the Hough Transformation to Detect Lines and Curves in Pictures. Comm. ACM, Vol. 15, (January 1972), pp. 11-15. [22] GOCR - A Free Optical Character Recognition Program. http://jocr.sourceforge.net/, Retrieved 04/15/2013. [23] B.J. Fogg. 2009. A Behavior for Persuasive Design. In Proceedings of the 4th International Conference on Persuasive Technology (Persuasive 09), Article No. 40, ACM New York, NY, USA, ISBN: 978-1-60558-376-1. [24] ABBYY Mobile OCR Engine. http://www.abbyy.com/mobileocr/, Retrieved April 15 2013. [25] WINTONE Mobile OCR Engine. http://www.wintone.com.cn/en/prod/44/detail270.aspx, Retrieved April 15, 2013. [26] Kulyukin, V., Kutiyanawala, A., and Zaman, T. (2012). Eyes-Free Barcode Detection on Smartphones with Niblack's Binarization and Support Vector Machines . In Proceedings of the 16-th International Conference on Image Processing, Computer Vision, and Pattern Recognition ( IPCV 2012), Vol. I, pp. 284-290, CSREA Press, July 16-19, 2012, Las Vegas, Nevada, USA, (pdf); ISBN: 1-60132-223-2, 1-60132-224-0. International Journal of Digital Information and Wireless Communications (IJDIWC) 3(3): 56-65 The Society of Digital Information and Wireless Communications, 2013 (ISSN: 2225-658X) 65