Vision-Based Localization of Skewed UPC Barcodes on Smartphones
Vision-Based Localization of Skewed UPC Barcodeson SmartphonesVladimir Kulyukin and Tanwir ZamanDepartment of Computer Science, Utah State UniversityLogan, Utah, United States of AmericaAbstract—Two algorithms are presented for vision-basedlocalization of skewed UPC barcodes. The first algorithmlocalizes skewed barcodes in captured frames by computingdominant orientations of gradients (DOGs) of image segmentsand collecting smaller segments with similar dominantgradient orientations into larger connected components. Thesecond algorithm localizes skewed barcodes by growing edgealignment trees (EATs) on binary images with detected edges.The DOG algorithm is implemented in Python 2.7.2 using thePython Image Library (PIL). The EAT algorithm isimplemented on Android 2.3.6 with the Java OpenCV2.4library. The performance of both algorithms was evaluated ona sample of 1,066 images of skewed UPC barcodes on bags,boxes, bottles, cans, books, and images with no barcodes. Allimages were taken on an Android 2.3.6 Google Nexus Onesmartphone.Keywords—computer vision; barcode localization; mobilecomputing; image gradients; skewed barcodesI. IntroductionThe U.S. Department of Agriculture estimates that U.S.residents have increased their caloric intake by 523 caloriesper day since 1970 . Mismanaged diets are estimated toaccount for 30-35 percent of cancer cases. A leading cause ofmortality in men is prostate cancer. A leading cause ofmortality in women is breast cancer. Approximately47,000,000 U.S. residents have metabolic syndrome anddiabetes. Diabetes in children appears to be closely related toincreasing obesity levels. Many nutritionists and dieticiansconsider proactive nutrition management to be a key factor inreducing and controlling cancer, diabetes, and other illnessesrelated to or caused by mismanaged diets.Surveys conducted by the American Dietetic Association(http://www.eatright.org/) demonstrate that the role oftelevision and printed media as sources of nutritioninformation has been steadily falling. In 2002, the credibilityof television as a source of nutrition information wasestimated at 14%; the credibility of magazines were estimatedat 25%. The popularity of the Internet increased from 13 to25% with a perceived credibility of 22% in the same timeperiod. Since smartphones and other mobile devices have, forall practical purposes, become the most popular gateway toaccess the Internet on the go, they can be used as nutritionmanagement tools. As more and more users manage theirdaily activities with smartphones, smartphones areincreasingly being used for proactive diet management.Numerous web sites have been developed to track calorieintake (e.g., http://nutritiondata.self.com), to determine caloriccontents and quantities in consumed food (e.g.,http://www.calorieking.com), and to track food intake andexercise (e.g., http://www.fitday.com).There are free public online barcode databases (e.g.,http://www.upcdatabase.com/) that provide some productdescriptions and issuing countries’ names. Unfortunately,since production information is provided by volunteers whoare assumed to periodically upload product details and toassociate them with product IDs, almost no nutritionalinformation is available. Some applications (e.g.,http://redlaser.com) provide some nutritional information for afew popular products.Visually impaired (VI), low vision, and blind shopperscurrently lack eyes-free access to nutritional information,some of which can be obtained by successfully locating anddecoding package barcodes. While vision-based localizationand decoding of barcodes is a well-known research program,VI and blind consumers have not greatly benefitted fromrecent advances. A common disadvantage of open source orcommercial barcode readers is that they require that thesmartphone camera is carefully aligned with a target barcode,which is acceptable for sighted users but presents a notableaccessibility barrier to VI, low vision, and blind shoppers.In our previous research, we presented an eyes-freealgorithm for vision-based localization and decoding ofaligned barcodes by assuming that simple and efficient visiontechniques can be augmented with interactive user interfacesthat ensure that the smartphone camera is horizontally orvertically aligned with the surface on which a barcode issought . In this paper, two algorithms are presented thatrelax the horizontal and vertical alignment constraint bylocalizing skewed barcodes in frames captured by thesmartphone’s camera.Our paper is organized as follows. Section 2 covers relatedwork. Section 3 presents barcode algorithm 1 that usesdominant gradient orientations in image segments. Section 4
presents barcode algorithm 2 that localizes skewed barcodesby growing edge alignment trees on binary images withdetected edges. Section 5 presents our experiments. Section 6discusses our results and outlines several directions for futurework.II. Related WorkVision-based barcode localization on mobile phones is awell-known problem. Much research has been dedicated todeveloping and improve hardware such a laser readers to readthe barcode. In  and , a computer vision algorithm ispresented to guide a mobile phone user so that the user cancenter a target barcode in the camera frame via audioinstructions. In , another vision-based algorithm ispresented to detect barcodes on mobile phones. The algorithmis based on image analysis and pattern recognition method. Akey assumption is that a barcode is present. In , analgorithm for scanning 1D barcodes is presented that issuitable for blurry and noisy low resolution images. Thealgorithm can detect barcodes only if they are slanted by lessthan 45 degrees. In , an automatic barcode detection andrecognition algorithm is developed for multiple and rotationinvariant barcode decoding. The proposed system isimplemented and optimized on a DM6437 DSP EVM board,which is a special kind of embedded system. In [8-14],systems have been developed for scanning barcodes withmobile phones. However, these solutions have been developedfor sighted users and may not be suitable for visually impaired(VI) individuals. We endorse this body of research andcontribute to it two algorithms that localize skewed barcodes.Skewed barcode localization is necessary for eyes-free mobilebarcode access, because visually impaired (VI), low vision,and blind individuals may not be capable of aligningsmartphone cameras with target barcodes even in the presenceof content-rich haptic and audio feedback.III. Barcode Localization Algorithm IA. Dominant Orientation of GradientsLet I be an RGB image and let f be a linear relativeluminance  function computed from a pixel’s R, G, Bcomponents: .0722.07152.02126.0,, BGRBGRf (1)The gradient of f and the gradient’s orientation 𝜃 can thenbe computed as follows:.tan;, 1 yfxfyfxff (2)Let M be an n x n mask, n > 0, convolved with I. Let thedominant orientation of gradients of M, DOG(M), be the mostfrequent discrete gradient orientation of all pixels covered byM. Let (c, r) be the column and row coordinates of the top leftpixel of M. The regional orientation table of M, RGOT(c, r), isa map of discrete gradient orientations to their frequencies inthe region of I covered by M. The global gradient orientationtable (GGOT) of I is a map of the top left coordinates of imageregions covered by M to their RGOTs.Figure 1. Global Gradient Orientation TableFigure 1 shows the logical organization of an image’s GGOT.In our implementation, both GGOTs and RGOTs areimplemented as hash tables (i.e., Python dictionaries). AGGOT maps (c, r) integer tuples to RGOT hashes that, in turn,map discrete gradient orientations (GO1, GO2, …, GOn inFigure 1) to their frequencies (FREQ1, FREQ2, …, FREQn inFigure 1) in the corresponding image regions, i.e., the regionswhose top left coordinates are specified by the corresponding(c, r) integer tuple and whose size is the size of M. EachRGOT is converted into the most frequent gradient orientationabove a specific threshold. This number is the region’sDOG(M).Figure 2. Skewed UPC-A BarcodeConsider an example of a skewed barcode in Figure 2.Figure 3 gives the DOGs for a 20 x 20 mask convolved withthe image in Figure 2. Each green square is a 20 x 20 imageregion. The top number in the square is the region’s DOG, indegrees, the bottom number is the frequency of that DOG inthe region. For example, if the top number is 49 and thebottom number is 11, it means that in that region 11 pixelshave a gradient orientation of 490. If no gradient orientationclears the frequency count threshold, both numbers are set to0. Figure 4 displays the DOGs for a 50 x 50 mask. As thesize of the mask increases, fewer image regions are expectedto clear the DOG threshold if the latter is set as a ratio ofpixels with specific gradient values over the entire number ofregion’s pixels in the image.
Figure 3. GGOT for a 20 x 20 mask MFigure 4. GGOT for a 50 x 50 mask MB. DOG NeighborhoodsSuppose that an n x n mask M is convolved with an image.Let a an RGOT 3-tuple kkk DOGrc ,, be the DOG for theregion of the image covered by M whose top left corner is at kk rc , . A DOG neighborhood is a non-empty set of RGOT 3-tuples kkk DOGrc ,, such that for any 3-tuple kkk DOGrc ,,there exists at least one other 3-tuple jjj DOGrc ,, such that kkkjjj DOGrcDOGrc ,,,, and ,,,,,, TrueDOGrcDOGrcsim kkkjjj where sim is aboolean similarity metric.In our implementation, the similarity metric is true whenthe square regions specified by the top left coordinates, i.e., kk rc , and jj rc , , and the mask size n are horizontal,vertical, or diagonal neighbors and the absolute difference oftheir DOGs does not exceed a small threshold.The DOG neighborhoods (D-neighborhoods, for short) ofan image are computed simultaneously with the computationof the image’s GGOT. As each RGOT 3-tuple becomesavailable during the normal computation of RGOTs, it isplaced into another hash table that keeps track of theneighborhoods. The computed D-neighborhoods are filteredby the ratio of the total area of their component RGOTs to theimage area. Figure 5 shows a D-neighborhood, whose RGOTsare marked as blue rectangles, computed in parallel with thecomputation of the GGOT in Figure 3.Figure 5. D-neighborhood found in GGOT in Figure 3Figure 6. Boxed D-neighborhood in Figure 5Detected D-neighborhoods are boxed by smallestrectangles that contain all of their RGOT 3-tuples, as shown inFigure 6, where the number in the center of the box denotesthe neighborhood’s DOG. Boxed D-neighborhoods arebarcode region candidates. There can be multiple boxed D-neighborhoods detected in an image, especially if the D-neighborhood filter threshold is set too low. Figure 7 showsall boxed neighborhoods when the threshold is set to .01.Figure 7. Multiple D-neighborhoodsNote that multiple D-neighborhoods in Figure 7 intersect overa skewed barcode. A D-neighborhood is a complete truepositive if there is at least one straight line across all bars of alocalized barcode. A partial true positive occurs if a straightline can be drawn across some, but not all, bars of a barcode.A false positive is a D-neighborhood with no barcode present.In Figure 7, the D-neighborhood whose DOG is 100, in the
upper left corner of the image, is a false positive. Falsenegatives occur when a boxed D-neighborhoods does notcover a barcode either completely or partially.Figure 8. EAT Algorithm FlowchartIV. Barcode Localization Algorithm IIA. Edge Alignment TreesFigure 8 shows an overview of the barcode detectionalgorithm using edge alignment trees (EAT). The algorithm isbased on the observation that barcodes characteristicallyexhibit closely spaced aligned edges with the same angle,which sets them apart from text and graphics. As the flowchartshows, a captured image is put throught a Canny edgedetection filter to produce a binarized image. We chose theCanny because we experimentally found it to produced betterbarcode edges then the other two edge detection algorithmsavailable in the standard OpenCV (opencv.org) distributions,e.g., watershed and flood fill.The binarized image is next divided into smaller regionsof interest (ROI) and scanned row by row and column bycolumn. For each ROI, the edge alignment tree formation isstarted to detect the dominant skew angle of the edges. Figure9 shows how EATs dynamically grow. The algorithm startsfrom the first row of each ROI and moves right column bycolumn until the end of the row is reached. If a pixel’s value is255 (white), it is treated as a 1, marked as a root of an EAT,and stored in the list of nodes.Figure 9. Growing EATs from Binarized Edgesθ ( ) (3)If the current row is the ROI’s first row, the node listcontains the root nodes. Once all nodes in the current row arecomputed, the algorithm moves to the next row. In the nextrow (and all subsequent rows), whenever a white pixel (255)or 1 is found, it is checked against the current node list to seeif any of the nodes can be the parent of this pixel. This is doneby checking the angle between the root pixel and the currentpixel using the formula shown in equation 3.Specifically, if the angle is between 45oand 135o, thecurrent pixel is added to the list of children of the foundparent. This parent-child matching is repeated for all nodes inthe current node list. If none of the nodes satisfies the parentcondition, the orphan pixel becomes a root itself and is addedto the node list.These steps are executed for all rows in the ROI. Once allthe EATs are grown, as shown in Figure 9, the dominant angleis computed for each EAT as the average of the anglesbetween each parent and its children, all the way down to theleaves. For each ROI, the standard deviation of the angles iscomputed for all EATs. If the ROI’s standard deviation is low(less than 5 in our current implementation), the ROI is apotential barcode region.V. ExperimentsThe barcode localization performance of both algorithmswas evaluated on a sample of 1,066 images of skewed UPCbarcodes on bags, boxes, bottles, cans, books, and images withno barcodes. All images were taken on an Android 2.3.6Google Nexus One phone. The outputs of both algorithms,boxed image regions, were evaluated by two human judges.A. DOG Localization ExperimentsThe DOG algorithm is implemented in Python 2.7.2 using thePython Image Library (PIL). The mask sizes were tested from5 x 5 up to 50 x 50 in increments of 5. For each mask size,five thresholds (.01, .02, .03, .04, and .05) were evaluated.The charts in Figures 10 – 14 summarize the performanceanalysis for the DOG algorithm for each product class. Thisalgorithm is conservative in that the number of false positivesis very low. The best mask size and thresholds for which DOGgave the best results are given in Table I. The algorithm gavethe highest accuracy value of .85 for bags.Table I. DOG Products, Mask Sizes, and ThresholdsProduct Mask Size ThresholdBag 20 x 20 0.02Book 40 x 40 0.01Bottle 40 x 40 0.02Box 20 x 20 0.02Can 20 x 20 0.01
Figure 10. DOG Performance on BagsFigure 11. DOG Performance on CansFigure 12. DOG Performance on BottlesFigure 13. DOG Performance on BoxesTables II – VI give the statistics of the DOG barcodelocalization performance for each product category.Figure 14. DOG Performance on BooksB. EAT AlgorithmThe EAT algorithm was implemented in Java withOpenCV2.4 bindings for Android 2.3.6 and ran on GoogleNexus One and Samsung Galaxy S2. Three different Cannyedge detector’s thresholds were used: (200,300), (300,400)and (400,500). Three different window sizes for the ROIcalculations were evaluated: 10, 20 and 40.Figure 12. Complete & Partial Detections for EAT algorithmImages where detected barcode regions had single linescrossing all bars of a real barcode were considered ascomplete true positives. Images where such lines coveredsome of the bars were reckoned as partial true positives.Figure 12 shows complete and partial true positives. Imageswhere detected barcode regions did not have any barcodeswere false positives. True negatives were identified as imageswith no barcodes where the algorithm did not detect anything.False negatives were the images where algorithms failed todetect a barcode.The charts in Figures 13 - 16 summarize the performanceof the EAT algorithm for each product category. The EATalgorithm gave best results for bags with a Canny threshold of(300,400). For bottles, boxes and cans it performed best with aCanny threshold of (400,500). The algorithm gave the mostaccurate results for window size of 10 for all categories ofproducts. Out of all the product categories the algorithm gavethe highest accuracy value of .8828 for boxes.
Figure 13. EAT Performance on BagsFigure 14. EAT Performance on CansFigure 15. EAT Performance on BottlesFigure 16. EAT Performance on BoxesTables VII - X show the precision, recall, specificity andaccuracy averages for the four different categories of products.The EAT algorithm performed well with images that wereproperly focused. It did not perform very well on out of focusor blurred images as shown in Figure 17.Figure 17. A blurred image with a false positive.Figure 18. Distribution of False Positives and True Negatives forEAT and DOG AlgorithmsTable II. DOG Bag DataPrecision TotalRecallCompleteRecallPartialRecallSpecificity Accuracy0.90896 0.48425 0.37954 0.30873 0.72141 0.46121Table III. DOG Book DataPrecision TotalRecallCompleteRecallPartialRecallSpecificity Accuracy0.88232 0.28699 0.25988 0.08062 0.0 0.27594Table IV. DOG Bottle DataPrecision TotalRecallCompleteRecallPartialRecallSpecificity Accuracy0.89031 0.55961 0.50903 0.23971 0.0 0.47493Table V. DOG Box DataPrecision TotalRecallCompleteRecallPartialRecallSpecificity Accuracy0.87369 0.52701 0.49681 0.24237 0.0 0.43242
Table VI. DOG Can DataPrecision TotalRecallCompleteRecallPartialRecallSpecificity Accuracy0.93706 0.48894 0.46391 0.15826 0.51542 0.45702Table VII. EAT Bag DataPrecision TotalRecallCompleteRecallPartialRecallSpecificity Accuracy0.70158 0.9308 0.88978 0.8502 0.0125 0.66984Table VIII. EAT Bottle DataPrecision TotalRecallCompleteRecallPartialRecallSpecificity Accuracy0.76234 0.95345 0.93688 0.77586 0.00138 0.72496Table IX. EAT Box DataPrecision TotalRecallCompleteRecallPartialRecallSpecificity Accuracy0.76593 0.93631 0.91816 0.7197 0.0001 0.72697Table X. EAT Can DataPrecision TotalRecallCompleteRecallPartialRecallSpecificity Accuracy0.6903 0.93802 0.9198 0.72251 0.00351 0.66141VI. DiscussionAs the experiments show, the DOG algorithm has a higherprecision (approximately 90%) than the EAT algorithm(approximately 70%). As shown in Figure 18, the DOGalgorithm is better than the EAT algorithm in terms of truenegatives (97% vs. 56%) and false positives (2.5% vs. 44%).On the other hand, the EAT algorithm is better than the DOGalgorithm in terms of accuracy (approximately 40% vs.approximately 70%). The DOG algorithm appears to be betterthan the EAT algorithm on blurry images, but this has notbeen experimentally verified.The choice of DOG vs. EAT is the choice between more vs.less conservative barcode region localizations. The DOGalgorithm is preferable to the EAT algorithm when theobjective is to minimize the percentage of false positives andto increase the percentage of true negatives. If, however, theobjective is to maximize recall, either complete or partial, theEAT algorithm should be chosen.An advantage of the EAT algorithm is that it has actuallybeen implemented and tested on the Android platform whereasthe DOG algorithm is currently implemented in Python 2.7.2.using the Python Image Library (PIL). We plan to port thisDOG algorithm over to the Android platform in the nearfuture and evaluate it on the same or similar sample of images.It should also be pointed out that the presented algorithms canrun in the cloud if the data transmission rates are bothinexpensive and efficient.AcknowledgmentThis project has been supported, in part, by the MDSCCorporation. We would like to thank Dr. Stephen Clyde,MDSC President, for supporting our research andchampioning our cause.References Anding, R. Nutrition Made Clear. The Great Courses,Chantilly, VA, 2009. Kulyukin, V., Kutiyanawala, A., and Zaman, T. 2012.Eyes-Free Barcode Detection on Smartphones withNiblacks Binarization and Support Vector Machines. InProceedings of the 16-th International Conference onImage Processing, Computer Vision, and PatternRecognition, Vol. 1, (Las Vegas, Nevada, USA), IPCV2012, CSREA Press, July 16-19, 2012, pp. 284-290;ISBN: 1-60132-223-2, 1-60132-224-0. Ender Tekin and James M. Coughlan. 2010. A mobilephone application enabling visually impaired users to findand read product barcodes. In Proceedings of the 12thinternational conference on Computers helping peoplewith special needs (ICCHP10), Klaus Miesenberger,Joachim Klaus, Wolfgang Zagler, and Arthur Karshmer(Eds.). Springer-Verlag, Berlin, Heidelberg, 290-295. Tekin, E. and Coughlan, J.M., An Algorithm EnablingBlind Users to Find and Read Barcodes. WACV09, 2009 Wachenfeld, S.; Terlunen, S.; Xiaoyi Jiang; , "Robustrecognition of 1-D barcodes using camera phones,"Pattern Recognition, 2008. ICPR 2008. 19th InternationalConference on , vol., no., pp.1-4, 8-11 Dec. 2008. Adelmann R., Langheinrich M., Floerkemeier, C. AToolkit for BarCode Recognition and Resolving onCamera Phones - Jump Starting the Internet of Things. InProceedings of the Workshop on Mobile and EmbeddedInformation Systems (MEIS’06) at Informatik 2006,Dresden, Germany, Oct 2006 Gallo, O.; Manduchi, R.; , "Reading 1D Barcodes withMobile Phones Using Deformable Templates," PatternAnalysis and Machine Intelligence, IEEE Transactions on, vol.33, no.9, pp.1834-1843, Sept. 2011 Daw-Tung Lin, Min-Chueh Lin, and Kai-Yung Huang.2011. Real-time automatic recognition of omnidirectionalmultiple barcodes and DSP implementation. Mach. VisionAppl. 22, 2 (March 2011) Poynton, Charles (2003). Digital Video and HDTV: Algorithms andInterfaces. Morgan Kaufmann. ISBN 1-55860-792-7. Rohs, M. Real-world Interaction with Camera-phones. InProceedings of 2nd International Symposium onUbiquitous Computing Systems, pp. 74–89, Springer,2004. McCune, J., Perrig, A., Reiter, M. Seeing-is-believing:Using Camera Phones for Human-verifiableAuthentication. In Proceedings of IEEE Symposium onSecurity and Privacy, pp. 110 – 124, May 2005. Chai, D., Hock, F. Locating and Decoding ean-13Barcodes from Images Captured by Digital Cameras. InProceedings of Fifth International Conference onInformation, Communications and Signal Processing, pp.1595–1599, Dec. 2005. Zxing, http://code.google.com/p/zxing/, retrieved May 19,2011. Occipital, LLC. RedLaser, http://redlaser.com/.