An Algorithm for In-Place Vision-Based Skewed 1D Barcode Scanning in the Cloud

531 views

Published on

An Algorithm for In-Place Vision-Based Skewed 1D Barcode Scanning in the Cloud

Published in: Science
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
531
On SlideShare
0
From Embeds
0
Number of Embeds
267
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

An Algorithm for In-Place Vision-Based Skewed 1D Barcode Scanning in the Cloud

  1. 1. IPCV 2014 An Algorithm for In-Place Vision-Based Skewed 1D Barcode Scanning in the Cloud Vladimir Kulyukin Department of Computer Science Utah State University Logan, UT, USA vladimir.kulyukin@usu.edu Tanwir Zaman Department of Computer Science Utah State University Logan, UT, USA tanwir.zaman@aggiemail.usu.edu Abstract—An algorithm is presented for in-place vision-based skewed 1D barcode scanning that requires no smartphone camera alignment. The algorithm is in-place in that it performs no rotation of input images to align localized barcodes for scanning. The algorithm is cloud-based, because image processing is done in the cloud. The algorithm is implemented in a distributed, cloud-based system. The system’s front end is a smartphone application that runs on Android 4.3 or higher. The system’s back end is currently deployed on a four node Linux cluster used for image recognition and data storage. The algorithm was evaluated on a set of 506 video recordings of common grocery products. The videos had a 1280 x 720 resolution, an average duration of 15 seconds, and were recorded on an Android Galaxy Nexus smartphone in a local supermarket. The results of the experiments are presented and discussed. Keywords—computer vision; barcode detection; barcode scanning; mobile computing; skewed barcodes I. Introduction According to the World Health Organization (www.who.int), obesity causes such diseases as diabetes, kidney failures, and strokes and predicts that these diseases will be a major cause of death worldwide. Berreby [1] points out that, for the first time in human history, obese people outnumber underfed ones. Such chronic illnesses as diabetes threaten many individuals with numerous complications that include but are not limited to blindness and amputations [2, 3]. The U.S. Academy of Nutrition and Dietetics (www.eatright.org) estimates that approximately twenty six million Americans have diabetes and seven million people in the U.S. are estimated to be aware of their condition. It is estimated that by 2030 the prevalence of diabetes in the world will reach 4.4%, which will equal to approximately 366 million people [3]. While there is no cure for diabetes 1 or 2 as of now, many experts agree that it can be successfully managed. Successful diabetes management has three integral components: healthy diet, blood glucose management, and physical exercise [4]. In this paper, we focus on healthy diet. An important component of a healthy diet is the patients’ comprehension and retention of nutritional information and understanding of how different foods and nutritional components affect their bodies. In the U.S. and many other countries, nutritional information is primarily conveyed to consumers through nutritional labels (NLs). Unfortunately, even highly motivated consumers, who deliberately look for NLs to make healthy food choices, find it difficult to locate them on many products [5]. One way to improve the comprehension and retention of nutritional information by consumers is to use computer vision to scan barcodes in order to retrieve NLs from databases. Unfortunately, a common weakness of many barcode scanners, both open source and commercial, is the camera alignment requirement: the smartphone camera must be aligned with a target barcode to obtain at least one complete scanline for successful barcode recognition [6]. This requirement is acceptable for sighted users but presents a serious accessibility barrier to visually impaired and blind users or to users who may not have good physical command of their hands. Skewed barcode scanning is also beneficial for sighted smartphone users, because it makes barcode scanning faster due to the absence of the camera alignment requirement. Another weakness of the current mobile smartphone scanners is lack of coupling of barcode scanning to comprehensive NL databases from which nutritional information can be retrieved on demand. To address the problem of skewed barcode scanning, we developed an algorithm for skewed barcode localization on mobile smartphones [7]. In this paper, an algorithm is presented for in-place vision-based skewed barcode scanning that no longer requires the smartphone camera alignment. The algorithm is in-place in that it performs no rotation of input images to align localized barcodes for scanning. The algorithm is cloud-based, because image processing is done in the cloud. The algorithm is implemented in a distributed, cloud-based system. The system’s front end is a smartphone application that runs on Android 4.3 or higher. The system’s back end is currently deployed on a four node Linux cluster used for image recognition and nutritional data storage. The front end smartphone sends captured frames to the back end cluster across a wireless data channel (e.g., 3G/4G/Wi-Fi) where barcodes, both skewed and aligned, are recognized. Corresponding NLs are retrieved from a cloud database, where they are stored as HTML documents, and sent across the data channel back to the smartphone where the HTML documents are displayed on the touchscreen. Wikipedia links to important nutrition terms are embedded for better comprehension. Consumers can use standard touch gestures (e.g., zoom in/out, swipe) available on mainstream smartphone platforms to manipulate the NL’s surface size. The NL database currently includes approximately 230,000 products compiled from public web sites by a custom crawler.
  2. 2. IPCV 2014 The remainder of this paper is organized as follows. In Section II, some background information is given on the related work as well as on the research of our laboratory on proactive nutrition management and mobile vision-based nutritional information extraction from product packages. In Section III, we outline the details of our algorithm for in-place skewed barcode scanning in the cloud. In Section IV, we describe our four node Linux cluster for image processing and data storage. Section V presents several experiments with the system and discusses the results. Section VI summarizes our findings, presents our conclusions, and outlines several research venues for the future. II. Related Work A. Barcode Localization and Scanning Much research has been done to on mobile barcode scanning. Tekin and Coughlan [8] describe a vision-based algorithm to guide visually impaired smartphone users to center target barcodes in the camera frame via audio instructions. Wachenfeld et al. [9] present another vision-based algorithm that detects barcodes on a mobile phone via image analysis and pattern recognition methods. A barcode is assumed to be present in the image. The algorithm overcomes typical distortions, such as inhomogeneous illumination, reflections, or blurriness due to camera movement. Unfortunately, the algorithm does not appear to address the localization and scanning of skewed barcodes. Adelmann et al. [10] have developed a vision-based algorithm for scanning barcodes on mobile phones. The algorithm relies on the fact that, if multiple scanlines are drawn across the barcode in various arbitrary orientations, one of them might cover the whole length of the barcode and result in a successful barcode scan. This recognition scheme does not appear to handle distorted images, because it is not always possible to obtain the scanlines that cover the entire barcode. Galo and Manduchi [11] present an algorithm for 1D barcode reading in blurred, noisy, and low resolution images. However, the algorithm detects barcodes only if they are slanted by less than 45 degrees. Lin et al. [12] have developed an automatic barcode detection and recognition algorithm for multiple and rotation invariant barcode decoding. The proposed system is implemented and optimized on a DM6437 DSP EVM board, a custom embedded system built specifically for barcode scanning. A common weakness of many barcode scanners, both open source and commercial (e.g., [13]) is the camera alignment requirement: the smartphone camera must be aligned with a target barcode to obtain at least one complete scanline for successful barcode recognition. This requirement is acceptable for sighted users but presents a serious accessibility barrier to visually impaired shoppers or to shoppers who may not have good dexterity. Skewed barcode scanning is also beneficial for sighted smartphone users, because it may make barcode scanning faster because the camera alignment requirement no longer needs to be satisfied. B. Proactive Nutrition Management Many nutritionists and dieticians consider proactive nutrition management to be a key factor in managing diabetes. As more and more individuals start managing their daily activities with smartphones and other mobile devices, such devices hold great potential to become self-management tools for diabetes and other chronic ailments. Unfortunately, modern nutrition management systems assume that users understand how to collect nutritional data and can be persuaded to perform necessary data collection with emails, SMS’s or other digital prompts. Such systems often underperform, because many users find it difficult to integrate nutritional data collection into their daily activities due to lack of time, motivation, or training. Eventually they turn off or ignore digital stimuli [14]. To overcome these challenges, we have begun to develop a Persuasive NUTrition Management System (PNUTS). PNUTS seeks to shift current research and clinical practices in nutrition management toward persuasion, automated nutritional information extraction and processing, and context- sensitive nutrition decision support. The system is based on a nutrition management approach inspired by the Fogg Behavior Model (FBM) [14], which states that motivation alone is insufficient to stimulate target behaviors. Even a motivated user must have both the ability to execute a behavior and a trigger to engage in that behavior at an appropriate place and time. The algorithm presented in this paper is one of the algorithms used by PNUTS for vision-based nutritional information extraction from product packages. III. Skewed Barcode Scanning The algorithm for skewed 1D barcode scanning uses our algorithm for skewed barcode localization [7]. The algorithm for skewed barcode localization localizes skewed barcodes in captured frames by computing dominant orientations of gradients (DOGs) of image segments and collecting smaller segments with similar dominant gradient orientations into larger connected components. In Fig. 1, the output of the DOG localization algorithm is shown as a white rectangle around the skewed barcode. Fig. 2 shows the control flow of our algorithm for skewed barcode scanning. The algorithm takes as input an image captured from the smartphone camera’s video stream. This image is given to the DOG algorithm. If the barcode is not localized, another frame is grabbed from the video stream. If the DOG algorithm localizes a barcode, as shown in Fig. 1, the coordinates of the detected region is passed to the line grower component. The line grower component selects the center of the localized region, which is always a rectangle, and starts growing scanlines. For an example of how the line growing component works, consider Fig. 3. In Fig. 3, the horizontal and vertical white lines intersect in the center of the localized region. The skew angle of the localized barcode computed by the DOG algorithm is 120 degrees. The line that passes the localized region’s center at the skew angle detected by the DOG algorithm is referred to as the skew line. After the center of the region and the skew angle are determined, the line growing component begins to grow scanlines that are orthogonal to the skew line. In Fig. 3, the skew line is denoted as the black line that passes the region’s center at 120 degrees. A scanline is grown on both sides of the skew line. In Fig. 3, the upper half of the scanline is shown as a red arrow and the lower half of the scanline is shown as a blue arrow. Each half is extended until it reaches the portion of the image where the barcode lines are no longer detectable.
  3. 3. IPCV 2014 A five pixel buffer region is subsequently added after the scanline’s end to improve subsequent scanning. Figure 1. Localization of a skewed barcode The number of scanlines grown on both sides of the skew line is an adjustable parameter. In the current implementation of the algorithm, the value of this parameter is set to 10. The scanlines are arrays of luminosity values for each pixel in their growth path. For each grown scanline, the Line Widths (LW) for the barcode are then computed by finding two points that are on the intensity curve but lie on the opposite sides of the mean intensity. By modelling the curve between these points as a straight line we obtain the intersection point between the intensity curve and the mean intensity. Figure 2. Skewed barcode scanning algorithm Line Colors (LC) are classified to be either black or white based on whether the pixel intensity is less than or greater than the mean intensity of the scanline. Once the LW and LC are known, each scanline is decoded using the standard EAN decoding scheme. Since UPC is a subset of EAN, this scanning algorithm can decode both EAN and UPC barcodes.The number of scanlines is currently not dynamically adjusted. In other words, ten scanlines are grown and then each of them is passed to the barcode scanner. As soon as a successful scan is obtained, the remaining scanlines, if there are any left, are not scanned. If no scanlines result in a successful scan, the control returns back to caputring frames from the video stream. Figure 3. Growth of a scanline orthogonal to the skewed angle of a barcode Figure 4. Sequence of images that demonstrates how the algorithm works Fig. 4 shows a sequence of images that gives a visual demonstration of how the algorithm processes a captured frame. The top image in Fig. 4 is a frame captured from the
  4. 4. IPCV 2014 smartphone camera’s video stream. The second image in Fig. 4 shows the result of the clustering stage of the DOG algorithm that clusters small subimages with similar dominant gradient orientations. The third image in Fig. 4 shows a white rectangle that denotes the localized barcode. The fourth image in Fig. 4 shows the ten scanlines, one of which results in a successful skewed barcode scan. IV. Linux Cluster for Image Processing and Data Storage We built a Linux cluster out of four Dell computers for cloud- based computer vision and data storage. Each computer has an Intel Core i5-650 3.2 GHz dual-core processor that supports 64-bit computing. The processors have 3MB of cache memory. The machines are equipped with 6GB DDR3 SDRAM and have Intel integrated GMA 4500 Dynamic Video Memory Technology 5.0. All machines have 320 GB of hard disk space. Ubuntu 12.04 LTS was installed on each machine. We used JBoss (http://www.jboss.org) to build and configure the cluster and the Apache mod_cluster module (http://www.jboss.org/mod_cluster) to configure the cluster for load balancing. Our cluster has one master node and three slaves. The master node is the domain controller. The master node also runs mod_cluster and httpd. All four machines are part of a local area network and have hi-speed Internet connectivity. We have installed JDK 7 in each node. The JBoss Application Server (JBoss AS) is a free open- source Java EE-based application server. In addition to providing a full implementation of a Java application server, it also implements the Java EE part of Java. The JBoss AS is maintained by jboss.org, a community that provides free support for the server. JBoss is licensed under the GNU Lesser General Public License (LGPL). The Apache mod_cluster module is an httpd-based load balancer. The module is implemented with httpd as a set of modules for httpd with mod_proxy enabled. This module uses a communication channel to send requests from httpd to a set of designated application server nodes. An additional communication channel is established between the server nodes and httpd. The nodes use the additional channel to transmit server-side load balance factors and lifecycle events back to httpd via a custom set of HTTP methods collectively referred to as the Mod-Cluster Management Protocol (MCMP). The mod_cluster module provides dynamic configuration of httpd workers. The proxy's configuration is on the application servers. The application server sends lifecycle events to the proxies, which enables the proxies to auto- configure themselves. The mod_cluster module provides accurate load metrics, because the load balance factors are calculated by the application servers, not the proxies. All nodes in our cluster run JBoss AS 7. Jboss AS 7.1.1 is the version of the application server installed on the cluster. Apache httpd runs on the master with the mod_cluster-1.2.0 module enabled. The Jboss AS 7.1.1 on the master and the slaves are discovered by httpd. A Java servlet for image recognition is deployed on the master node as a web archive file. The servlet’s URL is hardcoded in every front end smartphone. The servlet receives images uploaded with http post requests, recognizes barcodes, and sends an HTML response back to front end smartphones. No data caching is currently done on the servlet or the front end smartphones. V. Experiments The skewed barcode scanning experiments were conducted on a set of 506 video recordings of common grocery products that we have made publicly available [15]. The videos have a 1280x720 resolution, an average duration of 15 seconds and were recorded on an Android 4.2.2 Galaxy Nexus smartphone in a supermarket in Logan, UT. All videos were taken by an operator who held a grocery product in one hand and a smartphone in the other. The videos covered four different categories of products: bags, boxes, bottles, and cans. Colored RGB frames were extracted from each video at the rate of one frame per second and grouped together into different categories of products. Each frame was automatically classified as blurred or sharp by the blur detection scheme using Haar wavelet transforms [16, 17] that we implemented in Python. Each frame was manually classified as having a barcode or not. Figure 5. Average request-response times in milliseconds After all classifications were completed (blurred vs. sharp, barcode vs. no barcode), the classified frames were stored in the smartphone's sdcard. The evaluation procedure was implemented as follows. A started Android service would take one frame at a time and uploaded it to the Linux cluster via an http POST request over the USU Wi-Fi network. The network has a download speed of 72.31 Mbps and an upload speed of 29.64 Mbps. Each captured frame was processed on the cluster as follows. The DOG localization algorithm [7] was executed and, if a barcode was successfully localized, the barcode was scanned within the localized region in place with ten scanlines. The detection result was sent back to the smartphone and recorded on the smartphone’s sdcard. The average request-response time for each session was calculated. Fig. 5 gives the graph of the node cluster’s request-response times. The lowest average was 712 milliseconds; the highest average was 1813 milliseconds.
  5. 5. IPCV 2014 Figure 6. Skewed barcode scanning in blurred and sharp images Fig. 6 shows the performance of the skewed barcode scanning algorithm on blurred and sharp images for each of the four product categories. As mentioned above, the blurriness coefficient was computed automatically through the Haar transform [17]. Figure 7. Average barcode scan times for each product category in a local supermarket in seconds We also conducted skewed barcode scanning experiments in a local supermarket. A sighted user was given a Galaxy Nexus 4 smartphone with an AT&T 4G connection. Our front end application was installed on the smartphone. The user was asked to scan ten products of his choice in each of the four categories: box, can, bottle, and bag. The user was told that he can choose which products to scan. A research assistant accompanied the user and recorded the scan times for each product. Each scan time started from the moment the user began scanning and ended when the response was received from the server. Fig. 7 denotes the average times in seconds for each category. VI. Results Images that contained barcodes for all four product categories had no false positives. As shown in Fig. 6, for each product category, the sharp images had a significantly better true positive percentage than the blurred images. A comparison of the bar charts in Fig. 6 reveals that the true positive percentage of the sharp images is more than double that of the blurry ones. Images without any barcode for all categories produced 100% accurate results with all true negatives, irrespective of the blurriness. In other words, the algorithm is highly specific in that it does not detect barcodes in images that do not contain them. Another observation on Fig. 6 is that the algorithm showed its best performance on boxes. The algorithm’s performance on bags, bottles, and cans was worse because of crumpled, curved, or shiny surfaces. These surfaces caused many light reflections and hindered performance of the skewed barcode localization and scanning. The percentages of the skewed barcode localization and scanning were better on boxes due to smoother surfaces. Quite expectedly, the sharpness of a frame also makes a positive difference in that the algorithm performed much better on sharp images in each product category. Specifically, on sharp images, the algorithm performed best on boxes with a true positive percentage of 54%, followed by bags at 44%, bottles at 32%, and cans at 30%. As Fig. 7 shows, that the average scan time was lowest for boxes and largest for cans. This finding is in line with the results in Fig. 6. The average scan times for cans and bags were significantly longer than for boxes. The average scan time for bottles was longer than for boxes but shorter than for cans and bags. However, the explanation is not limited to crumpled, curved, and shiny surfaces. Another reason for the slower scan times on individual products in each product category is the availability of Internet connectivity at various locations in the supermarket. During our scan experiments in the supermarket, we noticed that at some areas of the supermarket the Internet connection did not exist, which caused delays in barcode scanning. For several products, a 10- or 15-step change in location within a supermarket resulted in a successful barcode scan. VII. Discussion The skewed barcode scanning algorithm presented in this paper targets medium- to high-end mobile devices with single or quad-core ARM systems. Since cameras on these devices capture several frames per second, the algorithm is designed to minimize false positives rather than maximize true ones, because, at such frequent frame capture rates, it is far more important to minimize the processing time per frame. In other words, the algorithm is highly specific, where specificity is the percentage of true negative matches out of all possible negative matches. Figures 8, 9, 10, and 11 demonstrate the specificity of the algorithm on all four categories of products. In all product categories, the true negative and false positive percentages are 0, which means that the algorithm accurately does not recognize barcodes in images that do not contain them. In all categories, the false negative percentages are relatively high. This is done by design. The algorithm is designed to be
  6. 6. IPCV 2014 conservative in that it rejects the frames on the slightest chance that it does not contain any barcode. While this increases false negatives, it keeps both true negatives and false positives close to zero. Figure 8. Performance on sharp box images Figure 9. Performance on sharp can images Figure 10. Performance on sharp bag images A limitation of the current implementation of this algorithm which was discovered during our field experiments in a supermarket is that there is no run-time checking for the availability of Internet connectivity. If such checking is implemented, the user can be quickly notified that there is no Internet connectivity and the frame grabbing from the video stream can be temporarily halted. Figure 11. Performance on sharp bottle images Another limitation of the current implementation is that it does not compute the blurriness of the captured frame before sending it to the cluster for barcode localization and detection. As shown in Fig. 6, the scanning results are substantially higher on sharp images than on blurred images. This limitation points to a potential improvement that we plan to implement in the future. When a frame is captured, its blurriness coefficient can be computed on the smartphone and, if it is high, the frame should not even be sent to the cluster. This improvement will also reduce the load on the cluster and may increase response times. Another approach to handling blurred inputs is to improve camera focus and stability, both of which are outside the scope of this algorithm, because it is, technically speaking, a hardware problem. It is likely to work better in later models of smartphones. The current implementation on the Android platform attempts to force focus at the image center but this ability to request camera focus is not present in older Android versions. Over time, as device cameras improve and more devices run newer versions of Android, this limitation will have less impact on recall but it will never be fixed entirely. Finally, we would like to implement a tighter integration of the current system with proactive nutrition management. Specifically, we plan to integrate skewed barcode scanning with a wireless glucometer so that nutrition intake recording can be coupled with glucometer readings. This will allow users to actively monitor the impact that various foods have on the level of glucose in their blood. References [1] Anding, R. Nutrition Made Clear. The Great Courses, Chantilly, VA, 2009. [2] D. Berreby. “The obesity era.” The Aeon Magazine. June 19, 2013.
  7. 7. IPCV 2014 [3] Rubin, A. L. Diabetes for Dummies. 3rd Edition, Wiley, Publishing, Inc. Hoboken, New Jersey, 2008. [4] Eirik Årsand, E., Tatara, N., Østengen, G. and Gunnar Hartvigsen, G. “Mobile phone-based self-management tools for type 2 diabetes: the few touch application.” Journal of Diabetes Science and Technology, Vol. 4, Issue 2, March 2010. [5] Graham, D. J., Orquin, J. L., and Visshers, V. H. M. “Eye tracking and nutritional label use: a review of the literature and recommendations for label enhancement.” Food Policy, vol. 32, pp. 378-382, 2012. [6] Kulyukin, V., Kutiyanawala, A., and Zaman, T. "Eyes- Free Barcode Detection on Smartphones with Niblack's Binarization and Support Vector Machines." In Proceedings of the 16-th International Conference on Image Processing, Computer Vision, and Pattern Recognition ( IPCV 2012), Vol. I, pp. 284-290, CSREA Press, July 16-19, 2012, Las Vegas, Nevada, USA. ISBN: 1-60132-223-2, 1-60132-224-0. [7] Kulyukin, V. and Zaman T. "Vision-Based Localization of Skewed UPC Barcodes on Smartphones." In Proceedings of the International Conference on Image Processing, Computer Vision, & Pattern Recognition (IPCV 2013), pp. 344-350, pp. 314-320, ISBN 1-60132- 252-6, CSREA Press, Las Vegas, NV, USA. [8] Tekin, E. and Coughlan, J. “A mobile phone application enabling visually impaired users to find and read product barcodes.” In Proceedings of the 12th International Conference on Computers Helping People with Special needs (ICCHP'10), Klaus Miesenberger, Joachim Klaus, Wolfgang Zagler, and Arthur Karshmer (Eds.). Springer- Verlag, Berlin, Heidelberg, pp. 290-295, 2010. [9] Wachenfeld, S., Terlunen, S., and Jiang, X. "Robust recognition of 1-D barcodes using camera phones." In Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), pp.1-4, Dec. 8-11, 2008. ISSN: 1051-4651, IEEE. [10] Adelmann R., Langheinrich M., Floerkemeier, C. A Toolkit for BarCode Recognition and Resolving on Camera Phones - Jump Starting the Internet of Things. Workshop on Mobile and Embedded Information Systems (MEIS’06) at Informatik 2006, Dresden, Germany, Oct 2006. [11] Gallo, O.; Manduchi, R., "Reading 1D barcodes with mobile phones using deformable templates." IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.33, no.9 ,pp.1834-1843,Sept.2011. doi:10.1109/TPAMI.2010.229. [12] Lin, D. T., Lin, M.C., and Huang, K.Y. 2011. “Real-time automatic recognition of omnidirectional multiple barcodes and DSP implementation.” Mach. Vision Appl. Vol 22, num. 2, pp. 409-419, March 2011. doi:10.1007/s00138-010-0299-3. [13] Zxing, http://code.google.com/p/zxing/. [14] Fog, B.J. "A behavior model for persuasive design." In Proc. 4th International Conference on Persuasive Technology, Article 40, ACM, New York, USA, 2009. [15] Mobile Supermarket Barcode Videos. https://www.dropbox.com/sh/q6u70wcg1luxwdh/LPtUBd wdY1. [16] Tong, H., Li, M., Zhang, H., and Zhang, C. "Blur detection for digital images using wavelet transform," In Proceedings of the IEEE International Conference on Multimedia and Expo, Vol.1, pp. 27-30, June 2004. doi: 10.1109/ICME.2004.1394114. [17] Nievergelt, Y. Wavelets Made Easy. Birkhäuser, Boston, 1999.

×