Eyesight Sharing in Blind Grocery Shopping: Remote         P2P Caregiving through Cloud Computing           Vladimir Kulyu...
A typical example of how teleassistance is used for blind navigation is the systemdeveloped by Bujacz et. al. [1]. The sys...
recognition software application that runs on a cloud server and returns the top 5matches from its product database. Numbe...
processor and 512 MB RAM. The server runs an OpenCV 2.3.3(http://opencv.willowgarage.com/wiki/) image matching application...
tions in rectangular areas. It is formed by summing up the pixel values of the x,y co-ordinates from the origin to the end...
used in the laboratory study. We selected 45 products from 9 aisles (5 products peraisle) in the supermarket and took 6 im...
study 1, there were three cases when the helper requested the client to send anotherimage of a product because he could no...
As the second supermarket study suggests, cloud-based image matching may notbe necessary. The use of mobile phones as the ...
Upcoming SlideShare
Loading in …5

Eyesight Sharing in Blind Grocery Shopping: Remote P2P Caregiving through Cloud Computing


Published on

Published in: Technology, Health & Medicine
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Eyesight Sharing in Blind Grocery Shopping: Remote P2P Caregiving through Cloud Computing

  1. 1. Eyesight Sharing in Blind Grocery Shopping: Remote P2P Caregiving through Cloud Computing Vladimir Kulyukin, Tanwir Zaman, Abhishek Andhavarapu , and Aliasgar Kutiyanawala Department of Computer Science Utah State University Logan, UT, USA {vladimir.kulyukin}@usu.edu Abstract. Product recognition continues to be a major access barrier for visual- ly impaired (VI) and blind individuals in modern supermarkets. R&D ap- proaches to this problem in the assistive technology (AT) literature vary from automated vision-based solutions to crowdsourcing applications where VI cli- ents send image identification requests to web services. The former struggle with run-time failures and scalability while the latter must cope with concerns about trust, privacy, and quality of service. In this paper, we investigate a mo- bile cloud computing framework for remote caregiving that may help VI and blind clients with product recognition in supermarkets. This framework empha- sizes remote teleassistance and assumes that clients work with dedicated care- givers (helpers). Clients tap on their smartphones’ touchscreens to send images of products they examine to the cloud where the SURF algorithm matches in- coming image against its image database. Images along with the names of the top 5 matches are sent to remote sighted helpers via push notification services. A helper confirms the product’s name, if it is in the top 5 matches, or speaks or types the product’s name, if it is not. Basic quality of service is ensured through human eyesight sharing even when image matching does not work well. We implemented this framework in a module called EyeShare on two Android 2.3.3/2.3.6 smartphones. EyeShare was tested in three experiments with one blindfolded subject: one lab study and two experiments in Fresh Market, a su- permarket in Logan, Utah. The results of our experiments show that the pro- posed framework may be used as a product identification solution in supermar- kets.1 IntroductionThe term teleassistance covers a wide range of technologies that enable VI and blindindividuals to transmit video and audio data to remote caregivers and receive audioassistance [1]. Research evidence suggests that the availability of remote caregivingreduces the psychological stress on VI and blind individuals when they perform vari-ous tasks in different environments [2].
  2. 2. A typical example of how teleassistance is used for blind navigation is the systemdeveloped by Bujacz et. al. [1]. The system consists of two notebook computers: oneis carried by the VI traveler in a backpack and the other used by the remote sightedcaregiver. The traveler transmits video through a chest-mounted USB camera. Thetraveler wears a headset (an earphone and a microphone) to communicate with thecaregiver. Several indoor navigation experiments showed that VI travelers walkedfaster, at a steadier pace, and were able to navigate more easily when assisted by re-mote guides then when they navigated the same routes by themselves. Our research group has applied teleassistance to blind shopping in ShopMobile, amobile shopping system for VI and blind individuals [3]. Our end objective is to ena-ble VI and blind individuals to shop independently using only their smartphones.ShopMobile is our most recent system for accessible blind shopping that follows Ro-boCart and ShopTalk [4]. The system has three software modules: an eyes-free bar-code scanner, an OCR engine, and a teleassitance module called TeleShop. The eyes-free barcode scanner allows VI shoppers to scan UPC barcodes on products and MSIbarcodes on shelves. The OCR engine is being developed to extract nutrition factsfrom nutrition tables available on many product packages. TeleShop provides a tele-assistance backup in situations when the barcode scanner or the OCR engine’s mal-function. The current implementation of TeleShop consists of a server running on the VIshoppers smartphone (Google Nexus One with Android 2.3.3/2.3.6) and a client GUImodule running on the remote caregivers computer. All client-server communicationoccurs over UDP. Images from the phone camera are continuously transmitted to theclient GUI. The caregiver can start, stop, and pause the incoming image stream and tochange image resolution and quality. Images of high resolution and quality providemore reliable detail but may cause the video stream to become choppy. Lower resolu-tion images result in smoother video streams but provide less detail. The pause optionis for holding the current image on the screen. TeleShop has so far been evaluated in two laboratory studies with Wi-Fi and 3G[3]. The first study was done with two sighted students, Alice and Bob. The secondstudy was done with a married couple: a completely blind person (Carl) and his sight-ed wife (Diana). For both studies, we assembled four plastic shelves in our laboratoryand stocked them with empty boxes, cans, and bottles to simulate an aisle in a grocerystore. The shopper and the caregiver were in separate rooms. In the first study, weblindfolded Bob to act as a VI shopper. The studies were done on two separate days.The caregivers were given a list of nine products and were asked to help the shoppersfind the products and read the nutrition facts on the products packages or bottles. Avoice connection was established between the shopper and the caregiver via a regularphone call. Alice and Bob took an average of 57.22 and 86.5 seconds to retrieve aproduct from the shelf and to read its nutrition facts, respectively. The correspondingtimes for Carl and Diana were 19.33 and 74.8 seconds, respectively [3]. In this paper, we present an extension of TeleShop, called EyeShare, that leveragescloud computing to assist VI and blind shoppers (clients) with product recognition insupermarkets. The client takes a still image of the product that he or she currentlyexamines and sends it to the cloud. The image is processed by an open source object
  3. 3. recognition software application that runs on a cloud server and returns the top 5matches from its product database. Number 5 was chosen, because a 5-item list easilyfits on one Google Nexus One screen. The matches, in the form of a list of productnames, are sent to the helper along with the original image through a push notificationservice. The helper uses his or her smartphone to select the correct product name fromthe list or, if the product’s name is not found among the matches, to speak it into thesmartphone. If speech recognition (SR) does not work, the helper types in the prod-uct’s name. This framework is flexible in that various image recognition algorithmscan tested in the cloud. It is also possible to use no image recognition, in which caseall product recognition is done by the sighted caregiver. The remainder of our paper is organized as follows. In Section 3, we present ourcloud computing framework for remote caregiving with which mobile devices formad hoc peer-to-peer (P2P) communication networks. In Section 4, we describe threeexperiments in two different environments: a laboratory and a local supermarketwhere a blindfolded individual and a remote sighted caregiver evaluated the systemon different products. In Section 5, we present the results of our experiments. In Sec-tion 6, we discuss our investigation. Fig. 1. Cloud Computing Framework for Remote Caregiving.2 A Cloud Computing Framework for Remote P2P CaregivingThe cloud computing framework we have implemented consists of mobile devicesthat communicate with each other in an ad hoc P2P network. The devices haveGoogle accounts for authentication and are registered with Googles C2DM (cloud todevice messaging) service (http://code.google.com/android/c2dm/), apush notification service that allocates unique IDs to registered devices. Our frame-work assumes that the cloud computing services run on Amazons Elastic ComputingService (EC2) (http://aws.amazon.com/ec2/). Other cloud computing ser-vices may be employed. We configured an Amazon EC2 Linux server with 1 GHz
  4. 4. processor and 512 MB RAM. The server runs an OpenCV 2.3.3(http://opencv.willowgarage.com/wiki/) image matching application.Product images are saved in a MySQL database. The use of this framework requiresthat clients and helpers download the client and caregiver applications on theirsmartphones. The clients and helpers subsequently find each other and form an ad hocP2P network via C2DM registration IDs. Figure 1 shows this framework in action. A client sends a help request (Step 1). InEyeShare, this request consists of a product image. However, in principle, this requestcan be anything transmittable over available wireless channels such as Wi-Fi, 3G, 4G,Bluetooth, etc. The image is received by the Amazon EC2 Linux server where it ismatched against the images in the MySQL database. Our image matching application uses the SURF algorithm [5]. The matching op-eration returns the top 5 matches and sends the names of the corresponding productsalong with the URL that contains the client’s original image to the C2DM service(Step 2). Thus, the image is transmitted only once – in the help request. C2DM for-wards the message to the caregivers smartphone (Step 3). The helper confirms theproduct’s name by selecting it from the list of the top 5 matches. If the top matchesare incorrect, the helper uses SR to speak the product’s name or, if SR does not workor is not available, types it in on the touchscreen. If the helper cannot determine theproduct’s name from the image, the helper sends a resend request to the client. Thehelper’s message goes back to the C2DM service (Step 4) and then on to the clientssmartphone (Step 5). The helper application is designed in such a way that the helperdoes not have to interrupt its smartphone activities for too long to render assistance.2.1 Android Cloud to Device Messaging (C2DM) FrameworkC2DM (http://code.google.com/android/c2dm/) takes care of messagequeuing and delivery. Push notifications ensure that the application does not need tokeep polling the cloud server for new incoming requests. C2DM wakes up the An-droid application when messages are received through intent broadcasts. However,the application must be set up with the proper C2DM broadcast receiver permissions.In EyeShare, C2DM is used in two separate activities. First, C2DM forwards the mes-sage from the server to the helper application. This message consists of a formattedstring of the client registration ID, the names of the top 5 product matches, and theURL containing the client’s image. Clients’ images are temporarily saved on thecloud-based Linux server and removed as soon as the corresponding help requests areprocessed. Second, C2DM is used when helper messages are sent back to clients.2.2 Image MatchingWe have used SURF (Speeded Up Robust Features) [5] as a black box image match-ing algorithm in our cloud server. SURF extracts unique key points and descriptorsfrom images and later uses them to match indexed images against incoming image.SURF uses an intermediate image representation called Integral Image that is com-puted from the input image. This intermediate representation speeds up the calcula-
  5. 5. tions in rectangular areas. It is formed by summing up the pixel values of the x,y co-ordinates from the origin to the ends of the image. This makes computation time in-variant to change in size and is useful in matching large images. The SURF detector isbased on the determinant of the Hessian matrix. The SURF descriptor describes howpixel intensities are distributed within a scale dependent neighborhood of each interestpoint detected by Fast Hessian. Object detection using SURF is scale and rotationinvariant and does not require long training. The fact that SURF is rotation invariantmakes the algorithm useful in situations where image matching works with objectimages taken at different orientations than the images of the same objects used intraining.3 ExperimentsWe evaluated EyeShare in product recognition experiments at two locations. The firststudy was conducted in our laboratory. The second and third studies were conductedat Fresh Market, a local supermarket in Logan, Utah.3.1 A Laboratory StudyWe assembled four shelves in our laboratory and placed on them 20 products: bottles,boxes, and cans. The same setup was successfully used in our previous experimentson accessible blind shopping [3, 4]. We created a database of 100 images. Each of the20 products on the shelves had 5 images taken at different orientations. The SURFalgorithm was trained on these 100 images. A blindfolded individual was given aGoogle Nexus One smartphone (Android 2.3.3) with the EyeShare client applicationinstalled on it. A sighted helper was given another Google Nexus One (Android 2.3.3)with the EyeShare helper app installed on it. The blindfolded client was asked to take each product from the assembled shelvesand recognize it. The client took a picture of the product by tapping the touchscreen.The image was sent to the cloud Linux server where it was processed by the SURFalgorithm. The names of the top 5 matched products were sent to the helper for verifi-cation along with the URL with the original image through C2DM. The helper, locat-ed in a different room in the same building, selected the product’s name from the listof the top matches and sent the product’s name back to the client. If the product’sname was not in the list, the helper spoke the name of the product or, if SR was notrecognized after three attempts, typed in the product’s name on the virtualtouchscreen keyboard. The run for an individual product was considered completedwhen the product’s name was spoken on the client’s smartphone through TTS. Thus,the total run time (in seconds) for each run included all five steps given in Fig. 1.3.2 Store ExperimentsThe next two experiments were executed in Fresh Market, a local supermarket inLogan, Utah. Prior to the experiments we added 270 images to our image database
  6. 6. used in the laboratory study. We selected 45 products from 9 aisles (5 products peraisle) in the supermarket and took 6 images at different rotations for every product.The products included boxes, bottles, cans, and bags. We biased our selection toproducts that an individual can hold in one hand. SURF was retrained on these 370images (100 images from the lab study and 270 new ones). The same blindfolded subject who participated in the laboratory study was given aSamsung Galaxy S2 smartphone (Android 2.3.6) with the EyeShare client applicationinstalled on it. The client used a 4G data plan. The same helper who participated inthe laboratory study was given a Google Nexus One (Android 2.3.6) with the Eye-Share helper application installed on it. The helper was located in a building approx-imately one mile away from the supermarket. The helper used a Wi-Fi connection.The first set of experiments was confined to the first three aisles of the supermarketand lasted for 30 minutes. In each aisle, three products from the database and threeproducts not from the database were chosen by a research assistant who went to thesupermarket with the blindfolded subject. The assistant gave each product to the sub-ject who was asked to use the EyeShare client application to recognize the product.There was no training involved, because it was the same blindfolded subject who didthe laboratory study. The subject was given 16 products, one product at a time, by theassistant. One experimental run began at the time when the subject was given a prod-uct and went on until the time when the subject’s smartphone received the product’sname and read it out to the subject through TTS. The second set of experiments was conducted in the same supermarket on a differ-ent day with the same subject and helper. The experiments lasted 30 minutes. Since,as explained in the discussion section, the image matching did not perform as well aswe hoped it would in the first supermarket study, we did not do any image matchingin the second set of experiments. All product recognition was done by the remotesighted helper. The subject was given 17 products, one product at a time, taken fromthe next three aisles of the supermarket by the assistant. The experimental run timeswere computed in the same way as they were in the first supermarket study.4 ResultsThe results of the experiments are summarized in Table 1. Column 1 gives the envi-ronments where the experiments were executed. Column 2 gives the number of prod-ucts used in the experiments in the corresponding environments. Column 3 gives themean time (in seconds) of the experimental runs. Column 4 gives the standard devia-tions of the corresponding mean time values. Column 5 gives the number of times thecorrect product was found in the top 5 matches. Column 6 gives the mean number ofSR attempts. Column 7 gives the number of SR failures when the helper had to typethe product names on the touchscreen keyboard after attempting to use SR threetimes. In all experiments, all products were successfully recognized by the blindfold-ed subject. As can be seen in Table 1, in supermarket study 1, after our image data-base had grown in size, there were no correct product names in the top 5 matches.Consequently, we decided not to use SURF in supermarket study 2. In supermarket
  7. 7. study 1, there were three cases when the helper requested the client to send anotherimage of a product because he could not identify the product’s name from the originalimage. In supermarket study 1, there was one brief (several seconds) loss of Wi-Ficonnection on the helper’s smartphone. Table 1. Experimental results. Environment # Products Mean Time STD Top 5 Mean SR SR Failures Lab 16 40 .00021 8 1.1 0 Store 1 16 60 .00033 0 1.2 2 Store 2 17 60 .00081 0 1.1 35 Discussion Our study contributes to the recent body of research that addresses various aspectsof independent blind shopping through mobile and cloud computing (e.g., [6, 7, 8]).Our approach differs from these studies in its emphasis on dedicated remote caregiv-ing. Our approach addresses, at least to some extent, both image recognition failuresof fully automated solutions and the concerns about trust, privacy, and basic quality ofservice of pure crowdsourcing approaches. Dedicated caregivers alleviate imagerecognition failures through human eyesight sharing. Since dedicated caregiving ismore personal and trustworthy, clients are not required to post image recognitionrequests on open web forums, which allows them to preserve more privacy. Interestedreaders may watch our research videos at www.youtube.com/csatlusu formore information on our accessible shopping experiments and projects. The experiments show that the average product recognition is within one minute.The results demonstrate that SR is a viable option for product naming. We attributethe poor performance of SURF in the first supermarket study to our failure to properlyparameterize the algorithm. As we gain more experience with SURF, we may be ableto improve the performance of automated image matching. However, databasemaintenance may be a more serious long-term concern for automated image matchingunless there is direct access to the supermarket’s inventory control system. Our findings should be interpreted with caution, because we used only one blind-folded subject in the experiments. Nonetheless, our findings may serve as a basis forfuture research on remote teleassisted caregiving in accessible blind shopping. Ourexperience with the framework suggests that telassistance may be an feasible optionfor VI individuals in modern supermarkets. Dedicated remote caregiving can be ap-plied not only to product recognition but also to assistance with cash payments andsupermarket navigation. It is a relatively inexpensive solution, because the only re-quired hardware device is a smartphone with a data plan.
  8. 8. As the second supermarket study suggests, cloud-based image matching may notbe necessary. The use of mobile phones as the means of caregiving allows caregiversto provide assistance from the comfort of their homes or offices or on the go. As dataplans move toward 4G network speeds, we can expect faster response times and betterquality of service. Faster network connections may, in time, make it feasible to com-municate via streaming videos.References 1. Bujacz M., Baranski P., Moranski M., Strumillo P., and Materka A. “Remote Guidance for the Blind - A Proposed Teleassistance System and Navigation Trials,” In Proceedings of the Conference on Human System Interactions, pp. 888-892, IEEE, Krakow, Poland, 2008. 2. Peake P. and Leonard J., “The Use of Heart-Rate as an Index of Stress in Blind Pedestri- ans,” Ergonomics, 1971. 3. Kutiyanawala, A., Kulyukin, V., and Nicholson, J. “Teleassistance in Accessible Shopping for the Blind.” In Proceedings of the 2011 International Conference on Internet Compu- ting, ICOMP Press, pp. 190-193, July 18-21, 2011, Las Vegas, USA. 4. Kulyukin, V. and Kutiyanawala, A. “Accessible Shopping Systems for Blind and Visually Impaired Individuals: Design Requirements and the State of the Art.” The Open Rehabili- tation Journal, ISSN: 1874-9437, Volume 2, 2010, pp. 158-168, DOI: 10.2174/1874943701003010158. 5. Bay, H., Tuytelaars, T., L. Van Cool. “SURF: Speeded Up Robust Features.” Computer Vision-ECCV, pp. 404-417, Springer-Verlag, 2006. 6. Sam S. Tsai, S., Chen, D., Chandrasekhar, V., Takacs, G., Ngai-Man, C., 7. Vedantham, R., Grzeszczuk, R., and Girod, B. “Mobile Product Recognition.” In Proceed- ings of the International Conference on Multimedia (MM 10). ACM, New York, NY, USA, 1587-1590. DOI=10.1145/1873951.1874293, http://doi.acm.org/10.1145/1873951.1874293 8. Girod, B., Chandrasekhar, V., Chen, D.M., Ngai-Man C., Grzeszczuk, R., Reznik, Y., Takacs, G., Tsai, S.S., and Vedantham, R. "Mobile Visual Search," Signal Processing Magazine, IEEE, vol.28, no.4, pp.61-76, July 2011. doi: 10.1109/MSP.2011.940881. 9. Von Reischach, F., Michahelles, F., Guinard, D., Adelmann, R., Fleisch, E., and Schmidt, A. “An Evaluation of Product Identification Techniques for Mobile Phones.” In Proceed- ings of the 2nd IFIP TC13 Conference in Human-Computer Interaction (Interact 2009). Uppsala, Sweden.