Towards A Real-Time System for Finding and Reading Signs for Visually Impaired Users

Uploaded on

Portable and Mobile Systems in Assistive Technology - Towards A Real-Time System for Finding and Reading Signs for Visually Impaired Users - Coughlan, James (f)

Portable and Mobile Systems in Assistive Technology - Towards A Real-Time System for Finding and Reading Signs for Visually Impaired Users - Coughlan, James (f)

More in: Business , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Towards A Real-Time System for Finding and Reading Signs for Visually Impaired Users James Coughlan, Ph.D.
  • 2. Informational signsSigns are ubiquitous indoors and outdoorsUseful for wayfinding, finding shops andbusinesses, accessing variety of servicesBut nearly all are inaccessible to blind andvisually impaired persons! 2
  • 3. OCR (Optical Character Recognition)Originally developed for clear images of textdocuments, acquired by a flatbed scannerNot equipped to find text in an image with lotsof non-text clutter (buildings, trees, etc.) 3
  • 4. Portable OCR for visually impaired usersSmartphone (Nokia N82) implementation:kReader Mobile, knfbReader Mobile (K–NFBReading Technology, Inc.) 4
  • 5. kReader Mobile limitationAssumes text comprises all (or most) ofimage:“Get as close to the text as you can withoutcutting off any text, as it is displayed on thescreen”“Distance from the target can greatly affect thetext recognition quality. Most, but not all,documents should be approximately 10 inchesfrom the Reader.” (KNFB Mobile Reader UserGuide) 5
  • 6. Related workMuch research on computer vision algorithmsfor finding text in cluttered imagesVery challenging problemEven if text is correctly located in an image,many problems with OCR:• non-standard fonts• poor illumination• curved surfaces, perspective distortion• other forms of noise in images 6
  • 7. Related work (continued)Some smartphone apps find text, read itand translate it in real time 7
  • 8. Related work (continued)A small amount of work targeted specifically atfinding and reading text for blind and visuallyimpaired persons:•C. Yi & Y. Tian, 2011•“Smart Telescope” project from BlindsightCorporation ( find textregions and present enlarged text to low visionuser 8
  • 9. Our approach• Design algorithm to rapidly find text on Android smartphone running in video mode (640 x 480 pixels)• Perform on-board OCR (Tesseract)• Read aloud (text-to-speech) immediately• For speed, all processing is done on-board (no need for internet connection). Read aloud up to 1-2 frames per second. 9
  • 10. System UI (user interface)• Philosophy: text detection/reading errors are inevitable. To overcome them, have user obtain multiple readings of each text sign over time. Ignore spurious (unreproducible) readings, and come to consensus about true contents of each sign.• If multiple text strings in one image, read aloud in “raster” order (from top to bottom, and along a line from left to right) 10
  • 11. Overview of algorithm 11
  • 12. Big challenge: how to aim the smartphone camera?If you are blind, you may have little idea whereto aim the camera! (kReader Mobile UserGuide has an entire section on “Learning toAim Your Reader”)Also, text is best read when it is horizontal, butmany blind users have trouble holding camerahorizontal 12
  • 13. Help with aiming: UI features• Tilt detection function: allows user to vary pitch and yaw but forces roll to be zero. Issue vibration any time roll is far enough from zero. Allows user to point in any compass direction, and to aim high or low depending on whether text is above or below shoulder height. Increases chances that text appears horizontal in image. 13
  • 14. Help with aiming: UI features2) Warning whenever text is close to being cutoff: read aloud detected text in a low pitch. Red box = camera field of view  “Smoking” (low pitch) 14
  • 15. Help with aiming: UI features Red box = camera field of view  “No smoking” (normal pitch) 15
  • 16. Help with aiming: UI features3) Warning whenever text is small: read text ina high pitch  signal user to approach text forclearer view Red box = camera field of view NO SMOKING  “No smoking” (high pitch) 16
  • 17. ExperimentsTen signs printed out and placed on twoadjoining walls of conference roomTwo blind volunteer subjects, out of reachof wallBrief training session: purpose ofexperiment, how to hold and movecamera 17
  • 18. -Subjects told to search for an unknownnumber of signs on the two walls, and totell experimenter content of each signdetected 18
  • 19. Experimental resultsSubject 1:•6 signs reported perfectly correctly•2 signs completely missed•2 other signs reported with some errors:“Dr. Samuels” was detected as “Samuels”(audible to experimenter but not subject)•“Meeting in Session” sign gave rise tothe words “Meeting” and “section” (thoughthey were not uttered together) 19
  • 20. Experimental resultsSubject 2:•3 signs reported perfectly correctly•Typical errors:- “Exam Room 150” was detected andread aloud correctly, but subject wasunable to understand the word “exam”- Reported “D L Samuels meeting insession” as a sign, which is an incorrectcombination of two signs, “Dr. Samuels”(which the system misread as “Dr.”) and 20“Meeting in Session”
  • 21. DiscussionSystem still very difficult to use!False positives and false negatives (i.e., missed text) still a big problem  we are improving our text detection algorithmEven when text is correctly detected, OCR still causes many errorsSlow processing speeds (plus camera motion blur) force user to pan camera very slowly 21
  • 22. Discussion (continued)UI planned in the future:• Have user scan environment, sound an audio tone whenever text is detected• Compute an image mosaic (panorama) of entire scene, to seamlessly read text strings that don’t fit inside a single image frame• Cluster multiple text strings into distinct sign regions• User will be able to hear text-to-speech repeated for any sign region 22
  • 23. Discussion (continued)Further in the future:“Visual spam” is a big problem  task-driven search (“find me Dr. Smith’s office”)Finding signs will always be difficult at times (even for people with normal vision)  integration with “indoor GPS” (i.e., localization indoors) to provide useful, location-specific information 23
  • 24. Thanks to…First author: Dr. Huiying Shen (Smith- Kettlewell)Collaborators: Dr. Roberto Manduchi (UC Santa Cruz), Dr. Vidya Murali and Dr. Ender Tekin (Smith-Kettlewell)Funding from NIH and NIDRR 24