The document describes research on developing a real-time system to help visually impaired users find and read signs using a smartphone. The system uses computer vision algorithms to detect text in images and performs optical character recognition (OCR) to read the text aloud. Experiments with blind volunteers found the system was able to correctly read some signs but also had false positives and negatives. The discussion outlines plans to improve text detection and address other challenges like camera motion blur and limited processing speeds.
8447779800, Low rate Call girls in Saket Delhi NCR
Real-Time Sign Reading for Visually Impaired
1. Towards A Real-Time System for
Finding and Reading Signs
for Visually Impaired Users
James Coughlan, Ph.D.
2. Informational signs
Signs are ubiquitous indoors and outdoors
Useful for wayfinding, finding shops and
businesses, accessing variety of services
But nearly all are inaccessible to blind and
visually impaired persons! 2
3. OCR (Optical Character
Recognition)
Originally developed for clear images of text
documents, acquired by a flatbed scanner
Not equipped to find text in an image with lots
of non-text clutter (buildings, trees, etc.)
3
4. Portable OCR for visually
impaired users
Smartphone (Nokia N82) implementation:
kReader Mobile, knfbReader Mobile (K–NFB
Reading Technology, Inc.)
4
5. kReader Mobile limitation
Assumes text comprises all (or most) of
image:
“Get as close to the text as you can without
cutting off any text, as it is displayed on the
screen”
“Distance from the target can greatly affect the
text recognition quality. Most, but not all,
documents should be approximately 10 inches
from the Reader.” (KNFB Mobile Reader User
Guide) 5
6. Related work
Much research on computer vision algorithms
for finding text in cluttered images
Very challenging problem
Even if text is correctly located in an image,
many problems with OCR:
• non-standard fonts
• poor illumination
• curved surfaces, perspective distortion
• other forms of noise in images 6
8. Related work (continued)
A small amount of work targeted specifically at
finding and reading text for blind and visually
impaired persons:
•C. Yi & Y. Tian, 2011
•“Smart Telescope” project from Blindsight
Corporation (www.blindsight.com): find text
regions and present enlarged text to low vision
user
8
9. Our approach
• Design algorithm to rapidly find text on
Android smartphone running in video mode
(640 x 480 pixels)
• Perform on-board OCR (Tesseract)
• Read aloud (text-to-speech) immediately
• For speed, all processing is done on-board
(no need for internet connection). Read
aloud up to 1-2 frames per second.
9
10. System UI (user interface)
• Philosophy: text detection/reading errors are
inevitable. To overcome them, have user
obtain multiple readings of each text sign
over time. Ignore spurious (unreproducible)
readings, and come to consensus about true
contents of each sign.
• If multiple text strings in one image, read
aloud in “raster” order (from top to bottom,
and along a line from left to right)
10
12. Big challenge: how to aim the
smartphone camera?
If you are blind, you may have little idea where
to aim the camera! (kReader Mobile User
Guide has an entire section on “Learning to
Aim Your Reader”)
Also, text is best read when it is horizontal, but
many blind users have trouble holding camera
horizontal
12
13. Help with aiming: UI features
• Tilt detection function:
allows user to vary pitch
and yaw but forces roll to
be zero. Issue vibration any
time roll is far enough from
zero.
Allows user to point in any compass direction,
and to aim high or low depending on whether
text is above or below shoulder height.
Increases chances that text appears
horizontal in image. 13
14. Help with aiming: UI
features
2) Warning whenever text is close to being cut
off: read aloud detected text in a low pitch.
Red box = camera
field of view
“Smoking”
(low pitch)
14
15. Help with aiming: UI
features
Red box = camera
field of view
“No smoking”
(normal pitch)
15
16. Help with aiming: UI features
3) Warning whenever text is small: read text in
a high pitch signal user to approach text for
clearer view
Red box = camera
field of view
NO SMOKING
“No smoking”
(high pitch)
16
17. Experiments
Ten signs printed out and placed on two
adjoining walls of conference room
Two blind volunteer subjects, out of reach
of wall
Brief training session: purpose of
experiment, how to hold and move
camera 17
18. -
Subjects told to search for an unknown
number of signs on the two walls, and to
tell experimenter content of each sign
detected 18
19. Experimental results
Subject 1:
•6 signs reported perfectly correctly
•2 signs completely missed
•2 other signs reported with some errors:
“Dr. Samuels” was detected as “Samuels”
(audible to experimenter but not subject)
•“Meeting in Session” sign gave rise to
the words “Meeting” and “section” (though
they were not uttered together)
19
20. Experimental results
Subject 2:
•3 signs reported perfectly correctly
•Typical errors:
- “Exam Room 150” was detected and
read aloud correctly, but subject was
unable to understand the word “exam”
- Reported “D L Samuels meeting in
session” as a sign, which is an incorrect
combination of two signs, “Dr. Samuels”
(which the system misread as “Dr.”) and
20
“Meeting in Session”
21. Discussion
System still very difficult to use!
False positives and false negatives (i.e.,
missed text) still a big problem we are
improving our text detection algorithm
Even when text is correctly detected, OCR still
causes many errors
Slow processing speeds (plus camera motion
blur) force user to pan camera very slowly
21
22. Discussion (continued)
UI planned in the future:
• Have user scan environment, sound an
audio tone whenever text is detected
• Compute an image mosaic (panorama) of
entire scene, to seamlessly read text strings
that don’t fit inside a single image frame
• Cluster multiple text strings into distinct sign
regions
• User will be able to hear text-to-speech
repeated for any sign region 22
23. Discussion (continued)
Further in the future:
“Visual spam” is a big problem task-driven
search (“find me Dr. Smith’s office”)
Finding signs will always be difficult at times
(even for people with normal vision)
integration with “indoor GPS” (i.e.,
localization indoors) to provide useful,
location-specific information
23
24. Thanks to…
First author: Dr. Huiying Shen (Smith-
Kettlewell)
Collaborators: Dr. Roberto Manduchi (UC
Santa Cruz), Dr. Vidya Murali and Dr.
Ender Tekin (Smith-Kettlewell)
Funding from NIH and NIDRR
24