SlideShare a Scribd company logo
1 of 24
Towards A Real-Time System for
  Finding and Reading Signs
  for Visually Impaired Users

     James Coughlan, Ph.D.
Informational signs
Signs are ubiquitous indoors and outdoors




Useful for wayfinding, finding shops and
businesses, accessing variety of services
But nearly all are inaccessible to blind and
visually impaired persons!                     2
OCR (Optical Character
          Recognition)
Originally developed for clear images of text
documents, acquired by a flatbed scanner
Not equipped to find text in an image with lots
of non-text clutter (buildings, trees, etc.)




                                             3
Portable OCR for visually
         impaired users
Smartphone (Nokia N82) implementation:
kReader Mobile, knfbReader Mobile (K–NFB
Reading Technology, Inc.)




                                       4
kReader Mobile limitation
Assumes text comprises all (or most) of
image:
“Get as close to the text as you can without
cutting off any text, as it is displayed on the
screen”
“Distance from the target can greatly affect the
text recognition quality. Most, but not all,
documents should be approximately 10 inches
from the Reader.” (KNFB Mobile Reader User
Guide)                                          5
Related work
Much research on computer vision algorithms
for finding text in cluttered images
Very challenging problem
Even if text is correctly located in an image,
many problems with OCR:
•      non-standard fonts
•      poor illumination
•      curved surfaces, perspective distortion
•      other forms of noise in images          6
Related work (continued)
Some smartphone apps find text, read it
and translate it in real time




                                          7
Related work (continued)
A small amount of work targeted specifically at
finding and reading text for blind and visually
impaired persons:

•C. Yi & Y. Tian, 2011
•“Smart Telescope” project from Blindsight
Corporation (www.blindsight.com): find text
regions and present enlarged text to low vision
user
                                            8
Our approach
• Design algorithm to rapidly find text on
  Android smartphone running in video mode
  (640 x 480 pixels)
• Perform on-board OCR (Tesseract)
• Read aloud (text-to-speech) immediately
• For speed, all processing is done on-board
  (no need for internet connection). Read
  aloud up to 1-2 frames per second.


                                          9
System UI (user interface)
• Philosophy: text detection/reading errors are
  inevitable. To overcome them, have user
  obtain multiple readings of each text sign
  over time. Ignore spurious (unreproducible)
  readings, and come to consensus about true
  contents of each sign.
• If multiple text strings in one image, read
  aloud in “raster” order (from top to bottom,
  and along a line from left to right)

                                           10
Overview of algorithm




                        11
Big challenge: how to aim the
     smartphone camera?
If you are blind, you may have little idea where
to aim the camera! (kReader Mobile User
Guide has an entire section on “Learning to
Aim Your Reader”)

Also, text is best read when it is horizontal, but
many blind users have trouble holding camera
horizontal
                                              12
Help with aiming: UI features
• Tilt detection function:
  allows user to vary pitch
  and yaw but forces roll to
  be zero. Issue vibration any
  time roll is far enough from
  zero.
  Allows user to point in any compass direction,
  and to aim high or low depending on whether
  text is above or below shoulder height.
  Increases chances that text appears
  horizontal in image.                       13
Help with aiming: UI
           features
2) Warning whenever text is close to being cut
off: read aloud detected text in a low pitch.
                          Red box = camera
                                field of view
                           “Smoking”
                                (low pitch)


                                          14
Help with aiming: UI
     features

           Red box = camera
               field of view
            “No smoking”
               (normal pitch)



                           15
Help with aiming: UI features

3) Warning whenever text is small: read text in
a high pitch  signal user to approach text for
clearer view
                         Red box = camera
                               field of view
     NO SMOKING

                          “No smoking”
                             (high pitch)
                                            16
Experiments
Ten signs printed out and placed on two
adjoining walls of conference room

Two blind volunteer subjects, out of reach
of wall

Brief training session: purpose of
experiment, how to hold and move
camera                                    17
-




Subjects told to search for an unknown
number of signs on the two walls, and to
tell experimenter content of each sign
detected                                18
Experimental results
Subject 1:
•6 signs reported perfectly correctly
•2 signs completely missed
•2 other signs reported with some errors:
“Dr. Samuels” was detected as “Samuels”
(audible to experimenter but not subject)
•“Meeting in Session” sign gave rise to
the words “Meeting” and “section” (though
they were not uttered together)
                                        19
Experimental results
Subject 2:
•3 signs reported perfectly correctly
•Typical errors:
- “Exam Room 150” was detected and
read aloud correctly, but subject was
unable to understand the word “exam”
- Reported “D L Samuels meeting in
session” as a sign, which is an incorrect
combination of two signs, “Dr. Samuels”
(which the system misread as “Dr.”) and
                                            20
“Meeting in Session”
Discussion
System still very difficult to use!
False positives and false negatives (i.e.,
  missed text) still a big problem  we are
  improving our text detection algorithm
Even when text is correctly detected, OCR still
  causes many errors
Slow processing speeds (plus camera motion
  blur) force user to pan camera very slowly

                                            21
Discussion (continued)
UI planned in the future:
• Have user scan environment, sound an
  audio tone whenever text is detected
• Compute an image mosaic (panorama) of
  entire scene, to seamlessly read text strings
  that don’t fit inside a single image frame
• Cluster multiple text strings into distinct sign
  regions
• User will be able to hear text-to-speech
  repeated for any sign region                   22
Discussion (continued)
Further in the future:
“Visual spam” is a big problem  task-driven
  search (“find me Dr. Smith’s office”)
Finding signs will always be difficult at times
  (even for people with normal vision) 
  integration with “indoor GPS” (i.e.,
  localization indoors) to provide useful,
  location-specific information

                                              23
Thanks to…
First author: Dr. Huiying Shen (Smith-
   Kettlewell)

Collaborators: Dr. Roberto Manduchi (UC
  Santa Cruz), Dr. Vidya Murali and Dr.
  Ender Tekin (Smith-Kettlewell)

Funding from NIH and NIDRR



                                          24

More Related Content

Viewers also liked

Cloud4all settings handlers
Cloud4all settings handlersCloud4all settings handlers
Cloud4all settings handlersicchp2012
 
Integration of a Regular Application into a User Interface Adaptation Engine ...
Integration of a Regular Application into a User Interface Adaptation Engine ...Integration of a Regular Application into a User Interface Adaptation Engine ...
Integration of a Regular Application into a User Interface Adaptation Engine ...icchp2012
 
Non-Visual presentation of graphs using the Novint Falcon
Non-Visual presentation of graphs using the Novint FalconNon-Visual presentation of graphs using the Novint Falcon
Non-Visual presentation of graphs using the Novint Falconicchp2012
 
An Accessibility Checker for LibreOffice and OpenOffice.org Writer
An Accessibility Checker for LibreOffice and OpenOffice.org WriterAn Accessibility Checker for LibreOffice and OpenOffice.org Writer
An Accessibility Checker for LibreOffice and OpenOffice.org Writericchp2012
 
Use of Social Media by People with Visual Impairments: Usage Levels, Attitude...
Use of Social Media by People with Visual Impairments: Usage Levels, Attitude...Use of Social Media by People with Visual Impairments: Usage Levels, Attitude...
Use of Social Media by People with Visual Impairments: Usage Levels, Attitude...icchp2012
 
Lifecycle actions manager
Lifecycle actions managerLifecycle actions manager
Lifecycle actions managericchp2012
 

Viewers also liked (7)

Cloud4all settings handlers
Cloud4all settings handlersCloud4all settings handlers
Cloud4all settings handlers
 
Integration of a Regular Application into a User Interface Adaptation Engine ...
Integration of a Regular Application into a User Interface Adaptation Engine ...Integration of a Regular Application into a User Interface Adaptation Engine ...
Integration of a Regular Application into a User Interface Adaptation Engine ...
 
Non-Visual presentation of graphs using the Novint Falcon
Non-Visual presentation of graphs using the Novint FalconNon-Visual presentation of graphs using the Novint Falcon
Non-Visual presentation of graphs using the Novint Falcon
 
English4
English4English4
English4
 
An Accessibility Checker for LibreOffice and OpenOffice.org Writer
An Accessibility Checker for LibreOffice and OpenOffice.org WriterAn Accessibility Checker for LibreOffice and OpenOffice.org Writer
An Accessibility Checker for LibreOffice and OpenOffice.org Writer
 
Use of Social Media by People with Visual Impairments: Usage Levels, Attitude...
Use of Social Media by People with Visual Impairments: Usage Levels, Attitude...Use of Social Media by People with Visual Impairments: Usage Levels, Attitude...
Use of Social Media by People with Visual Impairments: Usage Levels, Attitude...
 
Lifecycle actions manager
Lifecycle actions managerLifecycle actions manager
Lifecycle actions manager
 

Similar to Real-Time Sign Reading for Visually Impaired

A Translation Device for the Vision Based Sign Language
A Translation Device for the Vision Based Sign LanguageA Translation Device for the Vision Based Sign Language
A Translation Device for the Vision Based Sign Languageijsrd.com
 
Sign Language Recognition System.pptx
Sign Language Recognition System.pptxSign Language Recognition System.pptx
Sign Language Recognition System.pptxDhruvMittal81
 
Screenless pd presentation
Screenless pd presentationScreenless pd presentation
Screenless pd presentationShalini1293
 
Introduction to Prototyping: What, Why, How
Introduction to Prototyping: What, Why, HowIntroduction to Prototyping: What, Why, How
Introduction to Prototyping: What, Why, HowAbdallah El Ali
 
Game Design 2: Lecture 10 - UI Layout
Game Design 2: Lecture 10 - UI LayoutGame Design 2: Lecture 10 - UI Layout
Game Design 2: Lecture 10 - UI LayoutDavid Farrell
 
SCREENLESS DISPLAY.pptx
SCREENLESS DISPLAY.pptxSCREENLESS DISPLAY.pptx
SCREENLESS DISPLAY.pptxAlenJames14
 
Advance Interaction Techniques
Advance Interaction Techniques Advance Interaction Techniques
Advance Interaction Techniques Waqar_Ali52
 
2016 AR Summer School - Lecture 5
2016 AR Summer School - Lecture 52016 AR Summer School - Lecture 5
2016 AR Summer School - Lecture 5Mark Billinghurst
 
Materi_01_VK_2223_3.pdf
Materi_01_VK_2223_3.pdfMateri_01_VK_2223_3.pdf
Materi_01_VK_2223_3.pdfichsan6
 
Surface computing,towards business technology
Surface computing,towards business technologySurface computing,towards business technology
Surface computing,towards business technologyrajesh441
 
Open Cv – An Introduction To The Vision
Open Cv – An Introduction To The VisionOpen Cv – An Introduction To The Vision
Open Cv – An Introduction To The VisionHemanth Haridas
 
Part A--Scanners, Conversion
Part A--Scanners, ConversionPart A--Scanners, Conversion
Part A--Scanners, Conversiongollanmel
 
Smartphone and tablet apps for people with disabilities
Smartphone and tablet apps for people with disabilities Smartphone and tablet apps for people with disabilities
Smartphone and tablet apps for people with disabilities jemsshep07
 
Sensory Aids for Persons with Visual Impairments
Sensory Aids for Persons with Visual ImpairmentsSensory Aids for Persons with Visual Impairments
Sensory Aids for Persons with Visual ImpairmentsDamian T. Gordon
 

Similar to Real-Time Sign Reading for Visually Impaired (20)

Virtual Mouse
Virtual MouseVirtual Mouse
Virtual Mouse
 
A Translation Device for the Vision Based Sign Language
A Translation Device for the Vision Based Sign LanguageA Translation Device for the Vision Based Sign Language
A Translation Device for the Vision Based Sign Language
 
Sign Language Recognition System.pptx
Sign Language Recognition System.pptxSign Language Recognition System.pptx
Sign Language Recognition System.pptx
 
Gesture detection
Gesture detectionGesture detection
Gesture detection
 
Suman
SumanSuman
Suman
 
Screenless pd presentation
Screenless pd presentationScreenless pd presentation
Screenless pd presentation
 
Introduction to Prototyping: What, Why, How
Introduction to Prototyping: What, Why, HowIntroduction to Prototyping: What, Why, How
Introduction to Prototyping: What, Why, How
 
Game Design 2: Lecture 10 - UI Layout
Game Design 2: Lecture 10 - UI LayoutGame Design 2: Lecture 10 - UI Layout
Game Design 2: Lecture 10 - UI Layout
 
SCREENLESS DISPLAY.pptx
SCREENLESS DISPLAY.pptxSCREENLESS DISPLAY.pptx
SCREENLESS DISPLAY.pptx
 
A Robust Embedded Based String Recognition for Visually Impaired People
A Robust Embedded Based String Recognition for Visually Impaired PeopleA Robust Embedded Based String Recognition for Visually Impaired People
A Robust Embedded Based String Recognition for Visually Impaired People
 
Advance Interaction Techniques
Advance Interaction Techniques Advance Interaction Techniques
Advance Interaction Techniques
 
2016 AR Summer School - Lecture 5
2016 AR Summer School - Lecture 52016 AR Summer School - Lecture 5
2016 AR Summer School - Lecture 5
 
Materi_01_VK_2223_3.pdf
Materi_01_VK_2223_3.pdfMateri_01_VK_2223_3.pdf
Materi_01_VK_2223_3.pdf
 
Surface computing,towards business technology
Surface computing,towards business technologySurface computing,towards business technology
Surface computing,towards business technology
 
Open Cv – An Introduction To The Vision
Open Cv – An Introduction To The VisionOpen Cv – An Introduction To The Vision
Open Cv – An Introduction To The Vision
 
Part A--Scanners, Conversion
Part A--Scanners, ConversionPart A--Scanners, Conversion
Part A--Scanners, Conversion
 
Smartphone and tablet apps for people with disabilities
Smartphone and tablet apps for people with disabilities Smartphone and tablet apps for people with disabilities
Smartphone and tablet apps for people with disabilities
 
Touchless touch screen
Touchless touch screenTouchless touch screen
Touchless touch screen
 
Sensory Aids for Persons with Visual Impairments
Sensory Aids for Persons with Visual ImpairmentsSensory Aids for Persons with Visual Impairments
Sensory Aids for Persons with Visual Impairments
 
gesture-recognition
gesture-recognitiongesture-recognition
gesture-recognition
 

More from icchp2012

Improving Game Accessibility with Vibrotactile-Enhanced Hearing Instruments
Improving Game Accessibility with Vibrotactile-Enhanced Hearing InstrumentsImproving Game Accessibility with Vibrotactile-Enhanced Hearing Instruments
Improving Game Accessibility with Vibrotactile-Enhanced Hearing Instrumentsicchp2012
 
Camera-based Signage Detection and Recognition for Blind Persons
Camera-based Signage Detection and Recognition for Blind PersonsCamera-based Signage Detection and Recognition for Blind Persons
Camera-based Signage Detection and Recognition for Blind Personsicchp2012
 
Visión SenS - Why should blind people be limited to information in Braille ?
Visión SenS - Why should blind people be limited to information in Braille ?Visión SenS - Why should blind people be limited to information in Braille ?
Visión SenS - Why should blind people be limited to information in Braille ?icchp2012
 
GPS and Inertial Measurement Unit (IMU) as a Navigation System for the Visual...
GPS and Inertial Measurement Unit (IMU) as a Navigation System for the Visual...GPS and Inertial Measurement Unit (IMU) as a Navigation System for the Visual...
GPS and Inertial Measurement Unit (IMU) as a Navigation System for the Visual...icchp2012
 
3D model fabricated by layered manufacturing for visually handicapped persons...
3D model fabricated by layered manufacturing for visually handicapped persons...3D model fabricated by layered manufacturing for visually handicapped persons...
3D model fabricated by layered manufacturing for visually handicapped persons...icchp2012
 
A Multimodal Approach To Accessible Web Content On Smartphones
A Multimodal Approach To Accessible Web Content On SmartphonesA Multimodal Approach To Accessible Web Content On Smartphones
A Multimodal Approach To Accessible Web Content On Smartphonesicchp2012
 
AAC vocabulary standardisation and harmonisation
AAC vocabulary standardisation and harmonisation AAC vocabulary standardisation and harmonisation
AAC vocabulary standardisation and harmonisation icchp2012
 
The Crosswatch Traffic Intersection Analyzer: A Roadmap for the Future - Coug...
The Crosswatch Traffic Intersection Analyzer: A Roadmap for the Future - Coug...The Crosswatch Traffic Intersection Analyzer: A Roadmap for the Future - Coug...
The Crosswatch Traffic Intersection Analyzer: A Roadmap for the Future - Coug...icchp2012
 
Creating an Entertaining and Informative Music Visualization
Creating an Entertaining and Informative Music VisualizationCreating an Entertaining and Informative Music Visualization
Creating an Entertaining and Informative Music Visualizationicchp2012
 
Cloud4all Architecture Overview
Cloud4all Architecture OverviewCloud4all Architecture Overview
Cloud4all Architecture Overviewicchp2012
 

More from icchp2012 (10)

Improving Game Accessibility with Vibrotactile-Enhanced Hearing Instruments
Improving Game Accessibility with Vibrotactile-Enhanced Hearing InstrumentsImproving Game Accessibility with Vibrotactile-Enhanced Hearing Instruments
Improving Game Accessibility with Vibrotactile-Enhanced Hearing Instruments
 
Camera-based Signage Detection and Recognition for Blind Persons
Camera-based Signage Detection and Recognition for Blind PersonsCamera-based Signage Detection and Recognition for Blind Persons
Camera-based Signage Detection and Recognition for Blind Persons
 
Visión SenS - Why should blind people be limited to information in Braille ?
Visión SenS - Why should blind people be limited to information in Braille ?Visión SenS - Why should blind people be limited to information in Braille ?
Visión SenS - Why should blind people be limited to information in Braille ?
 
GPS and Inertial Measurement Unit (IMU) as a Navigation System for the Visual...
GPS and Inertial Measurement Unit (IMU) as a Navigation System for the Visual...GPS and Inertial Measurement Unit (IMU) as a Navigation System for the Visual...
GPS and Inertial Measurement Unit (IMU) as a Navigation System for the Visual...
 
3D model fabricated by layered manufacturing for visually handicapped persons...
3D model fabricated by layered manufacturing for visually handicapped persons...3D model fabricated by layered manufacturing for visually handicapped persons...
3D model fabricated by layered manufacturing for visually handicapped persons...
 
A Multimodal Approach To Accessible Web Content On Smartphones
A Multimodal Approach To Accessible Web Content On SmartphonesA Multimodal Approach To Accessible Web Content On Smartphones
A Multimodal Approach To Accessible Web Content On Smartphones
 
AAC vocabulary standardisation and harmonisation
AAC vocabulary standardisation and harmonisation AAC vocabulary standardisation and harmonisation
AAC vocabulary standardisation and harmonisation
 
The Crosswatch Traffic Intersection Analyzer: A Roadmap for the Future - Coug...
The Crosswatch Traffic Intersection Analyzer: A Roadmap for the Future - Coug...The Crosswatch Traffic Intersection Analyzer: A Roadmap for the Future - Coug...
The Crosswatch Traffic Intersection Analyzer: A Roadmap for the Future - Coug...
 
Creating an Entertaining and Informative Music Visualization
Creating an Entertaining and Informative Music VisualizationCreating an Entertaining and Informative Music Visualization
Creating an Entertaining and Informative Music Visualization
 
Cloud4all Architecture Overview
Cloud4all Architecture OverviewCloud4all Architecture Overview
Cloud4all Architecture Overview
 

Recently uploaded

Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Americas Got Grants
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCRashishs7044
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMintel Group
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaoncallgirls2057
 
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxThe-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxmbikashkanyari
 
8447779800, Low rate Call girls in Dwarka mor Delhi NCR
8447779800, Low rate Call girls in Dwarka mor Delhi NCR8447779800, Low rate Call girls in Dwarka mor Delhi NCR
8447779800, Low rate Call girls in Dwarka mor Delhi NCRashishs7044
 
Darshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfDarshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfShashank Mehta
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCRashishs7044
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationAnamaria Contreras
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckHajeJanKamps
 
Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Peter Ward
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...ssuserf63bd7
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Servicecallgirls2057
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCRashishs7044
 

Recently uploaded (20)

Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 Edition
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdf
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
 
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxThe-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
 
8447779800, Low rate Call girls in Dwarka mor Delhi NCR
8447779800, Low rate Call girls in Dwarka mor Delhi NCR8447779800, Low rate Call girls in Dwarka mor Delhi NCR
8447779800, Low rate Call girls in Dwarka mor Delhi NCR
 
Darshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfDarshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdf
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement Presentation
 
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
 
Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...
 
Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?
 
Corporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information TechnologyCorporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information Technology
 
Call Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North GoaCall Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North Goa
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR
 

Real-Time Sign Reading for Visually Impaired

  • 1. Towards A Real-Time System for Finding and Reading Signs for Visually Impaired Users James Coughlan, Ph.D.
  • 2. Informational signs Signs are ubiquitous indoors and outdoors Useful for wayfinding, finding shops and businesses, accessing variety of services But nearly all are inaccessible to blind and visually impaired persons! 2
  • 3. OCR (Optical Character Recognition) Originally developed for clear images of text documents, acquired by a flatbed scanner Not equipped to find text in an image with lots of non-text clutter (buildings, trees, etc.) 3
  • 4. Portable OCR for visually impaired users Smartphone (Nokia N82) implementation: kReader Mobile, knfbReader Mobile (K–NFB Reading Technology, Inc.) 4
  • 5. kReader Mobile limitation Assumes text comprises all (or most) of image: “Get as close to the text as you can without cutting off any text, as it is displayed on the screen” “Distance from the target can greatly affect the text recognition quality. Most, but not all, documents should be approximately 10 inches from the Reader.” (KNFB Mobile Reader User Guide) 5
  • 6. Related work Much research on computer vision algorithms for finding text in cluttered images Very challenging problem Even if text is correctly located in an image, many problems with OCR: • non-standard fonts • poor illumination • curved surfaces, perspective distortion • other forms of noise in images 6
  • 7. Related work (continued) Some smartphone apps find text, read it and translate it in real time 7
  • 8. Related work (continued) A small amount of work targeted specifically at finding and reading text for blind and visually impaired persons: •C. Yi & Y. Tian, 2011 •“Smart Telescope” project from Blindsight Corporation (www.blindsight.com): find text regions and present enlarged text to low vision user 8
  • 9. Our approach • Design algorithm to rapidly find text on Android smartphone running in video mode (640 x 480 pixels) • Perform on-board OCR (Tesseract) • Read aloud (text-to-speech) immediately • For speed, all processing is done on-board (no need for internet connection). Read aloud up to 1-2 frames per second. 9
  • 10. System UI (user interface) • Philosophy: text detection/reading errors are inevitable. To overcome them, have user obtain multiple readings of each text sign over time. Ignore spurious (unreproducible) readings, and come to consensus about true contents of each sign. • If multiple text strings in one image, read aloud in “raster” order (from top to bottom, and along a line from left to right) 10
  • 12. Big challenge: how to aim the smartphone camera? If you are blind, you may have little idea where to aim the camera! (kReader Mobile User Guide has an entire section on “Learning to Aim Your Reader”) Also, text is best read when it is horizontal, but many blind users have trouble holding camera horizontal 12
  • 13. Help with aiming: UI features • Tilt detection function: allows user to vary pitch and yaw but forces roll to be zero. Issue vibration any time roll is far enough from zero. Allows user to point in any compass direction, and to aim high or low depending on whether text is above or below shoulder height. Increases chances that text appears horizontal in image. 13
  • 14. Help with aiming: UI features 2) Warning whenever text is close to being cut off: read aloud detected text in a low pitch. Red box = camera field of view  “Smoking” (low pitch) 14
  • 15. Help with aiming: UI features Red box = camera field of view  “No smoking” (normal pitch) 15
  • 16. Help with aiming: UI features 3) Warning whenever text is small: read text in a high pitch  signal user to approach text for clearer view Red box = camera field of view NO SMOKING  “No smoking” (high pitch) 16
  • 17. Experiments Ten signs printed out and placed on two adjoining walls of conference room Two blind volunteer subjects, out of reach of wall Brief training session: purpose of experiment, how to hold and move camera 17
  • 18. - Subjects told to search for an unknown number of signs on the two walls, and to tell experimenter content of each sign detected 18
  • 19. Experimental results Subject 1: •6 signs reported perfectly correctly •2 signs completely missed •2 other signs reported with some errors: “Dr. Samuels” was detected as “Samuels” (audible to experimenter but not subject) •“Meeting in Session” sign gave rise to the words “Meeting” and “section” (though they were not uttered together) 19
  • 20. Experimental results Subject 2: •3 signs reported perfectly correctly •Typical errors: - “Exam Room 150” was detected and read aloud correctly, but subject was unable to understand the word “exam” - Reported “D L Samuels meeting in session” as a sign, which is an incorrect combination of two signs, “Dr. Samuels” (which the system misread as “Dr.”) and 20 “Meeting in Session”
  • 21. Discussion System still very difficult to use! False positives and false negatives (i.e., missed text) still a big problem  we are improving our text detection algorithm Even when text is correctly detected, OCR still causes many errors Slow processing speeds (plus camera motion blur) force user to pan camera very slowly 21
  • 22. Discussion (continued) UI planned in the future: • Have user scan environment, sound an audio tone whenever text is detected • Compute an image mosaic (panorama) of entire scene, to seamlessly read text strings that don’t fit inside a single image frame • Cluster multiple text strings into distinct sign regions • User will be able to hear text-to-speech repeated for any sign region 22
  • 23. Discussion (continued) Further in the future: “Visual spam” is a big problem  task-driven search (“find me Dr. Smith’s office”) Finding signs will always be difficult at times (even for people with normal vision)  integration with “indoor GPS” (i.e., localization indoors) to provide useful, location-specific information 23
  • 24. Thanks to… First author: Dr. Huiying Shen (Smith- Kettlewell) Collaborators: Dr. Roberto Manduchi (UC Santa Cruz), Dr. Vidya Murali and Dr. Ender Tekin (Smith-Kettlewell) Funding from NIH and NIDRR 24