2. Abstract
Camera based assistive text reading framework help blind
persons to read text labels and product packing from hand-
held objects in their daily lives. To isolate the objects in the
camera view, an efficient and effective motion-based method
has been proposed to define a region of interest(ROI). In the
extracted ROI, text localization and recognition are conducted
to acquire text information. The recognized text codes are
output to blind users in speech.
3. What is Assistive Technology?
“Any product, instrument,
equipment or technical system
used by a disabled or
elderly person, made specially or
existing on the market,
aimed to prevent, compensate,
relieve or neutralise the deficiency,
the inability or the handicap.”
4. Introduction
• Of the 314 million visually impaired people worldwide,
45 million are blind.
• Developments in computer vision, digital cameras, and
portable computers make it feasible to assist these
individuals.
• By developing camera-based products which combine
computer vision technology along with OCR systems.
• Already few portable system exist like portable bar code
reader, pen scanner, k mobile reader.
6. Drawbacks
• Cannot handle screen image with complex
background.
• Hard to find position of barcode.
• Object must be placed on a clear dark surface and
must contain text.
• Cannot handle screen image with complex
background.
• Hard to find the position of barcode.
• Objects must be placed on a clear dark surface and
must contain text.
7. Proposed method
• The camera-based label reader help blind persons
to read names of labels on the products.
• Camera acts as main vision in detecting the label
image of the product then image is processed
internally .
• Separates label from image , and finally identifies
the product and identified product name is
pronounced through voice.
8. • Received label image is then converted to text.
• Once the identified label name is converted to text
and converted text is displayed on display unit
connected to controller.
• Now converted text should be converted to voice to
hear label name as voice through ear phones
connected to audio.
10. • The scene capture component collects scenes
containing objects of interest in the form of images
or video.
• In this prototype, it corresponds to a camera
attached to a pair of sunglasses.
• The data processing component is used for
deploying proposed algorithms, they are
object-of-interest detection to selectively extract
the image of the object held by the blind user from
the cluttered background or other neutral objects in
the camera view.
11. Text localization to obtain image regions containing
text, and text recognition to transform image-based
text information into readable codes.
• The audio output component is to inform the blind
user of recognized text codes.
• A Bluetooth earpiece with mini microphone is
employed for speech output.
12. Flowchart of the proposed framework to read text from hand-held objects for blind users.
13. Object of interest
• Frame sequence v is captured by a camera worn by
blind users.
• User’s object of interest S by shaking the object while
recording.
S= 1
|𝑣| 𝑖 𝑅(𝑣, 𝐵)
V is ith frame in the captured sequence
|v| is the number of frames
B is the estimated background from motion based object
detection
R is calculated foreground object at each frame
14. Text localization
• To extract text region
X𝑐=argmax 𝑠 ∈ 𝑠 L (s)
L is suitability responses of text layout
Xc is candidate text regions from object of interest S
15. Object region detection
• To ensure that the hand-held object appears in the
camera view, a camera with a reasonably wide
angle is proposed(since the blind user may not aim
accurately).
• Users are asked to shake the hand-held objects
containing the text they wish identify.
• Employ a motion-based method to localize the
objects from cluttered background.
16. • Background subtraction (BGS) approach is used to
detect moving objects for video surveillance
systems with stationary cameras.
• This method is done based on the frame variations.
• Since background imagery is nearly constant in all
frames, a Gaussian method is applied.
• Gaussian mixture model method is robust to slow
lighting changes.
• Texture information is employed to remove false
positive foreground area.
17. • Texture similarity is measured.
• Its subsequent frame pixel distribution is more likely
to be the background model.
• To detect moving objects in a dynamic scene,
many adaptive BGS techniques have been
developed.
18. Localizing the image region of the hand-held object of interest. (a)Capturing images by a camera
mounted on a pair of sunglasses;(b)an example of a captured image;(c)detected moving areas in
the image while the user shaking the object;(d)detected region of the hand-held object for
further processing of text recognition.
19. Automatic text extraction
• Text extraction can be done by two features,
Stroke orientation.
Edge distribution.
A sample of text strokes showing relationship between stroke orientations and
gradient orientation at pixels of stroke boundaries. Blue arrows denote the stroke
orientations at the sections and red arrow denotes the gradient orientations at
stroke boundaries.
20. Text stroke orientation
• Stroke orientation describes the local structure of
text characters.
• Stroke orientation will be perpendicular to the
gradient orientation.
A text patch and its 16-bin histogram of quantized
stroke orientations.
21. Distributed of edge pixels
• Text characters appear in the form of stroke
boundaries.
• Describes the density of text region.
• Used to distinguish between text region from
background regions.
• Edge detection is performed to obtain an edge map.
• Number of edges in pixels in each row Y and column
X is calculated as NR(Y) and Nc(X).
22. Each pixel is labelled with product value of number of
edge pixels in its located rows and columns
respectively.
Then a 3X3 smooth operator Wn is applied to obtain
the edge distribution feature map.
D(X,Y)=∑Wn.NR(Yn).NC(Xn)
(Xn,Yn) is neighbouring pixel of (X,Y)
Wn is 1/9(weight value)
23. Text recognition and audio output
• Text recognition is performed by off-the-shelf OCR
prior to output of informative words from the
localized text regions.
• A text region labels the minimum rectangular area
for the accommodation of characters inside it.
• So the border of the text region contacts the edge
boundary of the text character.
24. • OCR generates better performance if text regions
are first assigned proper margin areas and
binarized to segment text characters from
background.
• Thus, each localized text region is enlarged by
enhancing the height and width by pixels,
respectively.
25. Conclusion
• To read printed text on hand-held objects
for assisting blind person.
• In order to solve the common aiming
problem for blind users.
• This method can effectively distinguish the
object of interest from background or
other objects in the camera view.
26. • To extract text regions from complex
backgrounds, proposed a text localization
algorithm based on models of stroke
orientation and edge distributions.
• OCR is used to perform word recognition
on the localized text regions and
transform into audio output for blind
users.
27. References
• Base paper by Chucai Yi, student member,IEEE,
YingLi Tian, Senior member, IEEE, Aries Arditi.
• T.Phan, P.Shivakumara and C.L.Tan, “A Laplacian
Method for Video /text Detection,”.
• C.Stauffer and W.E.L. Grimson, “Adaptive
Background mixture Model for real-time tracking”,.
• Vision Pattern Recognit., Fort Collins, CO, USA,
2013.
Cannot handle screen image with complex background.
Hard to find the position of barcode.
Objects must be placed on a clear dark surface and must contain text.
SCREEN CAPTURE
DATA PROCESSOR
AUDIO OUTPUT
Flowchart of the proposed framework to read text from hand-held objects for blind users.
th
c
Localizing the image region of the hand-held object of interest. (a)Capturing images by a camera mounted on a pair of sunglasses;(b)an example of a captured image;(c)detected moving areas in the image while the user shaking the object;(d)detected region of the hand-held object for further processing of text recognition.
A text patch and its 16-bin histogram of quantized stroke orientations.