SlideShare a Scribd company logo
1 of 5
Download to read offline
International Journal of Trend in Scientific Research and Development (IJTSRD)
Volume 4 Issue 5, July-August 2020 Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470
@ IJTSRD | Unique Paper ID – IJTSRD31508 | Volume – 4 | Issue – 5 | July-August 2020 Page 624
Text and Object Recognition using
Deep Learning for Visually Impaired People
R. Soniya1, B. Mounica1, A. Joshpin Shyamala1, Mr. D. Balakumaran2
2Assistant Professor, M.E,
1,2Department of Electronics and Communication Engineering,
1,2S. A. Engineering College, Chennai, Tamil Nadu, India
ABSTRACT
The main aim of this paper is to aid the visually impaired people with object
detection and text detection using deep learning. Object detection is done
using a convolution neural network and text recognition is done by optical
character recognition. The detected output is converted into speechusingtext
to the speech synthesizer. Object detection comprises of two methods. One is
object localization and the other is image classification. Image classification
refers to the prediction of classes of different objects within an image. Object
localization infers the location of objects using bounding boxes.
KEYWORDS: object detection, CNN, text detection, OCR, TTS
How to cite this paper: R. Soniya | B.
Mounica | A. Joshpin Shyamala | Mr. D.
Balakumaran "Text and Object
Recognition using Deep Learning for
Visually Impaired
People" Published in
International Journal
of Trend in Scientific
Research and
Development(ijtsrd),
ISSN: 2456-6470,
Volume-4 | Issue-5,
August 2020, pp.624-628, URL:
www.ijtsrd.com/papers/ijtsrd31508.pdf
Copyright © 2020 by author(s) and
International Journal of TrendinScientific
Research and Development Journal. This
is an Open Access
article distributed
under the terms of
the Creative Commons Attribution
License (CC BY 4.0)
(http://creativecommons.org/licenses/by
/4.0)
I. INTRODUCTION
Although much advancement has been made in the image
processing field, extracting objects from the color image is
not an easy task. This paper dealswith taking out the objects
and text alone from a frame. The object is recognized by
passing theinput video toa trainedmodelwhich leads to the
classification of objects between variousclasses.Themodelis
trained by MS-COCO (Common Objects in Context) [1] and
further tuned finely by using PASCAL VOC0712 [2] has some
pre-trained weights. It makes use of single-shot detectors
(SSD)and mobile nets result in fast and real-time object
detection.
Figure 1:Overall diagram for detection and recognition
A Text detectionmodelisbased ona fullyconvolution neural
network [2][3]. Itwas adopted tocentralize the text regions.
Generally, the feature of the actual input image is first
extracted. The one convolution is applied to output dense
per-pixel predictions of text to be present. For each positive
sample, the channel assumes a probability. As referred in
figure:1 The preprocessed image input is forwarded to the
neural network.
There it compares the input with the trained dataset after
detecting the object and text. It generates output as to what
object or text is that. This is followed by text to speech
conversion process to help the visually impaired person.
II. CONVOLUTION NEURAL NETWORK
CNN [3] is primarily used for pattern recognition within the
image, thus making it suitable for the image-focused taskby
reducing parameters to set up a model. It consists of three
layers namely, convolution layer, pooling, fully connected
layer. Figure 1: Shows simplified CNN architecture
comprising of feature extraction and classification.
Convolution layer performs convolution operation between
an array of input pixels and kernel of a particular size which
slides all over the input frame
IJTSRD31508
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID – IJTSRD31508 | Volume – 4 | Issue – 5 | July-August 2020 Page 625
Figure 2: Simplified CNN Architecture
The resulting output is passed through the activation
function (Rectified linear unit) in-order to removenonlinear
functionalities figure (2). The Pooling layer simplifies the
overall dimension of the image by performing the down
sampling operation. It operates over each input activation
map and scales using the “MAX” function. Most of the CNN
contains a max-pooling layer with a kernel of 2x2 size.
Figure 3: Rectified linear unit (ReLU) activation
function
The output from the final pooling or convolution layer is
flattened and fedas input toa fully connected layer. The final
layer uses soft max activation function which is used to
classify based on probabilities obtained.
III. CAFFE OBJECT DETECTION MODELS
Caffe [4] (convolution Architecture for fast feature
embedding) is a deep learning framework that supports
different architecture for image classification and
segmentation. The following are the model used for object
recognition.
A. Single-shot detectors:
It is the combination of Yolo’s regression and a faster RCNN
mechanism [5]. It simplifies the complexity of the neural
network. The local feature extraction method is effective in
SSD where multiscale feature extraction. The recognition
speed of SSD is 59 fps (frames per second). It quenches the
need for real-timeimplementations. The only demeritisthat
its fragile detection of tiny objects. At the time of prediction,
the network generates a default box and adjusts it to match
with the object shape [6],[7].
Figure 3:localization and confidence using multiple
bounding boxes
B. Mobile Nets
While developing object detection networks we generally
utilize existing network architecture, suchas VGG or ResNet
The limitationis thatthese networkarchitecturescanbevery
huge in the order of 200-500MB. So it doesn't support
resource-constrained systems. Mobile nets differ from
traditional CNN through the usage of depth-wise separable
convolution (figure 4)[7],[8].
Figure 4: (Left) Standard convolutional layer with
batch normalization and ReLU. (Right) Depth- wise
separable convolution with depth-wise and pointwise
layers followed by batch normalization and ReLU
(figure and caption from Liu et al.).
Here, convolution is divided into two stages of 3x3 depth
wise convolutions followed bya 1x1 point wise convolution.
IV. TEXT RECOGNITION
Text recognition utilizes the help of OCR to detect the text
from an image. OCR is a popular model use to get the
character from the text. OCR is a self learning translator of
character in typed, printed or any documented and pictured
forms. It has various methods to recognize characters like
thinning thickening, feature extraction [9]. The major steps
corresponding to text detection comprises data collection,
pre processing, segmentation, features extracting and
classification. In case of first step, pre trained data sets or
own data sets can be used for the research. Pre processing
includes slant detection, slant correction, gray scale
conversion, normalization; noise removaletc.segmentationis
done to splits the words from sentences for better detection.
The feature extraction step extracts the features from the
separated words and compares it with features of trained
images to confirm with the matching classes. Finally
classifiers are used with respect to convenience for the
classification process.
Figure 5:Basic steps of OCR[10]
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID – IJTSRD31508 | Volume – 4 | Issue – 5 | July-August 2020 Page 626
A. Data collection:
It involves acquiring the text from a frame in the form of
video of resolution using webcam [9]
B. Preprocessing:
Once the data is collected, various preprocessing techniques
are used to enhance and to improve the quality of the image.
It is an important stage prior to feature extraction because it
determines the quality of the image in the successive stages
[10].
C. Segmentation:
Segmentationisnothingbutseparating thevarious lettersof
a word. Only if there is a good segmentation, better
recognition rate can be obtained [11].
D. Feature extraction:
It is a stage where the required information is obtained by
using minimum number of resources. It also reduces the
overall dimensions of the image.
E. Classification:
It is process of categorizing a character into its suitable
category. There are different function to classify the
image[12],[13].
V. TEXT TO SPEECH SYNTHESIZER
Text to speech system (TTS) converts text into voice using a
synthesizer[14]which producesartificial speechofhuman.It
converts an arbitrary plain text into a corresponding
waveform.
The maincomponents of this systemare text processingand
speech generation [15]. The text processing unit is
responsible for producing a sequence of phonemic units
according to the input text and also includes text
normalization and tokenization. Then makes the text
into prosodic units, which is termed as phonetic (also called
as text-to-phoneme or grapheme-to-phoneme conversion).
These are then realized by speech generation system. A
general text to speech system is shown in figure 5[16]
Figure 5: A overview of TTS system
The Textnormalizationinvolvestransforming thewholetext
into standard format [17].Tokenization are used to split the
whole text intowords, keywords, phrasesand symbolsitalso
discards some punctuation like commas.
Speech synthesizer has to generate sounds to make those
words in input text. There are 40 phonemes available for 26
letters in English alphabet it is because some letters can be
read in multiple ways. But this may get harder for practical
implementation commonly called as prosody, in linguistics.
Within a word, a given phoneme may sounds differently
because the phonemes that come before and after it.
Thusgrapheme isthealternative approachwherewordsare
segmented into individual letters. Once the sequences of
words are converted into list of phonemes, then it will be
converted into speech. Following are the different speech
synthesis techniques.
A. Concatenative synthesis
It is a synthesizer that make use of preloaded record of
human voiceswhichisreferred toas units. Theunitscontains
lots of examples of different things, break the spoken
sentencesintowordsandthewordsinto phonemes[18].Thus
itcan easily rearrange tocreate newwordsandsentence.The
duration of units is defined and are in the range of 10 ms up
to 10 seconds.
B. formant synthesis
It is created by using additive synthesis and an acoustic
model instead of human speech input during runtime[19].
Factors like noise, frequencyandvoicearechangedwithtime
to produce signalsofartificial speech.Thisis often termedas
rules based synthesis. Most of the systems that use formant
synthesis have robotic or artificially generated voice which
are unique from human speech. It is trusted for its
intelligence and high speed, eliminating acoustic flaws that
usually infect concatenative systems. Visually impaired use
high speedsynthesizedspeech tonavigate throughcomputer
screensUsingscreen reader. Formant synthesizers normally
do not have a database of speech samples so, they are found
to be smaller programs than concatenative systems.
C. Articulatory synthesis
It is computational technique used for synthesizing speech
with respecttomodel ofhuman vocal tract(voiceprocess).It
is a mechanism to mimic a human voice[19].
VI. EXPERIMENTAL SETUP
A. Object Detection
To execute object detection using CNN it makes use of the
Caffe framework for deploying the object detection model.
The datasets include 20 classes [7], [20].
The model contains apre-trained. Caffe model file, many.
Proto txt files. Which has its parameterized structure within
the proto text file and also weights are predefined.
Thus a single frame of the input video is resized to the
dimension of 300x300(height & width) and forward passed
into the model. So that it can detect 19 objects in an image.
Multiple objects can be detected from a single frame.
B. trained dataset
The datasetcontainsthefollowingclassesincluding airplanes,
boats, bicycles, bottles, birds, buses, cars, cats, chairs, cows,
dining tables, dogs, horses, motorbikes, potted plants, people,
sofas, trains, and television monitors.
The probability during each detection is also checked. If the
confidence level is above the threshold level defined the
particularclass namewhich is referred toas label alongwith
their accuracy will be displayed and also a bounding box is
computed over the objects detected
And also the label detected will be created as a separate text
file which helps in producing the audio output of the object
detected.
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID – IJTSRD31508 | Volume – 4 | Issue – 5 | July-August 2020 Page 627
A. labels and their accuracy during detection
TABLE I Different objects and text fonts that can be
detected
Since the labeling had to be done manually and also because
of the size of RAM we couldn’t produce enough data, or data
of assured quality, to properly train the neural network
model. We have reached the possible accuracy for a CNN
with images as input data, we have only created a model for
detecting 20 objects.
With the help of OCR it detects the different font styles with
approximate predictions of the word or sentence. Table
6.1(a) shows the list ofobjectsand Table 6.1(b) shows listof
text fonts which can be detected by the proposed algorithm.
Along with the detection italso produces voice output of the
detected object and text through the earphone connected to
the audio jack.
As far as the results are concerned the following are
observed to be achieved from the project. In case of text, itis
possible for the device to detect from either image inputs or
live reading of the texts. Itiscapable of recognizing standard
text formats of different fonts. Especially bold letters can be
detected fast and accurately. It is noted that words in light
background are recognized well with precision.
These results are based on the frame intensity and pixel
variations. The outcome is subject to vary in accuracy
depending on the camera specifications and focus done to
the object. The accuracy of the predicted objectsdifferdueto
different condition suchascontrast sensitivity, illumination,
noise and also it producesmaximumaccuracyif the objectis
closer to the camera in the range of 70 cm with good
lightning condition.
VII. CONCLUSION
This device iscapable ofacting as a smart device. It provides
visually impaired people with the list of objects in the
surrounding along with accuracy of prediction.Thereforethe
person will get to know about the hindrances in his path.
This is more like a voice assistant that tells name of the
things around us. Multiple objects in a single frame is
detected which is an added advantage.
REFERENCES
[1] T.-Y. Lin, J. Hays, M. Maire, P. Perona, S. Belongie, D.
Ramanan, L. Bourdev, L. Zitnick, R. Girshick and P.
Dollar, "MicrosoftCOCO: Common Objects in Context,"
arXiv:1405.0312v3, pp. 1-15, February 2015.
[2] X. Zhou et al., ”EAST: An efficient and accurate scene
text detector”, Proc-30th IEEEconf.Comput.vis. Pattern
Recoginition, CVPR 2017, vol.2017-janua, pp.2642-
2651, 2017.
[3] X. Liu, D. Liang, S. Yan, D. Chen, Y. Qiao and J. Yan,
”FOTS: Fast Oriented Text Spotting with a Unified
Network”, Proc. IEEEComput. Soc. Conf. Comput.Vis.
Pattern Recognit., pp.5676-5685, 2018.
[4] M. Everingham, L. V. Gool, C. K. I. Williams, J. Winn and
A. Zisserman, "The PASCAL Visual ObjectClasses(VOC)
Challenge," International Journal of Computer Vision,
vol. 88, no. 2, pp. 303-338, 2010. ****
[5] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet
classification with deep convo-lutional neural
networks. In: Advances in neural information
processing systems. pp. 1097–1105 (2012)****
[6] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y.
Fu, andA. C. Berg, “Ssd: Single shot multibox detector,”
in ECCV, 2016.(ssdku)***
[7] Liu, Wei, et al. "Ssd: Single shot multibox detector."
European conference on computer vision. Springer,
Cham, 2016.
[8] A. G. Howard, M. Zhu, B. Chen and D. Kalenichenko,
"MobileNets: Efficient ConvolutionalNeural Networks
for Mobile Vision," 2017.
[9] “mobilenet_ssd pretrained model”
https://github.com/chuanqi305/MobileNet-
SSD/blob/master/README.md
[10] A Survey on Optical Character Recognition System
Noman Islam, Zeeshan Islam, Nazia Noor, Journal of
Information & Communication Technology-JICTVol.10
Issue. 2, December 2016.
[11] Lund, W.B., Kennard, D.J., & Ringger, E.K. (2013).
Combining Multiple Thresholding Binarization Values
to Improve OCR Output presented in Document
Recognition and Retrieval XX Conference 2013,
California, USA, 2013. USA: SPIE.
[12] Shaikh, N. A., & Shaikh, Z. A, 2005, A generalized
thinning algorithm for cursive and non-cursive
language scripts presented in 9th International
Multitopic Conference IEEE INMIC, Pakistan, 2005.
Pakistan: IEEE
[13] Shaikh, N. A., Shaikh, Z. A., & Ali, G, 2008, Segmentation
of Arabic text into characters for recognitionpresented
in International Multi Topic Conference, IMTIC,
Jamshoro, Pakistan, 2008.
[14] Ciresan, D. C., Meier, U., Gambardella, L. M., &
Schmidhuber, J, 2011, Convolutional neural network
committees for handwritten character classification
presented in International Conference on Document
Analysis and Recognition, Beijing, China, 2011. USA:
IEEE.
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID – IJTSRD31508 | Volume – 4 | Issue – 5 | July-August 2020 Page 628
[15] Archana Balyan, S. S. Agrwal and Amita Dev, Speech
Synthesis: Review, IJERT, ISSN 2278-0181 Vol. 2
(2013) p. 57 – 75.
[16] E. Cambria andB. White, “JumpingNLPcurves:areview
of natural language processing research,” IEEE
Computational Intelligence, vol. 9, no. 2, pp. 48–57,
2014.
[17] van Santen, Jan P. H.; Sproat, Richard W.; Olive,JosephP.;
Hirschberg, Julia (1997). Progress in Speech Synthesis.
Springer. ISBN 978-0-387-94701-3.
[18] Liu, F., Weng, F., Jiang, X.: A Broad-Coverage
Normalization System for Social Media Language. In:
Proceedings of the 50th Annual Meeting of the
AssociationforComputationalLinguistics:LongPapers-
Volume 1. Number July (2012) 1035–1044.
[19] Mark Tathamand Katherine Morton, Developments in
Speech Synthesis (John Wiley & Sons,Ltd. ISBN:0-470-
85538-X, 2005)
[20] “Deep learning for computer vision with Caffe and
cuDNN” NIVIDIA Developer Blog. October 16,2014
[21] (ocr) “A Complete Workflow for Development of
Bangla OCR” International Journal of Computer
Applications (0975 – 8887) Volume 21– No.9, May
2011

More Related Content

What's hot

Volume 2-issue-6-1974-1978
Volume 2-issue-6-1974-1978Volume 2-issue-6-1974-1978
Volume 2-issue-6-1974-1978Editor IJARCET
 
HUMAN COMPUTER INTERACTION ALGORITHM BASED ON SCENE SITUATION AWARENESS
HUMAN COMPUTER INTERACTION ALGORITHM BASED ON SCENE SITUATION AWARENESSHUMAN COMPUTER INTERACTION ALGORITHM BASED ON SCENE SITUATION AWARENESS
HUMAN COMPUTER INTERACTION ALGORITHM BASED ON SCENE SITUATION AWARENESScsandit
 
CONTENT RECOVERY AND IMAGE RETRIVAL IN IMAGE DATABASE CONTENT RETRIVING IN TE...
CONTENT RECOVERY AND IMAGE RETRIVAL IN IMAGE DATABASE CONTENT RETRIVING IN TE...CONTENT RECOVERY AND IMAGE RETRIVAL IN IMAGE DATABASE CONTENT RETRIVING IN TE...
CONTENT RECOVERY AND IMAGE RETRIVAL IN IMAGE DATABASE CONTENT RETRIVING IN TE...Editor IJMTER
 
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...ijscai
 
Paper id 25201491
Paper id 25201491Paper id 25201491
Paper id 25201491IJRAT
 
An interactive image segmentation using multiple user input’s
An interactive image segmentation using multiple user input’sAn interactive image segmentation using multiple user input’s
An interactive image segmentation using multiple user input’seSAT Publishing House
 
Image Classification using Deep Learning
Image Classification using Deep LearningImage Classification using Deep Learning
Image Classification using Deep Learningijtsrd
 
IRJET-MText Extraction from Images using Convolutional Neural Network
IRJET-MText Extraction from Images using Convolutional Neural NetworkIRJET-MText Extraction from Images using Convolutional Neural Network
IRJET-MText Extraction from Images using Convolutional Neural NetworkIRJET Journal
 
Gender Classification using SVM With Flask
Gender Classification using SVM With FlaskGender Classification using SVM With Flask
Gender Classification using SVM With FlaskAI Publications
 
An Iot Based Smart Manifold Attendance System
An Iot Based Smart Manifold Attendance SystemAn Iot Based Smart Manifold Attendance System
An Iot Based Smart Manifold Attendance SystemIJERDJOURNAL
 
IRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
IRJET- Implementation of Gender Detection with Notice Board using Raspberry PiIRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
IRJET- Implementation of Gender Detection with Notice Board using Raspberry PiIRJET Journal
 
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2IRJET Journal
 
IRJET- Deep Learning Techniques for Object Detection
IRJET-  	  Deep Learning Techniques for Object DetectionIRJET-  	  Deep Learning Techniques for Object Detection
IRJET- Deep Learning Techniques for Object DetectionIRJET Journal
 

What's hot (18)

Volume 2-issue-6-1974-1978
Volume 2-issue-6-1974-1978Volume 2-issue-6-1974-1978
Volume 2-issue-6-1974-1978
 
HUMAN COMPUTER INTERACTION ALGORITHM BASED ON SCENE SITUATION AWARENESS
HUMAN COMPUTER INTERACTION ALGORITHM BASED ON SCENE SITUATION AWARENESSHUMAN COMPUTER INTERACTION ALGORITHM BASED ON SCENE SITUATION AWARENESS
HUMAN COMPUTER INTERACTION ALGORITHM BASED ON SCENE SITUATION AWARENESS
 
CONTENT RECOVERY AND IMAGE RETRIVAL IN IMAGE DATABASE CONTENT RETRIVING IN TE...
CONTENT RECOVERY AND IMAGE RETRIVAL IN IMAGE DATABASE CONTENT RETRIVING IN TE...CONTENT RECOVERY AND IMAGE RETRIVAL IN IMAGE DATABASE CONTENT RETRIVING IN TE...
CONTENT RECOVERY AND IMAGE RETRIVAL IN IMAGE DATABASE CONTENT RETRIVING IN TE...
 
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...
 
Ijcet 06 08_005
Ijcet 06 08_005Ijcet 06 08_005
Ijcet 06 08_005
 
Ijetcas14 465
Ijetcas14 465Ijetcas14 465
Ijetcas14 465
 
Paper id 25201491
Paper id 25201491Paper id 25201491
Paper id 25201491
 
An interactive image segmentation using multiple user input’s
An interactive image segmentation using multiple user input’sAn interactive image segmentation using multiple user input’s
An interactive image segmentation using multiple user input’s
 
Image Classification using Deep Learning
Image Classification using Deep LearningImage Classification using Deep Learning
Image Classification using Deep Learning
 
50120130405017 2
50120130405017 250120130405017 2
50120130405017 2
 
IRJET-MText Extraction from Images using Convolutional Neural Network
IRJET-MText Extraction from Images using Convolutional Neural NetworkIRJET-MText Extraction from Images using Convolutional Neural Network
IRJET-MText Extraction from Images using Convolutional Neural Network
 
Paper
PaperPaper
Paper
 
Gender Classification using SVM With Flask
Gender Classification using SVM With FlaskGender Classification using SVM With Flask
Gender Classification using SVM With Flask
 
An Iot Based Smart Manifold Attendance System
An Iot Based Smart Manifold Attendance SystemAn Iot Based Smart Manifold Attendance System
An Iot Based Smart Manifold Attendance System
 
IRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
IRJET- Implementation of Gender Detection with Notice Board using Raspberry PiIRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
IRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
 
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
 
F010433136
F010433136F010433136
F010433136
 
IRJET- Deep Learning Techniques for Object Detection
IRJET-  	  Deep Learning Techniques for Object DetectionIRJET-  	  Deep Learning Techniques for Object Detection
IRJET- Deep Learning Techniques for Object Detection
 

Similar to Text and Object Recognition using Deep Learning for Visually Impaired People

Deep Learning in Text Recognition and Text Detection : A Review
Deep Learning in Text Recognition and Text Detection : A ReviewDeep Learning in Text Recognition and Text Detection : A Review
Deep Learning in Text Recognition and Text Detection : A ReviewIRJET Journal
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learningijtsrd
 
IMAGE SEGMENTATION AND ITS TECHNIQUES
IMAGE SEGMENTATION AND ITS TECHNIQUESIMAGE SEGMENTATION AND ITS TECHNIQUES
IMAGE SEGMENTATION AND ITS TECHNIQUESIRJET Journal
 
IRJET- Semantic Segmentation using Deep Learning
IRJET- Semantic Segmentation using Deep LearningIRJET- Semantic Segmentation using Deep Learning
IRJET- Semantic Segmentation using Deep LearningIRJET Journal
 
IRJET- Wearable AI Device for Blind
IRJET- Wearable AI Device for BlindIRJET- Wearable AI Device for Blind
IRJET- Wearable AI Device for BlindIRJET Journal
 
Voice Enable Blind Assistance System -Real time Object Detection
Voice Enable Blind Assistance System -Real time Object DetectionVoice Enable Blind Assistance System -Real time Object Detection
Voice Enable Blind Assistance System -Real time Object DetectionIRJET Journal
 
Object Detetcion using SSD-MobileNet
Object Detetcion using SSD-MobileNetObject Detetcion using SSD-MobileNet
Object Detetcion using SSD-MobileNetIRJET Journal
 
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionSecure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionIJAEMSJORNAL
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learningSushant Shrivastava
 
Automated Image Captioning – Model Based on CNN – GRU Architecture
Automated Image Captioning – Model Based on CNN – GRU ArchitectureAutomated Image Captioning – Model Based on CNN – GRU Architecture
Automated Image Captioning – Model Based on CNN – GRU ArchitectureIRJET Journal
 
IRJET- Face Recognition using Machine Learning
IRJET- Face Recognition using Machine LearningIRJET- Face Recognition using Machine Learning
IRJET- Face Recognition using Machine LearningIRJET Journal
 
Inpainting scheme for text in video a survey
Inpainting scheme for text in video   a surveyInpainting scheme for text in video   a survey
Inpainting scheme for text in video a surveyeSAT Journals
 
IRJET- Image Caption Generation System using Neural Network with Attention Me...
IRJET- Image Caption Generation System using Neural Network with Attention Me...IRJET- Image Caption Generation System using Neural Network with Attention Me...
IRJET- Image Caption Generation System using Neural Network with Attention Me...IRJET Journal
 
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...ijscai
 
Unsupervised learning models of invariant features in images: Recent developm...
Unsupervised learning models of invariant features in images: Recent developm...Unsupervised learning models of invariant features in images: Recent developm...
Unsupervised learning models of invariant features in images: Recent developm...IJSCAI Journal
 
Machine learning based augmented reality for improved learning application th...
Machine learning based augmented reality for improved learning application th...Machine learning based augmented reality for improved learning application th...
Machine learning based augmented reality for improved learning application th...IJECEIAES
 
DEEP LEARNING BASED BRAIN STROKE DETECTION
DEEP LEARNING BASED BRAIN STROKE DETECTIONDEEP LEARNING BASED BRAIN STROKE DETECTION
DEEP LEARNING BASED BRAIN STROKE DETECTIONIRJET Journal
 
Feature Extraction and Feature Selection using Textual Analysis
Feature Extraction and Feature Selection using Textual AnalysisFeature Extraction and Feature Selection using Textual Analysis
Feature Extraction and Feature Selection using Textual Analysisvivatechijri
 
Scene recognition using Convolutional Neural Network
Scene recognition using Convolutional Neural NetworkScene recognition using Convolutional Neural Network
Scene recognition using Convolutional Neural NetworkDhirajGidde
 

Similar to Text and Object Recognition using Deep Learning for Visually Impaired People (20)

Deep Learning in Text Recognition and Text Detection : A Review
Deep Learning in Text Recognition and Text Detection : A ReviewDeep Learning in Text Recognition and Text Detection : A Review
Deep Learning in Text Recognition and Text Detection : A Review
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learning
 
IMAGE SEGMENTATION AND ITS TECHNIQUES
IMAGE SEGMENTATION AND ITS TECHNIQUESIMAGE SEGMENTATION AND ITS TECHNIQUES
IMAGE SEGMENTATION AND ITS TECHNIQUES
 
IRJET- Semantic Segmentation using Deep Learning
IRJET- Semantic Segmentation using Deep LearningIRJET- Semantic Segmentation using Deep Learning
IRJET- Semantic Segmentation using Deep Learning
 
IRJET- Wearable AI Device for Blind
IRJET- Wearable AI Device for BlindIRJET- Wearable AI Device for Blind
IRJET- Wearable AI Device for Blind
 
Voice Enable Blind Assistance System -Real time Object Detection
Voice Enable Blind Assistance System -Real time Object DetectionVoice Enable Blind Assistance System -Real time Object Detection
Voice Enable Blind Assistance System -Real time Object Detection
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
 
Object Detetcion using SSD-MobileNet
Object Detetcion using SSD-MobileNetObject Detetcion using SSD-MobileNet
Object Detetcion using SSD-MobileNet
 
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionSecure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
Automated Image Captioning – Model Based on CNN – GRU Architecture
Automated Image Captioning – Model Based on CNN – GRU ArchitectureAutomated Image Captioning – Model Based on CNN – GRU Architecture
Automated Image Captioning – Model Based on CNN – GRU Architecture
 
IRJET- Face Recognition using Machine Learning
IRJET- Face Recognition using Machine LearningIRJET- Face Recognition using Machine Learning
IRJET- Face Recognition using Machine Learning
 
Inpainting scheme for text in video a survey
Inpainting scheme for text in video   a surveyInpainting scheme for text in video   a survey
Inpainting scheme for text in video a survey
 
IRJET- Image Caption Generation System using Neural Network with Attention Me...
IRJET- Image Caption Generation System using Neural Network with Attention Me...IRJET- Image Caption Generation System using Neural Network with Attention Me...
IRJET- Image Caption Generation System using Neural Network with Attention Me...
 
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...
 
Unsupervised learning models of invariant features in images: Recent developm...
Unsupervised learning models of invariant features in images: Recent developm...Unsupervised learning models of invariant features in images: Recent developm...
Unsupervised learning models of invariant features in images: Recent developm...
 
Machine learning based augmented reality for improved learning application th...
Machine learning based augmented reality for improved learning application th...Machine learning based augmented reality for improved learning application th...
Machine learning based augmented reality for improved learning application th...
 
DEEP LEARNING BASED BRAIN STROKE DETECTION
DEEP LEARNING BASED BRAIN STROKE DETECTIONDEEP LEARNING BASED BRAIN STROKE DETECTION
DEEP LEARNING BASED BRAIN STROKE DETECTION
 
Feature Extraction and Feature Selection using Textual Analysis
Feature Extraction and Feature Selection using Textual AnalysisFeature Extraction and Feature Selection using Textual Analysis
Feature Extraction and Feature Selection using Textual Analysis
 
Scene recognition using Convolutional Neural Network
Scene recognition using Convolutional Neural NetworkScene recognition using Convolutional Neural Network
Scene recognition using Convolutional Neural Network
 

More from ijtsrd

‘Six Sigma Technique’ A Journey Through its Implementation
‘Six Sigma Technique’ A Journey Through its Implementation‘Six Sigma Technique’ A Journey Through its Implementation
‘Six Sigma Technique’ A Journey Through its Implementationijtsrd
 
Edge Computing in Space Enhancing Data Processing and Communication for Space...
Edge Computing in Space Enhancing Data Processing and Communication for Space...Edge Computing in Space Enhancing Data Processing and Communication for Space...
Edge Computing in Space Enhancing Data Processing and Communication for Space...ijtsrd
 
Dynamics of Communal Politics in 21st Century India Challenges and Prospects
Dynamics of Communal Politics in 21st Century India Challenges and ProspectsDynamics of Communal Politics in 21st Century India Challenges and Prospects
Dynamics of Communal Politics in 21st Century India Challenges and Prospectsijtsrd
 
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...ijtsrd
 
The Impact of Digital Media on the Decentralization of Power and the Erosion ...
The Impact of Digital Media on the Decentralization of Power and the Erosion ...The Impact of Digital Media on the Decentralization of Power and the Erosion ...
The Impact of Digital Media on the Decentralization of Power and the Erosion ...ijtsrd
 
Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...
Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...
Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...ijtsrd
 
Problems and Challenges of Agro Entreprenurship A Study
Problems and Challenges of Agro Entreprenurship A StudyProblems and Challenges of Agro Entreprenurship A Study
Problems and Challenges of Agro Entreprenurship A Studyijtsrd
 
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...ijtsrd
 
The Impact of Educational Background and Professional Training on Human Right...
The Impact of Educational Background and Professional Training on Human Right...The Impact of Educational Background and Professional Training on Human Right...
The Impact of Educational Background and Professional Training on Human Right...ijtsrd
 
A Study on the Effective Teaching Learning Process in English Curriculum at t...
A Study on the Effective Teaching Learning Process in English Curriculum at t...A Study on the Effective Teaching Learning Process in English Curriculum at t...
A Study on the Effective Teaching Learning Process in English Curriculum at t...ijtsrd
 
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...ijtsrd
 
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...ijtsrd
 
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadiku
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. SadikuSustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadiku
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadikuijtsrd
 
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...ijtsrd
 
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...ijtsrd
 
Activating Geospatial Information for Sudans Sustainable Investment Map
Activating Geospatial Information for Sudans Sustainable Investment MapActivating Geospatial Information for Sudans Sustainable Investment Map
Activating Geospatial Information for Sudans Sustainable Investment Mapijtsrd
 
Educational Unity Embracing Diversity for a Stronger Society
Educational Unity Embracing Diversity for a Stronger SocietyEducational Unity Embracing Diversity for a Stronger Society
Educational Unity Embracing Diversity for a Stronger Societyijtsrd
 
Integration of Indian Indigenous Knowledge System in Management Prospects and...
Integration of Indian Indigenous Knowledge System in Management Prospects and...Integration of Indian Indigenous Knowledge System in Management Prospects and...
Integration of Indian Indigenous Knowledge System in Management Prospects and...ijtsrd
 
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...DeepMask Transforming Face Mask Identification for Better Pandemic Control in...
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...ijtsrd
 
Streamlining Data Collection eCRF Design and Machine Learning
Streamlining Data Collection eCRF Design and Machine LearningStreamlining Data Collection eCRF Design and Machine Learning
Streamlining Data Collection eCRF Design and Machine Learningijtsrd
 

More from ijtsrd (20)

‘Six Sigma Technique’ A Journey Through its Implementation
‘Six Sigma Technique’ A Journey Through its Implementation‘Six Sigma Technique’ A Journey Through its Implementation
‘Six Sigma Technique’ A Journey Through its Implementation
 
Edge Computing in Space Enhancing Data Processing and Communication for Space...
Edge Computing in Space Enhancing Data Processing and Communication for Space...Edge Computing in Space Enhancing Data Processing and Communication for Space...
Edge Computing in Space Enhancing Data Processing and Communication for Space...
 
Dynamics of Communal Politics in 21st Century India Challenges and Prospects
Dynamics of Communal Politics in 21st Century India Challenges and ProspectsDynamics of Communal Politics in 21st Century India Challenges and Prospects
Dynamics of Communal Politics in 21st Century India Challenges and Prospects
 
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...
Assess Perspective and Knowledge of Healthcare Providers Towards Elehealth in...
 
The Impact of Digital Media on the Decentralization of Power and the Erosion ...
The Impact of Digital Media on the Decentralization of Power and the Erosion ...The Impact of Digital Media on the Decentralization of Power and the Erosion ...
The Impact of Digital Media on the Decentralization of Power and the Erosion ...
 
Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...
Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...
Online Voices, Offline Impact Ambedkars Ideals and Socio Political Inclusion ...
 
Problems and Challenges of Agro Entreprenurship A Study
Problems and Challenges of Agro Entreprenurship A StudyProblems and Challenges of Agro Entreprenurship A Study
Problems and Challenges of Agro Entreprenurship A Study
 
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...
Comparative Analysis of Total Corporate Disclosure of Selected IT Companies o...
 
The Impact of Educational Background and Professional Training on Human Right...
The Impact of Educational Background and Professional Training on Human Right...The Impact of Educational Background and Professional Training on Human Right...
The Impact of Educational Background and Professional Training on Human Right...
 
A Study on the Effective Teaching Learning Process in English Curriculum at t...
A Study on the Effective Teaching Learning Process in English Curriculum at t...A Study on the Effective Teaching Learning Process in English Curriculum at t...
A Study on the Effective Teaching Learning Process in English Curriculum at t...
 
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...
The Role of Mentoring and Its Influence on the Effectiveness of the Teaching ...
 
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...
Design Simulation and Hardware Construction of an Arduino Microcontroller Bas...
 
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadiku
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. SadikuSustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadiku
Sustainable Energy by Paul A. Adekunte | Matthew N. O. Sadiku | Janet O. Sadiku
 
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...
Concepts for Sudan Survey Act Implementations Executive Regulations and Stand...
 
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...
Towards the Implementation of the Sudan Interpolated Geoid Model Khartoum Sta...
 
Activating Geospatial Information for Sudans Sustainable Investment Map
Activating Geospatial Information for Sudans Sustainable Investment MapActivating Geospatial Information for Sudans Sustainable Investment Map
Activating Geospatial Information for Sudans Sustainable Investment Map
 
Educational Unity Embracing Diversity for a Stronger Society
Educational Unity Embracing Diversity for a Stronger SocietyEducational Unity Embracing Diversity for a Stronger Society
Educational Unity Embracing Diversity for a Stronger Society
 
Integration of Indian Indigenous Knowledge System in Management Prospects and...
Integration of Indian Indigenous Knowledge System in Management Prospects and...Integration of Indian Indigenous Knowledge System in Management Prospects and...
Integration of Indian Indigenous Knowledge System in Management Prospects and...
 
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...DeepMask Transforming Face Mask Identification for Better Pandemic Control in...
DeepMask Transforming Face Mask Identification for Better Pandemic Control in...
 
Streamlining Data Collection eCRF Design and Machine Learning
Streamlining Data Collection eCRF Design and Machine LearningStreamlining Data Collection eCRF Design and Machine Learning
Streamlining Data Collection eCRF Design and Machine Learning
 

Recently uploaded

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxsqpmdrvczh
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........LeaCamillePacle
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxLigayaBacuel1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 

Recently uploaded (20)

TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 

Text and Object Recognition using Deep Learning for Visually Impaired People

  • 1. International Journal of Trend in Scientific Research and Development (IJTSRD) Volume 4 Issue 5, July-August 2020 Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470 @ IJTSRD | Unique Paper ID – IJTSRD31508 | Volume – 4 | Issue – 5 | July-August 2020 Page 624 Text and Object Recognition using Deep Learning for Visually Impaired People R. Soniya1, B. Mounica1, A. Joshpin Shyamala1, Mr. D. Balakumaran2 2Assistant Professor, M.E, 1,2Department of Electronics and Communication Engineering, 1,2S. A. Engineering College, Chennai, Tamil Nadu, India ABSTRACT The main aim of this paper is to aid the visually impaired people with object detection and text detection using deep learning. Object detection is done using a convolution neural network and text recognition is done by optical character recognition. The detected output is converted into speechusingtext to the speech synthesizer. Object detection comprises of two methods. One is object localization and the other is image classification. Image classification refers to the prediction of classes of different objects within an image. Object localization infers the location of objects using bounding boxes. KEYWORDS: object detection, CNN, text detection, OCR, TTS How to cite this paper: R. Soniya | B. Mounica | A. Joshpin Shyamala | Mr. D. Balakumaran "Text and Object Recognition using Deep Learning for Visually Impaired People" Published in International Journal of Trend in Scientific Research and Development(ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-5, August 2020, pp.624-628, URL: www.ijtsrd.com/papers/ijtsrd31508.pdf Copyright © 2020 by author(s) and International Journal of TrendinScientific Research and Development Journal. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0) (http://creativecommons.org/licenses/by /4.0) I. INTRODUCTION Although much advancement has been made in the image processing field, extracting objects from the color image is not an easy task. This paper dealswith taking out the objects and text alone from a frame. The object is recognized by passing theinput video toa trainedmodelwhich leads to the classification of objects between variousclasses.Themodelis trained by MS-COCO (Common Objects in Context) [1] and further tuned finely by using PASCAL VOC0712 [2] has some pre-trained weights. It makes use of single-shot detectors (SSD)and mobile nets result in fast and real-time object detection. Figure 1:Overall diagram for detection and recognition A Text detectionmodelisbased ona fullyconvolution neural network [2][3]. Itwas adopted tocentralize the text regions. Generally, the feature of the actual input image is first extracted. The one convolution is applied to output dense per-pixel predictions of text to be present. For each positive sample, the channel assumes a probability. As referred in figure:1 The preprocessed image input is forwarded to the neural network. There it compares the input with the trained dataset after detecting the object and text. It generates output as to what object or text is that. This is followed by text to speech conversion process to help the visually impaired person. II. CONVOLUTION NEURAL NETWORK CNN [3] is primarily used for pattern recognition within the image, thus making it suitable for the image-focused taskby reducing parameters to set up a model. It consists of three layers namely, convolution layer, pooling, fully connected layer. Figure 1: Shows simplified CNN architecture comprising of feature extraction and classification. Convolution layer performs convolution operation between an array of input pixels and kernel of a particular size which slides all over the input frame IJTSRD31508
  • 2. International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID – IJTSRD31508 | Volume – 4 | Issue – 5 | July-August 2020 Page 625 Figure 2: Simplified CNN Architecture The resulting output is passed through the activation function (Rectified linear unit) in-order to removenonlinear functionalities figure (2). The Pooling layer simplifies the overall dimension of the image by performing the down sampling operation. It operates over each input activation map and scales using the “MAX” function. Most of the CNN contains a max-pooling layer with a kernel of 2x2 size. Figure 3: Rectified linear unit (ReLU) activation function The output from the final pooling or convolution layer is flattened and fedas input toa fully connected layer. The final layer uses soft max activation function which is used to classify based on probabilities obtained. III. CAFFE OBJECT DETECTION MODELS Caffe [4] (convolution Architecture for fast feature embedding) is a deep learning framework that supports different architecture for image classification and segmentation. The following are the model used for object recognition. A. Single-shot detectors: It is the combination of Yolo’s regression and a faster RCNN mechanism [5]. It simplifies the complexity of the neural network. The local feature extraction method is effective in SSD where multiscale feature extraction. The recognition speed of SSD is 59 fps (frames per second). It quenches the need for real-timeimplementations. The only demeritisthat its fragile detection of tiny objects. At the time of prediction, the network generates a default box and adjusts it to match with the object shape [6],[7]. Figure 3:localization and confidence using multiple bounding boxes B. Mobile Nets While developing object detection networks we generally utilize existing network architecture, suchas VGG or ResNet The limitationis thatthese networkarchitecturescanbevery huge in the order of 200-500MB. So it doesn't support resource-constrained systems. Mobile nets differ from traditional CNN through the usage of depth-wise separable convolution (figure 4)[7],[8]. Figure 4: (Left) Standard convolutional layer with batch normalization and ReLU. (Right) Depth- wise separable convolution with depth-wise and pointwise layers followed by batch normalization and ReLU (figure and caption from Liu et al.). Here, convolution is divided into two stages of 3x3 depth wise convolutions followed bya 1x1 point wise convolution. IV. TEXT RECOGNITION Text recognition utilizes the help of OCR to detect the text from an image. OCR is a popular model use to get the character from the text. OCR is a self learning translator of character in typed, printed or any documented and pictured forms. It has various methods to recognize characters like thinning thickening, feature extraction [9]. The major steps corresponding to text detection comprises data collection, pre processing, segmentation, features extracting and classification. In case of first step, pre trained data sets or own data sets can be used for the research. Pre processing includes slant detection, slant correction, gray scale conversion, normalization; noise removaletc.segmentationis done to splits the words from sentences for better detection. The feature extraction step extracts the features from the separated words and compares it with features of trained images to confirm with the matching classes. Finally classifiers are used with respect to convenience for the classification process. Figure 5:Basic steps of OCR[10]
  • 3. International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID – IJTSRD31508 | Volume – 4 | Issue – 5 | July-August 2020 Page 626 A. Data collection: It involves acquiring the text from a frame in the form of video of resolution using webcam [9] B. Preprocessing: Once the data is collected, various preprocessing techniques are used to enhance and to improve the quality of the image. It is an important stage prior to feature extraction because it determines the quality of the image in the successive stages [10]. C. Segmentation: Segmentationisnothingbutseparating thevarious lettersof a word. Only if there is a good segmentation, better recognition rate can be obtained [11]. D. Feature extraction: It is a stage where the required information is obtained by using minimum number of resources. It also reduces the overall dimensions of the image. E. Classification: It is process of categorizing a character into its suitable category. There are different function to classify the image[12],[13]. V. TEXT TO SPEECH SYNTHESIZER Text to speech system (TTS) converts text into voice using a synthesizer[14]which producesartificial speechofhuman.It converts an arbitrary plain text into a corresponding waveform. The maincomponents of this systemare text processingand speech generation [15]. The text processing unit is responsible for producing a sequence of phonemic units according to the input text and also includes text normalization and tokenization. Then makes the text into prosodic units, which is termed as phonetic (also called as text-to-phoneme or grapheme-to-phoneme conversion). These are then realized by speech generation system. A general text to speech system is shown in figure 5[16] Figure 5: A overview of TTS system The Textnormalizationinvolvestransforming thewholetext into standard format [17].Tokenization are used to split the whole text intowords, keywords, phrasesand symbolsitalso discards some punctuation like commas. Speech synthesizer has to generate sounds to make those words in input text. There are 40 phonemes available for 26 letters in English alphabet it is because some letters can be read in multiple ways. But this may get harder for practical implementation commonly called as prosody, in linguistics. Within a word, a given phoneme may sounds differently because the phonemes that come before and after it. Thusgrapheme isthealternative approachwherewordsare segmented into individual letters. Once the sequences of words are converted into list of phonemes, then it will be converted into speech. Following are the different speech synthesis techniques. A. Concatenative synthesis It is a synthesizer that make use of preloaded record of human voiceswhichisreferred toas units. Theunitscontains lots of examples of different things, break the spoken sentencesintowordsandthewordsinto phonemes[18].Thus itcan easily rearrange tocreate newwordsandsentence.The duration of units is defined and are in the range of 10 ms up to 10 seconds. B. formant synthesis It is created by using additive synthesis and an acoustic model instead of human speech input during runtime[19]. Factors like noise, frequencyandvoicearechangedwithtime to produce signalsofartificial speech.Thisis often termedas rules based synthesis. Most of the systems that use formant synthesis have robotic or artificially generated voice which are unique from human speech. It is trusted for its intelligence and high speed, eliminating acoustic flaws that usually infect concatenative systems. Visually impaired use high speedsynthesizedspeech tonavigate throughcomputer screensUsingscreen reader. Formant synthesizers normally do not have a database of speech samples so, they are found to be smaller programs than concatenative systems. C. Articulatory synthesis It is computational technique used for synthesizing speech with respecttomodel ofhuman vocal tract(voiceprocess).It is a mechanism to mimic a human voice[19]. VI. EXPERIMENTAL SETUP A. Object Detection To execute object detection using CNN it makes use of the Caffe framework for deploying the object detection model. The datasets include 20 classes [7], [20]. The model contains apre-trained. Caffe model file, many. Proto txt files. Which has its parameterized structure within the proto text file and also weights are predefined. Thus a single frame of the input video is resized to the dimension of 300x300(height & width) and forward passed into the model. So that it can detect 19 objects in an image. Multiple objects can be detected from a single frame. B. trained dataset The datasetcontainsthefollowingclassesincluding airplanes, boats, bicycles, bottles, birds, buses, cars, cats, chairs, cows, dining tables, dogs, horses, motorbikes, potted plants, people, sofas, trains, and television monitors. The probability during each detection is also checked. If the confidence level is above the threshold level defined the particularclass namewhich is referred toas label alongwith their accuracy will be displayed and also a bounding box is computed over the objects detected And also the label detected will be created as a separate text file which helps in producing the audio output of the object detected.
  • 4. International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID – IJTSRD31508 | Volume – 4 | Issue – 5 | July-August 2020 Page 627 A. labels and their accuracy during detection TABLE I Different objects and text fonts that can be detected Since the labeling had to be done manually and also because of the size of RAM we couldn’t produce enough data, or data of assured quality, to properly train the neural network model. We have reached the possible accuracy for a CNN with images as input data, we have only created a model for detecting 20 objects. With the help of OCR it detects the different font styles with approximate predictions of the word or sentence. Table 6.1(a) shows the list ofobjectsand Table 6.1(b) shows listof text fonts which can be detected by the proposed algorithm. Along with the detection italso produces voice output of the detected object and text through the earphone connected to the audio jack. As far as the results are concerned the following are observed to be achieved from the project. In case of text, itis possible for the device to detect from either image inputs or live reading of the texts. Itiscapable of recognizing standard text formats of different fonts. Especially bold letters can be detected fast and accurately. It is noted that words in light background are recognized well with precision. These results are based on the frame intensity and pixel variations. The outcome is subject to vary in accuracy depending on the camera specifications and focus done to the object. The accuracy of the predicted objectsdifferdueto different condition suchascontrast sensitivity, illumination, noise and also it producesmaximumaccuracyif the objectis closer to the camera in the range of 70 cm with good lightning condition. VII. CONCLUSION This device iscapable ofacting as a smart device. It provides visually impaired people with the list of objects in the surrounding along with accuracy of prediction.Thereforethe person will get to know about the hindrances in his path. This is more like a voice assistant that tells name of the things around us. Multiple objects in a single frame is detected which is an added advantage. REFERENCES [1] T.-Y. Lin, J. Hays, M. Maire, P. Perona, S. Belongie, D. Ramanan, L. Bourdev, L. Zitnick, R. Girshick and P. Dollar, "MicrosoftCOCO: Common Objects in Context," arXiv:1405.0312v3, pp. 1-15, February 2015. [2] X. Zhou et al., ”EAST: An efficient and accurate scene text detector”, Proc-30th IEEEconf.Comput.vis. Pattern Recoginition, CVPR 2017, vol.2017-janua, pp.2642- 2651, 2017. [3] X. Liu, D. Liang, S. Yan, D. Chen, Y. Qiao and J. Yan, ”FOTS: Fast Oriented Text Spotting with a Unified Network”, Proc. IEEEComput. Soc. Conf. Comput.Vis. Pattern Recognit., pp.5676-5685, 2018. [4] M. Everingham, L. V. Gool, C. K. I. Williams, J. Winn and A. Zisserman, "The PASCAL Visual ObjectClasses(VOC) Challenge," International Journal of Computer Vision, vol. 88, no. 2, pp. 303-338, 2010. **** [5] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convo-lutional neural networks. In: Advances in neural information processing systems. pp. 1097–1105 (2012)**** [6] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, andA. C. Berg, “Ssd: Single shot multibox detector,” in ECCV, 2016.(ssdku)*** [7] Liu, Wei, et al. "Ssd: Single shot multibox detector." European conference on computer vision. Springer, Cham, 2016. [8] A. G. Howard, M. Zhu, B. Chen and D. Kalenichenko, "MobileNets: Efficient ConvolutionalNeural Networks for Mobile Vision," 2017. [9] “mobilenet_ssd pretrained model” https://github.com/chuanqi305/MobileNet- SSD/blob/master/README.md [10] A Survey on Optical Character Recognition System Noman Islam, Zeeshan Islam, Nazia Noor, Journal of Information & Communication Technology-JICTVol.10 Issue. 2, December 2016. [11] Lund, W.B., Kennard, D.J., & Ringger, E.K. (2013). Combining Multiple Thresholding Binarization Values to Improve OCR Output presented in Document Recognition and Retrieval XX Conference 2013, California, USA, 2013. USA: SPIE. [12] Shaikh, N. A., & Shaikh, Z. A, 2005, A generalized thinning algorithm for cursive and non-cursive language scripts presented in 9th International Multitopic Conference IEEE INMIC, Pakistan, 2005. Pakistan: IEEE [13] Shaikh, N. A., Shaikh, Z. A., & Ali, G, 2008, Segmentation of Arabic text into characters for recognitionpresented in International Multi Topic Conference, IMTIC, Jamshoro, Pakistan, 2008. [14] Ciresan, D. C., Meier, U., Gambardella, L. M., & Schmidhuber, J, 2011, Convolutional neural network committees for handwritten character classification presented in International Conference on Document Analysis and Recognition, Beijing, China, 2011. USA: IEEE.
  • 5. International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID – IJTSRD31508 | Volume – 4 | Issue – 5 | July-August 2020 Page 628 [15] Archana Balyan, S. S. Agrwal and Amita Dev, Speech Synthesis: Review, IJERT, ISSN 2278-0181 Vol. 2 (2013) p. 57 – 75. [16] E. Cambria andB. White, “JumpingNLPcurves:areview of natural language processing research,” IEEE Computational Intelligence, vol. 9, no. 2, pp. 48–57, 2014. [17] van Santen, Jan P. H.; Sproat, Richard W.; Olive,JosephP.; Hirschberg, Julia (1997). Progress in Speech Synthesis. Springer. ISBN 978-0-387-94701-3. [18] Liu, F., Weng, F., Jiang, X.: A Broad-Coverage Normalization System for Social Media Language. In: Proceedings of the 50th Annual Meeting of the AssociationforComputationalLinguistics:LongPapers- Volume 1. Number July (2012) 1035–1044. [19] Mark Tathamand Katherine Morton, Developments in Speech Synthesis (John Wiley & Sons,Ltd. ISBN:0-470- 85538-X, 2005) [20] “Deep learning for computer vision with Caffe and cuDNN” NIVIDIA Developer Blog. October 16,2014 [21] (ocr) “A Complete Workflow for Development of Bangla OCR” International Journal of Computer Applications (0975 – 8887) Volume 21– No.9, May 2011