SlideShare a Scribd company logo
1 of 41
Download to read offline
Presentation
on
Detection & Recognition of Text Using YOLO Based
Framework
By
Nisarg Gandhewar
S. R. Tandan
Rohit Miri
Contents
•Introduction
•Motivation
•Challenges
•Review of Literature
• Identified Research Gap
• Objectives
•Methodology
•Result & Discussion
• Future Scope
• Conclusion
•References
2
By Mr Nisarg Gandhewar
Introduction
3
By Mr Nisarg Gandhewar
•Increasing use of smart phone in our day to day life to capture images initiates a need to
recognize text from natural images.
•Text in natural scenes exists in almost every phase of our daily life.
•Its an active research topic in the field of computer vision due its real world applications
as driverless car, Industrial automation etc.
•Recognizing texts from natural images is still a difficult task because of series of grand
challenges
Motivation
4
Robot Vision
Self Driving Car
Visual Question Answering Image Annotation and Retrival
By Mr Nisarg Gandhewar
Challenges
5
•Text density
•Structure of text: text on a page is structured, mostly in strict rows, while text in
the wild may be sprinkled everywhere, in different rotations.
•Fonts:
•Artifacts: clearly, outdoor pictures are much noisier than the comfortable
scanner.
•Location: some tasks include cropped/centred text, while in others, text may be
located in random locations in the image.
By Mr Nisarg Gandhewar
Challenges
•Diversity in fonts, scales, and orientations of text.
6
By Mr Nisarg Gandhewar
Challenges
• Complexity and Interference of Backgrounds.
• Imperfect Imaging Conditions
7
•Ignorance of some Text Part
By Mr Nisarg Gandhewar
Challenges
8
• Multi-Language
•Robust Reading Competition
It provides a platform which acts as a bridge between document analysis
community and computer vision community.
By Mr Nisarg Gandhewar
Review of Literature
9
•Classic Computer Vision Techniques
CC based, Sliding Window Based, Texture Based
•Segmentation Based Techniques
PSENet
•General Object Detection Based Techniques:
SSD, Retina Net, RCNN, Fast RCNN, Faster RCNN, YOLO
By Mr Nisarg Gandhewar
Review of Literature
Sr
No
Author
Name
Year Proposed Work Remark
1 Wang et al 2019
Wang et al. applied a (CNN) model with SW scheme to
obtain candidate lines of text in given image, and thus
estimate text locations.
Classic Technique
2 Yao et al. 2016
Yao et al. consider text detection as a semantic
segmentation problem. They use a FCN model based on
holistically- nested edge detection (HED) to produce global
maps including information of text region, individual
characters and their relationship
Segmentation Based
Technique
3 Deng et al. 2018
Dan Deng proposed PixelLink an instance segmentation
based technique where text sample is first segregated by
connecting pixels inside the identical instance collectively.
Bounding box for text is then obtained from the
segmentation output directly, exclusive of location
regression.
Segmentation Based
Technique
By Mr Nisarg Gandhewar
Review of Literature
Sr
No
Author
Name
Year Proposed Work Remark
4 Wang et al 2019
Wang et al. presents PSENet to locate text sample
of random shapes and produce the diverse size of
kernels for every sample of text and steadily enlarge
the minimal scale kernel to the sample of text of
entire shape
Segmentation Based Technique
5 Deng et al. 2019
Linjie Deng et al. proposed a technique rely upon
RetinaNet for arbitrary oriented detection of text,
having aim to incorporate the learning mechanism
borrowed from two stage RCNN structure into the
one stage detector.
Object Detection based Technique
Two stage detector
6 Adarsh et al. 2020
Pranav Adarsh et al. proposed YOLO v3-Tiny one
stage improved model based on YOLO speeds up
object detection while guarantees the precision of
the outcome [2].
Object Detection based Technique
One stage Detector
By Mr Nisarg Gandhewar
Review of Literature
Sr
No
Author
Name
Year Proposed Work Remark
7 Liu et al. 2016
Presented Single-Shot Detector (SSD) like
architecture is used to extract features and perform
text/non-text prediction as well as link prediction.
Object Detection based
Technique, One stage detector
8 Liao et al. 2017
Minghui Liao et al. present text detection technique
TextBoxes which is fit for identifying text in a sole
network, having no post process excluding non max
suppression.
Object Detection based Technique,
9 Liao et al. 2018
Minghui Liao et al. present TextBoxes++ approach
based on SSD for multi oriented detection of scene
text having both high precision and proficiency.
Object Detection based Technique,
One stage detector
By Mr Nisarg Gandhewar
Identified Research Gap
13
•There exist a trade off in speed and precision of result to
discover text in scene pictures.
•The size of created model is big.
•There is scope to improve accuracy.
By Mr Nisarg Gandhewar
Objectives
14
•To detect & recognize text from natural scene images under different
conditions.
•To develop an approach exploring multiple real world datasets.
•To read a text of different orientations.
•To explore deep learning framework for detection & recognition of text.
•To improve the speed and accuracy of text detection.
•To reduce the size of model.
By Mr Nisarg Gandhewar
15
Methodology For Text Detection
By Mr Nisarg Gandhewar
16
Pre-processing Steps
•Image Annotation:
•Image Pre-processing Operations:
 Orienting
 Resizing
 Auto-adjust contrast.
•Image Augmentation:
 Shear with 15 degree angle,
 Brightness with 25% and
 Saturation with 25 %.
By Mr Nisarg Gandhewar
17
Model Tuning and HyperParameters
•Yolov4:
Backbone: CSPDarknet53
Neck: Path aggregation network (PANet)
Hyperparameters:
batch=64, subdivisions=32, width=608, height=608, channel=3,
momentum = 0.949, decay = 0.0005, saturation = 1.5, exposure = 1.5, hue=0.1,
learning rate = 0.00261, maximum batches = 4000, and filters=18.
Here we use only first 137 layers out of 162.
•Yolov5:
Backbone: Cross-Stage-Partial-Networks
Neck: Path aggregation network (PANet)
Hyperparameters:
batch=16, subdivisions=32, width=416, height=416, momentum = 0.1,
learning rate = 0.00261, maximum batches = 4000.
Here we use YOLOv5x Pre-trained Weight.
By Mr Nisarg Gandhewar
18
Model Tuning and HyperParameters
•Detectron2:
Backbone: Base-RCNN-FPN
Neck: Region-Proposal-Network
Hyperparameters:
batch=64, subdivisions=32, width=416, height=416, channel=3,
momentum = 0.1, learning rate = 0.001, maximum batches = 4000.
Here we use pre-trained weights with X101-FPN model
By Mr Nisarg Gandhewar
19
Text Detection Algorithm
Algorithm1 Text Detection
Input: Image (I)
Output: Text Detection
For n =1 to N do
Divide I into G * G grid
For each G
Generate V bounding boxes & Anchor Box
For each V generate confidence score (C)
and Class Probability (P)
IOU = Area of Intersection / Area of Union
If IOU > 0.5
then consider V
NMS (P)
Else ignore V
End
End
End
By Mr Nisarg Gandhewar
20
Methodology For Text Recognition
By Mr Nisarg Gandhewar
21
Data Preprocessing
•Interpret the image and convert it into a gray-scale image.
•Formulate all images of size (128,32) by utilizing padding.
•Expand image dimension as (128,32,1) to make it compatible with the input
shape of architecture.
•Normalize the image pixel values by dividing it with 255.
To preprocess the output labels use the followings:
•Read the text from the name of the image.
•Encode each character of a word into some numerical value by creating a
function( as „a‟:0, „b‟:1 …….. „z‟:26 etc ).
Let say we are having the word „abab‟ then our encoded label would be [0,1,0,1]
By Mr Nisarg Gandhewar
22
Network Architecture
•Input image of height 32 and width 128.
•Here we used seven convolution layers of which 6 are having kernel size (3,3)
and the last one is of size (2.2). And the number of filters is increased from 64 to
512 layer by layer.
•Two max-pooling layers are added with size (2,2) and then two max-pooling
layers of size (2,1) are added to extract features with a larger width to predict
long texts.
• We used batch normalization layers after fifth and sixth convolution layers
which accelerates the training process.
•Then we used two Bidirectional LSTM layers each of which has 128 units.
Loss Function: CTC
By Mr Nisarg Gandhewar
Benchmark Dataset Used for Experimentation
23
SVT
By Mr Nisarg Gandhewar
Result & Discussion: Text Detection
24
Outcome of text detection
techniques on SVT dataset
Outcome of text detection techniques on
MSRA-TD 500 dataset
P: Precision R:Recall F: F-Measure
Technique P R F
ProposedYOLOv4 87 67 76
Proposed Detectron2 71 55.7 62.42
Proposed YOLOv5 57 63 60
Tian 68 65 66
Zhang 68 53 60
Rong 29 27 28
Gupta 26.20 27.4 26.7
Jaderberg 53.6 62.8 46.8
Kittler 55 81 62
Kasar 70 71 69
Technique P R F
ProposedYOLOv4 83 78 81
Proposed Detectron2 69 53.8 60.45
Proposed YOLOv5 75 77.5 76.22
Zhou 87.2 67.4 76.08
Zhang 83 67 74
He 77 70 74
Turki 72 79 75.3
Liao 87 73 79
Deng 83 73.2 77.8
Liu 84.5 77.1 80.6
By Mr Nisarg Gandhewar
25
Outcome of text detection techniques
on ICDAR 2013 dataset Outcome of text detection techniques
on ICDAR 2003 dataset
P: Precision R:Recall F: F-Measure
Technique P R F
ProposedYOLOv4 89 94 91
Proposed Detectron2 93.5 77.1 84.10
Proposed YOLOv5 79.6 77.2 78.38
Zhong 93 86.7 89.7
He 92 81 86
Gupta 92 75.5 83
Liao 89 83 86
Zhang 88 74 80
Lyu 92 84.4 88
Result & Discussion: Text Detection
Technique P R F
ProposedYOLOv4 83 82 83
Proposed Detectron2 84.91 68.2 75.64
Proposed YOLOv5 74.5 86.7 80.13
Kittler 75 89 78
Kasar 72 64 65
Sauvola 65 83 67
Howe 76 84 76
By Mr Nisarg Gandhewar
26
Outcome of text detection techniques
on ICDAR 2015 dataset
P: Precision R:Recall F: F-Measure
Result & Discussion: Text Detection
Technique P R F
ProposedYOLOv4 69 68 69
Proposed Detectron2 61.2 50.5 55.33
Proposed YOLOv5 76 44 55.73
Zhou 83.27 78.3 80.7
He 82 80 81
Liao 87.8 78.5 82.9
Lyu 89.5 79.7 84.3
Zhong 89 83 86
Xie 84 81.9 82.9
Wang 86.9 84.5 85.7
By Mr Nisarg Gandhewar
27
Result & Discussion: Text Recognition
Original text = UPSTAIRS
predicted text = UPSTAIRS
original_text = SCIATICA
predicted text = SCIATICA
original_text = camouflagers
predicted text = camouflagers
original_text = ORATES
predicted text = ORATES
original_text = Ukuleles
predicted text = Ukueles
original_text = HILARIOUS
predicted text = WULURIOLS
original_text = Procurable
predicted text = Proturable
original_text = lix
predicted text = lix
By Mr Nisarg Gandhewar
Sample Output
28
By Mr Nisarg Gandhewar
Sample Output
29
By Mr Nisarg Gandhewar
Sample Output
30
By Mr Nisarg Gandhewar
Sample Output
31
By Mr Nisarg Gandhewar
Sample Output
32
By Mr Nisarg Gandhewar
Conclusion
33
•There exist several techniques with a trade off in speed, performance and
accuracy of outcome to identify text in scene images.
•We have introduced a YOLOv4, YOLOv5 and Detectron2 based
framework for text detection by considering cons of existing techniques.
• The performance of proposed framework is validated on datasets like
ICDAR 2015, ICDAR 2013, ICDAR 2003, SVT, MSRA-TD-500 by
considering metrics like precision, recall and F-measure.
By Mr Nisarg Gandhewar
Conclusion
34
•Our YOLOv4 based framework shows promising result as compare to
existing techniques of text detection over several benchmark datasets
except ICDAR 2015.
• Our proposed model have overcome the challenges like
• Diversity and Variability of Text in Natural Scenes,
• Complexity and Interference of Backgrounds,
• Imperfect Imaging Conditions,
• Diversity inside Datasets,
• Ignorance of some Text Part,
• Multi-Orientation
• and got optimum results over ICDAR2013 dataset.
• Text recognition framework also attain very good results.
By Mr Nisarg Gandhewar
Future Scope
35
•To extend our projected framework for detecting text with curved shape,
with multilingual support.
•To detect text in real time video also.
•To secure our model with respect to different adversarial attacks.
By Mr Nisarg Gandhewar
References
36
Adarsh P , Rathi P (2020), “YOLO v3 Tiny- Object Detection and Recognition using one-stage
improved model”, ICACCS.
Bochovsky A, Wang C, Yuan H (2020), “YOLOv4- Optimal Speed and Accuracy of Object-Detection”,
Computer Vision and Pattern Recognition. arXiv:2004.10934 , submitted to Cornell University.
Deng D, Liu H (2018),” Pixel-Link- Detecting Scene-Text via Instance Segmentation”, AAAI.
Deng L, Gong Y, Lu X, Ma Z, Xie M (2019), “STELA: A Real Time Scene-Text-Detector with Learned
Anchor”, Submitted to cornell University, arXiv:1909.07549.
Deng D, Liu H, Li X, Cai D (2018), “PixelLink: detecting scene text via instance segmentation”, In:
Proceedings of association for the advancement of artificial intelligence, pp 1–8.
Dai J, Qi H, Xiong Y, Li Y, Zhang G (2017) Deformable convolutional networks. In: IEEE international
conference on computer vision, pp 764–773.
Gupta A, Vedaldi A, (2016), “Synthetic data for text localisation in natural images”, IEEE Conf. Comp.
Vis. Patt. Recog. (CVPR), pp. 2315–2324.
He T, Huang W, Qiao Y, Yao J (2016), “Accurate text localization in natural image with cascaded
convolutional textnetwork”.,pp 1–10. arXiv :1603.09423.
He W, Zhang X,Yin Y (2017), “Deep direct regression for multi-oriented scene text detection”, arXiv
preprint arXiv:1703.08289, submitted to Cornell University.
By Mr Nisarg Gandhewar
He W, Zhang XY, Yin F, Liu CL (2017), “Deep direct regression for multi-oriented scene text
detection”, In: IEEE international conference on computer vision, pp 745–753.
Howe N (2011), “A Laplacian Energy for Document Binarization”, International Conference on
Document Analysis and Recognition.
Huang Z, Zhong Z, Huo Q (2019), “Mask RCNN with Pyramid-Attention-Network for Scene-Text-
Detection”, IEEE Conference on Application of Computer Vision.
Jaderberg M , Simonyan K, Vedaldi A and Zisserman A (2014), “Reading text in the wild with
convolutional neural networks”, arXiv:1412.1842, submitted to Cornell University.
Jiang Y, Zhu X, Wang X, Yang S, Li W (2017), “R2CNN: rotational region cnn for orientation robust
scene text detection”, pp 1–8. arXiv :1706.09579
Jocher G, Stoken A, Borovec J, Laughing C, Hogan A, Wang A, Diaconu L, Poznanski J, Rai P,
Ferriday R, Sullivan T, Xinyu W, (2020), “Ultralytics/YOLOV5: V3.0. [Online]”, Available:
https://github.com/ultralytics/yolov5, doi:10.5281/zenodo.3983579.
Kasar T, Kumar J (2007), “Font and Background Color Independent Text-Binarization”, International
Workshop on Camera Based Document Analysis and Recognition, pp. 3–9.
Kittler J, Illingworth J, and Föglein J (1985), “Threshold Selection based on a Simple Image Statistic”, Computer
Vision Graphics and Image Processing, pages. 125-147
References
By Mr Nisarg Gandhewar
References
Liao M, Shi B, Bai X (2018), “TextBoxes++: A Single-Shot Oriented Scene-Text-Detector”,
arXiv:1801.02765 , Submitted to Cornell University.
Liao M, Wan Z, Yao C (2020), “Real-time Scene-Text-Detection with Differentiable Binarization”,
Conference on Artificial Intelligence.
Liao M, Shi B, Bai X, Wang X, Liu W (2017), “TextBoxes a fast text detector with a single deep neural
network”, In: Proceedings of association for the advancement of artificial intelligence,pp 1–7
Liao M, Zhu Z, Shi B, Xia G, Bai X (2018), “Rotation-sensitive regression for oriented scene text
detection”, In: IEEE conference on computer vision and pattern recognition, pp 1–10
Lin H, Yang P (2019),”Review of Scene-Text-Detection and Recognition”, Archives of Computational
Methods in Engg.
Liu X, Meng G, Pan C (2019), “Scene-text-detection and recognition with advances in deep-learning: a
survey”, IJDAR.
Liu Y, Zhang L, Luo C, Zhang S (2019), “Curved scene text detection via transverse and longitudinal
sequence connection”,. Pattern Recognition Journal.
Liu Y, Jin L (2017), “Deep matching prior network toward tighter multi-oriented text detection”, In: IEEE
conference on computer vision and pattern recognition, pp 3454–3461
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S (2016), “SSD: single shot multibox detector. In”,
European conference on computer vision, pp 21–37
Li X, Wang W, Hou W, Liu RZ, Lu T (2018), “Shape robust text detection with progressive scale expansion
network”, pp 1–12. arXiv :1806.02559
Lyu P, Yao C, Wu W, Yan S, Bai X (2018), “Multi-oriented scene text detection via corner localization and
region segmentation”, IEEE Conf. Comp. Vis. Patt. Recog. (CVPR).
Ma J, Shao W, Ye H, Wang L, Wang H (2017), “Arbitrary-oriented scene text detection via rotation
proposals” IEEE Trans Multimed 20:1–9 39.
By Mr Nisarg Gandhewar
Qin S, Manduchi R (2017), “Cascaded segmentation-detection networks for word-level text
spotting”, In: International conference on document analysis and recognition, pp 1275–1282.
Rong X, Yi C, Tian Y (2017), “Unambiguous text-localization and retrieval for cluttered scenes”, In
CVPR, pp. 3279–3287.
Sauvola J and Pietikäinen M (2000), “Adaptive document image binarization”, Pattern
Recognition, Volume 33, Issue 2.
Shi, X. Bai, and C. Yao (2016), “An end-to-end trainable neural network for image-based sequence
recognition and its application to scene text recognition”, IEEE Trans. Pattern Anal. Mach. Intell..
Shi, X. Wang, P. Lv, C. Yao, and X. Bai (2016), “Robust scene text recognition with automatic
rectification”, In Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
Shi B, Bai X, Belongie S (2017), “Detecting oriented text in natural images by linking segments”,
In: IEEE conference on computer vision and pattern recognition, pp 2482–3490.
Tian Z , Shu M , Lyu P, Li R, Zhou C (2019), “Learning Shape-Aware Embedding for Scene Text
Detection”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Tian Z, Huang W, He T, He P, Qiao Y (2016), “Detecting text in natural image with connectionist
text proposal network”, In: European conference on computer vision, pp 56–72.
Wang W, Xie E, Zang Y, Lu T (2019), “Efficient and Accurate Arbitrary-Shaped Text-Detection with
Pixel Aggregation Network”, ICCV.
References
By Mr Nisarg Gandhewar
Wu Y, Kirillov A, Massa F, Lo W, Girshick R (2019), “Detectron2”, url: https://github.com
/facebookresearch/ detectron2.
Xue C, Lu S, Zhan F (2018), “Accurate Scene-Text-Detection through Border Semantics
Awareness and Bootstrapping”, ECCV.
Yang Q, Cheng M, Zhou W, Chen Y, Qiu M (2018), “IncepText: a new inception-text
module with deformable psroi pooling for multi-oriented scene text detection”, In:
International joint conference on artificial intelligence, pp 1–7
Yao C, Bai X, Sang N, Zhou X, Zhou S (2016), “SceneText detection via holistic, multi-
channel prediction”, arXiv :1606.09002 pp 1–10
Zhang Z , Shen W, Yao C (2015), “Symmetry-based text line detection in natural scenes”, In CVPR,
pp. 2558–2567.
Zhang Z, Zhang C, Shen W, Yao C, Liu W (2016), “Multioriented text detection with fully convolutional
networks”, In:Computer vision & pattern recognition, pp 4159–4167.
Zhong Z, Sun L, Huo Q (2017), “Improved localization accuracy by locnet for faster r-cnn based text
detection”, International Conference on Document Analysis and Recognition.
Zhong Z, Jin L, Zhang S, Feng Z (2016), “DeepText a unified framework for text proposal generation
and text detection in natural images”, pp 1–12. arXiv :1605.07314 v1.
Zhou X, Yao C, Wen H, Wang Y, Zhou S (2017), “EAST an efficient and accurate scene text detector”,
In: IEEE conference on computer vision and pattern recognition, pp 2642–2651.
References
By Mr Nisarg Gandhewar
Thanks
By
Nisarg Gandhewar
S. R. Tandan
Rohit Miri
41

More Related Content

What's hot

Module 4 Arithmetic Coding
Module 4 Arithmetic CodingModule 4 Arithmetic Coding
Module 4 Arithmetic Codinganithabalaprabhu
 
Attributes of output primitive(line attributes)
Attributes of output primitive(line attributes)Attributes of output primitive(line attributes)
Attributes of output primitive(line attributes)shalinikarunakaran1
 
Image feature extraction
Image feature extractionImage feature extraction
Image feature extractionRushin Shah
 
Object detection presentation
Object detection presentationObject detection presentation
Object detection presentationAshwinBicholiya
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaPreferred Networks
 
Fundamentals and image compression models
Fundamentals and image compression modelsFundamentals and image compression models
Fundamentals and image compression modelslavanya marichamy
 
sutherland- Hodgeman Polygon clipping
sutherland- Hodgeman Polygon clippingsutherland- Hodgeman Polygon clipping
sutherland- Hodgeman Polygon clippingArvind Kumar
 
Character generation
Character generationCharacter generation
Character generationAnkit Garg
 
Output primitives in Computer Graphics
Output primitives in Computer GraphicsOutput primitives in Computer Graphics
Output primitives in Computer GraphicsKamal Acharya
 
color detection using open cv
color detection using open cvcolor detection using open cv
color detection using open cvAnil Pokhrel
 
Content based image retrieval(cbir)
Content based image retrieval(cbir)Content based image retrieval(cbir)
Content based image retrieval(cbir)paddu123
 
Image compression standards
Image compression standardsImage compression standards
Image compression standardskirupasuchi1996
 
Socket programming in Java (PPTX)
Socket programming in Java (PPTX)Socket programming in Java (PPTX)
Socket programming in Java (PPTX)UC San Diego
 

What's hot (20)

Module 4 Arithmetic Coding
Module 4 Arithmetic CodingModule 4 Arithmetic Coding
Module 4 Arithmetic Coding
 
Attributes of output primitive(line attributes)
Attributes of output primitive(line attributes)Attributes of output primitive(line attributes)
Attributes of output primitive(line attributes)
 
Image feature extraction
Image feature extractionImage feature extraction
Image feature extraction
 
Data mining
Data miningData mining
Data mining
 
Object detection presentation
Object detection presentationObject detection presentation
Object detection presentation
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi Kerola
 
Data link control protocol(1)
Data link control protocol(1)Data link control protocol(1)
Data link control protocol(1)
 
Fundamentals and image compression models
Fundamentals and image compression modelsFundamentals and image compression models
Fundamentals and image compression models
 
sutherland- Hodgeman Polygon clipping
sutherland- Hodgeman Polygon clippingsutherland- Hodgeman Polygon clipping
sutherland- Hodgeman Polygon clipping
 
Character generation
Character generationCharacter generation
Character generation
 
Output primitives in Computer Graphics
Output primitives in Computer GraphicsOutput primitives in Computer Graphics
Output primitives in Computer Graphics
 
Introduction to Python Basics Programming
Introduction to Python Basics ProgrammingIntroduction to Python Basics Programming
Introduction to Python Basics Programming
 
color detection using open cv
color detection using open cvcolor detection using open cv
color detection using open cv
 
Content based image retrieval(cbir)
Content based image retrieval(cbir)Content based image retrieval(cbir)
Content based image retrieval(cbir)
 
Clipping
ClippingClipping
Clipping
 
Computer graphics realism
Computer graphics realismComputer graphics realism
Computer graphics realism
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
Image compression standards
Image compression standardsImage compression standards
Image compression standards
 
Socket programming in Java (PPTX)
Socket programming in Java (PPTX)Socket programming in Java (PPTX)
Socket programming in Java (PPTX)
 
Java - Sockets
Java - SocketsJava - Sockets
Java - Sockets
 

Similar to Detection & Recognition of Text.pdf

Implementation of Computer Vision Applications using OpenCV in C++
Implementation of Computer Vision Applications using OpenCV in C++Implementation of Computer Vision Applications using OpenCV in C++
Implementation of Computer Vision Applications using OpenCV in C++IRJET Journal
 
A Review on Natural Scene Text Understanding for Computer Vision using Machin...
A Review on Natural Scene Text Understanding for Computer Vision using Machin...A Review on Natural Scene Text Understanding for Computer Vision using Machin...
A Review on Natural Scene Text Understanding for Computer Vision using Machin...IRJET Journal
 
IRJET - Text Detection in Natural Scene Images: A Survey
IRJET - Text Detection in Natural Scene Images: A SurveyIRJET - Text Detection in Natural Scene Images: A Survey
IRJET - Text Detection in Natural Scene Images: A SurveyIRJET Journal
 
Self-Directing Text Detection and Removal from Images with Smoothing
Self-Directing Text Detection and Removal from Images with SmoothingSelf-Directing Text Detection and Removal from Images with Smoothing
Self-Directing Text Detection and Removal from Images with SmoothingPriyanka Wagh
 
Enhanced characterness for text detection in the wild
Enhanced characterness for text detection in the wildEnhanced characterness for text detection in the wild
Enhanced characterness for text detection in the wildPrerana Mukherjee
 
A Survey On Thresholding Operators of Text Extraction In Videos
A Survey On Thresholding Operators of Text Extraction In VideosA Survey On Thresholding Operators of Text Extraction In Videos
A Survey On Thresholding Operators of Text Extraction In VideosCSCJournals
 
A Survey On Thresholding Operators of Text Extraction In Videos
A Survey On Thresholding Operators of Text Extraction In VideosA Survey On Thresholding Operators of Text Extraction In Videos
A Survey On Thresholding Operators of Text Extraction In VideosCSCJournals
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overviewLEE HOSEONG
 
2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong LeeMoazzem Hossain
 
IRJET - Object Detection using Hausdorff Distance
IRJET -  	  Object Detection using Hausdorff DistanceIRJET -  	  Object Detection using Hausdorff Distance
IRJET - Object Detection using Hausdorff DistanceIRJET Journal
 
IRJET- Object Detection using Hausdorff Distance
IRJET-  	  Object Detection using Hausdorff DistanceIRJET-  	  Object Detection using Hausdorff Distance
IRJET- Object Detection using Hausdorff DistanceIRJET Journal
 
Text extraction from natural scene image, a survey
Text extraction from natural scene image, a surveyText extraction from natural scene image, a survey
Text extraction from natural scene image, a surveySOYEON KIM
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsPetteriTeikariPhD
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
A Framework for Curved Videotext Detection and Extraction
A Framework for Curved Videotext Detection and ExtractionA Framework for Curved Videotext Detection and Extraction
A Framework for Curved Videotext Detection and ExtractionIJERA Editor
 
A Framework for Curved Videotext Detection and Extraction
A Framework for Curved Videotext Detection and ExtractionA Framework for Curved Videotext Detection and Extraction
A Framework for Curved Videotext Detection and ExtractionIJERA Editor
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術CHENHuiMei
 
Script Identification for printed document images at text-line level using DC...
Script Identification for printed document images at text-line level using DC...Script Identification for printed document images at text-line level using DC...
Script Identification for printed document images at text-line level using DC...IOSR Journals
 
Nepali character classification
Nepali character classificationNepali character classification
Nepali character classificationPitambar88
 

Similar to Detection & Recognition of Text.pdf (20)

Implementation of Computer Vision Applications using OpenCV in C++
Implementation of Computer Vision Applications using OpenCV in C++Implementation of Computer Vision Applications using OpenCV in C++
Implementation of Computer Vision Applications using OpenCV in C++
 
A Review on Natural Scene Text Understanding for Computer Vision using Machin...
A Review on Natural Scene Text Understanding for Computer Vision using Machin...A Review on Natural Scene Text Understanding for Computer Vision using Machin...
A Review on Natural Scene Text Understanding for Computer Vision using Machin...
 
IRJET - Text Detection in Natural Scene Images: A Survey
IRJET - Text Detection in Natural Scene Images: A SurveyIRJET - Text Detection in Natural Scene Images: A Survey
IRJET - Text Detection in Natural Scene Images: A Survey
 
Self-Directing Text Detection and Removal from Images with Smoothing
Self-Directing Text Detection and Removal from Images with SmoothingSelf-Directing Text Detection and Removal from Images with Smoothing
Self-Directing Text Detection and Removal from Images with Smoothing
 
Enhanced characterness for text detection in the wild
Enhanced characterness for text detection in the wildEnhanced characterness for text detection in the wild
Enhanced characterness for text detection in the wild
 
A Survey On Thresholding Operators of Text Extraction In Videos
A Survey On Thresholding Operators of Text Extraction In VideosA Survey On Thresholding Operators of Text Extraction In Videos
A Survey On Thresholding Operators of Text Extraction In Videos
 
A Survey On Thresholding Operators of Text Extraction In Videos
A Survey On Thresholding Operators of Text Extraction In VideosA Survey On Thresholding Operators of Text Extraction In Videos
A Survey On Thresholding Operators of Text Extraction In Videos
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overview
 
2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee
 
IRJET - Object Detection using Hausdorff Distance
IRJET -  	  Object Detection using Hausdorff DistanceIRJET -  	  Object Detection using Hausdorff Distance
IRJET - Object Detection using Hausdorff Distance
 
IRJET- Object Detection using Hausdorff Distance
IRJET-  	  Object Detection using Hausdorff DistanceIRJET-  	  Object Detection using Hausdorff Distance
IRJET- Object Detection using Hausdorff Distance
 
Text extraction from natural scene image, a survey
Text extraction from natural scene image, a surveyText extraction from natural scene image, a survey
Text extraction from natural scene image, a survey
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
 
Dip
DipDip
Dip
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
A Framework for Curved Videotext Detection and Extraction
A Framework for Curved Videotext Detection and ExtractionA Framework for Curved Videotext Detection and Extraction
A Framework for Curved Videotext Detection and Extraction
 
A Framework for Curved Videotext Detection and Extraction
A Framework for Curved Videotext Detection and ExtractionA Framework for Curved Videotext Detection and Extraction
A Framework for Curved Videotext Detection and Extraction
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術
 
Script Identification for printed document images at text-line level using DC...
Script Identification for printed document images at text-line level using DC...Script Identification for printed document images at text-line level using DC...
Script Identification for printed document images at text-line level using DC...
 
Nepali character classification
Nepali character classificationNepali character classification
Nepali character classification
 

Recently uploaded

Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 

Recently uploaded (20)

Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 

Detection & Recognition of Text.pdf

  • 1. Presentation on Detection & Recognition of Text Using YOLO Based Framework By Nisarg Gandhewar S. R. Tandan Rohit Miri
  • 2. Contents •Introduction •Motivation •Challenges •Review of Literature • Identified Research Gap • Objectives •Methodology •Result & Discussion • Future Scope • Conclusion •References 2 By Mr Nisarg Gandhewar
  • 3. Introduction 3 By Mr Nisarg Gandhewar •Increasing use of smart phone in our day to day life to capture images initiates a need to recognize text from natural images. •Text in natural scenes exists in almost every phase of our daily life. •Its an active research topic in the field of computer vision due its real world applications as driverless car, Industrial automation etc. •Recognizing texts from natural images is still a difficult task because of series of grand challenges
  • 4. Motivation 4 Robot Vision Self Driving Car Visual Question Answering Image Annotation and Retrival By Mr Nisarg Gandhewar
  • 5. Challenges 5 •Text density •Structure of text: text on a page is structured, mostly in strict rows, while text in the wild may be sprinkled everywhere, in different rotations. •Fonts: •Artifacts: clearly, outdoor pictures are much noisier than the comfortable scanner. •Location: some tasks include cropped/centred text, while in others, text may be located in random locations in the image. By Mr Nisarg Gandhewar
  • 6. Challenges •Diversity in fonts, scales, and orientations of text. 6 By Mr Nisarg Gandhewar
  • 7. Challenges • Complexity and Interference of Backgrounds. • Imperfect Imaging Conditions 7 •Ignorance of some Text Part By Mr Nisarg Gandhewar
  • 8. Challenges 8 • Multi-Language •Robust Reading Competition It provides a platform which acts as a bridge between document analysis community and computer vision community. By Mr Nisarg Gandhewar
  • 9. Review of Literature 9 •Classic Computer Vision Techniques CC based, Sliding Window Based, Texture Based •Segmentation Based Techniques PSENet •General Object Detection Based Techniques: SSD, Retina Net, RCNN, Fast RCNN, Faster RCNN, YOLO By Mr Nisarg Gandhewar
  • 10. Review of Literature Sr No Author Name Year Proposed Work Remark 1 Wang et al 2019 Wang et al. applied a (CNN) model with SW scheme to obtain candidate lines of text in given image, and thus estimate text locations. Classic Technique 2 Yao et al. 2016 Yao et al. consider text detection as a semantic segmentation problem. They use a FCN model based on holistically- nested edge detection (HED) to produce global maps including information of text region, individual characters and their relationship Segmentation Based Technique 3 Deng et al. 2018 Dan Deng proposed PixelLink an instance segmentation based technique where text sample is first segregated by connecting pixels inside the identical instance collectively. Bounding box for text is then obtained from the segmentation output directly, exclusive of location regression. Segmentation Based Technique By Mr Nisarg Gandhewar
  • 11. Review of Literature Sr No Author Name Year Proposed Work Remark 4 Wang et al 2019 Wang et al. presents PSENet to locate text sample of random shapes and produce the diverse size of kernels for every sample of text and steadily enlarge the minimal scale kernel to the sample of text of entire shape Segmentation Based Technique 5 Deng et al. 2019 Linjie Deng et al. proposed a technique rely upon RetinaNet for arbitrary oriented detection of text, having aim to incorporate the learning mechanism borrowed from two stage RCNN structure into the one stage detector. Object Detection based Technique Two stage detector 6 Adarsh et al. 2020 Pranav Adarsh et al. proposed YOLO v3-Tiny one stage improved model based on YOLO speeds up object detection while guarantees the precision of the outcome [2]. Object Detection based Technique One stage Detector By Mr Nisarg Gandhewar
  • 12. Review of Literature Sr No Author Name Year Proposed Work Remark 7 Liu et al. 2016 Presented Single-Shot Detector (SSD) like architecture is used to extract features and perform text/non-text prediction as well as link prediction. Object Detection based Technique, One stage detector 8 Liao et al. 2017 Minghui Liao et al. present text detection technique TextBoxes which is fit for identifying text in a sole network, having no post process excluding non max suppression. Object Detection based Technique, 9 Liao et al. 2018 Minghui Liao et al. present TextBoxes++ approach based on SSD for multi oriented detection of scene text having both high precision and proficiency. Object Detection based Technique, One stage detector By Mr Nisarg Gandhewar
  • 13. Identified Research Gap 13 •There exist a trade off in speed and precision of result to discover text in scene pictures. •The size of created model is big. •There is scope to improve accuracy. By Mr Nisarg Gandhewar
  • 14. Objectives 14 •To detect & recognize text from natural scene images under different conditions. •To develop an approach exploring multiple real world datasets. •To read a text of different orientations. •To explore deep learning framework for detection & recognition of text. •To improve the speed and accuracy of text detection. •To reduce the size of model. By Mr Nisarg Gandhewar
  • 15. 15 Methodology For Text Detection By Mr Nisarg Gandhewar
  • 16. 16 Pre-processing Steps •Image Annotation: •Image Pre-processing Operations:  Orienting  Resizing  Auto-adjust contrast. •Image Augmentation:  Shear with 15 degree angle,  Brightness with 25% and  Saturation with 25 %. By Mr Nisarg Gandhewar
  • 17. 17 Model Tuning and HyperParameters •Yolov4: Backbone: CSPDarknet53 Neck: Path aggregation network (PANet) Hyperparameters: batch=64, subdivisions=32, width=608, height=608, channel=3, momentum = 0.949, decay = 0.0005, saturation = 1.5, exposure = 1.5, hue=0.1, learning rate = 0.00261, maximum batches = 4000, and filters=18. Here we use only first 137 layers out of 162. •Yolov5: Backbone: Cross-Stage-Partial-Networks Neck: Path aggregation network (PANet) Hyperparameters: batch=16, subdivisions=32, width=416, height=416, momentum = 0.1, learning rate = 0.00261, maximum batches = 4000. Here we use YOLOv5x Pre-trained Weight. By Mr Nisarg Gandhewar
  • 18. 18 Model Tuning and HyperParameters •Detectron2: Backbone: Base-RCNN-FPN Neck: Region-Proposal-Network Hyperparameters: batch=64, subdivisions=32, width=416, height=416, channel=3, momentum = 0.1, learning rate = 0.001, maximum batches = 4000. Here we use pre-trained weights with X101-FPN model By Mr Nisarg Gandhewar
  • 19. 19 Text Detection Algorithm Algorithm1 Text Detection Input: Image (I) Output: Text Detection For n =1 to N do Divide I into G * G grid For each G Generate V bounding boxes & Anchor Box For each V generate confidence score (C) and Class Probability (P) IOU = Area of Intersection / Area of Union If IOU > 0.5 then consider V NMS (P) Else ignore V End End End By Mr Nisarg Gandhewar
  • 20. 20 Methodology For Text Recognition By Mr Nisarg Gandhewar
  • 21. 21 Data Preprocessing •Interpret the image and convert it into a gray-scale image. •Formulate all images of size (128,32) by utilizing padding. •Expand image dimension as (128,32,1) to make it compatible with the input shape of architecture. •Normalize the image pixel values by dividing it with 255. To preprocess the output labels use the followings: •Read the text from the name of the image. •Encode each character of a word into some numerical value by creating a function( as „a‟:0, „b‟:1 …….. „z‟:26 etc ). Let say we are having the word „abab‟ then our encoded label would be [0,1,0,1] By Mr Nisarg Gandhewar
  • 22. 22 Network Architecture •Input image of height 32 and width 128. •Here we used seven convolution layers of which 6 are having kernel size (3,3) and the last one is of size (2.2). And the number of filters is increased from 64 to 512 layer by layer. •Two max-pooling layers are added with size (2,2) and then two max-pooling layers of size (2,1) are added to extract features with a larger width to predict long texts. • We used batch normalization layers after fifth and sixth convolution layers which accelerates the training process. •Then we used two Bidirectional LSTM layers each of which has 128 units. Loss Function: CTC By Mr Nisarg Gandhewar
  • 23. Benchmark Dataset Used for Experimentation 23 SVT By Mr Nisarg Gandhewar
  • 24. Result & Discussion: Text Detection 24 Outcome of text detection techniques on SVT dataset Outcome of text detection techniques on MSRA-TD 500 dataset P: Precision R:Recall F: F-Measure Technique P R F ProposedYOLOv4 87 67 76 Proposed Detectron2 71 55.7 62.42 Proposed YOLOv5 57 63 60 Tian 68 65 66 Zhang 68 53 60 Rong 29 27 28 Gupta 26.20 27.4 26.7 Jaderberg 53.6 62.8 46.8 Kittler 55 81 62 Kasar 70 71 69 Technique P R F ProposedYOLOv4 83 78 81 Proposed Detectron2 69 53.8 60.45 Proposed YOLOv5 75 77.5 76.22 Zhou 87.2 67.4 76.08 Zhang 83 67 74 He 77 70 74 Turki 72 79 75.3 Liao 87 73 79 Deng 83 73.2 77.8 Liu 84.5 77.1 80.6 By Mr Nisarg Gandhewar
  • 25. 25 Outcome of text detection techniques on ICDAR 2013 dataset Outcome of text detection techniques on ICDAR 2003 dataset P: Precision R:Recall F: F-Measure Technique P R F ProposedYOLOv4 89 94 91 Proposed Detectron2 93.5 77.1 84.10 Proposed YOLOv5 79.6 77.2 78.38 Zhong 93 86.7 89.7 He 92 81 86 Gupta 92 75.5 83 Liao 89 83 86 Zhang 88 74 80 Lyu 92 84.4 88 Result & Discussion: Text Detection Technique P R F ProposedYOLOv4 83 82 83 Proposed Detectron2 84.91 68.2 75.64 Proposed YOLOv5 74.5 86.7 80.13 Kittler 75 89 78 Kasar 72 64 65 Sauvola 65 83 67 Howe 76 84 76 By Mr Nisarg Gandhewar
  • 26. 26 Outcome of text detection techniques on ICDAR 2015 dataset P: Precision R:Recall F: F-Measure Result & Discussion: Text Detection Technique P R F ProposedYOLOv4 69 68 69 Proposed Detectron2 61.2 50.5 55.33 Proposed YOLOv5 76 44 55.73 Zhou 83.27 78.3 80.7 He 82 80 81 Liao 87.8 78.5 82.9 Lyu 89.5 79.7 84.3 Zhong 89 83 86 Xie 84 81.9 82.9 Wang 86.9 84.5 85.7 By Mr Nisarg Gandhewar
  • 27. 27 Result & Discussion: Text Recognition Original text = UPSTAIRS predicted text = UPSTAIRS original_text = SCIATICA predicted text = SCIATICA original_text = camouflagers predicted text = camouflagers original_text = ORATES predicted text = ORATES original_text = Ukuleles predicted text = Ukueles original_text = HILARIOUS predicted text = WULURIOLS original_text = Procurable predicted text = Proturable original_text = lix predicted text = lix By Mr Nisarg Gandhewar
  • 28. Sample Output 28 By Mr Nisarg Gandhewar
  • 29. Sample Output 29 By Mr Nisarg Gandhewar
  • 30. Sample Output 30 By Mr Nisarg Gandhewar
  • 31. Sample Output 31 By Mr Nisarg Gandhewar
  • 32. Sample Output 32 By Mr Nisarg Gandhewar
  • 33. Conclusion 33 •There exist several techniques with a trade off in speed, performance and accuracy of outcome to identify text in scene images. •We have introduced a YOLOv4, YOLOv5 and Detectron2 based framework for text detection by considering cons of existing techniques. • The performance of proposed framework is validated on datasets like ICDAR 2015, ICDAR 2013, ICDAR 2003, SVT, MSRA-TD-500 by considering metrics like precision, recall and F-measure. By Mr Nisarg Gandhewar
  • 34. Conclusion 34 •Our YOLOv4 based framework shows promising result as compare to existing techniques of text detection over several benchmark datasets except ICDAR 2015. • Our proposed model have overcome the challenges like • Diversity and Variability of Text in Natural Scenes, • Complexity and Interference of Backgrounds, • Imperfect Imaging Conditions, • Diversity inside Datasets, • Ignorance of some Text Part, • Multi-Orientation • and got optimum results over ICDAR2013 dataset. • Text recognition framework also attain very good results. By Mr Nisarg Gandhewar
  • 35. Future Scope 35 •To extend our projected framework for detecting text with curved shape, with multilingual support. •To detect text in real time video also. •To secure our model with respect to different adversarial attacks. By Mr Nisarg Gandhewar
  • 36. References 36 Adarsh P , Rathi P (2020), “YOLO v3 Tiny- Object Detection and Recognition using one-stage improved model”, ICACCS. Bochovsky A, Wang C, Yuan H (2020), “YOLOv4- Optimal Speed and Accuracy of Object-Detection”, Computer Vision and Pattern Recognition. arXiv:2004.10934 , submitted to Cornell University. Deng D, Liu H (2018),” Pixel-Link- Detecting Scene-Text via Instance Segmentation”, AAAI. Deng L, Gong Y, Lu X, Ma Z, Xie M (2019), “STELA: A Real Time Scene-Text-Detector with Learned Anchor”, Submitted to cornell University, arXiv:1909.07549. Deng D, Liu H, Li X, Cai D (2018), “PixelLink: detecting scene text via instance segmentation”, In: Proceedings of association for the advancement of artificial intelligence, pp 1–8. Dai J, Qi H, Xiong Y, Li Y, Zhang G (2017) Deformable convolutional networks. In: IEEE international conference on computer vision, pp 764–773. Gupta A, Vedaldi A, (2016), “Synthetic data for text localisation in natural images”, IEEE Conf. Comp. Vis. Patt. Recog. (CVPR), pp. 2315–2324. He T, Huang W, Qiao Y, Yao J (2016), “Accurate text localization in natural image with cascaded convolutional textnetwork”.,pp 1–10. arXiv :1603.09423. He W, Zhang X,Yin Y (2017), “Deep direct regression for multi-oriented scene text detection”, arXiv preprint arXiv:1703.08289, submitted to Cornell University. By Mr Nisarg Gandhewar
  • 37. He W, Zhang XY, Yin F, Liu CL (2017), “Deep direct regression for multi-oriented scene text detection”, In: IEEE international conference on computer vision, pp 745–753. Howe N (2011), “A Laplacian Energy for Document Binarization”, International Conference on Document Analysis and Recognition. Huang Z, Zhong Z, Huo Q (2019), “Mask RCNN with Pyramid-Attention-Network for Scene-Text- Detection”, IEEE Conference on Application of Computer Vision. Jaderberg M , Simonyan K, Vedaldi A and Zisserman A (2014), “Reading text in the wild with convolutional neural networks”, arXiv:1412.1842, submitted to Cornell University. Jiang Y, Zhu X, Wang X, Yang S, Li W (2017), “R2CNN: rotational region cnn for orientation robust scene text detection”, pp 1–8. arXiv :1706.09579 Jocher G, Stoken A, Borovec J, Laughing C, Hogan A, Wang A, Diaconu L, Poznanski J, Rai P, Ferriday R, Sullivan T, Xinyu W, (2020), “Ultralytics/YOLOV5: V3.0. [Online]”, Available: https://github.com/ultralytics/yolov5, doi:10.5281/zenodo.3983579. Kasar T, Kumar J (2007), “Font and Background Color Independent Text-Binarization”, International Workshop on Camera Based Document Analysis and Recognition, pp. 3–9. Kittler J, Illingworth J, and Föglein J (1985), “Threshold Selection based on a Simple Image Statistic”, Computer Vision Graphics and Image Processing, pages. 125-147 References By Mr Nisarg Gandhewar
  • 38. References Liao M, Shi B, Bai X (2018), “TextBoxes++: A Single-Shot Oriented Scene-Text-Detector”, arXiv:1801.02765 , Submitted to Cornell University. Liao M, Wan Z, Yao C (2020), “Real-time Scene-Text-Detection with Differentiable Binarization”, Conference on Artificial Intelligence. Liao M, Shi B, Bai X, Wang X, Liu W (2017), “TextBoxes a fast text detector with a single deep neural network”, In: Proceedings of association for the advancement of artificial intelligence,pp 1–7 Liao M, Zhu Z, Shi B, Xia G, Bai X (2018), “Rotation-sensitive regression for oriented scene text detection”, In: IEEE conference on computer vision and pattern recognition, pp 1–10 Lin H, Yang P (2019),”Review of Scene-Text-Detection and Recognition”, Archives of Computational Methods in Engg. Liu X, Meng G, Pan C (2019), “Scene-text-detection and recognition with advances in deep-learning: a survey”, IJDAR. Liu Y, Zhang L, Luo C, Zhang S (2019), “Curved scene text detection via transverse and longitudinal sequence connection”,. Pattern Recognition Journal. Liu Y, Jin L (2017), “Deep matching prior network toward tighter multi-oriented text detection”, In: IEEE conference on computer vision and pattern recognition, pp 3454–3461 Liu W, Anguelov D, Erhan D, Szegedy C, Reed S (2016), “SSD: single shot multibox detector. In”, European conference on computer vision, pp 21–37 Li X, Wang W, Hou W, Liu RZ, Lu T (2018), “Shape robust text detection with progressive scale expansion network”, pp 1–12. arXiv :1806.02559 Lyu P, Yao C, Wu W, Yan S, Bai X (2018), “Multi-oriented scene text detection via corner localization and region segmentation”, IEEE Conf. Comp. Vis. Patt. Recog. (CVPR). Ma J, Shao W, Ye H, Wang L, Wang H (2017), “Arbitrary-oriented scene text detection via rotation proposals” IEEE Trans Multimed 20:1–9 39. By Mr Nisarg Gandhewar
  • 39. Qin S, Manduchi R (2017), “Cascaded segmentation-detection networks for word-level text spotting”, In: International conference on document analysis and recognition, pp 1275–1282. Rong X, Yi C, Tian Y (2017), “Unambiguous text-localization and retrieval for cluttered scenes”, In CVPR, pp. 3279–3287. Sauvola J and Pietikäinen M (2000), “Adaptive document image binarization”, Pattern Recognition, Volume 33, Issue 2. Shi, X. Bai, and C. Yao (2016), “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition”, IEEE Trans. Pattern Anal. Mach. Intell.. Shi, X. Wang, P. Lv, C. Yao, and X. Bai (2016), “Robust scene text recognition with automatic rectification”, In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. Shi B, Bai X, Belongie S (2017), “Detecting oriented text in natural images by linking segments”, In: IEEE conference on computer vision and pattern recognition, pp 2482–3490. Tian Z , Shu M , Lyu P, Li R, Zhou C (2019), “Learning Shape-Aware Embedding for Scene Text Detection”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Tian Z, Huang W, He T, He P, Qiao Y (2016), “Detecting text in natural image with connectionist text proposal network”, In: European conference on computer vision, pp 56–72. Wang W, Xie E, Zang Y, Lu T (2019), “Efficient and Accurate Arbitrary-Shaped Text-Detection with Pixel Aggregation Network”, ICCV. References By Mr Nisarg Gandhewar
  • 40. Wu Y, Kirillov A, Massa F, Lo W, Girshick R (2019), “Detectron2”, url: https://github.com /facebookresearch/ detectron2. Xue C, Lu S, Zhan F (2018), “Accurate Scene-Text-Detection through Border Semantics Awareness and Bootstrapping”, ECCV. Yang Q, Cheng M, Zhou W, Chen Y, Qiu M (2018), “IncepText: a new inception-text module with deformable psroi pooling for multi-oriented scene text detection”, In: International joint conference on artificial intelligence, pp 1–7 Yao C, Bai X, Sang N, Zhou X, Zhou S (2016), “SceneText detection via holistic, multi- channel prediction”, arXiv :1606.09002 pp 1–10 Zhang Z , Shen W, Yao C (2015), “Symmetry-based text line detection in natural scenes”, In CVPR, pp. 2558–2567. Zhang Z, Zhang C, Shen W, Yao C, Liu W (2016), “Multioriented text detection with fully convolutional networks”, In:Computer vision & pattern recognition, pp 4159–4167. Zhong Z, Sun L, Huo Q (2017), “Improved localization accuracy by locnet for faster r-cnn based text detection”, International Conference on Document Analysis and Recognition. Zhong Z, Jin L, Zhang S, Feng Z (2016), “DeepText a unified framework for text proposal generation and text detection in natural images”, pp 1–12. arXiv :1605.07314 v1. Zhou X, Yao C, Wen H, Wang Y, Zhou S (2017), “EAST an efficient and accurate scene text detector”, In: IEEE conference on computer vision and pattern recognition, pp 2642–2651. References By Mr Nisarg Gandhewar
  • 41. Thanks By Nisarg Gandhewar S. R. Tandan Rohit Miri 41