SlideShare a Scribd company logo
REAL-TIME SCENE TEXT
LOCALIZATION AND RECOGNITION
project done by:
208H1A0454 PICHIKA MANOHAR
208H1A0416 CHEEDEPUDI G S PRAVEEN BABU
208H1A0422 DEVARAPALLI CHANDRASEKHAR
218H5A0411 KATARI SRINIVASRAO
CONTENTS
 INTRODUCTION
 PREVIOUS WORK
 IMAGE PROCESSING
 MATLAB SOFTWARE
 RESULT ANALYSIS
 CONCLUSION
 REFERENCES
INTRODUCTION
 Scene text recognition (STR) has become an increasing hot research field in computer vision recently, as manifested
by the prosperity of recent ”robust reading” competitions ICDAR [1] in every two years, along with the workshop
about Camera Based Document Analysis and Recognition (CBDAR).
 With an extensive demand for the information identification, STR technology has a large-scale applications in
automatically logistics distribution, geographical positioning, license plate recognition, and driverless applications.
 Text detection [2] is a common task in image analysis, while text recognition [3] is a more advanced task for it not
only should localize the text spatially which belongs to object detection, but also recognize the text, i.e., text spotting
[4].
 Compared to the traditional well-formatted document text detection and recognition, natural text detection and
recognition is a challenging topic in the visual detection task due to multilingual, text sizes, font tilt, blurring,
background interference, handwriting, various angles and so on, as shown
Previous Work
 Numerous methods which focus solely on text localization in real-world images have been published [6, 2, 7, 17].
The method of Epstein et al. in [5] converts an input image to a greyscale space and uses Canny detector [1] to find
edges.
 Pairs of parallel edges are then used to calculate stroke width for each pixel and pixels with similar stroke width are
grouped together into characters. The method is sensitive to noise and blurry images because it is dependent on a
successful edge detection and it provides only single segmentation for each character which not necessarily might
be the best one for an OCR module. A similar edge based approach with different connected component algorithm
is presented in [24].
 A good overview of the methods and their performance can be also found in ICDAR Robust Reading competition
results [10, 9, 20]. Only a few methods that perform both text localization and recognition have been published. The
method of Wang Figure 2.
 Text localization and recognition overview.
(a) Source 2MPx image.
(b) Intensity channel extracted.
(c) ERs selected in ON by the first stage of the sequential classifier.
(d) ERs selected by the second stage of the classifier.
(e) Text lines found by region grouping.
(f) Only ERs in text lines selected and text recognized by an OCR module.
(g) Number of ERs at the end of each stage and its duration.
IMAGE PROCESSING
 The term digital image refers to processing of a two dimensional picture by a digital computer. In a
broader context, it implies digital processing of any two dimensional data.
 A digital image is an array of real or complex numbers represented by a finite number of bits. An
image given in the form of a transparency, slide, photograph or an X-ray is first digitized and stored as
a matrix of binary digits in computer memory.
 This digitized image can then be processed and/or displayed on a high-resolution television monitor.
 For display, the image is stored in a rapid-access buffer memory, which refreshes the monitor at a rate
of 25 frames per second to produce a visually continuous display
RGB IMAGE
 An RGB color image is an M*N*3 array of color pixels where each color pixel is triplet
corresponding to the red, green and blue components of an RGB image, at a specific spatial location.
 An RGB image may be viewed as “stack” of three gray scale images that when fed in to the red,
green and blue inputs of a color monitor Produce a color image on the screen.
 Convention the three images forming an RGB color image are referred to as the red, green and blue
components images.
 The data class of the components images determines their range of values. If an RGB image is of class
double the range of values is [0, 1]
 A normal grey scale image has 8 bit color depth = 256 grey scales.
 A true color image has 24 bit color depth = 8 x 8 x 8 bits = 256 x 256 x 256 colors = ~16 million
colors
BINARY IMAGE
 The elementary type of image representation is called the "Binary image". It typical uses only two levels.
 The two levels are referred to as black and white which are mentioned as ‘1’ and ‘0’. This kind of image representation is
considered like 1 bit per pixel image. This is suitable to an reason which considers barely single digit number to signify
every pel.
 These types of images are often used to depict low level information of the picture like its outline or shape. Especially in
applications like representation in optical character (OCR) where the only outline character required realizing the letter
representing it.
 The digital images are generated from the system of images of gray scale through a technique called thresholding.
 The two-level thresholding simply acts as a decision factor above which it switches to numerical '1' and below which it
switches to numerical '0'.
(a) (b) (c)
IMAGE OF GRAY SCALE
 Images of Gray scale (GS) were denoted as neutral or single-color picture. Such images possess
information of brightness merely.
 Hence color data is contained by them is empty. However, the brightness is represented at different
levels. Typical 8-bit image holds a range of 0-255 brightness levels known as gray levels.
 Here 0 refers to black and 1 refers to white. The 8-bit depiction is obvious with the reality to a
computer actually handles the data in 8-bit format. Below Fig.5.7 (a) and (b) are two examples of
such GSI.
(a) (b) (c)
COLOUR IMAGE
 Color image (CI) which modeled as triple band single chromatic light information, here every band
of information will keeps in touch with various dissimilar colors.
 The following figure illustrates the vector as an arrow that adds, which refers an individual smallest
unit measurement of red, green, blue principles like a color vector ---(R, G, B ).
(a) (b) (c)
 A shading pixel vector comprises of the red, green and blue pixel esteems (R, G, B) at one given
line/section pixel facilitate (r, c)
 A multispectral picture is one that catches picture information at particular frequencies over the
electromagnetic range. Multispectral pictures regularly contain data outside the typical human
perceptual range. This may incorporate infrared, bright, X-beam, acoustic or radar information.
Foundation of these kinds of picture incorporates satellite frameworks, submerged sonar frameworks
and medicinal diagnostics imaging frameworks.
MATLAB SOFTWARE
 MATLAB® is a high a performance language for technical computing. It integrates computation,
visualization, and programming in an easy-to-use environment where problems and solutions are
expressed in familiar mathematical notation.
 Typical uses include
Math and computation.
Algorithm development
Data acquisition
Modelling, simulation, and prototyping
Data analysis, exploration, and visualization
Scientific and engineering graphics
Application development, including graphical user interface building.
 MATLAB is an interactive system whose basic data element is an array that does not require
dimensioning. This allows you to solve many technical computing problems. Especially those with
matrix and vector formulations, in a fraction of the time it would take to write a program in a scalar
non-interactive language such as C or FORTRAN.
 The name MATLAB stands for matrix laboratory. MATLAB was originally written to provide easy
access to matrix software developed by the LINPACK and EISPACK projects. Today, MATLAB
engines incorporate the LAPACK and BLAS libraries, embedding the state of the art in software for
matrix computation.
 MATLAB has evolved over a period of years with input from many users. in university environments,
it is the standard instructional tool for introductory and advanced courses in mathematics engineering,
and science. In industry, MAAB is the tool of for high productivity research, development, and
analysis.
MATLAB DESKTOP
 Mat lab Desktop is the main at lab application window. The desktop contains sub windows, the
command window, the workspace browser, the current directory window, the command history
window, and one or more figure windows, which are shown only when the user displays a graphic.
EXISTING METHOD
 The method is able to cope with noisy data, but its generality is limited as a lexicon of words (which
contains at most 500 words in their experiments) has to be supplied for each individual image.
Methods presented in [14, 15] detect characters as Maximally Stable Extremal Regions (MSERs) [11]
and perform text recognition using the segmentation obtained by the MSER detector. An MSER is an
particular case of
 Extremal Region whose size remains virtually unchanged over a range of thresholds. The methods
perform well but have problems on blurry images or characters with low contrast. According to the
description provided by the ICDAR 2011 Robust Reading competition organizers [20] the winning
method is based on MSER detection, but the method
PROPOSED METHOD
 The proposed methodology is described in four subsections. In Section II-A, we calculate the product of
Laplacian and Sobel operations on the input image to enhance the text details and it is called the Laplacian–
Sobel product (LSP) process. The Bayesian classifier is used for classifying true text pixels based on three
probable matrices, as described in Section II-B. The three probable matrices are obtained on the basis of LSP
such that high contrast pixels in LSP are classified as text pixels (HLSP), K-means with k = 2 of maximum
gradient difference of HLSP (K-MGD-HLSP), and K-means of LSP (between maximum and minimum values of
a sliding window over HLSP.
 Posterior probability estimation and text candidates. (a) TPM. (b) NTPM. (c) Bayesian result. (d) Text
candidates.
 Boundary growing method. (a) BGM for components. (b) BGM for first line. (c) BGM for second line.
(d) BGM for third line. (e) BGM for third line and false positives. (f) BGM for false positives. (g) False
positives shown. (h) False positive elimination
EXPERIMENT RESULT AND ANALYSIS
 An end-to-end real-time text localization and recognition method is presented in the paper. In the first stage of the classification, the
probability of each ER being a character is estimated using novel features calculated with O(1) complexity and only ERs with locally
maximal probability are selected for the second stage, where the classification is improved using more computationally expensive
features. It is demonstrated that including the novel gradient magnitude projection ERs cover 94.8% of characters. The average run
time of the method on a 800 × 600 image is 0.3s on a standard PC. however direct comparison is not possible as the method of Wang et
al. uses a different task formulation and a different evaluation protocol. Robustness of
CONCLUSION
 In this paper, we proposed a new video scene text detection method that made use of a new
enhancement method using Laplacian and Sobel operations of input images to enhance low contrast
text pixels. A Bayesian classifier was used to classify true text pixels from the enhanced text matrix
without a priori knowledge of the input image.
 Three probable text matrices and three probable non text matrices were derived based on clustering
and the result of enhancement method. To traverse the multi oriented text, we proposed a boundary
growing method based on the nearest neighbor concept.
 Experimentation and comparative study showed that the proposed method outperformed the existing
methods in terms of measures, especially on complex nonhorizontal data. However, there are few
problems in handling false positives.
 We planned to extend this method to detection of curve-shaped text lines with good recall, precision,
F-measures, and low computational times. Notwithstanding the current limitations that we will deal
with in our future research, the contribution of this paper lies in our continued effort in detecting
multi oriented text lines in videos, which hitherto has not been well explored by others.
REFERENCES
 [1] J. Canny. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 8:679–698, 1986.
 [2] X. Chen and A. L. Yuille. Detecting and reading text in natural scenes. CVPR, 2:366–373, 2004.
 [3] H. Cheng, X. Jiang, Y. Sun, and J. Wang. Colour image segmentation: advances and prospects. Pattern Recognition,
34(12):2259 – 2281, 2001.
 [4] N. Cristianini and J. Shawe Taylor. An introduction to Support Vector Machines. Cambridge University Press,
March 2000.
 [5] B. Epshtein, E. O fek, and Y. Wexler. Detecting text in natural scenes with stroke width transform. In CVPR 2010,
pages 2963 –2970.
 [6] L. Jung- Jin, P.-H. Lee, S.-W. Lee, A. Yuille, and C. Koch. Ada boost for text detection in natural scene. In ICDAR
2011, pages 429–434, 2011.
 [7] R. Li enhart and A. Wernicke. Localizing and segmenting text in images and videos. Circuits and Systems for
Video Technology, 12(4):256 –268, 2002.
 [8] H. Liu and X. Ding. Handwritten character recognition using gradient feature and quadratic classifier with multiple
discrimination schemes. In ICDAR 2005, pages 19 – 23 Vol. 1.
REAL-TIME SCENE TEXT LOCALIZATION AND RECOGNITION ppt.pptx

More Related Content

Similar to REAL-TIME SCENE TEXT LOCALIZATION AND RECOGNITION ppt.pptx

Optical Character Recognition from Text Image
Optical Character Recognition from Text ImageOptical Character Recognition from Text Image
Optical Character Recognition from Text Image
Editor IJCATR
 
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABFAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
Journal For Research
 
Mobile Based Application to Scan the Number Plate and To Verify the Owner Det...
Mobile Based Application to Scan the Number Plate and To Verify the Owner Det...Mobile Based Application to Scan the Number Plate and To Verify the Owner Det...
Mobile Based Application to Scan the Number Plate and To Verify the Owner Det...
inventionjournals
 
Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...
Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...
Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...
IJERA Editor
 
Survey paper on image compression techniques
Survey paper on image compression techniquesSurvey paper on image compression techniques
Survey paper on image compression techniques
IRJET Journal
 
Image processing with matlab
Image processing with matlabImage processing with matlab
Image processing with matlab
minhtaispkt
 
Assignment-1-NF.docx
Assignment-1-NF.docxAssignment-1-NF.docx
Assignment-1-NF.docx
KhondokerAbuNaim
 
Color image steganography in YCbCr space
Color image steganography in YCbCr spaceColor image steganography in YCbCr space
Color image steganography in YCbCr space
IJECEIAES
 
11.secure compressed image transmission using self organizing feature maps
11.secure compressed image transmission using self organizing feature maps11.secure compressed image transmission using self organizing feature maps
11.secure compressed image transmission using self organizing feature maps
Alexander Decker
 
IRJET- An Optimized Approach for Deaf and Dumb People using Air Writing
IRJET- An Optimized Approach for Deaf and Dumb People using Air WritingIRJET- An Optimized Approach for Deaf and Dumb People using Air Writing
IRJET- An Optimized Approach for Deaf and Dumb People using Air Writing
IRJET Journal
 
Digital Image Processing
Digital Image ProcessingDigital Image Processing
Digital Image Processing
Ankur Nanda
 
Analysis of color image features extraction using texture methods
Analysis of color image features extraction using texture methodsAnalysis of color image features extraction using texture methods
Analysis of color image features extraction using texture methods
TELKOMNIKA JOURNAL
 
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGESRANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
ijcseit
 
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
ijcseit
 
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGESRANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
IJCSEIT Journal
 
Canny Edge Detection Algorithm on FPGA
Canny Edge Detection Algorithm on FPGA Canny Edge Detection Algorithm on FPGA
Canny Edge Detection Algorithm on FPGA
IOSR Journals
 
C010111519
C010111519C010111519
C010111519
IOSR Journals
 
Canny Edge Detection Algorithm on FPGA
Canny Edge Detection Algorithm on FPGA Canny Edge Detection Algorithm on FPGA
Canny Edge Detection Algorithm on FPGA
IOSR Journals
 
Image processing using matlab
Image processing using matlabImage processing using matlab
Image processing using matlab
dedik dafiyanto
 
Matlab dip
Matlab dipMatlab dip
Matlab dip
Jeevan Reddy
 

Similar to REAL-TIME SCENE TEXT LOCALIZATION AND RECOGNITION ppt.pptx (20)

Optical Character Recognition from Text Image
Optical Character Recognition from Text ImageOptical Character Recognition from Text Image
Optical Character Recognition from Text Image
 
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABFAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
 
Mobile Based Application to Scan the Number Plate and To Verify the Owner Det...
Mobile Based Application to Scan the Number Plate and To Verify the Owner Det...Mobile Based Application to Scan the Number Plate and To Verify the Owner Det...
Mobile Based Application to Scan the Number Plate and To Verify the Owner Det...
 
Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...
Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...
Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...
 
Survey paper on image compression techniques
Survey paper on image compression techniquesSurvey paper on image compression techniques
Survey paper on image compression techniques
 
Image processing with matlab
Image processing with matlabImage processing with matlab
Image processing with matlab
 
Assignment-1-NF.docx
Assignment-1-NF.docxAssignment-1-NF.docx
Assignment-1-NF.docx
 
Color image steganography in YCbCr space
Color image steganography in YCbCr spaceColor image steganography in YCbCr space
Color image steganography in YCbCr space
 
11.secure compressed image transmission using self organizing feature maps
11.secure compressed image transmission using self organizing feature maps11.secure compressed image transmission using self organizing feature maps
11.secure compressed image transmission using self organizing feature maps
 
IRJET- An Optimized Approach for Deaf and Dumb People using Air Writing
IRJET- An Optimized Approach for Deaf and Dumb People using Air WritingIRJET- An Optimized Approach for Deaf and Dumb People using Air Writing
IRJET- An Optimized Approach for Deaf and Dumb People using Air Writing
 
Digital Image Processing
Digital Image ProcessingDigital Image Processing
Digital Image Processing
 
Analysis of color image features extraction using texture methods
Analysis of color image features extraction using texture methodsAnalysis of color image features extraction using texture methods
Analysis of color image features extraction using texture methods
 
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGESRANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
 
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
 
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGESRANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
 
Canny Edge Detection Algorithm on FPGA
Canny Edge Detection Algorithm on FPGA Canny Edge Detection Algorithm on FPGA
Canny Edge Detection Algorithm on FPGA
 
C010111519
C010111519C010111519
C010111519
 
Canny Edge Detection Algorithm on FPGA
Canny Edge Detection Algorithm on FPGA Canny Edge Detection Algorithm on FPGA
Canny Edge Detection Algorithm on FPGA
 
Image processing using matlab
Image processing using matlabImage processing using matlab
Image processing using matlab
 
Matlab dip
Matlab dipMatlab dip
Matlab dip
 

Recently uploaded

Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
ssuser36d3051
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
gerogepatton
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
wisnuprabawa3
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
jpsjournal1
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
IJNSA Journal
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
awadeshbabu
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
yokeleetan1
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 

Recently uploaded (20)

Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 

REAL-TIME SCENE TEXT LOCALIZATION AND RECOGNITION ppt.pptx

  • 1. REAL-TIME SCENE TEXT LOCALIZATION AND RECOGNITION project done by: 208H1A0454 PICHIKA MANOHAR 208H1A0416 CHEEDEPUDI G S PRAVEEN BABU 208H1A0422 DEVARAPALLI CHANDRASEKHAR 218H5A0411 KATARI SRINIVASRAO
  • 2. CONTENTS  INTRODUCTION  PREVIOUS WORK  IMAGE PROCESSING  MATLAB SOFTWARE  RESULT ANALYSIS  CONCLUSION  REFERENCES
  • 3. INTRODUCTION  Scene text recognition (STR) has become an increasing hot research field in computer vision recently, as manifested by the prosperity of recent ”robust reading” competitions ICDAR [1] in every two years, along with the workshop about Camera Based Document Analysis and Recognition (CBDAR).  With an extensive demand for the information identification, STR technology has a large-scale applications in automatically logistics distribution, geographical positioning, license plate recognition, and driverless applications.  Text detection [2] is a common task in image analysis, while text recognition [3] is a more advanced task for it not only should localize the text spatially which belongs to object detection, but also recognize the text, i.e., text spotting [4].  Compared to the traditional well-formatted document text detection and recognition, natural text detection and recognition is a challenging topic in the visual detection task due to multilingual, text sizes, font tilt, blurring, background interference, handwriting, various angles and so on, as shown
  • 4. Previous Work  Numerous methods which focus solely on text localization in real-world images have been published [6, 2, 7, 17]. The method of Epstein et al. in [5] converts an input image to a greyscale space and uses Canny detector [1] to find edges.  Pairs of parallel edges are then used to calculate stroke width for each pixel and pixels with similar stroke width are grouped together into characters. The method is sensitive to noise and blurry images because it is dependent on a successful edge detection and it provides only single segmentation for each character which not necessarily might be the best one for an OCR module. A similar edge based approach with different connected component algorithm is presented in [24].  A good overview of the methods and their performance can be also found in ICDAR Robust Reading competition results [10, 9, 20]. Only a few methods that perform both text localization and recognition have been published. The method of Wang Figure 2.  Text localization and recognition overview. (a) Source 2MPx image. (b) Intensity channel extracted. (c) ERs selected in ON by the first stage of the sequential classifier. (d) ERs selected by the second stage of the classifier. (e) Text lines found by region grouping. (f) Only ERs in text lines selected and text recognized by an OCR module. (g) Number of ERs at the end of each stage and its duration.
  • 5. IMAGE PROCESSING  The term digital image refers to processing of a two dimensional picture by a digital computer. In a broader context, it implies digital processing of any two dimensional data.  A digital image is an array of real or complex numbers represented by a finite number of bits. An image given in the form of a transparency, slide, photograph or an X-ray is first digitized and stored as a matrix of binary digits in computer memory.  This digitized image can then be processed and/or displayed on a high-resolution television monitor.  For display, the image is stored in a rapid-access buffer memory, which refreshes the monitor at a rate of 25 frames per second to produce a visually continuous display
  • 6. RGB IMAGE  An RGB color image is an M*N*3 array of color pixels where each color pixel is triplet corresponding to the red, green and blue components of an RGB image, at a specific spatial location.  An RGB image may be viewed as “stack” of three gray scale images that when fed in to the red, green and blue inputs of a color monitor Produce a color image on the screen.  Convention the three images forming an RGB color image are referred to as the red, green and blue components images.  The data class of the components images determines their range of values. If an RGB image is of class double the range of values is [0, 1]  A normal grey scale image has 8 bit color depth = 256 grey scales.  A true color image has 24 bit color depth = 8 x 8 x 8 bits = 256 x 256 x 256 colors = ~16 million colors
  • 7. BINARY IMAGE  The elementary type of image representation is called the "Binary image". It typical uses only two levels.  The two levels are referred to as black and white which are mentioned as ‘1’ and ‘0’. This kind of image representation is considered like 1 bit per pixel image. This is suitable to an reason which considers barely single digit number to signify every pel.  These types of images are often used to depict low level information of the picture like its outline or shape. Especially in applications like representation in optical character (OCR) where the only outline character required realizing the letter representing it.  The digital images are generated from the system of images of gray scale through a technique called thresholding.  The two-level thresholding simply acts as a decision factor above which it switches to numerical '1' and below which it switches to numerical '0'. (a) (b) (c)
  • 8. IMAGE OF GRAY SCALE  Images of Gray scale (GS) were denoted as neutral or single-color picture. Such images possess information of brightness merely.  Hence color data is contained by them is empty. However, the brightness is represented at different levels. Typical 8-bit image holds a range of 0-255 brightness levels known as gray levels.  Here 0 refers to black and 1 refers to white. The 8-bit depiction is obvious with the reality to a computer actually handles the data in 8-bit format. Below Fig.5.7 (a) and (b) are two examples of such GSI. (a) (b) (c)
  • 9. COLOUR IMAGE  Color image (CI) which modeled as triple band single chromatic light information, here every band of information will keeps in touch with various dissimilar colors.  The following figure illustrates the vector as an arrow that adds, which refers an individual smallest unit measurement of red, green, blue principles like a color vector ---(R, G, B ). (a) (b) (c)
  • 10.  A shading pixel vector comprises of the red, green and blue pixel esteems (R, G, B) at one given line/section pixel facilitate (r, c)  A multispectral picture is one that catches picture information at particular frequencies over the electromagnetic range. Multispectral pictures regularly contain data outside the typical human perceptual range. This may incorporate infrared, bright, X-beam, acoustic or radar information. Foundation of these kinds of picture incorporates satellite frameworks, submerged sonar frameworks and medicinal diagnostics imaging frameworks.
  • 11. MATLAB SOFTWARE  MATLAB® is a high a performance language for technical computing. It integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation.  Typical uses include Math and computation. Algorithm development Data acquisition Modelling, simulation, and prototyping Data analysis, exploration, and visualization Scientific and engineering graphics Application development, including graphical user interface building.
  • 12.  MATLAB is an interactive system whose basic data element is an array that does not require dimensioning. This allows you to solve many technical computing problems. Especially those with matrix and vector formulations, in a fraction of the time it would take to write a program in a scalar non-interactive language such as C or FORTRAN.  The name MATLAB stands for matrix laboratory. MATLAB was originally written to provide easy access to matrix software developed by the LINPACK and EISPACK projects. Today, MATLAB engines incorporate the LAPACK and BLAS libraries, embedding the state of the art in software for matrix computation.  MATLAB has evolved over a period of years with input from many users. in university environments, it is the standard instructional tool for introductory and advanced courses in mathematics engineering, and science. In industry, MAAB is the tool of for high productivity research, development, and analysis.
  • 13. MATLAB DESKTOP  Mat lab Desktop is the main at lab application window. The desktop contains sub windows, the command window, the workspace browser, the current directory window, the command history window, and one or more figure windows, which are shown only when the user displays a graphic.
  • 14. EXISTING METHOD  The method is able to cope with noisy data, but its generality is limited as a lexicon of words (which contains at most 500 words in their experiments) has to be supplied for each individual image. Methods presented in [14, 15] detect characters as Maximally Stable Extremal Regions (MSERs) [11] and perform text recognition using the segmentation obtained by the MSER detector. An MSER is an particular case of  Extremal Region whose size remains virtually unchanged over a range of thresholds. The methods perform well but have problems on blurry images or characters with low contrast. According to the description provided by the ICDAR 2011 Robust Reading competition organizers [20] the winning method is based on MSER detection, but the method
  • 15. PROPOSED METHOD  The proposed methodology is described in four subsections. In Section II-A, we calculate the product of Laplacian and Sobel operations on the input image to enhance the text details and it is called the Laplacian– Sobel product (LSP) process. The Bayesian classifier is used for classifying true text pixels based on three probable matrices, as described in Section II-B. The three probable matrices are obtained on the basis of LSP such that high contrast pixels in LSP are classified as text pixels (HLSP), K-means with k = 2 of maximum gradient difference of HLSP (K-MGD-HLSP), and K-means of LSP (between maximum and minimum values of a sliding window over HLSP.
  • 16.  Posterior probability estimation and text candidates. (a) TPM. (b) NTPM. (c) Bayesian result. (d) Text candidates.  Boundary growing method. (a) BGM for components. (b) BGM for first line. (c) BGM for second line. (d) BGM for third line. (e) BGM for third line and false positives. (f) BGM for false positives. (g) False positives shown. (h) False positive elimination
  • 17. EXPERIMENT RESULT AND ANALYSIS  An end-to-end real-time text localization and recognition method is presented in the paper. In the first stage of the classification, the probability of each ER being a character is estimated using novel features calculated with O(1) complexity and only ERs with locally maximal probability are selected for the second stage, where the classification is improved using more computationally expensive features. It is demonstrated that including the novel gradient magnitude projection ERs cover 94.8% of characters. The average run time of the method on a 800 × 600 image is 0.3s on a standard PC. however direct comparison is not possible as the method of Wang et al. uses a different task formulation and a different evaluation protocol. Robustness of
  • 18.
  • 19. CONCLUSION  In this paper, we proposed a new video scene text detection method that made use of a new enhancement method using Laplacian and Sobel operations of input images to enhance low contrast text pixels. A Bayesian classifier was used to classify true text pixels from the enhanced text matrix without a priori knowledge of the input image.  Three probable text matrices and three probable non text matrices were derived based on clustering and the result of enhancement method. To traverse the multi oriented text, we proposed a boundary growing method based on the nearest neighbor concept.  Experimentation and comparative study showed that the proposed method outperformed the existing methods in terms of measures, especially on complex nonhorizontal data. However, there are few problems in handling false positives.  We planned to extend this method to detection of curve-shaped text lines with good recall, precision, F-measures, and low computational times. Notwithstanding the current limitations that we will deal with in our future research, the contribution of this paper lies in our continued effort in detecting multi oriented text lines in videos, which hitherto has not been well explored by others.
  • 20. REFERENCES  [1] J. Canny. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8:679–698, 1986.  [2] X. Chen and A. L. Yuille. Detecting and reading text in natural scenes. CVPR, 2:366–373, 2004.  [3] H. Cheng, X. Jiang, Y. Sun, and J. Wang. Colour image segmentation: advances and prospects. Pattern Recognition, 34(12):2259 – 2281, 2001.  [4] N. Cristianini and J. Shawe Taylor. An introduction to Support Vector Machines. Cambridge University Press, March 2000.  [5] B. Epshtein, E. O fek, and Y. Wexler. Detecting text in natural scenes with stroke width transform. In CVPR 2010, pages 2963 –2970.  [6] L. Jung- Jin, P.-H. Lee, S.-W. Lee, A. Yuille, and C. Koch. Ada boost for text detection in natural scene. In ICDAR 2011, pages 429–434, 2011.  [7] R. Li enhart and A. Wernicke. Localizing and segmenting text in images and videos. Circuits and Systems for Video Technology, 12(4):256 –268, 2002.  [8] H. Liu and X. Ding. Handwritten character recognition using gradient feature and quadratic classifier with multiple discrimination schemes. In ICDAR 2005, pages 19 – 23 Vol. 1.