SlideShare a Scribd company logo
1 of 21
REAL-TIME SCENE TEXT
LOCALIZATION AND RECOGNITION
project done by:
208H1A0454 PICHIKA MANOHAR
208H1A0416 CHEEDEPUDI G S PRAVEEN BABU
208H1A0422 DEVARAPALLI CHANDRASEKHAR
218H5A0411 KATARI SRINIVASRAO
CONTENTS
 INTRODUCTION
 PREVIOUS WORK
 IMAGE PROCESSING
 MATLAB SOFTWARE
 RESULT ANALYSIS
 CONCLUSION
 REFERENCES
INTRODUCTION
 Scene text recognition (STR) has become an increasing hot research field in computer vision recently, as manifested
by the prosperity of recent ”robust reading” competitions ICDAR [1] in every two years, along with the workshop
about Camera Based Document Analysis and Recognition (CBDAR).
 With an extensive demand for the information identification, STR technology has a large-scale applications in
automatically logistics distribution, geographical positioning, license plate recognition, and driverless applications.
 Text detection [2] is a common task in image analysis, while text recognition [3] is a more advanced task for it not
only should localize the text spatially which belongs to object detection, but also recognize the text, i.e., text spotting
[4].
 Compared to the traditional well-formatted document text detection and recognition, natural text detection and
recognition is a challenging topic in the visual detection task due to multilingual, text sizes, font tilt, blurring,
background interference, handwriting, various angles and so on, as shown
Previous Work
 Numerous methods which focus solely on text localization in real-world images have been published [6, 2, 7, 17].
The method of Epstein et al. in [5] converts an input image to a greyscale space and uses Canny detector [1] to find
edges.
 Pairs of parallel edges are then used to calculate stroke width for each pixel and pixels with similar stroke width are
grouped together into characters. The method is sensitive to noise and blurry images because it is dependent on a
successful edge detection and it provides only single segmentation for each character which not necessarily might
be the best one for an OCR module. A similar edge based approach with different connected component algorithm
is presented in [24].
 A good overview of the methods and their performance can be also found in ICDAR Robust Reading competition
results [10, 9, 20]. Only a few methods that perform both text localization and recognition have been published. The
method of Wang Figure 2.
 Text localization and recognition overview.
(a) Source 2MPx image.
(b) Intensity channel extracted.
(c) ERs selected in ON by the first stage of the sequential classifier.
(d) ERs selected by the second stage of the classifier.
(e) Text lines found by region grouping.
(f) Only ERs in text lines selected and text recognized by an OCR module.
(g) Number of ERs at the end of each stage and its duration.
IMAGE PROCESSING
 The term digital image refers to processing of a two dimensional picture by a digital computer. In a
broader context, it implies digital processing of any two dimensional data.
 A digital image is an array of real or complex numbers represented by a finite number of bits. An
image given in the form of a transparency, slide, photograph or an X-ray is first digitized and stored as
a matrix of binary digits in computer memory.
 This digitized image can then be processed and/or displayed on a high-resolution television monitor.
 For display, the image is stored in a rapid-access buffer memory, which refreshes the monitor at a rate
of 25 frames per second to produce a visually continuous display
RGB IMAGE
 An RGB color image is an M*N*3 array of color pixels where each color pixel is triplet
corresponding to the red, green and blue components of an RGB image, at a specific spatial location.
 An RGB image may be viewed as “stack” of three gray scale images that when fed in to the red,
green and blue inputs of a color monitor Produce a color image on the screen.
 Convention the three images forming an RGB color image are referred to as the red, green and blue
components images.
 The data class of the components images determines their range of values. If an RGB image is of class
double the range of values is [0, 1]
 A normal grey scale image has 8 bit color depth = 256 grey scales.
 A true color image has 24 bit color depth = 8 x 8 x 8 bits = 256 x 256 x 256 colors = ~16 million
colors
BINARY IMAGE
 The elementary type of image representation is called the "Binary image". It typical uses only two levels.
 The two levels are referred to as black and white which are mentioned as ‘1’ and ‘0’. This kind of image representation is
considered like 1 bit per pixel image. This is suitable to an reason which considers barely single digit number to signify
every pel.
 These types of images are often used to depict low level information of the picture like its outline or shape. Especially in
applications like representation in optical character (OCR) where the only outline character required realizing the letter
representing it.
 The digital images are generated from the system of images of gray scale through a technique called thresholding.
 The two-level thresholding simply acts as a decision factor above which it switches to numerical '1' and below which it
switches to numerical '0'.
(a) (b) (c)
IMAGE OF GRAY SCALE
 Images of Gray scale (GS) were denoted as neutral or single-color picture. Such images possess
information of brightness merely.
 Hence color data is contained by them is empty. However, the brightness is represented at different
levels. Typical 8-bit image holds a range of 0-255 brightness levels known as gray levels.
 Here 0 refers to black and 1 refers to white. The 8-bit depiction is obvious with the reality to a
computer actually handles the data in 8-bit format. Below Fig.5.7 (a) and (b) are two examples of
such GSI.
(a) (b) (c)
COLOUR IMAGE
 Color image (CI) which modeled as triple band single chromatic light information, here every band
of information will keeps in touch with various dissimilar colors.
 The following figure illustrates the vector as an arrow that adds, which refers an individual smallest
unit measurement of red, green, blue principles like a color vector ---(R, G, B ).
(a) (b) (c)
 A shading pixel vector comprises of the red, green and blue pixel esteems (R, G, B) at one given
line/section pixel facilitate (r, c)
 A multispectral picture is one that catches picture information at particular frequencies over the
electromagnetic range. Multispectral pictures regularly contain data outside the typical human
perceptual range. This may incorporate infrared, bright, X-beam, acoustic or radar information.
Foundation of these kinds of picture incorporates satellite frameworks, submerged sonar frameworks
and medicinal diagnostics imaging frameworks.
MATLAB SOFTWARE
 MATLAB® is a high a performance language for technical computing. It integrates computation,
visualization, and programming in an easy-to-use environment where problems and solutions are
expressed in familiar mathematical notation.
 Typical uses include
Math and computation.
Algorithm development
Data acquisition
Modelling, simulation, and prototyping
Data analysis, exploration, and visualization
Scientific and engineering graphics
Application development, including graphical user interface building.
 MATLAB is an interactive system whose basic data element is an array that does not require
dimensioning. This allows you to solve many technical computing problems. Especially those with
matrix and vector formulations, in a fraction of the time it would take to write a program in a scalar
non-interactive language such as C or FORTRAN.
 The name MATLAB stands for matrix laboratory. MATLAB was originally written to provide easy
access to matrix software developed by the LINPACK and EISPACK projects. Today, MATLAB
engines incorporate the LAPACK and BLAS libraries, embedding the state of the art in software for
matrix computation.
 MATLAB has evolved over a period of years with input from many users. in university environments,
it is the standard instructional tool for introductory and advanced courses in mathematics engineering,
and science. In industry, MAAB is the tool of for high productivity research, development, and
analysis.
MATLAB DESKTOP
 Mat lab Desktop is the main at lab application window. The desktop contains sub windows, the
command window, the workspace browser, the current directory window, the command history
window, and one or more figure windows, which are shown only when the user displays a graphic.
EXISTING METHOD
 The method is able to cope with noisy data, but its generality is limited as a lexicon of words (which
contains at most 500 words in their experiments) has to be supplied for each individual image.
Methods presented in [14, 15] detect characters as Maximally Stable Extremal Regions (MSERs) [11]
and perform text recognition using the segmentation obtained by the MSER detector. An MSER is an
particular case of
 Extremal Region whose size remains virtually unchanged over a range of thresholds. The methods
perform well but have problems on blurry images or characters with low contrast. According to the
description provided by the ICDAR 2011 Robust Reading competition organizers [20] the winning
method is based on MSER detection, but the method
PROPOSED METHOD
 The proposed methodology is described in four subsections. In Section II-A, we calculate the product of
Laplacian and Sobel operations on the input image to enhance the text details and it is called the Laplacian–
Sobel product (LSP) process. The Bayesian classifier is used for classifying true text pixels based on three
probable matrices, as described in Section II-B. The three probable matrices are obtained on the basis of LSP
such that high contrast pixels in LSP are classified as text pixels (HLSP), K-means with k = 2 of maximum
gradient difference of HLSP (K-MGD-HLSP), and K-means of LSP (between maximum and minimum values of
a sliding window over HLSP.
 Posterior probability estimation and text candidates. (a) TPM. (b) NTPM. (c) Bayesian result. (d) Text
candidates.
 Boundary growing method. (a) BGM for components. (b) BGM for first line. (c) BGM for second line.
(d) BGM for third line. (e) BGM for third line and false positives. (f) BGM for false positives. (g) False
positives shown. (h) False positive elimination
EXPERIMENT RESULT AND ANALYSIS
 An end-to-end real-time text localization and recognition method is presented in the paper. In the first stage of the classification, the
probability of each ER being a character is estimated using novel features calculated with O(1) complexity and only ERs with locally
maximal probability are selected for the second stage, where the classification is improved using more computationally expensive
features. It is demonstrated that including the novel gradient magnitude projection ERs cover 94.8% of characters. The average run
time of the method on a 800 × 600 image is 0.3s on a standard PC. however direct comparison is not possible as the method of Wang et
al. uses a different task formulation and a different evaluation protocol. Robustness of
CONCLUSION
 In this paper, we proposed a new video scene text detection method that made use of a new
enhancement method using Laplacian and Sobel operations of input images to enhance low contrast
text pixels. A Bayesian classifier was used to classify true text pixels from the enhanced text matrix
without a priori knowledge of the input image.
 Three probable text matrices and three probable non text matrices were derived based on clustering
and the result of enhancement method. To traverse the multi oriented text, we proposed a boundary
growing method based on the nearest neighbor concept.
 Experimentation and comparative study showed that the proposed method outperformed the existing
methods in terms of measures, especially on complex nonhorizontal data. However, there are few
problems in handling false positives.
 We planned to extend this method to detection of curve-shaped text lines with good recall, precision,
F-measures, and low computational times. Notwithstanding the current limitations that we will deal
with in our future research, the contribution of this paper lies in our continued effort in detecting
multi oriented text lines in videos, which hitherto has not been well explored by others.
REFERENCES
 [1] J. Canny. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 8:679–698, 1986.
 [2] X. Chen and A. L. Yuille. Detecting and reading text in natural scenes. CVPR, 2:366–373, 2004.
 [3] H. Cheng, X. Jiang, Y. Sun, and J. Wang. Colour image segmentation: advances and prospects. Pattern Recognition,
34(12):2259 – 2281, 2001.
 [4] N. Cristianini and J. Shawe Taylor. An introduction to Support Vector Machines. Cambridge University Press,
March 2000.
 [5] B. Epshtein, E. O fek, and Y. Wexler. Detecting text in natural scenes with stroke width transform. In CVPR 2010,
pages 2963 –2970.
 [6] L. Jung- Jin, P.-H. Lee, S.-W. Lee, A. Yuille, and C. Koch. Ada boost for text detection in natural scene. In ICDAR
2011, pages 429–434, 2011.
 [7] R. Li enhart and A. Wernicke. Localizing and segmenting text in images and videos. Circuits and Systems for
Video Technology, 12(4):256 –268, 2002.
 [8] H. Liu and X. Ding. Handwritten character recognition using gradient feature and quadratic classifier with multiple
discrimination schemes. In ICDAR 2005, pages 19 – 23 Vol. 1.
REAL-TIME SCENE TEXT LOCALIZATION AND RECOGNITION ppt.pptx

More Related Content

Similar to REAL-TIME SCENE TEXT LOCALIZATION AND RECOGNITION ppt.pptx

Color image steganography in YCbCr space
Color image steganography in YCbCr spaceColor image steganography in YCbCr space
Color image steganography in YCbCr space
IJECEIAES
 
11.secure compressed image transmission using self organizing feature maps
11.secure compressed image transmission using self organizing feature maps11.secure compressed image transmission using self organizing feature maps
11.secure compressed image transmission using self organizing feature maps
Alexander Decker
 

Similar to REAL-TIME SCENE TEXT LOCALIZATION AND RECOGNITION ppt.pptx (20)

Optical Character Recognition from Text Image
Optical Character Recognition from Text ImageOptical Character Recognition from Text Image
Optical Character Recognition from Text Image
 
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABFAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
 
Mobile Based Application to Scan the Number Plate and To Verify the Owner Det...
Mobile Based Application to Scan the Number Plate and To Verify the Owner Det...Mobile Based Application to Scan the Number Plate and To Verify the Owner Det...
Mobile Based Application to Scan the Number Plate and To Verify the Owner Det...
 
Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...
Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...
Comparative Analysis of Lossless Image Compression Based On Row By Row Classi...
 
Survey paper on image compression techniques
Survey paper on image compression techniquesSurvey paper on image compression techniques
Survey paper on image compression techniques
 
Image processing with matlab
Image processing with matlabImage processing with matlab
Image processing with matlab
 
Assignment-1-NF.docx
Assignment-1-NF.docxAssignment-1-NF.docx
Assignment-1-NF.docx
 
Color image steganography in YCbCr space
Color image steganography in YCbCr spaceColor image steganography in YCbCr space
Color image steganography in YCbCr space
 
11.secure compressed image transmission using self organizing feature maps
11.secure compressed image transmission using self organizing feature maps11.secure compressed image transmission using self organizing feature maps
11.secure compressed image transmission using self organizing feature maps
 
IRJET- An Optimized Approach for Deaf and Dumb People using Air Writing
IRJET- An Optimized Approach for Deaf and Dumb People using Air WritingIRJET- An Optimized Approach for Deaf and Dumb People using Air Writing
IRJET- An Optimized Approach for Deaf and Dumb People using Air Writing
 
Digital Image Processing
Digital Image ProcessingDigital Image Processing
Digital Image Processing
 
Analysis of color image features extraction using texture methods
Analysis of color image features extraction using texture methodsAnalysis of color image features extraction using texture methods
Analysis of color image features extraction using texture methods
 
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGESRANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
 
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
 
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGESRANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
 
Canny Edge Detection Algorithm on FPGA
Canny Edge Detection Algorithm on FPGA Canny Edge Detection Algorithm on FPGA
Canny Edge Detection Algorithm on FPGA
 
Canny Edge Detection Algorithm on FPGA
Canny Edge Detection Algorithm on FPGA Canny Edge Detection Algorithm on FPGA
Canny Edge Detection Algorithm on FPGA
 
C010111519
C010111519C010111519
C010111519
 
Image processing using matlab
Image processing using matlabImage processing using matlab
Image processing using matlab
 
Matlab dip
Matlab dipMatlab dip
Matlab dip
 

Recently uploaded

21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
rahulmanepalli02
 

Recently uploaded (20)

Circuit Breakers for Engineering Students
Circuit Breakers for Engineering StudentsCircuit Breakers for Engineering Students
Circuit Breakers for Engineering Students
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
 
DBMS-Report on Student management system.pptx
DBMS-Report on Student management system.pptxDBMS-Report on Student management system.pptx
DBMS-Report on Student management system.pptx
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 
Artificial Intelligence in due diligence
Artificial Intelligence in due diligenceArtificial Intelligence in due diligence
Artificial Intelligence in due diligence
 
Adsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) pptAdsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) ppt
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded Systems
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
 
Introduction-to- Metrology and Quality.pptx
Introduction-to- Metrology and Quality.pptxIntroduction-to- Metrology and Quality.pptx
Introduction-to- Metrology and Quality.pptx
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility Applications
 
Geometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdfGeometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdf
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata Model
 
Databricks Generative AI Fundamentals .pdf
Databricks Generative AI Fundamentals  .pdfDatabricks Generative AI Fundamentals  .pdf
Databricks Generative AI Fundamentals .pdf
 
Databricks Generative AI FoundationCertified.pdf
Databricks Generative AI FoundationCertified.pdfDatabricks Generative AI FoundationCertified.pdf
Databricks Generative AI FoundationCertified.pdf
 

REAL-TIME SCENE TEXT LOCALIZATION AND RECOGNITION ppt.pptx

  • 1. REAL-TIME SCENE TEXT LOCALIZATION AND RECOGNITION project done by: 208H1A0454 PICHIKA MANOHAR 208H1A0416 CHEEDEPUDI G S PRAVEEN BABU 208H1A0422 DEVARAPALLI CHANDRASEKHAR 218H5A0411 KATARI SRINIVASRAO
  • 2. CONTENTS  INTRODUCTION  PREVIOUS WORK  IMAGE PROCESSING  MATLAB SOFTWARE  RESULT ANALYSIS  CONCLUSION  REFERENCES
  • 3. INTRODUCTION  Scene text recognition (STR) has become an increasing hot research field in computer vision recently, as manifested by the prosperity of recent ”robust reading” competitions ICDAR [1] in every two years, along with the workshop about Camera Based Document Analysis and Recognition (CBDAR).  With an extensive demand for the information identification, STR technology has a large-scale applications in automatically logistics distribution, geographical positioning, license plate recognition, and driverless applications.  Text detection [2] is a common task in image analysis, while text recognition [3] is a more advanced task for it not only should localize the text spatially which belongs to object detection, but also recognize the text, i.e., text spotting [4].  Compared to the traditional well-formatted document text detection and recognition, natural text detection and recognition is a challenging topic in the visual detection task due to multilingual, text sizes, font tilt, blurring, background interference, handwriting, various angles and so on, as shown
  • 4. Previous Work  Numerous methods which focus solely on text localization in real-world images have been published [6, 2, 7, 17]. The method of Epstein et al. in [5] converts an input image to a greyscale space and uses Canny detector [1] to find edges.  Pairs of parallel edges are then used to calculate stroke width for each pixel and pixels with similar stroke width are grouped together into characters. The method is sensitive to noise and blurry images because it is dependent on a successful edge detection and it provides only single segmentation for each character which not necessarily might be the best one for an OCR module. A similar edge based approach with different connected component algorithm is presented in [24].  A good overview of the methods and their performance can be also found in ICDAR Robust Reading competition results [10, 9, 20]. Only a few methods that perform both text localization and recognition have been published. The method of Wang Figure 2.  Text localization and recognition overview. (a) Source 2MPx image. (b) Intensity channel extracted. (c) ERs selected in ON by the first stage of the sequential classifier. (d) ERs selected by the second stage of the classifier. (e) Text lines found by region grouping. (f) Only ERs in text lines selected and text recognized by an OCR module. (g) Number of ERs at the end of each stage and its duration.
  • 5. IMAGE PROCESSING  The term digital image refers to processing of a two dimensional picture by a digital computer. In a broader context, it implies digital processing of any two dimensional data.  A digital image is an array of real or complex numbers represented by a finite number of bits. An image given in the form of a transparency, slide, photograph or an X-ray is first digitized and stored as a matrix of binary digits in computer memory.  This digitized image can then be processed and/or displayed on a high-resolution television monitor.  For display, the image is stored in a rapid-access buffer memory, which refreshes the monitor at a rate of 25 frames per second to produce a visually continuous display
  • 6. RGB IMAGE  An RGB color image is an M*N*3 array of color pixels where each color pixel is triplet corresponding to the red, green and blue components of an RGB image, at a specific spatial location.  An RGB image may be viewed as “stack” of three gray scale images that when fed in to the red, green and blue inputs of a color monitor Produce a color image on the screen.  Convention the three images forming an RGB color image are referred to as the red, green and blue components images.  The data class of the components images determines their range of values. If an RGB image is of class double the range of values is [0, 1]  A normal grey scale image has 8 bit color depth = 256 grey scales.  A true color image has 24 bit color depth = 8 x 8 x 8 bits = 256 x 256 x 256 colors = ~16 million colors
  • 7. BINARY IMAGE  The elementary type of image representation is called the "Binary image". It typical uses only two levels.  The two levels are referred to as black and white which are mentioned as ‘1’ and ‘0’. This kind of image representation is considered like 1 bit per pixel image. This is suitable to an reason which considers barely single digit number to signify every pel.  These types of images are often used to depict low level information of the picture like its outline or shape. Especially in applications like representation in optical character (OCR) where the only outline character required realizing the letter representing it.  The digital images are generated from the system of images of gray scale through a technique called thresholding.  The two-level thresholding simply acts as a decision factor above which it switches to numerical '1' and below which it switches to numerical '0'. (a) (b) (c)
  • 8. IMAGE OF GRAY SCALE  Images of Gray scale (GS) were denoted as neutral or single-color picture. Such images possess information of brightness merely.  Hence color data is contained by them is empty. However, the brightness is represented at different levels. Typical 8-bit image holds a range of 0-255 brightness levels known as gray levels.  Here 0 refers to black and 1 refers to white. The 8-bit depiction is obvious with the reality to a computer actually handles the data in 8-bit format. Below Fig.5.7 (a) and (b) are two examples of such GSI. (a) (b) (c)
  • 9. COLOUR IMAGE  Color image (CI) which modeled as triple band single chromatic light information, here every band of information will keeps in touch with various dissimilar colors.  The following figure illustrates the vector as an arrow that adds, which refers an individual smallest unit measurement of red, green, blue principles like a color vector ---(R, G, B ). (a) (b) (c)
  • 10.  A shading pixel vector comprises of the red, green and blue pixel esteems (R, G, B) at one given line/section pixel facilitate (r, c)  A multispectral picture is one that catches picture information at particular frequencies over the electromagnetic range. Multispectral pictures regularly contain data outside the typical human perceptual range. This may incorporate infrared, bright, X-beam, acoustic or radar information. Foundation of these kinds of picture incorporates satellite frameworks, submerged sonar frameworks and medicinal diagnostics imaging frameworks.
  • 11. MATLAB SOFTWARE  MATLAB® is a high a performance language for technical computing. It integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation.  Typical uses include Math and computation. Algorithm development Data acquisition Modelling, simulation, and prototyping Data analysis, exploration, and visualization Scientific and engineering graphics Application development, including graphical user interface building.
  • 12.  MATLAB is an interactive system whose basic data element is an array that does not require dimensioning. This allows you to solve many technical computing problems. Especially those with matrix and vector formulations, in a fraction of the time it would take to write a program in a scalar non-interactive language such as C or FORTRAN.  The name MATLAB stands for matrix laboratory. MATLAB was originally written to provide easy access to matrix software developed by the LINPACK and EISPACK projects. Today, MATLAB engines incorporate the LAPACK and BLAS libraries, embedding the state of the art in software for matrix computation.  MATLAB has evolved over a period of years with input from many users. in university environments, it is the standard instructional tool for introductory and advanced courses in mathematics engineering, and science. In industry, MAAB is the tool of for high productivity research, development, and analysis.
  • 13. MATLAB DESKTOP  Mat lab Desktop is the main at lab application window. The desktop contains sub windows, the command window, the workspace browser, the current directory window, the command history window, and one or more figure windows, which are shown only when the user displays a graphic.
  • 14. EXISTING METHOD  The method is able to cope with noisy data, but its generality is limited as a lexicon of words (which contains at most 500 words in their experiments) has to be supplied for each individual image. Methods presented in [14, 15] detect characters as Maximally Stable Extremal Regions (MSERs) [11] and perform text recognition using the segmentation obtained by the MSER detector. An MSER is an particular case of  Extremal Region whose size remains virtually unchanged over a range of thresholds. The methods perform well but have problems on blurry images or characters with low contrast. According to the description provided by the ICDAR 2011 Robust Reading competition organizers [20] the winning method is based on MSER detection, but the method
  • 15. PROPOSED METHOD  The proposed methodology is described in four subsections. In Section II-A, we calculate the product of Laplacian and Sobel operations on the input image to enhance the text details and it is called the Laplacian– Sobel product (LSP) process. The Bayesian classifier is used for classifying true text pixels based on three probable matrices, as described in Section II-B. The three probable matrices are obtained on the basis of LSP such that high contrast pixels in LSP are classified as text pixels (HLSP), K-means with k = 2 of maximum gradient difference of HLSP (K-MGD-HLSP), and K-means of LSP (between maximum and minimum values of a sliding window over HLSP.
  • 16.  Posterior probability estimation and text candidates. (a) TPM. (b) NTPM. (c) Bayesian result. (d) Text candidates.  Boundary growing method. (a) BGM for components. (b) BGM for first line. (c) BGM for second line. (d) BGM for third line. (e) BGM for third line and false positives. (f) BGM for false positives. (g) False positives shown. (h) False positive elimination
  • 17. EXPERIMENT RESULT AND ANALYSIS  An end-to-end real-time text localization and recognition method is presented in the paper. In the first stage of the classification, the probability of each ER being a character is estimated using novel features calculated with O(1) complexity and only ERs with locally maximal probability are selected for the second stage, where the classification is improved using more computationally expensive features. It is demonstrated that including the novel gradient magnitude projection ERs cover 94.8% of characters. The average run time of the method on a 800 × 600 image is 0.3s on a standard PC. however direct comparison is not possible as the method of Wang et al. uses a different task formulation and a different evaluation protocol. Robustness of
  • 18.
  • 19. CONCLUSION  In this paper, we proposed a new video scene text detection method that made use of a new enhancement method using Laplacian and Sobel operations of input images to enhance low contrast text pixels. A Bayesian classifier was used to classify true text pixels from the enhanced text matrix without a priori knowledge of the input image.  Three probable text matrices and three probable non text matrices were derived based on clustering and the result of enhancement method. To traverse the multi oriented text, we proposed a boundary growing method based on the nearest neighbor concept.  Experimentation and comparative study showed that the proposed method outperformed the existing methods in terms of measures, especially on complex nonhorizontal data. However, there are few problems in handling false positives.  We planned to extend this method to detection of curve-shaped text lines with good recall, precision, F-measures, and low computational times. Notwithstanding the current limitations that we will deal with in our future research, the contribution of this paper lies in our continued effort in detecting multi oriented text lines in videos, which hitherto has not been well explored by others.
  • 20. REFERENCES  [1] J. Canny. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8:679–698, 1986.  [2] X. Chen and A. L. Yuille. Detecting and reading text in natural scenes. CVPR, 2:366–373, 2004.  [3] H. Cheng, X. Jiang, Y. Sun, and J. Wang. Colour image segmentation: advances and prospects. Pattern Recognition, 34(12):2259 – 2281, 2001.  [4] N. Cristianini and J. Shawe Taylor. An introduction to Support Vector Machines. Cambridge University Press, March 2000.  [5] B. Epshtein, E. O fek, and Y. Wexler. Detecting text in natural scenes with stroke width transform. In CVPR 2010, pages 2963 –2970.  [6] L. Jung- Jin, P.-H. Lee, S.-W. Lee, A. Yuille, and C. Koch. Ada boost for text detection in natural scene. In ICDAR 2011, pages 429–434, 2011.  [7] R. Li enhart and A. Wernicke. Localizing and segmenting text in images and videos. Circuits and Systems for Video Technology, 12(4):256 –268, 2002.  [8] H. Liu and X. Ding. Handwritten character recognition using gradient feature and quadratic classifier with multiple discrimination schemes. In ICDAR 2005, pages 19 – 23 Vol. 1.