In natural scene images, text detection is a challenging study area for dissimilar content-based image analysis tasks. In this paper, a Bayesian network scores are used to classify candidate character regions by computing posterior probabilities. The posterior probabilities are used to define an adaptive threshold to detect text in scene images with accuracy. Therefore, candidate character regions are extracted through maximally stable extremal region. K2 algorithm-based Bayesian network scores are learned by evaluating dependencies amongst features of a given candidate character region. Bayesian logistic regression classifier is trained to compute posterior probabilities to define an adaptive classifier threshold. The candidate character regions below from adaptive classifier threshold are discarded as non-character regions. Finally, text regions are detected with the use of effective text localization scheme based on geometric features. The entire system is evaluated on the ICDAR 2013 competition database. Experimental results produce competitive performance (precision, recall and harmonic mean) with the recently published algorithms.
A NOVEL FEATURE SET FOR RECOGNITION OF SIMILAR SHAPED HANDWRITTEN HINDI CHARA...cscpconf
The growing need of handwritten Hindi character recognition in Indian offices such as passport, railway etc, has made it a vital area of research. Similar shaped characters are more prone to misclassification. In this paper four Machine Learning (ML) algorithms namely Bayesian Network, Radial Basis Function Network (RBFN), Multilayer Perceptron (MLP), and C4.5 Decision Tree are used for recognition of Similar Shaped Handwritten Hindi Characters (SSHHC) and their performance is compared. A novel feature set of 85 features is generated on the basis of character geometry. Due to the high dimensionality of feature vector, the classifiers can be computationally complex. So, its dimensionality is reduced to 11 and 4 using Correlation-Based (CFS) and Consistency-Based (CON) feature selection techniques respectively. Experimental results show that Bayesian Network is a better choice when used with CFS while C4.5 gives better performance with CON features.
Image to Text Converter PPT. PPT contains step by step algorithms/methods to which we can convert images in to text , specially contains algorithms for images which contains human handwritting, can convert writting in to text, img to text.
Improved wolf algorithm on document images detection using optimum mean techn...journalBEEI
Detection text from handwriting in historical documents provides high-level features for the challenging problem of handwriting recognition. Such handwriting often contains noise, faint or incomplete strokes, strokes with gaps, and competing lines when embedded in a table or form, making it unsuitable for local line following algorithms or associated binarization schemes. In this paper, a proposed method based on the optimum threshold value and namely as the Optimum Mean method was presented. Besides, Wolf method unsuccessful in order to detect the thin text in the non-uniform input image. However, the proposed method was suggested to overcome the Wolf method problem by suggesting a maximum threshold value using optimum mean. Based on the calculation, the proposed method obtained a higher F-measure (74.53), PSNR (14.77) and lowest NRM (0.11) compared to the Wolf method. In conclusion, the proposed method successful and effective to solve the wolf problem by producing a high-quality output image.
Text detection and recognition in scene images or natural images has applications in computer
vision systems like registration number plate detection, automatic traffic sign detection, image retrieval
and help for visually impaired people. Scene text, however, has complicated background, blur image,
partly occluded text, variations in font-styles, image noise and ranging illumination. Hence scene text
recognition could be a difficult computer vision problem. In this paper connected component method
is used to extract the text from background. In this work, horizontal and vertical projection profiles,
geometric properties of text, image binirization and gap filling method are used to extract the text from
scene images. Then histogram based threshold is applied to separate text background of the images.
Finally text is extracted from images.
Facial image retrieval on semantic features using adaptive mean genetic algor...TELKOMNIKA JOURNAL
The emergence of larger databases has made image retrieval techniques an essential component and has led to the development of more efficient image retrieval systems. Retrieval can either be content or text-based. In this paper, the focus is on the content-based image retrieval from the FGNET database. Input query images are subjected to several processing techniques in the database before computing the squared Euclidean distance (SED) between them. The images with the shortest Euclidean distance are considered as a match and are retrieved. The processing techniques involve the application of the median modified Weiner filter (MMWF), extraction of the low-level features using histogram-oriented gradients (HOG), discrete wavelet transform (DWT), GIST, and Local tetra pattern (LTrP). Finally, the features are selected using Adaptive Mean Genetic Algorithm (AMGA). In this study, the average PSNR value obtained after applying the Wiener filter was 45.29. The performance of the AMGA was evaluated based on its precision, F-measure, and recall, and the obtained average values were respectively 0.75, 0.692, and 0.66. The performance matrix of the AMGA was compared to those of particle swarm optimization algorithm (PSO) and genetic algorithm (GA) and found to perform better; thus, proving its efficiency.
A NOVEL FEATURE SET FOR RECOGNITION OF SIMILAR SHAPED HANDWRITTEN HINDI CHARA...cscpconf
The growing need of handwritten Hindi character recognition in Indian offices such as passport, railway etc, has made it a vital area of research. Similar shaped characters are more prone to misclassification. In this paper four Machine Learning (ML) algorithms namely Bayesian Network, Radial Basis Function Network (RBFN), Multilayer Perceptron (MLP), and C4.5 Decision Tree are used for recognition of Similar Shaped Handwritten Hindi Characters (SSHHC) and their performance is compared. A novel feature set of 85 features is generated on the basis of character geometry. Due to the high dimensionality of feature vector, the classifiers can be computationally complex. So, its dimensionality is reduced to 11 and 4 using Correlation-Based (CFS) and Consistency-Based (CON) feature selection techniques respectively. Experimental results show that Bayesian Network is a better choice when used with CFS while C4.5 gives better performance with CON features.
Image to Text Converter PPT. PPT contains step by step algorithms/methods to which we can convert images in to text , specially contains algorithms for images which contains human handwritting, can convert writting in to text, img to text.
Improved wolf algorithm on document images detection using optimum mean techn...journalBEEI
Detection text from handwriting in historical documents provides high-level features for the challenging problem of handwriting recognition. Such handwriting often contains noise, faint or incomplete strokes, strokes with gaps, and competing lines when embedded in a table or form, making it unsuitable for local line following algorithms or associated binarization schemes. In this paper, a proposed method based on the optimum threshold value and namely as the Optimum Mean method was presented. Besides, Wolf method unsuccessful in order to detect the thin text in the non-uniform input image. However, the proposed method was suggested to overcome the Wolf method problem by suggesting a maximum threshold value using optimum mean. Based on the calculation, the proposed method obtained a higher F-measure (74.53), PSNR (14.77) and lowest NRM (0.11) compared to the Wolf method. In conclusion, the proposed method successful and effective to solve the wolf problem by producing a high-quality output image.
Text detection and recognition in scene images or natural images has applications in computer
vision systems like registration number plate detection, automatic traffic sign detection, image retrieval
and help for visually impaired people. Scene text, however, has complicated background, blur image,
partly occluded text, variations in font-styles, image noise and ranging illumination. Hence scene text
recognition could be a difficult computer vision problem. In this paper connected component method
is used to extract the text from background. In this work, horizontal and vertical projection profiles,
geometric properties of text, image binirization and gap filling method are used to extract the text from
scene images. Then histogram based threshold is applied to separate text background of the images.
Finally text is extracted from images.
Facial image retrieval on semantic features using adaptive mean genetic algor...TELKOMNIKA JOURNAL
The emergence of larger databases has made image retrieval techniques an essential component and has led to the development of more efficient image retrieval systems. Retrieval can either be content or text-based. In this paper, the focus is on the content-based image retrieval from the FGNET database. Input query images are subjected to several processing techniques in the database before computing the squared Euclidean distance (SED) between them. The images with the shortest Euclidean distance are considered as a match and are retrieved. The processing techniques involve the application of the median modified Weiner filter (MMWF), extraction of the low-level features using histogram-oriented gradients (HOG), discrete wavelet transform (DWT), GIST, and Local tetra pattern (LTrP). Finally, the features are selected using Adaptive Mean Genetic Algorithm (AMGA). In this study, the average PSNR value obtained after applying the Wiener filter was 45.29. The performance of the AMGA was evaluated based on its precision, F-measure, and recall, and the obtained average values were respectively 0.75, 0.692, and 0.66. The performance matrix of the AMGA was compared to those of particle swarm optimization algorithm (PSO) and genetic algorithm (GA) and found to perform better; thus, proving its efficiency.
NEW ONTOLOGY RETRIEVAL IMAGE METHOD IN 5K COREL IMAGESijcax
Semantic annotation of images is an important research topic on both image understanding and database or web image search. Image annotation is a technique to choosing appropriate labels for images with
extracting effective and hidden feature in pictures. In the feature extraction step of proposed method, we
present a model, which combined effective features of visual topics (global features over an image) and
regional contexts (relationship between the regions in Image and each other regions images) to automatic image annotation.In the annotation step of proposed method, we create a new ontology (base on WordNet ontology) for the semantic relationships between tags in the classification and improving semantic gap exist in the automatic image annotation.Experiments result on the 5k Corel dataset show the proposed
method of image annotation in addition to reducing the complexity of the classification, increased accuracy
compared to the another methods.
Texture features based text extraction from images using DWT and K-means clus...Divya Gera
Text extraction from different kind of images document, caption and scene text images. Discret wavelet transform was used to exract horizontal, vertical and diagonal features and k-means clustering was used to cluster the features into text and background cluster. For simple images k = 2 worked i.e. text and backgroud cluster while for complex images k=3 was used i.e. text cluster, complex background ad simple background.
Dimensionality Reduction and Feature Selection Methods for Script Identificat...ITIIIndustries
The goal of this research is to explore effects of dimensionality reduction and feature selection on the problem of script identification from images of printed documents. The kadjacent segment is ideal for this use due to its ability to capture visual patterns. We have used principle component analysis to reduce the size of our feature matrix to a handier size that can be trained easily, and experimented by including varying combinations of dimensions of the super feature set. A modular
approach in neural network was used to classify 7 languages – Arabic, Chinese, English, Japanese, Tamil, Thai and Korean.
Text detection and recognition from natural sceneshemanthmcqueen
Text characters in natural scenes and surroundings provide us with valuable information about the place and even provide us with some legal/important information. Hence it’s very important for us to detect such text and recognise them which helps a lot. But , it’s not really easy to recognize those text information because of the diverse backgrounds and fonts used for the text. In this paper, a method is proposed to extract the text information from the surroundings. First, a character descriptor is designed with existing standard detectors and descriptors. Then, character structure is modeled at each character class by designing stroke configuration maps.In natural scenes , the text part is generally found on nearby sign boards and other objects. The extraction of such text is difficult because of noisy backgrounds and diverse fonts and text sizes. But many applications have been proven to be efficient in extraction of text from surroundings. For this , the method of text extraction is divided into two processes;
Text detection
Text recognition
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Imageproc...sunda2011
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd
IEEE projects, final year projects, students project, be project, engineering projects, academic project, project center in madurai, trichy, chennai, kollam, coimbatore
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Review of ocr techniques used in automatic mail sorting of postal envelopessipij
This paper presents a review of various OCR techniq
ues used in the automatic mail sorting process. A
complete description on various existing methods fo
r address block extraction and digit recognition th
at
were used in the literature is discussed. The objec
tive of this study is to provide a complete overvie
w about
the methods and techniques used by many researchers
for automating the mail sorting process in postal
service in various countries. The significance of Z
ip code or Pincode recognition is discussed.
A Study of BFLOAT16 for Deep Learning TrainingSubhajit Sahu
Highlighted notes of:
A Study of BFLOAT16 for Deep Learning Training
This paper presents the first comprehensive empirical study demonstrating the efficacy of the Brain Floating Point (BFLOAT16) half-precision format for DeepLearning training across image classification, speech recognition, language model-ing, generative networks, and industrial recommendation systems. BFLOAT16 is attractive for Deep Learning training for two reasons: the range of values it can represent is the same as that of IEEE 754 floating-point format (FP32) and conversion to/from FP32 is simple. Maintaining the same range as FP32 is important to ensure that no hyper-parameter tuning is required for convergence; e.g., IEEE 754compliant half-precision floating point (FP16) requires hyper-parameter tuning. In this paper, we discuss the flow of tensors and various key operations in mixed-precision training and delve into details of operations, such as the rounding modes for converting FP32 tensors to BFLOAT16. We have implemented a method to emulate BFLOAT16 operations in Tensorflow, Caffe2, IntelCaffe, and Neon for our experiments. Our results show that deep learning training using BFLOAT16tensors achieves the same state-of-the-art (SOTA) results across domains as FP32tensors in the same number of iterations and with no changes to hyper-parameters.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A Multi Criteria Decision Making Based Approach for Semantic Image Annotationijcax
Automatic image annotation has emerged as an important research topic due to its potential application on both image understanding and web image search. This paper presents a model, which integrates visual topics and regional contexts to automatic image annotation. Regional contexts model the relationship between the regions, while visual topics provide the global distribution of topics over an image. Previous image annotation methods neglected the relationship between the regions in an image, while these regions are exactly explanation of the image semantics, therefore considering the relationship between them are helpful to annotate the images. Regional contexts and visual topics are learned by PLSA (Probability
Latent Semantic Analysis) from the training data. The proposed model incorporates these two types of information by MCDM (Multi Criteria Decision Making) approach based on WSM (Weighted Sum Method). Experiments conducted on the 5k Corel dataset demonstrate the effectiveness of the proposed
model.
Speeded-up and Compact Visual Codebook for Object RecognitionCSCJournals
The well known framework in the object recognition literature uses local information extracted at several patches in images which are then clustered by a suitable clustering technique. A visual codebook maps the patch-based descriptors into a fixed-length vector in histogram space to which standard classifiers can be directly applied. Thus, the construction of a codebook is an important step which is usually done by cluster analysis. However, it is still difficult to construct a compact codebook with reduced computational cost. This paper evaluates the effectiveness and generalisation performance of the Resource-Allocating Codebook (RAC) approach that overcomes the problem of constructing fixed size codebooks that can be used at any time in the learning process and the learning patterns do not have to be repeated. It either allocates a new codeword based on the novelty of a newly seen pattern, or adapts the codebook to fit that observation. Furthermore, we improve RAC to yield codebooks that are more compact. We compare and contrast the recognition performance of RAC evaluated with two distinctive feature descriptors: SIFT and SURF and two clustering techniques: K-means and Fast Reciprocal Nearest Neighbours (fast-RNN) algorithms. SVM is used in classifying the image signatures. The entire visual object recognition pipeline has been tested on three benchmark datasets: PASCAL visual object classes challenge 2007, UIUC texture, and MPEG-7 Part-B silhouette image datasets. Experimental results show that RAC is suitable for constructing codebooks due to its wider span of the feature space. Moreover, RAC takes only one-pass through the entire data that slightly outperforms traditional approaches at drastically reduced computing times. The modified RAC performs slightly better than RAC and gives more compact codebook. Future research should focus on designing more discriminative and compact codebooks such as RAC rather than focusing on methods tuned to achieve high performance in classification.
Dominating set based arbitrary oriented bilingual scene text localizationIJECEIAES
Localizing and recognizing arbitrarily oriented text in natural scene images is the biggest challenge. It is because scene texts are often erratic in shapes. This paper presents a simple and effective graph representational algorithm for detecting arbitrary-oriented text location to smoothen the text recognition process because of its high impact and simplicity of representation. An arbitrarily oriented text can be horizontal, vertical, perspective, curved (diagonal/off-diagonal), or even a combination. As a pre-processing step, image enhancement is performed in the frequency domain to improve the representation of images that are invariant to intensity. It is necessary to draw bounding boxes for each candidate character in the scene images to extract text regions. This step is carried out by utilizing the advantage of the region-based approach called maximally stable extremal regions. A typical problem with curved text localization is that non-text objects may occur within localized text regions. Our method is the first in the literature that searches for dominating sets to solve this problem. This dominating set method outperforms several traditional methods, including deep learning methods used for arbitrary text localization, on challenging datasets like 13 th international conference on document analysis and recognition (ICDAR 2015), multi-script robust reading competition (MRRC), CurvedText 80 (CUTE80), and arbitrary text (ArT).
NEW ONTOLOGY RETRIEVAL IMAGE METHOD IN 5K COREL IMAGESijcax
Semantic annotation of images is an important research topic on both image understanding and database or web image search. Image annotation is a technique to choosing appropriate labels for images with
extracting effective and hidden feature in pictures. In the feature extraction step of proposed method, we
present a model, which combined effective features of visual topics (global features over an image) and
regional contexts (relationship between the regions in Image and each other regions images) to automatic image annotation.In the annotation step of proposed method, we create a new ontology (base on WordNet ontology) for the semantic relationships between tags in the classification and improving semantic gap exist in the automatic image annotation.Experiments result on the 5k Corel dataset show the proposed
method of image annotation in addition to reducing the complexity of the classification, increased accuracy
compared to the another methods.
Texture features based text extraction from images using DWT and K-means clus...Divya Gera
Text extraction from different kind of images document, caption and scene text images. Discret wavelet transform was used to exract horizontal, vertical and diagonal features and k-means clustering was used to cluster the features into text and background cluster. For simple images k = 2 worked i.e. text and backgroud cluster while for complex images k=3 was used i.e. text cluster, complex background ad simple background.
Dimensionality Reduction and Feature Selection Methods for Script Identificat...ITIIIndustries
The goal of this research is to explore effects of dimensionality reduction and feature selection on the problem of script identification from images of printed documents. The kadjacent segment is ideal for this use due to its ability to capture visual patterns. We have used principle component analysis to reduce the size of our feature matrix to a handier size that can be trained easily, and experimented by including varying combinations of dimensions of the super feature set. A modular
approach in neural network was used to classify 7 languages – Arabic, Chinese, English, Japanese, Tamil, Thai and Korean.
Text detection and recognition from natural sceneshemanthmcqueen
Text characters in natural scenes and surroundings provide us with valuable information about the place and even provide us with some legal/important information. Hence it’s very important for us to detect such text and recognise them which helps a lot. But , it’s not really easy to recognize those text information because of the diverse backgrounds and fonts used for the text. In this paper, a method is proposed to extract the text information from the surroundings. First, a character descriptor is designed with existing standard detectors and descriptors. Then, character structure is modeled at each character class by designing stroke configuration maps.In natural scenes , the text part is generally found on nearby sign boards and other objects. The extraction of such text is difficult because of noisy backgrounds and diverse fonts and text sizes. But many applications have been proven to be efficient in extraction of text from surroundings. For this , the method of text extraction is divided into two processes;
Text detection
Text recognition
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Imageproc...sunda2011
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd
IEEE projects, final year projects, students project, be project, engineering projects, academic project, project center in madurai, trichy, chennai, kollam, coimbatore
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Review of ocr techniques used in automatic mail sorting of postal envelopessipij
This paper presents a review of various OCR techniq
ues used in the automatic mail sorting process. A
complete description on various existing methods fo
r address block extraction and digit recognition th
at
were used in the literature is discussed. The objec
tive of this study is to provide a complete overvie
w about
the methods and techniques used by many researchers
for automating the mail sorting process in postal
service in various countries. The significance of Z
ip code or Pincode recognition is discussed.
A Study of BFLOAT16 for Deep Learning TrainingSubhajit Sahu
Highlighted notes of:
A Study of BFLOAT16 for Deep Learning Training
This paper presents the first comprehensive empirical study demonstrating the efficacy of the Brain Floating Point (BFLOAT16) half-precision format for DeepLearning training across image classification, speech recognition, language model-ing, generative networks, and industrial recommendation systems. BFLOAT16 is attractive for Deep Learning training for two reasons: the range of values it can represent is the same as that of IEEE 754 floating-point format (FP32) and conversion to/from FP32 is simple. Maintaining the same range as FP32 is important to ensure that no hyper-parameter tuning is required for convergence; e.g., IEEE 754compliant half-precision floating point (FP16) requires hyper-parameter tuning. In this paper, we discuss the flow of tensors and various key operations in mixed-precision training and delve into details of operations, such as the rounding modes for converting FP32 tensors to BFLOAT16. We have implemented a method to emulate BFLOAT16 operations in Tensorflow, Caffe2, IntelCaffe, and Neon for our experiments. Our results show that deep learning training using BFLOAT16tensors achieves the same state-of-the-art (SOTA) results across domains as FP32tensors in the same number of iterations and with no changes to hyper-parameters.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A Multi Criteria Decision Making Based Approach for Semantic Image Annotationijcax
Automatic image annotation has emerged as an important research topic due to its potential application on both image understanding and web image search. This paper presents a model, which integrates visual topics and regional contexts to automatic image annotation. Regional contexts model the relationship between the regions, while visual topics provide the global distribution of topics over an image. Previous image annotation methods neglected the relationship between the regions in an image, while these regions are exactly explanation of the image semantics, therefore considering the relationship between them are helpful to annotate the images. Regional contexts and visual topics are learned by PLSA (Probability
Latent Semantic Analysis) from the training data. The proposed model incorporates these two types of information by MCDM (Multi Criteria Decision Making) approach based on WSM (Weighted Sum Method). Experiments conducted on the 5k Corel dataset demonstrate the effectiveness of the proposed
model.
Speeded-up and Compact Visual Codebook for Object RecognitionCSCJournals
The well known framework in the object recognition literature uses local information extracted at several patches in images which are then clustered by a suitable clustering technique. A visual codebook maps the patch-based descriptors into a fixed-length vector in histogram space to which standard classifiers can be directly applied. Thus, the construction of a codebook is an important step which is usually done by cluster analysis. However, it is still difficult to construct a compact codebook with reduced computational cost. This paper evaluates the effectiveness and generalisation performance of the Resource-Allocating Codebook (RAC) approach that overcomes the problem of constructing fixed size codebooks that can be used at any time in the learning process and the learning patterns do not have to be repeated. It either allocates a new codeword based on the novelty of a newly seen pattern, or adapts the codebook to fit that observation. Furthermore, we improve RAC to yield codebooks that are more compact. We compare and contrast the recognition performance of RAC evaluated with two distinctive feature descriptors: SIFT and SURF and two clustering techniques: K-means and Fast Reciprocal Nearest Neighbours (fast-RNN) algorithms. SVM is used in classifying the image signatures. The entire visual object recognition pipeline has been tested on three benchmark datasets: PASCAL visual object classes challenge 2007, UIUC texture, and MPEG-7 Part-B silhouette image datasets. Experimental results show that RAC is suitable for constructing codebooks due to its wider span of the feature space. Moreover, RAC takes only one-pass through the entire data that slightly outperforms traditional approaches at drastically reduced computing times. The modified RAC performs slightly better than RAC and gives more compact codebook. Future research should focus on designing more discriminative and compact codebooks such as RAC rather than focusing on methods tuned to achieve high performance in classification.
Dominating set based arbitrary oriented bilingual scene text localizationIJECEIAES
Localizing and recognizing arbitrarily oriented text in natural scene images is the biggest challenge. It is because scene texts are often erratic in shapes. This paper presents a simple and effective graph representational algorithm for detecting arbitrary-oriented text location to smoothen the text recognition process because of its high impact and simplicity of representation. An arbitrarily oriented text can be horizontal, vertical, perspective, curved (diagonal/off-diagonal), or even a combination. As a pre-processing step, image enhancement is performed in the frequency domain to improve the representation of images that are invariant to intensity. It is necessary to draw bounding boxes for each candidate character in the scene images to extract text regions. This step is carried out by utilizing the advantage of the region-based approach called maximally stable extremal regions. A typical problem with curved text localization is that non-text objects may occur within localized text regions. Our method is the first in the literature that searches for dominating sets to solve this problem. This dominating set method outperforms several traditional methods, including deep learning methods used for arbitrary text localization, on challenging datasets like 13 th international conference on document analysis and recognition (ICDAR 2015), multi-script robust reading competition (MRRC), CurvedText 80 (CUTE80), and arbitrary text (ArT).
ROBUST TEXT DETECTION AND EXTRACTION IN NATURAL SCENE IMAGES USING CONDITIONA...ijiert bestjournal
In Natural Scene Image,Text detection is important tasks which are used for many content based image analysis. A maximally stable external region based method is us ed for scene detection .This MSER based method incl udes stages character candidate extraction,text candida te construction,text candidate elimination & text candidate classification. Main limitations of this method are how to detect highly blurred text in low resolutio n natural scene images. The current technology not focuses on any t ext extraction method. In proposed system a Conditi onal Random field (CRF) model is used to assign candidat e component as one of the two classes (text& Non Te xt) by Considering both unary component properties and bin ary contextual component relationship. For this pur pose we are using connected component analysis method. The proposed system also performs a text extraction usi ng OCR
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...CSCJournals
The traditional approach for solving the object recognition problem requires image representations to be first extracted and then fed to a learning model such as an SVM. These representations are handcrafted and heavily engineered by running the object image through a sequence of pipeline steps which requires a good prior knowledge of the problem domain in order to engineer these representations. Moreover, since the classification is done in a separate step, the resultant handcrafted representations are not tuned by the learning model which prevents it from learning complex representations that might would give it more discriminative power. However, in end-to-end deep learning models, image representations along with the classification decision boundary are all learnt directly from the raw data requiring no prior knowledge of the problem domain. These models deeply learn the object image representation hierarchically in multiple layers corresponding to multiple levels of abstraction resulting in representations that are more discriminative and give better results on challenging benchmarks. In contrast to the traditional handcrafted representations, the performance of deep representations improves with the introduction of more data, and more learning layers (more depth) and they perform well on large-scale machine learning problems. The purpose of this study is six fold: (1) review the literature of the pipeline processes used in the previous state-of-the-art codebook model approach for tackling the problem of generic object recognition, (2) Introduce several enhancements in the local feature extraction and normalization steps of the recognition pipeline, (3) compare the enhancements proposed to different encoding methods and contrast them to previous results, (4) experiment with current state-of-the-art deep model architectures used for object recognition, (5) compare between deep representations extracted from the deep learning model and shallow representations handcrafted through the recognition pipeline, and finally, (6) improve the results further by combining multiple different deep learning models into an ensemble and taking the maximum posterior probability.
Combination of texture feature extraction and forward selection for one-clas...IJECEIAES
This study aims to validate self-portraits using one-class support vector machine (OCSVM). To validate accurately, we build a model by combining texture feature extraction methods, Haralick and local binary pattern (LBP). We also reduce irrelevant features using forward selection (FS). OCSVM was selected because it can solve the problem caused by the inadequate variation of the negative class population. In OCSVM, we only need to feed the algorithm using the true class data, and the data with pattern that does not match will be classified as false. However, combining the two feature extractions produces many features, leading to the curse of dimensionality. The FS method is used to overcome this problem by selecting the best features. From the experiments carried out, the Haralick+LBP+FS+OCSVM model outperformed other models with an accuracy of 95.25% on validation data and 91.75% on test data.
Scene text recognition in mobile applications by character descriptor and str...eSAT Journals
Abstract
Camera-based scene images usually have complex background filled with non-text objects in multiple shapes and colors. The existing system is sensitive to font scale changes and background interference. The main focusof this system is on two character recognition methods. In text detection, previously proposed algorithms are used to search for regions of text strings. Proposed system uses character descriptor which is effective to extract representative and discriminative text features for both recognition schemes. The local features descriptor HOG is compatible with all above key point detectors. Our method of scene text recognition from detected text regions is compatible with the application of mobile devices. Proposedsystem accurately extracts text from natural scene image in presence of background interference.The demo system gives us details of algorithm design and performance improvements of scene text extraction. It is ableto detect text region of text strings from cluttered and recognize characters in the text regions.
Keywords: Scene text detection, scene text recognition, character descriptor, stroke configuration, text understanding, text retrieval.
PERFORMANCE EVALUATION OF FUZZY LOGIC AND BACK PROPAGATION NEURAL NETWORK FOR...ijesajournal
ABSTRACT
Fuzzy c-mean is one of the efficient tools used in character recognition. Back propagation neural network is another powerful that may be used in such field. A comparison between fuzzy c-mean and BP neural network classifiers are presented in this research to obtain the performance of both classifiers. The comparison was based on recognition efficiency; this efficiency was evaluated as the ratio of the number of assigned characters with unknown one to the number of character set related to that character. The fuzzy C-mean and BP neural network algorithms were tested on a set of hand written and machine printed dataset named Chars74K dataset using Matlab (2016 b) programming language and the result was that neural network classifier gave 82% recognition efficiency while fuzzy c –mean gave 78%. Neural network classifier is more superior than fuzzy C-mean in recognition due to the limitations of processing time of fuzzy C-mean that requires smaller image size and eventually this will cause less efficiency.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Recognition of Persian handwritten characters has been considered as a significant field of research for
the last few years under pattern analysing technique. In this paper, a new approach for robust handwritten
Persian numerals recognition using strong feature set and a classifier fusion method is scrutinized to
increase the recognition percentage. For implementing the classifier fusion technique, we have considered
k nearest neighbour (KNN), linear classifier (LC) and support vector machine (SVM) classifiers. The
innovation of this tactic is to attain better precision with few features using classifier fusion method. For
evaluation of the proposed method we considered a Persian numerals database with 20,000 handwritten
samples. Spending 15,000 samples for training stage, we verified our technique on other 5,000 samples,
and the correct recognition ratio achievedapproximately 99.90%. Additional, we got 99.97% exactness
using four-fold cross validation procedure on 20,000 databases.
Introduction to image processing and pattern recognitionSaibee Alam
this power point presentation provides a brief introduction to image processing and pattern recognition and its related research papers including conclusion
Multilinear Kernel Mapping for Feature Dimension Reduction in Content Based M...ijma
In the process of content-based multimedia retrieval, multimedia information is processed in order to
obtain descriptive features. Descriptive representation of features, results in a huge feature count, which
results in processing overhead. To reduce this descriptive feature overhead, various approaches have been
used to dimensional reduction, among which PCA and LDA are the most used methods. However, these
methods do not reflect the significance of feature content in terms of inter-relation among all dataset
features. To achieve a dimension reduction based on histogram transformation, features with low
significance can be eliminated. In this paper, we propose a feature dimensional reduction approaches to
achieve the dimension reduction approach based on a multi-linear kernel (MLK) modeling. A benchmark
dataset for the experimental work is taken and the proposed work is observed to be improved in analysis in
comparison to the conventional system.
Similar to K2 Algorithm-based Text Detection with An Adaptive Classifier Threshold (20)
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
K2 Algorithm-based Text Detection with An Adaptive Classifier Threshold
1. Khalid Iqbal, Xu-Cheng Yin, Hong-Wei Hao, Sohail Asghar & Hazrat Ali
International Journal of Image Processing (IJIP), Volume (8) : Issue (3) : 2014 87
K2 Algorithm-based Text Detection with An Adaptive Classifier
Threshold
Khalid Iqbal kik.ustb@gmail.com
Dept. of Computer Science and Technology,
School of Computer and Communication Engineering,
University of Science and Technology Beijing,
Beijing 100083, China.
Xu-Cheng Yin xuchengyin@ustb.edu.cn
Dept. of Computer Science and Technology,
School of Computer and Communication Engineering,
University of Science and Technology Beijing,
Beijing 100083, China.
Hong-Wei Hao hongwei.hao@ia.ac.cn
Institute of Automation
Chinese Academy of Sciences,
Beijing 100190, China.
Sohail Asghar sohail.asghar@uaar.edu.pk
University Institute of Information,
PMAS-Arid Agriculture University,
Rawalpindi, Pakistan
Hazrat Ali engr.hazratali@yahoo.com
Department of Computing,
City University London,
United Kingdom.
Abstract
In natural scene images, text detection is a challenging study area for dissimilar content-based
image analysis tasks. In this paper, a Bayesian network scores are used to classify candidate
character regions by computing posterior probabilities. The posterior probabilities are used to
define an adaptive threshold to detect text in scene images with accuracy. Therefore, candidate
character regions are extracted through maximally stable extremal region. K2 algorithm-based
Bayesian network scores are learned by evaluating dependencies amongst features of a given
candidate character region. Bayesian logistic regression classifier is trained to compute posterior
probabilities to define an adaptive classifier threshold. The candidate character regions below
from adaptive classifier threshold are discarded as non-character regions. Finally, text regions are
detected with the use of effective text localization scheme based on geometric features. The
entire system is evaluated on the ICDAR 2013 competition database. Experimental results
produce competitive performance (precision, recall and harmonic mean) with the recently
published algorithms.
Keywords: Bayesian Network, Adaptive Threshold, Bayesian Logistic Regression, Scene Image.
1. INTRODUCTION
With the advent of digital cameras at consumer-end, new challenges are opened to process and
analyze the content of images. The image content captured with a camera may have several
degradations such as distorted view, uneven illumination, curved or shadow effects. However,
2. Khalid Iqbal, Xu-Cheng Yin, Hong-Wei Hao, Sohail Asghar & Hazrat Ali
International Journal of Image Processing (IJIP), Volume (8) : Issue (3) : 2014 88
image content can have useful information about vehicle license plates, landmarks recognition,
gas/electricity meters and product recognition. The exploitation of image content constitutes a
challenging research area to detect and recognize scene text. The extracted information from
scene images has to be robustly located. The determination of locating text in scene images, with
complex background, different illumination and variable fonts and size, and text orientation, refers
to text localization.
Text localization techniques can be grouped into region-based, connected component (CC)-
based [1] and hybrid methods [2]. Region-based techniques employ a sliding window to look for
image text with the use of machine learning techniques for text identification. Sliding window
based methods tend to be slow due to multi scale processing of images. A new text detection
algorithm extracts six dissimilar classes of text features. Modest AdaBoost classifier is used to
recognize text regions based on text features [3]. CC-based methods group extracted candidate
characters into text regions with similar geometric features. CC-based methods are demanding to
apply additional checks for eliminating false positives. To find CCs, stroke width for every pixel is
computed to group neighboring pixels. These CCs were screened and grouped into text regions
[4]. Pan et al. [2] proposed hybrid method that exploits image regions to detect text candidates
and extracts CCs as candidate characters by local binarization. False positive components are
eliminated efficiently with the use of conditional random field (CRFs) model. Finally, character
components are grouped into lines/words. Recently, Yin et al. [5] extracted maximally stable
extremal regions (MSERs) as letter candidates. Non-letter candidates are eliminated using
geometric information. Candidate text regions are constructed by grouping similar letter
candidates using disjoint set. For each candidate text region, vertical and horizontal variances,
color, geometry and stroke width are extracted to identify text regions using Adaboost classifier.
Besides, MSER based method is the winner of ICDAR 2011 Robust Reading Competition [6] with
promising performance.
In this paper, we proposed a new robust and accurate MSER based approach to localize text in
scene images. Firstly, MSER is extracted as candidate characters. Some of the non-character
candidates are pruned based on total number of pixels. Second, Bayesian network scores for
each candidate characters feature are obtained using K2 algorithm [7]. Third, Bayesian Logistic
Regression Classifier (BLRC) [8, 9] is built from these score based features of candidate
characters and used to identify text regions. Also, a posterior probability based adaptive threshold
is used to filter out non-character candidates. In order to use ICDAR 2013 [10] competition
dataset, candidate characters are grouped into words by using effective text localization approach
[5]. The output of each step is shown in Figure 1. Our method is tested on the ICDAR 2013 test
dataset. The experimental results show the remarkable performance in terms of accuracy.
The remainder structure of this paper is organized as follows. Section II explains the text
localization method in perspective of candidate character extraction using MSER, Bayesian
network score learning, Bayesian logistic regression classifier with an adaptive threshold in
subsequent subsection and grouping of candidate character regions. Section III briefly describes
the comparative performance of different text localization algorithm experimental results. In last
section, the paper has been concluded.
2. K2 SCORES AND ADAPTIVE THRESHOLD BASED TEXT DETECTION
METHOD
Text localization method is divided into four parts: candidate character extraction using MSER,
Bayesian network scores for each candidate character, classification of candidate characters
using Bayesian network scores with the use of adaptive threshold and group candidate
characters by eliminating false positives.
3. Khalid Iqbal, Xu-Cheng Yin, Hong-Wei Hao, Sohail Asghar & Hazrat Ali
International Journal of Image Processing (IJIP), Volume (8) : Issue (3) : 2014 89
2.1 Candidate Character Extraction with MSER
The digital camera-based images may have distortions [11] with motivation by [6, 12] to localize
text from scene images. Our method extracts MSER from scene image as candidate characters
region. MSER was proposed by Matas et al. [13] to catch matching between different viewpoints
of images. MSER is a well-known and best reported robust method against scale and light
variation [14]. Each extracted MSER is resized (f X f) to perform binarization using an adaptive
threshold. Consider an image I, and a set of resized extracted MSERs Erc referred as candidate
character region, with f number of features. Binarization is performed to eliminate extra pixels of
candidate characters using an adaptive threshold as given in equation 1. Additional check is
applied on the number of pixels of all candidate character regions to filter out as non-character
regions. The purpose of additional constraint is applied on candidate binary character region to
minimize the processing time of learning Bayesian network scores using K2 algorithm [7].
1
1
1
(1)
k
k
rcr
c
CCR k
k
τ =
=
= →
∑ ∑
Where, CCRrc represent candidate character region before application of binarization, and k=25
is the number of features.
(2)CBR CCR τ= > →
Where, CBR represent the candidate binary character region. A candidate binary character
region is discarded as non-character if it has less than or equal to 30 pixels or greater than or
equal to 600 pixels.
2.2 K2 Algorithm-based Bayesian Network Scores
Bayesian network is an extensively used model for data analysis. Bayesian networks [7] exploit a
set of features of candidate character binary region of an image to measure probabilistic
relationship graphically among features. The optimal determination of Bayesian network structure
of a given data is NP-hard problem [15]. K2 algorithm [7] is one of the widely used efficient
heuristic solutions to learn structural relationship dependencies among features according to a
specified order. The initial feature in a given order has no parents. K2 then adds incrementally
parents for a current feature by increasing the score in resulting structure. When no predecessor
features addition for a current feature as parent can increase the score, K2 stops adding parents
to feature. As ordering of features are known beforehand, so search space is minimized make it
efficient as well as parents of features can be chosen independently [16]. Consider F= {f1, f2, f3 …
fk} are the features of candidate binary character region of scene image. All the candidate binary
character regions features are ordered from left to right for simplicity. To find scores of each
feature, scoring function of K2 algorithm for i
th
features is presented by equation 3. The final
scores of all features in a network structure are obtained by multiplying the individual score of
features of candidate binary character.
( )
( )1 1 1
1 !
( , ) ! (3)
1 !
i iq rk
i
i ijk
i j zij i
r
g i
N r
π α
= = =
−
= →
+ −
∏∏ ∏
Where, ( , )ig i π represent the Bayesian network score using K2 algorithm for each feature of
candidate binary character region of an image. However ( , )ig i π (where, i=1), has no parents
and zero score. Therefore, we omit the first feature of candidate binary character region in all the
test set ICDAR 2013 images. The purpose of omitting initial feature of candidates’ binary
characters is to produce point to point product of all Bayesian network K2 scores as given in
equation 4.
4. Khalid Iqbal, Xu-Cheng Yin, Hong-Wei Hao, Sohail Asghar & Hazrat Ali
International Journal of Image Processing (IJIP), Volume (8) : Issue (3) : 2014 90
1 12 [ ( 1, )] [ ( 1, )] (4)T
score i iK F g i g iπ π− −= − − →o
2.3 Classification of Bayesian Network Scores
Bayesian Logistic Regression is an approach that models a relationship between a Bayesian
network scores and automatically labeled candidate character regions, in which a statistical
analysis is taken under Bayes rule. The application of Bayesian Logistic Regression can be
applied in pattern recognition to classify Bayesian network scores of candidate character regions
for two or more classes. The most important advantage of Bayesian Logistic Regression is its
dominating classification accuracy while estimating a probabilistic relationship between Bayesian
network scores and labeled candidate character regions.
Suppose L = {0, 1} be the labels of Bayesian network scores of candidate character regions R.
The log-likelihood ratio ( )ln ( 1/ , ) ( 0 / , )P L R P L Rω ω= = is assumed to be linear in R, such
that conditional likelihood for L=1 is given by the sigmoid
function. ( )( 1/ , ) 1 1 ( )
T
R T
P L R e Rω
ω δ ω−
= = + = − . Similarly, ( 0 / , ) 1 (P L R P Lω= = − =
( )1/ , ) 1 1
T
R
R eω
ω = + such that ( / , ) ( )T
P L R L Rω δ ω= .
Let training set τ composed of Bayesian network scores of an image candidate character regions
R and their labels L, . . {L,R}i e τ = are input parameters to Bayesian Logistic Regression
Classifier. Conditional probability ( )P ω τ and priori probability ( )P ω are used to compute
posterior probability. As sigmoid likelihood data does not permit a conjugate-exponential prior to
find posterior analytic expression. The quadratic approximation ω is used exponentially, such that
conjugate-Gaussian prior with parameterization of α as hyper-parameter. The prior of this
approximation and modeling of conjugate Gamma distribution can be represented as given by
equations (5) and (6).
1/2
2 (5)
2
11 0 0 0 (6)0
0
T
P( a ) e
a a b
P(a) b e
a
αω ωα
ω
π
α
α
= →
− −
= →
Γ
5. Khalid Iqbal, Xu-Cheng Yin, Hong-Wei Hao, Sohail Asghar & Hazrat Ali
International Journal of Image Processing (IJIP), Volume (8) : Issue (3) : 2014 91
FIGURE 1: Text Detected in ICDAR 2013 Scene Images.
2.4 Posterior Probability-based Adaptive Threshold
To filter non-character regions, an adaptive threshold is computed using posterior probabilities
computed by Bayesian Logistic regression in previous subsection. For simplicity, we use
geometric mean and standard deviation of P( a )ω to filter non-character candidate regions. An
adaptive threshold is computed by equation 7.
(7)
n n
2 2
i in
i=1 i=1n
i 2
i=1
n P( a ) -( P( a ))
T = P( a ) -3
n
ω ω
ω
→
∑ ∑
∏
The entire candidate character region of an image classified through Bayesian network scores
are eliminated if their posterior probability is less than or equal to adaptive threshold value .
2.5 Candidate Character Regions Filtering and Grouping
Aspect ratio [12] is a cascade of filter that can be used to eliminate too wide or too narrow
candidate regions. For example, Wi is a width and Hi is the height of candidate region. In this
way, candidate character regions are further reduced after adaptive threshold.
(8)i
min max
i
W
l < < l
H
→
6. Khalid Iqbal, Xu-Cheng Yin, Hong-Wei Hao, Sohail Asghar & Hazrat Ali
International Journal of Image Processing (IJIP), Volume (8) : Issue (3) : 2014 92
In all our experiments, adaptive thresholds are adapted empirically. However, aspect ratio
threshold (lmin=0.1 and lmax=10) are fixed empirically throughout the experiment. Now, candidate
character regions are grouped using effective text localization method proposed by Yin et al. [5].
This technique constructs candidate regions with the use of geometric information and group
similar candidate character region using disjoint sets. Also, vertical and horizontal variances,
color, geometry and stroke width features of each candidate character regions are extracted to
build Adaboost classifier for identification of text regions. In contrary to [5], we used Bayesian
network based scores and Bayesian logistic regression classifier with additional adaptable and
fixed constraints. The output text regions after grouping are presented in Figure 1.
3. EXPERIMENTAL RESULTS
The Robust Reading Competition was organized by ICDAR 2013 [6, 10, 17] with an objective to
deliver performances of different algorithms as a benchmark. The evaluation system used by
ICDAR 2013 demonstrated object level precision and recall of detecting algorithms according to
their superiority of detection. Our text localization method used test dataset of ICDAR 2013
Robust Reading Competition Challenge 2 Task 1 [10]. The whole dataset consisted of 233
images containing text of different fonts and colors with different backgrounds.
3.1 Performance Evaluation of Text Localization System
ICDAR 2013 dataset was used in most recent text detection competition as a benchmark. To
provide a baseline comparison, we tested our method on the publicly available dataset in [10].
The output of an algorithm is a set of rectangles designating bounding boxes for detected words
in scene images and referred as estimate as given by Figure 1. A set of ground truth boxes is
provided in the dataset which referred to targets. The match of an estimate and target by a text
detecting algorithm can be defined as the area of intersection between two rectangles divided by
the area of minimum bounding box containing both rectangles. The identical rectangles have
value one and zero for rectangles with no intersection. Therefore, a closest match was found
between the estimated and targeted rectangles to measure the performance. For this purpose,
precision, recall and harmonic mean are used to recognize the matching of intersecting
rectangles. Precision is the ratio of the number of matched detected rectangles to the total
number of matched and unmatched detected rectangles. For example, M represents the matched
detected rectangles and U represents the unmatched detected rectangles, Precision can be
defined as given by equation 9.
(9)
M
Precision
M U
= →
+
Similarly, recall can be defined as the matched detected rectangles to the total number targeted
rectangles. For example, M are the matched and T are the targeted rectangles, recall can be
defined as given by equation 10.
(10)
M
Recall =
M +T
→
Finally, F-measure can be defined as the harmonic mean of precision and recall and can
represented as given by equation 11.
2
(11)
1 1
F Measure
Precision Recall
− = →
+
7. Khalid Iqbal, Xu-Cheng Yin, Hong-Wei Hao, Sohail Asghar & Hazrat Ali
International Journal of Image Processing (IJIP), Volume (8) : Issue (3) : 2014 93
Consequently, performances of our localization method with some of the published algorithms in
ICDAR 2013 competition are presented in Table 1. Our text localization method shows a
competitive recall 62.37, precision 84.97 and a 71.94 harmonic mean, which is competitive with
the leading methods reported by [10]. However, our text localization method performs better than
the ICDAR 2011 Robust Reading Competition methods reported by [6].
Method Recall Precision HMean
USTB_TexStar [27] 66.45 88.47 75.89
Text Spotter [20], [21], [22] 64.84 87.51 74.49
CASIA_NLPR [23], [24] 68.24 78.89 73.18
Text_Detector_CASIA [25], [26] 62.85 84.70 72.16
Our Method (LocaTeXT) 62.37 84.97 71.94
I2R NUS FAR 69.00 75.08 71.91
I2R NUS 66.17 72.54 69.21
TH-TextLoc 65.19 69.96 67.49
Text Detection [18], [19] 53.42 74.15 62.10
Baseline 34.74 60.76 44.21
Inkam 35.27 31.20 33.11
TABLE 1: Performance (%) Comparison of Text Detection Methods on ICDAR 2013 Dataset.
4. REFERENCES
[1] Y.-F. Pan, X. Hou, and C.-L. Liu. A hybrid approach to detect and localize texts in natural
scene images. IEEE Transactions on Image Processing, 20(3):800 –813, March 2011.
[2] Y.-F. Pan,, X. Hou, and C.-L. Liu. "A hybrid approach to detect and localize texts in natural
scene images." Image Processing, IEEE Transactions on 20, no. 3 (2011): 800-813.
[3] J.-J. Lee, P.-H. Lee, S.-W. Lee, A. Yuille, and C. Koch. Adaboost for text detection in natural
scene. In ICDAR 2011, pages 429 –434, September 2011.
[4] B. Epshtein, E. Ofek, and Y. Wexler. Detecting text in natural scenes with stroke width
transform. In CVPR 2010, pages 2963 –2970, June 2010.
[5] X. Yin, X-C Yin, H-W. Hao, and K. Iqbal. "Effective text localization in natural scene images
with MSER, geometry-based grouping and AdaBoost." In Pattern Recognition (ICPR), 2012
21st International Conference on, pp. 725-728. IEEE, 2012.
[6] A. Shahab, F. Shafait, and A. Dengel. ICDAR 2011 robust reading competition challenge 2:
Reading text in scene images. In ICDAR 2011, pages 1491 –1496, September 2011.
[7] G. F. Cooper and E. Herskovits, A Bayesian method for the induction of probabilistic networks
from data.Machine Learning, vol. 9, no.4, pp. 309-347, 1992.
[8] K. Iqbal, X.-C. Yin, H.-W. Hao, X. Yin, H. Ali, (2013), Classifier Comparison for MSER-Based
Text Classification in Scene Images, In Neural Networks (IJCNN), The 2013 International
Joint Conference on (pp. 1-6). IEEE.
[9] Hosmer, David W.; Lemeshow, Stanley (2000). “Applied Logistic Regression” (2nd ed.).
Wiley. ISBN 0-471-35632-8.
[10] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, S. R. Mestre, J. Mas, D. F. Mota, J. A.
Almazan, and L. P. de las Heras. "ICDAR 2013 Robust Reading Competition." In Document
Analysis and Recognition (ICDAR), 2013 12th International Conference on, pp. 1484-1493.
IEEE, 2013.
[11] X.-C. Yin, H.-W. Hao, J. Sun, and S. Naoi. Robustvanishing point detection for MobileCam-
based documents. In ICDAR 2011, pages 136 – 140, September 2011.
8. Khalid Iqbal, Xu-Cheng Yin, Hong-Wei Hao, Sohail Asghar & Hazrat Ali
International Journal of Image Processing (IJIP), Volume (8) : Issue (3) : 2014 94
[12] C. Merino-Gracia, K. Lenc, M. Mirmehdi, A head-mounted device for recognizing text in
natural scenes, Camera-Based Document Analysis and Recognition, pages 29--41, 2012.
[13] J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide baseline stereo from maximally
stable extremal regions. In British Machine Vision Conference 2002, volume 1, pages 384–
393, 2002.
[14] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir,
and L. V. Gool. A Comparison of Affine Region Detectors. International Journal of Computer
Vision, 65(1):43–72, November 2005.
[15] D. M. Chickering, Learning Bayesian networks is NP-complete. In D. Fisher and H.J. Lenz,
editors, Learning from Data: Articial Intelligence and Statistics V, Springer-Verlag, pp. 121-
130, 1996.
[16] N. Friedman, D. Koller, Being Bayesian About Network Structure: A Bayesian Approach to
Structure Discovery in Bayesian Networks, Machine Learning 50 (1-2) (2003) 95-125.
[17] D. Karatzas, S. Robles Mestre, J. Mas, F. Nourbakhsh, P. Pratim Roy , "ICDAR 2011 Robust
Reading Competition - Challenge 1: Reading Text in Born-Digital Images (Web and Email)",
In Proc. 11th International Conference of Document Analysis and Recognition, 2011, IEEE
CPS, pp. 1485-1490.
[18] J. Fabrizio, B. Marcotegui, and M. Cord, “Text segmentation in natural scenes using toggle-
mapping,” in Proc. Int. Conf. on Image Processing,2009.
[19] J. Fabrizio, B. Marcotegui, M. Cord, “Text detection in street level images,” Pattern Analysis
and Applications, 2013, Volume 16, Issue 4, pp 519-533.
[20] L. Neumann and J. Matas, “A method for text localization and recognition in real-world
images,” in Proc. Asian Conf. on Computer Vision, 2010, pp. 2067–2078.
[21] L. Neumann, and J. Matas. Real-time scene text localization and recognition. in Computer
Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. 2012., pp. 3538–3545.
[22] L Neumann, J Matas, “On combining multiple segmentations in scene text recognition,” in
Proc. Int. Conf. on Document Analysis and Recognition, 2013. International Conference on
Document Analysis and Recognition (ICDAR 2013), Washington D.C., 2013.
[23] Y.-M. Zhang, K.-Z. Huang, and C.-L. Liu, “Fast and robust graph-based transductive learning
via minimum tree cut,” in Proc. Int. Conf. on Data Mining, 2011.
[24] C.-L. L. B. Bai, F. Yin, “Scene text localization using gradient local correlation,” in Proc. Int.
Conf. on Document Analysis and Recognition, 2013.
[25] C. Shi, C. Wang, B. Xiao, Y. Zhang, S. Gao, and Z. Zhang, “Scene text recognition using
part-based tree-structured character detections,” in Proc. Int. Conf. on Computer Vision and
Pattern Recognition, 2013.
[26] C. Shi, C. Wang, B. Xiao, Y. Zhang, and S. Gao, “Scene text detection using graph model
built upon maximally stable extremal regions,” Pattern Recognition Letters, vol. 34, no. 2, pp.
107–116, 2013.
[27] X.-C Yin, X. Yin, K. Huang, and H.-W. Hao Robust Text Detection in Natural Scene Images.,
IEEE Transactions on Pattern Analysis and Machine Intelligence, preprint, 2013.