In image scene, text contains high-level of important information that helps to analyze and consider the particular environment. In this paper, we adapt image mask and original identification of the mask region based convolutional neural networks (R-CNN) to allow recognition at 3 levels such as sequence, holistic and pixel-level semantics. Particularly, pixel and holistic level semantics can be utilized to recognize the texts and define the text shapes, respectively. Precisely, in mask and detection, we segment and recognize both character and word instances. Furthermore, we implement text detection through the outcome of instance segmentation on 2-D feature-space. Also, to tackle and identify the text issues of smaller and blurry texts, we consider text recognition by attention-based of optical character recognition (OCR) model with the mask R-CNN at sequential level. The OCR module is used to estimate character sequence through feature maps of the word instances in sequence to sequence. Finally, we proposed a fine-grained learning technique that trains a more accurate and robust model by learning models from the annotated datasets at the word level. Our proposed approach is evaluated on popular benchmark dataset ICDAR 2013 and ICDAR 2015.
IRJET - Text Detection in Natural Scene Images: A SurveyIRJET Journal
Β
This document summarizes various methods for text detection in natural scene images. It discusses approaches like feature pyramid networks (FPN), Mask R-CNN with pyramid attention networks, and Progressive Scale Expansion Network (PSENet). PSENet uses segmentation to precisely detect text in different shapes and orientations, then expands text instances using progressive scale expansion. The document also provides a literature review of several papers on text detection, compares their network architectures and results on datasets like ICDAR. Overall, it surveys state-of-the-art methods for text detection in natural images to better understand challenges and recent developments in the field.
A Review on Natural Scene Text Understanding for Computer Vision using Machin...IRJET Journal
Β
This document reviews various machine learning techniques for natural scene text understanding in computer vision. It discusses how deep learning methods like convolutional neural networks (CNNs) are more accurate than traditional image processing for text detection and recognition in complex, natural images. The document also summarizes several papers that propose different models using techniques like recurrent neural networks (RNNs), feature pyramid networks (FPNs), attention mechanisms, and connectionist temporal classification to detect and recognize text in scene images with challenges like variations in font, orientation, lighting and background complexity.
Optically processed Kannada script realization with Siamese neural network modelIAESIJAI
Β
Optical character recognition (OCR) is a technology that allows computers to recognize and extract text from images or scanned documents. It is commonly used to convert printed or handwritten text into machine-readable format. This Study presents an OCR system on Kannada Characters based on siamese neural network (SNN). Here the SNN, a Deep neural network which comprises of two identical convolutional neural network (CNN) compare the script and ranks based on the dissimilarity. When lesser dissimilarity score is identified, prediction is done as character match. In this work the authors use 5 classes of Kannada characters which were initially preprocessed using grey scaling and convert it to pgm format. This is directly input into the Deep convolutional network which is learnt from matching and non-matching image between the CNN with contrastive loss function in Siamese architecture. The Proposed OCR system uses very less time and gives more accurate results as compared to the regular CNN. The model can become a powerful tool for identification, particularly in situations where there is a high degree of variation in writing styles or limited training data is available.
This document summarizes a research paper that presents a hybrid approach for detecting and localizing color text in natural scene images. The approach uses both region-based and connected component-based methods. In the preprocessing stage, a text region detector is used to detect text regions and generate candidate text components. A conditional random field model combines unary component properties and binary contextual relationships to filter non-text components. Finally, neighboring text components are grouped into text lines or words using a learning-based energy minimization method. The paper is evaluated on a natural scene image dataset and shows improvements over existing methods.
International Journal of Engineering Research and DevelopmentIJERD Editor
Β
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESijnlc
Β
Large amount of information is lying dormant in historical documents and manuscripts. This information would go futile if not stored in digital form. Searching some relevant information from these scanned images would ideally require converting these document images to text form by doing optical character
recognition (OCR). For indigenous scripts of India, there are very few OCRs that can successfully recognize printed text images of varying quality, size, style and font. An alternate approach using word spotting can be effective to access large collections of document images. We propose a word spotting
technique based on codes for matching the word images of Devanagari script. The shape information is utilised for generating integer codes for words in the document image and these codes are matched for final retrieval of relevant documents. The technique is illustrated using Marathi document images.
Recognition of Words in Tamil Script Using Neural NetworkIJERA Editor
Β
In this paper, word recognition using neural network is proposed. Recognition process is started with the partitioning of document image into lines, words, and characters and then capturing the local features of segmented characters. After classifying the characters, the word image is transferred into unique code based on character code. This code ideally describes any form of word including word with mixed styles and different sizes. Sequence of character codes of the word form input pattern and word code is a target value of the pattern. Neural network is used to train the patterns of the words. Trained network is tested with word patterns and is recognized or unrecognized based on the network error value. Experiments have been conducted with a local database to evaluate the performance of the word recognizing system and obtained good accuracy. This method can be applied for any language word recognition system as the training is based on only unique code of the characters and words belonging to the language.
Text Extraction of Colour Images using Mathematical Morphology & HAAR TransformIOSR Journals
Β
This document summarizes a research paper that proposes a method to extract text from color images using the Haar discrete wavelet transform (DWT) and mathematical morphology. It begins with an introduction to text extraction challenges and prior approaches. The proposed method detects text edges using the Haar DWT and removes non-text edges with thresholding. Morphological dilation is then used to connect isolated candidate text edges. Mathematical morphology and templates are also used to extract characters from images. The simulation was carried out using MATLAB for image processing.
IRJET - Text Detection in Natural Scene Images: A SurveyIRJET Journal
Β
This document summarizes various methods for text detection in natural scene images. It discusses approaches like feature pyramid networks (FPN), Mask R-CNN with pyramid attention networks, and Progressive Scale Expansion Network (PSENet). PSENet uses segmentation to precisely detect text in different shapes and orientations, then expands text instances using progressive scale expansion. The document also provides a literature review of several papers on text detection, compares their network architectures and results on datasets like ICDAR. Overall, it surveys state-of-the-art methods for text detection in natural images to better understand challenges and recent developments in the field.
A Review on Natural Scene Text Understanding for Computer Vision using Machin...IRJET Journal
Β
This document reviews various machine learning techniques for natural scene text understanding in computer vision. It discusses how deep learning methods like convolutional neural networks (CNNs) are more accurate than traditional image processing for text detection and recognition in complex, natural images. The document also summarizes several papers that propose different models using techniques like recurrent neural networks (RNNs), feature pyramid networks (FPNs), attention mechanisms, and connectionist temporal classification to detect and recognize text in scene images with challenges like variations in font, orientation, lighting and background complexity.
Optically processed Kannada script realization with Siamese neural network modelIAESIJAI
Β
Optical character recognition (OCR) is a technology that allows computers to recognize and extract text from images or scanned documents. It is commonly used to convert printed or handwritten text into machine-readable format. This Study presents an OCR system on Kannada Characters based on siamese neural network (SNN). Here the SNN, a Deep neural network which comprises of two identical convolutional neural network (CNN) compare the script and ranks based on the dissimilarity. When lesser dissimilarity score is identified, prediction is done as character match. In this work the authors use 5 classes of Kannada characters which were initially preprocessed using grey scaling and convert it to pgm format. This is directly input into the Deep convolutional network which is learnt from matching and non-matching image between the CNN with contrastive loss function in Siamese architecture. The Proposed OCR system uses very less time and gives more accurate results as compared to the regular CNN. The model can become a powerful tool for identification, particularly in situations where there is a high degree of variation in writing styles or limited training data is available.
This document summarizes a research paper that presents a hybrid approach for detecting and localizing color text in natural scene images. The approach uses both region-based and connected component-based methods. In the preprocessing stage, a text region detector is used to detect text regions and generate candidate text components. A conditional random field model combines unary component properties and binary contextual relationships to filter non-text components. Finally, neighboring text components are grouped into text lines or words using a learning-based energy minimization method. The paper is evaluated on a natural scene image dataset and shows improvements over existing methods.
International Journal of Engineering Research and DevelopmentIJERD Editor
Β
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESijnlc
Β
Large amount of information is lying dormant in historical documents and manuscripts. This information would go futile if not stored in digital form. Searching some relevant information from these scanned images would ideally require converting these document images to text form by doing optical character
recognition (OCR). For indigenous scripts of India, there are very few OCRs that can successfully recognize printed text images of varying quality, size, style and font. An alternate approach using word spotting can be effective to access large collections of document images. We propose a word spotting
technique based on codes for matching the word images of Devanagari script. The shape information is utilised for generating integer codes for words in the document image and these codes are matched for final retrieval of relevant documents. The technique is illustrated using Marathi document images.
Recognition of Words in Tamil Script Using Neural NetworkIJERA Editor
Β
In this paper, word recognition using neural network is proposed. Recognition process is started with the partitioning of document image into lines, words, and characters and then capturing the local features of segmented characters. After classifying the characters, the word image is transferred into unique code based on character code. This code ideally describes any form of word including word with mixed styles and different sizes. Sequence of character codes of the word form input pattern and word code is a target value of the pattern. Neural network is used to train the patterns of the words. Trained network is tested with word patterns and is recognized or unrecognized based on the network error value. Experiments have been conducted with a local database to evaluate the performance of the word recognizing system and obtained good accuracy. This method can be applied for any language word recognition system as the training is based on only unique code of the characters and words belonging to the language.
Text Extraction of Colour Images using Mathematical Morphology & HAAR TransformIOSR Journals
Β
This document summarizes a research paper that proposes a method to extract text from color images using the Haar discrete wavelet transform (DWT) and mathematical morphology. It begins with an introduction to text extraction challenges and prior approaches. The proposed method detects text edges using the Haar DWT and removes non-text edges with thresholding. Morphological dilation is then used to connect isolated candidate text edges. Mathematical morphology and templates are also used to extract characters from images. The simulation was carried out using MATLAB for image processing.
Text detection and recognition in scene images or natural images has applications in computer
vision systems like registration number plate detection, automatic traffic sign detection, image retrieval
and help for visually impaired people. Scene text, however, has complicated background, blur image,
partly occluded text, variations in font-styles, image noise and ranging illumination. Hence scene text
recognition could be a difficult computer vision problem. In this paper connected component method
is used to extract the text from background. In this work, horizontal and vertical projection profiles,
geometric properties of text, image binirization and gap filling method are used to extract the text from
scene images. Then histogram based threshold is applied to separate text background of the images.
Finally text is extracted from images.
Texture features based text extraction from images using DWT and K-means clus...Divya Gera
Β
Text extraction from different kind of images document, caption and scene text images. Discret wavelet transform was used to exract horizontal, vertical and diagonal features and k-means clustering was used to cluster the features into text and background cluster. For simple images k = 2 worked i.e. text and backgroud cluster while for complex images k=3 was used i.e. text cluster, complex background ad simple background.
COHESIVE MULTI-ORIENTED TEXT DETECTION AND RECOGNITION STRUCTURE IN NATURAL S...ijdpsjournal
Β
Scene text recognition brings various new challenges occurs in recent years. Detecting and recognizing text in scenes entails some of the equivalent problems as document processing, but there are also numerous novel problems to face for ecognizing text in natural scene images. Recent research in these regions has exposed several promise but present is motionless much effort to be entire in these regions. Most existing techniques have focused on detecting horizontal or near-horizontal texts. In this paper, we propose a new scheme which detects texts of arbitrary directions in natural scene images. Our algorithm is equipped with two sets of characteristics specially designed for capturing both the natural characteristics of texts using
MSER regions using Otsu method. To better estimate our algorithm and compare it with other existing algorithms, we are using existing MSRA Dataset, ICDAR Dataset, and our new dataset, which includes various texts in various real-world situations. Experiments results on these standard datasets and the proposed dataset shows that our algorithm compares positively with the modern algorithms when using horizontal texts and accomplishes significantly improved performance on texts of random orientations in composite natural scenes images.
COHESIVE MULTI-ORIENTED TEXT DETECTION AND RECOGNITION STRUCTURE IN NATURAL S...ijdpsjournal
Β
Scene text recognition brings various new challenges occurs in recent years. Detecting and recognizing text
in scenes entails some of the equivalent problems as document processing, but there are also numerous
novel problems to face for recognizing text in natural scene images. Recent research in these regions has
exposed several promise but present is motionless much effort to be entire in these regions. Most existing
techniques have focused on detecting horizontal or near-horizontal texts. In this paper, we propose a new
scheme which detects texts of arbitrary directions in natural scene images. Our algorithm is equipped with
two sets of characteristics specially designed for capturing both the natural characteristics of texts using
MSER regions using Otsu method. To better estimate our algorithm and compare it with other existing
algorithms, we are using existing MSRA Dataset, ICDAR Dataset, and our new dataset, which includes
various texts in various real-world situations. Experiments results on these standard datasets and the
proposed dataset shows that our algorithm compares positively with the modern algorithms when using
horizontal texts and accomplishes significantly improved performance on texts of random orientations in
composite natural scenes images.
1) The document surveys approaches for scene text recognition in images over the past few years. It discusses techniques for text detection and localization, recognition, and end-to-end systems.
2) Traditional techniques included connected component analysis and texture-based methods using filters. Recent deep learning methods apply convolutional neural networks for region proposal, segmentation, and recognition.
3) State-of-the-art approaches use CNN-RNN models to recognize text as character sequences. Future areas of interest include handling more complex scenes and multi-language text in real-time systems.
Because of the rapid growth in technology breakthroughs, including
multimedia and cell phones, Telugu character recognition (TCR) has recently
become a popular study area. It is still necessary to construct automated and
intelligent online TCR models, even if many studies have focused on offline
TCR models. The Telugu character dataset construction and validation using
an Inception and ResNet-based model are presented. The collection of 645
letters in the dataset includes 18 Achus, 38 Hallus, 35 Othulu, 34Γ16
Guninthamulu, and 10 Ankelu. The proposed technique aims to efficiently
recognize and identify distinctive Telugu characters online. This model's main
pre-processing steps to achieve its goals include normalization, smoothing,
and interpolation. Improved recognition performance can be attained by using
stochastic gradient descent (SGD) to optimize the model's hyperparameters.
CRNN model for text detection and classification from natural scenesIAESIJAI
Β
In the emerging field of computer vision, text recognition in natural settings remains a significant challenge due to variables like font, text size, and background complexity. This study introduces a method focusing on the automatic detection and classification of cursive text in multiple languages: English, Hindi, Tamil, and Kannada using a deep convolutional recurrent neural network (CRNN). The architecture combines convolutional neural networks (CNN) and long short-term memory (LSTM) networks for effective spatial and temporal learning. We employed pre-trained CNN models like VGG-16 and ResNet-18 for feature extraction and evaluated their performance. The method outperformed existing techniques, achieving an accuracy of 95.0%, 96.3%, and 96.2% on ICDAR 2015, ICDAR 2017, and a custom dataset (PDT2023), respectively. The findings not only push the boundaries of text detection technology but also offer promising prospects for practical applications.
Script Identification for printed document images at text-line level using DC...IOSR Journals
Β
This document summarizes a research paper that proposes a script identification approach for Indian scripts at the text-line level of printed documents. The approach uses visual appearance-based recognition by extracting features from text lines using Discrete Cosine Transform (DCT) and Principal Component Analysis (PCA). These features are classified using a Modified K-Nearest Neighbor algorithm. The method achieves 95% accuracy in recognizing 11 major Indian languages and discusses the importance of script identification in multilingual environments like India for applications such as optical character recognition.
IMAGE TO TEXT TO SPEECH CONVERSION USING MACHINE LEARNINGIRJET Journal
Β
This document describes a study on converting images to text to speech using machine learning. The researchers developed a system that uses optical character recognition to extract text from images, then converts the text to speech. They achieved over 99% accuracy on their test dataset of over 1 million images. Their integrated system was able to accurately extract and convert text from various real-world images like street signs and menus. The system has potential to improve accessibility for people with visual impairments by allowing printed information to be converted to audio. Future work includes handling lower quality images and expanding the system to support additional languages and applications.
Scene text recognition in mobile applications by character descriptor and str...eSAT Journals
Β
This document presents a method for scene text recognition in mobile applications using character descriptors and structure configuration. It proposes using a character descriptor that combines feature detectors and descriptors to extract text features effectively. It also models character structure using stroke configuration maps derived from character boundaries and skeletons. The method was tested on various datasets and achieved accuracy rates above 70%, outperforming existing methods. It can detect text regions and recognize text information for applications like text understanding and retrieval on mobile devices.
Dominating set based arbitrary oriented bilingual scene text localizationIJECEIAES
Β
Localizing and recognizing arbitrarily oriented text in natural scene images is the biggest challenge. It is because scene texts are often erratic in shapes. This paper presents a simple and effective graph representational algorithm for detecting arbitrary-oriented text location to smoothen the text recognition process because of its high impact and simplicity of representation. An arbitrarily oriented text can be horizontal, vertical, perspective, curved (diagonal/off-diagonal), or even a combination. As a pre-processing step, image enhancement is performed in the frequency domain to improve the representation of images that are invariant to intensity. It is necessary to draw bounding boxes for each candidate character in the scene images to extract text regions. This step is carried out by utilizing the advantage of the region-based approach called maximally stable extremal regions. A typical problem with curved text localization is that non-text objects may occur within localized text regions. Our method is the first in the literature that searches for dominating sets to solve this problem. This dominating set method outperforms several traditional methods, including deep learning methods used for arbitrary text localization, on challenging datasets like 13 th international conference on document analysis and recognition (ICDAR 2015), multi-script robust reading competition (MRRC), CurvedText 80 (CUTE80), and arbitrary text (ArT).
Bayesian distance metric learning and its application in automatic speaker re...IJECEIAES
Β
This document proposes a state-of-the-art automatic speaker recognition system based on Bayesian distance metric learning as a feature extractor. It explores constraints on the distance between modified and simplified i-vector pairs from the same speaker and different speakers. An approximation of the distance metric is used as a weighted covariance matrix from the higher eigenvectors of the covariance matrix, which is used to estimate the posterior distribution of the metric distance. This Bayesian distance learning approach achieves better performance than advanced methods and is insensitive to normalization compared to cosine scores. It is also effective with limited training data.
Optical Character Recognition from Text ImageEditor IJCATR
Β
Optical Character Recognition (OCR) is a system that provides a full alphanumeric recognition of printed or handwritten
characters by simply scanning the text image. OCR system interprets the printed or handwritten characters image and converts it into
corresponding editable text document. The text image is divided into regions by isolating each line, then individual characters with
spaces. After character extraction, the texture and topological features like corner points, features of different regions, ratio of
character area and convex area of all characters of text image are calculated. Previously features of each uppercase and lowercase
letter, digit, and symbols are stored as a template. Based on the texture and topological features, the system recognizes the exact
character using feature matching between the extracted character and the template of all characters as a measure of similarity.
This document summarizes a research paper that reviews different methods for scene text detection and the challenges associated with it. The paper begins with an introduction that describes the overall process of automated scene text detection systems. It then provides a literature review of various text detection methods proposed in previous research, which can be categorized as connected component based methods or texture based methods. Some example methods are described. The paper discusses challenges in scene text detection, such as variable imaging conditions, complex backgrounds, and a wide range of text sizes and fonts. Finally, it discusses performance metrics like precision, recall, and f-measure that are used to evaluate scene text detection methods based on a standard dataset.
IRJET- Image to Text Conversion using TesseractIRJET Journal
Β
This document discusses using Tesseract OCR engine to convert images containing text into editable text files. It begins with an abstract describing how digital images often contain text data that users need to access and edit digitally. Tesseract is an open-source OCR tool that uses neural networks like LSTM to recognize text in images with high accuracy and convert it into editable text. It then reviews existing OCR methods before describing Tesseract's image processing and recognition steps in more detail. The document also notes that the converted text could then be used to create audio files for visually impaired users to hear the text content.
offline character recognition for handwritten gujarati textBhumika Patel
Β
This document summarizes a presentation on optical character recognition of Gujarati characters using convolutional neural networks. It outlines collecting a dataset of 1360 images each of 34 Gujarati characters written by different people. The proposed approach involves preprocessing images, training a CNN model, and calculating accuracy. Initial results correctly recognized some characters but had difficulty with connected characters. Future work includes recognizing remaining characters and vowels, collecting more data, and exploring different CNN configurations to improve accuracy.
Hand Written Character Recognition Using Neural Networks Chiranjeevi Adi
Β
This document discusses a project to develop a handwritten character recognition system using a neural network. It will take handwritten English characters as input and recognize the patterns using a trained neural network. The system aims to recognize individual characters as well as classify them into groups. It will first preprocess, segment, extract features from, and then classify the input characters using the neural network. The document reviews several existing approaches to handwritten character recognition and the use of gradient and edge-based feature extraction with neural networks. It defines the objectives and methods for the proposed system, which will involve preprocessing, segmentation, feature extraction, and classification/recognition steps. Finally, it outlines the hardware and software requirements to implement the system as a MATLAB application.
Writing long sentences is bit boring, but with text prediction in the keyboard technology has made
this simple. Learning technology behind the keyboard is developing fast and has become more accurate.
Learning technologies such as machine learning, deep learning here play an important role in predicting the
text. Current trending techniques in deep learning has opened door for data analysis. Emerging technologies
such has Region CNN, Recurrent CNN have been under consideration for the analysis. Many techniques have
been used for text sequence prediction such as Convolutional Neural Networks (CNN), Recurrent Neural
Networks (RNN), and Recurrent Convolution Neural Networks (RCNN). This paper aims to provide a
comparative study of different techniques used for text prediction.
Deep Learning in Text Recognition and Text Detection : A ReviewIRJET Journal
Β
The document discusses text detection and recognition using deep learning techniques. It begins with an introduction to deep learning and its use in optical character recognition (OCR). There are two main components of OCR - text detection, which locates text in an image, and text recognition, which identifies the text. Convolutional neural networks (CNNs) are effective for both tasks. The document then outlines the steps for text detection and recognition using deep learning, including data collection, preprocessing, feature extraction using CNNs, training and validating models on datasets, and testing the trained models on new data.
Handwritten Text Recognition and Translation with AudioIRJET Journal
Β
This document presents research on handwritten text recognition and translation with audio. The researchers used convolutional neural networks to classify handwritten words and characters. For word classification, they directly classified whole words. For character classification, they first separated characters from words using an LSTM model and then classified each character. They trained models on the IAM Handwriting Dataset containing over 100k words. Pre-processing steps like noise removal, skew correction, and line segmentation were used to prepare images for classification. Both word-level and character-level classification models were explored. The character classification approach showed better results due to a smaller output size for the softmax layer. The recognized text could then be translated and output with audio.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Β
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the modelβs competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Β
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
More Related Content
Similar to Customized mask region based convolutional neural networks for un-uniformed shape text detection and text recognition
Text detection and recognition in scene images or natural images has applications in computer
vision systems like registration number plate detection, automatic traffic sign detection, image retrieval
and help for visually impaired people. Scene text, however, has complicated background, blur image,
partly occluded text, variations in font-styles, image noise and ranging illumination. Hence scene text
recognition could be a difficult computer vision problem. In this paper connected component method
is used to extract the text from background. In this work, horizontal and vertical projection profiles,
geometric properties of text, image binirization and gap filling method are used to extract the text from
scene images. Then histogram based threshold is applied to separate text background of the images.
Finally text is extracted from images.
Texture features based text extraction from images using DWT and K-means clus...Divya Gera
Β
Text extraction from different kind of images document, caption and scene text images. Discret wavelet transform was used to exract horizontal, vertical and diagonal features and k-means clustering was used to cluster the features into text and background cluster. For simple images k = 2 worked i.e. text and backgroud cluster while for complex images k=3 was used i.e. text cluster, complex background ad simple background.
COHESIVE MULTI-ORIENTED TEXT DETECTION AND RECOGNITION STRUCTURE IN NATURAL S...ijdpsjournal
Β
Scene text recognition brings various new challenges occurs in recent years. Detecting and recognizing text in scenes entails some of the equivalent problems as document processing, but there are also numerous novel problems to face for ecognizing text in natural scene images. Recent research in these regions has exposed several promise but present is motionless much effort to be entire in these regions. Most existing techniques have focused on detecting horizontal or near-horizontal texts. In this paper, we propose a new scheme which detects texts of arbitrary directions in natural scene images. Our algorithm is equipped with two sets of characteristics specially designed for capturing both the natural characteristics of texts using
MSER regions using Otsu method. To better estimate our algorithm and compare it with other existing algorithms, we are using existing MSRA Dataset, ICDAR Dataset, and our new dataset, which includes various texts in various real-world situations. Experiments results on these standard datasets and the proposed dataset shows that our algorithm compares positively with the modern algorithms when using horizontal texts and accomplishes significantly improved performance on texts of random orientations in composite natural scenes images.
COHESIVE MULTI-ORIENTED TEXT DETECTION AND RECOGNITION STRUCTURE IN NATURAL S...ijdpsjournal
Β
Scene text recognition brings various new challenges occurs in recent years. Detecting and recognizing text
in scenes entails some of the equivalent problems as document processing, but there are also numerous
novel problems to face for recognizing text in natural scene images. Recent research in these regions has
exposed several promise but present is motionless much effort to be entire in these regions. Most existing
techniques have focused on detecting horizontal or near-horizontal texts. In this paper, we propose a new
scheme which detects texts of arbitrary directions in natural scene images. Our algorithm is equipped with
two sets of characteristics specially designed for capturing both the natural characteristics of texts using
MSER regions using Otsu method. To better estimate our algorithm and compare it with other existing
algorithms, we are using existing MSRA Dataset, ICDAR Dataset, and our new dataset, which includes
various texts in various real-world situations. Experiments results on these standard datasets and the
proposed dataset shows that our algorithm compares positively with the modern algorithms when using
horizontal texts and accomplishes significantly improved performance on texts of random orientations in
composite natural scenes images.
1) The document surveys approaches for scene text recognition in images over the past few years. It discusses techniques for text detection and localization, recognition, and end-to-end systems.
2) Traditional techniques included connected component analysis and texture-based methods using filters. Recent deep learning methods apply convolutional neural networks for region proposal, segmentation, and recognition.
3) State-of-the-art approaches use CNN-RNN models to recognize text as character sequences. Future areas of interest include handling more complex scenes and multi-language text in real-time systems.
Because of the rapid growth in technology breakthroughs, including
multimedia and cell phones, Telugu character recognition (TCR) has recently
become a popular study area. It is still necessary to construct automated and
intelligent online TCR models, even if many studies have focused on offline
TCR models. The Telugu character dataset construction and validation using
an Inception and ResNet-based model are presented. The collection of 645
letters in the dataset includes 18 Achus, 38 Hallus, 35 Othulu, 34Γ16
Guninthamulu, and 10 Ankelu. The proposed technique aims to efficiently
recognize and identify distinctive Telugu characters online. This model's main
pre-processing steps to achieve its goals include normalization, smoothing,
and interpolation. Improved recognition performance can be attained by using
stochastic gradient descent (SGD) to optimize the model's hyperparameters.
CRNN model for text detection and classification from natural scenesIAESIJAI
Β
In the emerging field of computer vision, text recognition in natural settings remains a significant challenge due to variables like font, text size, and background complexity. This study introduces a method focusing on the automatic detection and classification of cursive text in multiple languages: English, Hindi, Tamil, and Kannada using a deep convolutional recurrent neural network (CRNN). The architecture combines convolutional neural networks (CNN) and long short-term memory (LSTM) networks for effective spatial and temporal learning. We employed pre-trained CNN models like VGG-16 and ResNet-18 for feature extraction and evaluated their performance. The method outperformed existing techniques, achieving an accuracy of 95.0%, 96.3%, and 96.2% on ICDAR 2015, ICDAR 2017, and a custom dataset (PDT2023), respectively. The findings not only push the boundaries of text detection technology but also offer promising prospects for practical applications.
Script Identification for printed document images at text-line level using DC...IOSR Journals
Β
This document summarizes a research paper that proposes a script identification approach for Indian scripts at the text-line level of printed documents. The approach uses visual appearance-based recognition by extracting features from text lines using Discrete Cosine Transform (DCT) and Principal Component Analysis (PCA). These features are classified using a Modified K-Nearest Neighbor algorithm. The method achieves 95% accuracy in recognizing 11 major Indian languages and discusses the importance of script identification in multilingual environments like India for applications such as optical character recognition.
IMAGE TO TEXT TO SPEECH CONVERSION USING MACHINE LEARNINGIRJET Journal
Β
This document describes a study on converting images to text to speech using machine learning. The researchers developed a system that uses optical character recognition to extract text from images, then converts the text to speech. They achieved over 99% accuracy on their test dataset of over 1 million images. Their integrated system was able to accurately extract and convert text from various real-world images like street signs and menus. The system has potential to improve accessibility for people with visual impairments by allowing printed information to be converted to audio. Future work includes handling lower quality images and expanding the system to support additional languages and applications.
Scene text recognition in mobile applications by character descriptor and str...eSAT Journals
Β
This document presents a method for scene text recognition in mobile applications using character descriptors and structure configuration. It proposes using a character descriptor that combines feature detectors and descriptors to extract text features effectively. It also models character structure using stroke configuration maps derived from character boundaries and skeletons. The method was tested on various datasets and achieved accuracy rates above 70%, outperforming existing methods. It can detect text regions and recognize text information for applications like text understanding and retrieval on mobile devices.
Dominating set based arbitrary oriented bilingual scene text localizationIJECEIAES
Β
Localizing and recognizing arbitrarily oriented text in natural scene images is the biggest challenge. It is because scene texts are often erratic in shapes. This paper presents a simple and effective graph representational algorithm for detecting arbitrary-oriented text location to smoothen the text recognition process because of its high impact and simplicity of representation. An arbitrarily oriented text can be horizontal, vertical, perspective, curved (diagonal/off-diagonal), or even a combination. As a pre-processing step, image enhancement is performed in the frequency domain to improve the representation of images that are invariant to intensity. It is necessary to draw bounding boxes for each candidate character in the scene images to extract text regions. This step is carried out by utilizing the advantage of the region-based approach called maximally stable extremal regions. A typical problem with curved text localization is that non-text objects may occur within localized text regions. Our method is the first in the literature that searches for dominating sets to solve this problem. This dominating set method outperforms several traditional methods, including deep learning methods used for arbitrary text localization, on challenging datasets like 13 th international conference on document analysis and recognition (ICDAR 2015), multi-script robust reading competition (MRRC), CurvedText 80 (CUTE80), and arbitrary text (ArT).
Bayesian distance metric learning and its application in automatic speaker re...IJECEIAES
Β
This document proposes a state-of-the-art automatic speaker recognition system based on Bayesian distance metric learning as a feature extractor. It explores constraints on the distance between modified and simplified i-vector pairs from the same speaker and different speakers. An approximation of the distance metric is used as a weighted covariance matrix from the higher eigenvectors of the covariance matrix, which is used to estimate the posterior distribution of the metric distance. This Bayesian distance learning approach achieves better performance than advanced methods and is insensitive to normalization compared to cosine scores. It is also effective with limited training data.
Optical Character Recognition from Text ImageEditor IJCATR
Β
Optical Character Recognition (OCR) is a system that provides a full alphanumeric recognition of printed or handwritten
characters by simply scanning the text image. OCR system interprets the printed or handwritten characters image and converts it into
corresponding editable text document. The text image is divided into regions by isolating each line, then individual characters with
spaces. After character extraction, the texture and topological features like corner points, features of different regions, ratio of
character area and convex area of all characters of text image are calculated. Previously features of each uppercase and lowercase
letter, digit, and symbols are stored as a template. Based on the texture and topological features, the system recognizes the exact
character using feature matching between the extracted character and the template of all characters as a measure of similarity.
This document summarizes a research paper that reviews different methods for scene text detection and the challenges associated with it. The paper begins with an introduction that describes the overall process of automated scene text detection systems. It then provides a literature review of various text detection methods proposed in previous research, which can be categorized as connected component based methods or texture based methods. Some example methods are described. The paper discusses challenges in scene text detection, such as variable imaging conditions, complex backgrounds, and a wide range of text sizes and fonts. Finally, it discusses performance metrics like precision, recall, and f-measure that are used to evaluate scene text detection methods based on a standard dataset.
IRJET- Image to Text Conversion using TesseractIRJET Journal
Β
This document discusses using Tesseract OCR engine to convert images containing text into editable text files. It begins with an abstract describing how digital images often contain text data that users need to access and edit digitally. Tesseract is an open-source OCR tool that uses neural networks like LSTM to recognize text in images with high accuracy and convert it into editable text. It then reviews existing OCR methods before describing Tesseract's image processing and recognition steps in more detail. The document also notes that the converted text could then be used to create audio files for visually impaired users to hear the text content.
offline character recognition for handwritten gujarati textBhumika Patel
Β
This document summarizes a presentation on optical character recognition of Gujarati characters using convolutional neural networks. It outlines collecting a dataset of 1360 images each of 34 Gujarati characters written by different people. The proposed approach involves preprocessing images, training a CNN model, and calculating accuracy. Initial results correctly recognized some characters but had difficulty with connected characters. Future work includes recognizing remaining characters and vowels, collecting more data, and exploring different CNN configurations to improve accuracy.
Hand Written Character Recognition Using Neural Networks Chiranjeevi Adi
Β
This document discusses a project to develop a handwritten character recognition system using a neural network. It will take handwritten English characters as input and recognize the patterns using a trained neural network. The system aims to recognize individual characters as well as classify them into groups. It will first preprocess, segment, extract features from, and then classify the input characters using the neural network. The document reviews several existing approaches to handwritten character recognition and the use of gradient and edge-based feature extraction with neural networks. It defines the objectives and methods for the proposed system, which will involve preprocessing, segmentation, feature extraction, and classification/recognition steps. Finally, it outlines the hardware and software requirements to implement the system as a MATLAB application.
Writing long sentences is bit boring, but with text prediction in the keyboard technology has made
this simple. Learning technology behind the keyboard is developing fast and has become more accurate.
Learning technologies such as machine learning, deep learning here play an important role in predicting the
text. Current trending techniques in deep learning has opened door for data analysis. Emerging technologies
such has Region CNN, Recurrent CNN have been under consideration for the analysis. Many techniques have
been used for text sequence prediction such as Convolutional Neural Networks (CNN), Recurrent Neural
Networks (RNN), and Recurrent Convolution Neural Networks (RCNN). This paper aims to provide a
comparative study of different techniques used for text prediction.
Deep Learning in Text Recognition and Text Detection : A ReviewIRJET Journal
Β
The document discusses text detection and recognition using deep learning techniques. It begins with an introduction to deep learning and its use in optical character recognition (OCR). There are two main components of OCR - text detection, which locates text in an image, and text recognition, which identifies the text. Convolutional neural networks (CNNs) are effective for both tasks. The document then outlines the steps for text detection and recognition using deep learning, including data collection, preprocessing, feature extraction using CNNs, training and validating models on datasets, and testing the trained models on new data.
Handwritten Text Recognition and Translation with AudioIRJET Journal
Β
This document presents research on handwritten text recognition and translation with audio. The researchers used convolutional neural networks to classify handwritten words and characters. For word classification, they directly classified whole words. For character classification, they first separated characters from words using an LSTM model and then classified each character. They trained models on the IAM Handwriting Dataset containing over 100k words. Pre-processing steps like noise removal, skew correction, and line segmentation were used to prepare images for classification. Both word-level and character-level classification models were explored. The character classification approach showed better results due to a smaller output size for the softmax layer. The recognized text could then be translated and output with audio.
Similar to Customized mask region based convolutional neural networks for un-uniformed shape text detection and text recognition (20)
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Β
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the modelβs competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Β
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
Β
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Neural network optimizer of proportional-integral-differential controller par...IJECEIAES
Β
Wide application of proportional-integral-differential (PID)-regulator in industry requires constant improvement of methods of its parameters adjustment. The paper deals with the issues of optimization of PID-regulator parameters with the use of neural network technology methods. A methodology for choosing the architecture (structure) of neural network optimizer is proposed, which consists in determining the number of layers, the number of neurons in each layer, as well as the form and type of activation function. Algorithms of neural network training based on the application of the method of minimizing the mismatch between the regulated value and the target value are developed. The method of back propagation of gradients is proposed to select the optimal training rate of neurons of the neural network. The neural network optimizer, which is a superstructure of the linear PID controller, allows increasing the regulation accuracy from 0.23 to 0.09, thus reducing the power consumption from 65% to 53%. The results of the conducted experiments allow us to conclude that the created neural superstructure may well become a prototype of an automatic voltage regulator (AVR)-type industrial controller for tuning the parameters of the PID controller.
An improved modulation technique suitable for a three level flying capacitor ...IJECEIAES
Β
This research paper introduces an innovative modulation technique for controlling a 3-level flying capacitor multilevel inverter (FCMLI), aiming to streamline the modulation process in contrast to conventional methods. The proposed
simplified modulation technique paves the way for more straightforward and
efficient control of multilevel inverters, enabling their widespread adoption and
integration into modern power electronic systems. Through the amalgamation of
sinusoidal pulse width modulation (SPWM) with a high-frequency square wave
pulse, this controlling technique attains energy equilibrium across the coupling
capacitor. The modulation scheme incorporates a simplified switching pattern
and a decreased count of voltage references, thereby simplifying the control
algorithm.
A review on features and methods of potential fishing zoneIJECEIAES
Β
This review focuses on the importance of identifying potential fishing zones in seawater for sustainable fishing practices. It explores features like sea surface temperature (SST) and sea surface height (SSH), along with classification methods such as classifiers. The features like SST, SSH, and different classifiers used to classify the data, have been figured out in this review study. This study underscores the importance of examining potential fishing zones using advanced analytical techniques. It thoroughly explores the methodologies employed by researchers, covering both past and current approaches. The examination centers on data characteristics and the application of classification algorithms for classification of potential fishing zones. Furthermore, the prediction of potential fishing zones relies significantly on the effectiveness of classification algorithms. Previous research has assessed the performance of models like support vector machines, naΓ―ve Bayes, and artificial neural networks (ANN). In the previous result, the results of support vector machine (SVM) were 97.6% more accurate than naive Bayes's 94.2% to classify test data for fisheries classification. By considering the recent works in this area, several recommendations for future works are presented to further improve the performance of the potential fishing zone models, which is important to the fisheries community.
Electrical signal interference minimization using appropriate core material f...IJECEIAES
Β
As demand for smaller, quicker, and more powerful devices rises, Moore's law is strictly followed. The industry has worked hard to make little devices that boost productivity. The goal is to optimize device density. Scientists are reducing connection delays to improve circuit performance. This helped them understand three-dimensional integrated circuit (3D IC) concepts, which stack active devices and create vertical connections to diminish latency and lower interconnects. Electrical involvement is a big worry with 3D integrates circuits. Researchers have developed and tested through silicon via (TSV) and substrates to decrease electrical wave involvement. This study illustrates a novel noise coupling reduction method using several electrical involvement models. A 22% drop in electrical involvement from wave-carrying to victim TSVs introduces this new paradigm and improves system performance even at higher THz frequencies.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Β
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Β
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
Β
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Β
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Β
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
Β
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
Β
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Β
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Β
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
Β
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances studentsβ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Β
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Β
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Β
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
Open Channel Flow: fluid flow with a free surfaceIndrajeet sahu
Β
Open Channel Flow: This topic focuses on fluid flow with a free surface, such as in rivers, canals, and drainage ditches. Key concepts include the classification of flow types (steady vs. unsteady, uniform vs. non-uniform), hydraulic radius, flow resistance, Manning's equation, critical flow conditions, and energy and momentum principles. It also covers flow measurement techniques, gradually varied flow analysis, and the design of open channels. Understanding these principles is vital for effective water resource management and engineering applications.
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...DharmaBanothu
Β
The Network on Chip (NoC) has emerged as an effective
solution for intercommunication infrastructure within System on
Chip (SoC) designs, overcoming the limitations of traditional
methods that face significant bottlenecks. However, the complexity
of NoC design presents numerous challenges related to
performance metrics such as scalability, latency, power
consumption, and signal integrity. This project addresses the
issues within the router's memory unit and proposes an enhanced
memory structure. To achieve efficient data transfer, FIFO buffers
are implemented in distributed RAM and virtual channels for
FPGA-based NoC. The project introduces advanced FIFO-based
memory units within the NoC router, assessing their performance
in a Bi-directional NoC (Bi-NoC) configuration. The primary
objective is to reduce the router's workload while enhancing the
FIFO internal structure. To further improve data transfer speed,
a Bi-NoC with a self-configurable intercommunication channel is
suggested. Simulation and synthesis results demonstrate
guaranteed throughput, predictable latency, and equitable
network access, showing significant improvement over previous
designs
Levelised Cost of Hydrogen (LCOH) Calculator ManualMassimo Talia
Β
The aim of this manual is to explain the
methodology behind the Levelized Cost of
Hydrogen (LCOH) calculator. Moreover, this
manual also demonstrates how the calculator
can be used for estimating the expenses associated with hydrogen production in Europe
using low-temperature electrolysis considering different sources of electricity
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELijaia
Β
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Transcat
Β
Join us for this solutions-based webinar on the tools and techniques for commissioning and maintaining PV Systems. In this session, we'll review the process of building and maintaining a solar array, starting with installation and commissioning, then reviewing operations and maintenance of the system. This course will review insulation resistance testing, I-V curve testing, earth-bond continuity, ground resistance testing, performance tests, visual inspections, ground and arc fault testing procedures, and power quality analysis.
Fluke Solar Application Specialist Will White is presenting on this engaging topic:
Will has worked in the renewable energy industry since 2005, first as an installer for a small east coast solar integrator before adding sales, design, and project management to his skillset. In 2022, Will joined Fluke as a solar application specialist, where he supports their renewable energy testing equipment like IV-curve tracers, electrical meters, and thermal imaging cameras. Experienced in wind power, solar thermal, energy storage, and all scales of PV, Will has primarily focused on residential and small commercial systems. He is passionate about implementing high-quality, code-compliant installation techniques.
Blood finder application project report (1).pdfKamal Acharya
Β
Blood Finder is an emergency time app where a user can search for the blood banks as
well as the registered blood donors around Mumbai. This application also provide an
opportunity for the user of this application to become a registered donor for this user have
to enroll for the donor request from the application itself. If the admin wish to make user
a registered donor, with some of the formalities with the organization it can be done.
Specialization of this application is that the user will not have to register on sign-in for
searching the blood banks and blood donors it can be just done by installing the
application to the mobile.
The purpose of making this application is to save the userβs time for searching blood of
needed blood group during the time of the emergency.
This is an android application developed in Java and XML with the connectivity of
SQLite database. This application will provide most of basic functionality required for an
emergency time application. All the details of Blood banks and Blood donors are stored
in the database i.e. SQLite.
This application allowed the user to get all the information regarding blood banks and
blood donors such as Name, Number, Address, Blood Group, rather than searching it on
the different websites and wasting the precious time. This application is effective and
user friendly.
Road construction is not as easy as it seems to be, it includes various steps and it starts with its designing and
structure including the traffic volume consideration. Then base layer is done by bulldozers and levelers and after
base surface coating has to be done. For giving road a smooth surface with flexibility, Asphalt concrete is used.
Asphalt requires an aggregate sub base material layer, and then a base layer to be put into first place. Asphalt road
construction is formulated to support the heavy traffic load and climatic conditions. It is 100% recyclable and
saving non renewable natural resources.
With the advancement of technology, Asphalt technology gives assurance about the good drainage system and with
skid resistance it can be used where safety is necessary such as outsidethe schools.
The largest use of Asphalt is for making asphalt concrete for road surfaces. It is widely used in airports around the
world due to the sturdiness and ability to be repaired quickly, it is widely used for runways dedicated to aircraft
landing and taking off. Asphalt is normally stored and transported at 150βC or 300βF temperature
Determination of Equivalent Circuit parameters and performance characteristic...pvpriya2
Β
Includes the testing of induction motor to draw the circle diagram of induction motor with step wise procedure and calculation for the same. Also explains the working and application of Induction generator
Customized mask region based convolutional neural networks for un-uniformed shape text detection and text recognition
1. International Journal of Electrical and Computer Engineering (IJECE)
Vol. 13, No. 1, February 2023, pp. 413~424
ISSN: 2088-8708, DOI: 10.11591/ijece.v13i1.pp413-424 ο² 413
Journal homepage: http://ijece.iaescore.com
Customized mask region based convolutional neural networks
for un-uniformed shape text detection and text recognition
Ravikumar Hodikehosahally Channegowda1
, Palani Karthik2
, Raghavendra Srinivasaiah3
,
Mahadev Shivaraj4
1
Department of Electronics and Communication Engineering, K S School of Engineering and Management,
Ghousia College of Engineering, Visvesvaraya Technological University, Karnataka, India
2
Department of Electronics and Communication Engineering, K S School of Engineering and Management,
Visvesvaraya Technological University, Karnataka, India
3
Department of Computer science and Engineering, Christ Deemed to be University, Bengaluru, Karnataka, India
4
Department of Electronics and Communication Engineering, Ghousia College of Engineering, Ramanagaram,
Visvesvaraya Technological University, Karnataka, India
Article Info ABSTRACT
Article history:
Received Oct 26, 2021
Revised Aug 7, 2022
Accepted Sep 6, 2022
In image scene, text contains high-level of important information that helps
to analyze and consider the particular environment. In this paper, we adapt
image mask and original identification of the mask region based
convolutional neural networks (R-CNN) to allow recognition at 3 levels such
as sequence, holistic and pixel-level semantics. Particularly, pixel and
holistic level semantics can be utilized to recognize the texts and define the
text shapes, respectively. Precisely, in mask and detection, we segment and
recognize both character and word instances. Furthermore, we implement
text detection through the outcome of instance segmentation on 2-D
feature-space. Also, to tackle and identify the text issues of smaller and
blurry texts, we consider text recognition by attention-based of optical
character recognition (OCR) model with the mask R-CNN at sequential
level. The OCR module is used to estimate character sequence through
feature maps of the word instances in sequence to sequence. Finally, we
proposed a fine-grained learning technique that trains a more accurate and
robust model by learning models from the annotated datasets at the word
level. Our proposed approach is evaluated on popular benchmark dataset
ICDAR 2013 and ICDAR 2015.
Keywords:
Deep neural networks
Optical character-recognition
Region based convolutional
neural networks
Region of interest
Text detection
Text recognition
This is an open access article under the CC BY-SA license.
Corresponding Author:
Ravikumar Hodikehosahally Channegowda
Department of Electronics and Communication Engineering, Visvesvaraya Technological University
Belagavi, Karnataka, India
Email: raviec40@gmail.com
1. INTRODUCTION
Text is one of the most expressive means of communication and can be embedded into documents
or into scenes as a means of communicating information. Text plays a crucial role in our daily lives and
reading it from videos and images are important values for the plentiful applications of the real world like
scene text, image retrieval, and recognition [1], office automation, assistance for blind people and geo-
locations consist of very convenient semantics for the understanding world. The reading of scene text gives a
rapid and automatic way to access the data of textual embodied in the natural scenes that are most powerful
representation given by scene text recognition and detection.
2. ο² ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 13, No. 1, February 2023: 413-424
414
Text spotting intends to identify and localize the images of natural text that have been studied in
prior techniques [2]. The techniques followed the traditional pipeline [3] treated the processes of text
recognition and identification individually in that the text methods are trained the text detector then it is fed
into the model of text recognition. This method looks straightforward and very simply but may lead to the
performance of sub-optimal for identification and recognition, where two given methods are most relevant
and complementary to one another. On the other hand, outcomes of recognition are highly dependent on
recognized text. The recognition outcomes are very helpful for eliminating the detections of false positive
(FP). Lately, He et al. [4] had started to integrate text recognition and detection with the help of an end-to-
end trainable-network that contains 2 models like the sequence-to-sequence (Seq2Seq) network and
identified a network for removing the text instances for estimating sequential labels for every single instance
of text. The main performance developments for text spots are gained by the given techniques, representing
recognition and identification model are corresponding in when they are also trained in the learning method
of end to end.
In order to perform the text spot through 2 phases such as a detector is utilized to recognize the text
in an image and after that, the recognition of text is used on the identified regions. The main drawbacks of
these given techniques rely on correlation ignorance and time cost among the text recognition and
identification. Thus, many techniques [5] were introduced to combine oriented and horizontal text
recognition and identification in the manner of end to end. Anyways, the scene texts are often performing in
an arbitrary shape. Instead of defining the text with quadrangles and rectangles, the text snake [6] defines the
text with the units of local series that behaves like the snake. This work is mainly focused on text detection in
an image, whereas the mask spotter of text is to recognize and identify the shapes of arbitrary text in which
the char segmentation is utilized based on text identifier [7], thus char level annotation is needed at the time
of training.
This paper represents unified techniques for arbitrary shape text-potting (ASTS) to identify and
recognize the arbitrary shapes by considering 3 semantics levels. With the spotter of image mask text and
mask region based convolutional neural networks (R-CNN) [8], we can articulate the detection of text as the
segmentation of instance task. Unlike the spotter of image mask text, we adapt image mask and original mask
R-CNN detection to allow identification at 3 levels such as sequence, pixel, and holistic-level semantics.
Particularly, holistic and pixel-level semantics can be utilized to recognize the texts and define the text
shapes respectively. Precisely, in mask and detection, we segment and recognize both character and word
instances. Furthermore, we can implement the detection of text through the segmentation result on 2-D
feature-space. Also, to tackle and identify the text issue of smaller and blurry texts, we contain recognition of
text with an attention-based of optical character-recognition (OCR) model based on mask R-CNN to consider
the semantics of the sequential level. We use an OCR module to estimate the character sequence through
feature maps of the word-instances in Seq2Seq, which is implemented in 1-D feature-space. Lastly, we
integrate 2 text detection outcomes based on edit distance among given lexicon and outcomes. In our
techniques, texts can be analyzed at 3 semantics levels consisting of the sequential level for recognition
mask, holistic level for an identified mask, and the pixel-level for the mask.
Finally, we propose a fine-grained learning technique that trains a more accurate and robust model
with the help of learning models through annotated datasets at the word level. First, we trained our
techniques utilizing a unified framework. Afterward, a trained model is applied to the annotated images at the
word level to discover character samples and exploit the annotations at the word level to defeat the false
character. Finally, the samples of the found character can be utilized as an annotation of character level and
integrated with the word level annotations to train a larger model. This paper is represented: section 2
represents the prior work related to text spotting, recognition, and detection. Section 3 represents the
proposed method before analyzing each model.
2. LITERATURE SURVEY
2.1. Text detection in image
Text identification plays an eminent part in the systems of text detection. Several techniques have
been introduced to recognize scene text [9]. Neumann and Matas [10] utilizes edge boxes to create proposals
and improve candidate boxes with the help of regression. Modifying solid-state drive (SSD) and the R-CNN
with modifications, Ren et al. [11] introduced to recognize the horizontal words. Recently, the text
identification of multi-oriented has become the most important topic.
Yao et al. [12] recognized an oriented based multi-scene text with the help of semantic
segmentation. Zhang et al. [13] introduced techniques that recognize text segments and join them into text
instances by link predictions and the spatial relationship, whereas Tian et al. [14] regressed the text boxes
through the segmentation maps. Shi et al. [15] introduced to recognize and connect the text points to create
3. Int J Elec & Comp Eng ISSN: 2088-8708 ο²
Customized mask region based convolutional neural β¦ (Ravikumar Hodikehosahally Channegowda)
415
the text boxes, whereas Lyu et al. [16] introduced the regression of rotation sensitivity for an oriented scene.
Dai et al. [17], the region proposal and feature improvement network are recognized to use representations of
improved features and instances of text shape to construct a bounding box. The pyramid region of interest
(RoI) pooling attention is improvised, which removes the segmentation of text size feature. The network of a
bounding box is utilized to remove the curve text.
2.2. Text recognition in image
Shi et al. [18] intended to decode the cropped and decoded image region into the character
sequences for the recognition of scene text. The recognition of scene text techniques can be parted into the
branches word-based techniques, Seq2Seq techniques, and character-based techniques. The character-based
detection techniques [19] mostly localize the individual character and detect and connect them into the
words. Jaderberg et al. [20] introduces the word-based technique which treats text detection as the issue of
sequence labeling. Shi et al. [21] utilized recurrent neural network (RNN) and convolutional neural network
(CNN) to model image features and outcome identified sequences with the help of centralized traffic control
(CTC), whereas Lee and Osindero [22] identified the scene text through the attention based Seq2Seq model.
Shelhamer [23] mainly focuses on demonstrating the complementarity module of character segmentation and
module of spatial attention, instead of introducing the novel module of spatial attention. The spatial attention
and the segmentation of integration characters do not minimize the required annotation of the character level
for training but also generate a robust model for text shapes.
2.3. Text spotting in image
Prior text spotting techniques divide the spotting process into 2 phases. First, scene text
identification was utilized [24] to contain text instances and then utilize the text recognizer to get identified
text. The text spotter utilizes semantic segmentation to recognize the spatial attention and arbitrary text
shapes for managing the irregular instances of text shapes by assuming global and local textual data [7]. The
text dragon defined text shape with the help of quadrangles sequence in order to manage the RoI-slide and
arbitrary text-shape, which interact the deep-network and also temporal based classification on text detector
[25]. The characters of location labeling are not required. The Wacnet [26] related the shared convolutional
network among char and word-level recognition and detection.
2.4. Object recognition in image
The development of object identification, instance/semantic segmentation, and deep learning (DL)
have gained more improvement. The recognition and identification of scene text have gained attention in the
last few years. Their technique is motivated by the given techniques. Specifically, their technique is also
inspired by the help of mask R-CNN of instance segmentation method [8]. Anyways, there are key
differences among mask branch technique which is introduced in mask-based R-CNN. The mask branch
cannot segment the regions of text also estimate the probability of text sequences and character maps that
means their technique can be utilized to detect an instance sequence within the character maps rather than
estimating the mask of an object only.
3. PROPOSED METHOD
This section represents introduced method and then defines four main modules such as a connected
network for extracting the image feature, text detection in an image for identifying character and word
instances, text recognition in an image for segmenting the outcome of text identification. Afterward, we
discover a fine-grained learning method for achieving our technique.
3.1. The architecture of the proposed method
Figure 1 represents the architecture of our proposed method. First, the image is faded into a shared
network that utilizes the backbone network in order to remove the feature maps by the input image and it
shares with the help of 3 subsequent like the operation of region-of-interest (RoI-align) and region-proposed
network (PN). Afterward, text identification identifies the semantics of the holistic level to categorize and
recognize the character instances and identify by rectifying region proposals utilizing the network of context
refinement. Meanwhile, the image text mask evaluates pixel-level-semantics for recognized instances to
conduct the segmentation. After that, an image test mask is utilized to define the character mask, and image
text shape is applied for primary recognition. Since blurred and smaller characters are expected to suffer
through failures, we introduced 2-D feature maps of the word instances into text detection to discover the
semantics of seq-level for exact identification. Lastly, an outcome of primary recognition and recognition
outputs are joined based on edit distance among outcomes and given lexicon.
4. ο² ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 13, No. 1, February 2023: 413-424
416
Figure 1. An architecture of proposed work
3.2. Connected network
This network contains risk priority number (RPN) and the backbone network. The feature of the
image is removed with a shared and back-bone with a sub-sequent network. The feature-pyramid networks
(FPNs) utilize the architecture of top-down to construct the feature map of a higher level at all of the scales
through each input image and with the help of the marginal-cost. Hence, we utilize the FPN with the help of
Res-Net, which is the backbone of the network.
The RPN is utilized to create the various sizes of the region and the aspect ratios for image mask
and sub-sequent R-CNN, and mask-R-CNN. In our method, we implement the RPN to construct the region
for three sub-sequent methods. We allocate various phases based on their sizes. Various aspect ratios are
utilized in every single phase. RoI pooling assigns a floating number to distinct feature maps, it also leads to
the misalignment issue. Then we apply the operation of RoI-align because it computes image-based features
on the coordinates in the floating type with a method of the bilinear interpolation, thus it delivers accurate
region alignments and the beneficial features to sub-sequent methods.
3.3. Text detection in image
The recognition intends region methods constructed with the help of RPN by estimating their classes
and assuming the regression of the bounding box to alter their coordinates. This process is expressed as the
identification of holistic-level semantics for region methods. This usually requires estimating two classes like
non-text and text. In prior work [27], text recognition intended to recognize both character instances and
words. Additionally, in non-text and text, we utilize the character data to train the identification method that
helps to learn discriminative representation and improvise the performance of identification. Based on the
spatial relationship between character and word instances, an outcome of character identification can be
utilized for text identification. This is equal to implementing the character-based text identification technique
while conducting text identification.
However, if methods overlap with a true word, existing refinement techniques may also suffer
refinement failure. Since methods do not consist of sufficient data for the perception of a holistic object.
Furthermore, the module of OCR is sensitive to the outcome of text localization. Therefore, we represent a
module of context refinement to argument existing refinement in-text identification. From the refinement of
context, the context data gathered from the surrounding regions are added into the representation of a unified
context to improvise the identification performance.
The identification of text loss function is defined as (1).
πΏππ π π = πΏππ π π(π , π) + π[π² β₯ 1]πΏππ π π(ππ²
, π³) (1)
Where the log loss is πΏππ π π(π, π) = βππππ π² for π² true class and π π² is estimated the confidence score for the
given class π². πΏππ π π is described as a tuple of the bounding box of true regression targets for π², π³ and ππ²
is
defined as an estimated tuple.
7. Int J Elec & Comp Eng ISSN: 2088-8708 ο²
Customized mask region based convolutional neural β¦ (Ravikumar Hodikehosahally Channegowda)
419
candidate character π£π₯ with the level of word GT βπ¦ and the threshold is π to choose the samples of positive
character. The threshold of confidence score π can set less because of constraints given by the annotations of
word-level that is also useful for preserving the samples of diversity character. Lastly, recognized samples of
positive character π can be used as character annotations and it can be integrated with the word-level-
annotations π’ to train an accurate and robust model of the text spot.
4. RESULTS AND DISCUSSION
In this section, we will discuss the result and analysis of our proposed approach, which will be
evaluated on most popular benchmarks dataset ICDAR 2013 [29] and ICDAR 2015 [30]. The proposed
model is compared with state-of-the-art methods. Furthermore, we will discuss experimental environment
and parameters, where the experiments were performed at Intel i5-7gen processor, 6 GB RAM, 1 TB solid
state drive and GPU of NVIDIA GTX2080Ti at PyTorch platform. The network is optimized by stochastic
gradient descent, images were resized to 640 x 640 with batch size of 12. The starting learning rate is
considered to be 10-3 and after 100 epochs learning rate is reduced to 10-4
.
4.1. ICDAR dataset, detection, and recognition protocol
There are two datasets used: ICDAR 2013 and ICDAR 2015. In ICDAR 2013 the number of
training images are 229 (which consist of 849 words) and the number of testing images are 233 (which
consist of 1,095 words). In this dataset, the texts placed are horizontal and annotated by the rectangles in
words. ICDAR 2015 is given at challenge of ICDAR 2015 βRobust reading competition for incidental scene
text detectionβ, where it contains 1,000 training images (which consist of 11,886 words) and testing images
of 500 (which consist of 5,230 words). In image dataset, the text areas are annotated by 4 quadrangle
vertices. Here, we provided analysis of evaluation protocols for recognition and text detection, and the
process of text detection is evaluated by the ICDAR protocol. In general, text recognition is evaluated using
end-to-end recognition procedure or word recognition accuracy.
In order to find the best possible match π‘(π’, π) for a rectangular π’ in the rectangle set π can be
written as (7),
π‘(π’, π) = πππ₯ππ(π’, π’β²)|π’β²
β π (7)
where π€π shows the match between two considered instances of text rectangles. It can be computed by
intersection area divided via minimum area of bounding box (i.e., contain both the rectangles), then the
considered evaluation parameters such as precision, recall and F-score can compute as (8),
Precision(P) =
β π‘(π’π¦,π)
π’π¦βπ
|π|
(8)
Recall(R) =
β π‘(π’π§,π)
π’π§βπ
|π|
(9)
βscore =
1
π€
π
+
(1βπ€)
π
(10)
where Z and Y represent the estimated rectangles and ground-truth set. The π’π§ and π’π¦ represents the
estimated rectangles and ground-truth, π€ is weight parameters. Text recognition performance is always
measured by the accuracy of word recognition. The word recognition accuracy (WRA) is simply defined as
the percentage of the recognized text is correct, as in (11).
ππ π΄ =
Number of words correctly recognized
Number of ground truth words
(11)
4.2. Proposed model comparison with state-of-art techniques
Our proposed model has been applied on some images and in Figure 4 shows the sample of texts
detected and recognized from ICDAR 2013 dataset. Table 1 shows the recall, precision, and F-score
comparison with state-of-art techniques at ICDAR 2013 dataset, ICDAR 2013 dataset comprises a huge
amount of text data that stretches across the entire image. As per table we can clearly say that our model has
performed considerably well at evaluation parameters.
8. ο² ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 13, No. 1, February 2023: 413-424
420
Figure 4. Sample of texts detected from ICDAR 2013 dataset
Table 1. Recall, precision, and F-Score comparison with state-of-art techniques at ICDAR 2013 dataset
Method Recall Precision F-Score
Textboxes++ [3] 74 88 81
Mask TextSpotter [7] 88.1 94.1 91
Connectionist text proposal network [14] 83 93 88
SegLink [15] 83 87.7 85.3
Lyu et al. [16] 79.4 93.3 85.8
Texboxes [24] 74 88 81
Fast oriented text spotting [27] 86.96
Fast oriented text spotting-Recg [27] 88.23
Deep direct regression [31] 81 96 86
Rotation region proposal networks [32] 72 90 80
PixelLink [33] 83.6 86.4 84.5
Border ResNet-50 [34] 86.9 87.8 87.4
Border DenseNet-121 [35] 87.1 91.5 89.2
Markov clustering network [34] 87 88 88
Rotation-sensitive regression detector [36] 75 88 81
He et al. [37] 89 95 91
HAM-ResNet50 [38] 81.28 89.62 85.25
HAM-ResNet50+IRB [38] 82.55 92.3 87.17
HAM-DenseNet-169 [38] 83.47 94.42 88.6
HAM-DenseNet-169+IRB [38] 83.56 94.52 88.7
Proposed method 90.56 97.83 94.05
Proposed model has obtained recall, precision and F-score as 90.56, 97.83 and 94.05, respectively
which 1.56%, 2.83%, 3.05% more compared to He et al. [37]. Figure 5 shows the sample of texts detected
from ICDAR 2015 dataset, samples in training set include 1000 images and 500 images from the ICDAR
2015 dataset. Table 2 shows the recall, precision, and F-score comparison with state-of-art techniques at
ICDAR 2015 dataset. Hidden anchor mechanism (HAM)-DenseNet-169+iterative regression box (IRB) [38]
has achieve an F-score of 89.21%, and HAM-DenseNet-169+IRB [38] has achieved a precision and recall of
90.69% and 87.77%. Our proposed method achieved an F-score of 89.61% and outperforms
HAM-DenseNet-169+IRB [38] by 0.44% in terms of the F-score. In Table 3, we have shown recognition
comparison with state-of-art techniques. Our proposed method has got 95.3% recognition accuracy in
ICDAR 2013 dataset and got 81.1% recognition accuracy at ICDAR 2015 dataset, which is better than
previous techniques.
9. Int J Elec & Comp Eng ISSN: 2088-8708 ο²
Customized mask region based convolutional neural β¦ (Ravikumar Hodikehosahally Channegowda)
421
Figure 5. Sample of texts detected from ICDAR 2015 dataset
Table 2. Recall, precision, and F-Score comparison with
state-of-art techniques at ICDAR 2015 dataset
Method Recall Precision F-Score
Textboxes++ [3] 78.5 87.8 82.9
TextSnake [6] 84.9 80.4 82.6
Mask TextSpotter [7] 81 91.6 86
SegLink [15] 76.8 73.1 75
Lyu et al. [16] 79.76 89.5 84.3
Fast oriented text spotting [27] 82.04 88.84 85.31
Fast oriented text spotting-Recg [27] 85.17 91 87.99
Deep direct regression [31] 80 88 83.8
Rotation region proposal networks [32] 77 84 80
PixelLink [33] 82 85.5 83.7
Markov clustering network [34] 80 72 76
Rotation-sensitive regression detector [36] 79 85.6 82.2
He et al. [37] 80 85 82
HAM-ResNet50 [38] 85.84 89.82 87.79
HAM-ResNet50+IRB [38] 85.8 89.9 87.8
HAM-DenseNet-169+IRB [38] 87.77 90.69 89.21
Instance transformation network [39] 74.1 85.7 79.5
Efficient and accurate scene text (EAST) [40] 78.3 83.3 80.7
Fused text segmentation networks [41] 80 88.6 84.1
IncepText [42] 84.3 89.4 86.8
Tian et al. [43] 85 88.3 86.6
Look more than once (LOMO) [44] 83.5 91.3 87.2
Proposed method 86.82 92.58 89.61
Table 3. Recognition performance
comparison
Methods ICDAR-13 ICDAR-15
[19] 87.6
[21] 89.6
[22] 90.0
[45] 92.8 80.0
[46] 81.8
[47] 88.6
[48] 90.8
[49] 93.3 70.6
[50] 85.2
[51] 91.1 60.0
[52] 92.9
[53] 68.2
[54] 94.0
[55] 94.4 73.9
[56] 91.5
[57] 92.4 68.8
[58] 89.7 68.9
[59] 91.0 69.2
[60] 91.3 76.9
[61] 93.9 78.7
[62] 91.9 74.2
[63] 92.9 75.5
[64] 76.1
Proposed method 95.3 81.1
10. ο² ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 13, No. 1, February 2023: 413-424
422
5. CONCLUSION
The demand and requirement of scene text detection and recognition have got cumulative attention
in various fields due to its potential. This paper we proposed an optimized detection and recognition
approach, text detection in an image is performed for identifying character and word instances. Text
recognition in an image for segmenting the outcome of text identification. In addition, we discovered a fine-
grained learning method to achieve optimized outcome. For evaluation, the most popular benchmarks dataset
ICDAR 2013 and ICDAR 2015 is considered, where the proposed model is compared with state-of-the-art
methods. Evaluation is performed based upon the recall, precision, F-score, and recognition performance.
The proposed model has performed better as compared with state-of-the-art techniques. In future work, we
will use our detection and recognition framework in video dataset.
ACKNOWLEDGEMENTS
We gratefully thank the Visvesvaraya Technological University, Jnana Sangama, Belagavi for
financial support extended to this research work.
REFERENCES
[1] X. Bai, M. Yang, P. Lyu, Y. Xu, and J. Luo, βIntegrating scene text and visual appearance for fine-grained image classification,β
IEEE Access, vol. 6, pp. 66322β66335, 2018, doi: 10.1109/ACCESS.2018.2878899.
[2] M. Busta, L. Neumann, and J. Matas, βDeep TextSpotter: an end-to-end trainable scene text localization and recognition
framework,β in 2017 IEEE International Conference on Computer Vision (ICCV), Oct. 2017, pp. 2223β2231, doi:
10.1109/ICCV.2017.242.
[3] M. Liao, B. Shi, and X. Bai, βTextBoxes++: a single-shot oriented scene text detector,β IEEE Transactions on Image Processing,
vol. 27, no. 8, pp. 3676β3690, Aug. 2018, doi: 10.1109/TIP.2018.2825107.
[4] T. He, Z. Tian, W. Huang, C. Shen, Y. Qiao, and C. Sun, βAn end-to-end textspotter with explicit alignment and attention,β in
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp. 5020β5029, doi:
10.1109/CVPR.2018.00527.
[5] H. Li, P. Wang, and C. Shen, βTowards end-to-end text spotting with convolutional recurrent neural networks,β in 2017 IEEE
International Conference on Computer Vision (ICCV), Oct. 2017, pp. 5248β5256, doi: 10.1109/ICCV.2017.560.
[6] S. Long, J. Ruan, W. Zhang, X. He, W. Wu, and C. Yao, βTextSnake: a flexible representation for detecting text of arbitrary
shapes,β in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), vol. 11206, 2018, pp. 19β35.
[7] P. Lyu, M. Liao, C. Yao, W. Wu, and X. Bai, βMask TextSpotter: an end-to-end trainable neural network for spotting text with
arbitrary shapes,β in Computer Vision ECCV 2018, Springer International Publishing, 2018, pp. 71β88.
[8] K. He, G. Gkioxari, P. Dollar, and R. Girshick, βMask R-CNN,β in 2017 IEEE International Conference on Computer Vision
(ICCV), Oct. 2017, pp. 2980β2988, doi: 10.1109/ICCV.2017.322.
[9] Y. Zhu, C. Yao, and X. Bai, βScene text detection and recognition: recent advances and future trends,β Frontiers of Computer
Science, vol. 10, no. 1, pp. 19β36, Feb. 2016, doi: 10.1007/s11704-015-4488-0.
[10] L. Neumann and J. Matas, βReal-time scene text localization and recognition,β in 2012 IEEE Conference on Computer Vision and
Pattern Recognition, Jun. 2012, pp. 3538β3545, doi: 10.1109/CVPR.2012.6248097.
[11] S. Ren, K. He, R. Girshick, and J. Sun, βFaster R-CNN: towards real-time object detection with region proposal networks,β IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137β1149, Jun. 2017, doi:
10.1109/TPAMI.2016.2577031.
[12] C. Yao, X. Bai, N. Sang, X. Zhou, S. Zhou, and Z. Cao, βScene text detection via holistic, multi-channel prediction,β CoRR
abs/1606.09002, Jun. 2016.
[13] Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai, βMulti-oriented text detection with fully convolutional networks,β in
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 4159β4167, doi:
10.1109/CVPR.2016.451.
[14] Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, βDetecting text in natural image with connectionist text proposal network,β in
Computer Vision ECCV 2016, Springer International Publishing, 2016, pp. 56β72, doi: 10.1007/978-3-319-46484-8_4.
[15] B. Shi, X. Bai, and S. Belongie, βDetecting oriented text in natural images by linking segments,β in 2017 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), Jul. 2017, pp. 3482β3490, doi: 10.1109/CVPR.2017.371.
[16] P. Lyu, C. Yao, W. Wu, S. Yan, and X. Bai, βMulti-oriented scene text detection via corner localization and region
segmentation,β in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp. 7553β7563, doi:
10.1109/CVPR.2018.00788.
[17] P. Dai, H. Zhang, and X. Cao, βDeep multi-scale context aware feature aggregation for curved scene text detection,β IEEE
Transactions on Multimedia, vol. 22, no. 8, pp. 1969β1984, Aug. 2020, doi: 10.1109/TMM.2019.2952978.
[18] B. Shi, M. Yang, X. Wang, P. Lyu, C. Yao, and X. Bai, βASTER: an attentional scene text recognizer with flexible rectification,β
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 9, pp. 2035β2048, Sep. 2019, doi:
10.1109/TPAMI.2018.2848939.
[19] A. Bissacco, M. Cummins, Y. Netzer, and H. Neven, βPhotoOCR: reading text in uncontrolled conditions,β in 2013 IEEE
International Conference on Computer Vision, Dec. 2013, pp. 785β792, doi: 10.1109/ICCV.2013.102.
[20] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, βSynthetic data and artificial neural networks for natural scene text
recognition,β arXiv.1406.222, 2014. doi:10.48550/arXiv.1406.2227.
[21] B. Shi, X. Bai, and C. Yao, βAn end-to-end trainable neural network for image-based sequence recognition and its application to
scene text recognition,β IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 11, pp. 2298β2304, Nov.
2017, doi: 10.1109/TPAMI.2016.2646371.
[22] C.-Y. Lee and S. Osindero, βRecursive recurrent nets with attention modeling for OCR in the wild,β in 2016 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 2231β2239, doi: 10.1109/CVPR.2016.245.
11. Int J Elec & Comp Eng ISSN: 2088-8708 ο²
Customized mask region based convolutional neural β¦ (Ravikumar Hodikehosahally Channegowda)
423
[23] E. Shelhamer, J. Long, and T. Darrell, βFully convolutional networks for semantic segmentation,β IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 39, no. 4, pp. 640β651, Apr. 2017, doi: 10.1109/TPAMI.2016.2572683.
[24] M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu, βTextBoxes: a fast text detector with a single deep neural network,β Proceedings
of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, Feb. 2017, doi: 10.1609/aaai.v31i1.11196.
[25] W. Feng, W. He, F. Yin, X.-Y. Zhang, and C.-L. Liu, βTextDragon: an end-to-end framework for arbitrary shaped text spotting,β
in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Oct. 2019, pp. 9075β9084, doi:
10.1109/ICCV.2019.00917.
[26] Y. Gao, Z. Huang, Y. Dai, K. Chen, J. Guo, and W. Qiu, βWacnet: word segmentation guided characters aggregation net for scene
text spotting with arbitrary shapes,β in 2019 IEEE International Conference on Image Processing (ICIP), Sep. 2019,
pp. 3382β3386, doi: 10.1109/ICIP.2019.8803529.
[27] X. Liu, D. Liang, S. Yan, D. Chen, Y. Qiao, and J. Yan, βFOTS: fast oriented text spotting with a unified network,β in 2018
IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp. 5676β5685, doi: 10.1109/CVPR.2018.00595.
[28] J. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio, βAttention-based models for speech recognition,β in
arXiv:1506.07503v1, Jun. 2015, doi: 10.48550/arXiv.1506.07503.
[29] D. Karatzas et al., βICDAR 2013 robust reading competition,β in 2013 12th International Conference on Document Analysis and
Recognition, Aug. 2013, pp. 1484β1493, doi: 10.1109/ICDAR.2013.221.
[30] D. Karatzas et al., βICDAR 2015 competition on robust reading,β in 2015 13th International Conference on Document Analysis
and Recognition (ICDAR), Aug. 2015, pp. 1156β1160, doi: 10.1109/ICDAR.2015.7333942.
[31] W. He, X.-Y. Zhang, F. Yin, and C.-L. Liu, βDeep direct regression for multi-oriented scene text detection,β in 2017 IEEE
International Conference on Computer Vision (ICCV), Oct. 2017, pp. 745β753, doi: 10.1109/ICCV.2017.87.
[32] J. Ma et al., βArbitrary-oriented scene text detection via rotation proposals,β IEEE Transactions on Multimedia, vol. 20, no. 11,
pp. 3111β3122, Nov. 2018, doi: 10.1109/TMM.2018.2818020.
[33] D. Deng, H. Liu, X. Li, and D. Cai, βPixelLink: detecting scene text via instance segmentation,β Proceedings of the AAAI
Conference on Artificial Intelligence, vol. 32, no. 1, Apr. 2018, doi: 10.1609/aaai.v32i1.12269.
[34] Z. Liu, G. Lin, S. Yang, J. Feng, W. Lin, and W. L. Goh, βLearning Markov clustering networks for scene text detection,β in 2018
IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp. 6936β6944, doi: 10.1109/CVPR.2018.00725.
[35] C. Xue, S. Lu, and F. Zhan, βAccurate scene text detection through border semantics awareness and bootstrapping,β in Computer
Vision ECCV 2018, Springer International Publishing, 2018, pp. 370β387.
[36] M. Liao, Z. Zhu, B. Shi, G. Xia, and X. Bai, βRotation-sensitive regression for oriented scene text detection,β in 2018 IEEE/CVF
Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp. 5909β5918, doi: 10.1109/CVPR.2018.00619.
[37] W. He, X.-Y. Zhang, F. Yin, and C.-L. Liu, βMulti-oriented and multi-lingual scene text detection with direct regression,β IEEE
Transactions on Image Processing, vol. 27, no. 11, pp. 5406β5419, Nov. 2018, doi: 10.1109/TIP.2018.2855399.
[38] J.-B. Hou et al., βHAM: hidden anchor mechanism for scene text detection,β IEEE Transactions on Image Processing, vol. 29,
pp. 7904β7916, 2020, doi: 10.1109/TIP.2020.3008863.
[39] F. Wang, L. Zhao, X. Li, X. Wang, and D. Tao, βGeometry-aware scene text detection with instance transformation network,β in
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp. 1381β1389, doi:
10.1109/CVPR.2018.00150.
[40] X. Zhou et al., βEAST: an efficient and accurate scene text detector,β in 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), Jul. 2017, pp. 2642β2651, doi: 10.1109/CVPR.2017.283.
[41] Y. Dai et al., βFused text segmentation networks for multi-oriented scene text detection,β in 2018 24th International Conference
on Pattern Recognition (ICPR), Aug. 2018, pp. 3604β3609, doi: 10.1109/ICPR.2018.8546066.
[42] Q. Yang, M. Cheng, W. Zhou, Y. Chen, M. Qiu, and W. Lin, βIncepText: a new inception-text module with deformable PSROI
pooling for multi-oriented scene text detection,β in Proceedings of the Twenty-Seventh International Joint Conference on
Artificial Intelligence, Jul. 2018, pp. 1071β1077, doi: 10.24963/ijcai.2018/149.
[43] Z. Tian et al., βLearning shape-aware embedding for scene text detection,β in 2019 IEEE/CVF Conference on Computer Vision
and Pattern Recognition (CVPR), Jun. 2019, pp. 4229β4238, doi: 10.1109/CVPR.2019.00436.
[44] C. Zhang et al., βLook more than once: an accurate detector for text of arbitrary shapes,β in 2019 IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR), Jun. 2019, pp. 10544β10553, doi: 10.1109/CVPR.2019.01080.
[45] Z. Qiao, Y. Zhou, D. Yang, Y. Zhou, and W. Wang, βSEED: semantics enhanced encoder-decoder framework for scene text
recognition,β in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, pp. 13525β13534,
doi: 10.1109/CVPR42600.2020.01354.
[46] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, βDeep structured output learning for unconstrained text recognition,β
Computer Vision and Pattern Recognition, 2014, doi: 10.48550/arXiv.1412.5903.
[47] B. Shi, X. Wang, P. Lyu, C. Yao, and X. Bai, βRobust scene text recognition with automatic rectification,β in 2016 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 4168β4176, doi: 10.1109/CVPR.2016.452.
[48] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, βReading text in the wild with convolutional neural networks,β
International Journal of Computer Vision, vol. 116, no. 1, pp. 1β20, Jan. 2016, doi: 10.1007/s11263-015-0823-z.
[49] Z. Cheng, F. Bai, Y. Xu, G. Zheng, S. Pu, and S. Zhou, βFocusing attention: towards accurate text recognition in natural images,β
in 2017 IEEE International Conference on Computer Vision (ICCV), Oct. 2017, pp. 5086β5094, doi: 10.1109/ICCV.2017.543.
[50] F. Yin, Y.-C. Wu, X.-Y. Zhang, and C.-L. Liu, βScene text recognition with sliding convolutional character models,β Computer
Vision and Pattern Recognition, Sep. 2017.
[51] W. Liu, C. Chen, and K.-Y. Wong, βChar-Net: a character-aware neural network for distorted scene text recognition,β
Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, Apr. 2018, doi: 10.1609/aaai.v32i1.12246.
[52] Z. Liu, Y. Li, F. Ren, W. L. Goh, and H. Yu, βSqueezedText: a real-time scene text recognition by binary convolutional encoder-
decoder network,β Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, Apr. 2018, doi:
10.1609/aaai.v32i1.12252.
[53] Z. Cheng, Y. Xu, F. Bai, Y. Niu, S. Pu, and S. Zhou, βAON: towards arbitrarily-oriented text recognition,β in 2018 IEEE/CVF
Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp. 5571β5579, doi: 10.1109/CVPR.2018.00584.
[54] Y. Liu, Z. Wang, H. Jin, and I. Wassell, βSynthetically supervised feature learning for scene text recognition,β in Computer Vision
ECCV 2018, Springer International Publishing, 2018, pp. 449β465.
[55] F. Bai, Z. Cheng, Y. Niu, S. Pu, and S. Zhou, βEdit probability for scene text recognition,β in 2018 IEEE/CVF Conference on
Computer Vision and Pattern Recognition, Jun. 2018, pp. 1508β1516, doi: 10.1109/CVPR.2018.00163.
[56] M. Liao et al., βScene text recognition from two-dimensional perspective,β Proceedings of the AAAI Conference on Artificial
Intelligence, vol. 33, pp. 8714β8721, Jul. 2019, doi: 10.1609/aaai.v33i01.33018714.
[57] C. Luo, L. Jin, and Z. Sun, βMORAN: A multi-object rectified attention network for scene text recognition,β Pattern Recognition,
12. ο² ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 13, No. 1, February 2023: 413-424
424
vol. 90, pp. 109β118, Jun. 2019, doi: 10.1016/j.patcog.2019.01.020.
[58] Z. Xie, Y. Huang, Y. Zhu, L. Jin, Y. Liu, and L. Xie, βAggregation cross-entropy for sequence recognition,β in 2019 IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2019, pp. 6531β6540, doi: 10.1109/CVPR.2019.00670.
[59] H. Li, P. Wang, C. Shen, and G. Zhang, βShow, attend and read: a simple and strong baseline for irregular text recognition,β
Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8610β8617, Jul. 2019, doi:
10.1609/aaai.v33i01.33018610.
[60] F. Zhan and S. Lu, βESIR: end-to-end scene text recognition via iterative image rectification,β in 2019 IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR), Jun. 2019, pp. 2054β2063, doi: 10.1109/CVPR.2019.00216.
[61] M. Yang et al., βSymmetry-constrained rectification network for scene text recognition,β in 2019 IEEE/CVF International
Conference on Computer Vision (ICCV), Oct. 2019, pp. 9146β9155, doi: 10.1109/ICCV.2019.00924.
[62] C. Wang and C.-L. Liu, βScene text recognition by attention network with gated embedding,β in 2020 International Joint
Conference on Neural Networks (IJCNN), Jul. 2020, pp. 1β8, doi: 10.1109/IJCNN48605.2020.9206802.
[63] X. Chen, T. Wang, Y. Zhu, L. Jin, and C. Luo, βAdaptive embedding gate for attention-based scene text recognition,β
Neurocomputing, vol. 381, pp. 261β271, Mar. 2020, doi: 10.1016/j.neucom.2019.11.049.
[64] C. Luo, Y. Zhu, L. Jin, and Y. Wang, βLearn to augment: joint data augmentation and network optimization for text recognition,β
in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, pp. 13743β13752, doi:
10.1109/CVPR42600.2020.01376.
BIOGRAPHIES OF AUTHORS
Ravikumar Hodikehosahally Channegowda is a research scholar in the
Department of Electronics and Communication Engineering at KSSEM, Bengaluru, affiliated
to VTU, Belagavi and currently working as asst. professor in Dept. of ECE at Ghousia College
of Engineering, Ramanagaram. He has done his masters in VLSI design and embedded
systems from VTU Extension Centre, PESCE, Mandya. His areas of interest are image
processing, machine learning, pattern recognition and multimedia concepts. He can be
contacted at raviec40@gmail.com.
Palani Karthik received a doctoral degree from Dr MGR University, Master
from Sathyabama University, during the year 2013 and 2006. Currently he is working as a
professor in the Dept. of Electronics and Communication Engineering at K S School of
Engineering, Bengaluru. His current research interests are acoustics sensors, image processing,
machine learning, smart grids and mobile communication. He is actively involved in various
professional bodies like IEEE senior member, member in IEI and Member in ISTE and
IAENG and ACM. He can be contacted at karthik.p@kssem.edu.in.
Raghavendra Srinivasaiah is currently working as Associate Professor in the
Department of Computer Science and Engineering at Christ Deemed to be University,
Bangalore. He completed his Ph.D. degree in Computer Science and Engineering from VTU,
Belgaum, India in 2017 and has 17 years of teaching experience. His interests include data
mining, artificial intelligence, and big data. He can be contacted at raghav.trg@gmail.com.
Mahadev Shivaraj received the M.Tech. Degree in VLSI Design and Embedded
System from R.V.C.E, Bengaluru, under VTU, Belagavi, Karnataka, India in 2007. He
received the B.E. degree in Electronics and Communication Engineering from VTU, Belagavi,
Karnataka, India in 2003. He is currently working as an assistant professor in Ghousia College
of Engineering, Ramanagara. He has published two patents. His area of research interest
includes image processing, hardware description language machine learning and Python. He
can be contacted at mahadevkhanderao@gmail.com.