This research tackles the challenge of manual data extraction from product labels by employing a blend of
computer vision and Natural Language Processing (NLP). We introduce an enhanced model that combines
Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) in a Convolutional
Recurrent Neural Network (CRNN) for reliable text recognition. Our model is further refined by
incorporating the Tesseract OCR engine, enhancing its applicability in Optical Character Recognition
(OCR) tasks. The methodology is augmented by NLP techniques and extended through the Open Food
Facts API (Application Programming Interface) for database population and text-only label prediction.
The CRNN model is trained on encoded labels and evaluated for accuracy on a dedicated test set.
Importantly, our approach enables visually impaired individuals to access essential information on
product labels, such as directions and ingredients. Overall, the study highlights the efficacy of deep
learning and OCR in automating label extraction and recognition.
OPTICAL CHARACTER RECOGNITION IN HEALTHCAREIRJET Journal
This document discusses an optical character recognition (OCR) model for extracting text information from medical records using machine learning and deep learning. The proposed OCR model would speed up access to medical records and ensure data is available electronically with no errors. It would recognize characters in medical form images and convert paper records to electronic format. The document then reviews several related works on OCR, including methods using Tesseract, character recognition models using neural networks, and OCR systems for assisting the visually impaired. It concludes with a discussion of different feature extraction and machine learning methods used for text categorization and character recognition.
INVESTIGATING THE EFFECT OF BD-CRAFT TO TEXT DETECTION ALGORITHMSijaia
With the rise and development of deep learning, computer vision and document analysis has influenced the
area of text detection. Despite significant efforts in improving text detection performance, it remains to be
challenging, as evident by the series of the Robust Reading Competitions. This study investigates the impact
of employing BD-CRAFT – a variant of CRAFT that involves automatic image classification utilizing a
Laplacian operator and further preprocess the classified blurry images using blind deconvolution to the
top-ranked algorithms, SenseTime and TextFuseNet. Results revealed that the proposed method
significantly enhanced the detection performances of the said algorithms. TextFuseNet + BD-CRAFT
achieved an outstanding h-mean result of 93.55% and shows an impressive improvement of over 4%
increase to its precision yielding 95.71% while SenseTime + BD-CRAFT placed first with a very
remarkable 95.22% h-mean and exhibited a huge precision improvement of over 4%.
Investigating the Effect of BD-CRAFT to Text Detection Algorithmsgerogepatton
The document summarizes a study that investigated the effect of applying a blind deconvolution technique called BD-CRAFT to improve the performance of two state-of-the-art text detection algorithms, SenseTime and TextFuseNet. BD-CRAFT automatically classifies images as blurry or non-blurry using a Laplacian operator threshold, and applies blind deconvolution to deblur the blurry images. The study found that combining BD-CRAFT with SenseTime and TextFuseNet significantly improved their text detection performances on the ICDAR 2013 dataset, with TextFuseNet + BD-CRAFT achieving a 93.55% h-mean and SenseTime + BD-CRAFT
Because of the rapid growth in technology breakthroughs, including
multimedia and cell phones, Telugu character recognition (TCR) has recently
become a popular study area. It is still necessary to construct automated and
intelligent online TCR models, even if many studies have focused on offline
TCR models. The Telugu character dataset construction and validation using
an Inception and ResNet-based model are presented. The collection of 645
letters in the dataset includes 18 Achus, 38 Hallus, 35 Othulu, 34×16
Guninthamulu, and 10 Ankelu. The proposed technique aims to efficiently
recognize and identify distinctive Telugu characters online. This model's main
pre-processing steps to achieve its goals include normalization, smoothing,
and interpolation. Improved recognition performance can be attained by using
stochastic gradient descent (SGD) to optimize the model's hyperparameters.
Product Label Reading System for visually challenged peopleIRJET Journal
1) The document proposes a camera-based assistive text reading system to help blind people read text labels on handheld objects. It uses computer vision techniques like stroke width transform to isolate the object of interest and detect the region of interest.
2) In the region of interest, the system performs text localization using gradient features and edge distributions. It then recognizes text using optical character recognition and outputs it verbally for the user.
3) The system aims to achieve robust text extraction and recognition from complex backgrounds while focusing on usability. It analyzes existing assistive technologies and proposes an improved workflow including image capture, processing, and audio output.
The document presents a novel pre-processing approach to improve optical character recognition (OCR) accuracy on document images. The approach uses a stack of enhancement techniques including illumination adjustment, grayscale conversion, sharpening, and binarization. Specifically, it applies contrast limited adaptive histogram equalization to adjust illumination, uses a luminance algorithm for grayscale optimization, employs unsharp masking to enhance text, and performs Otsu's method for binarization. The proposed method aims to compensate for distortions and improve OCR accuracy in a nonparametric and unsupervised manner. It is evaluated on standard datasets and shown to significantly improve text detection and OCR performance.
This document summarizes and reviews various techniques for optical character recognition (OCR) of English text, including matrix matching, fuzzy logic, feature extraction, structural analysis, and neural networks. It discusses the structure and stages of OCR systems, including image preprocessing, segmentation, feature extraction, classification, and output. Challenges for OCR systems include degraded documents like old books, photocopies, and newspapers. The document reviews several related works on OCR and discusses techniques to improve recognition of degraded text.
This document summarizes and reviews various techniques for optical character recognition (OCR) of English text, including matrix matching, fuzzy logic, feature extraction, structural analysis, and neural networks. It discusses the structure and stages of OCR systems, including image preprocessing, segmentation, feature extraction, classification, and output. Challenges for OCR systems include degraded documents like old books, photocopies, and newspapers. The document reviews several related works on OCR and discusses techniques for English, Indian languages, license plate recognition, document binarization, and removing "bleed-through" effects from financial documents.
OPTICAL CHARACTER RECOGNITION IN HEALTHCAREIRJET Journal
This document discusses an optical character recognition (OCR) model for extracting text information from medical records using machine learning and deep learning. The proposed OCR model would speed up access to medical records and ensure data is available electronically with no errors. It would recognize characters in medical form images and convert paper records to electronic format. The document then reviews several related works on OCR, including methods using Tesseract, character recognition models using neural networks, and OCR systems for assisting the visually impaired. It concludes with a discussion of different feature extraction and machine learning methods used for text categorization and character recognition.
INVESTIGATING THE EFFECT OF BD-CRAFT TO TEXT DETECTION ALGORITHMSijaia
With the rise and development of deep learning, computer vision and document analysis has influenced the
area of text detection. Despite significant efforts in improving text detection performance, it remains to be
challenging, as evident by the series of the Robust Reading Competitions. This study investigates the impact
of employing BD-CRAFT – a variant of CRAFT that involves automatic image classification utilizing a
Laplacian operator and further preprocess the classified blurry images using blind deconvolution to the
top-ranked algorithms, SenseTime and TextFuseNet. Results revealed that the proposed method
significantly enhanced the detection performances of the said algorithms. TextFuseNet + BD-CRAFT
achieved an outstanding h-mean result of 93.55% and shows an impressive improvement of over 4%
increase to its precision yielding 95.71% while SenseTime + BD-CRAFT placed first with a very
remarkable 95.22% h-mean and exhibited a huge precision improvement of over 4%.
Investigating the Effect of BD-CRAFT to Text Detection Algorithmsgerogepatton
The document summarizes a study that investigated the effect of applying a blind deconvolution technique called BD-CRAFT to improve the performance of two state-of-the-art text detection algorithms, SenseTime and TextFuseNet. BD-CRAFT automatically classifies images as blurry or non-blurry using a Laplacian operator threshold, and applies blind deconvolution to deblur the blurry images. The study found that combining BD-CRAFT with SenseTime and TextFuseNet significantly improved their text detection performances on the ICDAR 2013 dataset, with TextFuseNet + BD-CRAFT achieving a 93.55% h-mean and SenseTime + BD-CRAFT
Because of the rapid growth in technology breakthroughs, including
multimedia and cell phones, Telugu character recognition (TCR) has recently
become a popular study area. It is still necessary to construct automated and
intelligent online TCR models, even if many studies have focused on offline
TCR models. The Telugu character dataset construction and validation using
an Inception and ResNet-based model are presented. The collection of 645
letters in the dataset includes 18 Achus, 38 Hallus, 35 Othulu, 34×16
Guninthamulu, and 10 Ankelu. The proposed technique aims to efficiently
recognize and identify distinctive Telugu characters online. This model's main
pre-processing steps to achieve its goals include normalization, smoothing,
and interpolation. Improved recognition performance can be attained by using
stochastic gradient descent (SGD) to optimize the model's hyperparameters.
Product Label Reading System for visually challenged peopleIRJET Journal
1) The document proposes a camera-based assistive text reading system to help blind people read text labels on handheld objects. It uses computer vision techniques like stroke width transform to isolate the object of interest and detect the region of interest.
2) In the region of interest, the system performs text localization using gradient features and edge distributions. It then recognizes text using optical character recognition and outputs it verbally for the user.
3) The system aims to achieve robust text extraction and recognition from complex backgrounds while focusing on usability. It analyzes existing assistive technologies and proposes an improved workflow including image capture, processing, and audio output.
The document presents a novel pre-processing approach to improve optical character recognition (OCR) accuracy on document images. The approach uses a stack of enhancement techniques including illumination adjustment, grayscale conversion, sharpening, and binarization. Specifically, it applies contrast limited adaptive histogram equalization to adjust illumination, uses a luminance algorithm for grayscale optimization, employs unsharp masking to enhance text, and performs Otsu's method for binarization. The proposed method aims to compensate for distortions and improve OCR accuracy in a nonparametric and unsupervised manner. It is evaluated on standard datasets and shown to significantly improve text detection and OCR performance.
This document summarizes and reviews various techniques for optical character recognition (OCR) of English text, including matrix matching, fuzzy logic, feature extraction, structural analysis, and neural networks. It discusses the structure and stages of OCR systems, including image preprocessing, segmentation, feature extraction, classification, and output. Challenges for OCR systems include degraded documents like old books, photocopies, and newspapers. The document reviews several related works on OCR and discusses techniques to improve recognition of degraded text.
This document summarizes and reviews various techniques for optical character recognition (OCR) of English text, including matrix matching, fuzzy logic, feature extraction, structural analysis, and neural networks. It discusses the structure and stages of OCR systems, including image preprocessing, segmentation, feature extraction, classification, and output. Challenges for OCR systems include degraded documents like old books, photocopies, and newspapers. The document reviews several related works on OCR and discusses techniques for English, Indian languages, license plate recognition, document binarization, and removing "bleed-through" effects from financial documents.
Correcting optical character recognition result via a novel approachIJICTJOURNAL
Optical character recognition (OCR) is a recognition system used to recognize the substance of a checked picture. This system gives erroneous results, which necessitates a post-treatment, for the sentence correction. In this paper, we proposed a new method for syntactic and semantic correction of sentences it is based on the frequency of two correct words in the sentence and a recursive technique. This approach starts with the frequency calculation of each two words successive in the corpora, the words that have the greatest frequency build a correction center. We found 98% using our approach when we used the noisy channel. Further, we obtained 96% using the same corpus in the same conditions.
IRJET- Image to Text Conversion using TesseractIRJET Journal
This document discusses using Tesseract OCR engine to convert images containing text into editable text files. It begins with an abstract describing how digital images often contain text data that users need to access and edit digitally. Tesseract is an open-source OCR tool that uses neural networks like LSTM to recognize text in images with high accuracy and convert it into editable text. It then reviews existing OCR methods before describing Tesseract's image processing and recognition steps in more detail. The document also notes that the converted text could then be used to create audio files for visually impaired users to hear the text content.
Medical Prescription Scanner using Optical Character RecognitionIRJET Journal
This document describes a proposed system to digitize medical prescriptions using optical character recognition (OCR). The system would allow users to scan prescriptions using their device's camera. Deep learning techniques like convolutional neural networks would be used to identify characters and text in the prescriptions despite variability in handwriting. The system is meant to help patients and doctors better manage prescriptions by storing them digitally rather than on paper. It discusses techniques for collecting prescription data, preprocessing images, segmenting characters, training and testing OCR models, and implementing the system in an application. Popular tools like TensorFlow, OpenCV, Keras and NumPy would be utilized.
Visual Product Identification For Blind PeoplesIRJET Journal
1) The document presents a prototype system to assist blind people in reading printed text on handheld objects. It uses a camera, data processing, and audio output.
2) A motion-based method is used to isolate the object of interest from other objects in the camera view. Text localization and recognition algorithms are then used to extract and identify text from the isolated region.
3) The system is evaluated on its ability to localize and recognize text from images of objects with complex backgrounds, and on its usability with blind users. Future work will focus on improving text localization and the user interface.
Sign Language Detection and Classification using Hand Tracking and Deep Learn...IRJET Journal
The document presents a research paper on developing a real-time sign language detection system using hand tracking and deep learning. It aims to address the communication barrier faced by deaf individuals by automatically recognizing and interpreting sign language gestures. The researchers collected data, tracked hands using computer vision techniques, and classified gestures using machine learning models. Experimental results showed high confidence scores and real-time performance, demonstrating the potential of the system to facilitate independent communication for the deaf community. However, challenges remain around hand occlusion, lighting variations, and recognizing a wide range of sign language gestures.
IRJET- Design and Development of Tesseract-OCR Based Assistive System to Conv...IRJET Journal
The document describes the design and development of an assistive system using Tesseract optical character recognition (OCR) and a Raspberry Pi to convert captured text into voice output for visually impaired users. A Raspberry Pi with a webcam or mobile camera is used to focus on and capture printed text. The text is processed using OCR and segmentation before feature extraction and character recognition with Tesseract. Recognized text characters are converted to audio format using Festival so they can be accessed by blind users. An ultrasonic sensor is also included to allow users to determine the type of object, such as a menu, being interacted with. The system aims to provide faster reading access compared to braille.
A NOVEL SCHEME FOR ACCURATE REMAINING USEFUL LIFE PREDICTION FOR INDUSTRIAL I...ijaia
In the era of the fourth industrial revolution, measuring and ensuring the reliability, efficiency and safety of the industrial systems and components are one of the uppermost key concern. In addition, predicting performance degradation or remaining useful life (RUL) of an equipment over time based on its historical sensor data enables companies to greatly reduce their maintenance cost. In this way, companies can prevent costly unexpected breakdown and become more profitable and competitive in the marketplace. This paper introduces a deep learning-based method by combining CNN(Convolutional Neural Networks) and LSTM (Long Short-Term Memory)neural networks to predict RUL for industrial equipment. The proposed method does not depend upon any degradation trend assumptions and it can learn complex temporal representative and distinguishing patterns in the sensor data. In order to evaluate the efficiency and effectiveness of the proposed method, we evaluated it on two different experiment: RUL estimation and predicting the status of the IoT devices in 2-week period. Experiments are conducted on a publicly available NASA’s turbo fan-engine dataset. Based on the experiment results, the deep learning-based approach achieved high prediction accuracy. Moreover, the results show that the method outperforms standard well-accepted machine learning algorithms and accomplishes competitive performance when compared to the state-of-the art methods
A NOVEL SCHEME FOR ACCURATE REMAINING USEFUL LIFE PREDICTION FOR INDUSTRIAL I...gerogepatton
In the era of the fourth industrial revolution, measuring and ensuring the reliability, efficiency and safety of the industrial systems and components are one of the uppermost key concern. In addition, predicting performance degradation or remaining useful life (RUL) of an equipment over time based on its historical sensor data enables companies to greatly reduce their maintenance cost. In this way, companies can prevent costly unexpected breakdown and become more profitable and competitive in the marketplace. This paper introduces a deep learning-based method by combining CNN(Convolutional Neural Networks) and LSTM (Long Short-Term Memory)neural networks to predict RUL for industrial equipment. The proposed method does not depend upon any degradation trend assumptions and it can learn complex temporal representative and distinguishing patterns in the sensor data. In order to evaluate the efficiency and effectiveness of the proposed method, we evaluated it on two different experiment: RUL estimation and predicting the status of the IoT devices in 2-week period. Experiments are conducted on a publicly available NASA’s turbo fan-engine dataset. Based on the experiment results, the deep learning-based approach achieved high prediction accuracy. Moreover, the results show that the method outperforms standard well-accepted machine learning algorithms and accomplishes competitive performance when compared to the state-of-the art methods
Character recognition of kannada text in scene images using neuralIAEME Publication
Character recognition in scene images is one of the most fascinating and challenging
areas of pattern recognition with various practical application potentials. It can contribute
immensely to the advancement of an automation process and can improve the interface
between man and machine in many applications. Some practical application potentials of
character recognition system are: reading aid for the blind, traffic guidance systems, tour
guide systems, location aware systems and many more. In this work, a novel method for
recognizing basic Kannada characters in natural scene images is proposed. The proposed
method uses zone wise horizontal and vertical profile based features of character images. The
method works in two phases. During training, zone wise vertical and horizontal profile based
features are extracted from training samples and neural network is trained. During testing, the
test image is processed to obtain features and recognized using neural network classifier. The
method has been evaluated on 490 Kannada character images captured from 2 Mega Pixels
cameras on mobile phones at various sizes 240x320, 600x800 and 900x1200, which contains
samples of different sizes, styles and with different degradations, and achieves an average
recognition accuracy of 92%. The system is efficient and insensitive to the variations in size
and font, noise, blur and other degradations.
Character recognition of kannada text in scene images using neuralIAEME Publication
The document summarizes a proposed method for recognizing Kannada characters in low resolution scene images using neural networks. It involves extracting zone-wise horizontal and vertical profile features from character images. During training, features are extracted from samples and used to train a neural network. During testing, features are extracted from test images and recognized using the trained neural network classifier. The method achieved an average recognition accuracy of 92% on 490 Kannada character images captured from mobile phones under varying conditions.
IRJET- Review on Text Recognization of Product for Blind Person using MATLABIRJET Journal
This document summarizes a research paper that proposes a system to help blind people read text on product labels and documents using a camera and MATLAB software. The system uses image processing techniques like converting images to grayscale, binarization, and filtering to isolate text from complex backgrounds. It then applies optical character recognition to identify the text and provide information to blind users. The proposed system aims to address limitations of prior methods that struggled with non-horizontal text, complex backgrounds, and positioning objects in the camera view. It extracts a region of interest around a product using motion detection and recognizes text regardless of orientation.
IRJET- A Survey on Image Retrieval using Machine LearningIRJET Journal
This document summarizes a research paper on using machine learning for image retrieval. It discusses using optical character recognition (OCR) to extract text from images and then translate that text into machine-readable formats. The document reviews related work on combining visual and text features for image retrieval. It also provides an overview of the proposed system architecture, which combines color, texture, and edge features for image retrieval and evaluates performance using metrics like sensitivity, specificity, retrieval score, and accuracy.
IRJET- Text Recognization of Product for Blind Person using MATLABIRJET Journal
This document presents a camera-based text recognition system developed using MATLAB to assist blind or visually impaired individuals in reading product labels and text from handheld objects. The system uses a camera to capture an image of a product label, applies preprocessing techniques like grayscale conversion and thresholding to extract the text, then uses optical character recognition (OCR) to identify the text and provide it in an audio output. Key steps include text detection using maximally stable extremal regions (MSER) and stroke width transform algorithms, geometric filtering to remove non-text regions, and template matching OCR. The system aims to improve on previous methods by handling more complex backgrounds and text orientations like vertical text. Future work may include recognizing text from more challenging
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUESijcsitcejournal
Optical Character Recognition (OCR) is the process which enables a system to without human intervention
identifies the scripts or alphabets written into the users’ verbal communication. Optical Character
identification has grown to be individual of the mainly flourishing applications of knowledge in the field of
pattern detection and artificial intelligence. In our survey we study on the various OCR techniques. In this
paper we resolve and examine the hypothetical and numerical models of Optical Character Identification.
The Optical character identification or classification (OCR) and Magnetic Character Recognition (MCR)
techniques are generally utilized for the recognition of patterns or alphabets. In general the alphabets are
in the variety of pixel pictures and it could be either handwritten or stamped, of any series, shape or
direction etc. Alternatively in MCR the alphabets are stamped with magnetic ink and the studying machine
categorize the alphabet on the basis of the exclusive magnetic field that is shaped by every alphabet. Both
MCR and OCR discover utilization in banking and different trade appliances. Earlier exploration going on
Optical Character detection or recognition has shown that the In Handwritten text there is no limitation
lying on the script technique. Hand written correspondence is complicated to be familiar through due to
diverse human handwriting style, disparity in angle, size and shape of calligraphy. An assortment of
approaches of Optical Character Identification is discussed here all along through their achievement.
This document provides a summary of a minor project report on image recognition submitted in partial fulfillment of the requirements for a Bachelor of Technology degree in Computer Science and Engineering. The report was submitted by Bhaskar Tripathi and Joel Jose in October 2018 under the supervision of Dr. P. Mohamed Fathimal, Assistant Professor in the Department of Computer Science and Engineering at SRM Institute of Science and Technology. The report includes acknowledgements, a table of contents, and chapters on the introduction, project details, tools and technologies used, proposed system architecture, modules and functionality.
Real Time Sign Language Translation Using Tensor Flow Object DetectionIRJET Journal
This document describes a real-time sign language translation system developed using TensorFlow object detection. The system was able to detect Indian sign language alphabets in real-time with an average accuracy of 87.4% after training an SSD MobileNet v2 model on a dataset of 500 images containing signs for the English alphabet. Future work may focus on improving accuracy, reducing latency for real-time translation, and recognizing facial expressions in addition to hand gestures.
CONTENT RECOVERY AND IMAGE RETRIVAL IN IMAGE DATABASE CONTENT RETRIVING IN TE...Editor IJMTER
Digital Images are used in magazines, blogs, website, television and more. Digital image processing
techniques are used for feature selection, pattern extraction classification and retrieval requirements. Color, texture
and shape features are used in the image processing. Digital images processing also supports computer graphics
and computer vision domains. Scene text recognition is performed with two schemes. They are character
recognizer and binary character classifier models. A character recognizer is trained to predict the category of a
character in an image patch. A binary character classifier is trained for each character class to predict the existence
of this category in an image patch. Scene text recognition is performed on detected text regions. Pixel-based layout
analysis method is adopted to extract text regions and segment text characters in images. Text character
segmentation is carried out with color uniformity and horizontal alignment of text characters. Discriminative
character descriptor is designed by combining several feature detectors and descriptors. Histogram of Oriented
Gradients (HOG) is used to identify the character descriptors. Character structure is modeled at each character
class by designing stroke configuration maps. The scene text extraction scheme is also supports for smart mobile
devices. Text recognition methods are used with text understanding and text retrieval applications. The text
recognition scheme is enhanced with content based image retrieval process. The system is integrated with
additional representative and discriminative features for text structure modeling process. The system is enhanced to
perform text and word level recognition using lexicon analysis. The training process is included with word
database update task.
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...IRJET Journal
This document summarizes a research paper that proposes a system to detect text in images of traffic signs, extract the text, and translate it to English. The system uses convolutional neural networks (CNNs) to detect text areas and recurrent neural networks (RNNs) to translate the extracted text. The goal is to help travelers understand traffic signs written in unfamiliar languages like Spanish or French by automatically translating the text in images to English. The system performs three steps: 1) detect text areas in images of signs, 2) extract the words from the detected text regions, and 3) translate the extracted text to English for the user.
International Conference on Information Technology Convergence Services & AI ...gerogepatton
International Conference on Information Technology Convergence Services & AI (ITCAI 2024) will provide an excellent international forum for sharing knowledge and new research results in all areas of Information Technology Convergence Services & AI. The Conference focuses on all technical and practical aspects of ITC & AI. The goal of this conference is to bring together researchers and practitioners from academia and industry to focus on understanding Information Technology Convergence and services, AI, and establishing new collaborations in these areas.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
More Related Content
Similar to Information Extraction from Product Labels: A Machine Vision Approach
Correcting optical character recognition result via a novel approachIJICTJOURNAL
Optical character recognition (OCR) is a recognition system used to recognize the substance of a checked picture. This system gives erroneous results, which necessitates a post-treatment, for the sentence correction. In this paper, we proposed a new method for syntactic and semantic correction of sentences it is based on the frequency of two correct words in the sentence and a recursive technique. This approach starts with the frequency calculation of each two words successive in the corpora, the words that have the greatest frequency build a correction center. We found 98% using our approach when we used the noisy channel. Further, we obtained 96% using the same corpus in the same conditions.
IRJET- Image to Text Conversion using TesseractIRJET Journal
This document discusses using Tesseract OCR engine to convert images containing text into editable text files. It begins with an abstract describing how digital images often contain text data that users need to access and edit digitally. Tesseract is an open-source OCR tool that uses neural networks like LSTM to recognize text in images with high accuracy and convert it into editable text. It then reviews existing OCR methods before describing Tesseract's image processing and recognition steps in more detail. The document also notes that the converted text could then be used to create audio files for visually impaired users to hear the text content.
Medical Prescription Scanner using Optical Character RecognitionIRJET Journal
This document describes a proposed system to digitize medical prescriptions using optical character recognition (OCR). The system would allow users to scan prescriptions using their device's camera. Deep learning techniques like convolutional neural networks would be used to identify characters and text in the prescriptions despite variability in handwriting. The system is meant to help patients and doctors better manage prescriptions by storing them digitally rather than on paper. It discusses techniques for collecting prescription data, preprocessing images, segmenting characters, training and testing OCR models, and implementing the system in an application. Popular tools like TensorFlow, OpenCV, Keras and NumPy would be utilized.
Visual Product Identification For Blind PeoplesIRJET Journal
1) The document presents a prototype system to assist blind people in reading printed text on handheld objects. It uses a camera, data processing, and audio output.
2) A motion-based method is used to isolate the object of interest from other objects in the camera view. Text localization and recognition algorithms are then used to extract and identify text from the isolated region.
3) The system is evaluated on its ability to localize and recognize text from images of objects with complex backgrounds, and on its usability with blind users. Future work will focus on improving text localization and the user interface.
Sign Language Detection and Classification using Hand Tracking and Deep Learn...IRJET Journal
The document presents a research paper on developing a real-time sign language detection system using hand tracking and deep learning. It aims to address the communication barrier faced by deaf individuals by automatically recognizing and interpreting sign language gestures. The researchers collected data, tracked hands using computer vision techniques, and classified gestures using machine learning models. Experimental results showed high confidence scores and real-time performance, demonstrating the potential of the system to facilitate independent communication for the deaf community. However, challenges remain around hand occlusion, lighting variations, and recognizing a wide range of sign language gestures.
IRJET- Design and Development of Tesseract-OCR Based Assistive System to Conv...IRJET Journal
The document describes the design and development of an assistive system using Tesseract optical character recognition (OCR) and a Raspberry Pi to convert captured text into voice output for visually impaired users. A Raspberry Pi with a webcam or mobile camera is used to focus on and capture printed text. The text is processed using OCR and segmentation before feature extraction and character recognition with Tesseract. Recognized text characters are converted to audio format using Festival so they can be accessed by blind users. An ultrasonic sensor is also included to allow users to determine the type of object, such as a menu, being interacted with. The system aims to provide faster reading access compared to braille.
A NOVEL SCHEME FOR ACCURATE REMAINING USEFUL LIFE PREDICTION FOR INDUSTRIAL I...ijaia
In the era of the fourth industrial revolution, measuring and ensuring the reliability, efficiency and safety of the industrial systems and components are one of the uppermost key concern. In addition, predicting performance degradation or remaining useful life (RUL) of an equipment over time based on its historical sensor data enables companies to greatly reduce their maintenance cost. In this way, companies can prevent costly unexpected breakdown and become more profitable and competitive in the marketplace. This paper introduces a deep learning-based method by combining CNN(Convolutional Neural Networks) and LSTM (Long Short-Term Memory)neural networks to predict RUL for industrial equipment. The proposed method does not depend upon any degradation trend assumptions and it can learn complex temporal representative and distinguishing patterns in the sensor data. In order to evaluate the efficiency and effectiveness of the proposed method, we evaluated it on two different experiment: RUL estimation and predicting the status of the IoT devices in 2-week period. Experiments are conducted on a publicly available NASA’s turbo fan-engine dataset. Based on the experiment results, the deep learning-based approach achieved high prediction accuracy. Moreover, the results show that the method outperforms standard well-accepted machine learning algorithms and accomplishes competitive performance when compared to the state-of-the art methods
A NOVEL SCHEME FOR ACCURATE REMAINING USEFUL LIFE PREDICTION FOR INDUSTRIAL I...gerogepatton
In the era of the fourth industrial revolution, measuring and ensuring the reliability, efficiency and safety of the industrial systems and components are one of the uppermost key concern. In addition, predicting performance degradation or remaining useful life (RUL) of an equipment over time based on its historical sensor data enables companies to greatly reduce their maintenance cost. In this way, companies can prevent costly unexpected breakdown and become more profitable and competitive in the marketplace. This paper introduces a deep learning-based method by combining CNN(Convolutional Neural Networks) and LSTM (Long Short-Term Memory)neural networks to predict RUL for industrial equipment. The proposed method does not depend upon any degradation trend assumptions and it can learn complex temporal representative and distinguishing patterns in the sensor data. In order to evaluate the efficiency and effectiveness of the proposed method, we evaluated it on two different experiment: RUL estimation and predicting the status of the IoT devices in 2-week period. Experiments are conducted on a publicly available NASA’s turbo fan-engine dataset. Based on the experiment results, the deep learning-based approach achieved high prediction accuracy. Moreover, the results show that the method outperforms standard well-accepted machine learning algorithms and accomplishes competitive performance when compared to the state-of-the art methods
Character recognition of kannada text in scene images using neuralIAEME Publication
Character recognition in scene images is one of the most fascinating and challenging
areas of pattern recognition with various practical application potentials. It can contribute
immensely to the advancement of an automation process and can improve the interface
between man and machine in many applications. Some practical application potentials of
character recognition system are: reading aid for the blind, traffic guidance systems, tour
guide systems, location aware systems and many more. In this work, a novel method for
recognizing basic Kannada characters in natural scene images is proposed. The proposed
method uses zone wise horizontal and vertical profile based features of character images. The
method works in two phases. During training, zone wise vertical and horizontal profile based
features are extracted from training samples and neural network is trained. During testing, the
test image is processed to obtain features and recognized using neural network classifier. The
method has been evaluated on 490 Kannada character images captured from 2 Mega Pixels
cameras on mobile phones at various sizes 240x320, 600x800 and 900x1200, which contains
samples of different sizes, styles and with different degradations, and achieves an average
recognition accuracy of 92%. The system is efficient and insensitive to the variations in size
and font, noise, blur and other degradations.
Character recognition of kannada text in scene images using neuralIAEME Publication
The document summarizes a proposed method for recognizing Kannada characters in low resolution scene images using neural networks. It involves extracting zone-wise horizontal and vertical profile features from character images. During training, features are extracted from samples and used to train a neural network. During testing, features are extracted from test images and recognized using the trained neural network classifier. The method achieved an average recognition accuracy of 92% on 490 Kannada character images captured from mobile phones under varying conditions.
IRJET- Review on Text Recognization of Product for Blind Person using MATLABIRJET Journal
This document summarizes a research paper that proposes a system to help blind people read text on product labels and documents using a camera and MATLAB software. The system uses image processing techniques like converting images to grayscale, binarization, and filtering to isolate text from complex backgrounds. It then applies optical character recognition to identify the text and provide information to blind users. The proposed system aims to address limitations of prior methods that struggled with non-horizontal text, complex backgrounds, and positioning objects in the camera view. It extracts a region of interest around a product using motion detection and recognizes text regardless of orientation.
IRJET- A Survey on Image Retrieval using Machine LearningIRJET Journal
This document summarizes a research paper on using machine learning for image retrieval. It discusses using optical character recognition (OCR) to extract text from images and then translate that text into machine-readable formats. The document reviews related work on combining visual and text features for image retrieval. It also provides an overview of the proposed system architecture, which combines color, texture, and edge features for image retrieval and evaluates performance using metrics like sensitivity, specificity, retrieval score, and accuracy.
IRJET- Text Recognization of Product for Blind Person using MATLABIRJET Journal
This document presents a camera-based text recognition system developed using MATLAB to assist blind or visually impaired individuals in reading product labels and text from handheld objects. The system uses a camera to capture an image of a product label, applies preprocessing techniques like grayscale conversion and thresholding to extract the text, then uses optical character recognition (OCR) to identify the text and provide it in an audio output. Key steps include text detection using maximally stable extremal regions (MSER) and stroke width transform algorithms, geometric filtering to remove non-text regions, and template matching OCR. The system aims to improve on previous methods by handling more complex backgrounds and text orientations like vertical text. Future work may include recognizing text from more challenging
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUESijcsitcejournal
Optical Character Recognition (OCR) is the process which enables a system to without human intervention
identifies the scripts or alphabets written into the users’ verbal communication. Optical Character
identification has grown to be individual of the mainly flourishing applications of knowledge in the field of
pattern detection and artificial intelligence. In our survey we study on the various OCR techniques. In this
paper we resolve and examine the hypothetical and numerical models of Optical Character Identification.
The Optical character identification or classification (OCR) and Magnetic Character Recognition (MCR)
techniques are generally utilized for the recognition of patterns or alphabets. In general the alphabets are
in the variety of pixel pictures and it could be either handwritten or stamped, of any series, shape or
direction etc. Alternatively in MCR the alphabets are stamped with magnetic ink and the studying machine
categorize the alphabet on the basis of the exclusive magnetic field that is shaped by every alphabet. Both
MCR and OCR discover utilization in banking and different trade appliances. Earlier exploration going on
Optical Character detection or recognition has shown that the In Handwritten text there is no limitation
lying on the script technique. Hand written correspondence is complicated to be familiar through due to
diverse human handwriting style, disparity in angle, size and shape of calligraphy. An assortment of
approaches of Optical Character Identification is discussed here all along through their achievement.
This document provides a summary of a minor project report on image recognition submitted in partial fulfillment of the requirements for a Bachelor of Technology degree in Computer Science and Engineering. The report was submitted by Bhaskar Tripathi and Joel Jose in October 2018 under the supervision of Dr. P. Mohamed Fathimal, Assistant Professor in the Department of Computer Science and Engineering at SRM Institute of Science and Technology. The report includes acknowledgements, a table of contents, and chapters on the introduction, project details, tools and technologies used, proposed system architecture, modules and functionality.
Real Time Sign Language Translation Using Tensor Flow Object DetectionIRJET Journal
This document describes a real-time sign language translation system developed using TensorFlow object detection. The system was able to detect Indian sign language alphabets in real-time with an average accuracy of 87.4% after training an SSD MobileNet v2 model on a dataset of 500 images containing signs for the English alphabet. Future work may focus on improving accuracy, reducing latency for real-time translation, and recognizing facial expressions in addition to hand gestures.
CONTENT RECOVERY AND IMAGE RETRIVAL IN IMAGE DATABASE CONTENT RETRIVING IN TE...Editor IJMTER
Digital Images are used in magazines, blogs, website, television and more. Digital image processing
techniques are used for feature selection, pattern extraction classification and retrieval requirements. Color, texture
and shape features are used in the image processing. Digital images processing also supports computer graphics
and computer vision domains. Scene text recognition is performed with two schemes. They are character
recognizer and binary character classifier models. A character recognizer is trained to predict the category of a
character in an image patch. A binary character classifier is trained for each character class to predict the existence
of this category in an image patch. Scene text recognition is performed on detected text regions. Pixel-based layout
analysis method is adopted to extract text regions and segment text characters in images. Text character
segmentation is carried out with color uniformity and horizontal alignment of text characters. Discriminative
character descriptor is designed by combining several feature detectors and descriptors. Histogram of Oriented
Gradients (HOG) is used to identify the character descriptors. Character structure is modeled at each character
class by designing stroke configuration maps. The scene text extraction scheme is also supports for smart mobile
devices. Text recognition methods are used with text understanding and text retrieval applications. The text
recognition scheme is enhanced with content based image retrieval process. The system is integrated with
additional representative and discriminative features for text structure modeling process. The system is enhanced to
perform text and word level recognition using lexicon analysis. The training process is included with word
database update task.
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...IRJET Journal
This document summarizes a research paper that proposes a system to detect text in images of traffic signs, extract the text, and translate it to English. The system uses convolutional neural networks (CNNs) to detect text areas and recurrent neural networks (RNNs) to translate the extracted text. The goal is to help travelers understand traffic signs written in unfamiliar languages like Spanish or French by automatically translating the text in images to English. The system performs three steps: 1) detect text areas in images of signs, 2) extract the words from the detected text regions, and 3) translate the extracted text to English for the user.
Similar to Information Extraction from Product Labels: A Machine Vision Approach (20)
International Conference on Information Technology Convergence Services & AI ...gerogepatton
International Conference on Information Technology Convergence Services & AI (ITCAI 2024) will provide an excellent international forum for sharing knowledge and new research results in all areas of Information Technology Convergence Services & AI. The Conference focuses on all technical and practical aspects of ITC & AI. The goal of this conference is to bring together researchers and practitioners from academia and industry to focus on understanding Information Technology Convergence and services, AI, and establishing new collaborations in these areas.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
PhotoQR: A Novel ID Card with an Encoded Viewgerogepatton
There is an increasing interest in developing techniques to identify and assess data to allow an easy and
continuous access to resources, services or places that require thorough ID control. Usually, in order to
give access to these resources, different kinds of documents are mandatory. In order to avoid forgeries
without the need of extra credentials, a new system –named photoQR, is here proposed. This system is
based on a ID card having two objects: one person’s picture (pre-processed via blur and/or swirl
techniques) and one QR code containing embedded data related to the picture. The idea is that the picture
and the QR code can assess each other by a proper hash value in the QR. The QR without the picture
cannot be assessed and vice versa. An open source prototype of the photoQR system has been implemented
in Python and can be used both in offline and real-time environments, which effectively combines security
concepts and image processing algorithms to obtain data assessment.
12th International Conference of Artificial Intelligence and Fuzzy Logic (AI ...gerogepatton
12th International Conference of Artificial Intelligence and Fuzzy Logic (AI & FL 2024) provides a
forum for researchers who address this issue and to present their work in a peer-reviewed forum. Authors
are solicited to contribute to the conference by submitting articles that illustrate research results, projects,
surveying works and industrial experiences that describe significant advances in the following areas, but
are not limited to these topics only.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
10th International Conference on Artificial Intelligence and Applications (AI...gerogepatton
10th International Conference on Artificial Intelligence and Applications (AI 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Artificial Intelligence and its applications. The Conference looks for significant contributions to all major fields of the Artificial Intelligence, Soft Computing in theoretical and practical aspects. The aim of the Conference is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
May 2024 - Top 10 Read Articles in Artificial Intelligence and Applications (...gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
3rd International Conference on Artificial Intelligence Advances (AIAD 2024)gerogepatton
3rd International Conference on Artificial Intelligence Advances (AIAD 2024) will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the area advanced Artificial Intelligence. It will also serve to facilitate the exchange of information between researchers and industry professionals to discuss the latest issues and advancement in the research area. Core areas of AI and advanced multi-disciplinary and its applications will be covered during the conferences.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
10th International Conference on Artificial Intelligence and Applications (AI...gerogepatton
10th International Conference on Artificial Intelligence and Applications (AI 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Artificial Intelligence and its applications. The Conference looks for significant contributions to all major fields of the Artificial Intelligence, Soft Computing in theoretical and practical aspects. The aim of the Conference is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
Research on Fuzzy C- Clustering Recursive Genetic Algorithm based on Cloud Co...gerogepatton
Aiming at the problems of poor local search ability and precocious convergence of fuzzy C-cluster
recursive genetic algorithm (FOLD++), a new fuzzy C-cluster recursive genetic algorithm based on
Bayesian function adaptation search (TS) was proposed by incorporating the idea of Bayesian function
adaptation search into fuzzy C-cluster recursive genetic algorithm. The new algorithm combines the
advantages of FOLD++ and TS. In the early stage of optimization, fuzzy C-cluster recursive genetic
algorithm is used to get a good initial value, and the individual extreme value pbest is put into Bayesian
function adaptation table. In the late stage of optimization, when the searching ability of fuzzy C-cluster
recursive genetic is weakened, the short term memory function of Bayesian function adaptation table in
Bayesian function adaptation search algorithm is utilized. Make it jump out of the local optimal solution,
and allow bad solutions to be accepted during the search. The improved algorithm is applied to function
optimization, and the simulation results show that the calculation accuracy and stability of the algorithm
are improved, and the effectiveness of the improved algorithm is verified
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
10th International Conference on Artificial Intelligence and Soft Computing (...gerogepatton
10th International Conference on Artificial Intelligence and Soft Computing (AIS 2024) will
provide an excellent international forum for sharing knowledge and results in theory, methodology, and
applications of Artificial Intelligence, Soft Computing. The Conference looks for significant
contributions to all major fields of the Artificial Intelligence, Soft Computing in theoretical and practical
aspects. The aim of the Conference is to provide a platform to the researchers and practitioners from
both academia as well as industry to meet and share cutting-edge development in the field.
International Journal of Artificial Intelligence & Applications (IJAIA)gerogepatton
Employee attrition refers to the decrease in staff numbers within an organization due to various reasons.
As it has a negative impact on long-term growth objectives and workplace productivity, firms have
recognized it as a significant concern. To address this issue, organizations are increasingly turning to
machine-learning approaches to forecast employee attrition rates. This topic has gained significant
attention from researchers, especially in recent times. Several studies have applied various machinelearning methods to predict employee attrition, producing different resultsdepending on the employed
methods, factors, and datasets. However, there has been no comprehensive comparative review of multiple
studies applying machine-learning models to predict employee attrition to date. Therefore, this study aims
to fill this gap by providing an overview of research conducted on applying machine learning to predict
employee attrition from 2019 to February 2024. A literature review of relevant studies was conducted,
summarized, and classified. Most studies agree on conducting comparative experiments with multiple
predictive models to determine the most effective one.From this literature survey, the RF algorithm and
XGB ensemble method are repeatedly the best-performing, outperforming many other algorithms.
Additionally, the application of deep learning to employee attrition prediction issues also shows promise.
While there are discrepancies in the datasets used in previous studies, it is notable that the dataset
provided by IBM is the most widely utilized. This study serves as a concise review for new researchers,
facilitating their understanding of the primary techniques employed in predicting employee attrition and
highlighting recent research trends in this field. Furthermore, it provides organizations with insight into
the prominent factors affecting employee attrition, as identified by studies, enabling them to implement
solutions aimed at reducing attrition rates.
10th International Conference on Artificial Intelligence and Applications (AI...gerogepatton
10th International Conference on Artificial Intelligence and Applications (AIFU 2024) is a forum for presenting new advances and research results in the fields of Artificial Intelligence. The conference will bring together leading researchers, engineers and scientists in the domain of interest from around the world. The scope of the conference covers all theoretical and practical aspects of the Artificial Intelligence.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...PriyankaKilaniya
Energy efficiency has been important since the latter part of the last century. The main object of this survey is to determine the energy efficiency knowledge among consumers. Two separate districts in Bangladesh are selected to conduct the survey on households and showrooms about the energy and seller also. The survey uses the data to find some regression equations from which it is easy to predict energy efficiency knowledge. The data is analyzed and calculated based on five important criteria. The initial target was to find some factors that help predict a person's energy efficiency knowledge. From the survey, it is found that the energy efficiency awareness among the people of our country is very low. Relationships between household energy use behaviors are estimated using a unique dataset of about 40 households and 20 showrooms in Bangladesh's Chapainawabganj and Bagerhat districts. Knowledge of energy consumption and energy efficiency technology options is found to be associated with household use of energy conservation practices. Household characteristics also influence household energy use behavior. Younger household cohorts are more likely to adopt energy-efficient technologies and energy conservation practices and place primary importance on energy saving for environmental reasons. Education also influences attitudes toward energy conservation in Bangladesh. Low-education households indicate they primarily save electricity for the environment while high-education households indicate they are motivated by environmental concerns.
Software Engineering and Project Management - Introduction, Modeling Concepts...Prakhyath Rai
Introduction, Modeling Concepts and Class Modeling: What is Object orientation? What is OO development? OO Themes; Evidence for usefulness of OO development; OO modeling history. Modeling
as Design technique: Modeling, abstraction, The Three models. Class Modeling: Object and Class Concept, Link and associations concepts, Generalization and Inheritance, A sample class model, Navigation of class models, and UML diagrams
Building the Analysis Models: Requirement Analysis, Analysis Model Approaches, Data modeling Concepts, Object Oriented Analysis, Scenario-Based Modeling, Flow-Oriented Modeling, class Based Modeling, Creating a Behavioral Model.
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Transcat
Join us for this solutions-based webinar on the tools and techniques for commissioning and maintaining PV Systems. In this session, we'll review the process of building and maintaining a solar array, starting with installation and commissioning, then reviewing operations and maintenance of the system. This course will review insulation resistance testing, I-V curve testing, earth-bond continuity, ground resistance testing, performance tests, visual inspections, ground and arc fault testing procedures, and power quality analysis.
Fluke Solar Application Specialist Will White is presenting on this engaging topic:
Will has worked in the renewable energy industry since 2005, first as an installer for a small east coast solar integrator before adding sales, design, and project management to his skillset. In 2022, Will joined Fluke as a solar application specialist, where he supports their renewable energy testing equipment like IV-curve tracers, electrical meters, and thermal imaging cameras. Experienced in wind power, solar thermal, energy storage, and all scales of PV, Will has primarily focused on residential and small commercial systems. He is passionate about implementing high-quality, code-compliant installation techniques.
Height and depth gauge linear metrology.pdfq30122000
Height gauges may also be used to measure the height of an object by using the underside of the scriber as the datum. The datum may be permanently fixed or the height gauge may have provision to adjust the scale, this is done by sliding the scale vertically along the body of the height gauge by turning a fine feed screw at the top of the gauge; then with the scriber set to the same level as the base, the scale can be matched to it. This adjustment allows different scribers or probes to be used, as well as adjusting for any errors in a damaged or resharpened probe.
Applications of artificial Intelligence in Mechanical Engineering.pdfAtif Razi
Historically, mechanical engineering has relied heavily on human expertise and empirical methods to solve complex problems. With the introduction of computer-aided design (CAD) and finite element analysis (FEA), the field took its first steps towards digitization. These tools allowed engineers to simulate and analyze mechanical systems with greater accuracy and efficiency. However, the sheer volume of data generated by modern engineering systems and the increasing complexity of these systems have necessitated more advanced analytical tools, paving the way for AI.
AI offers the capability to process vast amounts of data, identify patterns, and make predictions with a level of speed and accuracy unattainable by traditional methods. This has profound implications for mechanical engineering, enabling more efficient design processes, predictive maintenance strategies, and optimized manufacturing operations. AI-driven tools can learn from historical data, adapt to new information, and continuously improve their performance, making them invaluable in tackling the multifaceted challenges of modern mechanical engineering.
Digital Twins Computer Networking Paper Presentation.pptxaryanpankaj78
A Digital Twin in computer networking is a virtual representation of a physical network, used to simulate, analyze, and optimize network performance and reliability. It leverages real-time data to enhance network management, predict issues, and improve decision-making processes.
Accident detection system project report.pdfKamal Acharya
The Rapid growth of technology and infrastructure has made our lives easier. The
advent of technology has also increased the traffic hazards and the road accidents take place
frequently which causes huge loss of life and property because of the poor emergency facilities.
Many lives could have been saved if emergency service could get accident information and
reach in time. Our project will provide an optimum solution to this draw back. A piezo electric
sensor can be used as a crash or rollover detector of the vehicle during and after a crash. With
signals from a piezo electric sensor, a severe accident can be recognized. According to this
project when a vehicle meets with an accident immediately piezo electric sensor will detect the
signal or if a car rolls over. Then with the help of GSM module and GPS module, the location
will be sent to the emergency contact. Then after conforming the location necessary action will
be taken. If the person meets with a small accident or if there is no serious threat to anyone’s
life, then the alert message can be terminated by the driver by a switch provided in order to
avoid wasting the valuable time of the medical rescue team.
Information Extraction from Product Labels: A Machine Vision Approach
1. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.15, No.2, March 2024
DOI:10.5121/ijaia.2024.15204 57
INFORMATION EXTRACTION FROM PRODUCT
LABELS: A MACHINE VISION APPROACH
Hansi Seitaj and Vinayak Elangovan
Computer Science program, Penn State Abington, Abington, PA, USA
ABSTRACT
This research tackles the challenge of manual data extraction from product labels by employing a blend of
computer vision and Natural Language Processing (NLP). We introduce an enhanced model that combines
Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) in a Convolutional
Recurrent Neural Network (CRNN) for reliable text recognition. Our model is further refined by
incorporating the Tesseract OCR engine, enhancing its applicability in Optical Character Recognition
(OCR) tasks. The methodology is augmented by NLP techniques and extended through the Open Food
Facts API (Application Programming Interface) for database population and text-only label prediction.
The CRNN model is trained on encoded labels and evaluated for accuracy on a dedicated test set.
Importantly, our approach enables visually impaired individuals to access essential information on
product labels, such as directions and ingredients. Overall, the study highlights the efficacy of deep
learning and OCR in automating label extraction and recognition.
KEYWORDS
Optical Character Recognition (OCR); Machine Vision; Machine Learning; Convolutional Recurrent
Neural Network (CRNN); Natural Language Processing (NLP); Text Recognition; Test Classification;
Product Labels; Deep Learning; Data Extraction.
1. INTRODUCTION
Automating visual inspection on product labels offers several important advantages and benefits,
which make it a valuable and essential process in various industries. Automated inspection
systems can process large volumes of product labels quickly and consistently, reducing the need
for manual labor and accelerating production processes. An efficient system can identify defects,
inconsistencies, and deviations in labels, ensuring that products meet regulatory standards and
quality requirements. Along the same lines, Optical Character Recognition (OCR) technology is
widely used to recognize and extract text and characters from images, allowing for the automated
processing and analysis of textual information on various objects, including product labels,
documents, and images.
In this research paper, the focus is on addressing a significant societal challenge related to manual
data extraction from product labels. This process is known to be tedious, error-prone, and poses
accessibility problems for visually impaired individuals. The issue is compounded by the vast
array of variations in product labels, including differences in design, colors, text fonts, and sizes,
as well as variations in lighting conditions during image capture [1-4]. Traditional OCR methods
have been limited in their ability to handle these variations effectively. However, the initiative by
researchers in [2] underscores a significant advancement in automatic information extraction
from cylindrical prescription labels, with potential applications to assist the elderly and
individuals with visual disabilities. To address these challenges, this research introduces an
2. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.15, No.2, March 2024
58
enhanced, automated OCR system that leverages advanced machine vision and learning methods.
The proposed solution makes use of a Convolutional Recurrent Neural Network (CRNN) model,
combining the feature extraction capabilities of Convolutional Neural Networks (CNNs) with the
sequence recognition strengths of Recurrent Neural Networks (RNNs).
This model is further enhanced through the integration of the Tesseract OCR engine, enhancing
its robustness and accuracy [5]. Advanced image processing techniques are applied for efficient
preprocessing and noise reduction, while Natural Language Processing (NLP) techniques are
integrated to enhance context understanding and semantics.
The implementation of this research has wide-ranging implications across various industries. In
healthcare, it can help prevent medication errors by accurately extracting and recognizing
medication labels [6]. Food safety organizations can efficiently monitor product ingredients,
allergens, and expiration dates, and consumer goods companies can streamline product
information management [7, 8]. Although many recent approaches utilizing deep learning models
like CNNs and RNNs have shown promise, they have not fully integrated computer vision and
NLP techniques [9 - 12]. Importantly, Optical character recognition (OCR) systems provide
persons who are blind or visually impaired with the capacity to scan printed text and then have it
spoken in synthetic speech or saved to a computer file [13].
Our research also extends data analysis through the Open Food Facts API, enriching the database
and fulfilling a text only RNN model for improved label prediction. This comprehensive
approach not only revolutionizes the field of text recognition but also makes significant
contributions to OCR applications. The research aims to offer a solution that automates
information extraction from product labels and does so with exceptional efficiency and accuracy.
In summary, this research presents an enhanced solution that effectively integrates computer
vision and NLP techniques to automate the extraction and recognition of textual information from
product labels, thereby addressing the limitations of existing OCR methods and enhancing the
efficiency and accuracy of the process.
2. LITERATURE REVIEW
The challenge of OCR, particularly in unconstrained environments like product labels, has been
an important point in the domain of computer vision and text recognition. Traditional OCR
methods have been found to have limitations when dealing with text in such unconstrained
environments due to geometrical distortions, complex backgrounds, and diverse fonts [15]. One
of the significant steps in this domain has been the introduction of CRNN which combines the
feature extraction capabilities of CNNs with the sequence recognition capabilities of RNNs [16].
Also, the arrival of deep learning has brought new methodologies that aim to tackle OCR
challenges more effectively. However, as Ignat et al. (2022) point out, these methods, particularly
end-to-end transformer OCR models, require substantial amounts of annotated data, which poses
challenges for low-resource languages. They emphasize the necessity of OCR in improving
machine translation for these languages, highlighting a critical gap in OCR research [17].
The beginning of deep learning has brought a new wave of methodologies that aim to tackle these
challenges more effectively. Further advancements have been seen with the integration of
external OCR engines like Tesseract, which when combined with deep learning methodologies,
have shown promise in enhancing OCR capabilities [18]. These integrated models aim to
enhance the accuracy and efficiency of text recognition in OCR systems. Moreover, the behavior
of transformer models like BERT in language processing has been a subject of recent research.
Jawahar et al. (2019) investigated what BERT learns about the structure of language, providing
valuable insights into the model's linguistic capabilities [19]. Additionally, Guarasci et al. (2022)
3. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.15, No.2, March 2024
59
conducted a computational experiment on the syntactic transfer capabilities of BERT in Italian,
French, and English, underscoring the adaptability of transformer models across various
languages [20].
The application of deep learning techniques for text detection and extraction has also been
explored, with algorithms like EAST being utilized to transform text from images or scanned
documents into machine-readable form [21]. Furthermore, the OCR is identified as an important
branch in the field of machine vision, encompassing pattern recognition, image processing,
digital signal processing, and artificial intelligence showcasing the multi-dimensional facets of
OCR technologies [22]. Moreover, the evolution of text recognition software has revolutionized
text extraction processes, presenting new techniques and technologies that continue to advance
the field of OCR. This integration aims to provide a robust solution to text extraction and
recognition challenges posed by diverse and complex imagery found on product labels. In fact,
the visual online detection method is proposed to ensure the label quality in real time during
industrial production, indicating the practical application of OCR in industrial settings [23].
Additionally, there is a growing interest in the application of OCR in various industry sectors. For
instance, a deep learning approach to drug label identification has been proposed to address
challenges in drug control and distribution, ensuring the provision of safe and approved drugs to
consumers and healthcare professionals [23, 24].
Despite the maturity of optical character recognition (OCR) technology, reading data from
screens such as 7 segments and LCD screens possess some challenges that illustrate the ongoing
challenges in OCR technology [25]. Also, in terms of precision, a system designed for OCR is
noted to be capable of interpreting captured images of hard disk drive and solid-state drive labels
with high accuracy, emphasizing the progression towards higher accuracy in OCR systems [26 -
27]. Along the same lines, this paper proposition holds of a camera-based assistive text reading
framework to help blind persons read text labels and product packaging from hand-held objects
in their daily lives underlining the societal impact and accessibility improvements offered by
OCR technologies [28]. In the recent past years, there is a significant shift away from traditional
OCR methods toward more integrated and advanced approaches that utilize deep learning and
external OCR engines to tackle the inherent challenges in text recognition and extraction. This
paper aims to improve the extraction of information from product labels, an important aspect for
industries like healthcare, food safety, and consumer goods. Conventional optical character
recognition (OCR) techniques, which are predominantly rule-based and depend on manually
created features and heuristic rules, have demonstrated their limitations, particularly when it
comes to managing intricate document layouts, one-of-a-kind fonts, and complex formatting,
which may lead to the extraction of crucial data being lost. In industries like finance, law,
healthcare, and government, where documents frequently have different formatting requirements,
this problem is especially significant [29]. Due to the variation in handwriting styles and quality,
these methods also have trouble identifying handwritten text. This can result in mistakes and
inaccuracies, which presents serious problems in situations where handwritten documents are
common [30].
Another significant drawback of traditional OCR is its reliance on image quality; noise
introduced by low-resolution images or dim lighting can make it difficult to recognize characters
accurately. Because of its dependence on high-quality images, OCR is less useful when dealing
with a variety of image qualities, font variations, and layouts [31]. This is especially true in
scenarios that pose difficulties for the visually impaired. Finally, the challenges in data collection,
labeling, and regulatory compliance, particularly in sensitive sectors like finance and healthcare,
underscore the continuous need for advancements in OCR systems to handle real-world
applications more effectively [32].
4. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.15, No.2, March 2024
60
3. PROPOSED METHODOLOGIES
This section provides a detailed description of the approaches used to process and analyze
product labels. It covers the methods, strategies, and algorithms employed, emphasizing the
experimental protocols as well as the logic behind the choice of each approach. The following
sections are organized as follows: section 3.1 addresses applied methodology; section 3.2
addresses machine vision technique; section 3.3 addresses techniques comparison; section 3.4
addresses similar applications and the limitations.
3.1. Applied Methodology
Figure 1 provides an overview of the comprehensive workflow of the system, covering the entire
process from initial data collection to the final assessment of the model. The creation of an
efficient automated system for processing and analyzing product labels employed a structured
methodology, as depicted in Figure 1. Initially, approximately fifty product label photos from
diverse industries were manually selected using an iPhone 11 Pro Max (Apple Inc., Cupertino,
California, USA), encompassing both '.jpg' and '.png' formats. To augment and enhance the
dataset, additional textual data was obtained from the Open Food Facts API using a free
developer account. Integration of this API data was crucial to broaden the dataset's scope,
specifically incorporating a diverse range of non-standardized product labels. This approach
facilitated the analysis of a more extensive dataset, comprising at least 500 lines, thereby
enhancing the depth of the study.
Figure 1. Overall System Overflow
Subsequently, Tesseract OCR was employed for optical character recognition to extract text from
the images. In parallel, a PyTorch model was devised to process these images and extract
considerable information. Prior to feeding the images into the PyTorch model, image
preprocessing steps were undertaken to enhance image quality making them conducive for the
model. Furthermore, the text data embedded within the product labels required standardization,
hence several text preprocessing techniques were applied using Natural Language Processing
(NLP) methods. The methodology included the transformation of all text to lowercase to
maintain consistency, removal of punctuation to enhance focus on words and meaningful content,
and the elimination of common stop words such as 'and', 'the', 'in' to reduce noise and augment
the relevance of the information extracted.
To enhance the processing speed, parallel threading was employed during image processing.
Furthermore, Figure 2 shows a visualization of the product categories to further clarify the nature
of the dataset derived post image processing. The data was then categorized into distinct features
like 'product_type', 'directions', 'supplements_or_elements', and 'warnings', based on predefined
keywords. For instance, the 'product_type' feature was discerned by matching keywords related to
groceries, skincare, or medication. Moreover, based on the length of the extracted text, the
corresponding image was categorized either in the 'extracted_images' or the
'not_extracted_images' directories.
5. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.15, No.2, March 2024
61
As shown in Figure 2, a step further was taken to group the products into four categories: 'food',
‘medication’, and 'skincare' or an 'unknown', based on keywords associated with each category
found in the product's name or ingredients. In addition, each row of the frame presents an image
reference, its corresponding category, also the product description or ingredients list and other
information extracted.
Figure 2. Data frame representation
After preprocessing, the processed images and the written information taken from Open Food
Facts API were combined to create a dataset, as shown in Figure 3. Along the same lines, the
visualization provides a concrete illustration of the process of implementing keyword-based
product categorization as well as the initial difficulties encountered when selecting a consistent
dataset. The Data Frame structure facilitates more efficient data processing and analysis in
accordance with the objectives of the study. Following the classification, data cleaning steps were
performed to handle missing values, format inconsistencies, and other anomalies, leading to a
cleaned and structured dataset converted into a Data Frame. Figure 3a shows the total quantity of
products under each category using a bar chart, with "groceries" having the higher number of
products. A donut chart in Figure 3b, shows the percentage distribution of these product types.
This visual representation highlights the frequency of product types in our dataset and facilitates
comprehension of its composition.
a. bar chart representing total quantity of products b. products percentage distribution.
Figure 3. Dataset Visualization
The data frame was improved by using data from product labels that were retrieved through the
Open Food Facts API. Complete product details, such as names, brands, ingredients, and
countries of origin, were included in the information. However, the abundance of the descriptors
found in the product labels extracted text from the API improved the keyword-based product
grouping system. A comprehensive dataset comprising five hundred lines was analyzed, as
illustrated in the figure 3, leading to a more precise classification of products into categories like
"groceries," "dairy," "beverages," "medication," and "skincare," as well as accounting for
6. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.15, No.2, March 2024
62
"unknown" classifications in situations where the data was ambiguous or not present. The
histogram and pie chart, which present the number and percentage distribution of product types
graphically, respectively, demonstrate the diversity and size of the dataset made possible by this
integration. Finally, using the Open Food Facts API improved the quantity and quality of
available data, allowing for the making of more informed and superior decisions based on
information. The machine learning techniques implementation included the use of three neural
network architectures: CNN (Convolutional Neural Network), RNN (Recurrent Neural Network),
and CRNN (Convolutional Recurrent Neural Network). The CNN network is primarily
responsible for processing image data.
In addition, the images of product labels are first preprocessed and then fed into CNN. This
network extracts features from these images, which are crucial for understanding visual patterns
in the product labels. Additionally, the RNN processes sequential data as the textual data
extracted from the product labels. The sequential nature of RNNs makes them well-suited for text
as it considers the order and context of words. Lastly, CRNN combines the features of both CNN
and RNN. Furthermore, the network takes the image features extracted by CNN and the text
processed by the RNN to perform a comprehensive analysis of the product labels. This approach
enables the model to understand both the visual and textual content of the labels, enhancing the
accuracy of your classification or analysis task.
For instance, PyTorch was used to create a deep learning model for classifying product labels.
With class weights, a Label Encoder and CrossEntropyLoss as the loss function were used to
encode the text data. The performance of the CRNN model on untested data was assessed
following the training phase using a testing Data Loader. Additionally, the database enrichment
made room for the development of an RNN implementation that just accepted text input. Also,
we leveraged a text-only Recurrent Neural Network (RNN) model, combining data from image
extractions and the Open Food Facts API into a singular Data Frame. Our RNN and Gated
Recurrent Unit (GRU) models, built with PyTorch, processed the textual information with an
embedding layer followed by recurrent layers to accommodate sequential data. These models
utilized LSTM and GRU units to better capture dependencies in text data, which is crucial for
classification tasks.
To assess the model's classification performance, evaluation metrics such as accuracy and
precision were calculated. The weights of the trained CRNN model were stored in a file called
"crnn_model_new.pth" in case they were needed again. Finally, the model underwent several
training phases at different learning rates. Lastly, the performance metrics for each learning rate
were stored in a metrics dictionary and visualized using seaborn to elucidate the relationship
between learning rates and performance, enabling a comprehensive understanding of the model's
behavior and performance under different conditions.
3.2. Machine Vision Techniques
To enhance the OCR process, we incorporated machine vision techniques. Image preprocessing
techniques such as image normalization, denoising, and contrast enhancement were applied to
improve the quality of label images. Additionally, image analysis algorithms, including contour
detection and segmentation, were employed to isolate and extract individual characters or text
regions from the labels.
Data Extraction Algorithm
The OCR algorithm takes a single image and returns the extracted information in these ordered
steps:
7. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.15, No.2, March 2024
63
1. Image Loading: The image is loaded from the specified path using OpenCV.
2. Color Conversion: Convert the image from BGR to RGB color space.
3. Image Resizing: Resize image proportionally to standardize the images, which helps in
normalizing the variations in dimensions.
4. LAB Color Space Conversion and Splitting: The resized image is converted to the Lab color
space, and the channels are split. This space is chosen for its ability to separate grayscale
information from color data.
5. CLAHE: Apply CLAHE to the L channel to improve contrast. Merge enhanced L with A and
B channels.
6. RGB Conversion and Sharpening: Convert LAB image back to RGB. Sharpen the RGB
image.
7. Grayscale Conversion and Thresholding: Convert sharpened image to grayscale. Otsu’s
thresholding is applied to binarize the image, simplifying the detection of text regions.
8. Black/White Ratio Calculation: Calculate ratios of black and white pixels.
9. Image Inversion: If the image is mostly black or white, invert the binary image.
10. Contour Detection and Extraction: For color images, dilate the image, find contours, and
detect text regions. Extract contour with the maximum area and crop the image.
11. Text Extraction: Extract text from the cropped image using Pytesseract.
12. Data Extraction and Image Saving: Extract data from the text and save images based on text
length.
13. Exception Handling: Handle any errors during image processing, log the error message.
The settings for filters, such as the Contrast Limited Adaptive Histogram Equalization (CLAHE),
and threshold values, are chosen based on their proven effectiveness in similar image processing
tasks. Specifically, the CLAHE settings are tuned to enhance the contrast of the L channel in the
LAB color space without amplifying noise, which is critical for improving OCR accuracy on
unevenly lit images. The clip limit and tile grid size parameters for CLAHE are selected to
balance contrast enhancement and detail preservation. For binary thresholding, Otsu's method is
employed due to its automatic calculation of the optimal threshold value, which separates the
pixel intensity distribution into two classes, foreground, and background. This method is
particularly effective for document images where there is a clear distinction between text
(foreground) and the rest of the image (background). Along the same lines, the color space
notation, the LAB color space is represented as Lab or Lab*. This notation reflects the three axes
of the color space: L for lightness and a and b for the color-opponent dimensions. Using the
correct notation is not just a matter of accuracy in terms; it also ensures adherence to the
established standards in image processing and color science literature.
Figure 4 showcases a few sample images after preprocessing and machine vision techniques on
label images. These three sample images, ranging from a mouthwash product, a medicinal label,
to a nutritional fact sheet of ground beef, depict the potential challenges faced during OCR
extraction, such as complex backgrounds, warped perspectives, and varying text densities.
8. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.15, No.2, March 2024
64
a: Mouthwash label b: Vaseline label c: Food label
Figure 4. Preprocessed Images.
Figure 4a (Mouthwash Label) is likely to be full of text and various symbols, including
certification marks which could complicate text recognition. The complexity comes from both the
amount of textual information and the variety of formats such as logos, small print, and
potentially decorative elements in the background that are common on personal care products.
Figure 4b (Vaseline Label) presents a different challenge due to the curvature of the surface on
which it is attached. Labels on curved surfaces can cause distortions in the text, making it harder
for OCR to accurately read the information. The label's text may be stretched or compressed due
to the shape of the container, and the lighting in the image may create reflections or shadows that
interfere with text detection. Lastly, Figure 4c (Food Label) holds the nutritional information
label for ground beef showcases the need for OCR to accurately capture numerical data and
percentages. This is crucial for consumers who rely on this information for dietary and health
reasons. The label's text density and the importance of precision in capturing the data make it a
good example of the need for advanced OCR techniques, which may include steps like color
conversion and contour detection to improve text extraction accuracy.
Each label requires a nuanced approach to OCR pre-processing to handle issues like complex
backgrounds, warped perspectives from curved surfaces, and high text density. These examples
highlight the necessity of a multi-step algorithmic method. As mentioned, the algorithm includes
various stages such as image loading, pre-processing steps like color conversion, grayscale
conversion, noise removal, and more sophisticated steps like contour detection and perspective
transformation to handle warped text. Exception handling is also crucial to ensure that anomalies
or unexpected issues don't lead to incorrect text extraction.
3.3. Natural Language Processing (NLP) in Product Labels
The NLP framework within our methodology utilizes an ensemble of techniques to interpret and
categorize the text extracted from product labels. The process starts with the application of a
lemmatizer, which reduces words to their base or dictionary form. This is particularly useful in
handling the various inflected forms of words encountered in product descriptions, ensuring that
'runs', 'running', and 'ran' are all analyzed as 'run'.
Stop words, typically the most common words in a language that carry minimal unique
information (like 'the', 'is', 'at'), are filtered out to focus on the more meaningful content. Removal
of these words is a standard practice in NLP to reduce noise and computational load.
9. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.15, No.2, March 2024
65
The text extraction process includes several customized steps:
1. Applied text lowercasing: Converting all characters to lowercase to maintain uniformity and
prevent the same words in different cases from being treated as distinct.
2. Keyword Extraction: Identifying specific keywords such as 'ingredients', 'directions', and
'warnings' using a combination of string operations and regular expressions. The extraction is
sensitive to the context, searching for these terms within the relevant sections of the text.
3. Tokenization: Breaking down text into individual terms or tokens, which are then analyzed
for patterns or used as input for further processing like classification.
4. Lemmatization: Applying the lemmatizer to reduce tokens to their lemmas, thus aiding in the
generalization and reducing the complexity of the textual data.
5. Text Cleaning: Implementing a series of regular expressions to remove non-alphanumeric
characters, excess whitespace, and numbers where they are not part of the meaningful text.
This step also includes encoding and decoding operations to handle non-ASCII characters,
which can often be found in product labels due to internationalization.
6. Advanced Text Processing: Utilizing NLP libraries like Text Blob to correct spelling, which
is crucial for ensuring the accuracy of keyword-based categorization.
7. Feature Extraction: From the cleaned and tokenized text, key features are extracted based on
predefined categories, which are then used to train machine learning models. This step
involves identifying the presence of specific terms that are indicative of product types, such
as differentiating food items from skincare products.
In summary, the described NLP techniques are deeply integrated into the overall architecture,
playing an essential role in the system's ability to accurately categorize and understand the
product labels. Each technique contributes to refining the text data, which is essential for the
subsequent machine learning stages.
3.4. Technique Comparison with Traditional OCR Methods
We implemented and compared our approach with traditional OCR methods, specifically LSTM,
transformer-based models, and CRNN. LSTM (Long Short-Term Memory) is a recurrent neural
network (RNN) model apt for OCR (Optical Character Recognition) tasks due to its memory cells'
ability to retain long-term information. Unlike traditional neural networks, LSTM incorporates
feedback connections, allowing it to process entire sequences of data, not just individual data
points. This makes it highly effective in understanding and predicting patterns in sequential data
like time series, text, and speech [33]. Its workflow includes preprocessing input images, text
extraction using OCR techniques like Tesseract, feature extraction through keyword matching,
and training an LSTM model using encoded label features. The model's architecture can include
embedding layers, LSTM layers, and fully connected layers. It is trained on labeled data,
validated, and evaluated for accuracy on a test set.
Transformer-based models, like those in the EasyOCR library, have shown promising results in
OCR tasks due to their self-attention mechanisms and parallel processing capabilities. The
workflow mirrors that of the LSTM approach, except for text extraction, where a transformer-
based model like EasyOCR is used. Furthermore, transformer-based OCR is an end-to-end
transformer-based OCR model for text recognition, this is one of the first works to jointly
leverage pre-trained image and text transformers [34]. These models require large datasets and
computational resources for effective training, and their performance is evaluated through metrics
like accuracy, precision, and recall.
CRNN (Convolutional Recurrent Neural Network) leverages both CNN (Convolutional Neural
Networks) and RNN's strengths for image and sequence recognition. The workflow is akin to the
10. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.15, No.2, March 2024
66
previous approaches, with the training involving a CNN for image feature extraction and an RNN
(such as LSTM or GRU) for sequence modeling. This combined model learns visual
representations and captures label sequence dependencies. The CRNN model is described as a
unified framework that integrates feature extraction, sequence modeling, and transcription, which
does not require character segmentation and can predict label sequences with relationships
between characters [35].
Our comparison with conventional OCR techniques showed that transformer-based models
outperform LSTM models because of their self-attention processes, while LSTM models are
resilient when processing sequential data. Nevertheless, our results show that CRNN provides an
improved method for OCR tasks by combining CNN's capability to extract hierarchical visual
characteristics with LSTM's sequential data processing. This hybrid model is the most promising
for improving OCR skills because it can capture the subtleties of temporal and spatial correlations
in text data. It excels at interpreting text from a variety of font styles and complex backdrops that
are seen in real-world settings.
3.5. Similar Applications and the Limitations
3.5.1. OCR Space website API
The OCR space website supplies an OCR (Optical Character Recognition) API for extracting text
from images. OCR space provides a simple and convenient OCR (Optical Character Recognition)
API for parsing images and multi-page PDF documents, returning the extracted text results in a
JSON format. The API offers three tiers: Free, PRO, and PRO PDF. The Free OCR API plan is
suitable for users who want to explore and test the OCR functionality. It comes with a rate limit of
500 requests per day per IP (Internet Protocol) address to prevent unintended spamming [36].
This plan allows developers to get started without any financial commitment.
The OCR space website supplies an OCR (Optical Character Recognition) API for extracting text
from images. However, there are limitations associated with using the free version of the API.
One limitation is that the API commands can only be used every 600 seconds (about 10 minutes).
Additionally, the image size must be less than 1 MB to be processed by the API. However, there
are other payable deals that offer great solutions to general image OCR implementations.
It is also important to note that while the Free OCR API plan has certain limitations such as the
rate limit and file size limit of 1 MB, the PRO plans offer expanded capabilities, including larger
file size support, higher request volumes, and customizable OCR servers. Finally, the PRO OCR
API can be purchased as locally installable OCR software, providing organizations with the
flexibility to have their OCR servers set up at a location of their choice. This option is suitable for
users with specific data security or compliance requirements.
3.5.2. MMOCR
MMOCR is an OCR toolbox that provides a comprehensive set of tools for optical character
recognition. To use MMOCR, Microsoft Visual C++ 14.0 or greater must be installed on the
system to compile the pycocotools package, a dependency for MMOCR.
The installation process for MMOCR involves installing several dependencies using the
"openmim" package manager. These dependencies include "mmengine," "mmcv" (version
2.0.0rc1 or later), and "mmdet" (version 3.0.0rc0 or later). More detailed installation instructions
can be found on the MMOCR documentation website [37].
11. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.15, No.2, March 2024
67
MMOCR offers two main components for OCR: detection and recognition. It is worth noting that
while MMOCR offers high accuracy in text recognition, it has some limitations. The approach
struggles with images of different scales, often resulting in extremely small images. Resizing the
images manually can help overcome this issue. Furthermore, based on the MMOCR official
documentation, also as a drawback, the training of the model can take up to 400 phases to be
trained and tested to provide high accuracy which will require much more powerful computer
with high speed. Additionally, the performance of the MMOCR approach is slower compared to
other OCR methods.
In conclusion, MMOCR provides a powerful OCR toolbox with efficient detection and
recognition capabilities [37]. By following the installation instructions and utilizing the approach
examples, users can leverage MMOCR to extract text from images effectively. However, it is
important to consider the limitations, such as the need for manual resizing of images and the
slower processing speed, when using MMOCR for OCR tasks.
3.5.3. EAST and CRAFT
The implementation of text detection using the CRAFT (Character Region Awareness for Text
Detection) algorithm performs text detection on both black-and-white and color images. The
CRAFT algorithm involves several steps, including text region detection using the CRAFT model
and subsequent preprocessing and OCR steps. The CRAFT algorithm involves steps such as text
region detection using the CRAFT model, contrast enhancement, noise removal, CLAHE for even
lighting, and Otsu's thresholding. CRAFT's accuracy surpasses other methods like CTPN, EAST,
and MSER, especially in complex text scenarios, although it may have a longer processing time
compared to these methods [38].
Similarly, the EAST algorithm relies on a fully convolutional neural network (FCN) for text
detection. However, the EAST approach also faces issues with optimization and modularity, as
indicated by the encountered module errors in C++ from the OpenCV library. The limitations of
the EAST algorithm for product label OCR include challenges in capturing complex designs and
layouts accurately, difficulty in detecting small text or fine details, and potential interference from
background noise or clutter.
Both the CRAFT and EAST algorithms lack semantic understanding of text regions, treating all
regions equally without considering their specific context or structure. This limitation is
particularly relevant for product label OCR, where different text regions often convey diverse
types of information. To overcome these challenges, alternative approaches tailored for product
label OCR can be explored. These approaches may involve combining text detection methods that
account for the unique characteristics of product labels, employing specialized preprocessing
techniques, utilizing domain-specific models, and incorporating techniques for semantic
understanding of the text content. Customizing the OCR pipeline to address the specific
challenges of product labels can improve the accuracy and reliability of OCR results.
In conclusion, while both CRAFT and EAST offer valuable capabilities in text detection, their
limitations point to the need for further development in OCR technology. This involves
combining text detection methods tailored for specific applications, utilizing specialized
preprocessing techniques, and incorporating models that understand the semantic context of text
content [21]
12. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.15, No.2, March 2024
68
3.5.4. BLOB/Boxes
The approach implements text detection using the BLOB (Binary Large Object) approach. It
includes functions for extracting bounding boxes, cropping images, applying non-maximum
suppression, rescaling boxes, and filtering boxes based on specific criteria.
In BLOB detection, parameters such as binarization threshold levels, minimum and maximum
pixel height and width are set to identify the desired text regions. The choice of these parameters
significantly influences the detection process. A higher binarization threshold can lead to the
detection of more dominant white pixels, while a lower threshold might result in a larger number
of undetected regions if the image has more dominant black pixels. Adjusting these parameters
allows the developer to target specific blob characteristics, but it requires a balance to achieve
accurate detection without overwhelming the system with too many potential regions [39].
However, a major limitation of this approach is that it encounters an issue where there are more
than 1600 boxes per image, causing the algorithm to get stuck in a loop for each box and resulting
in extremely slow performance. This significantly decreases the practical usability of the BLOB
approach. To address this limitation, an alternative contour-based approach can be considered.
The contour-based approach involves detecting contours in the image and filtering them based on
size, shape, or hierarchy. This technique significantly reduces the number of regions to be
processed, leading to improved algorithm speed. The contour-based approach is more efficient in
managing complex images, as it targets specific features of the text regions, thus reducing the
likelihood of processing irrelevant or excessive data [40].
In summary, the BLOB approach provides functions for text detection but suffers from poor
performance due to the substantial number of boxes per image. To overcome this limitation,
exploring the contour-based approach and incorporating contour filtering and edge detection
techniques can offer a more efficient and accurate text detection solution.
3.5.5. Google Vision API
The Google Vision API is a notable possibility for label OCR, offering powerful capabilities for
text recognition. It utilizes advanced machine learning models to accurately extract and interpret
text from images, making it an excellent choice for various OCR tasks, including label extraction
from product images.
The API incorporates optical character recognition, natural language processing, and computer
vision algorithms, ensuring reliable and accurate results. It handles complexities such as different
fonts, sizes, orientations, and backgrounds, providing a comprehensive solution for label OCR
[41].
However, there are important considerations when opting for the Google Vision API. Firstly, it is
a paid service, requiring users to create a developer account and adhere to Google’s pricing plans.
The costs associated with using the API depend on factors such as the volume of image requests
and the specific features utilized. This aspect of implementation may pose challenges for
researchers without technical expertise or limited development resources. Additionally, utilizing
the Google Vision API requires substantial computational resources. The API relies on cloud-
based infrastructure and powerful computing resources to process and analyze images effectively,
potentially resulting in additional computational expenses, particularly for high-volume or real-
time applications.
13. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.15, No.2, March 2024
69
Despite these considerations, the Google Vision API remains a reliable choice for label OCR due
to its robust capabilities and high accuracy. It offers advanced features and comprehensive tools
for extracting text from images. Users need to be aware of the associated costs, the need for a
developer account, and the potential computational expenses involved when considering the
Google Vision API as an OCR solution. Evaluating these factors will enable users to make
informed decisions based on their specific requirements and available resources.
In summary, the comparison of OCR tools in Table 1 provides an overview of various OCR
technologies, highlighting their key features and limitations. Furthermore, this table serves as a
quick reference for understanding the strengths and weaknesses of each OCR tool, aiding in
informed decision-making for specific OCR tasks.
Table 1. Comparison of OCR Tools
OCR Tool Key Features Limitations
MMOCR Multi-model OCR, Modular,
OpenMMLab ecosystem
Struggles with scale, High
computational need
EAST Convolutional network, Complex
layout handling
Limited semantic understanding,
Modularity issues
CRAFT Character-level detection, Text
region focus
Similar limitations as EAST
BLOB/Box
es
Bounding box extraction, Image
manipulation
Poor with many regions, Slow
OCR
Space API
User-friendly, Free and paid
versions
Rate/file size limits on free, Paid
for full features
Google
Vision API
Advanced ML models, Versatile
OCR
Paid service, High computational
resources
4. EXPERIMENTAL RESULTS
4.1. Text Recognition and Classification
In this section, the performance of the OCR system extraction and classification of textual data
from two sample product labels were analyzed.
4.1.1. Analysis of Eye Drop Product Label
The processed image as shown in Fugure-5 (The eye drops image/text) demonstrates how the
OCR result offers several insights. The image showcases a label from a bottle of eye drops. The
label contains multiple sections providing varied information, a list of active ingredients and their
respective purposes, as well as warnings about potential contraindications and side effects.
Throughout the label, text fonts and sizes vary to emphasize certain sections over others. The
main extracted text demonstrates how the OCR system can recognize important parts like “Drug
Facts,” “Active ingredients,” “Uses,” “Warnings,” and “Directions.” In addition, the system
classifies information, demonstrating its contextual comprehension skills, going beyond mere text
extraction.
For instance, “Instill 1 or 2 drops in the affected eye(s) as needed” is the distinct usage
instructions, but it has a few little errors, including the prefix “m”. The product category is
appropriately identified as “medication.” The ‘supplements_or_elements’ label for the active
14. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.15, No.2, March 2024
70
substances raises the possibility of improving the naming standards. Finally, the important user
warning instructions are retrieved with small distortions and prefixed aberrations such as “m” and
“Mm”. Lastly, the file path for the image was also captured by the system, and this information
may be crucial for traceability and additional research.
The text that was retrieved shows how well the system identified the main content pieces. Still,
the existence of twisted words—for example, “LUDriCant” rather than “Lubricant”—indicates
regions where the OCR still needs further refining. If random letters like “m” appear before
certain words, there may be noise in the image, or the algorithm can be refined even further to get
rid of these prefixes. Classifying parts such as ‘directions’, ‘warnings’, and ‘product_type’
indicates that the algorithm has a comprehension of the context of the text.
Figure 5. Eye Drop Label.
4.1.2. Analysis of CeraVe Daily Moisturizing Product Label
The “CeraVe Daily Moisturizing Lotion” label was subjected to the Optical Character
Recognition (OCR) method. First, the product’s title and description: “CeraVe Daily
Moisturizing Lotion” was clearly identified by the OCR. Additionally, the algorithm found a
significant description, which highlights the lotion’s special formulation created in collaboration
with specialists and its effectiveness in repairing the skin barrier. Figure 6 presents a label from a
CeraVe Daily Moisturizing Lotion bottle. It begins with the product name prominently displayed,
followed by a brief description emphasizing its unique formula developed with dermatologists
The middle section provides detailed directions for the lotion’s application and a comprehensive
list of ingredients. Icons and varied text formatting are used to visually distinguish different
sections of the label.
Furthermore, the algorithm effectively emphasized several of the product’s characteristics. For
example, there is included the MVE Delivery Technology, being oil-free, having hyaluronic acid,
not containing fragrance, and having undergone allergy testing. The algorithm did a
commendable job of extracting the lotion’s application recommendations.
However, the limitations make the algorithm not perfect in which shows a few minor issues such
as: “Avoid direct contact eye” was meant to say, “Avoid direct contact with eyes,” however there
was a small misunderstanding.
15. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.15, No.2, March 2024
71
A comprehensive list of all the recognized ingredients in the lotion’s recipe was provided.
Characters weren’t always appropriately distinguished or divided, though. For instance,
“CERAMIDE CERAMIDE CERAMIDE EOP” felt like an over-extraction, and
“CAPRYLICCAPRIC TRIGLYCERIDE” should ideally be “CAPRYLIC/CAPRIC
TRIGLYCERIDE.”
Figure 6. CeraVe Daily Moisturizing Lotion Label.
In conclusion, the OCR system has demonstrated admirable performance in text classification
and recognition from the image of these sample product labels. A significant amount of data has
been successfully recovered, the existence of minor errors suggests that more improvements are
required, including in the areas of noise reduction, special character identification, and contextual
comprehension.
4.2. Online Platform Comparison
When comparing OCR systems, our custom algorithm and OnlineOCR.net both demonstrated
remarkable effectiveness in extracting text from images, even with certain distinctive
characteristics. Both solutions were able to capture and transcribe significant elements of text
from the source image. In addition, the image below shows the comparison between the words
extracted from these two algorithms. For instance, the test image is a photo taken by an iPhone 11
Pro Max (Apple Inc., Cupertino, California, USA) device and the product name is ‘Ice Spice
Deodorant’.
16. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.15, No.2, March 2024
72
Figure 7. OCR Accuracy Evaluation.
The efficiency of our unique OCR method is seen in Figure 7 when compared to the
OnlineOCR.net solution. Based on a target picture of the “Ice Spice Deodorant” label, the
transcribing results from both platforms are shown in the attached image. Visual examination
reveals that both methods can capture a sizable percentage of the text, but with varying degrees of
reliability and completeness. In the aspect of data volume, both OCR systems were able to
successfully extract a considerable amount of text from the image. Even though OnlineOCR.net
gathered more specifics, our proprietary OCR algorithm proved competent by drawing out most
of the text components. Notably, both solutions precisely identified and transcribed crucial
segments such as product details, usage instructions, warnings, and ingredient lists.
Despite variances in extraction precision, it’s noteworthy that our proprietary OCR algorithm
showcased significant efficiency. For instance, text strings like “48 HOUR ODOR
PROTECTION,” “DIRECTIONS: Twist up product. Apply to underarms only, Use daily for best
results,” and “INGREDIENTS: DIPROPYLENE GLYCOL, WATER, PROPYLENE GLYCOL,
SODIUM STEARATE, POLOXAMINE 1307, FRAGRANCE, PPG-3 MYRISTYL ETHER,
TETRASODIUM EDTA, VIOLET 2, GREEN 6” were accurately transcribed by both OCR
platforms, indicating that our proprietary OCR algorithm was able to match the precision of
OnlineOCR.net in specific aspects.
Regarding the output presentation, both solutions delivered results in a format that was easy to
read and accessible. Although there’s potential for refining the structure in our proprietary OCR
algorithm, it still demonstrated parallel performance in terms of generating a user-friendly output.
To conclude, the proprietary OCR algorithm showed impressive capability in text extraction from
images, equating to OnlineOCR.net in various dimensions. Its accuracy in transcribing certain
vital text components was particularly impressive, which illustrated the considerable efficacy of
our algorithm. While there were instances where OnlineOCR.net showed a minor advantage, the
review emphasized the potential of our custom solution. This analysis highlights the areas for
improvement in our OCR system, such as enhanced error management, fine-tuned pre-processing
methodologies, and post-processing spelling corrections. Despite slight differences when
contrasted with a tool like OnlineOCR.net, our OCR algorithm is a well-established, generic
solution.
4.3. Implementation in the Web
In a bid to further reveal and demonstrate “LabelOCRWebsite,” serves as an interactive interface
allowing users to upload images of product labels and retrieve the extracted textual information
17. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.15, No.2, March 2024
73
therein. The source code and assets for this platform are publicly accessible on GitHub under the
repository “LabelOCRWebsite”.
The “LabelOCRWebsite” is structured as a Flask application, integrating HTML and Bootstrap
for the front-end design, thereby ensuring a user-friendly and intuitive interface. The back-end
logic, written in Python, harnesses the OCR model to process uploaded images, extract textual
data, and present the results to the user in a readable format. This platform exemplifies a real-
world application of the OCR system, showcasing its potential in addressing the challenges of
manual data extraction from product labels.
Furthermore, to supply a comprehensive understanding and an easy-to-follow guide on how the
OCR system was developed and implemented, a Google Collab Notebook was prepared.
Although the contents of the notebook could not be accessed directly, it’s presumed to have a
step-by-step walkthrough of the OCR model’s training, evaluation, and deployment process,
possibly alongside relevant code snippets and explanations. This notebook is intended to serve as
an educational resource, helping the reproducibility of the research and offering insights into the
methodologies employed.
The synergy between the “LabelOCRWebsite” platform and the Google Collab notebook
presents a holistic approach to not only proposing a solution to a societal challenge but also
ensuring that the community can access, learn from, and build upon this research. The interactive
platform demonstrates the practical utility of the OCR system, while the Collab notebook
provides a detailed exposition of the technical underpinnings and procedures involved in
realizing this solution.
5. CONCLUSIONS
This research marks a significant advancement in the domain of Optical Character Recognition
(OCR), specifically in the context of product label data extraction. By integrating a Convolutional
Recurrent Neural Network (CRNN) model, this study bridges the gap between conventional OCR
methods and the demands of modern data processing.
The CRNN model, integrating the feature extraction process of CNNs with the sequence
recognition strength of RNNs, demonstrated promise, achieving a peak validation accuracy of
47.06% over ten epochs. However, this performance level indicates that there is substantial room
for optimization and improvement. The intricacies of OCR tasks, especially in diverse and
complex environments like product labels, necessitate ongoing advancements and refinements in
model architecture and training methodologies.
The customized approach to OCR, leveraging the renowned Tesseract OCR engine, enhances the
system’s text recognition capabilities. The integration of NLP techniques acts as a catalyzer,
further boosting the model’s overall efficiency and performance. Another groundbreaking facet
of this research is the use of the Open Food Facts API. The incorporation of data from the Open
Food Facts API provided a multi-dimensional approach to text extraction, ensuring a more
comprehensive and accurate analysis of product labels. This approach underscores the potential
of integrating various data sources and techniques in OCR tasks.
While LSTM and other traditional OCR methods made significant strides, they faced challenges,
particularly with image variability. Transformer-based models, on the other hand, though
advanced, demanded substantial computational resources. The proposed CRNN model
ingeniously bridges the gap, amalgamating the strengths of both CNN and RNN architectures. In
a subsequent exploration using a GRU model, the system showed a sharp increase in its fifth
18. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.15, No.2, March 2024
74
epoch, achieving a validation accuracy of 58.82%. Finally, this performance discrepancy
accentuates the intricate nature of OCR and the necessity for sustained research.
This research not only addresses the immediate challenge of extracting data from product labels
but also sets the stage for future innovations. The implications of this work extend to critical
sectors such as healthcare and food safety. Moreover, it opens avenues for enhancing
accessibility for visually impaired individuals, highlighting the societal impact of advancements
in OCR technology. In practice, the results of this study can be implemented in various
applications where accurate and efficient text extraction from images is crucial. These include
consumer goods labeling, healthcare information systems, and automated data entry processes in
various industries. The findings also pave the way for future research, particularly in refining
deep learning models for OCR, exploring the integration of diverse data sources, and adapting
OCR technologies to different languages and contexts.
This model has the following limitations: 1. The output of the OCR technique depends on the
clarity of the image, 2. Currently, the system works on uploaded images for processing and 3.
The developed techniques are not tested on labels with vertical texts. In future, the developed
systems can be integrated together to form a functioning software which can process real-time
video captured from a mobile phone at a desirable distance and output text describing the product
label and answer queries by user. The techniques developed by our previous work as described in
[2] can also be combined to accommodate cylindrically distorted images.
6. Abbreviations
OCR: Optical Character Recognition
CNN: Convolutional Neural Network
RNN: Recurrent Neural Network
CRNN: Convolutional Recurrent Neural Network
GRU: Gated Recurrent Unit
LSTM: Long Short-Term Memory
NLP: Natural Language Processing
API: Application Programming Interface
REFERENCES
[1] Chauhan, A. Drawing Bounding Box Method in Image Processing. Medium. 2022. Available online:
https://pub.towardsai.net/drawing-bounding-box-method-in-image-processing-ec7487393cfa
(accessed on 31 July 2023).
[2] Gromova, K.; Elangovan, V. Automatic Extraction of Medication Information from Cylindrically
Distorted Pill Bottle Labels. Mach. Learn. Knowl. Extr. 2022, 4, 852-864.
https://doi.org/10.3390/make4040043
[3] Grubert, J.; Gao, Y. Mobile OCR: A Benchmark for Publicly Available Recognition Systems.
Stanford University. 2023. Available online:
https://stacks.stanford.edu/file/druid:bf950qp8995/Grubert_Gao.pdf
[4] How to Find the Bounding Rectangle of an Image Contour in OpenCV Python. Available online:
https://www.tutorialspoint.com/how-to-find-the-bounding-rectangle-of-an-image-contour-in-opencv-
python (accessed on 19 June 2023).
[5] Tesseract OCR. Tesseract Documentation. Tesseract OCR. 2023. Version 5.0.0. Available online:
https://tesseract-ocr.github.io/tessdoc/Home.html (accessed on 31 July 2023).
[6] Institute of Medicine (US), Roundtable on Environmental Health Sciences, Research, and Medicine.
Global Environmental Health in the 21st Century: From Governmental Regulation to Corporate
Social Responsibility: Workshop Summary. Washington (DC): National Academies Press (US);
19. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.15, No.2, March 2024
75
2007. 5, Corporate Social Responsibility. Available from:
https://www.ncbi.nlm.nih.gov/books/NBK53982/ (accessed on 31 July 2023).
[7] Open Food Facts - World. Available online: https://world.openfoodfacts.org (accessed on 31 July
2023).
[8] OpenCV. Documentation. OpenCV. 2023. Version 4.8.0. Available online:
https://docs.opencv.org/4.x/ (accessed on 31 July 2023).
[9] PIL (Pillow). Pillow (PIL Fork) Documentation. Read the Docs. 2023. Version 9.5.0. Available
online: https://pillow.readthedocs.io/en/stable/index.html (accessed on 31 July 2023).
[10] PyTorch. PyTorch Documentation. Version 2.0.1+cpu. PyTorch. 2023. Available online:
https://pytorch.org/docs/stable/index.html (accessed on 31 July 2023).
[11] Rosebrock, A. OpenCV Text Detection (EAST Text Detector). PyImageSearch. 2018. Available
online: https://pyimagesearch.com/2018/08/20/opencv-text-detection-east-text-detector/ (accessed on
31 July 2023).
[12] Sun, H.; et al. Spatial Dual-Modality Graph Reasoning for Key Information Extraction. arXiv. 2021.
Available online: http://arxiv.org/abs/2103.14470 (accessed on 31 July 2023).
[13] "Optical Character Recognition Systems." American Foundation for the Blind. Accessed November
2, 2023. https://www.afb.org/node/16207/optical-character-recognition-systems.
[14] Li Zhang, Chee Peng Lim, Jungong Han, "Complex Deep Learning and Evolutionary Computing
Models in Computer Vision", Complexity, vol. 2019, Article ID 1671340, 2 pages, 2019.
https://doi.org/10.1155/2019/1671340
[15] Namysl, Marcin, and Iuliu Konya. "Efficient, lexicon-free OCR using deep learning." 2019
international conference on document analysis and recognition (ICDAR). IEEE,
2019.https://arxiv.org/abs/1906.01969 (accessed on 31 July 2023).
[16] Sydorenko, Iryna. "OCR with Deep Learning: The Curious Machine Learning Case." Label Your
Data, May 25, 2023. https://labelyourdata.com/articles/ocr-with-deep-learning
[17] Ignat, Oana, et al. “OCR Improves Machine Translation for Low-Resource Languages.” arXiv
preprint arXiv:2202.13274 (2022).
[18] Deep Learning Based OCR Text Recognition Using Tesseract. LearnOpenCV. Available online:
https://www.learnopencv.com/deep-learning-based-cor-text-recognition-using-tesseract (accessed on
31 July 2023).
[19] Jawahar, Ganesh, Benoît Sagot, and Djamé Seddah. “What does BERT learn about the structure of
language?.” ACL 2019-57th Annual Meeting of the Association for Computational Linguistics. 2019.
[20] Guarasci, R., Silvestri, S., De Pietro, G., Fujita, H., & Esposito, M. (2022). BERT syntactic transfer:
A computational ex-periment on Italian, French and English languages. Computer Speech &
Language, 71, 101261.
[21] Zhou, Xinyu & Yao, Cong & Wen, He & Wang, Yuzhi & Zhou, Shuchang & He, Weiran & Liang,
Jiajun. (2017). EAST: An Efficient and Accurate Scene Text Detector.
https://www.researchgate.net/publication/316015737_EAST_An_Efficient_and_Accurate_Scene_Te
xt_Detector
[22] R. Mittal and A. Garg, "Text extraction using OCR: A Systematic Review," 2020 Second
International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore,
India, 2020, pp. 357-362, doi: 10.1109/ICIRCA48905.2020.9183326.
https://ieeexplore.ieee.org/document/9183326
[23] Liu, X., Meehan, J., Tong, W. et al. DLI-IT: a deep learning approach to drug label identification
through image and text embedding. BMC Med Inform Decis Mak 20, 68 (2020).
https://doi.org/10.1186/s12911-020-1078-3
[24] Drug Label Extraction using Deep Learning. Section.io. Available online:
https://www.section.io/engineering-education/drug-labels-extraction-using-deep-learning/ (accessed
on 31 July 2023).
[25] A. Yadav, S. Singh, M. Siddique, N. Mehta and A. Kotangale, "OCR using CRNN: A Deep Learning
Approach for Text Recognition," 2023 4th International Conference for Emerging Technology
(INCET), Belgaum, India, 2023, pp. 1-6, doi: 10.1109/INCET57972.2023.10170436.
[26] V. Wati, K. Kusrini and H. A. Fatta, "Real Time Face Expression Classification Using Convolutional
Neural Network Algorithm," 2019 International Conference on Information and Communications
Technology (ICOIACT), Yogyakarta, Indonesia, 2019, pp. 497-501, doi:
10.1109/ICOIACT46704.2019.8938521.
20. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.15, No.2, March 2024
76
[27] V. E. Bugayong, J. Flores Villaverde and N. B. Linsangan, "Google Tesseract: Optical Character
Recognition (OCR) on HDD / SSD Labels Using Machine Vision," 2022 14th International
Conference on Computer and Automation Engineering (ICCAE), Brisbane, Australia, 2022, pp. 56-
60, doi: 10.1109/ICCAE55086.2022.9762440.
[28] C. Yi, Y. Tian and A. Arditi, "Portable Camera-Based Assistive Text and Product Label Reading
From Hand-Held Objects for Blind Persons," in IEEE/ASME Transactions on Mechatronics, vol. 19,
no. 3, pp. 808-817, June 2014, doi: 10.1109/TMECH.2013.2261083.
[29] Tensorway. "What Is Optical Character Recognition (OCR): Its Working, Limitations, and
Alternatives." Tensorway. https://www.tensorway.com/post/optical-character-recognition
[30] Potrimba, Petru. "What is Optical Character Recognition (OCR)?" Roboflow Blog, November 21,
2023.https://blog.roboflow.com/what-is-optical-character-recognition-ocr/
[31] Tripathi, Pankaj. "Major Limitations Of OCR Technology and How IDP Systems Overcome Them."
Docsumo Blog, October 9, 2023. https://www.docsumo.com/blog/ocr-limitations
[32] Bais, Gourav. "Building Deep Learning-Based OCR Model: Lessons Learned." Neptune.ai.
September 12, 2023. https://neptune.ai/blog/building-deep-learning-based-ocr-model.
[33] Analytics Vidhya. "Introduction to Long Short-Term Memory (LSTM)." Analytics Vidhya Blog,
March 2021. https://www.analyticsvidhya.com/blog/2021/03/introduction-to-long-short-term-
memory-lstm/.
[34] Infrrd. "Transformer-Based OCR Model: How OCR Decoder Works." Infrrd AI Blog. Accessed
November 7, 2023. https://www.infrrd.ai/blog/transformer-based-ocr.
[35] B. Shi, X. Bai and C. Yao, "An End-to-End Trainable Neural Network for Image-Based Sequence
Recognition and Its Application to Scene Text Recognition," in IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 39, no. 11, pp. 2298-2304, 1 Nov. 2017, doi:
10.1109/TPAMI.2016.2646371.
[36] a9t9 software GmbH. "Free OCR API." OCR.space. Last modified 2023. Version 3.52. Accessed
June 19, 2023. https://ocr.space/OCRAPI#local.
[37] MMOCR Contributors. (2020). OpenMMLab Text Detection, Recognition and Understanding
Toolbox (Version 0.3.0) [Computer software]. https://github.com/open-mmlab/mmocr
[38] Baek, Youngmin, Bado Lee, Dongyoon Han, Sangdoo Yun and Hwalsuk Lee. “Character Region
Awareness for Text Detection.” 2019 IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR) (2019): 9357-9366
[39] Acoba, Jonathan James. "Computer Vision Tutorial: Blob Detection." CodeMentor. Accessed
December 19, 2023. https://www.codementor.io/@jonathanjamesacoba/computer-vision-tutorial-
blob-detection-130qpyy6hr.
[40] "Text Detection and Recognition." MathWorks. Accessed December 19, 2023.
https://www.mathworks.com/help/vision/text-detection-and-recognition.html.
[41] Google Cloud. Vision AI | Cloud Vision API. Available online: https://cloud.google.com/vision
(accessed on 31 July 2023).
AUTHORS
Hansi Seitaj received his Bachelor of Science in Computer Science in 2023 from
Pennsylvania State University, Abington. His research interests are in developing real-world
applications of computer science and machine learning algorithms. He is an
interdisciplinary researcher and technologist who combines social responsibility with
technical expertise.
Dr. Vinayak Elangovan is an Assistant Professor of Computer Science at Penn State
Abington. His research interest includes computer vision, machine vision, multi-sensor data
fusion and activity sequence analysis with keen interest in software applications
development and database management.