The document compares the performance of various optical character recognition (OCR) tools. It analyzes eight OCR tools - Online OCR, Free Online OCR, OCR Convert, Convert image to text.net, Free OCR, i2OCR, Free OCR to Word Convert, and Google Docs. The document provides sample outputs of each tool processing the same input image. It then evaluates the tools based on character accuracy, character error rate, special symbol accuracy, and special symbol error rate to determine which tools most accurately convert images to editable text.
Optical character recognition (OCR) is process of classification of optical patterns contained in a digital image. The process of OCR Recognition involves several steps including pre-processing, segmentation, feature extraction, classification. Pre-processing is for done the basic operation on input image like noise reduction which remove the noisy signal from image. Segmentation stage for segment the given image into line by line and segment each character from segmented line. Future extraction calculates the characteristics of character. A Radial Basis Function Neural Network (RBFNN) is used to classification contains the database and does the comparison.
Optical character recognition (OCR) is process of classification of optical patterns contained in a digital image. The process of OCR Recognition involves several steps including pre-processing, segmentation, feature extraction, classification. Pre-processing is for done the basic operation on input image like noise reduction which remove the noisy signal from image. Segmentation stage for segment the given image into line by line and segment each character from segmented line. Future extraction calculates the characteristics of character. A Radial Basis Function Neural Network (RBFNN) is used to classification contains the database and does the comparison.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
The interaction between the paper documents and the electronic devices in more integrated and efficient way. Using this computers try to deal with paper documents as they deal with other forms of computer media. So the paper would be as readable by the computer as magnetic and optical disks.
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text.
It is a system that provides a full alphanumeric recognition of printed or handwritten characters at electronic speed by simply scanning the form. It is widely used as a form of data entry from some sort of original paper data source, whether documents, sales receipts, mail, or any number of printed records.
It is a common method of digitizing printed texts so that they can be electronically searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech and text mining.OCR is a field of research in pattern recognition, artificial intelligence and computer vision. More recently, the term Intelligent Character Recognition(ICR) has been used to describe the process of interpreting image data, in particular alphanumeric text .
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...ijiert bestjournal
Optical character recognition systems have been effectively developed for the recognition of p rinted characters. Optical character recognition is an awesome computer vision technique with various applications ranging from saving real time scripts digitally and deriving context based intelligence using natural language processing from the texts. One such application is the recognition of machine printed characters. This paper illustrates the technique to identify machine printed characters using Blob detection method and Image processing. In many cases of such machine printed characters there is simi larity between character colour and background colour. There is mix up of reflected light and scattered light. Colour is not consistent across character area or background area. Paper explains how Blob detection technique is used for recognition of these m achines printed characters.
Users Approach on Providing Feedback for Smart Home Devices – Phase IIijujournal
Smart Home technology has accomplished extraordinary success in making individuals' lives more straightforward and relaxing. Technology has recently brought about numerous savvy and refined frame works that advanced clever living innovation. In this paper, we will investigate the behavioral intention of user's approach to providing feedback for smart home devices. We will conduct an online survey for a sample of three to five students selected by simple random sampling to study the user's motto for giving feedback on smart home devices and their expectations. We have observed that most users are ready to actively share their input on smart home devices to improve the product's service and quality to fulfill the user’s needs and make their lives easier.
Users Approach on Providing Feedback for Smart Home Devices – Phase IIijujournal
Smart Home technology has accomplished extraordinary success in making individuals' lives more
straightforward and relaxing. Technology has recently brought about numerous savvy and refined frame
works that advanced clever living innovation. In this paper, we will investigate the behavioral intention of
user's approach to providing feedback for smart home devices. We will conduct an online survey for a
sample of three to five students selected by simple random sampling to study the user's motto for giving
feedback on smart home devices and their expectations. We have observed that most users are ready to
actively share their input on smart home devices to improve the product's service and quality to fulfill the
user’s needs and make their lives easier.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
The interaction between the paper documents and the electronic devices in more integrated and efficient way. Using this computers try to deal with paper documents as they deal with other forms of computer media. So the paper would be as readable by the computer as magnetic and optical disks.
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text.
It is a system that provides a full alphanumeric recognition of printed or handwritten characters at electronic speed by simply scanning the form. It is widely used as a form of data entry from some sort of original paper data source, whether documents, sales receipts, mail, or any number of printed records.
It is a common method of digitizing printed texts so that they can be electronically searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech and text mining.OCR is a field of research in pattern recognition, artificial intelligence and computer vision. More recently, the term Intelligent Character Recognition(ICR) has been used to describe the process of interpreting image data, in particular alphanumeric text .
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...ijiert bestjournal
Optical character recognition systems have been effectively developed for the recognition of p rinted characters. Optical character recognition is an awesome computer vision technique with various applications ranging from saving real time scripts digitally and deriving context based intelligence using natural language processing from the texts. One such application is the recognition of machine printed characters. This paper illustrates the technique to identify machine printed characters using Blob detection method and Image processing. In many cases of such machine printed characters there is simi larity between character colour and background colour. There is mix up of reflected light and scattered light. Colour is not consistent across character area or background area. Paper explains how Blob detection technique is used for recognition of these m achines printed characters.
Users Approach on Providing Feedback for Smart Home Devices – Phase IIijujournal
Smart Home technology has accomplished extraordinary success in making individuals' lives more straightforward and relaxing. Technology has recently brought about numerous savvy and refined frame works that advanced clever living innovation. In this paper, we will investigate the behavioral intention of user's approach to providing feedback for smart home devices. We will conduct an online survey for a sample of three to five students selected by simple random sampling to study the user's motto for giving feedback on smart home devices and their expectations. We have observed that most users are ready to actively share their input on smart home devices to improve the product's service and quality to fulfill the user’s needs and make their lives easier.
Users Approach on Providing Feedback for Smart Home Devices – Phase IIijujournal
Smart Home technology has accomplished extraordinary success in making individuals' lives more
straightforward and relaxing. Technology has recently brought about numerous savvy and refined frame
works that advanced clever living innovation. In this paper, we will investigate the behavioral intention of
user's approach to providing feedback for smart home devices. We will conduct an online survey for a
sample of three to five students selected by simple random sampling to study the user's motto for giving
feedback on smart home devices and their expectations. We have observed that most users are ready to
actively share their input on smart home devices to improve the product's service and quality to fulfill the
user’s needs and make their lives easier.
October 2023-Top Cited Articles in IJU.pdfijujournal
International Journal of Ubiquitous Computing (IJU) is a quarterly open access peer-reviewed journal that provides excellent international forum for sharing knowledge and results in theory, methodology and applications of ubiquitous computing. Current information age is witnessing a dramatic use of digital and electronic devices in the workplace and beyond. Ubiquitous Computing presents a rather arduous requirement of robustness, reliability and availability to the end user. Ubiquitous computing has received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational applications in real life. The aim of the journal is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
ACCELERATION DETECTION OF LARGE (PROBABLY) PRIME NUMBERSijujournal
In order to avoid unnecessary applications of Miller-Rabin algorithm to the number in question, we resort
to trial division by a few initial prime numbers, since such a division take less time. How far we should go
with such a division is the that we are trying to answer in this paper?For the theory of the matter is fully
resolved. However, that in practice we do not have much use.Therefore, we present a solution that is
probably irrelevant to theorists, but it is very useful to people who have spent many nights to produce
large (probably) prime numbers using its own software.
A novel integrated approach for handling anomalies in RFID dataijujournal
Radio Frequency Identification (RFID) is a convenient technology employed in various applications. The
success of these RFID applications depends heavily on the quality of the data stream generated by RFID
readers. Due to various anomalies found predominantly in RFID data it limits the widespread adoption of
this technology. Our work is to eliminate the anomalies present in RFID data in an effective manner so that
it can be applied for high end applications. Our approach is a hybrid approach of middleware and
deferred because it is not always possible to remove all anomalies and redundancies in middleware. The
processing of other anomalies is deferred until the query time and cleaned by business rules. Experimental
results show that the proposed approach performs the cleaning in an effective manner compared to the
existing approaches.
UBIQUITOUS HEALTHCARE MONITORING SYSTEM USING INTEGRATED TRIAXIAL ACCELEROMET...ijujournal
Ubiquitous healthcare has become one of the prominent areas of research inorder to address the
challenges encountered in healthcare environment. In contribution to this area, this study developed a
system prototype that recommends diagonostic services based on physiological data collected in real time
from a distant patient. The prototype uses WBAN body sensors to be worn by the individual and an android
smart phone as a personal server. Physiological data is collected and uploaded to a Medical Health
Server (MHS) via GPRS/internet to be analysed. Our implemented prototype monitors the activity, location
and physiological data such as SpO2 and Heart Rate (HR) of the elderly and patients in rehabilitation. The
uploaded information can be accessed in real time by medical practitioners through a web application.
ENHANCING INDEPENDENT SENIOR LIVING THROUGH SMART HOME TECHNOLOGIESijujournal
The population of elderly folks is ballooning worldwide as people live longer. But getting older often
means declining health and trouble living solo. Smart home tech could keep an eye on old folks and get
help quickly when needed so they can stay independent. This paper looks at a system combining wireless
sensors, video watches, automation, resident monitoring, emergency detection, and remote access. Sensors
track health signs, activities, appliance use. Video analytics spot odd stuff like falls. Sensor fusion and
machine learning find normal patterns so wonks can see unhealthy changes and send alerts. Multi-channel
alerts reach caregivers and emergency folks. A LabVIEW can integrate devices and enables local and
remote oversight and can control and handle emergency responses. Benefits seem to be early illness clues,
quick help, less burden on caregivers, and optimized home settings. But will old folks use all this tech? Can
we prove it really helps folks live longer and better? More research on maximizing reliability and
evaluating real-world impacts is needed. But designed thoughtfully, smart homes could may profoundly
improve the aging experience.
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCEijujournal
In today’s Internet world, log file analysis is becoming a necessary task for analyzing the customer’s
behavior in order to improve advertising and sales as well as for datasets like environment, medical,
banking system it is important to analyze the log data to get required knowledge from it. Web mining is the
process of discovering the knowledge from the web data. Log files are getting generated very fast at the
rate of 1-10 Mb/s per machine, a single data center can generate tens of terabytes of log data in a day.
These datasets are huge. In order to analyze such large datasets we need parallel processing system and
reliable data storage mechanism. Virtual database system is an effective solution for integrating the data
but it becomes inefficient for large datasets. The Hadoop framework provides reliable data storage by
Hadoop Distributed File System and MapReduce programming model which is a parallel processing
system for large datasets. Hadoop distributed file system breaks up input data and sends fractions of the
original data to several machines in hadoop cluster to hold blocks of data. This mechanism helps to
process log data in parallel using all the machines in the hadoop cluster and computes result efficiently.
The dominant approach provided by hadoop to “Store first query later”, loads the data to the Hadoop
Distributed File System and then executes queries written in Pig Latin. This approach reduces the response
time as well as the load on to the end system. This paper proposes a log analysis system using Hadoop
MapReduce which will provide accurate results in minimum response time.
SERVICE DISCOVERY – A SURVEY AND COMPARISONijujournal
With the increasing number of services in the internet, companies’ intranets, and home networks: service
discovery becomes an integral part of modern networked system. This paper provides a comprehensive
survey of major solutions for service discovery. We cover techniques and features used in existing systems.
Although a few survey articles have been published on this object, our contribution focuses on comparing
and analyzing surveyed solutions according eight prime criteria, which we have defined before. This
comparison will be helpful to determine limits of existing discovery protocols and identify future research
opportunities in service discovery.
SIX DEGREES OF SEPARATION TO IMPROVE ROUTING IN OPPORTUNISTIC NETWORKSijujournal
Opportunistic Networks are able to exploit social behavior to create connectivity opportunities. This
paradigm uses pair-wise contacts for routing messages between nodes. In this context we investigated if the
“six degrees of separation” conjecture of small-world networks can be used as a basis to route messages in
Opportunistic Networks. We propose a simple approach for routing that outperforms some popular
protocols in simulations that are carried out with real world traces using ONE simulator. We conclude that
static graph models are not suitable for underlay routing approaches in highly dynamic networks like
Opportunistic Networks without taking account of temporal factors such as time, duration and frequency of
previous encounters.
International Journal of Ubiquitous Computing (IJU)ijujournal
International Journal of Ubiquitous Computing (IJU) is a quarterly open access peer-reviewed journal that provides excellent international forum for sharing knowledge and results in theory, methodology and applications of ubiquitous computing. Current information age is witnessing a dramatic use of digital and electronic devices in the workplace and beyond. Ubiquitous Computing presents a rather arduous requirement of robustness, reliability and availability to the end user. Ubiquitous computing has received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational applications in real life. The aim of the journal is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
PERVASIVE COMPUTING APPLIED TO THE CARE OF PATIENTS WITH DEMENTIA IN HOMECARE...ijujournal
The aging population and the consequent increase in the incidence of dementias is causing many
challenges to health systems, mainly related to infrastructure, low services quality and high costs. One
solution is to provide the care at house of the patient, through of home care services. However, it is not a
trivial task, since a patient with dementia requires constant care and monitoring from a caregiver, who
suffers physical and emotional overload. In this context, this work presents an modelling for development of
pervasive systems aimed at helping the care of these patients in order to lessen the burden of the caregiver
while the patient continue to receive the necessary care.
A proposed Novel Approach for Sentiment Analysis and Opinion Miningijujournal
as the people are being dependent on internet the requirement of user view analysis is increasing
exponentially. Customer posts their experience and opinion about the product policy and services. But,
because of the massive volume of reviews, customers can’t read all reviews. In order to solve this problem,
a lot of research is being carried out in Opinion Mining. In order to solve this problem, a lot of research is
being carried out in Opinion Mining. Through the Opinion Mining, we can know about contents of whole
product reviews, Blogs are websites that allow one or more individuals to write about things they want to
share with other The valuable data contained in posts from a large number of users across geographic,
demographic and cultural boundaries provide a rich data source not only for commercial exploitation but
also for psychological & sociopolitical research. This paper tries to demonstrate the plausibility of the idea
through our clustering and classifying opinion mining experiment on analysis of blog posts on recent
product policy and services reviews. We are proposing a Nobel approach for analyzing the Review for the
customer opinion
International Journal of Ubiquitous Computing (IJU)ijujournal
International Journal of Ubiquitous Computing (IJU) is a quarterly open access peer-reviewed journal that provides excellent international forum for sharing knowledge and results in theory, methodology and applications of ubiquitous computing. Current information age is witnessing a dramatic use of digital and electronic devices in the workplace and beyond. Ubiquitous Computing presents a rather arduous requirement of robustness, reliability and availability to the end user. Ubiquitous computing has received a significant and sustained research interest in terms of designing and deploying large scale and high performance computational applications in real life. The aim of the journal is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
USABILITY ENGINEERING OF GAMES: A COMPARATIVE ANALYSIS OF MEASURING EXCITEMEN...ijujournal
Usability engineering and usability testing are concepts that continue to evolve. Interesting research studies
and new ideas come up every now and then. This paper tests the hypothesis of using an EDA-based
physiological measurements as a usability testing tool by considering three measures; which are observers‟
opinions, self-reported data and EDA-based physiological sensor data. These data were analyzed
comparatively and statistically. It concludes by discussing the findings that has been obtained from those
subjective and objective measures, which partially supports the hypothesis.
SECURED SMART SYSTEM DESING IN PERVASIVE COMPUTING ENVIRONMENT USING VCSijujournal
Ubiquitous Computing uses mobile phones or tiny devices for application development with sensors
embedded in mobile phones. The information generated by these devices is a big task in collection and
storage. For further, the data transmission to the intended destination is delay tolerant. In this paper, we
made an attempt to propose a new security algorithm for providing security to Pervasive Computing
Environment (PCE) system using Public-key Encryption (PKE) algorithm, Biometric Security (BS)
algorithm and Visual Cryptography Scheme (VCS) algorithm. In the proposed PCE monitoring system it
automates various home appliances using VCS and also provides security against intrusion using Zigbee
IEEE 802.15.4 based Sensor Network, GSM and Wi-Fi networks are embedded through a standard Home
gateway.
PERFORMANCE COMPARISON OF ROUTING PROTOCOLS IN MOBILE AD HOC NETWORKSijujournal
Routing protocols have an important role in any Mobile Ad Hoc Network (MANET). Researchers have
elaborated several routing protocols that possess different performance levels. In this paper we give a
performance evaluation of AODV, DSR, DSDV, OLSR and DYMO routing protocols in Mobile Ad Hoc
Networks (MANETS) to determine the best in different scenarios. We analyse these MANET routing
protocols by using NS-2 simulator. We specify how the Number of Nodes parameter influences their
performance. In this study, performance is calculated in terms of Packet Delivery Ratio, Average End to
End Delay, Normalised Routing Load and Average Throughput.
Optical Character Recognition (OCR) is a technique, used to convert scanned image into editable text
format. Many different types of Optical Character Recognition (OCR) tools are commercially available
today; it is a useful and popular method for different types of applications. OCR can predict the accurate
result depends on text pre-processing and segmentation algorithms. Image quality is one of the most
important factors that improve quality of recognition in performing OCR tools. Images can be processed
independently (.png, .jpg, and .gif files) or in multi-page PDF documents (.pdf). The primary objective of
this work is to provide the overview of various Optical Character Recognition (OCR) tools and analyses of
their performance by applying the two factors of OCR tool performance i.e. accuracy and error rate.
DETERMINING THE NETWORK THROUGHPUT AND FLOW RATE USING GSR AND AAL2Rijujournal
In multi-radio wireless mesh networks, one node is eligible to transmit packets over multiple channels to
different destination nodes simultaneously. This feature of multi-radio wireless mesh network makes high
throughput for the network and increase the chance for multi path routing. This is because the multiple
channel availability for transmission decreases the probability of the most elegant problem called as
interference problem which is either of interflow and intraflow type. For avoiding the problem like
interference and maintaining the constant network performance or increasing the performance the WMN
need to consider the packet aggregation and packet forwarding. Packet aggregation is process of collecting
several packets ready for transmission and sending them to the intended recipient through the channel,
while the packet forwarding holds the hop-by-hop routing. But choosing the correct path among different
available multiple paths is most the important factor in the both case for a routing algorithm. Hence the
most challenging factor is to determine a forwarding strategy which will provide the schedule for each
node for transmission within the channel. In this research work we have tried to implement two forwarding
strategies for the multi path multi radio WMN as the approximate solution for the above said problem. We
have implemented Global State Routing (GSR) which will consider the packet forwarding concept and
Aggregation Aware Layer 2 Routing (AAL2R) which considers the both concept i.e. both packet forwarding
and packet aggregation. After the successful implementation the network performance has been measured
by means of simulation study.
A SURVEY: TO HARNESS AN EFFICIENT ENERGY IN CLOUD COMPUTINGijujournal
Cloud computing affords huge potential for dynamism, flexibility and cost-effective IT operations. Cloud
computing requires many tasks to be executed by the provided resources to achieve good performance,
shortest response time and high utilization of resources. To achieve these challenges there is a need to
develop a new energy aware scheduling algorithm that outperform appropriate allocation map of task to
optimize energy consumption. This study accomplished with all the existing techniques mainly focus on
reducing energy consumption
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
1. International Journal of UbiComp (IJU), Vol.6, No.3, July 2015
DOI:10.5121/iju.2015.6303 19
PERFORMANCE COMPARISON OF
OCR TOOLS
Dr. S.Vijayarani1
and Ms. A.Sakila2
1
Assistant Professor, Department of Computer Science, School of Computer Science and
Engineering, Bharathiar University, Coimbatore.
2
M.Phil Research Scholar, Department of Computer Science, School of Computer
Science and Engineering, Bharathiar University, Coimbatore.
ABSTRACT:
Optical Character Recognition (OCR) is a technique, used to convert scanned image into editable text
format. Many different types of Optical Character Recognition (OCR) tools are commercially available
today; it is a useful and popular method for different types of applications. OCR can predict the accurate
result depends on text pre-processing and segmentation algorithms. Image quality is one of the most
important factors that improve quality of recognition in performing OCR tools. Images can be processed
independently (.png, .jpg, and .gif files) or in multi-page PDF documents (.pdf). The primary objective of
this work is to provide the overview of various Optical Character Recognition (OCR) tools and analyses of
their performance by applying the two factors of OCR tool performance i.e. accuracy and error rate.
KEYWORDS:
Optical Character Recognition (OCR),Online OCR, Free Online OCR, OCR Convert, Convert image to
text.net, Free OCR, i2OCR, Free OCR to Word Convert, Google Docs.
1. INTRODUCTION
Optical Character Recognition technology recognizes the text from the images automatically. It
supports different types of image formats like JPG, PNG, BMP, GIF, TIFF and multi-page PDF
files. OCR involves analysis of the captured or scanned images and then translate character
images into character codes, so that it can be edited, searched, stored more efficiently, displayed
on-line, and used in machine processes [3] . Scanned images can easily extract that text with the
help of different OCR Tools. It works with images that almost consist of text in it [1]. The output
of a tool is based on the type of input image. Achieving 100% accuracy is not possible, but it is
better to have something rather than nothing [1]. To improve accuracy most of the OCR tools use
dictionaries, recognizing individual characters then it try to recognize entire words that exist in
the selected dictionary. Sometimes it is very difficult to extract text because different font size,
style, symbols and dark background. If we are using high resolution documents the OCR tools
will produce best results. Many OCR tools are available as of now, but only a few of them are
open source and free [2]. Normally, all the OCR tools process has five important steps. They are
preprocessing, segmentation, feature extraction, classification/recognition and post processing.
This is depicted in Figure 1[18].
2. International Journal of UbiComp (IJU), Vol.6, No.3, July 2015
20
Figure 1. OCR Tools Process
Input Image
Input image is digitalized images like a scanned or captured text image. It may be of different
formats, i.e. JPG, PNG, BMP, GIF, TIFF and multi-page PDF files.
Preprocessing
Preprocessing techniques are important and essential for OCR system for image handling. These
techniques are used to add or remove noises from the images, maintaining the correct contrast of
the image, background removal which contains any scenes or watermarks. These are applied into
images which enhance the image quality. This step is essential for OCR systems [12].
Segmentation
The accuracy of OCR system mainly depends on the segmentation algorithm being used.
Segmentation extracts pages, lines, words and then finally into characters from the text document
images [16]. Page segmentation separates graphics from text, a line segment is a part of a line that
is bounded by two distinct end points and Word segmentation is the problem of dividing a string
of written language into its component words [3]. Character segmentation separates characters
from others [12].
Feature Extraction
Feature Extraction stage analyzes a text segment and select a set of features that can be used to
uniquely identify the text segment [18]. This stage is used to extract the most relevant
information from the text image which helps to recognize the characters in the text [14].
Classification / recognition
Optical character Recognition is a most significant application. The main objective of Optical
Character Recognition (OCR) is to classify the optical patterns like alphanumeric and other
characters. The OCR is required when the information should be readable to both human and
machine [1]. Recognition has become essential for performing classification task [13].
3. International Journal of UbiComp (IJU), Vol.6, No.3, July 2015
21
Post Processing
The post processing stage is used to increase recognition. The goal of post processing is to detect
and correct grammatical misspellings in the OCR output text after the input image has been
scanned and completely processed.
Output Text
The result of the input images is displayed in the output text.
2. OCR TOOLS COMPARISON
This paper compares eight different types of OCR tools; they are,
1. Online OCR
2. Free Online OCR
3. OCR Convert
4. Convert image to text.net
5. Free OCR
6. i2OCR
7. Free OCR to Word Convert
8. Google Docs
The main goal of this work is to compare the performance these tools for finding the best OCR
tool. In order to perform the analysis, we provide an input image and this input image are
processed by these OCR tools and the output produced by these tools is considered for analysis.
Each OCR tools have produced different results for the same input image. The sample input
image (i.e. k-means clustering algorithm) given in Figure 2 is downloaded from google images
[17] and this image is used for this comparative analysis.
Figure 2 Input Image
4. International Journal of UbiComp (IJU), Vol.6, No.3, July 2015
22
2.1 Online OCR
OnlineOCR.net is free web-based Optical Character Recognition software (OCR) that allows, to
convert scanned PDF documents (including multipage files), faxes, photographs or digital camera
captured images (JPEG/JPG, BMP, PCX, PNG, GIF, ZIP file format) into editable and searchable
electronic documents [4]. This tool has the capability to convert the text image in to text and this
result may be displayed in different formats like Adobe PDF document, Microsoft Word
document, Microsoft Excel document, RTF document and Plain Text. It supports 46 languages
and has the ability to convert images to text format and its maximum input file size is 100 MB
[4]. The sample input image conversion performed by Online OCR tool [4] is depicted in Figure
2.1.
Figure 2.1 Online OCR
2.2 Free Online OCR
NewOCR.com is a free online OCR service that can analyze the text in any image file and
converts the text in the image into text format. Input files supported by this tool are JPEG, JFIF,
PNG, GIF, BMP, PBM, PGM, PPM and PCX. Compressed files supported by this tool are UNIX
compress, bzip2, bzip and gzip. Multi page documents such as TIFF, PDF, DOCX, ODT files
with images, multiple images in ZIP archive are also handled. After conversion the result has
displayed in different formats, i.e. Plain text (TXT), Microsoft Word (DOC), and Adobe Acrobat
(PDF). It supports 75 recognized languages and supports several font types. The advantage of
Free Online OCR is, it has taken unlimited uploads. The resultant output [5]is illustrated Figure
2.2.
Figure 2.2 Free Online OCR
5. International Journal of UbiComp (IJU), Vol.6, No.3, July 2015
23
2.3 OCR Convert
OCR Convert is a free online OCR service, which provides the facility to convert the scanned
image into text. It supports JPG, PNG, BMP, GIF, TIFF and multi-page PDF files and also
support low resolution images. The result may be in text format and this tool supports
simultaneous uploads and able to perform conversion process of files upto 5MB (aggregated).
The output text result [6]is shown in Figure 2.3.
Figure 2.3 OCR Convert
2.4 Convert image to text.net
Convert image to text.net tool is used to convert any scanned image into editable text file with the
new software JiNa OCR image to text. This software is very easy to use, just to upload an image
file and click on the button it converts directly into an open word document. The output formats
are Adobe PDF document, Microsoft Word document, Microsoft Excel document, Docx, HTML
and Text. The output result for convert images to text.net software [7] is shown figure 2.4.
Figure 2.4 Convert image to text.net
2.5 Free OCR
Free-OCR.com is a free online OCR (Optical Character Recognition) tool used to extract text
from any image and convert these images into an editable text document. It takes a JPG, GIF,
6. International Journal of UbiComp (IJU), Vol.6, No.3, July 2015
24
TIFF BMP or PDF (only first page) file formats and supports 30 different languages. The only
restriction of this tool is, the images must not be larger than 2MB. Output of the image [8] is
illustrated in Figure 2.5
Figure 2.5 Free OCR
2.6 i2OCR
i2OCR is a free online Optical Character Recognition (OCR) which extracts text from images and
it can be edited, formatted, indexed, searched, or translated. Input image file types are TIF, JPEG,
PNG, BMP, GIF, PBM, PGM and PPM. It supports 60+ Recognition Languages, major Image
Formats, Multi Column Document Analysis and 100% FREE with Unlimited Uploads. The
output result of the i2OCR [9] is given in Figure 2.6.
Figure 2.6 i2OCR
2.7 Free OCR to word convert
Free OCR to Word provides a new way of translating printed text to a digital file that can be
modified or edited in a word processor. The OCR to Word program works with any of the
popular image files of JPG, JPEG, PSD, PNG, GIF, TIFF, BMP and scanned image files, etc. All
of these file types are equally easy for Free OCR to Word and in just a few clicks, we can able to
get a fully editable and searchable files in MS Word or TXT[10]. The result is shown in Figure
2.7. [10]
7. International Journal of UbiComp (IJU), Vol.6, No.3, July 2015
25
Figure 2.7 Free OCR to word convert
2.8 Google Docs
Upload an image file or a scanned PDF to Google Docs, it Converts text to Google Docs format
and Google Docs will automatically perform OCR on the file before saving it to our account. If
the OCR operation is successful, all the extracted text is stored as a new document otherwise
Google Docs will store our original image without any modification. With Google Docs, we can
perform OCR on images and PDFs as large as 2 MB, in the output format of Google docs are
ODT, PDF, TXT, RTF, DOC and HTML. It supports 30 languages [11], the output text result
[11] is represented in Figure 2.8.
Figure 2.8 Google Docs
3. COMPARATIVE ANALYSIS
In order to perform the comparative analysis of the OCR tools, this paper consider two
performance measures and they are conversion accuracy and error rate. Conversion accuracy is
nothing but to identify whether all the alphabets, numbers and special symbols are converted
8. International Journal of UbiComp (IJU), Vol.6, No.3, July 2015
26
accurately or not. Error rate helps to identify how much of alphabets, numbers and special
symbols are not converted properly. The following tables 3.1 and 3.2 shows the Error rate of
OCR tools.
Table 3.1 Comparative Analysis of Online OCR, Free Online OCR, OCR Convert, Convert
image to text.net
S.
N
o
Original Text Online OCR Free Online
OCR
OCR Convert Convert Image
to text.net
1 M AI
2 first
3 Means Mmns
4 r0
(1)
,r0
(2)
,…. r0
(M)
41), 40, ..., 4m) r31 ‟. r32). r3") r§,", r§,2).....r§;'”' 41), 42), ...or
5 ri,1≤i≤N rbiNiNN ri. IsisN r,-. I €isN rb I i-<..A1
6 rito Dj r, to D, r, to D, r,- to D,» ritoDj
7 r0
(1)
4P r8) 1'87 rg)
8 φ(ri,r0
(j)
) ≤
φ(ri,r0
(j)
),1≤j,u≤
M
o(c,4ofr,4'1),lNJ,
u M
(Mn-18)) S
(p(r,-.r8‟)).l S j.
u S M
<p(r,».rX))
<<p(r,-.r§{„l).l € j.
11$ M
49(rhe),54)(rat
u),1j,u<Al
9 Dj Di Dj D,- Dj
10 1≤j≤M 1NjNM ISjSM I <j<M 1.<..j4M
11 riЄDj,
r0
(j)
=∑ /|Dj|
= lea rilipil rieDPif)” =
ZED, r;/lD,l
r,~eD‟,-.
rg) = 2,-CD‟ r,-
/|D,|
rie=EiED,
where IDJI
12 |Dj| ID, D,- D,- IDJI
13 change chan e
Table 3.2 Comparative Analysis of Free OCR, i2OCR, Google Docs.
S.N
o
Original Text Free OCR i2OCR Google docs
1 M
2 first ﬕrst
3 Means
4 r0
(1)
,r0
(2)
,…. r0
(M)
r8", r32’,
...,rï¬â€â€™
r3", r32‟. ....r§,„"‟ r3", r32‟. ....r§,„"‟
5 ri,1≤i≤N r,~, l<:'<N r,-. I <i<N r,-. I <i<N
9. International Journal of UbiComp (IJU), Vol.6, No.3, July 2015
27
6 rito Dj r, to D, r,- to D r,- to D,
7 r0
(1)
H87 r87 r87
8 φ(ri,r0
(j)
) ≤
φ(ri,r0
(j)
),1≤j,u≤
M
(p(riv'{P)<‘p(riv'
$u))vlgv u<M
«p<r,.rx*><<p(r,-
.r:;">.I <1. as M
«p<r,.rx*><<p(r,-.r:;">.I
<1. as M
9 Dj D,- D,- D,-
10 1≤j≤M l<j<M I <j<M I <j<M
11 riЄDj,
r0
(j)
=∑ /|Dj|
r,eD1,
rg) = Z,-GD’
r,-/ID,-|
r,-eD»,-.
If,” = Z,,_._,,, r,~/|D,|
r,-eD»,-.
If,” = Z,,_._,,, r,~/|D,|
12 |Dj| D] D,- D,
13 change
Free OCR to Word Convert has obtained the least place because this tool does not produce the
accurate results for conversion of characters, symbols and equations. Hence, it is not included in
the table. The sample output is given in Figure 2.7.
4. PERFORMANCE MEASURES
The main function of the OCR tools is to convert the given input images into text documents. In
order to compare the performance of the above mentioned OCR tools, the following strategies are
applied.
Strategy 1: To find the character accuracy (CA) and character error rate (CER) from the resultant
text documents. i.e To verify whether an OCR tool has converted all the characters available in
the input image correctly or not. For this the following formula is used.
Character accuracy (CA) = (a/n) *100
Character error rate = 100-CA.
(where a=Total number of characters in the resultant text document
n=Total number of characters in the input image)
Strategy 2: To find the special symbols accuracy (SA) and special symbols error rate (SER) from
the resultant text documents. i.e To verify whether an OCR tool has converted all the special
symbols (Σ, φ, ψ, ≥, ≤,=, +, *, -, /, ^,%, # |,(n)
, (n) ,etc.. ) available in the input image correctly or
not. For this the following formula is used.
Special Symbol accuracy (SA) = (b/m) *100
Special Symbol Error Rate SER= 100-SA.
(where b=Total number of special symbols in the resultant text document
m=Total number of special symbols in the input image)
10. International Journal of UbiComp (IJU), Vol.6, No.3, July 2015
28
Table 3 shows the Accuracy and Error rate of different OCR tools. First two columns displays
character accuracy (CA) and character error rate (CER), third and fourth column provides special
symbols accuracy (SA) and special symbols error rate (SER).
Table 3 Comparison between OCR tools
Figure 4.1 Character accuracy and Error rate of
the OCR tools
Figure 4.2 Special Symbols accuracy and Error
rate of the OCR tools
Figure 4.1 describes the overall character accuracy and error rate of the OCR tools. From this, it
is observed the performance of OCR convert, Convert image to text.net, i2OCR and Google Docs
are better than other tools. Figure 4.2 presents the overall Special symbols accuracy and error rate
of the OCR tools. In this measure all the OCR tools have produced 0% accuracy and error rate for
all these tools is 100% which shows no one tool has performed the process of converting the
mathematical symbols in equations accurately.
5. CONCLUSION
This work has analyzed the performance of eight different types of OCR tools. From this
analysis, we come to know that, the above mentioned OCR software tools cannot detect fonts and
formats properly; it gives only plain text as output. Hence, it has proved that the existing OCR
tools produced good results for converting characters from the text images but converting
S.No OCR Tools Character
Accuracy
(CA) (%)
Character
Error Rate
(CER) (%)
Special
Symbols
Accuracy
(EA) (%)
Special Symbol
Error Rate
(SER)(%)
1 Online OCR 95.9 4.10 0 100
2 Free Online OCR 98.64 1.36 0 100
3 OCR Convert 100 0 0 100
4 Convert image to text.net 100 0 0 100
5 Free OCR 100 0 0 100
6 i2OCR 100 0 0 100
7 Free OCR to Word
Convert
23.29 76.71 0 100
8 Google Docs 100 0 0 100
11. International Journal of UbiComp (IJU), Vol.6, No.3, July 2015
29
mathematical equations and symbols, these tools have produced inappropriate results. In Future,
new OCR tools are to be developed which will be helpful to perform the conversion process
appropriately for mathematical equations and symbols.
REFERENCES
[1] ShivaniDhiman, A.J Singh, “TesseractVsGocr A Comparative Study” International Journal of Recent
Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-2, Issue-4
[2] Chirag Patel, Atul Patel, Dharmendra Patel, “Optical Character Recognition by Open Source OCR
Tool Tesseract: A Case Study” International Journal of Computer Applications (0975 – 8887)
Volume 55– No.10
[3] http://en.wikipedia.org/wiki/Optical_character_recognition
[4] http://www.onlineocr.net/
[5] http://www.newocr.com/
[6] http://www.ocrconvert.com/
[7] http://www.convertimagetotext.net/
[8] http://www.free-ocr.com/
[9] http://www.i2ocr.com/
[10] http://www.ocrtoword.com/
[11] https://docs.google.com/
[12] Yasser Alginahi, “Preprocessing Techniques in Character Recognition”
[13] Oivind due trier, Anil K.Jain, TorfinnTaxt, “Future extraction methods for character recognition A
survey”.
[14] Pritpal Singh, SumitBudhiraja, “Feature Extraction and Classification Techniques in O.C.R. Systems
for Handwritten Gurmukhi Script – A Survey” Pritpal Singh, SumitBudhiraja / International Journal
of Engineering Research and Applications (IJERA), Vol. 1, Issue 4, pp. 1736-1739.
[15] Youssef Bassil, Mohammad Alwani, “Ocr Post-Processing Error Correction Algorithm Using
Google's Online Spelling Suggestion” Journal of Emerging Trends in Computing and Information
Sciences, Vol.3, No. 1
[16] Archana A. Shinde, D.G.Chougule, “Text Pre-processing and Text Segmentation for OCR”IJCSET,
Vol2
[17]https://www.google.co.in/search?q=k+means+algorithm&biw=1366&bih=615&source=lnms&tbm=isc
h&sa=X&sqi=2&ved=0CAcQ_AUoAmoVChMI5-DkrIvaxgIVD4-
OCh0leQqA#imgrc=TQ19POvnV7mNbM%3A
[18] Sandeep Dangi, Ashish Oberoi, Nishi Goel “Performance Comparison between Different Feature
Extraction Techniques with SVM Using Gurumukhi Script” International journal of Engineering
Research and Applications(IJERA) ISSN : 2248-9622, Vol. 4, Issue 7( Version 5), July 2014, pp.123-
128
12. International Journal of UbiComp (IJU), Vol.6, No.3, July 2015
30
BIOGRAPHY
Dr. S. Vijayarani, MCA, M.Phil, Ph.D is working as Assistant Professor in the
School of Computer Science and Engineering, Bharathiar University, Coimbatore.
Her fields of research interest are data mining, privacy and security issues in data
mining and data streams. She has published papers in the international journals and
presented research papers in international and national conferences.
Ms. A. Sakila has completed M.Sc in Computer Science. She is currently pursuing
her M.Phil in Computer Science in the School of Computer Science and Engineering,
Bharathiar University, Coimbatore. Her fields of interest are Data Mining and
Multimedia Mining.