This document summarizes a research paper that presents an optical character recognition system for isolated Arabic characters using a DSP-based hardware implementation. The system uses a fuzzy ART neural network for character recognition after extracting features from segmented characters. Testing achieved a 95% recognition rate on 700 sample characters from various fonts and sizes. The system was implemented on a TI TMS320C6416T digital signal processor for high-speed performance with small size and low power consumption. Future work could expand the system to recognize connected characters and directly read images rather than relying on pre-processing in MATLAB.
Handwritten Text Recognition and Digital Text Conversionijtsrd
Sometimes it is extremely difficult to secure handwritten documents in the real world. While doing so, we may encounter many problems such as misplacing the documents, unavailability of access from anywhere, physical damage, etc. So, to keep the information secure, we convert that information into digital format to address all the above mentioned problems. The main aim of our application is to recognize hand written text and display it in digital text format. Image processing is very significant process for data analysis these days. In image processing, the visible text from the real world as input must be processed precisely in order to produce the same information as output with accuracy. To do this, the text present in the image must be recognized by the system accurately. The proposed system aims at achieving these results. The process goes in this way The image which contains the handwritten text is fed to the system is passed into neural network which recognizes the handwritten text present in the image and displays it in the form of digital text. This can be used for many purposes such as copying the digital text for using it elsewhere, producing formal documents and can also be used as input for data processing. Using this process, we can store the information in a secure way, we can access the information from anywhere or at any time and there is no scope for physical damage as the information is in digital format. Mr. B. Ravinder Reddy | J. Nandini | P. Sowmya | Y. Sathwik ""Handwritten Text Recognition and Digital Text Conversion"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23508.pdf
Paper URL: https://www.ijtsrd.com/computer-science/data-processing/23508/handwritten-text-recognition-and-digital-text-conversion/mr-b-ravinder-reddy
Optical Character Recognition Using PythonYogeshIJTSRD
Optical Character Recognition is a process of classifying optical patterns with respect to alphanumeric or other characters. It also includes segmentation, feature extraction and classification. Deep learning is part of a broader family of machine learning methods based on artificial neural networks with. representation learning The idea of the project is to extract text from image using Deep Learning by OCR Ponvizhi. U | Ramya. P | Ramya. R "Optical Character Recognition Using Python" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-3 , April 2021, URL: https://www.ijtsrd.com/papers/ijtsrd41099.pdf Paper URL: https://www.ijtsrd.com/computer-science/other/41099/optical-character-recognition-using-python/ponvizhi-u
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUESijcsitcejournal
Optical Character Recognition (OCR) is the process which enables a system to without human intervention
identifies the scripts or alphabets written into the users’ verbal communication. Optical Character
identification has grown to be individual of the mainly flourishing applications of knowledge in the field of
pattern detection and artificial intelligence. In our survey we study on the various OCR techniques. In this
paper we resolve and examine the hypothetical and numerical models of Optical Character Identification.
The Optical character identification or classification (OCR) and Magnetic Character Recognition (MCR)
techniques are generally utilized for the recognition of patterns or alphabets. In general the alphabets are
in the variety of pixel pictures and it could be either handwritten or stamped, of any series, shape or
direction etc. Alternatively in MCR the alphabets are stamped with magnetic ink and the studying machine
categorize the alphabet on the basis of the exclusive magnetic field that is shaped by every alphabet. Both
MCR and OCR discover utilization in banking and different trade appliances. Earlier exploration going on
Optical Character detection or recognition has shown that the In Handwritten text there is no limitation
lying on the script technique. Hand written correspondence is complicated to be familiar through due to
diverse human handwriting style, disparity in angle, size and shape of calligraphy. An assortment of
approaches of Optical Character Identification is discussed here all along through their achievement.
Presentation on the New Technology based on the recognition of letters that would be available on Soft and Hard copy both and allow all the format in Soft Copy. Optical character Recognition based on the recognition of letters with all the existing languages.
This research tries to find out amethodology through which any data from the daily-use printed bills and invoices can be extracted. The data from these bills or invoices can be used extensively later on –such as machine learning or statistical analysis. This research focuses on extraction of final bill-amount, itinerary, date and similar data from bills and invoices as they encapsulate an ample amount of information about the users purchases, likes or dislikes etc. Optical Character Recognition (OCR) technology is a system that provides a full alphanumeric recognition of printed or handwritten characters from images. Initially, OpenCV has been used to detect the bill or invoice from the image and filter out the unnecessary noise from the image. Then intermediate image is passed for further processing using Tesseract OCR engine, which is an optical character recognition engine. Tesseract intends to apply Text Segmentation in order to extract written text in various fonts and languages. Our methodology proves to be highly accurate while tested on a variety of input images of bills and invoices.
Handwritten Text Recognition and Digital Text Conversionijtsrd
Sometimes it is extremely difficult to secure handwritten documents in the real world. While doing so, we may encounter many problems such as misplacing the documents, unavailability of access from anywhere, physical damage, etc. So, to keep the information secure, we convert that information into digital format to address all the above mentioned problems. The main aim of our application is to recognize hand written text and display it in digital text format. Image processing is very significant process for data analysis these days. In image processing, the visible text from the real world as input must be processed precisely in order to produce the same information as output with accuracy. To do this, the text present in the image must be recognized by the system accurately. The proposed system aims at achieving these results. The process goes in this way The image which contains the handwritten text is fed to the system is passed into neural network which recognizes the handwritten text present in the image and displays it in the form of digital text. This can be used for many purposes such as copying the digital text for using it elsewhere, producing formal documents and can also be used as input for data processing. Using this process, we can store the information in a secure way, we can access the information from anywhere or at any time and there is no scope for physical damage as the information is in digital format. Mr. B. Ravinder Reddy | J. Nandini | P. Sowmya | Y. Sathwik ""Handwritten Text Recognition and Digital Text Conversion"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23508.pdf
Paper URL: https://www.ijtsrd.com/computer-science/data-processing/23508/handwritten-text-recognition-and-digital-text-conversion/mr-b-ravinder-reddy
Optical Character Recognition Using PythonYogeshIJTSRD
Optical Character Recognition is a process of classifying optical patterns with respect to alphanumeric or other characters. It also includes segmentation, feature extraction and classification. Deep learning is part of a broader family of machine learning methods based on artificial neural networks with. representation learning The idea of the project is to extract text from image using Deep Learning by OCR Ponvizhi. U | Ramya. P | Ramya. R "Optical Character Recognition Using Python" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-3 , April 2021, URL: https://www.ijtsrd.com/papers/ijtsrd41099.pdf Paper URL: https://www.ijtsrd.com/computer-science/other/41099/optical-character-recognition-using-python/ponvizhi-u
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUESijcsitcejournal
Optical Character Recognition (OCR) is the process which enables a system to without human intervention
identifies the scripts or alphabets written into the users’ verbal communication. Optical Character
identification has grown to be individual of the mainly flourishing applications of knowledge in the field of
pattern detection and artificial intelligence. In our survey we study on the various OCR techniques. In this
paper we resolve and examine the hypothetical and numerical models of Optical Character Identification.
The Optical character identification or classification (OCR) and Magnetic Character Recognition (MCR)
techniques are generally utilized for the recognition of patterns or alphabets. In general the alphabets are
in the variety of pixel pictures and it could be either handwritten or stamped, of any series, shape or
direction etc. Alternatively in MCR the alphabets are stamped with magnetic ink and the studying machine
categorize the alphabet on the basis of the exclusive magnetic field that is shaped by every alphabet. Both
MCR and OCR discover utilization in banking and different trade appliances. Earlier exploration going on
Optical Character detection or recognition has shown that the In Handwritten text there is no limitation
lying on the script technique. Hand written correspondence is complicated to be familiar through due to
diverse human handwriting style, disparity in angle, size and shape of calligraphy. An assortment of
approaches of Optical Character Identification is discussed here all along through their achievement.
Presentation on the New Technology based on the recognition of letters that would be available on Soft and Hard copy both and allow all the format in Soft Copy. Optical character Recognition based on the recognition of letters with all the existing languages.
This research tries to find out amethodology through which any data from the daily-use printed bills and invoices can be extracted. The data from these bills or invoices can be used extensively later on –such as machine learning or statistical analysis. This research focuses on extraction of final bill-amount, itinerary, date and similar data from bills and invoices as they encapsulate an ample amount of information about the users purchases, likes or dislikes etc. Optical Character Recognition (OCR) technology is a system that provides a full alphanumeric recognition of printed or handwritten characters from images. Initially, OpenCV has been used to detect the bill or invoice from the image and filter out the unnecessary noise from the image. Then intermediate image is passed for further processing using Tesseract OCR engine, which is an optical character recognition engine. Tesseract intends to apply Text Segmentation in order to extract written text in various fonts and languages. Our methodology proves to be highly accurate while tested on a variety of input images of bills and invoices.
The Presentation is about Optical Character Recognition, Talks about high technology devices such as Bar-code scanner, book readers, Image to Tech Converter and paper Scanners
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Design and implementation of optical character recognition using template mat...eSAT Journals
Abstract
Optical character recognition (OCR) is an efficient way of converting scanned image into machine code which can further edit. There are variety of methods have been implemented in the field of character recognition. This paper proposes Optical character recognition by using Template Matching. The templates formed, having variety of fonts and size .In this proposed system, Image pre-processing, Feature extraction and classification algorithms have been implemented so as to build an excellent character recognition technique for different scripts .Result of this approach is also discussed in this paper. This system is implemented in Matlab.
Keywords- OCR, Feature Extraction, Classification
OCR application is developed with IMAQ Vision for LabVIEW software- developing tool and it uses a commercial digital camera from any android phone as image acquisition device.The proposed device will assist visually handicapped people in reading the printed text matter in English Language. It will look out for text signs which may be written text on printed books, newspapers, posters, etc and then live video frames of the text will be sent to labview for image processing and finally the recognised characters will be spoken.
OCR for PDFs: https://nanonets.com/blog/pdf-ocr/
PDF to CSV converter - https://nanonets.com/convert-pdf-to-csv
PDF to Excel converter - https://nanonets.com/tools/pdf-to-excel
Online OCR - https://nanonets.com/online-ocr
Handwritten character recognition is one of the most challenging and ongoing areas of research in the
field of pattern recognition. HCR research is matured for foreign languages like Chinese and Japanese but
the problem is much more complex for Indian languages. The problem becomes even more complicated for
South Indian languages due to its large character set and the presence of vowels modifiers and compound
characters. This paper provides an overview of important contributions and advances in offline as well as
online handwritten character recognition of Malayalam scripts.
The Presentation is about Optical Character Recognition, Talks about high technology devices such as Bar-code scanner, book readers, Image to Tech Converter and paper Scanners
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Design and implementation of optical character recognition using template mat...eSAT Journals
Abstract
Optical character recognition (OCR) is an efficient way of converting scanned image into machine code which can further edit. There are variety of methods have been implemented in the field of character recognition. This paper proposes Optical character recognition by using Template Matching. The templates formed, having variety of fonts and size .In this proposed system, Image pre-processing, Feature extraction and classification algorithms have been implemented so as to build an excellent character recognition technique for different scripts .Result of this approach is also discussed in this paper. This system is implemented in Matlab.
Keywords- OCR, Feature Extraction, Classification
OCR application is developed with IMAQ Vision for LabVIEW software- developing tool and it uses a commercial digital camera from any android phone as image acquisition device.The proposed device will assist visually handicapped people in reading the printed text matter in English Language. It will look out for text signs which may be written text on printed books, newspapers, posters, etc and then live video frames of the text will be sent to labview for image processing and finally the recognised characters will be spoken.
OCR for PDFs: https://nanonets.com/blog/pdf-ocr/
PDF to CSV converter - https://nanonets.com/convert-pdf-to-csv
PDF to Excel converter - https://nanonets.com/tools/pdf-to-excel
Online OCR - https://nanonets.com/online-ocr
Handwritten character recognition is one of the most challenging and ongoing areas of research in the
field of pattern recognition. HCR research is matured for foreign languages like Chinese and Japanese but
the problem is much more complex for Indian languages. The problem becomes even more complicated for
South Indian languages due to its large character set and the presence of vowels modifiers and compound
characters. This paper provides an overview of important contributions and advances in offline as well as
online handwritten character recognition of Malayalam scripts.
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...ijiert bestjournal
Optical character recognition systems have been effectively developed for the recognition of p rinted characters. Optical character recognition is an awesome computer vision technique with various applications ranging from saving real time scripts digitally and deriving context based intelligence using natural language processing from the texts. One such application is the recognition of machine printed characters. This paper illustrates the technique to identify machine printed characters using Blob detection method and Image processing. In many cases of such machine printed characters there is simi larity between character colour and background colour. There is mix up of reflected light and scattered light. Colour is not consistent across character area or background area. Paper explains how Blob detection technique is used for recognition of these m achines printed characters.
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text.
It is a system that provides a full alphanumeric recognition of printed or handwritten characters at electronic speed by simply scanning the form. It is widely used as a form of data entry from some sort of original paper data source, whether documents, sales receipts, mail, or any number of printed records.
It is a common method of digitizing printed texts so that they can be electronically searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech and text mining.OCR is a field of research in pattern recognition, artificial intelligence and computer vision. More recently, the term Intelligent Character Recognition(ICR) has been used to describe the process of interpreting image data, in particular alphanumeric text .
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Smart Assistant for Blind Humans using Rashberry PIijtsrd
An OCR (Optical Character Recognition) system which is a branch of computer vision and in turn a sub-class of Artificial Intelligence. Optical character recognition is the translation of optically scanned bitmaps of printed or hand-written text into audio output by using of Raspberry pi. OCRs developed for many world languages are already under efficient use. This method extracts moving object region by a mixture-of-Gaussians-based background subtraction method. A text localization and recognition are conducted to acquire text information. To automatically localize the text regions from the object, a text localization and Tesseract algorithm by learning gradient features of stroke orientations and distributions of edge pixels in an Adaboost model. Text characters in the localized text regions are then binaries and recognized by off-the-shelf optical character recognition software. The recognized text codes are output to blind users in speech. Performance of the proposed text localization algorithm. As the recognition process is completed, the character codes in the text file are processed using Raspberry pi device on which recognize character using Tesseract algorithm and python programming, the audio output is listed. Abish Raj. M. S | Manoj Kumar. A. S | Murali. V"Smart Assistant for Blind Humans using Rashberry PI" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-3 , April 2018, URL: http://www.ijtsrd.com/papers/ijtsrd11498.pdf http://www.ijtsrd.com/computer-science/embedded-system/11498/smart-assistant-for-blind-humans-using-rashberry-pi/abish-raj-m-s
Character Recognition (Devanagari Script)IJERA Editor
Character Recognition is has found major interest in field of research and practical application to analyze and study characters in different languages using image as their input. In this paper the user writes the Devanagari character using mouse as a plotter and then the corresponding character is saved in the form of image. This image is processed using Optical Character Recognition in which location, segmentation, pre-processing of image is done. Later Neural Networks is used to identify all the characters by the further process of OCR i.e. by using feature extraction and post-processing of image. This entire process is done using MATLAB.
Implementation and Performance Evaluation of Neural Network for English Alpha...ijtsrd
One of the most classical applications of the Artificial Neural Network is the character recognition system. This system is the base for many different types of applications in various fields, many of which are used in daily lives. Cost effective and less time consuming, businesses, post offices, banks, security systems, and even the field of robotics employ this system as the base of their operations. For character recognition, there are many prosperous algorithms for training neural networks. Back propagation (BP) is the most popular algorithm for supervised training multilayer neural networks. In this thesis, Back propagation (BP) algorithm is implemented for the training of multilayer neural networks employing in character recognition system. The neural network architecture used in this implementation is a fully connected three layer network. The network can train over 16 characters since the 4-element output vector is used as output units. This thesis also evaluates the performance of Back propagation (BP) algorithm with various learning rates and mean square errors. MATLAB Programming language is used for implementation. Myat Thida Tun"Implementation and Performance Evaluation of Neural Network for English Alphabet Recognition System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-5 , August 2018, URL: http://www.ijtsrd.com/papers/ijtsrd15863.pdf http://www.ijtsrd.com/engineering/information-technology/15863/implementation-and-performance-evaluation-of-neural-network-for-english-alphabet-recognition-system/myat-thida-tun
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
1. Paper 228, ENT 201
A Real-time DSP-Based Optical Character Recognition System for Isolated
Arabic characters using the TI TMS320C6416T
Haidar Almohri
University of Hartford
almohrih@hotmail.com
John S. Gray
University of Hartford
gray@mwd.hartford.edu
Hisham Alnajjar, PhD
University of Hartford
alnajjar@hartford.edu
Abstract
Optical Character Recognition (OCR) is an area of research that has attracted the interest of
researchers for the past forty years. Although the subject has been the center topic for many
researchers for years, it remains one of the most challenging and exciting areas in pattern
recognition. Since Arabic is one of the most widely used languages in the world, the demand
for a robust OCR for this language could be commercially valuable. There are varieties of
software based solutions available for Arabic OCR. However, there is little work done in the
area of hardware implementation of Arabic OCR where speed is a factor. In this research, a
robust DSP-based OCR is designed for recognition of Arabic characters. Since the scope of
this research is focused on hardware implementation, the system is designed for recognition
of isolated Arabic characters. An efficient recognition algorithm based on feature extraction
and using a Fuzzy ART Neural Network as well as the hardware implementation is also
proposed in this research. A recognition rate of 95% is reported.
Introduction
Optical Character Recognition, usually referred to as OCR, is the process of converting the
image obtained by scanning a text or a document into machine-editable format. OCR is one
of the most important fields of pattern recognition and has been the center of attention for
researchers in the last forty decades [1].
The goal is to process data that normally is processed only by humans with computers. One
of the apparent advantages of computer processing is dealing with huge amounts of
information at high speed [2]. Some other advantages of OCR are: reading postal address off
envelopes, reading customer filled forms, archiving and retrieving text, digitizing libraries …
etc. Using OCR, the handwritten and typewritten text could be stored into computers to
generate databases of existing texts without using the keyboard.
The modern version of OCR appeared in the middle of the 1940’s with the development of
the digital computers [3]. Since then several character recognition systems for English,
Proceedings of The 2008 IAJC-IJME International Conference
ISBN 978-1-60643-379-9
2. Chinese and Japanese characters have been proposed [4, 5, 6]. However, developing OCR
systems for other languages such as Arabic didn’t receive the same amount of attention.
Arabic is the official language of all countries in North Africa and most of the countries in
the Middle East and is spoken by 234 million people [7, 8]. It is the sixth most commonly
used language in the world. When spoken Arabic varies across regions, but written Arabic,
sometimes called “Modern Standard Arabic” (MSA), is a standardized version used for
official communication across the Arab world [9]. The characters of Arabic script and similar
characters are used by a greater percentage of the world’s population to write languages such
as Arabic, Farsi (Persian), and Urdu [10]. Therefore, an efficient way to automate the process
of digitizing the Arabic documents such as books, articles, etc. would be highly beneficial
and commercially valuable. A 2006 survey cites that the first modern Arabic OCR approach
took place in 1981 where Parhami and Taraghi presented their algorithm which achieved a
character recognition rate of 85 percent. Since then, many attempts have been taken place
and there have been numbers of commercially OCR products available in the market.
However, there is little effort done in implementing a hardware-based Arabic OCR device
that has a small foot print and could be easily transported.
This research aims to design and implement an efficient hardware-based OCR using image
processing and DSP techniques. The advantages of this OCR system include, but not limited
to the followings:
• Small footprint
• Light and easy to carry
• Low power consumption
• High speed performance
Characteristics of Arabic Script
One of the reasons for slow advancements in Arabic OCRs is the characteristics of this script
that makes it more challenging than other languages. Some of these characteristics are listed
below:
• The Arabic script is cursive
• Characters can have different shapes in different positions of a word
• Most letters have one, two, or three dots
• A word is composed of sub-word (s)
In addition to the above characteristics, the Arabic font is written-read from right to left.
These characteristics have made the progress of Arabic OCR more complex and difficult
than other languages.
Proceedings of The 2008 IAJC-IJME International Conference
ISBN 978-1-60643-379-9
3. Preview of Existing Work
Typewritten vs. Handwritten Recognition
The problem of character recognition can be divided into two major categories: typewritten
and handwritten. As their names describe their natures,typewritten recognition recognizes a
document that has been previously typed and scanned prior to recognition progress. Such a
system would be used as a way to digitize books, documents and papers in libraries,
government, or held by companies. In handwritten recognition, the system attempts to
recognize a text that has been written by a human (not a machine). This is usually more
difficult as there is no standard way of writing and the handwriting of each person is different
than the other. As a result, the recognition rate achieved for handwritten recognition systems
is less than the typewritten.
Offline vs. Online Text Recognition
Character recognition systems may be further categorized to offline and online recognition
systems. In offline recognition, the image of the type or handwritten text is acquired through
scanning using an optical scanner. The image then is read by the system and is analyzed for
recognition. In online recognition systems, input is an image of a hand-printed text which is
usually acquired from a tablet computer or pen-based devices such as cell phone and sign
pad. Online recognition is a fast growing technique for convenient human computer interface
and it has a lot of advantages. For example, it can be used to help people such as computer
novices, elderly people and house wives to conveniently use a computer. Additionaly, it
makes a small size portable computer (PDA, handheld PC, palm PC, etc.) possible because
there is no need for keyboard or keypad.
In this research, the developed system is designed for typewritten, offline character
recognition; therefore the discussion will be focused on this area.
Basic OCR System’s Architecture
Any offline OCR system contains of all or part of the following steps:
• Image Acquisition
• Preprocessing
• Line Segmentation
• Word Segmentation
• Character Segmentation
• Recognition
• Post Processing
Proceedings of The 2008 IAJC-IJME International Conference
ISBN 978-1-60643-379-9
4. Proposed Algorithm
Figure 1 shows the block diagram of the proposed algorithm.
Image Acquisition Character Segmentation
(By MATLAB) (Horizontal Projection
Profile)
Binarizing the Image
(By MATLAB) Feature Extraction
Loading the Image
Matrix to DSK Recognition (Fuzzy
ART Neural Network)
Line Segmentation
(Horizontal Projection
Profile) Saving the Result in a
.dat File
Figure 1: The block diagram of the proposed algorithm
Image Acquisition
The process starts by acquiring the image. Text is scanned using a 300 dpi scanner and the
image of the text is saved as a .bmp file in a computer running MATLAB. MATLAB is used
to read the image and convert it to black and white format. Using MATLAB, the pixel values
of the binary image (represented as 0 or 1) are saved in a text file. The pixel values are then
used to create a header file to represent the image in the main program.
As shown in figure 2, the scanned image always contains noise that usually appears as an
extra pixel (black or white) in the character image. If the noise is not taken into
consideration, it could subvert the process and produce an incorrect result.
Noise
Figure 2: Letter “faa” corrupted by noise
Proceedings of The 2008 IAJC-IJME International Conference
ISBN 978-1-60643-379-9
5. Line Segmentation
When the image matrix is ready to be processed, the first step is to isolate each line of the
text from the whole document. A horizontal projection profile technique is used for this
purpose. A computer program scans the image horizontally to find the first and last black
pixels in a line. Once these pixels are found, the area in between these pixels represents the
line that may contain one or more character. Using the same technique, the whole document
is scanned and each line is detected and saved in a temporary array for further processing.
Character Segmentation
Once each line of the text is stored in a separate array, using vertical projection profile, the
program scans each array this time vertically to detect and isolate each character within each
line. The first and last black pixels that are detected vertically are the borders of the
character. It possible that when the characters are segmented, there is a white area above,
below, or both above and below the character, except for the tallest character that its height is
equal to the height of the line. Since the edges of each character box is needed for the
recognition purpose, another horizontal scan is run to detect the top and bottom of the
character and isolate the area that only contains the pixels of the character.
Feature Extraction and Recognition
At this point, the program has isolated each character in the document and the matrix
representation of each character is ready to be processed for recognition purpose. In this
research, several methods were examined to find the most suitable method for recognition.
Several factors determine the efficiency of the recognition algorithm. The most important
factors are the speed of the process and the accuracy of the result.
Feature Extraction: As discussed earlier, at the time of processing, a matrix of pixel values
which contains the four boarders of each character image is extracted by the program and has
been recognized in a manner similar to that shown in figure 3.
Figure 3: Extracted characters
Proceedings of The 2008 IAJC-IJME International Conference
ISBN 978-1-60643-379-9
6. Feature selection is one of the most critical issues in character recognition as the recognition
rate is dependent on the choice of features. Every character has some features that distinguish
it from the other characters. Some of the commonly used features for character recognition
are loops, holes, strokes, vertical lines, cusps, etc. The majority of previous works focuses on
these features, as they appeal to the human intuitive logic. Unfortunately, techniques using
these features suffer a common drawback, namely, exhaustive processing time. The solution
to this lies in the selection of an algorithm which effectively reduces image processing time
while not compromising its accuracy.
An optimal selection of features, which categorically defines the details of the character and
does not take a long processing time, is implemented to extract features from the character to
be recognized prior to recognition.
This way, each character is distinguished by a set of features which are unique for the
character. This information is used to train the Neural Network to learn and use these features
to find the result rather than inputting all the pixel values for each character. The features
extracted have the following properties:
• Easy to extract, which reduces the complexity of the program.
• Distinct, which eases the Neural Network’s recognition process.
• Independent of font type and size, which is a big advantage since the system is capable of
recognizing any font type with any size.
There are 14 features extracted from the character of which 4 of them are for the whole
image as listed below:
1. Height / Width
2. number of black pixels / number of white pixels image
3. number of horizontal transitions
4. number of vertical transitions
The horizontal and vertical transition is a technique used to detect the curvature of each
character and found to be effective for this purpose. The procedure runs a horizontal
scanning through the character box and finds the number of times that the pixel value
changes state from 0 to 1 or from 1 to 0 as shown in figure 8. The total number of times that
the pixel status changes, is its horizontal transition value. Similar process is used to find the
vertical transition value.
Proceedings of The 2008 IAJC-IJME International Conference
ISBN 978-1-60643-379-9
7. Figure 4: Horizontal and vertical transitions
In addition, the image is divided into four regions as shown in figure 9 and the following
features are extracted from these regions:
1. Black Pixels in Region 1/White Pixels in Region 1
2. Black Pixels in Region 2/White Pixels in Region 2
3. Black Pixels in Region 3/White Pixels in Region 3
4. Black Pixels in Region 4/White Pixels in Region 4
5. Black Pixels in Region 1/Black Pixels in Region 2
6. Black Pixels in Region 3/Black Pixels in Region 4
7. Black Pixels in Region 1/Black Pixels in Region 3
8. Black Pixels in Region 2/Black Pixels in Region 4
9. Black Pixels in Region 1/Black Pixels in Region 4
10.Black Pixels in Region 2/Black Pixels in Region 3
Figure 5: Dividing the image to 4 regions and extracting features
These features were found to be sufficient to distinguish between different characters. The
extracted feature vector is to train the Neural Network.
Proceedings of The 2008 IAJC-IJME International Conference
ISBN 978-1-60643-379-9
8. ART Neural Network: A training set database has to be generated for the network to be
trained. 700 sample characters chosen from the most popular Arabic fonts and sizes are used
to generate the database. The 14 features described previousely are extracted from this set of
characters using MATLAB and the results are saved in a text file which could be used by
Professional II/PLUS as the training set.
The Fuzzy ART neural network has the following architecture: 14 Input, 1 Output, 60 F2
Layer, 0.0000 Vigilance. The network is trained for 20,000 times and tested using different
samples to calculate the performance of the network. The test results showed that the
network was able to predict about 95% of the input characters correctly. This accuracy range
is an average as it varies depending on the resolution of the image, and the font type and size.
If the input font is the same as the fonts available in the database (which are used to train the
network) the accuracy goes up to 98% recognition, but if the font is unknown for the
network, the error level increases yielding an accuracy of 92%. Since most of the popular
Arabic fonts are defined in the training set, the network should be able to achieve a high
accuracy rate in most cases.
After the network is fully trained and tested and a satisfactory result has been achieved, the C
source code is generated using flashcode option which is a utility available in Professional
II/PLUS.
Hardware Implementation and Results
This project is fully written in the C programming language using Code Composer Studio
which is a fully integrated development environment (IDE) supporting Texas Instruments
industry-leading DSP platforms. A C6416T DSK, which is a standalone development
platform that enables users to evaluate and develop applications for the TI C64xx DSP
family, is used to run the application. The Neural Network was designed in Professional
II/PLUS software. The project was built and run on C6416T DSK and the results were saved
on computer as a .dat file. The images were obtained by scanning different texts with a 300
dpi scanner and transferred to the system.
Conclusion and achievements
The aim of this work is to implement a hardware-based Optical Character Recognition
system for printed Arabic characters. The goal was achieved using DSP techniques.
The following points summarize the conclusions of the work:
• Arabic character recognition is a research area involving many concepts and research
points.
• Fuzzy ART Neural Network was implemented and tested.
• A noble and efficient recognition method with a high accuracy (plus 95%) was
successfully developed and tested in this thesis.
Proceedings of The 2008 IAJC-IJME International Conference
ISBN 978-1-60643-379-9
9. • A complete hybrid hardware-based system for isolated Arabic character recognition was
proposed and tested for performance.
Future work
The implemented system suffers from some constraints and needs further work to become a
reliable and commercially valuable product. The following list notes the limitations and
suggested solutions for future research on this project:
• The image is obtained manually: as discussed earlier, the image is read by MATLAB
outside the program and after preprocessing (converting to binary, etc.) its pixel values are
transferred to the project as a header file. The possible solution for this limitation is to
write C program code to read the image directly from the computer (or any other host
device that the image is saved on).
• The system works only for isolated characters: since the scope of this research is focused
on the recognition problem and the hardware implementation, the current system assumes
that the characters are already isolated and the algorithm can only recognize the isolated
characters. For this project to have a commercial value, the system should be able to
isolate the connected characters from a word. Since there are techniques already developed
for isolating Arabic characters, this system could be integrated with one of the existing
character segmentation algorithms to overcome this limitation.
• The system is developed using a general-purpose DSP board (TMS320C6416T DSK). The
algorithm could be developed on a single chip and a smaller board which is only designed
for this purpose.
• A multi-language Optical Character Recognition hardware could be developed if several
OCR applications for several languages are programmed on the same chip using the same
board.
• The board could be integrated with a built-in scanner (like the scanners used to scan the
bar code of the products) for image acquisition. This way, the system would work
independently of any computer or host device. Such a board can scan a document and
perform OCR without needing computer.
References
[1] Kavianifar Mandana, & Amin, Adnan (1999). Preprocessing and Structural Feature
Extraction for a Multi-Fonts Arabic / Persian OCR. Document Analysis and
Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference .
[2] Vinciarelli, Alessandro (2003).Offline Cursive Handwritting: From Word to Text
Recognition.
[3] Govindan V .K, & Shivaprasad, A.P (1990). Character Recognition – A Review.
Pattern Recognition. 23, 671-683
Proceedings of The 2008 IAJC-IJME International Conference
ISBN 978-1-60643-379-9
10. [4] Sekita, I., Toraichi, K., Mori, R., Yamamoto, K., & Yamada, H. (1988). Feature
extraction of handprinted Japanese characters by spline function or relaxation
matching. Pattren Recognition. 21, 9-17.
[5] Xie X. L., & Suk, M. (1988). On machine recognition of handprinted Chinese
characters by feature relaxation. Pattern Recognition. 21, 1-7.
[6] Matsumura, H., Aoki, K., Iwahara,, T., Oohama, H., & Kogura, K. (1986). Desktop
optical handwritten character reader. Sanyo tech. 18, 3-12.
[7] Hashemi, M., Fatemi, O., & Safavi, R. (1995). Persian script recognition. Proceedings
of the third Int. Conference on document analysis and recognition. II, 869-873.
[8] Allam, M. (1995).Segmentation versus Segmentation-free for Recognizing Arabic
Text. Document Recognition II, SPIE. 2422, 228-235.
[9] Ethnologue: Languages of the World, 14th ed. SIL Int’l, 2000.
[10] Lorigo, AuthorLiana M., & Govindaraju, Venu (2006). Offline Arabic Handwriting
Recognition: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND
MACHINE INTELLIGENCE. 28, 1.
Biography
HAIDAR ALMOHRI is currently an employed by Siemens Co. as a Communication
Engineer in their branch in Kuwait. He completed his undergraduate and graduate studies in
Electrical Engineering at the University of Hartford, Connecticut, USA.
John S. Gray is currently a Professor of Computer Science at the University of Hartford,
West Hartford, CT. He is Chair of two degree programs – Computer Science and
Multimedia Web Design and Development. His area of interest and expertise is UNIX
system level programming with a focus on interprocess communications. As an educator,
author and consultant, he has been involved with computers and software development for
over 24 years.
Dr. Hisham Alnajjar is an Associate Professor of Electrical and Computer Engineering at the
University of Hartford, Connecticut (USA), where he is also the Associate Dean of the
College of Engineering, Technology, and Architecture (CETA). Ph.D. from Vanderbilt
University, M.S. from Ohio University. His research interests include sensor array
processing, digital signal processing, and power systems.
Proceedings of The 2008 IAJC-IJME International Conference
ISBN 978-1-60643-379-9