This document summarizes research on recognizing online handwritten Sanskrit characters using support vector classification. It discusses using Freeman chain code to extract features from character images and represent boundary pixels. A randomized algorithm generates the chain codes. Features vectors are then built and used to train a support vector machine classifier. Segmentation is also used to evaluate possible segmentation zones. The goal is to develop an accurate system for recognizing Sanskrit characters, which is challenging due to complex character shapes and styles. Previous work on character recognition is discussed, focusing on Indian scripts like Devanagari and techniques like feature extraction and classification.
An Optical Character Recognition for Handwritten Devanagari ScriptIJERA Editor
Optical Character Recognition is process of recognition of character from scanned document and lots of OCR now available in the market. But most of these systems work for Roman, Chinese, Japanese and Arabic characters . There are no sufficient number of work on Indian language script like Devanagari so this paper present a review on optical character recognition on handwritten Devanagari script
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...acijjournal
Handwritten character recognition is conversion of handwritten text to machine readable and editable form. Online character recognition deals with live conversion of characters. Malayalam is a language spoken by millions of people in the state of Kerala and the union territories of Lakshadweep and Pondicherry in India. It is written mostly in clockwise direction and consists of loops and curves. The method aims at training a simple neural network with three layers using backpropagation algorithm.
Freeman codes are used to represent each character as feature vector. These feature vectors act as inputs to the network during the training and testing phases of the neural network. The output is the character expressed in the Unicode format.
S TRUCTURAL F EATURES F OR R ECOGNITION O F H AND W RITTEN K ANNADA C ...ijcsit
Research in image processing involves many active a
reas, of these Recognition of Handwritten character
holds lots of promises and is challenging one .The
idea is to enable the computer to be able to recogn
ize
intelligibly hand written inputs In this paper, a
new method that uses structural features and suppo
rt
vector Machine (SVM) classifier for recognition of
Handwritten Kannada characters is presented. On an
average recognition accuracy of 89.84 % and 85.14%
for handwritten Kannada vowels and Consonants
obtained with this proposed method, inspite of inhe
rent variations
Signboard Text Translator: A Guide to TouristIJECEIAES
The travelers face troubles in understanding the signboards which are written in local lan- guage. The travelers can rely on smart phone for traveling purposes. Smart phones become most popular in recent years in terms of market value and the number of useful applications to the users. This work intends to build up a web application that can recognize the English content present on signboard pictures captured using a smart phone, translate the content from English to Telugu, and display the translated Telugu text back onto the screen of the phone. Experiments have been conducted on various signboard pictures and the outcomes demonstrate the viability of the proposed approach.
An Optical Character Recognition for Handwritten Devanagari ScriptIJERA Editor
Optical Character Recognition is process of recognition of character from scanned document and lots of OCR now available in the market. But most of these systems work for Roman, Chinese, Japanese and Arabic characters . There are no sufficient number of work on Indian language script like Devanagari so this paper present a review on optical character recognition on handwritten Devanagari script
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...acijjournal
Handwritten character recognition is conversion of handwritten text to machine readable and editable form. Online character recognition deals with live conversion of characters. Malayalam is a language spoken by millions of people in the state of Kerala and the union territories of Lakshadweep and Pondicherry in India. It is written mostly in clockwise direction and consists of loops and curves. The method aims at training a simple neural network with three layers using backpropagation algorithm.
Freeman codes are used to represent each character as feature vector. These feature vectors act as inputs to the network during the training and testing phases of the neural network. The output is the character expressed in the Unicode format.
S TRUCTURAL F EATURES F OR R ECOGNITION O F H AND W RITTEN K ANNADA C ...ijcsit
Research in image processing involves many active a
reas, of these Recognition of Handwritten character
holds lots of promises and is challenging one .The
idea is to enable the computer to be able to recogn
ize
intelligibly hand written inputs In this paper, a
new method that uses structural features and suppo
rt
vector Machine (SVM) classifier for recognition of
Handwritten Kannada characters is presented. On an
average recognition accuracy of 89.84 % and 85.14%
for handwritten Kannada vowels and Consonants
obtained with this proposed method, inspite of inhe
rent variations
Signboard Text Translator: A Guide to TouristIJECEIAES
The travelers face troubles in understanding the signboards which are written in local lan- guage. The travelers can rely on smart phone for traveling purposes. Smart phones become most popular in recent years in terms of market value and the number of useful applications to the users. This work intends to build up a web application that can recognize the English content present on signboard pictures captured using a smart phone, translate the content from English to Telugu, and display the translated Telugu text back onto the screen of the phone. Experiments have been conducted on various signboard pictures and the outcomes demonstrate the viability of the proposed approach.
Handwriting character recognition (HCR) is the ability of a computer to receive and interpret handwritten input. Handwritten Character Recognition is one of the active and challenging research areas in the field of Pattern Recognition. Pattern recognition is a process that taking in raw data and making an action based on the category of the pattern. HCR is one of the well-known applications of pattern recognition. Handwriting recognition especially for Indian languages is still in infant stage because not much work has been done it. This paper discuss about an idea to recognize Kannada vowels using chain code features. Kannada is a South Indian language. For any recognition system, an important part is feature extraction. A proper feature extraction method can increase the recognition ratio. In this paper, a chain code based feature extraction method is investigated for developing HCR system. Chain code is working based on 4-neighborhood or 8–neighborhood methods. Chain code is a sequence of code directions of a character and connection to a starting point which is often used in image processing. In this paper, 8–neighborhood method has been implemented which allows generation of eight different codes for each character. These codes have been used as features of the character image, which have been later on used for training and testing for K-Nearest Neighbor (KNN) classifiers. The level of accuracy reached to 100%.
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGEScscpconf
Identification of scripts from multi-script document is one of the important steps in the design
of an OCR system for successful analysis and recognition. Most optical character recognition
(OCR) systems can recognize at most a few scripts. But for large archives of document images
containing different scripts, there must be some way to automatically categorize these
documents before applying the proper OCR on them. Much work has already been reported in
this area. In the Indian context, though some results have been reported, the task is still at its
infancy. This paper presents a research in the identification of Tamil, English and Hindi
scripts at word level irrespective of their font faces and sizes. It also identifies English
numerals from multilingual document images. The proposed technique performs document
vectorization method which generates vectors from the nine zones segmented over the
characters based on their shape, density and transition features. Script is then determined by
using Rule based classifiers and its sub classifiers containing set of classification rules which
are raised from the vectors. The proposed system identifies scripts from document images
even if it suffers from noise and other kinds of distortions. Results from experiments,
simulations, and human vision encounter that the proposed technique identifies scripts and numerals with minimal pre-processing and high accuracy. In future, this can also be extended for other scripts.
Video Audio Interface for recognizing gestures of Indian sign LanguageCSCJournals
We proposed a system to robotically recognize gestures of sign language from a video stream of the signer. The developed system converts words and sentences of Indian sign language into voice and text in English. We have used the power of image processing techniques and artificial intelligence techniques to achieve the objective. To accomplish the task we used powerful image processing techniques such as frame differencing based tracking, edge detection, wavelet transform, image fusion techniques to segment shapes in our videos. It also uses Elliptical Fourier descriptors for shape feature extraction and principal component analysis for feature set optimization and reduction. Database of extracted features are compared with input video of the signer using a trained fuzzy inference system. The proposed system converts gestures into a text and voice message with 91 percent accuracy. The training and testing of the system is done using gestures from Indian Sign Language (INSL). Around 80 gestures from 10 different signers are used. The entire system was developed in a user friendly environment by creating a graphical user interface in MATLAB. The system is robust and can be trained for new gestures using GUI.
Fragmentation of handwritten touching characters in devanagari scriptZac Darcy
Character Segmentation of handwritten words is a difficult task because of different writing styles and
complex structural features. Segmentation of handwritten text in Devanagari script is an uphill task. The
occurrence of header line, overlapped characters in middle zone & half characters make the segmentation
process more difficultt. Sometimes, interline space and noise makes line fragmentation a difficult task.
Sometimes, interline space and noise makes line fragmentation a difficult task. Without separating the
touching characters, it will be difficult to identify the characters, hence fragmentation is necessary of the
touching characters in a word. So, we devised a technique, according to that first step will be
preprocessing of a word, than identify the joint points, form the bounding boxes around all vertical &
horizontal lines and finally fragment the touching characters on the basis of their height and width.
Character recognition for bi lingual mixed-type characters using artificial n...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Preprocessing Phase for Offline Arabic Handwritten Character RecognitionEditor IJCATR
—In this paper we reviewed the importance issues of the optical character recognition, gives more emphases for OCR and its phases. We discuss the main characteristics of Arabic language, furthermore it focused on the pre-processing phase of the character recognition system. We described and implemented the algorithms of binarization, dots removing and thinning which will be used for feature extraction phase. The algorithms are tested using 47,988 isolated character sample taken from SUST/ ALT dataset and achieved better results. The pre-processing phase developed by using MATLAB software
Segmentation and recognition of handwritten gurmukhi scriptRAJENDRA VERMA
To Segment handwritten cursive words into individual predefined strokes. I had design a algorithm which is calculated angle between two coordinate points in the basic of angle and segment the handwritten cursive word. It improve the accuracy of handwriting recognition system.
DISCRIMINATION OF ENGLISH TO OTHER INDIAN LANGUAGES (KANNADA AND HINDI) FOR O...IJCSEA Journal
India is a multilingual multi-script country. In every state of India there are two languages one is state local language and the other is English. For example in Andhra Pradesh, a state in India, the document may contain text words in English and Telugu script. For Optical Character Recognition (OCR) of such a bilingual document, it is necessary to identify the script before feeding the text words to the OCRs of individual scripts. In this paper, we are introducing a simple and efficient technique of script identification for Kannada, English and Hindi text words of a printed document. The proposed approach is based on the horizontal and vertical projection profile for the discrimination of the three scripts. The feature extraction is done based on the horizontal projection profile of each text words. We analysed 700 different words of Kannada, English and Hindi in order to extract the discrimination features and for the development of knowledge base. We use the horizontal projection profile of each text word and based on the horizontal projection profile we extract the appropriate features. The proposed system is tested on 100 differentdocument images containing more than 1000 text words of each script and a classification rate of 98.25%, 99.25% and 98.87% is achieved for Kannada, English and Hindi respectively.
The presentation will describe an algorithm through which one can recognize Devanagari Characters. Devanagari is the script in which Hindi is represented. This algorithm
could automatically segment character from the image of Devenagari text and then recognize them.
For extracting the individual characters from the image of Devanagari text, algorithm segmented the image several
times using the vertical and horizontal projection.
The algorithm starts with first segmenting the lines separately from the document by taking horizontal projection and then the line
into words by taking vertical projection of the line. Another step which is particular to the separation of
Devanagari characters was required and was done by first removing the header line by finding horizontal projection
of each word. The characters can then be extracted by vertical projection of the word without the header line.
Algorithm uses a Kohonen Neural Netowrk for the recognition task. After the separation of the characters from the
image, the image matrix was then downsampled to bring it down to a fixed size so as to make the recognition
size independent. The matrix can then be fed as input neurons to the Kohonen Neural Network and the winning neuron is
found which identifies the recognized the character. This information in Kohonen Neural Network was stored
earlier during the training phase of the neural network. For this, we first assigned random weights from input neurons
to output neurons and then for each training set, the winning neuron was calculated by finding the maximum
output produced by the neurons. The wights for this winning neuron were then adjusted so that it responds to this
pattern more strongly the next time.
Handwriting character recognition (HCR) is the ability of a computer to receive and interpret handwritten input. Handwritten Character Recognition is one of the active and challenging research areas in the field of Pattern Recognition. Pattern recognition is a process that taking in raw data and making an action based on the category of the pattern. HCR is one of the well-known applications of pattern recognition. Handwriting recognition especially for Indian languages is still in infant stage because not much work has been done it. This paper discuss about an idea to recognize Kannada vowels using chain code features. Kannada is a South Indian language. For any recognition system, an important part is feature extraction. A proper feature extraction method can increase the recognition ratio. In this paper, a chain code based feature extraction method is investigated for developing HCR system. Chain code is working based on 4-neighborhood or 8–neighborhood methods. Chain code is a sequence of code directions of a character and connection to a starting point which is often used in image processing. In this paper, 8–neighborhood method has been implemented which allows generation of eight different codes for each character. These codes have been used as features of the character image, which have been later on used for training and testing for K-Nearest Neighbor (KNN) classifiers. The level of accuracy reached to 100%.
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGEScscpconf
Identification of scripts from multi-script document is one of the important steps in the design
of an OCR system for successful analysis and recognition. Most optical character recognition
(OCR) systems can recognize at most a few scripts. But for large archives of document images
containing different scripts, there must be some way to automatically categorize these
documents before applying the proper OCR on them. Much work has already been reported in
this area. In the Indian context, though some results have been reported, the task is still at its
infancy. This paper presents a research in the identification of Tamil, English and Hindi
scripts at word level irrespective of their font faces and sizes. It also identifies English
numerals from multilingual document images. The proposed technique performs document
vectorization method which generates vectors from the nine zones segmented over the
characters based on their shape, density and transition features. Script is then determined by
using Rule based classifiers and its sub classifiers containing set of classification rules which
are raised from the vectors. The proposed system identifies scripts from document images
even if it suffers from noise and other kinds of distortions. Results from experiments,
simulations, and human vision encounter that the proposed technique identifies scripts and numerals with minimal pre-processing and high accuracy. In future, this can also be extended for other scripts.
Video Audio Interface for recognizing gestures of Indian sign LanguageCSCJournals
We proposed a system to robotically recognize gestures of sign language from a video stream of the signer. The developed system converts words and sentences of Indian sign language into voice and text in English. We have used the power of image processing techniques and artificial intelligence techniques to achieve the objective. To accomplish the task we used powerful image processing techniques such as frame differencing based tracking, edge detection, wavelet transform, image fusion techniques to segment shapes in our videos. It also uses Elliptical Fourier descriptors for shape feature extraction and principal component analysis for feature set optimization and reduction. Database of extracted features are compared with input video of the signer using a trained fuzzy inference system. The proposed system converts gestures into a text and voice message with 91 percent accuracy. The training and testing of the system is done using gestures from Indian Sign Language (INSL). Around 80 gestures from 10 different signers are used. The entire system was developed in a user friendly environment by creating a graphical user interface in MATLAB. The system is robust and can be trained for new gestures using GUI.
Fragmentation of handwritten touching characters in devanagari scriptZac Darcy
Character Segmentation of handwritten words is a difficult task because of different writing styles and
complex structural features. Segmentation of handwritten text in Devanagari script is an uphill task. The
occurrence of header line, overlapped characters in middle zone & half characters make the segmentation
process more difficultt. Sometimes, interline space and noise makes line fragmentation a difficult task.
Sometimes, interline space and noise makes line fragmentation a difficult task. Without separating the
touching characters, it will be difficult to identify the characters, hence fragmentation is necessary of the
touching characters in a word. So, we devised a technique, according to that first step will be
preprocessing of a word, than identify the joint points, form the bounding boxes around all vertical &
horizontal lines and finally fragment the touching characters on the basis of their height and width.
Character recognition for bi lingual mixed-type characters using artificial n...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Preprocessing Phase for Offline Arabic Handwritten Character RecognitionEditor IJCATR
—In this paper we reviewed the importance issues of the optical character recognition, gives more emphases for OCR and its phases. We discuss the main characteristics of Arabic language, furthermore it focused on the pre-processing phase of the character recognition system. We described and implemented the algorithms of binarization, dots removing and thinning which will be used for feature extraction phase. The algorithms are tested using 47,988 isolated character sample taken from SUST/ ALT dataset and achieved better results. The pre-processing phase developed by using MATLAB software
Segmentation and recognition of handwritten gurmukhi scriptRAJENDRA VERMA
To Segment handwritten cursive words into individual predefined strokes. I had design a algorithm which is calculated angle between two coordinate points in the basic of angle and segment the handwritten cursive word. It improve the accuracy of handwriting recognition system.
DISCRIMINATION OF ENGLISH TO OTHER INDIAN LANGUAGES (KANNADA AND HINDI) FOR O...IJCSEA Journal
India is a multilingual multi-script country. In every state of India there are two languages one is state local language and the other is English. For example in Andhra Pradesh, a state in India, the document may contain text words in English and Telugu script. For Optical Character Recognition (OCR) of such a bilingual document, it is necessary to identify the script before feeding the text words to the OCRs of individual scripts. In this paper, we are introducing a simple and efficient technique of script identification for Kannada, English and Hindi text words of a printed document. The proposed approach is based on the horizontal and vertical projection profile for the discrimination of the three scripts. The feature extraction is done based on the horizontal projection profile of each text words. We analysed 700 different words of Kannada, English and Hindi in order to extract the discrimination features and for the development of knowledge base. We use the horizontal projection profile of each text word and based on the horizontal projection profile we extract the appropriate features. The proposed system is tested on 100 differentdocument images containing more than 1000 text words of each script and a classification rate of 98.25%, 99.25% and 98.87% is achieved for Kannada, English and Hindi respectively.
The presentation will describe an algorithm through which one can recognize Devanagari Characters. Devanagari is the script in which Hindi is represented. This algorithm
could automatically segment character from the image of Devenagari text and then recognize them.
For extracting the individual characters from the image of Devanagari text, algorithm segmented the image several
times using the vertical and horizontal projection.
The algorithm starts with first segmenting the lines separately from the document by taking horizontal projection and then the line
into words by taking vertical projection of the line. Another step which is particular to the separation of
Devanagari characters was required and was done by first removing the header line by finding horizontal projection
of each word. The characters can then be extracted by vertical projection of the word without the header line.
Algorithm uses a Kohonen Neural Netowrk for the recognition task. After the separation of the characters from the
image, the image matrix was then downsampled to bring it down to a fixed size so as to make the recognition
size independent. The matrix can then be fed as input neurons to the Kohonen Neural Network and the winning neuron is
found which identifies the recognized the character. This information in Kohonen Neural Network was stored
earlier during the training phase of the neural network. For this, we first assigned random weights from input neurons
to output neurons and then for each training set, the winning neuron was calculated by finding the maximum
output produced by the neurons. The wights for this winning neuron were then adjusted so that it responds to this
pattern more strongly the next time.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUESijcsitcejournal
Optical Character Recognition (OCR) is the process which enables a system to without human intervention
identifies the scripts or alphabets written into the users’ verbal communication. Optical Character
identification has grown to be individual of the mainly flourishing applications of knowledge in the field of
pattern detection and artificial intelligence. In our survey we study on the various OCR techniques. In this
paper we resolve and examine the hypothetical and numerical models of Optical Character Identification.
The Optical character identification or classification (OCR) and Magnetic Character Recognition (MCR)
techniques are generally utilized for the recognition of patterns or alphabets. In general the alphabets are
in the variety of pixel pictures and it could be either handwritten or stamped, of any series, shape or
direction etc. Alternatively in MCR the alphabets are stamped with magnetic ink and the studying machine
categorize the alphabet on the basis of the exclusive magnetic field that is shaped by every alphabet. Both
MCR and OCR discover utilization in banking and different trade appliances. Earlier exploration going on
Optical Character detection or recognition has shown that the In Handwritten text there is no limitation
lying on the script technique. Hand written correspondence is complicated to be familiar through due to
diverse human handwriting style, disparity in angle, size and shape of calligraphy. An assortment of
approaches of Optical Character Identification is discussed here all along through their achievement.
A Comprehensive Study On Handwritten Character Recognition Systemiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
A Review on Geometrical Analysis in Character Recognitioniosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...ijiert bestjournal
Optical character recognition systems have been effectively developed for the recognition of p rinted characters. Optical character recognition is an awesome computer vision technique with various applications ranging from saving real time scripts digitally and deriving context based intelligence using natural language processing from the texts. One such application is the recognition of machine printed characters. This paper illustrates the technique to identify machine printed characters using Blob detection method and Image processing. In many cases of such machine printed characters there is simi larity between character colour and background colour. There is mix up of reflected light and scattered light. Colour is not consistent across character area or background area. Paper explains how Blob detection technique is used for recognition of these m achines printed characters.
Optical character recognition (OCR) is process of classification of optical patterns contained in a digital image. The process of OCR Recognition involves several steps including pre-processing, segmentation, feature extraction, classification. Pre-processing is for done the basic operation on input image like noise reduction which remove the noisy signal from image. Segmentation stage for segment the given image into line by line and segment each character from segmented line. Future extraction calculates the characteristics of character. A Radial Basis Function Neural Network (RBFNN) is used to classification contains the database and does the comparison.
Character Recognition (Devanagari Script)IJERA Editor
Character Recognition is has found major interest in field of research and practical application to analyze and study characters in different languages using image as their input. In this paper the user writes the Devanagari character using mouse as a plotter and then the corresponding character is saved in the form of image. This image is processed using Optical Character Recognition in which location, segmentation, pre-processing of image is done. Later Neural Networks is used to identify all the characters by the further process of OCR i.e. by using feature extraction and post-processing of image. This entire process is done using MATLAB.
Because of the rapid growth in technology breakthroughs, including
multimedia and cell phones, Telugu character recognition (TCR) has recently
become a popular study area. It is still necessary to construct automated and
intelligent online TCR models, even if many studies have focused on offline
TCR models. The Telugu character dataset construction and validation using
an Inception and ResNet-based model are presented. The collection of 645
letters in the dataset includes 18 Achus, 38 Hallus, 35 Othulu, 34×16
Guninthamulu, and 10 Ankelu. The proposed technique aims to efficiently
recognize and identify distinctive Telugu characters online. This model's main
pre-processing steps to achieve its goals include normalization, smoothing,
and interpolation. Improved recognition performance can be attained by using
stochastic gradient descent (SGD) to optimize the model's hyperparameters.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Essentials of Automations: The Art of Triggers and Actions in FME
O45018291
1. Ms. P P. Kulkarni et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 5( Version 1), May 2014, pp.82-91
www.ijera.com 82 | P a g e
Online Handwritten Sanskrit Character Recognition Using
Support Vector Classification
Prof. Sonal P.Patil1
, Ms. Priyanka P. Kulkarni2
1
Assistant Professor, Computer Science, GHRIEM, Jalgaon, Maharashtra, India
2
Research Scholar, Computer Science & Engg, GHRIEM, Jalgaon, Maharashtra, India
Abstract
Handwritten recognition has been one of the active and challenging research areas in the field of image
processing. In this Paper, we are going to analyses feature extraction technique to recognize online handwritten
Sanskrit word using preprocessing, segmentation. However, most of the current work in these areas is limited to
English and a few oriental languages. The lack of efficient solutions for Indic scripts and languages such as
Sanskrit has disadvantaged information extraction from a large body of documents of cultural and historical
importance. Here we use Freeman chain code (FCC) as the representation technique of an image character.
Chain code gives the boundary of a character image in which the codes represents the direction of where is the
location of the next pixel. Randomized algorithm is used to generate the FCC. After that, features vector is built.
The criterion of features toinput the classification is the chain code that converted to various features. And
segmentation is applied to evaluate the possible segmentation zone. Accordingly, several generations are
performed to evaluate the individuals with maximum fitness value. Support vector machine (SVM) is chosen for
the classification step.
Key Words: Freeman chain code (FCC), Heuristic method, Support vector machine (SVM),
I. INTRODUCTION
Human can accurately recognize the
handwritten characters if they are neat and clean. It is
very easy task for human beings. The same can do
easily by the kids also. But the same task for machine
is very difficult. Various languages use specific script
to write. Hindi & Marathi are most commonly used
languages by several thousand people [1]. most of the
current work in these areas is limited to English and a
few oriental languages. The lack of efficient solutions
for Indic scripts and languages such as Sanskrit has
hampered information extraction from a large body
of documents of cultural and historical importance.
Sanskrit Character contains complicated curves &
various shapes. So recognition of Sanskrit characters
is difficult & complicated task[2]. All these
considerations make Optical Character recognition
(OCR) with Sanskrit script very challenging. The
ultimate goal of designing a character recognition
system with an accuracy rate of 100 % is quite
difficult because handwritten characters are non
uniform; they can be written in many different styles.
Different writers can be written various sizes of
handwritten character. Even there is variation in
characters written by the same writer at different time
[3] The problem of exchanging data between human
beings and computing machines is challenging.
Basically character recognition is a process, which
associates a symbolic meaning with objects (letters,
symbols and numbers) drawn on an image, i.e.,
character recognition techniques associate a symbolic
identity with the image of a character.
1.1 Optical Character Recognition
Optical Character Recognition deals with
the problem of recognizing optically processed
characters. Optical recognition is performed off-line
after the writing or printing has been completed, as
opposed to on-line recognition where the computer
recognizes the characters as they are drawn shown in
Figure1.1. Both hand printed and printed characters
may be recognized, but the performance is directly
dependent upon the quality of the input documents.
The more constrained the input is, the better will the
performance of the OCR system be. However, when
it comes to totally unconstrained handwriting, OCR
machines are still a long way from reading as well as
humans. However, the computer reads fast and
technical deviances are continually bringing the
technology closer to its ideal [3].
RESEARCH ARTICLE OPEN ACCESS
2. Ms. P P. Kulkarni et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 5( Version 1), May 2014, pp.82-91
www.ijera.com 83 | P a g e
Fig -1: The different areas of character
recognition
1.2 classification of character recognition system
1.2.1 Classification According To Data Acquiring
Process
A) Online CRS
Online Character recognition system
involves electronic digitizer as shown in Figure1.2. A
special electronic pen samples the handwriting input
and writing is done on electronic surface. Digitizer
takes the temporal or dynamic information of writing.
This information consist of pen strokes (i.e. the
writing from pen down to pen up), the order of pen
strokes the direction of writing and the speed of
writing within each stroke. The online handwriting
signal contains additional information that is not
accessible in offline.
Fig -2: Electronic Digitizer
B) Offline CRS
In Offline method a piece of paper is used to
write the character and scan directly into the system
by a scanner or camera as shown in Figure1.3. In this
system, the image of writing is converted into a bit
pattern by an optically digitized device such as
optical scanner or camera. The bit pattern data is
shown by matrix of pixels. The main task of this
offline recognition system is to recognize
handwritten character on letters or parcels.
Fig -3: Handwritten Character captured by
camera
1.2.2. Classification According To Text Type
A) Printed CRS
Printed text includes all the printed materials
such as book, newspaper, magazine and documents.
Machine printed text & numbers have uniform
nature. For any font printed characters are of same
size. The recognition rate is very much dependent on
the age of the documents, quality of the paper and ink
which may result in significant data acquisition noise.
B) Hand written CRS
There is non-uniformity in handwritten
characters. There are different ways for writers to
write the characters. & in various sizes. Even there is
variation in writing of the same writer at different
times. Handwritten character may vary in shape also.
So handwritten character recognition is the most
difficult part of character recognition [2].
The storage of scanned documents have to
be large in size and many processing applications as
searching for a content, editing, maintenance are
either hard or impossible. Such documents require
human beings to process them manually, for
example, postman’s manual processing for
recognition and sorting of postal addresses and zip
code. OCR as the short form of Optical character
recognition, which translates such scanned images of
printed or handwritten documents into machine
encoded text. According to this requirements
translated machine encoded text can be easily edited,
searched and can be processed in many other ways. It
also requires small size for storage in comparison to
scanned documents. Optical character recognition
helps human’s simplicity and reduce their jobs of
manually handling and processing of documents.
Computerized processing to recognize individual
character is required to convert scanned document
into machine encoded form.
1.3. Character Recognition Architecture
Optical character recognition involves many
steps to completely recognize and produce machine
3. Ms. P P. Kulkarni et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 5( Version 1), May 2014, pp.82-91
www.ijera.com 84 | P a g e
encoded text. These phases are termed as: Pre-
processing, Segmentation, Feature extraction,
Classification. The architecture of these phases is
shown in figure 1.2 and these phases are listed below
with brief description [4].
Fig -4: The stages of handwriting recognition
system
Pre-processing: The pre-processing phase normally
includes many techniques applied for binarization,
noise removal, skew detection, slant correction,
normalization, contour making and skeletonization
like processes to make character image easy to
extract relevant features and efficient recognition.
Segmentation: Segmentation phage, which
sometimes considered within pre-processing phase
itself, involves the splitting the image document into
classifiable module object generally isolated
characters or modifiers. Generally practiced
segmentations are line segmentation, word
segmentation, character segmentation and horizontal
segmentation to separate upper and lower modifiers
particularly in context to most Indian scripts.
Feature Extraction: Feature extraction is used to
extract relevant features for recognition of characters
based on these features. First features are computed
and extracted and then most relevant features are
selected to construct feature vector which is used
eventually for recognition. The computation of
features is based on structural, statistical, directional,
moment, transformation like approaches.
Classification: Each pattern having feature vector is
classified in predefined classes using classifiers.
Classifiers are first trained by a training set of pattern
samples to prepare a model which is later used to
recognize the test samples. The training data should
consist of wide varieties of samples to recognize all
possible samples during testing. Some examples of
generally practiced classifiers are- Support Vector
Machine (SVM), K- Nearest Neighbour (K-NN),
Artificial Neural Network (ANN) and Probabilistic
Neural Network (PNN).
1.4 Issues of handwritten character recognition
To recognize handwritten documents, either online or
offline, the character recognition is much affected by
style variations of handwriting by different writers
and even different styles of same writer on different
times. Distortion and noise incorporated while
digitization is also a major issue in character
recognition that affects the recognition accuracy
negatively[4]. many character recognition issues
regarding handwritten character recognition are
listed below:
Handwriting Style Variations
Different writers and even same writer have
different handwriting styles. Many times a person
finds himself/herself unable to recognize his/her own
handwriting. Hence, practically it is much difficult to
recognize handwriting by machine efficiently.
Deformed geometry, slants, skews, overlapping,
noise, distortion are inserted by different writers in
different ways. Geometric properties like aspect ratio,
position and size vary. One can note that some kind
of variations also exists in each sample of a character
although such samples share high degree of
similarities. The shape of a character is also
influenced by the word in which it is appearing.
Characters can look similar although their number of
strokes, and the drawing order and direction of the
strokes may vary considerably.
Constrained and Unconstrained Handwriting
The characters to recognize may be
constrained or unconstrained. Because unconstrained
documents include all possible style variation, so
such document are much difficult to recognize. To
make this recognition process easier atleast for
laboratory work, constrained documents are practiced
to recognize.
In constrained document format, the
handwritten samples are written in standard format
that make the characters easy to recognize. The
constrained document has box discrete and space
discrete nature of characters or words. In box discrete
nature, each character is written in separate standard
sized box. In space discrete nature, characters are
written have much space from one-another to make
the segmentation and recognition easier.
Unconstrained documents consist of
touching, overlapping and cursive characters. Cursive
writing makes recognition difficult due to stroke
variability. Touching characters are difficult to
segment the characters from each other and
overlapping characters makes this situation worse.
4. Ms. P P. Kulkarni et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 5( Version 1), May 2014, pp.82-91
www.ijera.com 85 | P a g e
Writer Dependent or Independent Recognition
Writer dependent recognition system is used to
recognize the samples of only those writers whose
samples are takes to train the recognition system.
That is, are specific to a group of writers. In writer
dependent system, all possible style variations can be
trained to system, hence a higher recognition rate can
be obtained.
At the other hand, writer independent
system needs the generalization of the recognition
system also to recognize the handwritten samples of
unknown writers. Hence, it needs to train the system
with all possible and commonly used style variations.
Hence it need to train the system with large number
of samples taken from large number of different
types of writers, to make the recognition system
generalized. Due to recognition of unknown samples,
the recognition rate of writer independent system is
comparatively lower. In practice, writer independent
systems are in more demand because of generalized
application.
Personal and Situational Aspects
Personal factors include writer’s writing
style which might be affected by handedness- either
left handed or right handed and. Many persons are
habitual to write random or specific inclined text
lines. A good recognition requires neat and clean
handwriting and this writing style also depends on
profession of writers to some extent. The situational
aspects depend on the facts either writer is interested
or not to write, how much attention a writer is
paying, text is written giving proper time or in hurry,
whether there was any interruption while writing,
how was the quality of material used for writing etc.
Number of stroke classes
Due to the presence of composite characters
in Indian writing systems, a large number of stroke
classes are possible. These stroke classes represent
consonants, vowels and modifiers or combinations of
consonants and vowels. The large number of stroke
classes and the shape complexity of various strokes
increase the complexity of the recognition system.
This is addressed by choosing the efficient
recognition algorithm which does not degrade with a
large number of classes.
Directionality of writing
There exist big variations in the
directionality of writing strokes and stroke segments
which could affect the uniformity in stroke
representation using certain features. It is necessary
to identify writing direction invariant features for
representing the stroke .
1.5Application of handwritten character
recognition
Handwritten character recognition system is
basically used to convert a sample image character to
corresponding machine coded character. This basic
characteristic can be used to derive many other
practical applications. Some of these applications are
listed below:
Check reading: The abundant numbers of
checks are need to process in banks. Handwritten
or printed OCR system can be used for
automatically reading the name of recipient,
signature verification, filled amount reading and
reading all other information.
Form processing: Form processing can be used
to process forms and documents of public
applications. In such forms handwritten
information is written in space provided. This
handwritten information can be processed by
Handwritten OCR system automatically.
Signature verification: On legal or other
documents that include the signatures of
authorities and persons, signature can be verified
using handwritten character recognition system.
The system once can be trained by various
samples of signatures of required persons and
later on any document their signatures can be
verified.
II. LITEATURE SURVEY
2.1Overview
In this chapter literature survey regarding
the previous work and related approaches about
character recognition is presented. The most
advanced and efficient OCR systems are designed for
English, Chinese and Japanese like scripts and
languages. In context to Indian languages and scripts,
recently a significant research work is proposed.
Most of Indian work on character recognition is
dominated by Devanagari script, which is used in
writing of Hindi, Marathi, and Nepali and Sanskrit
languages. Devanagari script is dominating because
Hindi which is written in Devanagari script is spoken
by a mass of Indian population and is national
language of India. After Devanagari, second position
of recognition research is taken by Bangla script [6].
Recently much research work is also practiced on
other Indian scripts like Gurumukhi [7], Gujarati,
Tamil, Telugu [8] and many other scripts. Most of the
current work in these areas is limited to English and a
few oriental languages. The lack of efficient solutions
for Indic scripts and languages such as Sanskrit has
hampered information extraction from a large body
of documents of cultural and historical importance.
In my literature survey I have studied many
research approached practiced on many languages
and scripts particularly in Indian context. Our
5. Ms. P P. Kulkarni et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 5( Version 1), May 2014, pp.82-91
www.ijera.com 86 | P a g e
emphasis is to study and analyse the feature
extraction approaches, observed or reported results
and many other relevant issues considerable to any
new research work on OCR. After my detailed
literature survey I am able to discover the theme of
my proposed work including the techniques I have
incorporated in various phases, to implement it and to
evaluate my results and conclusions.
2.2 works related to character recognition
The first research work report on
handwritten Devanagari characters was published in
1977 [5]. For Indian languages most of research work
is performed on firstly on Devanagari script and
secondly on Bangla script. U. Pal and B.B.
Chaudhury [6] presented a survey on Indian Script
Character Recognition. This paper introduces the
properties of Indian scripts and work and
methodologies approached to different Indian script.
They have presented the study of the work for
character recognition on many Indian language
scripts including Devanagari, Bangla, Tamil, Oriya,
Gurumukhi, Gujarati and Kannada.
U. Pal, Wakabayashi and Kimura also
presented comparative study of Devanagari
handwritten character recognition using different
features and classifiers [7]. They used four sets of
features based on curvature and gradient information
obtained from binary as well as gray scale images
and compared results using 12 different classifiers as
concluded the best results 94.94% and 95.19% for
features extracted from binary and gray image
respectively obtained with Mirror Image Learning
(MIL) classifier. They also concluded curvature
features to use for better results than gradient features
for most of classifiers.
A later review of research on Devanagari
character recognition is also presented by Vikas
Dungre et al. [8]. They have reviewed the techniques
available for character recognition. They have
introduced image pre-processing techniques for
thresholding, skew detection and correction, size
normalization and thinning which are used in
character recognition. They have also reviewed the
feature extraction using Global transformation and
series expansion like Fourier transform, Gabor
transform, wavelets, moments ; statistical features
like zoning, projections, crossings and distances ;
and some geometrical and topological features
commonly practiced. They also reviewed the
classification using template matching, statistical
techniques, neural network, SVM and combination
of classifiers for better accuracy is practiced for
recognition.
Prachi Mukherji and Priti Rege [9] used
shape features and fuzzy logic to recognize offline
Devanagari character recognition. They segmented
the thinned character into strokes using structural
features like endpoint, cross-point, junction points,
and thinning. They classified the segmented shapes
or strokes as left curve, right curve, horizontal stroke,
vertical stroke, slanted lines etc. They used tree and
fuzzy classifiers and obtained average 86.4%
accuracy.
Giorgos Vamvakas et al. [10], [11] have
described the statistical and structural features they
have used in their approach of Greek handwritten
character recognition. The statistical features they
have used are zoning, projections and profiling, and
crossings and distances. Further through zoning they
derived local features like density and direction
features. In direction features they used directional
histograms of contour and skeleton images. In
addition to normal profile features they described in-
and out- profiles of contour of images. The structural
features they have depicted are end point, crossing
point, loop, horizontal and vertical projection
histograms, radial histogram, radial out-in and in-out
histogram.
Sarbajit Pal et al. [12] have described projection
based statistical approach for handwritten character
recognition. They proposed four sided projections of
characters and projections were smoothed by
polygon approximation.
Wang Jin et al. [13] evolved a series of
recognition systems by using the virtual
reconfigurable architecture-based evolvable
hardware. To improve the recognition accuracy of the
proposed systems, a statistical pattern recognition-
inspired methodology was introduced.
Sandhya Arora et al. [1] used intersection, shadow
features, chain code histogram and straight line
fitting features and weighted majority voting
technique for combining the classification decision
obtained from different Multi Layer Perceptron
(MLP) based classifier. They obtained 92.80%
accuracy results for handwritten Devanagari
recognition. They also used chain code histogram and
moment based features in [13] to recognize
handwritten Devanagari characters. Chain code was
generated by detecting the direction of the next in-
line pixel in the scaled contour image. Moment
features were extracted from scaled and thinned
character image.
Fuzzy directional features are used in [14] in
which directional features were derived from the
angle between two adjacent curvature points. This
approach was used to recognize online handwritten
Devanagari characters and result was obtained with
upto 96.89% accuracy.
2.3 work on Sanskrit character recognition
Namita Dwivedi have described recognition
of Sanskrit word using Prewitt’s operator for
6. Ms. P P. Kulkarni et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 5( Version 1), May 2014, pp.82-91
www.ijera.com 87 | P a g e
extracting the features from an image thinning
process is applied in pre processing technique
Thinning is an important pre-processing step in OCR.
The purpose of thinning is to delete redundant
information and at the same time retain the
characteristic features of the image. Freeman Chain
code is one of the representation techniques that is
useful for image processing, shape analysis and
pattern recognition fields is used with heuristic
approach for feature extraction. Genetic algorithm is
used for non linear segmentation of multiple
characters. This recognition model was built from
SVM classifiers for higher level classification
accuracy [2]
III. METHODOLOGY
This section describes about the architecture
model of HCR in detail.
3.1 Preprocessor
In pre-processing scanned document is
converted to binary image and various other
techniques to remove noise, to make it ready and
appropriate before feature extraction and further
computations for recognition are applied. These
techniques include segmentation to isolated
individual characters, skeletonization, contour
making, normalization, filtration etc. Which types of
pre-processing techniques will suite; it highly
depends on our requirements and also is influenced
by mechanism adopted in later steps. Some of the
pre-processing techniques are listed and discussed
briefly in following sections
A.Gray Scale Image
If the input document is scanned in colored
image format, It may be required to first convert it
into gray scale, before converting to binary image
B. Binarization
Binarization converts colored (RGB) or gray
scale image into binary image. In case of converting
colored image it first needs to convert it into gray
image. To convert a gray image into binary image we
require to specify threshold value for gray level,
dividing or mapping range of gray level into two
levels either black or white i.e. either 0 or 1 not
between it [16].
C. Smoothing and Noise Removal
Smoothing operations are used to blur the
image and reduce the noise. Blurring is used in pre-
processing steps such as removal of small details
from an image. In binary images, smoothing
operations are used to reduce the noise or to
straighten the edges of the characters, for example, to
fill the small gaps or to remove the small bumps in
the edges (contours) of the characters [17].
D. Skew Detection and Correction
Deviation of the baseline of the text from
horizontal direction is called skew. Document skew
often occurs during document scanning or copying.
This effect visually appears as a slope of the text
lines with respect to the x-axis, and it mainly
concerns the orientation of the text lines.
E. Slant Correction
The character inclination that is normally
found in cursive writing is called slant. The general
purpose of slant correction is to reduce the variation
of the script and specifically to improve the quality of
the segmentation. To correct the slant presented first
we need to estimate the slant angle (θ), then
horizontal shear transform is applied to all the pixels
of images of the character/digit in order to shift them
to the left or to the right (depending on the sign of the
θ)[15].
F. Character Normalization
Character normalization is considered to be
the most important pre-processing operation for
character recognition. Normally, the character image
is mapped onto a standard plane (with predefined
size) so as to give a representation of fixed
dimensionality for classification. The goal for
character normalization is to reduce the within-class
variation of the shapes of the characters/digits in
order to facilitate feature extraction process and also
improve their classification accuracy. Basically, there
are two different approaches for character
normalization: linear methods and nonlinear
methods. These methods are described in [15].
G. Thinning (Skeleton)
Thinning is an important pre-processing step
in OCR. The purpose of thinning is to delete
redundant information and at the same time retain the
characteristic features of the image. Thinning is
applied to find a skeleton of a character. Skeleton is
an output of thinning process. Thinning is a
morphological operation that is used to remove
selected foreground pixels from binary images,
somewhat like erosion or opening. It can be used for
several applications, but is particularly useful for
skeletonization. In this mode it is commonly used to
tidy up the output of edge detectors by reducing all
lines to single pixel thickness. Thinning is normally
only applied to binary images, and produces another
binary image as output. A simple example of thinning
of a simple binary image is shown in fig 5
7. Ms. P P. Kulkarni et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 5( Version 1), May 2014, pp.82-91
www.ijera.com 88 | P a g e
Fig -5: Skeleton produced by thinning process
3.2 Feature Extraction
A. View based features
This method is based on the fact, that for
correct character-recognition a human usually needs
only partial information about it its shape and
contour. This feature extraction method, which works
on scaled, thinned diarized image, examines four
“views” of each character extracting from them a
characteristic vector, which describes the given
character as shown in 1.5. The view is a set of points
that plot one of four projections of the object (top,
bottom, left and right) it consists of pixels belonging
to the contour of the character and having extreme
values of each block. Thus for 5 × 5 blocks we get 5
× 5 × 8 = 200 features for recognition one of its
coordinates. For example, the top view of a letter is a
set of points having maximal y coordinate for a given
x coordinate. Next, characteristic points are marked
out on the surface of each view to describe the shape
of that view [16].
Fig -6: Selecting characteristic points for four
views
B. Shadow Features of character
Shadow is basically the length of the
projection on the sides as shown in Figure 1.6. For
computing shadow features on scaled binary image,
the rectangular boundary enclosing the character
image is divided into eight octants. For each octant
shadows or projections of character segment on three
sides of the octant dividing triangles are computed so,
a total of 24 shadow features are obtained. Each of
these features is divided by the length of the
corresponding side of the triangle to get a normalized
value [8].
Fig -7: Shadow Features of character
C. Chain Code Histogram of Character Contour
Given a scaled binary image, first find the contour
points of the character image. Consider a 3 × 3
window surrounded by the object points of the image.
If any of the 4-connected neighbor points is a
background point then the object point (P), as shown
in Figure 1.6 is considered as contour point.
Fig -8: Contour point detection
The contour following procedure uses a contour
representation called “chain coding” that is used for
contour shown in Figure 1.7. Each pixel of the
contour is assigned a different code that indicates the
direction of the next pixel that belongs to the contour
in some given direction. Chain code provides the
points in relative position to one another, independent
of the coordinate system. In this methodology of
using a chain coding of connecting neighboring
contour pixels, the points and the outline coding are
captured. Contour following procedure may proceed
in clockwise or in counter clockwise direction. Here,
we have chosen to proceed in a clockwise direction
[8]. The main problem in representing the characters
using FCC is the length of the FCC that depends on
the starting points. Also, during FCC generation that
require traversing each pixel (or node) of the
character, it is often to find the problem of finding
several branches and revisiting the same nodes. To
solve this problem, heuristic is used to generate the
FCC correctly to represent the characters [2].
Fig -9: Chain Coding: (a) direction of
connectivity, (b) 4-connectivity, (c) 8-connectivity
Heuristic is a method to find a solution that
is closed to the best but it does not guarantee that the
best will be found. Heuristic methods are proposed
which is randomized algorithm. The pseudo code of
randomized algorithm is used to generate the FCC of
character. The procedure is as following: start from
the first node, which is node-method and end-node-
method. Node method is to find the first character for
every aspects boundary such as left upper, left lower,
8. Ms. P P. Kulkarni et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 5( Version 1), May 2014, pp.82-91
www.ijera.com 89 | P a g e
right upper and right lower. End node- method is to
find the first character based on the end position of a
character .In this randomized algorithm, if the
number of visited node less than the number of
nodes, there would be three kinds of characteristics,
which are unvisited, visited and taboo neighbors’.
Unvisited neighbors’ are nodes that Never went
through the route searching. Visited neighbors
indicate the nodes that have went through the route
searching. Taboo neighbors are used to keep track of
the visited search space and revisited node with one
step after current node. The criteria of features to
input the segmentation stage is the chain code that
converted to features are from the calculated values
of ratio-upper, ratio-right, ratio-heightweight,ratio-
height and number of string character. The ratio-
upper is calculated from firstly by cropping the image
and then defining the centre of the image character.
After that, the total number of upper character is
divided with the total number of character. This is
done similarly to the ratio-right, ratio-height-weight
and ratio-height. The formula of height character as
shown in Equation [2]
Height=Height of character/Height of image
Randomized Algorithm (Pseudo code)-
Initialize Data while Termination Not Met do
Select First Node Randomly
{
Node-Method, End –Node-Method
}
while Number of Visited Node<Number of Node do
if there are Unvisited neighbors
Selected one Node Randomly
else if there are visited neighbors
Selected one Node Randomly
else if there are taboo neighbor
Selected one Node Randomly
end if
end while
end while
Display solution
IV. SEGMENTATION
Segmentation partitions the digital image
into multiple segments. These segments form the
local zones of the image in such a way that is useful
to extract features for character recognition. It is a
process of assigning a label to each pixel in an image
such that pixels with same label share certain visual
characteristics. In character recognition generally we
need following types of segmentation processes one
after another sequentially proceeding to segmentation
into smaller objects. These segmentation processes
are discussed below. The first three types of
segmentation are discussed in [19].
A. Line Segmentation: However straight text lines
having enough thoroughly horizontal white space
between lines, can be segmented easily, but in
practice, particularly in the case of handwritten texts,
this technique cannot succeed. In context of India
script having header lines a prevalent approach is to
detect lines by horizontal projections. Header line is
used to have maximum number of pixels while base
line is used to have minimum number of pixels in
horizontal projections.
B. Word Segmentation: After line segmentation, the
text in each line is segmented to detect words. It is
called word segmentation. Word segmentation is
easier than line segmentation, because there is
generally enough space presents between words and
each word is bounded by header line having no space
within word.
C.Character Segmentation: Now after word
segmentation, each word needs to segment into
characters. It is called character segmentation. In
Indian script having header lines use to remove
header line first to have vertical space between
characters within the word under consideration.
D.Zone Segmentation: In most Indian scripts like
Devnagari, Sanskrit there are modifiers presented
either above the header line or below the base line.
The basic and conjunct characters are presented in
the middle horizontal zone of below the header line
and above the base line. Most Indian researchers
prefer to segment the text horizontally in three zones
namely upper, middle and lower zones. Header line
and base line, having maximum and minimum
number of pixels in horizontal projections
respectively are used to separate these zones.
V. CLASSIFIERS
OCR systems extensively use the
methodologies of pattern recognition, which assigns
an unknown sample to a predefined class. Various
techniques for OCR are investigated by the
researchers. The Concept of SVM (Support Vector
Machine) was introduced by Vapnik and co-workers.
It gains popularity because it offers the attractive
features and powerful machinery to tackle the
problem of classification i.e., we need to know which
belongs to which group and promising empirical
performance. The SVM is based on statistical
learning theory. SVM’s better generalization
performance is based on the principle of Structural
Risk Minimization (SRM) .The concept of SRM is to
maximize the margin of class separation. The SVM
was defined for two-class problem and it looked for
optimal hyper-plane, which maximized the distance,
9. Ms. P P. Kulkarni et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 5( Version 1), May 2014, pp.82-91
www.ijera.com 90 | P a g e
the margin, between the nearest examples of both
classes, named SVM [2].
At present SVM is popular classification
tool used for pattern recognition and other
classification purposes. Support vector machines
(SVM) are a group of supervised learning methods
that can be applied to classification or regression. The
standard SVM classifier takes the set of input data
and predicts to classify them in one of the only two
distinct classes. SVM classifier is trained by a given
set of training data and a model is prepared to
classify test data based upon this model. For
multiclass classification problem, we decompose
multiclass problem into multiple binary class
problems, and we design suitable combined multiple
binary SVM classifiers [17].
Support Vector Machines are based on the
concept of decision planes that define decision
boundaries. A decision plane is one that separates
between a set of objects having different class
memberships. The classifier that separates a set of
objects into their respective classes with a line. Most
classification tasks, however, are not that simple, and
often more complex structures are needed in order to
make an optimal separation, i.e., correctly classify
new objects (test cases) on the basis of the examples
that are available (train cases).
SVM Algorithm-
1. Choose a kernel function
2. Choose a value for C (control over fitting)
3. Solve the quadratic programming problem (many
software packages available)
4. Construct the discriminant function from the
support vectors
VI. PERFORMANCE ANALYSIS
Researchers have investigated OCR for a
number of Indian scripts: Devnagari, Tamil, Telugu,
Bengali, and Kannada, Gurumukhi by using different
feature extraction technique and different classifier.
Some analyses from different research paper are
listed below.
Table -1: Performance analysis
VII. CONCLUSIONS
In this paper, we have presented a system
for recognizing a handwritten character. This
proposes a model for handwritten Sanskrit character
recognition. The model starts with pre-processing.
The pre-processing stage involves all of the
operations to produce a clean character image, so that
it is can be used directly and efficiently by the feature
extraction stage. Thinning process is used in the pre-
processing stage that produced a skeleton of a
character. The second step is feature extraction. FCC
is generated from the characters that used as the
features for segmentation. The main problem in
representing the characters using FCC is the length of
the FCC that depends on the starting points. Also,
during FCC generation that require traversing each
pixel (or node) of the character, it is often to find the
problem of finding several branches and revisiting
the same nodes. To solve this problem, heuristic is
used to generate the FCC correctly to represent the
characters. The third step is segmentation using the
features generated from FCC. Correct segmentation
of characters is mandatory for their successful
10. Ms. P P. Kulkarni et al Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 5( Version 1), May 2014, pp.82-91
www.ijera.com 91 | P a g e
recognition. This recognition model was built from
SVM classifiers for higher level classification
accuracy.
REFERENCES
[1] Arora, D. Bhattacharjee, M. Nasipuri, D.K.
Basu, and M. Kundu, “Combining
Multiple Feature Extraction Techniques for
Handwritten Devanagari Character
Recognition”, Proc. IEEE Region 10
Colloquium and Third Intl. Conf.
Industrial & Information Systems,
Kharagpur (India), pp. 978-1-4244-2806
2008.
[2] Namita Dwivedi, Kamal Srivastava and
Neelam Arya,” sanskrit word recognition
using prewitt’s operator and support vector
classification”, IEEE International
Conference on Emerging Trends in
Computing, Communication and
Nanotechnology (ICECCN 2013)
[3] Prof. M.S.Kumbhar1, Y.Y.Chandrachud,
“Handwritten Marathi Character
Recognition Using Neuarl Network”,
International Journal of Emerging
Technology and Advanced Engineering,
Volume 2, Issue 9,September 2012.
[4] Shruthi Kubatur, Maher Sid-Ahmed, Majid
Ahmadi,’’ A Neural Network Approach to
Online Devanagari Handwritten
Character Recognition”
[5] Jayaraman, A., Chandra Sekhar, C.,
Chakravarthy, V. S., “Modular approach
to recognition of strokes in Telugu
script”, Proceedings of the Ninth
International Conference on Document
Analysis and Recognition, September 23-
26, pp 501-505, 2007.
[6] U. Pal, B.B. Chaudhury, “Indian Script
Character Recognition: A Survey”,
Pattern Recognition, Elsevier, pp. 1887-
1899, 2004.
[7] U. Pal, Wakabayashi, Kimura,
“Comparative Study of Devanagari
Handwritten Character Recognition using
Different Feature and Classifiers”, 10th
International Conference on Document
Analysis and Recognition, pp. 1111-1115,
2009.
[8] Vikas J Dungre et al., “A Review of
Research on Devanagari Character
Recognition”, International Journal of
Computer Applications, Volume-12, No.2,
pp. 8-15, November 2010.
[9] Prachi Mukherji, Priti Rege, “Shape Feature
and Fuzzy Logic Based Offline Devanagari
Handwritten Optical Character
Recognition”, Journal of Pattern
Recognition Research 4,pp. 52-68, 2009.
[10] G. Vamvakas, B. Gatos, S. Petridis, N.
Stamatopoulos, “Optical Character
Recognition for Handwritten Characters”.
[11] G. Vamvakas, B. Gatos, S. Petridis, N.
Stamatopoulos, "An Efficient Feature
Extraction and Dimensionality Reduction
Scheme for Isolated Greek Handwritten
Character Recognition", Ninth International
Conference on Document Analysis and
Recognition(ICDAR), Vol.2, pp. 1073-1077,
September 2007.
[12] Sarbajit Pal, Jhimli Mitra, Soumya Ghose,
Paromita Banerjee, "A Projection
Based Statistical Approach for Handwritten
Character Recognition," in Proceedings of
International Conference on Computational
Intelligence and Multimedia Applications,
Vol. 2, pp.404-408, 2007.
[13] Wang Jin, Tang Bin-bin, Piao Chang-hao,
Lei Gai-hui, "Statistical method-based
evolvable character recognition
system",IEEE International Symposium on
Industrial Electronics (ISIE), pp. 804-808,
July 2009.
[14] Arora, D. Bhattacharjee, M. Nasipuri,
M.Kundu, D.K. Basu, “Application of
Statistical Features in Handwritten
Devanagari Character Recognition”,
International Journal of Recent Trends in
Engineering (IJRTE), Vol. 2, No. 2, pp. 40-
42, November 2009.
[15] Mehmet Sezgin and Bulent Sankur, “Survey
over image thresholding techniques and
quantitative performance
evaluation”,Journal of Electronic
ImagingVol. 13, Issue 1, pp.146–165,
January 2004.
[16] Sandhya Arora, Debotosh Bhattacharjee,
Mita Nasipuri, L. Malik , M. Kundu and D.
K. Basu, “ Performance Comparison of
SVM and ANN for Handwritten
Devanagari Character Recognition”, IJCSI
International Journal of Computer Science
Issues, Vol. 7, Issue 3, May 2010.
[17] Dr.Renu Dhir and Mrs.Rajneesh
Rani,”Handwritten Gurumukhi character
recognition”
[18] Mahesh Jangid, “Devanagari Isolated
Character Recognition by using
Statistical features (Foreground Pixels
Distribution, Zone Density and Background
Directional Distribution feature and SVM
Classifier) ”,International Journal on
Computer Science and Engineering Vol. 3,
No. 6, June 2011.