SlideShare a Scribd company logo
1 of 5
Download to read offline
ONLINE HANDWRITTEN SCRIPT RECOGNITION
                                    IEEE APPROACH
Automatic identification of handwritten script facilitates many important applications such as
automatic transcription of multilingual documents and search for documents on the Web containing
a particular script. The increase in usage of handheld devices which accept handwritten input has
created a growing demand for algorithms that can efficiently analyze and retrieve handwritten data.
This project proposes a method to classify words and lines in an online handwritten document into
one of the six major scripts: Arabic, Cyrillic, Devnagari, Han, Hebrew, or Roman. The classification
is based on 11 different spatial and temporal features extracted from the strokes of the words.
The proposed system attains an overall classification accuracy of 87.1 percent at the word level with
5-fold cross validation on a data set containing 13,379 words. The classification accuracy improves
to 95 percent as the number of words in the test sample is increased to five, and to 95.5 percent for
complete text lines consisting of an average of seven words.


OBJECTIVE
•   Automatic identification of handwritten script facilitates many important applications such as
    automatic transcription of multilingual documents and search for documents on the Web
    containing a particular script.
•   The increase in usage of handheld devices which accept handwritten input has created a
    growing demand for algorithms that can efficiently analyze and retrieve handwritten data.
•   This project proposes a method to classify words and lines in an online handwritten document
    into one of the six major scripts: Arabic, Cyrillic, Devnagari, Han, Hebrew, or Roman.


DESCRIPTION OF PROBLEM
The field of work on automatic script recognition deals with offline documents, i.e., documents which
are either handwritten or printed on a paper and then scanned to obtain a two-dimensional digital
representation. The existing method deals with script recognition in off-line.
Today we have lot of electronic devices and lot of input devices are available such as digital pen,
digital tablet, stylus through which user can input any type of script in side computer. Such devices,
which generate handwritten documents with online or dynamic (temporal) information, require
efficient algorithms for processing and retrieving handwritten data.
Online documents may be written in different languages and scripts. A single document page in
itself may contain text written in multiple scripts. So the project aims to develop efficient algorithm to
recognize the script.

Existing System
The existing method deals with languages are identified:
•   Using projection profiles of words and character shapes.
•   Using horizontal projection profiles and looking for the presence or absence of specific shapes
    in different scripts.
•   Existing method deals with only few characteristics.
•   Most of the method does this in off-line.

PROPOSED SYSTEM
The proposed method uses the features of connected components to classify six different scripts
(Arabic, Chinese, Cyrillic, Devnagari, Japanese, and Roman) and reported a classification accuracy
of 88 percent on document pages.
There are a few important aspects of online documents that enable us to process them in a
fundamentally different way than offline documents.
The most important characteristic of online documents is that they capture the temporal sequence of
strokes while writing the document. This allows us to analyze the individual strokes and use the
additional temporal information for both script identification as well as text recognition.
In the case of online documents, segmentation of foreground from the background is a relatively
simple task as the captured data, i.e., the (x; y) coordinates of the locus of the stylus, defines the
characters and any other point on the page belongs to the background.
We use stroke properties as well as the spatial and temporal information of a collection of strokes to
identify the script used in the document. Unfortunately, the temporal information also introduces
additional variability to the handwritten characters, which creates large intraclass variations of
strokes in each of the script classes.


Data collection and Preprocessing
The control for drawing that is writing the script is provided. User can choose the language and
write the script document with several pages and store inside script folder. For one language
various script can be written and stored. These documents are samples used for recognition
process. When the number of sample increases the error rate for recognition can be reduced.
So better keep as such script as possible before you are going for recognition.           During
preprocessing, the individual strokes are resampled to make the sampled points equidistant. This
helps to reduce the variations in scripts due to different writing speeds and to avoid anomalous
cases such as having a large number of samples at the same position when the user holds the pen
down at a point. The strokes are then smoothed using a Gaussian (lowpass) filter.
The x and y coordinates of the sequence of points are independently filtered by convolution with
one-dimensional Gaussian kernels. This reduces noise due to pen vibrations and errors in the
sensing mechanism.
The individual strokes are again resampled to make the points equidistant. The resampling
distances in both cases were set to 10 pixels and the standard deviation of the Gaussian was set to
6, both determined experimentally for the data. During the lowpass filtering and resampling
operations, the critical points in a stroke are retained.
A critical point is defined as a point in the stroke where the x or y direction of the stroke reverses
(changes sign), in addition to the extreme points (pen-up or pen-down) of the stroke. Fig. 3 shows
an example of preprocessing the online character a (consisting of a single stroke).




The character “r” written in two different styles. The offline representations ((a) and (b)) look similar,
but the temporal variations make them look very different ((c) and (d)).
The writing direction is indicated using arrows in (a) and (b). The vertical axes in (c) and (d)
represent time.
Each user was asked to write one page of text in a particular script, with each page containing
approximately 20 lines of text. No restriction was imposed on the content or style of writing (cursive
or handprint).
The writers consisted of college graduate students, professors, and employees in private companies
and businessmen. The details of the database used for this work. Multiple pages from some of the
writers were collected at different times.
Line and Word Detection
The data available to a script recognizer is usually a complete handwritten page or a subset of it. To
recognize the script of individual lines or words in the page, we first need to segment the page into
lines and words. The problem of text line identification in online documents has been attempted.
To identify the individual lines, first the interline distance is estimated. The interline distance, d, is
defined as the distance between successive peaks in the autocorrelation of the y-axis projection of
the text.The y-axis projection of the document and the autocorrelation of the projection.
The interline distance estimate is indicated as d. Lines are identified by finding valleys in the
projection, keeping the interline distance as a guiding factor. To avoid local minima, we choose only
those points, which have the smallest magnitude within a window of width d, as valleys.
The text-line boundaries identified for the document along with the y-axis projection. Once the line
boundaries are obtained, the text is divided into lines by collecting all the strokes which fall in
between two successive line boundaries.
The temporal information from stroke order is used to disambiguate strokes which fall across line
boundaries and to correctly group small strokes, which may fall into an adjacent line.
A word is defined as a set of strokes that overlap horizontally. The segmentation of a line into words
is done using an x-axis projection of the text in the line.
The valleys in the projection are noted as word boundaries and the strokes which fall between two
boundaries are collected and labeled as a word. The minimum width of a valley in the projection for
word segmentation was experimentally determined as n numbers pixels for the resolution of the
device used.


FEATURE EXTRACTION
Each sample or pattern that we attempt to classify is either a word or a set of contiguous words in a
line. It is helpful to study the general properties of each of the six scripts for feature extraction.
1.   Arabic: Arabic is written from right to left within a line and the lines are written from top to
     bottom. A typical Arabic character contains a relatively long main stroke which is drawn from
     right to left, along with one to three dots.
     The character set contains three long vowels. Short markings (diacritics) may be added to the
     main character to indicate short vowels. Due to these diacritical marks and the dots in the
     script, the length of the strokes vary considerably.
2.   Cyrillic: Cyrillic script looks very similar to the cursive Roman script. The most distinctive
     features of Cyrillic script, compared to Roman script are: 1) individual characters, connected
     together in a word, form one long stroke, and 2) the absence of delayed strokes. Delayed
     strokes cause movement of the pen in the direction opposite to the regular writing direction.
3.   Devnagari: The most important characteristic of Devnagari script is the horizontal line present
     at the top of each word, called “Shirorekha”. These lines are usually drawn after the word is
     written and hence are similar to delayed strokes in Roman script. The words are written from
     left to right in a line.
4.   Han: Characters of Han script are composed of multiple short strokes. The strokes are usually
     drawn from top to bottom and left to right within a character. The direction of writing of words in
     a line is either left to right or top to bottom. The database used in this study contains Han script
     text of the former type (horizontal text lines).
5.   Hebrew: Words in a line of Hebrew script are written from right to left and, hence, the script is
     temporally similar to Arabic. The most distinguishing factor of Hebrew from Arabic is that the
     strokes are more uniform in length in the former.
6.   Roman: Roman script has the same writing direction as Cyrillic, Devnagari, and Han scripts. We
     have already noted the distinguishing features of these scripts compared to the Roman script.
     In addition, the length of the strokes tends to fall between that of Devnagari and Cyrillic scripts.
     The features are extracted either from the individual strokes or from a collection of strokes.
     Here, we describe the features and their method of computation.
Horizontal Interstroke Direction (HID)
This is the sum of the horizontal directions between the starting points ofconsecutive strokes in the
pattern. The feature essentially captures the writing direction within a line.

Shirorekha Strength
This feature measures the strength of the horizontal line component in the pattern.

Average Stroke length
Average length of individual strokes in the pattern.

Shirorekha Confidence
We compute a confidence measure for a stroke being a Shirorekha Each stroke in the pattern is
inspected for three properties width of a word, always occur at the top of the word, and are
horizontal.

Recognition
The above mentioned statistics are calculated and the classifier is used to find which script is used
in the document.
In this project the classifier used is k-Nearest Neighbor classifier. It finds the distance between two
feature vectors was computed using the Euclidean distance metric.
The transformation was applied to every feature to make their mean zero, and variance unity, in the
training set. This transformation is referred to as normalization.
With help of this training set database, the experiments are performed and this classifier is invoked
to recognize the script.

System Analysis
System analysis can be defined, as a method that is determined to use the resources, machine in
the best manner and perform tasks to meet the information needs of an organization.

System Description
It is also a management technique that helps us in designing a new systems or improving an
existing system.
The four basic elements in the system analysis are
     •    Output
     •    Input
     •    Files
     •    Process
The above-mentioned are mentioned are the four basis of the System Analysis.

System Design
Design is concerned with identifying software components specifying relationships among
components. Specifying software structure and providing blue print for the document phase.
Modularity is one of the desirable properties of large systems. It implies that the system is divided
into several parts. In such a manner, the interaction between parts is minimal clearly specified.
Design will explain software components in detail. This will help the implementation of the system.
Moreover, this will guide the further changes in the system to satisfy the future requirements.

Form Design
Form is a tool with a message; it is the physical carrier of data or information. The user interface
form gives option for the user to select the language and user can write a new document, can open
an existing document to view or can able to delete.
If new document written it can be saved into the database.
HARDWARE & SOFTWARE SPECIFICATION
Processor    :    Intel Processor IV
RAM          :    128 MB
Hard disk    :    20 GB
CD drive     :    40 x Samsung
Floppy drive :    1.44 MB
Monitor      :    15’ color Monitor
Keyboard     :    108 mercury keyboard
Mouse        :    Logitech mouse


Software Specification
Operating System      –     Windows XP/2000
Language used         –     Java


                 ONLINE HANDWRITTEN SCRIPT RECOGNITION

System Flow Chart

More Related Content

Similar to Online Handwritten Script Recognition with 87.1% Accuracy

A survey on Script and Language identification for Handwritten document images
A survey on Script and Language identification for Handwritten document imagesA survey on Script and Language identification for Handwritten document images
A survey on Script and Language identification for Handwritten document imagesiosrjce
 
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGESSCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGEScscpconf
 
ARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORK
ARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORKARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORK
ARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORKijaia
 
Online Hand Written Character Recognition
Online Hand Written Character RecognitionOnline Hand Written Character Recognition
Online Hand Written Character RecognitionIOSR Journals
 
Haralick Texture Features based Syriac(Assyrian) and English or Arabic docume...
Haralick Texture Features based Syriac(Assyrian) and English or Arabic docume...Haralick Texture Features based Syriac(Assyrian) and English or Arabic docume...
Haralick Texture Features based Syriac(Assyrian) and English or Arabic docume...Editor IJCATR
 
An Empirical Study on Identification of Strokes and their Significance in Scr...
An Empirical Study on Identification of Strokes and their Significance in Scr...An Empirical Study on Identification of Strokes and their Significance in Scr...
An Empirical Study on Identification of Strokes and their Significance in Scr...IJMER
 
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPS
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPSANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPS
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPSIJCSEA Journal
 
International Journal of Computer Science, Engineering and Applications (IJCSEA)
International Journal of Computer Science, Engineering and Applications (IJCSEA)International Journal of Computer Science, Engineering and Applications (IJCSEA)
International Journal of Computer Science, Engineering and Applications (IJCSEA)IJCSEA Journal
 
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPS
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPSANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPS
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPSIJCSEA Journal
 
Script identification using dct coefficients 2
Script identification using dct coefficients 2Script identification using dct coefficients 2
Script identification using dct coefficients 2IAEME Publication
 
Preprocessing Phase for Offline Arabic Handwritten Character Recognition
Preprocessing Phase for Offline Arabic Handwritten Character RecognitionPreprocessing Phase for Offline Arabic Handwritten Character Recognition
Preprocessing Phase for Offline Arabic Handwritten Character RecognitionEditor IJCATR
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESA NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESijnlc
 
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTSDIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTSijait
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONIJNLC Int.Jour on Natural Lang computing
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONkevig
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONijnlc
 
Wavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script IdentificationWavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script IdentificationCSCJournals
 
A study of feature extraction for Arabic calligraphy characters recognition
A study of feature extraction for Arabic calligraphy characters recognitionA study of feature extraction for Arabic calligraphy characters recognition
A study of feature extraction for Arabic calligraphy characters recognitionIJECEIAES
 

Similar to Online Handwritten Script Recognition with 87.1% Accuracy (20)

A survey on Script and Language identification for Handwritten document images
A survey on Script and Language identification for Handwritten document imagesA survey on Script and Language identification for Handwritten document images
A survey on Script and Language identification for Handwritten document images
 
P01725105109
P01725105109P01725105109
P01725105109
 
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGESSCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
 
ARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORK
ARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORKARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORK
ARABIC ONLINE HANDWRITING RECOGNITION USING NEURAL NETWORK
 
Online Hand Written Character Recognition
Online Hand Written Character RecognitionOnline Hand Written Character Recognition
Online Hand Written Character Recognition
 
Haralick Texture Features based Syriac(Assyrian) and English or Arabic docume...
Haralick Texture Features based Syriac(Assyrian) and English or Arabic docume...Haralick Texture Features based Syriac(Assyrian) and English or Arabic docume...
Haralick Texture Features based Syriac(Assyrian) and English or Arabic docume...
 
An Empirical Study on Identification of Strokes and their Significance in Scr...
An Empirical Study on Identification of Strokes and their Significance in Scr...An Empirical Study on Identification of Strokes and their Significance in Scr...
An Empirical Study on Identification of Strokes and their Significance in Scr...
 
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPS
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPSANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPS
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPS
 
International Journal of Computer Science, Engineering and Applications (IJCSEA)
International Journal of Computer Science, Engineering and Applications (IJCSEA)International Journal of Computer Science, Engineering and Applications (IJCSEA)
International Journal of Computer Science, Engineering and Applications (IJCSEA)
 
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPS
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPSANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPS
ANOMALY DETECTION IN ARABIC TEXTS USING NGRAMS AND SELF ORGANIZING MAPS
 
Script identification using dct coefficients 2
Script identification using dct coefficients 2Script identification using dct coefficients 2
Script identification using dct coefficients 2
 
Preprocessing Phase for Offline Arabic Handwritten Character Recognition
Preprocessing Phase for Offline Arabic Handwritten Character RecognitionPreprocessing Phase for Offline Arabic Handwritten Character Recognition
Preprocessing Phase for Offline Arabic Handwritten Character Recognition
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESA NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
 
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTSDIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
 
Wavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script IdentificationWavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script Identification
 
A study of feature extraction for Arabic calligraphy characters recognition
A study of feature extraction for Arabic calligraphy characters recognitionA study of feature extraction for Arabic calligraphy characters recognition
A study of feature extraction for Arabic calligraphy characters recognition
 

More from ncct

Biomedical Wearable Device For Remote Monitoring Ofphysiological Signals
Biomedical Wearable Device For Remote Monitoring Ofphysiological SignalsBiomedical Wearable Device For Remote Monitoring Ofphysiological Signals
Biomedical Wearable Device For Remote Monitoring Ofphysiological Signalsncct
 
Digital Water Marking For Video Piracy Detection
Digital Water Marking For Video Piracy DetectionDigital Water Marking For Video Piracy Detection
Digital Water Marking For Video Piracy Detectionncct
 
Self Repairing Tree Topology Enabling Content Based Routing In Local Area Ne...
Self Repairing Tree Topology Enabling  Content Based Routing In Local Area Ne...Self Repairing Tree Topology Enabling  Content Based Routing In Local Area Ne...
Self Repairing Tree Topology Enabling Content Based Routing In Local Area Ne...ncct
 
Cockpit White Box
Cockpit White BoxCockpit White Box
Cockpit White Boxncct
 
Rail Track Inspector
Rail Track InspectorRail Track Inspector
Rail Track Inspectorncct
 
Botminer Clustering Analysis Of Network Traffic For Protocol And Structure...
Botminer   Clustering Analysis Of Network Traffic For Protocol  And Structure...Botminer   Clustering Analysis Of Network Traffic For Protocol  And Structure...
Botminer Clustering Analysis Of Network Traffic For Protocol And Structure...ncct
 
Bot Robo Tanker Sound Detector
Bot Robo  Tanker  Sound DetectorBot Robo  Tanker  Sound Detector
Bot Robo Tanker Sound Detectorncct
 
Distance Protection
Distance ProtectionDistance Protection
Distance Protectionncct
 
Bluetooth Jammer
Bluetooth  JammerBluetooth  Jammer
Bluetooth Jammerncct
 
Crypkit 1
Crypkit 1Crypkit 1
Crypkit 1ncct
 
I E E E 2009 Java Projects
I E E E 2009  Java  ProjectsI E E E 2009  Java  Projects
I E E E 2009 Java Projectsncct
 
B E Projects M C A Projects B
B E  Projects  M C A  Projects  BB E  Projects  M C A  Projects  B
B E Projects M C A Projects Bncct
 
J2 E E Projects, I E E E Projects 2009
J2 E E  Projects,  I E E E  Projects 2009J2 E E  Projects,  I E E E  Projects 2009
J2 E E Projects, I E E E Projects 2009ncct
 
J2 M E Projects, I E E E Projects 2009
J2 M E  Projects,  I E E E  Projects 2009J2 M E  Projects,  I E E E  Projects 2009
J2 M E Projects, I E E E Projects 2009ncct
 
Engineering College Projects, M C A Projects, B E Projects, B Tech Pr...
Engineering  College  Projects,  M C A  Projects,  B E  Projects,  B Tech  Pr...Engineering  College  Projects,  M C A  Projects,  B E  Projects,  B Tech  Pr...
Engineering College Projects, M C A Projects, B E Projects, B Tech Pr...ncct
 
B E M E Projects M C A Projects B
B E  M E  Projects  M C A  Projects  BB E  M E  Projects  M C A  Projects  B
B E M E Projects M C A Projects Bncct
 
I E E E 2009 Java Projects, I E E E 2009 A S P
I E E E 2009  Java  Projects,  I E E E 2009  A S PI E E E 2009  Java  Projects,  I E E E 2009  A S P
I E E E 2009 Java Projects, I E E E 2009 A S Pncct
 
Advantages Of Software Projects N C C T
Advantages Of  Software  Projects  N C C TAdvantages Of  Software  Projects  N C C T
Advantages Of Software Projects N C C Tncct
 
Engineering Projects
Engineering  ProjectsEngineering  Projects
Engineering Projectsncct
 
Software Projects Java Projects Mobile Computing
Software  Projects  Java  Projects  Mobile  ComputingSoftware  Projects  Java  Projects  Mobile  Computing
Software Projects Java Projects Mobile Computingncct
 

More from ncct (20)

Biomedical Wearable Device For Remote Monitoring Ofphysiological Signals
Biomedical Wearable Device For Remote Monitoring Ofphysiological SignalsBiomedical Wearable Device For Remote Monitoring Ofphysiological Signals
Biomedical Wearable Device For Remote Monitoring Ofphysiological Signals
 
Digital Water Marking For Video Piracy Detection
Digital Water Marking For Video Piracy DetectionDigital Water Marking For Video Piracy Detection
Digital Water Marking For Video Piracy Detection
 
Self Repairing Tree Topology Enabling Content Based Routing In Local Area Ne...
Self Repairing Tree Topology Enabling  Content Based Routing In Local Area Ne...Self Repairing Tree Topology Enabling  Content Based Routing In Local Area Ne...
Self Repairing Tree Topology Enabling Content Based Routing In Local Area Ne...
 
Cockpit White Box
Cockpit White BoxCockpit White Box
Cockpit White Box
 
Rail Track Inspector
Rail Track InspectorRail Track Inspector
Rail Track Inspector
 
Botminer Clustering Analysis Of Network Traffic For Protocol And Structure...
Botminer   Clustering Analysis Of Network Traffic For Protocol  And Structure...Botminer   Clustering Analysis Of Network Traffic For Protocol  And Structure...
Botminer Clustering Analysis Of Network Traffic For Protocol And Structure...
 
Bot Robo Tanker Sound Detector
Bot Robo  Tanker  Sound DetectorBot Robo  Tanker  Sound Detector
Bot Robo Tanker Sound Detector
 
Distance Protection
Distance ProtectionDistance Protection
Distance Protection
 
Bluetooth Jammer
Bluetooth  JammerBluetooth  Jammer
Bluetooth Jammer
 
Crypkit 1
Crypkit 1Crypkit 1
Crypkit 1
 
I E E E 2009 Java Projects
I E E E 2009  Java  ProjectsI E E E 2009  Java  Projects
I E E E 2009 Java Projects
 
B E Projects M C A Projects B
B E  Projects  M C A  Projects  BB E  Projects  M C A  Projects  B
B E Projects M C A Projects B
 
J2 E E Projects, I E E E Projects 2009
J2 E E  Projects,  I E E E  Projects 2009J2 E E  Projects,  I E E E  Projects 2009
J2 E E Projects, I E E E Projects 2009
 
J2 M E Projects, I E E E Projects 2009
J2 M E  Projects,  I E E E  Projects 2009J2 M E  Projects,  I E E E  Projects 2009
J2 M E Projects, I E E E Projects 2009
 
Engineering College Projects, M C A Projects, B E Projects, B Tech Pr...
Engineering  College  Projects,  M C A  Projects,  B E  Projects,  B Tech  Pr...Engineering  College  Projects,  M C A  Projects,  B E  Projects,  B Tech  Pr...
Engineering College Projects, M C A Projects, B E Projects, B Tech Pr...
 
B E M E Projects M C A Projects B
B E  M E  Projects  M C A  Projects  BB E  M E  Projects  M C A  Projects  B
B E M E Projects M C A Projects B
 
I E E E 2009 Java Projects, I E E E 2009 A S P
I E E E 2009  Java  Projects,  I E E E 2009  A S PI E E E 2009  Java  Projects,  I E E E 2009  A S P
I E E E 2009 Java Projects, I E E E 2009 A S P
 
Advantages Of Software Projects N C C T
Advantages Of  Software  Projects  N C C TAdvantages Of  Software  Projects  N C C T
Advantages Of Software Projects N C C T
 
Engineering Projects
Engineering  ProjectsEngineering  Projects
Engineering Projects
 
Software Projects Java Projects Mobile Computing
Software  Projects  Java  Projects  Mobile  ComputingSoftware  Projects  Java  Projects  Mobile  Computing
Software Projects Java Projects Mobile Computing
 

Recently uploaded

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 

Online Handwritten Script Recognition with 87.1% Accuracy

  • 1. ONLINE HANDWRITTEN SCRIPT RECOGNITION IEEE APPROACH Automatic identification of handwritten script facilitates many important applications such as automatic transcription of multilingual documents and search for documents on the Web containing a particular script. The increase in usage of handheld devices which accept handwritten input has created a growing demand for algorithms that can efficiently analyze and retrieve handwritten data. This project proposes a method to classify words and lines in an online handwritten document into one of the six major scripts: Arabic, Cyrillic, Devnagari, Han, Hebrew, or Roman. The classification is based on 11 different spatial and temporal features extracted from the strokes of the words. The proposed system attains an overall classification accuracy of 87.1 percent at the word level with 5-fold cross validation on a data set containing 13,379 words. The classification accuracy improves to 95 percent as the number of words in the test sample is increased to five, and to 95.5 percent for complete text lines consisting of an average of seven words. OBJECTIVE • Automatic identification of handwritten script facilitates many important applications such as automatic transcription of multilingual documents and search for documents on the Web containing a particular script. • The increase in usage of handheld devices which accept handwritten input has created a growing demand for algorithms that can efficiently analyze and retrieve handwritten data. • This project proposes a method to classify words and lines in an online handwritten document into one of the six major scripts: Arabic, Cyrillic, Devnagari, Han, Hebrew, or Roman. DESCRIPTION OF PROBLEM The field of work on automatic script recognition deals with offline documents, i.e., documents which are either handwritten or printed on a paper and then scanned to obtain a two-dimensional digital representation. The existing method deals with script recognition in off-line. Today we have lot of electronic devices and lot of input devices are available such as digital pen, digital tablet, stylus through which user can input any type of script in side computer. Such devices, which generate handwritten documents with online or dynamic (temporal) information, require efficient algorithms for processing and retrieving handwritten data. Online documents may be written in different languages and scripts. A single document page in itself may contain text written in multiple scripts. So the project aims to develop efficient algorithm to recognize the script. Existing System The existing method deals with languages are identified: • Using projection profiles of words and character shapes. • Using horizontal projection profiles and looking for the presence or absence of specific shapes in different scripts. • Existing method deals with only few characteristics. • Most of the method does this in off-line. PROPOSED SYSTEM The proposed method uses the features of connected components to classify six different scripts (Arabic, Chinese, Cyrillic, Devnagari, Japanese, and Roman) and reported a classification accuracy of 88 percent on document pages. There are a few important aspects of online documents that enable us to process them in a fundamentally different way than offline documents.
  • 2. The most important characteristic of online documents is that they capture the temporal sequence of strokes while writing the document. This allows us to analyze the individual strokes and use the additional temporal information for both script identification as well as text recognition. In the case of online documents, segmentation of foreground from the background is a relatively simple task as the captured data, i.e., the (x; y) coordinates of the locus of the stylus, defines the characters and any other point on the page belongs to the background. We use stroke properties as well as the spatial and temporal information of a collection of strokes to identify the script used in the document. Unfortunately, the temporal information also introduces additional variability to the handwritten characters, which creates large intraclass variations of strokes in each of the script classes. Data collection and Preprocessing The control for drawing that is writing the script is provided. User can choose the language and write the script document with several pages and store inside script folder. For one language various script can be written and stored. These documents are samples used for recognition process. When the number of sample increases the error rate for recognition can be reduced. So better keep as such script as possible before you are going for recognition. During preprocessing, the individual strokes are resampled to make the sampled points equidistant. This helps to reduce the variations in scripts due to different writing speeds and to avoid anomalous cases such as having a large number of samples at the same position when the user holds the pen down at a point. The strokes are then smoothed using a Gaussian (lowpass) filter. The x and y coordinates of the sequence of points are independently filtered by convolution with one-dimensional Gaussian kernels. This reduces noise due to pen vibrations and errors in the sensing mechanism. The individual strokes are again resampled to make the points equidistant. The resampling distances in both cases were set to 10 pixels and the standard deviation of the Gaussian was set to 6, both determined experimentally for the data. During the lowpass filtering and resampling operations, the critical points in a stroke are retained. A critical point is defined as a point in the stroke where the x or y direction of the stroke reverses (changes sign), in addition to the extreme points (pen-up or pen-down) of the stroke. Fig. 3 shows an example of preprocessing the online character a (consisting of a single stroke). The character “r” written in two different styles. The offline representations ((a) and (b)) look similar, but the temporal variations make them look very different ((c) and (d)). The writing direction is indicated using arrows in (a) and (b). The vertical axes in (c) and (d) represent time. Each user was asked to write one page of text in a particular script, with each page containing approximately 20 lines of text. No restriction was imposed on the content or style of writing (cursive or handprint). The writers consisted of college graduate students, professors, and employees in private companies and businessmen. The details of the database used for this work. Multiple pages from some of the writers were collected at different times.
  • 3. Line and Word Detection The data available to a script recognizer is usually a complete handwritten page or a subset of it. To recognize the script of individual lines or words in the page, we first need to segment the page into lines and words. The problem of text line identification in online documents has been attempted. To identify the individual lines, first the interline distance is estimated. The interline distance, d, is defined as the distance between successive peaks in the autocorrelation of the y-axis projection of the text.The y-axis projection of the document and the autocorrelation of the projection. The interline distance estimate is indicated as d. Lines are identified by finding valleys in the projection, keeping the interline distance as a guiding factor. To avoid local minima, we choose only those points, which have the smallest magnitude within a window of width d, as valleys. The text-line boundaries identified for the document along with the y-axis projection. Once the line boundaries are obtained, the text is divided into lines by collecting all the strokes which fall in between two successive line boundaries. The temporal information from stroke order is used to disambiguate strokes which fall across line boundaries and to correctly group small strokes, which may fall into an adjacent line. A word is defined as a set of strokes that overlap horizontally. The segmentation of a line into words is done using an x-axis projection of the text in the line. The valleys in the projection are noted as word boundaries and the strokes which fall between two boundaries are collected and labeled as a word. The minimum width of a valley in the projection for word segmentation was experimentally determined as n numbers pixels for the resolution of the device used. FEATURE EXTRACTION Each sample or pattern that we attempt to classify is either a word or a set of contiguous words in a line. It is helpful to study the general properties of each of the six scripts for feature extraction. 1. Arabic: Arabic is written from right to left within a line and the lines are written from top to bottom. A typical Arabic character contains a relatively long main stroke which is drawn from right to left, along with one to three dots. The character set contains three long vowels. Short markings (diacritics) may be added to the main character to indicate short vowels. Due to these diacritical marks and the dots in the script, the length of the strokes vary considerably. 2. Cyrillic: Cyrillic script looks very similar to the cursive Roman script. The most distinctive features of Cyrillic script, compared to Roman script are: 1) individual characters, connected together in a word, form one long stroke, and 2) the absence of delayed strokes. Delayed strokes cause movement of the pen in the direction opposite to the regular writing direction. 3. Devnagari: The most important characteristic of Devnagari script is the horizontal line present at the top of each word, called “Shirorekha”. These lines are usually drawn after the word is written and hence are similar to delayed strokes in Roman script. The words are written from left to right in a line. 4. Han: Characters of Han script are composed of multiple short strokes. The strokes are usually drawn from top to bottom and left to right within a character. The direction of writing of words in a line is either left to right or top to bottom. The database used in this study contains Han script text of the former type (horizontal text lines). 5. Hebrew: Words in a line of Hebrew script are written from right to left and, hence, the script is temporally similar to Arabic. The most distinguishing factor of Hebrew from Arabic is that the strokes are more uniform in length in the former. 6. Roman: Roman script has the same writing direction as Cyrillic, Devnagari, and Han scripts. We have already noted the distinguishing features of these scripts compared to the Roman script. In addition, the length of the strokes tends to fall between that of Devnagari and Cyrillic scripts. The features are extracted either from the individual strokes or from a collection of strokes. Here, we describe the features and their method of computation.
  • 4. Horizontal Interstroke Direction (HID) This is the sum of the horizontal directions between the starting points ofconsecutive strokes in the pattern. The feature essentially captures the writing direction within a line. Shirorekha Strength This feature measures the strength of the horizontal line component in the pattern. Average Stroke length Average length of individual strokes in the pattern. Shirorekha Confidence We compute a confidence measure for a stroke being a Shirorekha Each stroke in the pattern is inspected for three properties width of a word, always occur at the top of the word, and are horizontal. Recognition The above mentioned statistics are calculated and the classifier is used to find which script is used in the document. In this project the classifier used is k-Nearest Neighbor classifier. It finds the distance between two feature vectors was computed using the Euclidean distance metric. The transformation was applied to every feature to make their mean zero, and variance unity, in the training set. This transformation is referred to as normalization. With help of this training set database, the experiments are performed and this classifier is invoked to recognize the script. System Analysis System analysis can be defined, as a method that is determined to use the resources, machine in the best manner and perform tasks to meet the information needs of an organization. System Description It is also a management technique that helps us in designing a new systems or improving an existing system. The four basic elements in the system analysis are • Output • Input • Files • Process The above-mentioned are mentioned are the four basis of the System Analysis. System Design Design is concerned with identifying software components specifying relationships among components. Specifying software structure and providing blue print for the document phase. Modularity is one of the desirable properties of large systems. It implies that the system is divided into several parts. In such a manner, the interaction between parts is minimal clearly specified. Design will explain software components in detail. This will help the implementation of the system. Moreover, this will guide the further changes in the system to satisfy the future requirements. Form Design Form is a tool with a message; it is the physical carrier of data or information. The user interface form gives option for the user to select the language and user can write a new document, can open an existing document to view or can able to delete. If new document written it can be saved into the database.
  • 5. HARDWARE & SOFTWARE SPECIFICATION Processor : Intel Processor IV RAM : 128 MB Hard disk : 20 GB CD drive : 40 x Samsung Floppy drive : 1.44 MB Monitor : 15’ color Monitor Keyboard : 108 mercury keyboard Mouse : Logitech mouse Software Specification Operating System – Windows XP/2000 Language used – Java ONLINE HANDWRITTEN SCRIPT RECOGNITION System Flow Chart