SlideShare a Scribd company logo
1 of 81
OPTICAL CHARACTER RECOGNIZATION
STAGE 1
1.INTRODUCTION:
In the running world, there is going demand for the software systemto recognize characters
in computer system when information is scanned through paper documents as we know
that we have number of newspapers and books which are in printed format related to
different subjects. These days there is a huge demand in “storing the information available
in these paper documents in to a computer storage disk and then later reusing this
information be searching process”. One simply way to store information in these paper
document in to computer systemis to first scan the documents and then store them as
IMAGES. But to reuse this information it is very difficult to read the individual contents and
searching the contents from these documents line-by-line and word-by-word. The reason
for this difficulty is the font characteristics of the character in paper document are different
to font of the character in computer system. As a result , computer is unable to recognize
the characters while reading them. This concept of storing the contents of paper documents
in computer storage place and then reading and searching the content is called DOCUMENT
PROCESSING . Sometimes in this document processing we need to process the information
that is related to languages other than the English in the world. For this document
processing we need a software systemcalled CHARACTER RECOGNITION SYSTEM. This
process is also called DOCUMENT IMAGE ANALYSIS(DIA).
Thus our need is to develop character recognition software system to perform Document
image Analysis which transforms documents in paper format to electronic format. For this
process there are various techniques in the world. Among all those techniques we have
chosen OCR as main fundamental techniques to recognize characters. The conversion of
paper documents in to electronic format is an on-going task in many of the organizations
particularly in Research and Development area, in large business enterprises, in government
institutions, so on. From our problem statement we can introduce the necessity of OCR in
mobile electronic devices such as cell phones, digital cameras to acquire images and
recognize them as a part of face recognition and validation.
To effectively use OCR for character recognition in-order to perform Document Image
Analysis(DIA), we are using the information in Grid format. This system is thus effective and
useful in virtual Digital Library’s design and construction.
1.1 PURPOSE:
The main purpose of OCR systembased on a grid infrastructure is to perform Document
image analysis, document processing of electronic document formats more effectively and
efficiently. This improves the accuracy of recognizing the characters during document
processing compared various existing available character recognition methods. Here OCR
technique derives the meaning of the characters , their font properties from their bit-
mapped images.
 The primary objective is to speed up the process of character recognition in document
processing. As a result the systemcan process huge number of documents with-in less
time and hence saves the time.
 Since our character recognition is based on a grid infrastructure, it aims to recognize
multiple heterogeneous characters that belong to different universal languages with
different font properties and alignments.
1.2 PROJECT SCOPE:
The scope of our product OCR on a grid infrastructure is to provide an efficient and
enhanced software tool for the users to perform
A) Abstract:
1. Optical Character Recognition software written in Java
2. …uses KNN classifier
3. …uses Open-CV image processing modules.
4. OCR is the recognition of printed or written text characters by a computer.
5. This involves scanning of the text char-by-char, analysis of the scanned-in
image, and then translation of the char image to char codes.
B)Existing System:
In the running world there is a growing demand for the users to convert the printed
documents in to electronic documents for maintaining the security of their data. Hence the
basic OCR systemwas invented to convert the data available on papers in to computer
process able documents, So that the documents can be editable and reusable. The existing
system/the previous system of OCR on a grid infrastructure is just OCR without grid
functionality. That is the existing systemdeals with the homogeneous character recognition
or character recognition of single languages.
I. Existing System Dis-advantages:
The drawback in the early OCR systemis that they only capability to convert and recognize
only the documents of English are specific language only. That is, the older OCR System uni-
lingual.
Character must be hand-printed with separate character in boxes.
C) Proposedsystem:
 Proposed system makes use of the camera
 A scanner is not mandatory to scan the document
 Built-in camera of the laptop can be used to capture images.
 Camera can also be attached externally to a desktop /laptop computer.
 Works equally well with scanned documents.
 Proposed system works even with Hand written documents.
 Works equally well with printed documents/ brochures/ books.
I. Architecture of the Proposed System:
The Architecture of the optical character recognition System on a grid infrastructure
consists of the main components. They are
 Scanner
 OCR Hardware or Software
 Output Interfaces.
II. Advantages of Proposed System:
 The benefit of proposed systemthat overcomes the drawback of the existing
system is that is supports multiple functionalities such as ending and searching.
 If also adds benefits by providing heterogeneous character recognition.
C) INTENDED AUDIENCE AND READING SUGGESTIONS
In this section, we identify the audience who are interested with the product and are
involved in the implementation of the product either directly or indirectly. As from our
research, the OCR systemis mainly useful in R&D at various scientific organizations, in
governmental institutes and in large business organizations, we identify the following as
various interested audience in implementing OCR system:-
 The scientists, the research scholars and the research fellows in telecommunication
institutions are interested in using OCR systemfor processing the word document
that contains base paper for their research.
 The Librarian to manage the information contents of the older books in building
virtual digital library requires use of OCR system.
 Various sites that vendor e-books have a huge requirement of this OCR systemin-
order to scan all the books in to electronic format and thus make money. The
Amazon book world is largely using this concept to build their digital libraries.
Now we present the reading suggestions for the users or clients through which the user
can better understand the various phases of the product. These suggestions may be
effective and useful for the beginners of the product rather than the regular users such as
research scholars, librarians and administrators of various web-sites. With these
suggestions, the user need not waste his time in scrolling the documents up and down,
browsing through the web, visiting libraries in search of different books and …The following
are the various reading suggestions that the user can follow in-order to completely
understand about our product and to save time:-
 It would help you if you start with Wikipedia.com. It lets you know the basic concept
of every keyword you require. First learn from it what is OCR? And how does it work
based on a Grid infrastructure?
 Now you can proceed your further reading with the introduction of our product we
provided in our documentation. From these two steps you completely get an in-
depth idea of the use of our product and several processes involved in it.
 The more you need is the implementation of the product. For this you can visit Free
OCR.comwhere you can view how the sample OCR works and you can try it.
STAGE 2:
SOFTWARE REQUIREMENT ANALYSIS
2.1 information gathering
PROBLEM STATEMENT:
The problem here is for the software systems to recognize characters in computer system
when information is scanned through paper documents as we know that we have number
of newspapers and books which are in printed format related to different subjects.
Whenever we scan the documents through the scanner, the documents are stored as
images such as jpeg, gif etc., in the computer system. These images cannot be read or
edited by the user. But to reuse this information it is very difficult to read the individual
contents and searching the contents form these documents line-by-line and word-by-word.
These days there is a huge demand in “storing the information available in these paper
documents in to a computer storage disk and then later editing or reusing this information
by searching process”.
2.2 MODULES AND THEIR FUNCTIONALITIES
Our software systemOptical Character Recognition on a grid infrastructure can be
divided into five modules based on its functionality.The modules classified are as follows:-
 Document Processing Module
 System Training Module.
 Document Recognition Module.
 Document Editing Module and
 Document Searching Module.
2.2.1 DOCUMENT PROCESSING MODULE
This module is accessed by administrator whose role in our application is a librarian.This
module perform certain activities such as scanning documents, storing them as images,
recognizing characters in images to transfer them into word format. During the recognition
process, this module uses the OCR methodology in support of grid infrastructure data
structure. The module supports the following services:-
 Scanning printed documents.
 Storing the documents as snapshots or images.
 Processing those image-based documents.
 Converting these image-based documents into e-documents(also called structured
documents).
 Recognizing the characters in documents.
 Generating grid infrastructure data structure.
2.2.2 SYSTEM TRAINING MODULE
This module can be accessed by both the administrator and the end-user. Before
converting the printed documents in to editable and searchable documents, the first and
the mandatory step is providing training to the system. Here training in the sense the font
followed in the scanned document should be identified by the user. Then the user types all
the characters that are required for recognition from the scanned document as an image
file. This image file should be provided as an input during the training process. The user then
clicks the train button provided in the recognition module. Then the training gets
completed. Thus the system gets familiar with the new font. This module supports:-
 Training the systemwith the pre-defined fonts.
 Training the systemwith the new fonts that are not present in the system and that
cannot be identified by the system.
2.2.3 DOCUMENT RECOGNITION MODULE
This module can be accessed by both the administrator and the end-user. Once the
printed documents are converted into structured documents, any user can recognize the
characters present in the document. That means the user can recognize the characters of
any language he chooses which makes OCR more flexible. This flexibility is due to the
adaptation of grid infrastructure. This is the module where the main functionality of OCR is
tested.
Under this module, there are two types of recognition. They are handwritten recognition
and scanned document recognition.
In handwritten recognition, the handwriting of the user in any language is trained to the
system only for the first time. From there on-wards, the system recognizes the characters or
words written by the user. Thus handwritten document recognition recognizes the human
handwriting.
In scanned document recognition, the systemis first trained with the font characters in the
document in the training module itself. Now in the recognition module, the system takes
the scanned documents image as an input file, first crops the image and then
extracts/recognizes the characters from the document and makes these documents editable
and searchable. Thus the scanned document recognition recognizes the characters from the
scanned document image and makes the document editable and searchable. Hence the
document recognition module on a whole supports the following services:-
 Converts the document into specific format
 Recognizes the characters
 Heterogeneous character Recognition
2.2.4 DOCUMENT EDITING MODULE
This module can be accessed by both the administrator and the end-user during
document editing to implement the character recognition process. Once the scanned
documents are stored, they reside in computer memory. This data resides in the form of an
image that is just view able in an image viewer. Hence, the document is first converted into
a form such that it is editable. The desired form of the document may be MS-Word,Text,… as
specified by the user.The objective of this module is to let the user perform :-
 Addition of specific content to the documents
 Deletion of certain content from documents
 Any other modification of documents.
2.2.5 DOCUMENT SEARCHING MODULE
This module can be accessed by both the administrator and the end-user during the
search of the user required document to implement the character recogniiton process on it.
The user requests the systemto search for a particular document. Then the system finds the
documents based on OCR methodology and returns the result of the search to the user.The
main objective of this module is :-
 To facilitate the search function.
2.3 SOFTWAREAND HARDWAREREQUIREMENTS
2.3.1 SOFTWARE REQUIREMENTS SPECIFICATION
 Operating System : Windows-XP
 Programming Language : Core Java
 User Interface : Swings
2.3.2 HARDWARE REQUIREMENTS SPECIFICATION
 Processor : Pentium IV processor or higher
 RAM : Minimum of 512 MB RAM
 Memory : 500 MB or higher
STAGE 3:
Project Analysis
LITERATURE SURVEY /REVIEW OF LITERATURE:
INTRODUCTION
A feasibility study is a high-level capsule version of the entire System analysis and Design
Process. The study begins by classifying the problem definition. Feasibility is to determine if
it’ s worth doing. Once an acceptance problem definition has been generated, the analyst
develops a logical model of the system. A search for alternatives is analyzed carefully. There
are 3 parts in feasibility study.
3.1 FEASIBILITY STUDY:
A feasibility study is a high- level capsule version of the entire system analysis and
design process. The study begins by classifying the problem definition. Feasibility is to
determine if it’s worth doing. Once an acceptance problem definition has been generated,
the analyst develops a logical model of the system. A search for alternatives is analyzed
carefully.
There are 3 parts of feasibility study.
1. Technical feasibility
2. Operational feasibility
3. Economical feasibility
1. Technical Feasibility:
Evaluating the technical feasibility is the trickiest part of a feasibility study. This is
because, at the point in time,not too many detailed design of the system,making if difficult
to access issues like performances, costs on etc. A number of issues have to be considered
while doing a technical analysis. Understand the different technologies involved in the
proposed systembefore commencing the we have to be very clear about what are the
technologies that are to be required for the development of the new system. Find out
whether the recognition currently process the required technologies. Is the required
technologies available with the organization?.
2. Operational Feasibility:
Proposed project is beneficial only if it can be turned into information systemthat will
meet the organization operating requirements. Simply stated, this test of feasibility asks if
the system will work when it is developed and installed. Are there major barriers to
Implementation? Here are questions that will help test the operational feasibility of a
project.
 Is there sufficient support for the project from management from users? If the
current systemis well liked and used to the extent that person will
 Are the current business methods acceptable to the user? If they are not. Users may
welcome a change that will bring about a more operational and useful systems.
 Have the user been in the planning and development of the project?
 Early involvement reduces the chances of resistance to the systemand in general
and increase the successful project.
Since the proposed systemwas to help reduces the hardship encountered. In the
existing manual system, the new systemwas considered to be operational feasible.
3.Economic Feasibility:
Economic feasibility attempts to weigh the costs of developing and implementing a
new system,against the benefits that would having the new system in place. This feasibility
study gives the top management the economic justification for the new system. A simple
economic analysis which gives the actual comparison of costs and benefits are much more
meaningful in this case. In addition,this proves to be a useful point of reference to compare
actual costs as the project progress. There could be various types of intangible benefits on
account of automation. These could include increased customer satisfaction,improvement
in product quality better decision making timeliness of information,expediting activities,
improved accuracy of operation, better documentation and record keeping,faster retrieval
of information,better employee morale.
3.4 TRAINING
Training is a very important process of working with a neural network. As seen from neural
networks, there are two forms of training that can be employed with a neural network. They
are namely:-
1. Un-Supervised Training
2. Supervised Training
Supervised training provides the neural network with training sets and the anticipated
output. Unsupervised training supplies the neural network with training sets, but there is
no anticipated output provided
3.4.1 UNSUPERVISED TRAINING:
Unsupervised training is a very common training technique for Kohonen neural networks.
We will discuss how to construct a Kohonen neural network and the general process for
training without supervision.
What is meant by training without supervision is that the neural network is provided with
training sets, which are collections of defined input values. But the unsupervised neural
network is not provided with anticipated outputs.
Unsupervised training is usually used in a classification neural network. A classification
neural network takes input patterns, which are presented to the input neurons. These input
patterns are then processed, and one single neuron on the output layer fires. This firing
neuron can be thought of as the classification of which group the neural input pattern
belonged to. Handwriting recognition is a good application of a classification neural
network.
The input patterns presented to the Kohonen neural network are the dot image of the
character that was hand written. We may then have 26 output neurons, which correspond
to the 26 letters of the English alphabet. The Kohonen neural network should classify the
input pattern into one of the 26 input patterns.
During the training process the Kohonen neural network in handwritten recognition is
presented with 26 input patterns. The network is configured to also have 26 output
patterns. As the Kohonen neural network is trained the weights should be adjusted so that
the input patterns are classified into the 26 output neurons. This technique results in a
relatively effective method for character recognition.
Another common application for unsupervised training is data mining. In this case you
have a large amount of data, but you do not often know exactly what you are looking for.
You want the neural network to classify this data into several groups. You do not want to
dictate, ahead of time, to the neural network which input pattern should be classified to
which group. As the neural network trains the input patterns will fall into similar groups.
This will allow you to see which input patterns were in common groups.
3.4.2 SUPERVISED TRAINING:
The supervised training method is similar to the unsupervised training method in that
training sets are provided. Just as with unsupervised training these training sets specify
input signals to the neural network.
The primary difference between supervised and unsupervised training is that in
supervised training the expected outputs are provided. This allows the supervised training
algorithm to adjust the weight matrix based on the difference between the anticipated
output of the neural network, and the actual output.
There are several popular training algorithms that make use of supervised training. One
of the most common is the back-propagation algorithm. It is also possible to use an
algorithm such as simulated annealing or a genetic algorithm to implement supervised
training.
3.5 INTRODUCING KOHONEN NEURAL NETWORK
The Kohonen neural network differs considerably from the feed-forward back propagation
neural network. The Kohonen neural network differs both in how it is trained and how it
recalls a pattern. The Kohonen neural network does not use any sort of activation function.
Further, the Kohonen neural network does not use any sort of a bias weight.
Output from the Kohonen neural network does not consist of the output of several
neurons. When a pattern is presented to a Kohonen network one of the output neurons is
selected as a "winner". This "winning" neuron is the output from the Kohonen network.
Often these "winning" neurons represent groups in the data that is presented to the
Kohonen network. For example, in an OCR program that uses 26 output neurons, the 26
output neurons map the input patterns into the 26 letters of the Latin alphabet.
The most significant difference between the Kohonen neural network and the feed
forward back propagation neural network is that the Kohonen network trained in an
unsupervised mode. This means that the Kohonen network is presented with data, but the
correct output that corresponds to that data is not specified. Using the Kohonen network
this data can be classified into groups. We will begin our review of the Kohonen network by
examining the training process.
It is also important to understand the limitations of the Kohonen neural network. Neural
networks with only two layers can only be applied to linearly separable problems. This is the
case with the Kohonen neural network. Kohonen neural networks are used because they
are a relatively simple network to construct that can be trained very rapidly.
A "feed forward" neural network is similar to the types of neural networks that we are
ready examined. Just like many other neural network types the feed forward neural
network begins with an input layer. This input layer must be connected to a hidden layer.
This hidden layer can then be connected to another hidden layer or directly to the output
layer. There can be any number of hidden layers so long as at least one hidden layer is
provided. In common use most neural networks will have only one hidden layer. It is very
rare for a neural network to have more than two hidden layers. We will now examine, in
detail, and the structure of a "feed forward neural network".
The Structure of a Feed Forward Neural Network
A "feed forward" neural network differs from the neural networks previously examined.
Figure 3.1 shows a typical feed forward neural network with a single hidden layer.
Figure.3.1
The Input Layer
The input layer to the neural network is the conduct through which the external
environment presents a pattern to the neural network. Once a pattern is presented to the
input layer of the neural network the output layer will produce another pattern. In essence
this is all the neural network does. The input layer should represent the condition for which
we are training the neural network for. Every input neuron should represent some
independent variable that has an influence over the output of the neural network.
It is important to remember that the inputs to the neural network are floating point
numbers. These values are expressed as the primitive Java data type "double". This is not to
say that you can only process numeric data with the neural network. If you wish to process a
form of data that is non-numeric you must develop a process that normalizes this data to a
numeric representation.
The Output Layer
The output layer of the neural network is what actually presents a pattern to the external
environment. Whatever patter is presented by the output layer can be directly traced back
to the input layer. The number of a output neurons should directly related to the type of
work that the neural network is to perform.
To consider the number of neurons to use in your output layer you must consider the
intended use of the neural network. If the neural network is to be used to classify items into
groups, then it is often preferable to have one output neurons for each groups that the item
is to be assigned into. If the neural network is to perform noise reduction on a signal then it
is likely that the number of input neurons will match the number of output neurons. In this
sort of neural network you would one day he would want the patterns to leave the neural
network in the same format as they entered.
For a specific example of how to choose the numbers of input and output neurons consider
a program that is used for optical character recognition, or OCR. To determine the number
of neurons used for the OCR example we will first consider the input layer. The number of
input neurons that we will use is the number of pixels that might represent any given
character. Characters processed by this program are normalized to universal size that is
represented by a 5x7 grid. A 5x7 grid contains a total of 35 pixels. The optical character
recognition program therefore has 35 input neurons.
The number of output neurons used by the OCR program will vary depending on how many
characters the program has been trained for. The default training file that is provided with
the optical character recognition program is trained to recognize 26 characters. As a result
using this file the neural network would have 26 output neurons. Presenting a pattern to the
input neurons will fire the appropriate output neuron that corresponds to the letter that the
input pattern corresponds to.
3.5.1 HOWA KOHONEN NETWORKRECOGNIZES
We will now show you how the Kohonen neural network recognizes a pattern. We will
begin by examining the structure of the Kohonen neural network. Once you understand the
structure of the Kohonen neural network, and how it recognizes patterns, you will be shown
how to train the Kohonen neural network to properly recognize the patterns you desire. We
will begin by examining the structure of the Kohonen neural network.
3.5.2 THE STRUCTURE OF THE KOHONEN NEURAL NETWORK
The Kohonen neural network works differently than the feed forward neural network.
The Kohonen neural network contains only an input and output layer of neurons. There is
no hidden layer in a Kohonen neural network. First we will examine the input and output to
a Kohonen neural network.
The input to a Kohonen neural network is given to the neural network using the input
neurons. These input neurons are each given the floating point numbers that make up the
input pattern to the network. A Kohonen neural network requires that these inputs be
normalized to the range between -1 and 1. Presenting an input pattern to the network will
cause a reaction from the output neurons.
The output of a Kohonen neural network is very different from the output of a feed
forward neural network. If we had a neural network with five output neurons we would be
given an output that consisted of five values. This is not the case with the Kohonen neural
network. In a Kohonen neural network only one of the output neurons actually produces a
value. Additionally, this single value is either true or false. When the pattern is presented to
the Kohonen neural network, one single output neuron is chosen as the output neuron.
Therefore, the output from the Kohonen neural network is usually the index of the neuron
(i.e. Neuron #5) that fired. The structure of a typical Kohonen neural network is shown in
Figure 3.2
Figure.3.2
Now that you understand the structure of the Kohonen neural network we will examine
how the network processes information.
3.5.3 HOWA KOHONEN NETWORKLEARNS
In this section you will learn to train a Kohonen neural network. There are several steps
involved in this training process. Overall the process for training a Kohonen neural network
involves stepping through several epochs until the error of the Kohonen neural network is
below acceptable level. In this section we will learn these individual processes.
We will also learn how to calculate the error rate for Kohonen neural network, you'll
learn how to adjust the weights for each epoch. You will also learn to determine when no
more epochs are necessary to further train the neural network.
The training process for the Kohonen neural network is competitive. For each training set
one neuron will "win". This winning neuron will have its weight adjusted so that it will react
even more strongly to the input the next time. As different neurons win for different
patterns, their ability to recognize that particular pattern will be increased.
We will first examine the overall process involving training the Kohonen neural network.
These individual steps are summarized in figure 3.3
Figure.3.3
STAGE 4:
4.SOFTWARE DESIGN
4.1 Data Flow Diagram
The DFD is also called as bubble chart. A data flow diagram(DFD) is a graphical
representation of the”flow”of the data through information system. DFD’s can also be used
for the visualization of data processing. The flow of data in our systemcan be described in
the form of data flow diagram as follows:-
1. First,if the user is administrator he can initialize the following actions:-
 Documentation processing
 Document search
 Document editing
All the above action come under 2 cases. They described as follows:-
a) If the printed document is a new document that is not yet read into the system,then the
document processing phase reads the scanned document as image only and then produces
the document image stored in computer memory as a result. Now the document processing
phase has the document at its hand and can read the document at any point of time. Later
the document processing phase proceeds with recognizing the document using OCR
methodology and the grid infrastructures. Thus it produces the document with the
recognized character as final output which can be later searched and edited by the end-user
or administrator.
b)If the printed document is already scanned in and is held in systemmemory,then the
document processing phase proceeds with document recognition using OCR methodology
and grid infrastructure. And thus it finally produces the document with recognized
document as output.
2. If the user using the OCR systemis the end-user,then he can perform the following
actions:-
 Document searching
 Document editing
1. Document searching:- The document which are recognized can be searched by the user
whenever required by requesting the system database.
2. Document Editing:- The recognized document can be edited by adding the specific
content to the document,deleting specific content from the document and modifying the
document.
4.1 Design Flow Diagram
4.2 UML Diagrams:
UML combines best techniques from data modeling (ER Diagrams),business
modeling(work flows),object modeling,and component modeling. It can be used
with all process, throughout the software development life cycle,and across
different implementation technologies. UML has 14 types of diagram divided into 2
categories. 7 diagrams types represent structural information,and the other seven
represent general types of behavior,including four that represent different aspects
of interactions. Some of these diagrams we provided to describe the design and
implementation of our system can be categorized hierarchically as below:-
1.Use case diagram
2.Class diagrams
3.Sequence diagram
4.Collaboration diagram
5.Activity diagram
6.Component diagrams
7.Deployment diagrams
4.2.1 USE-CASEDIAGRAMS:
Our software systemcan be used to support library environment to create a Digital
library where several paper documents are converted into electronic-form for accessing
by the users. For this purpose the printed document must be recognized before they are
converted into electronic-form. The resulting electronic-documents are accessed by the
users like faculty and students for reading and coding. Now according to this
information,the following are the different actors involved in implementing our OCR
systems.
 If we consider for virtual digital library,the Administrators can be Librarian and the
End-user can be students or/and faculty.
 The following are the list of use diagrams that altogether form the complete or the
overall use-case diagrams. They are listed below:-
1. Use-case diagramfor document processing
2. Use-case diagramfor neural network training
3. Use-case diagramfor document recognition
4. Use-case diagramfor document editing
5. Use-case diagramfor document searching
In each use-case diagrambelow we clearly explained about that particular use-case
functionalities. In this we provided a description about the
 Use-case name
 Details about the use-case
 Actors using this use-case
 The flow of events carried out by use-case
 The condition that in this use-case
5.2Use Case Diagram For Document Processing
Use Case Name
Document processing
Description
The administrator is the only person who participates in the document processing.
Here he scans the documents. The scanned documents are read as images. Finally the
read images are stored in the system memory.
Actors
 Primary Actor : Administrator
 Secondary Actor : User
Scans documents
read images
stores the images
Administrator
Flow of Events
1. The Administrator scans the document which he wants to edit.
2. The scanned documents are read as images.
3. Finally the images that are stored in systemmemory for the future reference.
4.3 Case Diagram For Neural Network Training
Use case Name
Neural Network Training
Description
The Administration or End-user enters the specific characters required for training.
User stores them as image file and trains the system.
Actors
 Primary Actor : Administration or End-user
 Secondary Actor : User
Flow of Events
1. The user the specific characters in order to train the system
2. After entering it is stored as image file
3. Finally trains the system according to the system .
Pre-Condition:
Enters specific characters
Stores them as image file
Trains the system
Administrator or
end-user
The font in the scanned document should be identified.
4.4Use-case diagram for document editing
Use-Case Name
Document editing
Description
Both administration and end-user can perform the document editing. The user opens
the document in the editor and selects the edit action etc., edit,modify delete etc. After
selecting the edit action ending operation performed and finally stores the document
that had been edited.
Actors
 Primary Actor : Administrator or End-User
 Secondary Actor : User
Flow of Events
1. The administrator or End-User opens the document which he want to edit.
Open document in editor
Select Edit action
Performs editing
Stores edited document
Administrator or
End-user
2. He selects the edit action. The action consists of editing the document,modifying the
document,deleting,deleting the document etc.
3. After selecting the edit action the editing operation is performed.
4. Finally the edited document is stored in the systemmemory.
Pre-Condition
The input to be taken for editing should be an image of the document that is converted
in to word of text file. That is the input file must be either .doc file or .txt file only.
Post-Condition
Finally after editing the document there are specific target formats defined by the user.
The document should be saved in that format only. That will be the output of the editor.
That is,as per our design the final document after editing must be saved in .doc orb.txt
file only.
4.5Use - Case Diagram For Document Recognition
Use case Name
Document Recognition
Description
The Administration or End-user trains the system according to the given symbols or
alphabets. Then the characters are recognized after the systemis trained.
Actors
 Primary Actor : Administrator or End-user
 Secondary Actor : User
Trains System
Recognize characters
Administrator or
end-user
Flow of Events
1. The user trains the systemto recognize the characters.
2. After the system is trained the characters are recognized.
Pre-Condition
Before trying to recognize the characters,the system should be trained first with the
font characteristics the font size.
4.6 Use-Case Diagram For Document Searching
Use case Name
Document Searching
Description
The Administrator or End-user opens the document in editor. He enters the word which
he is looking for in that document. Then he searches the word.
Actors
 Primary Actor : Administrator or End-user
 Secondary Actor : User
Flow of Events
1. The user opens the document for searching a word he required
2. After opening the document he enters the word for search
Opens document in Editor
Enters word for search
searches the word
Administrator or
end-user
3. Finally searches the word in that document.
Pre and Post Conditions
No Pre-Condition and post-condition.
4.7 Overall Use-Case Diagram
4.2.2. CLASS DIAGRAMS
The class diagrams is the main building block in object oriented modeling. The classes in
a class diagramrepresent both the main objects and or interactions in the application and
the objects to be programmed.
 The class diagramof our OCR systemconsists of 9classes. They are
end-user1
end-user2
Document modification Document deletion
Document recognition
scan documents
store documents
Document processing
<<includes>>
<<includes>>
Document processing
Document editing
administrator
Trains the system
end-user
1. Main Screen
2. Editor
3. Help-frame
4. Document
5. HEntry
6. Entry
7. Training Set
8. Kohonen-Network
9. Printed-Frame
Among all these classes the MainScreen is the main class that represent all the major
functions carried out by our OCR system. The MainScreen class has an association with five
classes viz…..,Editor, HelpFrame, Document, TrainingSet, PrintedFrame. And the TrainingSet
class in-turn has an association with the HEntry and the KohonenNetwork classes. The
PrintedFrame has an association with the Entry and KohonenNetwork classes.
4.8 Class Diagram
SEQUENCE DIAGRAMS
Sequence diagram are sometimes called Event-trace diagram,event scenarios ,and timing
diagrams. A sequence diagram shows,as parallel vertical lines,different processes or objects
that live simultaneously,and,as horizontal arrows,the messages exchanged between them,in
the order in which they occur. This allows the specifications of simple runtime scenarios in a
graphical manner.
In sequence diagram,the class object that are used to describe the interaction between
various classes vary from one function to another function. There are five sequence
diagrams short-listed below for presenting the sequence of actions performed by each of
the five modules. The key class object involved in all of these module functions is
MainScreen class which controls the interaction among various class objects.
Sequence Diagramfor Document Processing
1.Objects:
Administrator-”a”
MainScreen-”m”
Document-”d”
SystemMemory-”s”
2.Links:
a) Administrator object to MainScreen object
b) MainScreen object to Document object
c) Document object to SystemMemory object
d) SystemMemory object to Administrator object.
3. Messages:
a) Process document
b) Scan document
c) Scans
d) Stores document
e) Stores
f) Return the processed document
4.9 Sequence Diagram For Processing
Sequence Diagramfor SystemTraining:
1.Objects:
Administrator-”a”
System-”s”
Training-”t”
2.Links:
a) Administrator object to System object
b) System object to TrainingSet object
c) Training object to System object
d) System object to Administrator object
3. Messages:
a) Specifies the font character
a:Administraror m:MainSreen d:Document s:SystemMemory
1.Process documents
2.Scan documents
3.Scans
4.Stores documents
5.Stores
6.Returns the processed documents
b) Stores it as an image
c) Trains the system with new font
d) System recognize new font and return for user.
5.10 Sequence Diagram For Training
Sequences Diagram for Document Recognition:
1.Objects
Administrator-”a”
MainScreen-”m”
SystemMemory-”s”
Training-”t”
2.Links
a) Administrator object to MainScreen object
b) MainScreen object to SystemMemory object
c) SystemMemory object to MainScreen object
d) MainScreen object to TrainingSet object.
e) TrainingSet object to MainScreen object
f) MainScreen object to Administrator object.
a:Administrator s:System t:TrainingSet
1.Specifies the font characters
2.Stores it as an image
3.Trains the system with new font
4.System recognizes new font and returns for user
3.Messages:
a) Recognize document
b) Store processed document
c) Read file image
d) Recognize using OCR
e) Send processed document
f) Recognize the characters.
4.11 Sequence Diagram For Recognition
Sequence Diagram for Document Editing:
1.Objects:
Administrator-”a”
MainScreen-”m”
Document-”d”
SystemMemory-”s”
2.Links:
1. Administrator object to MainScreen object
a:Administrator m:MainScreen s:SystemMemory t:TrainingSet
1:Recognise documents
2.Store processed document
3.Read file image
4.Recognise using ocr
5.Send processed document
6.Recognise the characters
2. MainScreen object to Document object
3. MainScreen object to Document object
4. MainScreen object to Document object
5. Document object to SystemMemory objects
6. SystemMemory object to Administrator object.
3.Message:
a) Edit document
b) Adding document
c) Adds
d) Deleting document
e) Deletes
f) Modifying documents
g) Modifiers
h) Stores the edited document
i) Administrator accesses the edited document.
4.12 Sequence Diagram For Editing
Sequence Diagrams for Document Searching
1.Object:
Administrator-”a”
MainScreen-”m”
Document-”d”
2.Links:
a) Administrator object to MainScreen object
b) MainScreen object to Document object
c) Document object to Administrator object.
3.Messages:
a:Administrator m:MainScreen d.Document s:SystemMemory
1.Edit document
2.Adding document
3.adds
4.Deleting content
5.Deletes
7.Modifies
8.Stores the edited documents
9.Administrator accesses the edited documents
6.Modifing content
a. Specifies the word
b. Searches the word
c. Searches
d. Returns the location of the word.
4.13 Sequence Diagram For Searching
4.2.4 COLLABORATION DIAGRAM
A Collaboration diagram also known as Communication diagram models the interactions
between objects or parts in terms of sequenced messages. Communication diagrams show
which elements each one interacts with better, but sequence diagrams show the order in
which the interactions take place more clearly.
The collaboration diagram is same as sequence diagram in its function implementation
but the presentation or structure of the classes differs. The class objects that are used to
define the relationships between the classes are same as that of the sequence diagram.
Hence collaboration diagramalso has five collaboration diagrams describing each of the five
modules. In this diagram also the MainScreen acts as the key class object.
a:Administrator m:MainScreen d:Document
1.Specifies the word
2.Searches the word
4.Returns the location of the word
3.Searches
4.14 Collaboration Diagram for Document Processing
5.15 Collaboration Diagram for Neural Network Training
a:Administraror m:MainSreen
d:Documents:SystemMemory
3: 3.Scans
5: 5.Stores
1: 1.Process documents
2: 2.Scan documents
4: 4.Stores documents
6: 6.Returns the processed documents
a:Administrator s:System
t:TrainingS
et
1: 1.Specifies the font characters
4: 4.System recognizes new font and returns for user
2: 2.Stores it as an image
3: 3.Trains the system with new font
4.16 Collaboration Diagram for Document Recognition
4.17 Collaboration Diagram for Document Editing
a:Administrator m:MainScreen
s:SystemMemory
t:TrainingSet
6: 6.Recognise the characters
1: 1:Recognise documents
2: 2.Store processed document
3: 3.Read file image
4: 4.Recognise using ocr
5: 5.Send processed document
a:Administrator m:MainScreen
d.Documents:SystemMemory
3: 3.adds
5: 5.Deletes
7: 7.Modifies
1: 1.Edit document
2: 2.Adding document
4: 4.Deleting content
6: 6.Modifying content
8: 8.Stores the edited documents
9: 9.Administrator accesses the edited documents
4.18 Collaboration Diagram for Document Searching
4.2.4 ACTIVITES DIAGRAM:
The purpose of activities diagramis to provide a view of flows and what is going on inside a
use case or among several classes. Activities diagramcan also be used to represent a class’s
method implementation. A token represents an operation. An activity is shown as a round
box containing the name of the operation. An outgoing solid arrow attached to the end of
activity symbol indicates a transition triggered by the completion.
a:Administrator m:MainScreen
d:Document
3: 3.Searches
1: 1.Specifies the word
2: 2.Searches the word4: 4.Returns the location of the word
4. 19 Activity Diagram For Document Processing
Request document
processing
Process
document
Retry for
scanning
Scan
documents
Store
documents
[ scanner not ready ]
[ scanner ready ]
4.20 Activity Diagram for document Retrieval.
Request
document
Initiate search
Returns
message
Sends document to
user
Retrieves
document
[ Document exists ]
[ Document does not exist ]
4.21 Activity Diagram For Document Storage
Edit
documents
Delete document
content
[ user choses delete ]Add document
content
[ user choses add ]
Modify
document
[ user choses modify ]
Store
documents
4.2.5 COMPONENT DIAGRAM
The crucial component in our component diagram that plays a major role in
implementing the OCR systemis the GUI component. All other components that is
Document processing and recognition,Document editing and Document Searching depends
on it. They are as follows:-
 GUI Component that is used to design GUI screens for interacting with the end-user and
administrator.
 Form the GUI component other component functionalities are carried out. The
functionalities include Document processing and recognition, Document editing and
Document Searching.
GUI Document Processing
and Recognition
Editing
Searching
GUI Screens
adding,deleting,
modifying
scanning,storing
and recognising
characters
supports user
search function
4.22 Component Diagram
4.2.6 DEPLOYMENTDIAGRAM:
A deployment diagram serves to model the physical deployment of artifacts on
deployment targets. Deployment diagrams show “the allocation of Artifacts to Nodes
according to the Deployment defined between them”.
In the deployment diagram of our system,the server role is played by admin called
Librarian. Then can be N number of clients who can access the digital library data content at
a time. The clients here may be either the students or the faculty or the both.
 The actions performed by the Administrator are document processing searching and
editing where as the performed by the end-user are only document searching and
editing.
<<Server>>
<<Client1>> <<Client2>> <<ClientN>>
Document
searching,
editing
Document
searching,
editing
Document
searching,
editing
Document
Processing,
editing and
searching
4.23 Deployment Diagram
STAGE 6:
6.TESTING
The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub assemblies, assemblies and/or a finished product. It is the
process of exercising software with the intent of ensuring that the software system meets
its requirements and user expectations and does not fail in an unacceptable manner. There
are various types of test. Each test type addresses a specific testing requirement.
6.1 TYPES OF TESTS
Unit Testing
Unit testing involves the design of test cases that validate that the internal program logic
is functioning properly, and that program input produces valid outputs. All decision
branches and internal code flow should be validated. It is the testing of individual software
units of the application .it is done after the completion of an individual unit before
integration. This is a structural testing, that relies on knowledge of its construction and is
invasive. Unit tests perform basic tests at component level and test a specific business
process, application, and/or system configuration. Unit tests ensure that each unique path
of a business process performs accurately to the documented specifications and contains
clearly defined inputs and expected results.
Integration Testing
Integration tests are designed to test integrated software components to determine if
they actually run as one program. Testing is event driven and is more concerned with the
basic outcome of screens or fields. Integration tests demonstrate that although the
components were individually satisfaction, as shown by successfully unit testing, the
combination of components is correct and consistent. Integration testing is specifically
aimed at exposing the problems that arise from the combination of components.
System Testing
System testing ensures that the entire integrated software system meets requirements. It
tests a configuration to ensure known and predictable results. An example of system testing
is the configuration oriented system integration test. System testing is based on process
descriptions and flows, emphasizing Pre-driven process links and integration points.
Functional Testing
Functional tests provide a systematic demonstration that functions tested are available
as specified by the business and technical requirements, system documentation, and user
manuals.
Functional testing is centered on the following items:
Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Output : identified classes of application outputs must be exercised.
Systems/Procedures : interfacing systems or procedures must be invoked.
Organization and preparation of functional tests is focused on requirements, key
functions, or special test cases. In addition, systematic coverage pertaining to identify
business process flows, data fields, predefined processes, and successive processes must be
considered for testing. Before functional testing is complete, additional tests are identified
and the effective value of current tests is determined.
 There are two basic approaches of functional testing:
a. Black box or functional testing.
b. White box testing or structural testing.
(a) Black box testing
This method is used when knowledge of the specified function that a product has been
design to perform is known. The concept of black box is used to represent a system hose
inside working’ s are not available to inspection. In a black box the test item is eaten as
“Black”, since its logic is unknown is what goes in and what comes out, or the input and
output.
In black box testing, we try various inputs and examine the resulting outputs. The black
box testing can also be used for scenarios based test .In this test we verify whether it is
taking valid input and producing resultant out to user. It is imaginary box testing that hides
internal workings. In our project valid input is image resultant output well structured image
should be received.
Input output
Figure 6.1
(b) White box testing
White box testing is concern with testing implementation of the program. The intent of
structural testing is not to exercise all the inputs or outputs but to exercise the different
programming and data structures used in the program. Thus structure testing aims to
achieve test cases that will force the desire coverage of different structures. Two types of
path testing are:
1. Statement testing
2. Branch testing
Statement Testing
The main idea of statement testing coverage is to test every statement in the objects
method by executing it at least once. However, realistically, it is impossible to test program
on every single input, so you never can be sure that a program will not fail on some input.
Branch Testing
The main idea behind branch testing coverage is to perform enough tests to ensure that
every branch alternative has been executed at least once under some test. As in statement
testing coverage, it is unfeasible to fully test any program of considerable size.
Input Output
Figure 6.2
6.2 UNIT TESTING
Unit testing is usually conducted as part of a combined code and unit test phase of the
software life-cycle, although it is not uncommon for coding and unit testing to be conducted
as two distinct phases.
Test strategy and approach
Field testing will be performed manually and functional tests will be written in detail.
Test objectives
 All field entries must work properly.
INTERNAL
WORKING
 Pages must be activated from the identified link.
 The entry screen, messages and responses must not be delayed.
Features to be tested
 Verify that the entries are of the correct format.
 No duplicate entries should be allowed.
 All links should take the user to the correct page.
6.3 INTEGRATION TESTING
Software integration testing is the incremental integration testing of two or more
integrated software components on a single platform to produce failures caused by
interface defects.
The task of the integration test is to check that components or software applications, ex.
components in a software system or one step up software applications at the company level
- interact without error.
Test Results: All the test cases mentioned above passed successfully. No defects
encountered.
6.4 ACCEPTANCE TESTING
User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the systemmeets the functional
requirements.
Test Results: All the test cases mentioned above passed successfully. No defects
encountered.
STAGE 7:
7. SCREENSHOTS
OUTPUT SCREENS
The following shows the series of output screens and how the actual process of
implementing OCR takes place:-
 The first and the home page of our OCR systemlooks as shown in below figure provides
an interface to the user such that the user can access any module that is present in this
software in this software from this page itself. The page is as shown below:-
7.1 Main Screen
 There are two types of recognition in the document recognition module. They are
handwrtitten letter recognition and the scanned document recognition. The
implementation of the handwritten document recognition proceeds as follows:-
Firstly suppose that we have drawn a letter named ‘ A’ in the workspace provided.
Hand Written Screen1
 From the above screen we can write letters on the workspace provided with the name
“Draw Letters Here” by using mouse pointer. For recognizing these letters we have to
train the systemfirst. Else, it will give an error message depicting that the systemhas to
be trained first. This process is explained with the following screens:-
Hand Written Screen3
 Now suppose that you have clicked the “Recognize” button without training, for
recognizing the character you have written and showing the recognized character in the
grid. Then it will display an error message as shown below:-
Hand Written Screen4
 Now if we click the “Begin Training” button before proceeding with the recognition
then a status message with successful status is shown below:-
Hand Written Screen5
 Since the training has been completed, now the letter ‘ A’ can be recognized by clicking
then “Recognize” button. Then the letter ‘ A’ will appear in the grid as output. It is as
shown below:-
Hand Written Screen6
 Once we have provided training to the systemfor every session, the systemdo not need
any further training for any kind of letter in any kind of language. That is, once the
training is provided to the system for at-least one character then on wards, it will
recognize any character written in the workspace without the need of training it.
For Example, First we have written letter ‘ A’ provided training for it and recognized the
letter A. Now we have written letter S. Now without the need for the training we can
directly recognize the letter ‘ S’ in the grid by clicking the “Recognize” button. Thus we
do not need to train the systemfurther.
Hand Written Screen7
Hand Written Screen8
 Since we have provided the training to the systemonce with one character of English
language, We can now recognize the characters of any language other than English that
too without the need for training. Suppose we have written a telugu character as shown
below:-
Hand Written Screen9
 Now we can directly recognize the above telugu character without the need of training
the system. Just click the “Recognize” button once after drawing the letter in the
workspace.
Hand Written Screen 10
 Next other than providing the training to the system through the drawn letters, we can
also train the system by providing the characters through the keyboard and storing them
as patterns. Later we provide training to the system on those patterns.
Firstly, We provide the input through the keyboard as follows:-
Hand Written Screen11
 If we click ok, those letters will be saved in stored patterns workspace. Later we can click
“Begin Training” button such that those stored patterns will be trained to the system.
Else, it will provide an error message depicting that the systemneeds training.
Hand Written Screen12
 Now suppose that if we write a word ‘ sr’ and click “Recognize” button before
providing training on the above stored pattern ‘ A’ then an error message will be
displayed depicting that the systemneeds to be trained on the stored patterns as shown
below:-
Hand Written Screen13
 Now click the “Begin Training” button before you attempt to recognize the drawn word
‘ sr’ . Then it produces an output screen as shown below indicating that the training has
been completed:-
Hand Written Screen14
 Now if we click the “Recognize” button then the drawn word ‘ sr’ is recognized and is
shown as an output in the grid format by firing the last neuron in stored patterns.
Hand written screen15
 Since we have provided training on the stored patterns once, from now onwards we can
just draw the characters or words of any language and we can recognize them directly
by clicking the ”Recognize” button without the need for training the system again. An
example is shown for a telugu word.
Hand written screen 16
 Process Explaining Scanned Document Recognition
Firstly, When we click the “Scanned Document Recognition Button” the main page of
this recognition module is displayed as follows:-
Hand Written screen17
 The data that is present in the first text box is the default image file set by the user. The
user can change the input image file rather than the default image file by clicking open
and then selecting an image file. The procedure is as shown below:-
Hand written screen18
Hand Written Screen19
 There are two main tabs under the scanned document recognition. They are training
and recognition. First we should train the system under training module. Only then we
can recognize the characters from the input image provided using the recognition
module. The training tab under scanned document recognition looks like this.
Hand written screen20
 The above figure shows the default input image for training. We can change the training
input image for different fonts by opening different input image files and then training
them such that the system gets adapted to the new fonts.
Hand written screen21
 The choice of opening image file changes the default input image for training in to a new
image as shown below.
Hand written screen22
 Now the user can select the bounds up to which the systemmust be trained just by
using click and drag actions of the mouse. Then selected data highlights as follows:-
Hand written screen23
After selection of the data, just click the “Train” button. This lets the systemto train itself
with the help of the kohonen network and finally displays a dialog box depicting that the
training has been completed successfully.
Hand written screen24
 Once the training of the systemis completed, we move on to the recognition phase
where we open a new scanned image file to be converted into editable document as an
input as per our requirement. Now we select that part of the image from which the data
has to be extracted. Then it looks like:-
Hand written screen 25
 Next click the “Crop” button such that it finds the bounds of the text that is selected by
the user by composing a red boundary line around the selected text. It is as shown
below:-
Hand written screen 26
 Finally click the “Recognize” button such that it extracts/recognizes the characters from
the image and presents it to the user. But this data is still not editable. Hence when we
click on the “EDIT” button provided at the bottom-center then the document becomes
both editable and searchable. This complete process is explained in the upcoming two
screens. It is as shown below:-
Hand written screen 27
Hand written screen 28
 Now from the data available in the above screen shot, we can make any sort of changes
to the document using cut, copy, paste and etc and You can finally save the document in
two formats(word, text) as per our design.
 The search function can be carried out here by clicking the “find” image button at the
bottom-left corner. Then it asks the user to enter the search term. It is as shown below:-
Hand written screen29
 Now in the above screen shots dialog box, if you click Ok then there are two cases that
happens over here as per our design. They are:-
Case-1:- If the user enter search term resides in the document, then it will display a
dialog box asking the user, “whether he wants to continue the search or not? “.
If the user clicks yes then it will move the cursor to the search term.
If the user clicks no then it will exit the search.
Case-2:-If the user enters a search term that does not reside in the document, then it
will direct display a dialog box saying that the searching is finished. It means that the
search term is not present in the document.
Thus the user can understand whether the search term is present in the document or
not just after entering the search term itself.
 If we are searching for a term that is already present in the document then the series of
output screens will be as follows:-
Hand written screen 30
Hand written screen31
 If we are searching for a term that does not reside in the document then the series of
output screens will be as follows:-
32
Hand written screen 33
 If we are using the editor, you can perform the following actions displayed in the screens
below:-
Hand written screen 34
Hand written screen 35
 The editor module directly displays the screen as shown below:-
Hand written screen 37
STAGE 8
8. CONCLUSION
What does the future hold for OCR? Given enough entrepreneurial designers and sufficient
research and development dollars, OCR can become a powerful tool for future data entry
applications. However, the limited availability of funds in a capital-short environment could
restrict the growth of this technology. But, given the proper impetus and encouragement, a
lot of benefits can be provided by the OCR system. They are:-
 The automated entry of data by OCR is one of the most attractive, labor reducing
technology
 The recognition of new font characters by the system is very easy and quick.
 We can edit the information of the documents more conveniently and we can reuse the
edited information as and when required.
 The extension to software other than editing and searching is topic for future works.
The Grid infrastructure used in the implementation of Optical Character Recognition
system can be efficiently used to speed up the translation of image based documents into
structured documents that are currently easy to discover, search and process.
STAGE 9
9. FUTURE ENHANCEMENTS
The Optical Character Recognition software can be enhanced in the future in different kinds
of ways such as:
 Training and recognition speeds can be increased greater and greater by making it
more user-friendly.
 Many applications exist where it would be desirable to read handwritten entries.
Reading handwriting is a very difficult task considering the diversities that exist in
ordinary penmanship. However, progress is being made.
STAGE 10
10.REFERENCES
Under this references section, we have mentioned various references from which we
collected our problem and several others that supported us to design the solution for our
problem. These references include either books, papers published through some standards
and several websites links with URL’ s:-
 For the complete reference and understanding of neural networks refer jeff heaton’s
chapter 1 from www.jeffheaton.com
 For the complete reference and understanding of OCR refer jeff heaton’s chapter 7
from www.jeffheaton.com
 The IEEE standard reference paper from which we collected our problem statement
is authorized by Dana Petcu, Silviu Panica, Viorel Negru and Andrei Eckstein of
Computer Science Department who are from West University of Timisoara, Romania.
 The reference paper is also authorized by Doina Banciu from National Institute for
Research and Development in Informatics, Romania.
 You can refer the IEEE standard paper written by D. Andrews, R. Brown, C. Caldwell,
et al., “A Parallel Architecture for Performing Real Time Multi-Line Optical Character
Recognition”
 You can refer the IEEE standard paper written by H. Goto, “OCRGrid : A Platform for
Distributed and Cooperative OCR Systems”
 You can refer the paper written by M. Forbes, “OCHRE-P Optical Character
Recognition in Parallel”, which you can locate during your browsing itself.
 Also refer R. Mason, H. Schmidt, R. Trott, “Down on the OCR Farm: How We
Produced Searchable PDFs for 7 Million Documents in a Student Computer Lab”
STSGE 11
11. APPENDICES
Appendix A:Glossary
TERMS
All the terms and abbreviations in the project are specified clearly. For further
development of project evolved definitions will be specified
ACRONYMS
IEEE : Institute of Electrical and Electronics Engineers
DFD : Data Flow Diagram
UML : Unified Modeling Language
J2EE : Java 2 Enterprise Edition
GUI : Graphical User Interface
GOCR : Grid OCR
Appendix B: Analysis Models
This includes all the pertinent analysis models, such as data flow diagrams,class diagrams ,
use case diagrams, interaction diagrams and state-chart diagrams.
STAGE 12
BIBILOGRAPHY
Java AWT
By John Zukowski
Java Swings
By Dietel & Dietel
Java Complete Reference
By Schildt
OCR
By Jeff Heaton

More Related Content

What's hot

Final Report on Optical Character Recognition
Final Report on Optical Character Recognition Final Report on Optical Character Recognition
Final Report on Optical Character Recognition Vidyut Singhania
 
Optical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based RetrievalOptical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based RetrievalBiniam Asnake
 
Optical character recognition IEEE Paper Study
Optical character recognition IEEE Paper StudyOptical character recognition IEEE Paper Study
Optical character recognition IEEE Paper StudyEr. Ashish Pandey
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character RecognitionDurjoy Saha
 
Optical character recognition (ocr) ppt
Optical character recognition (ocr) pptOptical character recognition (ocr) ppt
Optical character recognition (ocr) pptDeijee Kalita
 
Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Karan Panjwani
 
OCR (Optical Character Recognition)
OCR (Optical Character Recognition) OCR (Optical Character Recognition)
OCR (Optical Character Recognition) IstiaqueBinIslam
 
Text to-speech & voice recognition
Text to-speech & voice recognitionText to-speech & voice recognition
Text to-speech & voice recognitionMark Williams
 
Resume-Manish_Agrahari_IBM_BPM
Resume-Manish_Agrahari_IBM_BPMResume-Manish_Agrahari_IBM_BPM
Resume-Manish_Agrahari_IBM_BPMManish Agrahari
 
Scott Allen Williams Résumé - Senior Java Software Developer - Agile Technolo...
Scott Allen Williams Résumé - Senior Java Software Developer - Agile Technolo...Scott Allen Williams Résumé - Senior Java Software Developer - Agile Technolo...
Scott Allen Williams Résumé - Senior Java Software Developer - Agile Technolo...Scott Williams
 
자연어처리 소개
자연어처리 소개자연어처리 소개
자연어처리 소개Jin wook
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Chiranjeevi Adi
 
Nitesh...........testing resume (1)
Nitesh...........testing resume (1)Nitesh...........testing resume (1)
Nitesh...........testing resume (1)9036858358
 
Resume - Taranjeet Singh - 3.5 years - Java/J2EE/GWT
Resume - Taranjeet Singh - 3.5 years - Java/J2EE/GWTResume - Taranjeet Singh - 3.5 years - Java/J2EE/GWT
Resume - Taranjeet Singh - 3.5 years - Java/J2EE/GWTtaranjs
 
Optical Character Recognition Using Python
Optical Character Recognition Using PythonOptical Character Recognition Using Python
Optical Character Recognition Using PythonYogeshIJTSRD
 
Cv ramzan ali (network engineer)
Cv ramzan ali (network engineer)Cv ramzan ali (network engineer)
Cv ramzan ali (network engineer)Ramzan Ali
 

What's hot (20)

Final Report on Optical Character Recognition
Final Report on Optical Character Recognition Final Report on Optical Character Recognition
Final Report on Optical Character Recognition
 
Optical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based RetrievalOptical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based Retrieval
 
Optical character recognition IEEE Paper Study
Optical character recognition IEEE Paper StudyOptical character recognition IEEE Paper Study
Optical character recognition IEEE Paper Study
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character Recognition
 
Optical character recognition (ocr) ppt
Optical character recognition (ocr) pptOptical character recognition (ocr) ppt
Optical character recognition (ocr) ppt
 
ocr
ocrocr
ocr
 
Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Optical Character Recognition( OCR )
Optical Character Recognition( OCR )
 
OCR (Optical Character Recognition)
OCR (Optical Character Recognition) OCR (Optical Character Recognition)
OCR (Optical Character Recognition)
 
Text to-speech & voice recognition
Text to-speech & voice recognitionText to-speech & voice recognition
Text to-speech & voice recognition
 
Resume-Manish_Agrahari_IBM_BPM
Resume-Manish_Agrahari_IBM_BPMResume-Manish_Agrahari_IBM_BPM
Resume-Manish_Agrahari_IBM_BPM
 
Text reader [OCR]
Text reader [OCR]Text reader [OCR]
Text reader [OCR]
 
TESTING
TESTINGTESTING
TESTING
 
Scott Allen Williams Résumé - Senior Java Software Developer - Agile Technolo...
Scott Allen Williams Résumé - Senior Java Software Developer - Agile Technolo...Scott Allen Williams Résumé - Senior Java Software Developer - Agile Technolo...
Scott Allen Williams Résumé - Senior Java Software Developer - Agile Technolo...
 
자연어처리 소개
자연어처리 소개자연어처리 소개
자연어처리 소개
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks
 
Nitesh...........testing resume (1)
Nitesh...........testing resume (1)Nitesh...........testing resume (1)
Nitesh...........testing resume (1)
 
Resume - Taranjeet Singh - 3.5 years - Java/J2EE/GWT
Resume - Taranjeet Singh - 3.5 years - Java/J2EE/GWTResume - Taranjeet Singh - 3.5 years - Java/J2EE/GWT
Resume - Taranjeet Singh - 3.5 years - Java/J2EE/GWT
 
Optical Character Recognition Using Python
Optical Character Recognition Using PythonOptical Character Recognition Using Python
Optical Character Recognition Using Python
 
Testing resume
Testing resumeTesting resume
Testing resume
 
Cv ramzan ali (network engineer)
Cv ramzan ali (network engineer)Cv ramzan ali (network engineer)
Cv ramzan ali (network engineer)
 

Similar to Optical character recognization word

Smart Assistant for Blind Humans using Rashberry PI
Smart Assistant for Blind Humans using Rashberry PISmart Assistant for Blind Humans using Rashberry PI
Smart Assistant for Blind Humans using Rashberry PIijtsrd
 
What is Optical Character Recognition (OCR) Technology?
What is Optical Character Recognition (OCR) Technology?What is Optical Character Recognition (OCR) Technology?
What is Optical Character Recognition (OCR) Technology?ARC Document Solutions
 
300GroupProject_handwritingsoftware.pptx
300GroupProject_handwritingsoftware.pptx300GroupProject_handwritingsoftware.pptx
300GroupProject_handwritingsoftware.pptxDanielJDanso
 
IRJET- Offline Transcription using AI
IRJET-  	  Offline Transcription using AIIRJET-  	  Offline Transcription using AI
IRJET- Offline Transcription using AIIRJET Journal
 
A Deep Learning Approach to Recognize Cursive Handwriting
A Deep Learning Approach to Recognize Cursive HandwritingA Deep Learning Approach to Recognize Cursive Handwriting
A Deep Learning Approach to Recognize Cursive HandwritingIRJET Journal
 
IRJET- Text Recognization of Product for Blind Person using MATLAB
IRJET- Text Recognization of Product for Blind Person using MATLABIRJET- Text Recognization of Product for Blind Person using MATLAB
IRJET- Text Recognization of Product for Blind Person using MATLABIRJET Journal
 
Colorful Modern Group Project Creative Presentation.pdf
Colorful Modern Group Project Creative Presentation.pdfColorful Modern Group Project Creative Presentation.pdf
Colorful Modern Group Project Creative Presentation.pdfImmanImman6
 
Reading System for the Blind PPT
Reading System for the Blind PPTReading System for the Blind PPT
Reading System for the Blind PPTBinayak Ghosh
 
OPTICAL CHARACTER RECOGNITION IN HEALTHCARE
OPTICAL CHARACTER RECOGNITION IN HEALTHCAREOPTICAL CHARACTER RECOGNITION IN HEALTHCARE
OPTICAL CHARACTER RECOGNITION IN HEALTHCAREIRJET Journal
 
Optical Recognition of Handwritten Text
Optical Recognition of Handwritten TextOptical Recognition of Handwritten Text
Optical Recognition of Handwritten TextIRJET Journal
 
Visual Product Identification For Blind Peoples
Visual Product Identification For Blind PeoplesVisual Product Identification For Blind Peoples
Visual Product Identification For Blind PeoplesIRJET Journal
 
IRJET- Survey Paper: Image Reader for Blind Person
IRJET- Survey Paper: Image Reader for Blind PersonIRJET- Survey Paper: Image Reader for Blind Person
IRJET- Survey Paper: Image Reader for Blind PersonIRJET Journal
 
Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Editor IJARCET
 

Similar to Optical character recognization word (20)

CRC Final Report
CRC Final ReportCRC Final Report
CRC Final Report
 
Smart Assistant for Blind Humans using Rashberry PI
Smart Assistant for Blind Humans using Rashberry PISmart Assistant for Blind Humans using Rashberry PI
Smart Assistant for Blind Humans using Rashberry PI
 
PB.docx
PB.docxPB.docx
PB.docx
 
D017222226
D017222226D017222226
D017222226
 
What is Optical Character Recognition (OCR) Technology?
What is Optical Character Recognition (OCR) Technology?What is Optical Character Recognition (OCR) Technology?
What is Optical Character Recognition (OCR) Technology?
 
300GroupProject_handwritingsoftware.pptx
300GroupProject_handwritingsoftware.pptx300GroupProject_handwritingsoftware.pptx
300GroupProject_handwritingsoftware.pptx
 
IRJET- Offline Transcription using AI
IRJET-  	  Offline Transcription using AIIRJET-  	  Offline Transcription using AI
IRJET- Offline Transcription using AI
 
A Deep Learning Approach to Recognize Cursive Handwriting
A Deep Learning Approach to Recognize Cursive HandwritingA Deep Learning Approach to Recognize Cursive Handwriting
A Deep Learning Approach to Recognize Cursive Handwriting
 
IRJET- Text Recognization of Product for Blind Person using MATLAB
IRJET- Text Recognization of Product for Blind Person using MATLABIRJET- Text Recognization of Product for Blind Person using MATLAB
IRJET- Text Recognization of Product for Blind Person using MATLAB
 
Colorful Modern Group Project Creative Presentation.pdf
Colorful Modern Group Project Creative Presentation.pdfColorful Modern Group Project Creative Presentation.pdf
Colorful Modern Group Project Creative Presentation.pdf
 
What is Batch Document Processing? A tutorial for document capture.
What is Batch Document Processing?  A tutorial for document capture.What is Batch Document Processing?  A tutorial for document capture.
What is Batch Document Processing? A tutorial for document capture.
 
Reading System for the Blind PPT
Reading System for the Blind PPTReading System for the Blind PPT
Reading System for the Blind PPT
 
Smart note maker
Smart note makerSmart note maker
Smart note maker
 
Paper based interaction
Paper based interactionPaper based interaction
Paper based interaction
 
OPTICAL CHARACTER RECOGNITION IN HEALTHCARE
OPTICAL CHARACTER RECOGNITION IN HEALTHCAREOPTICAL CHARACTER RECOGNITION IN HEALTHCARE
OPTICAL CHARACTER RECOGNITION IN HEALTHCARE
 
DU_SERIES_Session1.pdf
DU_SERIES_Session1.pdfDU_SERIES_Session1.pdf
DU_SERIES_Session1.pdf
 
Optical Recognition of Handwritten Text
Optical Recognition of Handwritten TextOptical Recognition of Handwritten Text
Optical Recognition of Handwritten Text
 
Visual Product Identification For Blind Peoples
Visual Product Identification For Blind PeoplesVisual Product Identification For Blind Peoples
Visual Product Identification For Blind Peoples
 
IRJET- Survey Paper: Image Reader for Blind Person
IRJET- Survey Paper: Image Reader for Blind PersonIRJET- Survey Paper: Image Reader for Blind Person
IRJET- Survey Paper: Image Reader for Blind Person
 
Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015
 

Recently uploaded

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 

Recently uploaded (20)

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 

Optical character recognization word

  • 1. OPTICAL CHARACTER RECOGNIZATION STAGE 1 1.INTRODUCTION: In the running world, there is going demand for the software systemto recognize characters in computer system when information is scanned through paper documents as we know that we have number of newspapers and books which are in printed format related to different subjects. These days there is a huge demand in “storing the information available in these paper documents in to a computer storage disk and then later reusing this information be searching process”. One simply way to store information in these paper document in to computer systemis to first scan the documents and then store them as IMAGES. But to reuse this information it is very difficult to read the individual contents and searching the contents from these documents line-by-line and word-by-word. The reason for this difficulty is the font characteristics of the character in paper document are different to font of the character in computer system. As a result , computer is unable to recognize the characters while reading them. This concept of storing the contents of paper documents in computer storage place and then reading and searching the content is called DOCUMENT PROCESSING . Sometimes in this document processing we need to process the information that is related to languages other than the English in the world. For this document processing we need a software systemcalled CHARACTER RECOGNITION SYSTEM. This process is also called DOCUMENT IMAGE ANALYSIS(DIA). Thus our need is to develop character recognition software system to perform Document image Analysis which transforms documents in paper format to electronic format. For this process there are various techniques in the world. Among all those techniques we have chosen OCR as main fundamental techniques to recognize characters. The conversion of paper documents in to electronic format is an on-going task in many of the organizations particularly in Research and Development area, in large business enterprises, in government institutions, so on. From our problem statement we can introduce the necessity of OCR in mobile electronic devices such as cell phones, digital cameras to acquire images and recognize them as a part of face recognition and validation.
  • 2. To effectively use OCR for character recognition in-order to perform Document Image Analysis(DIA), we are using the information in Grid format. This system is thus effective and useful in virtual Digital Library’s design and construction. 1.1 PURPOSE: The main purpose of OCR systembased on a grid infrastructure is to perform Document image analysis, document processing of electronic document formats more effectively and efficiently. This improves the accuracy of recognizing the characters during document processing compared various existing available character recognition methods. Here OCR technique derives the meaning of the characters , their font properties from their bit- mapped images.  The primary objective is to speed up the process of character recognition in document processing. As a result the systemcan process huge number of documents with-in less time and hence saves the time.  Since our character recognition is based on a grid infrastructure, it aims to recognize multiple heterogeneous characters that belong to different universal languages with different font properties and alignments. 1.2 PROJECT SCOPE: The scope of our product OCR on a grid infrastructure is to provide an efficient and enhanced software tool for the users to perform A) Abstract: 1. Optical Character Recognition software written in Java 2. …uses KNN classifier 3. …uses Open-CV image processing modules. 4. OCR is the recognition of printed or written text characters by a computer. 5. This involves scanning of the text char-by-char, analysis of the scanned-in image, and then translation of the char image to char codes.
  • 3. B)Existing System: In the running world there is a growing demand for the users to convert the printed documents in to electronic documents for maintaining the security of their data. Hence the basic OCR systemwas invented to convert the data available on papers in to computer process able documents, So that the documents can be editable and reusable. The existing system/the previous system of OCR on a grid infrastructure is just OCR without grid functionality. That is the existing systemdeals with the homogeneous character recognition or character recognition of single languages. I. Existing System Dis-advantages: The drawback in the early OCR systemis that they only capability to convert and recognize only the documents of English are specific language only. That is, the older OCR System uni- lingual. Character must be hand-printed with separate character in boxes. C) Proposedsystem:  Proposed system makes use of the camera  A scanner is not mandatory to scan the document  Built-in camera of the laptop can be used to capture images.  Camera can also be attached externally to a desktop /laptop computer.  Works equally well with scanned documents.  Proposed system works even with Hand written documents.  Works equally well with printed documents/ brochures/ books.
  • 4. I. Architecture of the Proposed System: The Architecture of the optical character recognition System on a grid infrastructure consists of the main components. They are  Scanner  OCR Hardware or Software  Output Interfaces. II. Advantages of Proposed System:  The benefit of proposed systemthat overcomes the drawback of the existing system is that is supports multiple functionalities such as ending and searching.  If also adds benefits by providing heterogeneous character recognition.
  • 5. C) INTENDED AUDIENCE AND READING SUGGESTIONS In this section, we identify the audience who are interested with the product and are involved in the implementation of the product either directly or indirectly. As from our research, the OCR systemis mainly useful in R&D at various scientific organizations, in governmental institutes and in large business organizations, we identify the following as various interested audience in implementing OCR system:-  The scientists, the research scholars and the research fellows in telecommunication institutions are interested in using OCR systemfor processing the word document that contains base paper for their research.  The Librarian to manage the information contents of the older books in building virtual digital library requires use of OCR system.  Various sites that vendor e-books have a huge requirement of this OCR systemin- order to scan all the books in to electronic format and thus make money. The Amazon book world is largely using this concept to build their digital libraries. Now we present the reading suggestions for the users or clients through which the user can better understand the various phases of the product. These suggestions may be effective and useful for the beginners of the product rather than the regular users such as research scholars, librarians and administrators of various web-sites. With these suggestions, the user need not waste his time in scrolling the documents up and down, browsing through the web, visiting libraries in search of different books and …The following are the various reading suggestions that the user can follow in-order to completely understand about our product and to save time:-  It would help you if you start with Wikipedia.com. It lets you know the basic concept of every keyword you require. First learn from it what is OCR? And how does it work based on a Grid infrastructure?
  • 6.  Now you can proceed your further reading with the introduction of our product we provided in our documentation. From these two steps you completely get an in- depth idea of the use of our product and several processes involved in it.  The more you need is the implementation of the product. For this you can visit Free OCR.comwhere you can view how the sample OCR works and you can try it. STAGE 2: SOFTWARE REQUIREMENT ANALYSIS 2.1 information gathering PROBLEM STATEMENT: The problem here is for the software systems to recognize characters in computer system when information is scanned through paper documents as we know that we have number of newspapers and books which are in printed format related to different subjects. Whenever we scan the documents through the scanner, the documents are stored as images such as jpeg, gif etc., in the computer system. These images cannot be read or edited by the user. But to reuse this information it is very difficult to read the individual
  • 7. contents and searching the contents form these documents line-by-line and word-by-word. These days there is a huge demand in “storing the information available in these paper documents in to a computer storage disk and then later editing or reusing this information by searching process”. 2.2 MODULES AND THEIR FUNCTIONALITIES Our software systemOptical Character Recognition on a grid infrastructure can be divided into five modules based on its functionality.The modules classified are as follows:-  Document Processing Module  System Training Module.  Document Recognition Module.  Document Editing Module and  Document Searching Module. 2.2.1 DOCUMENT PROCESSING MODULE This module is accessed by administrator whose role in our application is a librarian.This module perform certain activities such as scanning documents, storing them as images, recognizing characters in images to transfer them into word format. During the recognition process, this module uses the OCR methodology in support of grid infrastructure data structure. The module supports the following services:-  Scanning printed documents.  Storing the documents as snapshots or images.  Processing those image-based documents.
  • 8.  Converting these image-based documents into e-documents(also called structured documents).  Recognizing the characters in documents.  Generating grid infrastructure data structure. 2.2.2 SYSTEM TRAINING MODULE This module can be accessed by both the administrator and the end-user. Before converting the printed documents in to editable and searchable documents, the first and the mandatory step is providing training to the system. Here training in the sense the font followed in the scanned document should be identified by the user. Then the user types all the characters that are required for recognition from the scanned document as an image file. This image file should be provided as an input during the training process. The user then clicks the train button provided in the recognition module. Then the training gets completed. Thus the system gets familiar with the new font. This module supports:-  Training the systemwith the pre-defined fonts.  Training the systemwith the new fonts that are not present in the system and that cannot be identified by the system. 2.2.3 DOCUMENT RECOGNITION MODULE This module can be accessed by both the administrator and the end-user. Once the printed documents are converted into structured documents, any user can recognize the characters present in the document. That means the user can recognize the characters of any language he chooses which makes OCR more flexible. This flexibility is due to the adaptation of grid infrastructure. This is the module where the main functionality of OCR is tested.
  • 9. Under this module, there are two types of recognition. They are handwritten recognition and scanned document recognition. In handwritten recognition, the handwriting of the user in any language is trained to the system only for the first time. From there on-wards, the system recognizes the characters or words written by the user. Thus handwritten document recognition recognizes the human handwriting. In scanned document recognition, the systemis first trained with the font characters in the document in the training module itself. Now in the recognition module, the system takes the scanned documents image as an input file, first crops the image and then extracts/recognizes the characters from the document and makes these documents editable and searchable. Thus the scanned document recognition recognizes the characters from the scanned document image and makes the document editable and searchable. Hence the document recognition module on a whole supports the following services:-  Converts the document into specific format  Recognizes the characters  Heterogeneous character Recognition 2.2.4 DOCUMENT EDITING MODULE This module can be accessed by both the administrator and the end-user during document editing to implement the character recognition process. Once the scanned documents are stored, they reside in computer memory. This data resides in the form of an image that is just view able in an image viewer. Hence, the document is first converted into a form such that it is editable. The desired form of the document may be MS-Word,Text,… as specified by the user.The objective of this module is to let the user perform :-  Addition of specific content to the documents  Deletion of certain content from documents
  • 10.  Any other modification of documents. 2.2.5 DOCUMENT SEARCHING MODULE This module can be accessed by both the administrator and the end-user during the search of the user required document to implement the character recogniiton process on it. The user requests the systemto search for a particular document. Then the system finds the documents based on OCR methodology and returns the result of the search to the user.The main objective of this module is :-  To facilitate the search function. 2.3 SOFTWAREAND HARDWAREREQUIREMENTS 2.3.1 SOFTWARE REQUIREMENTS SPECIFICATION  Operating System : Windows-XP  Programming Language : Core Java  User Interface : Swings 2.3.2 HARDWARE REQUIREMENTS SPECIFICATION  Processor : Pentium IV processor or higher  RAM : Minimum of 512 MB RAM
  • 11.  Memory : 500 MB or higher STAGE 3: Project Analysis LITERATURE SURVEY /REVIEW OF LITERATURE: INTRODUCTION A feasibility study is a high-level capsule version of the entire System analysis and Design Process. The study begins by classifying the problem definition. Feasibility is to determine if it’ s worth doing. Once an acceptance problem definition has been generated, the analyst develops a logical model of the system. A search for alternatives is analyzed carefully. There are 3 parts in feasibility study.
  • 12. 3.1 FEASIBILITY STUDY: A feasibility study is a high- level capsule version of the entire system analysis and design process. The study begins by classifying the problem definition. Feasibility is to determine if it’s worth doing. Once an acceptance problem definition has been generated, the analyst develops a logical model of the system. A search for alternatives is analyzed carefully. There are 3 parts of feasibility study. 1. Technical feasibility 2. Operational feasibility 3. Economical feasibility 1. Technical Feasibility: Evaluating the technical feasibility is the trickiest part of a feasibility study. This is because, at the point in time,not too many detailed design of the system,making if difficult to access issues like performances, costs on etc. A number of issues have to be considered while doing a technical analysis. Understand the different technologies involved in the proposed systembefore commencing the we have to be very clear about what are the technologies that are to be required for the development of the new system. Find out whether the recognition currently process the required technologies. Is the required technologies available with the organization?. 2. Operational Feasibility: Proposed project is beneficial only if it can be turned into information systemthat will meet the organization operating requirements. Simply stated, this test of feasibility asks if the system will work when it is developed and installed. Are there major barriers to Implementation? Here are questions that will help test the operational feasibility of a project.  Is there sufficient support for the project from management from users? If the current systemis well liked and used to the extent that person will
  • 13.  Are the current business methods acceptable to the user? If they are not. Users may welcome a change that will bring about a more operational and useful systems.  Have the user been in the planning and development of the project?  Early involvement reduces the chances of resistance to the systemand in general and increase the successful project. Since the proposed systemwas to help reduces the hardship encountered. In the existing manual system, the new systemwas considered to be operational feasible. 3.Economic Feasibility: Economic feasibility attempts to weigh the costs of developing and implementing a new system,against the benefits that would having the new system in place. This feasibility study gives the top management the economic justification for the new system. A simple economic analysis which gives the actual comparison of costs and benefits are much more meaningful in this case. In addition,this proves to be a useful point of reference to compare actual costs as the project progress. There could be various types of intangible benefits on account of automation. These could include increased customer satisfaction,improvement in product quality better decision making timeliness of information,expediting activities, improved accuracy of operation, better documentation and record keeping,faster retrieval of information,better employee morale. 3.4 TRAINING Training is a very important process of working with a neural network. As seen from neural networks, there are two forms of training that can be employed with a neural network. They are namely:- 1. Un-Supervised Training 2. Supervised Training
  • 14. Supervised training provides the neural network with training sets and the anticipated output. Unsupervised training supplies the neural network with training sets, but there is no anticipated output provided 3.4.1 UNSUPERVISED TRAINING: Unsupervised training is a very common training technique for Kohonen neural networks. We will discuss how to construct a Kohonen neural network and the general process for training without supervision. What is meant by training without supervision is that the neural network is provided with training sets, which are collections of defined input values. But the unsupervised neural network is not provided with anticipated outputs. Unsupervised training is usually used in a classification neural network. A classification neural network takes input patterns, which are presented to the input neurons. These input patterns are then processed, and one single neuron on the output layer fires. This firing neuron can be thought of as the classification of which group the neural input pattern belonged to. Handwriting recognition is a good application of a classification neural network. The input patterns presented to the Kohonen neural network are the dot image of the character that was hand written. We may then have 26 output neurons, which correspond to the 26 letters of the English alphabet. The Kohonen neural network should classify the input pattern into one of the 26 input patterns. During the training process the Kohonen neural network in handwritten recognition is presented with 26 input patterns. The network is configured to also have 26 output patterns. As the Kohonen neural network is trained the weights should be adjusted so that the input patterns are classified into the 26 output neurons. This technique results in a relatively effective method for character recognition.
  • 15. Another common application for unsupervised training is data mining. In this case you have a large amount of data, but you do not often know exactly what you are looking for. You want the neural network to classify this data into several groups. You do not want to dictate, ahead of time, to the neural network which input pattern should be classified to which group. As the neural network trains the input patterns will fall into similar groups. This will allow you to see which input patterns were in common groups. 3.4.2 SUPERVISED TRAINING: The supervised training method is similar to the unsupervised training method in that training sets are provided. Just as with unsupervised training these training sets specify input signals to the neural network. The primary difference between supervised and unsupervised training is that in supervised training the expected outputs are provided. This allows the supervised training algorithm to adjust the weight matrix based on the difference between the anticipated output of the neural network, and the actual output. There are several popular training algorithms that make use of supervised training. One of the most common is the back-propagation algorithm. It is also possible to use an algorithm such as simulated annealing or a genetic algorithm to implement supervised training. 3.5 INTRODUCING KOHONEN NEURAL NETWORK The Kohonen neural network differs considerably from the feed-forward back propagation neural network. The Kohonen neural network differs both in how it is trained and how it recalls a pattern. The Kohonen neural network does not use any sort of activation function. Further, the Kohonen neural network does not use any sort of a bias weight. Output from the Kohonen neural network does not consist of the output of several neurons. When a pattern is presented to a Kohonen network one of the output neurons is
  • 16. selected as a "winner". This "winning" neuron is the output from the Kohonen network. Often these "winning" neurons represent groups in the data that is presented to the Kohonen network. For example, in an OCR program that uses 26 output neurons, the 26 output neurons map the input patterns into the 26 letters of the Latin alphabet. The most significant difference between the Kohonen neural network and the feed forward back propagation neural network is that the Kohonen network trained in an unsupervised mode. This means that the Kohonen network is presented with data, but the correct output that corresponds to that data is not specified. Using the Kohonen network this data can be classified into groups. We will begin our review of the Kohonen network by examining the training process. It is also important to understand the limitations of the Kohonen neural network. Neural networks with only two layers can only be applied to linearly separable problems. This is the case with the Kohonen neural network. Kohonen neural networks are used because they are a relatively simple network to construct that can be trained very rapidly. A "feed forward" neural network is similar to the types of neural networks that we are ready examined. Just like many other neural network types the feed forward neural network begins with an input layer. This input layer must be connected to a hidden layer. This hidden layer can then be connected to another hidden layer or directly to the output layer. There can be any number of hidden layers so long as at least one hidden layer is provided. In common use most neural networks will have only one hidden layer. It is very rare for a neural network to have more than two hidden layers. We will now examine, in detail, and the structure of a "feed forward neural network". The Structure of a Feed Forward Neural Network A "feed forward" neural network differs from the neural networks previously examined. Figure 3.1 shows a typical feed forward neural network with a single hidden layer.
  • 17. Figure.3.1 The Input Layer The input layer to the neural network is the conduct through which the external environment presents a pattern to the neural network. Once a pattern is presented to the input layer of the neural network the output layer will produce another pattern. In essence this is all the neural network does. The input layer should represent the condition for which we are training the neural network for. Every input neuron should represent some independent variable that has an influence over the output of the neural network. It is important to remember that the inputs to the neural network are floating point numbers. These values are expressed as the primitive Java data type "double". This is not to say that you can only process numeric data with the neural network. If you wish to process a form of data that is non-numeric you must develop a process that normalizes this data to a numeric representation.
  • 18. The Output Layer The output layer of the neural network is what actually presents a pattern to the external environment. Whatever patter is presented by the output layer can be directly traced back to the input layer. The number of a output neurons should directly related to the type of work that the neural network is to perform. To consider the number of neurons to use in your output layer you must consider the intended use of the neural network. If the neural network is to be used to classify items into groups, then it is often preferable to have one output neurons for each groups that the item is to be assigned into. If the neural network is to perform noise reduction on a signal then it is likely that the number of input neurons will match the number of output neurons. In this sort of neural network you would one day he would want the patterns to leave the neural network in the same format as they entered. For a specific example of how to choose the numbers of input and output neurons consider a program that is used for optical character recognition, or OCR. To determine the number of neurons used for the OCR example we will first consider the input layer. The number of input neurons that we will use is the number of pixels that might represent any given character. Characters processed by this program are normalized to universal size that is represented by a 5x7 grid. A 5x7 grid contains a total of 35 pixels. The optical character recognition program therefore has 35 input neurons. The number of output neurons used by the OCR program will vary depending on how many characters the program has been trained for. The default training file that is provided with the optical character recognition program is trained to recognize 26 characters. As a result using this file the neural network would have 26 output neurons. Presenting a pattern to the input neurons will fire the appropriate output neuron that corresponds to the letter that the input pattern corresponds to.
  • 19. 3.5.1 HOWA KOHONEN NETWORKRECOGNIZES We will now show you how the Kohonen neural network recognizes a pattern. We will begin by examining the structure of the Kohonen neural network. Once you understand the structure of the Kohonen neural network, and how it recognizes patterns, you will be shown how to train the Kohonen neural network to properly recognize the patterns you desire. We will begin by examining the structure of the Kohonen neural network. 3.5.2 THE STRUCTURE OF THE KOHONEN NEURAL NETWORK The Kohonen neural network works differently than the feed forward neural network. The Kohonen neural network contains only an input and output layer of neurons. There is no hidden layer in a Kohonen neural network. First we will examine the input and output to a Kohonen neural network. The input to a Kohonen neural network is given to the neural network using the input neurons. These input neurons are each given the floating point numbers that make up the input pattern to the network. A Kohonen neural network requires that these inputs be normalized to the range between -1 and 1. Presenting an input pattern to the network will cause a reaction from the output neurons. The output of a Kohonen neural network is very different from the output of a feed forward neural network. If we had a neural network with five output neurons we would be given an output that consisted of five values. This is not the case with the Kohonen neural network. In a Kohonen neural network only one of the output neurons actually produces a value. Additionally, this single value is either true or false. When the pattern is presented to the Kohonen neural network, one single output neuron is chosen as the output neuron. Therefore, the output from the Kohonen neural network is usually the index of the neuron (i.e. Neuron #5) that fired. The structure of a typical Kohonen neural network is shown in Figure 3.2
  • 20. Figure.3.2 Now that you understand the structure of the Kohonen neural network we will examine how the network processes information. 3.5.3 HOWA KOHONEN NETWORKLEARNS In this section you will learn to train a Kohonen neural network. There are several steps involved in this training process. Overall the process for training a Kohonen neural network involves stepping through several epochs until the error of the Kohonen neural network is below acceptable level. In this section we will learn these individual processes. We will also learn how to calculate the error rate for Kohonen neural network, you'll learn how to adjust the weights for each epoch. You will also learn to determine when no more epochs are necessary to further train the neural network.
  • 21. The training process for the Kohonen neural network is competitive. For each training set one neuron will "win". This winning neuron will have its weight adjusted so that it will react even more strongly to the input the next time. As different neurons win for different patterns, their ability to recognize that particular pattern will be increased. We will first examine the overall process involving training the Kohonen neural network. These individual steps are summarized in figure 3.3 Figure.3.3
  • 22. STAGE 4: 4.SOFTWARE DESIGN 4.1 Data Flow Diagram The DFD is also called as bubble chart. A data flow diagram(DFD) is a graphical representation of the”flow”of the data through information system. DFD’s can also be used for the visualization of data processing. The flow of data in our systemcan be described in the form of data flow diagram as follows:- 1. First,if the user is administrator he can initialize the following actions:-  Documentation processing  Document search  Document editing All the above action come under 2 cases. They described as follows:- a) If the printed document is a new document that is not yet read into the system,then the document processing phase reads the scanned document as image only and then produces the document image stored in computer memory as a result. Now the document processing phase has the document at its hand and can read the document at any point of time. Later the document processing phase proceeds with recognizing the document using OCR methodology and the grid infrastructures. Thus it produces the document with the recognized character as final output which can be later searched and edited by the end-user or administrator. b)If the printed document is already scanned in and is held in systemmemory,then the document processing phase proceeds with document recognition using OCR methodology and grid infrastructure. And thus it finally produces the document with recognized document as output.
  • 23. 2. If the user using the OCR systemis the end-user,then he can perform the following actions:-  Document searching  Document editing 1. Document searching:- The document which are recognized can be searched by the user whenever required by requesting the system database. 2. Document Editing:- The recognized document can be edited by adding the specific content to the document,deleting specific content from the document and modifying the document. 4.1 Design Flow Diagram
  • 24. 4.2 UML Diagrams: UML combines best techniques from data modeling (ER Diagrams),business modeling(work flows),object modeling,and component modeling. It can be used with all process, throughout the software development life cycle,and across different implementation technologies. UML has 14 types of diagram divided into 2 categories. 7 diagrams types represent structural information,and the other seven represent general types of behavior,including four that represent different aspects of interactions. Some of these diagrams we provided to describe the design and implementation of our system can be categorized hierarchically as below:- 1.Use case diagram 2.Class diagrams 3.Sequence diagram 4.Collaboration diagram 5.Activity diagram 6.Component diagrams 7.Deployment diagrams 4.2.1 USE-CASEDIAGRAMS: Our software systemcan be used to support library environment to create a Digital library where several paper documents are converted into electronic-form for accessing by the users. For this purpose the printed document must be recognized before they are converted into electronic-form. The resulting electronic-documents are accessed by the users like faculty and students for reading and coding. Now according to this information,the following are the different actors involved in implementing our OCR systems.  If we consider for virtual digital library,the Administrators can be Librarian and the End-user can be students or/and faculty.  The following are the list of use diagrams that altogether form the complete or the overall use-case diagrams. They are listed below:- 1. Use-case diagramfor document processing 2. Use-case diagramfor neural network training 3. Use-case diagramfor document recognition
  • 25. 4. Use-case diagramfor document editing 5. Use-case diagramfor document searching In each use-case diagrambelow we clearly explained about that particular use-case functionalities. In this we provided a description about the  Use-case name  Details about the use-case  Actors using this use-case  The flow of events carried out by use-case  The condition that in this use-case 5.2Use Case Diagram For Document Processing Use Case Name Document processing Description The administrator is the only person who participates in the document processing. Here he scans the documents. The scanned documents are read as images. Finally the read images are stored in the system memory. Actors  Primary Actor : Administrator  Secondary Actor : User Scans documents read images stores the images Administrator
  • 26. Flow of Events 1. The Administrator scans the document which he wants to edit. 2. The scanned documents are read as images. 3. Finally the images that are stored in systemmemory for the future reference. 4.3 Case Diagram For Neural Network Training Use case Name Neural Network Training Description The Administration or End-user enters the specific characters required for training. User stores them as image file and trains the system. Actors  Primary Actor : Administration or End-user  Secondary Actor : User Flow of Events 1. The user the specific characters in order to train the system 2. After entering it is stored as image file 3. Finally trains the system according to the system . Pre-Condition: Enters specific characters Stores them as image file Trains the system Administrator or end-user
  • 27. The font in the scanned document should be identified. 4.4Use-case diagram for document editing Use-Case Name Document editing Description Both administration and end-user can perform the document editing. The user opens the document in the editor and selects the edit action etc., edit,modify delete etc. After selecting the edit action ending operation performed and finally stores the document that had been edited. Actors  Primary Actor : Administrator or End-User  Secondary Actor : User Flow of Events 1. The administrator or End-User opens the document which he want to edit. Open document in editor Select Edit action Performs editing Stores edited document Administrator or End-user
  • 28. 2. He selects the edit action. The action consists of editing the document,modifying the document,deleting,deleting the document etc. 3. After selecting the edit action the editing operation is performed. 4. Finally the edited document is stored in the systemmemory. Pre-Condition The input to be taken for editing should be an image of the document that is converted in to word of text file. That is the input file must be either .doc file or .txt file only. Post-Condition Finally after editing the document there are specific target formats defined by the user. The document should be saved in that format only. That will be the output of the editor. That is,as per our design the final document after editing must be saved in .doc orb.txt file only. 4.5Use - Case Diagram For Document Recognition Use case Name Document Recognition Description The Administration or End-user trains the system according to the given symbols or alphabets. Then the characters are recognized after the systemis trained. Actors  Primary Actor : Administrator or End-user  Secondary Actor : User Trains System Recognize characters Administrator or end-user
  • 29. Flow of Events 1. The user trains the systemto recognize the characters. 2. After the system is trained the characters are recognized. Pre-Condition Before trying to recognize the characters,the system should be trained first with the font characteristics the font size. 4.6 Use-Case Diagram For Document Searching Use case Name Document Searching Description The Administrator or End-user opens the document in editor. He enters the word which he is looking for in that document. Then he searches the word. Actors  Primary Actor : Administrator or End-user  Secondary Actor : User Flow of Events 1. The user opens the document for searching a word he required 2. After opening the document he enters the word for search Opens document in Editor Enters word for search searches the word Administrator or end-user
  • 30. 3. Finally searches the word in that document. Pre and Post Conditions No Pre-Condition and post-condition. 4.7 Overall Use-Case Diagram 4.2.2. CLASS DIAGRAMS The class diagrams is the main building block in object oriented modeling. The classes in a class diagramrepresent both the main objects and or interactions in the application and the objects to be programmed.  The class diagramof our OCR systemconsists of 9classes. They are end-user1 end-user2 Document modification Document deletion Document recognition scan documents store documents Document processing <<includes>> <<includes>> Document processing Document editing administrator Trains the system end-user
  • 31. 1. Main Screen 2. Editor 3. Help-frame 4. Document 5. HEntry 6. Entry 7. Training Set 8. Kohonen-Network 9. Printed-Frame Among all these classes the MainScreen is the main class that represent all the major functions carried out by our OCR system. The MainScreen class has an association with five classes viz…..,Editor, HelpFrame, Document, TrainingSet, PrintedFrame. And the TrainingSet class in-turn has an association with the HEntry and the KohonenNetwork classes. The PrintedFrame has an association with the Entry and KohonenNetwork classes.
  • 32. 4.8 Class Diagram SEQUENCE DIAGRAMS Sequence diagram are sometimes called Event-trace diagram,event scenarios ,and timing diagrams. A sequence diagram shows,as parallel vertical lines,different processes or objects that live simultaneously,and,as horizontal arrows,the messages exchanged between them,in the order in which they occur. This allows the specifications of simple runtime scenarios in a graphical manner. In sequence diagram,the class object that are used to describe the interaction between various classes vary from one function to another function. There are five sequence diagrams short-listed below for presenting the sequence of actions performed by each of
  • 33. the five modules. The key class object involved in all of these module functions is MainScreen class which controls the interaction among various class objects. Sequence Diagramfor Document Processing 1.Objects: Administrator-”a” MainScreen-”m” Document-”d” SystemMemory-”s” 2.Links: a) Administrator object to MainScreen object b) MainScreen object to Document object c) Document object to SystemMemory object d) SystemMemory object to Administrator object. 3. Messages: a) Process document b) Scan document c) Scans d) Stores document e) Stores f) Return the processed document
  • 34. 4.9 Sequence Diagram For Processing Sequence Diagramfor SystemTraining: 1.Objects: Administrator-”a” System-”s” Training-”t” 2.Links: a) Administrator object to System object b) System object to TrainingSet object c) Training object to System object d) System object to Administrator object 3. Messages: a) Specifies the font character a:Administraror m:MainSreen d:Document s:SystemMemory 1.Process documents 2.Scan documents 3.Scans 4.Stores documents 5.Stores 6.Returns the processed documents
  • 35. b) Stores it as an image c) Trains the system with new font d) System recognize new font and return for user. 5.10 Sequence Diagram For Training Sequences Diagram for Document Recognition: 1.Objects Administrator-”a” MainScreen-”m” SystemMemory-”s” Training-”t” 2.Links a) Administrator object to MainScreen object b) MainScreen object to SystemMemory object c) SystemMemory object to MainScreen object d) MainScreen object to TrainingSet object. e) TrainingSet object to MainScreen object f) MainScreen object to Administrator object. a:Administrator s:System t:TrainingSet 1.Specifies the font characters 2.Stores it as an image 3.Trains the system with new font 4.System recognizes new font and returns for user
  • 36. 3.Messages: a) Recognize document b) Store processed document c) Read file image d) Recognize using OCR e) Send processed document f) Recognize the characters. 4.11 Sequence Diagram For Recognition Sequence Diagram for Document Editing: 1.Objects: Administrator-”a” MainScreen-”m” Document-”d” SystemMemory-”s” 2.Links: 1. Administrator object to MainScreen object a:Administrator m:MainScreen s:SystemMemory t:TrainingSet 1:Recognise documents 2.Store processed document 3.Read file image 4.Recognise using ocr 5.Send processed document 6.Recognise the characters
  • 37. 2. MainScreen object to Document object 3. MainScreen object to Document object 4. MainScreen object to Document object 5. Document object to SystemMemory objects 6. SystemMemory object to Administrator object. 3.Message: a) Edit document b) Adding document c) Adds d) Deleting document e) Deletes f) Modifying documents g) Modifiers h) Stores the edited document i) Administrator accesses the edited document.
  • 38. 4.12 Sequence Diagram For Editing Sequence Diagrams for Document Searching 1.Object: Administrator-”a” MainScreen-”m” Document-”d” 2.Links: a) Administrator object to MainScreen object b) MainScreen object to Document object c) Document object to Administrator object. 3.Messages: a:Administrator m:MainScreen d.Document s:SystemMemory 1.Edit document 2.Adding document 3.adds 4.Deleting content 5.Deletes 7.Modifies 8.Stores the edited documents 9.Administrator accesses the edited documents 6.Modifing content
  • 39. a. Specifies the word b. Searches the word c. Searches d. Returns the location of the word. 4.13 Sequence Diagram For Searching 4.2.4 COLLABORATION DIAGRAM A Collaboration diagram also known as Communication diagram models the interactions between objects or parts in terms of sequenced messages. Communication diagrams show which elements each one interacts with better, but sequence diagrams show the order in which the interactions take place more clearly. The collaboration diagram is same as sequence diagram in its function implementation but the presentation or structure of the classes differs. The class objects that are used to define the relationships between the classes are same as that of the sequence diagram. Hence collaboration diagramalso has five collaboration diagrams describing each of the five modules. In this diagram also the MainScreen acts as the key class object. a:Administrator m:MainScreen d:Document 1.Specifies the word 2.Searches the word 4.Returns the location of the word 3.Searches
  • 40. 4.14 Collaboration Diagram for Document Processing 5.15 Collaboration Diagram for Neural Network Training a:Administraror m:MainSreen d:Documents:SystemMemory 3: 3.Scans 5: 5.Stores 1: 1.Process documents 2: 2.Scan documents 4: 4.Stores documents 6: 6.Returns the processed documents a:Administrator s:System t:TrainingS et 1: 1.Specifies the font characters 4: 4.System recognizes new font and returns for user 2: 2.Stores it as an image 3: 3.Trains the system with new font
  • 41. 4.16 Collaboration Diagram for Document Recognition 4.17 Collaboration Diagram for Document Editing a:Administrator m:MainScreen s:SystemMemory t:TrainingSet 6: 6.Recognise the characters 1: 1:Recognise documents 2: 2.Store processed document 3: 3.Read file image 4: 4.Recognise using ocr 5: 5.Send processed document a:Administrator m:MainScreen d.Documents:SystemMemory 3: 3.adds 5: 5.Deletes 7: 7.Modifies 1: 1.Edit document 2: 2.Adding document 4: 4.Deleting content 6: 6.Modifying content 8: 8.Stores the edited documents 9: 9.Administrator accesses the edited documents
  • 42. 4.18 Collaboration Diagram for Document Searching 4.2.4 ACTIVITES DIAGRAM: The purpose of activities diagramis to provide a view of flows and what is going on inside a use case or among several classes. Activities diagramcan also be used to represent a class’s method implementation. A token represents an operation. An activity is shown as a round box containing the name of the operation. An outgoing solid arrow attached to the end of activity symbol indicates a transition triggered by the completion. a:Administrator m:MainScreen d:Document 3: 3.Searches 1: 1.Specifies the word 2: 2.Searches the word4: 4.Returns the location of the word
  • 43. 4. 19 Activity Diagram For Document Processing Request document processing Process document Retry for scanning Scan documents Store documents [ scanner not ready ] [ scanner ready ]
  • 44. 4.20 Activity Diagram for document Retrieval. Request document Initiate search Returns message Sends document to user Retrieves document [ Document exists ] [ Document does not exist ]
  • 45. 4.21 Activity Diagram For Document Storage Edit documents Delete document content [ user choses delete ]Add document content [ user choses add ] Modify document [ user choses modify ] Store documents
  • 46. 4.2.5 COMPONENT DIAGRAM The crucial component in our component diagram that plays a major role in implementing the OCR systemis the GUI component. All other components that is Document processing and recognition,Document editing and Document Searching depends on it. They are as follows:-  GUI Component that is used to design GUI screens for interacting with the end-user and administrator.  Form the GUI component other component functionalities are carried out. The functionalities include Document processing and recognition, Document editing and Document Searching. GUI Document Processing and Recognition Editing Searching GUI Screens adding,deleting, modifying scanning,storing and recognising characters supports user search function
  • 47. 4.22 Component Diagram 4.2.6 DEPLOYMENTDIAGRAM: A deployment diagram serves to model the physical deployment of artifacts on deployment targets. Deployment diagrams show “the allocation of Artifacts to Nodes according to the Deployment defined between them”. In the deployment diagram of our system,the server role is played by admin called Librarian. Then can be N number of clients who can access the digital library data content at a time. The clients here may be either the students or the faculty or the both.  The actions performed by the Administrator are document processing searching and editing where as the performed by the end-user are only document searching and editing. <<Server>> <<Client1>> <<Client2>> <<ClientN>> Document searching, editing Document searching, editing Document searching, editing Document Processing, editing and searching
  • 48. 4.23 Deployment Diagram STAGE 6: 6.TESTING The purpose of testing is to discover errors. Testing is the process of trying to discover every conceivable fault or weakness in a work product. It provides a way to check the functionality of components, sub assemblies, assemblies and/or a finished product. It is the process of exercising software with the intent of ensuring that the software system meets its requirements and user expectations and does not fail in an unacceptable manner. There are various types of test. Each test type addresses a specific testing requirement. 6.1 TYPES OF TESTS Unit Testing Unit testing involves the design of test cases that validate that the internal program logic is functioning properly, and that program input produces valid outputs. All decision branches and internal code flow should be validated. It is the testing of individual software units of the application .it is done after the completion of an individual unit before integration. This is a structural testing, that relies on knowledge of its construction and is invasive. Unit tests perform basic tests at component level and test a specific business process, application, and/or system configuration. Unit tests ensure that each unique path of a business process performs accurately to the documented specifications and contains clearly defined inputs and expected results. Integration Testing
  • 49. Integration tests are designed to test integrated software components to determine if they actually run as one program. Testing is event driven and is more concerned with the basic outcome of screens or fields. Integration tests demonstrate that although the components were individually satisfaction, as shown by successfully unit testing, the combination of components is correct and consistent. Integration testing is specifically aimed at exposing the problems that arise from the combination of components. System Testing System testing ensures that the entire integrated software system meets requirements. It tests a configuration to ensure known and predictable results. An example of system testing is the configuration oriented system integration test. System testing is based on process descriptions and flows, emphasizing Pre-driven process links and integration points. Functional Testing Functional tests provide a systematic demonstration that functions tested are available as specified by the business and technical requirements, system documentation, and user manuals. Functional testing is centered on the following items: Valid Input : identified classes of valid input must be accepted. Invalid Input : identified classes of invalid input must be rejected. Functions : identified functions must be exercised. Output : identified classes of application outputs must be exercised. Systems/Procedures : interfacing systems or procedures must be invoked. Organization and preparation of functional tests is focused on requirements, key functions, or special test cases. In addition, systematic coverage pertaining to identify business process flows, data fields, predefined processes, and successive processes must be considered for testing. Before functional testing is complete, additional tests are identified and the effective value of current tests is determined.
  • 50.  There are two basic approaches of functional testing: a. Black box or functional testing. b. White box testing or structural testing. (a) Black box testing This method is used when knowledge of the specified function that a product has been design to perform is known. The concept of black box is used to represent a system hose inside working’ s are not available to inspection. In a black box the test item is eaten as “Black”, since its logic is unknown is what goes in and what comes out, or the input and output. In black box testing, we try various inputs and examine the resulting outputs. The black box testing can also be used for scenarios based test .In this test we verify whether it is taking valid input and producing resultant out to user. It is imaginary box testing that hides internal workings. In our project valid input is image resultant output well structured image should be received. Input output Figure 6.1 (b) White box testing White box testing is concern with testing implementation of the program. The intent of structural testing is not to exercise all the inputs or outputs but to exercise the different programming and data structures used in the program. Thus structure testing aims to achieve test cases that will force the desire coverage of different structures. Two types of path testing are: 1. Statement testing
  • 51. 2. Branch testing Statement Testing The main idea of statement testing coverage is to test every statement in the objects method by executing it at least once. However, realistically, it is impossible to test program on every single input, so you never can be sure that a program will not fail on some input. Branch Testing The main idea behind branch testing coverage is to perform enough tests to ensure that every branch alternative has been executed at least once under some test. As in statement testing coverage, it is unfeasible to fully test any program of considerable size. Input Output Figure 6.2 6.2 UNIT TESTING Unit testing is usually conducted as part of a combined code and unit test phase of the software life-cycle, although it is not uncommon for coding and unit testing to be conducted as two distinct phases. Test strategy and approach Field testing will be performed manually and functional tests will be written in detail. Test objectives  All field entries must work properly. INTERNAL WORKING
  • 52.  Pages must be activated from the identified link.  The entry screen, messages and responses must not be delayed. Features to be tested  Verify that the entries are of the correct format.  No duplicate entries should be allowed.  All links should take the user to the correct page. 6.3 INTEGRATION TESTING Software integration testing is the incremental integration testing of two or more integrated software components on a single platform to produce failures caused by interface defects. The task of the integration test is to check that components or software applications, ex. components in a software system or one step up software applications at the company level - interact without error. Test Results: All the test cases mentioned above passed successfully. No defects encountered. 6.4 ACCEPTANCE TESTING User Acceptance Testing is a critical phase of any project and requires significant participation by the end user. It also ensures that the systemmeets the functional requirements. Test Results: All the test cases mentioned above passed successfully. No defects encountered.
  • 53. STAGE 7: 7. SCREENSHOTS OUTPUT SCREENS The following shows the series of output screens and how the actual process of implementing OCR takes place:-  The first and the home page of our OCR systemlooks as shown in below figure provides an interface to the user such that the user can access any module that is present in this software in this software from this page itself. The page is as shown below:- 7.1 Main Screen
  • 54.  There are two types of recognition in the document recognition module. They are handwrtitten letter recognition and the scanned document recognition. The implementation of the handwritten document recognition proceeds as follows:- Firstly suppose that we have drawn a letter named ‘ A’ in the workspace provided. Hand Written Screen1  From the above screen we can write letters on the workspace provided with the name “Draw Letters Here” by using mouse pointer. For recognizing these letters we have to train the systemfirst. Else, it will give an error message depicting that the systemhas to be trained first. This process is explained with the following screens:-
  • 55. Hand Written Screen3  Now suppose that you have clicked the “Recognize” button without training, for recognizing the character you have written and showing the recognized character in the grid. Then it will display an error message as shown below:- Hand Written Screen4
  • 56.  Now if we click the “Begin Training” button before proceeding with the recognition then a status message with successful status is shown below:- Hand Written Screen5  Since the training has been completed, now the letter ‘ A’ can be recognized by clicking then “Recognize” button. Then the letter ‘ A’ will appear in the grid as output. It is as shown below:-
  • 57. Hand Written Screen6  Once we have provided training to the systemfor every session, the systemdo not need any further training for any kind of letter in any kind of language. That is, once the training is provided to the system for at-least one character then on wards, it will recognize any character written in the workspace without the need of training it. For Example, First we have written letter ‘ A’ provided training for it and recognized the letter A. Now we have written letter S. Now without the need for the training we can directly recognize the letter ‘ S’ in the grid by clicking the “Recognize” button. Thus we do not need to train the systemfurther.
  • 58. Hand Written Screen7 Hand Written Screen8  Since we have provided the training to the systemonce with one character of English language, We can now recognize the characters of any language other than English that
  • 59. too without the need for training. Suppose we have written a telugu character as shown below:- Hand Written Screen9  Now we can directly recognize the above telugu character without the need of training the system. Just click the “Recognize” button once after drawing the letter in the workspace. Hand Written Screen 10
  • 60.  Next other than providing the training to the system through the drawn letters, we can also train the system by providing the characters through the keyboard and storing them as patterns. Later we provide training to the system on those patterns. Firstly, We provide the input through the keyboard as follows:- Hand Written Screen11  If we click ok, those letters will be saved in stored patterns workspace. Later we can click “Begin Training” button such that those stored patterns will be trained to the system. Else, it will provide an error message depicting that the systemneeds training.
  • 61. Hand Written Screen12  Now suppose that if we write a word ‘ sr’ and click “Recognize” button before providing training on the above stored pattern ‘ A’ then an error message will be displayed depicting that the systemneeds to be trained on the stored patterns as shown below:- Hand Written Screen13
  • 62.  Now click the “Begin Training” button before you attempt to recognize the drawn word ‘ sr’ . Then it produces an output screen as shown below indicating that the training has been completed:- Hand Written Screen14  Now if we click the “Recognize” button then the drawn word ‘ sr’ is recognized and is shown as an output in the grid format by firing the last neuron in stored patterns.
  • 63. Hand written screen15  Since we have provided training on the stored patterns once, from now onwards we can just draw the characters or words of any language and we can recognize them directly by clicking the ”Recognize” button without the need for training the system again. An example is shown for a telugu word. Hand written screen 16
  • 64.  Process Explaining Scanned Document Recognition Firstly, When we click the “Scanned Document Recognition Button” the main page of this recognition module is displayed as follows:- Hand Written screen17  The data that is present in the first text box is the default image file set by the user. The user can change the input image file rather than the default image file by clicking open and then selecting an image file. The procedure is as shown below:-
  • 65. Hand written screen18 Hand Written Screen19  There are two main tabs under the scanned document recognition. They are training and recognition. First we should train the system under training module. Only then we
  • 66. can recognize the characters from the input image provided using the recognition module. The training tab under scanned document recognition looks like this. Hand written screen20  The above figure shows the default input image for training. We can change the training input image for different fonts by opening different input image files and then training them such that the system gets adapted to the new fonts.
  • 67. Hand written screen21  The choice of opening image file changes the default input image for training in to a new image as shown below. Hand written screen22
  • 68.  Now the user can select the bounds up to which the systemmust be trained just by using click and drag actions of the mouse. Then selected data highlights as follows:- Hand written screen23 After selection of the data, just click the “Train” button. This lets the systemto train itself with the help of the kohonen network and finally displays a dialog box depicting that the training has been completed successfully.
  • 69. Hand written screen24  Once the training of the systemis completed, we move on to the recognition phase where we open a new scanned image file to be converted into editable document as an input as per our requirement. Now we select that part of the image from which the data has to be extracted. Then it looks like:-
  • 70. Hand written screen 25  Next click the “Crop” button such that it finds the bounds of the text that is selected by the user by composing a red boundary line around the selected text. It is as shown below:- Hand written screen 26  Finally click the “Recognize” button such that it extracts/recognizes the characters from the image and presents it to the user. But this data is still not editable. Hence when we click on the “EDIT” button provided at the bottom-center then the document becomes both editable and searchable. This complete process is explained in the upcoming two screens. It is as shown below:-
  • 71. Hand written screen 27 Hand written screen 28  Now from the data available in the above screen shot, we can make any sort of changes to the document using cut, copy, paste and etc and You can finally save the document in two formats(word, text) as per our design.
  • 72.  The search function can be carried out here by clicking the “find” image button at the bottom-left corner. Then it asks the user to enter the search term. It is as shown below:- Hand written screen29  Now in the above screen shots dialog box, if you click Ok then there are two cases that happens over here as per our design. They are:- Case-1:- If the user enter search term resides in the document, then it will display a dialog box asking the user, “whether he wants to continue the search or not? “. If the user clicks yes then it will move the cursor to the search term. If the user clicks no then it will exit the search. Case-2:-If the user enters a search term that does not reside in the document, then it will direct display a dialog box saying that the searching is finished. It means that the search term is not present in the document. Thus the user can understand whether the search term is present in the document or not just after entering the search term itself.
  • 73.  If we are searching for a term that is already present in the document then the series of output screens will be as follows:- Hand written screen 30 Hand written screen31
  • 74.  If we are searching for a term that does not reside in the document then the series of output screens will be as follows:- 32 Hand written screen 33
  • 75.  If we are using the editor, you can perform the following actions displayed in the screens below:- Hand written screen 34
  • 76. Hand written screen 35  The editor module directly displays the screen as shown below:- Hand written screen 37
  • 77. STAGE 8 8. CONCLUSION What does the future hold for OCR? Given enough entrepreneurial designers and sufficient research and development dollars, OCR can become a powerful tool for future data entry applications. However, the limited availability of funds in a capital-short environment could restrict the growth of this technology. But, given the proper impetus and encouragement, a lot of benefits can be provided by the OCR system. They are:-  The automated entry of data by OCR is one of the most attractive, labor reducing technology  The recognition of new font characters by the system is very easy and quick.  We can edit the information of the documents more conveniently and we can reuse the edited information as and when required.  The extension to software other than editing and searching is topic for future works. The Grid infrastructure used in the implementation of Optical Character Recognition system can be efficiently used to speed up the translation of image based documents into structured documents that are currently easy to discover, search and process.
  • 78. STAGE 9 9. FUTURE ENHANCEMENTS The Optical Character Recognition software can be enhanced in the future in different kinds of ways such as:  Training and recognition speeds can be increased greater and greater by making it more user-friendly.  Many applications exist where it would be desirable to read handwritten entries. Reading handwriting is a very difficult task considering the diversities that exist in ordinary penmanship. However, progress is being made. STAGE 10 10.REFERENCES Under this references section, we have mentioned various references from which we collected our problem and several others that supported us to design the solution for our problem. These references include either books, papers published through some standards and several websites links with URL’ s:-  For the complete reference and understanding of neural networks refer jeff heaton’s chapter 1 from www.jeffheaton.com  For the complete reference and understanding of OCR refer jeff heaton’s chapter 7 from www.jeffheaton.com
  • 79.  The IEEE standard reference paper from which we collected our problem statement is authorized by Dana Petcu, Silviu Panica, Viorel Negru and Andrei Eckstein of Computer Science Department who are from West University of Timisoara, Romania.  The reference paper is also authorized by Doina Banciu from National Institute for Research and Development in Informatics, Romania.  You can refer the IEEE standard paper written by D. Andrews, R. Brown, C. Caldwell, et al., “A Parallel Architecture for Performing Real Time Multi-Line Optical Character Recognition”  You can refer the IEEE standard paper written by H. Goto, “OCRGrid : A Platform for Distributed and Cooperative OCR Systems”  You can refer the paper written by M. Forbes, “OCHRE-P Optical Character Recognition in Parallel”, which you can locate during your browsing itself.  Also refer R. Mason, H. Schmidt, R. Trott, “Down on the OCR Farm: How We Produced Searchable PDFs for 7 Million Documents in a Student Computer Lab” STSGE 11 11. APPENDICES Appendix A:Glossary TERMS All the terms and abbreviations in the project are specified clearly. For further development of project evolved definitions will be specified
  • 80. ACRONYMS IEEE : Institute of Electrical and Electronics Engineers DFD : Data Flow Diagram UML : Unified Modeling Language J2EE : Java 2 Enterprise Edition GUI : Graphical User Interface GOCR : Grid OCR Appendix B: Analysis Models This includes all the pertinent analysis models, such as data flow diagrams,class diagrams , use case diagrams, interaction diagrams and state-chart diagrams. STAGE 12 BIBILOGRAPHY Java AWT By John Zukowski Java Swings By Dietel & Dietel Java Complete Reference