SlideShare a Scribd company logo
1 of 47
Download to read offline
A MINOR PROJECT REPORT
On
IMAGE RECOGNITION
Submitted in partial fulfilment of the requirement for the award of the degree of
B. TECH
in
COMPUTER SCIENCE AND ENGINEERING
Submitted By
BHASKAR TRIPATHI :RA1611003040016
JOEL JOSE : RA1611003040128
DEPT. OF COMPUTER SCIENCE & ENGINEERING
SRM Institute of Science & Technology
Vadapalani Campus, Chennai
OCTOBER 2018
BONAFIDE CERTIFICATE
Certified that this project report “IMAGE RECOGNITION“is the bonafide work of “BHASKAR
TRIPATHI ASHWINI KUMAR and JOEL JOSE” who carried out the project work under my
supervision.
SIGNATURE OF THE GUIDE SIGNATURE OF THE HOD
Dr.P.Mohamed Fathimal B.E.,M.E.,PhD. Dr.S. Prasanna Devi, B.E., M.E., Ph.D.,
PGDHRM., PDF(IISc)
Assistant Professor Professor
Department of Computer Science and
Engineering
Department of Computer Science and
Engineering
SRM Institute of Science & Technology SRM Institute of Science & Technology
Vadapalani Campus Vadapalani Campus
I
ACKNOWLEDGEMENT
It is our privilege to express our sincerest regards to our project coordinator, Dr.P.Mohamed Fathimal for
her valuable inputs, able guidance, encouragement, whole-hearted cooperation and constructive criticism
throughout the duration of our project.
We deeply express our sincere thanks to our Head of Department Dr. S. Prasanna Devi, for encouraging
and allowing us to present the project on the topic “IMAGE RECOGNITION “ at our department
premises for the partial fulfillment of the requirements leading to the award of B-Tech degree.
We take this opportunity to thank all our faculty members, Dean, Dr.K.Duraivelu, and Management who
have directly or indirectly helped our project. Last but not the least we express our thanks to our friends
for their cooperation and support.
II
TABLE OF CONTENT Page No
ABSTRACT
CHAPTER 1
Introduction
1.1 Introduction of Project
1.2 Overview
CHAPTER 2
About Project
1.3 Purpose
2.2Project Scope
1.4 Existing System
1.5 Drawback Of Existing System
1.6 Proposed System
1.7 Benifits Of Proposed System
1.8 System Specifications
CHAPTER 3
3.1 Tools And Technology
3.2 Architecture Of Proposed System
3.3 Problem Statement
3.4 Modules and their Functionalities
CHAPTER 4
System Study
4.1 Data Flow Diagram
4.2 UML Diagram
CHAPTER 5
Modules
III
5.1 Main Modules
5.2 Face Detection
5.3 Feature Extraction
5.4 Recognition
CHAPTER 6
System Testing
CHAPTER 7
Source Code
7.1 Text Recognition
7.2 Face Recognition
7.3 Landmark Detection
7.4 Label Detection
CHAPTER 8
Screenshots
CHAPTER 9
Conclusion
REFRENCES
IV
LIST OF FIGURES
Figures Page No.
3.2 Architecture of proposed System
4.1 proposed model for real-time classification
4.2.1 Use-Case Diagram For Document Editing
4.2.2 Class Diagram
4.2.3 Sequence Diagram for Processing
4.2.3.a Sequence Diagram for Training
4.2.4 Sequence Diagram for Recognition
4.2.5 Sequence Diagram for Editing
6.1 Normalized confusion matrix of our mini-Xception network.
6.2 provided real-time emotion classification provided public repository
Screenshots
1
ABSTRACT
In this paper we propose an implement a general convolutional neural network (CNN)
building framework for designing real-time CNNs. We validate our models by creating a real-
time vision system which accomplishes the tasks of face detection, gender classification and
emotion classification simultaneously in one blended step using our proposed CNN
architecture. After presenting the details of the training procedure setup we proceed to
evaluate on standard benchmark sets. We report accuracies of 96% in the IMDB gender
dataset and 66% in the FER-2013 emotion dataset. Along with this we also introduced the
very recent real-time enabled guided backpropagation visualization technique. Guided back-
propagation uncovers the dynamics of the weight changes and evaluates the learned features.
We argue that the careful implementation of modern CNN architectures, the use of the current
regularization methods and the visualization of previously hidden features are necessary in
order to reduce the gap between slow performances and real-time architectures.All our code,
demos and pretrained architectures have been released under an open-source license in our
public repository.
2
CHAPTER-1
INTRODUCTION
In the running world, there is growing demand for the software systems to recognize images when
information is scanned through Google vision API. Image recognition, in the context of machine vision, is the
ability of software to identify objects, places, people, writing and actions in images. Computers can use
machine vision technologies in combination with a camera and artificial intelligence software to achieve
image recognition.While human and animal brains recognize objects with ease, computers have difficulty
with the task. Inorder to overcome this problem faced by computer softwares are used.Current and future
applications of image recognition include smart photo libraries, targeted advertising, the interactivity of media,
accessibility for the visually impaired and enhanced research capabilities. Google, Facebook, Microsoft
and Apple are some of the major companies which uses this technology.Facebook can now perform face
recognize at 98% accuracy which is comparable to the ability of humans. Facebook can identify your friend’s
face with only a few tagged pictures. The efficacy of this technology depends on the ability to classify images.
Classification is pattern matching with data. Images are data in the form of 2-dimensional matrices. In fact,
image recognition is classifying data into one category out of many. One common and an important example
is optical character recognition (OCR). OCR converts images of typed or handwritten text into machine-
encoded text.The major steps in image recognition process are gather and organize data, build a predictive
model and use it to recognize images.
Furthermore, the human accuracy for classifying an image of a face in one of 7 different emotions is 65% ±
5% . One can observe the difficulty of this task by trying to manually classify the FER-2013 dataset images
within the following classes {“angry”, “disgust”, “fear”, “happy”, “sad”, “surprise”, “neutral”}.
Gender classification was first perceived as an issue in psychophysical studies; it focuses on the efforts of
understanding human visual processing and identifying key features used to categorize between male and
female individuals [1]. Research has shown that the disparity between facial masculinity and femininity can
be utilized to improve performances of face recognition applications in biometrics, human–computer
interactions, surveillance, and computer vision. However, in a real-world environment, the challenge is how to
deal with the facial image being affected by the variance in factors such as illumination, pose, facial
expression, occlusion, background information, and noise dependent on the type of classifier chosen, which is
in turn dependent on the feature extraction method applied.
3
It is difficult to find a classifier that combines best with the chosen feature extractor such that an optimal
classification performance is achieved. Any changes to the problem domain require a complete redesign of the
system. The convolutional neural network (CNN) is a neural network variant that consists of a number of
convolutional layers alternating with subsampling layers and ends with one or more fully connected layers in
the standard multilayer perceptron (MLP). A significant advantage of the CNN over conventional approaches
in pattern recognition is its ability to simultaneously extract features, reduce data dimensionality, and classify
in one network structure. Such a structure, as illustrated in Figure 1, can boost recognition accuracy efficientl
yand cost-effectively.
This is then also the challenge in the development of a robust face-based gender classification system that has
high classification accuracy and real-time performance. The conventional approach applied in face recognition,
including face-based gender recognition, typically involves the stages of image acquisition and processing,
dimensionality reduction, feature extraction, and classification, in that order. Prior knowledge of the
application domain is required to determine the best feature extractor to design.
4
CHAPTER-2
ABOUT PROJECT
2.1PURPOSE
The main purpose of Image Recognition system based on a grid infrastructure is to perform Image Analysis,
document processing of electronic document formats converted from paper formats more effectively and
efficiently. This improves the accuracy of recognizing the characters during document processing compared
to various existing available character recognition methods.The primary objective is to speed up the process
of character recognition in document processing. As a result the system can process huge number of
documents with-in less time and hence saves the time.This application can be used as a quick search engine
which doesn’t need any long they typing for searching anything and quicken our work.Since our image
recognition is based on a grid infrastructure, it aims to recognize multiple images and characters that belong to
different universal languages with different properties,font properties and alignments.
2.2PROJECT SCOPE
The scope of our project Image Recognition on a grid infrastructure is to provide an efficient and
enhanced software tool for the users to perform Document Image Analysis, document processing
by reading and recognizing the characters in research, academic, governmental and business
organizations that are having large pool of documented, scanned images. Irrespective of the size
of documents and the type of characters in documents, the product is recognizing them, searching
them and processing them faster according to the needs of the environment.
2.3EXISTING SYSTEM
In the running world there is a growing demand for the users to convert the images,printed documents for
identifying the content within them and process it to understand what it means . Hence the google lens system
was invented to convert the images and data available on papers in to computer process able documents and
images .Google Lens is an image recognition mobile app developed by Google. First announced
during Google I/O 2017, it is designed to bring up relevant information using visual analysis.When directing
the phone's camera at an object, Google Lens will attempt to identify the object or read labels and text and
show relevant search results and information. For example, when pointing the device's camera at a Wi-Fi label
containing the network name and password, it will automatically connect to the Wi-Fi source that has been
5
scanned. Lens is also integrated with the Google Photos and Google Assistant apps.The service is similar
to Google Goggles, a previous app that functioned similarly but with lesser capability.Lens uses more
advanced deep learning routines, similar to other apps like Bixby Vision (for Samsung devices released 2016
and after) and Image Analysis Toolset (available on Google Play); artificial neural networks are used to detect
and identify objects, landmarks and to improve optical character recognition (OCR) accuracy
2.4 DRAWBACK OF EXISTING SYSTEM
The drawback in the Google lens include Device support is limited, although it is not clear
which devices are not supported or why. It requires Android Marshmallow (6.0) or newer.It is not
available in India too.
2.5 PROPOSED SYSTEM
Our proposed system is on a grid infrastructure which is a image and character recognition
system that supports recognition of the images and characters. This feature is what we call grid
infrastructure which eliminates the problem of heterogeneous character recognition and supports
multiple functionalities to be performed on the documents and images.
2.6 BENEFIT OF PROPOSED SYSTEM
The benefit of proposed system that overcomes the drawback of the existing system is that it supports Mobile
application.Image recognition with Google vision API and Google lens identifies famous
personalities,animals,the actions being performed within the image.Text understanding and text retrieval are
used to extract images and texts within surrounding and identify them with the help of Google lens and vision
API.
6
2.7 SYSTEM SPECIFICATIONS
Hardware requirements:
1 System : Snapdragon 410
2 Hard disk : 8 GB
3 Floppy drive :Not required
4 Monitor : Any
5 Ram : 512MB
Software requirements:
● Operating system :Android
● Coding language : Java & XML
● Data base : Not required
● API : 21
● SDK Version:3.2.1
7
CHAPTER 3
3.1 TOOLS AND TECHNOLOGY
3.1.1Android Studio:
Android Studio is the official integrated development environment (IDE) for Google's Android operating
system, built on JetBrains' IntelliJ IDEA software and designed specifically for Android development.[8]
It is
available for download on Windows, macOS and Linux based operating systems.[9][10]
It is a replacement for
the Eclipse Android Development Tools (ADT) as the primary IDE for native Android application
development.
Android Studio was announced on May 16, 2013 at the Google I/O conference. It was in early access preview
stage starting from version 0.1 in May 2013, then entered beta stage starting from version 0.8 which was
released in June 2014. The first stable build was released in December 2014, starting from version 1.0.The
current stable version is 3.2.1, which was released in October 2018
Features
 Gradle-based build support
 Android-specific refactoring and quick fixes
 Lint tools to catch performance, usability, version compatibility and other problems
 ProGuard integration and app-signing capabilities
 Template-based wizards to create common Android designs and components
 A rich layout editor that allows users to drag-and-drop UI components, option to preview layouts on
multiple screen configurations
 Support for building Android Wear apps
 Built-in support for Google Cloud Platform, enabling integration with Firebase Cloud Messaging (Earlier
8
 Android Virtual Device (Emulator) to run and debug apps in the Android studio.
Android Studio supports all the same programming languages of IntelliJ, and CLion e.g.Java (programming
language), and C++; and Android Studio 3.0 or later supports Kotlin and Java 7 language features and a
subset of Java 8 language features that vary by platform version."External projects backport some Java 9
features.
3.1.2Google vision API
Google Vision API allows developers to easily integrate vision detection features within applications,
including image labeling, face and landmark detection, optical character recognition (OCR), and tagging of
explicit content. ... Cloud AutoMLVision enables you to create a custom machine learning model for image
labeling
How to use Google Vision API?
Recently, we covered how computers can see, hear, feel, smell, and taste. One of the ways your code can
“see” is with the Google Vision API. Google Vision API connects your code to Google’s image recognition
capabilities. You can think of Google Image Search as a kind of API/REST interface to images.google.com,
but it does much more than show you similar images.
Google Vision can detect whether you’re a cat or a human, as well as the parts of your face. It tries to detect
whether you’re posed or doing something that wouldn’t be okay for Google Safe Search—or not. It even tries
to detect if you’re happy or sad
9
3.2 ARCHITECTURE OF THE PROPOSED SYSTEM
Working architecture of Proposed System
10
The Architecture of the optical character recognition system on a grid infrastructure consists of
the three main components. They are:-
 Scanner
 OCR Hardware or Software
 Output Interface
3.3PROBLEM STATEMENT
The problem here is for the software systems to recognize characters in computer system when
information is scanned through paper documents as we know that we have number of newspapers
and books which are in printed format related to different subjects. Whenever we scan the
documents through the scanner, the documents are stored as images such as jpeg, gif etc., in the
11
computer system. These images cannot be read or edited by the user. But to reuse this
information it is very difficult to read the individual contents and searching the contents form
these documents line-by-line and word-by-word. These days there is a huge demand in “storing
the information available in these paper documents in to a computer storage disk and then later
editing or reusing this information by searching process”.
3.4MODULES AND THEIR FUNCTIONALITIES
Our software system Optical Character Recognition on a grid infrastructure can be divided
into five modules based on its functionality.The modules classified are as follows:-
● Document Processing Module
● System Training Module.
● Document Recognition Module.
● Document Editing Module and
● Document Searching Module.
3.4.1 DOCUMENT PROCESSING MODULE
This module is accessed by administrator whose role in our application is a librarian.This module
perform certain activities such as scanning documents, storing them as images, recognizing
characters in images to transfer them into word format. During the recognition process, this
module uses the OCR methodology in support of grid infrastructure datastructure. The module
supports the following services:-
 Scanning printed documents.
 Storing the documents as snapshots or images.
 Processing those image-based documents.
 Converting these image-based documents into e-documents(also called structured
documents).
 Recognizing the characters in documents.
 Generating grid infrastructure datastructure.
12
3.4.2 DOCUMENT RECOGNITION MODULE
This module can be accessed by both the administrator and the end-user. Once the printed
documents are converted into structured documents, any user can recognize the characters present
in the document. That means the user can recognize the characters of any language he chooses
which makes OCR more flexible. This flexibility is due to the adaptation of grid infrastructure.
This is the module where the main functionality of OCR is tested.
Under this module, there are two types of recognition. They are handwritten recogniiton
and scanned document recognition.
In handwritten recognition, the handwriting of the user in any language is trained to the system
only for the first time. From there on-wards, the system recognizes the characters or words
written by the user. Thus handwritten document recognition recognizes the human handwriting.
In scanned document recognition, the system is first trained with the font characters in the
document in the training module itself. Now in the recognition module, the system takes the
scanned documents image as an input file, first crops the image and then extracts/recognizes the
characters from the document and makes these documents editable and searchable. Thus the
scanned document recognition recognizes the chracters from thescanned document image and
makes the document editable and searchable. Hence the document recogniiton module on a whole
supports the following services:-
 Converts the document into specific format
 Recognizes the characters
 Heterogeneous character Recognition
3.4.3 DOCUMENT EDITING MODULE
This module can be accessed by both the administrator and the end-user during document editing
to implement the character recogniiton process. Once the scanned documents are stored, they
reside in computer memory. This data resides in the form of an image that is just viewable in an
image viewer. Hence, the document is first coverted into a form such that it is editable. The
desired form of the document may be MS-Word,Text,… as specified by the user.The objective of
this module is to let the user perform :-
13
 Addition of specific content to the documents
 Deletion of certain content from documents
 Any other modification of documents.
3.4.4 DOCUMENT SEARCHING MODULE
This module can be accessed by both the administrator and the end-user during the search of the
user required document to implement the character recogniiton process on it. The user requests
the system to search for a particular document. Then the system finds the documents based on
OCR methodology and returns the result of the search to the user.
14
SYSTEM STUDY
CHAPTER 4
We propose two models which we evaluated in accordance to their test accuracy and number of parameters.
Both models were designed with the idea of creating the best accuracy over number of parameters ratio.
Reducing the number of parameters help us overcoming two important problems. First, the use of small CNNs
alleviate us from slow performances in hardware-constrained systems such robot platforms. And second, the
reduction of parameters provides a better generalization under an Occam’s razor framework. Our first model
relies on the idea of eliminating completely the fully connected layers. The second architecture combines the
deletion of the fully connected layer and the inclusion of the combined depth-wise separable convolutions and
residual modules. Both architectures were trained with the ADAM optimizer [8]. Following the previous
architecture schemas, our initial architecture used Global Average Pooling to completely remove any fully
connected layers. This was achieved by having in the last convolutional layer the same number of feature
maps as number of classes, and applying a softmax activation function to each reduced feature map. Our
initial proposed architecture is a standard fully-convolutional neural network composed of 9 convolution
layers, ReLUs [5], Batch Normalization [7] and Global Average Pooling. This model contains approximately
600,000 parameters. It was trained on the IMDB gender dataset, which contains 460,723 RGB images where
each image belongs to the class “woman” or “man”, and it achieved an accuracy of 96% in this dataset. We
also validated this model in the FER-2013 dataset. This dataset contains 35,887 grayscale images where each
image belongs to one of the following classes {“angry”, “disgust”, “fear”, “happy”, “sad”, “surprise”,
“neutral”}. Our initial model achieved an accuracy of 66% in this dataset. We will refer to this model as
“sequential fully-CNN”. Our second model is inspired by the Xception [1] architecture. This architecture
combines the use of residual modules [6] and depth-wise separable convolutions [2]. Residual modules
modify the desired mapping between two subsequent layers, so that the learned features become the
difference of the original feature map and the desired features. Consequently, the desired features H(x) are
modified in order to solve an easier learning problem F(X) such that:
H(x) = F(x) + x (1)
Since our initial proposed architecture deleted the last fully connected layer, we reduced further the
amount of parameters by eliminating them now from the convolutional layers. This was done trough the use
of depth-wise separable convolutions. Depth-wise separable convolutions are composed of two different layers:
depth-wise convolutions and pointwise convolutions. The main purpose of these layers is to separate the
spatial cross-correlations from the channel crosscorrelations [1]. They do this by first applying a D × D
15
filter on every M input channels and then applying N 1 × 1 × M convolution filters to combine the M input
channels into N output channels. Applying 1 × 1 × M convolutions combines each value in the feature map
without considering their spatial relation within the channel. Depth-wise separable convolutions reduces the
computation with respect to the standard convolutions by a factor of 1 N + 1 D2 [2]. A visualization of the
difference between a normal Convolution layer and a depth-wise separable convolution can be observed in
Figure 4 b.
Our final architecture is a fully-convolutional neural network that contains 4 residual depth-wise separable
convolutions where each convolution is followed by a batch normalization operation and a ReLU activation
function. The last layer applies a global average pooling and a soft-max activation function to produce a
prediction. This architecture has approximately 60,000 parameters; which corresponds to a reduction of 10×
when compared to our initial naive implementation, and 80× when compared to the original CNN. Figure 4a
displays our complete final architecture which we refer to as mini-Xception. This architectures obtains an
accuracy of 95% in gender classification task. Which corresponds to a reduction of one percent with respect to
our initial implementation. Furthermore, we tested this architecture in the FER-2013 dataset and we obtained
the same accuracy of 66% for the emotion classification task. Our final architecture weights can be stored in
an 855 kilobytes file. By reducing our architectures computational cost we are now able to join both models
and use them consecutively in the same image without any serious time reduction.
16
Fig. 4 a : Our proposed model for real-time classification.
17
SOFTWARE DESIGN
4.1 DATA FLOW DIAGRAM
The DFD is also called as bubble chart. A data-flow diagram (DFD) is a graphical
representation of the "flow" of data through an information system. DFD’s can also be used for
the visualization of data processing. The flow of data in our system can be described in the form
of dataflow diagram as follows:-
1.Firstly, if the user is administrator he can initialize the following actions:-
● Document processing
● Document search
● Document editing.
All the above actions come under 2cases.They are described as follows:-
a)If the printed document is a new document that is not yet read into the system, then the
document processing phase reads the scanned document as an image only and then produces the
document image stored in computer memory as a result.
Now the document processing phase has the document at its hand and can read the document at
any point of time. Later the document processing phase proceeds with recognizing the document
using OCR methodology and the grid infrastructures. Thus it produces the documents with the
recognized characters as final output which can be later searched and edited by the end-user or
administrator.
b)If the printed document is already scanned in and is held in system memory, then the document
processing phase proceeds with document recognition using OCR
methodology and grid infrastructure. And thus it finally produces the document with recognized
documents as output.
18
1. If the user using the OCR system is the end-user, then he can perform the following
actions:-
● Document searching
● Document editing
1. Document Searching:- The documents which are recognized can be searched by the
user whenever required by requesting from the system database.
Document Editing:- The recognized documents can be edited by adding the specific content to
the document, deleting specific content from the document and modifying the document
methodology and grid infrastructure. And thus it finally produces the document with recognized
documents as output.
2. If the user using the OCR system is the end-user, then he can perform the following
actions:-
● Document searching
● Document editing
2. Document Searching:- The documents which are recognized can be searched by the
user whenever required by requesting from the system database.
3. Document Editing:- The recognized documents can be edited by adding the specific
content to the document, deleting specific content from the document and modifying the
document.
4.1 UML DIAGRAMS
UML combines best techniques from data modeling (entity relationship diagrams), business
modeling (work flows), object modeling, and component modeling. It can be used with all
processes, throughout the software development life cycle, and across different implementation
technologies. UML has 14 types of diagrams divided into two categories. Seven diagram types
represent structural information, and the other seven represent general types of behavior,
including four that represent different aspects of interactions. Some of these diagrams we
provided to describe the design and implementation of our OCR system can be categorized
hierarchically as below:-
 Use case diagram
19
 Class diagram
 Sequence diagram
 Collaboration diagram
 Activity diagram
 Component diagram
 Deployment diagram
4.2.1 USE-CASE DIAGRAMS
Our software system can be used to support library environment to create a Digital Library
where several paper documents are converted into electronic-form for accessing by the users. For
this purpose the printed documents must be recognized before they are converted into
electronic-form. The resulting electronic-documents are accessed by the users likefaculty and
students for reading and editing. Now according to this information, the following are the
different actors involved in implementing our OCR system:-
 If we consider for virtual digital library, the Administrator can be the Librarian and
the End-users can be Students or/and Faculty.
 The following are the list of use diagrams that altogether form the complete or the
overall use-case diagram. They are listed below:-
1. Use-case diagram for document processing
2. Use-case diagram for neural network training
3. Use-case diagram for document recognition
4. Use-case diagram for document editing
5. Use-case diagram for document searching
In each of the use-case diagrams below we clearly explained about that particular use- case
functionality. In this we provided a description about the
● Use-case name
● Details about the use-case
● Actors using this use-case
20
Use case Name
Neural Network Training
Description
The Administrator or End-user enters the specific characters required for training. User
stores them as image file and trains the system.
Actors
○ Primary Actor : Administrator or End-user
○ Secondary Actor : User
Flow of Events
1. The user enters the specific characters in order to train the system.
2. After entering it is stored as image file.
3. Finally trains the system according to the system.
Pre-Condition
The font in the scanned document should be identified.
21
Open document in editor
Select Edit action
Administrator or
End-user
Performs editing
Stores edited document
Figure 4.2.1 Use-Case Diagram For Document Editing
Actors
○ Primary Actor : Administrator or End-user
○ Secondary Actor : User
Flow of Events
1. The user opens the document for searching a word he required.
2. After opening the document he enters the word for search.
22
<<includes>>
scan documents
Document processing
administrator <<includes>>
store documents
Document recognition
Trainsthesystem
end-use
Document processing
3. Finally searches the word in that document.
Pre And Post Conditions
No pre-condition and post-condition
Overall Use-Case Diagram
4.2.2 CLASS DIAGRAMS
The class diagram is the main building block in object oriented modeling. The classes in a class
diagram represent both the main objects and or interactions in the application and the objects to
be programmed.
● The class diagram of our OCR system consists of 9classes. They are
23
1. MainScreen
2. Editor
3. HelpFrame
4. Document
5. HEntry
6. Entry
7. TrainingSet
8. KohonenNetwork
9. PrintedFrame.
Among all these classes the MainScreen is the main class that represents all the major
functions carried out by our OCR system. The MainScreen class has an association with
five classes viz., Editor, HelpFrame, Document, TrainingSet, PrintedFrame. And the
TrainingSet class in-turn has an association with the HEntry and the KohonenNetwork
classes. The PrintedFrame has an association with the Entry and KohonenNetworkclasses.
24
Figure 4.4.2:Class Diagram
4.2.3 SEQUENCE DIAGRAMS
Sequence diagrams are sometimes called Event-trace diagrams, event scenarios, and timing
diagrams. A sequence diagram shows, as parallel vertical lines (lifelines), different processes or
objects that live simultaneously, and, as horizontal arrows, the messages exchanged between them,
in the order in which they occur. This allows the specification of simple runtime scenarios in a
graphical manner.
In sequence diagram, the class objects that are used to describe the interaction between various
classes vary from one function to another function. There are five sequence diagrams short-listed
below for presenting the sequence of actions performed by each of the five modules. The key
class object involved in all of these module functions is MainScreen class which controls the
interaction among various class objects.
25
Sequence Diagram for Document Processing
1. Objects
Administrator - “a”
MainScreen - “m”
Document - “d”
SystemMemory - “s”
2. Links
1. Administrator object to MainScreen object.
2. MainScreen object to Document object.
3. Document object to SystemMemory object.
4. SystemMemory object to Administrator object.
3. Messages
1. Process documents
2. Scan documents
3. Scans
4. Stores documents
5. Stores
6. Returns the processed documents
7. Returns
8. End
9. Processed Document
26
1.Process
2.Scan
3.Scan
4.Stores
5.Stores
6.Returns theprocesseddocuments
s:SystemMemord:Documentm:MainSreena:Administraror
1. Objects
Figure 4.2.3:Sequence Diagram for Processing
Administrator - “a”
System - “s”
TrainingSet - “t”
2. Links
1. Administrator object to System object
2. System object to TrainingSet object
3. TrainingSet object to System object
4. System object to Administrator object
3. Messages
1. Specifies the font characters
2. Stores it as an image
3. Trains the system with new font
27
1.Specifies the fontcharacters
2.Stores it as an image
3.Trains the system with new font
4.System recognizes new font and returns for user
t:TrainingSets:Systema:Administrato
4. System recognizes new font and returns for user
Figure 4.2.3.a:Sequence Diagram for Training
Sequence Diagram for Document Recognition
1. Objects
Administrator - “a”
MainScreen - “m”
SystemMemory - “s”
TrainingSet - “t”
2. Links
1. Administrator object to MainScreen object
2. MainScreen object to SystemMemory object
3. SystemMemory object to MainScreen object
28
4. TrainingSet object to MainScreen object
5. MainScreen object to Administrator object
3. Messages
1. Recognize documents
2. Store processed document
3. Read file image
4. Recognize using ocr
5. Send processed document
6. Recognize the characters
Figure 4.4Sequence Diagram for Recognition
1:Recognise
2. Store processed
document
4.Recognise using ocr
5.Send processed
6.Recognise the
t:TrainingSes:SystemMemorya:Administrator m:MainScreen
29
Sequence Diagram for Document Editing
1. Objects
Administrator - “a”
MainScreen - “m”
Document - “d”
SystemMemory - “s”
2. Links
1. Administrator object to MainScreen object.
2. MainScreen object to Document object.
3. MainScreen object to Document object
4. MainScreen object to Document object
5. Document object to SystemMemory object.
6. SystemMemory object to Administrator object.
3. Messages
1. Edit document
2. Adding document
3. Adds
4. Deleting document
30
5. Modifying document
6. Modifies
7. Stores the edited documents
8. Administrator accesses the edited documents
Figure 4.2.5:Sequence Diagram for Editing
1.Edit
2.Adding
3.adds
5.Deletes
4.Deleting
6.Modifing
7.Modifies
8.Stores the edited documents
9.Administrator accesses the edited documents
s:SystemMemord.Documenta:Administrator m:MainScreen
31
CHAPTER 5
MODULES
5.1 Main Modules:
2 Face detection
3 Feature Extraction
4 Recognition
5.2 Face Detection:
Viola-Jones object detection framework is the first and one of the most mature frame Work to provide
competitive object detection rates in real-time It is a binary classification problem by implementing an
Adaboost classifier with Haar-like features.
5.3 Feature Extraction:
One possible classification divides the feature extraction methods into Holistic Methods and Local Feature-
based Methods. In the first method the whole face image is applied as an input of the recognition operation
similar to the well-known PCA-based method which was used in Kiby and Sirovich [5] followed by Turk and
Pentland [6]. In the second method local features are extracted, for example the location and local statistics of
the eyes, nose and mouth are used in the recognition task.
5.4 Recognition :
The facial recognition module is used to automatically identify people by their video images. It recognizes
faces captured by Axxon facial detection tool by comparing their parameters with digital templates stored in a
dedicated database.
32
CHAPTER 6
SYTEM TESTING
Results of the real-time emotion classification task in unseen faces can be observed in
Figure 8(a). Our complete realtime pipeline including: face detection, emotion and
gender classification have been fully integrated in our Care-O-bot 3 robot. An
example of our complete pipeline can be seen in Figure 8(b) in which we provide
emotion and gender classification. In Figure 7 we provide the confusion matrix results
of our emotion classification mini-Xception model. We can observe several common
misclassifications such as predicting “sad” instead of “fear” and predicting “angry”
instead “disgust”. A comparison of the learned features between several emotions and
both of our proposed models can be observed in Figure 8(c). The white areas in figure
8(d) correspond to the pixel values that activate a selected neuron in our last
convolution layer. The selected neuron was always selected in accordance to the
highest activation. We can observe that the CNN learned to get activated by
considering features such as the frown, the teeth, the eyebrows and the widening of
one’s eyes, and that each feature remains constant within the same class. These results
reassure that the CNN learned to interpret understandable human-like features, that
provide generalizable elements. These interpretable results have helped us understand
several common misclassification such as persons with glasses being classified as
“angry”. This happens since the label “angry” is highly activated when it believes a
person is frowning and frowning features get confused with darker glass frames.
Moreover, we can also observe that the features learned in our mini-Xception model
are more interpretable than the ones learned from our sequential fully-CNN.
Consequently the use of more parameters in our naive implementations leads to less
robust features.
33
Fig. 6.1: Normalized confusion matrix of our mini-Xception network.
Fig.6.2 Results of the provided real-time emotion classification provided in our public repository
34
CHAPTER 7
Source Code
7.1 TEXT RECOGNITION
public static void detectText(String filePath, PrintStream out) throws Exception, IOException
{ List<AnnotateImageRequest> requests = new ArrayList<>();
ByteString imgBytes = ByteString.readFrom(new FileInputStream(filePath));
Image img = Image.newBuilder().setContent(imgBytes).build();
Feature feat = Feature.newBuilder().setType(Type.TEXT_DETECTION).build();
AnnotateImageRequest request =
AnnotateImageRequest.newBuilder().addFeatures(feat).setImage(img).build();
requests.add(request);
try (ImageAnnotatorClient client = ImageAnnotatorClient.create())
{ BatchAnnotateImagesResponse response = client.batchAnnotateImages(requests);
List<AnnotateImageResponse> responses = response.getResponsesList();
for (AnnotateImageResponse res : responses)
{ if (res.hasError()) {
out.printf("Error: %sn", res.getError().getMessage());
return;
}
// For full list of available annotations, see http://g.co/cloud/vision/docs
for (EntityAnnotation annotation : res.getTextAnnotationsList()) {
out.printf("Text: %sn", annotation.getDescription());
out.printf("Position : %sn", annotation.getBoundingPoly());
}
}
}
}
7.2 FACE RECOGNITION
public static void detectFaces(String filePath, PrintStream out) throws Exception, IOException
{ List<AnnotateImageRequest> requests = new ArrayList<>();
ByteString imgBytes = ByteString.readFrom(new FileInputStream(filePath));
35
Image img = Image.newBuilder().setContent(imgBytes).build();
Feature feat = Feature.newBuilder().setType(Type.FACE_DETECTION).build();
AnnotateImageRequest request =
AnnotateImageRequest.newBuilder().addFeatures(feat).setImage(img).build();
requests.add(request);
try (ImageAnnotatorClient client = ImageAnnotatorClient.create())
{ BatchAnnotateImagesResponse response = client.batchAnnotateImages(requests);
List<AnnotateImageResponse> responses = response.getResponsesList();
for (AnnotateImageResponse res : responses)
{ if (res.hasError()) {
out.printf("Error: %sn", res.getError().getMessage());
return;
}
// For full list of available annotations, see http://g.co/cloud/vision/docs
for (FaceAnnotation annotation : res.getFaceAnnotationsList()) {
out.printf(
"anger: %snjoy: %snsurprise: %snposition: %s",
annotation.getAngerLikelihood(),
annotation.getJoyLikelihood(),
annotation.getSurpriseLikelihood(),
annotation.getBoundingPoly());
}
}
}
}
}
}
}
36
7.3 LANDMARK DETECTION
public static void detectLandmarks(String filePath, PrintStream out) throws Exception,
IOException {
List<AnnotateImageRequest> requests = new ArrayList<>();
ByteString imgBytes = ByteString.readFrom(new FileInputStream(filePath));
Image img = Image.newBuilder().setContent(imgBytes).build();
Feature feat = Feature.newBuilder().setType(Type.LANDMARK_DETECTION).build();
AnnotateImageRequest request =
AnnotateImageRequest.newBuilder().addFeatures(feat).setImage(img).build();
requests.add(request);
try (ImageAnnotatorClient client = ImageAnnotatorClient.create())
{ BatchAnnotateImagesResponse response = client.batchAnnotateImages(requests);
List<AnnotateImageResponse> responses = response.getResponsesList();
for (AnnotateImageResponse res : responses)
{ if (res.hasError()) {
out.printf("Error: %sn", res.getError().getMessage());
return;
}
// For full list of available annotations, see http://g.co/cloud/vision/docs
for (EntityAnnotation annotation : res.getLandmarkAnnotationsList()) {
LocationInfo info = annotation.getLocationsList().listIterator().next();
out.printf("Landmark: %sn %sn", annotation.getDescription(), info.getLatLng());
}
}
}
37
}
7.4 LABEL DETECTION
public static void detectLabelsGcs(String gcsPath, PrintStream out) throws Exception,
IOException {
List<AnnotateImageRequest> requests = new ArrayList<>();
ImageSource imgSource = ImageSource.newBuilder().setGcsImageUri(gcsPath).build();
Image img = Image.newBuilder().setSource(imgSource).build();
Feature feat = Feature.newBuilder().setType(Type.LABEL_DETECTION).build();
AnnotateImageRequest request =
AnnotateImageRequest.newBuilder().addFeatures(feat).setImage(img).build();
requests.add(request);
try (ImageAnnotatorClient client = ImageAnnotatorClient.create())
{ BatchAnnotateImagesResponse response = client.batchAnnotateImages(requests);
List<AnnotateImageResponse> responses = response.getResponsesList();
for (AnnotateImageResponse res : responses)
{ if (res.hasError()) {
out.printf("Error: %sn", res.getError().getMessage());
return;
}
// For full list of available annotations, see http://g.co/cloud/vision/docs
for (EntityAnnotation annotation : res.getLabelAnnotationsList()) {
annotation.getAllFields().forEach((k, v) ->
out.printf("%s : %sn", k, v.toString()));
}
38
CHAPTER
SCREENSHOTS
39
CHAPTER 9
CONCLUSION
We have proposed and tested a general building designs for creating real-time CNNs. Our proposed
architectures have been systematically built in order to reduce the amount of parameters. We began by
eliminating completely the fully connected layers and by reducing the amount of parameters in the remaining
convolutional layers via depth-wise separable convolutions. We have shown that our proposed models can be
stacked for multi-class classifications while maintaining real-time inferences. Specifically, we have developed
a vision system that performs face detection, gender classification and emotion classification in a single
integrated module. We have achieved human-level performance in our classifications tasks using a single
CNN that leverages modern architecture constructs. Our architecture reduces the amount of parameters 80×
while obtaining favorable results. Our complete pipeline has been successfully integrated in a Care-O-bot 3
robot. Finally we presented a visualization of the learned features in the CNN using the guided back-
propagation visualization. This visualization technique is able to show us the high-level features learned by
our models and discuss their interpretability.
40
CHAPTER 10
REFERENCES
1. Francis Chollet. Xception: Deep learning with depthwise separable
convolutions. CoRR, abs/1610.02357, 2016.
2. Andrew G. Howard et al. Mobilenets: Efficient convolutional neural networks
for mobile vision applications. CoRR, abs/1704.04861, 2017.
3. Dario Amodei et al. Deep speech 2: End-to-end
speech recognition in english and mandarin. CoRR, abs/1512.02595, 2015.
4. Ian Goodfellow et al. Challenges in Representation Learning: A report on three
machine learning contests, 2013.
5. Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural
networks. In Proceedings of the Fourteenth International Conference on artificial
Intelligence and Statistics, pages 315–323,
2011.
6. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual
learning for image recognition. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 770–778,2016
7. Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating
deep network training by reducing internal covariate shift. In International
Conference on Machine Learning, pages 448–456, 2015
41

More Related Content

What's hot

Face detection presentation slide
Face detection  presentation slideFace detection  presentation slide
Face detection presentation slideSanjoy Dutta
 
Character Recognition using Machine Learning
Character Recognition using Machine LearningCharacter Recognition using Machine Learning
Character Recognition using Machine LearningRitwikSaurabh1
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
Machine learning in image processing
Machine learning in image processingMachine learning in image processing
Machine learning in image processingData Science Thailand
 
IMAGE SEGMENTATION TECHNIQUES
IMAGE SEGMENTATION TECHNIQUESIMAGE SEGMENTATION TECHNIQUES
IMAGE SEGMENTATION TECHNIQUESVicky Kumar
 
Image segmentation ppt
Image segmentation pptImage segmentation ppt
Image segmentation pptGichelle Amon
 
Image Processing Basics
Image Processing BasicsImage Processing Basics
Image Processing BasicsA B Shinde
 
Image Processing and Computer Vision
Image Processing and Computer VisionImage Processing and Computer Vision
Image Processing and Computer VisionSilicon Mentor
 
Application of edge detection
Application of edge detectionApplication of edge detection
Application of edge detectionNaresh Biloniya
 
Computer vision introduction
Computer vision  introduction Computer vision  introduction
Computer vision introduction Wael Badawy
 
Object Detection & Tracking
Object Detection & TrackingObject Detection & Tracking
Object Detection & TrackingAkshay Gujarathi
 
Digital Image Processing: An Introduction
Digital Image Processing: An IntroductionDigital Image Processing: An Introduction
Digital Image Processing: An IntroductionMostafa G. M. Mostafa
 
COM2304: Introduction to Computer Vision & Image Processing
COM2304: Introduction to Computer Vision & Image Processing COM2304: Introduction to Computer Vision & Image Processing
COM2304: Introduction to Computer Vision & Image Processing Hemantha Kulathilake
 

What's hot (20)

IMAGE SEGMENTATION.
IMAGE SEGMENTATION.IMAGE SEGMENTATION.
IMAGE SEGMENTATION.
 
Face detection presentation slide
Face detection  presentation slideFace detection  presentation slide
Face detection presentation slide
 
Character Recognition using Machine Learning
Character Recognition using Machine LearningCharacter Recognition using Machine Learning
Character Recognition using Machine Learning
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Machine learning in image processing
Machine learning in image processingMachine learning in image processing
Machine learning in image processing
 
IMAGE SEGMENTATION TECHNIQUES
IMAGE SEGMENTATION TECHNIQUESIMAGE SEGMENTATION TECHNIQUES
IMAGE SEGMENTATION TECHNIQUES
 
OpenCV Introduction
OpenCV IntroductionOpenCV Introduction
OpenCV Introduction
 
Image Processing
Image ProcessingImage Processing
Image Processing
 
Deep learning
Deep learningDeep learning
Deep learning
 
Image segmentation ppt
Image segmentation pptImage segmentation ppt
Image segmentation ppt
 
Image Processing Basics
Image Processing BasicsImage Processing Basics
Image Processing Basics
 
Image Processing and Computer Vision
Image Processing and Computer VisionImage Processing and Computer Vision
Image Processing and Computer Vision
 
Computer Vision
Computer VisionComputer Vision
Computer Vision
 
Application of edge detection
Application of edge detectionApplication of edge detection
Application of edge detection
 
Segmentation
SegmentationSegmentation
Segmentation
 
Computer vision introduction
Computer vision  introduction Computer vision  introduction
Computer vision introduction
 
Object Detection & Tracking
Object Detection & TrackingObject Detection & Tracking
Object Detection & Tracking
 
Image compression models
Image compression modelsImage compression models
Image compression models
 
Digital Image Processing: An Introduction
Digital Image Processing: An IntroductionDigital Image Processing: An Introduction
Digital Image Processing: An Introduction
 
COM2304: Introduction to Computer Vision & Image Processing
COM2304: Introduction to Computer Vision & Image Processing COM2304: Introduction to Computer Vision & Image Processing
COM2304: Introduction to Computer Vision & Image Processing
 

Similar to Image recognition

CONTENT RECOVERY AND IMAGE RETRIVAL IN IMAGE DATABASE CONTENT RETRIVING IN TE...
CONTENT RECOVERY AND IMAGE RETRIVAL IN IMAGE DATABASE CONTENT RETRIVING IN TE...CONTENT RECOVERY AND IMAGE RETRIVAL IN IMAGE DATABASE CONTENT RETRIVING IN TE...
CONTENT RECOVERY AND IMAGE RETRIVAL IN IMAGE DATABASE CONTENT RETRIVING IN TE...Editor IJMTER
 
Comparative Study of Enchancement of Automated Student Attendance System Usin...
Comparative Study of Enchancement of Automated Student Attendance System Usin...Comparative Study of Enchancement of Automated Student Attendance System Usin...
Comparative Study of Enchancement of Automated Student Attendance System Usin...IRJET Journal
 
IRJET- A Survey on Image Retrieval using Machine Learning
IRJET- A Survey on Image Retrieval using Machine LearningIRJET- A Survey on Image Retrieval using Machine Learning
IRJET- A Survey on Image Retrieval using Machine LearningIRJET Journal
 
FACE EXPRESSION IDENTIFICATION USING IMAGE FEATURE CLUSTRING AND QUERY SCHEME...
FACE EXPRESSION IDENTIFICATION USING IMAGE FEATURE CLUSTRING AND QUERY SCHEME...FACE EXPRESSION IDENTIFICATION USING IMAGE FEATURE CLUSTRING AND QUERY SCHEME...
FACE EXPRESSION IDENTIFICATION USING IMAGE FEATURE CLUSTRING AND QUERY SCHEME...Editor IJMTER
 
Multiple object detection report
Multiple object detection reportMultiple object detection report
Multiple object detection reportManish Raghav
 
Face Annotation using Co-Relation based Matching for Improving Image Mining ...
Face Annotation using Co-Relation based Matching  for Improving Image Mining ...Face Annotation using Co-Relation based Matching  for Improving Image Mining ...
Face Annotation using Co-Relation based Matching for Improving Image Mining ...IRJET Journal
 
Photograph Database management report
Photograph Database management reportPhotograph Database management report
Photograph Database management reportAmrit Ranjan
 
Face Recognition System
Face Recognition SystemFace Recognition System
Face Recognition SystemStudentRocks
 
Techniques Used For Extracting Useful Information From Images
Techniques Used For Extracting Useful Information From ImagesTechniques Used For Extracting Useful Information From Images
Techniques Used For Extracting Useful Information From ImagesJill Crawford
 
Facial Expression Identification System
Facial Expression Identification SystemFacial Expression Identification System
Facial Expression Identification SystemIRJET Journal
 
IRJET- Real-Time Object Detection System using Caffe Model
IRJET- Real-Time Object Detection System using Caffe ModelIRJET- Real-Time Object Detection System using Caffe Model
IRJET- Real-Time Object Detection System using Caffe ModelIRJET Journal
 
IRJET- Wearable AI Device for Blind
IRJET- Wearable AI Device for BlindIRJET- Wearable AI Device for Blind
IRJET- Wearable AI Device for BlindIRJET Journal
 
Image Forgery / Tampering Detection Using Deep Learning and Cloud
Image Forgery / Tampering Detection Using Deep Learning and CloudImage Forgery / Tampering Detection Using Deep Learning and Cloud
Image Forgery / Tampering Detection Using Deep Learning and CloudIRJET Journal
 
A Deep Learning Approach to Recognize Cursive Handwriting
A Deep Learning Approach to Recognize Cursive HandwritingA Deep Learning Approach to Recognize Cursive Handwriting
A Deep Learning Approach to Recognize Cursive HandwritingIRJET Journal
 
IRJET- Review on Text Recognization of Product for Blind Person using MATLAB
IRJET-  Review on Text Recognization of Product for Blind Person using MATLABIRJET-  Review on Text Recognization of Product for Blind Person using MATLAB
IRJET- Review on Text Recognization of Product for Blind Person using MATLABIRJET Journal
 
Utilization of Machine Learning in Computer Vision
Utilization of Machine Learning in Computer VisionUtilization of Machine Learning in Computer Vision
Utilization of Machine Learning in Computer VisionIRJET Journal
 
Modelling Framework of a Neural Object Recognition
Modelling Framework of a Neural Object RecognitionModelling Framework of a Neural Object Recognition
Modelling Framework of a Neural Object RecognitionIJERA Editor
 

Similar to Image recognition (20)

CONTENT RECOVERY AND IMAGE RETRIVAL IN IMAGE DATABASE CONTENT RETRIVING IN TE...
CONTENT RECOVERY AND IMAGE RETRIVAL IN IMAGE DATABASE CONTENT RETRIVING IN TE...CONTENT RECOVERY AND IMAGE RETRIVAL IN IMAGE DATABASE CONTENT RETRIVING IN TE...
CONTENT RECOVERY AND IMAGE RETRIVAL IN IMAGE DATABASE CONTENT RETRIVING IN TE...
 
Comparative Study of Enchancement of Automated Student Attendance System Usin...
Comparative Study of Enchancement of Automated Student Attendance System Usin...Comparative Study of Enchancement of Automated Student Attendance System Usin...
Comparative Study of Enchancement of Automated Student Attendance System Usin...
 
IRJET- A Survey on Image Retrieval using Machine Learning
IRJET- A Survey on Image Retrieval using Machine LearningIRJET- A Survey on Image Retrieval using Machine Learning
IRJET- A Survey on Image Retrieval using Machine Learning
 
FACE EXPRESSION IDENTIFICATION USING IMAGE FEATURE CLUSTRING AND QUERY SCHEME...
FACE EXPRESSION IDENTIFICATION USING IMAGE FEATURE CLUSTRING AND QUERY SCHEME...FACE EXPRESSION IDENTIFICATION USING IMAGE FEATURE CLUSTRING AND QUERY SCHEME...
FACE EXPRESSION IDENTIFICATION USING IMAGE FEATURE CLUSTRING AND QUERY SCHEME...
 
Multiple object detection report
Multiple object detection reportMultiple object detection report
Multiple object detection report
 
Face Annotation using Co-Relation based Matching for Improving Image Mining ...
Face Annotation using Co-Relation based Matching  for Improving Image Mining ...Face Annotation using Co-Relation based Matching  for Improving Image Mining ...
Face Annotation using Co-Relation based Matching for Improving Image Mining ...
 
Photograph Database management report
Photograph Database management reportPhotograph Database management report
Photograph Database management report
 
Face Recognition System
Face Recognition SystemFace Recognition System
Face Recognition System
 
Techniques Used For Extracting Useful Information From Images
Techniques Used For Extracting Useful Information From ImagesTechniques Used For Extracting Useful Information From Images
Techniques Used For Extracting Useful Information From Images
 
Facial Expression Identification System
Facial Expression Identification SystemFacial Expression Identification System
Facial Expression Identification System
 
IRJET- Real-Time Object Detection System using Caffe Model
IRJET- Real-Time Object Detection System using Caffe ModelIRJET- Real-Time Object Detection System using Caffe Model
IRJET- Real-Time Object Detection System using Caffe Model
 
Karthick
KarthickKarthick
Karthick
 
40120140501006
4012014050100640120140501006
40120140501006
 
top journals
top journalstop journals
top journals
 
IRJET- Wearable AI Device for Blind
IRJET- Wearable AI Device for BlindIRJET- Wearable AI Device for Blind
IRJET- Wearable AI Device for Blind
 
Image Forgery / Tampering Detection Using Deep Learning and Cloud
Image Forgery / Tampering Detection Using Deep Learning and CloudImage Forgery / Tampering Detection Using Deep Learning and Cloud
Image Forgery / Tampering Detection Using Deep Learning and Cloud
 
A Deep Learning Approach to Recognize Cursive Handwriting
A Deep Learning Approach to Recognize Cursive HandwritingA Deep Learning Approach to Recognize Cursive Handwriting
A Deep Learning Approach to Recognize Cursive Handwriting
 
IRJET- Review on Text Recognization of Product for Blind Person using MATLAB
IRJET-  Review on Text Recognization of Product for Blind Person using MATLABIRJET-  Review on Text Recognization of Product for Blind Person using MATLAB
IRJET- Review on Text Recognization of Product for Blind Person using MATLAB
 
Utilization of Machine Learning in Computer Vision
Utilization of Machine Learning in Computer VisionUtilization of Machine Learning in Computer Vision
Utilization of Machine Learning in Computer Vision
 
Modelling Framework of a Neural Object Recognition
Modelling Framework of a Neural Object RecognitionModelling Framework of a Neural Object Recognition
Modelling Framework of a Neural Object Recognition
 

Recently uploaded

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 

Recently uploaded (20)

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 

Image recognition

  • 1. A MINOR PROJECT REPORT On IMAGE RECOGNITION Submitted in partial fulfilment of the requirement for the award of the degree of B. TECH in COMPUTER SCIENCE AND ENGINEERING Submitted By BHASKAR TRIPATHI :RA1611003040016 JOEL JOSE : RA1611003040128 DEPT. OF COMPUTER SCIENCE & ENGINEERING SRM Institute of Science & Technology Vadapalani Campus, Chennai OCTOBER 2018
  • 2. BONAFIDE CERTIFICATE Certified that this project report “IMAGE RECOGNITION“is the bonafide work of “BHASKAR TRIPATHI ASHWINI KUMAR and JOEL JOSE” who carried out the project work under my supervision. SIGNATURE OF THE GUIDE SIGNATURE OF THE HOD Dr.P.Mohamed Fathimal B.E.,M.E.,PhD. Dr.S. Prasanna Devi, B.E., M.E., Ph.D., PGDHRM., PDF(IISc) Assistant Professor Professor Department of Computer Science and Engineering Department of Computer Science and Engineering SRM Institute of Science & Technology SRM Institute of Science & Technology Vadapalani Campus Vadapalani Campus
  • 3. I ACKNOWLEDGEMENT It is our privilege to express our sincerest regards to our project coordinator, Dr.P.Mohamed Fathimal for her valuable inputs, able guidance, encouragement, whole-hearted cooperation and constructive criticism throughout the duration of our project. We deeply express our sincere thanks to our Head of Department Dr. S. Prasanna Devi, for encouraging and allowing us to present the project on the topic “IMAGE RECOGNITION “ at our department premises for the partial fulfillment of the requirements leading to the award of B-Tech degree. We take this opportunity to thank all our faculty members, Dean, Dr.K.Duraivelu, and Management who have directly or indirectly helped our project. Last but not the least we express our thanks to our friends for their cooperation and support.
  • 4. II TABLE OF CONTENT Page No ABSTRACT CHAPTER 1 Introduction 1.1 Introduction of Project 1.2 Overview CHAPTER 2 About Project 1.3 Purpose 2.2Project Scope 1.4 Existing System 1.5 Drawback Of Existing System 1.6 Proposed System 1.7 Benifits Of Proposed System 1.8 System Specifications CHAPTER 3 3.1 Tools And Technology 3.2 Architecture Of Proposed System 3.3 Problem Statement 3.4 Modules and their Functionalities CHAPTER 4 System Study 4.1 Data Flow Diagram 4.2 UML Diagram CHAPTER 5 Modules
  • 5. III 5.1 Main Modules 5.2 Face Detection 5.3 Feature Extraction 5.4 Recognition CHAPTER 6 System Testing CHAPTER 7 Source Code 7.1 Text Recognition 7.2 Face Recognition 7.3 Landmark Detection 7.4 Label Detection CHAPTER 8 Screenshots CHAPTER 9 Conclusion REFRENCES
  • 6. IV LIST OF FIGURES Figures Page No. 3.2 Architecture of proposed System 4.1 proposed model for real-time classification 4.2.1 Use-Case Diagram For Document Editing 4.2.2 Class Diagram 4.2.3 Sequence Diagram for Processing 4.2.3.a Sequence Diagram for Training 4.2.4 Sequence Diagram for Recognition 4.2.5 Sequence Diagram for Editing 6.1 Normalized confusion matrix of our mini-Xception network. 6.2 provided real-time emotion classification provided public repository Screenshots
  • 7. 1 ABSTRACT In this paper we propose an implement a general convolutional neural network (CNN) building framework for designing real-time CNNs. We validate our models by creating a real- time vision system which accomplishes the tasks of face detection, gender classification and emotion classification simultaneously in one blended step using our proposed CNN architecture. After presenting the details of the training procedure setup we proceed to evaluate on standard benchmark sets. We report accuracies of 96% in the IMDB gender dataset and 66% in the FER-2013 emotion dataset. Along with this we also introduced the very recent real-time enabled guided backpropagation visualization technique. Guided back- propagation uncovers the dynamics of the weight changes and evaluates the learned features. We argue that the careful implementation of modern CNN architectures, the use of the current regularization methods and the visualization of previously hidden features are necessary in order to reduce the gap between slow performances and real-time architectures.All our code, demos and pretrained architectures have been released under an open-source license in our public repository.
  • 8. 2 CHAPTER-1 INTRODUCTION In the running world, there is growing demand for the software systems to recognize images when information is scanned through Google vision API. Image recognition, in the context of machine vision, is the ability of software to identify objects, places, people, writing and actions in images. Computers can use machine vision technologies in combination with a camera and artificial intelligence software to achieve image recognition.While human and animal brains recognize objects with ease, computers have difficulty with the task. Inorder to overcome this problem faced by computer softwares are used.Current and future applications of image recognition include smart photo libraries, targeted advertising, the interactivity of media, accessibility for the visually impaired and enhanced research capabilities. Google, Facebook, Microsoft and Apple are some of the major companies which uses this technology.Facebook can now perform face recognize at 98% accuracy which is comparable to the ability of humans. Facebook can identify your friend’s face with only a few tagged pictures. The efficacy of this technology depends on the ability to classify images. Classification is pattern matching with data. Images are data in the form of 2-dimensional matrices. In fact, image recognition is classifying data into one category out of many. One common and an important example is optical character recognition (OCR). OCR converts images of typed or handwritten text into machine- encoded text.The major steps in image recognition process are gather and organize data, build a predictive model and use it to recognize images. Furthermore, the human accuracy for classifying an image of a face in one of 7 different emotions is 65% ± 5% . One can observe the difficulty of this task by trying to manually classify the FER-2013 dataset images within the following classes {“angry”, “disgust”, “fear”, “happy”, “sad”, “surprise”, “neutral”}. Gender classification was first perceived as an issue in psychophysical studies; it focuses on the efforts of understanding human visual processing and identifying key features used to categorize between male and female individuals [1]. Research has shown that the disparity between facial masculinity and femininity can be utilized to improve performances of face recognition applications in biometrics, human–computer interactions, surveillance, and computer vision. However, in a real-world environment, the challenge is how to deal with the facial image being affected by the variance in factors such as illumination, pose, facial expression, occlusion, background information, and noise dependent on the type of classifier chosen, which is in turn dependent on the feature extraction method applied.
  • 9. 3 It is difficult to find a classifier that combines best with the chosen feature extractor such that an optimal classification performance is achieved. Any changes to the problem domain require a complete redesign of the system. The convolutional neural network (CNN) is a neural network variant that consists of a number of convolutional layers alternating with subsampling layers and ends with one or more fully connected layers in the standard multilayer perceptron (MLP). A significant advantage of the CNN over conventional approaches in pattern recognition is its ability to simultaneously extract features, reduce data dimensionality, and classify in one network structure. Such a structure, as illustrated in Figure 1, can boost recognition accuracy efficientl yand cost-effectively. This is then also the challenge in the development of a robust face-based gender classification system that has high classification accuracy and real-time performance. The conventional approach applied in face recognition, including face-based gender recognition, typically involves the stages of image acquisition and processing, dimensionality reduction, feature extraction, and classification, in that order. Prior knowledge of the application domain is required to determine the best feature extractor to design.
  • 10. 4 CHAPTER-2 ABOUT PROJECT 2.1PURPOSE The main purpose of Image Recognition system based on a grid infrastructure is to perform Image Analysis, document processing of electronic document formats converted from paper formats more effectively and efficiently. This improves the accuracy of recognizing the characters during document processing compared to various existing available character recognition methods.The primary objective is to speed up the process of character recognition in document processing. As a result the system can process huge number of documents with-in less time and hence saves the time.This application can be used as a quick search engine which doesn’t need any long they typing for searching anything and quicken our work.Since our image recognition is based on a grid infrastructure, it aims to recognize multiple images and characters that belong to different universal languages with different properties,font properties and alignments. 2.2PROJECT SCOPE The scope of our project Image Recognition on a grid infrastructure is to provide an efficient and enhanced software tool for the users to perform Document Image Analysis, document processing by reading and recognizing the characters in research, academic, governmental and business organizations that are having large pool of documented, scanned images. Irrespective of the size of documents and the type of characters in documents, the product is recognizing them, searching them and processing them faster according to the needs of the environment. 2.3EXISTING SYSTEM In the running world there is a growing demand for the users to convert the images,printed documents for identifying the content within them and process it to understand what it means . Hence the google lens system was invented to convert the images and data available on papers in to computer process able documents and images .Google Lens is an image recognition mobile app developed by Google. First announced during Google I/O 2017, it is designed to bring up relevant information using visual analysis.When directing the phone's camera at an object, Google Lens will attempt to identify the object or read labels and text and show relevant search results and information. For example, when pointing the device's camera at a Wi-Fi label containing the network name and password, it will automatically connect to the Wi-Fi source that has been
  • 11. 5 scanned. Lens is also integrated with the Google Photos and Google Assistant apps.The service is similar to Google Goggles, a previous app that functioned similarly but with lesser capability.Lens uses more advanced deep learning routines, similar to other apps like Bixby Vision (for Samsung devices released 2016 and after) and Image Analysis Toolset (available on Google Play); artificial neural networks are used to detect and identify objects, landmarks and to improve optical character recognition (OCR) accuracy 2.4 DRAWBACK OF EXISTING SYSTEM The drawback in the Google lens include Device support is limited, although it is not clear which devices are not supported or why. It requires Android Marshmallow (6.0) or newer.It is not available in India too. 2.5 PROPOSED SYSTEM Our proposed system is on a grid infrastructure which is a image and character recognition system that supports recognition of the images and characters. This feature is what we call grid infrastructure which eliminates the problem of heterogeneous character recognition and supports multiple functionalities to be performed on the documents and images. 2.6 BENEFIT OF PROPOSED SYSTEM The benefit of proposed system that overcomes the drawback of the existing system is that it supports Mobile application.Image recognition with Google vision API and Google lens identifies famous personalities,animals,the actions being performed within the image.Text understanding and text retrieval are used to extract images and texts within surrounding and identify them with the help of Google lens and vision API.
  • 12. 6 2.7 SYSTEM SPECIFICATIONS Hardware requirements: 1 System : Snapdragon 410 2 Hard disk : 8 GB 3 Floppy drive :Not required 4 Monitor : Any 5 Ram : 512MB Software requirements: ● Operating system :Android ● Coding language : Java & XML ● Data base : Not required ● API : 21 ● SDK Version:3.2.1
  • 13. 7 CHAPTER 3 3.1 TOOLS AND TECHNOLOGY 3.1.1Android Studio: Android Studio is the official integrated development environment (IDE) for Google's Android operating system, built on JetBrains' IntelliJ IDEA software and designed specifically for Android development.[8] It is available for download on Windows, macOS and Linux based operating systems.[9][10] It is a replacement for the Eclipse Android Development Tools (ADT) as the primary IDE for native Android application development. Android Studio was announced on May 16, 2013 at the Google I/O conference. It was in early access preview stage starting from version 0.1 in May 2013, then entered beta stage starting from version 0.8 which was released in June 2014. The first stable build was released in December 2014, starting from version 1.0.The current stable version is 3.2.1, which was released in October 2018 Features  Gradle-based build support  Android-specific refactoring and quick fixes  Lint tools to catch performance, usability, version compatibility and other problems  ProGuard integration and app-signing capabilities  Template-based wizards to create common Android designs and components  A rich layout editor that allows users to drag-and-drop UI components, option to preview layouts on multiple screen configurations  Support for building Android Wear apps  Built-in support for Google Cloud Platform, enabling integration with Firebase Cloud Messaging (Earlier
  • 14. 8  Android Virtual Device (Emulator) to run and debug apps in the Android studio. Android Studio supports all the same programming languages of IntelliJ, and CLion e.g.Java (programming language), and C++; and Android Studio 3.0 or later supports Kotlin and Java 7 language features and a subset of Java 8 language features that vary by platform version."External projects backport some Java 9 features. 3.1.2Google vision API Google Vision API allows developers to easily integrate vision detection features within applications, including image labeling, face and landmark detection, optical character recognition (OCR), and tagging of explicit content. ... Cloud AutoMLVision enables you to create a custom machine learning model for image labeling How to use Google Vision API? Recently, we covered how computers can see, hear, feel, smell, and taste. One of the ways your code can “see” is with the Google Vision API. Google Vision API connects your code to Google’s image recognition capabilities. You can think of Google Image Search as a kind of API/REST interface to images.google.com, but it does much more than show you similar images. Google Vision can detect whether you’re a cat or a human, as well as the parts of your face. It tries to detect whether you’re posed or doing something that wouldn’t be okay for Google Safe Search—or not. It even tries to detect if you’re happy or sad
  • 15. 9 3.2 ARCHITECTURE OF THE PROPOSED SYSTEM Working architecture of Proposed System
  • 16. 10 The Architecture of the optical character recognition system on a grid infrastructure consists of the three main components. They are:-  Scanner  OCR Hardware or Software  Output Interface 3.3PROBLEM STATEMENT The problem here is for the software systems to recognize characters in computer system when information is scanned through paper documents as we know that we have number of newspapers and books which are in printed format related to different subjects. Whenever we scan the documents through the scanner, the documents are stored as images such as jpeg, gif etc., in the
  • 17. 11 computer system. These images cannot be read or edited by the user. But to reuse this information it is very difficult to read the individual contents and searching the contents form these documents line-by-line and word-by-word. These days there is a huge demand in “storing the information available in these paper documents in to a computer storage disk and then later editing or reusing this information by searching process”. 3.4MODULES AND THEIR FUNCTIONALITIES Our software system Optical Character Recognition on a grid infrastructure can be divided into five modules based on its functionality.The modules classified are as follows:- ● Document Processing Module ● System Training Module. ● Document Recognition Module. ● Document Editing Module and ● Document Searching Module. 3.4.1 DOCUMENT PROCESSING MODULE This module is accessed by administrator whose role in our application is a librarian.This module perform certain activities such as scanning documents, storing them as images, recognizing characters in images to transfer them into word format. During the recognition process, this module uses the OCR methodology in support of grid infrastructure datastructure. The module supports the following services:-  Scanning printed documents.  Storing the documents as snapshots or images.  Processing those image-based documents.  Converting these image-based documents into e-documents(also called structured documents).  Recognizing the characters in documents.  Generating grid infrastructure datastructure.
  • 18. 12 3.4.2 DOCUMENT RECOGNITION MODULE This module can be accessed by both the administrator and the end-user. Once the printed documents are converted into structured documents, any user can recognize the characters present in the document. That means the user can recognize the characters of any language he chooses which makes OCR more flexible. This flexibility is due to the adaptation of grid infrastructure. This is the module where the main functionality of OCR is tested. Under this module, there are two types of recognition. They are handwritten recogniiton and scanned document recognition. In handwritten recognition, the handwriting of the user in any language is trained to the system only for the first time. From there on-wards, the system recognizes the characters or words written by the user. Thus handwritten document recognition recognizes the human handwriting. In scanned document recognition, the system is first trained with the font characters in the document in the training module itself. Now in the recognition module, the system takes the scanned documents image as an input file, first crops the image and then extracts/recognizes the characters from the document and makes these documents editable and searchable. Thus the scanned document recognition recognizes the chracters from thescanned document image and makes the document editable and searchable. Hence the document recogniiton module on a whole supports the following services:-  Converts the document into specific format  Recognizes the characters  Heterogeneous character Recognition 3.4.3 DOCUMENT EDITING MODULE This module can be accessed by both the administrator and the end-user during document editing to implement the character recogniiton process. Once the scanned documents are stored, they reside in computer memory. This data resides in the form of an image that is just viewable in an image viewer. Hence, the document is first coverted into a form such that it is editable. The desired form of the document may be MS-Word,Text,… as specified by the user.The objective of this module is to let the user perform :-
  • 19. 13  Addition of specific content to the documents  Deletion of certain content from documents  Any other modification of documents. 3.4.4 DOCUMENT SEARCHING MODULE This module can be accessed by both the administrator and the end-user during the search of the user required document to implement the character recogniiton process on it. The user requests the system to search for a particular document. Then the system finds the documents based on OCR methodology and returns the result of the search to the user.
  • 20. 14 SYSTEM STUDY CHAPTER 4 We propose two models which we evaluated in accordance to their test accuracy and number of parameters. Both models were designed with the idea of creating the best accuracy over number of parameters ratio. Reducing the number of parameters help us overcoming two important problems. First, the use of small CNNs alleviate us from slow performances in hardware-constrained systems such robot platforms. And second, the reduction of parameters provides a better generalization under an Occam’s razor framework. Our first model relies on the idea of eliminating completely the fully connected layers. The second architecture combines the deletion of the fully connected layer and the inclusion of the combined depth-wise separable convolutions and residual modules. Both architectures were trained with the ADAM optimizer [8]. Following the previous architecture schemas, our initial architecture used Global Average Pooling to completely remove any fully connected layers. This was achieved by having in the last convolutional layer the same number of feature maps as number of classes, and applying a softmax activation function to each reduced feature map. Our initial proposed architecture is a standard fully-convolutional neural network composed of 9 convolution layers, ReLUs [5], Batch Normalization [7] and Global Average Pooling. This model contains approximately 600,000 parameters. It was trained on the IMDB gender dataset, which contains 460,723 RGB images where each image belongs to the class “woman” or “man”, and it achieved an accuracy of 96% in this dataset. We also validated this model in the FER-2013 dataset. This dataset contains 35,887 grayscale images where each image belongs to one of the following classes {“angry”, “disgust”, “fear”, “happy”, “sad”, “surprise”, “neutral”}. Our initial model achieved an accuracy of 66% in this dataset. We will refer to this model as “sequential fully-CNN”. Our second model is inspired by the Xception [1] architecture. This architecture combines the use of residual modules [6] and depth-wise separable convolutions [2]. Residual modules modify the desired mapping between two subsequent layers, so that the learned features become the difference of the original feature map and the desired features. Consequently, the desired features H(x) are modified in order to solve an easier learning problem F(X) such that: H(x) = F(x) + x (1) Since our initial proposed architecture deleted the last fully connected layer, we reduced further the amount of parameters by eliminating them now from the convolutional layers. This was done trough the use of depth-wise separable convolutions. Depth-wise separable convolutions are composed of two different layers: depth-wise convolutions and pointwise convolutions. The main purpose of these layers is to separate the spatial cross-correlations from the channel crosscorrelations [1]. They do this by first applying a D × D
  • 21. 15 filter on every M input channels and then applying N 1 × 1 × M convolution filters to combine the M input channels into N output channels. Applying 1 × 1 × M convolutions combines each value in the feature map without considering their spatial relation within the channel. Depth-wise separable convolutions reduces the computation with respect to the standard convolutions by a factor of 1 N + 1 D2 [2]. A visualization of the difference between a normal Convolution layer and a depth-wise separable convolution can be observed in Figure 4 b. Our final architecture is a fully-convolutional neural network that contains 4 residual depth-wise separable convolutions where each convolution is followed by a batch normalization operation and a ReLU activation function. The last layer applies a global average pooling and a soft-max activation function to produce a prediction. This architecture has approximately 60,000 parameters; which corresponds to a reduction of 10× when compared to our initial naive implementation, and 80× when compared to the original CNN. Figure 4a displays our complete final architecture which we refer to as mini-Xception. This architectures obtains an accuracy of 95% in gender classification task. Which corresponds to a reduction of one percent with respect to our initial implementation. Furthermore, we tested this architecture in the FER-2013 dataset and we obtained the same accuracy of 66% for the emotion classification task. Our final architecture weights can be stored in an 855 kilobytes file. By reducing our architectures computational cost we are now able to join both models and use them consecutively in the same image without any serious time reduction.
  • 22. 16 Fig. 4 a : Our proposed model for real-time classification.
  • 23. 17 SOFTWARE DESIGN 4.1 DATA FLOW DIAGRAM The DFD is also called as bubble chart. A data-flow diagram (DFD) is a graphical representation of the "flow" of data through an information system. DFD’s can also be used for the visualization of data processing. The flow of data in our system can be described in the form of dataflow diagram as follows:- 1.Firstly, if the user is administrator he can initialize the following actions:- ● Document processing ● Document search ● Document editing. All the above actions come under 2cases.They are described as follows:- a)If the printed document is a new document that is not yet read into the system, then the document processing phase reads the scanned document as an image only and then produces the document image stored in computer memory as a result. Now the document processing phase has the document at its hand and can read the document at any point of time. Later the document processing phase proceeds with recognizing the document using OCR methodology and the grid infrastructures. Thus it produces the documents with the recognized characters as final output which can be later searched and edited by the end-user or administrator. b)If the printed document is already scanned in and is held in system memory, then the document processing phase proceeds with document recognition using OCR methodology and grid infrastructure. And thus it finally produces the document with recognized documents as output.
  • 24. 18 1. If the user using the OCR system is the end-user, then he can perform the following actions:- ● Document searching ● Document editing 1. Document Searching:- The documents which are recognized can be searched by the user whenever required by requesting from the system database. Document Editing:- The recognized documents can be edited by adding the specific content to the document, deleting specific content from the document and modifying the document methodology and grid infrastructure. And thus it finally produces the document with recognized documents as output. 2. If the user using the OCR system is the end-user, then he can perform the following actions:- ● Document searching ● Document editing 2. Document Searching:- The documents which are recognized can be searched by the user whenever required by requesting from the system database. 3. Document Editing:- The recognized documents can be edited by adding the specific content to the document, deleting specific content from the document and modifying the document. 4.1 UML DIAGRAMS UML combines best techniques from data modeling (entity relationship diagrams), business modeling (work flows), object modeling, and component modeling. It can be used with all processes, throughout the software development life cycle, and across different implementation technologies. UML has 14 types of diagrams divided into two categories. Seven diagram types represent structural information, and the other seven represent general types of behavior, including four that represent different aspects of interactions. Some of these diagrams we provided to describe the design and implementation of our OCR system can be categorized hierarchically as below:-  Use case diagram
  • 25. 19  Class diagram  Sequence diagram  Collaboration diagram  Activity diagram  Component diagram  Deployment diagram 4.2.1 USE-CASE DIAGRAMS Our software system can be used to support library environment to create a Digital Library where several paper documents are converted into electronic-form for accessing by the users. For this purpose the printed documents must be recognized before they are converted into electronic-form. The resulting electronic-documents are accessed by the users likefaculty and students for reading and editing. Now according to this information, the following are the different actors involved in implementing our OCR system:-  If we consider for virtual digital library, the Administrator can be the Librarian and the End-users can be Students or/and Faculty.  The following are the list of use diagrams that altogether form the complete or the overall use-case diagram. They are listed below:- 1. Use-case diagram for document processing 2. Use-case diagram for neural network training 3. Use-case diagram for document recognition 4. Use-case diagram for document editing 5. Use-case diagram for document searching In each of the use-case diagrams below we clearly explained about that particular use- case functionality. In this we provided a description about the ● Use-case name ● Details about the use-case ● Actors using this use-case
  • 26. 20 Use case Name Neural Network Training Description The Administrator or End-user enters the specific characters required for training. User stores them as image file and trains the system. Actors ○ Primary Actor : Administrator or End-user ○ Secondary Actor : User Flow of Events 1. The user enters the specific characters in order to train the system. 2. After entering it is stored as image file. 3. Finally trains the system according to the system. Pre-Condition The font in the scanned document should be identified.
  • 27. 21 Open document in editor Select Edit action Administrator or End-user Performs editing Stores edited document Figure 4.2.1 Use-Case Diagram For Document Editing Actors ○ Primary Actor : Administrator or End-user ○ Secondary Actor : User Flow of Events 1. The user opens the document for searching a word he required. 2. After opening the document he enters the word for search.
  • 28. 22 <<includes>> scan documents Document processing administrator <<includes>> store documents Document recognition Trainsthesystem end-use Document processing 3. Finally searches the word in that document. Pre And Post Conditions No pre-condition and post-condition Overall Use-Case Diagram 4.2.2 CLASS DIAGRAMS The class diagram is the main building block in object oriented modeling. The classes in a class diagram represent both the main objects and or interactions in the application and the objects to be programmed. ● The class diagram of our OCR system consists of 9classes. They are
  • 29. 23 1. MainScreen 2. Editor 3. HelpFrame 4. Document 5. HEntry 6. Entry 7. TrainingSet 8. KohonenNetwork 9. PrintedFrame. Among all these classes the MainScreen is the main class that represents all the major functions carried out by our OCR system. The MainScreen class has an association with five classes viz., Editor, HelpFrame, Document, TrainingSet, PrintedFrame. And the TrainingSet class in-turn has an association with the HEntry and the KohonenNetwork classes. The PrintedFrame has an association with the Entry and KohonenNetworkclasses.
  • 30. 24 Figure 4.4.2:Class Diagram 4.2.3 SEQUENCE DIAGRAMS Sequence diagrams are sometimes called Event-trace diagrams, event scenarios, and timing diagrams. A sequence diagram shows, as parallel vertical lines (lifelines), different processes or objects that live simultaneously, and, as horizontal arrows, the messages exchanged between them, in the order in which they occur. This allows the specification of simple runtime scenarios in a graphical manner. In sequence diagram, the class objects that are used to describe the interaction between various classes vary from one function to another function. There are five sequence diagrams short-listed below for presenting the sequence of actions performed by each of the five modules. The key class object involved in all of these module functions is MainScreen class which controls the interaction among various class objects.
  • 31. 25 Sequence Diagram for Document Processing 1. Objects Administrator - “a” MainScreen - “m” Document - “d” SystemMemory - “s” 2. Links 1. Administrator object to MainScreen object. 2. MainScreen object to Document object. 3. Document object to SystemMemory object. 4. SystemMemory object to Administrator object. 3. Messages 1. Process documents 2. Scan documents 3. Scans 4. Stores documents 5. Stores 6. Returns the processed documents 7. Returns 8. End 9. Processed Document
  • 32. 26 1.Process 2.Scan 3.Scan 4.Stores 5.Stores 6.Returns theprocesseddocuments s:SystemMemord:Documentm:MainSreena:Administraror 1. Objects Figure 4.2.3:Sequence Diagram for Processing Administrator - “a” System - “s” TrainingSet - “t” 2. Links 1. Administrator object to System object 2. System object to TrainingSet object 3. TrainingSet object to System object 4. System object to Administrator object 3. Messages 1. Specifies the font characters 2. Stores it as an image 3. Trains the system with new font
  • 33. 27 1.Specifies the fontcharacters 2.Stores it as an image 3.Trains the system with new font 4.System recognizes new font and returns for user t:TrainingSets:Systema:Administrato 4. System recognizes new font and returns for user Figure 4.2.3.a:Sequence Diagram for Training Sequence Diagram for Document Recognition 1. Objects Administrator - “a” MainScreen - “m” SystemMemory - “s” TrainingSet - “t” 2. Links 1. Administrator object to MainScreen object 2. MainScreen object to SystemMemory object 3. SystemMemory object to MainScreen object
  • 34. 28 4. TrainingSet object to MainScreen object 5. MainScreen object to Administrator object 3. Messages 1. Recognize documents 2. Store processed document 3. Read file image 4. Recognize using ocr 5. Send processed document 6. Recognize the characters Figure 4.4Sequence Diagram for Recognition 1:Recognise 2. Store processed document 4.Recognise using ocr 5.Send processed 6.Recognise the t:TrainingSes:SystemMemorya:Administrator m:MainScreen
  • 35. 29 Sequence Diagram for Document Editing 1. Objects Administrator - “a” MainScreen - “m” Document - “d” SystemMemory - “s” 2. Links 1. Administrator object to MainScreen object. 2. MainScreen object to Document object. 3. MainScreen object to Document object 4. MainScreen object to Document object 5. Document object to SystemMemory object. 6. SystemMemory object to Administrator object. 3. Messages 1. Edit document 2. Adding document 3. Adds 4. Deleting document
  • 36. 30 5. Modifying document 6. Modifies 7. Stores the edited documents 8. Administrator accesses the edited documents Figure 4.2.5:Sequence Diagram for Editing 1.Edit 2.Adding 3.adds 5.Deletes 4.Deleting 6.Modifing 7.Modifies 8.Stores the edited documents 9.Administrator accesses the edited documents s:SystemMemord.Documenta:Administrator m:MainScreen
  • 37. 31 CHAPTER 5 MODULES 5.1 Main Modules: 2 Face detection 3 Feature Extraction 4 Recognition 5.2 Face Detection: Viola-Jones object detection framework is the first and one of the most mature frame Work to provide competitive object detection rates in real-time It is a binary classification problem by implementing an Adaboost classifier with Haar-like features. 5.3 Feature Extraction: One possible classification divides the feature extraction methods into Holistic Methods and Local Feature- based Methods. In the first method the whole face image is applied as an input of the recognition operation similar to the well-known PCA-based method which was used in Kiby and Sirovich [5] followed by Turk and Pentland [6]. In the second method local features are extracted, for example the location and local statistics of the eyes, nose and mouth are used in the recognition task. 5.4 Recognition : The facial recognition module is used to automatically identify people by their video images. It recognizes faces captured by Axxon facial detection tool by comparing their parameters with digital templates stored in a dedicated database.
  • 38. 32 CHAPTER 6 SYTEM TESTING Results of the real-time emotion classification task in unseen faces can be observed in Figure 8(a). Our complete realtime pipeline including: face detection, emotion and gender classification have been fully integrated in our Care-O-bot 3 robot. An example of our complete pipeline can be seen in Figure 8(b) in which we provide emotion and gender classification. In Figure 7 we provide the confusion matrix results of our emotion classification mini-Xception model. We can observe several common misclassifications such as predicting “sad” instead of “fear” and predicting “angry” instead “disgust”. A comparison of the learned features between several emotions and both of our proposed models can be observed in Figure 8(c). The white areas in figure 8(d) correspond to the pixel values that activate a selected neuron in our last convolution layer. The selected neuron was always selected in accordance to the highest activation. We can observe that the CNN learned to get activated by considering features such as the frown, the teeth, the eyebrows and the widening of one’s eyes, and that each feature remains constant within the same class. These results reassure that the CNN learned to interpret understandable human-like features, that provide generalizable elements. These interpretable results have helped us understand several common misclassification such as persons with glasses being classified as “angry”. This happens since the label “angry” is highly activated when it believes a person is frowning and frowning features get confused with darker glass frames. Moreover, we can also observe that the features learned in our mini-Xception model are more interpretable than the ones learned from our sequential fully-CNN. Consequently the use of more parameters in our naive implementations leads to less robust features.
  • 39. 33 Fig. 6.1: Normalized confusion matrix of our mini-Xception network. Fig.6.2 Results of the provided real-time emotion classification provided in our public repository
  • 40. 34 CHAPTER 7 Source Code 7.1 TEXT RECOGNITION public static void detectText(String filePath, PrintStream out) throws Exception, IOException { List<AnnotateImageRequest> requests = new ArrayList<>(); ByteString imgBytes = ByteString.readFrom(new FileInputStream(filePath)); Image img = Image.newBuilder().setContent(imgBytes).build(); Feature feat = Feature.newBuilder().setType(Type.TEXT_DETECTION).build(); AnnotateImageRequest request = AnnotateImageRequest.newBuilder().addFeatures(feat).setImage(img).build(); requests.add(request); try (ImageAnnotatorClient client = ImageAnnotatorClient.create()) { BatchAnnotateImagesResponse response = client.batchAnnotateImages(requests); List<AnnotateImageResponse> responses = response.getResponsesList(); for (AnnotateImageResponse res : responses) { if (res.hasError()) { out.printf("Error: %sn", res.getError().getMessage()); return; } // For full list of available annotations, see http://g.co/cloud/vision/docs for (EntityAnnotation annotation : res.getTextAnnotationsList()) { out.printf("Text: %sn", annotation.getDescription()); out.printf("Position : %sn", annotation.getBoundingPoly()); } } } } 7.2 FACE RECOGNITION public static void detectFaces(String filePath, PrintStream out) throws Exception, IOException { List<AnnotateImageRequest> requests = new ArrayList<>(); ByteString imgBytes = ByteString.readFrom(new FileInputStream(filePath));
  • 41. 35 Image img = Image.newBuilder().setContent(imgBytes).build(); Feature feat = Feature.newBuilder().setType(Type.FACE_DETECTION).build(); AnnotateImageRequest request = AnnotateImageRequest.newBuilder().addFeatures(feat).setImage(img).build(); requests.add(request); try (ImageAnnotatorClient client = ImageAnnotatorClient.create()) { BatchAnnotateImagesResponse response = client.batchAnnotateImages(requests); List<AnnotateImageResponse> responses = response.getResponsesList(); for (AnnotateImageResponse res : responses) { if (res.hasError()) { out.printf("Error: %sn", res.getError().getMessage()); return; } // For full list of available annotations, see http://g.co/cloud/vision/docs for (FaceAnnotation annotation : res.getFaceAnnotationsList()) { out.printf( "anger: %snjoy: %snsurprise: %snposition: %s", annotation.getAngerLikelihood(), annotation.getJoyLikelihood(), annotation.getSurpriseLikelihood(), annotation.getBoundingPoly()); } } } }
  • 42. } } } 36 7.3 LANDMARK DETECTION public static void detectLandmarks(String filePath, PrintStream out) throws Exception, IOException { List<AnnotateImageRequest> requests = new ArrayList<>(); ByteString imgBytes = ByteString.readFrom(new FileInputStream(filePath)); Image img = Image.newBuilder().setContent(imgBytes).build(); Feature feat = Feature.newBuilder().setType(Type.LANDMARK_DETECTION).build(); AnnotateImageRequest request = AnnotateImageRequest.newBuilder().addFeatures(feat).setImage(img).build(); requests.add(request); try (ImageAnnotatorClient client = ImageAnnotatorClient.create()) { BatchAnnotateImagesResponse response = client.batchAnnotateImages(requests); List<AnnotateImageResponse> responses = response.getResponsesList(); for (AnnotateImageResponse res : responses) { if (res.hasError()) { out.printf("Error: %sn", res.getError().getMessage()); return; } // For full list of available annotations, see http://g.co/cloud/vision/docs for (EntityAnnotation annotation : res.getLandmarkAnnotationsList()) { LocationInfo info = annotation.getLocationsList().listIterator().next(); out.printf("Landmark: %sn %sn", annotation.getDescription(), info.getLatLng());
  • 43. } } } 37 } 7.4 LABEL DETECTION public static void detectLabelsGcs(String gcsPath, PrintStream out) throws Exception, IOException { List<AnnotateImageRequest> requests = new ArrayList<>(); ImageSource imgSource = ImageSource.newBuilder().setGcsImageUri(gcsPath).build(); Image img = Image.newBuilder().setSource(imgSource).build(); Feature feat = Feature.newBuilder().setType(Type.LABEL_DETECTION).build(); AnnotateImageRequest request = AnnotateImageRequest.newBuilder().addFeatures(feat).setImage(img).build(); requests.add(request); try (ImageAnnotatorClient client = ImageAnnotatorClient.create()) { BatchAnnotateImagesResponse response = client.batchAnnotateImages(requests); List<AnnotateImageResponse> responses = response.getResponsesList(); for (AnnotateImageResponse res : responses) { if (res.hasError()) { out.printf("Error: %sn", res.getError().getMessage()); return; } // For full list of available annotations, see http://g.co/cloud/vision/docs for (EntityAnnotation annotation : res.getLabelAnnotationsList()) { annotation.getAllFields().forEach((k, v) -> out.printf("%s : %sn", k, v.toString())); }
  • 45. 39
  • 46. CHAPTER 9 CONCLUSION We have proposed and tested a general building designs for creating real-time CNNs. Our proposed architectures have been systematically built in order to reduce the amount of parameters. We began by eliminating completely the fully connected layers and by reducing the amount of parameters in the remaining convolutional layers via depth-wise separable convolutions. We have shown that our proposed models can be stacked for multi-class classifications while maintaining real-time inferences. Specifically, we have developed a vision system that performs face detection, gender classification and emotion classification in a single integrated module. We have achieved human-level performance in our classifications tasks using a single CNN that leverages modern architecture constructs. Our architecture reduces the amount of parameters 80× while obtaining favorable results. Our complete pipeline has been successfully integrated in a Care-O-bot 3 robot. Finally we presented a visualization of the learned features in the CNN using the guided back- propagation visualization. This visualization technique is able to show us the high-level features learned by our models and discuss their interpretability. 40
  • 47. CHAPTER 10 REFERENCES 1. Francis Chollet. Xception: Deep learning with depthwise separable convolutions. CoRR, abs/1610.02357, 2016. 2. Andrew G. Howard et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017. 3. Dario Amodei et al. Deep speech 2: End-to-end speech recognition in english and mandarin. CoRR, abs/1512.02595, 2015. 4. Ian Goodfellow et al. Challenges in Representation Learning: A report on three machine learning contests, 2013. 5. Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on artificial Intelligence and Statistics, pages 315–323, 2011. 6. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778,2016 7. Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning, pages 448–456, 2015 41