DETECTING EMOTION FROM FACIAL EXPRESSION HAS BECOME AN URGENT NEED BECAUSE OF
ITS IMMENSE APPLICATIONS IN ARTIFICIAL INTELLIGENCE SUCH AS HUMAN-COMPUTER
COLLABORATION, DATA DRIVEN ANIMATION, HUMAN-ROBOT COMMUNICATION ETC. SINCE IT
IS A DEMANDING AND INTERESTING PROBLEM IN COMPUTER VISION, SEVERAL WORKS HAD
BEEN CONDUCTED REGARDING THIS TOPIC. THE OBJECTIVE OF THIS PROJECT IS TO DEVELOP A
FACIAL EXPRESSION RECOGNITION SYSTEM BASED ON CONVOLUTIONAL NEURAL NETWORK
WITH DATA AUGMENTATION. THIS APPROACH ENABLES TO CLASSIFY SEVEN BASIC EMOTIONS
CONSIST OF ANGRY, DISGUST, FEAR, HAPPY, NEUTRAL, SAD AND SURPRISE FROM IMAGE DATA.
CONVOLUTIONAL NEURAL NETWORK WITH DATA AUGMENTATION LEADS TO HIGHER
VALIDATION ACCURACY THAN THE OTHER EXISTING MODELS (WHICH IS 96.24%) AS WELL AS
HELPS TO OVERCOME THEIR LIMITATIONS.
3. Abstra
ctDETECTING EMOTION FROM FACIAL EXPRESSION HAS BECOME AN URGENT
NEED BECAUSE OF ITS IMMENSE APPLICATIONS IN ARTIFICIAL INTELLIGENCE
SUCH AS HUMAN-COMPUTER COLLABORATION, DATA DRIVEN ANIMATION,
HUMAN-ROBOT COMMUNICATION ETC. SINCE IT IS A DEMANDING AND
INTERESTING PROBLEM IN COMPUTER VISION, SEVERAL WORKS HAD BEEN
CONDUCTED REGARDING THIS TOPIC. THE OBJECTIVE OF THIS PROJECT IS TO
DEVELOP A
FACIAL EXPRESSION RECOGNITION SYSTEM BASED ON CONVOLUTIONAL
NEURAL NETWORK WITH DATAAUGMENTATION. THIS APPROACH ENABLES TO
CLASSIFY SEVEN BASIC EMOTIONS CONSIST OF ANGRY, DISGUST, FEAR,
HAPPY, NEUTRAL, SAD AND SURPRISE FROM IMAGE DATA. CONVOLUTIONAL
NEURAL NETWORK WITH DATAAUGMENTATION LEADS TO HIGHER VALIDATION
ACCURACY THAN THE OTHER EXISTING MODELS (WHICH IS 96.24%) AS WELL
AS HELPS TO OVERCOME THEIR LIMITATIONS.
4. Introducti
onWe human express our feelings using many means, this project deal with
the expressions an expressions can convey a person perception. There
are 7 universal facial expressions they are anger, contempt, disgust, fear
,joy, sadness and surprise.
This project is build to detect the facial expressions of person on any video
or the system camera. The project
is developed using deep learning algorithm convolution neural network.
These algorithm is most useful for image recognition tools and categorizing
it into types of category.
The model is trained to use input from video or camera and predict the
facial expression of image and display on the web page. The data
use for training model is a dataset of machine learning competition
in2013 and has images of expressions of distinct varieties.
The dataset is split to 80 per of train dataset and 20 per of test dataset
for evaluation of our model to get how accurate it is in predicting for
inputs we provide and display on the screen
5. Definition of
problem
In recent years there has been a growing interest in improving all aspects
of the interaction between humans
and computers.
The rapid advance of technology in recent years has made computers
cheaper and more powerful and has made the use of microphones and
pc-cameras affordable and easily available. The microphones and
cameras enable the computerto “see” and “hear,” and to use this
information to act.
It is argued that to truly achieve effective human-computer intelligent
interaction, there is a need for the computer to be able to interact
naturally with the user, like the way human-human interaction takes
place.
Human beings possess and express emotions in everyday interactions with
others. Emotions are often reflected
on the face, in hand and body gestures, and in the voice, to express our
feelings or likings.
6. Psychologists and engineers alike have tried to analyze facial expressions
to understand and categorize these expressions. This knowledge can be
for example used to teach computers to recognize human emotions from
video images acquired from built-in cameras.
There are several related problems: detection of an image segment as a
face, extraction of the facial expression
information, and classification of the expression.
A system that performs these operations accurately and in real time would
be a major step forward in achieving a human-like interaction between the
man and machine.
The demand to meet the requirements inspired us to build a facail
recognition system and apply it to solve real time problems such as
computer instructors, emotion monitor ect.
7. The 7 universal facial
expressionsIt is widely supported within the scientific community that there are seven
basic emotions, each with their own
unique and distinctive facial expressions.
1.Happi
ness
2.Sadn
ess
3.Fear
4.Disgust
5.Anger
6.Contem
pt and
7.Surprise
.
8. Existing
Systems:
SVM classifierA support vector machine (SVM) is a supervised machine learning model
that uses classification algorithms for two-group classification problems.
After giving an SVM model sets of labeled training data for each category,
they’re able to categorize new text.
In this proposed algorithm initially detecting eye and mouth, features of eye
and mouth are extracted using gabor filter, LBP and PCA is used to reduce
the dimensions of the features. Finally SVM is used to classification of
expression and facial action units.
example. Let’s imagine we have two tags: red and blue, and our
data has two features: x and y. We want a classifier that, given a pair
of (x,y) coordinates, outputs if it’s either red or blue. We plot our already
labeled training data on a plane
9. A support vector machine takes these data points and outputs the hyperplane
(which in two dimensions it’s simply a line) that best separates the tags. This line is
the decision boundary: anything that falls to one side of it we will classify as blue,
and anything that falls to the other as red.
11. Fully connected neural network
The fully connected neural network is made to predict the face expression by fitting the
model with train and test data and output it to device connected to machine.
12. Proposed
systemIn the svm systems and fully
connected system the
model that is built after we
fit the model using dataset
it consumes large space
and is time consuming.
So to overcome the
problem we built a
convolution neural network
model which take less size
and is fast.
FULLY
CONNECTED
LAYER
CNN
LAYER
13. Tools
used.
PythonIn recent years, python has become the language of choice for data
science and artificial intelligence two technology trends essential for
global businesses to stay competitive today. In fact, python is the fastest-
growing programming language today. It’s used across a wide variety of
applications from web development to task automation to data analysis.
Keras
Keras is a minimalist python library for deep learning that can run on
top of theano or tensorflow. It was developed to make implementing
deep learning models as fast and easy as possible for research and
development.
TensorFlow
TensorFlow is an end-to-end open source platform for machine learning. It
has a comprehensive, flexible ecosystem of tools, libraries and community
resources that lets researchers push the state-of-the-art in ML and
developers easily build and deploy ML powered applications.
14. Fla
skFlask is a web application framework written in python. Armin ronacher, who
leads an international group of python enthusiasts named pocco, develops it.
Flask is based on werkzeug WSGI toolkit and jinja2 template engine.
Jupyter notebook
Jupyterlab is a web-based interactive development environment for jupyter
notebooks, code, and data. Jupyterlab is flexible: configure and arrange the
user interface to support a wide range of workflows in data science, scientific
computing, and machine learning.
Camera module
Python provides various libraries for image and video processing. One of
them is OpenCV. OpenCV is a vast library that helps in providing various
functions for image and video operations. With OpenCV, we can capture a
video from the camera. It lets you create a video capture object which is
helpful to capture videos through webcam and then you may perform
desired operations on that video.
OpenCV
OpenCV is an open source computer vision and machine learning software
15. Importing important
modules
Import NumPy
NumPy is the fundamental package for scientific computing in python. It
is a python library that provides a multidimensional array object,
various derived objects
Import seaborn
Seaborn is a python data visualization library based on matplotlib. It provides
a high-level interface for drawing
attractive and informative statistical graphics.
Import matplotlib.Pyplot
Matplotlib is a plotting library for the python programming language and its
numerical mathematics extension NumPy.
Import utils
Python utils is a collection of small python functions and classes which make
common patterns shorter and easier.
Import os
OS module in python provides functions for interacting with the operating
16. Import
imagedatageneratorThe imagedatagenerator class is very useful in image classification. There
are several ways to usethis generator, depending on the method we use,
here we will focus on flow_from_directory takes a path to the directory
containing images sorted in sub directories and image augmentation
parameters.
Import dense layer
Dense layer is the regular deeply connected neural network layer. It is most
common and frequently used layer.
Import input layer
Input is used to instantiate a keras tensor.
Import dropout layer
The dropout layer randomly sets input units to 0 with a frequency of rate at
each step during training time, which helps prevent overfitting.
Import flatten layer
If inputs are shaped (batch,) without a feature axis, then flattening adds an
extra channel dimension and output
shape is (batch, 1).
Import conv2d layer
17. Import batch
normalization
Normalize the activations of the previous layer at each batch, i.e. Applies a
transformation that maintains the mean
activation close to 0 and the activation standard deviation close to 1.
Import activation
Relu activation: max(x, 0), the element-wise maximum of 0 and the input
tensor.
Import maxpooling2d
Down samples the input representation by taking the maximum value over the
window defined by pool size for each
dimension along the feature's axis.
Import model
Model groups layers into an object with training and inference features.
Import sequential
A sequential model is appropriate for a plain stack of layers where each layer
has exactly one input tensor and one output tensor.
Import adam
Adam optimization is a stochastic gradient descent method that is based on
adaptive estimation of first-order and
18. Import
reducelronplateauReduce learning rate when a metric has stopped improving.
Import ipyhton.Display
When this object is returned by an expression or passed to the display
function, it will result in the data being
displayed in the frontend.
Import livelossplot
The livelossplot python package for live training loss plots in jupyter
notebook for keras.
Import TensorFlow
TensorFlow is an end-to-end open source platform for machine learning. It
has a comprehensive, flexible ecosystem of tools, libraries and community
resources that lets researchers push the state-of-the-art in ML and
developers easily build and deploy ML powered applications.
19. LOADING IMAGE
DATASET
Data augmentationData augmentation encompasses a wide range of techniques used to
generate “new” training samples from the original ones by applying
random jitters and perturbations (but at the same time ensuring that the
class labels of the data are not changed).
we have a smaller number of images of disgust. we use data
augmentation technique to generate new images. this is done with
image datagenerator.
augmented data is more likely to generalize to example data points not
included in the training set.
loading dataset
we have two sets of data train data and test data.
we had split our data to 80 perc train dataset and 20 perc test dataset for
testing our model.
we use method flow_from_directory to load our dataset.
20. Role of layer in CNN image
classification
A convolutional neural network(CNN) architecture has three main parts:
A convolutional layerthat extracts features from a source image.
Convolution helps with blurring, sharpening, edge detection, noise
reduction, or other operations that can help the machine to learn specific
characteristics of an image.
A pooling layer that reduces the image dimensionality without losing
important features or patterns.
A fully connectedlayer also known as the dense layer, in which the results
of the convolutional layers are fed through one or more neural layers to
generate a prediction.
In between the convolutional layer and the fully connected layer, there is a
‘flatten’ layer. Flattening transforms a
two-dimensional matrix of features into a vector that can be fed into a fully
21. Create CNN
modelIn keras, this is a typical process for building a CNN architecture:
Reshape the input data into a format suitable for the convolutional
layers, using x_train.Reshape() and x_test.Reshape().
For class-based classification, one-hot encode the categories using
to_categorical()
Build the model using the sequential.Add() function.
Add a convolutional layer.
Add a pooling layer
Add a “flatten” layer which prepares a vector for the fully connected
layers.
Add one or more fully connected layer.
Compile the model using model.Compile().
Train the model using model.Fit(), supplying x_train(), x_test(),
y_train() and y_test()
22. CNN
layers
Filters: integer, the
dimensionality ofthe output space.
Strides: an integer or tuple/list of 2
integers, specifying the strides of
the
convolution along the height and
width.
When using the conv2d layer first
we give the input shape ex
48,48,1. We use grayscale
images.
Padding: one of "valid" or "same"
(case-insensitive). "Valid" means
no padding. "Same" results in
padding evenly to the left/right or
up/down of the input such that
output has the same height/width
dimension as the input.
23. Normalize the activations of
the previous layer at each
batch, i.e. Applies a
transformation that
maintains the mean
activation close to 0 and the
activation standard deviation
close to 1.
Down samples the input
representation by taking the
maximum value over the
window defined by pool size
for each dimension along
the feature's axis. The
window is shifted by strides
in each dimension.
Dropout is a technique where
26. Activatio
ns
Relu
The rectified linear unit is the most
used activation function in deep
learning models. The function returns 0
if it receives any negative input, but for
any positive value x it returns that
value back. So it can be written as
f(x)=max(0,x) .
SoftMax
The SoftMax function is a function that
turns a vector of K
real values into a vector of K real
values that sum to 1. The input values
can be positive, negative, zero, or
27. Train and evaluate
model
Sample
One element of a dataset. For instance, one image is a sample in a
convolutional network. One audio snippet is a
sample for a speech recognitionmodel.
Batch
A set of N samples. The samples in a batch are processed independently, in
parallel.If training, a batch results in only one update to the model. A batch
generally approximates the distribution of the input data better than a single
input.
Epochs
An arbitrary cutoff, generally defined as "one pass over the entire dataset",
used to separate traininginto distinct phases, which is useful for logging and
periodic evaluation. When using validation_data or validation_split with the
fit method of keras models, evaluation will be run at the end of every epoch.
28. Callba
cksA callback is an object that can perform actions at various stages of
training (e.g. At the start or end of an epoch, before or after a single
batch, etc).
Callback to save the keras model or model weights at some frequency.
Modelcheckpoint
Modelcheckpoint callback is used in conjunction with training using
model.Fit() to save a model or weights (in a checkpoint file) at some
interval, so the model or weights can be loaded later to continue the
training from the state saved.
Reducelronplateau
Models often benefit from reducing the learning rate by a factor of 2-10
once learning stagnates. This callback monitors a quantity and if no
improvement is seen for a 'patience' number of epochs, the learning rate
is reduced.
29. Model
fitModel fitting is a measure of how well a machine learning model
generalizes to similar data to that on which it was trained. A model that is
well-fitted produces more accurate outcomes. A model that is overfitted
matches the data too closely. A model that is underfitted doesn’t match
closely enough. Models are trained by numpy arrays using fit(). The main
purpose of this fit function is used to evaluate your model on training.
To train a model with fit(), you need to specify a loss function, an
optimizer, and optionally, some metrics to monitor. If your model has
multiple outputs, you can specify different losses and metrics for each
output, and you can modulate the contribution of each output to the total
loss of the model.
Adam optimization is a stochastic gradient descent method that is based
on adaptive estimation of first-order
and second-order moments.
30. The compilation is the final step in
creating a model. Once the
compilation is done, we can move
on to training phase.Histo
ryOne of the default callbacks that is
registered when training all deep
learning models is the history
callback. Its records training
metrics for each epoch. This
includes the loss and the accuracy
(for classification problems) as
well as the loss and accuracy for
the validation dataset, if one is set.
The history object is returned from
calls to the fit() function used to
train the model. Metrics are stored
in a dictionary in the history
member of the object returned.
31. Visualization of
historyWe can create plots from the collected history data.
A plot of accuracy on the training and validation datasets over
training epochs.
A plot of loss on the training and validation datasets over training
epochs.
32. From the plot of accuracy we can see that the model could probably be
trained a little more as the trend for accuracy on both datasets is still rising
for the last few epochs. We can also see that the model has not yet over-
learned the training dataset, showing comparable skill on both datasets.
From the plot of loss, we can see that the model has comparable
performance on both train and validation datasets (labeled test). If these
parallel plots start to depart consistently, it might be a sign to stop training at
an earlier epoch.
Represent model as json
JSON is a simple file format for describing data hierarchically.
Keras provides the ability to describe any model using json format with a
to_json() function. This can be saved to file and later loaded via the
model_from_json() function that will create a new model from the JSON
specification.
The weights are saved directly from the model using the save_weights()
function and later loaded using the
symmetrical load_weights() function.
33. Saving keras
model
We can include the model in our code to give the inputs and predict
the outputs to cv2 and display an output after processing the input
weprovide to model we build.
34. Metrics and
accuracy
Keras allows you to list the metrics to monitor during the
training of your model.
You can do this by specifying the “metrics” argument and
providing a list of function names
to the compile() function on your model.
All metrics are reported in verbose output and in the history
object returned from calling
the fit() function.
Regardless of whether your problem is a binary or multi-class
classification problem, you
can specify the ‘accuracy‘ metric to report on accuracy.
35. Crossentro
pyAs part of the optimization algorithm, the error for the current state of the
model must be estimated repeatedly. This requires the choice of an error
function, conventionally called a loss function, that can be used to estimate
the loss of the model so that the weights can be updated to reduce the loss
on the next evaluation.
Cross-entropy is the default loss function to use for binary classification
problems.
It is intended for use with binary classification where the target values are in
the set {0, 1}.
Mathematically, it is the preferred loss function under the inference
framework of maximum likelihood. It is the
loss function to be evaluated first and only changed if you have a good
reason.
Cross-entropy is a measure from the field of information theory, building
upon entropy and generally calculating the difference between two
36. Web
applicationThe output of data is given to a web application.
We built the web application with flask.
Flask is a lightweight wsgi web application framework. It is designed to make
getting started quick and easy, with
the ability to scale up to complex applications.
It began as a simple wrapper around werkzeug and jinja and has become
one of the most popular python web
application frameworks.
We us render template .
We give the path for input in video feed. If the input is a video, we the video
path or if input is from the webcam,
we give 0 in definition.
39. Conclusi
onWe made a CNN model to predict the facial expression and solve real time
problems.
We build a web application using flask to display output.
We calculated accuracy of model its accuracy is 66.7 .
Applications of facial expression recognizing.
Facial expressions and other gestures convey nonverbal communication
cues that play an important role in
interpersonal relations.
Computer can monitor and counsel person by using an emotions.
For businesses, since facial expression recognition software delivers raw
emotional responses, it can provide
valuable information about the sentiment of a target audience towards a
marketing message, product or brand.