SlideShare a Scribd company logo
1 of 21
Download to read offline
Data Science with Python
Facial Expression Recognition
Final Project Report - OPIM 5894 - Data Science with Python
Team Brogrammers: Santanu Paul, Sree Inturi, Saurav Gupta, Vibhuti Upadhyay, Sunender Pothula
Nov 30 2017
Table of Contents
Facial Expression Recognition .............................................................................................. 1
1. Introduction................................................................................................................................ 1
1.1 Background – What is Representation Learning? .......................................................................1
1.2 Research Objectives..................................................................................................................2
2. Data Description and Exploration.............................................................................................. 2
2.1 About the Dataset.................................................................................................................... 2
2.2 Data Exploration.......................................................................................................................3
2.3 Data Preprocessing...................................................................................................................4
3. Dimensionality Reduction.......................................................................................................... 4
3.1 Curse of Dimensionality............................................................................................................4
3.2 Principal Component Analysis...................................................................................................5
Takeaway from the plot: ........................................................................................................6
Visualizing the Eigen Value:..................................................................................................6
Interactive visualizations of PCArepresentation...............................................................7
Improvements:........................................................................................................................8
3.3 Linear Discriminant Analysis......................................................................................................9
4. Modeling................................................................................................................................... 11
4.1 Support Vector Machine .........................................................................................................11
4.2 Neural Networks.....................................................................................................................14
4.3 Conclusion..............................................................................................................................16
5. Scope for Improvement ........................................................................................................... 18
5.1 CNN (Convolutional Neural Network) And Parameter Tuning...................................................18
Attachments – Python Notebooks and Code.............................................................................. 19
1
1. Introduction
1.1 Background– What is Representation Learning?
Talking about the older machine learning algorithms, they rely on the input being a
feature and then learn a classifier, regressor, etc. on top of that. Most of these features are
hand crafted, i.e. designed by humans. Classical examples of features in computer vision
include SIFT, LBP, etc. The problem with these is that they are designed by humans based
on heuristics. Images can be represented using these features and ML algorithms can be
applied on top of that. However, they may not be the most optimal in terms of the objective
function, i.e., it may be possible to design better features that can lead to lower objective
function values. Instead of hand crafting these image representations, we can learn them.
That is known as representation learning. We can have a neural network which takes the
image as an input and outputs a vector, which is the feature representation of the image. This
is the representation learner. This be followed by another neural network that acts as the
classifier, regressor, etc.
For example: A wheel has a geometric shape, but its image may be complicated by
shadows falling on the wheel, the sun glaring off the metal parts of the wheel, the fender of
the car or an object in the foreground obscuring part of the wheel, and so on. We can try to
manually describe how a wheel should look like and how it can be represented. Say, it should
be circular, be black in color, have treads, etc. But these are all hand-crafted features and
may not generalize to all situations. For example, if you look at the wheel from a different
angle, it might be oval in shape. Or the lighting may cause it to have lighter and darker
patches. These kinds of variations are hard to account for manually. Instead, we can let the
representation learning neural network learn them from data by giving it several positive and
negative examples of a wheel and training it end to end.
2
1.2 ResearchObjectives
The major objective of this project is to classify an image using its facial expression.
(1) Image Classification
We are presenting a method for the classification of facial expression from the analysis of
facial deformations. The classification process is based on Convolutional Neural Networks which
classifies an image as “Happy” or “Sad”. Our Neural Network model extracts an expression
skeleton of facial features. We also demonstrate the efficiency of our classifier. Our classifier was
compared with PCA and LDA classifiers working on the same data.
2. Data Description and Exploration
The data set used in this project is Challenges in Representation Learning: Facial
Expression Recognition Challenge, which contains 48x48 pixel grayscale images of faces. The
faces have been automatically registered so that the face is more or less centered and occupies
about the same amount of space in each image. The task is to categorize each face based on
the emotion shown in the facial expression in to one of two categories (3=Happy, 4=Sad)
2.1 About the Dataset
The training set consists of 15,066 examples (Happy:8989, Sad:6077) and two columns,
"emotion" and "pixels”. The "emotion" column contains a numeric code i.e. 3 & 4, inclusive, for
the emotion that is present in the image. The "pixels" column contains a string surrounded in
quotes for each image. The contents of this string a space-separated pixel values in row major
order.
Similarly test set used for the leaderboard consists of 3,589 examples and contains only
the "pixels" column and our task is to predict the emotion (Happy or Sad)
There were no missing values in our data set, it was a clean dataset.
Value Counts of our data points: Our data set is quite balanced.
3
Screenshotofdata
2.2 Data Exploration
As we had pixel information in the pixel column, our first goal is to split the pixel column into multiple fields,
so that we get a rough idea how a final 48*48 picture looks like.
4
Let’s see how the emotions looks like
Happy Face Sad Face
2.3 Data Preprocessing
Standardization: Standardization is a good practice for many machine learning algorithms.
Although our data is on the same scale i.e. values (1 to 255) we still preferred to do Standardize
our data.
3. Dimensionality Reduction
3.1 Curse of Dimensionality
This term has often been thrown about, especially when PCA, LDA is thrown into the mix.
This phrase refers to how our perfectly good and reliable Machine Learning methods may
suddenly perform badly when we are dealing in a very high-dimensional space. But what exactly
do all these two acronyms do? They are essentially transformation methods used for
dimensionality reduction. Therefore, if we are able to project our data from a higher-dimensional
space to a lower one while keeping most of the relevant information, that would make life a lot
easier for our learning methods.
In our data, there are 48 X 48 pixel images of data contributing to 2307 columns.Modeling
in such high dimensional space our model could perform badly so it’s perfect time to introduce
Dimensionality Reduction methods.
5
3.2 PrincipalComponentAnalysis
In a nutshell, PCA is a linear transformation algorithm that seeks to project the original
features of our data onto a smaller set of features (or subspace) while still retaining most of the
information. To do this the algorithm tries to find the most appropriate directions/angles (which
are the principal components) that maximize the variance in the new subspace.
We know that principal components are orthogonal to each other. As such when
generating the covariance matrix in our new subspace, the off-diagonal values of the covariance
matrix will be zero and only the diagonals (or eigenvalues) will be non-zero. It is these diagonal
values that represent the variances of the principal components i.e. the information about the
variability of our features.
This is how our final preprocessed data looks like:
The method follows:
1. Standardize the data (already done)
2. Calculating Eigen Vectors and Eigen Values of Covariance matrix
3. Create a list of (Eigen Value, Eigen Vector) tuples
6
4. Sort the Eigen Value, Eigen Vector pair from high to low
5. Calculate the explained variance from Eigen Values
Takeaway from the plot:
There are two plots above, a smaller one embedded within the larger plot. The smaller
plot (Green and Red) shows the distribution of the Individual and Explained variances across all
features while the larger plot (Golden and black) portrays a zoomed section of the explained
variances only.
As we can see, out of our 2304 features or columns approximately 90% of the Explained
Variance can be described by using just over 107 features. So, if we wanted to implement a PCA
on this, extracting the top 107 features would be a very logical choice as they already account for
the majority of the data
Visualizing the Eigen Value:
As alluded to above, since the PCA method seeks to obtain the optimal directions (or
eigenvectors) that captures the most variance (spreads out the data points the most). Therefore,
7
it may be informative to visualize these directions and their associated eigenvalues. For the
purposes of this notebook and for speed, I will invoke PCA to only extract the top 28. Of interest
is when one compares the first component "Eigenvalue 1" to the 28th component "Eigenvalue
28", it is obvious that more complicated directions or components are being generated in the
search to maximize variance in the new feature subspace.
Interactive visualizations of PCA representation
When it comes to these dimensionality reduction methods, scatter plots are most
commonly implemented because they allow for great and convenient visualizations of clustering
(if any existed) and this will be exactly what we will be doing as we plot the first 2 principal
components as follows. We observed that there are no observable clusters for first two Principal
Components.
8
Improvements:
Looking at the reconstruction of the original image vs the image generated after PCA, it appears
that reconstructed images are not very similar to the original ones so as to discern them
categorically. Facial expressions can be subtle and lot more information will be needed to detect
them.
Sometimes, even naked eyes fail to understand the reconstructed images' emotions. Hence,
90% is not enough information. Let's move to 95% variance (259 components)
9
But as we know PCA is meant to be an unsupervised method and therefore not optimized for
separating different class labels. Classifying more accurately is what we try to accomplish by the
very next method i.e. LDA.
3.3 Linear DiscriminantAnalysis
LDA, much like PCA is also a linear transformation method commonly used in
dimensionality reduction tasks. However, unlike the latter which is an unsupervised learning
algorithm, LDA falls into the class of supervised learning methods. As such the goal of LDA is that
with available information about class labels, LDA will seek to maximize the separation between
the different classes by computing the component axes (linear discriminants) which does this.
LDA Implementation from Scratch
The objective of LDA is to preserve the class separation information whilst still reducing
the dimensions of the dataset. As such implementing the method from scratch can roughly be
split into 4 distinct stages as below.
A. Projected Means
Since this method was designed to take into account class labels we therefore first need to
establish a suitable metric with which to measure the 'distance' or separation between different
10
classes. Let's assume that we have a set of data points x that belong to one particular class w.
Therefore, in LDA the first step is to the project these points onto a new line, Y that contains the
class-specific information via the transformation
$$Y = omega^intercal x $$
With this the idea is to find some method that maximizes the separation of these new projected
variables. To do so, we first calculate the projected mean.
B. Scatter Matrices and their solutions: Having introduced our projected means, we now need
to find a function that can represent the difference between the means and then maximize it. Like
in linear regression, where the most basic case is to find the line of best fit we need to find the
equivalent of the variance in this context. And hence this is where we introduce scatter matrices
where the scatter is the equivalent of the variance.
$$ tilde{S}^{2} = (y - tilde{mu})^{2}$$
C. Selecting Optimal Projection Matrices
D. Transforming features onto new subspace
LDA Implementation via Sklearn: We used Sklearn inbuilt LDA function and hence we invoke
an LDA model as follows:
The syntax for the LDA implementation is very much like PCA whereby one calls the fit and
transform methods which fits the LDA model with the data and then does a transformation by
applying the LDA dimensionality reduction to it. However, since LDA is a supervised learning
algorithm, there is a second argument to the method that the user must provide and this would
be the class labels, which in this case is the target labels of the digits.
11
Interactive visualizations of LDA representation:
From the scatter plot above, we can see that the data points are more clearly clustered when
using LDA with as compared to implementing PCA with class labels. This is an inherent advantage
in having class labels to supervise the method with.
4. Modeling
4.1 SupportVectorMachine
SVM can be considered as an extension of the perceptron. Using the perceptron algorithm, we
can minimize misclassification errors. However, in SVMs, our optimization objective is
to maximize the margin between the classes. The margin is defined as the distance between the
separating hyperplane (decision boundary) and the training samples (support vectors) that are
closest to this hyperplane.
12
Input X: Components from PCA i.e. 107
Running a SVM classifier with default parameters on it we get the accuracy of 62%
Input X: Components from PCA i.e. 259 components.
Running a SVM classifier with default parameters on it we get the accuracy of 65%
13
14
Input X: Output from LDA, i.e. LD 1
Running a SVM classifier with default parameters we get accuracy of 66.4%
Misclassification rate is 33.6%, we will try to fit a neural network model so that our model classifies
with more accuracy.
4.2 NeuralNetworks
A computational model that works in a similar way to the neurons in the human brain.
Each neuron takes an input, performs some operations then passes the output to the following
neuron. As we are done pre-processing and splitting our dataset we can start implementing our
neural network
We have designed a simple neural network with one hidden layer i.e. Vanilla NN with 50
nodes and the Hyperbolic Tangent Activation Function
15
We have used a simple neural network with one hidden layers having 50 nodes. The learning
rate used is also quite low in order to find the optimum solution. A mix of gradient descent and
momentum method is used. Tangent hyperbolic function is applied in the hidden layer, and a
cross entropy loss function is used from the softmax output. An accuracy of 65.8% was
achieved.
16
The maximum accuracy is achieved rather quickly in this method using gradient descent and
momentum.
4.3 Conclusion
As our model is misclassifying 33 times out of 100. We tried to look at the initial image,
what features it is not able to predict right. Pictures like following is what our model is not able to
predict right. Maybe because of the hair or the eyes or maybe because of the lightning. As the
image set is very discrete there may be some error there. Because of the time constraint we
were not able to run CNN (Convolutional Neural Network) on the dataset. But that would be our
next step.
17
Many pictures in our data had watermarks just like this one, which were misclassified. Majority
of our training data doesn’t have watermarks, that is also the reason it is not able to classify to
the maximum capacity.
18
5. Scope for Improvement
5.1 CNN (ConvolutionalNeural Network)And ParameterTuning
We were not able to tune the parameters of our neural network model because of time
crunch and it took a lot of time in training this huge dataset. So, going forward not for the grades
but for our self-learning we will be focusing on Tenserflow and CNN.
Traditional neural networks that are very good at doing image classification have many
more parameters and take a lot of time if trained on CPU. They are faster and are applied heavily
in image and video recognition, recommender systems and natural language processing. CNNs
share weights in convolutional layers, which means that the same filter weights bank is used for
each receptive field in the layer; this reduces memory footprint and improves performance.
19
Attachments – Python Notebooks and Code
1.Python Project_Image Classification.ipynb
Initial Data exploration and Preprocessing. Dimensionality Reduction by PCA, LDA
2. Python Project_Image Classification2.ipynb
SVM Implementation on top of PCA and LDA (Comparison)
3. Vanilla Neural Network.ipynb
Neural Network Implementation

More Related Content

What's hot

Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine LearningPranav Challa
 
Achieving Optimal Privacy in Trust-Aware Collaborative Filtering Recommender ...
Achieving Optimal Privacy in Trust-Aware Collaborative Filtering Recommender ...Achieving Optimal Privacy in Trust-Aware Collaborative Filtering Recommender ...
Achieving Optimal Privacy in Trust-Aware Collaborative Filtering Recommender ...Nima Dokoohaki
 
Exploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image SegmentationExploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image SegmentationChristopher Peter Makris
 
Supervised learning (2)
Supervised learning (2)Supervised learning (2)
Supervised learning (2)AlexAman1
 
CS583-unsupervised-learning.ppt
CS583-unsupervised-learning.pptCS583-unsupervised-learning.ppt
CS583-unsupervised-learning.pptHathiramN1
 
Computing Ratings and Rankings by Mining Feedback Comments
Computing Ratings and Rankings by Mining Feedback CommentsComputing Ratings and Rankings by Mining Feedback Comments
Computing Ratings and Rankings by Mining Feedback CommentsIRJET Journal
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesijsc
 
PROFIT AGENT CLASSIFICATION USING FEATURE SELECTION EIGENVECTOR CENTRALITY
PROFIT AGENT CLASSIFICATION USING FEATURE SELECTION EIGENVECTOR CENTRALITYPROFIT AGENT CLASSIFICATION USING FEATURE SELECTION EIGENVECTOR CENTRALITY
PROFIT AGENT CLASSIFICATION USING FEATURE SELECTION EIGENVECTOR CENTRALITYIAEME Publication
 
LOSSLESS RECONSTRUCTION OF SECRET IMAGE USING THRESHOLD SECRET SHARING AND TR...
LOSSLESS RECONSTRUCTION OF SECRET IMAGE USING THRESHOLD SECRET SHARING AND TR...LOSSLESS RECONSTRUCTION OF SECRET IMAGE USING THRESHOLD SECRET SHARING AND TR...
LOSSLESS RECONSTRUCTION OF SECRET IMAGE USING THRESHOLD SECRET SHARING AND TR...IJNSA Journal
 
Prediction of Changes That May Occur in the Neutral Cases in Conflict Theory ...
Prediction of Changes That May Occur in the Neutral Cases in Conflict Theory ...Prediction of Changes That May Occur in the Neutral Cases in Conflict Theory ...
Prediction of Changes That May Occur in the Neutral Cases in Conflict Theory ...IJCSIS Research Publications
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionJordan McBain
 
07 Statistical approaches to randomization
07 Statistical approaches to randomization07 Statistical approaches to randomization
07 Statistical approaches to randomizationdnac
 
Sentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataSentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataIRJET Journal
 
An Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learningAn Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learningChetan Khatri
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 
Trust management scheme for ad hoc networks using a social network bas
Trust management scheme for ad hoc networks using a social network basTrust management scheme for ad hoc networks using a social network bas
Trust management scheme for ad hoc networks using a social network basIAEME Publication
 

What's hot (19)

Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
Achieving Optimal Privacy in Trust-Aware Collaborative Filtering Recommender ...
Achieving Optimal Privacy in Trust-Aware Collaborative Filtering Recommender ...Achieving Optimal Privacy in Trust-Aware Collaborative Filtering Recommender ...
Achieving Optimal Privacy in Trust-Aware Collaborative Filtering Recommender ...
 
Understandig PCA and LDA
Understandig PCA and LDAUnderstandig PCA and LDA
Understandig PCA and LDA
 
Exploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image SegmentationExploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image Segmentation
 
Supervised learning (2)
Supervised learning (2)Supervised learning (2)
Supervised learning (2)
 
CS583-unsupervised-learning.ppt
CS583-unsupervised-learning.pptCS583-unsupervised-learning.ppt
CS583-unsupervised-learning.ppt
 
Computing Ratings and Rankings by Mining Feedback Comments
Computing Ratings and Rankings by Mining Feedback CommentsComputing Ratings and Rankings by Mining Feedback Comments
Computing Ratings and Rankings by Mining Feedback Comments
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniques
 
PROFIT AGENT CLASSIFICATION USING FEATURE SELECTION EIGENVECTOR CENTRALITY
PROFIT AGENT CLASSIFICATION USING FEATURE SELECTION EIGENVECTOR CENTRALITYPROFIT AGENT CLASSIFICATION USING FEATURE SELECTION EIGENVECTOR CENTRALITY
PROFIT AGENT CLASSIFICATION USING FEATURE SELECTION EIGENVECTOR CENTRALITY
 
LOSSLESS RECONSTRUCTION OF SECRET IMAGE USING THRESHOLD SECRET SHARING AND TR...
LOSSLESS RECONSTRUCTION OF SECRET IMAGE USING THRESHOLD SECRET SHARING AND TR...LOSSLESS RECONSTRUCTION OF SECRET IMAGE USING THRESHOLD SECRET SHARING AND TR...
LOSSLESS RECONSTRUCTION OF SECRET IMAGE USING THRESHOLD SECRET SHARING AND TR...
 
Prediction of Changes That May Occur in the Neutral Cases in Conflict Theory ...
Prediction of Changes That May Occur in the Neutral Cases in Conflict Theory ...Prediction of Changes That May Occur in the Neutral Cases in Conflict Theory ...
Prediction of Changes That May Occur in the Neutral Cases in Conflict Theory ...
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty Detection
 
07 Statistical approaches to randomization
07 Statistical approaches to randomization07 Statistical approaches to randomization
07 Statistical approaches to randomization
 
Sentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataSentiment Analysis on Twitter Data
Sentiment Analysis on Twitter Data
 
An Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learningAn Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learning
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Lda
LdaLda
Lda
 
Trust management scheme for ad hoc networks using a social network bas
Trust management scheme for ad hoc networks using a social network basTrust management scheme for ad hoc networks using a social network bas
Trust management scheme for ad hoc networks using a social network bas
 

Similar to Facial Expression Recognition

Mat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataMat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataKathleneNgo
 
Scene Description From Images To Sentences
Scene Description From Images To SentencesScene Description From Images To Sentences
Scene Description From Images To SentencesIRJET Journal
 
Clustering in Machine Learning.pdf
Clustering in Machine Learning.pdfClustering in Machine Learning.pdf
Clustering in Machine Learning.pdfSudhanshiBakre1
 
Standard Statistical Feature analysis of Image Features for Facial Images usi...
Standard Statistical Feature analysis of Image Features for Facial Images usi...Standard Statistical Feature analysis of Image Features for Facial Images usi...
Standard Statistical Feature analysis of Image Features for Facial Images usi...Bulbul Agrawal
 
SURVEY ON BRAIN – MACHINE INTERRELATIVE LEARNING
SURVEY ON BRAIN – MACHINE INTERRELATIVE LEARNINGSURVEY ON BRAIN – MACHINE INTERRELATIVE LEARNING
SURVEY ON BRAIN – MACHINE INTERRELATIVE LEARNINGIRJET Journal
 
Matrix mirrors project
Matrix mirrors projectMatrix mirrors project
Matrix mirrors projectRenato Roque
 
Face Emotion Analysis Using Gabor Features In Image Database for Crime Invest...
Face Emotion Analysis Using Gabor Features In Image Database for Crime Invest...Face Emotion Analysis Using Gabor Features In Image Database for Crime Invest...
Face Emotion Analysis Using Gabor Features In Image Database for Crime Invest...Waqas Tariq
 
Image–based face-detection-and-recognition-using-matlab
Image–based face-detection-and-recognition-using-matlabImage–based face-detection-and-recognition-using-matlab
Image–based face-detection-and-recognition-using-matlabIjcem Journal
 
Performance characterization in computer vision
Performance characterization in computer visionPerformance characterization in computer vision
Performance characterization in computer visionpotaters
 
Data Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingData Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingDerek Kane
 
Semantic Analysis Using MapReduce
Semantic Analysis Using MapReduceSemantic Analysis Using MapReduce
Semantic Analysis Using MapReduceAnkur Pandey
 
Criminal Detection System
Criminal Detection SystemCriminal Detection System
Criminal Detection SystemIntrader Amit
 
SessionSix_TransformingManipulatingDataObjects
SessionSix_TransformingManipulatingDataObjectsSessionSix_TransformingManipulatingDataObjects
SessionSix_TransformingManipulatingDataObjectsHellen Gakuruh
 
GROUPING OBJECTS BASED ON THEIR APPEARANCE
GROUPING OBJECTS BASED ON THEIR APPEARANCEGROUPING OBJECTS BASED ON THEIR APPEARANCE
GROUPING OBJECTS BASED ON THEIR APPEARANCEijaia
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Simplilearn
 
Desicion tree and neural networks
Desicion tree and neural networksDesicion tree and neural networks
Desicion tree and neural networksjaskarankaur21
 

Similar to Facial Expression Recognition (20)

Pca seminar final report
Pca seminar final reportPca seminar final report
Pca seminar final report
 
Mat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataMat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports Data
 
Scene Description From Images To Sentences
Scene Description From Images To SentencesScene Description From Images To Sentences
Scene Description From Images To Sentences
 
Data Science Machine
Data Science Machine Data Science Machine
Data Science Machine
 
Clustering in Machine Learning.pdf
Clustering in Machine Learning.pdfClustering in Machine Learning.pdf
Clustering in Machine Learning.pdf
 
Standard Statistical Feature analysis of Image Features for Facial Images usi...
Standard Statistical Feature analysis of Image Features for Facial Images usi...Standard Statistical Feature analysis of Image Features for Facial Images usi...
Standard Statistical Feature analysis of Image Features for Facial Images usi...
 
SURVEY ON BRAIN – MACHINE INTERRELATIVE LEARNING
SURVEY ON BRAIN – MACHINE INTERRELATIVE LEARNINGSURVEY ON BRAIN – MACHINE INTERRELATIVE LEARNING
SURVEY ON BRAIN – MACHINE INTERRELATIVE LEARNING
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
 
Matrix mirrors project
Matrix mirrors projectMatrix mirrors project
Matrix mirrors project
 
Face Emotion Analysis Using Gabor Features In Image Database for Crime Invest...
Face Emotion Analysis Using Gabor Features In Image Database for Crime Invest...Face Emotion Analysis Using Gabor Features In Image Database for Crime Invest...
Face Emotion Analysis Using Gabor Features In Image Database for Crime Invest...
 
Image–based face-detection-and-recognition-using-matlab
Image–based face-detection-and-recognition-using-matlabImage–based face-detection-and-recognition-using-matlab
Image–based face-detection-and-recognition-using-matlab
 
Performance characterization in computer vision
Performance characterization in computer visionPerformance characterization in computer vision
Performance characterization in computer vision
 
Data Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingData Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image Processing
 
Semantic Analysis Using MapReduce
Semantic Analysis Using MapReduceSemantic Analysis Using MapReduce
Semantic Analysis Using MapReduce
 
Image processing
Image processingImage processing
Image processing
 
Criminal Detection System
Criminal Detection SystemCriminal Detection System
Criminal Detection System
 
SessionSix_TransformingManipulatingDataObjects
SessionSix_TransformingManipulatingDataObjectsSessionSix_TransformingManipulatingDataObjects
SessionSix_TransformingManipulatingDataObjects
 
GROUPING OBJECTS BASED ON THEIR APPEARANCE
GROUPING OBJECTS BASED ON THEIR APPEARANCEGROUPING OBJECTS BASED ON THEIR APPEARANCE
GROUPING OBJECTS BASED ON THEIR APPEARANCE
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
 
Desicion tree and neural networks
Desicion tree and neural networksDesicion tree and neural networks
Desicion tree and neural networks
 

Recently uploaded

Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 

Recently uploaded (20)

Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 

Facial Expression Recognition

  • 1. Data Science with Python Facial Expression Recognition Final Project Report - OPIM 5894 - Data Science with Python Team Brogrammers: Santanu Paul, Sree Inturi, Saurav Gupta, Vibhuti Upadhyay, Sunender Pothula Nov 30 2017
  • 2. Table of Contents Facial Expression Recognition .............................................................................................. 1 1. Introduction................................................................................................................................ 1 1.1 Background – What is Representation Learning? .......................................................................1 1.2 Research Objectives..................................................................................................................2 2. Data Description and Exploration.............................................................................................. 2 2.1 About the Dataset.................................................................................................................... 2 2.2 Data Exploration.......................................................................................................................3 2.3 Data Preprocessing...................................................................................................................4 3. Dimensionality Reduction.......................................................................................................... 4 3.1 Curse of Dimensionality............................................................................................................4 3.2 Principal Component Analysis...................................................................................................5 Takeaway from the plot: ........................................................................................................6 Visualizing the Eigen Value:..................................................................................................6 Interactive visualizations of PCArepresentation...............................................................7 Improvements:........................................................................................................................8 3.3 Linear Discriminant Analysis......................................................................................................9 4. Modeling................................................................................................................................... 11 4.1 Support Vector Machine .........................................................................................................11 4.2 Neural Networks.....................................................................................................................14 4.3 Conclusion..............................................................................................................................16 5. Scope for Improvement ........................................................................................................... 18 5.1 CNN (Convolutional Neural Network) And Parameter Tuning...................................................18 Attachments – Python Notebooks and Code.............................................................................. 19
  • 3. 1 1. Introduction 1.1 Background– What is Representation Learning? Talking about the older machine learning algorithms, they rely on the input being a feature and then learn a classifier, regressor, etc. on top of that. Most of these features are hand crafted, i.e. designed by humans. Classical examples of features in computer vision include SIFT, LBP, etc. The problem with these is that they are designed by humans based on heuristics. Images can be represented using these features and ML algorithms can be applied on top of that. However, they may not be the most optimal in terms of the objective function, i.e., it may be possible to design better features that can lead to lower objective function values. Instead of hand crafting these image representations, we can learn them. That is known as representation learning. We can have a neural network which takes the image as an input and outputs a vector, which is the feature representation of the image. This is the representation learner. This be followed by another neural network that acts as the classifier, regressor, etc. For example: A wheel has a geometric shape, but its image may be complicated by shadows falling on the wheel, the sun glaring off the metal parts of the wheel, the fender of the car or an object in the foreground obscuring part of the wheel, and so on. We can try to manually describe how a wheel should look like and how it can be represented. Say, it should be circular, be black in color, have treads, etc. But these are all hand-crafted features and may not generalize to all situations. For example, if you look at the wheel from a different angle, it might be oval in shape. Or the lighting may cause it to have lighter and darker patches. These kinds of variations are hard to account for manually. Instead, we can let the representation learning neural network learn them from data by giving it several positive and negative examples of a wheel and training it end to end.
  • 4. 2 1.2 ResearchObjectives The major objective of this project is to classify an image using its facial expression. (1) Image Classification We are presenting a method for the classification of facial expression from the analysis of facial deformations. The classification process is based on Convolutional Neural Networks which classifies an image as “Happy” or “Sad”. Our Neural Network model extracts an expression skeleton of facial features. We also demonstrate the efficiency of our classifier. Our classifier was compared with PCA and LDA classifiers working on the same data. 2. Data Description and Exploration The data set used in this project is Challenges in Representation Learning: Facial Expression Recognition Challenge, which contains 48x48 pixel grayscale images of faces. The faces have been automatically registered so that the face is more or less centered and occupies about the same amount of space in each image. The task is to categorize each face based on the emotion shown in the facial expression in to one of two categories (3=Happy, 4=Sad) 2.1 About the Dataset The training set consists of 15,066 examples (Happy:8989, Sad:6077) and two columns, "emotion" and "pixels”. The "emotion" column contains a numeric code i.e. 3 & 4, inclusive, for the emotion that is present in the image. The "pixels" column contains a string surrounded in quotes for each image. The contents of this string a space-separated pixel values in row major order. Similarly test set used for the leaderboard consists of 3,589 examples and contains only the "pixels" column and our task is to predict the emotion (Happy or Sad) There were no missing values in our data set, it was a clean dataset. Value Counts of our data points: Our data set is quite balanced.
  • 5. 3 Screenshotofdata 2.2 Data Exploration As we had pixel information in the pixel column, our first goal is to split the pixel column into multiple fields, so that we get a rough idea how a final 48*48 picture looks like.
  • 6. 4 Let’s see how the emotions looks like Happy Face Sad Face 2.3 Data Preprocessing Standardization: Standardization is a good practice for many machine learning algorithms. Although our data is on the same scale i.e. values (1 to 255) we still preferred to do Standardize our data. 3. Dimensionality Reduction 3.1 Curse of Dimensionality This term has often been thrown about, especially when PCA, LDA is thrown into the mix. This phrase refers to how our perfectly good and reliable Machine Learning methods may suddenly perform badly when we are dealing in a very high-dimensional space. But what exactly do all these two acronyms do? They are essentially transformation methods used for dimensionality reduction. Therefore, if we are able to project our data from a higher-dimensional space to a lower one while keeping most of the relevant information, that would make life a lot easier for our learning methods. In our data, there are 48 X 48 pixel images of data contributing to 2307 columns.Modeling in such high dimensional space our model could perform badly so it’s perfect time to introduce Dimensionality Reduction methods.
  • 7. 5 3.2 PrincipalComponentAnalysis In a nutshell, PCA is a linear transformation algorithm that seeks to project the original features of our data onto a smaller set of features (or subspace) while still retaining most of the information. To do this the algorithm tries to find the most appropriate directions/angles (which are the principal components) that maximize the variance in the new subspace. We know that principal components are orthogonal to each other. As such when generating the covariance matrix in our new subspace, the off-diagonal values of the covariance matrix will be zero and only the diagonals (or eigenvalues) will be non-zero. It is these diagonal values that represent the variances of the principal components i.e. the information about the variability of our features. This is how our final preprocessed data looks like: The method follows: 1. Standardize the data (already done) 2. Calculating Eigen Vectors and Eigen Values of Covariance matrix 3. Create a list of (Eigen Value, Eigen Vector) tuples
  • 8. 6 4. Sort the Eigen Value, Eigen Vector pair from high to low 5. Calculate the explained variance from Eigen Values Takeaway from the plot: There are two plots above, a smaller one embedded within the larger plot. The smaller plot (Green and Red) shows the distribution of the Individual and Explained variances across all features while the larger plot (Golden and black) portrays a zoomed section of the explained variances only. As we can see, out of our 2304 features or columns approximately 90% of the Explained Variance can be described by using just over 107 features. So, if we wanted to implement a PCA on this, extracting the top 107 features would be a very logical choice as they already account for the majority of the data Visualizing the Eigen Value: As alluded to above, since the PCA method seeks to obtain the optimal directions (or eigenvectors) that captures the most variance (spreads out the data points the most). Therefore,
  • 9. 7 it may be informative to visualize these directions and their associated eigenvalues. For the purposes of this notebook and for speed, I will invoke PCA to only extract the top 28. Of interest is when one compares the first component "Eigenvalue 1" to the 28th component "Eigenvalue 28", it is obvious that more complicated directions or components are being generated in the search to maximize variance in the new feature subspace. Interactive visualizations of PCA representation When it comes to these dimensionality reduction methods, scatter plots are most commonly implemented because they allow for great and convenient visualizations of clustering (if any existed) and this will be exactly what we will be doing as we plot the first 2 principal components as follows. We observed that there are no observable clusters for first two Principal Components.
  • 10. 8 Improvements: Looking at the reconstruction of the original image vs the image generated after PCA, it appears that reconstructed images are not very similar to the original ones so as to discern them categorically. Facial expressions can be subtle and lot more information will be needed to detect them. Sometimes, even naked eyes fail to understand the reconstructed images' emotions. Hence, 90% is not enough information. Let's move to 95% variance (259 components)
  • 11. 9 But as we know PCA is meant to be an unsupervised method and therefore not optimized for separating different class labels. Classifying more accurately is what we try to accomplish by the very next method i.e. LDA. 3.3 Linear DiscriminantAnalysis LDA, much like PCA is also a linear transformation method commonly used in dimensionality reduction tasks. However, unlike the latter which is an unsupervised learning algorithm, LDA falls into the class of supervised learning methods. As such the goal of LDA is that with available information about class labels, LDA will seek to maximize the separation between the different classes by computing the component axes (linear discriminants) which does this. LDA Implementation from Scratch The objective of LDA is to preserve the class separation information whilst still reducing the dimensions of the dataset. As such implementing the method from scratch can roughly be split into 4 distinct stages as below. A. Projected Means Since this method was designed to take into account class labels we therefore first need to establish a suitable metric with which to measure the 'distance' or separation between different
  • 12. 10 classes. Let's assume that we have a set of data points x that belong to one particular class w. Therefore, in LDA the first step is to the project these points onto a new line, Y that contains the class-specific information via the transformation $$Y = omega^intercal x $$ With this the idea is to find some method that maximizes the separation of these new projected variables. To do so, we first calculate the projected mean. B. Scatter Matrices and their solutions: Having introduced our projected means, we now need to find a function that can represent the difference between the means and then maximize it. Like in linear regression, where the most basic case is to find the line of best fit we need to find the equivalent of the variance in this context. And hence this is where we introduce scatter matrices where the scatter is the equivalent of the variance. $$ tilde{S}^{2} = (y - tilde{mu})^{2}$$ C. Selecting Optimal Projection Matrices D. Transforming features onto new subspace LDA Implementation via Sklearn: We used Sklearn inbuilt LDA function and hence we invoke an LDA model as follows: The syntax for the LDA implementation is very much like PCA whereby one calls the fit and transform methods which fits the LDA model with the data and then does a transformation by applying the LDA dimensionality reduction to it. However, since LDA is a supervised learning algorithm, there is a second argument to the method that the user must provide and this would be the class labels, which in this case is the target labels of the digits.
  • 13. 11 Interactive visualizations of LDA representation: From the scatter plot above, we can see that the data points are more clearly clustered when using LDA with as compared to implementing PCA with class labels. This is an inherent advantage in having class labels to supervise the method with. 4. Modeling 4.1 SupportVectorMachine SVM can be considered as an extension of the perceptron. Using the perceptron algorithm, we can minimize misclassification errors. However, in SVMs, our optimization objective is to maximize the margin between the classes. The margin is defined as the distance between the separating hyperplane (decision boundary) and the training samples (support vectors) that are closest to this hyperplane.
  • 14. 12 Input X: Components from PCA i.e. 107 Running a SVM classifier with default parameters on it we get the accuracy of 62% Input X: Components from PCA i.e. 259 components. Running a SVM classifier with default parameters on it we get the accuracy of 65%
  • 15. 13
  • 16. 14 Input X: Output from LDA, i.e. LD 1 Running a SVM classifier with default parameters we get accuracy of 66.4% Misclassification rate is 33.6%, we will try to fit a neural network model so that our model classifies with more accuracy. 4.2 NeuralNetworks A computational model that works in a similar way to the neurons in the human brain. Each neuron takes an input, performs some operations then passes the output to the following neuron. As we are done pre-processing and splitting our dataset we can start implementing our neural network We have designed a simple neural network with one hidden layer i.e. Vanilla NN with 50 nodes and the Hyperbolic Tangent Activation Function
  • 17. 15 We have used a simple neural network with one hidden layers having 50 nodes. The learning rate used is also quite low in order to find the optimum solution. A mix of gradient descent and momentum method is used. Tangent hyperbolic function is applied in the hidden layer, and a cross entropy loss function is used from the softmax output. An accuracy of 65.8% was achieved.
  • 18. 16 The maximum accuracy is achieved rather quickly in this method using gradient descent and momentum. 4.3 Conclusion As our model is misclassifying 33 times out of 100. We tried to look at the initial image, what features it is not able to predict right. Pictures like following is what our model is not able to predict right. Maybe because of the hair or the eyes or maybe because of the lightning. As the image set is very discrete there may be some error there. Because of the time constraint we were not able to run CNN (Convolutional Neural Network) on the dataset. But that would be our next step.
  • 19. 17 Many pictures in our data had watermarks just like this one, which were misclassified. Majority of our training data doesn’t have watermarks, that is also the reason it is not able to classify to the maximum capacity.
  • 20. 18 5. Scope for Improvement 5.1 CNN (ConvolutionalNeural Network)And ParameterTuning We were not able to tune the parameters of our neural network model because of time crunch and it took a lot of time in training this huge dataset. So, going forward not for the grades but for our self-learning we will be focusing on Tenserflow and CNN. Traditional neural networks that are very good at doing image classification have many more parameters and take a lot of time if trained on CPU. They are faster and are applied heavily in image and video recognition, recommender systems and natural language processing. CNNs share weights in convolutional layers, which means that the same filter weights bank is used for each receptive field in the layer; this reduces memory footprint and improves performance.
  • 21. 19 Attachments – Python Notebooks and Code 1.Python Project_Image Classification.ipynb Initial Data exploration and Preprocessing. Dimensionality Reduction by PCA, LDA 2. Python Project_Image Classification2.ipynb SVM Implementation on top of PCA and LDA (Comparison) 3. Vanilla Neural Network.ipynb Neural Network Implementation