Real time emotion_detection_from_videos

REPUBLIC OF TURKEY
YILDIZ TECHNICAL UNIVERSITY
FACULTY OF ELECTRICAL AND ELECTRONICS
DEPARTMENT OF COMPUTER ENGINEERING
REAL TIME EMOTION DETECTION FROM VIDEOS
10011082 − Cafer YILDIZ
10011015 − Musa GÖKMEN
SENIOR PROJECT
Advisor
Assoc. Prof. Mine Elif KARSLIG˙IL
January, 2017

ACKNOWLEDGEMENTS
This study was prepared in scope of Senior Project in the Yildiz Technical University
Computer Engineering Department. We appreciate to our family always supporting
and being with us, and to our supervisors, Associate Professor Mine Elif KARSLIG˙IL,
guiding us with her knowledge and leading us with her experience during the project.
Cafer YILDIZ
Musa GÖKMEN
ii

TABLE OF CONTENTS
LIST OF ABBREVIATIONS v
LIST OF FIGURES vi
LIST OF TABLES vii
1 Introduction 1
2 Literature Review 2
2.1 Steps Facial Expression Recognition . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Creating Shape and Profile Model . . . . . . . . . . . . . . . . . . 3
2.1.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Feasibility Studies 4
3.1 Technical Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1.1 Software Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1.2 Hardware Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Legal Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3 Schedule Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.4 Financial Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4.1 Software Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4.2 Hardware Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4.3 Employee Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 System Analysis 8
4.1 Active Shape Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.1.1 Creating The Shape Model . . . . . . . . . . . . . . . . . . . . . . . 8
4.1.2 Creating the Profile Model . . . . . . . . . . . . . . . . . . . . . . . 13
4.1.3 Model Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Local Binary Pattern Uniform (LBP) . . . . . . . . . . . . . . . . . . . . . . 18
4.3 Compare Histogram with the Histograms of Emotions . . . . . . . . . . 20
5 System Architecture Design 21
iii

5.1 Cohn-Kanade Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6 Experimental Results 24
7 Conclusion 28
References 29
Curriculum Vitae 31
iv

LIST OF ABBREVIATIONS
ASM Active Shape Model
BSD Berkeley Software Distribution
CPU Central Processing Unit
FACS Facial Action Coding System
FER Facial Expression Recognition
GB Gigabyte
GHZ Gigahertz
HCI Human Computer Interaction
HDD Hard Disk Drive
IDE Integrated Development Environment
JDK Java Development Kit
KNN K Nearest Neighbour
LBP Local Binary Pattern
MB Megabyte
PC Personal Computer
PCA Principal Component Analysis
RAM Random-Access Memory
TB Terabyte
TRY Turkish Lira(New)
v

LIST OF FIGURES
Figure 3.1 Gantt diagram of the project . . . . . . . . . . . . . . . . . . . . . . 6
Figure 4.1 76-point face shape example(modified from [19]) . . . . . . . . 9
Figure 4.2 Before and after alignment . . . . . . . . . . . . . . . . . . . . . . 10
Figure 4.3 How the shape model is genrated and used (modified from [20]) 11
Figure 4.4 Different shapes generated by using different b values . . . . . . 12
Figure 4.5 Search profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Figure 4.6 Initial shape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Figure 4.7 Search along the sampled profile and best fit location. Modified
from[22] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Figure 4.8 (Right): Candidate shape. (Left): Candidate shape modeled on
the shape model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Figure 4.9 An example of LBP operator(modified from [25]) . . . . . . . . 18
Figure 4.10 Circularly symmetric neighbor sets for different values of m and
R(modified from [25]) . . . . . . . . . . . . . . . . . . . . . . . . . 19
Figure 4.11 (Left): Performed Local Binary Pattern, (Right): Performed
Local Binary Patter 58 Uniform . . . . . . . . . . . . . . . . . . . . 20
Figure 5.1 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Figure 5.2 Block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Figure 5.3 The sample face expression images from the Cohn–Kanade
database(modified from [28]) . . . . . . . . . . . . . . . . . . . . . 23
Figure 6.1 (Left): Initial, (Middle): Candidate, (Right): Result . . . . . . . 24
vi

LIST OF TABLES
Table 3.1 System requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Table 3.2 Software cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Table 3.3 Hardware cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Table 3.4 Employee cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Table 6.1 Results of experiments with training dataset and k=1 . . . . . . . 25
Table 6.4 Results of experiments with test dataset and k=1 . . . . . . . . . 26
vii

ABSTRACT
REAL TIME EMOTION DETECTION FROM VIDEOS
Cafer YILDIZ
Musa GÖKMEN
Department of Computer Engineering
Senior Project
Advisor: Assoc. Prof. Mine Elif KARSLIG˙IL
Machines recognize and analyze emotions in real time has become a necessity with
technology entering into every space people live. Nowadays, many researchers are
working on this subject to recognize face expressions. In the work done; face and eye
area were determined using OpenCV. Once these areas have been identified, Active
Shape Models algorithm has been used to detect other important regions such as
mouth, nose, and so on. After finding all areas, feature extraction for each region
was performed using Local Binary Pattern algorithm for emotion recognition. Finally,
the emotion was determined by the K-NN classification algorithm. The results of this
study were tested on 259 objects. Success rate was 91.7% on training data and 80.2%
on test data.
Keywords: Facial expression, real-time, face detection, active shape models
YILDIZ TECHNICAL UNIVERSITY
FACULTY OF ELECTRICAL AND ELECTRONICS
DEPARTMENT OF COMPUTER ENGINEERING
viii

ÖZET
V˙IDEO GÖRÜNTÜLERDEN GERÇEK ZAMANLI DUYGU
TESP˙IT˙I
Cafer YILDIZ
Musa GÖKMEN
Bilgisayar Mühendisli˘gi Bölümü
Bitirme Projesi
Danı¸sman: Doç. Dr. Mine Elif KARSLIG˙IL
Teknolojinin insanların hayatlarındaki her alana girmesiyle beraber makinelerin
gerçek zamanlı olarak duyguları tanıyıp analiz etmesi bir ihtiyaç haline gelmi¸stir.
Günümüzde de yüz ifadelerinin tanınması için bir çok ara¸stırmacı bu konu üzerinde
çalı¸smalar yapmaktadır. Yapılan çalı¸smada; OpenCV kullanılarak yüz ve göz bölgesi
tespit edilmi¸stir. Bu bölgeler tespit edildikten sonra Aktif ¸Sekil Modelleri algoritması
ile yüzün a˘gız, burun gibi di˘ger önemli bölgeleri tespit edilmi¸stir. Tüm bölgeler
bulunduktan sonra duygu tanıma için LBP algoritması kullanılarak her bölgeye
ait özellik çıkarımı yapılmı¸stır. Son olarak, KNN sınıﬂandırma algoritması ile de
duygunun tespiti yapılmı¸stır. Bu çalı¸smanın sonuçları 259 tane nesne üzerinde test
edilmi¸stir. E˘gitim verileri üzerinde %91.7 ba¸sarı sa˘glanırken test verileri üzerinde
%80.2 oranında ba¸sarı sa˘glanmı¸stır.
Anahtar Kelimeler: Duygu analizi, gerçek zamanlı, yüz tanıma, aktif ¸sekil modelleri
YILDIZ TEKN˙IK ÜN˙IVERS˙ITES˙I
ELEKTR˙IK - ELEKTRON˙IK FAKÜLTES˙I
B˙ILG˙ISAYAR MÜHEND˙ISL˙I˘G˙I BÖLÜMÜ
ix

1
Introduction
Emotions are related to the different forms of face regions.It plays an important role
in ensuring proper communication between individuals. The recognition of facial
expression by the machine can contribute significantly to the communication between
the user and the computer. Thus, in the future, the computer will be able to find
recommendations according to the emotional state that the users are in. Because
the face structure for humans is the same in a certain place, the face expressions are
close to each other in great order. Thus, a system can be determined better emotional
expression is trained with a certain group of facial expression. This is the basic thing
to be done in this study.
In this project, it is aimed to automatically determine the basic emotional expressions
from 2D images taken from the videos.
The system receives the captured video frame by frame, and then detects the existed
faces in each frame using OpenCV library. The used training datasets in our study
are MUCT database which consists of 3755 faces with 76 manual landmarks[1] and
Cohn-Kanade database which is for research in automatic facial image analysis and
synthesis and for perceptual studies. We apply Active Shape Models (ASM)[2] over
MUCT in order to calculate a profile by averaging the profile gradients of all the
shapes in the dataset. The generated profile is used to detect the suitable face shape.
We use Local Binary Pattern[3] to extract the face’s features. At the end, using the
classification algorithm KNN[4], we classify the facial expressions into happiness,
anger, surprise, disgust, fear, sadness and discussed classes.
1

2
Literature Review
The human face has an important place in recognizing emotions. Because, significant
features are emerging among with different forms of face for each emotion. The
emotional state can be analyzed by looking at these characteristics. Facial Expression
Recognition(FER)[5][6][7] is used in different areas. Some of them are Computer
Vision, Digital Image Processing and Artificial Intelligence[8]. In recent years, emotion
recognition has become a necessity with the increasing interaction between the
computer and the human[9]. That’s why we can say it’s a popular topic.
The aim of the work is to develop an interactive computer vision system for recognition
of facial expressions from videos. The need for interaction with the machine is
increasing day by day in image processing in parallel to the progress in technology.
Different techniques are presented for the classification of emotions with increased
need.
In previous studies on emotion recognition, some of the difficulties are listed below:
• Not real time
• The small number of face combination for performance
• Use of geometric and visual techniques to extract features in face expression
• Generalization of classification algorithms in classification of face expression
• Processing large data
• Large size images
• Dynamics of face expressions
In this section, we focus on various feature extraction methods using appearance based
features for recognizing human facial expression. There are different approaches
2

have been developed for extracting features from face images are Gabor Filter[10],
Principal Component Analysis (PCA), Linear Discriminate Analysis (LDA)[11], Local
Binary Pattern (LBP), with different classifier Support Vector Machine(SVM)[12], KNN
(K Nearest Neighbour).
2.1 Steps Facial Expression Recognition
Facial Expression Recognition proceed in mainly three steps: Creating Shape and
Profile Model, Feature Extraction, Classification.
2.1.1 Creating Shape and Profile Model
The first step in the figure model is aligning the shapes in the data set. In the next
step, the shape model is created using PCA. As a final step, a profile model is created
for each point of each shape in dataset. When the profile model is created, the gray
level values of the pixels around each point are used.
2.1.2 Feature Extraction
Feature is extracted for each area after the determination of meaningful areas of the
image taken from the video. The step of extracting the features is important for the
emotion recognition is detected by comparison of the features. In this study, the
feature extraction was made with texture base.
2.1.3 Classification
After the feature extraction of the face and facial components of the input image, the
next step is to classified according to their closeness. KNN (K Nearest Neighbour)[13]
is used for classification.
As a result, similar applications which uses the same research has been compared and
found to be lacking in certain properties. This application will attempt to improve and
add the missing features as they are found.
3

3
Feasibility Studies
To analyze the project, we made some studies, in terms of labor, technical, legal and
economic sides.
3.1 Technical Feasibility
There are two feasibilities to choose the suitable software and hardware.
3.1.1 Software Feasibility
Microsoft Visual Studio was used as a software structure. Because other options are
limited. Windows 10 was used as an operating system. C++ programming language
has been selected as programming language for this project.
3.1.1.1 Microsoft Visual Studio
Microsoft Visual Studio is an integrated development environment (IDE) from
Microsoft.[14] Visual Studio supports different programming languages and allows
the code editor and debugger to support (to varying degrees) nearly any programming
language, provided a language-specific service exists. So it is preferred.
3.1.1.2 C++
C++ is a general-purpose programming language. It has imperative, object-oriented
and generic programming features, while also providing facilities for low-level
memory manipulation. It was designed with a bias toward system programming and
embedded, resource-constrained and large systems, with performance, efficiency and
flexibility of use as its design highlights.[15]
4

3.1.1.3 OpenCV
OpenCV (Open Source Computer Vision) is a library of programming functions mainly
aimed at real-time computer vision, originally developed by Intel’s research center in
Nizhny Novgorod (Russia), later supported by Willow Garage and now maintained
by Itseez. The library is cross-platform and free for use under the open-source BSD
license[16].
3.1.1.4 Windows 10
Windows 10 is an operating system that is put on the market by Microsoft Windows.
Windows 10 is used in Personal Computer, Notebook, Net-book, Tablet PC and Media
Center. Microsoft Windows put Windows 10 on market at 29 July 2015. The most
advantage is that it is easy to use.
3.1.2 Hardware Feasibility
The table was created to calculate hardware requirement. The requirements were
calculated with the help of Table 3.1.
Table 3.1 System requirements
Software RAM HDD CPU Graphic Card
Microsoft Visual Studio 1 GB 4 GB 1.6 GHZ 256 MB
JDK 64 MB 396 MB - -
Windows 10 2 GB 20 GB 1 GHZ 128 MB
Total 3.05 GB 24.4 GB 2.6 GHZ 384 MB
According the Table 3.1, the minimum requirement of system that 3.05 GB RAM, 24.4
GB HDD, 2.6 GHZ CPU and 384 MB graphic card. The notebook was used that have
8 GB RAM, 1.5 TB HDD, 2.6 GHZ CPU and 2 GB graphic card in this project.
3.2 Legal Feasibility
All rights reserved to Yıldız Technical University, Computer Engineering Department.
3.3 Schedule Feasibility
In the Schedule Feasibility, a Gantt diagram has been created to determine duration
and milestones of the project. Starting the senior Project 30 September 2016 and it
was decided to completed to 30 December 2016 as shown by Gantt diagram in Figure
3.1.
5

Figure 3.1 Gantt diagram of the project
6

3.4 Financial Feasibility
Financial analysis was made in Financial Feasibility. Total cost is under this part.
3.4.1 Software Costs
To be purchased and free software: Microsoft Visual Studio and Windows 10 are
purchased. Windows 10 just has cost 900 TRY and Microsoft Visual Studio has cost
1875 TRY per a year. On the other hand, JDK is used to free. As a result, there is about
cost 2775 TRY for software. All used software is shown in Table 3.2.
Table 3.2 Software cost
Program License Price(TRY)
Microsoft Visual Studio BSD license 1875
JDK BSD license Free
Windows 10 OEM license 900
Total - 2775
Refer to the Microsoft Visual Studio price[17] and the Windows 10 price[18].
3.4.2 Hardware Costs
There is no device that have minimum requirement in market. So, notebook and
server was used that point in the feasibility part. Their costs are shown in Table 3.3.
Table 3.3 Hardware cost
Hardware Price(TRY)
Laptop x 2 2000 + 4700
Total 6700
3.4.3 Employee Costs
All employee salaries are shown in Table 3.4.
Table 3.4 Employee cost
Employee Price(TRY)
Employee 1 3000(4 months)
Employee 2 3000(4 months)
Total 24000
7

4
System Analysis
Since this study is concern with the face recognition, and since ASM model is one of
the most popular methods in this domain. This project is decided to be concerned as
an ASM project. In this section the ASM model and its processing steps have been
discussed in details.
4.1 Active Shape Model
ASM is one of the model based approaches. It is created using profile and shape
models. Shape model defines the variations of shapes inside the training set. One the
other hand, the profile model generates statistical data to represent the gray-level’s
texture for each landmark point. The shape and profile model which has been created
in the training step, is used to position the figures in the test images in the search step.
4.1.1 Creating The Shape Model
The shape model is created in three steps: Manually marking the landmark points in
the training set, marking the marks, then alimenting the marked shapes (removing the
differences of scale, position, and rotation angle), and finally obtaining the statistical
data related to the shape changes.
4.1.1.1 Marking The Shapes
The shape of an object is formed by a set of N points; each point has d-dimensional.
These points should be selected on or around the unchanged regions for all images in
the training set. These regions should generally reflect the general shape and character
of the object. For example, for the face, points on the facial boundary, eyes, nose, and
corner points can be used to create face shapes. 76-point face shape is shown in Figure
4.1.
8

Figure 4.1 76-point face shape example(modified from [19])
An object is described by points, referred to as landmark points. The landmark
points are (manually) determined in a set of training images. From these collections
of landmark points, a point distribution model [28] is constructed as follows. The
landmark points (x1, y1),...,(xn, yn) are stacked in shape vectors as Equation 4.1
S = (x1, x2,.., xn, y1, y2,.., yn)T
(4.1)
The shape vectors is defined for the k image in the training set (Sj; j = 1,2,3,..., k).
4.1.1.2 Aligning a Set of Shapes
During training we need to align not just two shapes but a set of shapes. By definition,
alignment means that the total distance from the aligned shapes to the mean shape is
minimized. The mean shape is the average of the aligned shapes. If we knew the mean
shape beforehand, we could simply align all the shapes to the mean shape and be done.
Since we don’t have this prior knowledge, we instead create an initial provisional mean
shape from a reference shape and iterate using the following algorithm. The reference
shape can be any shape in the set of shapes.
9

1. Translate all shape to the center point (0,0)
2. Fix one shape S and scale ||S||=1
3. Hide S to S0
4. Align all shape to the shape S
5. Find the average shape of the aligned shapes
6. Align new average shape to S0 and scale it to ||¯S||=1
7. Repeat steps 4-6 until the mean shape converges
Before alignment begins, it may be beneﬁcial to position the reference shape on the
origin and presale its size to unity. However, this isn’t essential the absolute position.
Figure 4.2 shows the difference between shapes before and after alignment.
Figure 4.2 Before and after alignment
4.1.1.3 Shape Model
To create the shape model, we use a standard principal components approach to
generate a set of directions, or axes, along which the mean shape can be ﬂexed in
shape space to best represent the way the faces vary in the training set. This is done
as described below. Figure 4.3 is an overview.
10

Figure 4.3 How the shape model is genrated and used (modiﬁed from [20])
Principal component analysis(PCA) is applied to the shape vectors x by computing the
mean shape as Equation 4.2
¯S =
1
n
n
i=1
Si (4.2)
The covariance as Equation 4.3
C =
1
(n − 1)
n
i=1
(Si − ¯S)(Si − ¯S)T
(4.3)
11

The Eigen system of the covariance matrix. The eigenvectors corresponding to the t
largest eigenvalues λi are retained in a matrix φ = (φ1|φ2|...|φt). A shape can now
be approximated by Equation 4.4
S ∼= ¯S + φb (4.4)
Where b is a vector of t elements containing the model parameters, computed by
Equation 4.5
b = φT
(S − ¯S) (4.5)
When ﬁtting the model to a set of points, the values of are constrained to lie within
the range m λi, where m usually has a value between two and three.
The number t of eigenvalues to retain is chosen so as to explain a certain proportion
fv of the variance in the training shapes, usually ranging from 90% to 99.5%. The
desired number of modes is given by the smallest for which as Equation 4.6
t
i=1
λi ≥ f v
n
i=1
λi (4.6)
Show different shapes generated by using different b values in Figure 4.4.
Figure 4.4 Different shapes generated by using different b values
12

4.1.2 Creating the Profile Model
The profile model is created to define the attributes of the texture around the landmark
points. In other words, profile model is used to determine how the texture and the
look around a point should be. In the test images, when the shape is aligned, the
texture information around the sample points are extracted. These information is
compared with the texture information obtained from profile model. Then, according
to the comparison result the point’s position is updated. Thus, the points are moved
to the most appropriate position at each step and the most similar shape to the object
is obtained.
To create the profile model, suppose for a given point we sample along a profile k
pixels either side of the model point in the i-th training image. For the j-th landmark
in the m-th image, a grey-level vector is obtained by sampling pixels along the normal
line of the connecting line between j-th and (j-1)-th landmarks, gray level information
can be recorded as Equation 4.7
gnj = [gnj1, gnj2, gnj3,..., gnj(2k+1)] (4.7)
To reduce the effects of global intensity changes we sample the derivative grey values
rather than the grey values as Equation 4.8
gnj
= [(gnj1 − gnj2),(gnj2 − gnj3),...,(gnj2k − gnj(2k+1))] (4.8)
In order to reduce the impact of illumination and so on, gray vectors are normalized
as as Equation 4.9
Pnj =
gnj
2k
i=1
|gnji|
(4.9)
To the j- th landmark for each image in the training set, sample gray information to
use the same sampling method, then we can build the profile model of the j-th labeled
point, the mean gray-scale information and the covariance matrix are expressed as:
The average of a point as Equation 4.10
¯Pn =
1
K
K
j=1
Pnj, n = 1,2,3,..., N (4.10)
13

The covariance matrix of a point as Equation 4.11
Spn =
1
K − 1
K
j=1
(Pnj − ¯Pn)(Pnj − ¯Pn)T
, n = 1,2,3,..., N (4.11)
4.1.3 Model Searching
Based on above methods, we can obtain the shape model and the profile shape model,
then use the profile model to search an unknown image. The ASM starts the search for
landmarks from the mean shape aligned to the position and size of the face determined
by a global face detector. It then repeats the following two steps until convergence
• suggest a tentative shape by adjusting the locations of shape points by template
matching of the image texture around each point .
• conform the tentative shape to a global shape model.
During training on manually landmarked faces, at each landmark we calculate the
mean profile vector ¯Pn and the profile covariance matrix Spn.
During searching, For each landmark, we find the best fit along the profile where the
best profile gradient Psn gives the minimum Mahalanobis distance(dmn) with the model
as Equation 4.12
dmn = (Psn − ¯Pn)T
S−1
pn
(Psn − ¯Pn) (4.12)
Searching algorithm is shown in Figure 4.5.
Figure 4.5 Search profile
14

Before starting the search operation, it must be determined in the location where the
image of the desired object in the shape. The Viola Jones (VJ) face detection method
is widely used for the face [21]. After the face position is determined, the initial shape
is created. The average ﬁgure recorded generated in the training phase (¯S) is carried
out using face scaling object’s location and width information. Figure 4.6 shows the
face region and initial shape found with the VJ face detector.
Figure 4.6 Initial shape
15

Profile comparison process is performed to find the best shape. Each landmark created
to average Profile on the (Pn) (2k + 1) in length and a test image obtained profiles on
(Psn), (2m + 1) Suppose that the vectors of length (m > k). For any point, the average
profile is shifted over the sampled profile. In each translation step (total (2m - 2k
+ 1) shift) the corresponding part of the sampled profile is compared with the mean
figure. The comparison is made by calculating the Mahalonobis distance between the
vectors. The new position of the landmark point is determined by the shift step has
the smallest Mahalanobis distance. Any landmark profiles sampled for the test image,
the average profile and profile comparison results are shown in Figure 4.7.
Figure 4.7 Search along the sampled profile and best fit location. Modified from[22]
Profile comparison process is performed for all the landmarks, the point position is
updated. Thus a new shape is obtained. This shape is called as candidate shape (Sc).
When creating the candidate shape, the landmark points move independently of each
other. Therefore, the candidate shape is less likely to be similar to the object. For
example, a landmark on the face boundary can be stay a weak edge on the image
and it can not go to the location should be in reality.To prevent such a situation
from occurring, the candidate shape is adapted to the shape model. This process
is carried out by finding the closest model shape (produced by the shape model)
with the candidate shape. A model can be aligned in any way with various geometric
transformations (translation, scaling, rotation). Therefore, the b vector and exposure
parameters (Xt: the amount of displacement along the x-axis, Yt: the amount of
displacement along the y-axis, s: scale, θ: rotation angle) must be determined to
produce the model shape closest to the proposed shape. The vector b and the exposure
parameters take values that make the distance between the candidate shape and the
model shape at least.
16

The algorithm shown below is used to find the most suitable vector b and exposure
parameters.
1. Initialize b = 0
2. Generate model points S = ¯S+Qb
3. Find s, Xt, Yt, θ to best fit Sc to S, Sc = M (S)
Sc is candidate shape after profile matching
M is aligning function which finding s, Xt, Yt, θ parameters
4. Project Sc into S space Sp = M−1
(Sc)
5. Update model parameters b = QT
( Sp - ¯S)
6. Go to step 2, iterate until convergence.
The candidate shape and model shape are shown in Figure 4.8.
Figure 4.8 (Right): Candidate shape. (Left): Candidate shape modeled on the shape
model.
17

After detecting all the effective areas of the emotion of face, it is difficult to work on
the fields of RGB[23] format first for every related area in the face. So all fields are
processed in the YCBCR[24] format. Local Binary Pattern Uniform (LBP) is applied
after obtaining the Y value image. A feature vector is generated using image histogram
of LBP applied. Created this vector previously identified anger, disgust, fear, happy,
sad, neatural, surprise that corresponds to whichever of the class KNN (K Nearest
Neighbour) was determined using classification algorithms.
4.2 Local Binary Pattern Uniform (LBP)
The LBP is created from binary scales that are gaussian to the intensity values of the
pixels around a point. That is, the value of each pixel is compared individually with
the values of its neighbors. If the value of the neighboring comparison is greater than
its own value during comparison, it gives the neighbor value 1 as the label. If the value
of the neighbor is smaller than its value, it gives the label 0 for the neighbor. Show
Figure 4.9. The resulting decimal counterpart of the binary sequence will update itself
as the new value. If the transitions between "0" and "1" are small than or equal to two,
the patterns are uniform. For example 11100011,00001000 are uniform pattern but
11010111 is not uniform. The histogram of uniform patterns in each region will be
the feature vector.
Figure 4.9 An example of LBP operator(modified from [25])
18

Multiresolution analysis can be achieved by choosing different values of m and R,
where m denotes the number of neighboring pixels with respect to the center pixel,
and R represents the Distance from the center pixel to each of the neighboring pixels.
Figure 4.10 illustrates circularly symmetric neighbor sets for different values of m and
R.[25]
Figure 4.10 Circularly symmetric neighbor sets for different values of m and
R(modified from [25])
In order to fully describe the dominant patterns contained in the face images, we
extend the conventional LBP. Each Pattern in the image is assigned a unique label by
the following equation[26] as Equation 4.13
LBP(m,R) =
p−1
i=0
u(ti − tc)2i
(4.13)
where tc is the intensity of the center pixel, ti is the intensity of the neighbor i,
and u(x) is the step function. It is clear that the LBP defined in equation 1 is not
rotation-invariant as the intensity value of ti changes when the circle is rotated by a
specific angle. Two patterns should be treated as the same type if one can be obtained
from the other through rotating by a certain angle.[25]
The results obtained after Local Binary Pattern application and the results obtained
after applying Local Binary Pattern Uniform are shown in Figure 4.11.
19

Figure 4.11 (Left): Performed Local Binary Pattern, (Right): Performed Local Binary
Patter 58 Uniform
4.3 Compare Histogram with the Histograms of Emotions
In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a
non-parametric method used for classification and regression. In both cases, the input
consists of the k closest training examples in the feature space. The output depends
on whether k-NN is used for classification or regression.[27]
In this study, the regions related to the Active Shape Models algorithm were found, and
after the feature was extracted for each region with Local Binary Patter 58 Uniform, the
K-NN algorithm was used in the classification phase. First, the feature vector for each
image in the data set is subtracted. This extracted feature vector is stored in the class of
the relevant affect. Then, during the sensory testing phase, the corresponding regions
of the face in the image coming from the video are detected and the feature vector is
extracted for each region. The extracted feature vector is the distance between each
vector in each class. The nearest k neighbors are looked at. The closest k neighbors
belong to that class, whichever class is the most, the emotion that is tested.
20

5
System Architecture Design
Firstly, all the pictures and shape of pictures are read in the main class. Later, these
shapes are sent to the GeneralizedProcrustesAnalysis class. In this class, all shapes
are aligned using the ProcrustesAnlysis and PointList classes. Then the shape model is
generated with applying PCA to the aligned shapes. Profile model is created by using
each shape of figure. Using the figure model and profile model, the shape of the image
is produced in the search process. The class diagram of the system is given on Figure
5.1.
Figure 5.1 Class Diagram
21

Firstly ASM’s need to be trained. For this, a data set consisting of pictures and shapes
of pictures is created. Then a shape model is created during the education phase. All
images are grayed for creating a profile. In the testing phase, it is determined eye first.
according to the position of the eye, the starting shape is created. In the image being
searched, gray translation is performed. In the last step, the ASM search algorithm is
applied to find the shape of the image. The block diagram of the system is given on
Figure 5.2.
Figure 5.2 Block diagram
5.1 Cohn-Kanade Dataset
The proposed algorithm was trained and tested on the Cohn-Kanade Facial Expression
Database. This database consists of 100 university students in age from 18 to 30 years,
of which 65% were female, 15% were African-American, and 3 were Asian or Latino.
Subjects were instructed to perform a series of 23 facial displays, seven of which were
based on descriptions of prototype emotions (i.e., anger, disgust, fear, happy, neutral,
sadness, and surprise). Image sequences from neutral to target display were digitized
into 640x490 pixel arrays. For our experiments, we selected 320 image sequences
from the database for basic emotional expression recognition. The sequences come
from 96 subjects, with 1 to 6 emotions per subject. For each sequence, the neutral
face and three peak frames of each sequence were used. To evaluate generalization
performance, a 10-fold cross-validation test scheme was adopted.[28]
22

Figure 5.3 The sample face expression images from the Cohn–Kanade
database(modiﬁed from [28])
23

6
Experimental Results
The experimental results of the project are given below. Some pictures were used for
the experiment. The progressive results of these images are presented in Figure 6.1,
Figure 6.2 and Figure 6.3 visually. The result in Figure 6.1 is successful, the result in
Figure 6.2 is very successful and the result in Figure 6.3 fails.
Figure 6.1 (Left): Initial, (Middle): Candidate, (Right): Result
24

Most facial algorithms aim to ﬁnd expressions of high resolution faces. However,
most images taken in the real world are low resolution. So, the faces in the image
and the expressions they are in are in low resolution. Studies have shown that LBP
algorithm is not affected by different resolution images. In this work, we use the
LBP algorithm because it is not affected by different resolution images. Experimental
observations made it possible to observe that the areas of the mouth, nose, eyes,
forehead and eyebrows most affected the change of emotion. Local feature vectors
have been derived from the corresponding regions with LBP algorithm. Today, most
of the images taken from the camera have a size of 480/640. We used the John-Kanade
database to train our system because the John-Kanade dataset is also 490/640. When
we are using the K-NN algorithm at the classiﬁcation stage, we compare the results
obtained from the system by giving different values to k when testing the system.
The results obtained in the tests are given in Table 6.1 for the training dataset and in
Table 6.2 for the test dataset.
Table 6.1 Results of experiments with training dataset and k=1
Anger Disgust Fear Happy Natural Sad Surprise Accuracy (%)
Anger 23 0 0 0 0 0 0 100
Disgust 0 21 0 0 0 1 0 95,5
Fear 0 0 30 2 0 0 0 93,75
Happy 0 0 1 72 0 0 0 98,6
Natural 0 0 0 0 47 0 0 100
Sad 2 0 0 0 0 35 0 94.5
Surprise 0 0 0 0 0 0 27 100
25

Anger 21 0 0 0 1 1 0 91,3
Disgust 0 21 0 0 0 1 0 95,5
Fear 0 0 26 4 1 1 0 81,3
Happy 0 0 0 68 0 3 2 93,2
Natural 1 0 1 0 43 1 1 91,5
Sad 2 0 1 0 1 33 0 89,2
Surprise 0 0 0 0 0 0 27 100
Anger 21 0 0 0 1 1 0 91,3
Disgust 0 21 0 0 0 1 0 95,5
Fear 0 0 28 2 1 1 0 87.5
Happy 0 0 2 68 0 1 2 93,2
Natural 1 0 1 0 44 1 93,6
Sad 2 0 0 0 2 33 0 89,2
Surprise 0 1 0 0 0 0 26 96.2
Table 6.4 Results of experiments with test dataset and k=1
Anger 19 0 0 0 1 2 1 82,6
Disgust 2 18 0 0 0 2 0 78,3
Fear 1 0 18 7 3 3 0 56,3
Happy 0 2 0 64 0 5 2 87,7
Natural 2 0 2 0 40 1 2 85,1
Sad 2 0 2 1 3 29 0 78,4
Surprise 0 0 0 0 1 1 27 93,1
Anger 20 0 0 0 1 1 1 86,9
Disgust 2 18 0 0 0 2 0 78,3
Fear 1 0 22 6 2 1 0 68.75
Happy 0 2 0 66 0 4 1 90.4
Natural 2 0 2 0 39 2 2 82,9
Sad 2 0 2 1 2 30 0 81
Surprise 0 1 0 0 0 1 27 93,1
26

Anger 21 0 0 0 2 0 0 91,3
Disgust 1 19 0 0 0 2 0 86,3
Fear 1 0 25 4 1 1 0 78.12
Happy 0 2 0 66 0 4 1 90.4
Natural 2 0 2 0 39 2 2 82,9
Sad 0 0 2 1 1 33 0 89,18
Surprise 0 1 0 0 0 1 27 93,1
27

7
Conclusion
In this study, first a data set (MUCT) consisting of previously manually manipulated
faces was used. The shapes used are first aligned with the PCA algorithm in a common
reference manner. A shape model and a profile model were created from the aligned
shapes. The PCA algorithm was used to construct the shape model. Random new
shapes were created by using eigenvectors and eigenvalues obtained by using PCA
algorithm. A profile model was created for points representing each shape. Using the
generated profile model, the randomly generated shape was matched to the desired
shape. Briefly, the ASM algorithm’s shape model and the profile model were used to
determine meaningful regions.
In the stage of facial expression recognition, the method based on Local Binary Pattern
58 Uniform algorithm and K-NN is introduced. Local Binary Pattern 58 Uniform
algorithm is used for local texture feature extraction and K-NN is used for expression
classification and recognition. Experimental results show that the method adopted
in this article is robust in different expressions. Finally, If everyone is doing special
training in the test phase, we get the best result when k = 1. If the person does not
have special training, we get the best result when k = 5.
28

References
[1] S. Milborrow, J. Morkel, and F. Nicolls, “The muct landmarked face database,”
Pattern Recognition Association of South Africa, 2010, http://www.milbo.
org/muct.
[2] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham, “Active shape
models-their training and application,” Computer vision and image understand-
ing, vol. 61, no. 1, pp. 38–59, 1995.
[3] Z. Guo, L. Zhang, and D. Zhang, “A completed modeling of local binary pattern
operator for texture classification,” IEEE Transactions on Image Processing, vol.
19, no. 6, pp. 1657–1663, 2010.
[4] M.-L. Zhang and Z.-H. Zhou, “A k-nearest neighbor based algorithm for
multi-label classification,” in 2005 IEEE international conference on granular
computing, IEEE, vol. 2, 2005, pp. 718–721.
[5] Z. Zhang, M. Lyons, M. Schuster, and S. Akamatsu, “Comparison between
geometry-based and gabor-wavelets-based facial expression recognition using
multi-layer perceptron,” in Automatic Face and Gesture Recognition, 1998. Pro-
ceedings. Third IEEE International Conference on, IEEE, 1998, pp. 454–459.
[6] C. Shan, S. Gong, and P. W. McOwan, “Facial expression recognition based on
local binary patterns: A comprehensive study,” Image and Vision Computing,
vol. 27, no. 6, pp. 803–816, 2009.
[7] I. Cohen, N. Sebe, A. Garg, L. S. Chen, and T. S. Huang, “Facial expression
recognition from video sequences: Temporal and static modeling,” Computer
Vision and image understanding, vol. 91, no. 1, pp. 160–187, 2003.
[8] J. R. Jensen, “Introductory digital image processing: A remote sensing
perspective,” Univ. of South Carolina, Columbus, Tech. Rep., 1986.
[9] A. Dix, Human-computer interaction. Springer, 2009.
[10] W. K. Kong, D. Zhang, and W. Li, “Palmprint feature extraction using 2-d gabor
filters,” Pattern recognition, vol. 36, no. 10, pp. 2339–2347, 2003.
[11] A. J. Izenman, “Linear discriminant analysis,” in Modern multivariate statistical
techniques, Springer, 2013, pp. 237–280.
[12] J. A. Suykens and J. Vandewalle, “Least squares support vector machine
classifiers,” Neural processing letters, vol. 9, no. 3, pp. 293–300, 1999.
[13] P. Horton and K. Nakai, “Better prediction of protein cellular localization sites
with the it k nearest neighbors classifier.,” in Ismb, vol. 5, 1997, pp. 147–152.
[14] MS Windows microsoft visual studio wikipedia, https://en.wikipedia.org/
wiki/Microsoft_Visual_Studio, Accessed: 2016-10-29.
29

[15] MS Windows microsoft visual studio wikipedia, https://en.wikipedia.org/
wiki/C%2B%2B, Accessed: 2016-10-29.
[16] G. Bradski and A. Kaehler, Learning OpenCV: Computer vision with the OpenCV
library. " O’Reilly Media, Inc.", 2008.
[17] MS Windows NT pricing and purchasing options | visual studio, https://www.
visualstudio.com/tr/vs/pricing/, Accessed: 2016-10-28.
[18] MS Windows windows 10 pro: Yükselt veya satın al - microsoft ma˘gazası türkiye,
https : / / www . microsoftstore . com / store / msmea / tr _ TR / pdp /
productID . 320421400 ? VID = 320421600 & s _ kwcid = AL ! 4249 ! 3 !
157215759337 ! ! ! g ! 18283950120 ! &WT . mc _ id = tr _ datafeed _
pla _ google _ pointitsem _ office & ef _ id = WJHaOwAABaNV0ub8 :
20170201125315:s, Accessed: 2016-10-28.
[19] Ö. Ayhan, “Yüz öznitelik çıkarımı için geli¸stirilmi¸s aktif ¸sekil modeli,” PhD
thesis, Fen Bilimleri Enstitüsü, 2013.
[20] S. Milborrow, T. Bishop, and F. Nicolls, “Multiview active shape models with sift
descriptors for the 300-w face landmark challenge,” in Proceedings of the IEEE
International Conference on Computer Vision Workshops, 2013, pp. 378–385.
[21] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple
features,” in Computer Vision and Pattern Recognition, 2001. CVPR 2001. Pro-
ceedings of the 2001 IEEE Computer Society Conference on, IEEE, vol. 1, 2001,
pp. I–511.
[22] I. Ari, A. Uyar, and L. Akarun, “Facial feature tracking and expression
recognition for sign language,” in Computer and Information Sciences, 2008.
ISCIS’08. 23rd International Symposium on, IEEE, 2008, pp. 1–6.
[23] G. E. Gunbas, A. Durmus, and L. Toppare, “Could green be greener?
novel donor–acceptor-type electrochromic polymers: Towards excellent neutral
green materials with exceptional transmissive oxidized states for completion of
rgb color space,” Advanced Materials, vol. 20, no. 4, pp. 691–695, 2008.
[24] D. Chai and A. Bouzerdoum, “A bayesian approach to skin color classiﬁcation in
ycbcr color space,” in TENCON 2000. Proceedings, IEEE, vol. 2, 2000, pp. 421–
424.
[25] S. Liao, W. Fan, A. C. Chung, and D.-Y. Yeung, “Facial expression recognition
using advanced local binary patterns, tsallis entropies and global appearance
features,” in Image Processing, 2006 IEEE International Conference on, IEEE,
2006, pp. 665–668.
[26] T. Ahonen, J. Matas, C. He, and M. Pietikäinen, “Rotation invariant image
description with local binary pattern histogram fourier features,” in Scandina-
vian Conference on Image Analysis, Springer, 2009, pp. 61–70.
[27] N. S. Altman, “An introduction to kernel and nearest-neighbor nonparametric
regression,” The American Statistician, vol. 46, no. 3, pp. 175–185, 1992.
[28] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews,
“The extended cohn-kanade dataset (ck+): A complete dataset for action unit
and emotion-speciﬁed expression,” in Computer Vision and Pattern Recognition
Workshops (CVPRW), 2010 IEEE Computer Society Conference on, IEEE, 2010,
pp. 94–101.
30

Curriculum Vitae
PERSONAL INFORMATION MEMBER OF 1
Name-Surname: Cafer YILDIZ
Birthdate and Place of Birth: 07.07.1988, Diyarbakır
E-mail: caferyildiz3@gmail.com
Phone: 0545 494 11 48
Practical Training: Evren Bilgisayar
PERSONAL INFORMATION MEMBER OF 2
Name-Surname: Musa GÖKMEN
Birthdate and Place of Birth: 24.11.1989, Diyarbakır
E-mail: musagokmen21@gmail.com
Phone: 0544 934 72 21
Practical Training: Inoart Bili¸sim Hizmetleri A.¸S.
Project System Informations
System and Software: Windows ˙I¸sletim Sistemi, C++
Required RAM: 3GB
Required Disk: 25GB
31

Real time emotion_detection_from_videos

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Viewers also liked

Viewers also liked (20)

Similar to Real time emotion_detection_from_videos

Similar to Real time emotion_detection_from_videos (20)

Recently uploaded

Recently uploaded (20)

Real time emotion_detection_from_videos