Abstract
Our end-of-study project aims to develop an intelligent emotion prediction system using
augmented EEG signals using VAE and convolutional neural networks combined with LSTM. The aim of the project is to improve medical diagnosis, in particular by facilitating the
detection of emotions and applying quick treatment to patients, those suffering from trauma
and anxiety. This work helps both the medical staff and the patient. In a first step, we generated new data via cVAEs to provide a sufficient amount of data. Second, we used CNNs
combined with the LSTM technique to predict emotions. Finally, we have created an ergonomic, easy-to-access interface for therapists in medical diagnosis.
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Emotions prediction for augmented EEG signals using VAE and Convolutional Neural Networks CNN combined with LSTM
1. MASTER’S THESIS
To obtain the Master degree in :
Advanced Engineering of Robotized Systems
and Artificial Intelligence
Presented by: Bouzidi Amir
Emotions prediction for augmented EEG signals using VAE
and Convolutional Neural Networks CNN combined with LSTM
Presented in : July, 4th
2021
Graduation committee :
President : Mr. Sayadi Mounir
Technical advisor : Ms. Fourati Rahma and Mr. Yangui Maher
Report advisor : Ms. Ammar Boudour
Member : Ms. Selmani Anissa
Academic year: 2020/2021
2. اﻟﺨﻼﺻﺔ
ﺑﺎﺳﺘﺨﺪام اﻟﻤﻌﺰزة EEG إﺷﺎرات ﺑﺈﺳﺘﻌﻤﺎل ﺑﺎﻟﻌﻮاﻃﻒ ﻟﻠﺘﻨﺒﺆ ذﻛﻲ ﻧﻈﺎم إﻧﺸﺎء ﻋﻦ ﻋﺒﺎرة ﻫﺬا اﻟﺪروس ﺧﺘﻢ ﻣﺸﺮوع
ﺗﺴﻬﻴﻞﺧﺎﺻﺔواﻟﻄﺒﻲاﻟﺘﺸﺨﻴﺺﺗﺤﺴﻴﻦﻫﻮاﻟﻤﺸﺮوعاﻟﻬﺪف.LSTMﺑﻮاﺳﻄﺔاﻹﻟﺘﻔﺎﻓﻴﺔاﻟﻌﺼﺒﻴﺔاﻟﺸﺒﻜﺎتوVAE
و اﻟﻨﻔﺴﻴﺔ اﻹﺿﻄﺮاﺑﺎت ﻣﻦ ﻳﻌﺎﻧﻮن اﻟﺬﻳﻦ و ﻟﻠﻤﺮﺿﻰ ﺑﺴﺮﻋﺔ اﻟﻌﻼج ﻟﻤﺮﺣﻠﺔ اﻟﻤﺮور ﺑﺎﻟﺘﺎﻟﻲ و اﻟﻌﻮاﻃﻒ اﻛﺘﺸﺎف
أﺟﻬﺰةﻋﺒﺮﺟﺪﻳﺪةﺑﻴﺎﻧﺎتأﻧﺘﺠﻨﺎ،اﻷوﻟﻰاﻟﻤﺮﺣﻠﺔﻓﻲ ﺳﻮاءﺣﺪﻋﻠﻰواﻟﻤﺮﻳﺾاﻟﻄﺒﻲاﻟﻄﺎﻗﻢﻳﺴﺎﻋﺪاﻟﻌﻤﻞﻫﺬا اﻟﻘﻠﻖ
اﻟﻌﺼﺒﻴﺔ اﻟﺸﺒﻜﺎت اﺳﺘﺨﺪﻣﻨﺎ ًﺎﻴوﺛﺎﻧ ، اﻟﺒﻴﺎﻧﺎت ﻣﻦ ﻛﺎﻓﻴﺔ ﻛﻤﻴﺔ ﻟﺘﻮﻓﻴﺮ cVAE اﻟﺸﺮﻃﻴﺔ اﻟﻤﺘﻐﻴﺮة اﻟﺘﻠﻘﺎﺋﻲ اﻟﺘﺸﻔﻴﺮ
و ﺳﻬﻠﺔ واﺟﻬﺔ أﻧﺸﺄﻧﺎ ًاﺮوأﺧﻴ ، ﺑﺎﻟﻌﻮاﻃﻒ ﻟﻠﺘﻨﺒﺆ ﺑﺎﻟﻀﺒﻂ LSTM اﻟﻤﺪى ﻃﻮﻳﻠﺔ اﻟﺬاﻛﺮة ﺗﻘﻨﻴﺔ ﻓﻲ ﻣﺘﻤﺜﻠﺔ ، اﻟﺘﻼﻓﻴﻔﻴﺔ
اﻟﻄﺒﻲﻟﻠﺘﺸﺨﻴﺺﻻﺳﺘﺨﺪاﻣﻬﺎﻟﻠﻤﺴﺘﺨﺪﻣﻴﻦاﻹﺳﺘﻌﻤﺎلﺑﺴﻴﻄﺔ
Resumé
Notre projet de fin d’étude consiste à développer un système intelligent de prédiction des
émotions en utilisant les signaux EEG augmentés à l'aide de VAE et de réseaux de neurones
convolutifs combiné avec LSTM. L’objectif du projet est d'améliorer le diagnostic médical, en
facilitant notamment la détection des émotions et en appliquant un traitement rapide des
patients, ceux qui souffrent du traumatisme et de l'anxiété psychologique. Ce travail aide aussi
bien le personnel médical que le patient. Dans une première étape, nous avons généré de
nouvelles données via des cVAEs pour fournir une quantité suffisante de données. En
deuxième lieu, nous avons utilisé les CNNs combiné avec la technique LSTM pour prédire les
émotions. Enfin, nous avons créé une interface ergonomique à simple accès pour les
utilisateurs au diagnostic médical.
Abstract
Our end-of-study project aims to develop an intelligent emotion prediction system using
augmented EEG signals using VAE and convolutional neural networks combined with LSTM.
The aim of the project is to improve medical diagnosis, in particular by facilitating the
detection of emotions and applying quick treatment to patients, those suffering from trauma
and anxiety. This work helps both the medical staff and the patient. In a first step, we
generated new data via cVAEs to provide a sufficient amount of data. Second, we used CNNs
combined with the LSTM technique to predict emotions. Finally, we have created an
ergonomic, easy-to-access interface for users in medical diagnosis.
اﻟﺨﺎﺿﻊ ﺷﺒﻪ اﻟﺘﻌﻠﻢ ، اﻟﻤﺸﺎﻋﺮ ﻋﲆ اﻟﺘﻌﺮف ، اﻟﺘﻨﺒﺆ ، ﺑﺎﻳﺜﻮن ، EEG ، CNN ، RNN ، VAE ، LSTM :اﻟﻤﻔﺎﺗﻴﺢ
.اﻟﺒﻴﺎﻧﺎت ﻣﺠﻤﻮﻋﺎت ، اﻵﻟﻲ اﻟﺘﻌﻠﻢ ، اﻟﻌﻤﻴﻖ اﻟﺘﻌﻠﻢ ، ﻟﻺﺷﺮاف
Mots clés: EEG, CNN, RNN , VAE, LSTM, Python, prédiction, reconnaissance des émotion
apprentissage semi supervisé, apprentissage profond, machine learning, datasets
Key-words: EEG, CNN, RNN , VAE, LSTM, Python, prediction, emotion recognition, semi-
.supervised learning, deep learning, machine learning, Datasets
Emotions prediction for augmented EEG signals using VAE and
Convolutional Neural Networks CNN combined with LSTM
8. Acknowledgments
In the Name of Allah, the Most Beneficent, the Most Merciful
The Prophet Mohammad, peace be upon him said:
”Allah does not thank the person who does not thank people.”
• I would first like to thank my thesis advisor, Ms. Boudour
Ammar the PhD, Eng.and Assistant Professor in National
Engineering School of Sfax for her big support, help and
precious advices during this internship.
• I would also express my gratitude to Doctor Rahma Fourati,
member of REGIM Lab at ENIS, her door’s office was al-
ways open whenever I ran into a trouble spot or had a ques-
tion about my research or writing. They consistently allowed
this report to be my own work, but steered me in the right
the direction whenever he thought I needed it.
• I would like to thank the industrial head of CISEN Com-
puter Mr. Maher Yangui, the IT engineer, who gave me
the opportunity to accomplish this internship with their es-
teemed company, I would thank also thank the administra-
tive responsibles of UVT for their coordination and support
during my studies.
• Finally, I must express my very profound gratitude to my
parents: Mokhtar and Baya and to my brothers: Chedi and
Jemil, to my friends and colleagues for providing me with
unfailing support and continuous encouragement throughout
my years of study and through the process of researching
and writing this thesis. This accomplishment would not have
been possible without them. Thank you.
Amir Bouzidi
1
9. Chapter 1
General Introduction
Affective computing is the study and development of systems and devices
that can recognize, interpret, process, and simulate human affects. It is an
interdisciplinary field spanning computer science, psychology, and cognitive
science. The machine should interpret the emotional state of humans and
adapt its behavior to them, giving an appropriate response for those emo-
tions.
Affective computing technologies sense the emotional state of a user (via
sensors, microphone, cameras or software logic). They respond by perform-
ing specific predefined product/service features, such as changing a quiz or
recommending a set of videos to fit the mood of the learner. The more com-
puters we have in our lives the more we’re going to want them to be socially
smart. We don’t want it to bother us with unimportant information. That
kind of common-sense reasoning requires an understanding of the person’s
emotional state.
A major area in affective computing is the design of computational de-
vices proposed to exhibit either innate emotional capabilities or that are
capable of convincingly simulating emotions. A more practical approach,
based on current technological capabilities, is the simulation of emotions in
conversational agents in order to enrich and facilitate interactivity between
human and machine. While human emotions are often associated with surges
in hormones and other neuropeptides, emotions in machines might be asso-
ciated with abstract states associated with progress (or lack of progress) in
autonomous learning systems. In this view, affective emotional states cor-
respond to time-derivatives in the learning curve of an arbitrary learning
system. Two major categories describing emotions in machines: Emotional
speech and Facial affect detection.
Anxiety is a kind of a negative emotion. In this work, the goal is to
design an application for the visualization of EEG signals, the inspection of
the topographical map and the recognition of the anxiety level.
The desktop application is useful for the therapist as a first diagnosis to
choose the convenient technique to continue the therapy. In other words,
some people do not share their thinking or their mental state. In such case,
the inspection of the EEG signals allows to access the inner state of the
human mind. The current work proposes two contributions:
2
10. CHAPTER 1. GENERAL INTRODUCTION 3
• The content generation of data EEG signals in order to enrich the
dataset to guarantee an efficient training of a deep neural network.
• The recognition of anxiety states with a recurrent neural network com-
posed of convolutional layers followed by a Long Short Term (LSTM)
memory to capture spatio-temporal features in clean EEG signals.
The rest of this master is composed of four chapters. These are organised
as follows:
• Chapter Two: This chapter presents the theoretical background and
literature review on EEG-based emotion recognition as well as EEG-
based Anxiety levels recognition. It also provides a background on EEG
signals, including EEG rhythms, analysis techniques of EEG signals
and the role of EEG as a modality for emotion recognition. In addition
to this, the chapter contains the used techniques by machine learning
for emotions recognition (facial recognition, speech recognition, body
gestures and movements, motor behavioural patterns and biosignals)
followed by an overview of the existing affective benchmarks then the
available datasets and the framework and steps for this project.
• Chapter three: The chapter introduces the data projection with au-
toencoder followed by the explanation of the AE architecture, vari-
ational autoencoders and the gap between AE and VAE. Then, we
choose the conditional variational autoencoder for writing the code on
COLAB. Finally, we talk about metrics of our machine learning.
• Chapter four: This chapter presents the proposed model for the
recognition of the emotion states. We talk about CNN advantages, it
principle, architecture and it applications in real world then we describe
the difference between 1D and used 2D CNN for our code followed by
a brief explanation for recurrent neural networks RNN and it impor-
tance for our application for continuous signals like EEG, then we write
about the used technique LSTM for our code, followed by an analysis
for the code, finally we had present the graphical user interface and the
GANTT chart for whole project.
• Chapter five: This chapter presents the obtained results, summariz-
ing this experience and the contribution for future works. The master
ends with a conclusion that provides a summary of our contributions,
outlines the conclusions and the limitations of this research and also
suggests several directions for future research.
11. Chapter 2
Review of EEG-based emotion
recognition
2.1 Introduction
The mechanisms that regulate our physiological and mental processes behave
in a coupled way in which there is a miraculous inter-dependency. Mental
processes are responsible for changes of the physiological state in our body.
On the other side, changes in bodily functions also lead to different thoughts,
behaviours and emotions. In this chapter, we will present the EEG modality
along with its brain waves then we will talk about the phenomenon of anxiety
in Tunisia and worldwide followed by a presentation of the latest machine
learning used techniques for emotions recognition and finally we will intro-
duce the top used mobile applications in our community for treating anxiety
and emotion identification with an overview for the elicitation techniques of
anxious states.
2.2 Preliminaries
2.2.1 Electroencephalography
Electroencephalography, also known as EEG, is the study of the brain func-
tions reflected by the brain’s electrical activity and it is considered as one
of the basic tools to image brain functioning. Our thoughts are generated
through a network of neurons, that send signals to each other with the help
of electrical currents. We have to use some electrodes of an EEG headset
placed on the scalp to collect the brain’s electrical signals. In addition, a
conductive paste is used to improve the conduction of the electrical signals.
The EEG headset used in this study is an elastic cap similar to a bathing
cap with the electrodes mounted to the cap. The electrodes are mounted
systematically on the cap using the international 10-20 system for electrode
placement to ensure that the data can be collected from identical positions
across all the respondents. These electrodes detect the electrical changes of
thousands of synchronized neurons simultaneously. The voltage fluctuations
4
12. CHAPTER 2. REVIEW OF EEG-BASED EMOTION RECOGNITION 5
measured by those kind of sensors are very low, typically in micro-volts.
The signals are digitized and sent to an amplifier where the signals will be
amplified.
Once the signals are amplified, the signals are sent to a computer, where
these can be recorded. Using different methodologies, the signals can be
represented as a vector array or matrix array for data processing utilities
[Cong et al., 2015]. In addition, various maps of the brain activity can be
generated, with a fast temporal resolution.
Figure 2.1: Measuring the electrical activity using fixated electrodes on an
EEG cap
[Web 2]
The main drawback for EEG is the spatial resolution where it is difficult to
tell whether the signals that were measured by the electrodes were generated
near the surface, or in deeper regions of the brain so it’s hard to figure out
where in the brain the electrical activity is coming from. The cost of EEG
systems depend on several factors: Firstly, the number of electrodes on the
headset, secondly on the quality of the amplifier and thirdly on the sampling
rate, measured in Hz.
One of the major advantages of using EEG is the fact that it has excel-
lent temporal resolution, meaning that it can measure in fine detail events
happening in real-time. According to neuroscience news, researchers believe
that it takes about 17 milliseconds for the brain to form a representation of a
human face making EEG the perfect candidate, as EEG can capture activity
at a time scale down to milliseconds [ScienceDaily, 2018].
13. CHAPTER 2. REVIEW OF EEG-BASED EMOTION RECOGNITION 6
2.2.2 Brain waves
The development of technologies such as virtual reality, wearable devices
and understanding the physiological responses to emotional states can serve
a wide range of valuable applications in such diverse domains:
• Medicine: Rehabilitation (Help monitoring), companion (Enhance re-
alism), counseling (Client’s emotional state), health care (Patient’s feel-
ing about treatment especially for for deaf, dumb and blind).
• E-learning: Adjust the online presentation of an online tutor, detect
the state of the learner, improve tutoring systems.
• Monitoring: Detect driver’s state and warning him, ATM not dis-
pensing money when scared, improve call-center system (Detect and
prioritize angry customers via their voices).
• Entertainment: Recognize mood and emotions of the users and sat-
isfy their needs with the right content (Movies and music recommen-
dations).
• Law: Deeper discovery of depositions: Improve investigations tools
with criminals, suspects and witnesses.
• Marketing: Impact of ads, ameliorate advertising plans, optimize rec-
ommendation systems then satisfy user shopping experience so increase
sales.
Brain waves represent the regularly recurring wave forms that are similar
in shape and duration [Steriade, 2005]. There are five main EEG frequency
bands: Delta, theta, alpha, beta and gamma which reflect the different brain
states [NeuroSky, 2009].
Brain waves and the functions of each EEG band are described below:
• Delta waves (0.1-3 Hz): Appear in dreamless sleep and unconscious
states.
• Theta waves (4-7 Hz): Observed in different states such as intuitive,
creative, imaginary and drowsy states.
• Alpha waves (8-12 Hz): The first EEG waves that were discovered by
Berger [Niedermeyer and da Silva, 2005]. Alpha waves appear in the
relaxation state, tranquil and conscious states but not during drowsy.
Alpha waves become attenuated in several states such as eyes opening,
hearing sounds, anxiety or attenuation.
14. CHAPTER 2. REVIEW OF EEG-BASED EMOTION RECOGNITION 7
• Beta waves (12-30 Hz): Observed in the active state and anxious think-
ing. There are three different bands of beta waves:
– Low beta waves (12-15 Hz): Appear in relaxed yet focused and
integrated cases.
– Mid-range beta waves (16-20 Hz): Appear in thinking and aware-
ness of self and surroundings.
– High beta (21-30 Hz): Observed in alertness and agitation states.
• Gamma waves (30-100 Hz): Observed in higher mental activity such
as in processing information and learning.
The EEG oscillations of the same frequency may have different functions as
depicted in Figure 2.1 for example, delta oscillations are normal and abnor-
mal based on states : Normal through slow wave sleep and clearly signing
abnormality during awake state [Freeman and Quiroga, 2012].
Figure 2.2: The five EEG signals and its associated activities
[Web 3]
15. CHAPTER 2. REVIEW OF EEG-BASED EMOTION RECOGNITION 8
2.2.3 Anxiety disorder
While the origins of the field may be traced as far back as to early philosoph-
ical inquiries into emotion (”affect” is, basically, a synonym for ”emotion”),
the more modern branch of computer science originated with Rosalind Pi-
card’s book [Picard, 1997] on affective computing. The motivation for the
research is the ability to simulate empathy.
According to [Web 4], anxiety refers to multiple mental and physiological
phenomena, including a person’s conscious state of worry over a future un-
wanted event, or fear of an actual situation. Anxiety and fear are closely
related, some scholars view anxiety as a uniquely human emotion and fear as
common to nonhuman species. Another distinction often made between fear
and anxiety is that fear is an adaptive response to realistic threat, whereas
anxiety is a diffuse emotion, sometimes an unreasonable or excessive reaction
to current or future perceived threat.
2.2.4 Worldwide and Tunisia statistics
According to ADAA [Web 5], anxiety disorders are the most common men-
tal illness in the United States, affecting 40 million adults in the country age
18 and older, or 18.1% of the population every year. Anxiety disorders are
highly treatable, yet only 36.9% of those suffering receive treatment. Anxi-
ety disorders affect 25.1% of children between 13 and 18 years old. Research
shows that untreated children with anxiety disorders are at higher risk to per-
form poorly in school, miss out on important social experiences, and engage
in substance abuse. The WHO reports that anxiety disorders are the most
common mental disorders worldwide with specific phobia, major depressive
disorder and social phobia being the most common anxiety disorders. The
figure below quoted from the UN Sustainable Development Solutions Net-
work Report of 2018 showed that Tunisia ranked 111 in the happiness index.
This result reflects the deterioration of the mental health situation within
Tunisian society and the need for optimizing existing solutions.
Figure 2.3: Hapiness index for 2015-2017
[Web 6]
16. CHAPTER 2. REVIEW OF EEG-BASED EMOTION RECOGNITION 9
2.3 Ways of emotion detection using machine
learning
It exists different ways for emotion recognition through machine learning.
Here are the last used tehcniques for feelings identification:
2.3.1 Facial Recognition
Facial recognition based on machine learning (ML) is a widely used method
for detecting emotions. It takes advantage of the fact that our face charac-
teristics fluctuate dramatically in response to our emotions. When we are
happy, for example, our lips expand upwards from both ends. Similarly, when
we are excited, we elevate our brows.
Facial Recognition is a valuable emotion detection technology in which
pixels of critical facial regions are evaluated to characterize facial expressions
using facial landmarks, machine learning and deep learning. Eyes, nose, lips,
jaw, eyebrows, mouth, and other facial landmarks are employed in emotion
detection using machine learning. While a distinct facial landmark may
present in two separate emotions, a detailed analysis of the combination of
different landmarks using artificial intelligence via machine learning can help
distinguish between similar-appearing but unique emotions. For example,
while elevated eyebrows can indicate astonishment, they can also be a sign
of worry. Higher brows with raised lip boundaries, on the other hand, would
signal a joyful surprise rather than anxiety. Face recognition can be used to
detect emotions in surveillance healthcare.
Figure 2.4: Extracting facial features
[Web 8]
17. CHAPTER 2. REVIEW OF EEG-BASED EMOTION RECOGNITION10
2.3.2 Speech Recognition
Speech feature extraction and voice activity detection are required for emo-
tion identification using speech recognition. The method entails utilizing
machine learning to analyze speech parameters such as tone, energy, pitch,
formant frequency, and so on, and determining emotions based on changes
in these features.
Because voice signals can be obtained quickly and cheaply, ML-based
emotion identification via speech, also known as Speech Emotion Recogni-
tion (SER), is very popular. A good audio database, effective feature extrac-
tion, and the deployment of trustworthy classifiers employing ML techniques
and Natural Language Processing (NLP) are all required for speech emotion
recognition using machine learning.
Both feature extraction and feature selection are critical for reliable find-
ings. Then, using various classification techniques, raw data is classified into
a certain emotion class based on features retrieved from the data, such as the
Gaussian Mixture Model (GMM), Hidden Markov Model (HMM), Support
Vector Machine (SVM), Neural Networks (NN), and Recurrent Neural Net
(RNN).
Major application areas for SER are audio surveillance, e-learning, clinical
studies, banking, entertainment, call-centers, gaming, and many more. For
example, emotion detection in e-learning helps understand students’ emo-
tions and modify the teaching techniques accordingly [Web 7].
Figure 2.5: Extracting speech features
[Web 9]
2.3.3 Body Gestures and Movements
With the help of machine learning, analyzing body movements and gestures
can also aid in emotion identification. With changes in emotions, our bodily
movements, posture, and gestures alter dramatically. This is why, based on
18. CHAPTER 2. REVIEW OF EEG-BASED EMOTION RECOGNITION11
a mix of hand/arm gestures and body movements, we can usually infer a
person’s basic mood. A clenched fist with an alert stance, for example, is an
indication of rage. In addition, if a person
Every shift in human mood is followed by a succession of gestures and
changes in body movement. With the use of proper Machine learning clas-
sifier algorithms and gesture sensors like Microsoft Kinect, OpenKinect, and
OpenNI, analyzing a combination of various gestures and body motions can
provide excellent insights into emotion recognition.
The process of emotion detection through body gestures and movements
involves the extraction of regions in relevant body parts, for example, from
hands to get a hand region mask. Then, contour analysis is performed in
this region that provides contours and convexity defects. This is used for
classification. The extraction of areas in important body parts, such as hands,
to obtain a hand region mask, is part of the process of emotion detection
using body gestures and motions. Then, in this region, contour analysis is
performed, which produces contours and convexity defects. This is used to
categorize results. Five extended fingers imply open hands and no extended
finger implies a fist.
Figure 2.6: Emotion recognition using hand movements
[Web 10]
2.3.4 Motor Behavioural Patterns
The changes in a person’s behavioral patterns with muscle tension, strength,
coordination, and frequency can also help characterize changes in their emo-
tional state when using the correct machine learning algorithms. As a result,
these are useful factors for machine learning-based emotion identification.
A cheerful state, for example, is shown by symmetric up and down hand
19. CHAPTER 2. REVIEW OF EEG-BASED EMOTION RECOGNITION12
gestures. This method leverages the fact that our body muscles react sig-
nificantly to the changes in our emotional state, as a reflex action. While
we would not even be aware of how prominent these changes are, these mo-
tor behavioral changes if recorded and analyzed properly through machine
learning techniques, act as great indicators for emotion detection.
Figure 2.7: ER using walking behavioural features
([Randhavane, 2020])
2.3.5 Biosignals
Emotion detection through biosignals is the process of analyzing biologi-
cal changes occurring with emotion changes. Biosignals include heart rate,
temperature, pulse, respiration, perspiration, skin conductivity, electrical im-
pulses in the muscles, and brain activity. For example, a rapidly increasing
heart rate indicates a state of stress or anxiety.[Web 7]
These biosignals, also known as physiological signals, aid in gaining
knowledge on human physiological states. The problem is that a single biosig-
nal is insufficient because it can convey a variety of emotions. As a result,
several biosignals from various areas of the body are combined and examined
as a whole. M is then used to categorize these biosignals (combinations).
These biosignals (combinations) are then analyzed using machine learning
techniques such as convolutional neural networks (CNN) and classification
20. CHAPTER 2. REVIEW OF EEG-BASED EMOTION RECOGNITION13
algorithms such as regression tree, support vector machine, linear discrimi-
nant analysis, and Naive Bayes, among others. This technology is practical
since it is now possible to record and analyze biosignals using smart wear-
able devices. More complicated biosignals are also recorded for healthcare
purposes using electroencephalography (EEG), electrocardiography (ECG),
and electromyography (EMG).
Figure 2.8: Emotion recognition using biosignals
[Web 11]
In conclusion, the greatest results in emotion detection using machine learn-
ing may be obtained by combining two or more of these approaches. By
expanding the number of users, learning ability improves, and the obtained
data from these strategies enhances the results. EEG, as a breakthrough ma-
chine learning methodology for understanding emotions, has the potential to
be a game-changing method for treating millions of individuals all over the
world.
21. CHAPTER 2. REVIEW OF EEG-BASED EMOTION RECOGNITION14
2.4 Existing applications for anxiety’s detec-
tion and treatment
2.4.1 Wysa: Depression and anxiety therapy chatbot
Wysa is an emotionally intelligent chatbot that uses AI to react to the
user’s emotions expressions for free. Talk to the cute penguin or use its
free mindfulness exercises for effective anxiety relief, depression and stress
management. With the highest rating in all health care apps 4.8/5 and
over 1 million downloads, Wysa obtained ORCHA prize for best stress
apps (ORCHA: A British organization for testing and reviewing health
apps) and Editor’s choice WMHD (World Mental Health Day) both for 2019.
Figure 2.9: Screenshot from the Penguin chatbot
[Web 12]
If the patient is dealing with stress, anxiety and depression or coping
with low self-esteem, then talking to Wysa can help them relax and get
unstuck, it’s empathetic, helpful, and will never judge. Persons will overcome
their mental health obstacles, through empathetic conversation and free CBT
therapy (Cognitive Behavioral Therapy) based technique. Used around the
clock and trusted by 1,000,000 people.
For extra support, people can avail guidance from a real human coach,
a skilled psychologist who will take them through the advanced coaching
sessions for their needs.
Vent and talk through things or just reflect on their day with the AI
chatbot, practice CBT and DBT techniques (Dialectical Behavior Therapy)
22. CHAPTER 2. REVIEW OF EEG-BASED EMOTION RECOGNITION15
to build resilience in a fun way using one of 40 conversational coaching tools
which helps in dealing with (stress, anxiety, depression, panic attacks, worry,
loss, or conflict), manage anxious thoughts and anxiety: deep breathing,
techniques for observing thoughts, visualization, and tension relief [Web 12].
In brief, this application gathers data through messages received from
users. AI algorithms improve their answers from the huge database.
2.4.2 Daylio:
Daylio is a highly flexible tool, which we can use it to track whatever we
want. Exercise, meditate, eat, and be thankful with our fitness goal buddy,
mental health coach, and food log. This program looks after the mental,
emotional, and physical well-being. Self-care is essential for a better mood
and less worry. With a 4.6/5 rating from 320.000 users, Daylio surpassed 10
million downloads.
Figure 2.10: Screenshots from Daylioo
[Web 13]
-This application is built on three principles:
1. Reach happiness and self-improvement by being mindful of our days.
2. Validate our hunes. How does our new hobby influence our life.
3. Form a new habit in an obstacle-free environment no learning curve.
Finally, this app does not gather any data as mentioned in its Google Play
page: “We don’t send your data to our servers. We don’t have access to your
entries. Also, any other third-party app can’t read your data.”
23. CHAPTER 2. REVIEW OF EEG-BASED EMOTION RECOGNITION16
2.4.3 Headspace:
With over 10 million downloads and 4.6/5 rating by 200 thousands users, this
piece of software is among the most popular mental health care applications.
Headspace is a mindfulness app that users may utilize in their daily lives.
Learn meditation and mindfulness methods from world-class experts and
build tools to help patients focus, breathe, stay calm, and create balance in
their lives, whether they require stress relief or sleep assistance.
Figure 2.11: Screenshots from Headscape
[Web 15]
The users will learn how to deal with tension and anxiety, as well as how
to calm their minds.
• Stress & anxiety meditation: Managing anxiety, letting go of stress
• Falling asleep & waking up meditation: Sleep, restlessness
• Work & productivity meditation: Finding focus, prioritization, pro-
ductivity, creativity and student’s meditations.
• Movement & sports meditation: Motivation, focus, training, competi-
tion, recovery
• Physical health mindfulness training: Mindful eating, pain manage-
ment, pregnancy, coping with cancer. [Web 14]
In brief, Headscape does not gather any data, it does not use chatbots or
surveys to collect users data.
24. CHAPTER 2. REVIEW OF EEG-BASED EMOTION RECOGNITION17
2.4.4 Calm:
This app is a popular choice for meditation and sleep. With guided medita-
tions, Sleep Stories, breathing programs, masterclasses, and calming music,
millions of individuals enjoy reduced tension, anxiety, and more peaceful
sleep. Top psychiatrists, therapists, and mental health professionals have
endorsed the app, according to the creators.
Guided meditation sessions are available in lengths of 3, 5, 10, 15, 20 or
25 minutes so user can choose the perfect length to fit with his schedule.
Calming anxiety, managing stress, deep sleep, focus and concentration,
relationships, breaking habits, happiness, gratitude, self-esteem, body scan,
loving-kindness, forgiveness, non-judgment, commuting to work or school,
mindfulness at College, mindfulness at work, walking meditation, and calm
kids are just a few of the topics covered. [Web 16]
Figure 2.12: Mindful days follow-up sechedule from Calm
[Web 17]
In sum, Calm concentrates on relaxing music, sleep stories and breathing
programs, the app does not use chatbots or gather any data to improve its
algorithms.
2.4.5 AntiStress, Relaxing, Anxiety Stress Relief
Game:
With more than 5 millions installs and 4.2/5 rating by 53 thousands users,
the app provides users relaxation with satisfying games that are designed
25. CHAPTER 2. REVIEW OF EEG-BASED EMOTION RECOGNITION18
with great concepts and full of relaxation toys. Using them and creating fun
loving moments in the hectic routine. This relaxing game 2021 with color
therapy is for all ages. Users just need to download it and plunge into it for
his unlimited fun and relaxation.
Figure 2.13: Many relaxing and coloring games to treat anxiety
[Web 19]
The app contains: Realistic 3D brain exercise and relaxation, different
mind freshness toys, high quality relaxing sounds to release stress, realistic
experience of release stress in minutes, smooth controls to play with the 3D
fidget toys and different relaxation toys missions.[Web 18]
In brief, this application does not gather any data or information from
users.
2.4.6 Shine: Calm Anxiety Stress:
This application have more than 100 thousands downloads and 4.8/5 rat-
ing,. The application help users to learn a new self-care strategy every day,
get support from a diverse community, and access an audio library of 800+
original meditations, bedtime stories, and calming sounds to help patients
26. CHAPTER 2. REVIEW OF EEG-BASED EMOTION RECOGNITION19
shift their mindset or mood plus: Meditations specific to the mental health
challenges faced by members of marginalized groups. [Web 20]
Topics include: Black Well being calming anxiety, reducing stress, con-
fidence, growth, improving sleep, focus, burnout, forgiveness, self-love, mo-
tivation, creativity, finding joy, managing work frustrations, strengthening
relationships and creating healthy habits.
Figure 2.14: Screenshots from Shine app
[Web 21]
To summarize, SHINE does not collect any informations or data from it
users, it is just a classic programmed app. [Web 20]
2.5 Available anxiety elicitation-based
datasets
Anxiety affects human capabilities and behavior as much as it affects pro-
ductivity and quality of life. It is considered to be the main cause of
depression and suicide. Anxious states are detectable by specialists by
virtue of their acquired cognition and skills. There is a need for non-
invasive reliable techniques that performs the complex task of anxiety de-
tection. Several works [Garcı́a-Martı́nez et al., 2017], [Arsalan et al., 2019],
27. CHAPTER 2. REVIEW OF EEG-BASED EMOTION RECOGNITION20
and [Zhang et al., 2020] were proposed to recognize anxious states. There is
no consensus nor about the elicitation of anxious states neither about the
labels which makes existing works very different and difficult to compare
them.
• Recently, a new dataset known as ”DASPS” for anxiety levels recog-
nition ([Baghdadi et al., 2020]) from low-cost portable EEG device
(EMOTIV-EPOC) with 14 channels is released. The EEG recordings
were taken from 23 participants. DASPS is characterized by a ther-
apeutic elicitation which triggers different levels of anxiety in partici-
pants by self-recall of stressful situations. To accord labels to two and
four levels, Hamilton score was taken from questionnaire filled before
and after experiment.
• In the same context, Arsalen et al. ([Arsalan et al., 2019]) carried a
psychological experiment on 28 participants by recording EEG signals
using low-cost portable EEG device (MUSE:Muse is a wearable brain
sensing headband where the device measures brain activity via 4 elec-
troencephalography EEG sensors). Preparing oral presentation is used
as stressful activity to trigger perceived mental stress. Three sessions
recording: The pre-activity when participants are in a resting position,
activity when they prepare the presentation and post-activity for the
public oral presentation. Arsalen et al. showed that only pre-activity
EEG recordings are well correlated to two and three stress levels, re-
spectively. In the classification task, only pre-activity EEG signals are
considered.
• Anxiety disorder is recognized through Healthy Brain Network (HBN)
dataset [Alexander et al., 2017] launched by the American Institute of
Child Psychology and includes data collected from children and ado-
lescents (ages 5 to 17) in New York City. HBN was proposed to di-
agnose and intervene in the mental health of minors. The dataset
contains also eye movements and large EEG recordings. Zhang et al.
[Zhang et al., 2020] selected 92 subjects (where 45 children are consid-
ered as anxious and 47 children as normal) to conduct experiments ac-
cording to the Screen for Child Anxiety Related Disorders (SCARED)
scale. Zhang et al. [Zhang et al., 2020] extracted PSD (Post-Stroke
Depression) features from Gamma band and transform them using a
new proposed Group Sparse Canonical Correlation Analysis (GSCCA)
to achieve 82.70% with SVM classifier.
28. CHAPTER 2. REVIEW OF EEG-BASED EMOTION RECOGNITION21
2.5.1 Problem of imbalanced dataset
First, let is explain clearly what is balanced dataset, we consider flower pic-
tures as a positive values and tree pictures as a Negative value. We can
say that the number of positive values and negative values in approximately
same but the imbalanced dataset: If there is the very high different between
the positive values and negative values. So, using an imbalanced dataset
will produce a wrong learning and finally a wrong classification results. For
this reason we must generate new sufficient amount of data for each category
before beginning the training and testing steps to have a precise results with
high accuracy.
Figure 2.15: GAP results between balanced and imbalanced dataset
2.6 Conclusion
EEG-based emotion recognition task is very crucial for human daily life,
especially to maintain mental health and keep information in an early stage
before going into a bad situation. In this chapter, we detailed several concepts
related to EEG-based emotion recognition task. The specificity’s of EEG
signals where the representation of anxiety state is explained. A brief research
about used machine learning methods for emotion recognition then we have
mentioned the most famous existing applications about anxiety detection
and treatment and finally we have presented the available EEG datasets
from several scientific experiences such as DASPS dataset.
In the next chapter, a data augmentation step is proposed in order to
provide sufficient data for the training of the neural network.
29. Chapter 3
EEG data augmentation using
CVAE
3.1 Introduction
Unsupervised learning is modeling the underlying structure or distribution
in the data in order to learn more about it. These, are called unsupervised
learning because unlike supervised learning there are no correct answers and
there is no teacher. Algorithms are left to their own devices to discover and
present the interesting structure in the data.
Variational Autoencoder as a generative model is based on unsupervised
learning to learn the structure of input data for the aim of generating new
similar data to the orginal real data.
3.2 The autoencoder
Autoencoders (AE) are a family of neural networks for which the output
produced data is the same as the input data. They work by compressing the
input into a latent-space representation, and then reconstructing the output
from this representation. The general idea of autoencoders is simple and
consists on setting an encoder and a decoder as neural networks and to learn
the best encoding-decoding scheme using an iterative optimization process.
The search of encoder and decoder that minimize the reconstruction error is
done by gradient descent over the parameters of these networks. Figure 3.1
depicts the encoder and decoder of an autoencoder network.
Notice that the gradient descent is an optimization algorithm for finding
a local minimum of a differentiable function. Gradient descent is used to
find the values of a function’s parameters (coefficients) that minimize a cost
function as far as possible.
Notice that dimensionality reduction refers to techniques that reduce the
number of input variables in a dataset.
The more complex the architecture is, the more the autoencoder can
proceed to a high dimensionality reduction while keeping reconstruction loss
low.
22
30. CHAPTER 3. EEG DATA AUGMENTATION USING CVAE 23
Figure 3.1: The autoencoder architecture
[Web 22]
Second, most of the time, the final purpose of dimensionality reduction
is not to only reduce the number of dimensions of the data but to reduce
this number of dimensions while keeping the major part of the data struc-
ture information in the reduced representations. For these two reasons, the
dimension of the latent space and the ”depth” of autoencoders (That define
the degree and the quality of compression) have to be carefully controlled and
adjusted depending on the final purpose of the dimensionality reduction.
Figure 3.2: The latent space regularization problem
[Web 23]
The regularity of the latent space for autoencoders is a difficult point that
depends on the distribution of the data in the initial space, the dimension
of the latent space and the architecture of the encoder. The high degree of
freedom of the autoencoder that makes possible to encode and decode with
no information loss (Despite the low dimensionality of the latent space) leads
to a severe overfitting.
31. CHAPTER 3. EEG DATA AUGMENTATION USING CVAE 24
According to Figure 3.2, we can notice that the problem of the autoen-
coders latent space regularity is much more general and need a special atten-
tion. Indeed, the autoencoder is trained for enforce to get such organisation:
The autoencoder is solely trained to encode and decode with as few loss as
possible, no matter how the latent space is organised.
3.3 Variational AutoEncoder (VAE)
Variational autoencoders (VAEs) are a deep learning technique for learning
latent representations. They have also been used to draw images or gener-
ating new data in semi-supervised learning. VAE is a generative model that
estimates the Probability Density Function (PDF) of the training data. The
unique fundamental property that separates it from standard autoencoders,
and makes them so useful for generative modeling is that their latent spaces
are by design, continuous, allowing easy random sampling and interpolation.
In brief, VAE has more parameters to tune that gives significant control over
how we want to model our latent distribution therefore a meaningful outputs
with high quality.
The VAE training objective is to maximize the likelihood of the training
data as described by equation 3.1, according to the model shown in Figure
3.3, where x is the input, z is the latent vector or the hidden representation
and θ represents the network parameters.
pθ(x) =
Z
pθ(z)pθ(x|z)dz (3.1)
The choice of the output distribution is mainly Gaussian, i.e. p(x|z; θ) =
N(x|f(z; θ), σ2
∗ I), f(z; θ) will be modeled using a neural network and σ is
a hyper parameter that multiplies the identity matrix I.
The formula for P(x) is intractable because it requires exponential time to
compute as it needs to be evaluated over all configurations of latent variables.
To solve this problem, an additional encoder network is defined qφ(z|x) to
approximate pθ(z|x). The marginal likelihood of individual data points can
be rewritten as follows:
log pθ(x(i)
) = DKL(qφ(z|x)||pθ(z)) + L(θ, φ, x(i)
) (3.2)
The first term of equation 3.2 is the KL (Kullback-Leibler) divergence
of the approximate posterior and the prior, which intuitively measures how
similar two distributions are. The second term is the variational lower bound
on the marginal likelihood of the data point i. Since the Kullback-Leibler di-
vergence is always greater than or equal to zero. This means that minimizing
32. CHAPTER 3. EEG DATA AUGMENTATION USING CVAE 25
Figure 3.3: The variational autoencoder model
[Web 23]
the Kullback-Leibler divergence is equivalent to maximizing the variational
lower bound. Equation 3.2 can be rewritten as follows:
log pθ(x(i)
) ≥ L(θ, φ, x(i)
) (3.3)
L(θ, φ, x(i)
) = Ez[log pθ(x(i)
|z)] − DKL[qφ(z|x(i)
)||pθ(z)] (3.4)
The loss function for this network (equation 3.4) consists of two terms, the
first penalizes the reconstruction error and the second term encourages the
learned distribution qφ(z|x(i)
) to be similar to the true prior distribution
pθ(z).
The VAE architecture is presented in Figure 3.4 where the encoder model
learns a mapping from x to z and the decoder model learns a mapping from
z back to x. The encoder output is constrained from two vectors describing
the mean µz|x and variance σz|x of the latent state distributions. The de-
coder generates a latent vector by sampling from these defined distributions
and proceed to develop a reconstruction of the original input. Using the
backpropagation technique for optimizing the loss is not feasible because the
sampling process is random. To solve this problem, a ”reparameterization
trick” is used which consists on randomly sampling from the desired distri-
bution, and then multiply it by the mean µz|x and add the variance σz|x to
the result as described in Figure 3.5.
33. CHAPTER 3. EEG DATA AUGMENTATION USING CVAE 26
Figure 3.4: The optimisation of the variational autoencoder model
[Web 23]
Figure 3.5: The reparameterization trick
3.4 The gap between AE and VAE
An autoencoder accepts input, compresses it, and then recreates the original
input. This is an unsupervised technique because all we need is the original
data, without any labels of known, correct results. The two main uses of
an autoencoder are to compress data to two (or three) dimensions so it can
be graphed, and to compress and decompress images or documents, which
34. CHAPTER 3. EEG DATA AUGMENTATION USING CVAE 27
removes noise in the data. A variational autoencoder assumes that the source
data has some sort of underlying probability distribution (such as Gaussian)
and then attempts to find the parameters of the distribution. Implementing
a variational autoencoder is much more challenging than implementing an
autoencoder. The one main use of a variational autoencoder is to generate
new data that’s related to the original source data. Now exactly what the
additional data is good for is hard to say. A variational autoencoder is a
generative system, and serves a similar purpose as a generative adversarial
network (although GANs work quite differently)[Web 24].To conclude, if we
want precise control over your latent representations and what we would like
them to represent, then choose VAE [Web 23].
Figure 3.6: Real difference between AE and VAEs on MNIST dataset
[Web 25]
3.5 Conditional Variational Autoencoder
The only problem for generating data using Variational Autoencoders is that
we do not have any control over what sort of data it generates. To explain the
principle, for example when we train a VAE with the EEG data set and try
to produce new signals by feeding Z ∼ N(0, 1) into the decoder, it will also
generate another random outputs. If we train the decoder well, we will have
a better signal’s quality but we will have no control over what EEG precisely
35. CHAPTER 3. EEG DATA AUGMENTATION USING CVAE 28
it will produce. For example, We can’t decide exactly what we want to get
in the output.
Figure 3.7: The network of CVAE. Here, Enc and Dec represent one real sam-
ple, real label, generated sample, mean value, standard deviation, resampled
noise, encoder, and decoder, respectively.
For this, we should change our VAE architecture. Give an input Y(label
of the EEG) we want our generative model to produce output X(EEG). Thus,
the process of VAE will be modified as the following: given observation y, z is
drawn from the prior distribution Pθ(z|y), and the output x is produced from
the distribution Pθ(x|y, z). Note that, for simple VAE, the prior is Pθ(z) and
the output is produced by Pθ(x|z).
Therefore, the encoder part tries to learn Qθ(z|x, y), which is equivalent to
learning hidden representation of data X or encoding the X into the hidden
representation conditioned y. The decoder part tries to learn Pθ(X|z, y)
which decoding the hidden representation to input space conditioned by y.
The graphical model can be expressed as mentioned below.
Figure 3.8: The architecture of a Conditional Variational Autoencoder
[Web 26]
In this method, we aim to generate data with the specific category. As
36. CHAPTER 3. EEG DATA AUGMENTATION USING CVAE 29
shown in next figure, to control the generated category, an extra label Y is
added to the encoder and decoder. Firstly we feed the training data point and
the corresponding label to the encoder, secondly we concatenate the hidden
representation with the corresponding label and feed it to the decoder to
train the network. Thirdly, we can generate data with the specific label by
feeding the decoder with the noise sampled from the Gaussian distribution
and the assigned label.
3.6 Coding in COLAB
Our code in COLAB is divided into 4 parts:
1. Create a sampling layer
2. Define the standalone encoder model
3. Define the standalone decoder model
4. Define the CVAE as a Model with a custom training step: The genera-
tion process is performed with sampling layer followed by the decoder,
we have generated different samples, labels encoded with 4 bits.
3.7 Evaluation of the generation process
It is important to ensure that the generated samples are of high quality;
in other words, they are realistic and diverse. Lack of diversity among the
generated samples is an indicator of mode collapse, meaning that the gen-
erator has collapsed into generating only limited modes of the real data. In
this case, we use several qualitative and quantitative metrics to evaluate the
quality of the samples generated by VAEs in terms of diversity and similarity
with the real samples.
Visualization:
Before we start, we have to define t-SNE:t-Distributed Stochastic Neigh-
bor Embedding (t-SNE) is an unsupervised, non-linear technique primarily
used for data exploration and visualizing high-dimensional data. In sim-
pler terms, t-SNE gives a feel or intuition of how the data is arranged in a
high-dimensional space.
37. CHAPTER 3. EEG DATA AUGMENTATION USING CVAE 30
Figure 3.9: t-SNE representation of real and generated data
Now, we visually inspect the quality of the artificial samples by map-
ping the generated and real samples into two dimensions using (t-SNE) and
temporal distribution. T-SNE (t-distributed Stochastic Neighbor Embed-
ding) is applied to map the high dimensional real (Training) and generated
EEG samples into 2-D space. Next figure displays a 2D plot of the anxiety
classes in the latent space. It can be seen that t-SNE embedding of real MI
and generated MI is similar. In addition, real and generated rest samples
have similar distributions. Besides comparing the generated samples with
the training samples, it is interesting to compare them with the test samples.
In fact, the similarity between the generated and the test samples will ex-
plain the classification improvement made by the augmentation. The training
set includes the samples from all subjects excluding the target subject, the
generated set includes the generated samples for the target subject, and the
test set includes the second half of the target subject’s samples that were
not seen during training. Thus, no overlap between the training, test, and
generated sets exists. The results verified that the generated samples were
indeed realistic and diverse.
38. CHAPTER 3. EEG DATA AUGMENTATION USING CVAE 31
Figure 3.10: Topographical map of real and generated data
To evaluate the quality of the generated data, we plotted the topograph-
ical map of the generated data and the real data. In general, the signals are
similar.
3.8 Conclusion
In summary, due to missing sufficient amount of data for good machine learn-
ing prediction, we have used Variational Autoencoders to generate new more
data. An autoencoder is a neural network that is trained to attempt to copy
its input to its output.The problem with Variational autoencoders is that we
have not any control over what sort of data it generates so we use decide to
use Conditional VAEs as solution to improve the outputs.
39. Chapter 4
Emotion recognition using
recurrent CNN
4.1 Introduction
The problem of classifying multi-channel Electroencephalogram (EEG)
time series consists in assigning their representation to one of a fixed
number of classes. This is a fundamental task in many health-
care applications, including anxiety detection ([Baghdadi et al., 2017],
[Fourati et al., 2020] and [Baghdadi and Aribi, 2019]), epileptic seizures pre-
diction [Tsiouris et al., 2018] and also affective computing applications such
as EEG-based emotion recognition ([Fourati et al., 2020]). The problem
has been tackled by a wealth of different approaches, spanning from
the signal decomposition techniques of EEG signals to the feature ex-
traction and feature selection algorithms as highlighted in the surveys
[Baghdadi et al., 2016], [Movahedi et al., 2017], [Mahmud et al., 2018], and
[Garcı́a-Martı́nez et al., 2019].
Representation learning or feature learning [Bengio et al., 2013] consists
in automatically discovering the relevant representations for a classification
or detection task directly from raw. Consequently, the laborious handcrafted
features are no longer needed since representation learning permits to both
learn the features and use them to perform a specific task.
In this chapter, we focus on EEG representation learning for anxiety
states recognition using recurrent convolutional neural network in a subject-
independent context.
4.2 Convolutional Neural Network
4.2.1 CNN principle
Convolutional Neural Network (CNN) is a deep neural network originally
designed for image analysis. Recently, it was discovered that the CNN has
also an excellent capacity in sequence data analysis such as natural language
processing. CNN always contains two basic operations, namely convolution
32
40. CHAPTER 4. EMOTION RECOGNITION USING RECURRENT CNN33
and pooling. The convolution operation using multiple filters is able to
extract features (Edges) from the data set, through which their corresponding
spatial information can be preserved. The pooling operation, also called
sub-sampling, is used to reduce the dimensionality of feature maps from
the convolution operation. Max pooling and average pooling are the most
common pooling operations used in the CNN. Due to the complicity of CNN,
ReLU is the common choice for the activation function to transfer gradient
in training by backpropagation [Jitendra Verma, 2020].
4.2.2 CNN applications
Convolutional neural networks (CNNs) are more often utilized for classifica-
tion and computer vision tasks such as pedestrian and object detection for
self-driving cars, face recognition on social media or securing mobile phones,
image analysis in healthcare (detecting tumours and diseases), quality in-
spection in manufacturing, security in airports, improving results in search
engines, recommender systems (like in Youtube, Amazon and Facebook, etc),
emotions recognition, stock and currency prediction values.
Figure 4.1: Charles Camiel looks into the camera for a facial recognition test
at Logan International Airport in Boston
[Web 28]
41. CHAPTER 4. EMOTION RECOGNITION USING RECURRENT CNN34
4.2.3 CNN Architecture
In general CNN architecture consists of 4 kinds of layers: Convolutional layer,
pooling layer, dense layer and output layer.
• Convolutional layer: Convolutional layer is the backbone of any
CNN working model. This layer is the one where pixel by pixel scanning
takes place of the images and creates a feature map to define future
classifications.
• Pooling layer: Pooling is also known as the down-sampling of the
data by bringing the overall dimensions of the images. The informa-
tion of each feature from each convolutional layer is limited down to
only containing the most necessary data. The process of creating con-
volutional layers and applying pooling is continuous, may take several
times.
• Fully connected input layer: This is also known as the flattening
of the images. The outputs gained from the last layer are flattened
into a single vector so that it can be used as the input data from the
upcoming layer.
• Fully connected layer: After the feature analysis has been done and
it’s time for computation, this layer assigns random weights to the
inputs and predicts a suitable label.
• Fully connected Output layer: This is the final layer of the CNN
model which contains the results of the labels determined for the clas-
sification and assigns a class to the images.
Figure 4.2: CNN architecture
[Web 29]
42. CHAPTER 4. EMOTION RECOGNITION USING RECURRENT CNN35
4.2.4 Difference between Conv1D and Conv2D
We wanted to show the difference between Conv1D and Conv2D because
in our code we relied on Conv2D
• For Conv1D, the kernel moves in only one direction. The input
and output data of Conv1D is 2-dimensional (Time and variable Y).
Mainly used for time series data like audio, text, acceleration, etc.
• The Conv2D is known by the kernel slides along the data is 2 di-
mensions, the kernel moves in 2 directions (Height and width). The
input and output data of Conv2D is 3-dimensional (Height, width
and depth). It is mainly used for image data. Kernel matrix can ex-
tract spatial features from the data, it detects edges, color distribution,
etc.
Figure 4.3: Conv2D kernel sliding
[Web 30]
4.3 Recurrent Neural Network
4.3.1 Definition
A recurrent neural network (RNN) is a type of artificial neural network which
uses sequential data or time series data. Like feedforward and CNNs, RNNs
43. CHAPTER 4. EMOTION RECOGNITION USING RECURRENT CNN36
utilize training data to learn. They are distinguished by their ”memory”
as they take information from prior inputs to influence the current input
and output. While traditional deep neural networks assume that inputs
and outputs are independent of each other, the output of recurrent neural
networks depend on the prior elements within the sequence. While future
events would also be helpful in determining the output of a given sequence,
unidirectional recurrent neural networks cannot account for these events in
their predictions.
4.3.2 RNN Applications
RNNs are commonly used for ordinal or temporal problems, such as language
translation, natural language processing (NLP), speech recognition, and im-
age captioning; they are incorporated into popular applications such as Siri,
voice search, and Google Translate.
Figure 4.4: GAP between RNNs (L) and Feedforward Neural Networks (R)
[Web 31]
4.3.3 Long Short Term Memory layer LSTM
The problem with RNNs is that as time passes by and they get fed more
and more new data, they start to ”forget” about the previous data they have
seen in what called: Vanishing gradient problem, so we need some sort of
Long term memory, which is just what LSTMs provide. The core concept
of LSTM is the cell state, and its various gates. LSTM is a type of cell in
a recurrent neural network used to process sequences of data in applications
such as handwriting recognition, machine translation, and image captioning.
LSTMs address the vanishing gradient problem that occurs when train-
ing RNNs due to long data sequences by maintaining history in an internal
44. CHAPTER 4. EMOTION RECOGNITION USING RECURRENT CNN37
memory state based on new input and context from previous cells in the
RNN. The cell state as illustrated in Figure 4.5 acts as a transport highway
that transfers relative information all the way down the sequence chain. It
can be seen as the ”memory” of the network. The cell state, in theory, can
carry relevant information throughout the processing of the sequence. So
even information from the earlier time steps can make its way to later time
steps, reducing the effects of short-term memory. As the cell state goes on its
journey, information gets added or removed to the cell state via gates. The
last are different neural networks that decide which information is allowed
on the cell state. The gates can learn what information is relevant to keep
or forget during training.
Figure 4.5: Inside the LSTM cell
[Web 32]
4.4 The proposed architecture for anxiety
states recognition
In this section, we detail the components of 4.6. The preprocessed EEG
signals are fed directly to the CNN-LSTM model. A convolutional block
composed of convolutional layer followed by a pooling layer. This block is
responsible for the spatial encoding of EEG time series. Actually, there is a
relation between channels. We did not perform a 1D convolution since the
kernel will be convoluted with each channel separately.
45. CHAPTER 4. EMOTION RECOGNITION USING RECURRENT CNN38
A 2D convolution is performed which allows to operate on channels to-
gether. The encoded CNN features are then fed to an LSTM layer for tem-
poral parsing. Each LSTM cell encoded in the time axis the CNN features
and forward then to the next cell. LSTM produce the last output activa-
tion and then it is classified with a softmax layer. Our architecture is a
spatio-temporal processing of EEG time series.
Figure 4.6: CNN-LSTM architecture for anxiety states recognition
4.5 Anxiety states recognition results
4.5.1 Experimental setup
We chose Colab because it allows anybody to write and execute arbitrary
python code through the browser, and is especially well suited to machine
learning and data analysis. The main advantages for using this environement
are: Free access to GPUs, zero configuration required, easy access and sharing
with other users.
In our model, four convolutional blocks as follows:
• Input layer, Conv2D layer, batch normalization, leaky ReLU activation
function, Max pooling 2D layer and dropout.
• Conv2D layer, batch normalization, leaky ReLU activation function,
Max pooling 2D layer
46. CHAPTER 4. EMOTION RECOGNITION USING RECURRENT CNN39
Figure 4.7: CNN LSTM architecture description
• Conv2D layer, batch normalization, leaky ReLU activation function,
Max pooling 2D layer
• Conv2D layer, batch normalization, leaky ReLU activation function,
Max pooling 2D layer
• Flattening layer, reshape, LSTM, dropout, first dense layer and second
dense layer (Output).
4.5.2 Classification results without data augmentation
To prove the usefulness of the data augmentation approach, we first classify
only original EEG signals.
The training and validation loss cure are depicted in Figure 4.8.
• The red curve refers to the training loss which is the error on the
training data set.
47. CHAPTER 4. EMOTION RECOGNITION USING RECURRENT CNN40
• The blue curve refers to the validation loss which is the error after
running the validation set of data through the trained network.
Figure 4.8: Training and validation loss
• If validation loss training loss we can call it overfitting.
• If validation loss training loss we can call it underfitting.
While there are some fluctuations in the validation curve, the training step
ends with a small gap between training loss and validation loss.
Figure 4.9 presents the training and validation accuracy curve. The train-
ing set is used to train the model, while the validation set is only used to
evaluate the model’s performance.
Training accuracy is the accuracy we get if we apply the model on the
training data, while validation or testing accuracy is the accuracy for unseen
data. We have validation accuracy less than training accuracy because EEG
training data is something with which the model is already familiar and
validation data is a collection of new data points which is new to the model.
The fluctuations in validation loss curve are also present in the accuracy
validation curve. But, it still remain the training and validation accuracy
curves are close to each other.
In the field of machine learning and specifically the problem of statistical
classification, a confusion matrix, also known as an error matrix, is a specific
table layout that allows visualization of the performance of an algorithm.
Each row of the matrix represents the instances in an actual class while
each column represents the instances in a predicted class, or vice versa –
48. CHAPTER 4. EMOTION RECOGNITION USING RECURRENT CNN41
Figure 4.9: Training and validation accuracy of CNN-LSTM on DASPS
dataset
both variants are found in the literature. The name stems from the fact
that it makes it easy to see whether the system is confusing two classes (i.e.
commonly mislabeling one as another). It is a special kind of contingency
table, with two dimensions (”actual” and ”predicted”), and identical sets of
”classes” in both dimensions (each combination of dimension and class is a
variable in the contingency table).
Figure 4.10: Confusion matrix of CNN-LSTM on DASPS dataset
When showing the test data to the trained model, the CNN-LSTM
achieves 89.96%. According to Figure 4.10, the model achieves its higher
49. CHAPTER 4. EMOTION RECOGNITION USING RECURRENT CNN42
accuracy on the normal anxiety state. The lowest accuracy is achieved with
light anxiety state. There is no confusion between severe trials and light
trials. The highest confusion is done between moderate trials with normal
trials.
4.5.3 Classification results with data augmentation
To begin this part, we should discuss DASPS (A Database for Anxious States
based on a Psychological Stimulation). DASPS is a database that comprises
EEG signals for detecting anxiety levels. The electroencephalogram (EEG)
signals of 23 subjects were captured during fear elicitation using face-to-
face psychological cues in this database. This work is innovative not only in
making EEG data available to the affective computing community, but also in
the design of a psychological stimulation protocol that provides comfortable
conditions for participants in direct interaction with the therapist, as well
as the use of a wireless EEG cap with fewer channels, namely only 14 dry
electrodes. The raw EEG data obtained from the 23 individuals is stored in
.edf files in the database. This included database contains raw data as well
as preprocessed data in .mat format.
The researchers offered a matlab script for segmenting each EEG signal
into six segments, one for each of the six scenarios.
Figure 4.11: Uploading data to GUI by therapist
50. CHAPTER 4. EMOTION RECOGNITION USING RECURRENT CNN43
4.6 Graphical user interface
A GUI (graphical user interface) is a system of interactive visual components
for computer software. A GUI displays objects that convey information, and
represent actions that can be taken by the user. The objects change color,
size, or visibility when the user interacts with them.
Figure 4.12: GUI showing EEG brain map in side a and emotion prediction
in side b
In Figure 4.11, the therapist can upload the recorded EEG signals (a
pre-processed trial saved as .mat file).
Then, the system will load the trial and plot the topographical map which
helps in visualization of activated brain regions as illustrated in side a Figure
4.12.
The last interface consists in classifying the anxiety state of the subject.
Actually, the trained CNN-LSTM model is saved. After that, we call it for
every new trial to classify. The anxiety state is depicted in side b of the
Figure 4.12.
51. CHAPTER 4. EMOTION RECOGNITION USING RECURRENT CNN44
4.7 The general methodology of our work
In the chart below, we summarized our framework in 3 basic steps:
1. The first step is about gathering EEG signals and collecting original
data set but the needed capacity of data is very little, for this reason
we proceed to second step: Generate new data.
2. The second step is about generating new data using the Conditional
Variational Autoencoder model.
3. The final step is about predicting the emotions using convolutional
neural networks along with Long-Short Term Memory cells.
Figure 4.13: Basic steps of our methodology
4.8 Project schedule
A Gantt chart is a type of bar chart that illustrates a project schedule,
named after its inventor, Henry Gantt (1861–1919), who designed such a
chart around the years 1910–1915. Modern Gantt charts also show the de-
pendency relationships between activities and the current schedule status.
Figure 4.14: GANTT chart
52. CHAPTER 4. EMOTION RECOGNITION USING RECURRENT CNN45
In our Gantt (see Figure 4.14) chart, we elaborated 7 basic tasks:
• Firstly, planning outlines for the whole internship with the professors.
• Secondly, beginning writing chapter 1: Introduction
• Thirdly, in parallel with chapter 1 we will begin practically our machine
learning project, find more details in the architecture of the project.
• Fourthly, starting writing chapter 2 in the beginning of April: Back-
ground and literature review of EEG-based emotion recognition.
• Fifthly, starting writing chapter 3 talking deeply about the used in-
ternal architectures: VAEs for generating new data and LSTM for
classifying the emotions.
• Sixthly, in last month of April, we will create the graphical user inter-
face using Tkinter.
• Sevenly and last, beginning writing the chapter 4 talking about results
and future optimization for this project.
4.9 Conclusion
In conclusion, for this chapter we have talking about convolutional neural
networks (Principle, application and architecture). Then, we have talking
about about recurrent neural networks as primary solution for predicting
emotions. In the next stage, we decided to change to LSTM technique that
is more suitable to our continuous signals such as (EEG, voice and text
translation). Finally, we have analysed our code on COLAB and interpret
the results and we have presented our graphical user interface for therapist.
53. Chapter 5
Conclusion and Future Work
This work ”Emotions prediction for augmented EEG signals using VAE and
Convolutional Neural Networks LSTM”, is based on two main ideas: gener-
ating new data and classifying emotions using old and new produced data. In
the first idea we have used conditional variational autoencoders to generate
new high quality EEG data and in the second part we have used convolu-
tional neural networks combined with LSTM technique to classify the output
level of anxiety (normal, light, moderate and severe). The code is writen us-
ing python and Google COLAB. Finally, we make this code into a graphical
user interface to use it by normal users and therapists.
This work has enabled us to know the patient’s feelings without relying
on many traditional questionnaires, surveys ans tests taking into account the
psychological and emotional state of the persons, especially for those who
suffer from psychological trauma, depression and negative feelings. It helps
both, the patient and the doctor by having a high accuracy in emotion recog-
nition and fast diagnosis. It allows to speed up the treatment process and not
cause embarrassment to the patient, taking into account the psychological
aspect of the children, the disabled, the deaf and the dumb whom find it dif-
ficult to express their emotions after a such negative event in his life, such as
the death of a loved one, divorce, school failure or family problems. It helps,
also, doctors in refugee camps in war zones to diagnose psychological con-
ditions as quickly as possible, especially for those who suffer severe trauma
after bloody events. This work is a an important step towards improving
mental health care on two levels: Diagnosis and treatment. It shortens time
and makes it easier to understand millions of patient’s situation around the
world with the least amount of documents and questionnaires.
This work can be improved and developed in coordination with official
authorities by cooperating with medicine universities, doctors’ clinics, the
ministry of health to improve the database and optimize other points by
interacting with psychiatrists, therapists and intervening ministry of woman
and children, sociologists, pediatricians and other organizations.
Following machine learning life cycle, this work can be improved in fu-
ture projects using a large big data from many hospitals, we can also use
Generative adversarial networks (GANs) for generating more high quality
46
54. CHAPTER 5. CONCLUSION AND FUTURE WORK 47
EEG signals which will improve the prediction results, using the new inputs
for new patients. In such application, it is a necessary step to improve our
algorithms to feed a better prediction results.
55. Bibliography
[Alexander et al., 2017] Alexander, L. M., Escalera, J., Ai, L., Andreotti,
C., Febre, K., Mangone, A., Vega-Potler, N., Langer, N., Alexander, A.,
Kovacs, M., et al. (2017). An open resource for transdiagnostic research in
pediatric mental health and learning disorders. Scientific data, 4:170181.
[Arsalan et al., 2019] Arsalan, A., Majid, M., Butt, A. R., and Anwar, S. M.
(2019). Classification of perceived mental stress using a commercially avail-
able eeg headband. IEEE journal of biomedical and health informatics,
23(6):2257–2264.
[Baghdadi and Aribi, 2019] Baghdadi, A. and Aribi, Y. (2019). Effectiveness
of dominance for anxiety vs anger detection. In 2019 Fifth International
Conference on Advances in Biomedical Engineering (ICABME), pages 1–4.
IEEE.
[Baghdadi et al., 2016] Baghdadi, A., Aribi, Y., and Alimi, A. M. (2016).
A survey of methods and performances for eeg-based emotion recognition.
In International Conference on Hybrid Intelligent Systems, pages 164–174.
Springer.
[Baghdadi et al., 2017] Baghdadi, A., Aribi, Y., and Alimi, A. M. (2017).
Efficient human stress detection system based on frontal alpha asymmetry.
In International Conference on Neural Information Processing, pages 858–
867. Springer.
[Baghdadi et al., 2020] Baghdadi, A., Aribi, Y., Fourati, R., Halouani, N.,
Siarry, P., and Alimi, A. (2020). Psychological stimulation for anxious
states detection based on eeg-related features. Journal of Ambient Intelli-
gence and Humanized Computing, pages 1–15.
[Bengio et al., 2013] Bengio, Y., Courville, A., and Vincent, P. (2013). Rep-
resentation learning: A review and new perspectives. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 35(8):1798–1828.
[Cong et al., 2015] Cong, F., Lin, Q.-H., Kuang, L.-D., Gong, X.-F.,
Astikainen, P., and Ristaniemi, T. (2015). Tensor decomposition of eeg
signals: a brief review. Journal of neuroscience methods, 248:59–69.
48
56. BIBLIOGRAPHY 49
[Fourati et al., 2020] Fourati, R., Ammar, B., Sanchez-Medina, J., and Al-
imi, A. M. (2020). Unsupervised learning in reservoir computing for eeg-
based emotion recognition. IEEE Transactions on Affective Computing,
to be published. doi:10.1109/TAFFC.2020.2982143.
[Freeman and Quiroga, 2012] Freeman, W. and Quiroga, R. Q. (2012). Imag-
ing brain function with EEG: advanced temporal and spatial analysis of
electroencephalographic signals. Springer Science Business Media.
[Garcı́a-Martı́nez et al., 2019] Garcı́a-Martı́nez, B., Martinez-Rodrigo, A.,
Alcaraz, R., and Fernández-Caballero, A. (2019). A review on nonlin-
ear methods using electroencephalographic recordings for emotion recog-
nition. IEEE Transactions on Affective Computing, to be published.
doi:10.1109/TAFFC.2018.2890636.
[Garcı́a-Martı́nez et al., 2017] Garcı́a-Martı́nez, B., Martı́nez-Rodrigo, A.,
Zangróniz, R., Pastor, J. M., and Alcaraz, R. (2017). Symbolic analy-
sis of brain dynamics detects negative stress. Entropy, 19(5):196.
[Jitendra Verma, 2020] Jitendra Verma, Sudip Paul, P. J. (2020). Computa-
tional Intelligence and Its Applications in Healthcare. “Elsevier”.
[Mahmud et al., 2018] Mahmud, M., Kaiser, M. S., Hussain, A., and Vas-
sanelli, S. (2018). Applications of deep learning and reinforcement learning
to biological data. IEEE Transactions on Neural Networks and Learning
Systems, 29(6):2063–2079.
[Movahedi et al., 2017] Movahedi, F., Coyle, J. L., and Sejdić, E. (2017).
Deep belief networks for electroencephalography: A review of recent con-
tributions and future outlooks. IEEE Journal of Biomedical and Health
Informatics, 22(3):642–652.
[NeuroSky, 2009] NeuroSky, I. (2009). Brain wave signal (eeg) of neurosky,
inc. Last accessed June 30, 2020.
[Niedermeyer and da Silva, 2005] Niedermeyer, E. and da Silva, F. L. (2005).
Electroencephalography: basic principles, clinical applications, and related
fields. Lippincott Williams Wilkins.
[Picard, 1997] Picard, R. (1997). Affective Computing. Inteligencia artificial.
“The” MIT Press.
[Randhavane, 2020] Randhavane, T. (2020). Identifying emotions from walk-
ing using affective and deep features. ARXIV, pages 1–15.
57. BIBLIOGRAPHY 50
[ScienceDaily, 2018] ScienceDaily (2018). University of toronto, mind-
reading algorithm uses eeg data to reconstruct images based on what we
perceive: New technique using eeg shows how our brains perceive faces.
Last accessed June 30, 2020.
[Steriade, 2005] Steriade, M. (2005). Cellular substrates of brain rhythms.
Electroencephalography: Basic principles, clinical applications, and related
fields, 5:31–83.
[Tsiouris et al., 2018] Tsiouris, K. M., Pezoulas, V. C., Zervakis, M., Konit-
siotis, S., Koutsouris, D. D., and Fotiadis, D. I. (2018). A long short-term
memory deep learning network for the prediction of epileptic seizures using
eeg signals. Computers in Biology and Medicine, 99:24–37.
[Zhang et al., 2020] Zhang, X., Pan, J., Shen, J., Din, Z. U., Li, J., Lu,
D., Wu, M., and Hu, B. (2020). Fusing of electroencephalogram and
eye movement with group sparse canonical correlation analysis for anxiety
detection. IEEE Transactions on Affective Computing, to be published.
doi:10.1109/TAFFC.2020.2981440.