Hci with emotional intelligence

H C I
1
Course
“Human Computer Interaction”
Project Type
“Research Paper”
Project Title
“HCI with the Help of Emotional Intelligence”

H C I
2
Table of Content
1. Introduction
 Abstract
 Concept
2. Body
 Description
 PresentWorkingStatus
3. Conclusion
 FutureExpectation
 References

H C I
3
1. Introduction
 Abstract
Emotion identification is beginning to be considered as an essential feature in human-
computer interaction. However, most of the studies are mainly focused on facial expression
classifications and speech recognition and not much attention has been paid until recently to
physiological pattern recognition. The traditional view of emotions as contributing to
irrational and unpredictable behavior has changed. Recent research has suggested that
emotions play an essential role in important areas such as learning, memory, motivation,
attention, creativity, and decision making. These findings have prompted many research
groups around the world to start examining the role of emotions and emotional intelligence in
human-computer interaction (HCI). This paper focused on what is done today and brings out
what are the most important features to consider for emotion-sensitive HCI.
 Concept
To have an intelligent HCI system that responds appropriately to the user’s affective
feedback, the first step is that the system must be able to detect and interpret the user’s
emotional states automatically. The visual channel (e.g., facial expression) and the auditory
(e.g. vocal caps) like vocal reactions are the most important features in the human recognition
of affective feedback. But other elements need to be considered, for example body
movements and the physiological reactions. When a person judges the emotional state of
someone else, it relies mainly on his facial and vocal expressions. However, some emotions
are harder to differentiate than others and need to consider other types of signals such as
gestures, posture or physiological signals. A lot of research have been done in the field of face

H C I
4
and gesture recognition, especially to recognize facial expressions. Facial expressions be
communicative signals or can be considered as being expression of emotions. And they can
be associated with basic emotions like happiness, surprise, fear, anger, disgust or sadness.
Another tool to detect emotions is the emotional speech recognition. In the voice, several
factors can vary depending on emotions, such as pitch, loudness, voice quality and rhythm. In
case of human-computer interaction, we can detect emotions by monitoring the nervous
system because for some feelings, physiological signs are very marked. We can measure the
blood pressure, the skin conductivity, and the rate of breathing or the finger temperature.
Changing in physiological signals means a change in the user’s behavior. Unfortunately,
physiological signals play a secondary role in human recognition of affective states; these
signals are neglected because to detect someone’s clamminess or heart rate, we should be in a
physical contact with the person. The analysis of the tactile channel is harder because, the
person must be wired to collect data and it’s usually perceived as being uncomfortable and
unpleasant. Several techniques are available to capture this physiological sign.
Electromyogram (EMG) for evaluating and recording the electrical activity produced by
muscles, Electrocardiogram to measure the activity of the heart with electrodes attached to the
skin or skin conductance sensors can be used. All these methods to collect data are very
useful but they do not seem always easy to use. Firstly, available technologies are restrictive
and some parameters must absolutely be considered to have valid data. Moreover, we need to
remove the noise of collected data like unwanted noise, image against the light or physical
characteristics such as beard, glasses, hat, etc. Secondly, to collect physiological signals, we
must use some intrusive methods like electrodes, chest trap or wearable computer.

H C I
5
2. Body
 Description
The human voice and associated speech patterns can be characterized by a number of
attributes, the primary ones being pitch, loudness or sound pressure, timbre, and tone.
Pitch is an auditory sensation in which a listener assigns musical tones to relative positions on
A musical scale based primarily on their perception of the frequency of vibration. Pitch can be
Quantified as a frequency, but it is based on the subjective perception of a sound wave. Sound
Oscillations can be measured to obtain a frequency in hertz or cycles per second. The pitch is
independent of the intensity or amplitude of the sound wave. A high- pitched sound indicates
rapid oscillations, whereas, a low-pitched sound corresponds to slower oscillations. Pitch of
complex sounds such as speech and musical notes corresponds to the repetition rate
Of periodic or nearly-periodic sounds, or the reciprocal of the timeinterval between similar
Repeating events in the sound waveform.
Loudness is a subjective perception of sound pressure and can be defined as the attribute of
Auditory sensation, in terms of which, sounds can be ordered on a scale ranging from quiet to
loud. Sound pressure is the local pressure deviation from the ambient, average, or equilibrium
atmospheric pressure, caused by a sound wave. Sound pressure level (SPL) is a logarithmic
measure of the effective pressure of a sound relative to a reference value and is often
measured in units of decibel (dB). The lower limit of audibility is defined as SPL of 0
DB, but the upper limit isnot as clearly defined.

H C I
6
Timbre is the perceived sound quality of a musical note, sound or tone. Timbre distinguishes
different types of sound production and enables listeners to distinguish different instruments
in the same category. The physical characteristics of sound that determine the perception of
timbre include spectrum and envelope. In simple terms, timbre is what makes a particular
sound be perceived differently from another sound, even when they have the same pitch and
loudness.
Tone is the use of pitch in language to distinguish lexical or grammatical meaning that is, to
Distinguish or to inflect words. All verbal languages use pitch to express emotional and other
Paralinguistic information and to convey emphasis, contrast, and other such features.
In order to achieve this objective, three test cases have been examined, corresponding to the
three state angry emotional, normal emotional states and panicked emotional state. For
carrying out the analysis, four vocal parameters have been taken into consideration: pitch,
SPL, timbre, and time gaps between consecutive words of speech
The proposed analysis was carried out withthe help of software packages such as MATLAB
and Wave pad.
Case 1: (Normal emotional state)
This test case involves statistics for pitch, SPL, timbre, and word timinggaps derived from
speech samples that were orated while the speaker was in a relaxed and normal emotional
state. This test case serves as the basis for the remaining two test cases. All the parameter
statistics indicate mean values derived from the speech samples. As shown in Table I, for the
purpose of demonstration, statistics for twospeech samples have been analyzed

H C I
7
Case 2: Angry emotional state
This test case involves statistics for pitch, SPL, timbre, and word-timing gaps derived from
speech samples that were orated while the speaker was in an agitated emotional state,
typically characterized by increased vocal
Loudness and pitch. All the parameter statistics indicate mean values derived from the speech
samples, as shown in Table II. The same speech samples that were earlier used in Case 1 have
been used in Case 2, but witha different
Intonation typical of an agitated or angry emotional state.

H C I
8
Case 3: Panicked emotional state
This test case involves statistics for pitch, SPL, timbre, and word-timing gaps derived from
speech samples that were orated while the speaker was in a panicked or overwhelmed
emotional state. Speech samples that was earlier
used in Case 1 have been used in Case 3, but with a different intonation typical of a panicked
emotional state, as shown in Table III.
Working Prototype:

H C I
9
The general architecture for SER system has three steps shown in Fig. 1 [6]:
I. A speech processing system extracts some appropriate quantities from signal, such as pitch
or energy,
ii.These quantities are summarized intoreduced set of features,
iii.A classifier learns in a supervised manner with example data how to associate the features
to the Emotions.
FEATURE EXTRACTION
Feature extraction is based on partitioning speech into small intervals known as frames. To
select suitable Features which are carrying information about emotions from speech signal are
an important step in SER system. There are two types of features: prosodic features including
energy, pitch and spectral features including MFCC,MEDC and LPCC.

H C I
10
a. Energy and related features
Energy is the basic and most important feature in speech signal. To obtain the statistics of
energy feature, we Use short-term function to extract the value of energy in each speech
frame. Then we can obtain the statistics of Energy in the whole speech sample by calculating
the energy, such as mean value, max value, variance, variation Range, contour of energy.
b. Pitch and related features
The vibration rate of vocal is called the fundamental frequency F0 or pitch frequency. The
pitch signal has Information about emotion, because it depends on the tension of the vocal
folds and the sub glottal air pressure, so the mean value of pitch, variance, variation range and
the contour is different in seven basic emotional Statuses. The following statistics are
calculated from the pitch and used in pitch feature vector:
· Mean, Median, Variance, Maximum, and Minimum (for the pitch feature vector and its
derivative)
· Average energies of voiced and unvoiced speech
· Speaking rate (inverse of the average length of the voiced part of utterance).
c. MFCC and MEDCfeatures
Mel-Frequency Cepstrum coefficients is the most important feature of speech withsimple
calculation, good Abilityof distinction, anti-noise. MFCC in the low frequency region has a
good frequency resolution, and the Robustness to noise is also very good. MEDC extraction
process is similar withMFCC. The only one difference in extraction process is that the
MEDC is taking logarithmic mean of energies after Mel Filter bank and Frequency wrapping,
while the MFCC Is takes logarithmic after Mel Filter bank and Frequency wrapping. After
that, we also compute 1st and 2nd
Difference about this feature.

H C I
11
d. Linear Prediction Cepstrum Coefficients
LPCC embodies the characteristics of particular channel of speech, and the same person with
different Emotional speech will have different channel characteristics, so we can extract these
feature coefficients to identify the emotions contained in speech. The computational method
of LPCC is usually a recurrence of computing the linear prediction coefficients (LPC).
CLASSIFIERS
For each extracted features of emotional speech classification algorithm is applied on
different set of inputs. Different classifiers are discussed below:
A. Support Vector Machine (SVM)
SVM, a binary classifier is a simple and efficient computation of machine learning
algorithms, and is widely Used for pattern recognition and classification problems, and under
the conditions of limitedtraining data, it can have a very good classification performance
compared to other classifiers. The idea behind the SVM is to transform the original input set
to a high dimensional feature space by using Kernel function. Therefore non-linear problems
can be solved by doing this transformation.
B. Hidden Markov Model (HMM)
The HMM consist of the first order markov chain whose states are hidden from the observer
therefore the Internal behavior of the model remains hidden. The hidden states of the model
capture the temporal structure of the data. Hidden Markov Models are statistical models that
describe the sequences of events. HMM is having the advantage that the temporal dynamics
of the speech features can be trapped due to the presence of the state Transition matrix.
During clustering, a speech signal is taken and the probability for each speech signal provided

H C I
12
To the model is calculated. An output of the classifier is based on the maximum probability
that the model has been generated this signal. For the emotion recognition using HMM, first
the database is sort out according to the mode of classification and then the features from
input waveform are extracted. These features are then added to database. The Transition
matrix and emission matrix has been made according to the modes, which generates the
random Sequence of states and emissions from the model. Final is estimating the state
sequence probability by using Viterbi algorithm.
C. K Nearest Neighbor (KNN)
A more general version of the nearest neighbor technique bases the classification of an
unknown sample on the “votes” of K of its nearest neighbor rather than on only it’s on single
nearest neighbor. Among the various Methods of supervised statistical pattern recognition, the
Nearest Neighbor is the most traditional one, it does not consider a priori assumptions about
the distributions from which the training examples are drawn. It involves a training set of all
cases. A new sample is classified by calculating the distance to the nearest training Case, the
sign of that point then determines the classification of the sample. Larger K values help
reduce the Effects of noisy points withinthe training data set, and the choice of K is often
performed through cross Validation.
D. AdaBoost Algorithm
AdaBoost algorithm is an adaptive classifier which iteratively builds a strong classifier from a
weak classifier. In each iteration, the weak classifier is used to classify the data points of
training data set. Initiallyall the data Points are given equal weights, but after each iteration,
the weight of incorrectly classified data point’s increases So that the classifier in next iteration

H C I
13
focuses more on them. This results in decrease of the global error of the Classifier and hence
builds a stronger classifier. AdaBoost algorithm is also used as a feature selector for training
SVMs.
 Today(PresentWorking)
Empath is an emotion recognition program developed by Smartmedical Corp. Our original
algorithm identifies your emotion by analyzing physical properties of your voice. Based on
tens of thousands voice samples, empath detects your anger, joy, sadness, calmness, and
vigor. We provide Empath Web API for developers. By just adding sample codes to your
website, you can integrate your apps with vocal emotion recognition technology. If you have
a program that sends WAVE files to the API, you can integrate the API to various platforms
such as Windows, iOS, and Android OS. It’s also very easy to use the API as M2M and IoT
sensor.
The Vokaturi software reflects the state of the art in emotion recognition from the human
voice. Its algorithms have been designed, and are continually improved, by Paul Boersma,
professor of Phonetic Sciences at the University of Amsterdam, who is the main author of the
world’s leading speech analysis software Praat. Vokaturi can measure directly from your

H C I
14
voice whether you are happy, sad, afraid, angry, or have a neutral state of mind. Currently the
open-source version of the software chooses between these five emotions with high accuracy,
even if it hears the speaker for the first time. The "plus" version of the software reaches the
performance level of a dedicated human listener.
EmoVoice is a comprehensive framework for real-time recognition of emotions from acoustic
properties of speech (not using word information). Linking output to other applications is easy
and thus allows the implementation of prototypes of affective interfaces.
3. Conclusion

H C I
15
 FutureExpectation
Our Expectations for the future in this area of research is an Emotionally Intelligent Software
which can run in computers, laptops, tablets as well as mobiles. We expect when human runs
it after this software has a capability to detect human present emotions whether he is sad,
happy or angry etc. After evaluating its emotional state, it subsequently starts communication
with human according to its emotions. It will give reply to human at real time according to the
human response; it will give suggestions, advices and its feedback to the human. This
software will use internet for this purpose if needed like a student installed this software in its
mobile and runs it then software analyzes its emotional state which is he is sad after that it
asked to human why he is sad then he reply to software its marks are low and he don’t know
in which university he should apply. Now the role of software comes it will search university
for him who suited best for him and tell the student if he likes it then done if not then he again
replies and software work according to it. This is our expectation from this technology.
Prototype:
Facial expression presents key mechanism to describe human emotion. From starting to end
of the day human changes plenty of emotions, it may be because of their mental or physical
circumstances. Although humans are filled with various emotions, modern psychology
defines six basic facial expressions: Happiness, Sadness, Surprise, Fear, Disgust, and
Anger as universal emotions. Facial muscles movements help to identify human emotions.
Basic facial features are eyebrow, mouth, nose & eyes.

H C I
16
Universal Emotion Identification
Emotion Definition Motion of facial part
Anger Anger is one of the most dangerous
emotions. This emotion may be
harmful so, humans are trying to
avoid this emotion. Secondary
emotions of anger are irritation,
annoyance, frustration, hate and
dislike.
Eyebrows pulled down, Open eye, teeth shut
and lips tightened, upper and lower lids
pulled up.
Fear Fear is the emotion of danger. It may
be because of danger of physical or
psychological harm. Secondary
emotions of fear are Horror,
nervousness, panic, worry and dread.
Outer eyebrow down, inner eyebrow up,
mouth open, jaw dropped
Happiness Happiness is most desired expression
by human. Secondary emotions are
cheerfulness, pride, relief, hope,
pleasure, and thrill.
Open Eyes, mouth edge up, open mouth, lip
corner pulled up, cheeks raised, and wrinkles
around eyes.
Sadness Sadness is opposite emotion of
Happiness. Secondary emotions are
suffering, hurt, despair, pity and
hopelessness.
Outer eyebrow down, inner corner of
eyebrows raised, mouth edge down, closed
eye, lip corner pulled down.
Surprise This emotion comes when
unexpected things happens.
Secondary emotions of surprise are
amazement, astonishment.
Eyebrows up, open eye, mouth open, jaw
dropped
Disgust Disgust is a feeling of dislike. Human
may feel disgust from any taste,
smell, sound or tough.
Lip corner depressor, nose wrinkle, lower lip
depressor, Eyebrows pulled down
Face Analysis
Input

H C I
17
Facial Feature Tracking System (Sky Biometry, Face++, Emotion etc)
Eyes Eyebrow Furrow Lips
Feature Vector Extraction Emotion
Output (Happy, Anger or Sad)

H C I
19
 References
https://diuf.unifr.ch/main/diva/sites/diuf.unifr.ch.main.diva/files/T6.pdf
https://link.springer.com/chapter/10.1007/978-3-540-78293-3_5
https://link.springer.com/chapter/10.1007/978-3-642-02580-8_62
http://sail.usc.edu/publications/files/Busso_2004.pdf
https://arxiv.org/pdf/1801.00451.pdf
http://www.ijcsmc.com/docs/papers/April2013/V2I4201323.pdf

Hci with emotional intelligence

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hci with emotional intelligence

Similar to Hci with emotional intelligence (20)

More from -

More from - (14)

Recently uploaded

Recently uploaded (20)

Hci with emotional intelligence