SlideShare a Scribd company logo
1 of 6
Download to read offline
TELKOMNIKA, Vol.15, No.2, June 2017, pp. 665~670
ISSN: 1693-6930, accredited A by DIKTI, Decree No: 58/DIKTI/Kep/2013
DOI: 10.12928/TELKOMNIKA.v15i2.3965  665
Received January 10, 2017; Revised March 28, 2017; Accepted April 13, 2017
Predicting the Level of Emotion by Means of
Indonesian Speech Signal
Fergyanto E. Gunawan*
1
, Kanyadian Idananta
2
Binus Graduate Programs, Bina Nusantara University, Jakarta, Indonesia
Jl. Kebon Jeruk Raya No. 27, Jakarta 11530, Indonesia
*Corresponding author, e-mail: fgunawan@binus.edu
1
, kidananta@binus.edu
2
Abstract
Understanding human emotion is of importance for developing better and facilitating smooth
interpersonal relations. It becomes much more important because human thinking process and behavior
are strongly influenced by the emotion. Align with these needs, an expert system that capable of predicting
the emotion state would be useful for many practical applications. Based on a speech signal, the system
has been widely developed for various languages. This study intends to evaluate to which extent Mel-
Frequency Cepstral Coefficients (MFCC) features, besides Teager energy feature, derived from
Indonesian speech signal relates to four emotional types: happy, sad, angry, and fear. The study utilizes
empirical data of nearly 300 speech signals collected from four amateur actors and actresses speaking 15
prescribed Indonesian sentences. Using support vector machine classifier, the empirical findings suggest
that the Teager energy, as well as the first coefficient of MFCCs, are a crucial feature and the prediction
can achieve the accuracy level of 86%. The accuracy increases quickly with a few initial MFCC features.
The fourth and more features have negligible effects on the accuracy.
Keywords: Indonesia speech, mel frequency cepstral coefficient, teager energy, support vector machine
Copyright © 2017 Universitas Ahmad Dahlan. All rights reserved.
1. Introduction
This study focuses on establishing a quantitative relation between human emotion and
Indonesian speech signal. Such relation is very important considering that human thinking
process and behavior are strongly influenced by the emotion [1, 2]; thus, understanding human
emotion is important for developing better and facilitating smooth personal relations. Failure in
understanding human emotion may damage interpersonal relations and on extreme occasions,
it may have catastrophic consequences. For example, in the case of the accident of Korean Air
Flight 801 bound for Antonio B. Won Pat International Airport in Guam on August 6, 1997, the
failure in communicating the runway position and the proximity of the aircraft to the ground
between the aircraft pilot, the flight engineer, and the air traffic controller had resulted in
fatalities of 228 passengers and crews [3].
We adopt Frieda's [1] definition of the human emotion, which is an intense feeling such
as happy, angry, or afraid, directed toward a person or a thing. The feeling intensity is assumed
to be rather significant such that its affects the person speeches, body gestures, or facial
expressions [2].
Understanding the quantitative relation between the emotional state and a speech
signal, in particular, is a prerequisite to establishing an automatic emotion detection system. The
relation is clearly affected by the nature of the language and many studies have been previously
conducted for languages: English [4], German [5], Mandarin [6], Assamese [7], Persian [8], and
Danish [9].
The central issues of this type of study are two folds. The first is related to the method
to extract features from the speech signal that sensitive to the emotion level and type. The
second is related to feature classification method. Various classification and extraction methods
have been evaluated so far. Kandali et al. [7] used Gaussian Mixture Model (GMM) for
classifying emotion of Assam tribe in India where the speech features were obtained from Mel-
Frequency Cepstral Coefficient (MFCC). They found that the surprise emotion state was difficult
to be identified correctly. Instead of using speech signal, Chibelushi and Bourel [2] related the
emotion to the facial expression. Muda et al. [10] utilized the method of Dynamic Time Warping
 ISSN: 1693-6930
TELKOMNIKA Vol. 15, No. 2, June 2017 : 665 – 670
666
(DTW) and MFCC and concluded that these methods were effective for the emotion
classification. Magre and Deshmukh [11] evaluated the strengths and weaknesses of a few
feature extraction methods of speech signals; their conclusions are reproduced in Table 1.
The objective of this work is to study to what extent Indonesian speech signal can be
related to the emotion type, how much is the accuracy of the prediction, and what are the most
relevant features for the emotion prediction. For this purpose, the speech signal will be analyzed
using MFCC for their features and support vector machine will be used for classification. This
article is composed of four sections. Section 2 Research Methods will briefly discuss the MFCC,
the data acquisition method, and research procedure. Section 3 Results and Discussion will
present the most important and relevant findings. Finally, Section 4 Conclusion will provide a
brief summary of the discussed problem and findings.
Table 1. Advantages and disadvantages of Mel-Frequency Cepstral Coecient (MFCC),
Linear Predictive Coecient (LPC), and Dynamic Time Warping (DTW) in modeling speech
signal [11]
Method Advantages Disadvantages
MFCC As the frequency bands are positioned
logarithmically in MFCC, it approximates the
human system response more closely than
any other system.
MFCC values are not very robust in the
presence of additive noise, and so it is common
to normalize their values in speech recognition
systems to lessen the in influence of noise.
LPC Provides a good approximation within vocal
spectral envelope.
On certain condition, the noise may become
dominant.
DWT Reduced storing space for the reference
template. Increased recognition rate.
Difficult to find the best reference template for
certain words.
2. Research Methods
The research procedure is as the following. We collected the speech data in Indonesia
language. The speakers were two actors and two actresses. All of them were amateur and were
active members of Theater Student Activity of Bina Nusantara University in Jakarta, Indonesia.
They were asked to simulate four emotional states, namely, happy, sad, angry, and fear. The
speech scripts were prepared by the researchers containing 15 sentences in Indonesia
language, see Table 2. The speakers were asked to speech the sentences with the four
emotional states. Several speech data were obtained per person, per sentence, and per
emotional state. As the results, 291 speech signals in Indonesia language were collected. As
much as 66% of the dataset were utilized for training, and the remainder was for testing.
Table 2. The four speakers, two men and two women, were asked to speak the following
sentences in four emotional states: happy, sad, angry, and fear
No Sentences
1 Bukunya tadi aku taruh di meja. (English: I put the book on the table)
2 Menurutmu gimana?. (English; What do you think?)
3 Tadi mama telpon kamu. (English; Mom just called you)
4 5 jam lagi acara dimulai.(English; Within 5 hours it will be started)
5 Itu tas kenapa ditaruh disitu? (English; why bagis placed there?)
6 Sabtu ini aku mau pulang dan bertemu dengan Agnes. (English; I want to go home on
Saturday and meet Agnes)
7 Baru aja buku itu dibawa ke atas dan sekarang dibawa ke bawah lagi. (English; It has
just been taken up and now it is brought down again)
8 Aku disuruh tangkap capung. (English; I’m asked to catch a dragonfly)
9 Lisa mau ngumpulin berkasnya hari Rabu. (English; Lisa want to submit her stuff on
Wednesday)
10 Kamu suka (cinta) sama Budi?. (English; You love budi, don’t you?)
11 Aku bakal kasih tahu nanti malam. (English: I’ll tell you tonight)
12 Aku itu belum punya pacar. (English; I don’t have girl/boyfriend)
13 Aku sudah melakukannya. (English; I have done it)
14 Makanannya ada di kulkas. (English; the food is on fridge)
15 Ekky tadi kasih aku boneka. (English; Ekky gave me a doll)
TELKOMNIKA ISSN: 1693-6930 
Predicting the Level of Emotion by Means of Indonesian Speech… (Fergyanto E. Gunawan)
667
2.1. Mel-Frequency Cepstral Coefficient
The speech data were processed to provide Mel-Frequency Cepstral Coefficients
(MFCC) and Teager Energy (TE). MFCC is based on human hearing perceptions, which are
difficult to perceive frequencies over 1 kHz. MFCC is based on known variation of the human
ears critical bandwidth frequency [10]. In summary, the MFCC procedures are: apply the Fourier
transform to every segment of the speech signal, map the frequency into the Mel scale and
apply triangular overlapping windows on the powers of the spectra, apply logarithmic
transformation on the power spectra on each Mel frequency, and apply the discrete cosine
transform. The results are 12 MFCC coefficients on each segment.
The computation of the MFCC in detail is of the following [12, 13]. The pre-processing
proses is called pre-emphasis process where the speech signal is filtered with the equation,
𝑦(𝑛) = (𝑥) − 0.95 ⋅ 𝑥(𝑛 − 1) (1)
where 𝑛 is an integer index, 𝑥(𝑛) is the value of the speech signal at the discrete time-step 𝑛,
and 𝑦(𝑛) is the filtered signal. Then, the filtered signal is divided into several small frames where
the length of each frame is within the range of 20–40 ms. The voice signal is divided into frames
of 𝑁 samples. The adjacent frames are separated by 𝑀 where 𝑀 < 𝑁. Typically, 𝑀 = 100, and
𝑁 = 256. The signal on each frame is scaled with a window function. The most widely window
function is the Hamming window. The windowing process is mathematically written as:
𝑧(𝑛) = 𝑦(𝑛) ⋅ 𝑤(𝑛), and (2)
𝑤(𝑛) = 0.54 − 0.46 cos (
2𝜋𝑛
𝑁 − 1
), (3)
where 𝑦(𝑛) is a segment of signal prior windowing process, 𝑧(𝑛) is after windowing process,
and 𝑛 ∈ [0, 𝑁 − 1]. Then, Fourier transform is applied to each frame signal to transform the
time-domain signal to the frequency-domain signal:
𝐹(𝑓) = ∫ 𝑓(𝑡)𝑒−2𝜋𝑗𝑡𝑓
d𝑡,
+∞
−∞
(4)
where the signal in the time domain is denotes by 𝑓(𝑡) and in the frequency domain by 𝐹(𝑓).
The computation of Eq. (4) is performed numerically by the Fast Fourier Transform algorithm.
The obtained Fourier spectra are expressed in form of the power spectra. The next step is
transformation of the Fourier frequency 𝑓 to Mel-scale frequency by:
𝑀(𝑓) = 1125 ⋅ ln (1 +
𝑓
700
), (5)
and applying triangular overlapping windows. The resulted spectra are subjected to logarithmic
transformation. Finally, the discrete cosine transform is applied to the log power spectra.
2.2. Teager Energy
The Teager energy of a speech signal 𝑥(𝑡) is defined by [14]:
Ψ [𝑥(𝑡)] = 𝑥̇2(𝑡) − 𝑥(𝑡)𝑥̈( 𝑡), (6)
where 𝑥̇( 𝑡) is the first derivative and 𝑥̈( 𝑡) is the second derivative of 𝑥(𝑡).
2.3. Support Vector Machine
In the present study, we only use the Support Vector Machine (SVM) for linearly
separable data. The SVM is a numerical method to compute a hyperplane for separating a two-
class dataset. It can easily be extended to multiple-class problem. The SVM establishes the
hyperplane, governed by (𝐰, 𝑏), by using the support vectors, which are the data points that are
closest to the hyperplane. The following SVM formulation is derived from Refs. [15, 16]; readers
 ISSN: 1693-6930
TELKOMNIKA Vol. 15, No. 2, June 2017 : 665 – 670
668
are advised to the two sources for detail exposition. In addition, readers may also consult Refs.
[17, 18].
We consider the point sets 𝐱 𝑖 ∈ ℜ 𝑑
, as the support vectors, with the categories
𝑦𝑖 ∈ [−1, +1]. The hyperplane that separates 𝑦𝑖 = −1 from those of 𝑦𝑖 = +1 should satisfy
< 𝐰, 𝐱 > +𝑏 = 0, (7)
where 𝐰 ∈ ℜ 𝑑
, < 𝐰, 𝐱 >, denotes the inner dot product of 𝐰 and 𝐱, and 𝑏 is a scalar constant.
The hyperplane is obtained by solving:
min
𝑤,𝑏
𝐿 𝑝 =
1
2
< 𝐰, 𝐰 > − ∑ 𝛼𝑖[𝑦𝑖(< 𝐰, 𝐱 𝑖 > +𝑏) − 1]
𝑖
(8)
where 𝛼𝐼 ≥ 0.
2.4. Accuracy Indicator
The classification accuracy is computed by:
Accuracy =
TP + TN
TP + TN + FP + FN
(9)
where TP stands for True Positive, TN for True Negative, FP for False Positive, and FN for
False Negative.
3. Results and Discussion
Firstly, we discuss how the accuracy of the emotion-state classification changes as the
number of features is increased. We should note that in the MFCC analysis, we utilized 12 Mel
filter banks from the lowest frequency to the highest frequencies. The coefficients in each bank
are preserved in a cepstral vector. For each vector, we determine the minimum and maximum
coefficients, and compute its average and standard deviation. Those statistical characteristics
are used in the feature vector.
Table 3 shows the computed accuracies with the increasing of the number feature
vectors. The first row shows the accuracy level when only the Teager energy is used as the
feature. The second row shows for the case where the features involve the Teager energy and
the statistical characteristics of the MFCC in the first bank. The characteristics on the second
bank are added in the third row case. The analysis is repeated until all banks are taken into
account.
Table 3. The changes in the level of accuracy as the function of the contents of the
feature vector. The TE denotes the Teager energy, 𝐶𝑖 denotes the feature vector of the
i-bank of MFCC where the vector contents are
𝐶𝑖 = [min(𝑀𝐹𝐶𝐶𝑖), max(𝑀𝐹𝐶𝐶𝑖) , mean(𝑀𝐹𝐶𝐶𝑖), dev(𝑀𝐹𝐶𝐶𝑖) ] mean and 𝑖 ∈ [1,12] is
the bank number
Feature Vector Accuracy (%) Change in Accuracy (%)
[TE] 41.41 -
[TE, 𝐶1] 72.73 31.32
[TE, 𝐶1, 𝐶2] 77.78 5.05
[TE, 𝐶1, 𝐶2, 𝐶3] 85.86 8.08
[TE, 𝐶1, 𝐶2, …, 𝐶4] 85.86 0.0
[TE, 𝐶1, 𝐶2, …, 𝐶5] 86.87 1.01
[TE, 𝐶1, 𝐶2, …, 𝐶6] 85.86 -1.01
[TE, 𝐶1, 𝐶2, …, 𝐶7] 84.85 -1.01
[TE, 𝐶1, 𝐶2, …, 𝐶8] 86.87 2.02
[TE, 𝐶1, 𝐶2, …, 𝐶9] 87.88 1.01
[TE, 𝐶1, 𝐶2, …, 𝐶10] 86.87 -1.01
[TE, 𝐶1, 𝐶2, …, 𝐶11] 86.87 0.0
[TE, 𝐶1, 𝐶2, …, 𝐶12] 85.86 -1.01
TELKOMNIKA ISSN: 1693-6930 
Predicting the Level of Emotion by Means of Indonesian Speech… (Fergyanto E. Gunawan)
669
The results suggest that the most important feature is the Teager energy and then
followed by the statistical characteristics in the MFCC first bank. The Teager energy only is
capable to achieve 41.4% of the level of accuracy. The features from the rest MFCC bank is
able to increase the accuracy by 31%. The features from the third MFCC bank have a slightly
better quality to increase the accuracy in comparison to those features in the second MFCC
bank. The features from the fourth to the last banks can be ignored without sacrificing the
accuracy.
The confusion matrix for the case where the feature vector consisting [TE, C1, C2, …,
C12] is shown in Table 4. The table shows that a few cases of happy-emotion state were
detected as angry emotion. Meanwhile, the motion states of sad, angry, and fear were classified
with high accuracy.
Table 4. The Computed Confussion Matrix for the case with 49 Speech Features
Prediction
Happy Sad Angry Fear
Actual
Happy 34 0 6 1
Sad 0 22 0 1
Angry 2 0 15 1
Fear 2 1 0 14
4. Conclusion
Automatic emotion recognition on the basis of human speech signal is important as it
has many practical benefits. For the reason, many studies have been performed particularly
using the speech signals in English, German, Mandarin, Persian, and Danish languages. This
work has similar nature but different in the aspects of the language of interest and of the feature
vector. The study focuses on Indonesia language and the Teager energy is taken into account
as an important feature alongside the features of the statistical descriptive of MFCC. The
performed numerical trials suggest that the Teager energy indeed is an important feature as it
contributes to the classification accuracy by about 41%. In addition, some statistical descriptive
of MFCC associated with low frequencies power spectra are also crucial for accurate
classification. The final finding is that the happy-emotion state seems slightly difficult to be
differentiated from the angry-emotion state and the emotion states of angry, sad, and fear are
detectable from the speech signal at a rather high accuracy.
References
[1] KKN Frieda, Moods, Emotion Episodes, and Emotions, Guilford Press, New York. 1993.
[2] C Chibelushi, F Bourel. Facial expression recognition: A brief tutorial overview, retrieved on January
2015 (January 2003). URL http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_
COPIES/CHIBELUSHI1/CCC_FB_FacExprRecCVonline.pdf
[3] POM Gladwell. The Ethnic Theory of Plane CrashesCaptain the Weather Radar Has Helped Us a
Lot. In Success, Outliers: The Story of Success, Little, Brown and Company, New York, United
States of America. 2008.
[4] TM Galla, A Sapra, N Panwar, S Panwar. Emotion Recognition from Speech. International Journal of
Emerging Technology and advanced engineering. 2013; 3(2): 341-345.
[5] BV Sathe-Pathak, AR Panat. Extraction of Pitch and Formants and its Analysis to Identify Three
Different Emotional States of a Person. International Journal of Computer Science. 2012; 9(4): 296-
299.
[6] TL Pao, Y Chen, JH Yeh, J Lu, Detecting Emotions in Mandarin Speech. ROCLING. 2004: 365-373.
[7] AB Kandali, A Routray, TK Basu. Emotion Recognition from Assamese Speeches using MFCC
Features and GMM classifier. TENCON 2008-2008 IEEE Region 10 Conference, IEEE. 2008: 1-5.
[8] M Hamidi, M Mansoorizade. Emotion Recognition from Persian Speech with Neural Network,
International Journal of Artificial Intelligence & Applications. 2012; 3(5): 107.
[9] YL Lin, G Wei. Speech Emotion Recognition based on HMM and SVM, in: Machine Learning and
Cybernetics. Proceedings of 2005 International Conference on. IEEE. 2005; 8: 4898-4901.
[10] L Muda, M Begam, I Elamvazuthi. Voice Recognition Algorithms using Mel Frequency Cepstral
Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques, arXiv preprint arXiv: 1003.4083.
URL http://arxiv.org/abs/1003.4083
 ISSN: 1693-6930
TELKOMNIKA Vol. 15, No. 2, June 2017 : 665 – 670
670
[11] SB Magre, RR Deshmukh. A Review on Feature Extraction and Noise Reduction Technique.
International Journal of Advanced Research in Computer Science and Software Engineering. 2014;
4(2): 352-356.
[12] SK Kopparapu, M Laxminarayana. Choice of Mel Filter Bank in Computing MFCC of a Resampled
Speech. Information Sciences Signal Processing and their Applications (ISSPA), 2010 10th
International Conference on, IEEE. 2010: 121-124.
[13] J Kaur, A Sharma. Emotion Detection Independent of User Using MFCC Feature Extraction.
International Journal of Advanced Research in Computer Science and Software Engineering. 2014;
4(6): 230-234.
[14] E Kvedalen. Signal Processing Using the Teager Energy Operator and other Nonlinear Operators,
Master Thesis, Department of Informatics, University of Oslo. 2003.
[15] N Christianni, J Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based
Learning Methods, Cambridge University Press, Cambridge, UK. 2000.
[16] T Hastie, R Tibshirani, J Friedman. The Elements of Statistical Learning, Springer, New York. 2008.
[17] AB Mutiara, R Refianti, NRA Mukarromah. Musical Genre Classification Using SVM and Audio
Features. TELKOMNIKA Telecommunication, Computing, Electronics and Control. 2016; 14(3).
[18] M Mulyati, WA Kusuma, M Nurilmala. Identification of Tuna and Mackerel based on DNA Barcodes
using Support Vector Machine. TELKOMNIKA Telecommunication, Computing, Electronics and
Control. 2016: 14(2).

More Related Content

What's hot

Understanding and estimation of
Understanding and estimation ofUnderstanding and estimation of
Understanding and estimation ofijnlc
 
Emotional telugu speech signals classification based on k nn classifier
Emotional telugu speech signals classification based on k nn classifierEmotional telugu speech signals classification based on k nn classifier
Emotional telugu speech signals classification based on k nn classifiereSAT Publishing House
 
ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model
ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM ModelASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model
ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Modelsipij
 
IEEE ICASSP 2021
IEEE ICASSP 2021IEEE ICASSP 2021
IEEE ICASSP 2021KunZhou18
 
Oral Qualification Examination_Kun_Zhou
Oral Qualification Examination_Kun_ZhouOral Qualification Examination_Kun_Zhou
Oral Qualification Examination_Kun_ZhouKunZhou18
 
VAW-GAN for disentanglement and recomposition of emotional elements in speech
VAW-GAN for disentanglement and recomposition of emotional elements in speechVAW-GAN for disentanglement and recomposition of emotional elements in speech
VAW-GAN for disentanglement and recomposition of emotional elements in speechKunZhou18
 
What can GAN and GMMN do for augmented speech communication?
What can GAN and GMMN do for augmented speech communication? What can GAN and GMMN do for augmented speech communication?
What can GAN and GMMN do for augmented speech communication? Shinnosuke Takamichi
 
LANGUAGE ASSIMILATION THROUGH TYPE-2 FUZZY GRAMMARS: AN APPLICATION IN COMPUT...
LANGUAGE ASSIMILATION THROUGH TYPE-2 FUZZY GRAMMARS: AN APPLICATION IN COMPUT...LANGUAGE ASSIMILATION THROUGH TYPE-2 FUZZY GRAMMARS: AN APPLICATION IN COMPUT...
LANGUAGE ASSIMILATION THROUGH TYPE-2 FUZZY GRAMMARS: AN APPLICATION IN COMPUT...paperpublications3
 
IRJET- Spoken Language Identification System using MFCC Features and Gaus...
IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...
IRJET- Spoken Language Identification System using MFCC Features and Gaus...IRJET Journal
 
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS cscpconf
 
Mfcc based enlargement of the training set for emotion recognition in speech
Mfcc based enlargement of the training set for emotion recognition in speechMfcc based enlargement of the training set for emotion recognition in speech
Mfcc based enlargement of the training set for emotion recognition in speechsipij
 
A study of gender specific pitch variation pattern of emotion expression for ...
A study of gender specific pitch variation pattern of emotion expression for ...A study of gender specific pitch variation pattern of emotion expression for ...
A study of gender specific pitch variation pattern of emotion expression for ...IAEME Publication
 
Creation of speech corpus for emotion analysis in Gujarati language and its e...
Creation of speech corpus for emotion analysis in Gujarati language and its e...Creation of speech corpus for emotion analysis in Gujarati language and its e...
Creation of speech corpus for emotion analysis in Gujarati language and its e...IJECEIAES
 
Classification of Machine Translation Outputs Using NB Classifier and SVM for...
Classification of Machine Translation Outputs Using NB Classifier and SVM for...Classification of Machine Translation Outputs Using NB Classifier and SVM for...
Classification of Machine Translation Outputs Using NB Classifier and SVM for...mlaij
 

What's hot (16)

Understanding and estimation of
Understanding and estimation ofUnderstanding and estimation of
Understanding and estimation of
 
Emotional telugu speech signals classification based on k nn classifier
Emotional telugu speech signals classification based on k nn classifierEmotional telugu speech signals classification based on k nn classifier
Emotional telugu speech signals classification based on k nn classifier
 
ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model
ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM ModelASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model
ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model
 
IEEE ICASSP 2021
IEEE ICASSP 2021IEEE ICASSP 2021
IEEE ICASSP 2021
 
Oral Qualification Examination_Kun_Zhou
Oral Qualification Examination_Kun_ZhouOral Qualification Examination_Kun_Zhou
Oral Qualification Examination_Kun_Zhou
 
VAW-GAN for disentanglement and recomposition of emotional elements in speech
VAW-GAN for disentanglement and recomposition of emotional elements in speechVAW-GAN for disentanglement and recomposition of emotional elements in speech
VAW-GAN for disentanglement and recomposition of emotional elements in speech
 
What can GAN and GMMN do for augmented speech communication?
What can GAN and GMMN do for augmented speech communication? What can GAN and GMMN do for augmented speech communication?
What can GAN and GMMN do for augmented speech communication?
 
200502
200502200502
200502
 
LANGUAGE ASSIMILATION THROUGH TYPE-2 FUZZY GRAMMARS: AN APPLICATION IN COMPUT...
LANGUAGE ASSIMILATION THROUGH TYPE-2 FUZZY GRAMMARS: AN APPLICATION IN COMPUT...LANGUAGE ASSIMILATION THROUGH TYPE-2 FUZZY GRAMMARS: AN APPLICATION IN COMPUT...
LANGUAGE ASSIMILATION THROUGH TYPE-2 FUZZY GRAMMARS: AN APPLICATION IN COMPUT...
 
IRJET- Spoken Language Identification System using MFCC Features and Gaus...
IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...
IRJET- Spoken Language Identification System using MFCC Features and Gaus...
 
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
 
Mfcc based enlargement of the training set for emotion recognition in speech
Mfcc based enlargement of the training set for emotion recognition in speechMfcc based enlargement of the training set for emotion recognition in speech
Mfcc based enlargement of the training set for emotion recognition in speech
 
Using interval type-2 fuzzy logic to analyze Igbo emotion words
Using interval type-2 fuzzy logic to analyze Igbo emotion wordsUsing interval type-2 fuzzy logic to analyze Igbo emotion words
Using interval type-2 fuzzy logic to analyze Igbo emotion words
 
A study of gender specific pitch variation pattern of emotion expression for ...
A study of gender specific pitch variation pattern of emotion expression for ...A study of gender specific pitch variation pattern of emotion expression for ...
A study of gender specific pitch variation pattern of emotion expression for ...
 
Creation of speech corpus for emotion analysis in Gujarati language and its e...
Creation of speech corpus for emotion analysis in Gujarati language and its e...Creation of speech corpus for emotion analysis in Gujarati language and its e...
Creation of speech corpus for emotion analysis in Gujarati language and its e...
 
Classification of Machine Translation Outputs Using NB Classifier and SVM for...
Classification of Machine Translation Outputs Using NB Classifier and SVM for...Classification of Machine Translation Outputs Using NB Classifier and SVM for...
Classification of Machine Translation Outputs Using NB Classifier and SVM for...
 

Similar to Predicting the Level of Emotion by Means of Indonesian Speech Signal

BASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECH
BASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECHBASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECH
BASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECHIJCSEA Journal
 
Emotion recognition based on the energy distribution of plosive syllables
Emotion recognition based on the energy distribution of plosive  syllablesEmotion recognition based on the energy distribution of plosive  syllables
Emotion recognition based on the energy distribution of plosive syllablesIJECEIAES
 
Literature Review On: ”Speech Emotion Recognition Using Deep Neural Network”
Literature Review On: ”Speech Emotion Recognition Using Deep Neural Network”Literature Review On: ”Speech Emotion Recognition Using Deep Neural Network”
Literature Review On: ”Speech Emotion Recognition Using Deep Neural Network”IRJET Journal
 
Signal & Image Processing : An International Journal
Signal & Image Processing : An International Journal Signal & Image Processing : An International Journal
Signal & Image Processing : An International Journal sipij
 
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundWavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundTELKOMNIKA JOURNAL
 
An Approach of Human Emotional States Classification and Modeling from EEG
An Approach of Human Emotional States Classification and Modeling from EEGAn Approach of Human Emotional States Classification and Modeling from EEG
An Approach of Human Emotional States Classification and Modeling from EEGCSCJournals
 
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficientMalayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficientijait
 
Emotion Detection from Voice Based Classified Frame-Energy Signal Using K-Mea...
Emotion Detection from Voice Based Classified Frame-Energy Signal Using K-Mea...Emotion Detection from Voice Based Classified Frame-Energy Signal Using K-Mea...
Emotion Detection from Voice Based Classified Frame-Energy Signal Using K-Mea...ijseajournal
 
fpsyg-08-01454 August 22, 2017 Time 1725 # 1REVIEWpubl
fpsyg-08-01454 August 22, 2017 Time 1725 # 1REVIEWpublfpsyg-08-01454 August 22, 2017 Time 1725 # 1REVIEWpubl
fpsyg-08-01454 August 22, 2017 Time 1725 # 1REVIEWpublJeanmarieColbert3
 
Sentiment analysis by deep learning approaches
Sentiment analysis by deep learning approachesSentiment analysis by deep learning approaches
Sentiment analysis by deep learning approachesTELKOMNIKA JOURNAL
 
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...mathsjournal
 
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...mathsjournal
 
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...mathsjournal
 
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...mathsjournal
 
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
 
5215ijcseit01
5215ijcseit015215ijcseit01
5215ijcseit01ijcsit
 

Similar to Predicting the Level of Emotion by Means of Indonesian Speech Signal (20)

BASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECH
BASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECHBASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECH
BASIC ANALYSIS ON PROSODIC FEATURES IN EMOTIONAL SPEECH
 
Emotion recognition based on the energy distribution of plosive syllables
Emotion recognition based on the energy distribution of plosive  syllablesEmotion recognition based on the energy distribution of plosive  syllables
Emotion recognition based on the energy distribution of plosive syllables
 
Literature Review On: ”Speech Emotion Recognition Using Deep Neural Network”
Literature Review On: ”Speech Emotion Recognition Using Deep Neural Network”Literature Review On: ”Speech Emotion Recognition Using Deep Neural Network”
Literature Review On: ”Speech Emotion Recognition Using Deep Neural Network”
 
H010215561
H010215561H010215561
H010215561
 
Signal & Image Processing : An International Journal
Signal & Image Processing : An International Journal Signal & Image Processing : An International Journal
Signal & Image Processing : An International Journal
 
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundWavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
 
An Approach of Human Emotional States Classification and Modeling from EEG
An Approach of Human Emotional States Classification and Modeling from EEGAn Approach of Human Emotional States Classification and Modeling from EEG
An Approach of Human Emotional States Classification and Modeling from EEG
 
50120130406003
5012013040600350120130406003
50120130406003
 
50120130406003
5012013040600350120130406003
50120130406003
 
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficientMalayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
 
Emotion Detection from Voice Based Classified Frame-Energy Signal Using K-Mea...
Emotion Detection from Voice Based Classified Frame-Energy Signal Using K-Mea...Emotion Detection from Voice Based Classified Frame-Energy Signal Using K-Mea...
Emotion Detection from Voice Based Classified Frame-Energy Signal Using K-Mea...
 
fpsyg-08-01454 August 22, 2017 Time 1725 # 1REVIEWpubl
fpsyg-08-01454 August 22, 2017 Time 1725 # 1REVIEWpublfpsyg-08-01454 August 22, 2017 Time 1725 # 1REVIEWpubl
fpsyg-08-01454 August 22, 2017 Time 1725 # 1REVIEWpubl
 
Sentiment analysis by deep learning approaches
Sentiment analysis by deep learning approachesSentiment analysis by deep learning approaches
Sentiment analysis by deep learning approaches
 
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
 
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
 
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
 
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
Speech Emotion Recognition by Using Combinations of Support Vector Machine (S...
 
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
5215ijcseit01
5215ijcseit015215ijcseit01
5215ijcseit01
 

More from TELKOMNIKA JOURNAL

Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...TELKOMNIKA JOURNAL
 
Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...TELKOMNIKA JOURNAL
 
Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...TELKOMNIKA JOURNAL
 
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...TELKOMNIKA JOURNAL
 
Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...TELKOMNIKA JOURNAL
 
Efficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antennaEfficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antennaTELKOMNIKA JOURNAL
 
Design and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fireDesign and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fireTELKOMNIKA JOURNAL
 
Wavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio networkWavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio networkTELKOMNIKA JOURNAL
 
A novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bandsA novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bandsTELKOMNIKA JOURNAL
 
Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...TELKOMNIKA JOURNAL
 
Brief note on match and miss-match uncertainties
Brief note on match and miss-match uncertaintiesBrief note on match and miss-match uncertainties
Brief note on match and miss-match uncertaintiesTELKOMNIKA JOURNAL
 
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...TELKOMNIKA JOURNAL
 
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemEvaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemTELKOMNIKA JOURNAL
 
Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...TELKOMNIKA JOURNAL
 
Reagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensorReagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensorTELKOMNIKA JOURNAL
 
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...TELKOMNIKA JOURNAL
 
A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...TELKOMNIKA JOURNAL
 
Electroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networksElectroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networksTELKOMNIKA JOURNAL
 
Adaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imagingAdaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imagingTELKOMNIKA JOURNAL
 
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...TELKOMNIKA JOURNAL
 

More from TELKOMNIKA JOURNAL (20)

Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...
 
Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...
 
Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...
 
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
 
Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...
 
Efficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antennaEfficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antenna
 
Design and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fireDesign and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fire
 
Wavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio networkWavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio network
 
A novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bandsA novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bands
 
Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...
 
Brief note on match and miss-match uncertainties
Brief note on match and miss-match uncertaintiesBrief note on match and miss-match uncertainties
Brief note on match and miss-match uncertainties
 
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
 
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemEvaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
 
Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...
 
Reagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensorReagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensor
 
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
 
A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...
 
Electroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networksElectroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networks
 
Adaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imagingAdaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imaging
 
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
 

Recently uploaded

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...Call Girls in Nagpur High Profile
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 

Recently uploaded (20)

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 

Predicting the Level of Emotion by Means of Indonesian Speech Signal

  • 1. TELKOMNIKA, Vol.15, No.2, June 2017, pp. 665~670 ISSN: 1693-6930, accredited A by DIKTI, Decree No: 58/DIKTI/Kep/2013 DOI: 10.12928/TELKOMNIKA.v15i2.3965  665 Received January 10, 2017; Revised March 28, 2017; Accepted April 13, 2017 Predicting the Level of Emotion by Means of Indonesian Speech Signal Fergyanto E. Gunawan* 1 , Kanyadian Idananta 2 Binus Graduate Programs, Bina Nusantara University, Jakarta, Indonesia Jl. Kebon Jeruk Raya No. 27, Jakarta 11530, Indonesia *Corresponding author, e-mail: fgunawan@binus.edu 1 , kidananta@binus.edu 2 Abstract Understanding human emotion is of importance for developing better and facilitating smooth interpersonal relations. It becomes much more important because human thinking process and behavior are strongly influenced by the emotion. Align with these needs, an expert system that capable of predicting the emotion state would be useful for many practical applications. Based on a speech signal, the system has been widely developed for various languages. This study intends to evaluate to which extent Mel- Frequency Cepstral Coefficients (MFCC) features, besides Teager energy feature, derived from Indonesian speech signal relates to four emotional types: happy, sad, angry, and fear. The study utilizes empirical data of nearly 300 speech signals collected from four amateur actors and actresses speaking 15 prescribed Indonesian sentences. Using support vector machine classifier, the empirical findings suggest that the Teager energy, as well as the first coefficient of MFCCs, are a crucial feature and the prediction can achieve the accuracy level of 86%. The accuracy increases quickly with a few initial MFCC features. The fourth and more features have negligible effects on the accuracy. Keywords: Indonesia speech, mel frequency cepstral coefficient, teager energy, support vector machine Copyright © 2017 Universitas Ahmad Dahlan. All rights reserved. 1. Introduction This study focuses on establishing a quantitative relation between human emotion and Indonesian speech signal. Such relation is very important considering that human thinking process and behavior are strongly influenced by the emotion [1, 2]; thus, understanding human emotion is important for developing better and facilitating smooth personal relations. Failure in understanding human emotion may damage interpersonal relations and on extreme occasions, it may have catastrophic consequences. For example, in the case of the accident of Korean Air Flight 801 bound for Antonio B. Won Pat International Airport in Guam on August 6, 1997, the failure in communicating the runway position and the proximity of the aircraft to the ground between the aircraft pilot, the flight engineer, and the air traffic controller had resulted in fatalities of 228 passengers and crews [3]. We adopt Frieda's [1] definition of the human emotion, which is an intense feeling such as happy, angry, or afraid, directed toward a person or a thing. The feeling intensity is assumed to be rather significant such that its affects the person speeches, body gestures, or facial expressions [2]. Understanding the quantitative relation between the emotional state and a speech signal, in particular, is a prerequisite to establishing an automatic emotion detection system. The relation is clearly affected by the nature of the language and many studies have been previously conducted for languages: English [4], German [5], Mandarin [6], Assamese [7], Persian [8], and Danish [9]. The central issues of this type of study are two folds. The first is related to the method to extract features from the speech signal that sensitive to the emotion level and type. The second is related to feature classification method. Various classification and extraction methods have been evaluated so far. Kandali et al. [7] used Gaussian Mixture Model (GMM) for classifying emotion of Assam tribe in India where the speech features were obtained from Mel- Frequency Cepstral Coefficient (MFCC). They found that the surprise emotion state was difficult to be identified correctly. Instead of using speech signal, Chibelushi and Bourel [2] related the emotion to the facial expression. Muda et al. [10] utilized the method of Dynamic Time Warping
  • 2.  ISSN: 1693-6930 TELKOMNIKA Vol. 15, No. 2, June 2017 : 665 – 670 666 (DTW) and MFCC and concluded that these methods were effective for the emotion classification. Magre and Deshmukh [11] evaluated the strengths and weaknesses of a few feature extraction methods of speech signals; their conclusions are reproduced in Table 1. The objective of this work is to study to what extent Indonesian speech signal can be related to the emotion type, how much is the accuracy of the prediction, and what are the most relevant features for the emotion prediction. For this purpose, the speech signal will be analyzed using MFCC for their features and support vector machine will be used for classification. This article is composed of four sections. Section 2 Research Methods will briefly discuss the MFCC, the data acquisition method, and research procedure. Section 3 Results and Discussion will present the most important and relevant findings. Finally, Section 4 Conclusion will provide a brief summary of the discussed problem and findings. Table 1. Advantages and disadvantages of Mel-Frequency Cepstral Coecient (MFCC), Linear Predictive Coecient (LPC), and Dynamic Time Warping (DTW) in modeling speech signal [11] Method Advantages Disadvantages MFCC As the frequency bands are positioned logarithmically in MFCC, it approximates the human system response more closely than any other system. MFCC values are not very robust in the presence of additive noise, and so it is common to normalize their values in speech recognition systems to lessen the in influence of noise. LPC Provides a good approximation within vocal spectral envelope. On certain condition, the noise may become dominant. DWT Reduced storing space for the reference template. Increased recognition rate. Difficult to find the best reference template for certain words. 2. Research Methods The research procedure is as the following. We collected the speech data in Indonesia language. The speakers were two actors and two actresses. All of them were amateur and were active members of Theater Student Activity of Bina Nusantara University in Jakarta, Indonesia. They were asked to simulate four emotional states, namely, happy, sad, angry, and fear. The speech scripts were prepared by the researchers containing 15 sentences in Indonesia language, see Table 2. The speakers were asked to speech the sentences with the four emotional states. Several speech data were obtained per person, per sentence, and per emotional state. As the results, 291 speech signals in Indonesia language were collected. As much as 66% of the dataset were utilized for training, and the remainder was for testing. Table 2. The four speakers, two men and two women, were asked to speak the following sentences in four emotional states: happy, sad, angry, and fear No Sentences 1 Bukunya tadi aku taruh di meja. (English: I put the book on the table) 2 Menurutmu gimana?. (English; What do you think?) 3 Tadi mama telpon kamu. (English; Mom just called you) 4 5 jam lagi acara dimulai.(English; Within 5 hours it will be started) 5 Itu tas kenapa ditaruh disitu? (English; why bagis placed there?) 6 Sabtu ini aku mau pulang dan bertemu dengan Agnes. (English; I want to go home on Saturday and meet Agnes) 7 Baru aja buku itu dibawa ke atas dan sekarang dibawa ke bawah lagi. (English; It has just been taken up and now it is brought down again) 8 Aku disuruh tangkap capung. (English; I’m asked to catch a dragonfly) 9 Lisa mau ngumpulin berkasnya hari Rabu. (English; Lisa want to submit her stuff on Wednesday) 10 Kamu suka (cinta) sama Budi?. (English; You love budi, don’t you?) 11 Aku bakal kasih tahu nanti malam. (English: I’ll tell you tonight) 12 Aku itu belum punya pacar. (English; I don’t have girl/boyfriend) 13 Aku sudah melakukannya. (English; I have done it) 14 Makanannya ada di kulkas. (English; the food is on fridge) 15 Ekky tadi kasih aku boneka. (English; Ekky gave me a doll)
  • 3. TELKOMNIKA ISSN: 1693-6930  Predicting the Level of Emotion by Means of Indonesian Speech… (Fergyanto E. Gunawan) 667 2.1. Mel-Frequency Cepstral Coefficient The speech data were processed to provide Mel-Frequency Cepstral Coefficients (MFCC) and Teager Energy (TE). MFCC is based on human hearing perceptions, which are difficult to perceive frequencies over 1 kHz. MFCC is based on known variation of the human ears critical bandwidth frequency [10]. In summary, the MFCC procedures are: apply the Fourier transform to every segment of the speech signal, map the frequency into the Mel scale and apply triangular overlapping windows on the powers of the spectra, apply logarithmic transformation on the power spectra on each Mel frequency, and apply the discrete cosine transform. The results are 12 MFCC coefficients on each segment. The computation of the MFCC in detail is of the following [12, 13]. The pre-processing proses is called pre-emphasis process where the speech signal is filtered with the equation, 𝑦(𝑛) = (𝑥) − 0.95 ⋅ 𝑥(𝑛 − 1) (1) where 𝑛 is an integer index, 𝑥(𝑛) is the value of the speech signal at the discrete time-step 𝑛, and 𝑦(𝑛) is the filtered signal. Then, the filtered signal is divided into several small frames where the length of each frame is within the range of 20–40 ms. The voice signal is divided into frames of 𝑁 samples. The adjacent frames are separated by 𝑀 where 𝑀 < 𝑁. Typically, 𝑀 = 100, and 𝑁 = 256. The signal on each frame is scaled with a window function. The most widely window function is the Hamming window. The windowing process is mathematically written as: 𝑧(𝑛) = 𝑦(𝑛) ⋅ 𝑤(𝑛), and (2) 𝑤(𝑛) = 0.54 − 0.46 cos ( 2𝜋𝑛 𝑁 − 1 ), (3) where 𝑦(𝑛) is a segment of signal prior windowing process, 𝑧(𝑛) is after windowing process, and 𝑛 ∈ [0, 𝑁 − 1]. Then, Fourier transform is applied to each frame signal to transform the time-domain signal to the frequency-domain signal: 𝐹(𝑓) = ∫ 𝑓(𝑡)𝑒−2𝜋𝑗𝑡𝑓 d𝑡, +∞ −∞ (4) where the signal in the time domain is denotes by 𝑓(𝑡) and in the frequency domain by 𝐹(𝑓). The computation of Eq. (4) is performed numerically by the Fast Fourier Transform algorithm. The obtained Fourier spectra are expressed in form of the power spectra. The next step is transformation of the Fourier frequency 𝑓 to Mel-scale frequency by: 𝑀(𝑓) = 1125 ⋅ ln (1 + 𝑓 700 ), (5) and applying triangular overlapping windows. The resulted spectra are subjected to logarithmic transformation. Finally, the discrete cosine transform is applied to the log power spectra. 2.2. Teager Energy The Teager energy of a speech signal 𝑥(𝑡) is defined by [14]: Ψ [𝑥(𝑡)] = 𝑥̇2(𝑡) − 𝑥(𝑡)𝑥̈( 𝑡), (6) where 𝑥̇( 𝑡) is the first derivative and 𝑥̈( 𝑡) is the second derivative of 𝑥(𝑡). 2.3. Support Vector Machine In the present study, we only use the Support Vector Machine (SVM) for linearly separable data. The SVM is a numerical method to compute a hyperplane for separating a two- class dataset. It can easily be extended to multiple-class problem. The SVM establishes the hyperplane, governed by (𝐰, 𝑏), by using the support vectors, which are the data points that are closest to the hyperplane. The following SVM formulation is derived from Refs. [15, 16]; readers
  • 4.  ISSN: 1693-6930 TELKOMNIKA Vol. 15, No. 2, June 2017 : 665 – 670 668 are advised to the two sources for detail exposition. In addition, readers may also consult Refs. [17, 18]. We consider the point sets 𝐱 𝑖 ∈ ℜ 𝑑 , as the support vectors, with the categories 𝑦𝑖 ∈ [−1, +1]. The hyperplane that separates 𝑦𝑖 = −1 from those of 𝑦𝑖 = +1 should satisfy < 𝐰, 𝐱 > +𝑏 = 0, (7) where 𝐰 ∈ ℜ 𝑑 , < 𝐰, 𝐱 >, denotes the inner dot product of 𝐰 and 𝐱, and 𝑏 is a scalar constant. The hyperplane is obtained by solving: min 𝑤,𝑏 𝐿 𝑝 = 1 2 < 𝐰, 𝐰 > − ∑ 𝛼𝑖[𝑦𝑖(< 𝐰, 𝐱 𝑖 > +𝑏) − 1] 𝑖 (8) where 𝛼𝐼 ≥ 0. 2.4. Accuracy Indicator The classification accuracy is computed by: Accuracy = TP + TN TP + TN + FP + FN (9) where TP stands for True Positive, TN for True Negative, FP for False Positive, and FN for False Negative. 3. Results and Discussion Firstly, we discuss how the accuracy of the emotion-state classification changes as the number of features is increased. We should note that in the MFCC analysis, we utilized 12 Mel filter banks from the lowest frequency to the highest frequencies. The coefficients in each bank are preserved in a cepstral vector. For each vector, we determine the minimum and maximum coefficients, and compute its average and standard deviation. Those statistical characteristics are used in the feature vector. Table 3 shows the computed accuracies with the increasing of the number feature vectors. The first row shows the accuracy level when only the Teager energy is used as the feature. The second row shows for the case where the features involve the Teager energy and the statistical characteristics of the MFCC in the first bank. The characteristics on the second bank are added in the third row case. The analysis is repeated until all banks are taken into account. Table 3. The changes in the level of accuracy as the function of the contents of the feature vector. The TE denotes the Teager energy, 𝐶𝑖 denotes the feature vector of the i-bank of MFCC where the vector contents are 𝐶𝑖 = [min(𝑀𝐹𝐶𝐶𝑖), max(𝑀𝐹𝐶𝐶𝑖) , mean(𝑀𝐹𝐶𝐶𝑖), dev(𝑀𝐹𝐶𝐶𝑖) ] mean and 𝑖 ∈ [1,12] is the bank number Feature Vector Accuracy (%) Change in Accuracy (%) [TE] 41.41 - [TE, 𝐶1] 72.73 31.32 [TE, 𝐶1, 𝐶2] 77.78 5.05 [TE, 𝐶1, 𝐶2, 𝐶3] 85.86 8.08 [TE, 𝐶1, 𝐶2, …, 𝐶4] 85.86 0.0 [TE, 𝐶1, 𝐶2, …, 𝐶5] 86.87 1.01 [TE, 𝐶1, 𝐶2, …, 𝐶6] 85.86 -1.01 [TE, 𝐶1, 𝐶2, …, 𝐶7] 84.85 -1.01 [TE, 𝐶1, 𝐶2, …, 𝐶8] 86.87 2.02 [TE, 𝐶1, 𝐶2, …, 𝐶9] 87.88 1.01 [TE, 𝐶1, 𝐶2, …, 𝐶10] 86.87 -1.01 [TE, 𝐶1, 𝐶2, …, 𝐶11] 86.87 0.0 [TE, 𝐶1, 𝐶2, …, 𝐶12] 85.86 -1.01
  • 5. TELKOMNIKA ISSN: 1693-6930  Predicting the Level of Emotion by Means of Indonesian Speech… (Fergyanto E. Gunawan) 669 The results suggest that the most important feature is the Teager energy and then followed by the statistical characteristics in the MFCC first bank. The Teager energy only is capable to achieve 41.4% of the level of accuracy. The features from the rest MFCC bank is able to increase the accuracy by 31%. The features from the third MFCC bank have a slightly better quality to increase the accuracy in comparison to those features in the second MFCC bank. The features from the fourth to the last banks can be ignored without sacrificing the accuracy. The confusion matrix for the case where the feature vector consisting [TE, C1, C2, …, C12] is shown in Table 4. The table shows that a few cases of happy-emotion state were detected as angry emotion. Meanwhile, the motion states of sad, angry, and fear were classified with high accuracy. Table 4. The Computed Confussion Matrix for the case with 49 Speech Features Prediction Happy Sad Angry Fear Actual Happy 34 0 6 1 Sad 0 22 0 1 Angry 2 0 15 1 Fear 2 1 0 14 4. Conclusion Automatic emotion recognition on the basis of human speech signal is important as it has many practical benefits. For the reason, many studies have been performed particularly using the speech signals in English, German, Mandarin, Persian, and Danish languages. This work has similar nature but different in the aspects of the language of interest and of the feature vector. The study focuses on Indonesia language and the Teager energy is taken into account as an important feature alongside the features of the statistical descriptive of MFCC. The performed numerical trials suggest that the Teager energy indeed is an important feature as it contributes to the classification accuracy by about 41%. In addition, some statistical descriptive of MFCC associated with low frequencies power spectra are also crucial for accurate classification. The final finding is that the happy-emotion state seems slightly difficult to be differentiated from the angry-emotion state and the emotion states of angry, sad, and fear are detectable from the speech signal at a rather high accuracy. References [1] KKN Frieda, Moods, Emotion Episodes, and Emotions, Guilford Press, New York. 1993. [2] C Chibelushi, F Bourel. Facial expression recognition: A brief tutorial overview, retrieved on January 2015 (January 2003). URL http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_ COPIES/CHIBELUSHI1/CCC_FB_FacExprRecCVonline.pdf [3] POM Gladwell. The Ethnic Theory of Plane CrashesCaptain the Weather Radar Has Helped Us a Lot. In Success, Outliers: The Story of Success, Little, Brown and Company, New York, United States of America. 2008. [4] TM Galla, A Sapra, N Panwar, S Panwar. Emotion Recognition from Speech. International Journal of Emerging Technology and advanced engineering. 2013; 3(2): 341-345. [5] BV Sathe-Pathak, AR Panat. Extraction of Pitch and Formants and its Analysis to Identify Three Different Emotional States of a Person. International Journal of Computer Science. 2012; 9(4): 296- 299. [6] TL Pao, Y Chen, JH Yeh, J Lu, Detecting Emotions in Mandarin Speech. ROCLING. 2004: 365-373. [7] AB Kandali, A Routray, TK Basu. Emotion Recognition from Assamese Speeches using MFCC Features and GMM classifier. TENCON 2008-2008 IEEE Region 10 Conference, IEEE. 2008: 1-5. [8] M Hamidi, M Mansoorizade. Emotion Recognition from Persian Speech with Neural Network, International Journal of Artificial Intelligence & Applications. 2012; 3(5): 107. [9] YL Lin, G Wei. Speech Emotion Recognition based on HMM and SVM, in: Machine Learning and Cybernetics. Proceedings of 2005 International Conference on. IEEE. 2005; 8: 4898-4901. [10] L Muda, M Begam, I Elamvazuthi. Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques, arXiv preprint arXiv: 1003.4083. URL http://arxiv.org/abs/1003.4083
  • 6.  ISSN: 1693-6930 TELKOMNIKA Vol. 15, No. 2, June 2017 : 665 – 670 670 [11] SB Magre, RR Deshmukh. A Review on Feature Extraction and Noise Reduction Technique. International Journal of Advanced Research in Computer Science and Software Engineering. 2014; 4(2): 352-356. [12] SK Kopparapu, M Laxminarayana. Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech. Information Sciences Signal Processing and their Applications (ISSPA), 2010 10th International Conference on, IEEE. 2010: 121-124. [13] J Kaur, A Sharma. Emotion Detection Independent of User Using MFCC Feature Extraction. International Journal of Advanced Research in Computer Science and Software Engineering. 2014; 4(6): 230-234. [14] E Kvedalen. Signal Processing Using the Teager Energy Operator and other Nonlinear Operators, Master Thesis, Department of Informatics, University of Oslo. 2003. [15] N Christianni, J Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, Cambridge, UK. 2000. [16] T Hastie, R Tibshirani, J Friedman. The Elements of Statistical Learning, Springer, New York. 2008. [17] AB Mutiara, R Refianti, NRA Mukarromah. Musical Genre Classification Using SVM and Audio Features. TELKOMNIKA Telecommunication, Computing, Electronics and Control. 2016; 14(3). [18] M Mulyati, WA Kusuma, M Nurilmala. Identification of Tuna and Mackerel based on DNA Barcodes using Support Vector Machine. TELKOMNIKA Telecommunication, Computing, Electronics and Control. 2016: 14(2).