1. 2022 International Conference for
Advancement in Technology
(ICONAT 2022)
An Acoustic and Statistic Study of Emotions Expressed
in Marathi Speech
Paper ID:433
By
TRUPTI K. HARHARE,
MILIND SHAH
University of Mumbai
3. 1. Introduction
Problem Definition
The COVID-19 pandemic has drastically altered people's lifestyles in
many parts of the world. The lockdowns and social distancing norms
eventually increased human-machine interaction applications.
Improvements are being made in speech recognition, speaker
recognition, and various human-system interaction technologies.
Recognizing emotion, on the alternative hand, is still under research
for building a prosody model.
Marathi language is poorly studied as far as emotions is concerned.
Need to analyze and compare the acoustic correlations of the prosody
features for various emotions in Marathi language.
In the Marathi language, this paper seeks to acoustically and
statistically evaluate acting speech for anger, happiness, fear, and
neutral emotions.
4. 1. Introduction
Objectives
To develop information processing tools and techniques to facilitate
human-machine interaction without language barrier (Technology
Development for Indian Languages (TDIL) Programme initiated by
the Ministry of Electronics & Information Technology, Govt. of
India.)
To create awareness and positive attitude towards Marathi language
through its visibility in the public domain.
Creating and accessing Marathi language knowledge resources; and
integrating them to develop innovative user products and services.
Analyse and compare the acoustic and statistic study of emotions
expressed in Marathi speech.
5. 2. Literature Survey
Speech : Segmental Information: Vowels, consonants[1-3].
Suprasegmental Information: Intonation, Rhythm, tone, stress.
Suprasegmental elements called as PROSODY features.
Prosody is characterized by emotions and speaking style.
Basic emotions like happy, sad, angry, fear and neutral readout style
emotions were considered [1-5].
The analysis of acoustic parameters like duration, Fundamental
Frequency/pitch, and intensity for the corresponding emotions were
carried out [6-11].
Statistical analysis [12-15] was performed after acoustic analysis to
corroborate the results of the acoustic analysis and then to pick the best
prosodic features to develop a prosody model.
6. 3.Methodology
•Collection of neutral meaning sentences from story books, novels etc.
•Recordings of the utterances in four emotion styles such as angry,
happy, fear and neutral.
•Perceptual verification
•Annotation of the recorded utterances in PRAAT
•Calculation of twelve prosodic features based on pitch, intensity,
duration, and formants for all the emotional utterances.
•Acoustic analysis and statistical analysis of the annotated data based
on the emotions.
7. 4.Implementation
•Objective: To determine the important prosodic features for emotion
classification for Marathi speech using acoustic and statistical hypothesis test.
•Implementation: The acoustic analysis carried out on twelve prosodic features.
•One way ANOVA is a hypothesis test, of the twelve prosodic features calculated
independently to determine whether the mean values of these prosodic features
differ significantly across the four emotions.
•The ANOVA analysis establishes a difference between two or more group
means, but it does not specify which groups are significantly different [20-23].
•The Tukey test which compares the means of the features pairwise to discover
whether a significant difference exists between each pair.
•Advantage: The emotion categorization capacity of the twelve prosodic features
is determined using the findings of ANOVA and the Tukey technique.
8. 5. Results and Discussion
•Pitch Related Features
•A. Acoustic Analysis:
B. ANOVA analysis:
Maximum pitch Minimum pitch Median pitch
F(3, 403)=8.402, p <0.001 F(3,403)= 21.54, p <0.001 F(3,403)= 65.07, p <0.001
The results of ANOVA analysis showed that all these pitch features are significant and can be used for
emotion classification in Marathi.
9. 5. Results and Discussion
•Intensity Related Features
A. Acoustic Analysis:
A. ANOVA analysis:
Maximum Intensity Minimum Intensity Mean Intensity
F(3,403)= 87.34, p <0.001 F(3,403)= 6.325, p <0.001 F(3,403)= 98.01, p <0.001
The results of ANOVA analysis showed that all these intensity features are significant and can
be used for emotion classification in Marathi.
10. 5. Results and Discussion
•Duration Related Features
Acoustic Analysis:
•
B. ANOVA analysis:
Number of syllables per second Number of voice breaks sentence duration
F(3,403)=31.8, p<0.001 F(3,403)= 19.14, p<0.001 F(3,403)= 24.35, p<0.001
The results of ANOVA analysis showed that all these duration features are
significant and can be used for emotion classification in Marathi.
11. 5. Results and Discussion
•Formant Related Features
A. Acoustic Analysis:
B. ANOVA analysis:
F1 F2 F3
F(3,403)=15.84, p <0.001 F(3,403)= 1.941 F(3,403)= 14.51, p <0.001
•The results of ANOVA analysis showed that F1 and F3 features are significant and can be used for
emotion classification in Marathi but as the formant F2 is non-significant and will not be much useful
feature for classifying the emotions.
12. Continued
Pitch Related Features
Features F value calculated
with one-way ANOVA
Analysis
Tukey Test for multiple comparisons of means (Confidence level: 95%)
(Significant codes: p<0.0001: ‘***’, p<0.001: ‘**’,
p <0.01: ‘*’, p<0.05: ‘.’, p< 0.1 ‘’)
Significant Non-significant
Maximum Pitch 8.402 Fear-Anger, p<0.0001
Happiness-Fear, p<0.0001
Neutral-Happiness, p<0.05
Neutral-Fear, p<0.05
Happiness-Anger
Neutral-Anger
Minimum Pitch 21.54 Neutral-Anger, p<0.0001
Neutral-Happiness, p<0.0001
Neutral-Fear, p<0.0001
Fear-Anger
Happiness-Anger
Happiness-Fear
Median Pitch 65.07 Fear-Anger, p<0.0001
Neutral-Anger, p<0.0001
Happiness-Fear, p<0.0001
Neutral-Happiness, p<0.0001
Happiness-Anger
Neutral-Fear
13. Continued
Formants Related Features
Features F value calculated
with one-way ANOVA
Analysis
Tukey Test for multiple comparisons of means
(Confidence level: 95%)
(Significant codes: p<0.0001: ‘***’, p<0.001: ‘**’, p <0.01:
‘*’, p<0.05: ‘.’, p< 0.1 ‘’)
Significant Non-significant
F1 15.84 Fear-Anger, p<0.0001
Happiness-Anger, p<0.0001
Neutral-Anger, p<0.0001
Happiness-Fear, p<0.05
Neutral-Fear
Neutral-Happiness
F2 1.941 Fear-Anger
Happiness-Anger
Neutral-Anger
Happiness-Fear
Neutral-Fear
Neutral-Happiness
F3 14.59 Fear-Anger,p<0.0001
Happiness-Fear, p<0.0001
Neutral-Fear, p <0.01
Neutral-Happiness, p<0.05
Happiness-Anger
Neutral-Anger
14. 6. Conclusion
This study presented the analysis of the Marathi emotional speech
database from an acoustic and statistical perspective.
A statistical study employing 1-way ANOVA indicated significant
changes in various prosodic features according to emotions.
ANOVA analysis showed that all the pitch attributes, duration
attributes, intensity-related attributes are significant for emotion
classification in Marathi. The formant F2 is a non-significant and not
useful feature for classifying the emotions in the Marathi language.
The Tuckey test revealed that even though the prosodic features are
significant in ANOVA analysis, all of the features are not distinct, but
there can be pairwise similarity in feature behavior for emotions.
This study demonstrated the usefulness of statistical tests to assess
the Marathi emotional speech database.
15. References
[1] T. Wani, T. Gunwan, S. Qadri, M. Kartiwi, “A Comprehensive Review of Speech Emotion Recognition Systems”, IEEEAccess, vol. 9, pp.
47795–47814, April 2021.
[2] P. Rao, N. Sanghvi, H. Mixdorff, K. Sabu, "Acoustic correlates of focus in Marathi: Production and perception", Journal of Phonetics, vol. 65,
pp. 110, 2017.
[3] J. Yadav and K. S. Rao. ”Generation of emotional speech by prosody imposition on sentence, word and syllable level fragments of neutral
speech”, in Proc. International Conference on Cognitive Computing and Information Processing (CCIP), March 3-4, pp. 1-5, 2015.
[4] M. C. Madhavi, S. Sharma and H. A. Patil, "Development of language resources for speech application in Gujarati and Marathi," International
Conference on Asian Language Processing (IALP), Kuching, pp. 115-118, 2014.
[5] X. Yang and Y. Yang, “Prosodic Realization of Rhetorical Structure in Chinese Discourse”, IEEE Transactions on Audio, Speech, and
Language Processing, vol. 20, no.4, May 2012.
[6] A. Agrawal, A. Dev, “Emotion recognition and conversion based on segmentation of speech in Hindi language”, IEEE International
Conference on Computing for Sustainable Global Development, New Delhi, India, 2015.
[7] N. Apandi, N. Jamil, “An analysis of Malay language emotional speech corpus for emotion recognition system”, Industrial Electronics and
Applications Conference(IEACon), IEEE, Kota Kinabalu, Malaysia, 2016.
[8] E. Väyrynen, “Emotion recognition from speech using prosodic features”, Academic dissertation, University of Oulu, Finland, 2014.
[9] T. Wang, Y. Lee, Q. Ma, “Within and Across-Language Comparison of Vocal Emotions in Mandarin and English”, Appl. Sci. vol.8, pp.2629;
December 2018.
[10] J. Tao, Y. Kang and A. Li. “Prosody conversion from neutral speech to emotional speech”, IEEE Trans. On Audio, Speech, and Language
Processing, 14(4), pp. 1145-1154, 2006.
[11] M. Begum, N. Raja, Ainon, R. Zainuddin, Z. M. Don, G. Knowles, “Prosody Generation by Integrating Rule and Template based Approaches
for Emotional Malay Speech Synthesis”, In Proc. TENCON, Hyderabad, India, pp. 1-6, Nov. 2008.
[12] Agnes Jacob, P.Mythili, “Upgrading the Performance of Speech Emotion Recognition at the Segmental Level”, IOSR Journal of Computer
Engineering (IOSR-JCE) Volume 15, Issue 3, pp. 48-52, 2013.
[13] T. Iliou,C.Anagnostopoulos, “Classification on Speech Emotion Recognition - A Comparative Study”, International Journal on
Advances in Life Sciences, vol 2 no 1 & 2, 2010.
[14] S. Ali, M. Andleeb, D. Rehman, “A Study of the Effect of Emotions and Software on Prosodic Features on Spoken Utterances in Urdu
Language”,
I.J. Image, Graphics and Signal Processing, vol. 4, pp.46-53,2016.
[15] M. Yusnita A, Paulraj M. P., S. Yaacobb , N. Fadzilah, Shahriman A. B.,“Acoustic Analysis of Formants across Genders and Ethnical Accents
in Malaysian English using ANOVA”, International Conference On Design and Manufacturing, vol.64, pp. 385–394, 2013.