A study of gender specific pitch variation pattern of emotion expression for hindi speech

International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 8, August (2014), pp. 47-55 © IAEME
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH
IN ENGINEERING AND TECHNOLOGY (IJARET)
ISSN 0976 - 6480 (Print)
ISSN 0976 - 6499 (Online)
Volume 5, Issue 8, August (2014), pp. 47-55
© IAEME: http://www.iaeme.com/IJARET.asp
Journal Impact Factor (2014): 7.8273 (Calculated by GISI)
www.jifactor.com
47

IJARET
© I A E M E
A STUDY OF GENDER SPECIFIC PITCH VARIATION PATTERN OF
EMOTION EXPRESSION FOR HINDI SPEECH
Sushma Bahuguna1, Y. P. Raiwani2
1Research Scholar, Singhaniya University, Pechri, Rajasthan, India
2Department of Computer Science Engineering, HNB Garhwal University, Uttarakhand, India
ABSTRACT
The present study is an attempt to analyze the pitch variation pattern of Hindi emotional
voice samples in five different emotional states. Sample sentences of male and female speakers were
recorded in different emotions i.e. Neutral, Angry, Happiness, Sadness and Surprise. Emotion
validation of sample sentences was done using machine recognizer and listening test. Pitch variation
analysis of the statistics derived from the pitch contour on selected sample sentences was performed
using PRAAT and MATLAB and results computed as mean, standard deviation, maximum and
minimum pitch, pitch range, and peak parameters show a strong influence of pitch variation on the
emotional state of the speakers. The analysis includes discriminating capability of pitch features in
different emotions when gender information is taken into consideration.
Keywords: Emotion Expression, Emotion Identification, Pitch Contours, Pitch Features.
1. INTRODUCTION
Pitch is defined as the fundamental frequency of the excitation source. The pitch signal
depends on the tension of the vocal folds and the sub glottal air pressure during speech generation
process. The pitch contour contains information that characterizes the emotion being expressed by
speaker, and consequently parameters extracted from pitch form an essential part of many automatic
emotion identification system. Currently emotion identification is being computed with statistical
information from acoustic features at sentence level such as, range, mean, maximum and minimum
variance of F0 and energy parameters. Lots of research has been carried out to examine pitch
characteristics for emotion detection in speech. [1] studied pitch contour of vocal expression and [2]
concluded the importance of fine structure of pitch contour in emotional cue. Pitch shape of
expressive speech are analyzed by [3]. [4] explained most important variation in pitch as pitch mean
and range. [5] compared linguistic and paralinguistic features of pitch and also explained pitch shape

association of linguistics structure of speech. [6, 7] have described the role of pitch contour patterns
in emotional speech expression. [8] have worked on the role of pitch contours in differentiating anger
and joy in speech. Out of the basic emotions, on anger specifically much acoustic research has been
done. [9] explained the effects of angry emotional tone, with only a word long content and found the
fundamental frequency (F0) contour appears to remain steady or fall slightly and the mean duration
is shorter for an 'angry' word. [10] Corroborate findings of [9] and [11] and found these variables
more relevant to arousal than to valence. [12] explained that anger's degree of pleasantness is very
open to individual experiences and found that hot anger exhibited very tense activation with narrow
hedonic valence and extremely full power. In terms of acoustic variables these characteristics
translated much greater F0 variability, F0 range and high-frequency energy. [13] also found
conspicuous lack of interactions as the main effects of these variables had indicating that they are all
relevant to affective expression. [14] explained that emotion is encoded and decoded with a high
degree of agreement across cultures. [15] made an analysis of the statistics derived from the pitch
contour. In our earlier study we had experimented acoustic features pattern using pitch, intensity and
formants parameters [16]. [17] had tried algorithm that allow robot to express its emotions by
modulating the intonation of its voice. Study on ensemble methods for spoken emotion recognition
in call centers was carried out by [18]. [19] had explained the effect of emotional state upon the
variation of fundamental frequency. [20] had experimented speech emotion recognition accuracy
using HMM Models.
48

In most of the studies the pitch characteristics of human voices have been measured
exclusively to either male or female voices. Moreover knowledge about pitch behavior in distinctive
speech styles is rather scarce [21]. [22] explained that the fundamental frequency of women varied
within a frequency range of about two octaves while the men often limited their variability to one
octave. [23] found that female speakers made use of distinctly larger fundamental frequency
variability compared to male speakers. [24] measured fundamental frequencies and ranges (in
semitones) of male and female speakers while expressing a number of specific emotions. [25] found
differences in intonation contours between men and women. [26] have analyzed emotion speech for
English and Slovenian interface emotional speech database and also made a comparison of emotional
features between English male and female speaker. The present paper is concerned with the pitch
variation pattern of male and female speakers expressing different emotions in Hindi speech
sentences. For our study native Hindi speakers (2 males and 2 females) were chosen to record sample
sentences. 10 Sentences used in daily conversation were recorded by each speaker in five basic
emotions and were passed through recognition stage by human and machine recognizer using MFCC
and VQ techniques. Finally the sentences correctly recognized by listening test and machine
recognizer were chosen to calculate pitch variation pattern in different emotions using PRAAT
software package and MATLAB.
2. ANALYSIS AND RESULTS
PRAAT software was used to extract the pitch values form the recorded sample sentences
and the extracted values were used as input in MATLAB to evaluate pitch range, mean pitch,
standard deviation, minimum pitch, maximum pitch and pitch peaks as a measures of pitch variation
pattern for all of the sound files. The Pitch features are statistical properties of pitch contours. In
present study Pitch contours are derived by applying the above described method to analyze the
statistics of pitch parameters of various sample sentences of emotion expression. The pitch contours
are an important cue for the perceived emotion of a speech sample. The frequency contour and
waveform illustrations from expressive speech sentences reveal distinctive contour pattern for
different emotions. The results of one of the sample sentences “Aaj Office nahi jana hai” in five
emotions by male and female speakers are shown in Figures 1-10 and Tables 1-10.

49

Max. Pitch Peaks
Min. Pitch Peaks
Time(Sec)
Max.
PK(Hz)
Time(Sec)
Min.
PK(Hz)
0.0274 266.8637
0.0474 240.8800
0.1724 297.1195
0.1924 293.3104
0.2924 338.6185
0.3624 315.2018
0.3724 319.7495
0.4274 239.4681
0.4374 321.5376
0.4524 317.1090
0.4674 320.0073
0.4824 318.3651
0.5174 331.5229
0.5774 313.8545
0.6424 333.5767
0.7074 299.6494
0.7624 304.141
0.8074 288.1895
0.8224 291.5525
0.8424 285.5504
0.8574 286.4642
0.8924 268.8876
0.9324 339.7722
0.9674 324.6308
1.0024 333.4951
1.0324 296.5501
1.0424 309.3049
1.0574 303.6815
1.0924 322.7117
1.1224 317.7040
1.1374 320.8305
1.1624 307.1315
1.1924 324.5019
1.3174 261.9790
1.3424 281.0573
1.5074 182.8470
Figure 1: Anger(Female) Table 1:Anger(Female)
Figure 2: Happy(Female)
Max. Pitch Peaks Min. Pitch Peaks
Time(Sec) Max. PK(Hz) Time(Sec)
Min.
PK(Hz)
0.0723 209.4121 0.2173 180.9632
0.2323 183.2369 0.2623 169.3780
0.3023 203.3406 0.3623 185.5871
0.4073 189.0843 0.4423 182.3814
0.5673 232.3424 0.5923 223.2977
0.6173 223.9561 0.6723 204.0832
0.7923 256.3747 0.8023 255.5742
0.8823 275.3295 0.9323 243.5835
1.0173 278.5901 1.1723 252.6873
1.1973 264.5584 1.3673 230.3095
1.3723 231.3755 1.4073 229.1406
1.4573 235.5940 1.4723 234.9366
1.4873 235.5841 1.5473 234.8199
1.5973 236.6531 1.6773 208.6305
1.7273 226.4945 1.8023 209.7734
1.9073 212.9623 1.9723 210.3596
2.0073 218.7860 2.0523 215.4405
2.0773 217.4566 2.0923 214.3920
Table 2: Happy(Female)
Figure 3: Neutral(Female)
Max. Pitch Peaks
Min. Pitch Peaks
Time(Sec)
Max.
PK(Hz)
Time(Sec)
Min.
PK(Hz)
0.0354 176.3932
0.0723 162.9141
0.2254 185.8086
0.1273 174.7061
0.3454 169.3526
0.1723 174.9997
0.5954 168.2188
0.2273 170.1198
0.8504 178.4579
0.3173 174.2214
0.9004 151.1423
0.3823 170.5779
1.1404 175.4615
0.7623 148.6841
1.2604 160.1985
0.7873 145.7212
1.3854 137.2319
0.8573 147.4194
1.4254 138.6274
1.1573 139.9198
1.4754 141.1931
1.2073 132.4903
1.5404 142.0372
1.2423 136.7588
1.6154 136.4066
1.3123 138.8464
1.7004 146.3144
1.4323 141.3799
1.7354 146.1305
1.4523 142.2651
Table 3: Neutral(Female)
Figure 4: Sad(Female)
Max. Pitch Peaks
Min. Pitch Peaks
Time(Sec) Max. PK(Hz)
Time(Sec) Min. PK(Hz)
0.2301 405.2725
0.0904 164.8212
0.3151 419.1142
0.3154 168.5937
0.3251 413.3566
0.4954 130.5681
0.5251 184.5954
0.7204 144.7384
0.6601 284.2466
0.8904 150.0317
0.7351 376.0756
1.0154 122.4403
0.7501 395.5935
1.2304 143.9606
0.8401 430.4868
1.3654 136.6003
0.8751 428.9179
1.4004 134.7780
0.9501 399.9716
1.4504 138.1148
1.0001 368.4136
1.5004 138.8244
1.1601 245.0676
1.6104 135.7536
1.2951 174.2809
1.6604 126.3469
1.4501 403.8697
1.7204 144.1078
Table 4: Sad(Female)

50

Figure 5: Surprise(Female)
Max. Pitch Peaks
Min. Pitch Peaks
0.2301 405.2725
0.3001 332.7995
0.3151 419.1142
0.3201 411.8019
0.3251 413.3566
0.4701 179.3032
0.5251 184.5954
0.5401 179.0843
0.6601 284.2466
0.6651 282.8639
0.7351 376.0756
0.7401 374.5004
0.7501 395.5935
0.7851 320.9308
0.8401 430.4868
0.8651 424.6280
0.8751 428.9179
0.9351 371.1580
0.9501 399.9716
0.9851 358.1337
1.0001 368.4136
1.1151 232.5406
1.1601 245.0676
1.2901 173.6630
1.2951 174.2809
1.3251 171.1973
Table 5: Surprise(Female)
Time(Sec) Max. PK(Hz) Time(Sec) Min. PK(Hz)
0.1105 151.8427 0.1205 150.2417
0.2305 226.5018 0.2605 214.3573
0.2755 220.8551 0.2905 202.4689
0.3255 215.1622 0.3555 211.4449
0.3705 213.3613 0.4305 195.2386
0.4605 211.5450 0.4955 206.2813
0.5005 248.0258 0.5055 228.3682
0.5405 243.0083 0.6055 191.8354
0.6255 209.1673 0.6305 208.4470
0.6455 216.9955 0.6905 170.8492
0.6955 171.4495 0.7055 166.6838
0.8905 266.7977 0.9305 246.3066
0.9505 269.5548 0.9855 243.2995
1.0055 247.0963 1.0205 241.8763
1.0555 244.9087 1.1155 228.6133
1.1305 229.5808 1.3755 091.0580
Figure 6: Anger(Male) Table 6: Anger(Male)
Figure 7: Happy(Male)
0.0466 148.9253 0.0516 127.5883
0.1916 181.2705 0.2116 175.2165
0.2216 176.6431 0.2466 164.9239
0.3066 182.8573 0.3316 182.0870
0.3716 185.4090 0.4066 175.2852
0.4366 182.8874 0.4566 180.9115
0.4666 181.6462 0.4766 179.1653
0.4866 212.4714 0.5916 185.1137
0.6166 199.3520 0.6816 196.2145
0.6866 196.8589 0.7416 182.2006
0.8016 187.1457 0.8666 178.0532
0.9766 246.8907 1.0316 196.1084
1.0416 203.6092 1.0516 197.7426
1.0766 208.0856 1.2116 129.6689
1.2316 131.7508 1.2716 129.2980
1.2866 131.2784 1.4316 113.3092
1.5116 116.7049 1.5666 113.6222
Table 7: Happy(Male)
Figure 8: Neutral(Male)
0.3308 167.3301 0.3508 155.4644
0.3658 166.3833 0.3808 151.6457
0.4258 158.8058 0.4808 151.4266
0.5058 152.9884 0.5258 145.9730
0.5458 151.1042 0.5558 150.3856
0.6008 193.2725 0.6958 154.7391
0.8058 169.0030 0.8308 151.3895
0.8508 154.2452 0.8658 153.3511
0.9108 160.6289 0.9258 159.5749
1.0108 185.8276 1.0758 165.6972
1.0958 168.3742 1.2108 133.0394
1.2258 134.4333 1.2508 132.2823
1.2608 133.1881 1.3758 115.4469
1.4008 118.0272 1.4608 111.0422
1.4708 111.5599 1.5108 103.0068
1.5308 107.9201 1.5458 104.3688]
Table 8: Neutral(Male)

51

Figure 9: Sad (Male)
0.1717 116.1005 0.1817 112.2522
0.3417 131.0295 0.3717 126.3966
0.3967 130.7466 0.4217 127.5389
0.4517 129.7727 0.5817 118.8933
0.5917 120.1534 0.5967 119.2434
0.7067 152.2471 0.8118 120.7858
0.9817 158.6623 1.0818 114.5193
1.1118 116.0195 1.1568 108.7562
1.3018 147.8923 1.3568 134.7556
1.3868 138.9676 1.5118 107.1190
1.5218 107.8125 1.5718 104.4020
1.5867 107.3844 1.7017 096.6589
1.7268 098.5362 1.7418 095.5470
1.7768 100.5100 1.8267 096.8091
1.8468 098.1214 1.8868 095.9293
1.9018 099.6447 1.9117 097.4501
1.9417 101.2222 1.9517 099.0492
Table 9: Sad (Male)
Figure 10: Surprise(Male)
Max. Pitch Peaks
Min. Pitch Peaks
0.2121 190.2424
0.2471 181.1592
0.2621 188.9810
0.3021 184.7994
0.3171 187.2840
0.3621 142.4524
0.4471 176.5780
0.4871 170.4763
0.4971 175.6148
0.5121 174.0372
0.5421 175.0406
0.5521 173.2507
0.5771 222.8746
0.5821 220.6013
0.6121 226.4705
0.6821 187.0653
0.6971 206.1591
1.0221 120.3817
1.0871 142.5872
1.1121 139.4091
1.1621 150.4366
1.1671 145.4267
1.2771 269.7540
1.2971 259.9232
1.3371 339.6683
1.3821 324.5930
1.4321 336.3981
1.4471 332.2637
1.5121 366.8736
1.6221 309.9620
Table 10: Surprise(Male)
The fine structures of the pitch contour exhibiting pitch peaks as small fluctuations are an
important emotional cue conveying emotional information. The pitch fluctuations of high activation
expression (Tab. 1, 5, 6 10) are varied from medium (Tab. 2 7) and low activation expressions
(Tab. 3, 4, 8 9). Comparing the various activations of emotions the minima seemed to be a bit
higher for both activation levels except for sadness and maxima did not follow any regular pattern
but pitch went up due to rise of its mean. Pitch contours are higher and more variable for emotions
surprise (Fig. 5 10) anger (Fig. 1 6), happy (Fig. 2 7) and lower and less variable for neutral
(Fig. 3 8) and sad (Fig. 4 9) emotions. Anger pitch contour follow angular frequency curve with
irregular up and down infliction. The stressed syllables ascend rhythmically and frequently. The
stressed syllables suddenly glide up to a high level subsequently fall to mid level or lower level in
last syllable. Surprise sentences also follow more or less angular frequency curve with higher
irregular up and down infliction. Happy expressions follow descending line with frequently
ascending at irregular intervals. Neutral sentences are characterized by monotonous contour with
shallow range whereas sad has downward inflections. The declination in happy speech is less than in
neutral speech and that the slope of the F0 contour is higher than neutral speech, especially at the end
of the sentence. The steepness of rising and falling of the pitch and direction of the pitch contour is
remarkable for anger and surprise sentences (Fig. 1, 5, 6 10). The energy contours reflect that an
increased intensity is followed by happy expressions whereas decreased intensity is noticeable in the
sad sentences. Angry and surprise expressions have a noticeable increased energy envelope with
raised intensity.
Average fundamental frequencies (mean pitch), Standard deviation (SD) and pitch range
parameters are computed for both male and female speakers that can be used as parameters of the
classifier. The tables 11-15 represent computation of pitch parameters of following 4 sample
sentences expressed in five emotions.

1. “Aaj office nahi jana hai”. 2. “Maine apna kaam kar liya hai”. 3. “Aaj Cricket match hai”.
4. “Mujhe aaj kaam karna hai”.
52

Table 11: Mean Pitch
Emo.
Sample Sentences
1 2 3 4
a 293.43639 290.99344 283.59904 294.9011
(F)
h 227.12478 223.59414 224.82956 227.18903
n 162.07243 172.59577 167.92701 191.54918
sa 153.03138 164.76875 165.93882 174.17183
su 302.36815 324.9595 321.45352 239.30298
a 205.61052 221.30491 206.36296 205.63666
(M)
h 162.80784 155.44498 167.71843 162.94207
n 147.68864 150.30648 152.85659 148.64526
sa 118.02006 118.05338 120.26458 119.95379
su 231.88352 229.92513 206.38492 253.70895
Emo.
Sample Sentences
1 2 3 4
a 38.0521 44.9733 37.5875 49.2198
(F)
h 26.7900 26.1878 29.1527 31.9514
n 17.8518 26.8248 22.4802 26.9890
sa 15.0193 16.7422 20.5397 15.5639
su 89.0408 68.6711 84.7988 69.5469
a 40.4847 44.5519 45.6355 39.2557
(M)
h 38.6968 36.6439 31.0797 34.8896
n 21.8147 19.9589 15.6523 20.1834
sa 14.0969 12.97769 11.0894 15.7631
su 78.1396 69.7615 72.3298 57.7255
Table 12: Standard deviation
Emo.
Sample Sentences
1 2 3 4
a 156.9252 183.5770 167.4712 134.0630
(F)
h 109.2121 122.3690 124.4345 134.8000
n 63.5045 76.8922 88.3878 87.6267
sa 63.3682 71.9073 75.9701 72.2699
su 259.2894 238.6640 302.8878 204.3170
a 178.4968 145.8040 191.4007 112.8700
(M)
h 139.8440 115.1850 113.8293 109.8560
n 90.2657 62.2681 85.5473 89.0045
sa 63.1153 49.5587 45.8823 55.2080
su 246.4920 239.0870 251.1407 189.0660
Table 13: Pitch Range
Emo. Sample Sentences
1 2 3 4
a 339.7722 365.4715 359.2399 362.5315
Table 14: Maximum Pitch
(F)
h 278.5901 306.6425 282.7491 298.1711
n 195.9949 213.8228 225.7316 229.8439
sa 185.8086 201.0567 205.5097 199.8777
su 430.4868 383.1413 471.47 335.9438
a 269.5548 296.0135 281.5507 262.5344
(M)
h 246.8907 222.8417 214.2731 218.6937
n 193.2725 186.2944 188.9552 179.9229
sa 158.6623 136.4878 141.4901 145.3877
su 366.8736 351.3492 359.5778 326.5271
Emo. Sample Sentences
1 2 3 4
a 182.8470 181.8945 191.7688 228.4687
(F)
h 169.3780 184.2731 158.3147 163.3715
n 132.4903 136.9306 137.3438 142.2173
sa 122.4403 129.1493 129.5396 127.6078
su 171.1973 144.4775 168.5822 131.6264
a 91.0580 150.2095 90.1500 149.6641
(M)
h 107.0467 107.6567 100.4438 108.8374
n 103.0068 124.0263 103.4078 90.9184
sa 95.5470 86.9291 95.6077 90.1797
su 120.3817 112.2624 108.4371 137.4607
Table 15: Minimum Pitch
The results indicate that range of pitch frequency for different emotions are quite different
and lay in particular range. The pitch frequency is lesser f0 for sad and neutral speech sentences than
happy, angry and surprise speech sentences. Anger and surprise produced with a higher, more varied

pitch and greater energy in comparison to neutral speech. Sadness is produced with a lower,
downward directed pitch with lower energy. Average pitch and pitch range have smallest value in
sadness. Both values have higher increase in neutral and happiness and highest in anger and surprise.
The Standard deviation (SD) increases with increasing activation level of emotions. The Standard
deviation has highest value for surprise and anger, higher for happy and lowest for sad expressions.
SD value is larger for female speakers and deviation pattern is same for both genders except for the
between speakers differences. Standard deviation shows slightly higher value in some happy and
anger male expressions as SD varies with depending on the type of discourse. Pitch range for
surprise is much wider and wide for anger and happiness and slightly narrower for sad. Except for
the between speaker differences, majority of speakers showed homogeneous pattern and none of
these speakers deviated much from the uniform pattern. Some of the high activation levels exhibit
higher pitch range for male expressions as the pitch range of the specific gender is also influenced by
the language, type of text and the type of discourse. An increase of the fundamental frequency is the
subjects of higher activation levels. Higher mean pitch and a wider pitch range, falling pitch contours
for all syllables, increased numbers of downward directed F0 contour, unaccented final syllables,
high articulation rate, fast rhythm and little variation in phoneme durations is exhibited by the angry
speech expressions whereas lower mean pitch and a narrower pitch range is exhibited by sad speech
expressions. It is observed that Angry and surprise expressions tend to be steeper in the fall and
sometimes a rise of the pitch at the end of an utterance. Slight descending slope is usually seen in
Neutral speech expressions. The degree of perceived of sadness shows a negative correlation to the
pitch. Emotional intensity increases the pitch level and decreases the syllable rate. Amplitude and
pitch show a positive correlation for emotionally intense speech expressions. The amplitude is
proportional to the degree of emotional intensity. The perceived level of emotional intensity has a
strong positive correlation to the perceived degree of activation level of emotions.
53
2.1 Gender Specific Pitch variation pattern

The discriminating capability of the statistical features considering gender information
revealed significant differences in variability between the male and female speakers. From the
figures 1-10, it is clear that female pitch contour has higher average magnitude than male pitch
contour. Pitch dynamism of speech expressions reveals that males and females have differences in
the frequency, rate of pitch changes and the variation of pitch around the average pitch value. The
computation of pitch parameters (Tables 11-15) reveals that the mean pitch for neutral expression for
different utterances is in the vicinity of 150 Hz (147 Hz to152 Hz) for males expressions whereas it
is 175 Hz (162 Hz to 191 Hz) for females expressions. In increasing order of activation level,
Happiness seems to be closer to the neutral state with mean pitch 160 Hz (155Hz to 167Hz) for
males and 225 Hz (223Hz to 227Hz) for females which correspond a deviation of up to 15% for male
expressions and deviation up to 40% for female expressions from neutral state. A downward
deviation of mean value is observed for sadness at 119 Hz (118Hz to 120 Hz) for males and 164 Hz
(153Hz to 174Hz) for females which represent deviation up to 21% for males and up to 9% for
females from the mean value of neutral state. Highest deviation is observed in the anger (205 Hz to
221 Hz for male and 283 Hz to 294 Hz for female) and surprise (206 Hz to 253 Hz for male and 239
to 324 Hz for female) expressions. The deviation from neutral expressions value for male is up to
47% for anger and up to 70% for surprise whereas for female it is up to 80% for anger and up to 92%
for surprise expressions.
Male’s speech expressions are characterized by less pitch dynamism than females speech
expressions suggesting tendency of female expressions to show more emotion during conversation.
The graph pattern show that female’s voice has a high frequency component and then has low
intensity after that whereas men’s voice has a band of medium intensity and gradually tapers towards
end. More monotonous tone is exhibited by the male expressions while women have a tendency to

use a wider range of tones. Standard deviation of women’s F0 from the female mean is greater than
in case of men. A sharp gradient of pitch change is exhibited in case of women speech expressions
than male’s speech expressions except in sad expressions over time showing more intonation
dynamics. Female speakers have lesser deviation from neutral mean value of pitch for sad emotion as
compared to male speaker. However, the difference expressed by rate of articulation is not larger for
different genders. On the whole a wider pitch range is exhibited by female speech expressions and
pitch range closer to monotone is exhibited by males speech expressions which may related to ethic
and show less emotions. Although significant differences are marked in variability between the male
and female speakers, both the genders are delineated by the similar pattern of pitch evolution.
54
3. CONCLUSION

The voice is enriched to convey emotional state of the speaker during expressive speech and
this emotional modulation affects pitch contour, one of the most important properties of speech.
Accomplished analysis of pitch and pitch derivatives contours has shown that emotions surprise and
anger have increased value of emotional features while decreased values of features were seen for
sadness. Anger characteristics are delineated by higher mean pitch, high variance, fast rhythm,
accenting syllables and falling contour of syllables. The happiness is characterized by high mean
pitch, high variance, and rather fast rhythm, accenting few syllables with last word accenting and
rising contour of syllables. The sadness emotion has low mean pitch, low variance, and slow rhythm
with very few syllables accenting and falling contours of syllables. The neutral expressions are
characterized by lower mean pitch than happiness with slow rhythm and very few syllables are
accenting and rising contour of syllables. Surprise expressions are characterized by highest mean
pitch with high variance, accenting syllables and falling pitch contours. The study reveals significant
differences in variability between male and female speakers. The analysis showed that gross pitch
contour statistics such as mean, standard deviation and range are more emotionally prominent in
female speech expressions than male speech expressions. Average mean value of pitch contour is
lower for male speaker but the deviation from average mean value of pitch contour for male and
female speaker is similar among the emotions. Emotions surprise, anger and happiness have higher
mean values than sadness and neutral with slow speaking style and variation pattern of both genders
have similar behavior among the features at different activation levels. The analysis showed that
although a significant difference in pitch parameters is noticeable between male and female speakers,
the deviation from average value of pitch parameters for the speakers follow unvarying pattern.
REFERENCES
[1] Juslin, P. N., Laukka, P. (2001). “Impact of intended emotion intensity on cue utilization
and decoding accuracy in vocal expression of emotion”. Emotion, 1, 381-412.
[2] Lieberman, P. and Michaels, S. (1962). “Some aspects of fundamental frequency and
envelope amplitude as related to the emotional content of speech,” J. Acoust. Soc. Amer.,
vol. 34, no. 7, pp. 922–927.
[3] Paeschke et al. A. Paeschke (2004). “Global trend of fundamental frequency in emotional
speech,” in Proc. Speech Prosody (SP’04), Nara, Japan, pp.671–674.
[4] Banziger and SchererT. Bänziger and K. Scherer (2005). “The role of intonation in emotional
expressions,” Speech Communication, vol. 46, no. 3–4, pp. 252–267.
[5] Scherer et al. K. Scherer, D. Ladd, and K. Silverman (1984). “Vocal cues to speaker affect:
Testing two models,” J. Acoust. Soc. Amer., vol. 76, no. 5, pp.1346–1356.
[6] Bänziger, T., and Scherer, K.R. (2005). “The role of intonation in emotional expressions”,
Speech Communication, Vol. 46, Issues 3-4, 252-267.

55

[7] Hirose, K., et al. (2005). “Synthesis of F0 contours using generation process model
parameters predicted from unlabeled corpora: application to emotional speech synthesis”,
Speech Communication, Volume 46, Issues 3-4, 385-404.
[8] S. Chuenwattanapranithi et.al. (2007). “The Roles of Pitch Contours in Differentiating Anger
and Joy in speech”, World Academy of Science, Engineering and Technology 11.
[9] Mullennix et.al. (2002). “Effects of variation in emotional tone of voice on speech
perception”, language and speech 45(3), 255-283.
[10] Pittam, J., Scherer, K.R. (1993). “Vocal expression and communication of emotion”. In M.
Lewis J.M. Haviland (Eds.), Handbook of emotions, New York: The Guildford Press.
[11] Oudeyer, P.Y. (2002). “Novel useful features and algorithms for the recognition of emotions
in human speech”, Sony computer science lab, Paris, France.
[12] Scherer, K.R., Banse, R., Wallbott, H.G. (2001). “Emotion inferences from vocal
expression correlate across languages and cultures”. Journal of Cross-Cultural Psychology,
32(1), 76-92.
[13] Scherer, K.R (2003). “Vocal communication of emotions are view of research paradigms”,
speech Communication 40, 227-256.
[14] Frick, R.W. (1985). “Communicating emotion: The role of prosodic features”. Psychological
Bulletin, 97, 412-29.
[15] Carlos Busso, (2009). “Analysis of Emotionally Salient Aspects of Fundamental Frequency
for Emotion Detection”, IEEE Transaction on audio, speech and language processing, Vol.17,
No.4.
[16] Sushma Bahuguna and Y.P. Raiwani (2013). “A Study of acoustic features pattern of
emotion expression for Hindi speech”, International journal of computer engineering
Technology, vol. 4, issue 6, 16-24.
[17] Pierre-Yves Oudeyer (2003). “The production and recognition of emotions in speech:
features and algorithms”, International journal of human-computer studies 59(2003) 157-183.
[18] Donn Morrison et.al. (2006). “Ensemble methods for spoken emotion recognition in call-centers”,
cui.unige.ch/~morrison/morrison-speechcomm06. Preprint submitted to Elsevier
Science.
[19] Marius Vasile Ghiurcau et. al. “A Study of the Effect of Emotional State upon the Variation
of the Fundamental Frequency of a Speaker”. Journal of applied computer science
Mathematics, no. 7- special issue 79.
[20] Tin Lay New et.al. (2003). “Speech emotion recognition using hidden Morkov models”,
Elsevier speech communications journal vol.41, Issue 4.
[21] Mirjam T. J. Tielen citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.396.7843
[22] Olsen, C.L. (1981). “Sex differences in English intonation observed in female
impersonation”. Toronto Papers of the Speech and Voice Society 2, 30-49.
[23] Huber, D. (l989b). “Voice characteristics of female speech and their representation in
computer speech synthesis and recognition”. Proc. Euro speech '89, 477480.
[24] Bezooijen, R. Van (1981). “Characteristics of vocal expressions of emotion: pitch level”.
Proc. Institute of Phonetics, Univ of Nijmegen 5, 1-18.
[25] Brend, R. (1971). “Male-female intonation patterns in American English”. Proc. of the 7th
Inth. Congres of Phonetic Sciences, Mouton, The Hague, 866-869.
[26] Vladimir Hozjan and Zdravko Kacic (2000). “Objective analysis of emotional speech for
English and Slovenian Interface emotional speech databases”. Interspeech, 1113-1116
Speech Communication 22(4): 385-402.

A study of gender specific pitch variation pattern of emotion expression for hindi speech

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Similar to A study of gender specific pitch variation pattern of emotion expression for hindi speech

Similar to A study of gender specific pitch variation pattern of emotion expression for hindi speech (20)

More from IAEME Publication

More from IAEME Publication (20)

Recently uploaded

Recently uploaded (20)

A study of gender specific pitch variation pattern of emotion expression for hindi speech