Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Diphthong dynamics in unscripted speech

65 views

Published on

Hughes, V., McDougall, K. and Foulkes P. (2009) Diphthong dynamics in unscripted speech. Paper presented at International Association for Forensic Phonetics and Acoustics (IAFPA) Conference, University of Cambridge. 2-5 August 2009.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Diphthong dynamics in unscripted speech

  1. 1. Diphthong dynamics in unscripted speech Vincent Hughes University of York Kirsty McDougall University of Cambridge Paul FoulkesUniversity of York & JP French Associates 2009
  2. 2. 0. Outline 1. formant dynamics 2. previous work on dynamic aspects of speech 3. methodology 4. results 5. discussion 6. conclusion and outlook Hughes, McDougall & Foulkes 2
  3. 3. 1. Formant dynamics • growing body of work on dynamic aspects of speech production – e.g. temporal changes in vowel formants • may furnish more fine-grained speaker-specific information than ‘static’ measures • language determines phonological/phonetic targets • but speakers have freedom to take different paths between targets Hughes, McDougall & Foulkes 3
  4. 4. 2. Previous work Hughes, McDougall & Foulkes 4 study language speakers segment(s) best classification (DA) Greisbach et al (1995) 80 6 x /V:/ 94% (matching procedure) Ingram et al (1996) 15 21 chunks 93% (DA) McDougall (2004, 2006) 5 /aɪ/ 88-96% (DA) McDougall & Nolan (2007) 20 /uː/ 52% (DA) Eriksson & Sullivan (2008) 5 /jœː/ 88% (DA) Morrison (2008) 27 /aɪ/ n/a LR analysis
  5. 5. 2. Previous work • promising results • but previous studies all used scripted/labspeech • unclear whether method can be usefully applied to spontaneous speech/forensic samples – hypothesise greater within-speaker variation • variation in rate, CSPs, context effects... • mixed channel effects – ... hence the aim of the present study Hughes, McDougall & Foulkes 5
  6. 6. 3. Methodology segmental material – /aɪ/ in English – direct comparison with McDougall (2004, 2006) corpus • DyViS (Nolan et al 2008, Hudson et al this morning) – 100 educated males, age 18-25, standard British English – spontaneous speech but controlled materials – target words elicited during tasks Hughes, McDougall & Foulkes 6
  7. 7. 3. Methodology speakers used in present study – 20 males speech sample – mock police interview (map task) 14 target words with /aɪ/ – bike, heights, type, Skype... – 11 – 15 tokens per speaker Hughes, McDougall & Foulkes 7
  8. 8. procedure – formant analysis by Praat script • manual check and correction (c. 5% of data)
  9. 9. 3. Methodology Hughes, McDougall & Foulkes 9 10 20 30 40 50 60 70 80 90% ß F3 ß F2 ß F1 formant analysis by Praat script time normalised as %
  10. 10. 4. Results Hughes, McDougall & Foulkes 10
  11. 11. 11 0 500 1000 1500 2000 2500 3000 0 20 40 60 80 100 Frequency (Hz) +10% step 1 2 3 4 6 8 9 10 11 12 13 15 16 17 18 19 20 21 22 23 all speakers F1, F2, F3 means
  12. 12. 12 all speakers F1 means 200 300 400 500 600 700 800 0 10 20 30 40 50 60 70 80 90 100 Frequency (Hz) 10% step 1 2 3 4 6 8 9 10 11 12 13 15 16 17 18 19 20 21 22 23
  13. 13. 1000 1200 1400 1600 1800 2000 2200 0 10 20 30 40 50 60 70 80 90 100 Frequency (Hz) 10% step 1 2 3 4 6 8 9 10 11 12 13 15 16 17 18 19 20 21 22 23 13 all speakers F2 means
  14. 14. 2000 2200 2400 2600 2800 3000 0 10 20 30 40 50 60 70 80 90 100 Frequency (Hz) 10% step 1 2 3 4 6 8 9 10 11 12 13 15 16 17 18 19 20 21 22 23 14 all speakers F3 means
  15. 15. 15 all speakers F2 : F1 means 250 300 350 400 450 500 550 600 650 700 750 800 1000120014001600180020002200 F1 (Hz) F2 (Hz) 1 2 3 4 6 8 9 10 11 12 13 15 16 17 18 19 20 21
  16. 16. 4.1 discriminant analysis • used to test power of feature in discriminating between speakers in the sample • standard procedure followed (Tabachnick and Fidell 1996) – a small number of outliers removed – various combinations of the 27 predictors tested initially – max. 10 predictors (i.e. sets of formant values) • max N predictors must be < min N tokens per speaker (= 11) Hughes, McDougall & Foulkes 16
  17. 17. initial DA results Hughes, McDougall & Foulkes 17 NB chance level = 5% % Steps N predictors Classification rate F1 20, 70 2 19.2% F2 20, 70 2 21.4% F3 20, 70 2 26.3% F1 + F2 + F3 20, 70 6 45.7% F1 + F2 + F3 10, 50, 90 9 45.4% F3 all 9 30.8% F1 + F2 10, 30, 50, 70, 90 10 36.1% F1 + F3 10, 30, 50, 70, 90 10 44.4% F2 + F3 10, 30, 50, 70, 90 10 42.5%
  18. 18. initial DA results • more predictors = better DA (p = 0.0004) Hughes, McDougall & Foulkes 18 R² = 0.41156 0% 10% 20% 30% 40% 50% 0 2 4 6 8 10 DA classification rate N predictors
  19. 19. 4.1 discriminant analysis • exploratory analysis: better results with more predictors, and all 3 formants • best 10 predictors identified via F-ratios (Anova) – used best 3 for each formant + 1 extra Hughes, McDougall & Foulkes 19 Predictors included F3 30%, 40%, 50%, 60% F2 10%, 20%, 30% F1 40%, 50%, 60%
  20. 20. 4.2 DA results • with best 10 predictors: classification rate 45.1% 20 Function 1 6420-2-4-6 Function2 4 2 0 -2 -4 -6 20 19 18 17 16 1513 12 11 10 98 6 4 3 2 1 Canonical Discriminant Functions Group Centroid 20 19 18 17 16 15 13 12 11 10 9 8 6 4 3 2 1 Speaker
  21. 21. 5. Discussion • best DA classification = 45% – far better than chance (5%) • compares well with work on scripted speech – cf. McDougall & Nolan (2007) – same 20 speakers – 52% with /uː/ in read text Hughes, McDougall & Foulkes 21
  22. 22. 5. Discussion • DA results improve when: – including more predictors (i.e. dynamic information) – including F1, F2 and F3 • but ceiling effect at c. 45% – adding more predictors yields diminishing returns – cf. McDougall (2004, 2006) Hughes, McDougall & Foulkes 22
  23. 23. 5. Discussion • F3 dominates in this data set – fairly flat contours – large cross-speaker differences – generally less within-speaker difference than F1, F2 (Nolan 1983 etc) • F1 & F2 dynamics thus play smaller role here than e.g. McDougall (2004, 2006) – power of F3 needs to be tested on other data sets – Australian /aɪ/ sociolinguistically more variable than RP Hughes, McDougall & Foulkes 23
  24. 24. 6. Conclusion & outlook • first exploration of formant dynamics approach to spontaneous speech • further testing needed and underway – reducing data to quadratic equations – other DyViS speakers and data (Atkinson, in progress) – corpus from Derby, UK (Rhodes, in progress) • /aɪ/ sociolinguistically variable: [aɪ aːːɪ ɒɪ] • but promising results & method looks applicable to forensic data Hughes, McDougall & Foulkes 24
  25. 25. thanks questions? Hughes, McDougall & Foulkes 25
  26. 26. Acknowledgments Toby Hudson & Caroline Williams for the Praat script Philip Harrison for technical assistance The DyViS team for the data RES-000-23-1248 Hughes, McDougall & Foulkes 26

×