Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Filled pauses as variables in speaker comparison: dynamic formant analysis and duration measurements improve performance

107 views

Published on

Wood, S., Hughes, V. and Foulkes, P. (2014) Filled pauses as variables in speaker comparison: dynamic formant analysis and duration measurements improve performance. Paper presented at International Association for Forensic Phonetics and Acoustics (IAFPA) Conference, Universität Zürich, Switzerland. 31 August-3 September 2014.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Filled pauses as variables in speaker comparison: dynamic formant analysis and duration measurements improve performance

  1. 1. FILLED PAUSES AS VARIABLES IN SPEAKER COMPARISON Sophie Wood, Vincent Hughes, and Paul Foulkes Department of Language and Linguistic Science, University of York 1
  2. 2. 1. Overview2
  3. 3. 1. Overview ¨ discriminatory power of filled pauses (FPs) as a variable for speaker comparison ¤ extending work presented by King et al. (2013) ¨ acoustic properties examined: ¤ ‘static’ mid-point formant frequencies of F1-F3 ¤ ‘dynamic’ measurements of F1-F3 ¤ duration ¨ calibrated log likelihood ratios (LRs) 3
  4. 4. 2. Introduction4
  5. 5. 2.1. Speaker Comparison ¨ speaker comparison involves comparative analysis of speech in a disputed sample and speech in a reference sample ¨ the best acoustic parameters used: ¤ exhibit large inter-speaker/ low intra-speaker variation ¤ are readily available in short samples ¤ are accurately measureable ¤ are resistant to disguise (Nolan 1997) 5
  6. 6. 2.2. Filled Pauses ¨ frequent for most speakers and in most types of spontaneous speech. ¤ 3.7 FPs/ minute (Tschäpe et al. 2005) ¨ typically longer than lexical vowels ¨ often abut silence à less influence of coarticulation ¤ more easily measured ¤ in principle, more consistent for the individual speaker. ¨ little awareness by speakers (Jessen 2008) 6
  7. 7. 2.3. Formant dynamics ¨ speech as a series of phonetic targets (McDougall 2006) ¨ individual has flexibility in articulatory movement from one target to the next ¤ Individual variation in vocal tract anatomy ¤ Speakers exercising a degree of choice ¨ Nolan (1997: 763): “the nearest we will come to finding a speaker’s unique signature will be in the detailed dynamics of speech” 7
  8. 8. 3. Previous study: King et al. (2013)8
  9. 9. 3. King et al. (2013). ¨ 20 male SSBE speakers aged 18-25 (DyViS). ¨ FPs and tokens of lexical vowels KIT, DRESS and TRAP. ¤ ‘Static’ mid-point measurements of F1-F3. ¤ Duration. ¨ FPs have greater discriminatory power than lexical vowels (formant frequencies only condition). ¨ Addition of durational info worsened discriminatory power. 9
  10. 10. 3. King et al. (2013). Condition EER% Cllr Formants only Uh 17.5 0.567 Um 20.0 0.506 Formants + duration Uh 42.5 0.870 Um 22.5 0.620 Lexical Vowels KIT 20.0 0.894 DRESS 37.5 0.830 TRAP 20.0 0.574 10
  11. 11. 4. Method11
  12. 12. 4.1. Subjects ¨ 75 speakers, DyViS corpus (Nolan et al. 2009) ¤ male ¤ aged 18-25 ¤ Southern Standard British English (SSBE) ¨ recordings taken from Task 1- mock police interview 12
  13. 13. 4.2. Segmentation of FPs ¨ 20 tokens of both uh and um/ speaker 13
  14. 14. 4.3. F1-F3 analysis ¨ 9 measurements of F1-F3, @ 10% intervals ¤ ‘dynamics_script_v2’ (Hughes, 2013) ¤ 27 formant measurements/ token ¨ quadratic polynomial curves fitted to F1-F3 contours ¤ quadratic estimations seen to be the most effective in carrying greater speaker-distinguishing information/ predictor (McDougall & Nolan 2007) 14
  15. 15. 4.4. Calibrated log likelihood ratios ¨ 75 speakers ¤ 25 speakers = background population ¤ 25 speakers= development set; 25 speakers= test set ¨ LRs computed using Aitken & Lucy’s (2004) Multivariate Kernel Density (MVKD) formula ¨ scores for test set converted to log LRs using logistic calibration weights based on the development scores 15
  16. 16. 4.5. EER and Cllr ¨ system performance was assessed using… 1. Equal Error Rate (EER) ¤ metric of absolute discrimination between SS & DS pairs 2. log LR cost function (Cllr) ¤ gradient assessment of system accuracy based on the magnitude of contrary-to-fact LRs ¤ Cllr is a measure of the validity of the system (< 1) ¤ the lower the Cllr the better (Morrison 2011) 16
  17. 17. 5. Results17
  18. 18. 5.1. Results for uh Test: EER (%): Cllr: ‘Static’ F1-F3 frequencies 11.92 0.5246 ‘Static’ F1-F3 frequencies fused with durations 12.00 0.4876 Formant dynamics 15.17 0.7068 Formant dynamics fused with durations 11.92 0.7449 • ‘static’ measurements outperform dynamics 18
  19. 19. 5.2. Results for um Test: EER (%): Cllr: ‘Static’ F1-F3 frequencies 11.92 0.3692 ‘Static’ F1-F3 frequencies fused with durations 8.92 0.2825 Formant dynamics 4.67 0.1978 Formant dynamics fused with durations 4.17 0.1821 • dynamic measurements better • addition of duration info further improves results 19
  20. 20. 6. Discussion20
  21. 21. 6.1. EER values for uh and um 0 2 4 6 8 10 12 14 16 Static Static + duration Dynamics Dynamics + duration EER(%) Uh Um 21
  22. 22. 6.2. Cllr values for uh and um 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Static Static + duration Dynamics Dynamics + duration Uh Um 22
  23. 23. 6.3. Uh- Overfitting? ¨ possible reason for the ‘static’ measurements outperforming the dynamic measurementsfor uh may be due to issues of overfitting ¨ trajectories essentially flat throughout the uh vocoid ¨ ‘static’ frequencies provide as much information without requiring so much input data 23
  24. 24. 6.4. Um- Inherent acoustic change ¨ dynamic properties of um are more useful because /VN/ FPs contain inherentlymore acoustic change between the vocalic and nasal portions 24
  25. 25. 7. Conclusion25
  26. 26. 7. Conclusion ¨ obtained LRs with EER scores below 5% using acoustic-phonetic features in spontaneous speech recordings ¤ Um: formant dynamics EER 4.67%. ¤ Um: formant dynamics + duration EER 4.17%. ¨ strongly supports the view that FPs have excellent potential as variables in forensic speaker comparison cases ¨ ‘static’ measurements provide equally good or better results than formant dynamics for uh 26
  27. 27. 8. References27
  28. 28. 8. References ¨Aitken, C. G. G. & Lucy, D. (2004). Evaluation of trace evidence in the form of multivariate data. Applied Statistics, 54, 109-122. ¨Foulkes, P. & French, J. P. (2012). Forensic speaker comparison: A linguistic-acoustic perspective. In P. M. Tiersma & L. M. Solan (eds.) The Oxford Handbookof Language and the Law. Oxford: Oxford University Press, pp. 557-572. ¨Jessen, M. (2008). Forensic phonetics. Language and Linguistics Compass, 2, 671-711. ¨King, J., Foulkes, P., French, P. & Hughes, V. (2013). Hesitation markers as a parameter for forensic speaker comparison. Paper presented at IAFPA conference, Tampa, Florida, USA. ¨McDougall, K. (2006). Dynamic features of speech and the characterisation of speakers: towards a new approach using formant frequencies. The International Journal of Speech, Language and the Law, 13, 89-126. ¨McDougall, K. & Nolan, F. (2007). Discrimination of speakers using the formant dynamics of /u:/ in British English. In J. Trouvain & W. Barry (eds.) Proceedings of the 16th International Congress of Phonetic Sciences, 6-10 August 2007. Saarbrücken, Germany. 1825-1828. 28
  29. 29. 8. References ¨Morrison, G. S. (2011). Measuring the validity and reliability of forensic likelihood-ratio systems. Science & Justice, 51, 91-98. ¨Nolan, F. J. (1997). Speaker recognition and forensic phonetics. In W. J. Hardcastle & J. Laver (eds.) The Handbook of Phonetic Sciences. Oxford: Blackwell. pp. 744-767. ¨Nolan, F., K. McDougall, G. de Jong & T. Hudson (2009). The DyViS database: style-controlled recordings of 100 homogeneous speakers for forensic phonetic research. The International Journal of Speech, Language and the Law, 16, 31-57. ¨Tschäpe, N., Trouvain, J., Bauer, D. & Jessen, M. (2005). Idiosyncratic patterns of filled pauses. Paper presented at the 14th Annual Conference of the International for Forensic Phonetics and Acoustics Annual Conference, 3-6 August 2005. Marrakesh, Morocco. 29
  30. 30. 9. Questions?30

×