Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Speaker- and group-specific information in formant dynamics: a forensic perspective

49 views

Published on

Hughes, V. and Foulkes, P. (2016) Speaker- and group-specific information in formant dynamics: a forensic perspective. Paper presented at LabPhon 15 Satellite Workshop: Speech dynamics, social meaning, and phonological categories, Cornell University, Ithaca, USA. 13-16 July 2016.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Speaker- and group-specific information in formant dynamics: a forensic perspective

  1. 1. Speaker- and group-specific information in formant dynamics: a forensic perspective Vincent Hughes Paul Foulkes LabPhon 15 Satellite Speech dynamics, social meaning, and phonological categories 13th July 2016
  2. 2. 2 e.g. Roberts (2012) Rhodes (2012) Harrison (2013)
  3. 3. Outline 1. the forensic problem 2. formant dynamics in forensics 3. research questions 4. method 5. experiments: – speaker discrimination – group discrimination 6. discussion 3
  4. 4. 1. The forensic problem • forensic voice comparison (FVC): vs. unknown offender known suspect 4
  5. 5. 1. The forensic problem defence (innocent) prosecution (guilty) FVC 5
  6. 6. 1. The forensic problem • properties of ideal features: – high between-speaker variability – low within-speaker variability – resistance to disguise – robustness in transmission – measurability – availability from Nolan (1983) 6 ✗ ✓
  7. 7. 1. The forensic problem 7 ✓ ✗ • properties of ideal features: – high between-speaker variability – low within-speaker variability – resistance to disguise – robustness in transmission – measurability – availability from Nolan (1983)
  8. 8. 2. Formant dynamics in forensics • commonly used in forensics for last 20 years – starting with… Greisbach et al. (1995) – McDougall (2006) • value of parametric representations • polynomials better than raw Hz input – Morrison (2009) • comparison of different parametric representations 8
  9. 9. 2. Formant dynamics in forensics why dynamics? • targets = learned by speech community • transitions = “acquired … by trial and error” • “speakers' 'vocal signatures' lie in the rapid, transitional movements of the speech organs between sounds” from Nolan (1997)/ McDougall (2004) 9
  10. 10. group e.g. targets 2. Formant dynamics in forensics • so… phonology is all about targets? individual e.g. transitions speech (language, contrast etc.) speaker Mokhtari (1998) Garvin & Ladefoged (1963)
  11. 11. 2. Formant dynamics in forensics • but… inconsistent with e.g. usage-based models? – any element of phonetic/ phonological structure can be learned & represented cognitively – thus potential for transitions to carry ‘group’ information • formant dynamics increasingly used to explore group-patterns in sociophonetics 11
  12. 12. 3. Research questions • to what extent is speaker- and group-specific information encoded in the dynamics of formant trajectories? – implications for models of phonology – value of the forensic perspective 12
  13. 13. 4. Method variable • PRICE /aɪ/ – subject of considerable analysis in forensics – covers a wide range of the vowel space • potential for considerable formant movement across the duration of the vowel 13
  14. 14. 4. Method: datasets (1) Standard Southern British English (SSBE) – DyViS corpus (Nolan et al. 2009) – 97 male speakers – 18-25 years – mock police interview (map task) 14
  15. 15. 4. Method: datasets (2) Newcastle (Milroy et al. 1994-97) (3) Manchester(Haddican et al. 2013) (4) Derby (Milroy et al. 1994-97) – 8 male speakers – 18-31 years – sociolinguistic interviews in peer-group pairs 15
  16. 16. 4. Method dynamics • c. 10 tokens/ sp • measurements at +10% steps • pre-testing for optimal fit – cubic polynomials • 4 coefficients/ formant 16 statics • +20% & +80% Hz values/ formant
  17. 17. 5. Results: speaker discrimination 17 • SSBE speakers: – 20 test speakers – 57 reference speakers • same- (SS) & different-speaker (DS) comps • likelihood ratios (LRs) used for discrimination p(E|Hp) p(E|Hd) p = probability E = evidence | = ‘given’ Hp = prosecution hyp Hd = defence hyp
  18. 18. 5. Results: speaker discrimination 18 • output = log10 LRs: – centered on 0 (no evidence) – > 0 = support for prosecution – < 0 = support for defence • error metrics: – equal error rate (EER) – log LR cost function (Cllr) Closer to 0, the better the performance
  19. 19. 5. Results: speaker discrimination 19 F1, F2 and F3 F1-only F2-only F3-only 0 5 10 15 20 25 30 35 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Log LR Cost (Cllr ) EER(%) Static F1, F2 and F3 F1-only F2-only F3-only 0 5 10 15 20 25 30 35 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Log LR Cost (Cllr ) EER(%) Dynamic
  20. 20. 5. Results: group discrimination 20 • predicting regional background • cross-validated discriminant analysis: – each token assigned to 1 of 4 regional groups – models built on all data excluding target token • generates classification rate based on posterior probability • chance = 25% (1/4)
  21. 21. 5. Results: group discrimination 21 Formant Classification rate F1 63.8% F2 64.7% F3 40.6%
  22. 22. 6. Discussion speaker discrimination • formant dynamics contain considerable speaker-specific information: – better performance than static values • higher formants = greater speaker- discriminatory power – speech-speaker dichotomy (Mokhtari1998) – F1~F2 responsible for contrast 22
  23. 23. 6. Discussion group discrimination • group-specific information isn’t all about targets – individual cubic coefficients capable of predicting regional background above chance – all coefficients in combination outperform any one in isolation • so… fine-grained phonetics clearly shared across speech communities 23
  24. 24. 6. Discussion • results challenge underlying phonological model for formant dynamics – groups = not all about targets – Individuals = not all about transitions • need to rethink the dichotomies: – speech-speaker (Mokhtari1998) – group-individual (Garvin & Ladefoged1963) – maybe it’s about continua? 24
  25. 25. 7. Conclusion • formant dynamics capable of encoding both speaker- and group-information – consistent with usage-based approaches? • focus on the individual may help us better understand acquisition of variation – therefore a role for forensics (methodological and theoretical) in understanding phonology 25
  26. 26. Thanks! Questions? LabPhon 15 Satellite Speech dynamics, social meaning, and phonological categories 13th July 2016

×