Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ffects of variability on numerical likelihood ratio calculations for forensic voice comparison

179 views

Published on

Hughes, V. and Foulkes, P. (2012) Effects of variability on numerical likelihood ratio calculations for forensic voice comparison. Paper presented at BAAP 2012 Colloquium, University of Leeds. 26-28 March 2012.

Published in: Education
  • Be the first to comment

  • Be the first to like this

ffects of variability on numerical likelihood ratio calculations for forensic voice comparison

  1. 1. Effects of variability on numerical likelihood ratio calculations for forensic voice comparison Vincent Hughes Paul Foulkes BAAP 2012 Colloquium Wednesday 28th March
  2. 2. 1. introduction • forensic voice comparison – voice of criminal vs. voice of suspect • increasing move towards likelihood ratio (LR) – “logically and legally correct framework” for expert evidence (Rose & Morrison 2009: 143) 2Hughes & Foulkes, BAAP 2012
  3. 3. 1. introduction • likelihood ratio = prosecution H defence H – assessment of the strength of the evidence – LR > 1 offers support for prosecution – LR < 1 offers support for defence • measurement of similarity vs. typicality – typicality evaluated against a broader ‘relevant’ population (Aitken & Taroni 2004) 3Hughes & Foulkes, BAAP 2012
  4. 4. 1. introduction Hughes & Foulkes, BAAP 2012 4 Study Feature Speech style REF DATA N speakers Age Language Rose et al (2003) /ɕ/ /o/ /N/ Read 60 20-50 Japanese Rose et al (2006) /aɪ/ Read 166 19-64 Australian English Morrison (2008) /aɪ/ Read 27 19-64 Australian English Kinoshita et al (2009) f0 controlled spontaneous 201 No control Japanese
  5. 5. 1. introduction • practical problems: – lack of available reference data • some for f0, articulation rate, pathological features – definition of the ‘relevant’ population – size of the reference data set Hughes & Foulkes, BAAP 2012 5
  6. 6. 1.1 research questions to what extent are LRs affected by… i. varying N speakers in the reference data? ii. varying N tokens per speaker in the reference data? iii. dialect mismatch between target voice and reference data? Hughes & Foulkes, BAAP 2012 6
  7. 7. Hughes & Foulkes, BAAP 2012 7 1. introduction Raw LR Log10 LR Verbal expression >10000 5 Very strong evidence 1000-10000 4 Strong evidence 100-1000 3 Moderately strong evidence 10-100 2 Moderate evidence 1-10 1 Limited evidence 1-0.1 -1 Limited evidence 0.1-0.01 -2 Moderate evidence 0.01-0.001 -3 Moderately strong evidence 0.001-0.0001 -4 Strong evidence <0.0001 -5 Very strong evidence Champod and Evett (2000) prosecution defence • LRs necessarily vary with different input data
  8. 8. 2. data and method • 4 sets of test data • 1 set of reference data • GOOSE /u:/ – phonetically diphthongal • dynamic time-normalised measurements of F1 and F2 (McDougall 2004, 2006) Hughes & Foulkes, BAAP 2012 8
  9. 9. Hughes & Foulkes, BAAP 2012 9 2.1 reference data set • reference data: ONZE – Origins of New Zealand English corpus – 120 male speakers – born 1932-1987 – minimum 10 tokens per speaker – tokens coded for phonological context – auto-generated formant data
  10. 10. Hughes & Foulkes, BAAP 2012 10 2.2 test data – 8 male speakers per set – 16 tokens per speaker – tokens coded according to phonological context Dialect Age Corpus NZ 20-30 ONZE Canterbury corpus Manchester 19-31 Haddican et al (2008) Newcastle 16-25 Milroy et al (1994-97) York 17-26 Tagliamonte (1998), Haddican et al (2008)
  11. 11. Hughes & Foulkes, BAAP 2012 11 200 300 400 500 600 700 800 05001000150020002500 F1 (Hz) F2 (Hz) Manchester Newcastle York ONZE 2.3 GOOSE vowels @ midpoint
  12. 12. • data reduction using quadratic polynomials – ỹ = a0 + a1x + a2x2 • LR calculation: – Multivariate Kernel Density formula • Aitken & Lucy (2004), Morrison (2007) Hughes & Foulkes, BAAP 2012 12 1400 1500 1600 1700 1800 1900 2000 0 10 20 30 40 50 60 70 80 90 100 +10% Step Raw Data (y) Quadratic (yfit) 2.4 formant analysis method
  13. 13. 3.1 results: N speakers Hughes & Foulkes, BAAP 2012 13 • number of speakers in the reference data – test data combined • 32 same-speaker comparisons • 992 different-speaker comparisons – starting with 120 speakers • 10 tokens per speaker – one speaker removed at a time
  14. 14. 3.1 results: N speakers Hughes & Foulkes, BAAP 2012 14 Log1o LR Verbal expression +/- 1 Limited evidence +/- 2 Moderate evidence +/- 3 Moderately strong evidence +/- 4 Strong evidence +/- 5 Very strong evidence 1 2 -1 -3 -4 -5 -6 -2
  15. 15. 3.1 results: N speakers Hughes & Foulkes, BAAP 2012 15 • stable mean > 25 speakers • increased variance < 60 speakers 1 2 -1 -3 -4 -5 -6 -2
  16. 16. Hughes & Foulkes, BAAP 2012 16 3.1 results: N speakers • stable mean > 45 speakers • greater fluctuations in SD 1 2 -1 -3 -4 -5 -6 -2
  17. 17. 3.2 results: N tokens Hughes & Foulkes, BAAP 2012 17 • N tokens per speaker in the reference data – test data combined • 32 same-speaker comparisons • 992 different-speaker comparisons – max N tokens shared by 102 speakers = 13 – LRs calculated 11 times with 1 token per speaker removed at each stage
  18. 18. Hughes & Foulkes, BAAP 2012 18 3.2 results: N tokens • mean LRs = stable • standard deviation = stable 4 2 -2 0 4 -4 -6 -8 -12 -10 -14 -18 -16 -20
  19. 19. Hughes & Foulkes, BAAP 2012 19 3.2 results: N tokens • continual increase in mean & SD as N tokens decreases 4 2 -2 0 -4 -6 -8 -12 -10 -14 -18 -16 -20
  20. 20. Hughes & Foulkes, BAAP 2012 20 3.2 results: N tokens • massive increase in strength of evidence 4 2 -2 0 -4 -6 -8 -12 -10 -14 -18 -16 -20
  21. 21. 3.3 results: dialect mismatch Hughes & Foulkes, BAAP 2012 21 • dialect mismatch – 4 independent test sets • ONZE, Manchester, Newcastle and York – 102 speakers in the reference data – 13 tokens per speaker
  22. 22. 3.3 results: dialect mismatch Hughes & Foulkes, BAAP 2012 22 Support for prosecution (same speaker) Support for defence (different speakers)
  23. 23. Hughes & Foulkes, BAAP 2012 23 3.3 results: dialect mismatch • same-speaker pairs
  24. 24. 3.3 results: dialect mismatch Hughes & Foulkes, BAAP 2012 24 • different-speaker pairs
  25. 25. 3.3 results: dialect mismatch Hughes & Foulkes, BAAP 2012 25 • different-speaker pairs
  26. 26. 4. discussion • number of speakers in the reference data – mean LRs & variance stable > ca. 45 speakers • different-speaker pairs more sensitive – misrepresentative estimation of the strength of evidence with small reference distribution – findings consistent with Ishihara & Kinoshita (2008) and Hawkins & Clermont (2009) Hughes & Foulkes, BAAP 2012 26
  27. 27. 4. discussion • number of tokens per speaker in the reference data – mean and SD for same-speaker pairs robust – different-speaker pairs very sensitive to the removal of even a single token – raises issues over Rose (2011) • 2 tokens of 5 vowel phonemes Hughes & Foulkes, BAAP 2012 27
  28. 28. 4. discussion • dialect mismatch – same-speaker strength of evidence overestimated • generally equivalent to one verbal category – multitude of issues with different-speaker pairs • overestimation of LRs for York (BUT issues of between- speaker variation) • high levels of contrary-to-fact support for the prosecution for Manchester and Newcastle • potential miscarriages of justice Hughes & Foulkes, BAAP 2012 28
  29. 29. • given its limitations, the implementation of a fully numerical LR framework for forensic voice comparison is still a way off • but the outlook is positive… – 45+ reference speakers – relatively large N tokens – awareness of sociolinguistic factors which affect within- and between-speaker variation Hughes & Foulkes, BAAP 2012 29 5. conclusion
  30. 30. Thanks Questions? Hughes & Foulkes, BAAP 2012 30 Vincent Hughes vh503@york.ac.uk Paul Foulkes pf11@york.ac.uk
  31. 31. Hughes & Foulkes, BAAP 2012 31 Aitken, C. and Stoney, D. A. (1991) The use of statistics in forensic science. London: Ellis Horwood. Aitken, C. G. G. and Taroni, F. (2001) Statistics and the evaluation of evidence for forensic scientists (2nd edition). Chichester: John Wiley & Sons. Aitken, C. G. G. and Lucy, D. (2004) Evaluation of trace evidence in the form of multivariate data. Applied Statistics 54: 109- 122. Brümmer, N. and du Preez, J. (2006) Application independent evaluation of speaker detection. Computer Speech and Language 20: 230-275. Champod, C. and Evett, I. W. (2000) Commentary on A. P. A. Broeders (1999) ‘Some observations on the use of probability scales in forensic identification’. Forensic Linguistics 7(2): 238-243. French, J. P. and Harrison, P. (2007) Position Statement concerning use of impressionistic likelihood terms in forensic speaker comparison cases. International Journal of Speech, Language and the Law 14(1): 137-144. French, J. P., Nolan, F., Foulkes, P., Harrison, P. and McDougall, K. (2010) The UK position statement on forensic speaker comparison: a rejoinder to Rose and Morrison. International Journal of Speech, Language and the Law 17(1): 138-163. Hawkins, S. and Clermont, F. (2009) A new approach to evaluating the likelihood ratio test for forensic speaker comparison: sample size, confidence intervals and intrinsic dimension. Unpublished paper submitted for presentation at Interspeech International Conference. Brighton, United Kingdom. Ishihara, S. and Kinoshita, Y. (2008) How many do we need? Exploration of the Population Size Effect on the performance of forensic speaker classification. Paper presented at the 9th Annual Conference of the International Speech Communication Association (Interspeech). Brisbane, Australia. 1941-1944. Kinoshita, Y., Ishihara, S. and Rose, P. (2009) Exploring the discriminatory potential of F0 distribution parameters in traditional speaker recognition. International Journal of Speech, Language and the Law 16(1): 91-111. Loakes, D. (2006) A forensic phonetic investigation into the speech patterns of identical and non-identical twins. PhD Dissertation, University of Melbourne. McDougall, K. (2006) Dynamic features of speech and the characterisation of speakers: towards a new approach using formantfrequencies. InternationalJournalof Speech,Languageand the Law 13(1):89-126. References
  32. 32. Hughes & Foulkes, BAAP 2012 32 McDougall, K. (2004) Speaker-specific formant dynamics: An experiment on Australian English /aɪ/. International Journal of Speech,Languageand the Law 11(1):103-130. Morrison, G. S. (2007) Matlab implementation of Aitken and Lucy’s (2004) Forensic Likelihood-Ratio Software Using Multivariate-Kernel-Density Estimation [software].Available: http://geoff-morrison.net Morrison, G. S. (2008) Forensic voice comparison using likelihood ratios based on polynomial curves fitted to the formant trajectories of Australian English /aI/. International Journal of Speech, Language and the Law 5(2): 249-266. Robertson, B. and Vignaux, G. A. (1995) Interpreting evidence: evaluating forensic science in the courtroom. Chichester: John Wiley & Sons. Rose, P. (2007) Going and getting it – Forensic speaker recognition from the perspective of a traditional practitioner- researcher. Paper presented at the Australian Research Council Network in Human Communication Science Workshop: FSI not CSI – Perspectives in State-of-the-Art Forensic Speaker Recognition, Sydney. http://forensic-voice-comparison.net/document Rose, P. (2011) Forensic voice comparison with Japanese vowel acoustics – alikelihood ratio-based approach using segmental cepstra. Proceedings of the 17th International Congress of Phonetic Sciences. 17-21 August 2011, Hong Kong. 1718-1721. Rose, P., Osanai, T. and Kinoshita, Y. (2003) Strength of forensic speaker identification evidence multispeaker formant- and cepstrum-based segmental discrimination with a Bayesian likelihood ratio as threshold. Forensic Linguistics 10(2): 179-202. Rose, P., Kinoshita, Y. and Alderman, T. (2006) Realistic extrinsic forensic speaker discrimination with the diphthong /aI/. Proceedings of the 10th Australian Conference on Speech Science and Technology, 8-10 December 2004, Sydney: Macquarie University. 329-334. Rose, P. and Morrison, G. S. (2009) A response to the UK Position Statement on forensic speaker comparison. International Journal of Speech, Language and the Law 16(1): 139-163. Wells, J. C. (1982) Accents of English (3 vols). Cambridge: Cambridge University Press. References

×