Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The role of regional variation in the quantification of strength of evidence for forensic voice comparison

40 views

Published on

Hughes, V. and Foulkes, P. (2013) The role of regional variation in the quantification of strength of evidence for forensic voice comparison. Paper presented at VALP2 Conference, University of Canterbury, Christchurch, NZ. 16-18 January 2013.

Published in: Education
  • Be the first to comment

  • Be the first to like this

The role of regional variation in the quantification of strength of evidence for forensic voice comparison

  1. 1. The role of regional variation in the quantification of strength-of-evidence for forensic voice comparison Vincent Hughes Paul Foulkes Department of Language and Linguistic Science Variation and Language Processing 2 Conference (VALP2) New Zealand Institute of Language, Brain and Behaviour (NZILBB) University of Canterbury 16-18 January 2013
  2. 2. 2 Hughes & Foulkes VALP2 • voice of criminal (disputed) vs. voice of suspect (known) – disputed (DS) = threatening phone calls, wire-tap recording, bomb threat... – known (KS) = police interview recording (in the UK) • “ultimate issue” (Lynch and McNally 2003:96): do the known and disputed recordings contain the voice of the same or different individuals? – an issue for the trier-of-fact (*not the expert) to determine 1.1 forensic voice comparison (FVC)
  3. 3. 1.2 likelihood ratio (LR) 3 Hughes & Foulkes VALP2 • “logically and legally correct framework” for assessing the strength of forensic comparison evidence (Rose & Morrison 2009: 143) p(E|Hp) p(E|Hd) • Hp (prosecution hypothesis) = same-speaker • Hd (defence hypothesis) = different-speakers LR =
  4. 4. 4 Hughes & Foulkes VALP2 /i:/ Hd defence Hp prosecution 1.3 what does this mean?
  5. 5. 5 Hughes & Foulkes VALP2 adapted from Berger (2012) ...and for the trier-of-fact? innocent guilty beyond a reasonable doubt
  6. 6. 1.4 how does this work for FVC? 6 Hughes & Foulkes VALP2 250 350 450 550 650 750 8001000120014001600180020002200 F1 (Hz) F2 (Hz) Known Sample Disputed Sample (i) how similar are the two samples with regard to the parameter under investigation? FLEECE /i:/ mid-point F1 and F2 values
  7. 7. 1.4 how does this work for FVC? 7 Hughes & Foulkes VALP2 250 350 450 550 650 750 8001000120014001600180020002200 F1 (Hz) F2 (Hz) Known Sample Disputed Sample (ii) what is the probability of these values assuming it’s within-speaker variation? p(E|Hp) FLEECE /i:/ mid-point F1 and F2 values
  8. 8. 1.4 how does this work for FVC? 8 Hughes & Foulkes VALP2 250 350 450 550 650 750 8001000120014001600180020002200 F1 (Hz) F2 (Hz) Known Sample Disputed Sample (iii) what is the probability of these values assuming it’s between-speaker variation? p(E|Hd) FLEECE /i:/ mid-point F1 and F2 values
  9. 9. • question (i) relates to similarity between KS and DS • to answer questions (ii) and (iii) we need to know the typicality of within-speaker and between-speaker variation • typicality = dependent on patterns in the relevant population (Aitken & Taroni2004) – quantified relative to a sampled sub-section of that population(reference data) 9 Hughes & Foulkes VALP2 1.4 how does this work for FVC?
  10. 10. • Rose (2004:4) default Hd = “same-sex speaker(s) of the language” – ‘logical relevance’ (Kaye 2004, 2008) • inevitable mismatch between the off-the-shelf data and the facts of the case at trial (Loakes 2006) • LRs necessary vary with different reference data 10 Hughes & Foulkes VALP2 1.5 defining the relevant population
  11. 11. To what extent are LRs affected by dialect mismatch between target voice and reference data for... i) GOOSE – predicted to display little between- community variation? ii) PRICE – predicted to display considerablebetween- community variation? 11 Hughes & Foulkes VALP2 2.0 Research questions
  12. 12. 12 Hughes & Foulkes VALP2 12 Raw LR Log10 LR Verbal expression >10000 4 - 5 Very strong evidence 1000-10000 3 - 4 Strong evidence 100-1000 2 - 3 Moderately strong evidence 10-100 1 - 2 Moderate evidence 1-10 0 - 1 Limited evidence 1-0.1 0 - -1 Limited evidence 0.1-0.01 -1 - -2 Moderate evidence 0.01-0.001 -2 - -3 Moderately strong evidence 0.001-0.0001 -3 - -4 Strong evidence <0.0001 -4 - -5 Very strong evidence Champod and Evett (2000) Hp Hd
  13. 13. 13 Hughes & Foulkes VALP2 • dynamic time- normalised measurements (McDougall 2004, 2006) • GOOSE (F1 and F2)/ PRICE (F1, F2 and F3) • data reduction using quadratic polynomials 3.0 Method cbxaxy ++= 2 • LR computed using Multivariate Kernel Density formula (Aitken and Lucy 2004, Morrison 2007)
  14. 14. • test data (4 sets) – NZE/ Manchester/ Newcastle/ York – 8 male speakers per set (aged 16-31) – 16 tokens per speaker (coded for context) • reference data (1 set) – ONZE (Canterbury Corpus) – 120 male speakers (born 1932-1987) – 10 tokens per speaker (coded for context) – auto-generated formant data 14 Hughes & Foulkes VALP2 4.1 Results: GOOSE /u:/
  15. 15. • not a regional stereotype (Labov 1971) of any of the test set dialects 15 200 300 400 500 600 700 800 05001000150020002500 F1 (Hz) F2 (Hz) Manchester Newcastle York ONZE 4.1 Results: why GOOSE?
  16. 16. -5 -4 -3 -2 -1 0 1 2 3 4 5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Log10 Likelihood Ratio Cumulative Proportion 16 Hughes & Foulkes VALP2 16 Support for prosecution (same speaker) Support for defence (different speakers) 4.1 Results: tippett plots
  17. 17. 17 Hughes & Foulkes VALP2 same-speaker pairs ONZE (match) Newcastle Manchester York ONZE (match) Newcastle Manchester York 4.1 Results: GOOSE (F1 and F2)
  18. 18. 18 Hughes & Foulkes VALP2 same-speaker pairs ONZE (match) Newcastle Manchester York different-speaker pairs 4.1 Results: GOOSE (F1 and F2)
  19. 19. 19 Hughes & Foulkes VALP2 same-speaker pairs ONZE (match) Newcastle Manchester York different-speaker pairs 71% 58% 4.1 Results: GOOSE (F1 and F2)
  20. 20. • test data (1 set): – 20 SSBE speakers (male, young, MC) – 10 tokens per speaker • reference data (2 sets): tailored = 32 Standard Southern British English (SSBE) speakers (match with test set) mixed = 32 BrEng speakers (SSBE, Derby, Manchester, Newcastle) - young, males, spontaneous speech 20 Hughes & Foulkes VALP2 4.1 Results: PRICE /aɪ/
  21. 21. • regional stereotype across these varieties • interested in whether regional variation is also encoded in F3 0 500 1000 1500 2000 2500 3000 0 10 20 30 40 50 60 70 80 90 100 Frequency (Hz) +10% step DyVis Derby Manchester Newcastle 4.1 Results: Why PRICE?
  22. 22. 22 same-speaker pairs different-speaker pairs Hughes & Foulkes VALP2 tailored gen BrEng 4.2 Results: PRICE (F1, F2 and F3)
  23. 23. 23 same-speaker pairs different-speaker pairs Hughes & Foulkes VALP2 tailored gen BrEng 4.2 Results: PRICE (F1, F2 and F3) 35%
  24. 24. 24 same-speaker pairs different-speaker pairs Hughes & Foulkes VALP2 tailored gen BrEng 4.2 Results: PRICE (F3 only)
  25. 25. • same-speaker strength of evidence overestimated – generally equivalent to one verbal category • multitude of issues with different-speaker pairs – overestimation of LRs for York (BUT issues of between-speaker variation) – high levels of contrary to fact support for the prosecution for Manchester and Newcastle – potential miscarriages of justice 25 Hughes & Foulkes VALP2 5.1 Discussion: GOOSE
  26. 26. • SS evidence overestimated using F1, F2 and F3 tailored = ‘moderate support’ mixed = ‘moderately strong support’ • SS and DS evidence similar across ref sets using F3 only • underestimation of DS evidence with mixed ref set even using F3 only – regional information encoded in F3? – little sociophoneticresearch on F3 26 Hughes & Foulkes VALP2 5.2 Discussion: PRICE
  27. 27. 27 Hughes & Foulkes VALP2 6.0 Conclusion • predictably dialect matters - even for features which aren’t expected to display considerable variation according to region • structured variation = invaluable resource for assessing typicality - help us to provide a more meaningful estimation of strength of evidence
  28. 28. more broadly the LR makes us think about... • fine-grained variationbetween communities • the extent of between-speaker variation within a single community • definingtruly homogeneousspeechcommunities • modelling the possible ranges of within-speaker variation = essential in order to provide a meaningful estimateof strengthof evidence - good speaker discriminants need to display high between- speaker variability and low within-speaker variability 28 Hughes & Foulkes VALP2 6.0 Conclusion
  29. 29. Thanks Questions? 29 Vincent Hughes vh503@york.ac.uk Hughes & Foulkes VALP2
  30. 30. References Aitken, C. G. G. and Taroni, F. (2004) Statistics and the evaluation of evidence for forensic scientists (2nd edition). Chichester: John Wiley & Sons. Berger, C. (2012) Modern evidential interpretation, reporting and fallacies. Lecture given at the BBfor2 Summer School in Forensic Evidence Evaluation and Validation. Universidad Autonoma de Madrid, Spain. 18-21 July 2012. Brümmer, N. and du Preez, J. (2006) Application independent evaluation of speaker detection. Computer Speech and Language 20: 230-275. Champod, C. and Evett, I. W. (2000) Commentary on A. P. A. Broeders (1999) ‘Some observations on the use of probability scales in forensic identification’. Forensic Linguistics 7(2): 238-243. Ishihara, S. and Kinoshita, Y. (2008) How many do we need? Exploration of the Population Size Effect on the performance of forensic speaker classification. Paper presented at the 9th Annual Conference of the International Speech Communication Association (Interspeech). Brisbane, Australia. 1941-1944. Kaye, D. H. (2004) Logical relevance: problems with the reference population and DNA mixtures in People v. Pizarro. Law, Probability and Risk 3: 211-220. Kaye, D. H. (2008) DNA probabilities in People v. Prince: When are racial and ethnic statistics relevant? In Speed, T. And Nolan, D. (eds.) Probability and Statistics: Essays in Honour of David A Freedman. Beachwood, OH: Institute of Mathematical Statistics. 289-301. 30 Hughes & Foulkes VALP2
  31. 31. 31 Hughes & Foulkes VALP2 Kinoshita, Y., Ishihara, S. and Rose, P. (2009) Exploring the discriminatory potential of F0 distribution parameters in traditional speaker recognition. International Journal of Speech, Language and the Law 16(1): 91-111. Labov, W. (1971) The study of language in its social context. In Fishman, J. A. (ed.) Advances in the Sociology of Language (vol. 1). The Hague: Mouton. 152-216. Loakes, D. (2006) A forensic phonetic investigation into the speech patterns of identical and non-identical twins. PhD Dissertation, University of Melbourne. McDougall, K. (2004) Speaker-specific formant dynamics: An experiment on Australian English /aɪ/. International Journalof Speech, Language and the Law 11(1): 103-130. McDougall, K. (2006) Dynamic features of speech and the characterisation of speakers: towards a new approach using formant frequencies. International Journal of Speech, Language and the Law 13(1): 89-126. Morrison, G. S. (2007) Matlab implementation of Aitken and Lucy’s (2004) Forensic Likelihood-Ratio Software Using Multivariate-Kernel-Density Estimation [software]. Available: http://geoff-morrison.net. Morrison, G. S. (2008) Forensic voice comparison using likelihood ratios based on polynomial curves fitted to the formant trajectories of Australian English /aI/. International Journalof Speech, Language and the Law 5(2): 249-266. Rose, P. (2004) Technical Forensic Speaker Identification from a Bayesian Linguist's Perspective. Keynote paper, Forensic Speaker Recognition Workshop, Speaker Odyssey ’04. 31 May - 3 June 2004, Toledo, Spain. 3-10.
  32. 32. 32 Hughes & Foulkes VALP2 Rose, P. (2011) Forensic voice comparison with Japanese vowel acoustics – a likelihood ratio- based approach using segmental cepstra. Proceedings of the 17th International Congress of Phonetic Sciences. 17-21 August 2011, Hong Kong. 1718-1721. Rose, P., Osanai, T. and Kinoshita, Y. (2003) Strength of forensic speaker identification evidence multispeaker formant- and cepstrum-based segmental discrimination with a Bayesian likelihood ratio as threshold. Forensic Linguistics 10(2): 179-202. Rose, P., Kinoshita, Y. and Alderman, T. (2006) Realistic extrinsic forensic speaker discrimination with the diphthong /aI/. Proceedings of the 10th Australian Conference on Speech Science and Technology, 8-10 December 2004, Sydney: Macquarie University. 329-334 Rose, P. and Morrison, G. S. (2009) A response to the UK Position Statement on forensic speaker comparison. International Journal of Speech, Language and the Law 16(1): 139-163. Wells, J. C. (1982) Accents of English (3 vols). Cambridge: Cambridge University Press.

×