Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

What is the relevant population? Considerations for the computation of likelihood ratios in forensic voice comparison

62 views

Published on

Hughes, V. and Foulkes, P. (2017) What is the relevant population? Considerations for the computation of likelihood ratios in forensic voice comparison. Paper presented at Interspeech, University of Stockholm, Sweden. 20-24 August 2017.

Published in: Education
  • Be the first to comment

  • Be the first to like this

What is the relevant population? Considerations for the computation of likelihood ratios in forensic voice comparison

  1. 1. What is the relevant population? Considerations for the computation of likelihood ratios in forensic voice comparison Vincent Hughes and Paul Foulkes
  2. 2. 1. The likelihood ratio (LR) p(E|Hp,I) p(E|Hd,I) 2 where: p probability E evidence (observations) Hp prosecution proposition Hp defence proposition I background information same speaker offender = suspect different speakers offender ≠ suspect
  3. 3. 1. The likelihood ratio (LR) • Hicks et al.(2015) definition of propositions: – should not include observations (unless the observations have no value or require no expert knowledge to evaluate) • Morrison et al.(2016) – expert should refine the relevant population based on properties of the offender – based on accent (regional background) and sex (male/female) which will be “usually be perceptually salient to all parties” • But… this fails to acknowledge the complexity of speech evidence 3
  4. 4. 2. Speech evidence • Lay listeners are not good at determining group-level characteristics(Van Bezooijen and Gooskens 1999) • Speech community: overlapping regional/social groups – multiple potential groups – indirect link between speech community and ling output – speakers do not have a monolithic way of speaking • Morrison et al (2016):implication that accent/sex are the most important and are easily extractable from offender sample – neither is true… 4
  5. 5. 3. Considerations a. Group-level characteristics to control b. Specificity c. Error d. Certainty 5
  6. 6. 4. This study a. Group-level characteristics to control b. Specificity c. Error d. Certainty 6 To what extent are LRs and system performance affected by the specificity of the relevant population?
  7. 7. 5. Method: Features • 121 young male speakers – 72 Standard Southern British English (SSBE) – 8 Manchester, 8 Derby, 8 Newcastle 7 • F1-F3 trajectories of /aɪ/ (10-43 tokens/speaker) • Fitted with cubic polynomial curves • 4 input values per formant
  8. 8. 5. Method: LR Testing • Test data = 40 SSBE speakers • Population conditions: Matched = 32 SSBE speakers “the voice in the offender sample does not belong to the suspect but to another male speaker of SSBE” Mixed = 8 SSBE/ 8 Derby/ 8 Manchester/ 8 Newcastle speakers “the voice in the offender sample does not belong to the suspect but to another male speaker of British English” 8
  9. 9. 5. Method: LR Testing • Cross-validated scores computed separately for Matched and Mixed sets using Multivariate Kernel Density (MVKD;Aitken and Lucy 2004) approach – used to calculate logistic regression calibration coefficients • Separate sets of scores computed for the test data using Matched and Mixed sets as reference data – calibration coefficients applied to appropriate sets – output = parallel sets of Matched and Mixed same- (40) and different-speaker (1560) log LRs (LLRs) 9
  10. 10. 5. Method: LR Testing • Experiment repeated using different input: – F1, F2,and F3 – F2 and F3 – F3-only • System performance evaluated using equal error rate (EER) and log LR cost (Cllr) – in both cases, optimal performance closer to zero 10
  11. 11. 6. Results: F1, F2 and F3 11 Matched Mixed
  12. 12. 6. Results: F2 and F3 12 Matched Mixed
  13. 13. 6. Results: F3-only 13 Matched Mixed
  14. 14. 6. Results: Validity 14 EER Cllr
  15. 15. 7. Discussion • SS evidence comparable across Matched and Mixed definitions – DS evidence considerably weaker (by up to four orders of magnitude) • Validity consistently worse using Mixed definition (by up to 7% EER and 0.15 Cllr) • Removal of F1 and F2: – lower magnitude LRs and poorer system performance – but… smaller difference between Matched and Mixed output 15
  16. 16. 7. Discussion 16 In principle… What we should do/ what we want to do Translating this into the real world = problematic Casework What we can do given the limitations of the data that are available Courtroom Explaining that strength of evidence is based on assumptions – explaining these assumptions
  17. 17. 8. Conclusions • Effects of different definitionsof the relevant population on LRs and system performance – factors to control – specificity – error – certainty • “Open scientific debate,in an atmosphere of mutual respect,is a key enabler to progress,especially when it comes to the complexity of interpretative issues in forensic science” (Hicks et al 2017) 17 sensible decisions rely on: • sociolinguistic knowledge of the community • understanding the potential magnitude and direction of effects of different assumptions
  18. 18. vincent.hughes@york.ac.uk www.vincent-hughes.com Follow me on Twitter: @VinceH_Forensic Thanks to Richard Rhodes, Jonas Lindh and Michael Jessen for comments and suggestions

×