Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The relevant population and the estimation of typicality in traditional linguistic-phonetic forensic voice comparison

Hughes, V. (2013) The relevant population and the estimation of typicality in traditional linguistic-phonetic forensic voice comparison. Department of Electrical and Computer Engineering (ECE) Seminar, University of Auckland, Auckland, NZ. 28 March 2013. (INVITED TALK)

  • Login to see the comments

The relevant population and the estimation of typicality in traditional linguistic-phonetic forensic voice comparison

  1. 1. The relevant population and the estimation of typicality in traditional linguistic-phonetic forensic voice comparison Vincent Hughes Department of Language and Linguistic Science Department of Engineering Seminar The University of Auckland 28th March 2013
  2. 2. 0. Outline focus of this talk: • introduction to linguistic-phonetic forensic voice comparison (FVC) and the likelihood ratio (LR) • estimating typicality: default defence hypothesis, reference data • theoretical concerns: what’s the relevant population? • practical concerns: how much reference data? University of Auckland Seminar 28th March 2013 2
  3. 3. 1. Introduction
  4. 4. • forensic voice comparison (FVC) = voice of criminal (disputed) vs. voice of suspect (known) – disputed (DS) = threatening phone calls, bomb threat... • increasingly recorded via telephone transmission/ mobile phone recording devices – known (KS) = police interview recording (in the UK) • auditory-acoustic phonetic approach (Gold and French 2011) – detailed analytical listening/ acoustic analysis 4 1. Introduction University of Auckland Seminar 28th March 2013
  5. 5. • range of parameters analysed (Gold and French 2011): – segmental (vowels, consonants) – suprasegmental (f0, intonation, articulation rate) – higher-order linguistic (lexical choice, syntax) – voice quality/ vocal setting • mixture of continuous/ discrete data: – often normally distributed, but not necessarily (cf. Aitken and Gold 2012) 5 1. Introduction University of Auckland Seminar 28th March 2013
  6. 6. 1. Introduction within-speaker variability • speech is inherently variable within individuals – two utterances produced by the same individual will never be exactly the same • therefore, unlike DNA, never be 1:1 match between DS and KS assuming Hp • multiple sources of within-speaker variability – long term = aging/ habitual behaviour(smoking etc.) – short term = stylistic factors/ time of day/ emotion 6 University of Auckland Seminar 28th March 2013
  7. 7. 1. Introduction between-speaker variation • constrained by (amongst others): – anatomical factors (size + shape of vocal tract) – phonological factors (pre-/ foll- sound) – regional/ social factors (dialect/ age/ sex/ ethnicity) • all of these interact with each other and affect different linguistic-phonetic parameters in different ways 7 University of Auckland Seminar 28th March 2013
  8. 8. • good discriminants = low within-speaker variability/ high between-speaker variation (Nolan 1983) – but most ling-phonparameters relatively poor discriminants • componential approach preferred: – multidimensional speaker-space (Nolan 1991) – provide useful strength of evidence to the courts 8 1. Introduction University of Auckland Seminar 28th March 2013
  9. 9. • courts are dependent on notions of conditional probability • in the courtroom we have two mutually exclusive hypotheses: – either the defendant is innocent or guilty • the formal way to think about conditional probability in the courtroom is to use Bayes’ Theorem (1763) University of Auckland Seminar 28th March 2013 9 1. Introduction
  10. 10. University of Auckland Seminar 28th March 2013 10 prior odds evidence posterior oddsx = 1. Introduction
  11. 11. University of Auckland Seminar 28th March 2013 11 p = probability Hp = prosecution hypothesis (guilty) Hd = defence hypothesis (innocent) E = evidence | = ‘given’ odds form of Bayes’ Theorem: prior odds evidence posterior odds 1. Introduction
  12. 12. University of Auckland Seminar 28th March 2013 12 p = probability Hp = prosecution hypothesis (guilty) Hd = defence hypothesis (innocent) E = evidence | = ‘given’ odds form of Bayes’ Theorem: prior odds evidence posterior odds trier of fact (judge/ jury) 1. Introduction
  13. 13. University of Auckland Seminar 28th March 2013 13 p = probability Hp = prosecution hypothesis (guilty) Hd = defence hypothesis (innocent) E = evidence | = ‘given’ odds form of Bayes’ Theorem: prior odds evidence posterior odds forensic expert 1. Introduction
  14. 14. p(E|Hp) p(E|Hd) likelihood ratio (LR) = 1. Introduction • gradient assessment of strength of evidence – LR > 1 = support for prosecution – LR < 1 = support for defence • logically and legally correct innocent beyond a reasonable doubt guilty adapted from Berger (2012) 14 University of Auckland Seminar 28th March 2013
  15. 15. • LR = similarity and typicality – it matters “whether the values found matching (…) are vanishingly rare, or sporadic, or near universal” (Nolan 2001:16) – typicality of values within- and between-speakers • typicality = dependent on patterns in the “relevant population” (Aitken and Taroni 2004) – quantified relative to a sample of the population – distributions modelledstatistically to generate numerical output 15 1. Introduction University of Auckland Seminar 28th March 2013
  16. 16. 2. Theoretical issues
  17. 17. 2. The relevant population • depends on the question being asked – defined by the defence hypothesis • logically the relevant population is the same across all forms of expert evidence • problem: defence offer vague/no alternative hypothesis – need an assumed default defence hypothesis which is more specific than “it was someone else in the population” 17 University of Auckland Seminar 28th March 2013
  18. 18. 2.1 Current approaches (i) “logical relevance” (Kaye 2004,2008) • frequency of variables in the population is constrained by certain factors • between-speaker variation - “language, sex” (Rose 2004:4) - age (Loakes 2006; Rose 2012) • within-speaker variation - “non-contemporaneity” (Rose, Morrison…) 18 University of Auckland Seminar 28th March 2013
  19. 19. 2.1 Current approaches Problems • can’t ever truly know what is logically relevant • reflects a limited view of variation • why sex and language above other sources of between-speaker variation? - how narrowly do define things like regional background? • sources of within-speaker variation in KS and DS not captured by ‘non-contemporaneity’ - not recreating the important facts of the case at trial - different interlocutor/ style/ topic… 19 University of Auckland Seminar 28th March 2013
  20. 20. 2.1 Current approaches (ii) Speaker-similarity (Morrison et al 2012) • similar sounding speakers to the offender as judged by lay listeners – because it was a lay listener (police officer) who made the decision to submit the samples for analysis • listeners match characteristics of the person who made the original decision: – e.g. young, male police officer from Melbourne… 20 University of Auckland Seminar 28th March 2013
  21. 21. 2.1 Current approaches Problems • limited view of variation in production and perception • what factors do we control in our listeners? • what do the listeners hear? - some controls on the part of the expert (usually sex and language again) • lay listeners are linguistically-erratic when it comes to assessing speaker-similarity (McDougall 2011) - inconsistent with a single conceptual relevant population across all forensic evidences - relevant population won’t be “those persons who could have been involved (in the crime)” (Coleman and Walls 1974:276) 21 University of Auckland Seminar 28th March 2013
  22. 22. 2.2 Research questions 22 University of Auckland Seminar 28th March 2013 • to what extent are LRs affected by different definitions of the relevant population? – little understanding of how the LR is affected by variation in the reference data – which factors to control and which to ignore? (there are so many!)
  23. 23. • single test set from which multiple pairs of speakers (with known outcomes) are compared – compare LRs with different reference sets – estimate of similarity remains consistent across conditions – focus on typicality element 23 2.3 General methods University of Auckland Seminar 28th March 2013
  24. 24. 24 • dynamic time- normalised measurements (McDougall 2004, 2006) • data reduction using quadratic polynomials cbxaxy ++= 2 • LR computed using Multivariate Kernel Density formula (Aitken and Lucy 2004, Morrison 2007) 2.3 General methods University of Auckland Seminar 28th March 2013
  25. 25. -5 -4 -3 -2 -1 0 1 2 3 4 5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Log10 Likelihood Ratio Cumulative Proportion 2525 Support for prosecution (same speaker) Support for defence (different speakers) 2.3 General methods University of Auckland Seminar 28th March 2013
  26. 26. Performance metrics • Equal error rate (EER): – “hard detector”àbased on categorical accept/reject – % false accept = % false reject • Log LR Cost function (Cllr) – “soft detector” à gradient goodness of set of LRs – closer to 0 = better 26 2.3 General methods University of Auckland Seminar 28th March 2013
  27. 27. 3.1 Regional background
  28. 28. e.g. size, lies, dice, mice • test data (1 set) – 20 Standard Southern British English (SSBE) speakers – 10 tokens per speaker • reference data (2 sets) – tailored = 32 SSBE speakers (match with test set); DyViS database (Nolan et al 2009) – mixed = 32 BrEng speakers (SSBE, Derby, Manchester, Newcastle) – young, males, spontaneous speech, mock police interview 28 3.1 PRICE /ai/ University of Auckland Seminar 28th March 2013
  29. 29. e.g. size, lies, dice, mice • regional stereotype across these varieties • interested in whether regional variation is also encoded in F3 3.1 PRICE /ai/ 0 500 1000 1500 2000 2500 3000 0 10 20 30 40 50 60 70 80 90 100 Frequency (Hz) +10% step DyVis Derby Manchester Newcastle
  30. 30. 30 same-speaker pairs different-speaker pairs tailored gen BrEng e.g. size, lies, dice, mice 3.1 PRICE /ai/
  31. 31. e.g. size, lies, dice, mice 31 3.1 PRICE /ai/ University of Auckland Seminar 28th March 2013 0 5 10 15 20 25 30 F1-F3 F2-F3 F3 EER(%) EER tailored_test_data mixed_ref_data 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 F1-F3 F2-F3 F3 LogLRCost(Cllr) Cllr
  32. 32. 3.2 Socio-economic class
  33. 33. e.g. base, case, race, disgrace • all data from Canterbury Corpus of Origins of New Zealand English (ONZE) Database (Gordon et al 2007) socio-economic class: • test data (1 set) – 44 professional male speakers/ 8 tokens per speaker • reference data (3 sets – 40 sp/set) – tailored = all professional (males) – mismatch = all non-professional (males) – mixed = equal mix of prof/non-prof (males) 33 3.2 FACE /ei/ University of Auckland Seminar 28th March 2013
  34. 34. 34 3.2 FACE /ei/ University of Auckland Seminar 28th March 2013 F1, F2 and F3 Reference Set Cllr Tailored 0.936 Mismatch 0.753 Mixed 0.851
  35. 35. 4. Practical issues
  36. 36. 4. Reference data (i) case-by-case basis (Rose 2007) • need reference data for every feature we would want to analyse • time consuming/ still inevitable mismatch with facts of the case at trial (ii) existing database (e.g. Nolan et al 2009) • may be forensic/sociolinguistic • appropriate? enough data? 36 University of Auckland Seminar 28th March 2013
  37. 37. University of Auckland Seminar 28th March 2013 37 4. Reference data • How many speakers do we need? – depends on your desired level of precision (Rose 2012, Wackerlyet al 2008) – Ishihara and Kinoshita (2008): LRs not robust to small N reference speakers – Rose (2012): same-speaker LRs asymptotic by around 30 speakers • How much data per speaker do we need? – meaningful estimation of within-sp variation
  38. 38. 4.1 Pilot study: small N
  39. 39. University of Auckland Seminar 28th March 2013 39 4.1.1 N Speakers: /ai/ Same-speaker pairs
  40. 40. University of Auckland Seminar 28th March 2013 40 4.1.2 N Tokens p speaker: /ai/ Same-speaker pairs
  41. 41. University of Auckland Seminar 28th March 2013 41 4.1.2 N Speakers: /ai/ Different-speaker pairs
  42. 42. 4.2 Monte Carlo simulations: large N
  43. 43. 43 4.2 Monte Carlo simulations • Monte Carlo simulations (MCS) offer a way of investigating upper limits – use MCS to generate synthetic data from a sample of raw data – requires raw data to be representative (!!) – properties of the distribution of synthetic data defined by the raw data – Rose (2012): no real differences in LRs between 30 and 10,000 speakers University of Auckland Seminar 28th March 2013
  44. 44. • MCS to test N speakers/ N tokens per speaker for local articulation rate (AR) – data from Gold (in prep) – raw data = 79 speakers/ 26 ‘tokens’ per speaker – generated synthetic mean and SD values for new speakers (up to 10,000 speakers) – then generated 26 tokens per speaker from each of the synthetic normal distributions 44 4.2 Monte Carlo simulations University of Auckland Seminar 28th March 2013
  45. 45. 45 4.2.1 N Speakers: local AR University of Auckland Seminar 28th March 2013
  46. 46. 46 4.2.1 N Speakers: local AR University of Auckland Seminar 28th March 2013
  47. 47. 47 4.2.2 N tokens p speaker: local AR University of Auckland Seminar 28th March 2013
  48. 48. 5. Discussion
  49. 49. • more nuanced view of how speech varies within- and between-speakers needed • different features are affected by different factors in different ways: - we need to think beyond ‘regional background and sex’ • general reference sets: - greater between-spvariation = overestimate SS LRs/ underestimate DS LRs 49 5.1 The relevant population University of Auckland Seminar 28th March 2013
  50. 50. • it’s an issue of knowing the sociophonetics of the data you’ve been asked to analyse - not to assume that everyone broadly ‘speaks the same’ • speaker-similarity (but the relevant population must be sociolinguistically coherent) - consistent with a single underlying relevant population across all evidence - based on logical relevance without having to be explicit 50 5.1 The relevant population University of Auckland Seminar 28th March 2013
  51. 51. • best to avoid small samples (<20 models of within- and between-speaker variation not representative) • different parameters behave in different way - dependent on inherent speaker discriminatory power • all cases may require some sample size testing (MCS?) - but MCS no solution to small N (!!) 51 5.2 Size of the sample University of Auckland Seminar 28th March 2013
  52. 52. 6. Conclusion
  53. 53. 6. Conclusion • plenty of arguments why the LR is the logically and legally correct framework for forensic evidence – keeps the role of expert and trier of fact separate – forces the expert to analyse only the specific piece of evidence 53 University of Auckland Seminar 28th March 2013
  54. 54. • the definition of the relevant population in the absence of a specific alternative hypothesis is not a theoretically or practically trivial issue – different definitions have different effects on the strength of evidence achieved (and the courts need to be aware of this) – different approaches cause different types of challenges 54 6. Conclusions University of Auckland Seminar 28th March 2013
  55. 55. • challenges aren’t a reason for abandoning the LR • BUT… greater awareness and acknowledgement the complex dimensions on which speech varies within- and between-speaker is essential – the framework needs to fit the data, NOT the other way around 55 6. Conclusions University of Auckland Seminar 28th March 2013
  56. 56. Thanks! Questions? Acknowledgements: ESRC, Paul Foulkes, Erica Gold, Peter French, Dom Watt, FSS Research Group (York), Ashley Brereton (University of Liverpool), NZILBB (Canterbury)

×