Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Issues and opportunities: the application of the numerical likelihood ratio framework to forensic speaker comparison

Hughes, V. and Gold, E. (2014) Issues and opportunities: the application of the numerical likelihood ratio framework to forensic speaker comparison. UK Government Biometrics Working Group Meeting 65, National Physical Laboratory, Teddington, UK. 20 June 2014. (INVITED TALK)

  • Login to see the comments

  • Be the first to like this

Issues and opportunities: the application of the numerical likelihood ratio framework to forensic speaker comparison

  1. 1. Vincent Hughes and Erica Gold Issues and opportunities: the application of the numerical likelihood ratio framework to forensic speaker comparison Biometrics Working Group 20 June 2014
  2. 2. Gold, E. and Hughes, V. (in press) Issues and opportunities: the application of the numerical likelihood ratio framework to forensic speaker comparison. Science and Justice. <> 2
  3. 3. outline n introduction n forensic speaker comparison (FSC) n likelihood ratio (LR) n LR in FSC research n complexity of speech evidence n issues and alternative approaches: (i) modelling (ii) relevant population (iii) correlations n discussion 3
  4. 4. Introduction 4
  5. 5. introduction n forensic speaker comparison (FSC) = voice of offender (unknown) vs. voice of suspect (known) n ultimate issue (Lynch and McNally 2003): is the person on the offender recording the same as the person on the suspect recording? n auditory-acoustic linguistic-phonetic analysis n analysis of a range of segmental (vowels, consonants), suprasegmental (f0, intonation, AR), higher-order linguistic (lexical choice, syntax) and VQ/vocal setting 5
  6. 6. introduction 6 POSITIVE IDENTIFICATION NEGATIVE IDENTIFICATION sure beyond reasonable doubt probable there can be very little doubt quite probable highly likely likely likely highly likely very probable probable quite possible possible … that they are the same person … that they are different people what can’t the expert say? ✗
  7. 7. introduction why? n how likely it is that the suspect is the offender given the evidence? n assessment of innocence/ guilt n role of the judge/ jury (trier of fact) n requires access to all of the evidence n possibility doesn’t tell you about probability n continuum isn’t equal on both sides (bias towards positive identification?) 7
  8. 8. introduction what can the expert say? n provides a gradient assessment of the strength/ weight of evidence n ratio = value centered on 1 where: n support for prosecution = > 1 n support for defence = < 1 8 p(E|Hp) p(E|Hd) p = probability E = evidence | = ‘given’ Hp = prosecution hyp Hd = defence hyp ✓
  9. 9. introduction why? n evaluationof the evidence, rather than the hypotheses (innocence vs. guilt) n separates the role of the expert and trier of fact n explicit consideration of both prosecutionand defence hypotheses (objective) n clear (??) probabilistic statement presented to the court 9
  10. 10. introduction n LR = similarity and typicality n it matters “whether the values found matching (…) are vanishingly rare (…) or near universal” (Nolan 2001:16) n typicality = dependent on patterns in the “relevant population”(Aitken & Taroni 2004) n reference data = sample of that population n distributions modelled statistically to generate numerical output 10
  11. 11. introduction 11 p(E|Hp) p(E|Hd) = 0.047 0.0115 = 4.08
  12. 12. 12 LRs in forensic speaker comparison
  13. 13. LR in FSC research n LR increasingly accepted as the “logically and legally correct” (Rose and Morrison 2009:143) framework n relatively strong representationof the LR framework in FSC n the development of the LR framework in the field of forensic phonetics is largely thanks to a small community of researchers 13
  14. 14. LR in FSC research n LR-based research in FSC focused on two main areas: 1. speaker-discriminatory power of individual parameters § exclusively focused on continuous data (and almost exclusively on vowels) 2. methodological advances § procedures for assessing validity and reliability § calibration (for improving system validity) § fusion (for combining correlated parameters) 14
  15. 15. complexity of speech evidence n inherent variability of speech: within-speaker variation: no two utterances are ever the same n interlocutor n topic n emotion n time of day etc. 15 ∴ p(E|Hp) ≠ 1
  16. 16. complexity of speech evidence n inherent variability of speech: between-speaker variation: n biological/ physiological factors (e.g. size/ shape of the vocal tract) n social factors (regional varieties/ class/ ethnicity…) n attitude (towards speech community) n habitual/ idiosyncratic features etc. 16
  17. 17. complexity of speech evidence n ling-phonFSC = componentialapproach based on combined weight of multiple parameters n representative analysis of suspect/ offender voices n ling-phonparameters form highly correlated (sub- )systems: n within parameters: e.g. F2 & F3 in /i:/ n between parameters: e.g. AR & formant movement between onset and offset n physiological/ phonological/ social factors 17
  18. 18. complexity of speech evidence n componentialling-phon analysis involves lots of different types of data: n continuous: e.g. formant frequencies, fundamental frequency n discrete: e.g. allophonic/ lexical variation (counts), voice quality & vocal setting n distributed in different ways: n normal and non-normal distributions n elements of parameter distributed in different ways n different distributions within-/between-speakers 18
  19. 19. complexity of speech evidence n complexity of forensic conditions for FSC n transmission mismatch: between suspect (police interview) and offender (telephone) samples - affects frequency components around the boundaries of the band-pass filter for telephone speech n low signal-to-noise ratio in offender samples: due to background noise/ overlapping speech n linguistic mismatch between suspect and offender samples: intoxication, time of day, emotion, topic, interlocutor etc. 19
  20. 20. i. Modelling 20 Issues and alternative approaches
  21. 21. modelling n currently only a small handful of models available to calculate LRs univariate, normal LRs (Lindley 1977) § developed for glass analysis § requires continuous data § models within- and between-speaker variation using a normal distribution § variance in suspect and offender samples are equal § for use with univariate parameters *limited application for most ling-phon parameters 21
  22. 22. modelling n currently only a small handful of models available to calculate LRs multivariate kernel density (Aitken and Lucy 2004) § developed for glass analysis § requires continuous data § accounts for multivariate parameters (but designed for 3 or 4 variables per parameter) § within-speaker values = normal § between-speaker values = Gaussian kernel *applied to most continuous acoustic phon parameters 22
  23. 23. modelling n currently only a small handful of models available to calculate LRs Gaussian mixture model – universal background model (Reynolds et al. 2000) § requires continuous data § person independent UBM for background data generated using GMM § speaker specific GMM forms suspect model § no assumption of normality *commonly applied to ASR, but variable performance using ling-phon parameters 23
  24. 24. modelling n currently only able to handle continuous data n but some of the most helpful parameters are discrete - i.e. voice quality using the VPA (Gold and French 2011) n many speech parameters are not considered under a numerical LR framework due to these lack of models - effectively only providing a partial analysis of an individuals’ speech characteristics 24
  25. 25. modelling solutions n create statistical models to fit the complexity of the forensic data n rather than applying models from other fields n fairer representationof speaker characteristics by incorporating all(relevant) parameters n linguistics/phonetics informing the choice of parameter rather than statistical models 25
  26. 26. modelling n Aitken and Gold (2013) n initial attempt to model discrete data in the form of counts n parameter = click rate (N clicks (ingressive velaric stops) per minute) n with many zeros (i.e. minutes with no clicks) 26
  27. 27. modelling n Foulkes, French, Gold, Hughes, Harrison, Stevens, Aitken and Neocleous (2013-15) n grant to develop specificmodels for different types of ling-phon parameters n multivariate model to capture correlations and ordering effects of formants from vowels n i.e. F1<F2<F3 for all vowels n model for discrete, binary data (presence- absence) for auditory VPA data 27
  28. 28. ii. Relevant population 28 Issues and alternative approaches
  29. 29. relevant population 29 logical relevance n Rose (2004: 4): “quite often (Hd) will simply be that the voice of the unknown speaker does not belong to the accused, but to another same-sex speaker of the language” n reflected in the majority of LR-based studies: n Kinoshita (2002), Aldermann (2004), Kinoshita (2005), Rose (2006), Rose et al. (2006), Rose (2007), Morrison & Kinoshita (2008), Morrison (2009), Morrison et al. (2011) Morrison (2011), Zhang et al. (2011)…
  30. 30. relevant population 30 n why just sex and language? n sex/ language are easily accessible in the speech signal (but see French et al. (2010:145), Foulkes &French (2012:569)) n sex/ language are the most significant sources of social variation defining sub-populations - lack of understanding of complexity of socially stratified variation in speech n paradox = without knowing who the offender is we can’t know (for sure) the population of which (s)he is a member
  31. 31. relevant population 31 n empirical testing of logical relevance (Hughes, in progress; Hughes & Foulkes, submitted) n single set of test data (pairs of mock suspect and offender samples) n social factors varied: regional background, age, class n multiple sets of reference data (matched, mixed and mismatched) n LRs computed using different ling-phon input parameters (/u:/ in goose, boot; /aɪ/ in price, bite; /eɪ/ in face, bake)
  32. 32. relevant population 32 n strength of evidence: n generally greater when using mismatched reference data n generally lower when using mixed reference data (up to 100 times weaker compared with matched baseline) n validity (errors) n fewer errors (better Cllr) with mismatched reference data n mixed: up to 20% more DS pairs classified as SS (false hits)/ up to 5% more SS pairs classified as DS (false misses)
  33. 33. relevant population 33 solutions n objective speaker similarity (i.e. a populationof speakers who actually sound like the offender) n similar to the approach used in ASR n *should* end up with sociolinguistically homogeneous group n more overt awareness/ acknowledgement of the complex sources of between-speaker variation n more empirical testing of effects of different definitions of the relevant populationon LR output
  34. 34. iii. Correlations 34 Issues and alternative approaches
  35. 35. correlations n naïve Bayes: LRs from multiple parameters may be combined using the independent product rule n but ling-phon parameters are often highly correlated (between- and/or within-individuals) (as a result of anatomical/ social factors) n early LR-based research used naïve Bayes to combine parameters n often with disregard for linguistic correlations n but some with empirical testing 35
  36. 36. correlations n multivariate LR models (e.g. MVKD) n accounting for correlations within parameters n but still issues with correlations between parameters n logistic-regression fusion as a potential solution for between-parameter correlations n currently the only alternative n procedure developed for automatic speaker recognition systems 36
  37. 37. correlations n fusion = form of “back-end processing” n considering the correlations in the LRs between parameters n generates overall LR “it is…possible…that two segments which are not correlated by virtue of their internal structure and which therefore should be naively combined, nevertheless have LRs which do correlate” (Rose 2010: 32) 37
  38. 38. correlations solutions n thinking about linguistic correlations in the data rather than in the LRs n front-end processing: through theory and structural learning n Bayesian networking or graphical models n Gold (2014) n combining LRs from the most commonly analysed parameters in FSC 38
  39. 39. International Survey of Forensic Experts (speech):Most Helpful Discriminants Vowels Long-term formant distributions (LTFD) Speech Tempo Articulation Rate (AR) Fundamental Frequency F0 (mean and SD) Linguistic Features Clicks (click rate) Gold and French (2011a;2011b) correlations
  40. 40. correlations 40 H HypotheticalBayesian Network of speech parameters
  41. 41. correlations Comparison % Correct Mean LLR Min LLR Max LLR EER Cllr Complete System SS 92.00 5.673 -3.082 7.316 .0607 .3793 Complete System DS 93.27 1.560 -infinity 3.963
  42. 42. correlations n Gold and Hughes (2013-14) n International Association of Forensic Phonetics and Acoustics (IAFPA) Grant n analysis of correlations between a large number of ling- phon parameters from a large, homogeneous population of speakers n expected: F3 from all vowels (+ correlation)/ AR & VOT for /t/ (- correlation) n unexpected: mean click rate & F2 mean (+ correlation) n differences in correlations for the group and for individuals (e.g. F2 and F3 for UM) 42
  43. 43. correlations n next step: compare the correlations in the raw data with the correlations in the resulting LRs n means of assessing whether fusion is capturing linguistically meaningful correlations in the data n understanding ling-phonprinciples of correlations may help us to more accurately capture the strength of evidence n more efficient: not spending excessive amount of time analysing highly correlated parameters 43
  44. 44. Discussion 44
  45. 45. n plenty of arguments why the LR is the logically and legally correct framework for forensic evidence n keeps the role of expert and trier of fact separate n forces the expert to analyse only the specific piece of evidence n but long way off widespreadapplication of numerical LR in casework n due, in part, to limitations of current approaches 45 discussion
  46. 46. n implicit in current LR-based FSC research is that the data should be forced to fit the numericalLR framework even if this means: n analysing only a small sub-set of potential parameters n making unrealistic assumptions about the distribution of our data n not accounting for the inherent complexity n models and procedures that we apply were often not developedto account for the complexity of speech data 46 discussion
  47. 47. n linguistics/ phonetics should inform how we apply and compute LRs n rather than the framework dictating what can/ can’t be incorporated n the complexity of linguistic-phonetic evidence shouldn’t be ignored n DNA = seen as “setting the standard” (Baldwin 2005:55) for forensic evidence n offers opportunity for speech to be at the forefront of forensic science 47 discussion
  48. 48. acknowledgements This research has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013)undergrant agreementnumber 238803 and the Economic Social Research Council Thanks also go to Colin Aitken, Paul Foulkes, Peter French, Michael Jessen, Tereza Neocleous 48