Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

nvestigating the effects of sample size on numerical likelihood ratios using Monte Carlo simulations

39 views

Published on

Hughes, V. (2013) Investigating the effects of sample size on numerical likelihood ratios using Monte Carlo simulations. Paper presented at International Association for Forensic Phonetics and Acoustics (IAFPA) Conference, University of South Florida, Tampa. 21-24 July 2013.

Published in: Education
  • Be the first to comment

  • Be the first to like this

nvestigating the effects of sample size on numerical likelihood ratios using Monte Carlo simulations

  1. 1. Investigating the effects of sample size on numerical likelihood ratios using Monte Carlo simulations Vincent Hughes Department of Language and Linguistic Science International Association of Forensic Phonetics and Acoustics Annual Conference University of South Florida, 21st – 24th July 2013
  2. 2. • essential part of the LR approach = estimation of typicality – models of within- and between-Svariation generated from sampled sub-section of the relevant population (Aitken and Taroni 2004) BUT how much reference data do we need? (i) representativeness (ii) precision 1.0 Introduction Hughes IAFPA 2013 2
  3. 3. 1.0 Introduction 3 Hughes IAFPA 2013 • limited set of previous studies provide some insight into (i): Ishihara and Kinoshita (2008) - f0 (distribution characteristics) - “population size effect” with < 20 speakers Hughes and Foulkes (2012) - formant trajectories of /u:/ - improved Cllr (validity) and stability of logLRs with > 20 speakers
  4. 4. 1.0 Introduction 4 Hughes IAFPA 2013 Rose (2012) first study to consider (ii): – single SS comparison from real case – mid-point F1, F2 and F3 for AusEng /a:/ (‘car’) – Monte Carlo simulations (MCS) to generate synthetic speakers based on Bernard (1967) – results compared with ‘true’ LR using 10,000 reference speakers – relative stability of logLRs with around 30 speakers (approaching value of ‘true’ LR)
  5. 5. 1.0 Introduction 5 Hughes IAFPA 2013 Rose (2012) first study to consider (ii): BUT – limited test data – no assessment of validity as f(N speakers) – no calibration of system based on scores from an appropriate training set – no explicit description of MCS procedures for synthesisingcorrelated features of a parameter
  6. 6. 1.0 Introduction 6 Hughes IAFPA 2013 Research Questions using Monte Carlo simulations to generate sufficiently large amount of reference data: (1) between-S variation: how are logLRs and system performance affected by the number of reference speakers? (2) within-S variation: how are logLRs and system performance affected by the number of tokens per reference speaker?
  7. 7. 2.1 Data 7 Hughes IAFPA 2013 • local articulation rate (AR) (from Gold, in progress) for 100 DyVis speakers (Task 2) (Nolan et al, 2009) – recordings divided into 26-32 memory stretches (Jessen 2007) – pauses, hesitation phenomena and repairs removed (following Künzel 1997) – N phonological syllables per memory stretch (first 26 (tokens) used for all speakers)
  8. 8. 2.1 Data 8 Hughes IAFPA 2013 Why AR? – availability of large amount of data (thanks Erica!) – univariate parameter (only correlation between means and SD to model) Problems with AR: – poor discriminant – confirmed by low variance ratio (Rose et al 2006)
  9. 9. 2.1 Data 9 Hughes IAFPA 2013 • training (development) data = 20 speakers • test data = 20 speakers • reference data = 59 speakers (1 removed based on outlying SD value, z > 3.29) • LRs computed using MVKD formula (Aitken and Lucy 2004) – between-Svar = kernel density – within-S var = normal distribution
  10. 10. 2.2 MCS procedure 10 • mean and SDs firstly sampled to create synthetic normal distributions • means sampled from the normal distribution of the means for the raw data: – pseudo-random value between 0 and 1 (Zi) generated by rand function in MatLab Hughes IAFPA 2013
  11. 11. • correlation between means and SDs used to generate synthetic SDs - speakers with higher average tempo display greater variability • generated from a normal distribution
  12. 12. 3.1 Experiment 1: N speakers 12 Hughes IAFPA 2013 • reference data = up to 1000 speakers (59 raw data, 941 synthetic data)/ 26 tokens per sp • SS and DS LR scores computed for development and test sets using ref data from 10:1000 speakers • logistic regression calibration weights generated from development scores and applied to test scores • validity (Cllr and EER) assessed as f(N speakers)
  13. 13. -0.1 0.0 0.1 0.2 0.3 0.4 10 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 Number of Reference Speakers Log10LR 3.1 Experiment 1: N speakers Calibrated LRs: Same-speaker pairs
  14. 14. -0.6 -0.4 -0.2 0.0 0.2 0.4 10 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 Number of Reference Speakers Log10LR 3.1 Experiment 1: N speakers 14 Calibrated LRs: Different-speaker pairs
  15. 15. 3.1 Experiment 1: N speakers 15 Hughes IAFPA 2013
  16. 16. -3 -2 -1 0 10 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 Number of Reference Speakers Log10LR 3.1 Experiment 1: N speakers Uncalibrated LRs: Different-speaker pairs
  17. 17. 3.2 Experiment 2: N tokens 17 Hughes IAFPA 2013 • reference data = 200 speakers (59 raw data, 141 synthetic data)/ up to 200 tokens per sp • SS and DS LR scores computed for development and test sets using 10:200 tokens per ref sp • logistic regression calibration weights from development scores applied to test scores • validity (Cllr and EER) assessed as f(N tokens per sp)
  18. 18. 0.0 0.1 0.2 0.3 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 Number of Tokens per Reference Speaker Log10LR 3.2 Experiment 2: N tokens Calibrated LRs: Same-speaker pairs
  19. 19. -0.6 -0.4 -0.2 0.0 0.2 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 Number of Tokens per Reference Speaker Log10LR 3.2 Experiment 2: N tokens Calibrated LRs: Different-speaker pairs
  20. 20. 3.2 Experiment 2: N tokens 20 Hughes IAFPA 2013
  21. 21. -6 -4 -2 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 Number of Tokens per Reference Speaker Log10LR 3.2 Experiment 2: N tokens Uncalibrated LRs: Different-speaker pairs
  22. 22. 4.0 Discussion 22 Hughes IAFPA 2013 • some fluctuation in magnitude of LRs and Cllr (albeit in a narrow range) with small sample sizes • calibrated SS and DS LRs relatively robust against sample size effects – both for N speakers and N tokens – implies that relatively reliable/ precise models of within- and between-spvariation for AR with relatively small amounts of data
  23. 23. 4.0 Discussion 23 Hughes IAFPA 2013 • calibration has played an important role here… – much more stability in LR performance when scores are calibrated with small amounts of reference data – trade-off between the amount of reference data and calibration weights? – is this parameter-specific? (because mean LR is close to unity?)
  24. 24. 5.0 Conclusion 24 Hughes IAFPA 2013 • MCS provide a useful way of assessing issues of sample size • extent to which these results are generalisable? – ideally need to test on more complex multivariate data with greater discriminatory power (e.g. formants) – sample size testing needed in LR-based casework? • other issues to test: size of training/test sets + effects on calibration weights
  25. 25. Thanks! Questions? Acknowledgements: Ashley Brereton, Paul Foulkes, Peter French, Erica Gold, York FSS Research Group, Dom Watt vh503@york.ac.uk

×