Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Forensic voice comparison using long-term acoustic measures of laryngeal voice quality

89 views

Published on

Hughes, V., Cardoso, A., Foulkes, P., French, J. P., Harrison, P. and Gully, A. (2019) Forensic voice comparison using long-term acoustic measures of voice quality. Paper presented at the 19th International Congress of Phonetic Sciences (ICPhS), Melbourne, Australia. 5-9 August 2019.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Forensic voice comparison using long-term acoustic measures of laryngeal voice quality

  1. 1. Forensic voice comparison using long-term acoustic measures of laryngeal voice quality Vincent Hughes1 Amanda Cardoso2 Philip Harrison13 Paul Foulkes1 Peter French13 Amelia Gully1 1 University of York 2 University of British Columbia 3 J P French Associates
  2. 2. 1. Introduction 2 • forensic voice comparison: – known suspect vs. unknown offender – question of identity • two broad sets of methods used: – linguistic-phonetic: componential auditory and acoustic analysis at all linguistic levels – automatic speaker recognition (ASR): holistic analysis of short-term cepstral features (typically MFCCs) across the entire voice-active portion of a recording
  3. 3. 1. Introduction 3 • VQ (combination of supralaryngeal and laryngeal settings) reported to be useful in casework (Gold & French 2011) • but… – no single protocol used by all analysts – little critical insight into forensic value (although see Nolan 2005)
  4. 4. 1. Introduction 4 • VQ (combination of supralaryngeal and laryngeal settings) reported to be useful in casework (Gold & French 2011) • but… – no single protocol used by all analysts – little critical insight into forensic value (although see Nolan 2005) – crucially… claim has never been tested empirically
  5. 5. 2. Research questions 5 1. How well does laryngeal VQ perform at voice comparison when analysed acoustically? 2. How robust are laryngeal VQ measures to channel variation? 3. Can laryngeal VQ help to improve the performance of an MFCC-based ASR system? Our approach = utilise errors produced by older systems (rather than attempt to improve state- of-the-art systems)
  6. 6. 3. Methods 6 • DyViS corpus (Nolan et al. 2009): – 97 young (18-25), male speakers of SSBE • Two tasks per speaker: – Suspect: mock police interview (Task1) – Offender: telephone conversation with accomplice (Task2) • 60 seconds of vowel material per sample per speaker segmented automatically
  7. 7. 3. Methods 7 • Four channel conditions: degradation in quality Sampling rate Bandpass filtering Bit-rate HQ Studio 44.1kHz - - TEL Landline telephone 8kHz 300-3400Hz - MOBHQ GSM mobile 8kHz 300-3400Hz 12.2kb/s MOBLQ GSM mobile 8kHz 300-3400Hz 4.75kb/s
  8. 8. 3. Methods 8 • VQ measures (using VoiceSauce): – Cepstral peak prominence (CPP) – Harmonics-to-noise ratios (HNR): over 0-500Hz, 0-1500Hz, 0-2500Hz & 0-3500Hz – H1-A1, H1-A2, H1-A3 – H1-H2, H2-H4 – F0: using STRAIGHT with 75-200Hz range • MFCCs with deltas & delta-deltas Additive noise Spectral tilt
  9. 9. 3. Methods 9 • Likelihood ratio-based testing conducted: – GMM-UBM approach – Logistic-regression calibration (and fusion) • 20 replications using different sets of speakers • Performance: – Equal error rate (EER): proportion of errors – Log LR cost function (Cllr): magnitude of errors – Closer to 0 the better
  10. 10. 4. Results: all VQ 10
  11. 11. 4. Results: all VQ 11
  12. 12. 4. Results: all VQ 12
  13. 13. 4. Results: separate VQ 13
  14. 14. 4. Results: MFCC & VQ fusion 14 MOB-LQ MOB-HQ TEL HQ MFCC-only Fused 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 EER(%) MOB-LQ MOB-HQ TEL HQ MFCC-only Fused 0.025 0.050 0.075 0.100 0.125 0.025 0.050 0.075 0.100 0.125 0.025 0.050 0.075 0.100 0.125 0.025 0.050 0.075 0.100 0.125 LogLRCost(Cllr)
  15. 15. 4. Results: MFCC & VQ fusion 15 MOB-LQ MOB-HQ TEL HQ MFCC-only Fused 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 EER(%) MOB-LQ MOB-HQ TEL HQ MFCC-only Fused 0.025 0.050 0.075 0.100 0.125 0.025 0.050 0.075 0.100 0.125 0.025 0.050 0.075 0.100 0.125 0.025 0.050 0.075 0.100 0.125 LogLRCost(Cllr) HQ: • 5/20 improvement • 10/20 same • 5/20 worse
  16. 16. 4. Results: MFCC & VQ fusion 16 MOB-LQ MOB-HQ TEL HQ MFCC-only Fused 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 EER(%) MOB-LQ MOB-HQ TEL HQ MFCC-only Fused 0.025 0.050 0.075 0.100 0.125 0.025 0.050 0.075 0.100 0.125 0.025 0.050 0.075 0.100 0.125 0.025 0.050 0.075 0.100 0.125 LogLRCost(Cllr) MOBLQ: • 16/20 improvement • 4/20 worse • Improvements of up to 97% (70% average)
  17. 17. 5. Discussion 17 • acoustic laryngeal VQ measures carry considerable speaker-specific information: – especially given extraction of measures across vowels & no correction of data – outperforms formants on the same recordings • unsurprisingly, performance degrades using telephone and mobile samples: – But… not by much – VQ remarkably robust to channel
  18. 18. 5. Discussion 18 • individual differences – some speakers are easier to separate using VQ than other (unsurprisingly) – system-level performance dependent on the make- up of the speakers tested (cf. Wang et al. yesterday) • important to test systems with different configurations of speakers • we should be cautious in generalising about the speaker discriminatory power of variables
  19. 19. 5. Conclusion 19 • automatic measures of laryngeal VQ are useful speaker discriminants: – perform well by themselves/ relatively robust to channel – can be used to improve ASR systems – esp. in degraded conditions • but... Individual variation (as you would expect) • no clear correlation with auditory VPA
  20. 20. Thanks! Questions? vincent.hughes@york.ac.uk @VinceH_Forensic
  21. 21. References 21 Gold, E. and French, J. P. (2011) International practices in forensic speaker comparison. International Journal of Speech, Language and the Law 18(2): 293-307. Nolan, F. (2005) Forensic speaker identification and the phonetic description of voice quality. In: W. Hardcastle and J. Beck (eds), A Figure of Speech. Mahwah, New Jersey: Erlbaum. pp 385-411. Nolan, F., McDougall, de Jong, G. and Hudson, T. (2009) The DyViS database: style-controlled recordings of 100 homogeneous speakers for forensic phonetic research. International Journal of Speech, Language and the Law 16(1): 31-57. San Segundo, E., Foulkes, P., French, J. P., Harrison, P., Hughes, V. and Kavanagh, C. (2018) The use of the vocal profile analysis for speaker characterisation: a methodological proposal. Journal of the IPA. Wang, B., Hughes, V. and Foulkes, P. (2019) Effect of score sampling on system stability in likelihood ratio based forensic voice comparison. Proceedings of the 19th International Congress of Phonetic Sciences, Melbourne, Australia. 5-9 August 2019.

×