Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The contribution of source and filter to speaker characterisation

40 views

Published on

Hughes, V., Cardoso, A., Foulkes, P., French, J. P., Gully, A. and Harrison, P. (2019) The contribution of source and filter to speaker characterisation. Paper presented at the International Association of Forensic Phonetics and Acoustics Conference. Istanbul, Turkey. 14-17 July 2019.

Published in: Education
  • Be the first to comment

  • Be the first to like this

The contribution of source and filter to speaker characterisation

  1. 1. The contribution of source and filter to speaker characterisation Vincent Hughes, Amanda Cardoso, Paul Foulkes, Peter French, Amelia Gully & Philip Harrison IAFPA 14-17 July 2019
  2. 2. • explore strengths, weaknesses and relationships between different speaker characterisation methods: – phonetic (e.g. Voice Quality) – semi-automatic (e.g. Long Term Formant Distributions) – automatic (e.g. MFCCs) • today’s presentation – source and filter… 2 Vincent Hughes Amanda Cardoso Paul Foulkes Peter French Amelia Gully Philip Harrison @
  3. 3. 1. Quick summary… • source-filter theory of speech production (Fant 1960) – assumes independence of source & filter • but are they really independent? – if yes… • combining source and filter features gives better characterisation – if no… • don’t duplicate evidence • examine correlations between source and filter features • examine speaker discrimination of ASR system using different combinations of source and filter features 3
  4. 4. 2. Research questions • Are source and filter independent in terms of speaker characterisation? • Does the fusion of source and filter features improve speaker characterisation performance over either one individually? • Are there differences between individual speakers in terms of source-filter independence? (limited prior research on source & filter) 4
  5. 5. 3. Method: Data • DyViS (of course!): HQ Tasks 1 and 2 • 90 speakers • hesitation marker um: – focus on vowel portion • 6758 tokens: – c. 35 per speaker/task – marked up manually 5
  6. 6. 3. Method: Features Source: F0, energy, additive noise (cepstral peak prominence), spectral tilt (range of relative harmonic measures) Filter: formants (F1, F2, F3), MFCCs • midpoint measures & • multiple measures across duration of V 6
  7. 7. 3. Method: Correlations based on midpoint data… (1) correlations in raw data – token-by-token correlations across all speakers – data for two samples pooled per speaker (2) correlations in between-speaker distances – data modelled with GMMs for each feature – Kullback-Leibler divergences (distance) between speakers – normalised and averaged by source and filter 7
  8. 8. 4. Results: Correlations (raw) 8 SourceFilter SourceFilter
  9. 9. 4. Results: Correlations (distances) 9 R2 = 0.0002
  10. 10. 3. Method: Speaker discrimination • speakers divided: training, test, reference sets • calibrated LLRs computed for each feature • all possible combinations of source and filter features tested using logistic-regression fusion • best source combination fused with the best filter combination • EER (%) and log LR cost function (Cllr) • 20 replications using different speaker combinations for training, test, reference 10
  11. 11. 4. Results: Speaker discrimination 11 Formants 9.21 4.20 0.35 0.21 MFCCs 3.21 0.46 0.14 0.08 ALL FILTER 0.74 0 0.07 0.01 EER (%) Cllr Mean Min Mean Min F0 21.84 16.83 0.67 0.55 Additive noise 17.19 10.63 0.61 0.49 Spectral tilt 16.39 12.76 0.61 0.50 Energy 47.70 36.03 1.01 0.99 ALL SOURCE 8.25 3.85 0.32 0.19
  12. 12. 4. Results: Speaker discrimination 12FilterFilter Source & Filter Source & Filter EER(%) LogLRCost(CLLR)
  13. 13. 5. Discussion 13 Are source and filter independent in terms of speaker characterisation? • in our data, essentially independent • but… – highly controlled demographics (DyViS) – only one segment
  14. 14. 5. Discussion 14 Does the fusion of source and filter features improve speaker characterisation performance over either one individually? • on average there was improvement – 16 of 20 replications (up to 64% decrease in Cllr) • but… it depends on the replication – possible floor effect in some replications
  15. 15. 5. Discussion 15 Are there differences between individual speakers in terms of source-filter independence? • yes, it’s very speaker dependent • we calculated mean SS and DS LLRs for each speaker across all 20 replications • then, took the difference between filter-only LLRs and source-filter LLRs
  16. 16. 5. Discussion 16 when adding source to filter: • stronger SS and DS LLRs – better perf. • stronger SS or DS LLRs – better & worse perf. • weaker SS and DS LLRs – worse perf.
  17. 17. 5. Discussion 17 • 51 of 90 speakers: stronger SS and DS LLRs • three problematic speakers: #33, 96, 113 • is there anything odd about these speakers? – thin/short worms for source features: weak/error- prone scores and low variability – average to strong scores (sheep/doves) for filter features – source features weaken filter features – but VQ profiles not unusual
  18. 18. 6. Conclusions 18 • tested source-filter independence in the context of speaker characterisation • no correlations in the raw data • general improvement in speaker discrimination by combining source & filter features • but… speaker effects – key (next) question: why these speakers?
  19. 19. Thanks! Questions? Special thanks to Frantz Clermont IAFPA 14-17 July 2019

×