Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Holistic perception of voice quality matters more than L1 when judging speaker similarity in short stimuli

72 views

Published on

San Segundo, E., Foulkes, P. and Hughes, V. (2016) Holistic perception of voice quality matters more than L1 when judging speaker similarity in short stimuli. Paper presented at the 16th Australasian Conference on Speech Science and Technology (ASSTA). University of Western Sydney, Australia. 6-9 December 2016.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Holistic perception of voice quality matters more than L1 when judging speaker similarity in short stimuli

  1. 1. Holistic perception of voice quality matters more than L1 when judging speaker similarity in short stimuli Eugenia San Segundo, Paul Foulkes & Vincent Hughes University of York SST 2016 Parramatta, Australia 7-9 December
  2. 2. 1. Introduction 2 - Voice quality(VQ) …but which VQ? Quasi-permanent quality resulting from a combination of long-term laryngeal and supralaryngeal features (Laver, 1980) – broad definition - social marker à membership of a speech community - idiosyncratic à characteristic colouring, or timbre FORENSIC PHONETICS! • forensic speaker comparison • earwitness evidence/testimony APPROACH Articulatory/Perceptual/Acoustic naïve listeners experts - holistic -featural
  3. 3. à naïve listeners will rely on holistic VQ perception in order to judge similarity between speakers... …. regardless of their L1 i.e. no native language advantage (cf. Perrachione et al. 2009) § when? under controlled conditions of speaker similarity: 2. Hypothesis • shared dialect • approximate age • approximate mean F0 § what? short speech samples VQ = only resource available for listeners to judge speaker similarity (i.e. nasality, creaky voice)
  4. 4. 3. Materials and method 3.1. Subjects 5 pairs male MZ twins: – native Spanish (Madrid) – no voice pathologies – similar sounding: 1. similar age mean 21, sd 3.7 2. similar mean F0 mean 113 Hz, sd 13 Hz 3. similar Euclidean Distance (ED) à based on VQ perceptual assessment 4
  5. 5. VPA settings Speaker Labial Mandibular LingualTip Lingualbody Velopharyngeal Pharyngeal LarynxHeight VTtension Larynxtension Phonationtype SMC AGF 0 0 0 0 0 0 2 1 1 1 SGF 0 1 0 0 1 0 2 1 1 1 Match 1 0 1 1 0 1 1 1 1 1 0.8 AMG 0 1 0 2 2 2 1 1 1 0 EMG 0 1 0 2 0 2 1 1 0 0 Match 1 1 1 1 0 1 1 1 0 1 0.8 ASM 1 0 0 0 1 0 2 0 1 1 RSM 1 0 0 0 1 0 2 2 2 0 Match 1 1 1 1 1 1 1 0 0 0 0.7 ARJ 2 2 1 1 0 1 2 2 1 0 JRJ 1 2 0 0 0 0 2 2 2 0 Match 0 1 0 0 1 0 1 1 0 1 0.5 DCT 0 2 0 2 0 2 0 2 1 1 JCT 0 0 0 0 2 2 2 0 1 1 Match 1 0 1 0 0 1 0 0 1 1 0.5 3.1. Subjects …similar perceptual ED 3. Materials… 5 – based on perceptual asssesment of VQ – using a simplified version of the VPA scheme: – 10 settings e.g. mandibular setting (close – neutral – open) • EDs = Similarity Matching Coefficients (SMC) number of setting matches number of settings (10) 1 0 2 mean SMC = 0.66 SMC= APPROACH Perceptual Naïve listeners Experts
  6. 6. 3. Materials and method 3.2. Stimuli and listeners Stimuli • ~3 secs • from spontan. conversations – interlocutor = controlled – same speaking style • declarative sentences – different ling. content – diverse neutral topics 6 Listeners • 20 native Spanish speakers age range 22-51; mean 33 • 20 native English speakers age range 19-35; mean 25 no knowledge of Spanish! à no hearing difficulties àrecruited at universities in
  7. 7. 3. Materials and method 3.3. Design of perceptual test • Multiple Forced Choice Praat experiment 90 different-speaker pairings – random order • Instructions for listeners: “please rate their similarity from 1 to 5” very similar very different • Familiarization test: short pre-test • Test duration = 15 min (break every 30 stimuli) • Test administration = silent room + PC with headphones • Listeners were not told that the test included twin pairs! 7 1 432 5
  8. 8. 3. Materials and method 3.4. Analysis methods • Multidimensional Scaling (MDS) à to visualize degree of perceived similarity à to detect meaningful dimensions that explain observed (dis)similarities • Mixed-effects modelling (MEM) à to fit models to the similarity ratings — Fixed effects (predictors): § Listener language § SMC between speakers in the target trial § Reaction time § Twins –whether speakers were twins or not 8 — Random effects: § Listeners § Trial (target sp. comparison)
  9. 9. 4. Results • MDS analysis scree plot: relative magnitude of the sorted Eigenvalues 9 stress:0.03 stress:0.07 *stress = goodness-of-fit criterion to minimize - A certain amount of distortion is tolerable. Rule of thumb: <0.1 is excellent; >0.15 is unacceptable (>0.20 is poor)
  10. 10. 4. Results • MDS plots (2D) 10 stress: 0.8
  11. 11. 4. Results • MDS plots (3D) 11 stress: 0.4
  12. 12. 4. Results 12 • Intra-pair EDs based on 7D speakers → AGF SGF DCT JCT ARJ JRJ ASM RSM AMG EMGlisteners ↓ Spanish 0.341 0.343 0.345 0.369 0.607 English 0.264 0.219 0.349 0.435 0.445 most similar most different
  13. 13. 4. Results 13 • Mixed-effects modelling – Best model à all fixed effects + interactions Est. Std Error z val Pr(>|z|) Language (Spanish) 2.98 0.94 3.165 0.001 ** SMC 1.97 1.38 1.43 0.152 Reaction Time 0.03 0.07 0.49 0.627 Twins (yes) -12.9 3.77 -3.44 <0.001 *** Language* SMC -.3.96 2.15 -1.84 0.066 Language* Reaction Time -0.35 0.11 -3.15 0.002 ** SMC* Reaction Time -0.18 0.15 -1.20 0.229 Language* Twins 5.59 5.83 0.96 0.337 SMC*Twins 13.45 5.75 2.34 0.019 * Reaction Time *Twins 1.21 0.41 2.93 0.003 ** • Significant interactions: ü Language * Reaction time ü Reaction time * Twins ü SMC * Twins • Besides… descriptive patterns
  14. 14. 4. Results 14 • Mixed-effects modelling – Best model à all fixed effects + interactions – Significantinteractions: üLanguage * Reaction time üReaction time * Twins üSMC * Twins
  15. 15. 4. Results 15 ü Language * Reaction time language*reaction_time effect plot reaction_time response(probability) 0.1 0.2 0.3 0.4 0.5 0 2 4 6 8 10 12 14 :language English :response 1 :language Spanish :response 1 :language English :response 2 0.1 0.2 0.3 0.4 0.5 :language Spanish :response 2 0.1 0.2 0.3 0.4 0.5 :language English :response 3 :language Spanish :response 3 :language English :response 4 0.1 0.2 0.3 0.4 0.5 :language Spanish :response 4 0.1 0.2 0.3 0.4 0.5 :language English :response 5 0 2 4 6 8 10 12 14 :language Spanish :response 5
  16. 16. 4. Results 16 ü Reaction time * Twins (language independent effects!) reaction_time*twins effect plot reaction_time response(probability) 0.2 0.4 0.6 0.8 0 2 4 6 8 10 12 14 :twins No :response 1 :twins Yes :response 1 :twins No :response 2 0.2 0.4 0.6 0.8 :twins Yes :response 2 0.2 0.4 0.6 0.8 :twins No :response 3 :twins Yes :response 3 :twins No :response 4 0.2 0.4 0.6 0.8 :twins Yes :response 4 0.2 0.4 0.6 0.8 :twins No :response 5 0 2 4 6 8 10 12 14 :twins Yes :response 5
  17. 17. 4. Results 17 ü SMC * Twins (language independent effects!) smc*twins effect plot smc response(probability) 0.2 0.4 0.6 0.8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 :twins No :response 1 :twins Yes :response 1 :twins No :response 2 0.2 0.4 0.6 0.8 :twins Yes :response 2 0.2 0.4 0.6 0.8 :twins No :response 3 :twins Yes :response 3 :twins No :response 4 0.2 0.4 0.6 0.8 :twins Yes :response 4 0.2 0.4 0.6 0.8 :twins No :response 5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 :twins Yes :response 5
  18. 18. 5. Discussion 18 - MDS • optimal configuration = 7D space – lowest possible stress value – confirms VQ multidimensionality (Kreiman & Sidtis 2011) • from most similar…. …to least similar twins pairs same cue prominence?different weight?
  19. 19. 5. Discussion 19 - Mixed Effects Modelling • mostly language-independent effects – notably: twins rated as more similar than non-twins • …but one language-dependent effect: language*reaction_time effect plot reaction_time response(probability) 0.1 0.2 0.3 0.4 0.5 0 2 4 6 8 10 12 14 :language English :response 1 :language Spanish :response 1 :language English :response 2 0.1 0.2 0.3 0.4 0.5 :language Spanish :response 2 0.1 0.2 0.3 0.4 0.5 :language English :response 3 :language Spanish :response 3 :language English :response 4 0.1 0.2 0.3 0.4 0.5 :language Spanish :response 4 0.1 0.2 0.3 0.4 0.5 :language English :response 5 0 2 4 6 8 10 12 14 :language Spanish :response 5 mean. 0.82 s sd: 0.14 mean. 0.84 s sd: 0.18
  20. 20. 6. Conclusions 20 • aim à explore the role of holistic VQ perception in speaker similarity ratings • results à native ≈ non-native ratings of similarity § no native advantage - short stimuli + homogeneous population (same accent, similar age, etc.) § VQ = available resource • possible implications in earwitness testimony • future studies: § interrelationships between - (naïve) holistic VQ perception - (expert) featural VQ perception à different salience à weigthing methods
  21. 21. Thanks! Questions?

×