Current Dev. In Phonetics

  • 939 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
939
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
25
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Current developments in phonetics Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC)
  • 2. Overview
    • My job: Aspects of Phonetics
    • Size of Phonetics Community
    • Ken Stevens and Phonetics
    • My choices of topics
    • Conclusions
  • 3. Aspects of Phonetics
    • Phonetics is fantastic and interdisciplinary
    • it is about the speech signal and much more:
    • spoken language, spoken communication
    • phonemes and prosody
    • speaking and listening
    • mental storage and retrieval
    • speech acquisition and speech pathology
    • speech technology, speech databases
    • languages of the world, dialects
    • and more: e.g. laboratory phonology, evaluating cochlear implants, or designing Web avatars
  • 4. My choices , given other intro’s
    • Phonetics as a basic science
    • Computational modeling, computational phonetics
    • Knowledge from annotated, and preferable freely accessible, speech corpora
    • Phonetics as an interdisciplinary science
  • 5. Size of Phonetics Community
    • 1000+ participants to speech conferences like: ICSLP & Eurospeech (now Interspeech under ISCA), ICASSP, LREC, ICPhS (under IPA)
    • numerous workshops (see ISCA and FoNETiks newsletters, and News section in SpeCom)
    • IPA ~1000 members, ISCA ~1350 members
    • phonetics community at least 10 times bigger
    • books; journals; LDC & ELRA
    • ICPhS’03 Barcelona: 50 countries (USA 158; FR 81; GE 73; UK 71; JAP 46; SP 45, SW 41; NE 31; CAN 25; RU 19; IT 17; FI 14; AU 12; BRA 12, CH 12)
    • Ba-Ma system: less specialization in Phonetics
  • 6. Ken Stevens and Phonetics
    • ESCA medal at Eurospeech’95 in Madrid
    • on average one paper per year in JASA
    • special issue JPhon “Quantal Theory” (1989)
    • 1998 master piece “Acoustic Phonetics”
    • regular keynote speaker at conferences
    • many international contacts (also Europe)
    • many good students world-wide
  • 7. Banquet Eurospeech95, Madrid E’95 chairman ESCA-medalist ESCA president J.M. Pardo Ken Stevens Mrs. Pardo Louis Pols
  • 8. Textbook Phonetics
    • Summer course in English Phonetics (UCL):
      • phonemic systems (vowels and consonants)
      • segmental analysis (allophonic processes)
      • word stress
      • weakening and coarticulation processes
      • sentence stress (accent, tonal stress)
      • intonation and meaning
    • similar in most textbooks
  • 9. Invariance Symp., MIT 1983
    • Invariance and variability in speech processes (Perkell & Klatt, 1986)
    • also Leitmotiv for my Amsterdam group
    • perception of dynamic speechlike sounds (vW)
    • formant dynamics (van Son)
    • appropriate context (van Son)
    • acoustic vowel reduction (van Bergem)
    • efficiency of speech (van Son)
  • 10. DL for short speech-like transitions Adopted from van Wieringen & Pols (1998), Acta Acustica 84, 520-528 “ Discrimination of short and rapid speechlike transitions” complex simple short longer trans. initial final
  • 11. Static vs. dynamic V recogn.
    • see Weenink (2001)
      • “ Vowel normalizations with the TIMIT acoustic phonetic speech corpus”, IFA Proc. 24, 117-123
    • 438 males, both train & test sentences TIMIT
    • 35,385 vowel segments, hand segmented
    • 13 monophthongeal vowel categories
    • 1-Bark bandfilter anal. (18), intensity normal.
    • 3 frames per segment: central and 25 ms L/R
  • 12. Some results
    • Vowel classif. (%) with discriminant functions
    94.5 87.9 5,374 speaker normalized 90.1 78.9 5,374 438x13 V centers per speaker 69.2 62.2 35,385 speaker normalized 66.9 59.3 35,385 438x13x(1…25) Original Dynamic 3 frames Static 1 frame # Items Condition
  • 13. Perceiving (speech) dynamics
    • vowel perception w/w or w/o transitions?
    • our claims (vSon, IFA proc. 17(1993):
      • only evidence for compensatory processes (i.e. perceptual-overshoot and dynamic-specification), when in an appropriate context
      • synthetic isolated dynamic formant tracks lead to perceptual undershoot (=averaging)
      • silent center studies are ambiguous
    • concl.: info in formant dynamics is only used when V’s are heard in appropriate context
  • 14.  
  • 15. Vowel identification
    • compare V responses for dynamic stimuli with those for static stimuli
    • calculate net shift in V responses per onglide (CV), complete (CVC), or offglide (VC)
    • result: responses average over the trailing part of the formant track
    • see Pols & vSon, “Acoustics and perception of dynamic vowel segments”, Speech Comm.
  • 16. Perceptual undershoot Net shift in vowel responses to tokens with curved formant tracks vs. stationary tokens. All values significant, except small open triangles
  • 17. Local context and C & V identification
    • 120 CVC fragments taken from a read text
    • various segments per CVC-fragment
    • (50ms V-kernel and beyond)
    • both accented and unaccented vowels
    • subjects identified (pre- or post-vocalic) consonant or vowel in CV-, VC-, or CVC-segments
    • vSon & Pols (1999), “Perisegmental speech improves consonant and vowel identification”, Speech Comm. 29, 1-22
  • 18.  
  • 19. Error rates of vowel identification for the individual stimulus token types. Long-short vowel errors (/ α-a:, -o:/) are ignored c
  • 20.
    • results:
    • phoneme identification benefits from extra speech
    • left context more beneficial than right context
    • better identification in CV when also other member of pair was identified correctly (context effect)
  • 21. Effect of (lack of) context
    • 100 Dutch listeners identifying V segments
      • “ Vowel contrast reduction”, K-vBeinum (1980)
    ASC = 1/n Σ |LF i - LF i | 2 (total variance), LF i = 100 10 log F i i=1 n 33.0 189 38.9 255 33.3 209 28.7 119 31.2 174 unstr., free conv. % (10) ASC 84.3 407 85.3 529 84.9 374 78.8 320 88.1 406 words % (5) ASC 89.6 480 86.4 634 88.0 447 88.9 404 95.2 433 isolated V % (3) ASC Av. F2 F1 M2 M1 3 conditions
  • 22. Historical biases
    • R. Plomp (2002) “The intelligent ear. On the nature of sound perception”
    • biases in research:
      • dominance for simple stimuli (e.g., phonemes)
      • preference for microscopic approach (e.g., phoneme discrimination rather than intelligibility)
      • emphasis on psychophysical rather than cognitive aspects of hearing
      • use of clean signals in lab (rather than acoustic reality of outside world with its disruptive sounds)
  • 23. Computational Phonetics
    • R. Moore (1995) 13th ICPhS, Stockholm
    • unify the emerging theoretical and practical developments in speech technology with the established knowledge and practices in phonetic sciences
    • Sagisaka et al. (1997), “Computing prosody. Computational models for processing spontaneous speech”
    • Klatt (1987), vSanten (1997), Wang (1997), duration modeling
    • vBergem (1993), Acoustic and lexical vowel reduction
    • Steeneken (1992), Speech Transmission Index
  • 24. Stylized formant contour c 2 c 1 F 2 (t) = c 0 + c 1 t + c 2 t 2 (second order polynomial) F 2 (t) = F 2 (t) + α 2 p (t) + β 2 t (t) + γ 2 α (t) for @ in /p@t α/ F 2 (-1) = 1352 Hz ; F 2 (0) = 1435 Hz; F 2 (1)=1485 Hz F 2 normalized time -1 F center (c 0 ) F offset 0 1 F onset
  • 25. Schwa realization The schwa is not just a centralized vowel but something that is completely assimilated with its phonemic context
  • 26. Human word intelligibility vs. noise from Ph.D thesis H. Steeneken (1992) ‘ On measuring and predicting speech intelligibility’
  • 27. Knowledge from Annotated Sp. Corp.
    • knowledge casted in rules vs. knowledge derived from intelligent searches in DB ’ s
    • vSanten (1997) greedy algorithm
    • Greenberg et al. (2003) Switchboard
    • Oostdijk et al. (2002) 1000 hrs.- 10M words spoken Dutch corpus ( CGN )
    • vSon et al. (2001) 5.5 hrs. IFA corpus
    • Intas915 project (Dutch, Finnish, Russian)
  • 28. Freq. effects vs. vowel reduction Dutch Finnish Russian -0,100 -0,050 0,000 0,050 0,100 0,150 0,200 0,250 0,300 Duration F12Dist CoG Intensity Correlation Coefficient -> R read speech spontaneous speech -log 2 (word frequency) vs. acoustic vowel reduction (in terms of duration, F1F2Dist, CoG, and Intensity) for Du, Fi, Ru Dutch Finnish Russian 0,000 0,050 0,100 0,150 0,200 0,250 0,300 Duration F12Dist CoG Intensity Correlation Coefficient -> R
  • 29. Phonetics an Interdisciplinary Science
    • some examples
      • phonetics is a contributor to many signal and data processing techniques as well as pattern recognition techniques
      • use of source-filter model to describe early speech development
      • laryngectomized speech, production and evaluation
      • turn switches in conversational dialogs
      • progress in vowel production in babies
  • 30. Early speech development vBeinum, Clement, vdDikkenberg, Developmental Sc. 4, 61-70 (2001) average onset (in weeks) Stage I Stage II Stage III Stage IV Stage V (babbling) Stage VI (‘words’) 0 6 10 20 31 40
  • 31. Tracheoesophageal speech C. van As, Ph.D thesis (2001)
  • 32. Turn switches in conversation
    • shift in phonetics from isolated stimuli to conversational speech
    • quantitative modelling of the identification of turn-relevent places (TRP’s)
    • integration process of temporally unfolding information at different levels in speech, from conversation acts and semantics to prosody, phonetics and visual cues
    • use of laryngograph to detect preparatory glottal closure that precedes most TRP’s
    • new project Rob van Son (start Jan. 2004)
  • 33. Progression in V production of babies
    • especially in the first year of life
    • utterances difficult to identify as phon. seq.
    • spectro-temporal analyses difficult because of very high pitch
    • formant measurements biased by expectations
    • pitch-related bandfilter analysis (automatic)
    • 5 normal-hearing and 5 hearing-impaired
    • vdStelt et al. (2003)
  • 34. Spectral measurements normal hearing child 5 & 24 mo. hearing-impaired child 5 & 24 mo. i u a
  • 35. Conclusions
    • importance of dynamic information
    • implications of (lack of) (local) context
    • interdisciplinary nature of phonetics
    • need for large, annotated, and freely accessible speech corpora
    • generalization via computational phonetics
    • phonetics and phonology (Patricia Keating)