Published on

Published in: Technology
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. CS 551/652: Structure of Spoken Language Lecture 2: Spectrogram Reading and Introductory Phonetics John-Paul Hosom Fall 2008
  2. 2. <ul><li>Spectrogram Reading </li></ul><ul><li>Why bother?? </li></ul><ul><li>What’s the point of spectrogram reading? Do people read </li></ul><ul><li>spectrograms as part of their job? Do computers “read” spectrograms </li></ul><ul><li>in order to recognize speech? </li></ul><ul><li>There are some jobs that require spectrogram reading (e.g. phonetic </li></ul><ul><li>time alignment), but not many. Automatic speech recognition </li></ul><ul><li>systems do not process speech in this way. </li></ul><ul><li>Primary reason for spectrogram reading: </li></ul><ul><ul><li>If you’re going to work on a problem, it’s advisable to </li></ul></ul><ul><ul><li>understand the nature of that problem. Spectrogram reading </li></ul></ul><ul><ul><li>provides a direct method for “hands-on” learning of the </li></ul></ul><ul><ul><li>characteristics of speech. Studying phonetics, signal processing, </li></ul></ul><ul><ul><li>or techniques in speech recognition/speech synthesis does not </li></ul></ul><ul><ul><li>fully convey of the complexity and structure of spoken language. </li></ul></ul>
  3. 3. A great website on spectrogram reading: includes “how to” tips on spectrogram reading, a monthly “mystery spectrogram”, and archives of past months’ spectrograms.
  4. 4. Phonetics: Introduction Phonology: A description of the systems and patterns of sounds that occur in a language (abstract), often involving comparisons between languages and/or evolution of a language over time. Phonetics: A branch of phonology that deals with individual speech sounds, their production, and their written representation. Phoneme: • A unit of speech that can be used to differentiate words (e.g. “cat” /k ae t/ vs. “bat” /b ae t/). • Phonemes identify minimal pairs in a language. • The set of phonemes in a language subject to interpretation; most languages have 20 to 40 phonemes.
  5. 5. Phonetics: Introduction Allophone: A speech sound constituting one of the systematic phonetic variants of a given phoneme. Different allophones are predictable from environment (e.g. “toe”, “caught”, “fitness”, “writer”; “sill”, “still”, “spill”) Phone: An acoustic realization of a phoneme. (Many different phones may represent the same phoneme.) “The phoneme /s/ consists of more than 100 allophones” − Pickett, The Acoustics of Speech Communication , p. 7. Phonemes indicated by / /; phones (allophones) indicated by [ ].
  6. 6. Phonetics: Introduction Syllable: • Unit of speech containing one or more phonemes. • A vowel in a syllable is called the syllable nucleus . • Most syllables contain one vowel (or diphthong); some contain only a lateral (“bott/le”) or nasal (“butt/on”) as the most intense sound. • Syllable boundaries sometimes ambiguous (“tas/ty” vs. “tast/y” vs. “ta/sty”) Coarticulation: The “blending” of two or more adjacent phones, causing a non-distinct boundary between them. Coarticulation is caused by smooth changes in the articulators (lips, tongue, jaw) over time.
  7. 7. Phonetics: Introduction Coarticulation Example: y uw aa r “ you are”: /y uw aa r/
  8. 8. <ul><li>Phonetics: Introduction (adapted from Schane, p. 4-6) </li></ul><ul><li>Speech signal is continuous; we perceive discrete entities. </li></ul><ul><li>(How many sound units are in the word “cat”?) </li></ul><ul><li>One assumption of phonology: utterances can be represented as </li></ul><ul><li>sequence of discrete units. </li></ul><ul><li>Are such units purely an “invention” of linguistics? </li></ul><ul><li>Spoonerisms (“belly jeans” vs. “jelly beans”) and rhymes indicate small units of language (Reverend William Archibald Spooner (1844-1930)) </li></ul><ul><li>Utterances of the same word(s) have many differences… we’re </li></ul><ul><li>usually only interested in those differences that are “linguistically </li></ul><ul><li>significant” or that are “perceived as different”. </li></ul><ul><li>Implies a somewhat subjective nature to phonology, whereas </li></ul><ul><li>we want an objective measure of perceived or produced units. </li></ul>
  9. 9. Phonetics: Distinctive Phonetic Features • Phonemes do not differ randomly from one another; there are relationships among phonemes (e.g. /p/ vs. /t/ vs. /ah/) • A (distinctive) feature is a “phonetic property that can be used to classify sounds” [Ladefoged, p. 42] • Typically, features are associated with aspects of articulation • Features may be binary or multi-valued • Capital letters indicate feature name: Manner square brackets [] indicate feature value: [+fricative]
  10. 10. <ul><li>Phonetics: Distinctive Phonetic Features </li></ul><ul><li>• Exact set of features and feature values depends on goals </li></ul><ul><li> (no “right” or “wrong” set of features or values) </li></ul><ul><li>• Distinctive features provide a vocabulary for describing speech </li></ul><ul><li>• Are distinctive features purely an “invention” of linguistics? </li></ul><ul><ul><li>memory tasks show that when people forget a phoneme, they usually remember a phoneme with similar distinctive features </li></ul></ul>
  11. 11. Phonetics: Distinctive Phonetic Features nasal tract (hard) palate oral tract velum (soft palate) velic port tongue tongue tip pharynx glottis (vocal folds and space between vocal cords) vocal folds (larynx) = vocal cords alveolar ridge lips teeth The Speech Production Apparatus (from Olive, p. 23)
  12. 12. Phonetics: Distinctive Phonetic Features * Feature Description _ Consonantal produced with a constriction along center line of oral cavity. Only vowels, /w/, /h/, and /y/ are not. Vocalic largely unobstructed vocal tract. Vowels and liquids (/l/, /r/) are vocalic; glides (/w/, /y/) are not. Anterior point of articulation near alveolar ridge, including all labial and dental sounds. Coronal articulation involves front of tongue Continuant no complete obstruction in oral cavity; only nasals, stops, and affricates are non-continuant Strident articulation with long, narrow constriction; such as /s/, /z/, /f/, /v/, /sh/, /zh/, /ch/, /jh/ Voiced vibration of the vocal folds occurs during articulation
  13. 13. Phonetics: Distinctive Phonetic Features * Feature Description _ Lateral contact between corona of tongue and roof of mouth, with lowering of sides of tongue (only /l/ in English) Nasal lowering of the velic port and opening of nasal cavity. High vowel with high tongue position (narrow constriction); in English, /iy/, /ih/, /uh/, /uw/ Low vowel with low tongue position (no constriction); /ae/, /ao/, /aa/ are (some) low vowels in English. Back vowels produce with tongue toward back of mouth; /uw/, /uh/, /ah/, /ao/, /aa/, /ow/ are back vowels Round articulation involving rounding of the lips; only /uw/, /ow/, /ao/, and /uh/ are rounded in English. However, /uh/ may take an unrounded form. * Adapted from “Language” by C.E.Cairns and F. Williams in Normal Aspects of Speech, Hearing, and Language , edited by Minifie, Hixon, and Williams, 1973, p. 424, as printed in Daniloff p. 51.
  14. 14. Phonetics: More Distinctive Phonetic Features * Feature Description _ Sonorant “resonant quality” of a sound; vowels are +sonorant, stops and fricatives are –sonorant. nasals also sonorant. Syllabic is the phoneme the main sound in a syllable? vowels are syllabic, stops are usually –syllabic, but there are syllabic nasals and liquids. Tense tense vowels are longer, more fully articulated, and more “distinct,” e.g. /iy ey uw ow aa/; lax vowels are less so, e.g. /ih eh uh ah/. Aspirated produced without a constriction in the vocal tract, but also without voicing (/h/). Glottalized produced with aperiodic or extremely low-frequency vibrations of the vocal cords. Diphthong a single phoneme composed of two or more other phonemes in sequence (/ay/, /oy/, /ei/, /aw/, /ow/) * from Schane, pp. 26-32
  15. 15. Phonetics: Distinctive Phonetic Features Physiological Features: • Manner stop /p/, fricative /s/, affricate /ch/, liquid /l/, /r/, glide /j/, /w/, nasal /m/, vowel /ah/, aspiration /h/ • Place bilabial /p/, labiodental /f/, dental /th/, alveolar /t/, palato-alveolar /r/, palatal /sh/, velar /k/, glottal /h/, front /iy/, mid /ah/, back /aa/ ( can combine mid + back) • Height high /iy/, mid-high /ih/, mid /ax/, mid-low /eh/, low /aa/ or high /iy/, mid /eh/, low /aa/ (3 values, plus tense/lax) • Tenseness, Nasality, Rounding same as previous descriptions
  16. 16. Phonetics: Distinctive Feature Relationships: Vowels * from Schane, pp. 12-13. † /ax/ is slightly more centralized than /ah/, and shorter in duration  (ao) a (aa) œ æ (ae) Low o (ow) ^ (ah) ö e (eh) Mid u (uw) i (ix) ü i (iy) High Rounded Unrounded Rounded Unrounded Back Front uh Lax ao ow uw Tense Back, +Round aa ae Low ah, ax † eh ey Mid ix ih iy High Lax Tense Lax Tense Back, –Round Front, –Round
  17. 17. <ul><li>Phonetics: Distinctive Phonetic Features: The Case of /ae/ </li></ul><ul><li>/ae/ is classified in the preceding table as “lax”, but we have been considering it as “tense”. </li></ul><ul><li>One Rule for Differentiating Tense/Lax: </li></ul><ul><li>A lax vowel can never be a word-final stressed vowel </li></ul><ul><li>e.g. /iy/ can be word final: “be” /b iy/, “tea” /t iy/ </li></ul><ul><li>/ih/ can not be word final in one-syllable word: /b ih/, /t ih/ </li></ul><ul><li>/ah/ can be word final, but only if unstressed. </li></ul><ul><li>According to this rule, both /eh/ and /ae/ are lax, because they can not be word-final stressed vowels. In this case, the tense vowel in contrast to /eh/ is /ey/. </li></ul><ul><li>However, /ae/ is long in duration (e.g. Forgie and Forgie (1959) and Peterson and Lehiste (1960) ), making it acoustically more similar to a tense vowel. </li></ul><ul><li>For spectrogram reading, we’re more concerned with acoustics, so we’ll call /ae/ a tense vowel, although others may call it lax. </li></ul>
  18. 18. <ul><li>Phonetics: Distinctive Phonetic Features: The Case of /ae/ </li></ul><ul><li>Looking at 130,000 words in the CMU dictionary: </li></ul><ul><li>PHN CNT PCNT EXAMPLES </li></ul><ul><li>/iy/ 12945 0.10002 </li></ul><ul><li>/ih/ 15 0.00012 “chui”, “des”, “kiwani”, “lui”, “moishe”, “pih”, “ to ” </li></ul><ul><li>/eh/ 30 0.00023 “bienvenue”, “des”, “eh”, “moshe”, “yahweh”, “zeh” </li></ul><ul><li>/ae/ 5 0.00004 “dhaka”, “lashua”, “losoya”, “pah”, “ yeah ” </li></ul><ul><li>/uw/ 714 0.00552 </li></ul><ul><li>/uh/ 2 0.00002 “l’heureux”, “milieu” </li></ul><ul><li>/ah/ 6413 0.04955 </li></ul><ul><li>/aa/ 170 0.00131 </li></ul><ul><li>/ao/ 243 0.00188 </li></ul><ul><li>/ey/ 962 0.00743 </li></ul><ul><li>/ay/ 379 0.00293 </li></ul><ul><li>/oy/ 167 0.00129 </li></ul><ul><li>/yu/ 171 0.00132 </li></ul><ul><li>/aw/ 226 0.00175 </li></ul><ul><li>/ow/ 5137 0.03969 </li></ul><ul><li>0.21280 21% of words end in vowel/diphthong </li></ul>
  19. 19. Front Central Back High Mid Low iy ih eh ae ah aa ao uh uw ix ax ju ey ay aw ow Phonetics: Distinctive Feature Relationships: Vowels from Ladefoged, pp. 38, 81, 218 with correction to /aw/ oy
  20. 20. approximant obstruent Phonetics: Distinctive Feature Relationships: Consonants from Olive, p. 28 and Daniloff, p. 56 n l +voice lateral r +voice retroflex (w) y w +voice glides ng m +voice nasals ch -voice jh +voice affricates sh s th f -voice h zh z dh v +voice fricatives k t p -voice g d b +voice stops glottal velar palatal palato- alveolar alveolar dental labio-dental bilabial Voicing Manner
  21. 21. Phonetics: Distinctive Feature Relationships: Consonants from Ladefoged, p. 44 -anterior +anterior l +lateral approximant y r w -lateral th dh f v -sibilant fricative sh zh s z ch jh +sibilant k g t d p b -nasal stop ng n m +nasal -sibilant Dorsal Coronal Labial
  22. 22. Approximants: Terminology <ul><li>“ Approximants” are NOT the same as “Semi-Vowels” (although Rabiner states they are the same…). American English /r/ is debatable, but we’ll exclude it from the Semi-Vowels for consistency. (Ladefoged p. 229) </li></ul><ul><li>Approximants can be divided into two groups: Liquids and Glides Liquid = {/l/, /r/}, Glide = {/w/, /y/} (Again, Rabiner confuses things by mixing up these sets) </li></ul><ul><li>Lateral = {/l/} </li></ul><ul><li>Retroflex = {/r/, /er/, /axr/}. (In some cases, /er/ is considered a retroflex but /r/ isn’t; we’ll keep things simple by calling /r/ a retroflex). </li></ul><ul><li>Central Approximants = {/r/, /w/, /y/}, Lateral Approximant = {/l/} </li></ul>
  23. 23. Approximants: Terminology Approximant Semi-Vowel / Glide Liquid Lateral Retroflex /y/ /w/ /r, er, axr/ /l/ central approximants lateral approximant