Frequency Effects on Perceptual Compensation for Coarticulation                                     Alan C. L. Yu1 , Ed Ki...
in equidistant steps from the abovementioned speaker’s /pa/ and                                           Vocalic Context ...
Table 2: Estimates for all predictors in the analysis of listener response in the identification task.                     ...
neural mechanism that subserves speech perception may even-                 [17] A. G. Samuel and T. Kraljic, “Perceptual ...
Upcoming SlideShare
Loading in …5

Paper diomede


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Paper diomede

  1. 1. Frequency Effects on Perceptual Compensation for Coarticulation Alan C. L. Yu1 , Ed King1 , Morgan Sonderreger2 1 Phonology Laboratory, Department of Linguistics, University of Chicago 2 Department of Computer Science, University of Chicago,, Abstract effort has gone into identifying the likely sources of such errorsErrors in compensating perceptually for effects of coarticulation [10, 11, 8, 12, 9], little is known about the source of regularityin speech have been hypothesized as one of the major sources in listener misperception that leads to the systematic nature ofof sound change in language. Little research has elucidated the sound change. That is, why would random and haphazard mis-conditions under which such errors might take place. Using the perception in an individual’s percept lead to systematic reorga-paradigm of selective adaptation, this paper reports the results nization of the sound system within the individual and withinof a series of experiments testing for the effect of frequency on the speech community? The present study demonstrates thatlikelihood of perceptual compensation for coarticulation by lis- the likelihood of listeners adjusting their categorization patternteners. The results suggest that perceptual compensation might contextually (i.e. perceptual compensation) may be affected bybe ameliorated (which might result in hypocorrection) or ex- the frequencies of the sound categories occurring in the specificaggerated (i.e. hypercorrection) depending on the relative fre- contexts. In particular, the present study expands on Beddorquency of the categories that are being perceived in their spe- et al.’s work [7] on the perceptual compensation for vowel-to-cific coarticulated contexts. vowel coarticulation in English, showing that the way EnglishIndex Terms: perceptual compensation, sound change, selec- listeners compensate perceptually for the effect of regressivetive adaptation. coarticulation from a following vowel (either /i/ or /a/) depends on the relative frequency of the coarticulated vowels (i.e. the relative frequency of /a/ and /e/ appearing before /i/ or /a/). The 1. Introduction idea that category frequency information affects speech percep-A fundamental property of speech is its tremendous variability. tion is not new. Research on selective adaptation has shown thatMuch research has shown that human listeners take such vari- repeated exposure to a particular speech sound, say /s/, wouldability into account in speech perception [1, 2, 3, 4, 5, 6]. Bed- shift the identification of ambiguous sounds, say sounds thatdor and colleagues [7], for example, found that adult English are half-way between /s/ and /S/, away from the repeatedly pre-and Shona speakers perceptually compensate for the coarticula- sented sound towards the alternative [13, 14, 15]. In perceptualtory anticipatory raising of /a/ in C/i/ context and the anticipa- learning studies, repeated exposure to an ambiguous sound, saytory lowering of /e/ in C/a/ context. Both English and Shona a /s/-/f/ mixture, in /s/-biased lexical contexts induces retunedlisteners report hearing more /a/ in the context of a following /i/ perception such that subsequent sounds are heard as /s/ eventhan in the context of a following /a/. Many scholars have hy- in lexically neutral contexts [16, 17]. The experiments reportpothesized that a primary source of systematic sound changes in below extend Beddor et al. [7]’s findings by presenting threelanguage comes from errors in perceiving the intended speech groups of participants with the same training stimuli but vary-signal [8, 9]. That is, errors in listeners’ classification of speak- ing the frequency with which they hear each token. The purposeers’ intended pronunciation, if propogated, might result in sys- of ths present study is to demonstrate that contextually-sensitivetematic changes in the sound systems of all speakers-listeners category frequency information can induce selective adaptationwithin the speech community. Ohala [8], in particular, argues effects in perceptual compensation and to examine the implica-that hypocorrective sound change (e.g., assimilation and vowel tions of such effects on theories of sound change.harmony) obtains when a contextual effect is misinterpreted asan intrinsic property of the segment (i.e. an increase in falsepositive in sound categorization). For example, an ambiguous 2. Methods/a/ token might be erroneously categorized as /e/ in the context 2.1. Stimuliof a following /i/ if the listeners fail to take into account of theanticipatory raising effect of /i/. If enough /a/ exemplars are The training stimuli consisted of CV1CV2 syllables where Cmisidentified as /e/, a pattern of vowel harmony might emerge. is one of /p, t, k/, V1 is either /a/ or /e/, and V2 is either /a/That is, the language will show a prevalence of mid vowels be- or /i/. To avoid any vowel-to-vowel coarticulatory effect in thefore /i/ and low vowels before /a/. On the other hand, a hyper- training stimuli, a phonetically-trained native English speakercorrective sound change (e.g., dissimilation) emerges when the (second author) produced each syllable of the training stimuli inlistener erroneously attributes intended phonetic properties as isolation (/pa/ /pe/, /pi/, /ta/, /te/, ti/, /ka/, /ke/, /ki/). The train-contextual variation (i.e. an increase in false negative in sound ing disyllablic stimuli were assembled by splicing together theidentification). In this case, an ambiguous /e/ might be mis- appropriate syllable and were resynthesized with a consistentclassifed as /a/ in the context of a following /i/. If enough /e/ intensity and pitch profile to avoid potential confound of stress.exemplars are misidentified as /a/ when followed by /i/, a dis- The test stimuli consisted of two series of /pV1pV2/ disyllablessimilatory pattern of only low vowels before high vowels and where V2 is either /a/ or /i/. The first syllable, pV1, is a 9-stepnon-low vowels before low vowels might emerge. While much continuum resynthesized in PRAAT by varing in F1, F2, and F3
  2. 2. in equidistant steps from the abovementioned speaker’s /pa/ and Vocalic Context x Step 1.0/pe/ syllables. The original /pa/ and /pe/ syllables serve as theend points of the 9-step continuum. 0.82.2. Participants and procedure Probability of ’a’ 0.6The experiment consists of two parts: exposure and testing. iSubjects were assigned randomly to three exposure conditions.One group was exposed to CeCi tokens four times more of- a 0.4ten than to CeCa tokens and to CaCa tokens four times moreoften than to CaCi ones (the H YPER condition). The second 0.2group was exposed to CeCa tokens four times more often thanto CeCi tokens and to CaCi tokens four times more often thanto CaCa ones (the H YPO condition). The final group was ex- 0.0 v2posed to an equal number of /e/ and /a/ vowels preceding /i/ and 2 4 6 8/a/ (the BALANCED condition). See Table 1 for a summary of Stepfrequency distribution of exposure stimuli. The exposure stim-uli were presented over headphones automatically in randomorder in E-Prime in a sound-proof booth. Subjects performed a Figure 1: Interaction between VOCALIC C ONTEXT and S TEP.phoneme monitoring task during the exposure phase where they The predictor variables were back-transformed to their originalwere asked to press a response button when the word contains a scales in the figure.medial /t/. Each subject heard 360 exposure tokens three times;a short break follows each block of 360 tokens. A total of forty-eight students at the University of Chicago, all native speakers vocalic contexts are both significant predictors of /a/ response.of American English, participated in the experiment for course That is, listeners reported hearing less and less /a/ from the /a/-credit or a nominal fee. Eleven subjects took part in the H YPO end of the continuum to the /e/-end of the continuum and theycondition, sixteen subjects each participated in the H YPER con- heard more /a/ when the target vowel is followed by /i/ thandition and the BALANCED condition. when it is followed by /a/. Specifically, the odds of hearing /a/ During the testing phase, subjects performed a 2-alternative before /i/ is 1.3 times that before /a/. The significant interac-force-choice task. The subject listened to a randomized set of tion between S TEP and VOCALIC C ONTEXT suggests that thetest stimuli and were asked to decide whether the first vowel vocalic context effect differs depending on where the test stimu-sounds like /e/ or /a/. lus is along the /a/-/e/ continuum. As illustrated in Figure 1, the effect of vocalic context is largest around steps 4-6 while identi- 3. Analysis fication is close to ceiling at the two endpoints of the continuum regardless of vocalic contexts. Of particular interest here is theSubject’s responses (i.e. subject’s /a/ response rates) were mod- significant interaction between the exposure condition and vo-eled using a mixed-effect logistic regression. The model con- calic contexts. Figure 2 illustrates this interaction clearly; thetains four fixed variables: T RIAL (1-180), C ONTINUUM S TEP effect of vocalic context on /a/ response is influenced by the(1-9), E XPOSURE C ONDITION (balanced, hyper, hypo) and nature of the exposure data. When the exposure data containsVOCALIC C ONTEXT (/a/ vs. /i/). The model also includes three more CaCa and CeCi tokens than CaCi and CeCa tokens (i.e.two-way interactions: VOCALIC C ONTEXT x S TEP, S TEP x the hyper condition), listeners report hearing more /a/ in the /i/C ONDITION, and VOCALIC C ONTEXT x C ONDITION. In addi- context than in the /a/ context, compared to the response rate af-tion, the model includes a by-subject random slopes for T RIAL. ter the balanced condition where the frequency of CaCa, CeCi,A likelihood ratio test comparing a model with a VOCALIC CaCi, and CeCa tokens are equal. On the other hand, in the hypoC ONTEXT x S TEP x C ONDITION as a three-way interaction condition where listeners heard more CaCi and CeCa tokensterm and one without it shows that the added three-way interac- than CaCa and CeCi ones, listeners reported hearing less /a/ intion does not significantly improve model log-likelihood (χ2 = the /i/ context than in the /a/ context, the opposite of what is ob-3.253, df = 2, P r(> χ2 ) = 0.1966). Table 2 summarizes the pa- served in both the balanced and hyper conditions. The modelrameter estimate β for all fixed effects in the model, as well as also shows a significant interaction between C ONDITION andthe estimate of their standard error SE(β), the associated Wald’s S TEP. As illustrated in Figure 3, the slope of the identificationz-score, and the significance level. To eliminate collinearity, function is the steepest after the hyper condition, but shallowestscalar variables were centered, while the categorical variables in the hypo condition.were sum-coded. Consistent with Beddor et al.’s findings, continuum step and 4. Discussion and conclusion The present study shows that the classification of vowels in dif-Table 1: Stimuli presentation frequency during the exposure ferent prevocalic contexts is influenced by the relative frequencyphase. C = /p, t, k/ distribution of the relevant vowels in specific contexts. For ex- ample, when /a/ frequently occurs before /a/, listeners are less Type BALANCED H YPER H YPO likely to identify future instances of ambiguous /a/-/e/ vowels CeCi 90 144 36 as /a/ in the same context; listeners would report hearing more CeCa 90 36 144 /e/ before /a/ if CaCa exemplars outnumber CeCa exemplars. CaCi 90 36 144 Likewise, when /a/ occurs frequently before /i/, listeners would CaCa 90 144 36 reduce their rate of identification of /a/ in the same context; lis-
  3. 3. Table 2: Estimates for all predictors in the analysis of listener response in the identification task. Predictor Coef.β SE(β) z p Intercept -0.0096 0.0674 -0.14 0.8867 T RIAL -0.0008 0.0009 -0.87 0.3825 S TEP -0.9260 0.0185 -49.98 < 0.001 *** VOCALIC C ONTEXT = a -0.2690 0.0316 -8.51 < 0.001 *** C ONDITION = hyper -0.0594 0.0934 -0.64 0.5251 C ONDITION = hypo 0.0126 0.0950 0.13 0.8942 S TEP x VOCALIC C ONTEXT = a 0.0372 0.0170 2.19 < 0.05 * S TEP x C ONDITION = hyper 0.0481 0.0247 1.95 0.0514 S TEP x C ONDITION = hypo -0.2631 0.0296 -8.89 < 0.001 *** VOCALIC C ONTEXT = a x C ONDITION = hyper 0.0311 0.0432 0.72 0.4717 VOCALIC C ONTEXT = a x C ONDITION = hypo -0.4146 0.0477 -8.70 < 0.001 *** Vocalic Context x Condition Condition x Step 1.0 1.0 0.8 0.8 Probability of ’a’ Probability of ’a’ 0.6 0.6 i Hypo a Balanced 0.4 0.4 Hyper Condition 0.2 0.2 0.0 0.0 v2 Balanced Hyper Hypo 2 4 6 8 Condition StepFigure 2: Interaction between VOCALIC C ONTEXT and C ON - Figure 3: Interaction between C ONDITION and S TEP.DITION . the balanced condition were heard as /e/ in the HYPO condition when V2 = /i/. If this type of reclassification persists, listenersteners would report hearing more /e/ before /i/ if CaCi tokens are in the hypo condition would develop a pseudo-lexicon wheremore prevalent than CeCi tokens. These results suggest that lis- vowels in disyllabic words must agree in lowness and a stateteners exhibit selective adaptation when frequency information of vocalic height harmony would obtain, similar to many casesof the target sounds varies in a context-specific fashion. That is, found in the Bantu languages of Africa [19].the repeated exposure to an adaptor (the more frequent variant) Another ramification the present findings have for listener-results in heighten identification of the alternative. This find- misperception models of sound change concerns the role ofing has serious implications for models of sound change that the conditioning environment. That is, such models of soundafford a prominent role to listener misperception to account for change often attribute misperception to listeners failing to detectsources of variation that lead to change. the contextual information properly and thus failing to prop- To begin with, subjects in the hyper exposure condition ex- erly normalize for the effect of context on the realization ofhibit what can be interpreted as hypercorrective behavior. That the sound in question. Here, our findings establish that system-is, speech tokens that were classified as /a/ in the balanced atic “failure” of perceptual compensation take place despite thecondition were being classified as /e/ in the hyper condition presence of the coarticulatory source; perceptual compensationwhen V2 = /a/; likewise, sounds that were classifed as /e/ in “failure” is interpreted here as whenever the context-specificthe banalced condition were treated as /a/ in the hyper condi- identification functions deviate from the canonical identificationtion when V2 = /i/. If this type of hypercorrective behavior functions observed in the balanaced condition. This findingpersists, the pseudo-lexicon of the made-up language our sub- echoes early findings that perceptual compensation may only bejects experienced would gradually develop a prevalence of di- partial under certain circumstances. Taken together, these find-syllabic “words” that do not allow in consecutive syllables two ings suggest that failure to compensate perceptually for coar-low-vowels or two non-low vowels. This would represent a state ticulatory influence need not be the result of not detecting theof vocalic height dissimilation, not unlike the pattern found in source of coarticulation. Listeners may exhibit behaviors of notthe Vanuatu languages [18]. On the other hand, listeners in the taking into account properly the role of coarticulatory contextshypo exposure condition exhibit what could be interpreted as have on speech production and perception.hypocorrective behavior. That is, tokens that were classified as It is worth pointing out in closing that selective adaptation/e/ in the balanced condition were being classified as /a/ in the effects have generally been attributed to adaptors fatiguing spe-HYPO condition when V2 = /a/; likewise, vowels heard as /a/ in cialized linguistic feature detectors [13], which suggests that the
  4. 4. neural mechanism that subserves speech perception may even- [17] A. G. Samuel and T. Kraljic, “Perceptual learning for speech,”tually recuperate from adaptor fatigue and the selective adap- Attention, Perception, & Psychophysics, vol. 71, no. 6, pp. 1207–tation might dissipate. There is some evidence that selective 1218, 2009.adaptation effects are temporarily [20]. The lack of durativ- [18] J. Lynch, “Low vowel dissimilation in Vanuato languages,”ity of selective adaptation raises doubt about its implication for Oceanic Linguistics, vol. 42, no. 2, pp. 359–406, 2003.sound change since sound change necessitates the longevity of [19] F. B. Parkinson, “The representation of vowel height in phonol-the influencing factors. Additional research is underway to as- ogy,” PhD dissertation, Ohio State University, 1996.certain the longitudinal effects of selective adaptation. Such [20] J. Vroomen, S. van Linden, M. Keetels, B. de Gelder, and P. Ber-data will provide much needed information regarding the sig- telson, “Selective adaptation and recalibration of auditory speechnificance of selective adaptation effects on speech perception by lipread information: dissipation,” Speech Communication,and sound change. vol. 44, p. 5561, 2004. 5. AcknowledgementsThis work is partially supported by National Science Founda-tion Grant BCS-0949754. 6. References [1] V. Mann, “Influence of preceding liquid on stopconsonant percep- tion,” Perception & Psychophysics, vol. 28, no. 5, p. 40712, 1980. [2] V. A. Mann and B. H. Repp, “Influence of vocalic context on per- ception of the [ ]-[s] distinction,” Perception & Psychophysics, vol. 28, pp. 213–228, 1980. [3] J. S. Pardo and C. A. Fowler, “Perceiving the causes of coartic- ulatory acoustic variation: consonant voicing and vowel pitch.” Perception & Psychophysics, vol. 59, no. 7, pp. 1141–52, 1997. [4] A. Lotto and K. Kluender, “General contrast effects in speech per- ception: effect of preceding liquid on stop consonant identifica- tion,” Perception & Psychophysics, vol. 60, no. 4, p. 60219, 1998. [5] P. Beddor and R. A. Krakow, “Perception of coarticulatory nasal- ization by speakers of English and Thai: Evidence for partial com- pensation,” Journal of the Acoustical Society of America, vol. 106, no. 5, pp. 2868–2887, 1999. [6] C. Fowler, “Compensation for coarticulation reflects gesture perception, not spectral contrast,” Perception & Psychophysics, vol. 68, no. 2, p. 161177, 2006. [7] P. S. Beddor, J. Harnsberger, and S. Lindemann, “Language- specific patterns of vowel-to-vowel coarticulation: acoustic struc- tures and their perceptual correlates,” Journal of Phonetics, vol. 30, pp. 591–627, 2002. [8] J. Ohala, “The phonetics of sound change,” in Historical Linguis- tics: Problems and Perspectives, C. Jones, Ed. London: Long- man Academic, 1993, pp. 237–278. [9] J. Blevins, Evolutionary Phonology: the emergence of sound pat- terns. Cambridge: Cambridge University Press, 2004.[10] J. Ohala, Sound change is drawn from a pool of synchronic varia- tion. Berlin: Mouton de Gruyter, 1989, pp. 173–198.[11] ——, “The phonetics and phonology of aspects of assimilation,” in Papers in Laboratory Phonology I: Between the Grammar and the Physics of Speech, J. Kingston and M. Beckman, Eds. Cam- bridge: Cambridge University Press, 1990, vol. 1, pp. 258–275.[12] ——, “Towards a universal, phonetically-based, theory of vowel harmony,” ICSLP, Yokohama, vol. 3, pp. 491–494, 1994.[13] P. Eimas and J. Corbit, “Selective adaptation of linguistic feature detectors,” Cognitive Psychology, vol. 4, pp. 99– 109, 1973.[14] P. D. Eimas and J. L. Miller, “Effects of selective adaptation of speech and visual patterns: Evidence for feature detectors,” in Per- ception and Experience, H. L. Pick and R. D. Walk, Eds. N.J.: Plenum, 1978.[15] A. G. Samuel, “Red herring detectors and speech perception: In defense of selective adaptation,” Cognitive Psychology, vol. 18, pp. 452–499, 1986.[16] D. Norris, J. M. McQueen, and A. Cutler, “Perceptual learning in speech,” Cognitive Psychology, vol. 47, no. 2, pp. 204–238, 2003.