Perception & Psychophysics1974, Vol. 16 (3), 513-521                Auditory and linguistic processing of cues            ...
514       EIMAS matching a set of features with its appropriate label.    speech sounds that varied in the starting freque...
ACOUSTIC CUES FOR PLACE OF ARTICULATION IN INFANTS                                         515  phonetic category [g]. The...
516               EIMAS         75                                                                                  I     ...
ACOUSTIC CUES FOR PLACE OF ARTICULATION IN INFANTS                                           517boundaries (ct. Stevens, 1...
518       EIMAS  was no evidence for categorical perception; Groups D     discriminable in one context (nonspeech) might n...
ACOUSTIC CUES FOR PLACE OF ARTICULATION IN INFANTS                        519 feature detectors which permit phonetic deci...
520         EIMASbut also automatically acts to segment the continuous                        voicing of initial stops: Ac...
ACOUSTIC CUES FOR PLACE OF ARTICULATION IN INFANTS                                         521during the initial 2 min. Th...
Upcoming SlideShare
Loading in …5

Auditory and linguistic processing


Published on

Published in: Technology, Health & Medicine
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Auditory and linguistic processing

  1. 1. Perception & Psychophysics1974, Vol. 16 (3), 513-521 Auditory and linguistic processing of cues for place of articulation by infants* PETER D. EIMAS W. S. Hunter Laboratory ofPsychology, Brown University, Providence, Rhode Island 02912 Two- and. 3-mo~th~0Id infants ~ere found to discriminate the acoustic cues for the phonetic feature of place of articulation III a categorical manner; that is, evidence for the discriminability of two synthetic sp~ech patter~s ~as. pre~~nt only when the stimuli signaled a change in the phonetic feature of place. No evidence of discriminability was found when two stimuli, separated by the same acoustic difference, signaled acoustic variations of the same phonetic feature. Discrimination of the same acoustic cues in a nonspeech context was found, in contrast, to be noncategorical or continuous. The results were discussed in terms of infants ability to process acoustic events in either an auditory or a linguistic mode. A major conclusion of the research on the 1972) and, to a considerably lesser extent, with vowelperception of speech over the past two decades is that stimuli under some conditions (Pisoni, 1973; Stevens,human listeners perceive speech quite differently from Liberman, Studdert-Kennedy, & Ohman, 1969;the way in which they perceive nonspeech sounds. Fujisaki & Kawashima, 1969).Speech perception is said to occur in a linguistic or Although these findings are of considerablespeech mode as opposed to an auditory mode (Eimas, importance in and of themselves, they are particularlyin press; Liberman, Cooper, Shankweiler, & revealing when the discriminability functions for theStuddert-Kennedy, 1967; Liberman, 1970; Studdert- synthetic speech stimuli are compared with theKennedy, 1974). Evidence for this conclusion comes discriminability functions for the same acousticfrom studies in which listeners identified and dimensions when presented in a nonspeech context.discriminated series of synthetic speech sounds that Mattingly et al (1971) showed that the discontinuitiesvaried continuously along a single acoustic dimension. in the discriminability functions, so apparent whenListeners were typically quite consistent in assigning the synthetic sounds were perceived as speech, werephonetic labels to the various stimuli. Moreover, and absent when the stimuli were not perceived as speech.most importantly, the ability to discriminate pairs of More specifically, they showed that variations in thestimuli was strongly determined by the phonetic starting frequency and direction of the second-assignments. Thus, two stimuli that were acoustic formant transition (which are sufficient for thevariations of the same phonetic category or feature perceived distinctions among the voiced stopwere discriminated only slightly better than would be consonants [b, d , g) and their voiceless counterpartsexpected by chance, whereas two stimuli that were [po t, k)) were perceived categorically and continuouslyseparated by the same acoustic difference but in speech and nonspeech contexts, respectively. Themembers of different phonetic categories were highly latter condition was arranged by presenting only thediscriminable. This form of perception has been second-formant transitions which are heard astermed categorical and is particularly apparent for bird-like chirps. Similar effects have also beenthose acoustic dimensions that distinguish the stop obtained by Miyawaki et al (1972). In addition,consonants (Liberman, 1957; Liberman, Harris, Eimas, Cooper, and Corbit (1973) have shown theHoffman, & Griffith, 1957; Eimas, 1963; Mattingly, acoustic information signaling the onset of voicing isLiberman, Syrdal, & Halwes, 1971; Lisker & perceived differently when presented in a speech asAbramson, 1970; Abramson & Lisker, 1970; Pisoni, opposed to a nonspeech context.1973). Categorical-like perception has also been Although it is certainly the case that all sounds,found with other consonantal distinctions (e.g., speech or nonspeech, must undergo some commonMiyawaki, Liberman, Fujimura, Strange, & Jenkins, auditory analysis, the sounds of speech would appear to undergo some additional, specialized processing *The studies reported here were supported by Grant HD 05331 that permits the extraction of distinctive phoneticfrom the National Institute of Child Health and Human features. It is the process of feature extraction that isDevelopment. United States Public Health Service. I wish to thank categorical in that it reduces the continuous variationJane Carlstrom. Jean Fechheimer, and Sheila Polofsky for their in acoustic-auditory information to a set of discretecapable assistance in the collection and analysis of the data. I also feature values. The actual phonetic experience, thatwish to acknowledge my indebtedness to Franklin S. Cooper for hisgenerosity in making the facilities of the Haskins Laboratories is, the perception of a particular phone involvesavailable for the preparation of stimulus materials. additional higher processes, including most likely 513
  2. 2. 514 EIMAS matching a set of features with its appropriate label. speech sounds that varied in the starting frequency Add itional evidence favoring a special speech and direction of the second- and third-formantprocessor comes from studies of dichotic listening. A transitions. These variations correspond to the number of researchers have shown that speech signals variations in articulatory movements necessary for theare better perceived by the left hemisphere (right ear) production of bilabial stops [b, p] as opposed to apicalwhereas nonspeech signals are better processed by the [d, t] or velar [g, k] stops. Discrimination of aright hemisphere (left ear) (Kimura, 1961, 1964; particular acoustic difference was measured underStuddert-Kennedy &, Shankweiler, 1970). Electro- two conditions: (1) when the acoustic variationphysiological studies of neural activity have also, signifies a change in place of articulation as measuredsupported this conclusion in adults (Wood, Goff, & by adult identification functions, and (2) when theDay, 1971). J variation represents acoustic variants of the same Of particular interest have been the recent place of articulation. Evidence of greater discrimin-electrophysiological (Molfese, 1972) and behavioral ability in the first condition permits the inference thatfindings (Eimas, Siqueland , Jusczyk , & Vigorito, infants are capable of perceiving information related 1971) that the specialized speech processes may be to place distinctions in a categorical fashion andoperative as early as the first few weeks of life. Eimas hence in a linguistically relevant al (1971) investigated the infants ability todiscriminate small differences in voice onset time. MethodVoice onset time, which is defined by Lisker and Procedure. The procedure was a modification of the methodologyAbramson (1964) in articulatory terms as the time developed by Siqueland and DeLucia (1969). Each infant was testedbetween the release burst and the onset of laryngeal individually in a small sound-shielded room. moderatelypulsing or voicing, is a sufficient cue for illuminated by the rear-projected image of a colorful object. Thedistinguishing between the voiced and voiceless stop visual image, in addition to providing illumination, tended toconsonants [b] vs [p], [d] vs [t], and [g] vs [k]. They maintain the infants orientation to the speaker, which was situated just above the screen, about 45 em from the infants head. Thefound that infants 1 and 4 months of age were better infant was placed in a reclining seat and presented with a blindable to discriminate a 20-msec difference in voice nipple. The nipple was held gently by one of the Es , who listened toonset time when the two stimuli to be discriminated music over a set of headphones. The second E monitored thewere from different adult phonetic categories, [b] and recording apparatus and controlled the presentation of the speech patterns. During the first 2 or 3 min, the high-amplitude criterion [p], than when they were from the same adult and the baseline rate of high-amplitude sucking were established.phonetic category, [b] or [pl. As was true for adult The amplitude criterion was defined as the level which yielded alisteners (Abramson & Lisker, 1970), acoustic baseline sucking rate of approximately 20 or 30 responses/min. Byvariations of the same phonetic feature were not permitting the amplitude criterion to vary from infant to infant, itdiscriminable. In later studies, Eimas (in press) was was possible to reduce the variability associated with baseline rates of sucking as well as to establish a baseline rate for each infant suchable to replicate these results with the apical stops, that changes in either direction could occur without serious [d, t], and to find some evidence for the categorical contamination by either floor or ceiling effects. Immediately afterperception by young infants of a voicing distinction obtaining the baseline rate, the first speech sound was madenot found in English. contingent upon high-amplitude sucking. If the time between each high-amplitude sucking response was at least t sec, then each stich Additional research has shown that infants between response produced one presentation of the stimulus patternthe ages of 2 and 6 months are sensitive to differences 300 msec in duration plus 700 msec of silence. However, if thein place of articulation, acoustically represented by infant produced a burst of sucking responses with interresponsevariations in the second- and third-formant times less than I sec, as was typical, then each response did nottransitions (Moffitt. 1971; Morse, 1972). Although produce one presentation of the stimulus. Rather, each response recycled the timing apparatus and the I-sec on period began again.Morse has contended that infants process place This limitation in the presentation of auditory feedback wasdistinctions in a linguistically relevant manner, the imposed to prevent the occurrence of the reinforcing sound longersupporting evidence was not compelling. It is toward a than t sec after the last response.resolution of this issue, that is, the manner in which The presentation of an auditory stimulus in this manner typically results in an increase in the infants rate of high-amplitude sucking,place information is perceived by infants, that the compared with the baseline rate. After several minutes of thispresent studies were directed. Another purpose was to contingency (the time varies from infant to infant from about 4 ordetermine whether the mechanisms that are necessary 5 min to over 15 min), the infant usually shows a decrement into decode information signaling place of articulation performance, presumably as a result of a diminution in theare, as is apparently the case with adult listeners, the reinforcing quality of the once novel speech stimulus. When the rate of sucking diminished by 20% or more for 2 consecutive minutesunique property of the speech processor. compared with the minute immediately preceding the first minute of decrement, the feedback stimulus was changed without EXPERIMENT I interruption by switching the channel selector on the tape deck. The second synthetic speech pattern was presented, likewise contingent upon high-amplitude sucking, for 4 min, after which the testing The purpose of Experiment I was to determine session was terminated.whether the perception of place distinctions by young One group of infants, Group D, received two stimuli, one ofinfants was categorical. The stimuli were synthetic which signaled the adult phonetic category [d] and the other the
  3. 3. ACOUSTIC CUES FOR PLACE OF ARTICULATION IN INFANTS 515 phonetic category [g]. The second group of infants, Group S, Table 1 likewise received two stimuli, but these sound patterns were Starting Frequencies (in Hz) of the Second- and acoustic variations of the same adult phonetic category [dl. The Third-Formant Transitions: Experiment order in which the two stimuli were presented was counterbalanced Stimulus 1 Stimulus 2 Difference across infants in both groups. The infants in the control condition, Group C, were randomly assigned one of the four stimuli F-2 1845 1996 151 administered to Groups D and S. At the point at which a change in Group D 337 F-3 2862 2525 stimulation would have occurred for the infants of Groups Sand D, F-2 1541 1695 154 the records of the control infants were marked, the channel selector Group S F-3 2862 3195 333 control was switched, and the session continued for 4 min. Thus, the control records were completely comparable with those of the Note-Adult listeners consistently identified th e stimuli for experimental infants. Given that infants are highly responsive to Group S and Stimulus 1 for Group D as [d], Stimulus 2 for novel stimuli, the presentation of a new, discriminable stimulus Group D was identified as f sl. would be expected to result in a different rate of sucking when compared with the sucking rates for infants who did not receive a Intensity readings were made with a General Radio Type 155l-C change in stimulation. Thus, either an increase in response rate sound-level meter, B scale. The background noise was produced associated with a change in stimulation greater than that shown by almost entirely by the room exhaust system and the slide projector the control infants or a decrease less than that of the controls is fan. taken as inferential evidence that the infants perceived the two Subjects. The Ss were 24 2-month-old and 24 3-month-old stimuli as different, infants from the Greater Providence, Rhode Island, area. Half of Stimuli. The stimuli were four synthetic speech sounds prepared the infants at each age level were males and females. In order to by means of the parallel resonance synthesizere at the Haskins obtain complete data on 48 infants, it was necessary to test 115 Laboratories by Pisoni (1971). Each stimulus was 300 msec in infants. The success rate of approximately 40% is typical of the duration, with the initial 40 msec simulating a period of closure satiation/release-from-satiation procedure that we have used to voicing by means of a low-amplitude first formant centered at assess the speech processing capacities of young infants. There were 150 Hz. The next 40 msec was a period of transition during which no reliable differences in failures due to experimental condition, all three formants moved from their starting frequencies to their age, or sex. The reasons for failure to complete testing were many terminal steady-state values of 743, 1,620, and 2,862 Hz for the and included such factors as falling asleep (33%), crying (25%), first. second, and third formant, respectively. These steady-state initial failure to suck on the nipple (20%), ceasing to suck during formant values are appropriate for the American English vowel the course of the experiment (7%), and a group of factors consisting [ae]. The steady-state portion of each pattern was 220 msec in offailure to show satiation, an extremely erratic pattern of sucking, duration. The starting values for the second- and third-formants equipment failure, and E error (15%). The majority of the infants are given in Table I for each ofthe two stimuli presented to Groups (78%) who failed to complete the session were eliminated prior to D and S. along with their phonetic identification as determined by the shift in stimulation. There were no reliable differences in the adult listeners (Pisoni, 1973). All four stimuli had the same infants who were eliminated after the shift as a function of sex or first-formant starting values, ISO Hz. As may be seen in Table I. experimental treatment, and 80% of these infants were eliminated the only differences among the stimuli were in starting frequencies for crying or falling asleep. The criterion for the former was loud and direction of the second- and third-formant transitions (and, of crying and a failure to take the nipple. whereas the criterion for thecourse, in one instance. in the phonetic identification or place of latter was the characteristic sucking pattern associated with sleep. articulation value). Of importance to note is the fact that the that is. brief bursts of very low-amplitude sucking. Most of the acoustic differences between the two stimuli used for Group D were postshift eliminations occurred during the first or second postshiftvery nearly identical to the differences between the stimuli used for minute.Group S. Thus. any differences in discriminability between the two As often as possible, the infants were assigned randomly to thegroups cannot be attributed to an inequality in the acoustic three conditions. However, given the high attrition rate and thedifferences between the stimuli. For additional details concerning restrictions that the three groups (each with 16 infants) bethe construction of these stimuli, the reader is referred to Pisoni counterbalanced with respect to age. sex, and order of stimulus (1971). , presentation. random assignments could not always be made. The stimuli were recorded on high-quality magnetic tape fromwhich continuous loops were made, with each 300-msec speechpattern separated by 700 msec of silence. There were several copiesof each tape loop. The loops used for Groups D and S had one Results and Discussionstimulus on Channel I and the second stimulus on Channel 2. The The mean number of sucking responses is displayedcontrol tapes contained the same stimulus on both channels. This in Fig. 1 as a function of minutes and treatments. Anarrangement of the stimuli permitted all infants to be treated in a analysis of the minute-by-minute sucking rates for thelike manner as well as the nearly instantaneous switching between 5 min immediately prior to the shift in stimulation (orstimuli. Apparatus. Part of the apparatus was a blind nipple on which the at that point in time when a shift would have occurredinfant sucked. The positive pressure generated by the infants in the case of the control infants) revealed nonreliablesucking was transduced to provide a record of all sucking responses differences between groups. As would be expectedas well as a digital record of criterional high-amplitude sucking from the nature ofthe experiment, the mean responseresponses by means of a HP 7202B polygraph and a Hunter digitaltimer. Additional equipment included a two-channel tape deck rate for the 2 min before shift (the satiation period)(Sony Model TC 850), a VM Model 33-1 speaker, power supply, was significantly lower than the third minute beforeHP 8805A preamplifier, and Lafayette 5710 event timer, arranged stimulus change (p .01). <to provide auditory feedback when the power supply was activated Inasmuch as the measure of discriminability isby a criterion response. Each sucking response of criterion based on a change in response rate correlated with aamplitude activated a power supply for 1 sec (or restarted a periodof l-sec activation), which resulted in a rapid increase in the change in stimulation, and given that there wereintensity of auditory feedback stimulation from an inaudible level to individual differences in response rates during theone about 13 to 15 dB above the background noise level of 63 dB. satiation period, the analyses of postshift performance
  4. 4. 516 EIMAS 75 I GROUP 0 GROUP S GROUP C I I"w I I"z0 I I~ 60 I I"......0; I I I Fig. 1. Mean number of sucking responseso I as a function of time and experimentalZ I I conditions. Time is measured with reference /" 45 :>2V :> ,...../,. I I I to the moment of stimulus shift, which is indicated by the dashed line. The baseline rate"0 u, , I of sucking is indicated by the letter"B." I :~ CIt 30 w OIl I ..L L. L. .! -l. . L- LJ. 1.- .L~ ~ :::> • I • I Z I I Z -<I: 15 I I w I I ~ I I -i_L..L-Lfu-L_L.J . I I, I 8 5 " 3 2 11 1 2 3 4 S 5 4 " 2 I 1 234 S 5 4 1 1 3 TiME (min)were conducted on difference scores. For each infant, are presented in a speech context, as in Experiment I,the mean response rate for the 2 min immediately and when they are presented in a non speech context.preceding the stimulus shift was subtracted from each The speech stimuli for this experiment wereof the four l-rnin measures of postshift performance. two-formant synthetic speech patterns, which variedAn analysis of variance, Groups by Minutes, in the starting frequency and direction of theperformed on these difference scores, revealed a second-formant transition only. AU other characteris-significant groups effect, F(2,45) = 4.5, p < .025, tics of the stimuli did not vary from one pattern toand a significant Groups by Minutes interaction, another. Variations in the second-formant transitionF(6,135) = 8.4, p < .001. Group D, which showed a are by themselves sufficient cues to signal variations inmean increment of 4.1 responses/min for the 4-min place of articulation, and indeed, as Liberman et alpostshift period, differed reliably from Groups Sand (1967) have noted: " ... [the second-formantC (p < .025 in each instance). Groups Sand C, which transition] is probably the single most importantshowed mean decrements of 6.1 and 4.8 carrier of linguistic information in the speech signalresponses/min, respectively, did not differ signif- [p, 434]." Based on identification functions fromicantly. The significant interaction, as can be seen in adult listeners obtained by Mattingly et al (1971),Fig. 1, was primarily a function of the increase in rate pairs of synthetic speech patterns, separated by aof responding shown by Group D over the 4-min constant acoustic difference, were selected such thatpostshift period, whereas Groups Sand C tended to the two stimuli were perceived either as differentshow a decreasing rate of response with time.! phones, [b] and [d], or as the same phone, [b]. The The cues for place of articulation, like the cues for two different types of stimulus pairs corresponded tovoice onset time, are discriminated in a the Group D and Group S stimuli of Experiment 1.categorical-like manner by young infants; that is to In order to present the identical relevant acousticsay, the infants ability to discriminate place cues is information in a non speech context, additionaldetermined largely by whether or not the acoustic patterns were synthesized that eliminated all acousticstimuli signal the same or different values along the features, except the second-formant transitions of thephonetic feature, place of articulation. Before speech stimuli. These patterns, 40 msec in duration,considering the perceptual mechanisms that might be are perceived as bird-like chirps by adult listenersresponsible for processing of speech in a linguistic (Mattinglyet aI, 1971). Thus, both sets of stimuli varymode, we wish to consider whether these mechanisms in precisely the same manner, but yet are perceived inare part of the more general auditory processing radically different ways by adult listeners.system or whether they are part of the specialized Should both sets ofstimuli be perceived in the samespeech processor. categorical manner, then it is possible to conclude that, at least during infancy, the more general EXPERIMENT II auditory processing system is differentially sensitive to variations in rapidly changing frequency values. Experiment II compared the infants ability to Furthermore, it would seem that the points of greaterdiscriminate acoustic cues for place when these cues discriminability were selected to mark phonetic
  5. 5. ACOUSTIC CUES FOR PLACE OF ARTICULATION IN INFANTS 517boundaries (ct. Stevens, 1972; Stevens et ai, 1969). Table 2However, should categorical-like perception be Starting Frequencies (in Hz) of the Second-Formant Transitions: Experiment IIcharacteristic of only the speech stimuli (as in adultlisteners), then categorical decisions regarding Stimulus Diffe-phonetic feature values cannot be ascribed to the 1 2 reneefu nctioning of the general auditory processing system, Pair 1 F-2 1312 1541 229 Group 0but rather must be considered to be a result of some Pair 2 F-2 1386 1620 234form of additional linguistic processing to which only Group S F-2 1232 1465 233the sounds of speech are subjected. Note-Adult listeners consistently identified the stimuli for Group S and Stimulus 1 for both Group D pairs as [b],Method Stimulus 2 for both Group D pairs was identified as [d], Stimuli. The speech stimuli were six synthetic speech patternsconstructed by means of a parallel resonance synthesizer at the after the stimulus change did not differ reliably due to sex orHaskins Laboratories by Mattingly et al (1971). Each stimulus was experimental treatment and most of the infants were eliminated245 msec in duration. The initial 15 msec simulated closure voicing before the third postshift minute. When possible, infants wereby means of a low-amplitude first formant at 150 Hz. The following randomly assigned to the six conditions. But, again, the high40 msec was a transitional period, during which the first and attrition rate and the restrictions that the six groups be balanced assecond formants moved from their starting values to their terminal closely as possible for age, sex, order of stimulus presentation, and,steady-state values of 743 and 1,620 Hz, respectively. The in the case of the infants in Group D. for stimulus pair, completelysteady-state portions were 190 msee in duration. These stimuli were random assignments were simply not possible.perceived as the voiced stops [b] or [d] plus the vowel [ae], Thenonspeech stimuli, 40 msec in duration, were likewise synthesized Results and Discussionby means of the parallel resonance synthesizer. Each non speechpattern corresponded exactly to one of the second-formant An analysis of the response rates for the 5 mintransitions of the speech stimuli. Table 2 gives the starting immediately preceding the stimulus change revealedfrequencies for the six second-formant transitions. The difference no reliable differences due to experimental conditionsin the second-formant starting frequency between the two stimuli of or type of stimulus, speech or nonspeech. All sixthe two pairs used for Group D and the single pair used for groups showed a reliable increment in sucking rate atGroup S was approximately the same. This equality held, of course,for both the speech and nonspeech stimuli. The stimuli were the third minute before shift compared with therecorded and made into continuous loops, as in Experiment 1. The baseline rate (p < .01) and a reliable decrementsilent interval between successive stimuli was 755 msec for the during the final 2 min before shift, as would bespeech sounds and 960 msec for the nonspeech sounds. expected (p < .01). Apparatus. The apparatus was the same as that used inExperiment 1. Difference scores were again used in all analyses of Procedure. The general procedural details were the same as those postshift performance. The mean changes in responseused in Experiment 1. For each type of stimulus, speech or rates over the final 4 min are shown in Table 3.nonspeech, there were three groups of 16 infants, D, S, and C. The Preliminary analyses of the postshift data revealed noinfants of Group D (both speech and non speech) received one of significant differences as a function of age, sex, ordertwo pairs of stimuli. half of the infants receiving one pair and theremaining infants the other pair. When the stimuli were speech of stimulus presentation, or, in the case of the infantssounds. one stimulus of each pair signaled the phonetic category assigned to Group D, the stimulus pair. An analysis[bl. while the second stimulus signaled the phonetic category [d]. of variance, Type of Stimulus (speech vs nonspeech)Two different pairs were used to be sure that. in at least one by Groups (D vs S vs C) by Minutes yielded only oneinstance, the two stimuli lay on opposite sides of the phoneticboundary. The infants in Group S received a single pair of stimuli, significant effect, namely, the Groups by Type ofwhich, when speech sounds, were both representatives of the Stimulus interaction, F(2,90) 4.1, P < .025).category [b]. The control infants (Group C) received one of the six Individual comparisons revealed the following:speech or six non speech stimuli, randomly selected. All stimuli were (l) When the stimuli were synthetic speech patterns,presented at approximately 15 dB above a background noise level there was evidence for categorical perception in thatof63 dB (B scale, Type 1551-C General Radio sound-level meter). Subjects. The Ss were 96 infants from the Greater Providence, Group D responded at a reliably higher rate than didRhode Island, area. Approximately half of the infants were males Groups Sand C, which did not differ from each other.and half females. Likewise, approximately half of the sample were 2 (2) When the stimuli where nonspeech sounds, theremonths of age and the remaining infants were 3 months old.Forty-four percent of the infants tested completed the experiment,and there were no reliable differences in the rate of failure due to Table 3age, sex, type of stimulus, or experimental treatment. Of the 122 Mean Recovery in Response Rate (Responses Per Minute) forinfants who failed to complete the experiment, 34% cried, 27% fell the Entire 4-Min Postshift Period: Experiment IIasleep. 12% failed to suck on the nipple at all. 80/0 stopped sucking Experimental Conditionat some time during the course of the experiment, and 18% were Group 0 Group S Group C Stimulieliminated for a variety of reasons. including failure to achieve thesatiation criterion, very erratic responding, and equipment failure. Speech +8.5 -4.6 -1.2The majority (86%) of the infants who failed to complete the Nonspeech +4.0* +7.9 +.6experiment were eliminated before the stimulus change. and ofthose who were eliminated after the change, 83% either fell asleep "In a replication study with nine infants, the mean incrementor cried. As in the first experiment, the infants who were eliminated in response rate was +7.3 responses/min.
  6. 6. 518 EIMAS was no evidence for categorical perception; Groups D discriminable in one context (nonspeech) might not be and S did not differ reliably. (3) Although Group S did discriminable in another context (speech), and differ from Group C, the infants of Group D did not (3) what functions perception in a speech mode might differ significantly from the control infants when the serve in the infant during the earliest stages of stimuli were non speech sounds. language acquisition. . It is, of course, possible that the auditory The major theoretical positions pertaining to the processing system finds some pairs of nonspeech perception of speech are complex models, in that sounds more discriminable than other pairs (cf, perception is assumed to be more directly linked to Mattingly et al, 1971). On the other hand, it is internal mediating events than to external acoustic likewise possible that the slightly depressed events or their immediate auditory transforms (see, performance of the nonspeech Group D (compared for example, Liberman, 1957, 1970; Liberman et ai, with Group S) was due to sampling error. Indeed, the 1967; Stevens & House, 1972). Thus, in the motor difference between thq two groups could be attributed theory of speech perception of Liberman and his to the relatively poor performance of only four infants associates, decoding of the incoming acoustic signal is in Group D. To test these alternatives, an additional made by reference to invariant motoric commands to nine infants were tested on their ability to the articulatory system, undoubtedly at the level of discriminate the two pairs of nonspeech chirps neural activity. Exactly how the acoustic information previously used for Group D. The results indicate that makes contact with these central articulatory these stimuli can be reliably discriminated and that commands has not as yet been fully explicated. In a the original estimate of discriminability may well have similar manner, the analysis-by-synthesis model of been underestimated due to sampling error: the mean Stevens and his colleagues makes the assumption that increase in response rate was 7.3 responses (p < .05), perception occurs when the temporarily stored speech very near the performance level of Group S. event, in auditory form, is matched by a second, Regardless of the true level of discriminability for internally generated or synthesized representation of the stimuli of the nonspeech Group D, it is the case the speech event in question. This process of that infants discriminate variations in the generation must necessarily involve considerable second-formant transition quite differently when they knowledge of production processes, more specifically, are presented in a speech context as opposed to a of the rules by which distinctive phonetic features are nonspeech context. Further confirmation of this converted into articulatory commands and of the rules conclusion comes from comparisons of Groups D and by which auditory patterns are associated with these S, which received speech stimuli, with the same . commands. groups which received nonspeech stimuli. A 2 by 2 Although motor theories of speech perception are analysis of variance revealed a highly significant possible models of adult speech perception, theGroups by Type of Stimulus interaction, FO,60) = application of this class of model to the20.5, P < .001: Group S, which received speech speech-processing capacities of the infant raises stimuli, responded less frequently than did the certain difficulties. Most prominent of these is the remaining three groups, which, in turn, did not differ need to explain how the young, virtually inarticulatesignificantly from one another. infant has come to possess the complex knowledge In summary, the results of these two experiments required for the conversion of phonetic features to indicate that infants are capable of categorizing articulatory commands, and how he has come tocontinuous variations in the acoustic dimension that associate particular auditory patterns with articula-signal adult phonetic distinctions corresponding to tory commands. Without ascribing this knowledge toplace of articulation. Moreover, the manner in which the biological endowment of the infant, there wouldthis categorization takes place agrees perfectly with appear to be no way in which an infant could havethe three major and possibly universal phonetic acquired this information by traditional means duringdistinctions based on place of articulation." Finally, the first few weeks or months of life. However, tothe processing system that subserves perception of the attribute this knowledge to the infants biologicalphonetic feature of place differs from the system that endowment would seem to extend considerably theis capable of processing the same acoustic information cognitive competencies that we are willing to imputein a nonspeech context. to genetically determined factors. A number of researchers have begun to consider GENERAL DISCUSSION alternate models of speech perception that are based on feature detectors and that do not require It remains now to consider 0) the perceptual knowledge of production routines (Abbs & Sussman,mechanisms that could be responsible for the 1971; Cole & Scott, 1972; Cooper, in press; Eimas &categorical perception of place cues in speech Corbit. 1973; Eimas, Cooper. & Corbit, 1973;contexts, (2) how acoustic differences that are Lieberman, 1970; Stevens. 1972, 1973).3 Although
  7. 7. ACOUSTIC CUES FOR PLACE OF ARTICULATION IN INFANTS 519 feature detectors which permit phonetic decisions which is more closely tied to physical (acoustic) could be either auditory or linguistic in nature, the differences per se than is that of the speech analyzers. weight of evidence favors linguistic feature detectors (For evidence pertaining to the insensitivity of the (but see Stevens, 1973, for an account of place speech processor to variations in the acoustic discrimination based on auditory property detectors). difference between two phonetically distinct sounds, Recently, Eimas and Corbit (1973) and Eimas, the reader is referred to Eimas, in press.) In essence, Cooper, and Corbit (1973) have presented evidence then, .we have tried to argue that the discriminability consistent with the hypothesis that there exist of speech in infants is based primarily on the detection phonetic feature detectors for the acoustic of different values of one or more phonetic features, consequences of the two modes of voicing found in such as voice onset time or place of articulation. 4 English and numerous other languages (Lisker & The ability of infants (and adults) to discriminate Abramson, 1964). These detectors were assumed to be certain variations of second-formant transitions in asensitive to relatively restricted ranges of complex nonspeech context and yet be unable to discriminate acoustic energy (voice onset time), part of the central the same acoustic information when it is presented in speech processing system, and finally the direct a speech context is a seemingly paradoxical finding. mediators of the categorization of voicing As mentioned above, it must be the case that acoustic information. Similarly, Cooper (in press) and Cooper events, regardless of their nature, initially undergo a and Blumstein (1974) have found evidence for the considerable degree of common auditory analysis, butexistence of phonetic feature detectors underlying the as the results indicate, some of the auditory perception of the acoustic consequences of the three information from this common processing. is major place distinctions. If these phonetic feature apparently lost or not available when the stimuli aredetectors exist and are operative early in the lives of speech signals. Perhaps the information arising frominfants, then it is possible to accommodate the the additional linguistic analysis that only the soundscategorical perception of infants without recourse to of speech undergo takes precedence in some mannermotoric theories of speech perception (Eimas, in over the output information of the auditory; Cutting & Eimas, in press). Essentially, it has Of equal likelihood, however, is the hypothesis thatbeen argued that, given the passive nature of a feature linguistic processing reduces the amount of auditorydetector system of analysis, the presentation of an information that can be held in a temporary, probablyacoustic signal with sufficient linguistic information preperceptual store (cf, Massaro, 1972), as a con-to activate the speech processor (cf. Molfese, 1972) sequence of the additional time and/or processingwill likewise activate those phonetic feature detectors capacity required by the linguistic analysis. Thus,for which there is an adequate stimulus. Continued after linguistic processing, the relatively brief, butpresentation of the same speech sound eventuates in complex, auditory information associated withthe adaptation of the stimulated detectors. This phonetic features becomes unavailable, for all intentsadaptation may be related to the diminution of the and purposes, to response decision units. Although atreinforcing properties of novel speech stimuli as they this time the available evidence does not permit abecome increasingly familiar and the subsequent resolution ofthis problem, it is of interest to note thatreduction in effort to obtain the stimulus. The a number of researchers have considered theintroduction of an acoustically different, but categorical nature of speech perception to be due, atphonetically identical, speech pattern will not be least in part, to memorial processes that affect theexperienced as novel by the infant, whereas an stored auditory analogs of the acoustic input (cf.acoustically and phonetically different speech sound Fujisak i & Kawashima, 1969; Pisoni, 1973).will be perceived as novel. From a phonetic feature The capacity of infants to perceive speech in aanalysis and the infants strong attraction to novelty, linguistic mode through the activation of a specialit is easily predicted that the postshift period of speech processor with phonetic feature detectorsGroup D should be marked by a substantial serves a number of functions. In the first place, thisincrement in response rate to obtain the new stimulus. form of analysis of speech provides the infant with anOn the other hand, the postshift period of Group S automatic means for the recognition of speech. As a(and, of course, of Group C) should reveal a consequence, speech signals will undergo specialcontinued decrease in the rate of responding. As for processing, without conscious decisions on the part ofaccommodating the continuous discrimination of the listener (infant or adult), provided only that theresecond-formant transition variations when presented is sufficient linguistic information in the signal toin a nonspeech context, we need only assume that activate the speech processor (cf. Eimas, Cooper, &there was not sufficient linguistic information in the Corbit, 1973; Mattingly et aI, 1971). Secondly, thesignal to activate the -speech processor. Hence, the speech processing system, with its phonetic featureanalysis of these sounds is restricted to the auditory analyzers, not only provides information concerningprocessing system, the discriminative capacity of the phonetic feature structure of the speech sounds
  8. 8. 520 EIMASbut also automatically acts to segment the continuous voicing of initial stops: Acoustical measurements. Word, 1964,stream of speech into discrete elements-a necessary 20, 384-422.step for the analysis of speech and language. Finally, LISKER, L., & ABRAMSON, A. S. The voicing dimension: Some experiments in comparative phonetics. in Proceedings of thethis form of processing, in not requiring that the Sixth International Congress ofPhonetic Sciences. Prague, 1967.infant actively learn to recognize speech or to segment Prague: Academia, 1970. Pp. into phonetic features must certainly hasten, and MASSARO, D. W. Preperceptual images, processing time, and per-perhaps even make possible, the acquisition of human ceptual units in auditory perception. Psychological Review, 1972, 79, 124-145.languages. MATTINGLY, I. G .. LIBERMAN, A. M., SYRDAL, A. K., & HALWES, T. Discrimination in speech and nonspeech modes. REFERENCES Cognitive Psychology, 1971,2,131-157. MIYAWAKI, K., LIBERMAN, A. M., FUJIMURA, 0., STRANGE, W., ABBS, 1. H., & SUSSMAN. H. M. Neurophysiological feature detec- & JENKINS, J. J. Cross-language study of the perception of the tors and speech perception: A discussion of theoretical implica- F3 cue for [r] versus [I] in speech and nonspeech patterns. tions. Journal ofSpeech & Hearing Research. 1971, 14.23-36. Paper presented at the XXth International Congress of ABRAMSON, A. S., & LISKER, L. Discriminability along with voic- Psychology, Tokyo, 1972. (Also in Status Report on Speech ing continuum: Cross-language tests. In Proceedings of the Sixth Research, Haskins Laboratories, New Haven, Connecticut, International Congress of Phonetic Sciences. Prague. 1967. 1973, SR-33. Pp. 67-76.) Prague: Academia, 1970. Pp. 569-573. MOFFITT, A. R. Consonant cue perception by twenty- to twenty- COLE, R. A., & SCOTT, B. Phoneme feature detectors. Paper four-week-old infants. Child Development, 1971, 42, 717-731. presented at the meetings of the Eastern Psychological Associa- MOLFESE, D. L. Cerebral asymmetry in infants, children and tion, Boston, Massachusetts, April 1972. adults: Auditory evoked responses to speech and noise stimuli.COOl~ER, W. E. Adaptation of phonetic feature analyzers for place Unpublished PhD dissertation, The Pennsylvania State Univer- of articulation. Journal of the Acoustical Society of America, sity, 1972. in press. MORSE, P. A. The discrimination of speech and nonspeechCOOPER, W. E., & BLUMSTEIN, S. E. A "labial" feature analyzer in stimuli in early infancy. Journal of Experimental Child speech perception. Perception & Psychophysics. 1974, 16, Psychology, 1972, 14,477-492. 591-600. PISONI, D. P. On the nature of categorical perception of speechCUTTING, 1. E., & EIMAS, P. D. Phonetic feature analyzers in the sounds. Unpublished PhD dissertation, University of Michigan, processing of speech by infants. In J. Kavanagh and J. E. 1971. (Also in Supplement to Status Report on Speech Research, Cutting (Eds.), The role of speech in language. Cambridge, Haskins Laboratories, New Haven, Connecticut, November Mass: M.LT. Press, in press. 1971) EIMAS, P. D. The relation between identification and discrimina- PISONI, D. B. AUditory and phonetic memory codes in the tion along speech and nonspeech continua. Language & Speech, discrimination of consonants and vowels. Perception & Psvcho.. 1963, 6, 206-217. physics, 1973, 13, 253-260. -EIMAS, P. D. Speech perception in early infancy. In L. B. Cohen SIQUELAND, E. R., & DELUCIA, C. A. Visual reinforcement of non- and P. Salapatek (Eds.), Infant perception. New York: nutritive sucking in human infants. Science, 1969, 165, Academic Press, in press. 1144-1146.EIMAS, P. D., COOPER, W. E., & CORBIT, J. D. Some properties of STEVENS, K. N. The quantal nature of speech: Evidence from linguistic feature detectors .•Perception & Psychophysics, 1973, articulatory-acoustic data. In E. E. David, Jr., and P. B. Denes 13, 247-252. (Eds.). Human communication: A unified view. New York:EIMAS, P. 0 .. & CORBIT, J. D. Selective adaptation of linguistic McGraw-Hili, 1972, pp. 51-66. feature detectors. Cognitive Psychology, 1973, 4, 99-109. STEVENS, K. N. The potential role of property detectors in theEIMAS, P. D., SIQUELAND, E. R., JUSCZYK, P., & VIGORITO, J. perception of consonants. Paper presented at Symposium on Speech Perception in infants. Science, 1971, 171, 303-306. Auditory Analysis and Perception of Speech, Leningrad, USSR,FUJISAKI, H .. & KAWASHIMA, T. On the modes and mechanisms of August 1973. speech perception. Annual Report of the Engineering Research STEVENS, K. N., & HOUSE, A. S. Speech perception. In J. Tobias Institute, Vol. 28, Faculty of Engineering, University of (Ed.), Foundations o( modern auditory theory. Vol. l l , Tokyo, Tokyo, 1969. Pp. 67-73. New York: Academic Press, 1972. Pp. 3-62.KIMURA, D. Cerebral dominance and the perception of verbal STEVENS, K. N., LIBERMAN, A. M., STUDDERT-KENNEDY, M., & stimuli. Canadian Journal of Psychology, 1961, 15, 166-171. OHMAN, S. E. G. Cross-language study of vowel perception.KIMURA, D. Left-right differences in the perception of melodies. Language & Speech. 1969, 12, 1-23. Quarterly Journal of Experimental Psychulogy, 1964, 16, STUDDERT-KENNEDY, M. The perception of speech. In T. A. 355-358. Sebeok (Ed.), Current trends in linguistics. Vol. XlI. The Hague:LIBERMAN, A. M. Some results of research on speech perception. Mouton, 1974. Journal of the Acoustical Society ofA merica, 1957, 29, 117-123. STUDDERT-KENNEDY, M., & SHANKWEILER, D. P. HemisphericLIBERMAN, A. M. The grammars of speech and language. Cogni- specialization for speech perception. Journal of the Acoustical tive Psychology, 1970, 1, 301-323. Society ofAmerica, 1970, 48, 579-594.LIBERMAN, A. M., COOPER, F. S., SHANKWEILER, D. P., & WOOD, C. c., GOFF, W. P., & DAY, R. S. Auditory evoked STUDDERT-KENNEDY, M. Perception of the speech code. potentials during speech perception. Science, 1971, 173, Psychological Review. 1967, 74,431-461. . 1248-1251.LIBERMAN, A. M., HARRIS, K. S., HOFFMAN, H. S., & GRIFFITH, B. C. The discrimination of speech sounds within and across NOTES phoneme boundaries. Journal of Experimental Psychology, 1957, 54, 358-368. I. In one respect, the present findings differ from those obtainedLIEBERMAN, P. Towards a unitied phonetic theory. Linguistic with the dimension of voice onset time: the recovery function for Inquiry, 1970, 1. 307-322. Group 0 showed a rising function, whereas for voice onset time, theLISKER, L., & ABRAMSON, A. S. A cross-language study of Group 0 recovery function attained its maximum response rate
  9. 9. ACOUSTIC CUES FOR PLACE OF ARTICULATION IN INFANTS 521during the initial 2 min. This effect appears to be real in that the extent by the phonetic composition of the acoustic event.same rising function for Group D was found in a replication study 3. We recognize that there exist, as yet, a number of unresolvedusing similar three-formant synthetic speech stimuli. The recovery problems in hypothesizing a model of speech perception based onfunction for the comparable Group D in Experiment II, showed a the extraction of phonetic information by feature detectors. Theseless pronounced increment over time, with the peak response rate problems include, for example, delineating the phonetic featuresoccurring at the third postshift minute. The reason for the delay in and the invariant stimulus information necessary for the activationthe postshift response peak with place cues is not clear (but see of the various detectors. The latter issue is particularly difficult,Cutting and Eimas, in press, for some possibly analogous findings given the extent to which a number of cues sufficient for signalingin the adult speech perception literature). phonetic feature distinctions are context-conditioned, and vary 2. Comparison of the pattern of results of the first experiment greatly from speaker to speaker (d. Liberman et aI, 1967).with that obtained by Pisani (1971, 1973) indicates that infants may Whether these issues can be resolved is to a large extent anbe bound by phonetic feature values in discriminating speech empirical problem. What we have tried to do in this paper andsignals even more than adult listeners. Pisani, using a variety of others is to demonstrate that this form of analysis can accommodatepsychophysical procedures, found that the level of discriminability at least a portion of the speech data, whether from infant or adultfor the stimuli of Group S was consistently above chance, albeit listeners.only 7%-100/0. It will be recalled that the recovery function for 4. It should be borne in mind that the feature detector model isGroup S did not differ from that of the control infants. Moreover, used in the present paper to accommodate the discriminability datain Experiment II, an even larger acoustic difference than used by only. It is obviously the case that additional, high-level processes ofMattingly et al (1971) failed to produce any evidence of integration, as well as some form of matching sets of distinctivewithin-phonetic category discrimination. Whether infants are features with appropriate labels, are necessary before anactually more constrained than adult listeners by the phonetic identification response can be made.feature structure or whether the apparent difference is oneattributable to procedural differences remains to be determined. In (Received for publication March I, 1974;any event, there appears to be ample evidence that the revision received July 6, 1974.)discrimination of speech signals by infants is controlled to a large