BRIEF REPORTS AND SUMMARIES
TESOL Quarterly invites readers to submit short reports and updates on their work.
These summaries may address any areas of interest to Quarterly readers.
Edited by CATHIE ELDER
University of Auckland
Pennsylvania State University
Vietnamese Acquisition of English Word Stress
THU T. A. NGUYEN
University of Queensland
St. Lucia, Queensland, Australia
University of Queensland
St Lucia, Queensland, Australia
■ The acquisition of prosody in a second language (L2) is relatively
understudied; however, the acquisition of stress—especially in English as
L2—has recently received some attention (e.g., Archibald, 1992, 1993,
1995, 1997, 1998; Pater, 1997; Peperkamp, Dupoux, & Sebastián-Gallés,
1999). Nevertheless, these studies focus on the transfer of the phonologi-
cal aspect of stress (e.g., stress placement and truncation);the transfer of
ﬁrst language (L1) acoustic features in realizing L2 stress has not been
widely investigated. In a recent study on the bidirectional transfer
between English and Japanese, Ueyama (2000) found that when produc-
ing L2 word accent in the beginning stage of L2 development, L2
speakers tend to import L1 phonetic features, for example, pitch (F0:
fundamental frequency) in L1 Japanese. However, speakers can also
modify these L1 habits to simulate L2 patterns. In addition, L2 speakers
use an acoustic correlate that is already active in the L1 system (e.g., F0
in L1 Japanese) to learn to control a correlate that is not active in L1
(e.g., syllable duration in L1 Japanese). This study examined the transfer
of tonal acoustic correlates in Vietnamese learners’ production of
English word stress. More speciﬁcally, the study examined acoustic
features that native and nonnative speakers (Vietnamese learners of
English) use to differentiate stressed from unstressed syllables in noun-
verb pairs (e.g., as in the words record vs. rec ord). Vietnamese and
TESOL QUARTERLY Vol. 39, No. 2, June 2005 309
English, respectively, represent two broadly contrastive prosodic types:
tone languages and stress languages. English has a system of culminative
word stress, but Vietnamese, a tonal language, has no system of word
stress; rather, it has a system of lexically distinctive tones (Nguyen, 1970;
Stress is different from tone in several ways. First, stress is culminative
(head-marking); that is, in stress languages, with few exceptions, every
(content) word has at least one stressed syllable. Second, because a
prominence hierarchy may occur among multiple stresses (e.g., primary
vs. secondary stresses in English), stress is hierarchical. Third, stress can
mark edges or boundaries in some systems; for example, some languages
prefer iambic feet (stress on the ﬁnal syllable), but others prefer trochaic
ones (stress on the initial syllable). Fourth, stress is rhythmic in systems
where stressed and unstressed syllables alternate and where clashes
(adjacent stresses) are avoided. Fifth, stress contrasts tend to be en-
hanced segmentally: Stressed syllables may be lengthened by vowel
lengthening or by gemination, and unstressed syllables may be weakened
by vowel reduction (Kager, 1996). Vietnamese, as a tonal language, has
no system of culminative word stress but a system of six lexical tones in
which pitch is used to contrast individual lexical items or words. As a
result, English and Vietnamese differ in terms of how they manipulate
the acoustic correlates in word-level prosody. Studies on the acoustic
correlates of English stress show that judgments of linguistically signiﬁ-
cant stress in English are contingent on at least four acoustical param-
eters: fundamental frequency, duration, amplitude, and vowel quality (Beckman,
1986; Fry, 1955; Lehiste & Peterson, 1959; Lieberman, 1960). Research
on tonal languages (Chao, 1930, 1980; Gandour, 1983; Gandour &
Harshman, 1978; Hashimoto, 1986; Ruhlen, 1976; Vance, 1977; Wang,
1967) shows that a limited number of tonal dimensions are used to signal
tonal distinctions: pitch height, direction of pitch movement, pitch range, and
beginning and ending point of pitch movement. Of these pitch characteristics,
two primary dimensions of linguistic tone are most commonly identiﬁed:
pitch height and the direction of pitch movement (Gandour, 1983;
Gandour & Harshman, 1978). In Vietnamese, in addition to direction of
pitch movement (tone contour) and pitch height, tones are distin-
guished by voice quality, intensity, and duration (Nguyen & Edmondson,
1997; Pham, 2000; Vu Thanh Phuong, 1981). Intensity was found to
highly correlate with pitch (Vu Thanh Phuong, 1981) and thus is
supplementary to pitch. Duration or, particularly, tonal length is not a
distinctive feature in Vietnamese (Pham, 2000; Vu Thanh Phuong, 1981)
but only varies in segmental contexts. From a study on native perception
of Vietnamese tones, Vu Thanh Phuong (1981) concludes that the
direction of pitch movement, pitch height, and voice quality play a more
important role than other tonal dimensions, such as duration and
310 TESOL QUARTERLY
intensity, in the identiﬁcation of tones. Intensity and duration support
perception but play no independent role in tone recognition.
The aforementioned studies show that even though both languages
employ F0 as perceptual cues (to tone in Vietnamese and stress in
English), the two languages differ in terms of how acoustic cues are
manipulated. In English, stressed syllables are longer than unstressed
syllables (i.e., duration is an active correlate in producing word stress),
and unstressed vowels tend to be reduced. In contrast, in Vietnamese, a
syllable-timed language, no systematic difference in duration or vowel
quality among syllables has been found.
The comparison of acoustic features used in the two languages shows
potential prosodic transfer effects in the ways that Vietnamese learners
produce English word stress patterns: (a) The active role of F0 as a tonal
cue in Vietnamese probably facilitates the production of F0 (and
intensity) contrasts between lexically stressed and unstressed syllables in
L2 English. (b) Because duration and vowel reduction do not function in
Vietnamese as active cues for tonal contrasts, Vietnamese learners will
have difﬁculties producing the requisite vowel duration and quality
contrasts for English word stress.
Twenty minimal pairs of nouns and verbs such as pre sent (noun) and
present (verb) were used as the stimulus items. Five minimal pairs were
three-syllable words (document vs. document), the remaining 15 pairs
were two-syllable words (see Appendix for the complete stimulus set).
For each pair, the noun form has word stress on the ﬁrst syllable, and the
verb form has word stress on the second or third syllable. The two forms
are segmentally homophonous except for the vowel reduction in the
unstressed syllable. Each noun and verb was embedded in the carrier
sentence, “Say the word ______ again.” Stress was marked on each
stressed vowel to make sure that Vietnamese speakers produced the
correct stress patterns. To ensure that speakers produced the correct
contrastive stress patterns, the sentences were presented in pairs with a
target noun form immediately followed by a counterpart verb form. For
1. a. Say the word “conduct” again.
b. Say the word “conduct ” again.
2. a. Say the word “present” again.
b. Say the word “present ” again.
BRIEF REPORTS & SUMMARIES 311
Three groups of subjects participated in this experiment: beginning-
level Vietnamese learners of English, advanced-level Vietnamese speak-
ers of English, and a control group of native speakers of Australian
English. The beginning group consisted of 10 subjects (3 from Hanoi, 3
from Hue, and 4 from Saigon; 5 male and 5 female), and they were paid
for their participation. All had just completed their ﬁrst year as English
majors at universities in Hanoi, Hue, and Saigon. They had all started
learning English at the age of 12 (in secondary school) with the grammar
translation method, which focuses on vocabulary and grammar learning.
However, during their ﬁrst year in university, they were exposed to
communicative English learning. The advanced group consisted of 10
postgraduate students at the University of Queensland (4 Southerners, 3
Northerners, and 3 Hue speakers). They were in the age range 25–32.
Their length of residence in Australia varied from 8 months to 10 years.
Eight of them had received a bachelor of arts degree in English and had
been teaching for 2 to 3 years. They were working toward a master of arts
degree in TESOL. Two other subjects had just ﬁnished bachelor’s
degrees in science studies. Like the beginning-level group, the advanced-
level students had started learning English at the age of 12 with the
grammar translation method. They were exposed to communicative
language teaching methods during their 4 years of undergraduate study.
They spoke Vietnamese and English, and they had very limited knowl-
edge of French, which they had learned at university as a second foreign
language with a curriculum that strongly emphasized grammar. Two
native speakers of Australian English (a phonetician and a linguist from
the University of Queensland linguistics program) served as the control
native speaker group. It is worth noting that the nonnative speaker
groups in this study included speakers of the three main Vietnamese
dialects: speakers from Hanoi representing the northern dialect, speak-
ers from Hue representing the central dialects, and speakers from
Saigon representing the southern dialects. This study was originally
designed to examine dialectal differences in prosodic transfer effect, but
a preliminary analysis and other related studies showed no signiﬁcant
dialectal differences on variables investigated. Therefore, the dialect
factor was excluded from this report.
Before the recording, subjects were presented the list of contrastive
sentence pairs and provided sufﬁcient time for familiarization and
practice. They then read the sentences aloud three times in their normal
speaking manner. Only the third repetition was recorded and used for
312 TESOL QUARTERLY
analysis. The two native speakers recorded ﬁve repetitions, all of which
were used for analysis. The recording was made in a quiet room using
Speech Station, sound recording and editing software, at 20-kHz sam-
pling rate and 16-bit precision.
The acoustic measurements included fundamental frequency (F0),
vowel and syllable duration, and intensity of the accent-bearing elements
(the ﬁrst syllable and the second syllable in a two-syllable word, or the
ﬁrst syllable and the third syllable in the three-syllable word). All the
measurements were made using Emu Speech Tools (Cassidy, 1999).
Studies of the effects of stress and accent on duration in English have
shown that not only the rhymes but also the initial consonants are
lengthened relative to their counterparts in unstressed syllables (see,
e.g., Ingrisano & Weismer, 1979; Umeda, 1977). Therefore, in this
experiment, we measured the duration of the vowel as well as of the
whole syllable, including the onset and the rhyme. We also measured the
fundamental frequency F0 (in Hz) and intensity (in db) values from the
center point of each vowel.
First, to examine the effect of word stress (stressed vs. unstressed) on
the acoustic correlates (i.e., to ﬁnd out whether each speaker group
could distinguish between stressed and unstressed syllables based on the
four acoustic correlates), a series of two-way ANOVAs (stress speakers)
were conducted on each acoustic parameter (F0, intensity, syllable
duration, and vowel duration) for each speaker group. This process
yielded 12 separate data sets (4 acoustic parameters 3 speaker groups).
The independent variables were stress (stressed vs. unstressed) and
speakers. The dependent variable was the acoustic parameter.
Second, to compare the degree of difference in acoustic values
between stressed and unstressed syllables among the three speaker
groups (e.g., the difference in degree of stressed-syllable lengthening),
the ratios of stressed to unstressed vowels in terms of the four acoustic
parameters were calculated (e.g., F0 ratios were calculated by dividing
the F0 value of the stressed vowel by that of the corresponding unstressed
vowel). Then one-way ANOVAs with planned comparison among speaker
groups (native vs. advanced, advanced vs. beginner, and native vs.
beginner) by the Tukey method were conducted on the ratios of each
acoustic parameter (F0 ratios, intensity ratios, syllable duration ratios,
and vowel duration ratios).
BRIEF REPORTS & SUMMARIES 313
Results of Anovas on Stress as a Factor (Stressed vs. Unstressed)
The results of 12 separate ANOVAs showed signiﬁcant main effects for
stress (p 0.001) and speakers (p 0.001), but no signiﬁcant
interaction effect of stress speakers. The signiﬁcant speaker effect
merely indicates speaker variation in intrinsic voice characteristics.
Therefore, we examine only the main effect of stress (see Table 1).
As shown in Table 1 and Figure 1, the F0 and intensity values on
stressed syllables were signiﬁcantly higher than the same values on the
unstressed syllables across the three speaker groups, indicating that all
three groups could differentiate stressed from unstressed syllables in
terms of F0 and intensity.
However, there is a difference among speaker groups on the duration
parameter. As shown in Table 1 and Figure 2, the native Australian
English speakers and the advanced Vietnamese learners of English
produced stressed syllables and vowels that were signiﬁcantly longer than
the corresponding unstressed syllables, but the beginning Vietnamese
learners produced stressed and unstressed syllables with no signiﬁcant
difference in duration. This result indicates that beginners failed to
encode the duration cue in their production of English word stress.
Results on Acoustic Ratios of Stressed to Unstressed Syllables
This section examines the magnitude of difference in acoustic values
between stressed and unstressed syllables among speaker groups (e.g.,
the difference in degree of stressed-syllable lengthening and vowel-
duration reduction among the three speaker groups). The results of one-
way ANOVAs (ratios speaker groups) on the ratio of each acoustic
Results of ANOVAs on Stress
Native Advanced Beginner
Syllable duration F(1,1) 14.9 F(1,1) 26.4 F(1,1) 2.3
p 0.001 p 0.0001 NS
Vowel duration F(1,1) 12.8 F(1,1) 69.5 F(1,1) 4.6
p 0.001 p 0.0001 p 0.04
F0 F(1,1) 60 F(1,1) 146 F(1,1) 40.8
p 0.0001 p 0.0001 p 0.0001
Intensity F(1,1) 14 F(1,1) 168 F(1,1) 32.4
p 0.001 p 0.0001 p 0.0001
314 TESOL QUARTERLY
Mean and Standard Deviation of F0 Values (left) and Intensity Values (right)
Mean intensity (db)
250 * 100
Mean F0 (hz)
200 80 * *
Native Advanced Beginner Native Advanced Beginner
* signiﬁcantly different at p 0.01.
value of stressed to unstressed syllables all showed signiﬁcant main
effects (p 0.001). The signiﬁcant pair comparisons by the Tukey
method between speaker groups are given in Table 2 (only those
signiﬁcant at p 0.01 are ﬂagged).
As shown in Table 2 and Figure 3, the native and advanced groups
produced signiﬁcantly greater duration ratios of stressed to unstressed
syllables than beginners did, but advanced and native groups produced
no signiﬁcant difference in duration ratios. The mean ratios of syllable
and vowel duration produced by the native and advanced groups range
from 1.3 to 1.5, respectively. In contrast, the ratios of beginning speakers
are clustered around 1.0, which indicates that they produced no differ-
ence in duration between stressed and unstressed syllables, conﬁrming
Mean and Standard Deviation of Syllable Duration
Mean duration (ms)
Native Advanced Beginner
* signiﬁcantly different at p 0.01.
BRIEF REPORTS & SUMMARIES 315
Comparison of Ratio Difference in Acoustic Values Between Stressed and
Unstressed Syllables Among Speaker Groups
F0 Intensity Syllable duration Vowel duration
Vietnamese speakers of English *** *** *** ***
Advanced-level native speaker
Beginning-level native speaker
of English *** ***
***Signiﬁcant at p 0.01.
the ANOVA results on duration presented in Table 1. Advanced speakers
show ratio magnitudes equivalent to that of native speakers, indicating
that they produce native-like duration patterns (i.e., the magnitude of
difference in duration between a stressed syllable and an unstressed
syllable is the same as that of native speakers). Even though the
difference in F0 ratios and intensity ratios between the advanced and
beginner groups is statistically signiﬁcant, the magnitude of difference is
not large. Vietnamese speakers of English and native speakers of English
generally do not produce signiﬁcantly different F0 and intensity ratios
(i.e., no signiﬁcant difference in magnitude of pitch contrast between
Vietnamese and native English speakers).
Average Ratio of Acoustic Parameters of English Stressed/Unstressed Syllables
F0 Intensity Syllable Vowel
316 TESOL QUARTERLY
The results of the experiment generally support the predictions. Both
groups of Vietnamese speakers could differentiate stressed from un-
stressed syllables in terms of F0 and intensity as well as native English
speakers did. This ﬁnding suggests that the active role of F0 (and
intensity) as tonal cues in Vietnamese facilitated the production of F0
(and intensity) contrast between lexically stressed and unstressed syl-
lables in L2 English. It is noted that increase in intensity between
unstressed and stressed syllables tended to correlate with changes of
voice pitch but was usually marginal and thus had less perceptual
discriminating power, consistent with classical experiments on the pho-
netics of English word stress (Fry, 1955).
Although advanced speakers could produce native-like duration pat-
terns between stressed and unstressed syllables, beginners failed to
differentiate English stressed and unstressed syllables in terms of dura-
tion. This ﬁnding suggests a negative transfer effect: Because duration
does not function as an active cue in Vietnamese tonal distinctions,
Vietnamese beginning English learners fail to encode this cue in their L2
production. Nevertheless, advanced speakers’ ability to produce con-
trasting duration between stressed and unstressed syllables indicates that
this feature is learnable.
In conclusion, the results of this study are consistent with ﬁndings in
our related investigations (Nguyen, 2003; Nguyen & Ingram, 2002) that
native speakers and nonnative speakers employ acoustic cues in different
ways that are optimally suited to their respective L1 phonologies. Native
speakers produced word stress using both pitch and duration cues. In
contrast, when compared with advanced Vietnamese learners of English,
beginning Vietnamese learners produced word stress that accommo-
dated L2 pitch and intensity targets but not timing parameters such as
duration and vowel reduction.
Although beginning Vietnamese learners failed to realize or recognize
syllable duration contrast and vowel reduction, phonetic features that
are not active in their L1, this result does not mean that they do not have
the ability to perceive or to encode duration contrast, but that they need
to be explicitly taught at the initial stage to encode this necessary cue.
Explicitly teaching learners about these features will help them master
the features faster than letting them pick up the features through
exposure to the language, particularly in a foreign language context.
Therefore, it is necessary for ESL teachers to draw learners’ awareness to
these features and to provide them with explicit training, particularly the
vowel reduction and syllable duration contrast in the acquisition of
English word stress.
BRIEF REPORTS & SUMMARIES 317
T. A. T. Nguyen is a postdoctoral research fellow and a lecturer in linguistics at the
School of English, Media Studies, and Art History, the University of Queensland,
Australia. His research interests include phonetics, phonology, second language
phonology, and Vietnamese prosodic phonology, second language phonology. He
has been a TESOL lecturer in Vietnam.
John Ingram is a senior lecturer in linguistics at the School of English, Media Studies,
and Art History, the University of Queensland, Australia. His research interests
include phonetics, phonology, and psycholinguistics. His most recent work includes
studies on Parkinson’s Disease and on Vietnamese acquisition of Australian English.
Archibald, J. (1992). Transfer of L1 parameter settings: Some empirical evidence
from Polish metrics. Canadian Journal of Linguistics, 37(3), 301–339.
Archibald, J. (1993). Metrical phonology and the acquisition of L2 stress. In F. R.
Eckman (Ed.), Conﬂuence: linguistics, L2 acquisition and speech pathology (pp. 37–
48). Amsterdam: John Benjamins.
Archibald, J. (1995). A longitudinal study of the acquisition of English stress. Calgary
working papers in linguistics, 17, 1–10. Calgary, Alberta, Canada: Department of
Linguistics, University of Calgary.
Archibald, J. (1997). The acquisition of English stress by speakers of nonaccentual
languages: Lexical storage versus computation of stress. Linguistics, 35, 167–181.
Archibald, J. (1998). Second language phonetics, phonology, and typology. Studies in
Second Language Acquisition, 20(2), 189–213.
Beckman, M. E. (1986). Stress and non-stress accent. Dordrecht, Holland: Foris.
Cassidy, S. (1999, September). Compiling multi-tiered speech databases into the
relational model: Experiments with the Emu System. Eurospeech ’99: Vol. 6 (pp.
2239–2242). Budapest, Hungary: N.p.
Chao, Y. R. (1930). A system of tone letters. Le Maitre Phonetique, 45, 283–319.
Chao, Y. R. (1980). Chinese tone and English stress. In L. R. Waugh & C. H. Van
Schooneveld (Eds.), The melody of language (pp. 41–44). Baltimore, MD: University
Fry, D. B. (1955). Duration and intensity as physical correlates of linguistic stress.
Journal of the Acoustical Society of America, 27, 765–768.
Gandour, J. (1983). Tone perception in Far Eastern languages. Journal of Phonetics,
Gandour, J., & Harshman, R. A. (1978). Crosslanguage differences in tone percep-
tion: A multidimensional scaling investigation. Journal of Phonetics 11, 149–175.
Hashimoto, A. (1986). Tone sandhi across Chinese dialects. In The Chinese
Language Society of Hong Kong (Ed.), Wang Li memorial volumes (pp. 445–474).
Hong Kong: Joint Publishing.
Ingrisano, D., & Weismer, G. (1979). s-Duration: methodological and linguistic
factors. Phonetica, 36, 32–43.
Kager, R. (1996). The metrical theory of word stress. In J. A. Goldsmith (Ed.), The
handbook of phonological theory (pp. 367–443). Cambridge, MA: Blackwell.
Lehiste, I., & Peterson, G. E. (1959). Vowel amplitudes and phonemic stress in
American English. Journal of the Acoustical Society of America, 31, 428.
Lieberman, P. (1960). Some acoustic correlates of word stress in American English.
Journal of the Acoustical Society of America, 32, 451.
318 TESOL QUARTERLY
Nguyen, D. H. (1980). Language in Vietnamese society. Chicago: University of Illinois Press.
Nguyen, D. L. (1970). A contrastive phonological analysis of English and Vietnamese
(Paciﬁc Linguistics Series, No 8). Canberra: Australian National University.
Nguyen, T. A. T. (2003). Prosodic transfer: The tonal constraints on Vietnamese acquisition
of English stress and rhythm. Unpublished doctoral dissertation, University of
Nguyen, T. A. T., & Ingram J. (2002). Native and Vietnamese production of
compound and phrasal stress patterns. In J. H. L. Hansen & B. Pellom (Eds.),
ICSLP-2002 Conference Proceedings: Vol. 2. 7th International Conference on Spoken
Language Processing, September 16–20, 2002, Denver, Colorado. (pp. 533–536). Boul-
der, CO: Center for Spoken Language Research.
Nguyen, V. L., & Edmondson, J. (1997). Tones and voice quality in modern northern
Vietnamese: Instrumental case studies. Mon-Khmer Studies, 28, 1–18.
Pater, J. (1997). Metrical parameter missetting in second language acquisition. In
S. J. Hannahs & M. Young-Scholten (Eds.), Focus on phonological acquisition (pp.
235–262). Amsterdam: John Benjamins.
Peperkamp, S., Dupoux, E., & Sebastián-Gallés, N. (1999). Perception of stress by
French, Spanish, and bilingual subjects. In Proceedings of Eurospeech ’99: Vol. 6 (pp.
2683–2686). Budapest, Hungary: N.p.
Pham, H. (2000). Vietnamese tone: Tone is not pitch. Unpublished doctoral dissertation,
University of Toronto, Ontario, Canada.
Ruhlen, M. (1976). A guide to the languages of the world. Palo Alto, CA: Language
Ueyama, M. (2000). Prosodic transfer: An acoustic study of L2 English vs. L2 Japanese.
Unpublished doctoral dissertation, University of California, Los Angeles.
Umeda, N. (1977). Consonant duration in American English. Journal of the Acoustical
Society of America, 60, 846–858.
Vance, T. J. (1977). Tonal distinctions in Cantonese. Phonetica, 34, 93–107.
Vu, T. P. (1981). The acoustic and perceptual nature of tone in Vietnamese. Unpublished
doctoral dissertation, Australian National University, Canberra.
Wang, W. S.-Y. (1967). Phonological features of tone. International Journal of American
Linguistics, 33(2), 93–105.
List of Test Words
Say the word “______” again.
1. upset(n) 8. proceeds(n) 15. rebel(n)
upset(v) proceed(v) rebel(v)
2. offset(n) 9. addict(n) 16. compliment(n)
offset(v) addict(v) compliment(v)
3. segment(n) 10. ally(n) 17. implement(n)
segment(v) ally(v) implement(v)
4. fragment(n) 11. relay(n) 18. document(n)
fragment(v) relay(v) document(v)
5. accent(n) 12. conﬁne(n) 19. regiment(n)
accent(v) conﬁne(v) regiment(v)
6. compress(n) 13. combine(n) 20. interlock(n)
compress(v) combine(v) interlock(v)
7. conduct(n) 14. present(n)
BRIEF REPORTS & SUMMARIES 319