English Pronunciation for Chinese and Vietnamese Speakers
Pronunciation Doctor on Youtube
Sunburst Media and Mission College, CA
CATESOL Teaching of Pronunciation Interest Group (TOP-IG)
Co-founder & Co-coordinator
Marsha J. Chan
Over 2000 free videos for learning English, curated into over 25
More info and handouts at
Most of my students are from southern
China, and they cannot pronounce an "l"
sound at the beginning of a word or
distinguish the sound from an initial "r"
sound. How do you go about teaching this?
What's the best way to teach students the
difference between "walk" and "work”? I draw
diagrams of the mouth and tongue, but it's
still very difficult. Is there better way?
The Vietnamese woman who works with me
refuses to pronounce the ends of words. I
keep telling her how, but she just won’t do it.
It’s so taxing to listen to them speak. It’s like
a sing-song staccato, and sometimes I just
can’t figure out what they’re saying. Is there a
way to make their pronunciation more
Ana Wu’s Blog
Every student must use a mirror fastidiously
Direct line: student’s mouth–mirror–teacher’s
1st listen & watch the teacher
2nd look in the mirror, repeat 3 (5, 10) times
Long pause = waiting for students’ eyes
A sagittal view of the mouth, nose, and throat
Both are sonorants.
◦ The sounds reverberate off the vocal organs freely
Both are liquid phonemes.
◦ Liquids produce only a partial closure in the mouth
a resonant, vowel-like consonant.
◦ The tongue approaches a point of articulation
within the mouth, but it does not obstruct the flow
of air through the oral cavity.
Contrast liquids/l/ and /r/ lake, rake
with stops /k/ and /p/: cup
with obstruent /ʧ/ in chair
The sound /l/ is a lateral consonant.
◦ The outward flow of air goes around the tongue
toward the sides of the mouth before it exits
through the lips.
In English, /l/ and /r/ may be syllabic, acting
like a vowel
◦ the second syllables of table and father
/l/ and /r/ are mostly non-syllabic, acting
like a consonant at the beginning
◦ rock, lock
Bring tip of tongue near alveolar (gum) ridge,
and let voiced breath travel over relaxed left
and right sides of tongue.
◦ Postvocalic: all, fell, cold
◦ Intervocalic: alive, belong, yellow
◦ Prevocalic: let, lie, look
◦ Clear/Light L: prevocalic: lay, slay, play
◦ Dark L /ɫ/: Raise back of tongue toward velum
(soft palate), at back of roof of mouth. Insert
short schwa-like vowel before the dark L in AmE.
tile, tail, tell
Do not round the lips for words ending in the
sound /l/ (Vietnamese, southern Chinese).
◦ Feel–few, dill-dew, mail–mayo
◦ Use a mirror
◦ Hold the lips with fingers
The letter ‘r’ is used in many written
languages, but it represents very different
◦ ‘r’ letter name e-rờ; rờ /ɛ˧ɹəː˧˩, ʐəː˧˩/
◦ Mandarin /ɻ/Retroflex approximant：日 rì，人 rén
◦ Cantonese has no r-like sound
◦ Many other dialects
American English /ɹ/: ray, row
spelled as jih
and jen in the
Anchor both L & R sides of tongue against
upper side teeth, round lips slightly, slowly
curl tongue up near but not touching gum
ridge let voiced breath travel over
◦ Postvocalic: air, car, more
◦ Intervocalic: array, arise, erase
◦ Prevocalic: ray, ride, rock
Essential point: Do not allow tip of tongue
to touch palate, the gum ridge, or teeth to
cause stoppage or friction.
English has many more final consonant
possibilities than Chinese or Vietnamese.
◦ 0 final consonant: see, my, shoe
◦ 1 final C: dog, cuff, smile /g, t, l/
◦ 2 final C: dogs, cuffs, smiled /gz, fs, ld/
◦ 3 final C: pants, curves, thanks /nts, rvz, ŋks/
Many combinations are not in students’
primary language inventory and will take
instruction, practice (observation +
English tends to be explosive: air travels
outward. Strong aspiration occurs on
consonants beginning stressed syllables
◦ pay, come, tend
◦ repay, become, attend
Vietnamese tends to be implosive, with lip
closure or rounding at the ends of stop
◦ English hawk vs. Vietnamese học (study, learn)
Phonemic stops (plosives) in English
◦ Labial: /p/, /b/
◦ Linguadental: /t/, /d/
◦ Velar: /k/, /g/
◦ Glottal stop is non-phonemic in English
Uh-oh, kitten, bottle (BrE), Batman
Glottal stops occur in Vietnamese, Cantonese,
and other Chinese dialects
◦ They are not written down –– not in orthography
◦ Belong to 2 lowest VN tones, 2 lowest Cant tones
◦ Final stops are pre-glottalized
◦ 特別 (special) dak6 bit6 /daʔk6 biʔt6/
◦ 北角（North Point)
◦ 食得（edible, can eat)
◦ 合作 （cooperate)
◦ Mỹ (America)
◦ Nguyễn (a common surname)
◦ học tập (study)
◦ rất đẹp (very pretty)
◦ Conceptual: description
◦ Auditory sensation:
◦ Visual sensation: eye
gaze on throat
◦ Tactile sensation: fingers
◦ Pronunciation Workouts
◦ Breathing and vibration
Did‿anyone call while‿I
Is the news good‿or‿bad?
When can we bring‿him
Yes. Dad’s doctor
Very positive. He’s
much better now.
The red curved lines show consonant to vowel linking. Note that we
delete /h/ in ‘he, him, his, her’ in the middle of a sentence.
Tone. Chinese and Vietnamese are tone
languages. Each word has its own tone.
◦ English: These pronunciations signify different
words: pan, fan, van, ban. (consonants)
◦ Chinese: These pronunciations signify different
words: mā, má, mǎ, mà (tones)
(mother, horse, hemp, scold)
Intonation: the rise and fall of the voice in
speaking – important for meaning in English.
Did_anyone call while_I
Is the news good_or bad?
When can we bring_him
Yes. Dad’s doctor
Very positive. He’s
much better now.
The arrows show intonation, the direction of pitch change.
I’ve spent decades teaching English to speakers of Vietnamese and Chinese dialects (as well as other languages), I’ve taken classes in VN, Cantonese, and Mandarin, I’ve researched linguists’ descriptions of these Asian languages, and today I’m going to share with you a small bit of what I know that will help you, as ESL teachers, approach teaching English pronunciation to Chinese and Vietnamese speakers.
Pronunciation Doctor is one of my personas. I have a Youtube Channel with over 2000 free video clips for teaching and learning English. I’ve curated them into over 25 different playlists. Most of them center around pronunciation and oral communication, but others focus on vocabulary and grammar. I welcome you and your students to access the learning videos at
Sometimes I get questions from teachers in an online format, sometimes from attendees at my face-to-face workshops, sometimes from teachers not trained in teaching pronunciation but wishing to help nonnative speakers communicate more clearly, and, of course, many from ESL students. Here’s common question about learners from many language backgrounds.
Go to my blog to see a response to this question.
This native English speaker’s question shows a frustration associated with not understanding speech production and habit, nor the incremental steps to take to alleviate the problem.
This person asks a question with more moderation. There is a recognition about helping NNS speak in a listener-friendly way.
In today’s short session, I may not be able to talk about every topic, but I invite you to my blog marshaprofdev.blogspot.com to read my detailed responses to some of these and other questions. Let me give a shout-out to Ana Wu, ESL instructor at City College of San Francisco, who sought questions from her colleagues at the various CCSF campuses.
The mirror is a basic tool for raising pronunciation awareness and production. I do systematic training of how to use (and not to use) a mirror for pronunciation.
Here’s a sagittal view of the speech organs. Are you familiar with this type of illustration? Teachers of pronunciation should become familiar with the speaking apparatus–what they’re called, what they do, how they interact to affect speech production. Can we get this picture to move?
Aha! Now, thanks to magnetic resonance imaging (MRI), we can see the speech organs in motion.
The tongue, besides being important for tasting and moving food as we chew, is the essential articulator of speech. In fact, we use this part of the body to refer to language: mother tongue, native tongue. The tongue is an amazing and powerful set of muscles. It is able to move in nearly every direction, and it can expand and compress to change size and shape. It is reported that it takes around 100 different muscles, in collaboration with one another, to create speech. These include muscles in the chest, neck, jaw, tongue and lips.
See how much the tongue occupies the mouth and how it moves during speech?!
The first time I saw a film of the movements of the tongue of a person talking (when I was in graduate school at Stanford, back in the Dark Ages), I was mesmerized by the size, myriad shapes, and rapid movements of the tongue. It seemed quite magical! Nowadays, we can find video recordings using Magnetic Resonance Imaging (MRI) and ultrasound technology on the web.
This soundless video clip is from the Max Planck Institute for Biophysical Chemistry, Göttingen
Live video of movements during speech production (MRI at 20 ms.)
Can you discern where the tongue is each time the speaker pronounces an “l” or “r” sound”?
As we teach our students to pronounce English as an additional language, especially adult learners with long-ingrained habits, we need to be aware that speaking requires a myriad of interconnected physical and mental actions, that the tongue by itself is a complex collection of muscles, and that movements become automatic over time and through continual practice.
Pronunciation workouts are physical warm-up exercises for the mouth: the jaw, lips, tongue, and lungs. Spending some time on these preparatory exercises goes a long way to helping learners pronounce English–or any language–more clearly. I won’t take time on these today, but let me refer you to my blog post Pronunciation Workouts for Improved Oral Production and to the Pronunciation Workout playlist on my Youtube channel. You’ll find links to video resources for various types of exercises for you and your students.
Both are sonorants.
The sounds reverberate off the vocal organs freely without obstruction.
Both are liquid phonemes.
Liquids produce only a partial closure in the mouth, which results in a resonant, vowel-like consonant.
The tongue approaches a point of articulation within the mouth, but it does not close off, or obstruct, the flow of air out from the lungs through the oral cavity.
Contrast the liquid quality of /l/ and /r/ with the stop sounds /k/ and /p/ in the word cup, and with the obstruent sound /ʧ/ in chair, where there is an obstruction of the airflow.
This means that the outward flow of air goes around the tongue toward the sides of the mouth before it exits through the lips. In English, /l/ and /r/ may be syllabic, acting like vowel, as in the second syllables of table and father. They are mostly non-syllabic, acting like a consonant at the beginning of rock and lock.
To produce the sound /l/, bring the tip of the tongue near the alveolar (gum) ridge, and let a voiced breath travel over the relaxed left and right sides of your tongue. Southern Chinese dialects do not necessarily differentiate /l/ and /n/, which in English are separate phonemes, so they may have difficulty distinguishing words like low and no. When teaching /l/, press the tip of the tongue against the gum ridge firmly and emphasize that the air comes out the oral, not the nasal, cavity. If you have a copy of my book and video Phrase by Phrase, look in Chapter 10 on page 103 and notice that the presentation starts with words like all, fell, and cold. The /l/ in these words occurs after a vowel sound. That means you can show the inside of your mouth as you voice the vowel /ɔ/ and then move your tongue into position for the /l/, pressing it against the gum ridge and continuing the vibration in your vocal cords to pronounce all. The next set of words contains /l/ before a vowel sound, such as let, alive, and play. Begin by showing students the tip of your tongue touching the gum ridge for /l/, vibrating the vocal cords and lowering the tip of the tongue into position for /ɛ/, and finally devoicing and stopping the sound with the tip of the tongue on the gum ridge for /t/ to pronounce let. Follow the same scaffolding: listening and looking, (listening and marking), listening and repeating, speaking independently.
In English, the sound /l/ has two allophones. The prevocalic one is known as the “clear L” or “light L”; it is articulated at the beginning of syllables. The post-vocalic one is known the “dark L” and can be represented by another phonetic symbol, /ɫ/, though this is not usually used in materials for ESL/EFL students. They both require placement of the tip of the tongue on the alveolar ridge and air flowing out over the sides. For the post-vocalic dark L, the back of the tongue is raised toward the velum, or soft palate, at the back of the roof of the mouth. A short schwa-like vowel is typically inserted before the dark L in American English, more so in some dialects and by some speakers than others.
For Vietnamese speakers, it’s important to distinguish movement of the tongue vs. movement of the lips.
Using a mirror may help if the student is able to apply verbal theory into physical action.
For users who aren’t able to use the meaning of words to control their lips, have them hold their lips apart like this when they pronounce all, tell, dill, mail.
Doing Pronunciation Workouts before lessons on specific pronunciation points will help.
One important concept to repeat when teaching pronunciation, especially in English, is that the written and spoken languages are different. Especially for visual book-learners, it’s often useful to pull students away from the alphabet and focus their attention on listening, hearing, seeing, discriminating, and physically producing new sounds.
The sound /r/ is very tricky. One of the reasons is because the letter ‘r’ is used in many written languages but articulated orally in totally different ways, so if students are learning English while assisted by or exposed to the written medium, they can be mightily misled into pronouncing the American /r/ as it is pronounced in their native language. The Vietnamese alphabet uses the letter ‘r’ but it represents a sound unlike English /r/.
As for speakers of Chinese, there are many spoken Chinese dialects–in fact, actually five to seven main language groups, which are mutually unintelligible, and of which Mandarin is only one. Most of them do not have a sound like the American 'r' as in rare. In Mainland China, the modern transcription system called Pinyin uses Roman letters to denote phonemes of Putonghua, or Standard Chinese (Mandarin), and uses the letter ‘r’ to represent a voiced retroflex fricative phoneme /ɻ/ that differs from American /r/ in initial position, for example 日 rì (sun) and 人 rén (person). It should be emphasized that the ‘r’ is not articulated the same way throughout China and other Chinese-speaking societies. Some word endings are rhotacized by speakers from certain parts of northern China, such as Beijing, who will find it easy to pronounce English words like car and more. However, your students from other locations in China, whose languages don't have ‘r’, will need more explicit instruction and practice with the American /r/.
˧ (IPA) A symbol representing a mid tone
˩ (IPA) A symbol representing an extra-low (bottom) tone
I suggest starting with the /r/ in post-vocalic position, as for /l/, because you can show the inside of your mouth as you pronounce the vowel and then move your tongue into position for the /r/. In Chapter 10 of my book Phrase by Phrase, you’ll see that the presentation starts with words like air, more, and hurry. To model air, start by voicing the vowel /ɛ/. Press the sides of your tongue against the upper side teeth and round your lips slightly as you slowly curl your tongue up near the but not touching the gum ridge to pronounce the /r/. You might add a slight schwa /ə/ sound before curling the tongue. Next, present words in which a vowel sound precedes /r/ at the beginning of a stressed syllable: array, arise, erase. This scaffolding provides a transition that will make it easier to pronounce /r/ at the beginning of a word: ray, ride, rock. An essential point: Do not allow the tip of your tongue to touch the palate, the gum ridge, or the teeth; otherwise it’ll create a stop or a fricative. And remember to have students use a mirror.
I want to talk a little about airstream. Do any of you speak or understand Spanish?
Yo quiero preparar un pedazo de papel.
I want to prepare a piece of paper.
You can hear (and see) the difference in aspiration when I pronounce the /p/ at the beginning of a stressed syllable in English. It almost seems like we are spitting. Learners from some languages consider English to be a very “spitty” language! English is explosive; the air goes outward.
A stop sound is one where the articulators stop the air from exiting the mouth. English recognizes six stop sounds that are phonemes, three voiceless-voices pairs /p-b, t-d, k-g/. English speakers using glottal stops non-phonemically: we say ʔuh-ʔoh, with a glottal stop at the beginning of each syllable. And kitten with a glottal stop between the ‘t’ and the ‘n’. Kit-ten and kit’n are recognized as the same word. They aren’t phonemically different like rock/lock. Bot-tle, bodl, and boʔl are recognized as the same word. The phonetic symbol for a glottal stop looks like a question mark without the dot at the bottom. It is not part of the alphabet in English, Chinese, or Vietnamese. Therefore, hardly anyone besides linguists know about it.
Vietnamese has six tones. Tone 5 ngã 'tumbling' and tone 6, nặng 'heavy' are glottalized. In addition, stops are pre-glottalized. That means that the speaker’s glottis closes, say, before the tongue touches the gum ridge for /t/ in “sit”.
Recent research posits that glottal stops occur before vowels in Vietnamese.
Cantonese have 6 or 9 tones (differences in linguistic analyses); the lowest tones are also glottalized and final /t, p, k/ are preglottalized.
Listen to these examples. (read)
Interference from the native language can make English sound choppy, staccato, unfinished. Some English speakers say “They’re swallowing the endings of the words.” Swallow ~ inward airstream, implosion.
So when an adult VN or Chinese speaker attempts to say “cook” –due to long years of habit– the glottis may close before the final consonant –without their being aware of it– so that /kʊk/ comes out as /kʊʔk/.
4 for nặng, the falling tone with glottalization,
(5 for hỏi, the falling-rising tone,)
6 for ngã, the falling-rising tone with glottalization
“The identity of non-identified sounds: glottal stop, prevocalic /w/ and triphthongs in Vietnamese” by Andrea Hoa Pham, University of Florida
The phonemic status of the glottal stop has been an issue in
many languages of North America and Asia. In Vietnamese,
the glottal stop has a distributional restriction in that it occurs
only syllable-initially. Whether researchers include it in the
initial consonantal inventory varies, and its status has received
no systematic analysis.
How can we help learners avoid glottal stops?
First: awareness. People generally are not aware of features of their own language; they don’t study it, they simply acquire it, learning it by “osmosis” or long-term exposure. I’ve also learned to speak French and Spanish, and whether the students speak Spanish, Chinese, or Vietnamese, they are often surprised when I describe characteristics of their language–they know how to speak the language, but they cannot explain what’s going on phonologically (or lexically or grammatically). If you spend some time learning about the language of your students–not so that you can speak or write it, but so that you can understand the habits they bring with them–then you can share with them features that are important to what you are teaching them in English.
To become aware that they speak with glottal stops, I have students look in the mirror to see, and place their fingers on their throat to feel, the glottis closing and opening. For those who are less body aware, I permit them to place their hand on my throat as I demonstrate.
I recommend doing training in breathing and continual vibration along with building sensation awareness to overcome unwanted glottal stops.
If you access my Pronunciation Workout playlist on Youtube, you’ll find a lot of videos that deal with breathing and becoming aware of how the actions of our speech apparatus affect the way we pronounce.
Once we’ve established better airstream direction and glottis control, we can enhance conversation by linking words together in phrases. This dialog shows numerous places where fluent speakers link ending consonants to beginning vowels: C-V linking. Linking, or elision, is helpful in making English sound more listener-friendly.
The red curved lines show consonant to vowel linking. Move the consonant to the word beginning with the vowel.
“Di-danyone…whi-li wazout? Goo-der bad, bri-ngem home, thi-safternoon, three’r’four.”
Note that we delete /h/ in ‘he, him, his, her’ (unstressed function words) in the middle of a sentence.
Chinese and Vietnamese are tone languages. Each word has its own tone or pitch change. In English these are different words: pan, fan, van, ban. In Chinese, these are different words: mā, má, mǎ, mà. In English, the pronunciations mān, mán, mǎn, màn all refer to the same word. You’re a good man, Charlie Brown!
In English a lot of the meaning of an utterance depends on the pitch change, or intonation, of phrases, that is, groups of words. You're happy. You're happy? YOU’RE happy?
This dialog incorporates several common intonation patterns in English:
Yes-no question: rising (AmE)
Choice: rising on 1st choice… falling on 2nd choice