Speech Rhythm In World Englishes The Case Of Hong Kong
Speech Rhythm in World Englishes: The
Case of Hong Kong
University of Reading
This study investigated syllable duration as a measure of speech rhythm
in the English spoken by Hong Kong Cantonese speakers. A computer
dataset of Hong Kong English speech data amounting to 4,404 sylla-
bles was used. Measurements of syllable duration were taken, investi-
gated statistically, and then compared with measurements of 1,847
syllables from an existing corpus of British English speakers. It was
found that, although some similarities existed, the Hong Kong English
speakers showed smaller differences in the relative syllable duration of
tonic, stressed, unstressed, and weakened syllables than the British
English speakers. This result is discussed with regard to potential intel-
ligibility problems, features of possible language transfer from
Cantonese to English with respect to speech rhythm, and implications
for language teaching professionals.
I n considering nonnative patterns of English speech, two paths are
generally pursued: segmental and suprasegmental. This article focuses
on the suprasegmental features of language. Speech rhythm is a
suprasegmental aspect of pronunciation, those aspects which describe
and address features larger than individual speech sounds. English
speech rhythm in older native varieties like British and American
English is often described as stress timed, which, in basic terms, means
that the start of each stressed syllable is said to be equidistant in time
from the start of the next stressed syllable. This kind of rhythm is in
contrast to syllable-timed languages (e.g., French, Spanish, Cantonese),
in which the start of each syllable is said to be equidistant in time from
the start of the next. Instrumental studies have, in fact, shown that very
little difference can be found between languages thought of as typically
stress timed and typically syllable timed (Roach, 1982; Dauer, 1983),
and, in fact, Cauldwell (2002) describes English as irrhythmical.
Whether these descriptions stand up under instrumental scrutiny,
they do seem to have some psychological importance for speakers of
the languages so described. English spoken with a syllable-timed rhythm
TESOL QUARTERLY Vol. 40, No. 4, December 2006 763
can be difﬁcult for speakers of stress-timed accents to understand
(Anderson-Hsieh & Venkatagiri, 1994). Tajima, Port, & Dalby (1997)
demonstrated that, when a Mandarin Chinese or Taiwanese speaker’s
speech was manipulated to match the syllable timing of a native
American English speaker, and vice versa, the Chinese speaker’s speech
improved in intelligibility by up to 25%, and the American English
speaker’s speech worsened in intelligibility by up to 25%, showing that
use of more native-like patterns considerably improves intelligibility
among native speakers of stress-timed varieties of English.
This result indicates that the acquisition of stress-timed English
speech rhythm by nonnative speakers is important in some contexts,
for example, in those where a nonnative speaker may be interacting
with a native speaker of an older, stress-timed variety such as British
or American English. Adams (1979) suggests that a learner’s failure to
use appropriate syllable timing when producing utterances in English,
instead producing “an anomalous rhythm which seriously impairs the
total intelligibility of their utterance” (p. 87), results in communicative
failure, and both parties to the act of communication will be at a loss
to explain what has happened and what was intended. This matter has
not eluded researchers, materials writers, and teachers (see, e.g.,
Anderson-Hsieh, Johnson, & Koehler, 1992; Anderson-Hsieh &
Venkatagiri, 1994; Chela-Flores, 1998; Gilbert, 1984; Taylor, 1981;
Wong, 1987), but it seems that speech rhythm and other suprasegmen-
tal features of speech are not the easiest for teachers or learners to
tackle. Indeed, rhythm is considered by some to be the single most
difﬁcult feature of English for nonnative speakers to learn (Taylor,
It should be noted that this article assumes interactions between
native speakers of stress-timed varieties of English and nonnative speak-
ers of English. Jenkins (2000), for example, considering nonnative
speaker interactions, does not include speech rhythm in her lingua
franca core, although she does agree that, based on evidence from
her own research, it appears to be crucial to lengthen stressed and
tonic syllables to improve intelligibility in English.
This study was based largely on the suggestions arising from Dauer
(1983). Although Dauer shows by examination of interstress intervals
in several languages that there is no instrumental evidence for the
stress-timed/syllable-timed dichotomy in speech production, she admits
that a so-called syllable-timed language like Spanish and a stress-timed
language like English do sound different rhythmically. She looks to
other features for an explanation, considering syllable structure, vowel
reduction, and stress/accent.
Concerning syllable structure, Dauer (1983) ﬁnds that stress-timed
languages tend to have a greater variety of syllable types. In addition,
764 TESOL QUARTERLY
open syllables such as consonant-vowel or CV syllables are found to
predominate in Spanish and French, whereas English has much more
variation among different syllable types. Dauer also ﬁnds that “there
is a strong tendency for ‘heavy’ syllables … to be stressed and ‘light’
syllables … to be unstressed” in stress-timed languages (p. 55). Heavy
syllables are determined according to what happens at the end of a
syllable, and usually contain consonants in coda position, although
those containing a long vowel or diphthong may also be analysed as
heavy. English is certainly a language which allows heavy syllables, with
up to three consonants at the beginning of a syllable and four in
syllable-ﬁnal position, whereas Cantonese, the ﬁrst language (L1) of the
speakers in this article, maximally permits CVC, with the ﬁnal conso-
nant being restricted to either an unreleased voiceless bilabial, alveolar
or velar stop [p t k], or a nasal consonant, one of [m n ŋ]. Based on
Dauer’s suggestions, it might be predicted that Cantonese is less likely
to be a stress-timed language than English. Dauer also notes that in
Arabic and Thai, considered to be stress-timed languages, stressed
syllables are more likely to be heavy.
It is not only the structure of the syllable, but also its composition
which has a bearing on stress. Dauer (1983) claims that 92% of the
unstressed CV syllables in the English text she analysed were made up
of a consonant plus a weak vowel, a type which tends to be inherently
short, whereas the stressed CV syllables contained strong vowels, which
tend to be longer.
Turning to vowel reduction, Dauer (1983) claims that stress-timed
languages often have a “separate and more restricted set of vowels to
choose from in unstressed syllables” (p. 57), whereas syllable-timed
languages tend not to have reduced vowel variants in unstressed sylla-
bles, but rather reduction results in the elimination of whole syllables.
For example, weak syllables in English contain /ə ʊ / or a syllabic
consonant, with the actual number of syllables in a word preserved
(unless in a contracted form, like I’m for I am); in Spanish “a sequence
of adjacent vowels often becomes reduced to a single vowel or is pro-
nounced as a single syllable” (p. 57). Cantonese has an extremely
restricted number of instances where syllable weakening is possible
(Bauer & Benedict, 1997).
Finally, Dauer (1983) examines stress, claiming that, whereas stress-
timed languages tend to have stress at the lexical or word level, syllable-
timed languages usually either have no lexical stress, or, where it does
exist, realise accent by pitch contour variation. Cantonese is a tone
language, in which each syllable has a speciﬁc pitch contour assigned
In conclusion, Dauer (1983) asks whether we are justiﬁed in using
the terms stress timed and syllable-timed at all, if it is the case that syllable
SPEECH RHYTHM IN WORLD ENGLISHES: THE CASE OF HONG KONG 765
structure, vowel reduction, and word stress, rather than aspects timing,
make a language nearer to one or the other category. Preferring the
term stress-based, as used by both Allen (1975) and O’Connor (1973),
she suggests, as did Roach (1982), a continuum on which languages
may be placed depending on how stress based their rhythm is, with
Japanese as the least stress based and English the most (Dauer, 1983,
So, although instrumental studies have proven either dismissive or,
at best, inconclusive about the physical existence of stress timing and
syllable timing, even those undertaking the instrumental studies men-
tioned earlier admit that the languages under discussion sound either
stress timed or syllable timed, enough so to be able to suggest a con-
tinuum on which these languages can be placed. Therefore, the labels
stress timed and syllable timed are used throughout this study. In addition,
Dauer (1983) makes a good case for there being factors other than
differences in interstress intervals, or the lack thereof, that make lan-
guages sound more or less stress based; these factors are syllable struc-
ture, vowel reduction, and word stress or accent.
The difﬁculty experienced by nonnative speakers of English from
language backgrounds which have different rhythmical types in acquir-
ing stress-timed English speech rhythm has implications for intelligibil-
ity, as demonstrated in investigations of Englishes similar to that spoken
in Hong Kong. Low, Grabe, and Nolan (2000) study the temporal
features of Singapore English, a Southeast Asian English which has
been recognised as having native speakers. Using the pairwise variabil-
ity index (PVI), which they developed, they compared vowel quality
and vowel duration with that of British English. They demonstrated
that Singapore English speakers do not reduce vowels in weak syllables
to the same extent that British English speakers do. This practice can
be expected to contribute to the rhythmic differences between
Singapore English and British English, the implication being that
Singapore English will be difﬁcult for speakers of British English to
ENGLISH IN HONG KONG
English in Hong Kong is described by Li (1999) as a “value added”
language (p. 97), meaning that being able to communicate effectively
in English is perceived by the speaker as having socioeconomic advan-
tages. Because of the economic and business environment in Hong
Kong, speakers of Hong Kong English may be interacting with other
speakers whose English could be classiﬁed as having stress-timed
rhythm. This being the case, for Hong Kong English speakers, speech
rhythm is certainly a feature of English pronunciation worthy of study.
766 TESOL QUARTERLY
Simply by listening to Hong Kong English, it is clear that the speech
rhythm is very different from that of varieties with a stress-timed
AIMS OF THE STUDY
This study aims to investigate speech rhythm among speakers of
Hong Kong English. Syllable duration was selected for investigation
because, in combination with pitch, loudness, and vowel quality, it is
an important factor in determining syllable stress in English and must
therefore contribute to its perceived rhythmical properties. An addi-
tional reason to study syllable duration is that it is thought to be a
highly learnable and teachable feature of word and rhythmic stress
(see, e.g., Gilbert 1984; Chela-Flores, 1994, 1998; Halliday, 1989). This
study focuses on weakened, unstressed, stressed, and tonic syllables.
Because this is a study of English as a second language, transfer effects
from the learner’s ﬁrst language, Cantonese, in the production of
English, in particular, fewer instances of weakened syllables in the
Hong Kong English data, may contribute to the perceived rhythm of
Hong Kong English. The hypothesis was that the rhythm of Hong
Kong English differs from that of British English because Hong Kong
English has smaller differences in the relative durations of weakened,
unstressed, stressed and tonic syllables.
The use of the term Hong Kong English does not attribute any special
status for this variety as an ofﬁcial new variety of English. As far as I
am aware, and certainly at the time of undertaking this study, there
are no native speakers of Hong Kong English, as there are of Singapore
or Indian English. Hong Kongers do not speak English with each
other outside of contrived situations, such as classes at tertiary-level
educational establishments and conversations, including business deal-
ings, where someone is present who is not a speaker of Cantonese but
is a speaker of English.
The relative differences in duration between weakened, unstressed,
stressed, and tonic syllables were measured to test the hypothesis that
the rhythm of Hong Kong English differs from that of British English
because Hong Kong English has smaller differences in the relative
durations of each type of syllable.
The hierarchy of syllable stressing (weakened, unstressed, stressed,
and tonic) was derived and developed for this research from studies
such as Bolinger (1965) and Klatt (1975), which indicate that stressed
SPEECH RHYTHM IN WORLD ENGLISHES: THE CASE OF HONG KONG 767
(including tonic) syllables are longer than unstressed (including weak-
ened) syllables in spoken discourse, and teaching materials such as
Gilbert (1984) and Chela-Flores (1998), which support such an
approach. In the sentence The book I bought had a blue front, assuming
no prior context and the main stress falling on the last word, with
rhythmic beats occurring on book, bought, blue and front, front is tonic,
with a falling tone, book, bought, and blue are stressed, I and had are
unstressed, and the and a are weakened. The item had is in fact the
main verb and therefore could be stressed, in which case blue would
probably be unstressed to maintain overall rhythm; had could certainly
not be weakened. It should be noted that any item could be stressed
depending on context. Applying the stress hierarchy, and according to
the ﬁrst rhythmic pattern described, a British English speaker could
be expected to produce the single-syllable words book, bought, blue, and
front with a longer average duration than the single-syllable words the,
I, had, and a, with front being particularly long because it is tonic, and
the and a being particularly short because they are weakened.
The Hong Kong English Data
Data from 20 Hong Kong Cantonese speakers of English were used in
this study. Participants were all students in their third and ﬁnal year of
study at the Hong Kong Polytechnic University at the time of data col-
lection. Recordings were made over a 3-year period, from 1996 to 1999.
The 10 female and 10 male students from whom the data were col-
lected fall roughly into two groups: those studying for language degrees,
and those studying nonlanguage subjects. The students following lan-
guage degrees at the Hong Kong Polytechnic University are assessed
in English language skills as part of their degree, whereas, at the time
of data collection, those following nonlanguage programmes had to
take classes in English but were not required to pass English to be
awarded a degree. The students whose speech was analysed for this
study were from three different departments of the university: Chinese
and Bilingual Studies, speciﬁcally from the Bachelor or Arts (Honors)
in Language and Communication, Building and Real Estate, the
Bachelor of Science (Honors) in Building Surveying, and Building
Services Engineering, and the Bachelor of Engineering (Honors) in
Building Services Engineering.
The Hong Kong Polytechnic University is an English medium insti-
tution, which means that, with the exception of students studying
another language, all classes should take place in English. In reality,
a good deal of tuition takes place in Cantonese. This is especially so
in the case of nonlanguage subjects.
768 TESOL QUARTERLY
Because the study focuses on the rhythm of Hong Kong English,
research based on word lists, which are possibly the most convenient
method for collecting large amounts of data, is inappropriate. Instead,
cassette tape recordings were made of students giving presentations in
class. This method has the advantage of providing a dataset that com-
prises a large amount of monologue from a number of different speakers.
It is for the latter reason that conversational data were not considered
for this study; although potentially the most natural kind of speech, it
was felt that it might not have yielded a suitably large quantity of con-
nected speech from one speaker and would certainly have involved
interruptions and overlap from other speakers. Also, the participants
would not normally speak English to each other, and so any spoken
English data collected at all is bound to be contrived to some extent.
One criticism of using data generated from class presentations is that
the delivery might be stilted, or less than natural, because of the
scripted nature of the task. However, being students in their third year
of study, the participants were all skilled in-class presenters and in the
main did not require strong adherence to a script.
Cue cards were used by students during their presentations as an
aide-mémoire, and students also used overhead transparencies. Because
it was an assessed task, students may well have rehearsed their presen-
tations. In addition, most students were either presenting on their
ﬁnal year projects—material with which they are more than familiar—
or on a passion or hobby of theirs. Therefore, the data used in this
study can be considered to give a reasonably accurate representation
of the features of English connected speech of all participants.
The topics covered in the data are presented in Table 1; each par-
ticipant is labelled f for female or m for male. A purely subjective score
of how stress timed or syllable timed the speaker sounds based on my
expert opinion as a phonetician is given in the column marked Rhythm;
a rating of 1 means a speaker sounds stress timed and a rating of 5
that the speaker sounds syllable timed. I wish to emphasise that this
score is entirely subjective.
Participants were tape-recorded using a personal stereo cassette
recorder (Sony Walkman™ model WM-R707) with a lapel microphone
clipped on to either a lapel or the collar of their clothing. The partici-
pants were fully aware that they were being recorded and had given their
permission for the recordings to be used as data for study purposes.
The speech collected was analysed by converting the recordings to
a machine-readable sound signal and measuring the duration of syllables
SPEECH RHYTHM IN WORLD ENGLISHES: THE CASE OF HONG KONG 769
List of Participants’ Presentation Topics and Author's Impression of Rhythmic Type
Female Speaker Files Rhythm 1–5 Male Speaker Files Rhythm 1–5
f01: To be a good
manager 2 m01: Property & housing market 3
communication 3 m02: Ceramic tiles 4
f03: Personal space 4 m03: Safety (demolition) 3
f04: Interview follow-up 5 m04: Site supervisor motivation 4
f05: Advertisements 3 m05: Interest risk 5
f06: Wording of
advertisements 4 m06: Pollution problems 5
f07 – nonverbal
behaviour 3 m07: Job satisfaction 4
f08: AIDS 3 m08: Bamboo scaffolding 3
f09: SCMP versus
People’s Daily 4 m09: Industrial accidents 5
f10: Goal setting 3 m10: Poling contractors 4
using specialist computer software on a PC platform. Speech from the
cassette recordings was sampled at a rate of 16,000 samples per second
(16 kHz, 16 bit mono PCM), and then labelled on computer by the
author. The computer software used to label data in this study is
Speech Filing System (SFS; for the latest edition, see Phonetics &
Linguistics, UCL, 2004), developed for research purposes at the
Department of Phonetics and Linguistics, University College London.
With the SFS software, speech data may be labelled in a number of
ways. For the purposes of this study, a broad phonetic segmental tran-
scription was used but included glottal stops, nasalisation, vocalised
/l/, and aspiration, where strong. The software then allows the user
to generate a ﬁle which contains information on the duration of each
of the sound segments in samples per second. This number was con-
verted into milliseconds (ms) by dividing it by 16 (thus 16,000 samples =
1,000 ms). Calculations of syllable duration were made from that infor-
mation; this is then analysed and compared with the SCRIBE data.
The British English Data
The British English data used for this study were drawn from the
SCRIBE corpus (see Spencer, 1990). SCRIBE is a corpus of British
English speakers from four main areas of the United Kingdom: the
Southeast (with received pronunciation or a southern standard British
770 TESOL QUARTERLY
English accent), Glasgow, Leeds, and Birmingham. The aim was to
record and annotate the speech of 30 speakers from each set perform-
ing a number of different spoken tasks, which include reading several
different sets of sentences, reading a passage, and undertaking a map
task to elicit free speech.
In selecting appropriate material for comparison, it was necessary
to decide which speech task performed by the British English speakers
is most closely comparable to the Hong Kong English data. In this
instance, it was decided to use the read passage for comparison. The
passage itself takes little more than 2 minutes to read aloud and is
about the advances in sailing technology since the time of the Vikings
to the present day. This passage, and not the free speech task, was
chosen for comparison because the Hong Kong English speakers, in
giving presentations with the aid of note cards that may have been
rehearsed, are performing a task which is in more ways similar to pas-
sage reading than to free speech.
Five speakers were taken from the SCRIBE material, one female and
four male speakers. All were from the Southeast set. The choice of
speakers was restricted by the availability of comparable transcription
passages because only one female and ﬁve male speakers from this
region were transcribed using a broad phonetic transcription. The pas-
sage is divided into four paragraphs of just over 30 seconds each. To
extract an amount of speech from each of the speakers for compari-
son, approximately one minute of each of the four male speakers was
used, two of the male speakers reading the ﬁrst two paragraphs and
the other two reading the last two paragraphs. In the case of the
female speaker, as there was only one female for whom a broad pho-
netic transcription was available, the entire passage was used in this
Speech from the SCRIBE corpus was sampled at a rate of 20,000
samples per second (20 kHz) and labelled using suitable speech analy-
sis software. This renders the label ﬁles into a slightly different format
from that of SFS, and so the data were manipulated on computer to
make them comparable. In addition, the segmental durations derived
from sampling at 20 kHz were divided by 20 in order to give a dura-
tion in milliseconds (20,000 samples 1,000 ms).
In order to calculate the duration of the syllables in the data, it is
ﬁrst necessary to syllabify the data. This was achieved using the maxi-
mal onsets approach adopted in Roach, Hartman, & Setter (2006) for
syllabifying the entries in the seventeenth edition of the English
SPEECH RHYTHM IN WORLD ENGLISHES: THE CASE OF HONG KONG 771
Pronouncing Dictionary. In its most basic form, maximal onsets means
that, “where possible, syllables should be divided in such a way that as
many consonants as possible are assigned to the beginning of the syl-
lable to the right” (p. xiii), assuming a linear transcription in which
speech is transcribed from left to right.
The rules for syllabiﬁcation were based on what is permissible in
the citation form of a monosyllabic word in English. In the case of
vowels, long vowels and diphthongs in English were permitted to be
syllable ﬁnal, but short vowels were not; this is because no monosyl-
labic English word occurs in RP or southern standard British English,
which ends with one of the short vowels / e æ / or /ʊ/. There
are, however, exceptions among short vowels in the case of unstressed
syllables. Schwa is always weak and can therefore occur in syllable-ﬁnal
position; unstressed / / and /ʊ/ also occur in weakened syllables in
English and were therefore afforded the same structural status when
weakened. In this system, photography, for example, is syllabiﬁed /f .
t g.r .ﬁ/, and educate /ed.jʊ.ke t/.
The nonphonemic vowel symbols [i] and [u] were used either as
the counterparts to unstressed / / and /ʊ/, respectively, when either
was followed by a vowel (e.g., react /ri'ækt/; inﬂuential / nﬂu'enʃəl/)
or appeared word ﬁnally in unstressed positions (e.g., happy /'hæpi/).
This practice is in line with current practice transcribing British
English, as demonstrated in Roach et al. (2006) and Wells (2000). It
should be noted, however, that using the symbols /i/ and /u/ is based
on native speaker intuitions of vowel quality in the positions men-
tioned earlier and that the symbols have no phonemic validity.
Concerning consonants, it is permissible to have up to three conso-
nants initially and four consonants ﬁnally in restricted combinations
in British English monosyllables (Roach, 2000). All consonants making
up the consonantal inventory of British English, with the exception of
/ŋ/, may occur in initial positions. In ﬁnal positions in British English,
the approximant consonants /r w/ and /j/ and fricative /h/ are not
permitted. However, according to the maximal onsets rule, in con-
nected speech, consonants belonging to the end of words may be syl-
labiﬁed as initials when the speech is broken down into syllables. For
example, if the maximal onsets rule is applied, cats and dogs is likely
to become /kæt.sn.d gz/ and forced in two will be divided as /f ፡.st n.
tu፡/ in connected speech.
It was found in the process of syllabifying the Hong Kong English
data that, in some cases, it was difﬁcult to apply maximal onsets insofar
as many syllables that would usually be weakened in British English
connected speech were pronounced with a vowel that was not weak-
ened. For example, collapse of any part is produced by speaker m03 as
/k læps venipɑ፡t/, rather than /k læps venipɑ፡t/. If adhering strictly
772 TESOL QUARTERLY
to maximal onsets in this case, it would be necessary to divide collapse
of any part as /k l.æp.s v.en.i.pɑ፡t/; however, it was felt that for Hong
Kong English speakers, a short vowel in syllable ﬁnal position is entirely
possible, as long as the syllable is unstressed. In other words, unstressed
short vowels in syllable ﬁnal position in Hong Kong English are treated
as having a similar status to / / and /ʊ/ in British English. In fact,
Jenkins (2000) positively encourages this approach with regards to
English as an international language. This interpretation leads us to
the following division of syllables: /k .læp.s .ven.i.pɑ፡t/, which is com-
parable to the likely British English version, /k .læp.s .ven.i.pɑ፡t/. This
approach, together with others mentioned below, was adopted to cope
with the data in this study and is not intended to imply that Hong
Kong English speakers have overt rules about syllabiﬁcation.
Other matters arose during syllabiﬁcation. One was that Hong Kong
English has many phonetically nasalised vowels, where Cantonese
speakers of English lower the velum in anticipation of a nasal conso-
nant which is present in the target phonology but not necessarily real-
ised with a full oral closure (Walmsley, 1997). In syllabifying nasalised
vowels where there was nasalisation in anticipation of a ﬁnal nasal
consonant, but no ﬁnal nasal consonant was pronounced, the syllable
was treated as containing a ﬁnal nasal consonant. For example, speaker
M06 produces construction industry as [k nsrÙk ò ̃ind stri], where the
vowel in the third syllable [ò ̃] is nasalised; this syllable is treated as
ending with a nasal consonant.
A second issue concerns ﬁnal dark and syllabic /l/. As is noted in
Hung (2000), dark and syllabic /l/ are frequently realised as vowels
by Hong Kong English speakers. Where a dark /l/ was very clearly
realised as a vowel, it was transcribed as a vowel.
Finally, the Hong Kong English data contains a large amount of
glottal stopping. This feature can prevent the linking associated with
connected English speech. Where the glottal stop is clearly not a reali-
sation of another consonant and appears in prevocalic position (e.g.,
speaker m09’s the accident is realised with a glottal stop at the begin-
ning of accident), it is not included as part of the syllable measurement.
This rule was also applied to the British English data to make sure the
treatment was comparable.
The British English data are much more straightforward to syllabify,
and in no cases were maximal onsets violated to cope with a speaker’s
Assigning Syllables to Stress Type
I assigned the syllables to a category in the stress hierarchy by using
an auditory/perceptual analysis, that is, listening to the speech in its
SPEECH RHYTHM IN WORLD ENGLISHES: THE CASE OF HONG KONG 773
continuous form and deciding which syllable belonged to which cate-
gory, based on my experience of both varieties of English and my
expertise as a phonetician. The categories were weakened (1),
unstressed (2), stressed (3), and tonic (4). A sample was checked by
another phonetician with less experience of Hong Kong English for
veriﬁcation; no objective measure of interrater reliability was carried
RESULTS AND ANALYSIS
Tables 2 and 3 give an overview of syllable duration across the two
language types, measured in milliseconds (ms). As previously stated,
the Hong Kong English data comprised 4,404 syllables and the British
English data comprised 1,847 syllables.
It becomes immediately apparent from a quick glance at Tables
2 and 3 that the overall duration of syllables in Hong Kong English
was longer than in British English. The mean syllable duration for the
Hong Kong English speakers was 244.39 ms and that of the British
English speakers was 109.99 ms (all data in this section is rounded to
two decimal points where appropriate, with some rounding resulting
in one decimal point only). The British English syllables were shorter
despite the fact that the British English speakers were performing a
reading task in which their speech tempo was reasonably slow and
precise. However, the standard deviation in both cases was relatively
similar: 104.6 for the Hong Kong English speakers and 109.21 for the
British English speakers. The distributions for both sets of data were
normal, and an alpha level of 0.01 was used for all statistical tests.
The syllables were divided into four categories: weakened, unstressed,
stressed and tonic, as outlined earlier, and these categories were used
in the data analysis. It was assumed that tonic syllables in the data
would be the longest in duration, followed by stressed, unstressed, and
then weakened syllables. The ﬁndings support this assumption.
Descriptive statistics can be seen for Hong Kong English and British
English in Tables 4 and 5, respectively (1 weakened, 2 unstressed,
3 stressed, 4 tonic).
Descriptive Statistics for All Syllables: Hong Kong English
N Minimum Maximum Mean Std. deviation
Duration (ms) 4,404 22.38 759.38 244.39 104.60
ValidN (listwise) 4,404
774 TESOL QUARTERLY
Descriptive Statistics for All Syllables: British English
N Minimum Maximum Mean Std. deviation
Duration (ms) 1,847 18.00 687.00 109.99 109.21
ValidN (listwise) 1,847
Figure 1 shows the difference between the two varieties. The Hong
Kong English data are represented by the upper solid line (L1 = 1 in
the key), and the British English data by the lower dashed line (L1 =
2). On the x (horizontal) axis, 1 = weakened syllables, 2 = unstressed
syllables, 3 = stressed syllables and 4 = tonic syllables. On the y (verti-
cal) axis, average duration in milliseconds is given.
From the fact that syllables in the Hong Kong English data were
considerably longer overall than those in the British English data, it
might be anticipated that syllables in all categories in the Hong Kong
English data would be signiﬁcantly longer statistically than those in the
British English data, but in fact this is not the case. It is clearly shown
in Figure 1, in which a curvilinear relationship between stress and
duration emerges, that this group of Hong Kong English speakers
maintain differences in length across the four stress levels, but that
they do not maintain these differences to the same degree as the
British English speakers studied; the ratio is different. An independent
samples t-test of each category ﬁnds the data to be different at a sig-
niﬁcance level of p ≤ 0.000 for weak, unstressed, and stressed syllables,
but it ﬁnds no signiﬁcant difference between the duration of tonic
syllables across the two language groups, at p 0.536 (equal variances
not assumed). This ﬁnding can be expected from looking at Figure 1.
The ratios of the syllables (Hong Kong English: British English) are
Syllable Duration According to Stress Level: Hong Kong English Data
Stress level N Minimum Maximum Mean Std. deviation
Duration (ms) 849 33.19 637.75 195.34 100.09
ValidN (listwise) 1
ValidN (listwise) 2 1922 22.38 669.38 220.78 90.29
ValidN (listwise) 3 960 94.06 697.88 282.47 91.22
ValidN (listwise) 4 673 72.25 759.38 319.38 107.36
SPEECH RHYTHM IN WORLD ENGLISHES: THE CASE OF HONG KONG 775
Syllable Duration According to Stress Level: British English Data
Stress level N Minimum Maximum Mean Std. deviation
Duration (ms) 643 18 453 129.95 71.97
ValidN (listwise) 1
Duration (ms) 498 20 599 150.30 77.52
ValidN (listwise) 2
Duration (ms) 408 73 553 246.83 71.32
ValidN (listwise) 3
Duration (ms) 298 99 687 314.23 124.76
ValidN (listwise) 4
as follows: weak syllables 1:1.5; unstressed syllables 1:1.47; stressed
1:1.14; tonic 1:1.02.
A feature revealed by the descriptive statistics that may have inﬂu-
enced the perceived rhythm of Hong Kong English was the much
Line Plot of Syllable Duration According to Stress Level in Hong Kong English And British English
776 TESOL QUARTERLY
greater proportion of unstressed but not weakened syllables in the
Hong Kong English data, as demonstrated in Figure 2. Although the
Hong Kong English and British English data had similar percentages
of stressed and tonic syllables, the Hong Kong English data had far
more unstressed than weakened syllables: 43.64% of Hong Kong
English syllables were unstressed and 19.3% weakened, compared with
26.96% unstressed and 34.81% weakened in the British English data.
The line plot, Figure 1, is rather telling about the situation in Hong
Kong English rhythmic stress: Weak and unstressed syllables are not as
short as those in the British English speech data, but tonic syllables
are very similar in length. Thus, the degree to which these syllables
differ in Hong Kong English is in sharp contrast to that of British
English. For the pattern to reﬂect the British English speakers, and
taking into account the overall difference in syllable length, the lines
would have had to have been parallel, not convergent. The lines,
although similar in form, are certainly not parallel, and the only point
at which the two varieties show no statistically signiﬁcant difference is
tonic syllables (4 on the x axis). At each of the other three points, the
amount of difference becomes progressively less, but is still signiﬁcantly
Proportion of Syllables According to Stress Level for Hong Kong English and British English
SPEECH RHYTHM IN WORLD ENGLISHES: THE CASE OF HONG KONG 777
different from the British English data. However, it would be overly
simplistic to conclude that the difference in rhythmic pattern between
Hong Kong English speakers and British English speakers is depend-
ent only on differences in relative syllable duration across categories
of stressing. Figure 2 clearly shows that the Hong Kong English speech
has a much greater proportion of unstressed syllables than does the
British English speech, which contains more weakened syllables, and
this fact will affect the perceived rhythm of Hong Kong English.
It was noted earlier that syllables in Hong Kong English were longer
on average than those of British English. This difference could be due
to speaking rate, and no attempt has been made in this study to nor-
malise the data for differences in participants’ speaking rate—unlike,
for example, Low et al. (2000). However, speaking rate should not
affect the relative durations of syllables, and will certainly have no
bearing on the ratio of each category. In addition, it is hoped that,
through choosing data from a fairly large number of Hong Kong
English speakers (20 in total), speaking rate would be reasonably con-
sistent, at least for this group of speakers, for the task they were doing
(i.e., giving a presentation).
Relative syllable duration in different levels of stressing may be a
key factor in determining the perceived rhythm of a language. This
belief arises directly from Dauer’s (1983) observations concerning dif-
ferences in vowel reduction in syllables across languages demonstrating
different rhythmic types, or being at one end or the other of a stress-
based continuum, and it was the basis of the hypothesis explored in
Support can certainly be found for this hypothesis. Figure 1 clearly
shows that, although this group of Hong Kong English speakers main-
tained the differences in length across the four stress levels weakened,
unstressed, stressed, and tonic, the group of British English speakers
did not maintain them to the same degree. Vowel reduction, or lack
thereof, is one of Dauer’s (1983) criteria for languages to differ in the
way they sound rhythmically; we can expect that these differences in
the patterns of vowel and syllable reduction shown in Figure 1 will,
therefore, serve to make Hong Kong English sound different rhythmi-
cally from British English. This situation could be seen as similar to
Low et al’s (2000) ﬁnding for syllable nuclei in Singapore English.
The descriptive statistics revealed a feature of equally high impor-
tance to the perceived rhythm of Hong Kong English, that of the
much greater number of unstressed but not weakened syllables in the
Hong Kong English data (Figure 2). This result is tied to Dauer’s
(1983) criterion of vowel reduction, and it could be seen as similar to
Low and Grabe’s (1999) “lack of ‘deprominencing’” (p. 49) in
Singapore English. Although the Hong Kong English and British
778 TESOL QUARTERLY
English data have similar proportions of stressed and tonic syllables,
the Hong Kong English data has far more unstressed than weakened
syllables: 43.64% of Hong Kong English syllables are unstressed and
19.3% weakened, compared with 26.96% unstressed and 34.81% weak-
ened in the British English data. Therefore, more syllables in Hong
Kong English appear with a full vowel rather than a schwa or syllabic
consonant—they are, in effect, less weak, and so lack deprominencing.
Dauer’s observation that syllable-based languages do not have the same
patterns of vowel reduction supports the fact that Hong Kong English
is likely to sound syllable rather than stress based if a language transfer
stance is adopted because such a stance reveals Hong Kong English
speakers’ preference for unstressed rather than weakened syllables.
Because the pattern of strong and weak syllables seems to be impor-
tant in native speakers’ perception of stress-based languages (see, e.g.,
Adams, 1979; Anderson-Hsieh et al., 1992; Cutler, 1993; Fear et al.,
1995), the lack of deprominencing in the Hong Kong English data
could suggest that these speakers are likely to be less intelligible to
native speakers of English when compared with their British English
counterparts. Native speakers of English may be less able to under-
stand these Hong Kong English speakers because the predictability of
English speech rhythm, which Buxton (1983) notes to be “relevant to
perceptual processing” (p. 120), is somewhat lacking in their speech.
So, is it possible to explain what is responsible for the differing pat-
tern of Hong Kong English from British English? One possible cause
is L1 transfer. Cantonese is described as a syllable-timed language, in
part because it has an extremely restricted number of instances where
syllable weakening is possible (Bauer & Benedict, 1997). This restricted
syllable weakening could mean that Cantonese speakers of English do
not demonstrate native-like patterns of English stress-timing because
they transfer their L1 patterns of syllable-timing, in which a full vowel
appears in each syllable, with syllables typically not subject to weaken-
ing, to the L2. This transfer of course would go hand in hand with
other features of the L1 syllable, all of which might contribute to the
perceived syllable-timed sound of the L2.
Another suggestion involves the difference in how English and
Chinese are represented graphically. The Chinese writing system is not
alphabetic, but pictographic or ideographic. Outside of alphabetic rep-
resentations of Chinese, like Pin Yin for Mandarin, no claim is made
that the form of the character in any systematic way depicts the pro-
nunciation of the syllable represented (although a phonetic element
may be present). English, on the other hand, is basically represented
in a phonetic manner, in that letters are used which correspond to
the sounds of the word and presented in a linear left-to-right format
giving the order in which these sounds are produced. However, English
SPEECH RHYTHM IN WORLD ENGLISHES: THE CASE OF HONG KONG 779
is notorious for being difﬁcult to spell because the grapheme-
phoneme correspondence is not static and is therefore often a poor
guide to pronunciation. Luke & Richards (1982) have commented on
the more frequent occurrence of full vowels in syllables that are nei-
ther stressed nor tonic in Hong Kong English, and they ascribed it to
the inﬂuence of English orthography. Additionally, Brown (1988) men-
tions the same phenomenon in Singapore English and suggests spell-
ing pronunciation—pronouncing each vowel with a full value as
represented in the spelling—as a possible culprit. It could be that the
phenomenon of preferring unstressed rather than weakened syllables
is, therefore, not so much a matter of L1 transfer, but of habits devel-
oped when learning to read in L2. It is also possible that a combina-
tion of L1 transfer and L2 reading habits is responsible.
To conclude, Hong Kong English speakers have smaller differences
in the duration of weakened, unstressed, stressed, and tonic syllables
than British English speakers as well as a much greater proportion of
unstressed to weakened syllables than found in the British English
data. These two factors combine to affect the perceived rhythm of
Hong Kong English speech.
Having reported on a study of speech rhythm in speakers of Hong
Kong English, it should be clear that I value the importance of the
relative stressing of syllables in a stream of speech and believe that
work on teaching English speech rhythm of the kind thought to exist
in British English has obvious importance and rewards for learners.
Like Chela-Flores (1994, 1998) and Gilbert (1984), I advocate work on
syllable duration as a way of teaching and learning speech rhythm
because, as this study shows, the duration of syllables in Hong Kong
English does not differ from syllables in British English; this similarity
in duration contributes to the lack of deprominencing which can
make Hong Kong English difﬁcult to follow. More native-like speech
rhythm will improve matters for those British, American, or Australian
visitors, for example, whether commercial or recreational, who are not
used to the syllable-timed patterns of Hong Kong English, resulting in
better transactions for all concerned.
The controversy over whether the terms stress timed and syllable timed
are useful as pedagogical terms, however, rumbles on. Cauldwell
(2002), based on his own research, concludes that the use of these
terms in fact obstructs our understanding of how spontaneous speech
works and that they should therefore be abandoned altogether
in teaching and learning theories and materials. But although the
780 TESOL QUARTERLY
inﬂuence of research into the reality of the production of stress- and
syllable-timed languages is growing in English language teaching circles,
sensible research will not fail to focus on the importance of and mech-
anisms behind appropriate stressing to make messages clear. For Marks
(1999), the use of rhythmical structures such as rhymes in the class-
room is valid in so far as it
provides a convenient framework for the perception and production of a
number of characteristic features of English pronunciation which are
often found to be problematic for learners: stress/unstress (and therefore
the basis for intonation), vowel length, vowel reduction, elision, compres-
sion, pause (between adjacent stresses). (p.198)
Although stress timing may itself fall out of favour as a description
of what is happening in the rhythm of English, skilful identiﬁcation
of some key aspects of the theory and how they contribute to making
messages clear are useful for pedagogical purposes.
Jane Setter is a lecturer in phonetics at the University of Reading, Reading, England.
She has also worked in Hong Kong and Japan. Jane is co-editor with Peter Roach
and James Hartman of the seventeenth edition of Daniel Jones’s English Pronouncing
Dictionary and joint coordinator of IATEFL’s Pronunciation Special Interest Group.
Adams, C. (1979). English speech rhythm and the foreign learner. The Hague, the
Allen, G. D. (1975). Speech rhythm: Its relation to performance universals and articu-
latory timing. Journal of Phonetics, 3, 75–86.
Anderson-Hsieh, J., Johnson, R., & Koehler, K. (1992). The relationship between
native speaker judgements of nonnative pronunciation and deviance in segmen-
tals, prosody, and syllable structure. Language Learning, 42(4), 529–555.
Anderson-Hsieh, J., & Venkatagiri, H. (1994). Syllable duration and pausing in the
speech of Chinese ESL speakers. TESOL Quarterly, 28, 807–812.
Bauer, R. S., & Benedict, P. K. (Eds.). (1997). Modern Cantonese phonology. Berlin:
Mouton De Gruyter.
Bolinger, D. W. (1965). Pitch accent and sentence rhythm. In D. W. Bolinger (Ed.),
Forms of English (pp. 139–180). Cambridge, MA: Harvard University Press.
Brown, A. (1988). The staccato effect in the pronunciation of English in Malaysia
and Singapore. In J. Foley (Ed.), New Englishes: The case of Singapore (pp. 115–147).
Singapore: Singapore University Press.
Buxton, H. (1983). Temporal predictability in the perception of English speech. In A.
Cutler & D. R. Ladd (Eds.), Prosody: Models and measurements (pp. 111–121). Berlin:
Cauldwell, R. (2002). The functional irrythmicality of spontaneous speech: A dis-
course view of speech rhythms. Apples, 2(1), 1–24. Retrieved January 31, 2003,
SPEECH RHYTHM IN WORLD ENGLISHES: THE CASE OF HONG KONG 781
Chela Flores, B. (1994). On the acquisition of English rhythm: Theoretical and practical
issues. International Review of Applied Linguistics in Language Teaching, 32(3), 232–242.
Chela-Flores, B. (1998). Teaching English rhythm: From theory to practice. Caracas,
Venezuela: Fondos Editorial Tropykos.
Cutler, A. (1993, October). Segmenting speech in different languages. The Psychologist,
Dauer, R. M. (1983). Stress timing and syllable timing reanalyzed. Journal of Phonetics,
Fear, B. D., Cutler, A., & Butterﬁeld, S. (1995). The strong/weak syllable distinction
in English. The Journal of the Acoustical Society of America, 97(3), 1893–1904.
Gilbert, J. B. (1984). Clear Speech: Pronunciation and listening comprehension in American
English. Cambridge, England: Cambridge University Press.
Halliday, M. A. K. (1989). Spoken and written language (2nd ed.). Oxford, England:
Oxford University Press.
Hung, T. (2000). Towards a phonology of Hong Kong English. World Englishes, 19,
Jenkins, J. (2000). The phonology of English as an international language. Oxford, England:
Oxford University Press.
Klatt, D. H. (1975). Vowel lengthening as syntactically determined in a connected
discourse. Journal of Phonetics, 3, 129–140.
Li, D. S. C. (1999). The functions and status of English in Hong Kong: A post 1997
update. English World-Wide, 20(1), 67–110.
Low, E. L., & Grabe, E. (1995). Prosodic patterns in Singapore English. In K. Elenius &
P. Branderud (Eds.), Proceedings of the XIIIth International Congress of Phonetic Sciences,
Stockholm (vol. 3, pp. 636–639). Stockholm, Sweden: KTH/Stockholm University.
Low, E. L., Grabe, E., & Nolan, F. (2000). Quantitative characterizations of speech
rhythm: syllable-timing in Singapore English. Language and Speech, 43(4), 377–401.
Luke, K. K., & Richards, J. C. (1982). English in Hong Kong: Functions and status.
English World-Wide, 3, 147–164.
Marks, J. (1999). Is stress-timing real? ELT Journal, 53, 191–199.
O’Connor, J. D. (1973). Phonetics. Harmondsworth, England: Penguin.
Roach, P. (1982). On the distinction between “stress-timed” and “syllable-timed”
languages. In D. Crystal (Ed.), Linguistic controversies: Essays in linguistic theory and
practice in honour of F R Palmer (pp. 73–79). London: Edward Arnold.
Roach, P. (2000). English phonetics and phonology: A practical course (3rd ed.). Cambridge,
England: Cambridge University Press.
Roach, P. J., Hartman, J. W., & Setter, J. E. (Eds.). (2006). Daniel Jones’ English pro-
nouncing dictionary (17th ed.). Cambridge, England: Cambridge University Press.
Spencer, C. (1990). Pre-SCRIBE Final Report. London: University College.
Tajima, K., Port, R., & Dalby, J. (1997). Effects of temporal correction on intelligibility
of foreign-accented English. Journal of Phonetics, 25, 10–24.
Taylor, D. S. (1981). Non-native speakers and the rhythm of English. International
Review of Applied Linguistics in Teaching, 19(3), 219–226.
University College of London, Phonetics & Linguistics. (2004). Speech ﬁling
system [version 4.6]. London: Author. Available from http://www.phon.ucl.
Walmsley, J. B. (1997). Cantonese English: An essay in diagnostic linguistics. In G.
Nickel (Ed.), Proceedings of the 3rd AILA Congress Copenhagen (vol. 1, pp. 261–277).
Heidelberg, Germany: Groos.
Wells, J. C. (2000). Longman pronunciation dictionary. London: Longman.
Wong, R. (1987). Teaching pronunciation: Focus on English rhythm and intonation.
Englewood Cliffs, NJ: Prentice Hall Regents.
782 TESOL QUARTERLY