University Of Baghdad
College Of Arts
Perception of Sounds
Pitch and Loudness
By: Aseel Kazum Mahmood
Perception is a general term with a general sense found in phonetics and psycholinguistics,
where it refers to the process of receiving and decoding speech input. The perception process
requires that the listener take into account not only the acoustic cues present in the speech
signal, but also their own knowledge of the sound pattern of their language, in order to interpret
what they hear. The term is usually concerned with production (crystal, 2003: 165).
The perception of sound in any organism is limited to a certain range of frequencies. For
humans, hearing is normally limited to frequencies between about 20 Hz and 20,000 Hz
(20 kHz), although these limits are not definite. The upper limit generally decreases with age.
Other species have a different range of hearing. For example, dogs can perceive vibrations
higher than 20 kHz, but are deaf to anything below 40 Hz. As a signal perceived by one of the
major senses, sound is used by many species for detecting danger, navigation, predation,
and communication. Earth's atmosphere, water, and virtually any physical phenomenon, such
as fire, rain, wind, surf, or earthquake, produces (and is characterized by) its unique sounds.
Many species, such as frogs, birds, marine and terrestrial mammals, have also developed
special organs to produce sound. In some species, these produce song and speech.
Furthermore, humans have developed culture and technology (such
as music, telephone and radio) that allows them to generate record, transmit, and broadcast
sound. The scientific study of human sound perception is known as psychoacoustics
Trask (1996:260)defines perception as the process by which an individual detects and interprets
information from the external world by means of the organs of sense, the nervous system and
the brain, in speech, the term is particularly applied to the way in which acoustic characteristics
like frequency and intensity are registered and interpreted in terms of speech perception which
is the process by which a hearer extracts identifiable linguistic elements from the continuous
acoustic signal of speech
The mechanism of speech perception, according to Gimson (1998:8), follows the same steps of
sound production but in revered way: the reception of sound waves by the hearing apparatus of
the listener (the physiological stage); then the transmission of the information through the nerve
system to the brain where the linguistic interpretation of the message takes place( the
The process of perception of speech sounds involves several factors. Accordingly, saying that,
we ‘’heard ‘’ a sound can mean several different things, which are summarized by David crystal
(2006:44-45) as follows:
1. The body may react psychologically to the sound stimulus, but we are not
consciously aware of it just like in some involuntary reflexes that are indicated by
the rate of our breathing or heart-beating in response to some specific situations.
2. A sound is consciously detected. This means it has to be audible in order to be
heard. For this to happen there should be a certain minimum of stimulation.
3. Sounds may be preserved to be the same (recognized, or (different) or
(discriminated). In order that the brain can differentiate between two different
sounds, there has to be minimum difference in magnitude between them.
4. The brain is able to focus on certain aspects of a complex auditory stimulus and to
ignore others, in this is called the phenomena of auditory attention. Therefore,
when we ‘’hear attentively, we are said to be ‘’listening’’. So listening and
hearing are not the same, and must be carefully distinguished.
Perception in the brain:
Many different aspects of our perception of sound help us make sense of auditory nerve activity.
The speech perception is the study of the way speech sounds are analyzed and identified in the
brain. Speech production is a part of the general subject called auditory perception the study of
the way we take in any kind of sounds stimulus from music to barking dog The Basic question
in speech perception is how the brain manages to find linguistic unites within he auditory system
or noise that surround us Finding the units of speech.(crystal:2006:45)
The basic question on speech perception is how the brain manages to find linguistic items when
people talk at once in a crowded room we are able to tune in to one speaker and ignore the other,
the human brain ability to pay attention to some incoming sound stimulus and ignore others that
is known as selective listening. How does the brain select auditory information so impressively?
Those complications are avoided when we are listening to just one speaker but even one to one
interaction is not a simple process what we receive from a speaker is a continuously varying
waveform . If we record that waveform, we find that the linguistic unit is not neatly demarcated
by pauses or other boundary marker. Sounds run into each other. Yet we are listening we hear
this waveform as a sequence of sounds and words .how is the brain able to analyze this signal so
that the language units can be identified?
When we analyze the signal, we find other intriguing issues. If we hear different instances of
particular sound, we have no difficulty recognizing them as the same but when we examine the
relevant parts of the wave form we find that the same sound may not have the same wave form,
moreover the articulation of the sound is by different people will result in differ wave form
because of their regional dialect and individual qualities will not be the same.
In normal speech, people produce sounds very quickly (twelve or more segment per second), run
sound together, and leave sounds out, nevertheless, the brain is able to process such rapid
sequence and cope with these modifications.(ibid)
The mechanical process described so far is only the beginning of our perception of sounds. The
mechanisms of sound interpretation are poorly understood, in fact are not yet clear whether all
people interpret sounds in the same way. Until recently, there has been no way to trace the
wiring of the brain, no way to apply simple stimuli and see which parts of the nervous system
respond, at least not in any detail. The only research method available was to have people listen
to sounds and describe what they heard. The variability of listening skills and the imprecision of
the language combined to make psycho-acoustics a rather frustrating field of study.
The current best guess as to the neural operation of hearing goes like this:
We have seen that sound of a particular waveform and frequency sets up a characteristic pattern
of active locations on the basilar membranes. (We might assume that the brain deals with these
patterns in the same way it deals with visual patterns on the retina.) If a pattern is repeated
enough we learn to recognize that pattern as belonging to a certain sound, much as we learn a
particular visual pattern belongs to a certain face. (This learning is accomplished most easily
during the early years of life.) The absolute position of the pattern is not very important; it is the
pattern itself that is learned. We do possess an ability to interpret the location of the pattern to
some degree, but that ability is quite variable from one person to the next. (It is not clear whether
that ability is innate or learned.) What use the brain makes of the fact that the aggregate firing of
the nerves more or less approximates the waveform of the sound is not known. The processing of
impulse sounds (which do not last long enough to set up basilar patterns) is also not well
Theories in speech perception:
The reason why phonetician’s interest in perception as already been stated: because the
phenomena of speech can be understood only if its production and perception are views as
interrelated and interacting elements of single process(Tiffany. R and Carrel. J 1987:8). Many
speech scientist like Ladefoged (1967) and warren(1969) have tested the listener ability to
identify the unites of speech by listening to tapes, in fact, understanding spoken language is not
hard to account for . Our perceptual sets are usually set to listen to meaning, and it proves that
meaning can be apprehended without necessarily utilizing every potentially available acoustic
cue. However phonetician is adopt or is required to adopt a special listening set in order to note
its salient features.
The motor theory of speech production: this hypothesis about the way spoken language is
perceived is related to the nature of thought process , psychologist agree generally that thinking
except in the case of nonverbal forms, is mediated by verbal symbols. Thinking is carried out by
means of convert or sub vocal speech, we think, they theorize, by ‘’talking ‘’silently to ourselves
by means of inner speech movements’’.
That is speaker repeats the message and apprehend its meaning from cues provided by inner
speech response .Lieberman ET all (1967) had a different view that ‘’the speech decoder works
by referring the incoming speech signal to command that would be appropriate to its production
‘’. However the perceptual processing has a number of acceptable explanatory principles that are
bought together under an analysis-by-synthesis model
Analysis -by- synthesis :it’s the formulation of what takes place in the process of speech
perception runs as follows: at initial stage, the incoming speech signal is received by the sensory
end of organ of hearing a in the ear and transmitted to the brain via the auditory path ways , up
to this point , only physical energy in the form of sensory nerve impulses will have reached the
brain , although I, brain circuitry next organized the data it has perceived into percepts on which
recognition is based however this takes place, structuring is an essential feature , the listener
construct one of his own in an attempt to match it . Recognition is based on fragmentary
information involves a principle that psychologist call closure. an auditory perception are
conditioned by certain presumptions mad on the basis of past experience .in the case of speech
these presumptions are the product of learning and take the form of some kind of ’known’
speech sound, the constancy principle incline us to perceive a given figure as always the same
regardless of variations in details .percepts are made to fit one’s prior presumption, and cues not
consistent with presumptions are rejected.one ability to understand spoken language is made
highly efficient through analysis-by synthesis.
However our immediate interest lies in the implication analysis-b-synthesis may have for the
problem of perceiving phonetic characteristics of speech. A base premises would seem to be our
habitual perception set, which is to listen to speech for its meaning, must be replaced by one
which follow u to perceive details of its form it should help to realize that our description of
speech form are likely to be biases not necessarily because we are bad listeners but by reason of
the very factors which enable us to gasp meaning efficiently, they work against the recognition
of structural details
A second premise which is also basic for knowledge of possible speech form provide a memory
bank for which to draw on matching features that have been deleted or detected in a sample
under study with known articulatory possibilities,
Perception in speaker to hearer:
Gimson (1989:19) when we listen to continuous utterance, we perceive an ever changing pattern
of sound. As we have seen, when it is a question of our own language, we are not conscious of
the complexities of pattern which reach our ears: we tend consciously to perceive and interpret
only those sound features which are relevant to intelligibility of our language. Nevertheless,
despite this linguistic selection which we ultimately make, we are aware that this changing
pattern consist of variations of different kinds: of sound quality- we hear a variety of vowel and
consonant; of pitch- we appreciate the melody, or intonation, of the utterance ; of loudness- we
will agree that some sounds of some sounds are ’louder’ than others ; and length- some sounds
will extend longer to our ears than other, these are judgments made by a listener in respect of a
sound continuum emitted by a speaker and, if the sound stimulus from the speaker and response
form the listener are made in term of the same linguistic system, then the utterance will be
meaningful for the speaker and listener alike . it is reasonable to assume , therefore, that there is
constant relationship between the speaker’s articulation and the listener’s reception of sound
variation. In other words, it should be possible to link through the transmission phase the
listen’s impression of changes of quality, pitch, loudness and length to some articulatory activity
in the part of the speaker, it will in fact be seen that the exact correlation between the production
transmission and reception phases of speech is not always easy to establish, the investigation is
such relationship being one of the task of preset day phonetic studies.
The perception of speech sounds involves four perceptual categories of: pitch, loudness, quality
and length (O’Connor, 1973:99).
According to Lyons (1981:68) the auditory dimension of pitch and loudness correlate with the
acoustic parameters of frequency and intensity; but the correlation between pitch and frequency ,
on one hand and between loudness and intensity on the other, is not stated in terms of fixed ratio
valid for the whole range of speech- sounds varying along the relevant dimension. While
O’Connor classify speech perception into 4 categories of pitch, loudness, quality and length
Pitch: the attribute of auditory sensation in term of which a sound may ordered on a scale from
‘low to ‘high’. It is an a auditory phonetic feature, corresponding to some degree with the
acoustic feature, The attribute of auditory sensation in terms of which a sound may be ordered on
a scale from ‘low’ to ‘high’, the study of speech is based upon the number of complete cycles of
vibration of the vocal folds. Pitch refers to a certain auditory property of a sound that enables
listener to place it on a scale going from low to high, without considering its acoustic properties
(ladefodged, 2006:23). According to peter Roach, it is ‘’an auditory sensation ‘’. That is to say,
‘’when we hear a sound that vibrating regular such as note played on a musical instrument, or
vowel produced by human voice, we hear a high pitch if the rate of vibration is high and low
pitch of the rate of vibration is low’’.(1992:23).
Trask (1996:278) defines pitch as :the perceptual correction of the frequency for the sound –in
speech, of the fundamental frequency of the vocal cords, the higher the frequency (that is the
more rapid the vibration))the higher the pitch, but the correlation is far from linearity : at higher
frequencies (though not at lower), the pitch is roughly proportional to the logarithm of the
frequency . Denes and pinson 1993:104) classify pitch into high low, elevated, rising, falling and
Pitch is auditory phonetic feature which associated with the acoustic feature of frequency that is
based on the number of complete cycles of vibration of the vocal cords (Crystal, 2003: 355). So,
when a speech sound goes up in a frequency it also goes up in pitch, since it depends on the rate
of vibration of the vocal cord. According to Katamba, ‘’the more taut the vocal cords are, the
faster they vibrate and the higher is the pitch of the perceived sound’’ (1989:186).
Also, pitch is usually associated with frequency; the higher the frequency of a sound, the higher
we perceive the pitch to be. But our perception of pitch is affected by the duration and intensity
of the sound stimulus. However, the concept of pitch and frequency is not identical: whereas
frequency is an objective, physical fact, pitch is a subjective psychological sensation (crystal,
Gimson (1989:24) our perception of pitch of a speech sound depends directly upon the frequency
of vibration of the vocal folds. Thus we are normally conscious of the pitch caused by the
‘voiced sounds, especially vowels; pitch judgments made on voiceless or whispered sounds,
without the glottal tone , are limited in comparison with those made on voiced sounds, and are
induced mainly by variations of intensity or by the dominance of certain harmonics brought by
the resonating cavities the higher the glottal fundamental frequency, the higher our impressions
of pitch, pitch level of voice will vary in a great deal between individuals and speech of one
Our perception of pitch is not however solely dependent upon fundamental frequency. variation
of intensity on the same frequency may induce impressions of change of pitch, and again, tones
of very high or low frequency , if they are to be auditable require a greater intensity that those in
a middle range of frequencies.(ibid)
Loudness according to Trask is the perceptual correlate of the acoustic intensity of a sound.
(1996:211). The attribute of auditory sensation in terms of which sound may be ordered on a
scale from soft to loud. It is an auditory phonetic feature, corresponding to some degree with the
acoustic features of intensity or power(measured in decibels (DB)), which in the study of
speech is based on the size of vibration of the vocal cords, as a result of vibration in air pressure,
there is however, no direct or parallel correlation between loudness (or volume) and intensity:
other factors that intensity my effect out sensation of loudness ;e.g. increasing the frequency of
vocal cords vibration may make one sound seem louder than another. (Crystal.2003: 288)
As for loudness, it is another perceptual dimension of speech sounds which is primarily related to
sound intensity (O’Conor, 1973:101). It refers to the ‘’an attribute of auditory sensation in terms
of which a sound may be ordered on a scale from soft to loud’’ (Crystal, 2003:278). According
to Peter Roach, we use loudness to refer to the ‘’scientific measurements of the amounts of
energy present in sounds’’, and ‘’the impression received by the human energy present in sounds
‘’, and ‘’ the impression received by the human listener (1992:48). Loudness is used to overcome
some difficulties in communication conditions. Or to give strong emphasis to what we say (ibid).
Loudness is an auditory feature which corresponded to some degree with the acoustic feature of
intensity or power, which is based on the size of vibration of the vocal cords (crystal, 2003:278).
The loudness of a sound may depend on several factors. for example, if the sound Syllable is a
standing alone, or in separation from its neighbors, it will be louder because it is associated with
a marked pitch, or because it is longer than its neighbors (Gimson,1989:25).
The loudness of a sound also depends on the size of vibrations in air pressure that may occur,
and intensity is the appropriate measure that corresponds with loudness (ladefordge, 1993:187).
Finally, both pitch and loudness provide some functional indications to the listener as they may
indicate the psychological conditions of the speaker, the significance of what he/she is saying,
and the manner and mode of what is said.
Our sensation of the relative loudness of sounds may depend on several factors. A sound or
syllable may appear to stand out from its neighbors. It is better to use a term such as prominence
to cover these general listener-impressions of variations in the perceptivity of sounds. More
strictly, what is ‘loudness’ at the receiving end should be related to intensity at the production
stage, which is in return related to the size of amplitude of the vibration and the speaker’s feeling
for stress. Moreover, all other things being equal, some sounds appear by their nature to be
louder than others: e.g. vowels may be more powerful than consonants.
-Crystal, D.2006. How language works. Clay Ltd. England.
-Crystal, D.2003. A dictionary of linguistics and phonetics. Blackwell, Oxford,
- Denes, Peter B. and Eliot N. Pinson. 1993. The speech chain: the physics and
biology of spoken language, 2nd (1st end 1973). New York:W.H.Freeman.
-Gimson.A.c, .1989,fourth edition.an introduction to the pronunciation of English.
New York, Rutledge chapman ltd.
- Ladefoged, P. 2001.vowels and consonants: an introduction to the sound of
English. Blackwell. Ltd.
-lyons,j.1981.language and linguistics. Cambridge: Cambridge university
-Roach, P. 2009. English phonetics and phonology. 4th edition. Cambridge:
Cambridge university press.
-Roca & Jhonson.2000.a course in phonology.2nd
edition. Blackwell publisher ltd.
-O'Connor, J.1988. Phonetics. Penguin, Australia and UK.
-Olson, Harry F. Autor (1967). Music, Physics and
Engineering. ISBN 9780486217697.Springer Handbook of Auditory Research,Vol.
29 Yost, William A.; Fay, Richard R. (Eds.)2008
-Singh, S and Singh, K.1976.phonetics: principle and practices.
Maryland:university park press.
-Trask.L.R.1996. a dictionary of phonetics and phnology.london:routledge
-TiffanyR.willaim and Carrel j.1978.phonetics:theory and application.2nd ed.
Singapore:Mcgraq-Hill Book Co.