Speech perception is defined as the process by which a perceiver tries to identify the talkers underlying language patterns on the basis of speech sounds and movements. The ultimate goal of speech perception is to determine the meaning and intent behind the spoken message.
-Arthur Boothroyd (1998)
In many everyday situations, we find ourselves listening to speech-often trying to understand the speech of one particular person even as other conversions, radio broadcasts, and public address announcements create a troublesome speech background. How do we understand the speech of other people? How do we select one voice particularly from a crowd of conversing persons? By what processes do we take in the perishable acoustic signal of speech and quickly reach decision about who said it, what was said and how it was said? All of these decisions must be made before the speaker produces the next utterance. These are some of the questions that the study of speech perception attempts to answer.
Auditory perception of speech is a process of interpreting the instructions imprinted on the acoustic wave by the speaker over a time span.
Auditory perception of speech per se deals mainly with the temporal management of information from the input (Berlin 1969).
• Speech is a continuous, unsegmented event. The organs of speech glide from one target position to the next, generating transitional information in the process.
• The characteristics of the acoustic stimulus for any given phoneme are considerably influenced by its neighbors i.e., its phonetic context. Coarticulation results from overlapping of the articulatory constituents of one sound with the next.
The perception of any sound can be considered in terms of either
a) The manner of articulation used in its production
b) The resultant acoustic event.
McKay (1956) described two approaches for an explanation of how linguistic value is determined from a speech signal. They are
1) Active
2) Passive
The passive system is envisaged as a filtered system functioning to identify and combine information so as to restructure the pattern. These theories are termed ‘Non mediated’ theories.
The active models are viewed as comparator systems in which input pattern are compared to an internally generated pattern. These models/theories are referred to as ‘mediated’ theories.
2. McKay(1956)describedtwo
approachesforanexplanation
ofhowlinguisticvalueis
determinedfromaspeech
signal. Theyare:
ACTIVETHEORIES PASSIVETHEORIES
1. Relies on cognitive resources or
intellectual energy
1. Relies on passive responses such
as thresholds
2. Mediating 2. Non mediating
3.Top down 3. Bottom up
4. Sequential 4. Non sequential
5. Comparator system is involved 5. Comparator system is not
involved
6. Involves motor processes 6. Involves sensory processes
7. Assumes that perception involves
work on the perceiver
7. Assumes that decision can be
made with little or no use of special
cognitive operation
3. GENERAL
ATTRIBUTES
BOTTOM UP TOP DOWN
Assumes that information in the physical signal is
essential to and adequate for the eventual
perceptual decisions to be made.
Assumes that analysis of physical signal is not
sufficient to make the required perceptual decisions.
Data driven perceptual decision-making process is
directed almost entirely by the information obtained
from the physical signal.
The decision-making process is highly dependent on
higher level sources of information such as
hypothesis generated from linguistic or cognitive
operations.
AUTONOMOUS INTERACTIVE
Based on the idea that perception occurs in a closed
system of decision making.
Perceptual decision making can and does rely on
various sources of information outside the
perceptual processor.
All information needed to reach the required
perception decision is contained in the basic
perceptual operations i.e., sub stages of perceptual
processing.
It allows the sub stages to interact.
4. ACOUSTIC
THEORY (FANT,
1960,
1962,1967) –
PASSIVETHEORY
Utilizes the concept of distinctive features that he developed along with Jacobson
& Halle. The acoustic theory is dependent upon the extraction of the distinctive
features from the acoustic features.
The origin of speech wave pattern is the response of the vocal tract filter systems
to one or more sound sources. So, speech wave is specified in terms of its source &
filter characteristics.
Source filter theory by Fant (1960, 1962, 1967): This states that speech is a
product (P) of source (S) and transfer function of vocal tract (T)
P=S хT
The central theme of acoustic theory of speech perception is that each vowel
sound has its own characteristic formant frequencies. Hence an acoustic analysis
(frequencies) of speech signal would readily help in identification of speech
sounds.
The distinctive feature information that exists in the articulatory stage of speech
production is imprinted upon the acoustic speech wave. The listener who draws
physiological maps in the auditory systems then internalizes. These maps
constitute the internal auditory pattern representation.
The overlapping of information is a major factor in phoneme identification. Human
listeners appear to derive formant frequency through a procedure that makes use
of their unconscious, internalized knowledge of mechanism and physics of speech
production.
5. Acoustic theory follows the concept of distinctive features
which are well documented and well established.
Many phonemes, phoneme sequences and words have similar
articulatory placements and it’s the acoustics of these
phonemes that helps us to be differentiated (Ohala)
Non speaking infants and animals can identify many speech
and non- speech sounds where acoustic play a major role and
not the articulatory postures.
MERITS
This theory does not speak about infant perception.
The theory has failed to prove the acoustic mapping of speech
sounds.
It has also failed to prove the direct link between the acoustic
and phonetic features.
The theory does not talk about co articulation.
DEMERITS
6. ANALYSIS BY
SYNTHESIS-
PASSIVETHEORY
The listener unconsciously produces a synthetic version of the input speech based on
a coarse auditory analysis. If the two version is matched the analysis is considered
successful, if they don’t match more refined processing of the input is necessary.
This involves both Top-down and Bottom-up processing in speech perception and
hypothesis that the listener decodes details without the speech perception.
Categorical perception is an evidence for the dual process model of speech
perception with a bottom-up auditory process and a top-down phonetic process.
The theory holds that the sounds that are perceived categorically are coded in terms
of features which disappear rapidly from auditory memory and are recorded
phonetically for longer time. Whereas continuously lasting sound have more lasting
features in auditory memory.
Listeners may recognize words by matching aspects of acoustic inputs to patterns stored in
their lexical dictionary. Recognition involves cognitive processes other than simply
matching the auditory patterns. The listener utilizes the context to anticipate future words.
If a word cannot be immediately recognized, the following words often help.
Transient acoustic information in short term acoustic memory may be lost unless it is
quickly recorded into a more compact phonetic for long term memory. Word recognition
may not require the identification of individual phonemes as an immediate step. For
multisyllabic words, perception may involve syllables as immediate processing units.
In this model the incoming acoustic speech pattern is subjected to analysis at lower levels
of the auditory system. This yields information not only about the frequency and intensity
distribution but also about the spectral characteristic of signal over time.
7. Corcoran (1971) has explained, passive processing of patterns involving two stages:
a) Analysis into their parts
b) The resynthesis of the processed parts back into neurological representation of the
entire stimulus.
This theory says that there are special cells in the brain that are sensitive to and capable
of analyzing particular information.
The authors cite developmental evidence which suggests the existence of innate feature
sensing neuron systems stimulated in their development by exposure to spoken
language.
Mc Caffery (1967) and Moffitt (1971)- the auditory system of very young infants can
discriminate between synthetic speech patterns of certain consonants sounds. Change
in heart rate occurred on presentation of a second consonant sound after the infant had
become accustomed (habituated to the first). This indicates that the acoustic features
were identified as dissimilar.
These neural detectors must be able to respond to spatial-temporal changes in the
signal.
NEUROLOGICAL
THEORY (ABBS
ANDSUSSMAN,
1971)- PASSIVE
THEORY
8. LATERAL INHIBITION:
An efferent fiber in the efferent pathway, as it is
otherwise called a descending pathway is
responsible for the lateral inhibition.
In lateral inhibition, certain impulses (considered
unwanted) are inhibited by the efferent systems so
that the necessary signals travelling though the
ascending system is enhanced.
FEATURE DETECTORTHEORY (Abbs
and Sussman, 1971):
It is rather concerned with the process of auditory
decoding of the acoustic speech signal which results
in phonetic identification.
The feature detectors are defined as ‘organizational
configurations of the sensory nervous system that
are highly sensitive to certain parameters of complex
stimuli”, the feature detectors respond
simultaneously to multiple characteristics.
Spatial configurations of receptor cells located in the
inner ear can be especially tuned to respond to
formant patterns especially formant transitions.
Speech stimuli would be processed differently from
non speech stimuli of equal complexity. Several
other researchers provide evidence that speech
sounds and non speech sounds other than
rhythmical features are processed in opposite
hemispheres of the brain, speech mainly in the left
and non speech in the right.
9. It explains theoretically how the transition changes, coded into the spatiotemporal
aspects of the acoustic wave, may be detected by the auditory system.
Each group of neural cells will have a dynamic range; such an arrangement
augmented by the tuning action of lateral inhibition could explain how the system
identifies phonemes which differ only by one feature.
The feature detector model of speech perception can provide a direct explanation of
a very intricate transduction detection phenomenon changing acoustic energy into
coded neural energy at high rates of acoustic inputs.
Normalization has been explained through lateral inhibition
Infant perception has been accounted for through innate feature sensing
mechanisms.
MERITS
Many of the studies mentioned here are done on animals and visual system
and has been generalized to human beings and auditory system respectively.
As a contrast to the study on speech and non- speech sound being differently
processed, researchers mentioned that the 2 types of materials processed by
different hemispheres is not necessarily processed differently.
The number of feature detectors required for processing all of the acoustic
features conveying a single phonological feature (Example: voicing) is
inordinately large.
As this theory says, the storage of the entire possible speech pattern in the
brain is impossible.
DEMERITS
10. AUDITORY
THEORY (H.S
GOPAL &
SYRDAL, 1986)-
PASSIVETHEORY
Speech is perceived via some complex auditory processing of the acoustic signal and not by relating it
to the production apparatus.
It emphasizes the sensory, filtering mechanisms of the listener and relegate speech production
knowledge to a minor, secondary role in which it is used only in difficult perceptual conditions.
The auditory model was primarily intended to address the 2 major issues in speech perception:
1.The mapping of phonetic features onto the acoustic signal
2.The normalization of acoustic variability for a given sound
Fant (1962) has modeled speech perception as primarily sensory. He maintains that the perceptual and
production mechanisms share a pool of distinctive features but that the listener need not refer to
production to perceive speech.
Fant (1962): listeners, having been exposed to language, are sensitive to the distinctive patterns of the
speech wave and only need to refer to their own ability to speak when shadowing or listening under
other unusual circumstances.
Morton and Broadbent (1967): listeners can decode directly, although reference to production may
be made when the perceptual task is difficult, as in transcribing speech phonetically.
Marler (1970): adult speakers are presumed to have stored abstract patterns of speech- templates of
phonemes or syllables. When they listen to speech, they match the incoming auditory patterns to the
stored templates to identify the sounds.
11. The auditory model of vowel recognition proposed by Syrdal and Gopal(1986)-
When the acoustic signal enters the auditory system, it first sets up a pattern of
excitation in the peripheral auditory system (the basilar membrane of cochlea). This
excitation set up by the acoustic signal consisting of formant frequencies (as well as
harmonics) is first captured in terms of a critical band scale or the Bark scale.
Perception of all speech sounds: It is
applicable only for vowel perception. It
does not account for consonant perception.
Even while explaining vowels it considers
only two of the vowel features; Vowel
height and place of articulation.
Infant perception: It indirectly accounts for
infant perception considering the fact, that
infants can perceive speech sounds even
before they start producing.
Perception of Speech Vs Non-speech
sounds: It does not talk about the
perception of non-speech sounds.
Production and perception link: It couldn’t
give an explanation for the rate disparity
between production and perception
12. This model provides a perceptually based, quantitatively
defined link between some acoustic and phonetic features.
This model emphasizes on the sensory filtering mechanisms
of the listener.
This model gives a better classification of vowel compared to
other models.
This model has good psycho acoustic and speech perception
data.
MERITS
This model talks only about vowel perception and does not
talk about consonant perception.
In vowel perception also it talks only about height and place
of articulation, so it does not completely and uniquely specify
a given vowel.
It doesn’t talk about co-articulation.
DEMERITS
13. QUANTAL
THEORY
(STEVENS,
1972)
This theory doesn’t belong to active/passive theory. It deals with relation between
speech perception and articulatory changes.
In the vocal tract is that, as the constriction moves from glottis to lips there are
regions of little topographical change but there are large structural
discontinuities between these regions. Thus, Stevens describes acoustic
discontinuities as not perceptual ones, but finds them in actual formant changes.
It is concluded from the theory that the human auditory system is especially sensitive
to those acoustic changes that the human articulatory system produces.
Implications of theories of speech perception:
1. Making aids for HI.
2. In low frequency transposition HA.
3. For speech impaired subjects
14. MOTOR
THEORY- ACTIVE
THEORY
People perceive spoken words by identifying the vocal tract gestures
with which they are pronounced rather than by identifying the sound
patterns that speech generates.
Speech perception is done through a specialized module that is
innate and human-specific.
The role of the speech motor system is not only to produce speech
articulations but also to detect them.
The theory was initially proposed in the Haskins Laboratories in the
1950s by Alvin Liberman and Franklin S. Cooper.
It was developed further by Donald Shankweiler, Michael Studdert-
Kennedy, Ignatius Mattingly, Carol Fowler and DouglasWhalen.
15. ORIGIN &
DEVELOPMENT
Associationist approach:
Infants mimic the speech they hear and that this leads
to behavioristic associations between articulation and
its sensory consequences. Later, this overt mimicry
would be short-circuited and become speech
perception.
Cognitivist approach:
The behavioristic approach was replaced by a
cognitivist one in which there was a speech module.
The module detected speech in terms of hidden distal
objects rather than at the proximal or immediate level
of their input.
Changing distal objects:
Initially, speech perception was assumed to link to
speech objects that were both
1. the invariant movements of speech articulators
2. the invariant motor commands sent to muscles to
move the vocal tract articulators
This was later revised to include the phonetic gestures
rather than motor commands, and then the gestures
intended by the speaker at a prevocal, linguistic level,
rather than actual movements.
Modern revision:
The "speech is special" claim has been dropped, as it
was found that speech perception could occur for
nonspeech sounds (for example, slamming doors for
duplex perception).
Mirror neurons:
The discovery of mirror neurons has led to renewed
interest in the motor theory of speech perception, and
the theory still has its advocates, although there are
also critics.
16. • Nonauditory gesture information
• Categorical perception
• Speech imitation
• Speech production
• Perception-action meshing
SUPPORT
• Multiple sources
• Production
• Speech module
• Sublexical tasks
CRITICISM
17. REFERENCES
1. Introduction to speech perception – Sanders
2. Models of speech perception – an auditory approach to vowel
recognition
– H.S Gopal, JISHA vol 9, 1992
3. Speech science primer- physiology, acoustic & perception of speech 5th
edition
– Borden & Harris
4. Speech production and perception-Tatham & Katherine
5. http://kunnampallilgejo.blogspot.com/2012/09/acoustic-theory-of-
speech-perception.html?q=theories+of+speech+perception
6. https://en.wikipedia.org/wiki/Motor_theory_of_speech_perception
18. QUESTIONS
ASKED IN
PREVIOUS
YEARS
1. Write a short note on QuantumTheory - 4 Mark (2019, 2011, 2009, 2006)
2. Critically evaluate the two passive theories of speech perception - 16 Mark (2019)
3. What are the different classification of speech perception theories and discuss acoustic
theory of speech perception and its relevance - 16 Mark (2017)
4. Which theory explain speech perception best and justify your choice - 16 Mark (2015, 2014,
2013)
5. Critically evaluate the motor and quantum theory of speech perception - 16 Mark (2011)
6. Critically evaluate the acoustic theory of speech perception, what are its advantage over
motor theory - 16 Mark (2009)
7. Describe neurological theories and how do acoustic theory assist in understanding speech
perception - 16 Mark (2006)
8. Short note on source filter theory - 4 Mark (2022, 2021)
9. Discuss and critically evaluate motor theory of speech perception – 16 Mark (2022, 2018)
10. Explain with recent research how the information theory can be applied in the field of speech
and hearing – 16 Mark (2021)
11. Short note onTRACE theory – 4 Mark (2011)
12. Critically evaluate neurobiological theory of speech perception – 16 Mark (2011)
13. Short note on analysis by synthesis – 4 Mark (2017, 2009)
14. Describe motor theory of speech perception. What are its advantages and disadvantages? –
16 Mark (2016)
15. Short note on McGruk effect – 4 Mark (2011)