UNIVERSITY OF BAGHDAD
COLLEGE OF ARTS
DEPARTMENT OF ENGLISH
THE Production of language
ASEEL KAZUM MAHMOOD
Speech production is a process that begin when the talker formulate the message in
his/her mind to transmit to the listener via speech. The next step in this process is
the conversion of the message into the message code. This corresponds to
converting the message into a set of phoneme sequences corresponding to the
sounds that make up the words, along with prosody (syntax) markers denoting
duration of sounds, loudness o sounds, and pitch associated with the sounds (
Rabiner& Juang 1993).
Although There has been less research on language production than on language
comprehension. The investigation of production is perceived to be more difficult
than the investigation of comprehension, primarily because it is difficult to control
the input in experiments on production. It is relatively easy to control the
frequency, imageability, and visual appearance (or any other aspect that is
considered important) of the materials of word recognition experiments, but our
thoughts are much harder to control experimentally (Harley 2001).
Stages of language production:
Language production refers to the process involved in creating and expressing
meaning through language (Fields, 2004).
Scovel (1998) mentions that the production of speech is neurologically and
psychologically far more complicated than negotiating a flight of stars, but its
intricacy also goes unappreciated until we suffer some linguistic disability to
commit a slip of the tongue. In daily conversations, we remain generally unaware
of the complexity of our achievement
Psychologist tend to divide linguistic phenomena into stages. One of the most
influential psycholinguistic model for speech production, developed by levelt
(1989) views is as a linear progression of four successive stages : (1)
conceptualization, (2) formulation, (3)articulation, (4)self-monitoring. First, we
must conceptualize what we wish to communicate. Second, we formulate this
thought into a linguistic plan. Third, we execute the plan through the muscles in
the speech system. Finally, we monitor our speech, to assess whether it is what we
intended to say and how we intended to say it
Where does the very beginning of any spoken utterance come from? What
sparks speech? These are difficult questions to answer partly because we still
don’t know enough about how language is produced, but partly because they
deal with mental abstractions so vague that they elude empirical
investigation. The American psycholinguist David McNeil, however has
gone on record with interesting mentalist account of how speech first
conceptualized in the human mind. His theory is that primitive linguistic
concepts are formed as two concurrent and parallel modes of though. These
are syntactic thinking which spawns the sequence of words which are
typically think of when we talk about how language is initiated, and
linguistic thinking,, which creates a more holistic and visual mode of
communication. The former segmented and linear and create strings of
syllables, words, phrases and sentences that together make up speech. He
latter is global and synthetic and tends to develop the gestures which we
naturally use to punctuate and illustrate our conversation.
McNeil’s claim, that syntactic thought and imagistic thought collaborate to
conceptualize conversations, is quiet convincingly demonstrated by the way
in which speech utterances and ordinary gestures seem to be tired and timed
together in any conversation
E.g. :Person A: where’s my brief case?
Person B: there’s your briefcase!
Person B points to the brief case the same moment he says there’s.
The problem with mentalism is The process of how imagistic and syntactic
thoughts are initially conventionalized are unclear. But McNeil offers some
plausible evidence by saying that syntactic thought may be generated by
beginning with something demonstrative . while imagistic thought might be
of some one pointing towards an object
Appealing as McNeill hypothesis appear , it is difficult to use his model to
explain this first stage of production for: nothing , his attempts to describe
how imagistic and syntactic thought are initially conceptualized are
unclear. For another, the illustrations he uses to describe how gestures
synchronize with important syntactic breaks in spoken language are difficult
to follow . perhaps it can be adequately illustrated by a videotape and not
We have seen that the initial stage of conceptualization is so far removed
from the words we actually speak and write that it is difficult to determine
this phase of production. But at the second stage of speech production,
formulation , we move close enough to the e eventual output of the process
to allow us to be more precise in our terminology and more convincing in
out use of empirical data. .conceptualization is hard to conceptualize, but
formulation is much easier to formulate . well over three decades ago , the
psychologist karl lshely published one of the first attempts to account for the
way speakers sequences string of sounds, words, and phrases together so
rapidly and accurately, , lashely gave the following example of how we
comprehend spoken sentences.
Rapid rightening with his uninjured hand saved from loss the contents of the
Like all native speakers of any language, the listeners were able to readjust
their comprehension of this sentence. After they recognized they had
initially wandered down the wrong garden path of comprehension, they were
forced to retrace their steps, and to choose the proper path towards complete
understanding. Thus lashely was able to demonstrate many of the themes
which were central to this seminal essay on speech production.
First, he showed how slips of the tongue (or the computer keyboard)
provide vivid insights into our understanding of how speech is formulated.
Second, he illustrated the power of priming in guiding the direction of
The way we demonstrate how both the production and the comprehension of
speech is largely a linear process . people tend to produce and comprehend
sentences in a linear way, and for comprehension , each additional piece of
information we receive has the potential to force us to remap out
understanding of what we have already heard.
Once we have organized our thoughts into a linguistic plan, this information
must be sent from the brain to the muscles in the speech system so that they
can then execute the required movements and produce the desired sounds.
Obviously,a thorough explanation of articulatory processes would be too
extended. However, it is useful to understand certain basic aspects of
articulation which is usually performed within three processes:
Three system of Muscles:
Fluent articulation of speech requires the coordinated use of a large number of
muscles. These muscles are distributed over three systems: the respiratory, the
laryngeal, and the supralaryngeal or vocal tract. The respiratory system
regulates the flow of air from the lungs to the vocal tract.
The laryngeal system consists of the vocal cords or vocal folds .This system
is responsible for the distinction between voiced and unvoiced sounds.
E.g. [b] vs [p] (Ladefoged, 1976).
The muscles in and around the laryngeal region produce these changes by
manipulating the length, thickness, and tension of the vocal cords. This, in
turn, significantly influences the fundamental frequency of the sound that
results. In particular, the larynx seems to be involved in the increase in
frequency that occurs at the end of yes/no questions such as Did Tom mow the
lawn? (Lieberman, 1967).
The supralaryngeal system consists of structures that lie above the larynx,
including the tongue, lips, teeth, jaw, and velum. These structures play a
significant role in the production of speech by manipulating the size and shape
of the oral cavity (the mouth and pharynx) and the nasal cavity.
All of the structures involved in speech production have other functions. The
main function of the respiratory system is, of course, breathing. The teeth and
tongue are used to chew and swallow food. The larynx operates as a valve,
controlling the air- flow to and from the lungs and preventing food from
entering the lungs.
Aitchison (1998) stress the dramatic change of the larynx or the ‘voice box’
which houses our vocal cords, like all other speech organs , the larynx did not
initially evolve with the specific function of helping human to articulate
language.. for one thing, the vocal cords in all animals possessing a larynx
serves as a kind of emergency trap door which can prevent foreign matter, such
as bit of foods, from falling from he mouth down the pharyngeal tube and
through the trachea into the lungs.
Lennenberg and others have documented several speech-enhancing
characteristics of the voice box that are unique to human and are absent in other
mammals, . The most striking difference between humans and all other animals
in this area of the body is in the position of the larynx. The advantage of the
lower voice box is to embellish the articulation of speech arrangement . unlike
other animals the pharynx benefit the production of speech in at least two ways;
it creates a new source of speech sounds, a pharyngeal tube also increase
resonance by adding extra acoustic space to the already existing oral and nasal
cavities . another enormous benefit is the way it frees up the back of the
tongue so that the root can maneuver and create more speech sounds
However, when these structures are used to produce speech, the pattern of
coordination is different. A major challenge for speech researchers is to explain
how so many different muscles are coordinated so smoothly during the
production of speech.
Motor Control of Speech:
begins with motor commands from the brain. As we assemble a linguistic plan
for our utterance, the brain structures responsible for speech production send
messages to the muscles in the respiratory, laryngeal, and supralaryngeal
systems. It is generally believed that these motor commands to speech muscles
take the form of commands for the articulators (tongue, lips, and so on) to move
to a particular location. If the next phonetic segment is [b], the muscles
controlling the lips must be brought into action.
One way to think of the motor commands, then, is that they specify a series of
target locations in the vocal tract. It is a simplification, however, to view
articulation as the production of a series of discrete sounds. Recall the concept
of coarticulation, The phenomenon refers to the condition that the shape of the
vocal tract for any given sound often accommodates to the shape needed for
surrounding sounds. This typically occurs for upcoming sounds (anticipatory
coarticulation) but also may occur when a sound is influenced by previous
The result of coarticulation is the undershooting of targets. When an articulator,
in anticipation of an upcoming sound, aims for a given location, it does
not actually achieve it. The main reason appears to be the distance the
articulators must travel to reach a series of rapidly changing targets. When
sounds are produced individually, the targets are reached; but when they are
articulated in a phonetic context, particularly one that involves antagonistic
movements, articulatory undershooting occurs (Sussman & Westbury, 1981).
Planning and Production Cycles:
Several studies have converged on the conclusion that we alternate between
planningspeech and implementing our plans. Consider first a study performed
by Henderson, Goldman-Eisler, and Skarbek (1966), who analyzed the
hesitations and fluent speech of individuals being interviewed.
Henderson and his colleagues found that all of the participants showed a cycle
of hesitation and fluency, although the ratio of speech to silence varied among
speakers. These results are consistent with the notion that we plan our
utterances in cycles We express a portion of our intended message, pause to
plan the next portion, articulate that portion, pause again, and so on (Beattie,
One underlying reason that we tend to hesitate during speech production is
that linguistic planning is very cognitively demanding, and it is difficult to plan
an entire utterance at once (Lindsley, 1975). As a consequence, we typically
plan only a portion of an utterance at a time.
Levelt (1983) found that pauses occurred more often before low-frequency
words than before high frequency words. Another variable that influences
lexical retrieval, and therefore pauses during speech, is the sheer number of
words from which we choose. Schacter, Christenfeld, Ravina, and Bilous
(1991) found that during lectures humanists used more filled pauses (such as
uh, ah, or um) than social scientists or natural scientists.According to Schacter
and his colleagues, the humanities have a far richer vocabulary than the
A different kind of variable that influences lexical retrieval during speech
production is the use of gestures. gestures that accompany speech may help
speakers formulate coherent speech by facilitating the retrieval of elusive words
from the internal lexicon. Gestures are more common in spontaneous speech
than in rehearsed speech (Chawla & Krauss, 1994) and more common with
speech that contains concrete and spatial words, such as adjacent, cube, and
spin (Rauscher, Krauss, & Chen,1996).
In addition to word frequency and size of vocabulary, such variables as
morphological complexity, lexical ambiguity, age of acquisition, and recency of
usage (that is, priming) also influence retrieval.
We have been talking of planning and production cycles as being in strict
alternation, but sometimes they overlap. Building on the work of Lindsley
(1975), Griffin (2001, 2003) has explored the circumstances under which we
articulate the beginnings of sentence while planning later parts.
A later study (Griffin, 2003) extended this line of thought. Speakers were
presented with line drawings and were asked to name the objects without
pausing between the names of the two objects (for example, windmill-carrot).
It thus seemed that speakers’ response times were sensitive to the fact that they
could prepare the second noun (such as carrot) while articulating the first, but
only if the first noun was two syllables. In short, we again see that speakers can
maintain fluent speech by preparing later portions of their sentences on the fly.
From time to time, we spontaneously interrupt our speech and correct
ourselves. These corrections are referred to as self-repairs. According to
Levelt (1983), self-repairs have a characteristic structure that consists of
First, we interrupt ourselves after we have detected an error in our speech.
Second, we usually utter one of various editing expressions. These include
terms such as uh, sorry, I mean, and so forth.
Finally, we repair the utterance. Let us consider each in turn.
Levelt (1983, 1989) distinguishes among three types of repairs.
Instant repairs consist of a speaker’s retracing back to a single troublesome
word, which is then replaced with the correct word
-Again left to the same blank crossing point—white crossing point.
In anticipatory retracings, the speaker retraces back to some point prior to
the error, as in
-And left to the purple crossing point—to the red crossing point.
Finally, in fresh starts, the speaker drops the original syntactic structure and
just starts over, as in
-From yellow down to brown—no—that’s red.
In general, speakers repair their utterances in a way that maximizes listeners’
comprehension. The listener’s problem when a speaker errs is not only to
understand the correction but also how to fit the correction into the ongoing
discourse (Clark & Clark, 1977).
Fox Tree and Schrock (1999,295) suggest that speakers use oh to signal to
their interlocutors that the conversation is about to change direction.
Sometimes oh is used as a sudden reaction to new or surprising information,
such as a surprise recollection or a surprise offer. As we have already seen, it
may also be used to indicate that the speaker is choosing what to say next, or
Self-Interruptions: Nooteboom (1980) examined a corpus of 648 speech errors
and made several interesting discoveries:
-He found that (64%) of the errors were corrected. Some errors were more
likely to be corrected than others; anticipations were corrected more often
than perseverations. In addition, Nooteboom found that most interruptions
occurred very shortly after the error.
Nooteboom suggests that the timing of self-interruption after detection of an
error is based on two competing forces. On one hand, we have an urge to
correct the error immediately. On the other hand, we want to complete the
word we are speaking.
Editing Expressions Although the matter could use further study, it appears
that the editing expression conveys to the listener the kind of trouble that the
speaker is correcting.
James (1972) analyzed utterances containing expressions such as uh and oh,
suggesting that these convey different meanings.
-I saw . . . uh . . . 12 people at the party.
-I saw . . . oh . . . 12 people at the party.
Du Bois (1974) has also analyzed several different editing expressions as in:
(22) Bill hit him—hit Sam, that is(a potentially ambiguous referent).
(23) I am trying to lease, or rather, sublease, my apartment(nuance editing).
(24) I really like to—I mean, hate to—get up in the morning( true erros).
These different editing expressions are not fully interchangeable and that the
expression that is used conveys the type of editing that the speaker is doing.
Levelt (1989) suggests that the expression uh may differ in some respects
from these other expressions, it is a symptom of trouble rather than a signal
with a specific communicative meaning. Speakers may simply utter uh when
they get stuck in the middle of their utterances. If it does not convey a
specific meaning, why say it at all? Perhaps uh, along with various
nonverbal cues such as averting one’s gaze, indicates to the listener that the
speaker still has the floor.
Speech production as a full process:
To summarize how the level model works, production begins with a set of ideas
that the speaker wishes to express; the abstract level or the conceptual preparation.
The next step is that those ideas are tied to lexical concepts because language may
have specific words for some of the ideas, but may require combinations of words
to express other ideas. After a set of lexical concepts has been activated. Lemmas
that corresponds to those lexical concepts become activated, activating lemmas
provide information about morphological properties of words including
information about “how words can be combined’’, after a set of morphemes has
been activated and organized into a sequence. The speech sounds of the phonemes
required can be activated and placed in a sequence. Phonological encoding
involves the activation of a metrical structure and syllabification organizing a set
of phonemes into syllabized group, whether the specific phoneme comes from the
same group or not. The outcome of this process is a set of phonological words
consisting of sequence of syllable sized frames, during phonetic encoding, the
speech production system consults sets of representations of specific syllables. The
system activates the appropriate syllable representations and places them in the
appropriate positions in the frame. This representation is used by the motor system
to create phonetic gestural score which is the representation used by the motor
system to plan the actual muscle movement articulation that will create sounds
that the listener will perceive as speech.
-concepts point you to lemmas.
-lemmas point you to morphological information you need to combine lemmas into
Morphological encoding point you to speech sounds (phonemes).
-you need to express specific set of lemmas in specific forms.
Evidence for weaver ++
-comes from three kind of studies:
-tip of the tongue experience e(TOT).
-picture naming and picture-word inferences studies (Traxler, 2012).
Insights from sign language :
Here in the final section we look at the production of sign language. The
production of signs is important theoretically because it gives us an opportunity to
disentangle the cognitive processes involved in translating thought into language
from the physical characteristics of our speech apparatus. Speech shares the vocal
channel with respiration; in contrast, sign production can occur entirely in parallel
with, and unimpeded by, respiratory activity. Thus, consideration of sign
production in comparison with speech production can yield insights into some of
the biological limits on linguistic form (Bellugi & Studdert-Kennedy, 1980).One
striking similarity is that errors occur in signing that strongly resemble those found
Independence of Parameters.
Morpheme Structure Constraints.
Studies of production rate, in contrast, reveal differences between the two
modes. Speakers achieve differences in speech rate primarily by varying the
number of pauses, whereas signers vary the duration of signed segments and both
the duration and number of pauses. These dissimilarities reflect the effects of
respiratory functioning on speech but not on signs.
Aitchison, J. (1998). The articulate mammal: An introduction to
psycholinguistics. London: Routledge.
Caroll, D. (2003). Psychology of language,6th
Fields, J. (2004). Psycholinguistics: the key concepts. London: Rutledge.
HARLEY, T. (2001). The Psychology of Language. Sussex: Psychology
Ladefoged, P. (1976). A course in phonetics. Newyork.
Lieberman, P. (1967). Intonation, perception, and language. Cambridge:
Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition.
Englewood Cliffs, N.J: PTR Prentice Hall.
Scovel, T. (1998). Psycholinguistics. Oxford: Oxford University Press.
Traxler, M. J. (2012). Introduction to psycholinguistics: Understanding
language science. Chichester, West Sussex: Wiley-Blackwell