Oglesbee_DefenseVersion-3

MULTIDIMENSIONAL STOP CATEGORIZATION IN ENGLISH,
SPANISH, KOREAN, JAPANESE, AND CANADIAN FRENCH
Eric Nathanael Oglesbee
Submitted to the faculty of the University Graduate School
in partial fulfillment of the requirements
for the degree
Doctor of Philosophy
in the Department of Linguistics,
Indiana University
July 2008

ii
Accepted by the Graduate Faculty, Indiana University, in partial fulfillment of the
requirements for the degree of Doctor of Philosophy.
_________________________________________
Kenneth de Jong, Ph.D.
(chair)
_________________________________________
Daniel A. Dinnsen, Ph.D.
_________________________________________
Robert F. Port, Ph.D.
_________________________________________
Diane Kewley-Port, Ph.D.
Bloomington, Indiana
July 10th, 2008

iii
© 2008
ALL RIGHTS RESERVED

iv
Dedication
For my son, β…
I wish you were here to share this with me.

v
Acknowledgments
Although my name is the only one that appears on the title page, in truth this work
belongs to the countless people who have invested their lives in my own over the last 29
years. As with all acknowledgement sections, this one is woefully inadequate, and I
apologize to anyone who feels that they should have been included, but are not explicitly
mentioned. My acknowledgments span nearly every aspect of my life from the last few
years and are not limited to my academic interactions. I have broken things down into
three broad categories: Educational, Personal, and for lack of a better word, Special.
Educational
Over the last 6 years I have received funding from the following sources:
1. Indiana University department of Mathematics (2 fall semesters; 2 spring semesters)
2. Indiana University department of Linguistics (2 summers; 1 spring semester)
3. Steve Chin: NIH Grant 5R01DC005594-03 (1 fall semester; 1 spring semester)
4. Diane Kewley-Port: NIH Grant R01-DC02229 (2 summers)
5. Kenneth de Jong: NSF Grant 0446540 (1 summer; 3 fall semesters; 2 spring semesters)
At no point over the last 6 years did I go without funding, for which I am extremely
grateful.
With respect to my dissertation, I would like to thank the 55+ participants who I
subjected to varying degrees of boredom and/or frustration, as well as those who
provided translation assistance (Scott Lamanna, Audrey Liljestrand, Hanyong Park, and

vi
Kenji Yoshida) for developing multi-lingual experimental interfaces. Without these
people, there would have been nothing to write about for a few hundred pages.
More generally, I want to acknowledge some other of my fellow graduate
students, without whom this would have been a very dull experience. In Linguistics, this
includes Tossi “Whopper” Ikuta, Thora Asgeirsdottir, and Brian Riordan. Some people
who deserve special recognition are Ashley Farris-Trimble (who is quite possibly my
twin sister), Indrek Park (who talked us into going to Beijing for a couple months) and
Noah Silbert (who probably thinks I understand more of his work than I really do). From
the Math department, where I spent a lot of my time early on, I would like to
acknowledge all of my office mates from the “crack house”. In particular, the friendship
(and distractions) provided by Shawn Alspaugh, David Meier, and Rob O’Connell.
Some often overlooked people who deserve to be mentioned are the secretaries in
Linguistics: Marilyn Estep, Jan Cobb, and the late Ann Baker. These people make the
world go round.
Three professors from my undergraduate education also deserve to be mentioned:
Robert Talbert, Tim Pennings, and Fred Long. Robert oversaw much of my
undergraduate mathematics education, and was instrumental in directing me towards an
NSF Research Experience for Undergraduates (REU) at Hope College that proved to be a
catalyst in developing my independent research skills. Tim was the professor in charge of
the REU. Fred’s New Testament courses at Bethel College substantially enhanced my
ability to wrestle with complicated, multifaceted material, and communicate my
observations in a coherent fashion. The skills I learned from these three individuals
formed the foundation for my success in graduate school.

vii
Finally, I would like to thank each of the members of my dissertation committee:
Kenneth de Jong (chair), Diane Kewley-Port, Daniel Dinnsen, and Robert Port. Ken,
Diane, and Dan also served as my advisory committee during the proposal stage. It has
been a privilege to be associated with each one of these people, and the skills I have
learned from them have been invaluable. The synthesis work and GUI development in
Matlab used in this dissertation would not have been possible without Diane’s Digital
Signal Processing (DSP) course, and subsequent work in her lab. Dan’s courses in
phonology sparked my interest in the sound systems of the world’s languages, and Bob’s
willingness to oversee a summer project (and resulting ASA presentation) following my
first year gave me the confidence to succeed in all of the other projects I worked on over
the last five years. My thanks to Ken are delayed until the section on “Special”
acknowledgements.
Personal
This may be silly, but I would like to thank 92.3 WTTS for providing the
soundtrack to my dissertation, and more generally, my time in Bloomington. I have been
able discover (and rediscover) a host of fantastic music, and it was their internet
broadcast that kept me sane during the long hours I spent in the Speech Psychophysics
Lab (SPL) working on projects for Diane.
The free wireless internet and cheap coffee at Panera Bread and Copper Cup
helped me to not go insane during the writing process. Scottys Brewhouse, Nuestro

viii
Mexico, and Avers were comfort food refuges that could always make a bad day a little
bit better.
I would also like to thank all of my extended family for their encouragement
throughout the last six years. Though somewhat unnerving, it was nice to have so many
of them at my defense.
Finally, a special thanks needs to go to the congregation and staff at Evangelical
Community Church (ECC). There were some bumpy times along the road, and there
were so many people there who came alongside us during different times of need and
disappointment. In particular, Bill and Lucy James became family to us. They were there
to celebrate some of our happiest times, and they cried with us during the hardest hour of
our life.
Special
With this being an acknowledgment section, it would not be complete if I did not
explicitly acknowledge the author and perfector of my faith, Jesus Christ. This work, and
all that I am, are His; I am simply a steward.
Finally, there are two people who fall into a category all their own:
Ken de Jong was far more than simply my academic advisor and the chair of my
dissertation committee. Ken was a mentor to me in all aspects of life: marriage,
parenting, faith, etc.... Although I am happy to be finished, I am sad that the sack lunch
chats about everything under the sun are a thing of the past. No one has a sense of humor
or perspective on life quite like Ken’s. I hope that I can be 1/10th
of the
father/husband/teacher that he is.

ix
Last, but by no means least, my wife needs to be acknowledged. To provide some
context, 6 months before we were married we bought a house in South Bend to renovate
because I was going to attend Notre Dame to get a Ph.D. in Mathematics. A couple
months after closing on the house, it became clear that God was leading me away from
Mathematics and into something else. When I broke the news to her that I didn’t want to
pursue Math, but rather something in Linguistics (I had read a chapter out of an
introductory linguistics book and found it interesting), she was initially not amused.
However, she very quickly came on board, and supported me in applying to graduate
school at IU, even though I had no background in Linguistics. The following year, we
packed up and moved to Bloomington. During our time there, Lisa did whatever needed
to be done to support me in my studies. Along the way she picked up her own M.A. in
TESOL, which opened the door for our return to Bethel College. Above all things, I am
simply grateful to her for keeping me human during the last six years, which I’m sure
was not always a fun task.

x
MULTIDIMENSIONAL STOP CATEGORIZATION IN ENGLISH, SPANISH,
KOREAN, JAPANESE, AND CANADIAN FRENCH
This thesis introduces a novel approach for identifying best exemplars and individual cue sensitivities in a
multidimensional stimulus space, and uses this new method to investigate acoustic correlates of labial stop consonant
voicing across five languages: English, Spanish, Korean, Japanese, and Canadian French.
Experiment 1 reports best exemplar and cue sensitivities for each language (3 listeners per language) in a six-
dimensional stimulus space designed to examine stop contrasts. Using speech resynthesis techniques, six cues were
varied: VOT, formant transitions, F0 register, F0 initial contour, vowel amplitude contours, and burst release intensity.
Results showed differential results across languages with respect to the types of cues used, as well as general best
exemplar locations. General utilization of the stimulus space is discussed, as well as the tractability of the search
procedure. A preliminary grouping of contrasts by cue sensitivity is given.
Using a subset of the listeners who participated in Experiment 1 (12 listeners), Experiment 2 examines the
relationship between individual cue sensitivities and category contrasts in production. Frame sentences were created for
each language in order to elicit phonologically focused and non-focused labial stops. Quantifications of four of the six
acoustic cues studied in Experiment 1 (VOT, F0 register, F0 initial contour, and vowel amplitude contour) were
examined relative to the production distributions. Results showed that separability of non-focused production
distributions, as well as focus effects, were inconsistent predictors of the perceptual relevance of a given acoustic cue.
Experiment 3 presents a refined picture of best exemplar and cue sensitivities for four languages: English,
Spanish, Japanese, and Korean. A five-dimensional stimulus space similar to Experiment 1 was used; the burst
intensity dimension was removed, and the VOT and F0 initial contour dimensions were expanded. Ten native-listeners
from each language group participated. Results showed that VOT and F0 were the main perceptually relevant cues, and
that their relative importance and best exemplar locations varied gradiently across contrasts. A two-level approach for
typologizing stop voicing contrasts is proposed.
In the final chapter, possible enhancements to the search procedure, final comments concerning relevance to
linguistic theory, and future directions for research are given.
_____________________________________________
_____________________________________________
_____________________________________________
_____________________________________________

xi
Dedication iv
Acknowledgements v
CHAPTER 1 : MULTIDIMENSIONAL STIMULUS SPACES AND SPEECH PERCEPTION..........1
I. INTRODUCTION ...............................................................................................................................................1
1. Background...............................................................................................................................................1
2. Purpose.....................................................................................................................................................5
3. Importance of multiple dimensions in linguistic categorization...........................................................5
II. PREVIOUS APPROACHES FOR EXAMINING LINGUISTIC CATEGORY LOCATIONS........................................7
1. Forced-choice labeling............................................................................................................................7
2. Listener-directed search ........................................................................................................................11
3. Goodness-driven search algorithms .....................................................................................................13
III. TEST CASE: LARYNGEAL CONTRASTS......................................................................................................14
IV. DISSERTATION CONTENTS ........................................................................................................................18
CHAPTER 2 : ALGORITHM FOR FINDING MULTIDIMENSIONAL BEST EXEMPLAR
LOCATIONS (AMBEL)...................................................................................................................................19
I. INTRODUCTION .............................................................................................................................................19
II. IVERSON AND EVANS (2003)......................................................................................................................19
1. Background.............................................................................................................................................19
2. Search algorithm....................................................................................................................................20
3. Generalization issues.............................................................................................................................22
3.1. The existence of a “neutral” stimulus.............................................................................................................23
3.2. A priori knowledge of the location of production tokens within the stimulus space ..................................24
3.3. A priori knowledge of covarying stimulus dimensions.................................................................................25
3.4. Conclusions ......................................................................................................................................................26
III. ALGORITHM FOR FINDING MULTIDIMENSIONAL BEST EXEMPLAR LOCATIONS (AMBEL)...................27
1. General procedure.................................................................................................................................28
2. Implementation.......................................................................................................................................30
2.1. Key features......................................................................................................................................................30
2.2. Summary of similarities and differences between AMBEL and Iverson and Evans (2003).......................35
3. Data types produced by AMBEL...........................................................................................................36
3.1. Destination points (i.e. best exemplar locations) ...........................................................................................36
3.2. Individual cue sensitivities ..............................................................................................................................37
3.3. Algorithm tracking performance.....................................................................................................................38
4. Augmenting AMBEL: Post-test description .........................................................................................39
5. Limitations and generalizability ...........................................................................................................40
IV. EVALUATION OF AMBEL.........................................................................................................................42
1. Random goodness judgments ................................................................................................................43
1.1. Methods (random goodness judgments).........................................................................................................43
1.2. Results (random goodness judgments) ...........................................................................................................45
1.2.1 . Effect of dimension size.........................................................................................................................50
1.2.2 . Effect of initial point and number of iterations ....................................................................................50
1.3. Discussion (random goodness judgments) .....................................................................................................51
2. Equal goodness judgments ....................................................................................................................52
2.1. Methods (equal goodness judgments).............................................................................................................52
2.2. Results (equal goodness judgments)...............................................................................................................53
2.3. Discussion (equal goodness judgments).........................................................................................................56
V. CONCLUSION ...............................................................................................................................................56
CHAPTER 3 : VOICING CUES AND STIMULUS SPACE DESIGN.....................................................57
I. INTRODUCTION .............................................................................................................................................57
II. CUES SELECTED FOR STUDY ......................................................................................................................57
1. List of cues..............................................................................................................................................57
2. Rationale for specific cues.....................................................................................................................58
2.1. Gestural timing: VOT and formant transitions ..............................................................................................58

xii
2.2. Source characteristics: F0 initial contour and F0 register .............................................................................59
2.3. Intensity: Burst release amplitude and vowel amplitude contour .................................................................60
III. PREVIOUS RESULTS ...................................................................................................................................61
1. English ....................................................................................................................................................61
2. Japanese .................................................................................................................................................64
3. Latin American Spanish.........................................................................................................................64
4. Canadian French ...................................................................................................................................65
5. Korean ....................................................................................................................................................66
6. Summary tables ......................................................................................................................................68
IV. STIMULUS SPACE DESCRIPTION................................................................................................................73
1. Experiment 1...........................................................................................................................................74
1.1. VOT (19 levels)................................................................................................................................................74
1.1.1 . Prevoicing ...............................................................................................................................................74
1.1.2 . Aspiration................................................................................................................................................75
1.2. Formant transition contours (7 levels)............................................................................................................76
1.3. F0 register (7 levels) ........................................................................................................................................79
1.4. F0 initial contour (5 levels) .............................................................................................................................79
1.5. Burst release amplitude (5 levels)...................................................................................................................82
1.6. Amplitude contour of the following vowel (11 levels) .................................................................................83
2. Experiment 3...........................................................................................................................................86
2.1. VOT (25 levels)................................................................................................................................................86
2.2. Formant transition contours (7 levels)............................................................................................................87
2.3. F0 register (7 levels) ........................................................................................................................................87
2.4. F0 initial contour (11 levels) ...........................................................................................................................87
2.5. Burst release amplitude (1 level).....................................................................................................................88
2.6. Amplitude contour of the following vowel (11 levels) .................................................................................88
3. Summary of stimulus values for experiment 1 (chapter 4) and experiment 3 (chapter 6) ................89
CHAPTER 4 : EXPERIMENT 1 (PERCEPTION)......................................................................................90
I. INTRODUCTION .............................................................................................................................................90
II. METHODS ....................................................................................................................................................90
1. Subjects...................................................................................................................................................90
2. Stimuli.....................................................................................................................................................91
3. Procedure ...............................................................................................................................................92
4. Analysis...................................................................................................................................................95
4.1. Destination points ............................................................................................................................................96
4.2. Individual cue sensitivities: Sensitivity metric...............................................................................................97
4.3. Tracking performance: Derived point goodness............................................................................................99
III. RESULTS.....................................................................................................................................................99
1. Stimulus space usage .............................................................................................................................99
1.1. Destination point distribution: Gestural timing dimensions........................................................................101
1.2. Destination point distribution: Source characteristics dimensions .............................................................103
1.3. Destination point distribution: Intensity dimensions ...................................................................................105
2. Algorithm performance........................................................................................................................106
3. Language specific results ....................................................................................................................109
3.1. Dimensional sensitivity .................................................................................................................................109
3.2. Destination Points ..........................................................................................................................................112
3.2.1 . English ..................................................................................................................................................114
3.2.2 . Japanese ................................................................................................................................................116
3.2.3 . French....................................................................................................................................................118
3.2.4 . Spanish..................................................................................................................................................120
3.2.5 . Korean...................................................................................................................................................122
IV. DISCUSSION .........................................................................................................................................124
1. Stimulus space coverage and usage....................................................................................................124
2. Tractability of search procedure.........................................................................................................126
3. Language profiles ................................................................................................................................128
V. CONCLUSION ........................................................................................................................................129
1. Summary ...............................................................................................................................................129
2. Preview of chapters 5 and 6................................................................................................................130

xiii
CHAPTER 5 : EXPERIMENT 2 (PRODUCTION) ..................................................................................131
I. INTRODUCTION ...........................................................................................................................................131
1. Background: Focus variation as a diagnostic for contrastive function ...........................................131
2. Current study........................................................................................................................................133
II. METHODS ..................................................................................................................................................134
1. Subjects.................................................................................................................................................134
2. Experimental tasks...............................................................................................................................135
3. Production measurements ...................................................................................................................137
3.1. VOT ................................................................................................................................................................138
3.2. Fundamental frequency (F0) .........................................................................................................................138
3.3. Vowel amplitude contour ..............................................................................................................................139
III. RESULTS...................................................................................................................................................141
1. Measured production values: non-focus condition............................................................................141
2. Focus effects.........................................................................................................................................146
3. Relationship between separability, focus effects, and perceptual sensitivity...................................147
3.1. English ............................................................................................................................................................149
3.2. Japanese ..........................................................................................................................................................150
3.3. French .............................................................................................................................................................150
3.4. Spanish............................................................................................................................................................151
3.5. Korean.............................................................................................................................................................151
3.5.1 . Fortis vs. Lenis .....................................................................................................................................152
3.5.2 . Fortis vs. Aspirated ..............................................................................................................................152
3.5.3 . Lenis vs. Aspirated...............................................................................................................................153
IV. DISCUSSION .............................................................................................................................................154
1. Interaction between consonant voicing specifications and focus.....................................................154
2. Use of non-focus separability and focus effects as diagnostics for identifying the relevance of
acoustic cues for a given contrast...........................................................................................................155
3. Plausibility of perceptual results in chapter 4 ...................................................................................158
V. CONCLUSION .............................................................................................................................................159
CHAPTER 6 : EXPERIMENT 3 (PERCEPTION)....................................................................................160
I. INTRODUCTION ...........................................................................................................................................160
II. METHODS ..................................................................................................................................................160
1. Subjects.................................................................................................................................................160
2. Stimuli...................................................................................................................................................163
3. Procedure .............................................................................................................................................164
3.1. AMBEL ..........................................................................................................................................................165
3.2. Post-test...........................................................................................................................................................166
3.3. Changes to experiment 1 interface................................................................................................................167
4. Analysis.................................................................................................................................................168
4.1. AMBEL ..........................................................................................................................................................168
4.2. Post-test...........................................................................................................................................................169
III. RESULTS...................................................................................................................................................170
1. Individual cue sensitivity: cross-linguistic comparisons...................................................................170
1.1. VOT ................................................................................................................................................................171
1.2. Formant...........................................................................................................................................................173
1.3. Amplitude contour .........................................................................................................................................175
1.4. F0 register.......................................................................................................................................................177
1.5. F0 initial contour............................................................................................................................................179
2. Destination points: All.........................................................................................................................181
2.1. Category locations..........................................................................................................................................182
2.2. Within-language category comparisons .......................................................................................................186
2.3. Cross-language category comparisons..........................................................................................................187
2.3.1 . VOT.......................................................................................................................................................188
2.3.2 . Formant .................................................................................................................................................189
2.3.3 . Amplitude contour................................................................................................................................191
2.3.4 . F0 register .............................................................................................................................................193
2.3.5 . F0 initial contour ..................................................................................................................................195

xiv
3. Destination points: Best.......................................................................................................................196
3.1. Category locations..........................................................................................................................................196
3.2. Within-language comparisons.......................................................................................................................199
3.3. Cross-language comparisons.........................................................................................................................200
3.3.1 . VOT.......................................................................................................................................................201
3.3.2 . Formant .................................................................................................................................................202
3.3.3 . Amplitude contour................................................................................................................................204
3.3.4 . F0 register .............................................................................................................................................205
3.3.5 . F0 initial contour ..................................................................................................................................207
IV. DISCUSSION .............................................................................................................................................208
1. Difference between using all and best destination points .................................................................208
1.1. Within-language similarities/differences......................................................................................................209
1.2. Cross-linguistic similarities/differences .......................................................................................................211
2. Language groupings ............................................................................................................................215
3. Pre-voicing preference for Korean Lenis stops .................................................................................217
4. Japanese and Korean category mismatch..........................................................................................218
V. CONCLUSION .............................................................................................................................................220
CHAPTER 7 : CONCLUSION......................................................................................................................222
I. INTRODUCTION ...........................................................................................................................................222
II. METHODOLOGY.........................................................................................................................................222
1. Implementing AMBEL in spaces containing large numbers of categories ......................................222
1.1. Initial points....................................................................................................................................................223
1.2. Search vectors.................................................................................................................................................224
2. Using Method of Adjustment (MOA) tasks in conjunction with AMBEL .........................................225
III. LINGUISTIC THEORY................................................................................................................................227
IV. FUTURE RESEARCH .................................................................................................................................230
V. FINAL THOUGHTS .....................................................................................................................................232
APPENDIX A : AMBEL INSTRUCTIONS AND GUI SCREENSHOTS .............................................233
APPENDIX B : PRODUCTION PROMPTS FOR EXPERIMENT 2 (CHAPTER 5).........................254
APPENDIX C : SUPPLEMENTARY FIGURES FOR CHAPTER 6 (BEST VS. ALL
DESTINATION POINTS)..............................................................................................................................266
REFERENCES .................................................................................................................................................278

1
Chapter 1 : Multidimensional Stimulus Spaces and
Speech Perception
I.Introduction
1. Background
Since Chomsky and Halle’s Sound Pattern of English (SPE) (1968), considerable
attention has been given to understanding the mapping between language-universal
phonological features, and their instantiations as language-specific phonetic features. In
the SPE model, the same articulatorily based feature set is used to describe sounds at both
the phonological and phonetic levels:
“…the features have a phonetic function and a classificatory [phonological]
function. In their phonetic function, they are scales that admit a fixed number of
values, and they relate to independently controllable aspects of the speech event
or independent elements of perceptual representation. In their classificatory
[phonological] function they admit only two coefficients, and they fall together
with other categories that specify the idiosyncratic properties of lexical items.
(SPE, p. 298) ”
In contrast to Jakobson, Fant, and Halle (1952), this approach makes the claim that sound
systems are structured according to the articulation of segments as opposed to their
acoustics:
“…we have not adopted it here as it conflicts with our conception of phonetic
features as directly related to particular articulatory mechanisms. (SPE, p.
326, Footnote 30, emphasis added)

2
Although the same feature set is assumed for all languages, implementation variability
due to context or language is possible because phonetic features are not constrained to a
binary representation like phonological features:
“This does not mean that the phonetic features into which the phonological
features are mapped must also be binary. In fact, the phonetic features are
physical scales and may thus assume numerous coefficients, as determined by
the rules of the phonological component. However, this fact clearly has no
bearing on the binary structure of the phonological features, which, as noted, are
abstract but not arbitrary categorial markers. (SPE, p. 296, emphasis added)”
Implicit in the SPE model is the notion that even though a phonetic feature can
have different values due to language or context, the physical scale itself should not
change. This suggests that identical cross-linguistic feature contrasts should not be
implemented in a manner where the acoustic cues for the contrast result from grossly
different articulatory gestures.
In an attempt to provide a more “structured” connection between the phonological
and phonetic levels of representation in the SPE model, Keating (1984) proposed the
introduction of what she termed “phonetic categories”. Essentially, this was an attempt to
constrain the possible set of values that a phonetic feature could take on as a result of the
settings of the corresponding phonological feature. Using the feature [voice] as an
example, Keating (1984) believed that due to inherent properties of the auditory system,
as well as observations across multiple languages, that the +/- phonetic implementation of
the feature [voice] involved an opposition between two of three possibilities: {voiced},

3
{voiceless unaspirated}, and {voiceless aspirated}. In the case of [voice], the phonetic
categories generally correspond with the observations of Lisker and Abramson (1964) in
regards to Voice Onset Time (VOT). Keating (1984) leaves it an open question whether
or not [voice] is binary or ternary; however, in the case where there are more than three
stops at a place of articulation, she indicates that some feature other than [voice] must be
involved. What is important to note about Keating’s (1984) discussion of [voice] is that
in the SPE framework, glottal gesture timing, as embodied by VOT and closure duration,
has often been assumed to be the primary physical scale that is relevant to phonetic
implementation of stop voicing contrasts.
One of the points raised by Keating (1984) is that there are actually a number of
possible “acoustic correlates and articulatory mechanisms” relevant to the phonetic
categories she proposed. This raises the question of redundancy of physical scales for
signaling a feature contrast. In particular, if there are additional acoustic dimensions in
which consistent production/perception differences are correlated with a phonological
feature’s specification, are the values on each physical scale redundant with respect to
one another? If the answer is “no”, then the implicit assumption in the SPE model that
each feature is mapped onto a single physical scale does not hold. A consequence of this
failed assumption would be that the implementation of phonological features would need
to be examined simultaneously across multiple physical scales.
Looking again at stop voicing in English, the existence of trading relations
between VOT and other acoustic cues such as F1 transition, aspiration amplitude, and F0
onset frequency (Repp, 1982) support the notion of multiple physical scales being
important for a given contrast. In the particular case of F0 onset frequency, it has been

4
demonstrated that consistent production differences cannot be completely ascribed to
“involuntary consequences of voicing gestures” (Francis, Ciocca, Wong, and Chan,
2006), suggesting that speakers are able to manipulate F0 onset as a cue for initial stop
voicing. Because VOT and F0 onset can be traded against one another in English, and
given that a speaker can actively manipulate both cues, this seems to go against the
implicit mapping assumptions in SPE and Keating (1984).
In principle, since it appears to be the case that more than one non-redundant
physical scale is relevant to stop voicing in a fixed prosodic context (word-initial) in a
particular language (English), the question now becomes: how many non-redundant
phonetic scales are there for a given phonological contrast? Across languages, the
question becomes even more complicated; different languages could have gradient
differences in the utilization of a particular cue1
, or a cue that is important in one
language may be irrelevant in another. Given the close tie between phonological and
phonetic features in the SPE model, understanding how multiple physical scales interact
with one another both within and across languages becomes a key component of
developing and evaluating phonological feature sets. Since distinctive feature sets are
built around the idea of isolating primary contrasts between segments, it seems that
identifying the relevant cues for the perception of speech contrasts should provide a
clearer understanding of the structure of sound systems that cannot be obtained from
impressionistic transcription or production measurements alone.
1
In Strange, Bohn, Trent, and Nishi (2004) speakers of American English were more
sensitive to spectral differences in vowels as opposed to duration when assimilating
North German vowels. When the two cues were in conflict, the English speakers
assimilated vowels based on spectral properties, even though German heavily weights
duration.

5
2. Purpose
One great challenge for understanding human speech perception is to understand
how a complex speech signal with many dimensions of variation can be mapped onto
single differences in categories. Put another way, in a speech signal where it is possible to
analytically identify numerous consistencies, how does one know what cue (or cues) a
listener is using when perceiving a contrast? It is not difficult to compile a large
candidate list of possible cues. For example, in the case stop “voicing” in intervocalic
position, Lisker (1986) identifies 16 possible closure, pre-closure, and post-closure cues
that may influence responses of English listeners. The difficulty lies in evaluating
multiple cues simultaneously in a single perception experiment. At present, a tractable,
generalizable approach for doing this has yet to be demonstrated.
Therefore, the purpose of the current research is to provide a generalizable first-
approximation solution for the problem of examining multiple phonetic dimensions in a
single perception experiment. This is done by proposing a novel approach for finding best
exemplars of speech categories in a multidimensional stimulus space, and then testing it
using both computer simulations and human listeners. Results from a production
experiment are reported as an additional way of evaluating the performance of the
proposed search algorithm.
3. Importance of multiple dimensions in linguistic categorization
Exploring categorization in multiple acoustic dimensions is desirable for a
number of reasons. First, implementing searches in higher dimensional stimulus spaces
makes it possible to identify multiple cues for a given contrast and examine their cross-
linguistic impact. It is not guaranteed that two different languages will manifest a

6
phonologically similar contrast in a phonetically similar manner (Shimizu, 1989). By
studying categorization in the context of higher dimensional spaces it is possible to
capture differences between languages that could be muddied or lost in a lower
dimensional space.
Second, an in-principle problem with lower-ordered stimulus spaces (i.e. small
number of dimensions) is that stimuli have to be tailored to the language-specific contrast
being tested. Cross-linguistically, this means either (a) using the same stimulus space for
multiple languages, and hoping that the space properly spans the categories being
examined in all of the languages, or (b) using different stimulus spaces for different
languages. The problem with (a) is that for a given language it may be possible to elicit a
category shift in listener responses without manipulating a dimension that is normally
important for categorization. For native English listeners, Oglesbee and Kewley-Port (in
review) were able to resynthesize /ih/ - /eh/ and /ah/ - /uh/ vowel continua where only the
first or second formant was shifted, yet subjects were able to consistently label continua
endpoints with one category or the other. It is probable that many cases exist where
category shifts can be elicited without varying all of the relevant acoustic information for
a contrast. Consequently, a pair of languages could look similar in how categories are
oriented in a lower-ordered stimulus space, yet one of the languages may have an
additional cue for the contrast that is as strong, or stronger, than the one manipulated in
the stimulus set. The problem with (b) is that a direct comparison between languages is
not possible; relationships between languages have to be inferred from perception results
on different sets of stimuli. In this situation, interaction effects between critical
dimensions cannot be examined, resulting in an incomplete picture of what factors

7
influence listeners when they identify a speech sound as belonging to one category or
another.
Having higher-ordered stimulus spaces (i.e. large number of dimensions) does not
eliminate either of these problems; however, it does substantially reduce their magnitude.
Each additional dimension that can be included in a stimulus set increases the likelihood
that the stimulus space spans the relevant cues for a given contrast. This leads to being
able to use the same stimulus space to study multiple languages, thus allowing for a direct
comparison between them. Results from direct comparisons can be used in the
development of phonological typologies concerning phonemic contrast systems, as well
theories of second language acquisition.
II.Previous Approaches for Examining Linguistic Category
Locations
1.Forced-choice labeling
Typically, the primary goal in a linguistic categorization experiment is to identify
where the boundaries between two or more categories are located in a given stimulus
space. The simplest and most straightforward way to accomplish this is to employ a
forced-choice labeling task. In this task, a stimulus set is created that is designed to span
the categories being studied. The stimuli are then presented to listeners multiple times in
random order, and the listener is asked to explicitly associate each stimulus with a label
taken from a set of previously established possible responses. The logic behind this
approach is that listeners will respond with near 100% accuracy to many of the stimuli,

8
while responding probabilistically to those at the boundaries between categories. As an
example of this, Fig. 1 contains a schematized reproduction of the perception data from
Lisker and Abramson (1970) for the three-way labial stop contrast in Thai. The stimulus
space in this experiment consisted of a single varied dimension (Voice Onset Time
(VOT)) that contained 37 steps.
Figure 1: VOT perception data for Thai labial stops (schematized from Lisker and
Abramson’s (1970) Fig. 3). VOT (in ms) is given on the horizontal axis; percent
identification is given on the vertical axis.
As can be seen from Fig. 1, forced identification of a stimulus space provides a clear
picture of category locations with respect to the cue(s) being manipulated.
Although in principle there is no problem with using a forced-choice labeling task
to probe stimulus spaces containing more than one dimension, there are practical
difficulties in doing so. Kitahara’s (2001) study of pitch accent categories in Tokyo
Japanese provides an excellent example of these difficulties, and the compromises that
must be made in order to compensate for them. In Kitahara’s second experiment (pp. 50 –

9
60), a stimulus space was generated in which five variables were manipulated: voicing
degree (3 levels), H* tone (6 levels), L% tone (6 levels), alignment (10 levels), and
phrasal H (6 levels). If fully crossed, this would have resulted in a stimulus space
containing 6,480 stimuli. Kitahara was able to reduce the size of this stimulus set to 1,658
by not including tokens where certain values in one dimension (e.g. voicing degree)
would result in other dimensions being irrelevant (e.g. alignment). He also lowered the
stimulus count by not probing the corners of the stimulus space. Even with this reduction
in the number of stimuli, the space was so large that multiple judgments per stimulus in a
single experimental session were not feasible. Consequently, data had to be pooled across
listeners and/or stimulus dimensions in order to examine category boundaries.
To put the problem of stimulus space size in perspective, Table I contains
estimates of how long it would take to collect forced choice data assuming 10 stimuli per
dimension, 30 judgments per stimulus, and 2 seconds per response. These estimates
assume that constant data collection would be possible (i.e. no breaks), and that no
strategies like those employed in Kitahara (2001) would be used to eliminate portions of
the space from being presented.
# Dimensions 1 2 3 4 5
# Trials 30*10 =
300
30 * 10 * 10
= 3000
30 * 10 * 10
* 10 =
30,000
30 * 10 * 10
* 10 * 10 =
300,000
30 * 10 * 10
* 10 * 10 *
10 =
3,000,000
Total Time 10 min 1.67 hours 16.7 hours 6.94 days 69.44 days
Table I. Estimates of protocol duration for multidimensional stimulus arrays.
Based on the results in Table I, non-optimized forced-choice tasks (i.e. every stimulus in
a fully-crossed space is played the same number of times) do not appear to feasibly scale

10
upward beyond two dimensions; however, even two dimensions may not necessarily be
tractable in all cases. The time estimates in Table I assume 10 steps per dimension, which
is relatively small compared to the 37 steps in Lisker and Abramson’s (1970) VOT
continuum. Were the Lisker and Abramson (1970) stimulus set augmented even by
varying one additional dimension, the resulting space would likely be prohibitively large
from a subject-running standpoint.
In an attempt to salvage the use of a forced choice task in multidimensional
spaces, Morrison (2006) proposed a general optimization strategy for presenting stimuli.
Instead of playing each stimulus in the space the same number of times, the regions near
category centers are sampled fewer times than other parts of the space. The idea behind
this approach is that substantial portions of the space will be identified 100% of the time
as being exemplars of a single category, and that it is unnecessary to repeatedly sample
these portions of the stimulus space. The algorithm thus focuses on repeatedly sampling
the category boundaries, because that is where the most variability in listener responses
would be expected. In practice, Morrison’s method effectively reduces the number of
trials by one-third; however, even with this reduction in experimental trials, forced choice
tasks still fail to practically scale upward to most n-dimensional stimulus spaces.

11
2. Listener-directed search
A typical forced-choice task contains a number of inefficiencies. One of these is
that listener intuitions about the relatedness between different members of the stimulus
space are ignored. When a stimulus space is generated, successive steps on a given
dimension normally have an identifiable relationship to one another (i.e. each step along
a continuum may be louder, have a higher pitch, have a longer duration, etc… as
compared to the one next to it). An experimental task that randomizes and presents these
stimuli for forced-choice identification throws away this relational information, which
could be used to optimize a search for category locations. Recognizing this, Johnson,
Flemming, and Wright (1993) illustrated an alternative method2
for finding best exemplar
locations in a two-dimensional vowel space that did not involve randomized forced-
choice identification.
In Johnson et al. (1993), a two-dimensional stimulus set containing 330 stimuli
was generated by varying F1 and F2 in Klatt (1980) synthesized vowels. Instead of
presenting each stimulus individually, an interactive Graphical User Interface (GUI) was
created which allowed listeners to play any stimulus they chose from the set. The
interface consisted of a two-dimensional grid, where each square in the grid corresponded
to an element in the stimulus set, and listeners simply had to click on a square in order to
play a stimulus. The instructions for the task were very basic. Listeners were given a key
word (i.e. heed, hid, head, etc…) and instructed to search the stimulus grid for either (a)
2
They used the term “Method of Adjustment (MOA)” to describe the family of tasks that
it belonged to.

12
the best example of the vowel in the key word, or (b) the stimulus that was most similar
to how the listener would produce the vowel in the key word.3
The goal of both
instruction sets was to have listeners identify best exemplars in the stimulus space. The
search process was extremely rapid because ordering relationships between stimuli with
respect to formant frequency were maintained in the grid.4
This allowed listeners to
quickly focus in on the section of the space that sounded the most similar to the vowel in
the key word, while ignoring the other stimuli.
This method is not without its drawbacks. First, it is difficult, though not
impossible, to examine category boundaries. The particular instantiation of this approach
in Johnson et al. (1993) is limited to finding exemplars that are well within a category’s
boundaries, since listeners are asked to search for the best example of a particular vowel.
This provides useful information about how categories are generally positioned relative
to one another in a space, but it does not indicate the size and shape of the categories.
There are ways this can be overcome5
, but the solutions result is a substantial increase in
the complexity of the task, as well as the number of stimuli that would need to be played
for subjects. A second issue is one of scalability. While this method is well suited for
searching two-dimensional spaces for best exemplars, it does not scale upwards very well
to n-dimensions. For example, a three-dimensional stimulus space would require
3
Johnson et al. (1993) found that it did not matter which instruction set was used.
4
Although the ordering of the stimuli with respect to formant frequency was maintained
in the grid, the overall orientation of the grid was adjusted throughout the experiment so
that listeners had to re-explore the space on each trial.
5
One possible solution would be a tiered approach where the listener “colors in” regions
of the 2-D space with different colors to represent best, acceptable, and marginal
exemplars of a category. However, the number of tokens that are presented and judged
begins to approximate a forced-choice task that is similar to Morrison’s (2006)
optimization algorithm

13
generating multiple two-dimensional grids, where the number of grids is equal to the
smallest number of steps in the three dimensions. The problem is compounded
exponentially with each additional dimension6
.
3. Goodness-driven search algorithms
Implicit in the Johnson et al. (1993) approach is the idea that listeners make
goodness judgments about each stimulus they select in the grid, and that movement
through the stimulus space is directed by the relative goodness of neighboring stimuli.
Each listener is assumed to adopt a stimulus sampling strategy for converging on the best
exemplar that is based on the visually intuitive two-dimensional presentation of the
space. The primary drawback to extending this approach to n-dimensional stimulus
spaces is the loss of intuitive sampling strategies for simultaneously probing all of the
stimulus dimensions. As mentioned above, the most obvious remedy for this is to present
listeners with multiple unique two-dimensional grids where only two dimensions are
varied at a time; however, the practical consequences of this compromise for most
applications make this an unattractive option.
A solution to this problem that preserves most of the benefits of having a user-
directed search method, as well as increases the number of stimulus dimensions that can
be examined, is to remove from the listener the burden of having to develop a stimulus
sampling strategy. This can be done by specifying a search algorithm that presents stimuli
to listeners and asks them to rate the stimuli using a goodness rating system. These
6
Supposing four dimensions, two of which only have 5 steps, a listener would have to
work through 5*5 = 25 unique grids in order to complete one trial of the experiment.

14
goodness judgments are then used to decide the next set of stimuli to be played and rated.
After a sufficient number of trials, the algorithm terminates at a point in the stimulus
space that should be in the region of best exemplars. Iverson and Evans (2003) and Evans
and Iverson (2004) used this type of approach to find best exemplars of vowels in five-
and four-dimensional stimulus spaces respectively, using a small number of trials.
One of the difficulties encountered with this method is knowing whether the
search algorithm provided by the experimenter consistently converges on best exemplars.
A second issue is that great care must be taken to not preload the search algorithm to
converge where the experimenter wants it to. As is discussed in Chapter 2, the Iverson
and Evans (2003) algorithm potentially falls prey to the problem of preloading, in
addition to not being general enough to apply to arbitrary stimulus spaces. Specifically,
the algorithm incorporates a number of properties that are specific to the examination of
vowel qualities.
III. Test Case: Laryngeal Contrasts
A relatively large amount of work has been done with respect to studying vowel
production and perception in multiple dimensions (F1, F2, duration, etc…); however, this
has not been the case with consonants. In the particular case of word-initial stops, since
Lisker & Abramson (1964;1970), a single dimension, voice onset time (VOT), has often
been the primary acoustic cue associated with delineating categories in both production
and perception. This is not to say that other cues have not been shown to influence
“voicing” judgments. In Repp’s (1982) review of the literature with respect to trading
relations among acoustic cues, he points out that F1 transition, aspiration amplitude, and

15
F0 at vowel onset have all been shown to influence the identification of word-initial stop
consonants in English. In spite of these results though, VOT has still been regarded cross-
linguistically as the dominant cue. One of the reasons for this is that VOT production
distributions are separable in a number of the world’s languages, especially English, as
well as Dutch, Spanish, Hungarian, Tamil, Cantonese, Eastern Armenian, and Thai
(Lisker and Abramson, 1964). A second compelling reason has been that varying VOT in
perception experiments is often sufficient for eliciting category shifts across a number of
languages: English (Lisker and Abramson, 1970; Zlatin, 1974; Keating, Mikos, and
Ganong, 1981; Whalen, Abramson, Lisker, and Mody, 1993; Benki, 2005), Polish
(Keating et al., 1981), Spanish (Lisker and Abramson, 1970; Benki, 2005), Canadian
French (Caramazza et al., 1973), and Thai (Lisker and Abramson, 1970) just to name a
few. However, even given these positives, evidence exists suggesting that additional
acoustic cues should be examined in greater detail.
A primary example of a language where word-initial labial stop VOT
distributions substantially overlap is Korean. Korean has a three-way labial stop contrast
(fortis, lenis, and aspirated) where the “lenis” and “fortis” categories overlap with respect
to VOT (Lisker and Abramson, 1964; Han and Weitzman, 1970; Shimizu, 1989). Hindi, a
language with a 4-way contrast, also shows overlap between some its labial stops (Lisker
and Abramson, 1964; Benguerel and Bhatia, 1980; Shimizu, 1989). In both of these
languages the labial stop series’ are accurately perceived by native speakers, which
suggests that additional acoustic dimensions besides VOT are necessary for some of the
contrasts.

16
Motivation for examining cues other than VOT can also be found even in
languages where VOT production distributions do not overlap. In Caramazza et al.
(1973), production and perception data is reported for a group of monolingual Canadian
French speakers. When tested on a synthetic VOT continuum, the Canadian French
speakers showed a pronounced “dip” and chance performance in their identification
functions in the negative 5 to positive 10 ms range of VOT values. Thus, French
speakers respond to most of the voiceless unaspirated tokens in the stimulus set at
somewhere around a chance level, even though their productions of ‘p’ fall in this same
VOT region. This pattern, in conjunction with the close spacing between labial stop
categories in the VOT production distributions of these same listeners, suggest that some
other acoustic dimension is particularly relevant for the Canadian French labial stop
“voicing” contrast.
Being concerned with acoustic dimensions other than VOT for languages where
VOT production distributions are separable is not a novel idea. As mentioned earlier,
Lisker (1986) discussed 16 acoustic properties that could plausibly be relevant to the p-b
“voicing” distinction in the words rapid and rabid in English (a separable VOT
language). In particular, Lisker listed six possible “post-closure” cues (release burst
intensity, timing of VOT, time of F1 onset, F1 onset frequency, F1 transition duration,
and F0 contour) that could be relevant. Shimizu (1989) noted that in Japanese production
data, F0 contours on the following vowel also seemed to separate the categories, and
Abramson and Lisker (1985) observed that F0 initial value affected labial stop
categorization in English. As recently as Francis et al. (2006), the use of F0 as a cue for
aspiration in Cantonese initial stops has been explored.

17
Taken together, these perception and production studies suggest a need exists for
an expanded, systematic evaluation of the acoustic cues for stop “voicing” contrasts. This
is not a new idea:
“The fact that there are multiple cues for most phonetic contrasts has been known
for a long time…a nearly complete list of cues has been accumulated over the
years. Nevertheless, the data were typically collected by varying one cue at a
time [emphasis added], although there are some exceptions, such as Hoffman’s
(1958) heroic study that varied three cues to stop place of articulation
simultaneously. Restrictions on the size of stimulus ensembles were imposed by
the limited technology of the time, which made stimulus synthesis and test
randomization very cumbersome. With the advent of modern computer-controlled
synthesis and randomization routines, however, orthogonal variation of several
cues in a single experiment became an easy task, and the limit to the number of
stimuli was set by the patience of the listener [emphasis added] rather than by
that of the investigator. (Repp, 1982)”
Very little has changed in 26 years. The problem has not been an inability to create
multidimensional stimulus spaces; rather, the issue has been one of having experimental
tasks that scale upwards in order to handle large numbers of stimuli without terminally
taxing the patience of the listeners (see Table I). In this dissertation, I seek to provide a
substantial part of a solution to this methodological problem while revisiting the question
of what acoustic cues are used for labial stop “voicing” contrasts in five of the world’s
languages.

18
IV. Dissertation Contents
This dissertation consists of two primary components. First, a novel, generalized
multidimensional search algorithm for locating regions of best exemplars in n-
dimensional stimulus spaces is proposed and tested for inherent biases (Chapter 2).
Second, using the proposed algorithm, the “voicing” contrast between labial stops in five
languages (English, Latin-American Spanish, Seoul Korean, Tokyo Japanese, and
Quebec French) is examined in a series of perception experiments (Chapters 4 and 6;
rationale for choosing labial stops and descriptions of the stimulus spaces are given in
Chapter 3). The first perception experiment (Chapter 4) looks for best exemplars and
dimensional sensitivities in a six-dimensional stimulus space using three native-listeners
for each of the five languages given above. Most of these listeners also participated in a
production experiment (Chapter 5) that tests both the validity of the perceptual results in
the first experiment, as well as the effect of lexical focus on the stimulus dimensions that
seemed to be important for categorization. The second perception experiment (Chapter 6)
uses a refined five-dimensional version of the stimulus space employed in the first
perception experiment, as well as an augmented version of the search procedure. Ten
native-listeners each of English, Latin-American Spanish, Seoul Korean, and Tokyo
Japanese participated7
. The thesis concludes with a general discussion (Chapter 7) of
possible improvements to the search procedure, implications for linguistic theory, and
future research directions.
7
Quebec French was not examined in the second experiment due to a limited local
subject pool.

19
Chapter 2 : Algorithm for finding Multidimensional Best
Exemplar Locations (AMBEL)
I. Introduction
In this chapter, a generalized n-dimensional search method is proposed
(Algorithm for finding Multidimensional Best Exemplar Locations (AMBEL)), and
results from simulations using computer-generated goodness judgments are presented.8
Since AMBEL was inspired by the seminal work of Iverson and Evans (2003)9
, a
description of their algorithm is given in order to provide context for AMBEL. Next, the
AMBEL search process is described. Data types produced by the search method, as well
as its limitations and generalizability are discussed. Finally, AMBEL is evaluated for
inherent biases using both random and non-random goodness judgments.
II. Iverson and Evans (2003)
1. Background
Iverson and Evans (2003) took on the ambitious task of finding best exemplar
locations of vowels in a five-dimensional stimulus space containing 100,700 stimuli. The
space was created by fully crossing F1 onset, F1 offset, F2 onset, F2 offset, and duration.
Formant values during the steady-state portion of the vowels were linearly interpolated
8
Portions of this chapter appear in Oglesbee and de Jong (2007).
9
For the duration of this chapter, I will be referencing Iverson and Evans (2003), because
it provides a better description of their algorithm than does Evans and Iverson (2004).
The primary difference between the two publications was that Iverson and Evans (2003)
used a 5-dimensional stimulus space, whereas Evans and Iverson (2004) used a 4-
dimensional stimulus space.

20
between the specified onset and offset values. The large number of stimuli created by
crossing these five dimensions resulted in a forced-choice identification task not being an
option. Also, the large number of dimensions prevented the use of Johnson et al.’s (1993)
self-directed Method of Adjustment (MOA) task. Therefore, Iverson and Evans (2003)
proposed a “goodness optimization method” for locating best exemplars using a small
number of stimulus presentations.
2. Search algorithm
In the case of Iverson and Evans (2003), their search algorithm was designed to
find best exemplar locations in a five-dimensional stimulus space by presenting listeners
only 35 stimuli per vowel category being investigated. The general approach they used
involved first finding local best exemplars within one-dimensional subsets (or “search
vectors”) of the stimulus space, where no more than four stimulus variables were
covaried at a time. Then, using this information, global best exemplars were identified by
simultaneously manipulating all five of the stimulus dimensions. Listener-provided
goodness judgments were used to identify best exemplars within each search vector. In
the end, a total of seven search vectors with five stimulus presentations per search vector
were used for each category examined.
The procedure used with their stimulus set was as follows:
(1) The initial search vector (V1) consisted of the stimuli along a straight-line path
between the measured production location of the vowel being examined, and a neutral
vowel (schwa) at the center of the stimulus space. This resulted in a one-dimensional
vector where four dimensions were covaried (F1 onset, F1 offset, F2 onset, and F2

21
offset). Each endpoint of V1 was presented to the listener, and a goodness rating between
0 and 1 was elicited. Then, based on a weighted average of these two goodness ratings, a
third stimulus was selected and presented for rating. The fourth stimulus was selected by
using the goodness judgments from the first three stimuli played, and the fifth stimulus
was chosen by using the goodness judgment from the best stimulus found so far (1 thru
4), along with the goodness judgments of the closest stimuli on either side of this best
stimulus. If the search produced a poor exemplar (i.e. if the 5th
stimulus was not a good
exemplar), listeners were allowed to repeat the search. If the 5th
stimulus was acceptable,
then all of the parameters of this stimulus were “passed onto the next stage of the search
algorithm”.
(2) The second search vector (V2) varied duration, while keeping all of the other stimulus
variables constant (i.e. no covariation of other stimulus variables). The same goodness
optimization method for finding the best exemplar on V1 was used to locate the best
exemplar on V2, and all other search vectors (V3, V4, V5, V6, and V7).
(3) The third search vector (V3) varied F1 and F2 formant onset frequencies “along the
same basic path as the first vector”. The difference between V3 and V1 was that the
offset formant frequencies remained fixed at the values determined while searching V1
(i.e. two stimulus dimensions were covaried).

22
(4) The fourth search vector (V4) was “orthogonal in the F1/F2 onset space” relative to
V3, and intersected V3 where the best exemplar was located. This vector also involved
covarying two stimulus dimensions.
(5) The fifth (V5) and sixth (V6) search vectors were “analogous to [V3] and [V4],
except that offset F1 and F2 frequencies were varied”.
(6) The seventh vector (V7) “varied all dimensions, passing through the best values found
thus far on all dimensions and the neutral vowel” (i.e. five covaried dimensions).
Iverson and Evans (2003) summarized their search procedure like this:
“The search paths were chosen so that [V1] would likely get close to a best
exemplar quickly, [V2] would adjust duration, [V3] – [V6] would adjust the onset and
offset formant frequencies, and [V7] would fine-tune the selection by allowing subjects
to make their best exemplar more or less extreme in the vowel space.”
3. Generalization issues
Although Iverson and Evans (2003) make the claim that “this basic method can be
used, in principle, with a wide variety of phonetic contrasts”, it is not clear that this is the
case. The idea of using listener-provided goodness judgments to quickly navigate a
multidimensional stimulus space seems generalizable; however, their particular approach
for choosing what stimuli to present to listeners was intertwined with known properties of
the stimulus space they were probing (vowels). Also, rapid convergence of their

23
algorithm was made possible by preloading the algorithm to search in the specific region
of the space where they expected the best exemplars to be located. Specifically, their
search procedure took advantage of:
(1) The existence of a “neutral” stimulus in the space (i.e. the vowel “schwa”).
(2) A priori knowledge of where production tokens would be located in the
stimulus space.
(3) A priori knowledge of the stimulus dimensions that can be covaried to induce
category shifts.
Each of these issues is discussed in more detail below.
3.1. The existence of a “neutral” stimulus
Four of the seven search vectors in the above search method rely on the presence
of a “neutral” stimulus for orienting the search vector within the space. In the case of a
stimulus space designed to examine vowel contrasts, there is a natural candidate for the
“neutral” stimulus: schwa. However, in the case of a consonantal contrast, how would
one determine this “neutral” stimulus? In particular, in a multidimensional space that has
been tailored to examine labial stop voicing, would the “neutral” stimulus be the same for
a two-category language such as English as compared to a three-category language like
Korean? One possible solution would be to define the “neutral” stimulus to be at the
center of the stimulus space; however, there is no guarantee that the center of the stimulus
space would be “neutral” with respect to the categories being examined.10
Since the
10
If we look at just VOT, and ignore all other possible dimensions, the different
boundary locations between [b] and [p] in Japanese and English are enough to

24
selection of search vectors depends heavily on the existence of this neutral stimulus, the
genearlizability of their approach to other types of stimulus spaces is questionable.
3.2. A priori knowledge of the location of production tokens within the
stimulus space
In order to speed up the search procedure, Iverson and Evans (2003) used
production measurements to select where the initial search vector should be defined in
the stimulus space. On the surface this is a reasonable tactic for increasing the efficiency
of the algorithm; in a way, it is analogous to the idea behind Morrison’s (2006) approach
where certain areas of the stimulus space were ignored because they were identified with
100% accuracy.
However, this optimization approach raises questions in regards to how one
handles category searches across languages, or even multiple speakers of the same
language. For example, if two individuals have different production locations for a
category, a decision has to be made whether or not to tailor the search procedure to each
individual’s productions, or to select a single point that is used to define initial search
vectors for both listeners. In the case of the former, this introduces an extra variable:
unique initial search vectors. It becomes an open question whether or not differences in
convergence location can be traced to having different starting points within the space. If
the latter approach is adopted, then the initial search vector may steer one individual into
the region of best exemplars much faster than another listener, resulting in data that might
demonstrate that the center of a fixed VOT continuum would not be in the crossover
region for both languages. In the case of a continuum that was symmetric around 0 ms, if
the “neutral” stimulus was placed at the center of the dimension, it would be well-within
the [b] category of both languages.

25
not truly reflect best exemplar locations for one of the listeners. Both of these potential
problems are compounded by the fact that the Iverson and Evans (2003) algorithm is a
non-iterative method; only one goodness optimized search is done for each search vector.
Also, the preloading of the algorithm to converge to one part of the space brings
into question whether or not each search for a particular category really has a candidate
space of 100,700 stimuli; in practice, there is a much smaller stimulus set per category
due to the approaches taken to reduce the number of experimental trials. It is not likely
that the algorithm for any particular vowel actually can steer subjects to a vast majority of
the 100,000 stimuli. Unfortunately, Iverson and Evans (2003) do not present an analysis
of the general selectivity introduced into their results by their search method. Regardless,
it appears that in their attempt to speed the convergence process, the general application
of their approach may be sacrificed.
3.3. A priori knowledge of covarying stimulus dimensions
Finally, one of the largest hurdles to generalizing their approach is that all but one
of the search vectors are constructed by covarying two, four, or five stimulus variables.
The motivation for this was that simultaneous manipulation of multiple variables should
facilitate quicker movement through the space. This required them to identify what
dimensions would benefit from covariation, as well as what ratio to use11
. In the case of
11
For example, if the number of steps in two dimensions were the same, a 1:1 ratio could
be used in the situation where when one stimulus variable increased by one step, the
covaried variable would also increase (or decrease) one step. In the case of Iverson and
Evans (2003), because the search vector passed through schwa and the vowel being
studied, the angle of the search vector within the dimensions being covaried would
determine this ratio.

26
vowels, covarying F1 and F2 onset/offset is well motivated, while it is not clear that
pairing duration with another variable is well motivated. The key was that Iverson and
Evans (2003) were able to take advantage of prior knowledge about the acoustic
representation of vowels to build their algorithm. Problems arise though when faced with
a stimulus space where there are not clear choices for covarying stimulus dimensions. In
these cases, it is either (a) not known what dimensions are coupled, (b) what the trading
relation between two (or more) cues happens to be, or (c) even if a known trading relation
exists, whether or not it is language universal.
An additional consequence of covarying stimulus dimensions is that it becomes
difficult to untangle relationships between dimensions. Questions arise concerning
whether or not a listener was actually sensitive to both covarying dimensions, or just one
of them. This is particularly relevant in cross-linguistic studies where different sets of
cues are used to signal a similar contrast, which is precisely the linguistic situation that
this multidimensional search approach is designed to elucidate. This problem is
confounded even further with search vectors like V1 and V7, where four and five
dimensions respectively are simultaneously varied.
3.4. Conclusions
Overall, it appears that the specific search method proposed by Iverson and Evans
(2003) is not directly generalizable to other stimulus spaces examining non-vowel
stimuli. In particular, their method appears to be inappropriate for examining categorical
contrasts where there has not been a substantial amount of previous work that can serve
as a guide for designing the search procedure. However, there are innovations within

27
their algorithm, which when applied differently, could be used as the foundation for a
more general procedure designed to locate regions of best exemplars in multidimensional
stimulus spaces.12
Specifically, the notion of using listener provided goodness judgments
as a way of finding localized best exemplars within one-dimensional search vectors, and
then using this information to find global best exemplars, has substantial merit. From a
generalizability standpoint, the primary shortcoming of the Iverson and Evans (2003)
approach was that their method for selecting and probing search vectors was not
independent of the stimulus space being used.
III. Algorithm for finding Multidimensional Best Exemplar
Locations (AMBEL)
A key feature of the Iverson and Evans (2003) approach that is incorporated here
is the use of goodness ratings from listeners to direct the search for best exemplars.
However, unlike Iverson and Evans (2003), AMBEL is designed to always use multiple
iterations of the search process to achieve convergence. Section 1 describes a single
iteration of the general AMBEL approach in a 2-dimensional stimulus space. Section 2
contains a brief discussion of key features of the algorithm, specific implementation
details, and a summary of similarities/differences between AMBEL and the algorithm in
Iverson and Evans (2003). Section 3 discusses the types of data generated by the
AMBEL, while section 4 provides a description of a post-test for verifying best exemplar
12
Although the analysis of Iverson and Evans’ (2003) approach has been somewhat
critical, the pioneering nature of their research and its importance in laying the foundation
for this dissertation should not be overlooked.

28
locations. Finally, section 5 highlights issues related to AMBEL limitations and
generalizability.
1. General procedure
In Fig. 1, two dimensions (D1 and D2) containing 13 stimuli each are represented.
Each point in the grid corresponds to a single stimulus. Here, the search order for the
dimensions is arbitrarily D1 followed by D2. In the first step, an initial point for the
search is chosen, as well as two comparison candidates that are identical to the initial
point in D2, but have different values in D1. Thus, a range of the probed space is chosen
as a search vector. The search vector is constrained to varying D1 only, and by design
does not span the entire length of D1.
Second, the middle point of the search vector is played for the listener, and a
goodness judgment is elicited using a slider bar that is part of a graphical interface. Prior
to assigning the goodness judgment, listeners are allowed to play the stimulus as many
times as they wish.
Next, an endpoint of the search vector is chosen at random, and the listener
assigns a goodness judgment, followed by the same procedure with the opposite
endpoint. The three goodness judgments obtained from the initial point and endpoints are
then used to estimate the location of a “best derived point”13
in the search vector. This
best derived point then serves as the initial point of the next search vector, which will
probe D2. The method for obtaining initial point and endpoint goodness judgments is the
same for D2 as it was for D1.
13
This would be an example of a “local” best exemplar.

29
In order to achieve convergence, the process iterates through all of the dimensions
multiple times. In the case of Fig. 1, the “X” in the lower right panel would become the
new initial point for probing D1, if a second iteration was used. Although the above
example is given in two dimensions, the process scales upward to n-dimensions.
Figure 1. Example of a single iteration of the search algorithm procedure in a two-
dimensional space. Boxed in regions indicate current search vectors. If a second iteration
were shown, the derived “best” point in the lower right panel (X) would serve as the
initial point for searching D1.

30
2. Implementation
In the first part of this section, the important implementation details of AMBEL
are discussed in greater detail. In the second part, differences/similarities between
AMBEL and Iverson and Evans (2003) are summarized.
2.1. Key features
6 Key features:
(1) Search vectors are constrained to a single dimension
Searching a single dimension at a time removes the need for making a priori decisions
concerning relationships between stimulus dimensions. This is contrary to the Iverson
and Evans (2003) approach, and consequently it makes AMBEL well-suited for probing
stimulus spaces in which relationships between dimensions are unknown.
(2) Best exemplars in each dimension are derived from goodness judgments taken
from three points contained in the search vector (initial point, left endpoint, right
endpoint).
This differs from Iverson and Evans (2003) where five points, instead of three, were
sampled on each search vector. In principle, AMBEL allows for freedom of choice by the
experimenter with regards to how many goodness judgments are used, and how the
goodness judgments are used to derive the “best point” on each search vector. In the
experiments reported in this thesis, goodness judgments were recorded on a scale ranging
from 0.01 to 1.00. A non-zero lower bound was used to avoid division by zero when
calculating the goodness ratio. The specific formula used in the current experiments for
finding the location of the “best point” in a search vector involved taking the ratio of the

31
two highest goodness judgments and modulating this ratio by the decreasing exponential
function in Fig. 2 (solid line).
Figure 2. Decreasing exponential function used to modulate goodness ratios. The
function is X(r) = .5 * r-2.3219, where r = g1/g2 , g1 is the highest goodness judgment,
and g2 is the second-highest goodness judgment. The goodness ratio “r” is shown on the
horizontal axis, and the distance of the “best” exemplar from the stimulus associated with
goodness judgment g1 is given on the vertical axis as a proportion of the distance between
the stimuli associated with g1 and g2.
The location of the “best point” was expressed as a proportion of the distance between
the two stimuli with the highest goodness judgments as measured from the highest rated
stimulus. Modulating the goodness judgment ratio by the decreasing exponential in Fig. 2
(solid line) has the advantage of moving the choice of the “best” exemplar towards the
stimulus with the highest goodness judgment further than would be true with a simple

32
ratio (see dotted curve in Fig. 2). For example, if a simple ratio was being used and the
goodness ratings were in a 2:1 relationship, the location of the derived “best” point would
be 33% of the distance from the highest rated point to the second highest rated point.
However, using the decreasing exponential function in Fig. 2 to modulate the ratio, the
derived “best” point is located at 10% of the distance. This weighting biases the selection
of the derived point towards a location that the listener has already identified as being the
best of the three stimuli that have been presented, and it also allows rapid movement
away from the center of the space.
(3) The estimate used for eliciting goodness judgments is retained between
stimulus presentations.
The slider bar is not reset at the beginning of each stimulus presentation. Not resetting
the slider bar between goodness judgments allows for consecutively played stimuli to be
judged relative to one another. An additional consequence of this is that the search
process generates two additional pieces of data that are quite valuable. First, since the
search vector endpoints are always played consecutively within a dimension, the listener
is directly comparing these endpoints. Movement in the slider bar after presentation of
the second endpoint indicates the listener’s preference for changes in a given
dimension.14
These search vector endpoint preferences provide useful data concerning
listeners’ tendency to respond to differences in a particular dimension, as well as the
general region of a dimension where best exemplars are likely located. Second, the
14
Actually, movement in the slider bar after any two stimulus presentations in the same
dimension would indicate a sensitivity; however, it is reasonable to assume that if there
was movement in the slider bar between the presentation of the middle of the search
vector and an endpoint that there would be movement in the slider bar when the
endpoints are presented.

33
goodness of the derived “best point” can be directly compared to the stimulus that
immediately preceded it (one of the search vector endpoints). If the algorithm for
choosing derived “best points” is consistently steering the search into regions containing
better exemplars, the derived “best point” will have a goodness value that is the same or
better than the point that preceded it. This can be used as a post hoc test of the search
algorithm’s performance.
(4) Only a subset of a stimulus dimension is searched at a given time.
A search vector should be large enough that there are substantial acoustic differences
between some of the stimuli, yet small enough to allow a large set of possible stimuli to
be played during the course of multiple iterations.15
The rationale for former is to avoid
ceiling effects within the goodness rating system; the latter encourages movement
through the space. The goal is that one of the endpoints will be noticeably worse than the
other stimuli, forcing listeners to recalibrate their use of the slider bar by moving it back
down (see Appendix A for screen shots of the interface used in the experiments reported
in chapters 4 and 6). In all simulations and experiments reported in this thesis, search
vectors were nominally constrained to a maximum length of 2/3 of a stimulus
dimension.16
This contrasts with Iverson and Evans (2003) where search vectors spanned
an entire dimension. In cases where the initial point was near an edge of the space, the
side of the search vector closest to the edge of the space was truncated.
15
This is not a contradiction. If a search vector is too large, there will be a very limited
set of endpoints played because the vector will always be bumping up against the edge of
the space. Smaller search vectors increase the likelihood that more steps on a particular
dimension will be sampled at least once.
16
Sometimes the actual percentage was higher due to rounding effects. For example, in
the case of dimensions containing only 5 stimuli, the search vector spanned the entire
length of the dimension.

34
(5) All dimensions are probed before a dimension is probed again.
Every dimension is “tuned” before a particular dimension is tested again. In principle, a
fixed or random ordering of dimensions within an iteration can be implemented;
however, in all experiments reported here, a fixed ordering was used.
(6) Multiple iterations of the search process are used to achieve convergence
The primary principle behind the proposed search method is that incremental progress
made in each individual dimension aids global convergence on a best exemplar. Iverson
and Evans (2003) rely on their algorithm to converge in a single iteration. In the current
approach, by iterating through all of the dimensions multiple times, listeners are given
multiple opportunities to steer the algorithm into the proper portion of the space. As is
discussed below, the iterative nature of AMBEL provides more data than just best
exemplar locations; the tracking process of the algorithm identifies listener sensitivity to
specific cues, which aids in the analysis and validity of best exemplar locations.

35
2.2. Summary of similarities and differences between AMBEL and
Iverson and Evans (2003)
Iverson and Evans (2003) AMBEL
# Stimuli per
search vector
5 3
# Search
vectors
7 (For a five-dimensional space) # Dimensions * # Iterations
Search vectors
covary
stimulus
variables?
Yes (all but one search vector) No
# Iterations 1 Defined by experimenter
Method for
calculating
“local” best
exemplars
A minimization algorithm using
absolute goodness judgments had
three components. (1) Weighted
averages of the first 2 stimuli
played would be used to select the
third stimulus. (2) The minima of
a parabola passing through the
goodness judgments of these three
points were then used to select the
fourth point. (3) The fifth point
was selected by doing the
parabolic minimization through
the point with the highest
goodness judgment thus far, and
one point on either side of this
highest rated point. This fifth
point was the local best exemplar
on the search vector.
Relative goodness judgments are
elicited using a slider bar that is
not reset between stimulus
presentations. The ratio of the two
highest goodness judgments is
modulated by a decreasing
exponential function to weight the
selection of the local best
exemplar towards the location of
the stimulus that was rated the
highest by the listener.
# Stimulus
presentations
per category
examined
35 (for a five-dimensional space) 3 * # dimensions * # iterations
Table I. Summary of comparison between Iverson and Evans (2003) and AMBEL.
An important difference between the two algorithms is that AMBEL typically
requires more trials per category being examined. Assuming a five-dimensional stimulus

Oglesbee_DefenseVersion-3

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to Oglesbee_DefenseVersion-3

Similar to Oglesbee_DefenseVersion-3 (20)

Oglesbee_DefenseVersion-3