1. PLIN3104 - Advanced Phonological Theory B Robert Foster
University College London
1
Modelling variation in the form of English comparatives and superlatives
Robert Foster
University College London
1. Introduction
The comparative and superlative forms of English adjectives may be produced in two ways.
Comparison can be denoted through attachment of the -er/-est suffix to an adjective, or by using the
periphrastic form more/most. While for certain adjectives one form is greatly preferred, most
adjectives exhibit some degree of free variation in their comparative/superlative form. This paper
aims to identify the phonological factors that influence the alternation through inspection of the
relative frequencies of the two competing forms in a database of English adjectives. By expressing
these generalisations as grammatical constraints, we are able to use two statistical models (MaxEnt
and regression trees) to provide insights into how a grammar that gives rise to variation may be
represented in the mind of the speaker.
2. Which factors play a role in determining the form of the comparative/superlative?
Speakers have strong intuitions as to whether certain adjectives should be modified though the
-er/-est suffix or through the use of more/most. These intuitions are directly reflected in the relative
frequencies of the morphological and periphrastic forms in spoken and written English. For example,
more tragic is far more frequent in speech and text corpora than its morphologically inflected
counterpart tragicer, whereas biggest is much better represented than most big. For some
adjectives, no clear preferences emerge and both forms receive similar levels of usage e.g.
cleverest/most clever, vainer/more vain. Several authors have attempted to identify generalisations
that capture speakers’ intuitions on the form of inflected adjectives.
Quirk et al. (1985) use the number of syllables in the adjective’s positive form as the main diagnostic
for the alteration. Monosyllables tend to prefer the morphological form (oldest > most old), whereas
words of three syllables or more strongly prefer the periphrastic form (most difficult > difficultest).
Words of two syllables exhibit free variation (stuffier ≈ more stuffy). While this generalisation is
broadly true, the authors state numerous counterexamples. These include the adjectives apt and
prone which prefer to take most, and unhappy which frequently takes the -er/-est suffix. This
suggests that syllable length is not the only factor which influences the alteration. Quirk et al. outline
several other diagnostics, summarised below:
Disyllabic adjectives ending in an unstressed vowel or a syllabic consonant tend to prefer the
morphological form. Unstressed vowels include /əʊ/ as in narrow, /i/ as in silly, and /ɚ/ as in
tender.
Adjectives such as scared and caring which are formed from verb participles are almost
always modified with more.
Among monosyllables, frequently occurring adjectives such as easy and large take the
morphological form more readily than rare adjectives such as wry and pert.
2. PLIN3104 - Advanced Phonological Theory B Robert Foster
University College London
2
Leech & Culpeper (1997) analyse the relative frequencies of morphological and periphrastic forms of
adjectives found in the British National Corpus. In addition to the generalisations observed above,
the authors identify other phonological and syntactic factors which influence the alteration:
Disyllabic adjectives with initial stress such as quiet and stupid take the morphological form
more frequently than those with word-final stress such as remote and absurd
Comparatives are more likely to be expressed with the more form when they have a
predicative function than when they have an attributive function. For example, the
periphrastic form is better represented in 1a) than in 1b):
1a) This bar is a livelier/?more lively place than the old bar.
1b) The bars in this town are a lot more lively/?livelier than I’m used to.
There is a tendency for comparative/superlative adjectives in parallel syntactic structures to
be of the same form, as illustrated in 2) and 3):
2) the wildest, riskiest/?most risky, craziest stunt in the world
3) The more irritated he gets, the more angry/?angrier he becomes.
Mondorf (2003) argues that the underlying factor responsible for the alteration is the cognitive
complexity of comparatives/superlatives and the environment in which they occur. She identifies 21
determinants for ‘more-support’ (i.e. the periphrastic form) which are used to illustrate her
hypothesis that -er/-est forms favour environments that are simple to process. This hypothesis goes
beyond the scope of papers such as Quirk et al. (1985) and Leech & Culpeper (1997) which simply
aim to observe the determinants that play a role in adjectival form, and not necessarily explain why
these patterns should be the case. Some of the determinants that are original to Mondorf’s paper
are outlined below:
Adjectives that end in consonant clusters tend to take the periphrastic form (most apt >
aptest)
Adjectives that are morphologically complex tend to take the periphrastic form (more selfish
> selfisher, more famous > famouser)
Adjectives that have a high frequency ratio of inflected form to uninflected form take
-er/-est more often than adjectives that are rarely made into comparatives/superlatives.
This suggests a semantic factor concerning the gradability of adjectives; less gradable
adjectives such as real and dead frequently require more-support
One aspect that all of these papers have in common is that they treat adjectives as belonging to one
of three categories: i) those that always take the morphological form, ii) those that always take the
periphrastic form, and iii) those that give rise to variation. As we shall see, the distinction between
these categories is not always so clear, and the alteration is observed to some extent for almost
every adjective in the data set. This suggests that grammaticality across all comparative/superlative
forms is gradient in nature, rather than being absolute for certain adjectives and variable for others.
In the following sections, we formalise some of these generalisations as grammatical constraints and
verify whether they are indeed influential in determining the form of comparatives/superlatives. In
order to do this, we observe the relative frequencies of morphological/periphrastic forms in a
database of adjectives for both comparatives and superlatives. If the individual constraints are
statistically true of the data set, they can be used as the targets of learning in grammatical models.
3. PLIN3104 - Advanced Phonological Theory B Robert Foster
University College London
3
3. Data and Methodology
Our data set consists of 500 of the most frequent English adjectives. For each item, we conducted
four Google searches (www.google.com) using the templates “x-er than”, “more x than”, “the x-est”
and “the most x” and tabulated the number of hits returned for each search. By comparing the
number of hits for the two competing comparative and superlative forms, we obtain a reliable
estimate of the relative frequencies of these forms for a wide array of adjectives.
The main advantage of using the above templates instead of simply searching the bare
comparative/superlative is to reduce the inherent syntactic ambiguity of the strings “more x” and
“most x”. If these templates were not used, our search would return hits for pages containing
phrases such as Most big companies donate some of their profits to charity, where most is modifying
the entire noun phrase big companies rather than just the adjective. This would lead to the
periphrastic forms being overrepresented in the data set. Even when using the templates, some
ambiguity remains; the search will still return false positives for constructions where the
morphological form is impossible such as more sad than angry and the most Big Macs (compare
*sadder than angry and *the Biggest Macs). However, these constructions are relatively uncommon.
Several steps were taken to identify any problematic adjectives for data collection. These words
were removed and replaced with the next most frequent adjective. Such problems include:
the adjective having a homograph that is a mass noun. Some examples are fat, fun, and
light: “thin glass lets in the most light” is problematic for this reason.
the adjective being ungradable e.g. solar, financial. This would limit search results to the
unwanted syntactic constructions mentioned above.
One advantage of using Google hits to estimate frequency is that language use found in websites
may be more representative of actual usage than data obtained from text corpora. This is because
text corpora such as the Brown Corpus are usually compiled from publications that are edited to
reflect the highest standards of ‘grammatical’ English. This means that non-standard adjectival
forms, which are occasionally produced in less formal environments, may be underrepresented.
Figure 1 displays the number of Google hits (in thousands) for 11 adjectives in each of the four
templates. The figures in bold represent the percentage of hits for morphologically formed
comparatives and superlatives. This is a small subsection of the data for illustrative purposes. The
entire data set containing frequency figures for 500 adjectives can be found in the Appendix.
4. PLIN3104 - Advanced Phonological Theory B Robert Foster
University College London
4
Figure 1 Table of raw and proportional frequencies for a small sample of adjectives
Adjective
x-er
(K)
more x
(K) %er
x-est
(K)
most x
(K) %est
NOISY 420 89.1 82.50% 401 51 88.72%
STRICT 509 192 72.61% 5390 269 95.25%
DISCREET 0.815 64.8 1.24% 4.05 171 2.31%
IMMENSE 0.28 107 0.26% 3.15 46.4 6.36%
TOUGH 5130 46.5 99.10% 23600 158 99.33%
FRESH 542 53 91.09% 16800 297 98.26%
POLITE 13.7 141 8.86% 145 408 26.22%
STUFFY 26.7 129 17.15% 16.2 71.5 18.47%
LONELY 164 80.8 66.99% 618 117 84.08%
GLORIOUS 0.13 181 0.07% 0.844 563 0.15%
DEAR 382 45.9 89.27% 455 314 59.17%
... ...
4. Forming constraints
From the generalisations discussed above, we formulate grammatical constraints on the output form
of comparatives and superlatives. The analysis consists of 6 constraints on the morphological form as
well as one universal constraint on the periphrastic form. As the primary focus of this paper is
phonological, the analysis does not incorporate any syntactic or semantic factors proposed in the
literature.
Constraints 1, 2 and 3 concern the syllable structure and stress pattern of the morphological form.
Constraints 4, 5 and 6 refer to the phono- and morphotactics of the morphological form. For each
constraint we performed a paired two-tailed t-test to show that the constraint is statistically
significant for both the -er and -est data sets (p <0.05)
Constraint 1- * σ σ σ [+COMP]
The morphological form must not contain three or more syllables
This constraint aims to capture the contrast between short words and medium/long words, e.g.
bigger and valider. Note that this constraint (as well as all of the others) applies to the candidate
-er/-est form and not the base adjective. This means that words such as humbler and littlest, which
are formed from words ending in a syllabic consonant, do not violate the constraint due to the
desyllabification of /l/ after suffixation. As a result, their inflected forms do not contain three or
more syllables. p-value <0.001 for both data sets.
Constraint 2- * σ σ σ σ [+COMP]
The morphological form must not contain four or more syllables
This constraint aims to capture the contrast between medium and long words, e.g. stupidest and
difficultest. Like the previous constraint, candidates that undergo resyllabification after suffixation
such as unstabler satisfy this constraint. p-value <0.001 for both data sets.
5. PLIN3104 - Advanced Phonological Theory B Robert Foster
University College London
5
Constraint 3- * σ ‘σ σ [+COMP]
The penultimate syllable of trisyllabic (or longer) morphological forms must not be stressed
This constraint is derived from Leech and Culpeper’s (1997) observation that adjectives such as
profounder and naivest are less well formed than adjectives with initial stress such as quieter and
stupidest. p-value <0.001 for both data sets.
Constraint 4- * [-reduced][+COMP]
The comparative/superlative morpheme must not be preceded by any segment that is not a reduced
vowel
This highly restrictive constraint is inspired from Quirk et al.’s observation that the morphological
form can be readily applied to adjectives that end in /əʊ/ , /i/ and /ɚ/. Conversely, it can be
construed that the morphological form disprefers final segments that are not /əʊ/ , /i/ or/ɚ/.These
three phonemes can be referred to as reduced vowels, and are characterised by their laxness, short
duration and central position. p-value <0.001 for both data sets.
Constraint 5- * C C [+COMP]
The comparative/superlative morpheme must not be preceded by two or more consonants
This constraint is motivated by Mondorf’s (2003) claim that adjectives ending in consonant clusters
such as apt tend to prefer the periphrastic form. When determining which words contained clusters,
we only considered non-rhotic pronunciations where /r/ in coda position is unpronounced.
p-value= 0.026 for -er, p-value= 0.004 for -est.
Constraint 6- * [SUFFIX][+COMP]
The comparative/superlative morpheme must not be preceded by a suffix
This is a simplified version of Mondorf’s (2003) hypothesis that morphologically complex adjectives
are less likely to take -er/-est. Through manual inspection, the suffixes we identify as reluctant to
take -er/-est are as follows: -able,-al,-ant,-ed,-en,-ent,-ful,-ible,-ic,-ing,-ish,-ive,-less,-ous,-some.
p-value <0.001 for both data sets.
Constraint 7- * more/most ADJ
Comparatives/superlatives must not be formed through the use of more/most
This constraint represents the inherent bias against the periphrastic form, capturing the intuition
that this form is only licensed when morphological form is sufficiently ungrammatical. Without this
constraint, the periphrastic form would be the preferred candidate for every input. Of course, it is
conceivable that the reverse situation is true- that there is an inherent bias against the
morphological form and several well-formedness constraints on the periphrastic form. However, this
account would not be very insightful from a phonetic standpoint. Assuming that phonological
constraints are generally phonetically motivated, it is more intuitive to think of the novel word forms
as the ones which create new, potentially ungrammatical phonological environments.
6. PLIN3104 - Advanced Phonological Theory B Robert Foster
University College London
6
Figure 2 displays a small sample of the full data set of 500 adjectives. It states the relative
frequencies of the morphological form of 12 comparatives and superlatives, as well as listing the
constraints that each adjective violates.
Figure 2 Constraint violations for 12 adjectives
Adjective %-er %-est * σ σ σ * σ σ σ σ * σ ‘σ σ * [-reduced] * C C * [SUFFIX]
ORDINARY 0.5% 0.3% * *
IGNORANT 0.3% 0.8% * * * * *
DIFFICULT 0.5% 0.0% * * * *
STEADY 71.5% 70.1% *
HOT 99.5% 99.6% *
POLITE 8.9% 26.2% * * *
SUDDEN 11.6% 9.0% * * *
HONEST 7.6% 3.1% * * *
PROFOUND 6.8% 6.5% * * * *
ASHAMED 0.2% 0.0% * * * * *
VAST 66.6% 61.2% * *
SCARED 14.4% 2.1% * *
... ... ...
These data are used as input to the statistical models MaxEnt and regression trees. In the next
section, we explain how these models can be used to construct a grammar that is representative of
the data set.
5. The Maximum Entropy model
The Maximum Entropy (MaxEnt) model provides a principled, objective means of simulating a
constraint-based grammar. The MaxEnt model has frequently been applied to phonological data
sets, as in Keller (2000), Goldwater & Johnson (2003) and Hayes & Wilson (2008). One notable
advantage of the model is that the output is probabilistic, meaning it is easily applicable to
grammars in which there is variation. The goal of the MaxEnt framework is to maximise the
probability of the observed data while maintaining maximum entropy of the model- i.e. it aims to
capture as many generalisations as possible from the data without making additional assumptions.
In the MaxEnt model, each constraint is assigned a non-negative weight. These weights are learned
algorithmically through maximising the objective function below:
In order to do this, the model compares all possible sets of weights and settles on the grammar that
maximises the likelihood of the observed data. The probability of each output candidate can be
calculated as follows:
7. PLIN3104 - Advanced Phonological Theory B Robert Foster
University College London
7
1. Take the dot product of the constraint violations for each candidate. In our data set, a
constraint can only be violated once, so the dot product for a particular candidate is just the
summed weights of all violated constraints.
2. Find the maxent value by raising e to the negative power of the dot product.
3. Divide this maxent value by the total value for all candidates (in our model, there are only
two candidates). The resulting figure is the expected probability of that candidate.
Figure 3 displays a small sample of results obtained from applying the MaxEnt model to our -er data
set. The predicted values for all 500 adjectives for both -er and -est forms can be found in the
Appendix:
Figure 3 MaxEnt predictions for 4 adjectives
* more * σ σ σ * [-reduced] * σ σ σ σ * [SUFFIX] * σ ‘σ σ * C C
Input Candidate
Dot
Product MaxEnt 4.109 3.419 2.830 2.326 2.324 0.935 0.340 Predicted Observed
POISONOUS poisonouser 10.898 0.000 * * * * 0.1% 0.2%
more poisonous 4.109 0.016 * 99.9% 99.8%
SCARY scarier 3.419 0.033 * 66.6% 72.6%
more scary 4.109 0.016 * 33.4% 27.4%
VAST vaster 3.169 0.042 * * 71.9% 66.6%
more vast 4.109 0.016 * 28.1% 33.4%
POLITE politer 7.183 0.001 * * * 4.4% 8.9%
more polite 4.109 0.016 * 95.6% 91.1%
... ... ...
These results allow us to quantify the influence of each phonological factor in determining the form
of comparatives and superlatives. The weights of each constraint for both the -er and -est data sets
are plotted below.
Figure 4 Constraint weights for -er and -est
-er
-est
8. PLIN3104 - Advanced Phonological Theory B Robert Foster
University College London
8
From these graphs, we see that the most influential constraint on the morphological form is the
constraint on trisyllabic or longer comparatives/superlatives. This corroborates Quirk et al.’s
proposal that the primary cause of the alteration is word length. The constraints on morphological
forms containing four syllables, a suffix or a final segment that is not an unreduced vowel also play a
significant role in determining output form, receiving similar weights in both data sets. The
remaining two constraints are less influential, particularly the consonant cluster constraint which
only has a marginal effect on the frequency of the morphological form. These findings seem to
contradict the assertion of Mondorf (2003) that adjectives ending in consonant clusters greatly
prefer more-support. No single constraint outweighs the inherent bias against the periphrastic form,
although because the constraints *σ σ σ σ and * σ‘σ σ only ever occur in tandem with * σ σ σ, we can
conclude that having four syllables or penultimate stress alone is enough to make the morphological
form dispreferred.
6. Modelling with regression trees
In this section, we use a second model to analyse the data set– regression trees. This model
estimates the observed frequency of inputs by observing generalisations in the data set and
expressing them as a network of binary decisions. The algorithm is outlined below:
1. All samples start at the top of the tree (in our case, we use morphological forms as samples)
2. At each branch, the algorithm looks to find the variable that splits the remaining data in the
‘best’ possible way. Specifically, it tests the null hypothesis of independence between each
variable and the response. Then it selects the variable with the strongest association to the
response (lowest p-value) and splits on it.
3. Branches stop splitting if the p-values for all variables exceed a certain stopping criterion.
4. The prediction at each leaf node is simply the average observed frequency for all samples at
that node.
If we set the stopping criterion for the p-value to be 0, the algorithm will always split- i.e. the tree
will grow to maximum size. The resulting tree for the -er data set is shown below:
Figure 5 Overfitted tree for –er
9. PLIN3104 - Advanced Phonological Theory B Robert Foster
University College London
9
While this tree is extremely faithful to the data set, many of the ‘rules’ it has learnt are tenuous and
only relevant to the data it was trained on. This makes it a poor predictor of the behaviour of new
inputs, and unlikely to be representative of a speaker’s grammar. We see that most of the early
splits have a very low p-value, indicating that there is an excellent chance that a real difference exists
between the two outcomes. However, at nodes 2,31,15,26 and 29, the p-value is high, suggesting
that there is no significant difference between the two outcomes above what would be expected
from random noise. Therefore, if we set a p-value threshold of 0.01, the resulting trees for -er and
-est capture only the most important generalisations, allowing them to be much better predictors of
new input: (>0 indicates violation, ≤0 indicates non-violation)
Figure 6 Master trees for -er and -est
10. PLIN3104 - Advanced Phonological Theory B Robert Foster
University College London
10
The strength of regression trees is that they are highly interpretable and that they capture non-
linear relationships between constraints and observed frequencies. For example, compare the values
of the leaf nodes in the following simplified version of the -er tree, in which the algorithm decides to
split on the * [-reduced] constraint on both sides of the tree:
Figure 7 Simplified tree for -er
The impact of violating * [-reduced] in the left-hand branch (unsuffixed, and at least trisyllabic) is
much greater than the impact of violating * [-reduced] in the right-hand branch (suffixed disyllables)
This kind of variable impact could not be captured by the MaxEnt methodology, in which a
constraint has the same weight no matter which other constraints a particular candidate violates. In
this example, the impact of capturing non-linearity is negligible (only 2 words fall in the left branch
of node 8), but in a larger scale analysis with more features, capturing non-linearity may be crucial to
the success of the model.
7. Evaluating the two models
To assess the performance of the MaxEnt and regression tree models, we conduct a 5-fold cross
validation, in which the models are trained on 80% of the data and asked to predict the relative
frequencies of the remaining 20%. This is repeated for every fifth of the data set to simulate
exposure to completely unseen adjectives. The table below displays the R2
value and root mean
squared error of the two models for both the original and cross-validated data sets:
11. PLIN3104 - Advanced Phonological Theory B Robert Foster
University College London
11
Figure 8 Table of R2
and RMSE values for original and cross-validated data sets
Original Cross-Validated
MaxEnt Reg. Tree MaxEnt Reg. Tree
R² -er 0.742 0.743 0.713 0.734
R² -est 0.832 0.834 0.807 0.827
RMSE -er 0.205 0.205 0.217 0.209
RMSE -est 0.173 0.172 0.185 0.174
The values in the left-hand column show that when trained and tested on the entire data set, both
models perform equally well. The high R2
and low RMSE values indicate a strong correspondence
between the observed and predicted frequencies. However, after cross-validation, the regression
tree model performs better than the MaxEnt model. This suggests that the relationship between
constraints and observed frequencies is non-linear.
The main cause of error in the models appears to be the large amount of variation in monosyllabic
adjectives, as well as in disyllables that end in -y. For example, easy, tiny and heavy take -er/-est
almost 100% of the time, compared to <25% for nosy, hasty and stuffy. The models cannot make
different predictions for these adjectives because they all violate the same phonological constraints.
This suggests that the relative frequency of the morphological form is not determined by phonology
alone, and that variation among these words may instead be attributed to lexical or semantic
properties such as raw frequency or gradability as proposed in the literature.
8. Summary
We have shown that the phonology of an English adjective influences whether its
comparative/superlative is formed with -er/-est or more/most. Phonological constraints can be
learned by the MaxEnt and regression tree models in order to capably simulate a grammar that
captures variation in the relative frequency of the morphological and periphrastic forms. The
grammars produced by the models allow us to draw the following conclusions:
The number of syllables is the most influential factor in determining surface form
Adjectives that end in a stressed syllable, a suffix, or any segment that is not an unreduced
vowel are much less likely to take the morphological form
Adjectives that end in consonant clusters only slightly prefer the periphrastic form
The relationship between phonological factors and observed frequency is likely to be
non-linear
Phonological constraints alone are not able to capture all of the variation observed for
certain kinds of adjective, particularly monosyllables and disyllables that end in -y. This
suggests that lexical or semantic factors may also play a role in determining output form.
3990 words (discounting appendix and bibliography)
12. PLIN3104 - Advanced Phonological Theory B Robert Foster
University College London
12
Bibliography and References
Goldwater, S & Johnson, M (2003)
Learning OT constraint rankings using a maximum entropy model. Proceedings of the Stockholm
Workshop on Variation within Optimality Theory, ed. by Jennifer Spenader; Anders Eriksson, and
Osten Dahl, 111–120. Stockholm: Stockholm University Department of Linguistics.
Hayes, B & Wilson, C (2008)
A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry 39: 379-440.
Hilpert, M (2008)
English Language and Linguistics 12.3: 395–417.Cambridge University Press
Keller F, (2000)
Gradience in grammar: Experimental and computational aspects of degrees of gramaticality.
Ph.D.thesis,Univ.of Edinburgh
Leech, G & Culpeper, J (1997)
The Comparison of Adjectives in Recent British English. In: Terttu Nevalainen and Leena Kahlas-
Tarkka (eds.) To Explain the Present: Studies in the Changing English Language in Honour of Matti
Rissanen (Mémoires de la Société Néophilologique de Helsinki 52), pp. 353–373. Helsinki: Société
Néophilologique.
Mondorf, B (2003)
Support for more -support. In: Günter Rohdenburg and Britta Mondorf (eds.). Determinants of
Grammatical Variation in English (Topics in English Linguistics 43), pp. 251–304. Berlin: Mouton de
Gruyter.
Quirk, R, Greenbaum, S, Leech, G & Svartvik, J (1985)
A Comprehensive Grammar of the English Language. London: Longman.