SlideShare a Scribd company logo
1 of 5
Download to read offline
Phonaesthemes: A corpora-based analysis
                                            Katya Otis (kotis@northwestern.edu)
                                         Department of Psychology, Northwestern University
                                           2029 Sheridan Road, Evanston, IL 60208 USA

                                            Eyal Sagi (ermon@northwestern.edu)
                                         Department of Psychology, Northwestern University
                                           2029 Sheridan Road, Evanston, IL 60208 USA

                           Abstract                                in separating lexical categories, a construct that necessarily
                                                                   includes some syntactic features and some semantic features
  The association between sound and meaning is commonly
  thought of as symbolic and arbitrary. While this appears to be   (Monaghan et al., 2005).1 Recent research indicates that
  mostly correct, there is some evidence that specific phonetic    systematic sound-meaning and sound-syntax relationships
  groupings can be indicative of word meaning. In this paper       play a role in language processing (Hutchins, 1998; Bergen,
  we present a corpus-based method that can be used to test        2004; Farmer, Christiansen, & Monaghan, 2006), and may
  whether such an association exists in a given corpus for a       also be important to language learning (Monaghan et al.,
  specified phonetic grouping. The results we obtain using this    2005).
  method are compared with other empirical findings in the
  field and its implications are discussed.
                                                                      To the degree that it differs from adult-directed speech,
                                                                   child-directed speech should be sensitive to the child‟s
  Keywords: Corpus analysis, Computational linguistics,            status as a language learner. Monaghan et al. (2005) tested
  Phonaesthemes, Phonetics, Psycholinguistics, Sound-              adult speech from the CHILDES corpus for the presence of
  Meaning association.
                                                                   16 phonological cues in open- and closed-class words and
                                                                   for their diagnosticity in determining whether a word is a
                         Introduction                              noun or a verb. Significantly diagnostic cues to the
   It is a popular intuition that words with similar sounds        noun/verb distinction were: syllable length, onset and
also mean similar things. There is a long tradition of belief      syllabic complexity, syllable reduction, -ed inflection
in the association between phonetic clusters and semantic          (voiced or unvoiced vowel), vowel position, and vowel
clusters going back at least as far as Wallis‟ grammar of          height. Furthermore, in an experiment on artificial language
English (Wallis, 1699). Morphemes form one such well-              learning of bigrams, they found that participants used
known cluster, but other, sub-morphemic phonetic clusters          phonological cues when distributional cues were weak or
that contribute to the meaning of the word as a whole have         absent. Since grammatical categories designate not only
also been hypothesized (Firth, 1930; Jakobsen & Waugh,             syntactically disparate words but also semantically disparate
1979). Anthropologists have documented sound symbolism             words, this might indicate that sound-meaning
in many languages (Blust, 2003; Nuckolls, 1999;                    correspondences are a boon to language learners, especially
Ramachandran & Hubbard, 2001), but its role as a purely            in low-frequency cases.
linguistic phenomenon is still unclear. Moreover, the                 Farmer, Christiansen, and Monaghan (2006) expanded the
Saussurean notion of the arbitrary relationship between the        research on phonological diagnostics for lexical category
sign‟s form and its referent is a matter of dogma for most         membership begun in Monaghan et al. (2005). They
linguists (Hockett, 1960). This makes the study of words           performed a regression analysis on over 3,000 monosyllabic
that do participate in predictable sound-meaning mappings          English words that significantly associated certain
all the more important, since it is difficult to explain how       phonological features with an unambiguous interpretation as
these patterns come to be, or why they might survive despite       either a noun or a verb. An associated series of experiments
the obvious benefits of arbitrary sound-meaning mappings,          demonstrated reaction time, reading time, and sentence
under the framework of contemporary linguistics. What we           comprehension advantages for phonologically “noun-like
mean by “sound-meaning mapping” is not purely sound                nouns” and “verb-like verbs.”
symbolism, however, nor is it morphology. In the following            Bergen (2004) used a morphological priming paradigm to
paper, we offer a statistical, corpus-based approach to the        test whether there was a processing advantage for words
phonaestheme, a sub-morphemic unit that has a predictable          containing phonaesthemes over words that shared only
effect on the meaning of the word as a whole. These non-           semantic or only formal features, or which contained
morphological relationships between sound and meaning              “pseudo-phonaesthemes.”          He found a difference in
have not been thoroughly explored by behavioral and                reaction times between the phonaestheme condition and the
computational research, with some notable exceptions (e.g.         other three conditions by comparing primed reaction times
Hutchins, 1998; Bergen, 2004).
   By contrast, sound-syntax mappings are somewhat better            1
documented in the literature. Monaghan, Chater, and                     For example, Subject-Verb-Object word order implicates
Christiansen (2005) address the role of phonetic similarity        syntax; persons, places, and things (nouns) are semantically
                                                                   different from actions and states of being (verbs).
to RTs to the same words in isolation, drawn from              exhibit greater semantic relatedness than words chosen at
Washington University‟s English Lexicon Project. He            random from the entire corpus.            This computational
demonstrated both a facilitation effect for word pairs         approach to the problem has two distinct advantages over
containing a phonaestheme and an inhibitory effect for word    the experimental methods commonly found in the literature.
pairs in which the prime contained a pseudo-phonaestheme.      First, this method is objective and does not rely heavily on
His use of corpus-based methods (in this case, Latent          intuition on either the part of the experimenter or
Semantic Analysis: Landauer, Foltz & Laham, 1998) was          participants2. Second, it is possible to use the method to test
limited to ensuring that his list of words used in meaning-    a large number of candidate phonaesthemes without
only priming pairs did not have any higher semantic            requiring us to probe each participant for hundreds of
coherence than the list of words used in phonaestheme          linguistic intuitions at a time.
priming pairs.
   Finally, Hutchins (1998, Study 1 and 2) examined                                    The Experiment
participants‟ intuitions about 46 phonaesthemes drawn from
nearly 70 years of speculation about sound-meaning links in    Methods
the literature.       In her studies, participants ranked
phonaestheme-bearing words according to their coherence        Procedure For our computational model we used Infomap
                                                               (http://infomap-nlp.sourceforge.net/; Schütze, 1997), a
with a gloss or definition. Participants also matched nonce
                                                               variant of Latent Semantic Analysis (Landauer & Dumais,
words containing phonaesthemes to a relevant definition at
rates significantly above chance. She also examined patterns   1997; Landauer et al., 1998). Infomap represents words as
internal to phonaesthemes: strength of sound-meaning           vectors in a multi-dimensional space whereby the distance
                                                               between the words is inversely proportional to their
association, regularity of this association, and
                                                               semantic similarity. This space is constructed by reducing
“productivity,” defined as likelihood that a nonword
containing that phonaestheme will be associated with the       the number of dimensions of a matrix that records the
definition of a real word containing that phonaestheme.        frequency of co-occurrence between content words in the
                                                               corpus through the application of a statistical method known
   In this paper, we chose a different approach to the study
                                                               as singular value decomposition. For the purposes of
of phonaesthemes than previously demonstrated in the
literature. Where previous approaches relied on intuition or   Infomap, two content words are said to co-occur if they are
a structured lexicon to gather words with candidate            found within a specific distance from each other (i.e., for
                                                               Infomap the co-ocurrence frequency of swim and water
phonaesthemes in them, we use a corpus analysis of texts
                                                               could depend on how many times the word swim appears
available          through         Project         Gutenberg
(http://www.gutenberg.org/; Lebert, 2005). In the following    within 15 words of water). This results in a space in which
experiment, we examine 47 distinct groups of words bearing     the vectors for words that frequently co-occur are grouped
                                                               closer together than words that rarely co-occur within the
candidate phonaesthemes from a large corpus using a
                                                               analysis window. As a result, words which relate to the
statistical method based on Latent Semantic Analysis.
   Previous studies on phonaesthemes relied on the             same topic, and can be assumed to have a strong semantic
intuitions of participants to verify the sound-meaning         relation, tend to be grouped together. The semantic
                                                               relationship between two words can then be measured by
relationships of interest (e.g., Hutchins, 1998; Bergen,
                                                               correlating the vectors representing those two words within
2004). Moreover, these methods are at their best when
testing only a limited number of phonaesthemes. As a result,   the semantic space.3
such studies have often constrained their examination to          Leveraging this property of semantic spaces allows us to
                                                               test the hypothesis that pairs of words sharing a
only a handful of phonaesthemes. Even the most extensive
                                                               phonaestheme are more likely to share some aspect of their
of these works, Hutchins (1998), who identified over 100
phonaesthemes previously indicated in the literature, uses     meaning than pairs of words chosen at random. We tested
only 46 of them in her experiments. In contrast with the       whether this was true for any specific candidate
                                                               phonaestheme using a Monte-Carlo analysis. We first
experimental methods employed by Hutchins (1998),
                                                               identified all of the words in the corpus sharing a
Bergen (2004), and others, we used a computational method
based on LSA to explore sound-meaning relationships such       conjectured phonaestheme4 resulting in a word cluster
as those exhibited by phonaesthemes.                           representing each candidate phonaestheme. Next we
                                                               performed two separate Monte-Carlo analyses. The first
   One of Latent Semantic Analysis‟ most useful features is
that it can be used to compute and compare semantic vectors
                                                                  2
of words and phrases. We use this feature to compare the            At present the experimenters choose which phonetic clusters to
semantic relatedness clusters comprised of words that share    test, meaning that intuition is still part of the process. However, the
                                                               validity of a phonetic cluster as a phonaestheme is entirely
a phonaestheme to clusters comprised of words chosen at
                                                               statistically determined.
random from the entire corpus. Because the phonaestheme           3
                                                                    This correlation is equivalent to calculating the cosine of the
as a construct necessarily involves a partial overlap in       angle formed by the two vectors.
meaning beyond that generally found in language, we               4
                                                                    It is important to note that due to the nature of a written corpus,
hypothesize that words sharing a phonaestheme would            the match was orthographical rather than phonetic. However, in
                                                               most cases the two are highly congruent.
analysis averaged the semantic relationship of 1000                        binomial distribution for our α (.05). After applying a
instances of word pairs chosen at random from the cluster.                 Bonferroni correction for performing 50 comparisons, the
This was designed to measure the overall semantic                          threshold for statistical significance of the binomial test was
relationship of words within the cluster. A second analysis                for 14 t-tests out of 100 to turn out as significant, with a
tested the statistical significance of this relationship by                frequency of 13 being marginally significant. We therefore
running 100 t-test comparisons. Each of these tests                        judged significance frequencies (#Sig below) of 15 and
compared the relationship of 50 pairs of words chosen at                   higher to indicate a phonaestheme for which we had
random from the conjectured cluster with 50 pairs of words                 statistical evidence. We judged significance frequencies of
chosen at random from a similarly sized cluster, randomly                  13 and 14 to indicate a phonaestheme for which we had
generated from the entire corpus. We recorded the number                   only marginal statistical support. A list of the results for
of times these t-tests resulted in a statistically significant             each of the tested phonaesthemes can be found in Appendix
difference (α = .05). Both of these analyses were performed                A.
3 times for each conjectured phonaestheme and the median                      Among Hutchins‟ original list of 46 possible
value for each run was used as the final result.                           phonaesthemes, we discovered 26 statistically reliable
                                                                           phonaesthemes and 2 marginally reliable phonaesthemes.
Materials We used a corpus based on Project Gutenberg                      Overall our results were in line with the empirical data
(http://www.gutenberg.org/). Specifically, we used the bulk                collected by Hutchins. By way of comparing the two
of the English language works available through the                        datasets, #Sig and Hutchins‟ average rating measure were
project‟s website. This resulted in a corpus of 4034 separate              well correlated (r = .53). Neither of the unlikely
documents consisting of over 290 million words. Infomap                    phonaestheme candidates we examined were statistically
analyzed this corpus using default settings (a co-occurrence               supported phonaesthemes (#Sigbr- = 6; #Sigz- = 5), whereas
window of 15 words and using the 20,000 most frequent                      both of our newly hypothesized phonaesthemes were
content words for the analysis) and its default stop list.                 statistically supported (#Sigkn- = 28; #Sig-gn = 23).
  The bulk of the candidate phonaesthemes we used were
taken from the list used by Hutchins (1998) with the                                                Discussion
addition of two possible phonaesthemes that seemed                           We successfully used our computational method to verify
interesting to us. We also included several letter                         phonaesthemes using a statistical corpus analysis. These
combinations that we thought were unlikely to be                           results were congruent with the empirical data collected by
phonaesthemes in order to test the method‟s capacity for                   Hutchins, suggesting that this statistical method can be used
discriminating between phonaesthemes and non-                              as a tool to examine the validity of conjectured
phonaesthemes. Overall we examined 50 possible                             phonaesthemes. Unlike previous work, our model can be
phonaesthemes. Of these, 46 were taken from the list                       used to directly test whether a cluster of words containing a
Hutchins‟ used in her first study5, two were candidates that               phonaestheme is more semantically similar than would be
we considered to be plausible phonaesthemes („kn-„ and „-                  expected by chance. While we successfully applied this test
ign‟), and for the last two we chose phonemic sequences we                 and discriminated between phonaesthemes and pseudo-
thought were unlikely to be phonaesthemes („br-„ and „z-„),                phonaesthemes, at present our method cannot identify what
yielding a final list of 47 candidate phonaesthemes.                       specific semantic content is carried by the identified
                                                                           phonaestheme.
Results                                                                      It is interesting to note that the most frequently cited
Our two measures, the average strength of the semantic                     phonaesthemes also exhibited an exceptionally high level of
relationship and the overall frequency of statistically                    support (e.g., #Siggl- = 85; #Sig-ump = 73). This supports the
significant t-test comparisons, were highly correlated (r =                common intuition about these phonetic groups‟ internal
0.93). This indicates that our method is reliable and that its             sound-meaning relationship and suggests that intuition tends
results are reproducible. Because of this high correlation we              to pick out the strongest phonaesthemes rather than weaker
will focus our analysis on the frequency of statistically                  ones. Because of this tendency, it is likely that there is an
significant t-tests as this analysis is likely to apply equally to         inherent bias toward specific “easy to find” phonaesthemes
the strength of the semantic relationship and is easier to                 in the literature.      For instance, it is possible that
interpret from a statistical perspective.                                  phonaesthemes can also be infixes, but all of the
   To determine whether a conjectured phonaestheme was                     phonaesthemes identified so far have been either prefixes or
statistically supported by our analysis we compared the                    suffixes. In order to more rigorously test the effect of
overall frequency of statistically significant t-tests with the            phonaesthemes on processing, a better method for
                                                                           identifying phonaesthemes of various degrees of strength
  5
     After examining our corpus we decided to drop three of                and semantic coherence is required. Our computational
Hutchins‟ list of phonaesthemes („str_p‟, ‟sp_t‟, and „-isp‟)              method can be adapted for such a use within a given corpus
because each of them had 3 or fewer types in our corpus and were           or across several corpora, and is therefore more suitable for
therefore not suitable for statistical analysis. It should be noted that   such a task than any previous method of which we are
two of these three („str_p‟ and „-isp‟) also had only 3 types              aware.
according to Hutchins.
References

Bergen, B. (2004). The Psychological Reality of                  Appendix A – Detailed results
  Phonaesthemes. Language, 80(2), 291-311.
Blust, R. (2003). The phonestheme ŋ in Austronesian              Table 1: Prefix Phonaesthemes from Hutchins (1998)
  languages. Oceanic Linguistics 42: 187-212.
Farmer, T. A., Christiansen, M. H., & Monaghan, P. (2006).
  Phonological typicality influences on-line sentence            Phonaestheme      Strength        #Sig #Tokens
  comprehension. Proceedings of the National Academy of               bl-            0.059          29      117
  Sciences, 103(32), 12203-12208.                                      cl-           0.033            8      86
Firth, J. (1930). Speech. London: Oxford University Press.
Hockett, C. F. (1960). The origin of speech. Scientific               cr-            0.023            4      95
  American, 203, 88-96.                                               dr-            0.046          13       72
Hutchins, S. S. (1998). The psychological reality,                     fl-           0.052          16       54
  variability,     and     compositionality     of    English
                                                                       gl-           0.120          85       25
  phonesthemes. Dissertation Abstracts International,
  59(08), 4500B. (University Microfilms No. AAT                       gr-            0.028            5     202
  9901857).                                                         sc-/sk-          0.038          11      104
Infomap [Computer Software]. (2007). http://infomap-                  scr-           0.050          23       19
  nlp.sourceforge.net/. Stanford: CA.
Jakobson, R., and Waugh, L. (1979). The sound shape of                 sl-           0.044          13       63
  language. Bloomington: Indiana University Press.                   sm-             0.061          22       44
Landauer, T. K., & Dumais, S. T. (1997). A solution to                sn-            0.080          51       37
  Plato's problem: The Latent Semantic Analysis theory of
  the acquisition, induction, and representation of                   sp-            0.023            7     109
  knowledge. Psychological Review, 104, 211-240.                      spl-           0.069          33        6
Landauer, T. K., Foltz, P. W., & Laham, D. (1998).                   spr-            0.121          92        8
   Introduction to Latent Semantic Analysis. Discourse
   Processes, 25, 259-284.
                                                                     squ-            0.038          11       12
Lebert, M. (2005). Project Gutenberg, from 1971 to 2005.               st-           0.028            9     224
  Retrieved January 22nd, 2008 from http://www.etudes-                str-           0.039            7      63
  francaises.net/dossiers/gutenberg_eng.htm
                                                                      sw-            0.045          15       29
Monaghan, P., Chater, N., & Christiansen, M. (2005). The
  differential role of phonological and distributional cues in         tr-           0.033            7     249
  grammatical categories. Cognition, 96(1), 143-182.                  tw-            0.058          31       35
Nuckolls, J. B. (1999). The case for sound symbolism.                 wr-            0.067          21       56
  Annual Review of Anthropology, 28, 225-252.
Project Gutenberg (n.d.) http://www.gutenberg.org.
Ramachandran, V. S. & Hubbard, E. M. (2001).
  Synaesthesia: A window into perception, thought and
  language. Journal of Consciousness Studies, 8(12), 3-34.
Schütze , J. (1997). Ambiguity Resolution in Language
  Learning. Stanford CA: CSLI Publications.
Wallis, J. (1699). Grammar of the English Language.
  Oxford, UK: Lichfield.
Table 2: Suffix Phonaesthemes from Hutchins (1998)

Phonaestheme       Strength       #Sig #Tokens
      -ack           0.056         19       45
      -am            0.033           7      98
     -amp            0.068         32       18
       -ap           0.060         34       40
      -ash           0.095         65       30
      -asp           0.204        100        9
      -awl           0.068         27       10
       -ick          0.058         19       36
     -inge           0.018           5       4
        -ip          0.039           9      67
   -irl/-url         0.086         68        7
       -ng           0.035         15       59
       -nk           0.037           4      59
       -oil          0.048         19       20
       -olt          0.101         92        5
     -oop            0.055         32       13
    -ouch            0.061         22       11
      -owl           0.161         97       13
       -sk           0.029           6      23
    -ump             0.095         73       19
      -ust           0.042         11       31


Table 3: Additional phonaesthemes and non-phonaesthemes

Phonaestheme        Strength      #Sig   #Tokens
      kn-            0.060         28         15
      -ign           0.059         23         14
       br-           0.029           6        77
       z-            0.011           5        11

More Related Content

What's hot

Morphology pp
Morphology ppMorphology pp
Morphology ppMissHindA
 
What can a corpus tell us about discourse
What can a corpus tell us about discourseWhat can a corpus tell us about discourse
What can a corpus tell us about discoursePascual Pérez-Paredes
 
Utilising wordsmith and atlas to explore, analyse and report qualitative data
Utilising wordsmith and atlas to explore, analyse and report qualitative dataUtilising wordsmith and atlas to explore, analyse and report qualitative data
Utilising wordsmith and atlas to explore, analyse and report qualitative dataMerlien Institute
 
AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYijnlc
 
What can a corpus tell us about grammar?
What can a corpus tell us about grammar?What can a corpus tell us about grammar?
What can a corpus tell us about grammar?Pascual Pérez-Paredes
 
A Comparision of Emotion Metaphors
A Comparision of Emotion MetaphorsA Comparision of Emotion Metaphors
A Comparision of Emotion MetaphorsPhuong Vo An
 
Assimilation and reduplication in pangasinan adjectives
Assimilation and reduplication in pangasinan adjectivesAssimilation and reduplication in pangasinan adjectives
Assimilation and reduplication in pangasinan adjectivesshinathrun
 
Derivational process in matbat language (jurnal ijhan)
Derivational process in matbat language (jurnal ijhan)Derivational process in matbat language (jurnal ijhan)
Derivational process in matbat language (jurnal ijhan)Trijan Faam
 
Textual cohesion
Textual cohesionTextual cohesion
Textual cohesionmrstovila
 
basic notions in linguistics
basic notions in linguisticsbasic notions in linguistics
basic notions in linguisticswilmeridiomasuce
 
Language and its components
Language and its componentsLanguage and its components
Language and its componentsMIMOUN SEHIBI
 
What is the linguistics POR VANNESA ROGEL
What is the linguistics POR VANNESA ROGELWhat is the linguistics POR VANNESA ROGEL
What is the linguistics POR VANNESA ROGELvanne2020
 
Components of language
Components of languageComponents of language
Components of languagenirmeennimmu
 
An Introduction to Historical Linguistics and Traditional Grammar
An Introduction to Historical Linguistics andTraditional GrammarAn Introduction to Historical Linguistics andTraditional Grammar
An Introduction to Historical Linguistics and Traditional Grammarsamatta111
 
Semantic discourse analysis
Semantic discourse analysisSemantic discourse analysis
Semantic discourse analysisblessedkkr
 
Discourse analysis and vocabulary
Discourse analysis and vocabularyDiscourse analysis and vocabulary
Discourse analysis and vocabularyKaikka Kaikka
 

What's hot (20)

Morphology pp
Morphology ppMorphology pp
Morphology pp
 
What can a corpus tell us about discourse
What can a corpus tell us about discourseWhat can a corpus tell us about discourse
What can a corpus tell us about discourse
 
Utilising wordsmith and atlas to explore, analyse and report qualitative data
Utilising wordsmith and atlas to explore, analyse and report qualitative dataUtilising wordsmith and atlas to explore, analyse and report qualitative data
Utilising wordsmith and atlas to explore, analyse and report qualitative data
 
Chapter I Introduction
Chapter I IntroductionChapter I Introduction
Chapter I Introduction
 
AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITY
 
What can a corpus tell us about grammar?
What can a corpus tell us about grammar?What can a corpus tell us about grammar?
What can a corpus tell us about grammar?
 
A Comparision of Emotion Metaphors
A Comparision of Emotion MetaphorsA Comparision of Emotion Metaphors
A Comparision of Emotion Metaphors
 
Basic linguistic notions
Basic linguistic notionsBasic linguistic notions
Basic linguistic notions
 
Assimilation and reduplication in pangasinan adjectives
Assimilation and reduplication in pangasinan adjectivesAssimilation and reduplication in pangasinan adjectives
Assimilation and reduplication in pangasinan adjectives
 
Derivational process in matbat language (jurnal ijhan)
Derivational process in matbat language (jurnal ijhan)Derivational process in matbat language (jurnal ijhan)
Derivational process in matbat language (jurnal ijhan)
 
Textual cohesion
Textual cohesionTextual cohesion
Textual cohesion
 
basic notions in linguistics
basic notions in linguisticsbasic notions in linguistics
basic notions in linguistics
 
Language and its components
Language and its componentsLanguage and its components
Language and its components
 
Discourse and corpus
Discourse and corpusDiscourse and corpus
Discourse and corpus
 
What is the linguistics POR VANNESA ROGEL
What is the linguistics POR VANNESA ROGELWhat is the linguistics POR VANNESA ROGEL
What is the linguistics POR VANNESA ROGEL
 
Components of language
Components of languageComponents of language
Components of language
 
Issues in linguistics
Issues in linguisticsIssues in linguistics
Issues in linguistics
 
An Introduction to Historical Linguistics and Traditional Grammar
An Introduction to Historical Linguistics andTraditional GrammarAn Introduction to Historical Linguistics andTraditional Grammar
An Introduction to Historical Linguistics and Traditional Grammar
 
Semantic discourse analysis
Semantic discourse analysisSemantic discourse analysis
Semantic discourse analysis
 
Discourse analysis and vocabulary
Discourse analysis and vocabularyDiscourse analysis and vocabulary
Discourse analysis and vocabulary
 

Similar to Phonaesthemes: A Corpus-based Analysis

Semantic Glimmers: CSDL9
Semantic Glimmers: CSDL9Semantic Glimmers: CSDL9
Semantic Glimmers: CSDL9kotis
 
Corpus linguistics and multi-word units
Corpus linguistics and multi-word unitsCorpus linguistics and multi-word units
Corpus linguistics and multi-word unitsPascual Pérez-Paredes
 
2014-Article 7 ISC
2014-Article 7 ISC2014-Article 7 ISC
2014-Article 7 ISCNatashaPDA
 
THE DIFFERENCES BETWEEN SYNTAX AND SEMANTICS
THE DIFFERENCES BETWEEN SYNTAX AND SEMANTICSTHE DIFFERENCES BETWEEN SYNTAX AND SEMANTICS
THE DIFFERENCES BETWEEN SYNTAX AND SEMANTICSHENOK SHIHEPO
 
Natural Phonology by Hussain H Mayuuf/2013
Natural Phonology by Hussain H Mayuuf/2013Natural Phonology by Hussain H Mayuuf/2013
Natural Phonology by Hussain H Mayuuf/2013Hhm Mayuuf
 
Hanks_Final Research Paper
Hanks_Final Research PaperHanks_Final Research Paper
Hanks_Final Research PaperAmanda Hanks
 
Learner Views Of Using Authentic Audio To Aid Pronunciation You Can Just Grab...
Learner Views Of Using Authentic Audio To Aid Pronunciation You Can Just Grab...Learner Views Of Using Authentic Audio To Aid Pronunciation You Can Just Grab...
Learner Views Of Using Authentic Audio To Aid Pronunciation You Can Just Grab...englishonecfl
 
Examination of the Prediction of Different Dimensions of Analytic Relations’ ...
Examination of the Prediction of Different Dimensions of Analytic Relations’ ...Examination of the Prediction of Different Dimensions of Analytic Relations’ ...
Examination of the Prediction of Different Dimensions of Analytic Relations’ ...Mohammad Mosiur Rahman
 
Memory & Cognition2002, 30 (4), 583-593Spoken word recog.docx
Memory & Cognition2002, 30 (4), 583-593Spoken word recog.docxMemory & Cognition2002, 30 (4), 583-593Spoken word recog.docx
Memory & Cognition2002, 30 (4), 583-593Spoken word recog.docxandreecapon
 
Theoretical_grammar_of_english.doc
Theoretical_grammar_of_english.docTheoretical_grammar_of_english.doc
Theoretical_grammar_of_english.docssuser67b22b1
 
What can a corpus tell us about grammar
What can a corpus tell us about grammarWhat can a corpus tell us about grammar
What can a corpus tell us about grammarSami Khalil
 
Differences in Frequencies between Linking Verbs and Relative Pronouns in Wri...
Differences in Frequencies between Linking Verbs and Relative Pronouns in Wri...Differences in Frequencies between Linking Verbs and Relative Pronouns in Wri...
Differences in Frequencies between Linking Verbs and Relative Pronouns in Wri...CSCJournals
 
Kuehnast & al._Being Moved_Frontiers in Psychology_2014
Kuehnast & al._Being Moved_Frontiers in Psychology_2014Kuehnast & al._Being Moved_Frontiers in Psychology_2014
Kuehnast & al._Being Moved_Frontiers in Psychology_2014Milena Kuehnast
 
Vocabulary learning during reading
Vocabulary learning during readingVocabulary learning during reading
Vocabulary learning during readingAlexander Decker
 
A semantics theory of word classes.pdf
A semantics theory of word classes.pdfA semantics theory of word classes.pdf
A semantics theory of word classes.pdfSara Parker
 
Decoding word association 1 lexical dev within and mental lexicon for language 2
Decoding word association 1 lexical dev within and mental lexicon for language 2Decoding word association 1 lexical dev within and mental lexicon for language 2
Decoding word association 1 lexical dev within and mental lexicon for language 2Col Mukteshwar Prasad
 

Similar to Phonaesthemes: A Corpus-based Analysis (20)

Semantic Glimmers: CSDL9
Semantic Glimmers: CSDL9Semantic Glimmers: CSDL9
Semantic Glimmers: CSDL9
 
Corpus linguistics and multi-word units
Corpus linguistics and multi-word unitsCorpus linguistics and multi-word units
Corpus linguistics and multi-word units
 
2014-Article 7 ISC
2014-Article 7 ISC2014-Article 7 ISC
2014-Article 7 ISC
 
THE DIFFERENCES BETWEEN SYNTAX AND SEMANTICS
THE DIFFERENCES BETWEEN SYNTAX AND SEMANTICSTHE DIFFERENCES BETWEEN SYNTAX AND SEMANTICS
THE DIFFERENCES BETWEEN SYNTAX AND SEMANTICS
 
Lexicology as a science
Lexicology as a scienceLexicology as a science
Lexicology as a science
 
Natural Phonology by Hussain H Mayuuf/2013
Natural Phonology by Hussain H Mayuuf/2013Natural Phonology by Hussain H Mayuuf/2013
Natural Phonology by Hussain H Mayuuf/2013
 
Hanks_Final Research Paper
Hanks_Final Research PaperHanks_Final Research Paper
Hanks_Final Research Paper
 
Cognitive linguistics
Cognitive linguisticsCognitive linguistics
Cognitive linguistics
 
Cognitive linguistics
Cognitive linguisticsCognitive linguistics
Cognitive linguistics
 
Learner Views Of Using Authentic Audio To Aid Pronunciation You Can Just Grab...
Learner Views Of Using Authentic Audio To Aid Pronunciation You Can Just Grab...Learner Views Of Using Authentic Audio To Aid Pronunciation You Can Just Grab...
Learner Views Of Using Authentic Audio To Aid Pronunciation You Can Just Grab...
 
Examination of the Prediction of Different Dimensions of Analytic Relations’ ...
Examination of the Prediction of Different Dimensions of Analytic Relations’ ...Examination of the Prediction of Different Dimensions of Analytic Relations’ ...
Examination of the Prediction of Different Dimensions of Analytic Relations’ ...
 
Memory & Cognition2002, 30 (4), 583-593Spoken word recog.docx
Memory & Cognition2002, 30 (4), 583-593Spoken word recog.docxMemory & Cognition2002, 30 (4), 583-593Spoken word recog.docx
Memory & Cognition2002, 30 (4), 583-593Spoken word recog.docx
 
Theoretical_grammar_of_english.doc
Theoretical_grammar_of_english.docTheoretical_grammar_of_english.doc
Theoretical_grammar_of_english.doc
 
What can a corpus tell us about grammar
What can a corpus tell us about grammarWhat can a corpus tell us about grammar
What can a corpus tell us about grammar
 
Differences in Frequencies between Linking Verbs and Relative Pronouns in Wri...
Differences in Frequencies between Linking Verbs and Relative Pronouns in Wri...Differences in Frequencies between Linking Verbs and Relative Pronouns in Wri...
Differences in Frequencies between Linking Verbs and Relative Pronouns in Wri...
 
Kuehnast & al._Being Moved_Frontiers in Psychology_2014
Kuehnast & al._Being Moved_Frontiers in Psychology_2014Kuehnast & al._Being Moved_Frontiers in Psychology_2014
Kuehnast & al._Being Moved_Frontiers in Psychology_2014
 
MELT 104 Functional Grammar
MELT 104   Functional GrammarMELT 104   Functional Grammar
MELT 104 Functional Grammar
 
Vocabulary learning during reading
Vocabulary learning during readingVocabulary learning during reading
Vocabulary learning during reading
 
A semantics theory of word classes.pdf
A semantics theory of word classes.pdfA semantics theory of word classes.pdf
A semantics theory of word classes.pdf
 
Decoding word association 1 lexical dev within and mental lexicon for language 2
Decoding word association 1 lexical dev within and mental lexicon for language 2Decoding word association 1 lexical dev within and mental lexicon for language 2
Decoding word association 1 lexical dev within and mental lexicon for language 2
 

Phonaesthemes: A Corpus-based Analysis

  • 1. Phonaesthemes: A corpora-based analysis Katya Otis (kotis@northwestern.edu) Department of Psychology, Northwestern University 2029 Sheridan Road, Evanston, IL 60208 USA Eyal Sagi (ermon@northwestern.edu) Department of Psychology, Northwestern University 2029 Sheridan Road, Evanston, IL 60208 USA Abstract in separating lexical categories, a construct that necessarily includes some syntactic features and some semantic features The association between sound and meaning is commonly thought of as symbolic and arbitrary. While this appears to be (Monaghan et al., 2005).1 Recent research indicates that mostly correct, there is some evidence that specific phonetic systematic sound-meaning and sound-syntax relationships groupings can be indicative of word meaning. In this paper play a role in language processing (Hutchins, 1998; Bergen, we present a corpus-based method that can be used to test 2004; Farmer, Christiansen, & Monaghan, 2006), and may whether such an association exists in a given corpus for a also be important to language learning (Monaghan et al., specified phonetic grouping. The results we obtain using this 2005). method are compared with other empirical findings in the field and its implications are discussed. To the degree that it differs from adult-directed speech, child-directed speech should be sensitive to the child‟s Keywords: Corpus analysis, Computational linguistics, status as a language learner. Monaghan et al. (2005) tested Phonaesthemes, Phonetics, Psycholinguistics, Sound- adult speech from the CHILDES corpus for the presence of Meaning association. 16 phonological cues in open- and closed-class words and for their diagnosticity in determining whether a word is a Introduction noun or a verb. Significantly diagnostic cues to the It is a popular intuition that words with similar sounds noun/verb distinction were: syllable length, onset and also mean similar things. There is a long tradition of belief syllabic complexity, syllable reduction, -ed inflection in the association between phonetic clusters and semantic (voiced or unvoiced vowel), vowel position, and vowel clusters going back at least as far as Wallis‟ grammar of height. Furthermore, in an experiment on artificial language English (Wallis, 1699). Morphemes form one such well- learning of bigrams, they found that participants used known cluster, but other, sub-morphemic phonetic clusters phonological cues when distributional cues were weak or that contribute to the meaning of the word as a whole have absent. Since grammatical categories designate not only also been hypothesized (Firth, 1930; Jakobsen & Waugh, syntactically disparate words but also semantically disparate 1979). Anthropologists have documented sound symbolism words, this might indicate that sound-meaning in many languages (Blust, 2003; Nuckolls, 1999; correspondences are a boon to language learners, especially Ramachandran & Hubbard, 2001), but its role as a purely in low-frequency cases. linguistic phenomenon is still unclear. Moreover, the Farmer, Christiansen, and Monaghan (2006) expanded the Saussurean notion of the arbitrary relationship between the research on phonological diagnostics for lexical category sign‟s form and its referent is a matter of dogma for most membership begun in Monaghan et al. (2005). They linguists (Hockett, 1960). This makes the study of words performed a regression analysis on over 3,000 monosyllabic that do participate in predictable sound-meaning mappings English words that significantly associated certain all the more important, since it is difficult to explain how phonological features with an unambiguous interpretation as these patterns come to be, or why they might survive despite either a noun or a verb. An associated series of experiments the obvious benefits of arbitrary sound-meaning mappings, demonstrated reaction time, reading time, and sentence under the framework of contemporary linguistics. What we comprehension advantages for phonologically “noun-like mean by “sound-meaning mapping” is not purely sound nouns” and “verb-like verbs.” symbolism, however, nor is it morphology. In the following Bergen (2004) used a morphological priming paradigm to paper, we offer a statistical, corpus-based approach to the test whether there was a processing advantage for words phonaestheme, a sub-morphemic unit that has a predictable containing phonaesthemes over words that shared only effect on the meaning of the word as a whole. These non- semantic or only formal features, or which contained morphological relationships between sound and meaning “pseudo-phonaesthemes.” He found a difference in have not been thoroughly explored by behavioral and reaction times between the phonaestheme condition and the computational research, with some notable exceptions (e.g. other three conditions by comparing primed reaction times Hutchins, 1998; Bergen, 2004). By contrast, sound-syntax mappings are somewhat better 1 documented in the literature. Monaghan, Chater, and For example, Subject-Verb-Object word order implicates Christiansen (2005) address the role of phonetic similarity syntax; persons, places, and things (nouns) are semantically different from actions and states of being (verbs).
  • 2. to RTs to the same words in isolation, drawn from exhibit greater semantic relatedness than words chosen at Washington University‟s English Lexicon Project. He random from the entire corpus. This computational demonstrated both a facilitation effect for word pairs approach to the problem has two distinct advantages over containing a phonaestheme and an inhibitory effect for word the experimental methods commonly found in the literature. pairs in which the prime contained a pseudo-phonaestheme. First, this method is objective and does not rely heavily on His use of corpus-based methods (in this case, Latent intuition on either the part of the experimenter or Semantic Analysis: Landauer, Foltz & Laham, 1998) was participants2. Second, it is possible to use the method to test limited to ensuring that his list of words used in meaning- a large number of candidate phonaesthemes without only priming pairs did not have any higher semantic requiring us to probe each participant for hundreds of coherence than the list of words used in phonaestheme linguistic intuitions at a time. priming pairs. Finally, Hutchins (1998, Study 1 and 2) examined The Experiment participants‟ intuitions about 46 phonaesthemes drawn from nearly 70 years of speculation about sound-meaning links in Methods the literature. In her studies, participants ranked phonaestheme-bearing words according to their coherence Procedure For our computational model we used Infomap (http://infomap-nlp.sourceforge.net/; Schütze, 1997), a with a gloss or definition. Participants also matched nonce variant of Latent Semantic Analysis (Landauer & Dumais, words containing phonaesthemes to a relevant definition at rates significantly above chance. She also examined patterns 1997; Landauer et al., 1998). Infomap represents words as internal to phonaesthemes: strength of sound-meaning vectors in a multi-dimensional space whereby the distance between the words is inversely proportional to their association, regularity of this association, and semantic similarity. This space is constructed by reducing “productivity,” defined as likelihood that a nonword containing that phonaestheme will be associated with the the number of dimensions of a matrix that records the definition of a real word containing that phonaestheme. frequency of co-occurrence between content words in the corpus through the application of a statistical method known In this paper, we chose a different approach to the study as singular value decomposition. For the purposes of of phonaesthemes than previously demonstrated in the literature. Where previous approaches relied on intuition or Infomap, two content words are said to co-occur if they are a structured lexicon to gather words with candidate found within a specific distance from each other (i.e., for Infomap the co-ocurrence frequency of swim and water phonaesthemes in them, we use a corpus analysis of texts could depend on how many times the word swim appears available through Project Gutenberg (http://www.gutenberg.org/; Lebert, 2005). In the following within 15 words of water). This results in a space in which experiment, we examine 47 distinct groups of words bearing the vectors for words that frequently co-occur are grouped closer together than words that rarely co-occur within the candidate phonaesthemes from a large corpus using a analysis window. As a result, words which relate to the statistical method based on Latent Semantic Analysis. Previous studies on phonaesthemes relied on the same topic, and can be assumed to have a strong semantic intuitions of participants to verify the sound-meaning relation, tend to be grouped together. The semantic relationship between two words can then be measured by relationships of interest (e.g., Hutchins, 1998; Bergen, correlating the vectors representing those two words within 2004). Moreover, these methods are at their best when testing only a limited number of phonaesthemes. As a result, the semantic space.3 such studies have often constrained their examination to Leveraging this property of semantic spaces allows us to test the hypothesis that pairs of words sharing a only a handful of phonaesthemes. Even the most extensive phonaestheme are more likely to share some aspect of their of these works, Hutchins (1998), who identified over 100 phonaesthemes previously indicated in the literature, uses meaning than pairs of words chosen at random. We tested only 46 of them in her experiments. In contrast with the whether this was true for any specific candidate phonaestheme using a Monte-Carlo analysis. We first experimental methods employed by Hutchins (1998), identified all of the words in the corpus sharing a Bergen (2004), and others, we used a computational method based on LSA to explore sound-meaning relationships such conjectured phonaestheme4 resulting in a word cluster as those exhibited by phonaesthemes. representing each candidate phonaestheme. Next we performed two separate Monte-Carlo analyses. The first One of Latent Semantic Analysis‟ most useful features is that it can be used to compute and compare semantic vectors 2 of words and phrases. We use this feature to compare the At present the experimenters choose which phonetic clusters to semantic relatedness clusters comprised of words that share test, meaning that intuition is still part of the process. However, the validity of a phonetic cluster as a phonaestheme is entirely a phonaestheme to clusters comprised of words chosen at statistically determined. random from the entire corpus. Because the phonaestheme 3 This correlation is equivalent to calculating the cosine of the as a construct necessarily involves a partial overlap in angle formed by the two vectors. meaning beyond that generally found in language, we 4 It is important to note that due to the nature of a written corpus, hypothesize that words sharing a phonaestheme would the match was orthographical rather than phonetic. However, in most cases the two are highly congruent.
  • 3. analysis averaged the semantic relationship of 1000 binomial distribution for our α (.05). After applying a instances of word pairs chosen at random from the cluster. Bonferroni correction for performing 50 comparisons, the This was designed to measure the overall semantic threshold for statistical significance of the binomial test was relationship of words within the cluster. A second analysis for 14 t-tests out of 100 to turn out as significant, with a tested the statistical significance of this relationship by frequency of 13 being marginally significant. We therefore running 100 t-test comparisons. Each of these tests judged significance frequencies (#Sig below) of 15 and compared the relationship of 50 pairs of words chosen at higher to indicate a phonaestheme for which we had random from the conjectured cluster with 50 pairs of words statistical evidence. We judged significance frequencies of chosen at random from a similarly sized cluster, randomly 13 and 14 to indicate a phonaestheme for which we had generated from the entire corpus. We recorded the number only marginal statistical support. A list of the results for of times these t-tests resulted in a statistically significant each of the tested phonaesthemes can be found in Appendix difference (α = .05). Both of these analyses were performed A. 3 times for each conjectured phonaestheme and the median Among Hutchins‟ original list of 46 possible value for each run was used as the final result. phonaesthemes, we discovered 26 statistically reliable phonaesthemes and 2 marginally reliable phonaesthemes. Materials We used a corpus based on Project Gutenberg Overall our results were in line with the empirical data (http://www.gutenberg.org/). Specifically, we used the bulk collected by Hutchins. By way of comparing the two of the English language works available through the datasets, #Sig and Hutchins‟ average rating measure were project‟s website. This resulted in a corpus of 4034 separate well correlated (r = .53). Neither of the unlikely documents consisting of over 290 million words. Infomap phonaestheme candidates we examined were statistically analyzed this corpus using default settings (a co-occurrence supported phonaesthemes (#Sigbr- = 6; #Sigz- = 5), whereas window of 15 words and using the 20,000 most frequent both of our newly hypothesized phonaesthemes were content words for the analysis) and its default stop list. statistically supported (#Sigkn- = 28; #Sig-gn = 23). The bulk of the candidate phonaesthemes we used were taken from the list used by Hutchins (1998) with the Discussion addition of two possible phonaesthemes that seemed We successfully used our computational method to verify interesting to us. We also included several letter phonaesthemes using a statistical corpus analysis. These combinations that we thought were unlikely to be results were congruent with the empirical data collected by phonaesthemes in order to test the method‟s capacity for Hutchins, suggesting that this statistical method can be used discriminating between phonaesthemes and non- as a tool to examine the validity of conjectured phonaesthemes. Overall we examined 50 possible phonaesthemes. Unlike previous work, our model can be phonaesthemes. Of these, 46 were taken from the list used to directly test whether a cluster of words containing a Hutchins‟ used in her first study5, two were candidates that phonaestheme is more semantically similar than would be we considered to be plausible phonaesthemes („kn-„ and „- expected by chance. While we successfully applied this test ign‟), and for the last two we chose phonemic sequences we and discriminated between phonaesthemes and pseudo- thought were unlikely to be phonaesthemes („br-„ and „z-„), phonaesthemes, at present our method cannot identify what yielding a final list of 47 candidate phonaesthemes. specific semantic content is carried by the identified phonaestheme. Results It is interesting to note that the most frequently cited Our two measures, the average strength of the semantic phonaesthemes also exhibited an exceptionally high level of relationship and the overall frequency of statistically support (e.g., #Siggl- = 85; #Sig-ump = 73). This supports the significant t-test comparisons, were highly correlated (r = common intuition about these phonetic groups‟ internal 0.93). This indicates that our method is reliable and that its sound-meaning relationship and suggests that intuition tends results are reproducible. Because of this high correlation we to pick out the strongest phonaesthemes rather than weaker will focus our analysis on the frequency of statistically ones. Because of this tendency, it is likely that there is an significant t-tests as this analysis is likely to apply equally to inherent bias toward specific “easy to find” phonaesthemes the strength of the semantic relationship and is easier to in the literature. For instance, it is possible that interpret from a statistical perspective. phonaesthemes can also be infixes, but all of the To determine whether a conjectured phonaestheme was phonaesthemes identified so far have been either prefixes or statistically supported by our analysis we compared the suffixes. In order to more rigorously test the effect of overall frequency of statistically significant t-tests with the phonaesthemes on processing, a better method for identifying phonaesthemes of various degrees of strength 5 After examining our corpus we decided to drop three of and semantic coherence is required. Our computational Hutchins‟ list of phonaesthemes („str_p‟, ‟sp_t‟, and „-isp‟) method can be adapted for such a use within a given corpus because each of them had 3 or fewer types in our corpus and were or across several corpora, and is therefore more suitable for therefore not suitable for statistical analysis. It should be noted that such a task than any previous method of which we are two of these three („str_p‟ and „-isp‟) also had only 3 types aware. according to Hutchins.
  • 4. References Bergen, B. (2004). The Psychological Reality of Appendix A – Detailed results Phonaesthemes. Language, 80(2), 291-311. Blust, R. (2003). The phonestheme ŋ in Austronesian Table 1: Prefix Phonaesthemes from Hutchins (1998) languages. Oceanic Linguistics 42: 187-212. Farmer, T. A., Christiansen, M. H., & Monaghan, P. (2006). Phonological typicality influences on-line sentence Phonaestheme Strength #Sig #Tokens comprehension. Proceedings of the National Academy of bl- 0.059 29 117 Sciences, 103(32), 12203-12208. cl- 0.033 8 86 Firth, J. (1930). Speech. London: Oxford University Press. Hockett, C. F. (1960). The origin of speech. Scientific cr- 0.023 4 95 American, 203, 88-96. dr- 0.046 13 72 Hutchins, S. S. (1998). The psychological reality, fl- 0.052 16 54 variability, and compositionality of English gl- 0.120 85 25 phonesthemes. Dissertation Abstracts International, 59(08), 4500B. (University Microfilms No. AAT gr- 0.028 5 202 9901857). sc-/sk- 0.038 11 104 Infomap [Computer Software]. (2007). http://infomap- scr- 0.050 23 19 nlp.sourceforge.net/. Stanford: CA. Jakobson, R., and Waugh, L. (1979). The sound shape of sl- 0.044 13 63 language. Bloomington: Indiana University Press. sm- 0.061 22 44 Landauer, T. K., & Dumais, S. T. (1997). A solution to sn- 0.080 51 37 Plato's problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of sp- 0.023 7 109 knowledge. Psychological Review, 104, 211-240. spl- 0.069 33 6 Landauer, T. K., Foltz, P. W., & Laham, D. (1998). spr- 0.121 92 8 Introduction to Latent Semantic Analysis. Discourse Processes, 25, 259-284. squ- 0.038 11 12 Lebert, M. (2005). Project Gutenberg, from 1971 to 2005. st- 0.028 9 224 Retrieved January 22nd, 2008 from http://www.etudes- str- 0.039 7 63 francaises.net/dossiers/gutenberg_eng.htm sw- 0.045 15 29 Monaghan, P., Chater, N., & Christiansen, M. (2005). The differential role of phonological and distributional cues in tr- 0.033 7 249 grammatical categories. Cognition, 96(1), 143-182. tw- 0.058 31 35 Nuckolls, J. B. (1999). The case for sound symbolism. wr- 0.067 21 56 Annual Review of Anthropology, 28, 225-252. Project Gutenberg (n.d.) http://www.gutenberg.org. Ramachandran, V. S. & Hubbard, E. M. (2001). Synaesthesia: A window into perception, thought and language. Journal of Consciousness Studies, 8(12), 3-34. Schütze , J. (1997). Ambiguity Resolution in Language Learning. Stanford CA: CSLI Publications. Wallis, J. (1699). Grammar of the English Language. Oxford, UK: Lichfield.
  • 5. Table 2: Suffix Phonaesthemes from Hutchins (1998) Phonaestheme Strength #Sig #Tokens -ack 0.056 19 45 -am 0.033 7 98 -amp 0.068 32 18 -ap 0.060 34 40 -ash 0.095 65 30 -asp 0.204 100 9 -awl 0.068 27 10 -ick 0.058 19 36 -inge 0.018 5 4 -ip 0.039 9 67 -irl/-url 0.086 68 7 -ng 0.035 15 59 -nk 0.037 4 59 -oil 0.048 19 20 -olt 0.101 92 5 -oop 0.055 32 13 -ouch 0.061 22 11 -owl 0.161 97 13 -sk 0.029 6 23 -ump 0.095 73 19 -ust 0.042 11 31 Table 3: Additional phonaesthemes and non-phonaesthemes Phonaestheme Strength #Sig #Tokens kn- 0.060 28 15 -ign 0.059 23 14 br- 0.029 6 77 z- 0.011 5 11