Effects of CLI and Markedness on L2 Word
Word formation through use of derivational morphemes is essential to the acquisition of
foreign languages. Such an ability enables the learner to parse unknown words in order to
glean their meaning without the use of outside resources, and to fill their own lexical gaps
by recombining units they already have access to. The latter ability is of special interest
to the study of second language (L2) acquisition, as its correct usage hinges upon
multiple factors. Language learners must be able to recognize morphemes as separate
units, understand their meanings, identify the syntactic groups they attach to and create,
and be aware of any usage constraints, whether arbitrary or otherwise conditioned. This is
a complicated process, and conscious application thereof can greatly augment the
vocabulary of the L2 learner and improve their understanding of given input.
The acquisition of this ability is already a complex matter, and is further
complicated by cross-linguistic influences (CLI) involving transfer from either the native
language (L1) or from within the L2, and potentially the degree of markedness within the
material. As word formation expectedly expands L2 learners’ vocabularies, and improves
their knowledge of the L2, any insight into factors helping or hindering this process is of
great importance. Thus, this study intends to add data to the discussion of in what way
and to what extent CLI and markedness affect L2 word formation.
In the next section I will detail the parameters of CLI and markedness that are
relevant to this study. The third section will address the parameters and aims of the study.
Section 4 will review a previous study on the subject. Section 5 will present the results
and discussion, and I will conclude in Section 6.
2. CLI and Markedness
2.1 Cross-linguistic influences
In the study of L2 acquisition, the effect of the L1 on the progress and usage of the
L2 has been long taken into account. This “influence” from the L1 may come from a lack
of understanding of the L2 or insufficient exposure to correct forms, leaners’ perceptions
of similarities or differences between the languages, shared or clashing features, et cetera.
It should be noted that “influence” should not be taken to be necessarily
detrimental. Features of the L1 may facilitate an easier acquisition of L2 features that
other learners from different linguistic backgrounds may have trouble with. That is, if the
L1 and L2 share a feature such as the progressive tense in English and Spanish, learners
from either background will have an easier time learning this tense in the L2 than an L1
German speaker would, as there is no progressive tense in German. The ease in which the
tense is acquired in this case is likely a result of influence from their L1.
Naturally, CLI can also result in errors, and not simply a difficulty in acquisition.
An L1 English speaker learning Japanese will not have experience with a distinction
between a subject and a topic. Due to this absence from English, they are likely to
mislabel units in Japanese. An L1 Korean speaker who already has exposure to this
distinction is less likely to make the same errors.
Therefore, a workable definition of CLI might be “any influence from already
acquired languages which has an effect on L2 usage and acquisition, whether positive or
For this study, we are interested in the negative effects of CLI on word formation.
In the general usage of an L2, negative CLI may be manifested in a few ways. Learners
may try to mimic features of their L1 that are ungrammatical in the L2, they may overuse
features or prefer certain structures to others, or they may avoid usage of structures as
much as possible to minimize their mistakes. Before discussing how CLI affects word
formation directly, we must discuss the effect of markedness on CLI.
CLI can be partly based on perceived difficulties between languages. Markedness
is a more objective measure of the difficulty of specific language facets. It is more or less
a measure of the number of features that distinguish these facets, and can be applied to
most basic linguistic fields. An unmarked component indicates that no features are used
in marking, and is considered the base state. Marked components require differentiation,
and therefore need to be broken into subsets with further features. For example, in a
number system, the singular, having the features [Ø], is regarded to be the unmarked
state. Languages do not typically mark nouns for singularity. Plural, [group] is more
marked than the singular, and is much more likely to be explicitly marked by a language
as such. Dual is a subset of plural: minimally plural, or [group, minimal] (Harley & Ritter
2000), and one is unlikely to find a language that does not mark this morphologically if it
Many components of language are grouped into similar hierarchies of markedness.
Those elements that are more marked are considered to be more complex and less
frequent and therefore more difficult to learn.
2.3 CLI on word formation
Both these features of CLI and markedness affect the acquisition of L2 English
derivational affixes. In regards to word formation, especially the more constrained
English word formation, we typically consider non-native affixes to be more marked.
This is due to the fact that these affixes come with more constraints (such as only being
applicable to words of a specific etymology), more base allomorphy (in- may be realized
as ir-, il-, or im-; an affix may change the pronunciation, spelling, or stress patterns of the
base), are generally semantically less transparent, and attach overall to less frequent
words, giving learners less exposure to them. With this in mind, it is assumed that L2
English learners will have a harder time acquiring these morphemes, leading to more
errors in their production, especially spontaneous usage (using morphologically complex
words that were not learned as whole units).
Learners use different strategies in spontaneous word formation that may lead to
errors. Broadly, these strategies encompass drawing on knowledge from within the L2
(intralanguage) or from their L1 (interlanguage), and can be generalized as coinage (CO)
and foreignizing (FO), respectively. Further strategies of overgeneralization (OG), and
back-formation (BF) are examples of intralanguage coinage, as they also take cues from
the L2. All strategies are outlined below:
The following are examples of each strategy:
COINAGE (1) Creates an existing word from a new base and affix
COINAGE (2) Shortens a lexeme creating a new base, to which affixes are
FOREIGNIZING Uses L1 words as a base
OVERGENERALIZATION Applies affix without regard to restrictions
BACK-FORMATION Derives a base from a complex form
COINAGE (1) *costable ⟶ expensive
COINAGE (2) *sustable ⟶ sustainable
FOREIGNIZING *juristical (from German juristisch) ⟶ legal
OVERGENERALIZATION *specifical ⟶ specific
BACK-FORMATION *inhalate (from inhalation) ⟶ inhale
Because the definition given in the table above for overgeneralization is much too
broad (without the following break-down it encompasses over half of my data set), I have
split the strategy into three subsets, based on the types of errors created. This allows for a
more fine-tuned analysis. These are OG, constraint violation (CV), and rival affix (RA).1
3. TOEFL Corpus Study
3.1 Study overview, methodology, concerns, and aims
This study was conducted with data from the TOEFL 2011 exam. This is the
standard test given to measure English language abilities. It is a high stakes exam, as it is
used for purposes such as university entry. It contained data from speakers of Spanish,
Italian, French, German, Turkish, Arabic, Japanese, Chinese, Korean, Telugu, and Hindu.
They are divided by skill level: high, medium, or low. Participants were given eight
prompts to choose from and a limited amount of time in which to write the essay. It was a
concern that such a high stakes exam would not lend itself to participant creativity, as
incorrect English could lower their scores. As such, they are more likely to use forms
they know to be correct.
For the purposes of this study, I only used data from speakers of Spanish, Italian,
German, Turkish, Arabic, and Japanese. This was done to keep a similar background to
OVERGENERALIZATION Attached an affix to a base that needed none
CONSTRAINT VIOLATION Incorrect affix/base pairing *pollutive ⟶ polluting
RIVAL AFFIX Attached the wrong affix from a pair with the same
meaning. *unexpensive ⟶ inexpensive
These are common terms in the literature, but I am deﬁning them here for the purposes of this study.1
the Callies study, outlined in the next section, but as the exam did not have data on
speakers of Russian, I opted to replace it with Arabic and Japanese, to further diversify
the background pool. This narrowed down the available essays from 10,000 to 5,000
Within these essays, I looked for instances of the following affixes:
This set of affixes is in keeping with Callies’ study, with the exception of -hood,
which I replaced with -ly. This was due to the extreme unproductiveness of the
morpheme. I might have done the same with -ship, as I found only two instances of its
With these constraints, I extracted all the relevant essays, and used the grep
function to search these for instances of each affix. This was not without its faults, as
grep is a string searching feature, and I was unable to account for spelling errors or
specify morphologically complex words. As such, there is likely an oversight in my data
set. Furthermore, as I could only search for strings, much more was returned than
necessary. Subsequently I combed through these returned lists and determined for myself
which were morphological mistakes and which were not. Human error on my part may
account for typing errors mistaken as morphological errors (the u and i keys are side-by-
side on most computers, and exchanging un- for in- was a popular error), or other
PREFIXES Native Non-Native
un- in-, de-, dis-
SUFFIXES Native Non-Native
VERBAL BASE -ate, -ify, -ize
NOMINAL BASE -ment, -ness, -ship -ion, -ity, -ism
ADJECTIVAL BASE -ful, -ly -able/ible, -ic(al), -ive
With this in mind, I narrowed down from 5,000 essays a data set of 500 instances;
about one error for every ten essays. This data set is what was used to facilitate the
discussion in Section 5 concerning the following:
❖ DOES CLI AFFECT THE L2 USAGE OF ENGLISH AFFIXES, AND HOW?
❖ DOES THE MARKEDNESS OF THE AFFIX PLAY A ROLE IN CLI MANIFESTATIONS?
❖ DO THESE EFFECTS DIFFER BETWEEN L1S?
4. Previous Work
Callies (2014) conducted similar research to the study outline above, using data
from the ICLE databank. Though the selection of relevant texts used in this study was
much smaller than the TOEFL data, the data came from low-stakes settings, potentially
resulting in relatively more creative affix usage.
Callies’ research found that the strategy of foreignzing was largely one of
Romance-language speakers, but found little evidence for trends regarding other L1s.
This was explained as many of these strategies being L2-based, and therefore unlikely to
be conditioned by the L1. His findings also indicated that creative usage resulted in errors
primarily with non-native affixes, suggesting that markedness is indeed a factor.
He had high rates of back-formation for the affix -ate, primarily back-forming
from -(a)tion forms. He also found that un- was overwhelmingly preferred to in-, as was
-ical to -ic.
5. Study Results and Discussion
Results have been normalized based on distributional data and graphed below. It
should be noted that this data was normalized based on each L1, so that this graph
represents the distribution of strategies within the language. This holds for all figures
below. This was done as L1 Romance speakers were the highest frequency group, and
would otherwise have the highest rates in nearly every category. Excerpts used as
examples below are unaltered, and relevant forms are underlined.
FIGURE 1. DISTRIBUTION OF STRATEGY USAGE ACROSS L1!
As noted in Callies (2014), foreignizing appears to be a strategy predominately
used by Romance language speakers. Indeed, my results have found the same pattern. Of
the 33 occurrences of foreignizing in the data set, 31 had Spanish or Italian backgrounds
(the other two were German L1 speakers). This is hardly surprising, as English has had
CO FO OG CV BF RA
ROM GER TUR ARA JAPN
considerable influence from French, another Romance language. Speakers fill lexical
gaps with words from their L1, resulting in formations that apply English affixes to L1
1. The frenetic rythm of our society seem to emarginate the people regarded as
unuseful so it is impossible for them to live in a life in wich every day is seen as a
possible to enjoy life. (Italian, M | marginalize, from Italian
2. Also, you have see the product that is better for you and have realition ships
with the people involucrate in this situation. (Spanish, L | involved, from Spanish
These are occasionally etymologically unrelated to the target word, but instead to
borrowed English words with similar but not identical meanings and usage.
3. In that case students are learning the esential things about the subjects, but are
not taking the time in any of them to specialize and pronfundize their knowledge
in a certain subject. (Spanish, H | deepen, from Spanish profundizar, similar to
4. When you study something that is not obvious or immediatly rapportable at the
real life, for example math, if you are a student, you'll ask to yourself why are you
learning it […] (Italian, M | relatable, from Italian rapportabile, similar to
From the two German instances, the words borrowed from the L1 are unrelated to
English vocabulary. It is likely they perceived the languages as similar enough to borrow
from the L1 to fill a lexical gap in their knowledge, being both Germanic, though
English’s extensive outside influence has replaced words or filled gaps.
5. As mentioned above people who haven't to work anymore have much more time
to concentrate on their hobbies than younger people. So if enjoying life would
really be indicated by the activities of the people, older people can concurate with
young people basically by having more time. (German, M | compete, from
No other strategy is so restricted to an L1, engendering the assumption that many
influences of CLI in word formation are not influenced by the native language.
Considering, though, that foreignizing is the only one that necessitates pulling from the
L1, this is a foreseeable phenomenon.
FIGURE 2. DISTRIBUTION OF AFFIX USAGE ACROSS STRATEGIES!
Afﬁx vs. Error Type
able/ible ate de/dis ful ic(al) ify ion ism ity ive ize ly ment ness ship un/in
CO FO OG CV BF RA
The raw frequency counts showed a roughly equal number of errors with both
native and non-native affixes. This would indicate that markedness may not be as
important a factor in the acquisition of affixes as hypothesized. It is, however, interesting
to note that native affixes tend to be incorrectly used in less ways than non-native affixes.
Aside from the native/non-native pair un/in-, native affixes are primarily misused in one
of three ways: overgeneralization, constraint violations, and coinage. Native affixes show
much higher rates of overgeneralization than non-native affixes. It is possible that, due to
high productiveness, learners simply do not consider the presence of rules. Many of these
affixes are overgeneralized to combine with the bases they aim to create. They may have
simply learned or intuited what part of speech particular affixes form, and do not stop to
consider whether their base already fulfills that function before applying the affix.
6. There are many adolescence who do not have a lot of fun because this stage of
life can be a very difficult one fore many young people. It is the time where the
body changes a lot. Young people have to deal with this changement. (German, M
This is not constrained to native affixes, though it’s much less attested.
7. For have broad knowledge of many academy subjects you have to study a lot
and paid attention to many things for understand well for specialize in one specific
subject you only have to focus in one subject and you may controlate all its
derivates. (Spanish, L | control)
Constraint violations may arise due to perceptions of across-the-board combinability. A
significant amount of the instances of native affix constraint violations come from using
the wrong base, instead of the wrong affix.
8. Although they are usually old, they can make an effort in order to invent new
things and they take risks. Therefore , they can live truthly to be successful.
(Japanese, M | truly)
9. In 2003, i had the opportunity to participate to the world congres of youth in
Morocco, there, i met and debate with young people from different countries about
some of the most burnig isues in the world , and tried to find solutions to deal with
poverty, unemploy; […] (Arabic, M | unemployment)
Instances where the affix is incorrect seem to be mostly (though not entirely) avoiding
use of the gerund -ing. It is possible that learners have trouble mapping more than one
usage to a morpheme, and avoid using this suffix for forms other than the progressive
tense, leading them to find other nominalizing morphemes. This need is rarely filled with
10. Therefore as the number of products on the market have increased
esponentially, the advertisement had to introduced more and more sophisticated
and intrusive mechanisms to get its aim of convincement. (Italian, H | convincing)
11. Another reason is that you will save time and money because you wont spend
time trying to find places unknown for you due to the guideness of the person
leading the group. (Spanish, H | guiding)
Constraint violations with non-native affixes, on the other hand, are more often the result
of applying the wrong affix.
12. So, with no replacement, oils seems to be remain the main source of power for
autos, in accordance with the argument I made above, rising oil prices will still
make these technologies unattractable. (Turkish, H | unattractive)
13. This can lead to the government to save this fossil carburant and to use it better
and so they will see as a priority to incentivate the public transport […] (Italian, M
The large contrast between the meanings of the morpheme used and the one intended
indicates that learners may have little more than a basic understanding of these affixes,
perhaps limited to what syntactic category they attach to and then create. In (12), both
morphemes form adjectives from verbs, but -ive has the meaning “having the nature of”,
and -able “able to be” (among others). It is likely that explicit instruction of non-native
English affixes is shallow at best, and at worst nonexistent, leaving learners to intuit for
themselves a very complicated system of affixation.
High levels of coinage might be explained by, in creating new words, these productive
native affixes are more likely to attach to a new word than the morphemes with more
usage constraints. This may indicate that learners are more aware of the meanings and
usage of these morphemes, and simply lack the vocabulary.
14. […] that certain hyped product is not as good as it is and will cause in other
people a sense of afraidness of spending a fair amount of money on a product that
will surely disappoint them. (Italian, H | fear)
15. We usually link young people to the energy, when I think about a baby, i think
life, movement and when i think about an old person i think something slow,
unlively. (Italian, M | listless)
Instances of coinage using non-native affixes are much more likely to use clipped bases,
mirroring the stem allomorphy often present in their regular usage.
16. Another useful activity young people could do for their community would be
go to elementary schools and sensibilize kids about the problem of bullism, in an
age where it emerges. (Italian, H | bullying)
17. […] it is a place where the child development her or his motricity and
intelectual forms based in understand ideas and not only in learn facts. (Spanish,
M | motor-skills)
In the case of rival affixes un/in-, the native affix un- is highly preferred to its non-native
counterpart. While a portion of errors involving the affix was due to coinage or constraint
18. It is difficult to become an expert in one specific subject without a good
knowledge in other different subjects , but it is unproductive and unuseful towards
society to have a general knowledge but not the possibility to get a specific role.
(Italian, L | useless)
The majority of errors were in the preference of un- over in-. In fact, out of 78 instances
of rival affixation between un- and in-, only in four of these instances was in- preferred
19. This is an important reason , that explains why most of the young people are
uncapable of enjoying life, the way they should, and why older people are capable
to. (Spanish, H | incapable)
20. If he tris his chance in show business, that would not be taking risk, but rather
unrational, since the result would be catastrophe. (Turkish, H | irrational)
In some cases this preference in affix also affects shades of meaning. In (21), unsecure
has connotations of something not being safe, whereas the intended form, insecure, is in
this context intended for a personal feeling about oneself.
21. He's unsecure about his financial situation and therefore now is more
cautionous. (German, H | insecure)
Another interesting phenomenon is the complete lack of back-formation from
native affixes. Aside from foreignizing, back-formation seems to be the least preferred
method among all L1 groups, which may be due to the restrictedness of its usage. The
majority of instances, in fact, come singularly from the application of the suffix -ate, a
pattern also found in Callies’ data.
22. […] if you don't know a thing you can't imaginate of what this thing need.
(Italian, L | imagine, from imagination)
23. Broad knowledge prepare us to rapidly adaptate to a changing enviroment.
(Spanish, H | adapt, from adaptation)
Callies also noted that the majority of these back-formations with -ate come from forms
ending in -(a)tion. My data set corroborates this finding. Of the 48 instances of -ate in my
data, half of these involved back-formation, of which only one was not derived from an -
(a)tion form, but it’s equivalent -ison.
24. In my personal experience, I have traveled both in a group led by a tour guide,
and in a group with out a tour guide. I think that one of this experince is better and
fun. Next I will comparate this two situations. (Spanish, M | compare, from
It is possible that learners consider this affix to be the result of combining -ate + -ion, and
back-form by removing the perceived last suffix (though the allomorph -ison does not
lend itself to this assumption), or that they prefer to use non-native affixes where one is
already present. Although learners frequently overgeneralize the prefix un- to bases with
other, non-native affixes, the application of the other morphemes does not say much to
either side of the latter theory. Occurrences of back-formation also greatly drop off as the
learner progresses, mainly being found in lower-level learners (see Appendix, Fig. 5).
CLI presents itself in many, complicated ways. It operates primarily without
regard for L1, as most word-formation strategies pull from within the target language.
Foreignizing, the only strategy that does not, is overwhelmingly a feature of Romance
languages. Effects, then, manifest vis-à-vis etymological origins and degrees of
markedness. My data has found less that markedness is a factor in determining the
presence of errors, but alternatively in how these errors were brought about. English
affixation is a complicated process and subject to a great number of rules and restrictions,
which are sometimes seemingly arbitrary. Learners cope with these restrictions with
numerous strategies, which may result in usage errors.
In this data set, CLI affected native and non-native affixes in equal numbers,
though the results of this interference are clearly different. Learners apply native affixes
without regard to restrictions: applying nominalizing suffixes to nouns, picking incorrect
base forms, and recombining elements to fill lexical gaps that do not exist within the
language itself. They are more comfortable creating new words using native affixes, and
prefer the native un- to the non-native in-. Learners also avoid mapping more than one
meaning to a particular morpheme, notably the gerund, which shares a form with the
English progressive tense. They tend to fill this gap with native over non-native
Non-native affixes, on the other hand, are heavily affected by constraint violations
involving the application of morphemes, and not the choice of base. English affixes,
especially non-native ones, have overlapping meanings and multiple meanings mapped to
each morpheme. Each of these is then restricted in the bases they can attach to, creating a
veritable maze of usage. As their usage is much more complicated than native affixes, a
lack of explicit instruction would make it much more difficult to navigate that maze and
result in the number of errors present in this data set, a great deal of which are constraint
violations. Examples of coinage employing non-native affixes tend to be subjected to
base allomorphy, which learners do not do with native affixes. All of this suggests that
learners understand how non-native affixes function in general, but not necessarily each
The following graphs were also created using the data compiled, but were
relatively inconclusive and discussed little in Section 5.
FIGURE 3. DISTRIBUTION OF AFFIX USAGE ACROSS L1
Afﬁx vs. L1
able/ible ate de/dis ic(al) ify ion ism ity ive ize ful ly ment ness ship un/in
ROM GER TUR ARA JAPN
FIGURE 4. DISTRIBUTION OF AFFIX USAGE ACROSS LEVEL!
FIGURE 5. DISTRIBUTION OF STRATEGIES ACROSS LEVEL
Afﬁx vs. Level
able/ible ate de/dis ic(al) ify ion ism ity ive ize ful ly ment ness ship un/in
H M L
Error Type vs. Level
CO FO OG CV BF RA
H M L