Subtle patterns of learner language: 13 topics for further research

Subtle patterns of learner language
Steve Pepper 2013-09-26 ASKeladden
13 topics for further research
og
er
det
å
i
jeg
som
en
at
på
for
de
til
ikke
har
med
vi
kan
av
man
men
om
et
så
mange den
varmå
eller
seg
også
mye
veldig
når
være
fra
norge
andre
alle
skal
megdu
vil
noen
hvis
mer
mennesker
ha
dette
barn
bare
blirviktig
fordi
folk
da
han
min
barna
hva
noefå
dem
bli
synes
hvor
selv
etter
hadde
oss
nå
land
år
kommer
ting
gjøre
alt
enn
dag
der
livet
tror
venner
gå
flere
stor
får
trenger

Introduction
• An application of the detection-based
argument (Jarvis 2010)
– Modelled on Jarvis & Crossley (2012)
• Use of data mining methods to
1) automatically detect (predict) the L1
2) identify (lexical) features that serve to
discriminate between L1 groups, i.e.
L1 predictors
• Major advantages:
– Ability to recognize positive as well as
negative transfer
– Ability to detect very subtle patterns that
might otherwise escape notice
Jarvis & Crossley (2012)

Evidence of the third kind...
• The method supplies the first two kinds of
evidence “out of the box”
– The focus here is therefore on supplying the
third kind
• Sources of type 3 evidence
– the learner’s L1 performance
– comparable users’ L1 performance
– contrastive grammars
– traditional grammars
• Involves Contrastive Interlanguage Analysis
(Granger 1996)
– ILL2 < > NLL1
Evidence for
transfer
(Jarvis 2010)
1. Intergroup
heterogeneity
2. Intragroup
homogeneity
3. Cross-language
congruity
4. Intralingual
contrasts

L1 predictors
• 55 features (i.e. words) selected using
Discriminant Analysis (see box)
– DA explained on Saturday at LCR 2013
• Subjected to post-hoc analysis using
Tukey’s HSD
– single-step multiple comparison procedure
and statistical test that is used in conjunction
with an ANOVA to find means that differ
statistically from each other
• The output is not very easy to
interpret…
andre, at, av, bare,
barn, barna, bo, da, de,
den, det, du, eller, en,
enn, er, et, for, fordi,
fra, han, har, hun, i,
ikke, jeg, kan, liker,
man, mange, med,
meg, men, mennesker,
mer, min, mye, norge,
norsk, når, og, også,
om, på, skal, som,
sted, så, til, veldig,
venner, vi, viktig,
være, å

SH EN PL DE NO RU
X
Y Y Y Y
X X X
Df Sum Sq Mean Sq F value Pr(>F)
myData$L1 5 1790 358.1 10.11 2.65e-09 ***
Residuals 594 21044 35.4
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = myData[, X] ~ myData$L1)
$`myData$L1`
diff lwr upr p adj
en-de -1.373 -3.7796269 1.03362692 0.5781845
no-de 0.032 -2.3746269 2.43862692 1.0000000
pl-de -0.239 -2.6456269 2.16762692 0.9997514
ru-de 3.186 0.7793731 5.59262692 0.0023298
sh-de -2.434 -4.8406269 -0.02737308 0.0456381
no-en 1.405 -1.0016269 3.81162692 0.5528485
pl-en 1.134 -1.2726269 3.54062692 0.7583997
ru-en 4.559 2.1523731 6.96562692 0.0000013
sh-en -1.061 -3.4676269 1.34562692 0.8063672
pl-no -0.271 -2.6776269 2.13562692 0.9995400
ru-no 3.154 0.7473731 5.56062692 0.0026907
sh-no -2.466 -4.8726269 -0.05937308 0.0409536
ru-pl 3.425 1.0183731 5.83162692 0.0007589
sh-pl -2.195 -4.6016269 0.21162692 0.0969624
sh-ru -5.620 -8.0266269 -3.21337308 0.0000000
sh en pl de no ru
2.806 3.867 5.001 5.240 5.272 8.426
feature: den
NOTE:
Tukey’s HSD was performed for
groups of six L1s at a time. There were
six such “groups of six”:
– DE, EN, PL and RU were always
included (along with the control
group NO)
– NL, SH, SP, SO, SQ and VI
were each added in turn
– The example above shows the
homogeneity table for the group
of L1s that includes SH
– Examples to follow (including
the next one) contain up to six
homogeneity tables at once
Essence represented visually
as a “homogeneity table”

#1 NL speakers overuse skal
• Finite form of modal auxiliary skulle; used to
form the future tense
han skal lage middag i kveld
he will make dinner tonight
– Other methods:
• non-past: han lager middag i kveld
• construction komme til + infinitive
• Recognized tendency for beginners to overuse
this form
– Partly due to overly simplistic explanations
in teaching materials
• “Futurum lager vi av skal + infinitiv”
(Greftegreff 1985)
• Analysis shows that skal is overused by NL, SH,
SO, SQ and VI learners
RU DE EN NO PL NL
Y
X X X X X
RU DE EN NO PL SH
Y
X X X X X
RU DE EN NO PL SO
Y
X X X X X
RU DE EN NO PL SP
X X X X X X
RU DE EN NO PL SQ
Y
X X X X X
RU DE EN NO PL VI
Y
X X X X X
? proficiency
? thematic bias
? transfer

Proficiency?
• We have CEFR ratings for 7 of the 10
L1 groups (not NL, SH, SQ)
– VI and SO score lowest
– DE and EN score highest
• For these 7 L1 groups, overuse of skal
thus correlates with linguistic and/or
cultural distance
– VI and SO communities in Norway
originated as refugees
– If lower proficiency explains overuse
of skal for VI and SO, chances are
that it also does so for SH and SQ
– But this does not explain the NL case
• So could the reason for NL users’
overuse be thematic bias?
0 20 40 60 80 100
SO
VI
SP
RU
PL
EN
DE
A2 A2/B1 B1 B1/B2 B2 B2/C1 C1

Thematic bias?
• Some topics are more concerned with future events than others
– Over half the occurrences of skal are in 6 of the 46 topics
• Cf. occurrences pr. text (“freq”) with the topic held constant
– 4.9 (NL) >> 2.9 (SP)
– 1.3 (NL) >> 0.5 (EN) and 0.6 (SP)
– 1.1 (NL) >> 0.7 (DE) and 0.4 (EN)
• Even with the topic held constant, the tendency is clear
• Thematic bias can thus be ruled out
DE EN NL SP
wc tc freq wc tc freq wc tc freq wc tc freq
Framtida - - -   - - -   39 8 4.9 29 10 2.9
Bomiljø - - -   20 38 0.5 21 16 1.3 14 23 0.6
Bolig og bosted - - -   - - -   13 9 1.4 - - -
Frivillig hjelp i
organisasjoner 2 5 0.4 - - -   9 2 4.5 - - -
Nyheter 7 10 0.7 4 9 0.4 8 7 1.1 2 -
Reise - - -   - - -   8 14 0.6 - - -

Cross-linguistic explanation
• In NL the future tenses are formed with the auxiliary zullen
hij zal het diner vanavond maken
• NL zullen cognate with skulle – finite form zal similar in form to skal
– EN shall also cognate with skal and similar in form, but much less frequent
in EN than ’ll, will and going to
– DE werden is neither cognate nor similar in form
• Conclusion: Strong tendency for NL speakers to overuse skal appears to
be a case of formal lexical transfer
– Caveat: NL has other means to express future action, including the non-past
tense (hij maakt het diner vanavond) and the auxiliary gaan
– Further investigation of relative frequencies necessary in order to confirm or
disconfirm possible transfer effects
➔ Is there anything else that should be considered???

#2 DE speakers overuse en
• Speakers of Slavic languages use the indefinite
articles en (m.) and et (n.) much less frequently
than learners from other L1 backgrounds
– Also applies to SO, SQ and VI. As expected
• But why do DE speakers use the masculine form
en more than everyone else?
– DE forms ein (m., n.), eine (f.) bear strong formal
resemblance to en
– Tendency to use en instead of et because of this?
– Detailed error analysis required.
• Hypothesis
– That DE speakers commit errors of type
<sic type="W" corr="et"><word>en</word></sic>
more frequently than other L1 groups
➔ Comments???
PL RU EN NO NL DE
Y Y Y
X X X
Y Y
X X
PL SH RU EN NO DE
Y Y
X X
Y Y
X X X
PL RU SO EN NO DE
Y Y
X X X
Y Y Y
X X X
PL RU SP EN NO DE
Y Y
X X X
Y Y Y
X X
PL RU SQ EN NO DE
Y Y
X X X
Y Y Y
X X
PL VI RU EN NO DE
Y Y
X X
Y Y Y
X X X

#3 EN speakers overuse et
• Cross-linguistic explanation?
– Avoidance of en (as indefinite article) due to
identification with the numeral ‘one’?
– Greater similarity between EN ‘a’ [ə] and NO et
(short vowel, unvoiced dental plosive) than between
‘a’ and NO en (formal lexical transfer)?
• Greater similarity between en and EN ‘an’, but ‘an’
much less frequent than ‘a’
– Wiktionary rankings #102 and #5 respectively
– ‘a’ occurs 11 times more often that ‘an’
– Evidence that frequency constrains transfer?
• Conclusion: L1 transfer appears to be at work
when EN speakers overuse et
➔ But how can this be proved beyond doubt???
RU PL DE NL NO EN
X X
Y Y Y
X X X
RU PL SH DE NO EN
Y Y
X X X X
SO RU PL DE NO EN
X X
Y Y
X X X X
RU PL DE SP NO EN
X X
Y Y Y
X X X X
RU PL SQ DE NO EN
X X
Y Y
X X X X
RU PL DE VI NO EN
X X
Y Y Y
X X X X

#4 PL and RU speakers: den and det
• These are 3SG pronouns, demonstratives, and
(preposed) definite articles
• RU speakers use den (m.) significantly more
often than all other L1 groups, including PL
speakers
• PL speakers use det (n.) significantly more
often than RU speakers
– Absolute usage figures:
• den PL 122, RU 166 (~40:60)
• det PL 668, RU 496 (~60:40)
➔ Why???
➔ How can we find out???
NOTE:
• 3SG personal pronouns
are identical in
PL (on, ona, ono) and
RU (он, она, оно)
• Demonstrative pronouns
– PL ten, ta, to
– RU етот, ето, ета
$den
SH EN PL DE NO RU
X
Y Y Y Y
X X X
$det
NO RU SH EN DE PL
X X X X
Y Y Y Y
X X X X

#5 EN speakers overuse er
• EN speakers use er ‘is, are’ statistically more
than all other L1 groups (except PL and SH)
• Most likely explanation: formal transfer
– formal resemblance er [æɾ] ~ are [ɑ(ɹ)]
EN NO
sg pl sg pl
1. am are er er
2. are are er er
3. is are er er
• High salience of ‘to be’ in English (not least
because of present continuous)
– And yet, ENPC shows finite forms of NO være to
be more frequent than finite forms of EN be
• 8,182 vs. 6,566 occurrences
➔ So how to explain EN overuse???
RU NO NL DE PL EN
X X
Y Y Y
X X X X
RU NO DE PL SH EN
X X X
Y Y Y
X X X
SO RU NO DE PL EN
X X
Y Y
X X X X
RU NO DE SP PL EN
Y Y
X X X
Y Y Y
X X X
RU NO SQ DE PL EN
X X
Y Y Y
X X X X
RU VI NO DE PL EN
X X
Y Y
X X X X

#6 While RU speakers underuse er
• PL and SH speakers use er more than RU
speakers
– Despite the fact that they are all Slavic languages
• PL and SH have a copula in the present tense
(być and бити ~ biti)
PL dom jest tam
SH куђа је тамо ~ kuća je tamo
‘the house is there’
• RU no longer has such a copula
RU дом _ там
‘the house is there’
➔ Case proved???
RU NO NL DE PL EN
X X
Y Y Y
X X X X
RU NO DE PL SH EN
X X X
Y Y Y
X X X
SO RU NO DE PL EN
X X
Y Y
X X X X
RU NO DE SP PL EN
Y Y
X X X
Y Y Y
X X X
RU NO SQ DE PL EN
X X
Y Y Y
X X X X
RU VI NO DE PL EN
X X
Y Y
X X X X

#7 Many L1 groups underuse være
Underuse by RU, SH, SO, SQ and VI
Possible cross-linguistic explanations:
RU no copula in present tense
VI copula là not used with adjectives
(because adjectives are verbal), thus:
Mai là sinh viên
‘Mai is (a) student’
but
Mai cao
‘Mai is tall’
SH copula exists but little used due to
contact with other Balkan languages
SO yahay ‘to be’ contracts with adjectives,
losing its root (-ah-) in the process
SQ no infinitives (është is finite form)
➔ Case proved???
RU NL DE PL NO EN
Y Y Y Y Y
X X X X X
SH RU DE PL NO EN
Y Y Y Y
X X X X X
SO RU DE PL NO EN
X X X X
Y Y Y Y
X X X X
RU DE PL NO SP EN
Y Y Y Y Y
X X X X
SQ RU DE PL NO EN
Y Y Y Y
X X X X X
VI RU DE PL NO EN
Y Y Y Y
X X X X X

#8 But EN speakers overuse være
• Overuse by EN speakers
– Difference is statistical w.r.t. RU, SH, SO, SQ
and VI
• Difference w.r.t. NO not statistical, but still
noticeable
– In the English-Norwegian Parallel Corpus, be
occurs much more frequently in English texts
(both fiction and non-fiction) than være does in
Norwegian texts
• be: 3,126 occurrences
• være: 1,193 occurrences
– Worthy of a more detailed investigation using
ENPC
➔ Alternative explanations?
RU NL DE PL NO EN
Y Y Y Y Y
X X X X X
SH RU DE PL NO EN
Y Y Y Y
X X X X X
SO RU DE PL NO EN
X X X X
Y Y Y Y
X X X X
RU DE PL NO SP EN
Y Y Y Y Y
X X X X
SQ RU DE PL NO EN
Y Y Y Y
X X X X X
VI RU DE PL NO EN
Y Y Y Y
X X X X X

#9 Prepositions i and på
• Preposition på ‘on’
– EN (overuse) vs. DE (underuse)
– Investigate using error analysis
– Check type and token frequencies of
constructions in which corresponding
L1 forms (on and auf) are congruent
in one L1 but not the other, e.g.:
– NO på søndag ≡EN on Sunday
but≠DE am Sonntag
whereas
– NO på engelsk ≡DE auf Englisch
but≠EN in English
• Preposition i ‘in’
– RU (overuse) vs. PL (underuse)
– Investigate using error analysis
➔ Any suggestions???
$i
PL EN DE NO NL RU
X X X
Y Y Y Y
X X X X
PL EN DE SH NO RU
Y Y
X X X X X
PL EN DE SO NO RU
Y Y Y
X X X X X
PL EN SP DE NO RU
Y Y
X X X X X
PL EN DE NO SQ RU
X X X
Y Y Y
X X X X
PL EN DE NO VI RU
X X X
Y Y Y Y
X X X X
$på
DE RU NO NL PL EN
Y Y Y Y Y
X X X X X
DE RU NO SH PL EN
Y Y Y Y Y
X X X X X
SO DE RU NO PL EN
X X X X
Y Y Y Y
X X X X
DE RU NO SP PL EN
Y Y Y Y Y
X X X X X
DE SQ RU NO PL EN
Y Y Y Y Y
X X X X X
DE RU NO VI PL EN
Y Y Y Y Y
X X X X X
Prepositions, especially spatial prepositions, are renowned for being “among the hardest expressions to acquire when learning a second language”
(Coventry & Garrod 2004: 4) and they have already been the subject of some interesting work based on ASK (Szymanska 2010; Malcher 2011).

#10 Prepositions til and fra
• Preposition til ‘to’
– underused by all L1 groups,
especially DE, SH and SQ
– …
• Preposition fra ‘from’
– used statistically more often
by EN speakers than by PL
or native speakers
– …
➔ Any suggestions here???
$til
DE RU PL NL EN NO
Y Y Y Y Y
X X X X X
SH DE RU PL EN NO
Y Y Y Y
X X X X X
DE RU SO PL EN NO
Y Y Y Y Y
X X X X X
DE RU SP PL EN NO
Y Y Y Y Y
X X X X X
SQ DE RU PL EN NO
Y Y Y Y
X X X X X
DE RU PL VI EN NO
Y Y Y Y Y
X X X X X
$fra
NO PL DE NL RU EN
X X X X
Y Y Y Y
X X X
NO PL SH DE RU EN
X X
Y Y Y
X X X X
NO PL DE SO RU EN
X X X X
Y Y Y Y
X X X
NO PL DE SP RU EN
X X X X
Y Y Y Y
X X X
NO PL DE SQ RU EN
X X X X
Y Y Y Y
X X X
NO PL DE VI RU EN
X X X X
Y Y Y Y
X X X X

#11 Underuse and overuse of og
• Striking contrast between PL speakers
(underuse) and RU speakers (overuse)
– Cannot be formal transfer, since PL i and RU и
are phonologically identical
• Different token frequencies in L1s?
– Wiktionary frequency lists (WFREQ)*
• RU и ranked as #1
• PL i ranked as #2 (after w ‘in’)
– Raw frequencies not comparable in WFREQ
• Zipfian distribution?
• Requires further investigation
➔ Your suggestions???
PL DE NL EN NO RU
Y Y
X X X X
PL DE SH EN NO RU
X X
Y Y
X X X X
PL DE EN SO NO RU
X X
Y Y Y
X X X X
PL SP DE EN NO RU
X X
Y Y
X X X X
PL SQ DE EN NO RU
X X
Y Y
X X X X
VI PL DE EN NO RU
X X
Y Y
X X X X
* http://en.wiktionary.org/wiki/Wiktionary:FREQ

#12 Overuse and underuse of eller
• DE and EN speakers overuse eller ‘or’
– Difference w.r.t. to NL is highly statistical
• This seems odd. (Are the Dutch more
decisive than the English and Germans?)
– Difference between DE and NO also statistical
– Frequency related?
• Mutual correspondence between NO eller
and EN ‘or’ is 84%
• RU speakers underuse eller
– Strong formal resemblance with или (ili)
• Possible cross-linguistic explanation
– или has a more restricted distribution
– Not used in negative contexts
он не любит ни футбол, ни теннис
‘he doesn’t like football or tennis’
RU NO NL PL EN DE
X X
Y Y Y Y
X X X X
RU SH NO PL EN DE
X X
Y Y Y
X X X X
RU SO NO PL EN DE
X X
Y Y Y
X X X X
RU NO PL SP EN DE
X X
Y Y Y Y
X X X
RU SQ NO PL EN DE
X X
Y Y Y
X X X X
RU VI NO PL EN DE
X X
Y Y Y
X X X X

#13 More general questions
• Misclassification can also be revealing
– Texts written by EN learners are more often misclassified as SP, rather than NL
or DE, despite EN being more closely related to the latter
➔ Why???
– Texts by SO and SQ learners are most often misclassified as RU, whilst texts
by VI learners are most often misclassified as PL
➔ Again, why???
• All the 12 patterns discussed above pertain to Indo-European languages
most closely related to NO (DE, EN, NL; PL, RU)
– There no really clear-cut predictors for the most distantly related L1s,
i.e. SO, SQ and VI
➔ Why???

Conclusion
• Discriminant analysis reveals subtle patterns of L2 usage that
would otherwise go undetected
• Homogeneity tables based on Tukey’s HSD can help us
understand those patterns
• Contrastive analysis is required in order to confirm that the
patterns are due to cross-linguistic influence
• All 13 issues discussed in this chapter are suitable topics for
further research using ASK
• This study has merely scratched the surface…

13 research questions
1. Why do NL speakers overuse skal?
2. Why do DE speakers overuse en?
3. Why do EN speakers overuse et?
4. Why do PL and RU speakers differ so
much in their use of den and det?
5. Why do EN speakers overuse er?
6. Why do RU speakers underuse er?
7. Why do many L1 groups underuse være?
8. Why do EN speakers, on the other hand,
overuse være?
9. Why do EN speakers overuse på, while DE
speakers underuse it?
And why do RU speakers overuse i, while PL
speakers underuse it?
10. Why do all L1 groups underuse til –
and why do EN speakers overuse fra?
11. Why do PL and RU speakers differ so
markedly in their use of og?
12. Why do EN and DE speakers overuse eller and
why do RU speakers underuse it?
13. What lies behind the misclassification patterns,
and why are there no good predictors for SO,
SQ and VI?

References
Donaldson, Bruce. 1997. Dutch: A Comprehensive Grammar. London: Routledge.
Granger, Sylviane. 1996. From CA to CIA and back: An integrated approach to computerized bilingual and
learner corpora. In Karin Aijmer, Bengt Altenberg and Mats Johansson (eds.) Languages in Contrast.
Papers from a symposium on text-based cross-linguistic studies. Lund 4–5 March 1994. Lund: Lund
University Press [Lund Studies in English 88], 37–51.
Greftegreff, Liv Astrid. 1985. Enkel norsk grammatikk. Oslo: NKS-Forlaget.
Husby, Olaf. 1999. En kort innføring i albansk. Trondheim: Tapir.
Husby, Olaf. 2001. En kort innføring i somali. Trondheim: Tapir.
Jarvis, Scott. 2010. Comparison-based and detection-based approaches to transfer research. EUROSLA
Yearbook 10, 169 192.‑
Jarvis, Scott & Scott A. Crossley (eds.) 2012. Approaching Language Transfer through Text Classification.
Explorations in the detection-based approach. Bristol: Multilingual Matters.
Koolhoven, H. 1961. Teach yourself Dutch. London: The English Universities Press.
Lie, Svein. 2005. Kontrastiv grammatikk – med norsk i sentrum, 3rd Edition. Oslo: Novus.
Malcher, Jenny. 2011. Jeg liker å treffe folk i café. Man må nyter de fine tingene på verden! Preposisjoner og
morsmålstransfer – en korpusbasert studie med i og på i fokus. Masters thesis, Department of Linguistics
and Scandinavian Studies, University of Oslo.
Mønnesland, Svein. 1990. Serbokroatisk-norsk kontrastiv grammatikk. In Hvenekilde, Anne (ed.) Med to
språk: Fem kontrastive språkstudier for lærere. Oslo: Cappelen.
Saaed, John Ibrahim. 1993. Somali Reference Grammar, 2nd Edition. Kensington, MD: Dunwoody Press.
Szymanska, Oliwia. 2010b. A conceptual approach towards the use of prepositional phrases in Norwegian – the
case of i and på. Folia Scandinavica 11, 173-183.
Wade, Terence. 2011. A Comprehensive Russian Grammar. Wiley: Malden MA.
Wiull, Hans Olaf. 2007. Bli bedre i norsk – se forskjellene mellom norsk og vietnamesisk. Oslo: VOX.

Subtle patterns of learner language: 13 topics for further research

Recommended

Recommended

More Related Content

Similar to Subtle patterns of learner language: 13 topics for further research

Similar to Subtle patterns of learner language: 13 topics for further research (20)

Recently uploaded

Recently uploaded (20)

Subtle patterns of learner language: 13 topics for further research

Editor's Notes