SlideShare a Scribd company logo
1 of 55
Download to read offline
Filled pauses (FPs)
and L2 proficiency
Finnish Australians speaking English
Timo Lauttamus, John Nerbonne and
Wybo Wiersma
University of Oulu, Rijksuniversiteit Groningen
timo.lauttamus@oulu.fi, j.nerbonne@rug.nl &
wybo@logilogi.org
5 June 2009
In this talk we present
A method for detecting syntactic differences and
our findings on pausing
3 sub-questions about the method
1 What did your corpus look like ?
2 What is permutation statistics ?
3 How to apply it to syntax ?
3 sub-questions about the results
1 What general differences did you find ?
2 How much pausing is there, and by who ?
3 What does this tell about the speakers ?
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Outline of the Talk
Introduction
In this Talk
Outline of the Talk
The Method
Results
Conclusion
Questions
References
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
The Method
Introduction
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
Conclusion
Questions
References
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
The Method
• Detect a wide range of syntax differences, and these as
• significant differences
• aggregate differences
• relative differences
• This would enable measuring the syntax part of total
impact:
“No easy way of measuring or characterizing the
total impact of one language on another in the
speech of bilinguals has been, or probably can be
devised. The only possible procedure is to describe
the various forms of interference and to tabulate
their frequency.”
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
The Method
• Detect a wide range of syntax differences, and these as
• significant differences
• aggregate differences
• relative differences
• This would enable measuring the syntax part of total
impact:
“No easy way of measuring or characterizing the
total impact of one language on another in the
speech of bilinguals has been, or probably can be
devised. The only possible procedure is to describe
the various forms of interference and to tabulate
their frequency.”
- Weinreich, Languages in Contact
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
The Method
We also want to do this automatically and computationally
In order to be able to:
• Mine for differences in syntax between
• learners versus native speakers
• speakers of different dialects
• writers from different discourses
• Test dialectological and other linguistic hypotheses
• Note over- and under-use instead of right / wrong
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
The Method
We did it in four steps:
1 Tag 2 or more collections of comparable material (using
an automatic POS-tagger)
2 Take n-grams (2 - 5 grams) of POS-tags
3 Statistically compare their frequencies
4 Sort the significant POS-n-grams by extent of difference
Aarts J. and Granger S. did this without the statistics in:
’Tag sequences in learner corpora: a key to interlanguage
grammar and discourse’ (1998)
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Our Corpus
Origins:
• 20,000 Finns immigrated to Australia
• Working class background, limited education
• 25-40 Years upon arrival
Corpus collected 1995-1998 by Greg Watson:
• of the university of Joensuu, Finland
• two age groups; adults and juveniles
• 350.000 words, 305.000 words free conversation
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Our Corpus
Adults:
• over 18 years at arrival, on average 30
• on average 58 at time of interview
• 60 interviews, 65 - 70 min each (221.000 words)
Juveniles:
• under 16 years at arrival, on average 6
• on average 36 at time of interview
• 30 interviews, 65 - 70 min each (84.000 words)
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Our Corpus
In preparation we Part of Speech-tagged it with:
• Trigrams ’n ’Tags (TnT) Statistical POS Tagger
• made by Thorsten Brants (Universitt des Saarlandes)
It achieves an accuracy of:
• 96.7% on the Penn Treebank
• 85.1% - 90.5% on our spoken material
Accuracy is of course worse for 3-grams:
• 2-grams 74%, 3-grams 65%, 4-grams 58% ...
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Permutation Statistics
It is different from parametric (normal) statistics:
• It is about the data, not about the population
• no need for normality
• no need for homoscedasticity (eq distrib variances)
• no absolute need for random sampling
• Still, important for permutation statistics are
• random assignment and independence of observations
• in practice no problems for linguistic/dialect data
As a statistical method it is very suitable for linguistics
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Permutation Statistics
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Permutation Statistics
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Permutation Statistics
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Permutation Statistics
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Permutation Statistics
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Permutation Statistics
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Permutation Statistics
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Permutation Statistics
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Permutation Statistics
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Permutation Statistics
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Permutation Statistics
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Applying it to Syntax
One firstly needs something to permutate:
• We permutated interviewees
• more conservative than 3-grams
• and also easier than sentences (did this earlier)
• For each interview
• we took 3-grams (N-grams too) of POS-tags
• we then calculated the 3-gram-promillages for all
3-grams (occurrence of 3-gram type per 1000 3-gram
tokens)
• These 3-gram-promillage-vectors were then used
• summed per group after each permutation
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Applying it to Syntax
Secondly one needs something to measure extremity:
• Both
• for the whole group
• and for each individual POS-3-gram
• We used r-square and summed r-square
• we also tried cosine and summed r
• R-square is the square of the difference (r)
• for a POS-3-gram-promillage between the 2 groups
• Summed r-square is the sum of r-square for all 3-grams
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
R-square
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Applying it to Syntax
Thirdly one needs to apply normalizations for:
• Text-size per subject (for each subject)
• divide by sum of the subjects’ 3-grams (the promillages)
• to eliminate differences in text-size between authors
• Frequencies of 3-gram types (for each 3-gram type)
• divide by the corpus-wide total of the 3-gram-type
• to eliminate differences in frequencies (optional)
• Group-size (for both groups, across permutations)
• divide by the average fequency of 3-grams in the group
• to correctly detect over- and under-use
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Applying it to Syntax
Normalisations are needed to
• Prevent false significance
• arising from differences in text- and group-sizes
• Increase the weight of less frequent 3-grams
• on the level of the group
• (as said this is optional)
• And it allows one to sort 3-grams based on
• whether they are more or less typical for each group
• relative to group-size
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Normalizing for Frequency
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Results and Analysis
Introduction
The Method
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Results and Analysis
• The following slides summarise some of the material in
Lauttamus, Nerbonne, and Wiersma (2007, 2009)
• The evidence based on the data of the two groups
shows that there are statistically significant syntactic
differences between the adult and juvenile groups
• We argue that some of the significant differences found
in the data can be ascribed to the lower level of
language proficiency of the adults
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
General Differences
Some of the syntactic differences found in the data can be
described in most general terms as follows (all for the adults
group):
1 Overuse of hesitation phenomena
2 Overuse of parataxis
3 Underuse of contracted forms
4 Reduced repertoire of discourse markers
5 Avoidance of complex verbal structures
6 Avoidance of prepositional and phrasal verbs
7 Underuse of the existential there
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
General Differences
• The adults demonstrate features of disfluent speech
• such as (filled) pauses, repeats, false starts, incomplete
or false syntactic structures, arising from difficulties in
speech processing, and particularly in lexical access
• We argue that the statistical evidence obtained from
our data reflects syntactic distance between the two
varieties of L2 English
• And, consequently, aggregate effects of the differences
in the two groups English proficiency
• We will now look further into pausing
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Pausing
We applied the computational technique described earlier to
examine:
1 if the the adults and juveniles show a differential use of
pausing (filled pauses, FPs), and
2 how such a difference can be analysed and explained
Thus:
• We will now look at the 308 POS-trigrams typical for
the adult (L1) speakers’ syntax
• first we look at the top 200
• compare them to the juveniles’ POS-trigrams
• and then we look at all of them
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Fig. 1: Percentage of FPs, adults top-200
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Pausing
• These are the top 200 POS-trigram types which most
characteristically distinguish the adults from the
juveniles
• they are significant at a p ≤ 0.05 level
• 42.5%, 85 out of 200
• include at least one filled pause, as in (1) and (2)
• 6%, 12 out of 200
• include at least two filled pauses, as in (3) and (4)
• In addition, there is one trigram with only filled pauses
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Pausing
(1)
Interj Conj(subord) Art (def)
politically | uh when the | liberals
(2)
V(cop,pres,encl) Interj Adv(inten)
I’ | m ah very | sick
(3)
Interj Interj Conj(subord)
and | uh uh because | in the morning
(4)
Interj Pron(pers, sing) Interj
and | uh I uh | snow-skied
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Pausing
• For the juveniles there are 792 POS-trigram types in
which they use the sequence of POS tags more
frequently than the adults
• Again significant at the p ≤ 0.5 level
• But only 0.4%, 3 out of 792
• include one filled pause
• None include more than one FP
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Fig. 2: Percentage of FPs, adults all 308
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Pausing
• Of all 308 POS-trigram types typical for the adults
• this are all POS-trigrams significant at a p 0.05 level
• 38.0%, 117 out of 308
• include at least one filled pause
• 4.5%, 14 out of 308
• include two filled pauses
• And again there is one trigram type with FPs only
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Analysis
• Both figures show the same trend. The highly skewed
distribution of the filled pauses across the two groups of
Finnish Australian English speakers conclusively shows
that
• the juveniles have a much more varied syntactic
repertoire (measured in terms of POS-trigrams) than
the adults, and
• the adults have much more limited and idiosyncratic
(ungrammatical or substandard) syntactic patterns at
their disposal than the juveniles
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Analysis
• The large number of filled pauses found in the adults
speech as opposed to the juveniles’ is in agreement with
the evidence in Paananen-Porkka (2007: 234), who
argues that native speakers of Finnish show longer
pauses on average in English than in Finnish
• The statistically significant differential use of filled
pauses by the adults can be explained in terms of the
adults lesser proficiency (particularly at the level of
speech planning) and, consequently, fluency of L2.
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Analysis
• The elimination of all FPs from the data has little effect
on the significance value for the tag sets
• The outcome of running the scripts again without the
FPs showed that there are still
• 729 statistically significant trigram types for the
juveniles
• as opposed to 220 for the adults
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Analysis
• The examination of the top 200 FP-less trigram types
produced by the adults showed that about 38% of the
trigram types are ungrammatical, and that some of the
remaining trigram types are substandard
• (e.g. omission of an obligatory article or preposition,
omission of an obligatory copula or primary verb be or
have, omission of the subject, use of a redundant article
with proper nouns etc.; cf. Lauttamus et al. 2007).
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Conclusion
• The uneven distribution of the filled pauses across the
two groups of Finnish Australian English speakers
conclusively shows that
• the adults used much more filled pauses than the
juveniles, and
• that the adults have much more limited and
idiosyncratic syntactic patterns at their disposal
• The statistically significant differential use of filled
pauses by the adults can be explained in terms of the
adults lesser proficiency compared to that of the
juveniles
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Concluding Remarks
There is room for fine-tuning the method:
• Find optimum size for data-sets
• Try and evaluate with different measures
The method as is can easily be applied to many data-sets:
• Works on untagged corpora of spoken language
• Can empirically buttress theses
Software to do it and to pre-process corpora is freely
available:
• the ComLinToo http://old.logilogi.org/ComLinToo
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Questions
Any questions ?
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Questions
Any questions ?
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
References
Jan Aarts and Sylviane Granger.
Tag sequences in learner corpora: A key to
interlanguage grammar and discourse.
In Sylviane Granger, editor, Learner English on
Computer, pages 132 – 141. Longman, London, 1998.
Alan Agresti.
An Introduction to Categorical Data Analysis.
Wiley, New York, 1996.
Thorsten Brants.
Tnt - a statistical part of speech tagger.
In 6th Applied Natural Language Processing Conference,
pages 224 – 231. ACL, Seattle, 2000.
Eugenio Coseriu.
Probleme der kontrastiven Grammatik.
Schwann, D¨usseldorf, 1970.
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
References
Kees de Bot, Wander Lowie, and Marjolijn Verspoor.
Second Language Acquisition: An Advanced Resource
Book.
Routledge, London, 2005.
Charles Fillmore and Paul Kay.
Grammatical constructions and linguistic generalizations:
the what’s x doing y construction.
Language, 75, 1999.
Roger Garside, Geoffrey Leech, and Tony McEmery.
Corpus Annotation: Linguistic Information from
Computer Text Corpora.
Longman, London/New York, 1997.
Phillip Good.
Permutation Tests.
Springer, New York, 1995 2nd, corr. ed.
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
References
Brett Kessler.
The Significance of Word Lists.
CSLI Press, Stanford, 2001.
T. Lauttamus, J. Nerbonne, and W. Wiersma.
Detecting syntactic contamination in emigrants: The
english of finnish australians.
SKY Journal of Linguistics, 20, 2007.
Chris Manning and Hinrich Sch¨utze.
Foundations of Statistical Natural Language Processing.
MIT Press, Cambridge, 1999.
J. Nerbonne and W. Wiersma.
A measure of aggregate syntactic distance.
In J. Nerbonne and E. Hinrichs, editors, Linguistic
Distances, pages 82 – 90. PA: ACL, Shroudsburg, 2006.
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
References
C.C.E. Oomen and A. Postma.
Effects of divided attention on the production of filled
pauses and repetitions.
Journal of Speech, Language, Hearing Research, 44,
2001.
M. Paananen-Porkka.
Speech Rhythm in an Interlanguage Perspective: Finnish
Adolescents Speaking English. Pragmatics, Ideology and
Contact Monographs.
University of Helsinki, Helsinki, 2007.
Shana Poplack and David Sankoff.
Borrowing: the synchrony of integration.
Linguistics, 22, 1984.
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
References
Shana Poplack, David Sankoff, and Christopher Miller.
The social correlates and linguistic processes of lexical
borrowing and assimilation.
Linguistics, 26, 1988.
William C. Ritchie and editors Tej K. Bhatia.
Handbook of Child Language Acquisition.
Academic, San Diego, 1998.
Peter Sells.
Lectures on Contemporary Syntactic Theories.
CSLI, Stanford, 1982.
Sarah Thomason and Terrence Kaufmann.
Language Contact, Creolization and Genetic Linguistics.
University of California Press, Berkeley, 1988.
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
References
Frans van Coetsem.
Loan phonology and the two transfer types in language
contact.
In Publications in Language Sciences, page 27. Foris
Publications, Dordrecht, 1988.
Greg Watson.
The finnish-australian english corpus.
ICAME Journal: Computers in English Linguistics, 20,
1996.
Uriel Weinreich.
Languages in Contact.
Mouton, The Hague, 1953 (page numbers from 2nd ed.
1968).
FPs and L2
proficiency:
Timo Lauttamus,
John Nerbonne
and Wybo
Wiersma
Introduction
In this Talk
Outline
The Method
Our Corpus
Permutation Statistics
Applying it to Syntax
Results
General Differences
Pausing
Analysis
Conclusion
Questions
References
Copyleft
Copyrights Wybo Wiersma, John Nerbonne and Timo
Lauttamus, available under the Creative Commons By-Sa
license
• Thanks to the OpenClipart archive; Carlitos for the
landscape, and unknown authors for the bomb and the
frogs.
http://creativecommons.org/licenses/by-sa/2.5/

More Related Content

What's hot

Processing short-message communications in low-resource languages
Processing short-message communications in low-resource languages�Processing short-message communications in low-resource languages�
Processing short-message communications in low-resource languages Robert Munro
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
FORMANT ANALYSIS OF BANGLA VOWEL FOR AUTOMATIC SPEECH RECOGNITION
FORMANT ANALYSIS OF BANGLA VOWEL FOR AUTOMATIC SPEECH RECOGNITIONFORMANT ANALYSIS OF BANGLA VOWEL FOR AUTOMATIC SPEECH RECOGNITION
FORMANT ANALYSIS OF BANGLA VOWEL FOR AUTOMATIC SPEECH RECOGNITIONsipij
 
Word representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2VecWord representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2Vecananth
 
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...ijnlc
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 
Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...IJECEIAES
 
Sipij040305SPEECH EVALUATION WITH SPECIAL FOCUS ON CHILDREN SUFFERING FROM AP...
Sipij040305SPEECH EVALUATION WITH SPECIAL FOCUS ON CHILDREN SUFFERING FROM AP...Sipij040305SPEECH EVALUATION WITH SPECIAL FOCUS ON CHILDREN SUFFERING FROM AP...
Sipij040305SPEECH EVALUATION WITH SPECIAL FOCUS ON CHILDREN SUFFERING FROM AP...sipij
 
Warnikchow - SAIT - 0529
Warnikchow - SAIT - 0529Warnikchow - SAIT - 0529
Warnikchow - SAIT - 0529WarNik Chow
 
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerImplementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerIOSR Journals
 

What's hot (17)

Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
 
Processing short-message communications in low-resource languages
Processing short-message communications in low-resource languages�Processing short-message communications in low-resource languages�
Processing short-message communications in low-resource languages
 
SSSLW 2017
SSSLW 2017SSSLW 2017
SSSLW 2017
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
FORMANT ANALYSIS OF BANGLA VOWEL FOR AUTOMATIC SPEECH RECOGNITION
FORMANT ANALYSIS OF BANGLA VOWEL FOR AUTOMATIC SPEECH RECOGNITIONFORMANT ANALYSIS OF BANGLA VOWEL FOR AUTOMATIC SPEECH RECOGNITION
FORMANT ANALYSIS OF BANGLA VOWEL FOR AUTOMATIC SPEECH RECOGNITION
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
Word representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2VecWord representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2Vec
 
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
 
Ey4301913917
Ey4301913917Ey4301913917
Ey4301913917
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...
 
Sipij040305SPEECH EVALUATION WITH SPECIAL FOCUS ON CHILDREN SUFFERING FROM AP...
Sipij040305SPEECH EVALUATION WITH SPECIAL FOCUS ON CHILDREN SUFFERING FROM AP...Sipij040305SPEECH EVALUATION WITH SPECIAL FOCUS ON CHILDREN SUFFERING FROM AP...
Sipij040305SPEECH EVALUATION WITH SPECIAL FOCUS ON CHILDREN SUFFERING FROM AP...
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Warnikchow - SAIT - 0529
Warnikchow - SAIT - 0529Warnikchow - SAIT - 0529
Warnikchow - SAIT - 0529
 
Prosodic features of the topic information unit in BP and EP
Prosodic features of the topic information unit in BP and EPProsodic features of the topic information unit in BP and EP
Prosodic features of the topic information unit in BP and EP
 
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerImplementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
 
551 466-472
551 466-472551 466-472
551 466-472
 

Similar to Filled pauses and L2 proficiency: Finnish Australians speaking English

Using Corpus Linguistics to Teach ESL Pronunication
Using Corpus Linguistics to Teach ESL PronunicationUsing Corpus Linguistics to Teach ESL Pronunication
Using Corpus Linguistics to Teach ESL PronunicationRebecca Allen
 
Investigating Teachers' Perceptions of Fluency
Investigating Teachers' Perceptions of FluencyInvestigating Teachers' Perceptions of Fluency
Investigating Teachers' Perceptions of FluencyEllen Head
 
SLA and Ultimate Attainment Stefan Rathert
SLA and Ultimate Attainment   Stefan RathertSLA and Ultimate Attainment   Stefan Rathert
SLA and Ultimate Attainment Stefan RathertStefan Rathert
 
A study on intonation as a means of conveying deontic modality in English.pdf
A study on intonation as a means of conveying deontic modality in English.pdfA study on intonation as a means of conveying deontic modality in English.pdf
A study on intonation as a means of conveying deontic modality in English.pdfNuioKila
 
saber Bouzara.pptx
saber Bouzara.pptxsaber Bouzara.pptx
saber Bouzara.pptxsaberbouzara
 
Topic 5 other variables in late L2 learning
Topic 5 other variables in late L2 learningTopic 5 other variables in late L2 learning
Topic 5 other variables in late L2 learningalandon429
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceinventy
 
Interview Project Paper of the 2017/1 INGLÊS: HABILIDADES INTEGRADAS II - TN4...
Interview Project Paper of the 2017/1 INGLÊS: HABILIDADES INTEGRADAS II - TN4...Interview Project Paper of the 2017/1 INGLÊS: HABILIDADES INTEGRADAS II - TN4...
Interview Project Paper of the 2017/1 INGLÊS: HABILIDADES INTEGRADAS II - TN4...UFMG
 
Lauren Hall-Lew & Zac Boyd's NWAV45 talk on Phonetic Variation and Self-Recor...
Lauren Hall-Lew & Zac Boyd's NWAV45 talk on Phonetic Variation and Self-Recor...Lauren Hall-Lew & Zac Boyd's NWAV45 talk on Phonetic Variation and Self-Recor...
Lauren Hall-Lew & Zac Boyd's NWAV45 talk on Phonetic Variation and Self-Recor...Lauren Hall-Lew
 
Listening difficulties
Listening difficultiesListening difficulties
Listening difficultiesabidayou
 
NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translationMarcis Pinnis
 
Dyslexia lecture 2006
Dyslexia lecture 2006Dyslexia lecture 2006
Dyslexia lecture 2006UCL
 
Conversation Analysis: Directness in NNS's Dispreferred Responses
Conversation Analysis: Directness in NNS's Dispreferred ResponsesConversation Analysis: Directness in NNS's Dispreferred Responses
Conversation Analysis: Directness in NNS's Dispreferred ResponsesRoberto Criollo
 
Difference Between Alphabet And International Phonetic Theory
Difference Between Alphabet And International Phonetic TheoryDifference Between Alphabet And International Phonetic Theory
Difference Between Alphabet And International Phonetic TheorySandy Harwell
 
The Errors Made in the Pronunciation of Moroccan EFL Learners
The Errors Made in the Pronunciation of Moroccan EFL LearnersThe Errors Made in the Pronunciation of Moroccan EFL Learners
The Errors Made in the Pronunciation of Moroccan EFL LearnersMohamed Benhima
 
Input, interaction, and second language acquisition
Input,  interaction, and second language acquisitionInput,  interaction, and second language acquisition
Input, interaction, and second language acquisitionPe Tii
 
Discourse and conversation
Discourse and conversationDiscourse and conversation
Discourse and conversationbrightmoon90900
 

Similar to Filled pauses and L2 proficiency: Finnish Australians speaking English (20)

Using Corpus Linguistics to Teach ESL Pronunication
Using Corpus Linguistics to Teach ESL PronunicationUsing Corpus Linguistics to Teach ESL Pronunication
Using Corpus Linguistics to Teach ESL Pronunication
 
Investigating Teachers' Perceptions of Fluency
Investigating Teachers' Perceptions of FluencyInvestigating Teachers' Perceptions of Fluency
Investigating Teachers' Perceptions of Fluency
 
SLA and Ultimate Attainment Stefan Rathert
SLA and Ultimate Attainment   Stefan RathertSLA and Ultimate Attainment   Stefan Rathert
SLA and Ultimate Attainment Stefan Rathert
 
A study on intonation as a means of conveying deontic modality in English.pdf
A study on intonation as a means of conveying deontic modality in English.pdfA study on intonation as a means of conveying deontic modality in English.pdf
A study on intonation as a means of conveying deontic modality in English.pdf
 
saber Bouzara.pptx
saber Bouzara.pptxsaber Bouzara.pptx
saber Bouzara.pptx
 
Topic 5 other variables in late L2 learning
Topic 5 other variables in late L2 learningTopic 5 other variables in late L2 learning
Topic 5 other variables in late L2 learning
 
NCIHC WEBINAR: Translation as a Tool in the Interpreter Toolbox
NCIHC WEBINAR: Translation as a Tool in the Interpreter ToolboxNCIHC WEBINAR: Translation as a Tool in the Interpreter Toolbox
NCIHC WEBINAR: Translation as a Tool in the Interpreter Toolbox
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
Interview Project Paper of the 2017/1 INGLÊS: HABILIDADES INTEGRADAS II - TN4...
Interview Project Paper of the 2017/1 INGLÊS: HABILIDADES INTEGRADAS II - TN4...Interview Project Paper of the 2017/1 INGLÊS: HABILIDADES INTEGRADAS II - TN4...
Interview Project Paper of the 2017/1 INGLÊS: HABILIDADES INTEGRADAS II - TN4...
 
Lauren Hall-Lew & Zac Boyd's NWAV45 talk on Phonetic Variation and Self-Recor...
Lauren Hall-Lew & Zac Boyd's NWAV45 talk on Phonetic Variation and Self-Recor...Lauren Hall-Lew & Zac Boyd's NWAV45 talk on Phonetic Variation and Self-Recor...
Lauren Hall-Lew & Zac Boyd's NWAV45 talk on Phonetic Variation and Self-Recor...
 
Listening difficulties
Listening difficultiesListening difficulties
Listening difficulties
 
NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translation
 
Dyslexia lecture 2006
Dyslexia lecture 2006Dyslexia lecture 2006
Dyslexia lecture 2006
 
Research in ELT
Research in ELT Research in ELT
Research in ELT
 
Conversation Analysis: Directness in NNS's Dispreferred Responses
Conversation Analysis: Directness in NNS's Dispreferred ResponsesConversation Analysis: Directness in NNS's Dispreferred Responses
Conversation Analysis: Directness in NNS's Dispreferred Responses
 
ASE Poster
ASE PosterASE Poster
ASE Poster
 
Difference Between Alphabet And International Phonetic Theory
Difference Between Alphabet And International Phonetic TheoryDifference Between Alphabet And International Phonetic Theory
Difference Between Alphabet And International Phonetic Theory
 
The Errors Made in the Pronunciation of Moroccan EFL Learners
The Errors Made in the Pronunciation of Moroccan EFL LearnersThe Errors Made in the Pronunciation of Moroccan EFL Learners
The Errors Made in the Pronunciation of Moroccan EFL Learners
 
Input, interaction, and second language acquisition
Input,  interaction, and second language acquisitionInput,  interaction, and second language acquisition
Input, interaction, and second language acquisition
 
Discourse and conversation
Discourse and conversationDiscourse and conversation
Discourse and conversation
 

Recently uploaded

Book Sex Workers Available Pune Call Girls Yerwada 6297143586 Call Hot India...
Book Sex Workers Available Pune Call Girls Yerwada  6297143586 Call Hot India...Book Sex Workers Available Pune Call Girls Yerwada  6297143586 Call Hot India...
Book Sex Workers Available Pune Call Girls Yerwada 6297143586 Call Hot India...Call Girls in Nagpur High Profile
 
如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查
如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查
如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查awo24iot
 
Alambagh Call Girl 9548273370 , Call Girls Service Lucknow
Alambagh Call Girl 9548273370 , Call Girls Service LucknowAlambagh Call Girl 9548273370 , Call Girls Service Lucknow
Alambagh Call Girl 9548273370 , Call Girls Service Lucknowmakika9823
 
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...anilsa9823
 
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai GapedCall Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai Gapedkojalkojal131
 
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service SaharanpurVIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service SaharanpurSuhani Kapoor
 
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...Pooja Nehwal
 
如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一
如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一
如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一ga6c6bdl
 
如何办理(NUS毕业证书)新加坡国立大学毕业证成绩单留信学历认证原版一比一
如何办理(NUS毕业证书)新加坡国立大学毕业证成绩单留信学历认证原版一比一如何办理(NUS毕业证书)新加坡国立大学毕业证成绩单留信学历认证原版一比一
如何办理(NUS毕业证书)新加坡国立大学毕业证成绩单留信学历认证原版一比一ga6c6bdl
 
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsapps
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /WhatsappsBeautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsapps
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsappssapnasaifi408
 
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Gaya Call Girls #9907093804 Contact Number Escorts Service Gaya
Gaya Call Girls #9907093804 Contact Number Escorts Service GayaGaya Call Girls #9907093804 Contact Number Escorts Service Gaya
Gaya Call Girls #9907093804 Contact Number Escorts Service Gayasrsj9000
 
Top Rated Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated  Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Top Rated  Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Call Girls in Nagpur High Profile
 
VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...Suhani Kapoor
 
(PARI) Alandi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(PARI) Alandi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(PARI) Alandi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(PARI) Alandi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
9892124323 Pooja Nehwal Call Girls Services Call Girls service in Santacruz A...
9892124323 Pooja Nehwal Call Girls Services Call Girls service in Santacruz A...9892124323 Pooja Nehwal Call Girls Services Call Girls service in Santacruz A...
9892124323 Pooja Nehwal Call Girls Services Call Girls service in Santacruz A...Pooja Nehwal
 
Russian Escorts in lucknow 💗 9719455033 💥 Lovely Lasses: Radiant Beauties Shi...
Russian Escorts in lucknow 💗 9719455033 💥 Lovely Lasses: Radiant Beauties Shi...Russian Escorts in lucknow 💗 9719455033 💥 Lovely Lasses: Radiant Beauties Shi...
Russian Escorts in lucknow 💗 9719455033 💥 Lovely Lasses: Radiant Beauties Shi...nagunakhan
 
Call Girls Service Kolkata Aishwarya 🤌 8250192130 🚀 Vip Call Girls Kolkata
Call Girls Service Kolkata Aishwarya 🤌  8250192130 🚀 Vip Call Girls KolkataCall Girls Service Kolkata Aishwarya 🤌  8250192130 🚀 Vip Call Girls Kolkata
Call Girls Service Kolkata Aishwarya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...ur8mqw8e
 
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...Pooja Nehwal
 

Recently uploaded (20)

Book Sex Workers Available Pune Call Girls Yerwada 6297143586 Call Hot India...
Book Sex Workers Available Pune Call Girls Yerwada  6297143586 Call Hot India...Book Sex Workers Available Pune Call Girls Yerwada  6297143586 Call Hot India...
Book Sex Workers Available Pune Call Girls Yerwada 6297143586 Call Hot India...
 
如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查
如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查
如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查
 
Alambagh Call Girl 9548273370 , Call Girls Service Lucknow
Alambagh Call Girl 9548273370 , Call Girls Service LucknowAlambagh Call Girl 9548273370 , Call Girls Service Lucknow
Alambagh Call Girl 9548273370 , Call Girls Service Lucknow
 
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
 
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai GapedCall Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
 
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service SaharanpurVIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
 
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
 
如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一
如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一
如何办理萨省大学毕业证(UofS毕业证)成绩单留信学历认证原版一比一
 
如何办理(NUS毕业证书)新加坡国立大学毕业证成绩单留信学历认证原版一比一
如何办理(NUS毕业证书)新加坡国立大学毕业证成绩单留信学历认证原版一比一如何办理(NUS毕业证书)新加坡国立大学毕业证成绩单留信学历认证原版一比一
如何办理(NUS毕业证书)新加坡国立大学毕业证成绩单留信学历认证原版一比一
 
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsapps
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /WhatsappsBeautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsapps
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsapps
 
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
 
Gaya Call Girls #9907093804 Contact Number Escorts Service Gaya
Gaya Call Girls #9907093804 Contact Number Escorts Service GayaGaya Call Girls #9907093804 Contact Number Escorts Service Gaya
Gaya Call Girls #9907093804 Contact Number Escorts Service Gaya
 
Top Rated Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated  Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Top Rated  Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
 
VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
 
(PARI) Alandi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(PARI) Alandi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(PARI) Alandi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(PARI) Alandi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
9892124323 Pooja Nehwal Call Girls Services Call Girls service in Santacruz A...
9892124323 Pooja Nehwal Call Girls Services Call Girls service in Santacruz A...9892124323 Pooja Nehwal Call Girls Services Call Girls service in Santacruz A...
9892124323 Pooja Nehwal Call Girls Services Call Girls service in Santacruz A...
 
Russian Escorts in lucknow 💗 9719455033 💥 Lovely Lasses: Radiant Beauties Shi...
Russian Escorts in lucknow 💗 9719455033 💥 Lovely Lasses: Radiant Beauties Shi...Russian Escorts in lucknow 💗 9719455033 💥 Lovely Lasses: Radiant Beauties Shi...
Russian Escorts in lucknow 💗 9719455033 💥 Lovely Lasses: Radiant Beauties Shi...
 
Call Girls Service Kolkata Aishwarya 🤌 8250192130 🚀 Vip Call Girls Kolkata
Call Girls Service Kolkata Aishwarya 🤌  8250192130 🚀 Vip Call Girls KolkataCall Girls Service Kolkata Aishwarya 🤌  8250192130 🚀 Vip Call Girls Kolkata
Call Girls Service Kolkata Aishwarya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
 
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
 

Filled pauses and L2 proficiency: Finnish Australians speaking English

  • 1. Filled pauses (FPs) and L2 proficiency Finnish Australians speaking English Timo Lauttamus, John Nerbonne and Wybo Wiersma University of Oulu, Rijksuniversiteit Groningen timo.lauttamus@oulu.fi, j.nerbonne@rug.nl & wybo@logilogi.org 5 June 2009
  • 2. In this talk we present A method for detecting syntactic differences and our findings on pausing 3 sub-questions about the method 1 What did your corpus look like ? 2 What is permutation statistics ? 3 How to apply it to syntax ? 3 sub-questions about the results 1 What general differences did you find ? 2 How much pausing is there, and by who ? 3 What does this tell about the speakers ?
  • 3. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Outline of the Talk Introduction In this Talk Outline of the Talk The Method Results Conclusion Questions References
  • 4. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References The Method Introduction The Method Our Corpus Permutation Statistics Applying it to Syntax Results Conclusion Questions References
  • 5. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References The Method • Detect a wide range of syntax differences, and these as • significant differences • aggregate differences • relative differences • This would enable measuring the syntax part of total impact: “No easy way of measuring or characterizing the total impact of one language on another in the speech of bilinguals has been, or probably can be devised. The only possible procedure is to describe the various forms of interference and to tabulate their frequency.”
  • 6. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References The Method • Detect a wide range of syntax differences, and these as • significant differences • aggregate differences • relative differences • This would enable measuring the syntax part of total impact: “No easy way of measuring or characterizing the total impact of one language on another in the speech of bilinguals has been, or probably can be devised. The only possible procedure is to describe the various forms of interference and to tabulate their frequency.” - Weinreich, Languages in Contact
  • 7. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References The Method We also want to do this automatically and computationally In order to be able to: • Mine for differences in syntax between • learners versus native speakers • speakers of different dialects • writers from different discourses • Test dialectological and other linguistic hypotheses • Note over- and under-use instead of right / wrong
  • 8. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References The Method We did it in four steps: 1 Tag 2 or more collections of comparable material (using an automatic POS-tagger) 2 Take n-grams (2 - 5 grams) of POS-tags 3 Statistically compare their frequencies 4 Sort the significant POS-n-grams by extent of difference Aarts J. and Granger S. did this without the statistics in: ’Tag sequences in learner corpora: a key to interlanguage grammar and discourse’ (1998)
  • 9. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Our Corpus Origins: • 20,000 Finns immigrated to Australia • Working class background, limited education • 25-40 Years upon arrival Corpus collected 1995-1998 by Greg Watson: • of the university of Joensuu, Finland • two age groups; adults and juveniles • 350.000 words, 305.000 words free conversation
  • 10. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Our Corpus Adults: • over 18 years at arrival, on average 30 • on average 58 at time of interview • 60 interviews, 65 - 70 min each (221.000 words) Juveniles: • under 16 years at arrival, on average 6 • on average 36 at time of interview • 30 interviews, 65 - 70 min each (84.000 words)
  • 11. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Our Corpus In preparation we Part of Speech-tagged it with: • Trigrams ’n ’Tags (TnT) Statistical POS Tagger • made by Thorsten Brants (Universitt des Saarlandes) It achieves an accuracy of: • 96.7% on the Penn Treebank • 85.1% - 90.5% on our spoken material Accuracy is of course worse for 3-grams: • 2-grams 74%, 3-grams 65%, 4-grams 58% ...
  • 12. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Permutation Statistics It is different from parametric (normal) statistics: • It is about the data, not about the population • no need for normality • no need for homoscedasticity (eq distrib variances) • no absolute need for random sampling • Still, important for permutation statistics are • random assignment and independence of observations • in practice no problems for linguistic/dialect data As a statistical method it is very suitable for linguistics
  • 13. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Permutation Statistics
  • 14. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Permutation Statistics
  • 15. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Permutation Statistics
  • 16. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Permutation Statistics
  • 17. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Permutation Statistics
  • 18. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Permutation Statistics
  • 19. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Permutation Statistics
  • 20. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Permutation Statistics
  • 21. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Permutation Statistics
  • 22. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Permutation Statistics
  • 23. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Permutation Statistics
  • 24. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Applying it to Syntax One firstly needs something to permutate: • We permutated interviewees • more conservative than 3-grams • and also easier than sentences (did this earlier) • For each interview • we took 3-grams (N-grams too) of POS-tags • we then calculated the 3-gram-promillages for all 3-grams (occurrence of 3-gram type per 1000 3-gram tokens) • These 3-gram-promillage-vectors were then used • summed per group after each permutation
  • 25. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Applying it to Syntax Secondly one needs something to measure extremity: • Both • for the whole group • and for each individual POS-3-gram • We used r-square and summed r-square • we also tried cosine and summed r • R-square is the square of the difference (r) • for a POS-3-gram-promillage between the 2 groups • Summed r-square is the sum of r-square for all 3-grams
  • 26. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References R-square
  • 27. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Applying it to Syntax Thirdly one needs to apply normalizations for: • Text-size per subject (for each subject) • divide by sum of the subjects’ 3-grams (the promillages) • to eliminate differences in text-size between authors • Frequencies of 3-gram types (for each 3-gram type) • divide by the corpus-wide total of the 3-gram-type • to eliminate differences in frequencies (optional) • Group-size (for both groups, across permutations) • divide by the average fequency of 3-grams in the group • to correctly detect over- and under-use
  • 28. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Applying it to Syntax Normalisations are needed to • Prevent false significance • arising from differences in text- and group-sizes • Increase the weight of less frequent 3-grams • on the level of the group • (as said this is optional) • And it allows one to sort 3-grams based on • whether they are more or less typical for each group • relative to group-size
  • 29. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Normalizing for Frequency
  • 30. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Results and Analysis Introduction The Method Results General Differences Pausing Analysis Conclusion Questions References
  • 31. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Results and Analysis • The following slides summarise some of the material in Lauttamus, Nerbonne, and Wiersma (2007, 2009) • The evidence based on the data of the two groups shows that there are statistically significant syntactic differences between the adult and juvenile groups • We argue that some of the significant differences found in the data can be ascribed to the lower level of language proficiency of the adults
  • 32. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References General Differences Some of the syntactic differences found in the data can be described in most general terms as follows (all for the adults group): 1 Overuse of hesitation phenomena 2 Overuse of parataxis 3 Underuse of contracted forms 4 Reduced repertoire of discourse markers 5 Avoidance of complex verbal structures 6 Avoidance of prepositional and phrasal verbs 7 Underuse of the existential there
  • 33. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References General Differences • The adults demonstrate features of disfluent speech • such as (filled) pauses, repeats, false starts, incomplete or false syntactic structures, arising from difficulties in speech processing, and particularly in lexical access • We argue that the statistical evidence obtained from our data reflects syntactic distance between the two varieties of L2 English • And, consequently, aggregate effects of the differences in the two groups English proficiency • We will now look further into pausing
  • 34. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Pausing We applied the computational technique described earlier to examine: 1 if the the adults and juveniles show a differential use of pausing (filled pauses, FPs), and 2 how such a difference can be analysed and explained Thus: • We will now look at the 308 POS-trigrams typical for the adult (L1) speakers’ syntax • first we look at the top 200 • compare them to the juveniles’ POS-trigrams • and then we look at all of them
  • 35. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Fig. 1: Percentage of FPs, adults top-200
  • 36. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Pausing • These are the top 200 POS-trigram types which most characteristically distinguish the adults from the juveniles • they are significant at a p ≤ 0.05 level • 42.5%, 85 out of 200 • include at least one filled pause, as in (1) and (2) • 6%, 12 out of 200 • include at least two filled pauses, as in (3) and (4) • In addition, there is one trigram with only filled pauses
  • 37. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Pausing (1) Interj Conj(subord) Art (def) politically | uh when the | liberals (2) V(cop,pres,encl) Interj Adv(inten) I’ | m ah very | sick (3) Interj Interj Conj(subord) and | uh uh because | in the morning (4) Interj Pron(pers, sing) Interj and | uh I uh | snow-skied
  • 38. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Pausing • For the juveniles there are 792 POS-trigram types in which they use the sequence of POS tags more frequently than the adults • Again significant at the p ≤ 0.5 level • But only 0.4%, 3 out of 792 • include one filled pause • None include more than one FP
  • 39. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Fig. 2: Percentage of FPs, adults all 308
  • 40. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Pausing • Of all 308 POS-trigram types typical for the adults • this are all POS-trigrams significant at a p 0.05 level • 38.0%, 117 out of 308 • include at least one filled pause • 4.5%, 14 out of 308 • include two filled pauses • And again there is one trigram type with FPs only
  • 41. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Analysis • Both figures show the same trend. The highly skewed distribution of the filled pauses across the two groups of Finnish Australian English speakers conclusively shows that • the juveniles have a much more varied syntactic repertoire (measured in terms of POS-trigrams) than the adults, and • the adults have much more limited and idiosyncratic (ungrammatical or substandard) syntactic patterns at their disposal than the juveniles
  • 42. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Analysis • The large number of filled pauses found in the adults speech as opposed to the juveniles’ is in agreement with the evidence in Paananen-Porkka (2007: 234), who argues that native speakers of Finnish show longer pauses on average in English than in Finnish • The statistically significant differential use of filled pauses by the adults can be explained in terms of the adults lesser proficiency (particularly at the level of speech planning) and, consequently, fluency of L2.
  • 43. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Analysis • The elimination of all FPs from the data has little effect on the significance value for the tag sets • The outcome of running the scripts again without the FPs showed that there are still • 729 statistically significant trigram types for the juveniles • as opposed to 220 for the adults
  • 44. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Analysis • The examination of the top 200 FP-less trigram types produced by the adults showed that about 38% of the trigram types are ungrammatical, and that some of the remaining trigram types are substandard • (e.g. omission of an obligatory article or preposition, omission of an obligatory copula or primary verb be or have, omission of the subject, use of a redundant article with proper nouns etc.; cf. Lauttamus et al. 2007).
  • 45. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Conclusion • The uneven distribution of the filled pauses across the two groups of Finnish Australian English speakers conclusively shows that • the adults used much more filled pauses than the juveniles, and • that the adults have much more limited and idiosyncratic syntactic patterns at their disposal • The statistically significant differential use of filled pauses by the adults can be explained in terms of the adults lesser proficiency compared to that of the juveniles
  • 46. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Concluding Remarks There is room for fine-tuning the method: • Find optimum size for data-sets • Try and evaluate with different measures The method as is can easily be applied to many data-sets: • Works on untagged corpora of spoken language • Can empirically buttress theses Software to do it and to pre-process corpora is freely available: • the ComLinToo http://old.logilogi.org/ComLinToo
  • 47. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Questions Any questions ?
  • 48. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Questions Any questions ?
  • 49. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References References Jan Aarts and Sylviane Granger. Tag sequences in learner corpora: A key to interlanguage grammar and discourse. In Sylviane Granger, editor, Learner English on Computer, pages 132 – 141. Longman, London, 1998. Alan Agresti. An Introduction to Categorical Data Analysis. Wiley, New York, 1996. Thorsten Brants. Tnt - a statistical part of speech tagger. In 6th Applied Natural Language Processing Conference, pages 224 – 231. ACL, Seattle, 2000. Eugenio Coseriu. Probleme der kontrastiven Grammatik. Schwann, D¨usseldorf, 1970.
  • 50. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References References Kees de Bot, Wander Lowie, and Marjolijn Verspoor. Second Language Acquisition: An Advanced Resource Book. Routledge, London, 2005. Charles Fillmore and Paul Kay. Grammatical constructions and linguistic generalizations: the what’s x doing y construction. Language, 75, 1999. Roger Garside, Geoffrey Leech, and Tony McEmery. Corpus Annotation: Linguistic Information from Computer Text Corpora. Longman, London/New York, 1997. Phillip Good. Permutation Tests. Springer, New York, 1995 2nd, corr. ed.
  • 51. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References References Brett Kessler. The Significance of Word Lists. CSLI Press, Stanford, 2001. T. Lauttamus, J. Nerbonne, and W. Wiersma. Detecting syntactic contamination in emigrants: The english of finnish australians. SKY Journal of Linguistics, 20, 2007. Chris Manning and Hinrich Sch¨utze. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, 1999. J. Nerbonne and W. Wiersma. A measure of aggregate syntactic distance. In J. Nerbonne and E. Hinrichs, editors, Linguistic Distances, pages 82 – 90. PA: ACL, Shroudsburg, 2006.
  • 52. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References References C.C.E. Oomen and A. Postma. Effects of divided attention on the production of filled pauses and repetitions. Journal of Speech, Language, Hearing Research, 44, 2001. M. Paananen-Porkka. Speech Rhythm in an Interlanguage Perspective: Finnish Adolescents Speaking English. Pragmatics, Ideology and Contact Monographs. University of Helsinki, Helsinki, 2007. Shana Poplack and David Sankoff. Borrowing: the synchrony of integration. Linguistics, 22, 1984.
  • 53. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References References Shana Poplack, David Sankoff, and Christopher Miller. The social correlates and linguistic processes of lexical borrowing and assimilation. Linguistics, 26, 1988. William C. Ritchie and editors Tej K. Bhatia. Handbook of Child Language Acquisition. Academic, San Diego, 1998. Peter Sells. Lectures on Contemporary Syntactic Theories. CSLI, Stanford, 1982. Sarah Thomason and Terrence Kaufmann. Language Contact, Creolization and Genetic Linguistics. University of California Press, Berkeley, 1988.
  • 54. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References References Frans van Coetsem. Loan phonology and the two transfer types in language contact. In Publications in Language Sciences, page 27. Foris Publications, Dordrecht, 1988. Greg Watson. The finnish-australian english corpus. ICAME Journal: Computers in English Linguistics, 20, 1996. Uriel Weinreich. Languages in Contact. Mouton, The Hague, 1953 (page numbers from 2nd ed. 1968).
  • 55. FPs and L2 proficiency: Timo Lauttamus, John Nerbonne and Wybo Wiersma Introduction In this Talk Outline The Method Our Corpus Permutation Statistics Applying it to Syntax Results General Differences Pausing Analysis Conclusion Questions References Copyleft Copyrights Wybo Wiersma, John Nerbonne and Timo Lauttamus, available under the Creative Commons By-Sa license • Thanks to the OpenClipart archive; Carlitos for the landscape, and unknown authors for the bomb and the frogs. http://creativecommons.org/licenses/by-sa/2.5/