Tracking Learning:  Using Corpus Linguistics to Assess Language Development James Lantolf Steve Thorne CALPER Center for Advanced Language Proficiency Education and Research The Pennsylvania State University
Tracking Learning: Approaches to Assessment Traditional Classroom Assessment Achievement, Placement, Formative Standardized Tests AP, TOEFL, OPI, STAMP Alternative Assessment Portfolio & LinguaFolio Performance Assessment, Task-Based CALPER Assessment Dynamic Assessment Corpus-Informed Assessment
Today’s Talk What is a corpus? Types of corpora Corpus-informed assessment Developmental learner corpora  Contrastive learner corpus analysis against baseline Two examples of corpus-informed assessment Advanced ESL academic discourse competence German modal particles
What is a Corpus? A  corpus  (plural  corpora ) Large collection of texts Gathered according to specific criteria  Stored in an electronic database with relevant meta-data associated with each text entry Student ID Time/date Activity type Corpora can be constructed from written language use (especially digital texts) or transcribed from spoken interaction
Basic Tenets of Corpus Analysis Data driven, highly empirical Objective approach  A grammar of use based on attested utterance types A grammar of probability based on frequency and distribution Language use and structure: Collocational patterns Lexicon heart of systematicity in language, i.e., grammar Formulaic sequences comprise ~60% of language use  (Wray, 2002; Schmitt & Carter, 2004)
Corpora & Language Assessment For advanced proficiency -- develop and/or utilize genre, modality, and context-specific corpora Focus can be on grammatical, lexical, metaphoric, discourse, pragmatic features  Typical problems and errors of usage can be found in learner data Teachers and learners themselves can observe and assess their own and one another’s performance Expert-speaker  corpora can reveal what learners are  not  using/doing, as well as how appropriately, successfully, and differentially they  are  using the target language
Comparing Assessment Approaches Elicited performance indicative of competence “ Authenticity” and / or ecological  validity of test instrument Sampling issues Reliability Critical question:  Is the elicited performance representative of the individual’s state of language development? Naturally-occurring  language performance indicates competence Volume of language learners produce across tasks/genres and time Sampling issues become irrelevant Reconceptualize reliability Critical question:  Have enough data been collected to conclude that an individual’s performance is representative of her state of language development? Testing Corpus-based
ITA Project Describing, assessing, and  developing academic discourse with international teaching assistants Steve Thorne Jonathan Reinhardt Paula Golombek
ITAcorp Project ITAs highly competent researchers Expand repertoire of options for performing often complex social roles (instructor, adjudicator, tutor, advisor, fellow student, mediator) Assessment --> Contrastive corpus analysis of ITACorp with baseline corpus -- MICASE  Grammar as choice as it relates to meaning and social actions Formulaic sequences, small words, modulation Corpus-informed pedagogical intervention to prepare students to participate successfully in spoken and written genres of academic discourse
Methodology Contrastive corpus analysis of MICASE and ITAcorp -->  what are the differences in language use between expert/native and ITA/advanced ESL speakers? Identified directive and obligative constructions Quantified usage of directive language in both corpora The case of wanna / want to
Corpus Assessment: Time 1 The case of “you want to” | “you could …” Please + [imperative]
Corpus Assessment: Time 2 Post intervention usage of “you want to” 10 instances of usage across 25 advanced ESL students Concordance lines of proceduralized usage in context
Corpus Assessment Corpus-informed Assessment and Materials Development: German Modal Particles Nina Vyatkina
Teaching the MPs: Challenges Modal Particles:  ja, doch, denn, mal Rampant polysemy in MPs Strongly context-bound meaning Absence of a direct counterpart in English (translated by tag questions, intonation, omitted)  Absence of an informal “particle-friendly climate” in traditional language classrooms Overly formal treatment in textbooks   Sentence-based rather than utterance-based [interactive]
Participants 7 American students and 16 German students discussing intercultural topics in German and in English using email and chat during 8 semester weeks (Fall 2005)
German modal particles: indeclinable “smallwords” typical of conversations ‘ The German listener expects a particle. If it is absent, the sentence acquires a specific stylistic value: without a particle it sounds choppy, harsh, unfriendly, its utterance is apodictic, abrupt, blatantly noncommittal.’ (Weydt, 1969) German Modal Particles
Pedagogical intervention Classroom  intra cultural sessions:  explicit instruction  based on the  data  produced by the participants in  Internet-mediated  inter cultural sessions:  practice in language use  in CMC with native speakers (Belz, 2006) Data-driven instruction CMC practice QUAN & QUAL analysis CMC practice Data-driven  instruction QUAN & QUAL analysis
Relative frequency:  modality/intervention effect * Statistically significant difference in mean relative frequencies (no. MPs/1000 German words), p<.05
MP Dispersion in the corpus Learners: ja denn doch mal NSs: ja denn doch mal
MP use by NSs and learners (absolute numbers) 80 89 NSs 1 22 Post-Int. W4 6 65  Total  Post-Interv. 3 27 Interv. W3 0 6 Interv. W2 2 7 Interv. W1 0 3 Pre-Interv. (4 weeks) Learners: Inaccurate use Learners: Accurate use Stages
Awareness-raising exercise 1 Questions adopted in part from Möllering and Nunan (1995) In this excerpt from your chat with your German partner, what lexical category (part of speech) do the words  ja, mal, aber  belong to? Can you list other words belonging to this category? What functions do these words have in the examples from your partners’ writing? Which of these words have you ever used (in this course or earlier) in the same functions? Soren:  Wann kommst Du  mal  nach Deutschland? Jeremy:  Hoffentlich komme ich der Fruhling 2007. Soren:  Oh das dauert  aber  noch Soren:  Das ist  ja  noch über ein Jahr Soren:  Naja,   vielleicht schaffst Du es  ja  dann  mal  bei mir vorbei zu kommen.
Awareness-raising exercise 2 In the given excerpt from your chat with your German partner, separate modal particles (MP) from their homonyms (H). Try to paraphrase the meaning of each word marked in bold. How many MPs and their homonyms did you use? And your partner? Chip : Was hast du ueber FKK geschrieben? [...] Simone : Zu welcher Frage meinst Du  denn (___) ? Chip : Umm... ich muss die Frage finden Simone : Es gab  ja (___)  mehrere Fragen zum Thema FKK […] Simone : Die Serie läuft  doch (___)  aber noch in den USA? [...] Simone : dann  mal (___)  bis zur nächsten e-mail! Chip :  Ja (___) ! Bis zum naechsten  Mal (___)  ! :-) Simone : Du kannst mir  ja (___) mal (___)  schreiben, was Du außerhalb der Uni noch so machst
Awareness-raising exercise 3 Questions adopted in part from Möllering (2004) Consider the following concordance lines with the modal particle MAL extracted by means of WordSmith Tools® from your partners’ writing and answer the questions: Underline all the finite verbs in the clauses containing MAL. Do you see any patterns? In these examples, does the content refer to the past, present, or future time? What is the lexical category (part of speech) of the word expressing the subject in each line? What sentence types do the examples contain – declaratives, exclamatives, commands, questions?
Corpus-informed Assessment: Conclusions, Questions, & Resources Representativeness and ecological validity? Assemble corpus data to adequately and significantly represent production Use benchmark corpora for assessing learner language successes and problems Developmental corpus assessment of individuals and class-cohorts CALPER materials: Corpus tutorial -- see calper.la.psu.edu INVESTIGATING REAL LANGUAGE -- June 25-27, 2007 DYNAMIC ASSESSMENT workshop June 25-27, 2007 CALPER Corpus Tool available Summer, 2007
Thanks -- please visit our website for more information on CALPER materials, events, and services: http://calper.la.psu.edu
 
Challenges to Corpus Approaches One data source among many: ethnographic details, visual field, introspection, clinical and experimental elicitation Descriptive not explanatory Focus on externalized language use / performance – psycholinguistics and language processing inferred Corpora are “real” (representation of actual use), but are they “authentic” (meaningful and applicable to learners, e.g., Widdowson, 2002) Only as good as its representativeness Harkening back to contrastive error analysis? No, contrastive analysis of actual use that does not need to include incapacity evaluations of learners
Types of Corpora & Analyses Synchronic Diachronic Developmental Learner Corpora (Myles, Payne, Belz, Thorne et. al.) Frequencies, ratios, ??? Learner Corpora (Granger) Contrastive IL Analyses Frequencies, ratios Genre/Register/Variation (Biber, Swales, Sinclair) Factor & cluster analyses Youman’s Vocabulary  Management Profiling Mutual information Historical Corpora (Davies) Frequencies, ratios Descriptive Benchmark (BNC, ANC) Frequencies, ratios
Corpus Design and Construction Aggregative Genre, register Meta-data: Situational context Activity Level of proficiency Synchronic Learner Corpora (Granger) Contrastive IL Analyses Frequencies, ratios Genre/Register/Variation (Biber, Swales, Sinclair) Factor & cluster analyses Youman’s Vocabulary  Management Profiling Mutual information Descriptive Benchmark (BNC, ANC) Frequencies, ratios
Corpus Design and Construction Role of meta-data: Individual Task Time Corpus construction as a form of experimental research Diachronic Developmental Learner Corpora (Myles, Payne, Belz, Thorne et. al.) Frequencies, ratios, ??? Historical Corpora (Davies) Frequencies, ratios
Corpus Annotation Frequency and location of tags Laughter for hyperbole Language use as social action Part-of-speech Lemmatization Syntactic tagging Error tagging Semantic tagging
Corpus Informing Language Theory Not only what is possible (e.g., nativist and UG approaches), but what is likely or frequent in usage Illustrates the limits of introspection about language (enormous differences between intuition and actual use) Language structure, i.e., formulaic sequences comprise ~60% of language use  (Wray, 2002; Schmitt & Carter, 2004) Emergent grammar   (Hopper, 2002; Bybee, 2001) Grammar a consequence, not a precondition -- epiphenomenal Grammar = observable repetition in discourse Grammar contingent upon lexical environment “ Grammar contracts as texts expands” --> fragments and repertoires
Revisioning Ellipsis Speakers add features as necessary rather than as taking away from what would be required in written discourse (see also Wittgenstein, 1953; Rommetveit, 1974) Omission of auxiliaries is common (be, have, do) but not often from speaker’s or 1 st  person perspective Empty “its” and existential “there is” often dropped in spoken discourse  Pronouns before modal verbs e.g., can happen, should be Overall, beginning bits are left out Grammatical description SHOULD represent spoken language use, should relate items and structures to interactional and situational functions
Importance of Measuring & Understanding  Process Alfred Binet (1909) advocated  process assessment , though never designed an instrument to measure it. Buckingham (1921) accounting for learning  processes  as important as  products .
Challenges of Assessing  Process   Feasibility “the most direct procedure for determining an individual’s proficiency…would simply be to follow that individual surreptitiously over an extended period of time…It is clearly impossible, or at least highly impractical, to administer a ‘test’ of this type in the language learning situation” (Clark 1978, as quoted in Bachman, 1990). Scalability - the bane of “alternative” assessment
Depicting  Process  in SLA Accuracy of production of L2 forms and IL development suggests a curvilinear rather than a linear relationship (Norris & Ortega, 2003). Threshold and stage effects (Meisel, Clahsen & Pienemann, 1991). U-shaped behavior (Kellerman, 1985) Omega-shaped behavior - temporary increase in frequency followed by a normalization (Wolfe-Quintero, Inagaki,& Kim, 1998).
Using Corpus to Assess IL Development Addressing  feasibility  and  scalability Proliferation of technology-mediated language learning More powerful computers and more refined software. Automated speech recognition - “dirty” ASR
“Complementary” Assessment Use testing techniques (traditional or performance) in conjunction with corpus-based assessment to generate a more detailed and broad-based account of IL development.
An ITA’s success as instructors and future faculty depends on successful participation in written and spoken academic discourse e.g. spoken genres: Academic Discourse Performance small lecture presentation large lecture presentation discussion leading lab section leading seminar leading advising colloquia participation interviewing meeting participation office hours conducting service encounters tutorial leading socializing conference presentation
The ITA “problem” Jan 2005: North Dakota proposed legislation: bill would have forced universities to reimburse class fees to student complaints about an instructor’s inability in English. If ten percent or more students had complained, the instructor would have been relieved from teaching pending further review. A watered-down version of the bill passed. High number of international graduate students in the U.S. -- 50 % of US graduate students in engineering and sciences are international
Directive Language DL is language with directive illocutionary force (Searle, 1979) used functionally for making suggestions or giving advice In traditional frameworks, DL has primarily deontic qualities of obligative modality In textbooks, is taught as series of modals & semi-modals (must, mustn’t, have to, should, ought to, need to, needn’t) In SYS-FUNC, DL would be considered part of the MODULATION system,  a continuum between obligation (what I want you to do) and inclination (what you want  to do)
Why Study Directive Language? DL is an important part of several academic discourse genres and professional competence Inappropriate or unintended use of DL may result in miscommunication or misunderstanding of speaker intention DL is highly interpersonal, involving speaker authority and power hierarchies
Research Contrastive genre-comparable spoken corpora ITAcorp (ITA language use): office hours role plays (CMC, presentation, post-evaluation)-- approx. 120,000 tokens MICASE (base-line ‘expert’ corpus): Advising and Office Hours sub-corpora--180,000 tokens MICASE data as model Analytical framework:  Corpus: usage-based, frequency & distribution Qualitative: (professional) discourse analysis,  SYS-FUNC & APPRAISAL
Preliminary Contrastive Analysis of wanna / want to
You [+ hedge] want to / wanna [+ hedge] MICASE ITACorp MICASE shows 12x the hedged use of want to / wanna ITACOrp uses  followed  pedagogical intervention on hedged wanna DL
Additional Preliminary Descriptive Findings In comparison to MICASE data, ITAs as represented in ITAcorp: Generally use very few hedges or intensifiers Generally under use periphrastic forms Overuse obligative modals (must, should) and please + imperative Use ‘ can ’ for obligative ‘ should ’ Use only basic conditional, underuse of ‘ you   could ’ and no use of ‘ I would’ Navigate between  ‘I’  and exclusive ‘ we’  strategically, invoking departmental or professorial authority when the going gets tough
Next Steps Complementary ethnographic data (survey, interviews) for ITAcorp participants Use audio to produce narrow transcriptions of select data Focus on differences across modality (CMC vs. F2F) Focus on classroom presentation of a concept (contrasting with MICASE) Gather data from non-role play ITA professional activity (section leader, lecturer, office hours) Develop set of corpus-informed pedagogical interventions focusing on professional discourse competencies
What is Data-Driven Learning? Application of tools (concordancers) and techniques from corpus linguistics in the service of language learning. Inquiry-based pedagogy Learner as researcher &quot;Research is too important to leave to researchers&quot;  (Johns, 1991, p2.)
Paradigms of L2 Instruction Traditional approaches: Present -> Practice -> Produce Data-driven learning: Observe -> Hypothesize -> Experiment
Impact of Corpus Techniques on L2 Pedagogy Materials development How do native/expert speakers actually use the target language? What drives sequencing? Instructional activities Example - link Data-driven learning tools KWICionary - link
Research on Data-Driven Learning Vocabulary Acquisition: improved through the use of concordances  (Steven, 1991; Cobb, 1997) Writing Instruction: students can correct their own errors with concordances  (Gaskell & Cobb, 2004;  Ross & Payne, 2005)
Pedagogical Issues for DDL Learning a new way to learn language Relationship between proficiency level and data-driven learning approach Should frequency of use drive materials development?
Next Generation Corpus Tools Text files => relational databases Storing data as smallest atomic unit Associate extensive meta-data with each data entry Application-based => web-based Promote aggregation and sharing of data Location-independent collaborative research Integration with online learning environments Online Corpus Analytic Tool (OCAT)
OCAT Design Relational database backend Extensive meta-data can be assigned to each data entry. Multiple corpora can be linked and meta-data fields aligned to create meta-corpora. Dynamic sub-corpora Users can create corpora and upload data via a web interface. Location-independent collaborative research Concordance query, frequency lists, Mean Tokens per Learner Data visualization techniques
Assessing Language Development
Assessing Language Development
Assessing Language Development

Tracking Learning: Using Corpus Linguistics to Assess Language Development

  • 1.
    Tracking Learning: Using Corpus Linguistics to Assess Language Development James Lantolf Steve Thorne CALPER Center for Advanced Language Proficiency Education and Research The Pennsylvania State University
  • 2.
    Tracking Learning: Approachesto Assessment Traditional Classroom Assessment Achievement, Placement, Formative Standardized Tests AP, TOEFL, OPI, STAMP Alternative Assessment Portfolio & LinguaFolio Performance Assessment, Task-Based CALPER Assessment Dynamic Assessment Corpus-Informed Assessment
  • 3.
    Today’s Talk Whatis a corpus? Types of corpora Corpus-informed assessment Developmental learner corpora Contrastive learner corpus analysis against baseline Two examples of corpus-informed assessment Advanced ESL academic discourse competence German modal particles
  • 4.
    What is aCorpus? A corpus (plural corpora ) Large collection of texts Gathered according to specific criteria Stored in an electronic database with relevant meta-data associated with each text entry Student ID Time/date Activity type Corpora can be constructed from written language use (especially digital texts) or transcribed from spoken interaction
  • 5.
    Basic Tenets ofCorpus Analysis Data driven, highly empirical Objective approach A grammar of use based on attested utterance types A grammar of probability based on frequency and distribution Language use and structure: Collocational patterns Lexicon heart of systematicity in language, i.e., grammar Formulaic sequences comprise ~60% of language use (Wray, 2002; Schmitt & Carter, 2004)
  • 6.
    Corpora & LanguageAssessment For advanced proficiency -- develop and/or utilize genre, modality, and context-specific corpora Focus can be on grammatical, lexical, metaphoric, discourse, pragmatic features Typical problems and errors of usage can be found in learner data Teachers and learners themselves can observe and assess their own and one another’s performance Expert-speaker corpora can reveal what learners are not using/doing, as well as how appropriately, successfully, and differentially they are using the target language
  • 7.
    Comparing Assessment ApproachesElicited performance indicative of competence “ Authenticity” and / or ecological validity of test instrument Sampling issues Reliability Critical question: Is the elicited performance representative of the individual’s state of language development? Naturally-occurring language performance indicates competence Volume of language learners produce across tasks/genres and time Sampling issues become irrelevant Reconceptualize reliability Critical question: Have enough data been collected to conclude that an individual’s performance is representative of her state of language development? Testing Corpus-based
  • 8.
    ITA Project Describing,assessing, and developing academic discourse with international teaching assistants Steve Thorne Jonathan Reinhardt Paula Golombek
  • 9.
    ITAcorp Project ITAshighly competent researchers Expand repertoire of options for performing often complex social roles (instructor, adjudicator, tutor, advisor, fellow student, mediator) Assessment --> Contrastive corpus analysis of ITACorp with baseline corpus -- MICASE Grammar as choice as it relates to meaning and social actions Formulaic sequences, small words, modulation Corpus-informed pedagogical intervention to prepare students to participate successfully in spoken and written genres of academic discourse
  • 10.
    Methodology Contrastive corpusanalysis of MICASE and ITAcorp --> what are the differences in language use between expert/native and ITA/advanced ESL speakers? Identified directive and obligative constructions Quantified usage of directive language in both corpora The case of wanna / want to
  • 11.
    Corpus Assessment: Time1 The case of “you want to” | “you could …” Please + [imperative]
  • 12.
    Corpus Assessment: Time2 Post intervention usage of “you want to” 10 instances of usage across 25 advanced ESL students Concordance lines of proceduralized usage in context
  • 13.
    Corpus Assessment Corpus-informedAssessment and Materials Development: German Modal Particles Nina Vyatkina
  • 14.
    Teaching the MPs:Challenges Modal Particles: ja, doch, denn, mal Rampant polysemy in MPs Strongly context-bound meaning Absence of a direct counterpart in English (translated by tag questions, intonation, omitted) Absence of an informal “particle-friendly climate” in traditional language classrooms Overly formal treatment in textbooks Sentence-based rather than utterance-based [interactive]
  • 15.
    Participants 7 Americanstudents and 16 German students discussing intercultural topics in German and in English using email and chat during 8 semester weeks (Fall 2005)
  • 16.
    German modal particles:indeclinable “smallwords” typical of conversations ‘ The German listener expects a particle. If it is absent, the sentence acquires a specific stylistic value: without a particle it sounds choppy, harsh, unfriendly, its utterance is apodictic, abrupt, blatantly noncommittal.’ (Weydt, 1969) German Modal Particles
  • 17.
    Pedagogical intervention Classroom intra cultural sessions: explicit instruction based on the data produced by the participants in Internet-mediated inter cultural sessions: practice in language use in CMC with native speakers (Belz, 2006) Data-driven instruction CMC practice QUAN & QUAL analysis CMC practice Data-driven instruction QUAN & QUAL analysis
  • 18.
    Relative frequency: modality/intervention effect * Statistically significant difference in mean relative frequencies (no. MPs/1000 German words), p<.05
  • 19.
    MP Dispersion inthe corpus Learners: ja denn doch mal NSs: ja denn doch mal
  • 20.
    MP use byNSs and learners (absolute numbers) 80 89 NSs 1 22 Post-Int. W4 6 65 Total Post-Interv. 3 27 Interv. W3 0 6 Interv. W2 2 7 Interv. W1 0 3 Pre-Interv. (4 weeks) Learners: Inaccurate use Learners: Accurate use Stages
  • 21.
    Awareness-raising exercise 1Questions adopted in part from Möllering and Nunan (1995) In this excerpt from your chat with your German partner, what lexical category (part of speech) do the words ja, mal, aber belong to? Can you list other words belonging to this category? What functions do these words have in the examples from your partners’ writing? Which of these words have you ever used (in this course or earlier) in the same functions? Soren: Wann kommst Du mal nach Deutschland? Jeremy: Hoffentlich komme ich der Fruhling 2007. Soren: Oh das dauert aber noch Soren: Das ist ja noch über ein Jahr Soren: Naja, vielleicht schaffst Du es ja dann mal bei mir vorbei zu kommen.
  • 22.
    Awareness-raising exercise 2In the given excerpt from your chat with your German partner, separate modal particles (MP) from their homonyms (H). Try to paraphrase the meaning of each word marked in bold. How many MPs and their homonyms did you use? And your partner? Chip : Was hast du ueber FKK geschrieben? [...] Simone : Zu welcher Frage meinst Du denn (___) ? Chip : Umm... ich muss die Frage finden Simone : Es gab ja (___) mehrere Fragen zum Thema FKK […] Simone : Die Serie läuft doch (___) aber noch in den USA? [...] Simone : dann mal (___) bis zur nächsten e-mail! Chip : Ja (___) ! Bis zum naechsten Mal (___) ! :-) Simone : Du kannst mir ja (___) mal (___) schreiben, was Du außerhalb der Uni noch so machst
  • 23.
    Awareness-raising exercise 3Questions adopted in part from Möllering (2004) Consider the following concordance lines with the modal particle MAL extracted by means of WordSmith Tools® from your partners’ writing and answer the questions: Underline all the finite verbs in the clauses containing MAL. Do you see any patterns? In these examples, does the content refer to the past, present, or future time? What is the lexical category (part of speech) of the word expressing the subject in each line? What sentence types do the examples contain – declaratives, exclamatives, commands, questions?
  • 24.
    Corpus-informed Assessment: Conclusions,Questions, & Resources Representativeness and ecological validity? Assemble corpus data to adequately and significantly represent production Use benchmark corpora for assessing learner language successes and problems Developmental corpus assessment of individuals and class-cohorts CALPER materials: Corpus tutorial -- see calper.la.psu.edu INVESTIGATING REAL LANGUAGE -- June 25-27, 2007 DYNAMIC ASSESSMENT workshop June 25-27, 2007 CALPER Corpus Tool available Summer, 2007
  • 25.
    Thanks -- pleasevisit our website for more information on CALPER materials, events, and services: http://calper.la.psu.edu
  • 26.
  • 27.
    Challenges to CorpusApproaches One data source among many: ethnographic details, visual field, introspection, clinical and experimental elicitation Descriptive not explanatory Focus on externalized language use / performance – psycholinguistics and language processing inferred Corpora are “real” (representation of actual use), but are they “authentic” (meaningful and applicable to learners, e.g., Widdowson, 2002) Only as good as its representativeness Harkening back to contrastive error analysis? No, contrastive analysis of actual use that does not need to include incapacity evaluations of learners
  • 28.
    Types of Corpora& Analyses Synchronic Diachronic Developmental Learner Corpora (Myles, Payne, Belz, Thorne et. al.) Frequencies, ratios, ??? Learner Corpora (Granger) Contrastive IL Analyses Frequencies, ratios Genre/Register/Variation (Biber, Swales, Sinclair) Factor & cluster analyses Youman’s Vocabulary Management Profiling Mutual information Historical Corpora (Davies) Frequencies, ratios Descriptive Benchmark (BNC, ANC) Frequencies, ratios
  • 29.
    Corpus Design andConstruction Aggregative Genre, register Meta-data: Situational context Activity Level of proficiency Synchronic Learner Corpora (Granger) Contrastive IL Analyses Frequencies, ratios Genre/Register/Variation (Biber, Swales, Sinclair) Factor & cluster analyses Youman’s Vocabulary Management Profiling Mutual information Descriptive Benchmark (BNC, ANC) Frequencies, ratios
  • 30.
    Corpus Design andConstruction Role of meta-data: Individual Task Time Corpus construction as a form of experimental research Diachronic Developmental Learner Corpora (Myles, Payne, Belz, Thorne et. al.) Frequencies, ratios, ??? Historical Corpora (Davies) Frequencies, ratios
  • 31.
    Corpus Annotation Frequencyand location of tags Laughter for hyperbole Language use as social action Part-of-speech Lemmatization Syntactic tagging Error tagging Semantic tagging
  • 32.
    Corpus Informing LanguageTheory Not only what is possible (e.g., nativist and UG approaches), but what is likely or frequent in usage Illustrates the limits of introspection about language (enormous differences between intuition and actual use) Language structure, i.e., formulaic sequences comprise ~60% of language use (Wray, 2002; Schmitt & Carter, 2004) Emergent grammar (Hopper, 2002; Bybee, 2001) Grammar a consequence, not a precondition -- epiphenomenal Grammar = observable repetition in discourse Grammar contingent upon lexical environment “ Grammar contracts as texts expands” --> fragments and repertoires
  • 33.
    Revisioning Ellipsis Speakersadd features as necessary rather than as taking away from what would be required in written discourse (see also Wittgenstein, 1953; Rommetveit, 1974) Omission of auxiliaries is common (be, have, do) but not often from speaker’s or 1 st person perspective Empty “its” and existential “there is” often dropped in spoken discourse Pronouns before modal verbs e.g., can happen, should be Overall, beginning bits are left out Grammatical description SHOULD represent spoken language use, should relate items and structures to interactional and situational functions
  • 34.
    Importance of Measuring& Understanding Process Alfred Binet (1909) advocated process assessment , though never designed an instrument to measure it. Buckingham (1921) accounting for learning processes as important as products .
  • 35.
    Challenges of Assessing Process Feasibility “the most direct procedure for determining an individual’s proficiency…would simply be to follow that individual surreptitiously over an extended period of time…It is clearly impossible, or at least highly impractical, to administer a ‘test’ of this type in the language learning situation” (Clark 1978, as quoted in Bachman, 1990). Scalability - the bane of “alternative” assessment
  • 36.
    Depicting Process in SLA Accuracy of production of L2 forms and IL development suggests a curvilinear rather than a linear relationship (Norris & Ortega, 2003). Threshold and stage effects (Meisel, Clahsen & Pienemann, 1991). U-shaped behavior (Kellerman, 1985) Omega-shaped behavior - temporary increase in frequency followed by a normalization (Wolfe-Quintero, Inagaki,& Kim, 1998).
  • 37.
    Using Corpus toAssess IL Development Addressing feasibility and scalability Proliferation of technology-mediated language learning More powerful computers and more refined software. Automated speech recognition - “dirty” ASR
  • 38.
    “Complementary” Assessment Usetesting techniques (traditional or performance) in conjunction with corpus-based assessment to generate a more detailed and broad-based account of IL development.
  • 39.
    An ITA’s successas instructors and future faculty depends on successful participation in written and spoken academic discourse e.g. spoken genres: Academic Discourse Performance small lecture presentation large lecture presentation discussion leading lab section leading seminar leading advising colloquia participation interviewing meeting participation office hours conducting service encounters tutorial leading socializing conference presentation
  • 40.
    The ITA “problem”Jan 2005: North Dakota proposed legislation: bill would have forced universities to reimburse class fees to student complaints about an instructor’s inability in English. If ten percent or more students had complained, the instructor would have been relieved from teaching pending further review. A watered-down version of the bill passed. High number of international graduate students in the U.S. -- 50 % of US graduate students in engineering and sciences are international
  • 41.
    Directive Language DLis language with directive illocutionary force (Searle, 1979) used functionally for making suggestions or giving advice In traditional frameworks, DL has primarily deontic qualities of obligative modality In textbooks, is taught as series of modals & semi-modals (must, mustn’t, have to, should, ought to, need to, needn’t) In SYS-FUNC, DL would be considered part of the MODULATION system, a continuum between obligation (what I want you to do) and inclination (what you want to do)
  • 42.
    Why Study DirectiveLanguage? DL is an important part of several academic discourse genres and professional competence Inappropriate or unintended use of DL may result in miscommunication or misunderstanding of speaker intention DL is highly interpersonal, involving speaker authority and power hierarchies
  • 43.
    Research Contrastive genre-comparablespoken corpora ITAcorp (ITA language use): office hours role plays (CMC, presentation, post-evaluation)-- approx. 120,000 tokens MICASE (base-line ‘expert’ corpus): Advising and Office Hours sub-corpora--180,000 tokens MICASE data as model Analytical framework: Corpus: usage-based, frequency & distribution Qualitative: (professional) discourse analysis, SYS-FUNC & APPRAISAL
  • 44.
  • 45.
    You [+ hedge]want to / wanna [+ hedge] MICASE ITACorp MICASE shows 12x the hedged use of want to / wanna ITACOrp uses followed pedagogical intervention on hedged wanna DL
  • 46.
    Additional Preliminary DescriptiveFindings In comparison to MICASE data, ITAs as represented in ITAcorp: Generally use very few hedges or intensifiers Generally under use periphrastic forms Overuse obligative modals (must, should) and please + imperative Use ‘ can ’ for obligative ‘ should ’ Use only basic conditional, underuse of ‘ you could ’ and no use of ‘ I would’ Navigate between ‘I’ and exclusive ‘ we’ strategically, invoking departmental or professorial authority when the going gets tough
  • 47.
    Next Steps Complementaryethnographic data (survey, interviews) for ITAcorp participants Use audio to produce narrow transcriptions of select data Focus on differences across modality (CMC vs. F2F) Focus on classroom presentation of a concept (contrasting with MICASE) Gather data from non-role play ITA professional activity (section leader, lecturer, office hours) Develop set of corpus-informed pedagogical interventions focusing on professional discourse competencies
  • 48.
    What is Data-DrivenLearning? Application of tools (concordancers) and techniques from corpus linguistics in the service of language learning. Inquiry-based pedagogy Learner as researcher &quot;Research is too important to leave to researchers&quot; (Johns, 1991, p2.)
  • 49.
    Paradigms of L2Instruction Traditional approaches: Present -> Practice -> Produce Data-driven learning: Observe -> Hypothesize -> Experiment
  • 50.
    Impact of CorpusTechniques on L2 Pedagogy Materials development How do native/expert speakers actually use the target language? What drives sequencing? Instructional activities Example - link Data-driven learning tools KWICionary - link
  • 51.
    Research on Data-DrivenLearning Vocabulary Acquisition: improved through the use of concordances (Steven, 1991; Cobb, 1997) Writing Instruction: students can correct their own errors with concordances (Gaskell & Cobb, 2004; Ross & Payne, 2005)
  • 52.
    Pedagogical Issues forDDL Learning a new way to learn language Relationship between proficiency level and data-driven learning approach Should frequency of use drive materials development?
  • 53.
    Next Generation CorpusTools Text files => relational databases Storing data as smallest atomic unit Associate extensive meta-data with each data entry Application-based => web-based Promote aggregation and sharing of data Location-independent collaborative research Integration with online learning environments Online Corpus Analytic Tool (OCAT)
  • 54.
    OCAT Design Relationaldatabase backend Extensive meta-data can be assigned to each data entry. Multiple corpora can be linked and meta-data fields aligned to create meta-corpora. Dynamic sub-corpora Users can create corpora and upload data via a web interface. Location-independent collaborative research Concordance query, frequency lists, Mean Tokens per Learner Data visualization techniques
  • 55.
  • 56.
  • 57.

Editor's Notes

  • #5 steve
  • #6 steve
  • #7 steve
  • #28 steve
  • #37 All of these accounts of IL developmental trajectories reflect the underlying process of restructuring and provide a challenge for current corpus analytic techniques.
  • #38 Yet, today so much of our learning is technology-mediated: email, chat, forum discussions, essays, voice-message boards, etc. that collecting much of the language students generate in instructional settings.
  • #39 To use only corpus for assessing IL development all naturally-occurring language use must be captured and added to the corpus. Even in the most Orwellian of worlds, this is not a possibility. In other words, the weakness of CBA lies in the inability to collect an exhaustive sample of a learner’s language. Therefore, it would be unwise to declare a learner deficient in some aspect of the language just because the structure/function/etc. was not present in the corpus. With “testing” a very different limitation is encountered: is the language elicited truly representative of the learner’s ability? Therefore, the two approaches to assessment are very complementary. If