Language corpora for grammar and
vocabulary instruction.

Jon Smart
Department of Linguistics
University of Pittsburgh
Language corpora
¤ Originally corpora were paper-based and were
searched by hand.
¤ Modern corpora are computer databases that can
be searched or analyzed using software programs.
¤ Some corpora are available as online searchable
databases.
¤ Many Modern grammar references and some ESL
textbooks are based on/informed by corpora.
Corpora in language learning
Corpora can offer language learners the following benefits:
¤  More accurate descriptions of the language than
teachers’/textbook writers’ intuitions (Biber & Reppen,
2002, Conrad, 1999).
¤  Exposure to contextualized, meaningful language and
“real” language data (Reppen, 2010).
¤  Language input for a specific registers (Conrad, 2000,
Flowerdew, 2009, 2012).
¤  A reference tool for autonomous learning (Johns, 1991).
Application to language teaching
1.  Corpus research informs syllabus design (e.g.,
word frequency, register information for ESP).
2.  Textbooks and materials informed by language
corpora
3.  Learners use corpora or corpus data directly as
learning tools.
Textbooks/materials
¤ Grammar & Beyond, Reppen et al,, 2012
¤ Real Grammar, Conrad & Biber, 2009
¤ Touchstone Series, McCarthy, McCarten, &
Sandiford, 2004/2006
¤ Natural Grammar, Thornbury, 2004 [example]
Corpora as learning tools
¤  Data-driven learning pioneered by Johns (1991) having
relatively advanced language learners search corpora
he built using micro-computers.
¤  Learner-focused approach to identifying patterns/trends
in language learning.
¤  Can be done either by having learners conduct their
own searches of corpora using a search tool (e.g., a
concordancer) OR by pre-selecting materials from
corpora to help learners discover language patterns.
Data-driven learning
¤  Has been used in several areas of language teaching:
vocabulary, grammar, discourse, writing, translation
studies, etc.
¤  Research on both hands-on and paper-based DDL, but
empirical evidence not entirely clear (see Boulton 2010,
2012 for summaries of empirical research).
¤  Can use existing corpora or you can have students
create their own (valuable for ESP contents).
Hands-on DDL
Two strategies for doing hands-on concordancing
with your students:
1.  Build your own corpus of documents relevant
to your course(s).
2.  Use an already existing, web-based corpus or
corpora.
Build your own corpus
What you need to get started:
¤ Collection of language texts (saved as
individual text files (.txt) or similar).
¤ Concordancer (a search interface for your
texts)
¤ AntConc (free)
¤ Wordsmith
¤ MonoConc
Using an existing corpus
¤  Several important, powerful corpora available online.
Here are a few
¤  Lextutor – very valuable for vocabulary study. Lextutor.ca
¤  Michigan Corpus of Academic Spoken English (MICASE)
¤  Michigan Corpus of Upper-Level Student Papers
(MICUSP)
¤  Sacodeyl EU-sponsored project, Variety of languages.
¤  Collection of corpora at Brigham Young University.
corpus.byu.edu.
Using an existing corpus
¤  Several important, powerful corpora available online (all
free, but some require registration) Here are a few
¤  Lextutor – very valuable for vocabulary study. Lextutor.ca
¤  Michigan Corpus of Academic Spoken English (MICASE)
¤  Michigan Corpus of Upper-Level Student Papers
(MICUSP)
¤  Sacodeyl EU-sponsored project, Variety of languages.
¤  Collection of corpora at Brigham Young University.
corpus.byu.edu.
BYU Corpus interface
Paper-based DDL
¤  Using a concordancer or web-interface can be daunting
for students.
¤  Teachers can use corpora to build teaching activities
that use authentic language materials but are not
overwhelming for their students. Focus on language, not
learning to search corpora.
Steps in a DDL activity
1.  Illustration: looking at data.
2.  Interaction: discussion and sharing observations
and opinions.
3.  Intervention: optional step to provide learners with
hints or clearer guides for induction.
4.  Induction: making one’s own rule for a particular
feature.
Flowerdew, 2009
Let’s look at some examples.
¤  Teach vocabulary (word meaning, collocations,
differences in meaning, parts of speech)
¤  Grammar (valuable for looking at differences between
forms)
¤  Corpora are especially useful for teaching lexicogrammatical information.
CAN (modal): how does it function
¤  Students examine a corpus to find examples of different
functions of the modal CAN.
¤  CAN has four functions in English:
¤  Ability
¤  Possibility
¤  Permission
¤  Request
• 

Find examples of each function.
Examples of CAN
Expanded context
A number of vs. The number of
¤  With respect to these approaches, __________ number of
countries in the world have revised their education
systems in general.
¤  And after decades of progress against malnutrition,
__________ number of its victims is rising.
¤  In Brazil, for instance, in 1991, __________ number of
people 60 years old or older was of 10.7 million.
¤  __________ number of lifecycle assessment software
packages have been released in recent years to help
designers.
Audience Participation
Oh No!
Avaunt: What does it mean?
¤  “Avaunt and quit my sight, thy face is dirty, and thy
hands unwashed; Avaunt, avaunt! I say!”
¤  “Avaunt, ye malicious enchanters, avaunt, ye wizard
tribe! For I am Don Quixote de la Mancha, against whom
your wicked arts avail not.”
¤  Now I comprehend the meaning of thy damned visit;
Avaunt, thou hag of darkness, and quit my
¤  He called me villain! bade me avaunt!
Grammercy: What does it mean?
¤  “Ha! Salisbury, good knight, and true,” returned the
Frenchman. "I knew not thou wert here. Gramercy for thy
caution, else had it fared with me right hardly.”
¤  “Thou shalt have a royal guard of salvages to escort thee
whither thou wilt go." "Gramercy for thy courtesy good
my Valiant," replied Winslow in the same tone.
¤  “My good lord,” cried Tristram, “gramercy of your
goodness which ye showed me in your marches, and of
your nobleness in calling me unto your aid, for it is great
honour to me that ye ask this, and I will do all for you to
the utmost of my strength.”
Considerations for using corpora
¤  What “meta-language” do students need to use corpora
or complete inductive activities?
¤  How much do students need to understand about the
online tools?
¤  What level is best to have students work with corpora?
¤  What vocabulary/grammar is best taught deductively or
inductively?
¤  Research on DDL outcomes is limited.
Thank you?
¤  Feel free to contact me at jsmart@pitt.edu

Using corpora in instruction

  • 1.
    Language corpora forgrammar and vocabulary instruction. Jon Smart Department of Linguistics University of Pittsburgh
  • 2.
    Language corpora ¤ Originally corporawere paper-based and were searched by hand. ¤ Modern corpora are computer databases that can be searched or analyzed using software programs. ¤ Some corpora are available as online searchable databases. ¤ Many Modern grammar references and some ESL textbooks are based on/informed by corpora.
  • 3.
    Corpora in languagelearning Corpora can offer language learners the following benefits: ¤  More accurate descriptions of the language than teachers’/textbook writers’ intuitions (Biber & Reppen, 2002, Conrad, 1999). ¤  Exposure to contextualized, meaningful language and “real” language data (Reppen, 2010). ¤  Language input for a specific registers (Conrad, 2000, Flowerdew, 2009, 2012). ¤  A reference tool for autonomous learning (Johns, 1991).
  • 4.
    Application to languageteaching 1.  Corpus research informs syllabus design (e.g., word frequency, register information for ESP). 2.  Textbooks and materials informed by language corpora 3.  Learners use corpora or corpus data directly as learning tools.
  • 5.
    Textbooks/materials ¤ Grammar & Beyond,Reppen et al,, 2012 ¤ Real Grammar, Conrad & Biber, 2009 ¤ Touchstone Series, McCarthy, McCarten, & Sandiford, 2004/2006 ¤ Natural Grammar, Thornbury, 2004 [example]
  • 8.
    Corpora as learningtools ¤  Data-driven learning pioneered by Johns (1991) having relatively advanced language learners search corpora he built using micro-computers. ¤  Learner-focused approach to identifying patterns/trends in language learning. ¤  Can be done either by having learners conduct their own searches of corpora using a search tool (e.g., a concordancer) OR by pre-selecting materials from corpora to help learners discover language patterns.
  • 9.
    Data-driven learning ¤  Hasbeen used in several areas of language teaching: vocabulary, grammar, discourse, writing, translation studies, etc. ¤  Research on both hands-on and paper-based DDL, but empirical evidence not entirely clear (see Boulton 2010, 2012 for summaries of empirical research). ¤  Can use existing corpora or you can have students create their own (valuable for ESP contents).
  • 10.
    Hands-on DDL Two strategiesfor doing hands-on concordancing with your students: 1.  Build your own corpus of documents relevant to your course(s). 2.  Use an already existing, web-based corpus or corpora.
  • 11.
    Build your owncorpus What you need to get started: ¤ Collection of language texts (saved as individual text files (.txt) or similar). ¤ Concordancer (a search interface for your texts) ¤ AntConc (free) ¤ Wordsmith ¤ MonoConc
  • 12.
    Using an existingcorpus ¤  Several important, powerful corpora available online. Here are a few ¤  Lextutor – very valuable for vocabulary study. Lextutor.ca ¤  Michigan Corpus of Academic Spoken English (MICASE) ¤  Michigan Corpus of Upper-Level Student Papers (MICUSP) ¤  Sacodeyl EU-sponsored project, Variety of languages. ¤  Collection of corpora at Brigham Young University. corpus.byu.edu.
  • 13.
    Using an existingcorpus ¤  Several important, powerful corpora available online (all free, but some require registration) Here are a few ¤  Lextutor – very valuable for vocabulary study. Lextutor.ca ¤  Michigan Corpus of Academic Spoken English (MICASE) ¤  Michigan Corpus of Upper-Level Student Papers (MICUSP) ¤  Sacodeyl EU-sponsored project, Variety of languages. ¤  Collection of corpora at Brigham Young University. corpus.byu.edu.
  • 14.
  • 15.
    Paper-based DDL ¤  Usinga concordancer or web-interface can be daunting for students. ¤  Teachers can use corpora to build teaching activities that use authentic language materials but are not overwhelming for their students. Focus on language, not learning to search corpora.
  • 16.
    Steps in aDDL activity 1.  Illustration: looking at data. 2.  Interaction: discussion and sharing observations and opinions. 3.  Intervention: optional step to provide learners with hints or clearer guides for induction. 4.  Induction: making one’s own rule for a particular feature. Flowerdew, 2009
  • 17.
    Let’s look atsome examples. ¤  Teach vocabulary (word meaning, collocations, differences in meaning, parts of speech) ¤  Grammar (valuable for looking at differences between forms) ¤  Corpora are especially useful for teaching lexicogrammatical information.
  • 18.
    CAN (modal): howdoes it function ¤  Students examine a corpus to find examples of different functions of the modal CAN. ¤  CAN has four functions in English: ¤  Ability ¤  Possibility ¤  Permission ¤  Request •  Find examples of each function.
  • 19.
  • 20.
  • 21.
    A number ofvs. The number of ¤  With respect to these approaches, __________ number of countries in the world have revised their education systems in general. ¤  And after decades of progress against malnutrition, __________ number of its victims is rising. ¤  In Brazil, for instance, in 1991, __________ number of people 60 years old or older was of 10.7 million. ¤  __________ number of lifecycle assessment software packages have been released in recent years to help designers.
  • 22.
  • 23.
    Avaunt: What doesit mean? ¤  “Avaunt and quit my sight, thy face is dirty, and thy hands unwashed; Avaunt, avaunt! I say!” ¤  “Avaunt, ye malicious enchanters, avaunt, ye wizard tribe! For I am Don Quixote de la Mancha, against whom your wicked arts avail not.” ¤  Now I comprehend the meaning of thy damned visit; Avaunt, thou hag of darkness, and quit my ¤  He called me villain! bade me avaunt!
  • 24.
    Grammercy: What doesit mean? ¤  “Ha! Salisbury, good knight, and true,” returned the Frenchman. "I knew not thou wert here. Gramercy for thy caution, else had it fared with me right hardly.” ¤  “Thou shalt have a royal guard of salvages to escort thee whither thou wilt go." "Gramercy for thy courtesy good my Valiant," replied Winslow in the same tone. ¤  “My good lord,” cried Tristram, “gramercy of your goodness which ye showed me in your marches, and of your nobleness in calling me unto your aid, for it is great honour to me that ye ask this, and I will do all for you to the utmost of my strength.”
  • 25.
    Considerations for usingcorpora ¤  What “meta-language” do students need to use corpora or complete inductive activities? ¤  How much do students need to understand about the online tools? ¤  What level is best to have students work with corpora? ¤  What vocabulary/grammar is best taught deductively or inductively? ¤  Research on DDL outcomes is limited.
  • 26.
    Thank you? ¤  Feelfree to contact me at jsmart@pitt.edu