by Alannah Fitzgerald, Open Education Practitioner and Researcher for Open Resources and Open Practices in English Language Teaching at Concordia University / The Open Educational Resources Research Hub on Nov 09, 2011
1 Writing with Open Tools (Part One)09/11/2011 http://www.flickr.com/photos/mikekline/265954619/ Alannah Fitzgerald
2 Overview (part one) Introducing Corpus Linguistics Lexical knowledge: collocations, derivatives, register The Flexible Language Acquisition Project (FLAX) The British National Corpus (BNC) The Lextutor The Academic Wordlist (AWL) EAP practice resources
Intro to corpus linguisticsLet‟s start with three questions about English:1. What is the meaning of goalless?2. How is the word shall used in present-day British English? Think of some examples.3. Which is more commonly expressed in everyday English? a. “I was a little disappointed…” b. “I was very disappointed…” Adapted from Hoffmann et al., 2008
British National Corpushttp://www.natcorp.ox.ac.uk/
Focus on representationThe British National Corpus (BNC)100 million-word static corpus 1978-1992 Spoken (10%); Written (90%); Domain representation
Focus on automationThe Flexible Language Acquisition Project(FLAX)Web n-gram corpora generated and supplied by 2006Google web dump 500,000 words and 380 million five-grams GALL - Google Assisted Language Learning (Chinnery, 2008; Shei, 2008)
„Goalless‟ keyword search in FLAX http://flax2.nzdl.org/greenstone3/flax?
Distribution of shall I/we in the spoken component of the BNC
Distribution of I/we shall in the spoken component of the BNC
FLAX - Samples retrieved for I was a littledisappointed
BNC - Samples retrieved for I was a littledisappointed
BNC – Samples retrieved for I was verydisappointed FLAX Web Collocations Collection Search (http://flax2.nzdl.org/greenstone3/flax?a=p&sa=home&module=)
FLAX vs BNC?• Limitations with representativeness Identifyingregister on the Web is difficult Successful corpora are based on domains, genres, collections of document types The web is a “dirty corpus” Kilgariff & Grefenstette (2003, p. 342) FLAX cleaned by 30% using BNC wordlist Linked externally to BNC, Yahoo Complementary sources, both with limitations
Google‟s terms of services“You agree not to access (or attempt to access)any of the Services by any means other thanthrough the interface that is provided byGoogle, unless you have been specificallyallowed to do so in a separate agreement withGoogle.”http:www.google.com/accounts/TOS Clause 5.3
Typical lexical errors18 telling a. He‟s very humorous. He‟s always doing jokes. collocation conversed b. We conversated for almost word families / derivatives one hour. without delay c. …and compromise, the issue was resolved in register a jiffy.