The document introduces several online tools and corpora that can be used to analyze English language usage, including the British National Corpus, the Flexible Language Acquisition Project, and the Compleat Lexical Tutor. It discusses how these resources can help with identifying collocations, derivatives, and register variation. Examples are provided of searching corpora to analyze the usage of words like "shall" and phrases containing verbs like "be disappointed."
1. 1
Writing with Open
Tools
(Part One)
09/11/2011 http://www.flickr.com/photos/mikekline/265954619/ Alannah Fitzgerald
2. 2 Overview (part one)
Introducing Corpus Linguistics
Lexical knowledge: collocations, derivatives,
register
The Flexible Language Acquisition Project
(FLAX)
The British National Corpus (BNC)
The Lextutor
The Academic Wordlist (AWL)
EAP practice resources
3. Intro to corpus linguistics
Let‟s start with three questions about English:
1. What is the meaning of goalless?
2. How is the word shall used in present-day British
English? Think of some examples.
3. Which is more commonly expressed in everyday
English?
a. “I was a little disappointed…”
b. “I was very disappointed…”
Adapted from Hoffmann et al., 2008
5. Focus on representation
The British National Corpus (BNC)
100 million-word static corpus 1978-1992
Spoken (10%); Written (90%); Domain representation
9. Focus on automation
The Flexible Language Acquisition Project
(FLAX)
Web n-gram corpora generated and supplied by 2006
Google web dump
500,000 words and 380 million five-grams
GALL - Google Assisted Language Learning
(Chinnery, 2008; Shei, 2008)
13. FLAX - Samples retrieved for I was a little
disappointed
14. BNC - Samples retrieved for I was a little
disappointed
15. BNC – Samples retrieved for I was very
disappointed
FLAX Web Collocations Collection Search (http://flax2.nzdl.org/greenstone3/flax?a=p&sa=home&module=)
16. FLAX vs BNC?
• Limitations with representativeness
Identifyingregister on the Web is difficult
Successful corpora are based on
domains, genres, collections of document types
The web is a “dirty corpus” Kilgariff & Grefenstette
(2003, p. 342)
FLAX cleaned by 30% using BNC wordlist
Linked externally to BNC, Yahoo
Complementary sources, both with limitations
17. Google‟s terms of services
“You agree not to access (or attempt to access)
any of the Services by any means other than
through the interface that is provided by
Google, unless you have been specifically
allowed to do so in a separate agreement with
Google.”
http:www.google.com/accounts/TOS Clause 5.3
18. Typical lexical errors
18
telling
a. He‟s very humorous. He‟s always doing
jokes. collocation
conversed
b. We conversated for almost word families / derivatives
one hour.
without delay
c. …and compromise, the issue was resolved in
register
a jiffy.
20. OSS Mozilla
http://www.flickr.com/photos/hindrik/2586245939/
21. 21 FLAX Web Pronoun Phrases Collection Search (http://flax2.nzdl.org/greenstone3/flax?a=p&sa=home&module=)
22. Noticing Text Types – Issues of Register and
Genre
FLAX Web Pronoun Phrases Collection Search (http://flax2.nzdl.org/greenstone3/flax?a=p&sa=home&module=)
22
23. FLAX Web Pronoun Phrases Collection Search (http://flax2.nzdl.org/greenstone3/flax?a=p&sa=home&module=)
23
26. Web Collocations (fact vs idea)
26
http://flax2.nzdl.org/greenstone3/flax?a=g&rt=r&sa=CollocationSearch&s=CollocationTypes&s1.wordClass=n&c=c
ollodb&s1.query=&s1.multiple=on
36. 36 FLAX Web Phrases Collection Search (http://flax2.nzdl.org/greenstone3/flax?a=p&sa=home&module=)
37. 37 Preparation (part two)
• Samples of your own writing – soft copy
• Build your own corpus – collect ten
academic articles in your discipline
• Writing analysis tools
• Specific academic word lists