Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using online corpus for literacy teachers


Published on

Workshop conducted at the BDA International Conference 2016. 12 March 2016

Published in: Education
  • Be the first to comment

Using online corpus for literacy teachers

  1. 1. Online Corpus Literacy Teachers’ Best Friend Dominik Lukeš @techczech
  2. 2. Outline What is a corpus Answering questions with a corpus The language of corpus searches The corpus and the classroom Practice
  3. 3. Corpus / Corpora ????
  4. 4. of about language knowledge
  5. 5. Prescriptivism … how language should be used Descriptivism … how language is used v
  6. 6. “Most of the prescriptive rules of the language mavens make no sense on any level. They are bits of folklore that originated for screwball reasons several hundred years ago… For as long as they have existed, speakers have flouted them…”
  7. 7. “intellectual abdication” “should be ashamed” “current around 1900” “a perversion of grammatical education” “blind to textual evidence even when he himself exhibits it” “dishonest and stupid” “vile little compendium of tripe about style” Grammarian Geoffrey K Pullum on … “More passives in Orwell's pompous essay with the warning about how you mustn't use them than in any periodical you can lay your hands on! “
  8. 8. This usage stuff is not straightforward and easy. If ever someone tells you that the rules of English grammar are simple and logical and you should just learn them and obey them, walk away, because you're getting advice from a fool.
  9. 9. Corpus Key modern tool for finding out about how language works…
  10. 10. Corpus … is a large database of representative language samples …
  11. 11. Corpus … 100s of millions of words from (mostly) written language in different genres in small samples (~2000 words) …
  12. 12. Corpus … used for linguistic research, making dictionaries, writing grammars, …
  13. 13. Corpora available for teachers
  14. 14. BYU corpora available COCA (contemporary Am English) COHA (historical Am English) GloWbE (global web English) Wikipedia Google Books (BrEng/AmEng) BNC (British National Corpus) Hansard (British parliamentary speeches) Spanish/Portugese
  15. 15. Access to COCA and related BYU corpora is free… …but free registration required for more than ~10 queries a day
  16. 16. Other resources derived from BYU corpora
  17. 17.
  18. 18.
  19. 19. Searching a corpus early on in the process of making a generalization can save you a lot of unpleasant surprises later.
  20. 20. How do we use the word dyslexia? We speak more often of dyslexic children than adults. We speak more often of dyslexia than any other dys- word.
  21. 21. Concordance BNC: dyslexic [n*] COCA: dyslexic [n*]
  22. 22. COCA: dys*
  23. 23. Suffixing rules *yed *ied
  24. 24. Suffixing rules *yed *ied played stayed portrayed enjoyed unemployed surveyed died tried married worried identified applied
  25. 25. The Corpus Magic * [ ] ? Different corpora use slightly different codes. Read the manual. [n* ]
  26. 26. The Corpus Magic * [ ] ? Any one character Any number of characters (incl 0) Lemma (all inflectional forms of a word) Different corpora use slightly different codes. Read the manual. [n* ] Part of speech tags (e.g. nouns)
  27. 27. *each each, reach, beach, teach, outreach, …, impeach, … teach* teachers, teaching, …, teachable, teacher-librarians, … t*ch touch, teach, tech, torch, trench, twitch, …, three-inch, … teach * teach the, teach us, teach students, …
  28. 28. ?each reach, beach, teach, peach, leach, keach, … each? each- (1), each# (1) [ie nothing] ?each? peachy, bleachy, teacha, reachs (2) [ie spelling error], … t?ch tech, tach, toch, tuch, tsch, tich t??ch touch, teach, torch, tisch, …
  29. 29. [Lemma]
  30. 30. Part of speech tags [run].[n*] [run] [n*]
  31. 31. Common tags [n*] noun [NN2] plural nouns [v*] verb [VVD] verb past tense [aj*] (BNC) / [j*](COCA) adjective [av*] (BNC) / [r*](COCA) adverb
  32. 32. Help
  33. 33. You can also cats and dogs search for idioms ?each*s combine wildcards [=pretty] search for synonyms car|bike|horse search for alternatives used -car exclude searches For more details see:
  34. 34. Concordance + KWIC *ies.[N*]
  35. 35. KWIC – Key-Word In Context *ies.[N*]
  36. 36. Limit searches by genre
  37. 37. Other questions corpus can answer Are there more nouns or verbs ending in -ies? *ies.[V*] vs. *ies.[N*] Are there four-letter verbs ending in -ed in the present tense? ??ed.[VVB] What are the most common adjectives describing students vs. pupils. [j*] [student] vs. [j*] [pupil] What do we say teachers do most often? [teacher] [vvb]
  38. 38. Corpus, rules, and regularity pre* *ed *ies.[V*]
  39. 39. Collocations Limits on variability See also Kennedy, p. 80-23
  40. 40. Collocations Limits on variability See also Kennedy, p. 80-23
  41. 41. Collocations (cont) [teacher] must [v*]
  42. 42. Idioms and set phrases 275 results 359 results
  43. 43. Google as a Corpus "put the search text in quotes" use * for the search item
  44. 44. k
  45. 45. Google as a Corpus PRO: rare, low frequency usage, up-to-date usage CON: no sampling, no frequency sort, no genre limit, no part of speech tags
  46. 46. Google results counts are only rough estimates… Different people searching in different geographic locations can get different numbers Sometimes searching for A gives fewer results than searching for A without B
  47. 47. …but Google fights can be fun
  48. 48. WebCorp is makes Google search results linguist-friendly
  49. 49. Avoid Common Corpus Errors
  50. 50. Be aware of limitations: sampling, coverage, size, presence of typos and errors, bad part of speech tagging Beware of low frequency results Beware of homographs
  51. 51. Check results come from multiple sources Check KWIC to confirm relevance Limit search by genre
  52. 52. Check examples and sources k
  53. 53. Always check low frequency results must [v*] [n*] …sometimes they come from the same source
  54. 54. False roots corner, silly, preface, cockroach, protest, stable …
  55. 55. Make your own corpus with TextSTAT
  56. 56. Make your own corpus with AntConc
  57. 57. Corpus in the classroom teacher preparation student discovery
  58. 58. Teacher preparation find relevant, common examples prepare worksheets check for exceptions find out answers to student questions about rules and usage
  59. 59. Student discovery show search results to students to work out rules or word meanings teach students how to search for questions ask students to give each other puzzles for searching
  60. 60. For heavy classroom use… register for group access to prevent spam lock out
  61. 61. Corpus v dictionary
  62. 62. Non-classroom corpus use supplement dictionary cross-word puzzles check typical usage when writing
  63. 63. Where to go next?
  64. 64. Thank you Contact