Presented at The 2nd Joint International Methodology Research Colloquium hosted by Okinawa JALT (Japan Association for Language Teaching) , Corpus SIG, Korea Association of Teachers of English (KATE), Korea, & Methodology SIG, Kansai Chapter,
Language Education and Technology (LET), Japan
Can we develop TV drama corpus-informed English vocabulary materials for elementary-level EFL learners?
1. Can we develop
TV drama corpus-informed
English vocabulary materials
for elementary-level EFL learners?
Hiroya Tanaka
Hokkai-Gakuen University
tanaka-h@hgu.jp
2nd Joint International Methodology Research Colloquium
Feb. 16, 2016 @Tenbusu Naha, Okinawa, Japan
2. Menu
1. Background - Why?
2. Purpose - What?
3. Method - How?
4. Results and Discussion - Where to?
3. 1. Background - Why?
Problem
• Insufficient input
both in quantity and quality
1.1 Elementary-level EFL learners in Japan
4. Input sufficient in quantity?
Authorized English textbook for JHS
Sunshine English Course 1 ~ 3, Kairyudo
• headword: 1,241 words
• token: 6,131 words
( + appendices = 7,583 words)
1.1 Elementary-level EFL learners in Japan
5. token: 6,131 words
( + appendices = 7,583 words)
= 24 pages in a paperback novel
1.1 Elementary-level EFL learners in Japan
6. Input sufficient in quantity?
1.1 Elementary-level EFL learners in Japan
textbook drill dictionary
7. Input sufficient in quantity and quality?
1.1 Elementary-level EFL learners in Japan
vocabulary book
8. (example) celebrate
• This is the festival celebrating the end of the
year. (Textbook A)
• We celebrated John’s birthday. (Book A)
• celebrate our father’s birthday (Book B)
• Quite a few people have been invited to
celebrate the couple’s anniversary. (Book C)
1.1 Elementary-level EFL learners in Japan
9. 1.1 Elementary-level EFL learners in Japan
Input sufficient in quality?
• Lack in function (to-whom, for-what,
where, when?)
• Little attention to Politeness?
(neg-politeness / pos-politeness / without
redressive action, baldly)
• Mostly one-way
10. • Need for sufficient input both in
quantity and quality
1.1 Elementary-level EFL learners in Japan
• Need for more exposure to
everyday life English
11. 1. Background - Why?
1.2 Formulaic language
formulaic sequence
multi-word construction lexical bundlelexical phrase
multi-word unit
collocation idiom phrasal verb
pragmatic routine conventional expression formula
12. Significance
• Wide coverage both in spoken and written text
(e.g., Biber et al., 1999; Erman & Warren, 2000; Foster, 2001)
• Use linked to higher assessments on speech and
writing tasks
(e.g., Hsu & Chiu, 2008; Keshavarz & Salimi, 2007)
• “Most (all?) conventional speech acts are realized
by families of formulaic language …”
(Schmitt, 2010, p. 120)
1.2 Formulaic language
13. Challenge for learners
• slow development
• limited ability to intuit usage norms
• first language-based sequences use
which end up inappropriate
(Wood, 2015)
1.2 Formulaic language
14. Application
• Academic Formulas List (AFL)
- Core and the top 200 Written and Spoken AFL
(Simpson-Vlach & Ellis, 2010)
• PHRASE List
- 505 most frequent non-transparent multiword
expressions
(Martinez & Schmitt, 2011)
• PHAVE List
- 150 most frequent phrasal verbs
(Garnier & Schmitt, 2015)
1.2 Formulaic language
15. Direction of this study
Not learning formulaic language
itself,
but learning basic vocabulary
through formulaic language
1.2 Formulaic language
16. Why formulaic language?
“You shall know a word
by the company it keeps.”
(Firth, 1957, p. 11)
Learners should learn a word
by the company it keeps.
1.2 Formulaic language
17. Why corpus?
• Corpus data have always played a major part in
developing vocabulary materials
• Difficultly in including formulaic sequence into
materials
- which word strings are formulaic
- which of the many formulae known to native
speakers learners most need to learn
(Jones & Durrant, 2010)
1.3 TV drama corpus
20. • spoken(-like) corpus
• much larger and more recent than other
spoken corpora (100 million words)
• search for phrases, grammatical
constructions, collocates available
• more words dealing with everyday life and
personal relationships than spoken proportion
of BNC and COCA (Davies, 2012)
Why TV drama corpus? Why SOAP?
21. Why TV drama corpus?
• Corpus of “you and I” and “here
and now”
• Potential as corpus for
Elementary level EFL learners
1.3 TV drama corpus
23. 2. Purpose - What?
Developing TV drama corpus-informed
supplementary vocabulary materials
for elementary-level Japanese EFL
learners
to help them acquire knowledge of basic
English vocabulary through formulaic
language
24. 3. Method - How?
Method
for
Materials Development?
25. Simple sequence of materials development
(Jolly & Bolitho, 2011)
1. Identification of need
2. Exploration of language
3. Contextual realisation
5. Physical production
4. Pedagogical realisation
6. Use
7. Evaluation
26. Simple sequence of materials development
(Jolly & Bolitho, 2011)
1. Identification of need
Students have difficulty
understanding the sentence,
“It’s time Prime Minister listened
to his critics.”
27. Simple sequence of materials development
(Jolly & Bolitho, 2011)
2. Exploration of language
The teacher consults
“Practical English Usage”
(Swan, 2005)
for explanations and example
sentences.
28. Simple sequence of materials development
(Jolly & Bolitho, 2011)
3. Contextual realisation
The teacher decides to produce
worksheets on “Hypothetical
Meaning” for class use to
reinforce actual teaching.
29. Simple sequence of materials development
(Jolly & Bolitho, 2011)
4. Pedagogical realisation
The teacher decides on
contrastive approach (facts vs.
hypothesis). The focus is on
unspoken meaning and
speaker’s attitude.
30. Simple sequence of materials development
(Jolly & Bolitho, 2011)
5. Physical production
The worksheet is produced as a
Word document, photocopied
and distributed to learners.
31. Simple sequence of materials development
(Jolly & Bolitho, 2011)
6. Use
There is an introduction in class,
followed by completion of the
worksheet at home and checking
in the next class.
32. Simple sequence of materials development
(Jolly & Bolitho, 2011)
7. Evaluation
Students write comments and
difficulties with the worksheet.
33. a variety of optional pathways and feedback loops which
make the whole process dynamic and self-regulating
Jolly & Bolitho (2011)
34. 3.1 Procedure for this study
1. Identification of need
2. Exploration of language
3. Contextual realisation
5. Tentative Physical production
4. Pedagogical realisation
6. Use (Empirical Study)
7. Evaluation (Empirical Study)
35. 3.1 Procedure for this study
8. Re-exploration of language
9. Re-contextual realisation
10. Re-pedagogical realisation
11. Physical production
36. 1. Identification of need
• Need for sufficient input both in
quantity and quality
• Need for more exposure to
everyday life English
37. Corpora
• Corpus of American SOAP operas
(100 million words)
2. Exploration of language
• Corpus of American SITCOM
(5 million words)
38. Corpus of American SITCOM
• 5 million words
• 18 titles from 2006 to 2015
• Transcript data from different fansites
39. Corpus of American SITCOM
18 titles from 2006 to 2015
• The Big Bang Theory (Season 1 ~ 8 )
• Modern Family (Season 1 ~ 6)
• Parks and Recreation (Season 2 ~ 6 )
• Baby Daddy (Season 1 ~ 4 )
• Jessie (Season 1 ~ 2)
• Girl Meets World (Season 1 ~ 2 )
• The Middle (Season 3 ~ 6 )
• It’s always sunny in Philadelphia (Season 6 ~ 10 )
40. Corpus of American SITCOM
• 2 Broke Girls (Season 1 ~ 4 )
• Bad Teacher (Season 1)
• Silicon Valley (Season 1 ~ 2)
• Switched at Birth (Season 1 ~ 4)
• Raising Hope (Season 1 ~4 )
• Community (Season 1 ~ 5)
• Happy Endings (Season 1 ~ 3)
• Melissa & Joey (Season 2 ~ 4)
• Mike & Molly (Season 1 ~ 5 )
• How I Met Your Mother (Season 6 ~ 9)
41. • Base : American SITCOM
- produce n-gram list
- search for example sentence
- search for situation and function
• Reference : American SOAP operas
- search for construction
- frequency / MI score
2. Exploration of language
43. • Basic vocabulary list:
Longman Communication 3000
• Spoken formulas list:
SITCOM Corpus
• Formula search:
SOAP and SITCOM
3. Contextual realisation
44. 3. Contextual realisation
Target Word
(Longman 3000)
Spoken Formulas List
(SITCOM Corpus)
Example Formula
(SOAP and SITCOM)
Vocabulary Material
45. • Basic vocabulary list
Longman communication 3000:
a list of the 3000 most frequent words
in both spoken and written English,
based on the 390 million words in the
Longman Corpus Network
(Bullon & Leech, 2007)
3. Contextual realisation
46. Spoken formulas list: SITCOM Corpus
(1) Extract word, 2-gram, 3-gram,
4-gram, and 5-gram list from SITCOM
(2) Cut off at the point of 10 per million
words (pmw)
(3) Use checklist to compile spoken
formulas list
3. Contextual realisation
48. Checklist for Spoken formula
1) two or more words (2- to 5-gram)
2) more than 10 pmw
3) pragmatic integrity
4) transparent rather than opaque
(literal rather than non-literal)
3. Contextual realisation
49. 3) pragmatic integrity
• discourse marking (e.g., you know, if you see
what I mean)
• face and politeness (e.g., do you think, do you
want me to)
• vagueness and approximation (e.g., a couple
of, or something like that)
(O’Keeffe, McCarthy, & Carter, 2007)
3. Contextual realisation
50. 4) transparent rather than opaque
• the most common sequences in
everyday talk — their recurrence
is typically subliminal and not
immediately accessible to the
intuition of the native speakers
(Adolphs & Carter, 2013)
3. Contextual realisation
51. Checklist for Example formula
1) Span of ± 4
(Sinclair, Jones, & Daley, 2004)
2) MI score of at least 3.0
(Wood & Namba, 2013)
3) transparent rather than opaque
(literal rather than non-literal)
3. Contextual realisation
52. • SITCOM word list
• spoken formulas list
→ General spoken formulas list
3. Contextual realisation
4. Results and Discussion - Where to?
53. • SITCOM word list
3. Contextual realisation
4. Results and Discussion - Where to?
54. Rank Word Frequency Per Million
1 you 170029 33900.68
2 i 165635 33024.60
3 the 130895 26098.08
4 to 114979 22924.72
5 a 110625 22056.61
6 and 81224 16194.58
7 it 62501 12461.56
8 that 61105 12183.22
9 of 55067 10979.36
10 is 50012 9971.48
3.4%
3.3%
55. Rank Word Frequency Per Million
11 in 47184 9407.63
12 my 43872 8747.28
13 this 43031 8579.60
14 me 42921 8557.66
15 i'm 41755 8325.19
16 what 40397 8054.42
17 we 36501 7277.63
18 oh 36177 7213.03
19 no 35887 7155.21
20 on 34843 6947.06
56. • General spoken formulas list
3. Contextual realisation
4. Results and Discussion - Where to?
57. Rank 2-gram Frequency Per Million
1 you know 14136 2818.46
2 i don't 12591 2510.42
3 are you 10018 1997.41
4 this is 9377 1869.60
5 do you 7829 1560.96
6 have to 7041 1403.85
7 i know 7015 1398.66
8 i have 6620 1319.91
9 all right 6240 1244.14
10 i think 6075 1211.24
58. Rank 2-gram Frequency Per Million
11 i just 5770 1150.43
12 a little 5291 1054.93
13 i mean 5248 1046.36
14 come on 4827 962.42
15 if you 4803 957.63
16 i can't 4729 942.88
17 thank you 4698 936.70
18 i can 3983 794.14
19 you guys 3965 790.55
20 did you 3744 746.49
59. Rank 3-gram Frequency Per Million
1 i don't know 4198 837.00
2 you know what 3457 689.26
3 what are you 3326 663.14
4 oh my god 3179 633.83
5 what do you 2115 421.69
6 you want to 1898 378.43
7 are you doing 1889 376.63
8 a lot of 1872 373.24
9 i don't want 1623 323.60
10 i have to 1622 323.40
60. Rank 3-gram Frequency Per Million
11 i want to 1486 296.28
12 do you think 1312 261.59
13 i don't think 1294 258.00
14 you have to 1180 235.27
15 i can't believe 1149 229.09
16 i'm going to 1103 219.92
17 why don't you 1077 214.73
18 i need to 955 190.41
19 do you want 935 186.42
20 what the hell 909 181.24
61. Rank 4-gram Frequency Per Million
1 what are you doing 1658 330.57
2 i don't want to 1056 210.55
3 you don't have to 577 115.04
4 what do you think 563 112.25
5 what do you mean 509 101.49
6 i want you to 455 90.72
7 you want me to 455 90.72
8 do you want to 398 79.35
9 get out of here 388 77.36
10 in the middle of 336 66.99
62. Rank 4-gram Frequency Per Million
11 thank you so much 327 65.20
12 i just want to 318 63.40
13 i need you to 313 62.41
14 nice to meet you 307 61.21
15 i just wanted to 302 60.21
16 what do you want 300 59.81
17 you don't want to 284 56.62
18 if you want to 276 55.03
19 i don't know how 275 54.83
20 i don't know if 271 54.03
63. Rank 5-gram Frequency Per Million
1 what are you doing here 513 102.28
2 what are you talking about 430 85.73
3 do you want me to 158 31.50
4 what the hell are you 141 28.11
5 what are we gonna do 138 27.51
6 thank you so much for 135 26.92
7 i don't know what to 134 26.72
8 can i talk to you 129 25.72
9 you know what i mean 128 25.52
10 what are you gonna do 122 24.32
64. Rank 5-gram Frequency Per Million
11 i don't want you to 114 22.73
12 what is wrong with you 93 18.54
13 what am i supposed to 92 18.34
14 what are you guys doing 91 18.14
15 you know what i think 89 17.74
16 how am i supposed to 88 17.55
17 let's get out of here 87 17.35
18 am i supposed to do 86 17.15
19 i don't know how to 86 17.15
20 i need to talk to 84 16.75
65. • General spoken formulas list criteria
cover 57% of the top 200
Academic spoken formulas.
(Simpson-Vlach & Ellis, 2010)
3. Contextual realisation
4. Results and Discussion - Where to?
66. 4. Pedagogical realisation
Word familiarity-based approach
• Familiarity 1: Recognition
“I have seen this word.”
• Familiarity 2: Form-meaning
mapping
“I know its form and meaning.”
67. 4. Pedagogical realisation
Word familiarity-based approach
• Familiarity 3: Key formula
“I can use this word in a formula.”
• Familiarity 4: Formula in dialog
“I can use this word in a formula
embedded in a conversation.”
68. 4. Pedagogical realisation
Word familiarity-based approach
• Familiarity 5: Formula in
context
“I can use this word in a formula to
express my own feelings and
thoughts.”
69. 5. Tentative Physical production
Rank COCA Freq. MI Rank SOAP Freq MI
1 ANNIVERSARY 1629 8.59 1 WE 1279 3.19
2 BIRTHDAY 1218 7.54 2 OUR 228 3.32
3 CHRISTMAS 414 4.95 3 SHOULD 227 3.04
4 VICTORY 311 4.89 4 DAY 100 3.09
5 MASS 288 4.14 5 TONIGHT 98 3.83
6 50TH 276 8.54 6 NEW 97 3.40
7 HOLIDAY 253 5.20 7 REASON 89 3.80
8 WEDDING 223 4.91 8 BIRTHDAY 88 6.54
9 HOLIDAYS 218 6.38 9 TODAY 73 3.56
10 FESTIVAL 194 4.94 10 CHAMPAGNE 67 6.36
celebrate: node with a span of ± 4 / MI > 3.0
70. celebrate
• This is the festival celebrating the end of
the year. (Textbook A)
• We celebrated John’s birthday. (Book A)
• celebrate our father’s birthday (Book B)
• Quite a few people have been invited to
celebrate the couple’s anniversary. (Book C)
“celebrate” in a textbook and vocabulary books
72. 5. Tentative Physical production
Familiarity 3: Key formula
celebrate /ˈsɛləˌbreɪt/ W3
It’s your birthday. Come on.
Let’s celebrate!
73. 5. Tentative Physical production
Familiarity 4: Formula in dialog
celebrate /ˈsɛləˌbreɪt/ W3
A: (Come on,) she came back.
This is good news. We should celebrate.
B: I don't want to celebrate.
A: Not even a little?
74. 5. Tentative Physical production
Familiarity 5: Formula in
context
celebrate /ˈsɛləˌbreɪt/ W3
Make a sentence or a dialog using “celebrate”
in your own context.
75. To do …
(0) Add more transcripts and clean up the corpus
(1) Finalize n-gram lists
• MI for n-gram (3-, 4-, 5-gram)?
• pragmatic function labeling
(2) Physical realisation
• polysemy and varied use of formulas
• publisher and English informant to work with
(3) Empirical study for “use” and “evaluation”
4. Results and Discussion - Where to?
76. Adolphs, S., & Carter, R. (2013).Spoken corpus linguistics: From monomodal to multimodal. London,
UK: Routledge.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and
written English. London, UK: Longman.
Bullon, S. and G. Leech. 2007. Longman communication 3000 and the Longman defining vocabulary.
In Bullon, S. and G. Leech (Eds.), Longman communication 3000. Harlow: Pearson Education.
Coulmas, F. (1979). On the sociolinguistic relevance of routine formulae. Journal of pragmatics, 3(3),
239-266.
Davies, Mark. (2012) The corpus of American soap operas: 500 million words, 1990-2012. Available
online at http://corpus.byu.edu/soap/overview.asp
Erman, B. and Warren, B. (2000). The idiom principle and the open choice principle. Text, 20(1), 29-62.
Firth, J. R. (1957). Papers in linguistics 1934–1951. London, UK: Oxford University Press.
Foster, P. (2001). Rules and routines: A consideration of their role in the task-based language production
of native and non-native speakers. In M. Bygate, P. Skehan and M. Swain (Eds.), Researching
pedagogic tasks: Second language learning, teaching, and testing (pp. 75-94). Harlow: Longman.
Garnier, M., & Schmitt, N. (2015). The PHaVE List: A pedagogical list of phrasal verbs and their
most frequent meaning senses. Language teaching research, 19(6), 645-666.
Hsu, J.-Y., & Chiu, C.-Y. (2008). Lexical collocations and their relation to speaking proficiency of college EFL
learners in Taiwan. Asian EFL Journal, 10, 181–204.
Imao, Y. (2016). CasualConc (Version 2.0.2) [Computersoftware]. retrieved from https://sites.google.com/site/
casualconc/Home
References
77. Jolly D. & Bolitho R. A Framework for materials writing. In B. Tomlinson, (Ed.), Materials development in
language teaching. London, UK: Cambridge University Press.
Jones, M. & Durrant P. (2010). What can a corpus tell us about vocabulary teaching materials? In O'Keeffe, A. and
McCarthy, M. (Eds.), The Routledge handbook of corpus linguistics Routledge (pp. 387-400).
Keshavarz, M. H., & Salimi, H. (2007). Collocational competence and cloze test performance: A study of Iranian
EFL learners. International journal of applied linguistics, 17(1), 81-92.
Krishnamurthy, R. (2003). English collocation studies: The OSTI report new edition of Sinclair, J., Jones, S., &
Daley, R. (1970), Birmingham, UK: Birmingham University Press.
Martinez, R., & Murphy, V. A. (2011). Effect of frequency and idiomaticity on second language reading
comprehension. TESOL quarterly, 45, 267–290.
O'keeffe, A., McCarthy, M., & Carter, R. (2007). From corpus to classroom: Language use and language teaching.
Cambridge University Press.
Peters, A. M. (1983). Units of language acquisition. Cambridge, UK: Cambridge University Press.
Schmitt, N. (2010). Researching vocabulary: A vocabulary research manual. London: Palgrave Macmillan.
Simpson-Vlach, R., & Ellis, N. C. (2010). An academic formulas list (AFL). Applied linguistics, 31, 487-512.
Wood, D. (2010). Uses and functions of formulaic sequences in second language speech: An exploration of the
foundation of fluency. Canadian modern language review, 63(1), 13-33.
Wood, D. (2015). Fundamentals of formulaic language: An introduction. London, UK: Bloomsbury Publishing.
Wood, D. & Namba, K. (2013). Focused instruction of formulaic language: Use and awareness in a Japanese
university class. The Asian conference of language learning official conference proceedings 2013, pp. 203-212.
Wray, A. & Namba, K. (2003). Use of formulaic language by a Japanese-English bilingual child: A practical
approach to data analysis. Japanese journal for multilingualism and multiculturalism, 9(1), 24-51.
References