1. An Examination of
the New General Service List
Tim Stoeckel
University of Niigata Prefecture
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
3. The General Service List
⢠West, 1953
⢠1,964 entries
⢠Word forms with shared roots grouped together
broad, breadth, broadcast
⢠Word selection
⢠Frequencies in two corpora
⢠Subjective considerations
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
4. The General Service List
⢠Revised by Paul Nation, early 1990s
⢠Added some inflected and derived forms (e.g. broader, broadly)
⢠Removed some compound forms (e.g., broadcast)
⢠Added other basic lexical sets (letters, numbers, months, days of the week)
⢠Most widely used version of the GSL
⢠Lextutor VocabProfiler
⢠Range
⢠AntWordProfiler
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
5. Limitations of The GSL
⢠Derived from small corpora (5 million words total)
⢠Dated
e.g., merchant, plow, spade, cultivator, bless, grace
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
6. Limitations of The GSL
⢠Organizing principle (word families)
⢠Learners with knowledge of base forms often lack knowledge of derived
forms
O broad
O broader
X breadth
X broadly
McLean, 2017; Mochizuki & Aizawa, 2000; Ward & Chuenjundaeng, 2009
⢠Learning burden
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
7. The New General Service List
⢠Browne, Culligan, & Phillips, 2013
⢠2,801 entries
⢠supplementary list (numbers, months, days of the week)
⢠273 million-word section of the modern Cambridge English Corpus
(CEC)
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
8. The New General Service List
⢠Modified lemma as organizing principle
GSL: Word Family
color
colored
colorful
coloring
colorless
colors
Lemma
color(n)
colors
NGSL: Modified Lemma
color (n, v)
colors (n, v)
colored
coloring
(British spellings not shown)
⢠Learners with knowledge of a base form usually have knowledge of
its inflected forms (McLean, 2017).
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
9. The New General Service List
Learning Burden of the GSL and NGSL (Browne, 2014)
List Word Families Modified Lemmas
GSL 1,964 3,623
NGSL1 2,368 2,818
1 Values based upon NGSL version 1.00.
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
10. The New General Service List
⢠Modified lemma as organizing principle
rise (n, v)
rose
rises
rising
risen
rose (n)
roses
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
11. The New General Service List
⢠Modified lemma as organizing principle
rise (n, v)
rose
rises
rising
risen
rose (n)
roses
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
12. The New General Service List
Word Selection
⢠Empirical Considerations
⢠Adjusted word-frequencies in the CEC (Carrollâs U, 1971)
⢠Subjective Considerations
⢠Consultation with Paul Nation
⢠Comparison to other lists
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
13. The New General Service List
Carrollâs U Values for Words at Different Frequency Ranks in the CEC
Modified Lemma Head Rank U
the 1 60,909.9
somebody 1,000 93.4
revolution 2,000 36.5
mortgage 2,500 26.0
utility 2,801 21.2
quit 3,000 19.0
explicit 3,500 14.7
Note. Carrollâs (1971) U values downloaded from http://www.newgeneralservicelist.org/
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
14. The New General Service List
Adjusted-Frequency Band
List 1K 2K
2,001
-2,801 >2,801
NGSL 981 987 715 118
NGSL Supplementary 12 5 7 23
Off List 7 8 79
NGSL Entries from Various Adjusted-Frequency Levels of the CEC
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
15. The New General Service List
⢠Areas in Need of Further Research
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
16. The New General Service List
⢠Areas in Need of Further Research
⢠Word Selection
⢠Explanations for subjective criteria for individual word selections remain unavailable
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
17. The New General Service List
⢠Areas in Need of Further Research
⢠Word Selection
⢠Explanations for subjective criteria for individual word selections remain unavailable
⢠CEC is 65% British English
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
18. The New General Service List
⢠Areas in Need of Further Research
⢠Word Selection
⢠Explanations for subjective criteria for individual word selections remain unavailable
⢠CEC is 65% British English
⢠Coverage
Coverage
Corpus GSL NGSL Difference
CEC section (273 mil) 84.24 90.34 + 6.10
Previous Research on NGSL Text Coverage (Browne, 2014)
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
19. Research Questions
1. How does the NGSL compare to the GSL in terms of coverage of the
Corpus of Contemporary American English (COCA) and each of its
individual sections?
2. Does this analysis of the COCA as a secondary data source reveal
candidates for addition to the NGSL?
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
20. Methods
⢠Materials
⢠2010-2015 section of the COCA (fiction, spoken, academic, magazine, newspaper)
⢠GSL (Nationâs version)
⢠NGSL 1.01 + NGSL Supplementary List + letters of the alphabet
⢠Procedures
⢠Corpus cleaned (144 million 114 million words)
⢠RQ 1: GSL & NGSL coverage calculated with AntWordProfiler
⢠RQ 2: Carrollâs U calculated for NGSL entries and high-frequency offlist modified
lemmas in the COCA
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
22. Research Question 1
How does the NGSL compare to the GSL in terms of coverage of the
COCA and each of its individual sections?
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
23. RQ1: Results
A Comparison of GSL and NGSL Coverage of the 2010-2015 Section of the COCA
Proper
Nouns Foreign Marginal
NGSL
Section NGSL Supp Total GSL Diff
Fiction 3.62 0.04 0.25 85.38 0.66 86.03 85.11 +.92
Spoken 4.65 0.01 0.65 88.08 0.70 88.78 85.59 +3.18
Academic 6.58 0.10 0.04 79.57 0.84 80.41 71.34 +9.07
Magazine 5.29 0.03 0.07 81.06 0.75 81.81 77.70 +4.11
Newspaper 8.42 0.06 0.05 79.57 1.25 80.82 75.95 +4.87
Total 5.67 0.05 0.22 82.82 0.84 83.66 79.34 +4.32
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
24. Research Question 2
Does analysis of the COCA reveal candidates for addition to the NGSL?
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
25. RQ2: Results
Descriptive Statistics
⢠Strong linear relationship between U-values for the first 3,000 modified
lemmas in the CEC and U-values for the same words in the COCA (r = .991)
⢠Empirical benchmark (i.e., 2801th ranked modified lemma) for NGSL
membership, COCA U = 22.7 (CEC U = 21.2)
⢠In the COCA: 10.07% (282) of words above this benchmark were not in
the NGSL.
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
26. U Values 2010-2015 COCA wpm
Modified
Lemma
COCA CEC
NAWL
Corpus
Academic Fiction Magazine Newspaper Spoken
candidate 117.4 94.29
conference 90.81 117.04
click 42.09 24.61
NAWL Members That Are Candidates for the NGSL
Note. CEC and NAWL Corpus U values downloaded from http://www.newgeneralservicelist.org/
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
27. U Values 2010-2015 COCA wpm
Modified
Lemma
COCA CEC
NAWL
Corpus
Academic Fiction Magazine Newspaper Spoken
candidate 117.4 94.29 94
conference 90.81 117.04 64
click 42.09 24.61 8
NAWL Members That Are Candidates for the NGSL
Note. CEC and NAWL Corpus U values downloaded from http://www.newgeneralservicelist.org/
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
28. U Values 2010-2015 COCA wpm
Modified
Lemma
COCA CEC
NAWL
Corpus
Academic Fiction Magazine Newspaper Spoken
candidate 117.4 94.29 94 109.6 13.5 72.9 212.6 268.7
conference 90.81 117.04 64 141.0 32.6 77.7 165.3 75.2
click 42.09 24.61 8 27.3 60.7 70.6 30.1 28.8
NAWL Members That Are Candidates for the NGSL
Note. CEC and NAWL Corpus U values downloaded from http://www.newgeneralservicelist.org/
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
29. Modified Lemma COCA U CEC U Mean U
website 76.07 19.13 47.60
blog 31.40 6.60 19.00
immigration 51.67 16.91 34.29
solar1
43.66 14.03 28.84
click1
42.09 24.61 33.35
Words Whose Usage Is Increasing
Note. CEC U values downloaded from http://www.newgeneralservicelist.org/
1 NAWL members
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
30. Change in Frequency Over Time
Occurrences per Million
COCA Section website blog immigration solar1 click1
1990-1994 0 0 28.71 23.79 6.69
1995-1999 3.34 0 29.69 23.46 14.55
2000-2004 13.39 1.01 34.33 35.34 20.48
2005-2009 27.90 21.26 51.05 40.91 21.23
2010-2014 67.87 23.21 58.55 43.39 22.23
Note. CEC U values downloaded from http://www.newgeneralservicelist.org/
1 NAWL members
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
31. Token COCA U 1 CEC U 2 Mean U
best 445.48 1.47 223.48
better 433.92 3.13 218.53
rose 78.04 12.01 45.03
born 102.46 0.25 51.35
criteria 18.83 (unlisted)
Types that Are Constituents of Two Modified Lemmas
1 COCA U values are for occurrences of the token only.
2 CEC U values basis is unclear.
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
32. Entry for Good in the NGSL
good (adj, n)
goods
Offlist Modified Lemma for Best
best (v)
bests
bested
besting
CEC U = 2,038.14
CEC U = 1.47
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
33. Token COCA U 1 CEC U 2 Mean U
best 445.48 1.47 223.48
better 433.92 3.13 218.53
rose 78.04 12.01 45.03
born 102.46 0.25 51.35
criteria 18.83 (unlisted)
Constituents of Two Modified Lemmas
1 COCA U values are for occurrences of the token only.
2 CEC U values basis is unclear.
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
34. Occurrences of Modified Lemma Constituents for Good and Best in the COCA
Type Noun Verb Adjective Adverb
good 329 2 500,960 6,688
goods 19,797 1 0 0
best 86 1,795 179,702 55,846
bests 2 131 0 0
bested 0 361 10 0
besting 0 127 1 0
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
35. Proposed Entry for Good in the NGSL
good
goods
best
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
36. Proposed Entry for Good in the NGSL
good
goods
better
best
Proposed Entry for Rise in the NGSL
rise
rose
rises
rising
risen
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
37. Token COCA U 1 CEC U 2 Mean U
best 445.48 1.47 223.48
better 433.92 3.13 218.53
rose 78.04 12.01 45.03
born 102.46 0.25 51.35
criteria 18.83 (unlisted)
Constituents of Two Modified Lemmas
1 COCA U values are for occurrences of the token only.
2 CEC U values basis is unclear.
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
38. Proposed Entry for Bear in the NGSL
bear
bears
bearing
born
borne
bearings
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
39. Proposed Entry for Bear in the NGSL
bear
bears
bearing
born
borne
bearings
Proposed Entry for Born in the NGSL
born
OR
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
40. Token COCA U 1 CEC U 2 Mean U
best 445.48 1.47 223.48
better 433.92 3.13 218.53
rose 78.04 12.01 45.03
born 102.46 0.25 51.35
criteria 18.83 (unlisted)
Constituents of Two Modified Lemmas
1 COCA U values are for occurrences of the token only.
2 CEC U values basis is unclear.
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
42. Proposed Entry for Criterion in the NGSL
criterion
criterions
criteria
(with removal of criteria from the NAWL)
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
43. Summary
⢠COCA Coverage
⢠Overall: NGSL + 4.32%
⢠Candidates for Inclusion to the NGSL
⢠3 current NAWL terms (candidate, conference, click)
⢠5 words whose usage is increasing (website, blog, immigration, solar, click)
⢠5 constituents of two modified lemmas (best, better, rose, born, criteria)
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
44. Conclusions
An Entry from Westâs âA General Service List of English Wordsâ
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018
45. References
Anthony, L. (2013). AntWordProfiler (Version 1.4.0) [Computer Software]. Tokyo, Japan: Waseda University. Retrieved from
http://www.antlab.sci.waseda.ac.jp/
Bauer, L., & Nation, P. (1993). Word families. International Journal of Lexicography, 6(4), 253-279.
Browne, C. (2014). A new general service list: The better mousetrap weâve been looking for. Vocabulary Learning and Instruction, 3(1), 1-10.
Browne, C., Culligan, B. & Phillips, J. (2013a). The New Academic Word List. Retrieved from http://www.newgeneralservicelist.org.
Browne, C., Culligan, B., & Phillips, J. (2013b). The New General Service List. Retrieved from http://www.newgeneralservicelist.org.
Carroll, J. B. (1971). Statistical analysis of the corpus. In The American heritage word frequency book (pp. xxi-xl). Boston, MA: Houghton Mifflin.
Davies, M. (2008-) The Corpus of Contemporary American English (COCA): 560 million words, 1990-present [Corpus]. Available online at
https://corpus.byu.edu/coca/
Heatley, A., Nation, P., & Coxhead, A. (1994). Range [Computer software]. Retrieved from https://www.victoria.ac.nz/lals/about/staff/paul-
nation.
McLean, S. (2017). Evidence for the adoption of the flemma as an appropriate word counting unit. Applied Linguistics. Advance online
publication. doi:10.1093/applin/amw050
Mochizuki, M., & Aizawa, K. (2000). An affix acquisition order for EFL learners: An exploratory study. System, 28(2), 291-304.
doi:10.1016/S0346-251X(00)00013-0
Nation, I. S. P. (2016). Making and using word lists for language learning and testing. Amsterdam: John Benjamins Publishing Company.
Ward, J., & Chuenjundaeng, J. (2009). Suffix knowledge: Acquisition and applications. System, 37(3), 461-469.
doi:10.1016/j.system.2009.01.004
West, M. (1953). A general service list of English words. London: Longman, Green.
Presentation at the JALT Vocabulary SIG Symposium September
15, 2018