SlideShare a Scribd company logo
1 of 13
Understanding Corpora Tables
By, Jamie Watanabe
• Counting words
• Tokens
• Types
• Lemmas
• Word families
• Frequency levels
Agenda/Topics to Be Covered
According to Nation, there are several ways of counting words, that is,
deciding what will be counted. (Nation, 2013)
Counting words
e.g. Its not easy to say it correctly
A simple way to count the preceding example is to count every word
form in a spoken or written text and if the same word form occurs more
than once, then each occurrence is counted. So, the example sentence,
would contain eight words, even though two of them are the same word
for, it. (Nation, 2013)
I.S.P. Nation- Learning vocabulary in another language, second edition
Tokens
e.g. Its not easy to say it correctly
Another way to count would be that if the same word occurs again, we
do not count it again. So the sentence of eight tokens consists of seven
different words or types. (Nation, 2013)
Types
e.g. cook and cooks
Counting the preceding example as two different words to be learned, is
strange. So, instead of counting different types as different words, closely
related words could be counted as members of the same word or
lemmas.
A lemma consists of headword and its inflected forms and reduced forms.
(Nation, 2013)
Lemmas
A word family consists of a headword, its inflected forms and its closely
related derived forms. (Nation, 2013)
Word families
Three kinds of vocabulary based on frequency levels.
For example, looking at academic text and examine the different
frequency levels of vocabulary it contains. The vocabulary is divided into
three groups according to frequency lists of word families.
• High frequency words
• Mid-frequency words
• Low frequency words
Frequency-based word lists
High frequency vocabulary
2,000 word families
(with proper nouns etc.-90% coverage)
Mid-frequency vocabulary
7,000 word families
(9%) coverage
Low-frequency vocabulary
(1% coverage) around 50,000 words
British National Corpus
Small group of high-frequency words which are very important because
these words cover a very large proportion of the running words in
spoken and written texts and occur in all kinds of uses of the language
According to Nation, there are 2,000 word families in high-frequency
vocabulary.
Michael West’s (1953) A General Service List of English Words, which
contains around 2,000 word families. About 165 word families in this list
are function words, such as a, some, two, because, and to.
High frequency words
A large group of generally useful words that occur rather infrequently,
but frequently enough to be sensible learning goal after the high-
frequency and specialized vocabulary is known.
Mid-frequency words consist of 7,000 word families from the third to the
ninth 1,000.
Depends on the Corpus
Mid-frequency words
There is a very large group of words that occur very infrequently and
cover only a small portion of any text.
Around 3% running words in the British National Corpus are like Carl,
Johnson, and Ohio.
Low-frequency words
Nation,I.S.P.(2013). Learning vocabulary in another language(2nd ed.).New
York: Cambridge University Press.
Reference

More Related Content

What's hot

Dictionary Presentation
Dictionary PresentationDictionary Presentation
Dictionary Presentation
moran23
 
3 reading skills grade 2 2013
3 reading skills grade 2 20133 reading skills grade 2 2013
3 reading skills grade 2 2013
Mónica Eberle
 
3 reading skills grade 1 2013
3 reading skills grade 1 20133 reading skills grade 1 2013
3 reading skills grade 1 2013
Mónica Eberle
 
3 reading skills grade 3 2013
3 reading skills grade 3 20133 reading skills grade 3 2013
3 reading skills grade 3 2013
Mónica Eberle
 
2 reading developmental plan k5 to g3
2 reading developmental plan k5 to g32 reading developmental plan k5 to g3
2 reading developmental plan k5 to g3
Mónica Eberle
 
Answering questions about words – dictionaries
Answering questions about words – dictionariesAnswering questions about words – dictionaries
Answering questions about words – dictionaries
Ernani Agulto
 

What's hot (20)

Dictionary Presentation
Dictionary PresentationDictionary Presentation
Dictionary Presentation
 
Dictionary skills in the 21st century
Dictionary skills in the 21st centuryDictionary skills in the 21st century
Dictionary skills in the 21st century
 
Dictionary, encyclopedia and thesaurus
Dictionary, encyclopedia and thesaurusDictionary, encyclopedia and thesaurus
Dictionary, encyclopedia and thesaurus
 
Dictionaries
DictionariesDictionaries
Dictionaries
 
English Grammar
English Grammar  English Grammar
English Grammar
 
3 reading skills grade 2 2013
3 reading skills grade 2 20133 reading skills grade 2 2013
3 reading skills grade 2 2013
 
3 reading skills k5
3 reading skills k53 reading skills k5
3 reading skills k5
 
Ppp10
Ppp10Ppp10
Ppp10
 
A level english language (spec b )
A level english language (spec b )A level english language (spec b )
A level english language (spec b )
 
The dictionary
The dictionaryThe dictionary
The dictionary
 
Dictionaries
DictionariesDictionaries
Dictionaries
 
WJEC AS Level English language and Literature
WJEC AS Level English language and LiteratureWJEC AS Level English language and Literature
WJEC AS Level English language and Literature
 
Dictionaries for learners
Dictionaries for learnersDictionaries for learners
Dictionaries for learners
 
3 reading skills grade 1 2013
3 reading skills grade 1 20133 reading skills grade 1 2013
3 reading skills grade 1 2013
 
3 reading skills grade 3 2013
3 reading skills grade 3 20133 reading skills grade 3 2013
3 reading skills grade 3 2013
 
A Level English Language (B) Exam advice from AQA 2012
A Level English Language (B) Exam advice from AQA 2012A Level English Language (B) Exam advice from AQA 2012
A Level English Language (B) Exam advice from AQA 2012
 
2 reading developmental plan k5 to g3
2 reading developmental plan k5 to g32 reading developmental plan k5 to g3
2 reading developmental plan k5 to g3
 
Answering questions about words – dictionaries
Answering questions about words – dictionariesAnswering questions about words – dictionaries
Answering questions about words – dictionaries
 
Why etymology
Why etymologyWhy etymology
Why etymology
 
Etymology
EtymologyEtymology
Etymology
 

Similar to Understanding corpora tables (f)

chapter 1.What is Morphology? (Morphology (Linguistics)
chapter 1.What is Morphology? (Morphology (Linguistics)chapter 1.What is Morphology? (Morphology (Linguistics)
chapter 1.What is Morphology? (Morphology (Linguistics)
MehakAli97
 
Mona Baker's strategies for translation. Chapter 2
Mona Baker's strategies for translation. Chapter 2Mona Baker's strategies for translation. Chapter 2
Mona Baker's strategies for translation. Chapter 2
ssusere6b7f7
 
[DSC Europe 22] Hedonometry and big data - Petar Kocovic & Muthu Ramachandran
[DSC Europe 22] Hedonometry and big data - Petar Kocovic & Muthu Ramachandran[DSC Europe 22] Hedonometry and big data - Petar Kocovic & Muthu Ramachandran
[DSC Europe 22] Hedonometry and big data - Petar Kocovic & Muthu Ramachandran
DataScienceConferenc1
 

Similar to Understanding corpora tables (f) (20)

Does the verb come last in your languages
Does the verb come last in your languagesDoes the verb come last in your languages
Does the verb come last in your languages
 
Does the verb come last in your languages
Does the verb come last in your languagesDoes the verb come last in your languages
Does the verb come last in your languages
 
General information on dictionary use
General information on dictionary useGeneral information on dictionary use
General information on dictionary use
 
chapter 1.What is Morphology? (Morphology (Linguistics)
chapter 1.What is Morphology? (Morphology (Linguistics)chapter 1.What is Morphology? (Morphology (Linguistics)
chapter 1.What is Morphology? (Morphology (Linguistics)
 
How to compare two vocabulary systems
How to compare two vocabulary systemsHow to compare two vocabulary systems
How to compare two vocabulary systems
 
Sample debate presentation: Is 'vocabulary' enough?
Sample debate presentation: Is 'vocabulary' enough?Sample debate presentation: Is 'vocabulary' enough?
Sample debate presentation: Is 'vocabulary' enough?
 
Structure-of-the-English-Grammar 1.pptx
Structure-of-the-English-Grammar 1.pptxStructure-of-the-English-Grammar 1.pptx
Structure-of-the-English-Grammar 1.pptx
 
Receptive and Expressive Communication
Receptive and Expressive CommunicationReceptive and Expressive Communication
Receptive and Expressive Communication
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
What is a dictionary.pptx
What is a dictionary.pptxWhat is a dictionary.pptx
What is a dictionary.pptx
 
Mona Baker's strategies for translation. Chapter 2
Mona Baker's strategies for translation. Chapter 2Mona Baker's strategies for translation. Chapter 2
Mona Baker's strategies for translation. Chapter 2
 
DK Merriam Webster Instructional Presentation
DK Merriam Webster Instructional PresentationDK Merriam Webster Instructional Presentation
DK Merriam Webster Instructional Presentation
 
Ch 9 Language and Speech Processing.pptx
Ch 9 Language and Speech Processing.pptxCh 9 Language and Speech Processing.pptx
Ch 9 Language and Speech Processing.pptx
 
About vocabulary learning
About vocabulary learningAbout vocabulary learning
About vocabulary learning
 
Vocabulary skills In Linguistics!!
Vocabulary skills In Linguistics!! Vocabulary skills In Linguistics!!
Vocabulary skills In Linguistics!!
 
Components of lexical meaning
Components of lexical meaningComponents of lexical meaning
Components of lexical meaning
 
Dictionary skills.
Dictionary skills.Dictionary skills.
Dictionary skills.
 
Conceps about vocabulary and pronunciation
Conceps about vocabulary and pronunciationConceps about vocabulary and pronunciation
Conceps about vocabulary and pronunciation
 
[DSC Europe 22] Hedonometry and big data - Petar Kocovic & Muthu Ramachandran
[DSC Europe 22] Hedonometry and big data - Petar Kocovic & Muthu Ramachandran[DSC Europe 22] Hedonometry and big data - Petar Kocovic & Muthu Ramachandran
[DSC Europe 22] Hedonometry and big data - Petar Kocovic & Muthu Ramachandran
 
Using a Dictionary
Using a DictionaryUsing a Dictionary
Using a Dictionary
 

Recently uploaded

SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
CaitlinCummins3
 

Recently uploaded (20)

An overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismAn overview of the various scriptures in Hinduism
An overview of the various scriptures in Hinduism
 
ANTI PARKISON DRUGS.pptx
ANTI         PARKISON          DRUGS.pptxANTI         PARKISON          DRUGS.pptx
ANTI PARKISON DRUGS.pptx
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
Trauma-Informed Leadership - Five Practical Principles
Trauma-Informed Leadership - Five Practical PrinciplesTrauma-Informed Leadership - Five Practical Principles
Trauma-Informed Leadership - Five Practical Principles
 
How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17
 
VAMOS CUIDAR DO NOSSO PLANETA! .
VAMOS CUIDAR DO NOSSO PLANETA!                    .VAMOS CUIDAR DO NOSSO PLANETA!                    .
VAMOS CUIDAR DO NOSSO PLANETA! .
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...
 
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
 
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
 
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUMDEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
 
male presentation...pdf.................
male presentation...pdf.................male presentation...pdf.................
male presentation...pdf.................
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
 
Book Review of Run For Your Life Powerpoint
Book Review of Run For Your Life PowerpointBook Review of Run For Your Life Powerpoint
Book Review of Run For Your Life Powerpoint
 
Improved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppImproved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio App
 
8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....
 

Understanding corpora tables (f)

  • 2. • Counting words • Tokens • Types • Lemmas • Word families • Frequency levels Agenda/Topics to Be Covered
  • 3. According to Nation, there are several ways of counting words, that is, deciding what will be counted. (Nation, 2013) Counting words
  • 4. e.g. Its not easy to say it correctly A simple way to count the preceding example is to count every word form in a spoken or written text and if the same word form occurs more than once, then each occurrence is counted. So, the example sentence, would contain eight words, even though two of them are the same word for, it. (Nation, 2013) I.S.P. Nation- Learning vocabulary in another language, second edition Tokens
  • 5. e.g. Its not easy to say it correctly Another way to count would be that if the same word occurs again, we do not count it again. So the sentence of eight tokens consists of seven different words or types. (Nation, 2013) Types
  • 6. e.g. cook and cooks Counting the preceding example as two different words to be learned, is strange. So, instead of counting different types as different words, closely related words could be counted as members of the same word or lemmas. A lemma consists of headword and its inflected forms and reduced forms. (Nation, 2013) Lemmas
  • 7. A word family consists of a headword, its inflected forms and its closely related derived forms. (Nation, 2013) Word families
  • 8. Three kinds of vocabulary based on frequency levels. For example, looking at academic text and examine the different frequency levels of vocabulary it contains. The vocabulary is divided into three groups according to frequency lists of word families. • High frequency words • Mid-frequency words • Low frequency words Frequency-based word lists
  • 9. High frequency vocabulary 2,000 word families (with proper nouns etc.-90% coverage) Mid-frequency vocabulary 7,000 word families (9%) coverage Low-frequency vocabulary (1% coverage) around 50,000 words British National Corpus
  • 10. Small group of high-frequency words which are very important because these words cover a very large proportion of the running words in spoken and written texts and occur in all kinds of uses of the language According to Nation, there are 2,000 word families in high-frequency vocabulary. Michael West’s (1953) A General Service List of English Words, which contains around 2,000 word families. About 165 word families in this list are function words, such as a, some, two, because, and to. High frequency words
  • 11. A large group of generally useful words that occur rather infrequently, but frequently enough to be sensible learning goal after the high- frequency and specialized vocabulary is known. Mid-frequency words consist of 7,000 word families from the third to the ninth 1,000. Depends on the Corpus Mid-frequency words
  • 12. There is a very large group of words that occur very infrequently and cover only a small portion of any text. Around 3% running words in the British National Corpus are like Carl, Johnson, and Ohio. Low-frequency words
  • 13. Nation,I.S.P.(2013). Learning vocabulary in another language(2nd ed.).New York: Cambridge University Press. Reference