SlideShare a Scribd company logo
Corpus Linguistics
“Corpus linguistics is the study of language as
expressed in corpora (samples) of "real world"
text.”
Example
For Example, a teacher that has been teaching for two years conducts a
corpus analysis.
“Questions”
1. What are the three most frequent words used by the students in their
writing?
2. How do they use these words in their writing?
Steps
1. Teacher will collect all student’s writings.
2. Calculate the number of words used by them.
3. Generate frequency list of all these words and rank them.
Result
Total no. of words: 120
The three most frequent words: the, for, it
Word Occurrence
The 48
For 24
It 20
Corpora
“Corpora are a large and structured set of texts (nowadays
usually electronically stored and processed).They are used
to do statistical analysis and hypothesis testing.
Notable English language corpora include the following:
 The American National Corpus (ANC).
 British National Corpus (BNC).
 The Corpus of Contemporary American English (COCA).
 The International Corpus of English (ICE).
Types of Corpora
Speech Corpus
“A speech corpus (or spoken corpus) is a database of speech audio files
and text transcriptions. In linguistics, spoken corpora are used to do
research into phonetics, conversation analysis, dialectology and other
fields.”
Types
Read speech which includes:
1. Book excerpts
2. Broadcast news
3. Lists of words
Spontaneous Speech, which
includes:
1. Dialogues: between two or more people (includes meetings).
2. Narratives: a person telling a story.
3. Map-tasks: one person explains a route on a map to another.
Text Corpus
A text corpus is a very large collection of text (often many billion words)
produced by real users of the language and used to analyze how words,
phrases and language in general are used. It is used by linguists,
lexicographers, social scientists, humanities, experts in natural language
processing and in many other fields.
Types
1. Monolingual corpus
A monolingual corpus is the most frequent type of corpus. It contains texts in one
language only. The corpus is usually tagged for parts of speech used by a wide
range of users for various tasks from highly practical ones.
Checking the correct usage of a word or looking up the most natural
word combinations, to scientific use, e.g. identifying frequent
patterns or new trends in language.
Example
2. Parallel Corpus
A parallel corpus consists of two monolingual corpora. One corpus
is the translation of the other. Both languages need to be aligned,
i.e. corresponding segments, usually sentences or paragraphs, need
to be matched. The user can then search for all examples of a word
or phrase in one language and the results will be displayed
together with the corresponding sentences in the other language.
Example
For example, a novel and its translation or a translation memory of a
CAT tool could be used to build a parallel corpus.
3. Multilingual corpus
A multilingual corpus is very similar to a parallel corpus. The two terms
are often used interchangeably. A multilingual corpus contains texts in
several languages which are all translations of the same text and are
aligned in the same way as parallel corpora. Sketch Engine allows the
user to select more than two aligned corpora and the search will display
the translation into all the languages simultaneously. When only two
languages are selected, a multilingual corpus behaves as a parallel
corpus. The user can also decide to work with one language to use it as a
monolingual corpus.
Example
For example, the Aarthus corpus of Danish, French and English contract
law consists of a set of three monolingual law corpora, which is not
comprised of translations of the same texts.
4. Comparable Corpus
A comparable corpus is a set of two or more monolingual corpora,
typically each in a different language, built according to the same
principles. When users search these corpora they can use the fact, that
the corpora also have the same metadata.
Example
International Corpus of English (ICE) are comparable corpora of 1 million words
each of different varieties of English
5. Learner Corpus
A learner corpus is a corpus of texts produced by learners of a language.
The corpus is used to study the mistakes and problems learners have
when learning a foreign language. Sketch Engine allows for learner
corpora to be annotated for the type of error and provides a special
interface to search either for the error itself, for the error correction, for
the error type or for a combination of the three options.
Example
1. Louvain Corpus of Native English Essays (LOCNEE)
2. International Corpus of Learner English (ICLE)
6. Diachronic Corpus
A diachronic corpus is a corpus containing texts from different periods
and is used to study the development or change in language. Sketch
Engine allows searching the corpus as a whole or only includes selected
time intervals into the search. In addition, there is a specialized
diachronic feature called Trends, which identifies words whose usage
changes the most of the selected period of time.
Example
Helsinki Corpus - 700 to 1700 texts varies in different situations
7. Specialized Corpus
A specialized corpus contains texts limited to one or more subject areas,
domains, topics etc. Such corpus is used to study how the specialized
language is used. It is used to investigate a particular type of language.
Example
1. Cambridge and Nottingham Corpus of Discourse in English (CANCODE)
(informal registers of British English) – 5 million words.
2. Michigan Corpus of Academic Spoken English (MICASE) (spoken
registers in a US academic setting) – 5 million words.
8. Multimedia Corpus
A multimedia corpus contains texts which are enhanced with audio or
visual materials or other type of multimedia content.
Example
The spoken part of British National Corpus in Sketch Engine has links to
the corresponding recordings which can be played from the Sketch
Engine interface.
Conclusion
Corpus Linguistics can help in telling about language use and how it varies
in different situations. Corpus linguistics allows us to see how language is
used today and how that language is used in different contexts, enabling us
to teach language more effectively.
Referencing
www.study.com
www.wikipedia.com
www.slideshare.net
Thank You

More Related Content

What's hot

Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
King Saud University
 
Pakistani English Vs. British English
Pakistani English Vs. British EnglishPakistani English Vs. British English
Pakistani English Vs. British English
Manzoor Panhwer
 
Language description presentation
Language description presentationLanguage description presentation
Language description presentation
Tusro Mardio
 
Course design nunan
Course design nunanCourse design nunan
Course design nunan
Karen Villalba
 
Corpus linguistics in language learning
Corpus linguistics in language learningCorpus linguistics in language learning
Corpus linguistics in language learning
nfuadah123
 
LEXICOGRAPHY
LEXICOGRAPHY LEXICOGRAPHY
LEXICOGRAPHY
mimisy
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
Alicia Ruiz
 
Noam chomsky and generative grammar
Noam chomsky and generative grammarNoam chomsky and generative grammar
Noam chomsky and generative grammar
Asia Fareed
 
Systemic Functional Linguistics
Systemic Functional LinguisticsSystemic Functional Linguistics
Systemic Functional Linguistics
Wahyu Purnaningtyas
 
Over View of the 19th century History of linguistics
Over View of the 19th century  History of linguisticsOver View of the 19th century  History of linguistics
Over View of the 19th century History of linguistics
ali23pk
 
lexicography
lexicographylexicography
lexicography
ayfa
 
Corpus Linguistics: An Introduction
Corpus Linguistics: An IntroductionCorpus Linguistics: An Introduction
Corpus Linguistics: An Introduction
Nanang Zubaidi
 
Generative grammar ppt report
Generative grammar ppt reportGenerative grammar ppt report
Generative grammar ppt report
Leilani Grace Reyes
 
Corpus linguistics intro
Corpus linguistics introCorpus linguistics intro
Corpus linguistics intro
Alex Curtis
 
Prague school slides
Prague school slidesPrague school slides
Prague school slides
noreen zafar
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
Irum Malik
 
The Prague School.ppt
The Prague School.pptThe Prague School.ppt
The Prague School.ppt
naheed29
 
The London School of Linguistics
The London School of LinguisticsThe London School of Linguistics
Corpus linguistics the basics
Corpus linguistics the basicsCorpus linguistics the basics
Corpus linguistics the basics
Jorge Baptista
 
Categorizing English world
Categorizing English worldCategorizing English world
Categorizing English world
Amna Fayyaz
 

What's hot (20)

Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Pakistani English Vs. British English
Pakistani English Vs. British EnglishPakistani English Vs. British English
Pakistani English Vs. British English
 
Language description presentation
Language description presentationLanguage description presentation
Language description presentation
 
Course design nunan
Course design nunanCourse design nunan
Course design nunan
 
Corpus linguistics in language learning
Corpus linguistics in language learningCorpus linguistics in language learning
Corpus linguistics in language learning
 
LEXICOGRAPHY
LEXICOGRAPHY LEXICOGRAPHY
LEXICOGRAPHY
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Noam chomsky and generative grammar
Noam chomsky and generative grammarNoam chomsky and generative grammar
Noam chomsky and generative grammar
 
Systemic Functional Linguistics
Systemic Functional LinguisticsSystemic Functional Linguistics
Systemic Functional Linguistics
 
Over View of the 19th century History of linguistics
Over View of the 19th century  History of linguisticsOver View of the 19th century  History of linguistics
Over View of the 19th century History of linguistics
 
lexicography
lexicographylexicography
lexicography
 
Corpus Linguistics: An Introduction
Corpus Linguistics: An IntroductionCorpus Linguistics: An Introduction
Corpus Linguistics: An Introduction
 
Generative grammar ppt report
Generative grammar ppt reportGenerative grammar ppt report
Generative grammar ppt report
 
Corpus linguistics intro
Corpus linguistics introCorpus linguistics intro
Corpus linguistics intro
 
Prague school slides
Prague school slidesPrague school slides
Prague school slides
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
The Prague School.ppt
The Prague School.pptThe Prague School.ppt
The Prague School.ppt
 
The London School of Linguistics
The London School of LinguisticsThe London School of Linguistics
The London School of Linguistics
 
Corpus linguistics the basics
Corpus linguistics the basicsCorpus linguistics the basics
Corpus linguistics the basics
 
Categorizing English world
Categorizing English worldCategorizing English world
Categorizing English world
 

Similar to Corpus Linguistics

Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
Prof.Ravindra Borse
 
Corpus study design
Corpus study designCorpus study design
Corpus study design
bikashtaly
 
What corpora are available? by David Y. W.D
What corpora are available? by David Y. W.DWhat corpora are available? by David Y. W.D
What corpora are available? by David Y. W.D
RajpootBhatti5
 
Corpus based translation Studies
Corpus based translation StudiesCorpus based translation Studies
Corpus based translation Studies
Habib Ali
 
Computer assisted text and corpus analysis
Computer assisted text and corpus analysisComputer assisted text and corpus analysis
Computer assisted text and corpus analysis
RubyaShaheen
 
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
ijnlc
 
The Corpus In The Classroom
The Corpus In The ClassroomThe Corpus In The Classroom
The Corpus In The Classroom
Colin Graham
 
Corpus approaches to discourse analysis
Corpus approaches to discourse analysisCorpus approaches to discourse analysis
Corpus approaches to discourse analysis
Aseel K. Mahmood
 
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
kevig
 
11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)
ThennarasuSakkan
 
Sinopsis
SinopsisSinopsis
Sinopsis
ayfa
 
Sinopsis
SinopsisSinopsis
Sinopsis
ayfa
 
A Rule-Based Approach for Aligning Japanese-Spanish Sentences from A Comparab...
A Rule-Based Approach for Aligning Japanese-Spanish Sentences from A Comparab...A Rule-Based Approach for Aligning Japanese-Spanish Sentences from A Comparab...
A Rule-Based Approach for Aligning Japanese-Spanish Sentences from A Comparab...
kevig
 
A RULE-BASED APPROACH FOR ALIGNING JAPANESE-SPANISH SENTENCES FROM A COMPARAB...
A RULE-BASED APPROACH FOR ALIGNING JAPANESE-SPANISH SENTENCES FROM A COMPARAB...A RULE-BASED APPROACH FOR ALIGNING JAPANESE-SPANISH SENTENCES FROM A COMPARAB...
A RULE-BASED APPROACH FOR ALIGNING JAPANESE-SPANISH SENTENCES FROM A COMPARAB...
kevig
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
Prof.Ravindra Borse
 
Language
LanguageLanguage
Language
Guido Wachsmuth
 
Lexicography
 Lexicography Lexicography
Lexicography
the4theorists
 
Lexicography
 Lexicography Lexicography
Lexicography
the4theorists
 
corpus.pptx
corpus.pptxcorpus.pptx
corpus.pptx
SlothFox
 
Graded assignment #3
Graded assignment #3Graded assignment #3
Graded assignment #3
Muhammad Amzar
 

Similar to Corpus Linguistics (20)

Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Corpus study design
Corpus study designCorpus study design
Corpus study design
 
What corpora are available? by David Y. W.D
What corpora are available? by David Y. W.DWhat corpora are available? by David Y. W.D
What corpora are available? by David Y. W.D
 
Corpus based translation Studies
Corpus based translation StudiesCorpus based translation Studies
Corpus based translation Studies
 
Computer assisted text and corpus analysis
Computer assisted text and corpus analysisComputer assisted text and corpus analysis
Computer assisted text and corpus analysis
 
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
 
The Corpus In The Classroom
The Corpus In The ClassroomThe Corpus In The Classroom
The Corpus In The Classroom
 
Corpus approaches to discourse analysis
Corpus approaches to discourse analysisCorpus approaches to discourse analysis
Corpus approaches to discourse analysis
 
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
 
11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)
 
Sinopsis
SinopsisSinopsis
Sinopsis
 
Sinopsis
SinopsisSinopsis
Sinopsis
 
A Rule-Based Approach for Aligning Japanese-Spanish Sentences from A Comparab...
A Rule-Based Approach for Aligning Japanese-Spanish Sentences from A Comparab...A Rule-Based Approach for Aligning Japanese-Spanish Sentences from A Comparab...
A Rule-Based Approach for Aligning Japanese-Spanish Sentences from A Comparab...
 
A RULE-BASED APPROACH FOR ALIGNING JAPANESE-SPANISH SENTENCES FROM A COMPARAB...
A RULE-BASED APPROACH FOR ALIGNING JAPANESE-SPANISH SENTENCES FROM A COMPARAB...A RULE-BASED APPROACH FOR ALIGNING JAPANESE-SPANISH SENTENCES FROM A COMPARAB...
A RULE-BASED APPROACH FOR ALIGNING JAPANESE-SPANISH SENTENCES FROM A COMPARAB...
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Language
LanguageLanguage
Language
 
Lexicography
 Lexicography Lexicography
Lexicography
 
Lexicography
 Lexicography Lexicography
Lexicography
 
corpus.pptx
corpus.pptxcorpus.pptx
corpus.pptx
 
Graded assignment #3
Graded assignment #3Graded assignment #3
Graded assignment #3
 

Recently uploaded

A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
ak6969907
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
Celine George
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
Bisnar Chase Personal Injury Attorneys
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
IreneSebastianRueco1
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 

Recently uploaded (20)

A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 

Corpus Linguistics

  • 2. “Corpus linguistics is the study of language as expressed in corpora (samples) of "real world" text.”
  • 3. Example For Example, a teacher that has been teaching for two years conducts a corpus analysis. “Questions” 1. What are the three most frequent words used by the students in their writing? 2. How do they use these words in their writing?
  • 4. Steps 1. Teacher will collect all student’s writings. 2. Calculate the number of words used by them. 3. Generate frequency list of all these words and rank them.
  • 5. Result Total no. of words: 120 The three most frequent words: the, for, it Word Occurrence The 48 For 24 It 20
  • 6. Corpora “Corpora are a large and structured set of texts (nowadays usually electronically stored and processed).They are used to do statistical analysis and hypothesis testing.
  • 7. Notable English language corpora include the following:  The American National Corpus (ANC).  British National Corpus (BNC).  The Corpus of Contemporary American English (COCA).  The International Corpus of English (ICE).
  • 9. Speech Corpus “A speech corpus (or spoken corpus) is a database of speech audio files and text transcriptions. In linguistics, spoken corpora are used to do research into phonetics, conversation analysis, dialectology and other fields.”
  • 10. Types
  • 11. Read speech which includes: 1. Book excerpts 2. Broadcast news 3. Lists of words
  • 12. Spontaneous Speech, which includes: 1. Dialogues: between two or more people (includes meetings). 2. Narratives: a person telling a story. 3. Map-tasks: one person explains a route on a map to another.
  • 13. Text Corpus A text corpus is a very large collection of text (often many billion words) produced by real users of the language and used to analyze how words, phrases and language in general are used. It is used by linguists, lexicographers, social scientists, humanities, experts in natural language processing and in many other fields.
  • 14. Types
  • 15. 1. Monolingual corpus A monolingual corpus is the most frequent type of corpus. It contains texts in one language only. The corpus is usually tagged for parts of speech used by a wide range of users for various tasks from highly practical ones.
  • 16. Checking the correct usage of a word or looking up the most natural word combinations, to scientific use, e.g. identifying frequent patterns or new trends in language. Example
  • 17. 2. Parallel Corpus A parallel corpus consists of two monolingual corpora. One corpus is the translation of the other. Both languages need to be aligned, i.e. corresponding segments, usually sentences or paragraphs, need to be matched. The user can then search for all examples of a word or phrase in one language and the results will be displayed together with the corresponding sentences in the other language.
  • 18. Example For example, a novel and its translation or a translation memory of a CAT tool could be used to build a parallel corpus.
  • 19. 3. Multilingual corpus A multilingual corpus is very similar to a parallel corpus. The two terms are often used interchangeably. A multilingual corpus contains texts in several languages which are all translations of the same text and are aligned in the same way as parallel corpora. Sketch Engine allows the user to select more than two aligned corpora and the search will display the translation into all the languages simultaneously. When only two languages are selected, a multilingual corpus behaves as a parallel corpus. The user can also decide to work with one language to use it as a monolingual corpus.
  • 20. Example For example, the Aarthus corpus of Danish, French and English contract law consists of a set of three monolingual law corpora, which is not comprised of translations of the same texts.
  • 21. 4. Comparable Corpus A comparable corpus is a set of two or more monolingual corpora, typically each in a different language, built according to the same principles. When users search these corpora they can use the fact, that the corpora also have the same metadata.
  • 22. Example International Corpus of English (ICE) are comparable corpora of 1 million words each of different varieties of English
  • 23. 5. Learner Corpus A learner corpus is a corpus of texts produced by learners of a language. The corpus is used to study the mistakes and problems learners have when learning a foreign language. Sketch Engine allows for learner corpora to be annotated for the type of error and provides a special interface to search either for the error itself, for the error correction, for the error type or for a combination of the three options.
  • 24. Example 1. Louvain Corpus of Native English Essays (LOCNEE) 2. International Corpus of Learner English (ICLE)
  • 25. 6. Diachronic Corpus A diachronic corpus is a corpus containing texts from different periods and is used to study the development or change in language. Sketch Engine allows searching the corpus as a whole or only includes selected time intervals into the search. In addition, there is a specialized diachronic feature called Trends, which identifies words whose usage changes the most of the selected period of time.
  • 26. Example Helsinki Corpus - 700 to 1700 texts varies in different situations
  • 27. 7. Specialized Corpus A specialized corpus contains texts limited to one or more subject areas, domains, topics etc. Such corpus is used to study how the specialized language is used. It is used to investigate a particular type of language.
  • 28. Example 1. Cambridge and Nottingham Corpus of Discourse in English (CANCODE) (informal registers of British English) – 5 million words. 2. Michigan Corpus of Academic Spoken English (MICASE) (spoken registers in a US academic setting) – 5 million words.
  • 29. 8. Multimedia Corpus A multimedia corpus contains texts which are enhanced with audio or visual materials or other type of multimedia content.
  • 30. Example The spoken part of British National Corpus in Sketch Engine has links to the corresponding recordings which can be played from the Sketch Engine interface.
  • 31. Conclusion Corpus Linguistics can help in telling about language use and how it varies in different situations. Corpus linguistics allows us to see how language is used today and how that language is used in different contexts, enabling us to teach language more effectively.