SlideShare a Scribd company logo
1 of 17
Section III
Page 105 to 150
Presented
by
Ata ul ghafer &shoiba sabir
Department of Applied linguistics
GCUF
Chapter 9
What corpora are available?
by David Y. W.
 Outline
 What are corpora
 Types of corpora
1. General corpora
2. Specialized corpora
3. Speech corpora
4. Parsed corpora
5. Historical corpora
Chapter 9
What corpora are available?
by David Y. W.
 Outline
1. Multimedia corpora
2. Developmental, learner and lingua franca corpora
3. ESL/EFL learner corpora
4. Parallel corpora
5. comparable corpora
6. multilingual corpora
What are corpora
 A Latin word “body / mass”
 A collection of written texts, especially the entire works
of a particular author or a body of writing on a
particular subject: "the Darwinian corpus“
 Corpora’ are a large and structured set of texts
(nowadays usually electronically stored and processed).
 They are used to do statistical analysis and hypothesis
testing, checking occurrences or validating linguistic
rules within a specific language territory.
Types of corpora
General Corpora
 The texts that do not belong to a single text type,
subject field, or register.
 May include written or spoken language, or both.
 May include texts produced in one country or
many.
 They aim to represent language in its broadest
sense and to serve as a widely available resource
for baseline or comparative studies of general
linguistic features.
Examples
 Brown Corpus – 1 million words.
 LOB Corpus – 1 million words.
 BNC (British National Corpus) – 100 million words.
Specialized Corpora
 Texts that are designed with more specific
research goals in mind – register-specific
descriptions and investigations of language.
 It aims to be representative of a given type
of text.
 Used to investigate a particular type of
language.
 The kind of texts included are limited:
 A time frame – such as a particular century.
 A social setting – such as conversations
taking place in a bookshop.
 A given topic – such as newspaper articles
dealing with a particular thing.
Examples
 Cambridge and Nottingham Corpus of
Discourse in English (CANCODE) (informal
registers of British English) – 5 million
words.
 Michigan Corpus of Academic Spoken
English (MICASE) (spoken registers in a
US academic setting) – 5 million words.
Historical or Diach
Historical Corpora
 Texts from different periods of time.
 Aim at representing an earlier stage(s) of a
language. They help to trace the development of a
language over time.
 Example:
Helsinki Corpus - 700 to 1700 texts 1.5
million words
Speech corpora
sound recordings
-SPOKEN ENGLISH CORPUS
-detailed description of spoken phenomena: phonology,
prosody (stress, tone units…), etc
multimedia corpora:
-transcripts synchronised audio/video recordings
-TALKBANK Website: SANTA BARBARACORPUS OF
SPOKEN AMERICAN ENGLISH (SBCSAE)
Learner’s Corpora
 Aim at representing the language as produced by the
learners of a language, and they include spoken or
written language samples produced by non-native
speakers.
 They are used to identify differences among learners’
frequency of words and types of mistakes.
 In what respects learners differ from each other and
from the language of native speakers
Example
Louvain Corpus of Native English Essays (LOCNEE)
International Corpus of Learner English (ICLE)
20,000 words.
Multilingual Corpora
 Any systematic collection of empirical language
data enabling linguists to carry out analyses of
multilingual individuals, multilingual societies or
multilingual communication.
Comparable Corpora
 Two (or more) corpora in different languages (e.g.
English and Spanish) or in different varieties of a
language (e.g. Indian English and Canadian English).
They are designed along the same lines – will contain
the same proportions of newspaper texts, novels, casual
conversation, etc.
 Comparable corpora of varieties of the same language
can be used to compare those varieties.
 Comparable corpora of different languages can be used
by translators to identify differences and equivalences
in each language.
 Example International Corpus of English (ICE) are
comparable corpora of 1 million words each of different
varieties of Eng
Parallel Corpora
 Two (or more) corpora in different
languages, each containing texts that
have been translated from one language
into the other, or texts that have been
produced simultaneously in two or more
languages.
 Can be used by translators and by
learners to find potential equivalent
expressions in each language and to
investigate differences between
languages.
parsed corpora:
-syntactically analysed
-SURFACE AND UNDERLYING STRUCTURAL ANALYSES AND
NATURALISTIC ENGLISH CORPUS (SUSANNE)
developmental language corpora:
-non-adult English native speakers' output
-not as proficient as native-speaker corpora
-POLYTECHNIC OF WALES (POW) CORPUS
ESL/EFL learner corpora:
-learners of English's output
-one and the same L1 background or different mother
tongues
-JAPANESE EFL LEARNER CORPUS (JEFLL)

More Related Content

What's hot

umair ijaz's Lexicography presentation
umair ijaz's Lexicography presentationumair ijaz's Lexicography presentation
umair ijaz's Lexicography presentationUmair Ijaz
 
A corpus based study of distribution of preposition in pakistani
A corpus based study of distribution of preposition in pakistaniA corpus based study of distribution of preposition in pakistani
A corpus based study of distribution of preposition in pakistaniAlexander Decker
 
Module1 historical linguistics-part1
Module1 historical linguistics-part1Module1 historical linguistics-part1
Module1 historical linguistics-part1Abdel-Fattah Adel
 
A timeline of the history of linguistics
A timeline of the history of linguistics A timeline of the history of linguistics
A timeline of the history of linguistics Jasmin Cruz
 
Corpora in language teaching
Corpora in language teachingCorpora in language teaching
Corpora in language teachingJonathan Smart
 
Corpus linguistics intro
Corpus linguistics introCorpus linguistics intro
Corpus linguistics introAlex Curtis
 
Corpus-Based Studies of Legal Language for Translation Purposes:
Corpus-Based Studies of Legal Language for Translation Purposes:Corpus-Based Studies of Legal Language for Translation Purposes:
Corpus-Based Studies of Legal Language for Translation Purposes:Lucja Biel
 
Using corpora in instruction
Using corpora in instructionUsing corpora in instruction
Using corpora in instructionJonathan Smart
 
Can we develop TV drama corpus-informed English vocabulary materials for elem...
Can we develop TV drama corpus-informed English vocabulary materials for elem...Can we develop TV drama corpus-informed English vocabulary materials for elem...
Can we develop TV drama corpus-informed English vocabulary materials for elem...Hiroya Tanaka
 
The development of english grammar
The development of english grammarThe development of english grammar
The development of english grammarDharmendra Kumar
 
19th century linguistics
19th century linguistics19th century linguistics
19th century linguisticsVenus Withers
 
Sociolinguistic, Varieties of Language, Diglossia
Sociolinguistic, Varieties of Language, DiglossiaSociolinguistic, Varieties of Language, Diglossia
Sociolinguistic, Varieties of Language, DiglossiaElnaz Nasseri
 
Presentation introduction linguistic
Presentation introduction linguisticPresentation introduction linguistic
Presentation introduction linguisticPutri Pratiwi
 

What's hot (20)

umair ijaz's Lexicography presentation
umair ijaz's Lexicography presentationumair ijaz's Lexicography presentation
umair ijaz's Lexicography presentation
 
A corpus based study of distribution of preposition in pakistani
A corpus based study of distribution of preposition in pakistaniA corpus based study of distribution of preposition in pakistani
A corpus based study of distribution of preposition in pakistani
 
Hstorical linguistics
Hstorical linguisticsHstorical linguistics
Hstorical linguistics
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Dictionaries
DictionariesDictionaries
Dictionaries
 
Module1 historical linguistics-part1
Module1 historical linguistics-part1Module1 historical linguistics-part1
Module1 historical linguistics-part1
 
A timeline of the history of linguistics
A timeline of the history of linguistics A timeline of the history of linguistics
A timeline of the history of linguistics
 
Corpora in language teaching
Corpora in language teachingCorpora in language teaching
Corpora in language teaching
 
Corpus linguistics intro
Corpus linguistics introCorpus linguistics intro
Corpus linguistics intro
 
Corpus-Based Studies of Legal Language for Translation Purposes:
Corpus-Based Studies of Legal Language for Translation Purposes:Corpus-Based Studies of Legal Language for Translation Purposes:
Corpus-Based Studies of Legal Language for Translation Purposes:
 
Using corpora in instruction
Using corpora in instructionUsing corpora in instruction
Using corpora in instruction
 
Diglossia
DiglossiaDiglossia
Diglossia
 
Can we develop TV drama corpus-informed English vocabulary materials for elem...
Can we develop TV drama corpus-informed English vocabulary materials for elem...Can we develop TV drama corpus-informed English vocabulary materials for elem...
Can we develop TV drama corpus-informed English vocabulary materials for elem...
 
The development of english grammar
The development of english grammarThe development of english grammar
The development of english grammar
 
Dictionaries
DictionariesDictionaries
Dictionaries
 
19th century linguistics
19th century linguistics19th century linguistics
19th century linguistics
 
Sociolinguistic, Varieties of Language, Diglossia
Sociolinguistic, Varieties of Language, DiglossiaSociolinguistic, Varieties of Language, Diglossia
Sociolinguistic, Varieties of Language, Diglossia
 
Neologisms
NeologismsNeologisms
Neologisms
 
Presentation introduction linguistic
Presentation introduction linguisticPresentation introduction linguistic
Presentation introduction linguistic
 
Archaisms
ArchaismsArchaisms
Archaisms
 

Similar to What corpora are available? by David Y. W.D

Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguisticsIrum Malik
 
corpus.pptx
corpus.pptxcorpus.pptx
corpus.pptxSlothFox
 
Computer assisted text and corpus analysis
Computer assisted text and corpus analysisComputer assisted text and corpus analysis
Computer assisted text and corpus analysisRubyaShaheen
 
Types of corpus linguistics Parallel ,aligned...
 Types of corpus linguistics Parallel ,aligned... Types of corpus linguistics Parallel ,aligned...
Types of corpus linguistics Parallel ,aligned...RajpootBhatti5
 
Corpus study design
Corpus study designCorpus study design
Corpus study designbikashtaly
 
Pronunciation and philippine dictionaries (philippine lexicography)
Pronunciation and philippine dictionaries (philippine lexicography)Pronunciation and philippine dictionaries (philippine lexicography)
Pronunciation and philippine dictionaries (philippine lexicography)Sheng Nuesca
 
Corpus approaches to discourse analysis
Corpus approaches to discourse analysisCorpus approaches to discourse analysis
Corpus approaches to discourse analysisAseel K. Mahmood
 
Sinopsis
SinopsisSinopsis
Sinopsisayfa
 
Sinopsis
SinopsisSinopsis
Sinopsisayfa
 
Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Umm-e-Rooman Yaqoob
 
Pronunciation and philippine dictionaries (philippine lexicography)
Pronunciation and philippine dictionaries (philippine lexicography)Pronunciation and philippine dictionaries (philippine lexicography)
Pronunciation and philippine dictionaries (philippine lexicography)Sheng Nuesca
 
A corpus driven comparative analysis of modal verbs in pakistani and british ...
A corpus driven comparative analysis of modal verbs in pakistani and british ...A corpus driven comparative analysis of modal verbs in pakistani and british ...
A corpus driven comparative analysis of modal verbs in pakistani and british ...Alexander Decker
 
lexicography
lexicographylexicography
lexicographyayfa
 
Translingualism
TranslingualismTranslingualism
Translingualismschwarzerd
 

Similar to What corpora are available? by David Y. W.D (20)

Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
corpus linguistics.pptx
corpus linguistics.pptxcorpus linguistics.pptx
corpus linguistics.pptx
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
corpus.pptx
corpus.pptxcorpus.pptx
corpus.pptx
 
Computer assisted text and corpus analysis
Computer assisted text and corpus analysisComputer assisted text and corpus analysis
Computer assisted text and corpus analysis
 
Types of corpus linguistics Parallel ,aligned...
 Types of corpus linguistics Parallel ,aligned... Types of corpus linguistics Parallel ,aligned...
Types of corpus linguistics Parallel ,aligned...
 
Corpora and its use in elt
Corpora and its use in eltCorpora and its use in elt
Corpora and its use in elt
 
Corpus study design
Corpus study designCorpus study design
Corpus study design
 
Pronunciation and philippine dictionaries (philippine lexicography)
Pronunciation and philippine dictionaries (philippine lexicography)Pronunciation and philippine dictionaries (philippine lexicography)
Pronunciation and philippine dictionaries (philippine lexicography)
 
Talk nbu
Talk nbuTalk nbu
Talk nbu
 
Corpus approaches to discourse analysis
Corpus approaches to discourse analysisCorpus approaches to discourse analysis
Corpus approaches to discourse analysis
 
Sinopsis
SinopsisSinopsis
Sinopsis
 
Sinopsis
SinopsisSinopsis
Sinopsis
 
Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics
 
Pronunciation and philippine dictionaries (philippine lexicography)
Pronunciation and philippine dictionaries (philippine lexicography)Pronunciation and philippine dictionaries (philippine lexicography)
Pronunciation and philippine dictionaries (philippine lexicography)
 
A corpus driven comparative analysis of modal verbs in pakistani and british ...
A corpus driven comparative analysis of modal verbs in pakistani and british ...A corpus driven comparative analysis of modal verbs in pakistani and british ...
A corpus driven comparative analysis of modal verbs in pakistani and british ...
 
Specialist genres
Specialist genresSpecialist genres
Specialist genres
 
lexicography
lexicographylexicography
lexicography
 
Dictionaries for learners
Dictionaries for learnersDictionaries for learners
Dictionaries for learners
 
Translingualism
TranslingualismTranslingualism
Translingualism
 

More from RajpootBhatti5

what is stylistics and its levels 1.Phonological level 2.Graphological leve...
what is stylistics and its levels 1.Phonological level   2.Graphological leve...what is stylistics and its levels 1.Phonological level   2.Graphological leve...
what is stylistics and its levels 1.Phonological level 2.Graphological leve...RajpootBhatti5
 
Different Levels of Stylistics Analysis 1.Phonological level 2.Graphologic...
Different Levels of Stylistics Analysis  1.Phonological level   2.Graphologic...Different Levels of Stylistics Analysis  1.Phonological level   2.Graphologic...
Different Levels of Stylistics Analysis 1.Phonological level 2.Graphologic...RajpootBhatti5
 
Universal grammar (ug)
Universal grammar (ug)Universal grammar (ug)
Universal grammar (ug)RajpootBhatti5
 
Researching language learning in the age of social
Researching language learning in the age of socialResearching language learning in the age of social
Researching language learning in the age of socialRajpootBhatti5
 
Call and less commonly taught languages
Call and less commonly taught languagesCall and less commonly taught languages
Call and less commonly taught languagesRajpootBhatti5
 
Call tele collaboration
Call  tele collaborationCall  tele collaboration
Call tele collaborationRajpootBhatti5
 
What can corpus software do? Routledge chpt 11
 What can corpus software do? Routledge chpt 11 What can corpus software do? Routledge chpt 11
What can corpus software do? Routledge chpt 11RajpootBhatti5
 
What are the basics of Analysing a corpus? chpt.10 Routledge
What are the basics of Analysing a corpus? chpt.10 RoutledgeWhat are the basics of Analysing a corpus? chpt.10 Routledge
What are the basics of Analysing a corpus? chpt.10 RoutledgeRajpootBhatti5
 

More from RajpootBhatti5 (13)

what is stylistics and its levels 1.Phonological level 2.Graphological leve...
what is stylistics and its levels 1.Phonological level   2.Graphological leve...what is stylistics and its levels 1.Phonological level   2.Graphological leve...
what is stylistics and its levels 1.Phonological level 2.Graphological leve...
 
Different Levels of Stylistics Analysis 1.Phonological level 2.Graphologic...
Different Levels of Stylistics Analysis  1.Phonological level   2.Graphologic...Different Levels of Stylistics Analysis  1.Phonological level   2.Graphologic...
Different Levels of Stylistics Analysis 1.Phonological level 2.Graphologic...
 
Universal grammar (ug)
Universal grammar (ug)Universal grammar (ug)
Universal grammar (ug)
 
ILR
ILRILR
ILR
 
Register theory
Register theoryRegister theory
Register theory
 
Binding theory
Binding theoryBinding theory
Binding theory
 
Researching language learning in the age of social
Researching language learning in the age of socialResearching language learning in the age of social
Researching language learning in the age of social
 
Call and less commonly taught languages
Call and less commonly taught languagesCall and less commonly taught languages
Call and less commonly taught languages
 
Call tele collaboration
Call  tele collaborationCall  tele collaboration
Call tele collaboration
 
What can corpus software do? Routledge chpt 11
 What can corpus software do? Routledge chpt 11 What can corpus software do? Routledge chpt 11
What can corpus software do? Routledge chpt 11
 
What are the basics of Analysing a corpus? chpt.10 Routledge
What are the basics of Analysing a corpus? chpt.10 RoutledgeWhat are the basics of Analysing a corpus? chpt.10 Routledge
What are the basics of Analysing a corpus? chpt.10 Routledge
 
phonemes
 phonemes  phonemes
phonemes
 
Marxism theory
Marxism theoryMarxism theory
Marxism theory
 

Recently uploaded

Avoid the 2025 web accessibility rush: do not fear WCAG compliance
Avoid the 2025 web accessibility rush: do not fear WCAG complianceAvoid the 2025 web accessibility rush: do not fear WCAG compliance
Avoid the 2025 web accessibility rush: do not fear WCAG complianceDamien ROBERT
 
9654467111 Call Girls In Mahipalpur Women Seeking Men
9654467111 Call Girls In Mahipalpur Women Seeking Men9654467111 Call Girls In Mahipalpur Women Seeking Men
9654467111 Call Girls In Mahipalpur Women Seeking MenSapana Sha
 
The Skin Games 2024 25 - Sponsorship Deck
The Skin Games 2024 25 - Sponsorship DeckThe Skin Games 2024 25 - Sponsorship Deck
The Skin Games 2024 25 - Sponsorship DeckToluwanimi Balogun
 
How To Utilize Calculated Properties in your HubSpot Setup
How To Utilize Calculated Properties in your HubSpot SetupHow To Utilize Calculated Properties in your HubSpot Setup
How To Utilize Calculated Properties in your HubSpot Setupssuser4571da
 
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Onlineanilsa9823
 
Moving beyond multi-touch attribution - DigiMarCon CanWest 2024
Moving beyond multi-touch attribution - DigiMarCon CanWest 2024Moving beyond multi-touch attribution - DigiMarCon CanWest 2024
Moving beyond multi-touch attribution - DigiMarCon CanWest 2024Richard Ingilby
 
Unraveling the Mystery of the Hinterkaifeck Murders.pptx
Unraveling the Mystery of the Hinterkaifeck Murders.pptxUnraveling the Mystery of the Hinterkaifeck Murders.pptx
Unraveling the Mystery of the Hinterkaifeck Murders.pptxelizabethella096
 
Google 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
Google 3rd-Party Cookie Deprecation [Update] + 5 Best StrategiesGoogle 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
Google 3rd-Party Cookie Deprecation [Update] + 5 Best StrategiesSearch Engine Journal
 
Uncover Insightful User Journey Secrets Using GA4 Reports
Uncover Insightful User Journey Secrets Using GA4 ReportsUncover Insightful User Journey Secrets Using GA4 Reports
Uncover Insightful User Journey Secrets Using GA4 ReportsVWO
 
BDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Forecast of Content Marketing through AI
Forecast of Content Marketing through AIForecast of Content Marketing through AI
Forecast of Content Marketing through AIRinky
 
pptx.marketing strategy of tanishq. pptx
pptx.marketing strategy of tanishq. pptxpptx.marketing strategy of tanishq. pptx
pptx.marketing strategy of tanishq. pptxarsathsahil
 
Situation Analysis | Management Company.
Situation Analysis | Management Company.Situation Analysis | Management Company.
Situation Analysis | Management Company.DanielaQuiroz63
 
Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...
Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...
Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...Search Engine Journal
 
Social Samosa Guidebook for SAMMIES 2024.pdf
Social Samosa Guidebook for SAMMIES 2024.pdfSocial Samosa Guidebook for SAMMIES 2024.pdf
Social Samosa Guidebook for SAMMIES 2024.pdfSocial Samosa
 
How to Leverage Behavioral Science Insights for Direct Mail Success
How to Leverage Behavioral Science Insights for Direct Mail SuccessHow to Leverage Behavioral Science Insights for Direct Mail Success
How to Leverage Behavioral Science Insights for Direct Mail SuccessAggregage
 
Local SEO Domination: Put your business at the forefront of local searches!
Local SEO Domination:  Put your business at the forefront of local searches!Local SEO Domination:  Put your business at the forefront of local searches!
Local SEO Domination: Put your business at the forefront of local searches!dstvtechnician
 
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...ChesterYang6
 

Recently uploaded (20)

Avoid the 2025 web accessibility rush: do not fear WCAG compliance
Avoid the 2025 web accessibility rush: do not fear WCAG complianceAvoid the 2025 web accessibility rush: do not fear WCAG compliance
Avoid the 2025 web accessibility rush: do not fear WCAG compliance
 
9654467111 Call Girls In Mahipalpur Women Seeking Men
9654467111 Call Girls In Mahipalpur Women Seeking Men9654467111 Call Girls In Mahipalpur Women Seeking Men
9654467111 Call Girls In Mahipalpur Women Seeking Men
 
Turn Digital Reputation Threats into Offense Tactics - Daniel Lemin
Turn Digital Reputation Threats into Offense Tactics - Daniel LeminTurn Digital Reputation Threats into Offense Tactics - Daniel Lemin
Turn Digital Reputation Threats into Offense Tactics - Daniel Lemin
 
The Skin Games 2024 25 - Sponsorship Deck
The Skin Games 2024 25 - Sponsorship DeckThe Skin Games 2024 25 - Sponsorship Deck
The Skin Games 2024 25 - Sponsorship Deck
 
The Future of Brands on LinkedIn - Alison Kaltman
The Future of Brands on LinkedIn - Alison KaltmanThe Future of Brands on LinkedIn - Alison Kaltman
The Future of Brands on LinkedIn - Alison Kaltman
 
How To Utilize Calculated Properties in your HubSpot Setup
How To Utilize Calculated Properties in your HubSpot SetupHow To Utilize Calculated Properties in your HubSpot Setup
How To Utilize Calculated Properties in your HubSpot Setup
 
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
 
Moving beyond multi-touch attribution - DigiMarCon CanWest 2024
Moving beyond multi-touch attribution - DigiMarCon CanWest 2024Moving beyond multi-touch attribution - DigiMarCon CanWest 2024
Moving beyond multi-touch attribution - DigiMarCon CanWest 2024
 
Unraveling the Mystery of the Hinterkaifeck Murders.pptx
Unraveling the Mystery of the Hinterkaifeck Murders.pptxUnraveling the Mystery of the Hinterkaifeck Murders.pptx
Unraveling the Mystery of the Hinterkaifeck Murders.pptx
 
Google 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
Google 3rd-Party Cookie Deprecation [Update] + 5 Best StrategiesGoogle 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
Google 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
 
Uncover Insightful User Journey Secrets Using GA4 Reports
Uncover Insightful User Journey Secrets Using GA4 ReportsUncover Insightful User Journey Secrets Using GA4 Reports
Uncover Insightful User Journey Secrets Using GA4 Reports
 
BDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort Service
 
Forecast of Content Marketing through AI
Forecast of Content Marketing through AIForecast of Content Marketing through AI
Forecast of Content Marketing through AI
 
pptx.marketing strategy of tanishq. pptx
pptx.marketing strategy of tanishq. pptxpptx.marketing strategy of tanishq. pptx
pptx.marketing strategy of tanishq. pptx
 
Situation Analysis | Management Company.
Situation Analysis | Management Company.Situation Analysis | Management Company.
Situation Analysis | Management Company.
 
Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...
Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...
Do More with Less: Navigating Customer Acquisition Challenges for Today's Ent...
 
Social Samosa Guidebook for SAMMIES 2024.pdf
Social Samosa Guidebook for SAMMIES 2024.pdfSocial Samosa Guidebook for SAMMIES 2024.pdf
Social Samosa Guidebook for SAMMIES 2024.pdf
 
How to Leverage Behavioral Science Insights for Direct Mail Success
How to Leverage Behavioral Science Insights for Direct Mail SuccessHow to Leverage Behavioral Science Insights for Direct Mail Success
How to Leverage Behavioral Science Insights for Direct Mail Success
 
Local SEO Domination: Put your business at the forefront of local searches!
Local SEO Domination:  Put your business at the forefront of local searches!Local SEO Domination:  Put your business at the forefront of local searches!
Local SEO Domination: Put your business at the forefront of local searches!
 
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
 

What corpora are available? by David Y. W.D

  • 1. Section III Page 105 to 150 Presented by Ata ul ghafer &shoiba sabir Department of Applied linguistics GCUF
  • 2. Chapter 9 What corpora are available? by David Y. W.  Outline  What are corpora  Types of corpora 1. General corpora 2. Specialized corpora 3. Speech corpora 4. Parsed corpora 5. Historical corpora
  • 3. Chapter 9 What corpora are available? by David Y. W.  Outline 1. Multimedia corpora 2. Developmental, learner and lingua franca corpora 3. ESL/EFL learner corpora 4. Parallel corpora 5. comparable corpora 6. multilingual corpora
  • 4. What are corpora  A Latin word “body / mass”  A collection of written texts, especially the entire works of a particular author or a body of writing on a particular subject: "the Darwinian corpus“  Corpora’ are a large and structured set of texts (nowadays usually electronically stored and processed).  They are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.
  • 5. Types of corpora General Corpora  The texts that do not belong to a single text type, subject field, or register.  May include written or spoken language, or both.  May include texts produced in one country or many.  They aim to represent language in its broadest sense and to serve as a widely available resource for baseline or comparative studies of general linguistic features.
  • 6. Examples  Brown Corpus – 1 million words.  LOB Corpus – 1 million words.  BNC (British National Corpus) – 100 million words.
  • 7. Specialized Corpora  Texts that are designed with more specific research goals in mind – register-specific descriptions and investigations of language.  It aims to be representative of a given type of text.  Used to investigate a particular type of language.  The kind of texts included are limited:  A time frame – such as a particular century.  A social setting – such as conversations taking place in a bookshop.  A given topic – such as newspaper articles dealing with a particular thing.
  • 8. Examples  Cambridge and Nottingham Corpus of Discourse in English (CANCODE) (informal registers of British English) – 5 million words.  Michigan Corpus of Academic Spoken English (MICASE) (spoken registers in a US academic setting) – 5 million words. Historical or Diach
  • 9. Historical Corpora  Texts from different periods of time.  Aim at representing an earlier stage(s) of a language. They help to trace the development of a language over time.  Example: Helsinki Corpus - 700 to 1700 texts 1.5 million words
  • 10. Speech corpora sound recordings -SPOKEN ENGLISH CORPUS -detailed description of spoken phenomena: phonology, prosody (stress, tone units…), etc multimedia corpora: -transcripts synchronised audio/video recordings -TALKBANK Website: SANTA BARBARACORPUS OF SPOKEN AMERICAN ENGLISH (SBCSAE)
  • 11. Learner’s Corpora  Aim at representing the language as produced by the learners of a language, and they include spoken or written language samples produced by non-native speakers.  They are used to identify differences among learners’ frequency of words and types of mistakes.  In what respects learners differ from each other and from the language of native speakers
  • 12. Example Louvain Corpus of Native English Essays (LOCNEE) International Corpus of Learner English (ICLE) 20,000 words.
  • 13. Multilingual Corpora  Any systematic collection of empirical language data enabling linguists to carry out analyses of multilingual individuals, multilingual societies or multilingual communication.
  • 14. Comparable Corpora  Two (or more) corpora in different languages (e.g. English and Spanish) or in different varieties of a language (e.g. Indian English and Canadian English). They are designed along the same lines – will contain the same proportions of newspaper texts, novels, casual conversation, etc.  Comparable corpora of varieties of the same language can be used to compare those varieties.  Comparable corpora of different languages can be used by translators to identify differences and equivalences in each language.  Example International Corpus of English (ICE) are comparable corpora of 1 million words each of different varieties of Eng
  • 15. Parallel Corpora  Two (or more) corpora in different languages, each containing texts that have been translated from one language into the other, or texts that have been produced simultaneously in two or more languages.  Can be used by translators and by learners to find potential equivalent expressions in each language and to investigate differences between languages.
  • 16. parsed corpora: -syntactically analysed -SURFACE AND UNDERLYING STRUCTURAL ANALYSES AND NATURALISTIC ENGLISH CORPUS (SUSANNE) developmental language corpora: -non-adult English native speakers' output -not as proficient as native-speaker corpora -POLYTECHNIC OF WALES (POW) CORPUS
  • 17. ESL/EFL learner corpora: -learners of English's output -one and the same L1 background or different mother tongues -JAPANESE EFL LEARNER CORPUS (JEFLL)