SlideShare a Scribd company logo
What are the
basics of
Analysing a
corpus?
Ata ul ghafer & shoiba sabir
Department of Applied linguistics
GCUF
Introduction
– corpora can tell us nothing but because they are collections of electronic texts.
– As Hunston (2002: 3) puts it, ‘a corpus does not contain new information about
language, but the software offers us a new perspective on the familiar’.
– There is an increasing range of software available to carry out such processes,
from established commercial software such as WordSmith Tools (Scott 1999),
Monoconc Pro (2000) and Word Sketch Engine (Kilgarriff et al. 2004) to
freeware downloadable from the internet.
Exploring word frequency lists
– The first basic corpus technique that we will consider is that of frequency
analysis.
– When we generate a frequency list for a particular corpus, the software
searches every item inthat corpus in order to establish how many tokens there
are in total -
Exploiting frequency data
– what frequency lists can tell us?
– Frequency lists can be useful documents for lexicographers and language
syllabus and materials designers (see McCarten, this volume).
– Example
– The Compleat Lexical Tutor (available online) utilises the Academic Wordlist (see
Coxhead, this volume) and the much older General Service List (West 1953) as
the basis of its lexical profiling programs.
– It can be useful to compare the rank order of items in two or more corpora by
looking at them side by side.
Normalisation
– Normalisation is a technique used to help reduce data duplication when designing
data structures, also resulting in an improvement in data integrity.
– Example
– For example, the pronoun we occurs 2,142 times in a sub-corpus of meetings
extracted from the BNC Sampler corpus, and 2,666 times in another sub-corpus of
the BNC Sampler made up of casual conversation. However, because the two
corpora are of such different sizes, these raw frequencies mean very little relative to
each other. In order to normalise the figure for the meeting sub-corpus, the raw
frequency of 2,142 is divided by 148,624 (the total word count of the meeting sub-
corpus) and multiplied by 1,000, giving a figure of fourteen occurrences per
thousand words.
Exploring key-word lists
– Key words are not necessarily the most frequent words in a corpus, but they are
those words which are identified by statistical comparison of a ‘target’ corpus
with another, larger corpus, which is referred to as the ‘reference’corpus.
Exploring concordance lines
– Also known as KWIC (key word in context) analysis, concordance analysis is
probably the first basic corpus analytic technique that many people interested
in corpus analysis undertake.
– Concordancing is a valuable analytical technique because it allows a large
number of examples of an item to be brought together in one place, in their
original context. It is useful both for hypothesis testing and for hypothesis
generation.
–Thank you

More Related Content

What's hot

Second language teaching methods
Second language teaching methodsSecond language teaching methods
Second language teaching methods
Jaziel Romero
 
structural ambiguity
structural ambiguitystructural ambiguity
structural ambiguity
ShathaRashedAlMutair
 
Corpora in language teaching
Corpora in language teachingCorpora in language teaching
Corpora in language teaching
Jonathan Smart
 
What is Universal Grammar Theory and its Criticism
What is Universal Grammar Theory and its Criticism What is Universal Grammar Theory and its Criticism
What is Universal Grammar Theory and its Criticism
Farhad Mohammad
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
Fatima Batool
 
Generative grammar
Generative grammarGenerative grammar
Generative grammar
Asif Ali Raza
 
Lexicography
LexicographyLexicography
Lexicography
Sadia Irshad
 
Discourse analysis and vocabulary
Discourse analysis and vocabularyDiscourse analysis and vocabulary
Discourse analysis and vocabulary
Azam Almubarki
 
Principles And Parameter Of Universal Grammar
Principles And Parameter Of Universal GrammarPrinciples And Parameter Of Universal Grammar
Principles And Parameter Of Universal Grammar
Dr. Cupid Lucid
 
code mixing and code switching
code mixing and code switchingcode mixing and code switching
code mixing and code switching
Fatima Gul
 
Stylistics
Stylistics Stylistics
Stylistics
Umm-e-Rooman Yaqoob
 
Applied linguistics
Applied linguisticsApplied linguistics
Applied linguistics
Jordán Masías
 
Critical Language Awareness
Critical Language AwarenessCritical Language Awareness
Critical Language Awareness
MuhammadMirzaAdam
 
Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics
Umm-e-Rooman Yaqoob
 
General linguistics
General linguisticsGeneral linguistics
General linguistics
zhian asaad
 
Form and function.
Form and function.Form and function.
Form and function.
Dr. Mohsin Khan
 
Theories in Language Description
Theories in Language DescriptionTheories in Language Description
Theories in Language Description
Mohsin Anayat Ch
 
Post Method Era
Post Method EraPost Method Era
Post Method Era
Sadeq Rahimi
 
The Prague School.ppt
The Prague School.pptThe Prague School.ppt
The Prague School.ppt
naheed29
 
Fairclough et al, critical discourse analysis
Fairclough et al, critical discourse analysisFairclough et al, critical discourse analysis
Fairclough et al, critical discourse analysis
Samira Rahmdel
 

What's hot (20)

Second language teaching methods
Second language teaching methodsSecond language teaching methods
Second language teaching methods
 
structural ambiguity
structural ambiguitystructural ambiguity
structural ambiguity
 
Corpora in language teaching
Corpora in language teachingCorpora in language teaching
Corpora in language teaching
 
What is Universal Grammar Theory and its Criticism
What is Universal Grammar Theory and its Criticism What is Universal Grammar Theory and its Criticism
What is Universal Grammar Theory and its Criticism
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Generative grammar
Generative grammarGenerative grammar
Generative grammar
 
Lexicography
LexicographyLexicography
Lexicography
 
Discourse analysis and vocabulary
Discourse analysis and vocabularyDiscourse analysis and vocabulary
Discourse analysis and vocabulary
 
Principles And Parameter Of Universal Grammar
Principles And Parameter Of Universal GrammarPrinciples And Parameter Of Universal Grammar
Principles And Parameter Of Universal Grammar
 
code mixing and code switching
code mixing and code switchingcode mixing and code switching
code mixing and code switching
 
Stylistics
Stylistics Stylistics
Stylistics
 
Applied linguistics
Applied linguisticsApplied linguistics
Applied linguistics
 
Critical Language Awareness
Critical Language AwarenessCritical Language Awareness
Critical Language Awareness
 
Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics
 
General linguistics
General linguisticsGeneral linguistics
General linguistics
 
Form and function.
Form and function.Form and function.
Form and function.
 
Theories in Language Description
Theories in Language DescriptionTheories in Language Description
Theories in Language Description
 
Post Method Era
Post Method EraPost Method Era
Post Method Era
 
The Prague School.ppt
The Prague School.pptThe Prague School.ppt
The Prague School.ppt
 
Fairclough et al, critical discourse analysis
Fairclough et al, critical discourse analysisFairclough et al, critical discourse analysis
Fairclough et al, critical discourse analysis
 

Similar to What are the basics of Analysing a corpus? chpt.10 Routledge

Corpus study design
Corpus study designCorpus study design
Corpus study design
bikashtaly
 
Chapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdfChapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdf
JemalNesre1
 
Chapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdfChapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdf
Habtamu100
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
rudolf eremyan
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
El Habib NFAOUI
 
[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
IJET - International Journal of Engineering and Techniques
 
Mood classification of songs based on lyrics
Mood classification of songs based on lyricsMood classification of songs based on lyrics
Mood classification of songs based on lyrics
Francesco Cucari
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
Prof.Ravindra Borse
 
Chapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrievalChapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrieval
captainmactavish1996
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
Alicia Ruiz
 
English kazakh parallel corpus for statistical machine translation
English kazakh parallel corpus for statistical machine translationEnglish kazakh parallel corpus for statistical machine translation
English kazakh parallel corpus for statistical machine translation
ijnlc
 
NLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inNLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful in
Kumari Naveen
 
lect36-tasks.ppt
lect36-tasks.pptlect36-tasks.ppt
lect36-tasks.ppt
HaHa501620
 
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert SystemModeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
csandit
 
NLP
NLPNLP
NLP
NLPNLP
Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...
butest
 
Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
Universidad Nacional de San Martin
 
A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...
Scott Bou
 
Ir 02
Ir   02Ir   02

Similar to What are the basics of Analysing a corpus? chpt.10 Routledge (20)

Corpus study design
Corpus study designCorpus study design
Corpus study design
 
Chapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdfChapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdf
 
Chapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdfChapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdf
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
 
Mood classification of songs based on lyrics
Mood classification of songs based on lyricsMood classification of songs based on lyrics
Mood classification of songs based on lyrics
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Chapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrievalChapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrieval
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
English kazakh parallel corpus for statistical machine translation
English kazakh parallel corpus for statistical machine translationEnglish kazakh parallel corpus for statistical machine translation
English kazakh parallel corpus for statistical machine translation
 
NLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inNLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful in
 
lect36-tasks.ppt
lect36-tasks.pptlect36-tasks.ppt
lect36-tasks.ppt
 
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert SystemModeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
 
NLP
NLPNLP
NLP
 
NLP
NLPNLP
NLP
 
Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...
 
Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
 
A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...
 
Ir 02
Ir   02Ir   02
Ir 02
 

More from RajpootBhatti5

what is stylistics and its levels 1.Phonological level 2.Graphological leve...
what is stylistics and its levels 1.Phonological level   2.Graphological leve...what is stylistics and its levels 1.Phonological level   2.Graphological leve...
what is stylistics and its levels 1.Phonological level 2.Graphological leve...
RajpootBhatti5
 
Different Levels of Stylistics Analysis 1.Phonological level 2.Graphologic...
Different Levels of Stylistics Analysis  1.Phonological level   2.Graphologic...Different Levels of Stylistics Analysis  1.Phonological level   2.Graphologic...
Different Levels of Stylistics Analysis 1.Phonological level 2.Graphologic...
RajpootBhatti5
 
Universal grammar (ug)
Universal grammar (ug)Universal grammar (ug)
Universal grammar (ug)
RajpootBhatti5
 
ILR
ILRILR
Register theory
Register theoryRegister theory
Register theory
RajpootBhatti5
 
Types of corpus linguistics Parallel ,aligned...
 Types of corpus linguistics Parallel ,aligned... Types of corpus linguistics Parallel ,aligned...
Types of corpus linguistics Parallel ,aligned...
RajpootBhatti5
 
Binding theory
Binding theoryBinding theory
Binding theory
RajpootBhatti5
 
Researching language learning in the age of social
Researching language learning in the age of socialResearching language learning in the age of social
Researching language learning in the age of social
RajpootBhatti5
 
Call and less commonly taught languages
Call and less commonly taught languagesCall and less commonly taught languages
Call and less commonly taught languages
RajpootBhatti5
 
Call tele collaboration
Call  tele collaborationCall  tele collaboration
Call tele collaboration
RajpootBhatti5
 
What can corpus software do? Routledge chpt 11
 What can corpus software do? Routledge chpt 11 What can corpus software do? Routledge chpt 11
What can corpus software do? Routledge chpt 11
RajpootBhatti5
 
What corpora are available? by David Y. W.D
What corpora are available? by David Y. W.DWhat corpora are available? by David Y. W.D
What corpora are available? by David Y. W.D
RajpootBhatti5
 
phonemes
 phonemes  phonemes
phonemes
RajpootBhatti5
 
Marxism theory
Marxism theoryMarxism theory
Marxism theory
RajpootBhatti5
 

More from RajpootBhatti5 (14)

what is stylistics and its levels 1.Phonological level 2.Graphological leve...
what is stylistics and its levels 1.Phonological level   2.Graphological leve...what is stylistics and its levels 1.Phonological level   2.Graphological leve...
what is stylistics and its levels 1.Phonological level 2.Graphological leve...
 
Different Levels of Stylistics Analysis 1.Phonological level 2.Graphologic...
Different Levels of Stylistics Analysis  1.Phonological level   2.Graphologic...Different Levels of Stylistics Analysis  1.Phonological level   2.Graphologic...
Different Levels of Stylistics Analysis 1.Phonological level 2.Graphologic...
 
Universal grammar (ug)
Universal grammar (ug)Universal grammar (ug)
Universal grammar (ug)
 
ILR
ILRILR
ILR
 
Register theory
Register theoryRegister theory
Register theory
 
Types of corpus linguistics Parallel ,aligned...
 Types of corpus linguistics Parallel ,aligned... Types of corpus linguistics Parallel ,aligned...
Types of corpus linguistics Parallel ,aligned...
 
Binding theory
Binding theoryBinding theory
Binding theory
 
Researching language learning in the age of social
Researching language learning in the age of socialResearching language learning in the age of social
Researching language learning in the age of social
 
Call and less commonly taught languages
Call and less commonly taught languagesCall and less commonly taught languages
Call and less commonly taught languages
 
Call tele collaboration
Call  tele collaborationCall  tele collaboration
Call tele collaboration
 
What can corpus software do? Routledge chpt 11
 What can corpus software do? Routledge chpt 11 What can corpus software do? Routledge chpt 11
What can corpus software do? Routledge chpt 11
 
What corpora are available? by David Y. W.D
What corpora are available? by David Y. W.DWhat corpora are available? by David Y. W.D
What corpora are available? by David Y. W.D
 
phonemes
 phonemes  phonemes
phonemes
 
Marxism theory
Marxism theoryMarxism theory
Marxism theory
 

Recently uploaded

Playlist and Paint Event with Sony Music U
Playlist and Paint Event with Sony Music UPlaylist and Paint Event with Sony Music U
Playlist and Paint Event with Sony Music U
SemajahParker
 
What’s “In” and “Out” for ABM in 2024: Plays That Help You Grow and Ones to L...
What’s “In” and “Out” for ABM in 2024: Plays That Help You Grow and Ones to L...What’s “In” and “Out” for ABM in 2024: Plays That Help You Grow and Ones to L...
What’s “In” and “Out” for ABM in 2024: Plays That Help You Grow and Ones to L...
Demandbase
 
Gokila digital marketing| consultant| Coimbatore
Gokila digital marketing| consultant| CoimbatoreGokila digital marketing| consultant| Coimbatore
Gokila digital marketing| consultant| Coimbatore
dmgokila
 
Pillar-Based Marketing Master Class - Ryan Brock
Pillar-Based Marketing Master Class - Ryan BrockPillar-Based Marketing Master Class - Ryan Brock
Crafting Seamless B2B Customer Journeys - Strategies for Exceptional Experien...
Crafting Seamless B2B Customer Journeys - Strategies for Exceptional Experien...Crafting Seamless B2B Customer Journeys - Strategies for Exceptional Experien...
Crafting Seamless B2B Customer Journeys - Strategies for Exceptional Experien...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Etsy Marketing Guide - Tips For Selling Digital Products
Etsy Marketing Guide - Tips For Selling Digital ProductsEtsy Marketing Guide - Tips For Selling Digital Products
Etsy Marketing Guide - Tips For Selling Digital Products
kcblog21
 
Consumer Journey Mapping & Personalization Master Class - Sabrina Killgo
Consumer Journey Mapping & Personalization Master Class - Sabrina KillgoConsumer Journey Mapping & Personalization Master Class - Sabrina Killgo
Consumer Journey Mapping & Personalization Master Class - Sabrina Killgo
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Global Growth Starts With Translation - How To Unlock Global Markets - Tim Kirby
Global Growth Starts With Translation - How To Unlock Global Markets - Tim KirbyGlobal Growth Starts With Translation - How To Unlock Global Markets - Tim Kirby
Global Growth Starts With Translation - How To Unlock Global Markets - Tim Kirby
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
How to Use a Free Book Funnel to Drive Highly Qualified Buyers Into Your Busi...
How to Use a Free Book Funnel to Drive Highly Qualified Buyers Into Your Busi...How to Use a Free Book Funnel to Drive Highly Qualified Buyers Into Your Busi...
How to Use a Free Book Funnel to Drive Highly Qualified Buyers Into Your Busi...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
PickUp_conversational AI_Capex, Inc._20240611
PickUp_conversational AI_Capex, Inc._20240611PickUp_conversational AI_Capex, Inc._20240611
PickUp_conversational AI_Capex, Inc._20240611
Shuntaro Kogame
 
Grow Your Business Online: Introduction to Digital Marketing
Grow Your Business Online: Introduction to Digital MarketingGrow Your Business Online: Introduction to Digital Marketing
Grow Your Business Online: Introduction to Digital Marketing
Digital Discovery Institute
 
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Growth Marketing in 2024 - Randy Rayess, Outgrow
Growth Marketing in 2024 - Randy Rayess,  OutgrowGrowth Marketing in 2024 - Randy Rayess,  Outgrow
Yes, It's Your Fault Book Launch Webinar
Yes, It's Your Fault Book Launch WebinarYes, It's Your Fault Book Launch Webinar
Yes, It's Your Fault Book Launch Webinar
Demandbase
 
Podcast, The New Marketing Currency - Ozeal Debastos
Podcast, The New Marketing Currency - Ozeal DebastosPodcast, The New Marketing Currency - Ozeal Debastos
Podcast, The New Marketing Currency - Ozeal Debastos
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 

Recently uploaded (20)

Playlist and Paint Event with Sony Music U
Playlist and Paint Event with Sony Music UPlaylist and Paint Event with Sony Music U
Playlist and Paint Event with Sony Music U
 
What’s “In” and “Out” for ABM in 2024: Plays That Help You Grow and Ones to L...
What’s “In” and “Out” for ABM in 2024: Plays That Help You Grow and Ones to L...What’s “In” and “Out” for ABM in 2024: Plays That Help You Grow and Ones to L...
What’s “In” and “Out” for ABM in 2024: Plays That Help You Grow and Ones to L...
 
Gokila digital marketing| consultant| Coimbatore
Gokila digital marketing| consultant| CoimbatoreGokila digital marketing| consultant| Coimbatore
Gokila digital marketing| consultant| Coimbatore
 
Pillar-Based Marketing Master Class - Ryan Brock
Pillar-Based Marketing Master Class - Ryan BrockPillar-Based Marketing Master Class - Ryan Brock
Pillar-Based Marketing Master Class - Ryan Brock
 
Amazing and On Point - Ramon Ray, USA TODAY
Amazing and On Point - Ramon Ray, USA TODAYAmazing and On Point - Ramon Ray, USA TODAY
Amazing and On Point - Ramon Ray, USA TODAY
 
Crafting Seamless B2B Customer Journeys - Strategies for Exceptional Experien...
Crafting Seamless B2B Customer Journeys - Strategies for Exceptional Experien...Crafting Seamless B2B Customer Journeys - Strategies for Exceptional Experien...
Crafting Seamless B2B Customer Journeys - Strategies for Exceptional Experien...
 
Etsy Marketing Guide - Tips For Selling Digital Products
Etsy Marketing Guide - Tips For Selling Digital ProductsEtsy Marketing Guide - Tips For Selling Digital Products
Etsy Marketing Guide - Tips For Selling Digital Products
 
Consumer Journey Mapping & Personalization Master Class - Sabrina Killgo
Consumer Journey Mapping & Personalization Master Class - Sabrina KillgoConsumer Journey Mapping & Personalization Master Class - Sabrina Killgo
Consumer Journey Mapping & Personalization Master Class - Sabrina Killgo
 
Global Growth Starts With Translation - How To Unlock Global Markets - Tim Kirby
Global Growth Starts With Translation - How To Unlock Global Markets - Tim KirbyGlobal Growth Starts With Translation - How To Unlock Global Markets - Tim Kirby
Global Growth Starts With Translation - How To Unlock Global Markets - Tim Kirby
 
How to Use a Free Book Funnel to Drive Highly Qualified Buyers Into Your Busi...
How to Use a Free Book Funnel to Drive Highly Qualified Buyers Into Your Busi...How to Use a Free Book Funnel to Drive Highly Qualified Buyers Into Your Busi...
How to Use a Free Book Funnel to Drive Highly Qualified Buyers Into Your Busi...
 
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
 
PickUp_conversational AI_Capex, Inc._20240611
PickUp_conversational AI_Capex, Inc._20240611PickUp_conversational AI_Capex, Inc._20240611
PickUp_conversational AI_Capex, Inc._20240611
 
Mastering Email Campaign Automation Strategies and Best Practices - Michelle...
Mastering Email Campaign Automation Strategies and Best Practices  - Michelle...Mastering Email Campaign Automation Strategies and Best Practices  - Michelle...
Mastering Email Campaign Automation Strategies and Best Practices - Michelle...
 
Unleash the Power of Storytelling - Win Hearts, Change Minds, Get Results - R...
Unleash the Power of Storytelling - Win Hearts, Change Minds, Get Results - R...Unleash the Power of Storytelling - Win Hearts, Change Minds, Get Results - R...
Unleash the Power of Storytelling - Win Hearts, Change Minds, Get Results - R...
 
Grow Your Business Online: Introduction to Digital Marketing
Grow Your Business Online: Introduction to Digital MarketingGrow Your Business Online: Introduction to Digital Marketing
Grow Your Business Online: Introduction to Digital Marketing
 
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
 
Growth Marketing in 2024 - Randy Rayess, Outgrow
Growth Marketing in 2024 - Randy Rayess,  OutgrowGrowth Marketing in 2024 - Randy Rayess,  Outgrow
Growth Marketing in 2024 - Randy Rayess, Outgrow
 
Yes, It's Your Fault Book Launch Webinar
Yes, It's Your Fault Book Launch WebinarYes, It's Your Fault Book Launch Webinar
Yes, It's Your Fault Book Launch Webinar
 
Podcast, The New Marketing Currency - Ozeal Debastos
Podcast, The New Marketing Currency - Ozeal DebastosPodcast, The New Marketing Currency - Ozeal Debastos
Podcast, The New Marketing Currency - Ozeal Debastos
 
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
Digital Marketing Trends - Experts Insights on How to Gain a Competitive Edge...
 

What are the basics of Analysing a corpus? chpt.10 Routledge

  • 1. What are the basics of Analysing a corpus? Ata ul ghafer & shoiba sabir Department of Applied linguistics GCUF
  • 2. Introduction – corpora can tell us nothing but because they are collections of electronic texts. – As Hunston (2002: 3) puts it, ‘a corpus does not contain new information about language, but the software offers us a new perspective on the familiar’. – There is an increasing range of software available to carry out such processes, from established commercial software such as WordSmith Tools (Scott 1999), Monoconc Pro (2000) and Word Sketch Engine (Kilgarriff et al. 2004) to freeware downloadable from the internet.
  • 3. Exploring word frequency lists – The first basic corpus technique that we will consider is that of frequency analysis. – When we generate a frequency list for a particular corpus, the software searches every item inthat corpus in order to establish how many tokens there are in total -
  • 4. Exploiting frequency data – what frequency lists can tell us? – Frequency lists can be useful documents for lexicographers and language syllabus and materials designers (see McCarten, this volume). – Example – The Compleat Lexical Tutor (available online) utilises the Academic Wordlist (see Coxhead, this volume) and the much older General Service List (West 1953) as the basis of its lexical profiling programs.
  • 5. – It can be useful to compare the rank order of items in two or more corpora by looking at them side by side.
  • 6. Normalisation – Normalisation is a technique used to help reduce data duplication when designing data structures, also resulting in an improvement in data integrity. – Example – For example, the pronoun we occurs 2,142 times in a sub-corpus of meetings extracted from the BNC Sampler corpus, and 2,666 times in another sub-corpus of the BNC Sampler made up of casual conversation. However, because the two corpora are of such different sizes, these raw frequencies mean very little relative to each other. In order to normalise the figure for the meeting sub-corpus, the raw frequency of 2,142 is divided by 148,624 (the total word count of the meeting sub- corpus) and multiplied by 1,000, giving a figure of fourteen occurrences per thousand words.
  • 7. Exploring key-word lists – Key words are not necessarily the most frequent words in a corpus, but they are those words which are identified by statistical comparison of a ‘target’ corpus with another, larger corpus, which is referred to as the ‘reference’corpus.
  • 8. Exploring concordance lines – Also known as KWIC (key word in context) analysis, concordance analysis is probably the first basic corpus analytic technique that many people interested in corpus analysis undertake. – Concordancing is a valuable analytical technique because it allows a large number of examples of an item to be brought together in one place, in their original context. It is useful both for hypothesis testing and for hypothesis generation.