SlideShare a Scribd company logo
1 of 58
Download to read offline
 Group Members:
 Ayesha Azhar
 Bareera Akbar
 Irum Masood
 Maryam Ahmed
 Tahira Jabeen
Incomprehensible
consciously
A sea of
words
Essence of
human
beings
Corpus linguistics
Corpus linguistics
Corpus linguistics
 A Latin word “body / mass”
 A collection of written texts, especially the entire
works of a particular author or a body of writing
on a particular subject: "the Darwinian corpus“
Corpora (plural)
History of Corpus Linguistics
 Language study is not a new idea.
 1921: 30,000 words. A Treasure, but of no use.
 1960 with the advent of computer....
 The use of collections of COMPUTER-READABLE text for
language study.
 Brown Corpus of Standard American English.
 One million words of American English texts printed in 1964.
 First electronic corpus
Corpus linguistics
Corpus Linguistics
 Linguistics being the scientific study of language
and its structure, ‘corpus linguistics’ is the study
of language “on the basis of text corpora.”
 The analysis does not stop at the description of
those texts; rather the contexts are also focused
upon.
Place for Corpus Linguistics in
Applied Linguistics
 A means to explore actual patterns of language use.
 A tool for developing materials for classroom language
instruction.
 To explore different questions about language use.
 To provide powerful tools for analysis of natural
languages.
 To give an insight about how language use varies in
different situations.
Corpora
 ‘Corpora’ are a large and structured set of texts
(nowadays usually electronically stored and
processed).
 They are used to do statistical analysis and
hypothesis testing, checking occurrences or
validating linguistic rules within a specific
language territory.
Corpus linguistics
General Corpora
 The texts that do not belong to a single text type,
subject field, or register.
 May include written or spoken language, or both.
 May include texts produced in one country or
many.
 They aim to represent language in its broadest
sense and to serve as a widely available resource
for baseline or comparative studies of general
linguistic features.
 May be used to produce reference materials for
language learning or translation.
 Often used as a baseline in comparison with more
specialized corpora.
 Also sometimes known as ‘reference corpora’.
Examples
 Brown Corpus – 1 million words.
 LOB Corpus – 1 million words.
 BNC (British National Corpus) – 100 million
words.
Specialized Corpora
 Texts that are designed with more specific research goals
in mind – register-specific descriptions and
investigations of language.
 It aims to be representative of a given type of text.
 Used to investigate a particular type of language.
 The kind of texts included are limited:
 A time frame – such as a particular century.
 A social setting – such as conversations taking place in
a bookshop.
 A given topic – such as newspaper articles dealing
with a particular thing.
Examples
 Cambridge and Nottingham Corpus
of Discourse in English
(CANCODE) (informal registers of
British English) – 5 million words.
 Michigan Corpus of Academic
Spoken English (MICASE) (spoken
registers in a US academic setting) –
5 million words.
Historical or Diachronic Corpora
Texts from different
periods of time.
Aim at representing an
earlier stage(s) of a
language.
They help to trace the
development of a language
over time.
Example
Helsinki Corpus - 700 to 1700 texts
1.5 million words
Regional Corpora
Aim at representing a regional variety of a
language, such as dialects.
Learner’s Corpora
 Aim at representing the language as produced by the
learners of a language, and they include spoken or
written language samples produced by non-native
speakers.
 They are used to identify differences among learners’
frequency of words and types of mistakes.
 In what respects learners differ from each other and
from the language of native speakers
Example
 Louvain Corpus of Native
English Essays (LOCNEE)
 International Corpus of Learner
English (ICLE)
 20,000 words
Multilingual Corpora
 Any systematic collection of empirical language data
enabling linguists to carry out analyses of multilingual
individuals, multilingual societies or multilingual
communication.
Comparable Corpora
 Two (or more) corpora in different languages (e.g. English
and Spanish) or in different varieties of a language (e.g.
Indian English and Canadian English).
 They are designed along the same lines – will contain the
same proportions of newspaper texts, novels, casual
conversation, etc.
 Comparable corpora of varieties of the same language can be
used to compare those varieties.
 Comparable corpora of different languages can be used by
translators to identify differences and equivalences in each
language.
Example
 International Corpus of English (ICE) are
comparable corpora of 1 million words each of
different varieties of English.
Parallel Corpora
 Two (or more) corpora in different languages, each
containing texts that have been translated from one
language into the other, or texts that have been produced
simultaneously in two or more languages.
 Can be used by translators and by learners to find
potential equivalent expressions in each language and to
investigate differences between languages.
Corpus linguistics
 Size
 Representativeness
 Registers / modes / topics
 Demographics
 Production / reception
 Research goals
 Funding
 Time
 Staff/students
Corpus linguistics
Written Corpora
 Obtaining/creating, Storing, Organizing
Materials Required:
-scanner, OCR software
Process:
-paper document into electronic text file
Types:
-newspapers, periodicals
-small specialized corpora
-informal writings (travel diaries, e-mail,
discussion, blogs, news groups)
Spoken Corpora
 deciding on a transcription system
I. prosodic/non prosodic
II. representing interactional characteristics of
speech (over lapping speech, back channels,
pauses, non-verbal contextual events)
III. permission to use data
IV. ensuring anonymity
V. avoiding impracticality of data
Markup
1. Structural markups:
-written corpus: Titles, authors, paragraphs, subheadings,
chapters etc.
-spoken corpus: Contextual events, paralinguistic features
2: Header:
-written corpus:
Classification into categories(register, genre, topic domain, discourse
mode, formality)
-spoken corpus:
Demographic infirmation about speaker(gender,social
class,occupation,age,native language/dialect)
Relationship among the participants
Linguistic Annotation
Parts of Speech Tagging:
Grammatical category, case assigning
Prosodic Annotation
Phonetic Annotation
Syntactic Parsing
Advantages of Tagging
Vast exploration
Frequency
Co-occurance
Multiple meaning studies
Automatically retrievable
Corpus linguistics
Concordance Lines
 Concordance lines are a useful tool for
investigating corpora, but their use is limited by
the ability of the human observer to process
information.
 There are some statistical calculations of
collocation and corpus annotation.
Frequency and Key-word Lists
 A frequency list is a list of all the types in a corpus
together with the number of occurrences of each type.
Comparing the frequency lists for two corpora can give
interesting information
 About the differences between the two texts.
e.g.) Kennedy (1998)
 a comparison between a corpus of Economics texts
and one of general academic English→ the words price,
cost, demand, curve, firm… are frequently found in the
Economics corpus.
Keywords
 A useful starting point in investigating a
specialized corpus.
 They can be lexical items which reflect the topic
of a particular text but also grammatical words
which convey more subtle information.
Collocation
 The tendency of words to be biased in the way
they co-occur.
 Statistical measurements of collocation are more
reliable, and for this reason a
corpus is essential.
Measurements of Collocation
 Computer programs, which calculate collocation,
take a node word and count the instances of all words
occurring within a particular span.
 (note) the count ignores punctuation marks.
 Counts ‘s’ as a separate word.
 Ignores sentence boundaries.
Tagging and Parsing
 Tagging is allocating a part of speech (POS) label
to each word in a corpus.
 e.g.) the word light ・・・tagged as verb, a noun
or an adjective each time it occurs in the corpus.
 Parsing is analyzing the sentences in a corpus
into their constituent parts, that is, doing.
Annotation
 General term for tagging and parsing, and also
used to describe other kinds of categorisation that
may be performed on a corpus.
(e.g.) The annotation of a spoken corpus for
prosodic features.
 The annotation of a corpus of learner English for
types of error.
 Annotation of anaphora and semantic annotation.
Softwares
 Special software is used in order to analyze a corpus and
certain words or phrases.
For example
• Sara for the BNC
• ICECUP for the ICE Great Britain.
• Concordancers can be used for the analysis of almost
any corpus.
Concordancer
 One of the most frequently used concordancers is
‘Wordsmith Tools’.
 Its two most important tools are:
 Concord and WordList
 As an alternative to Wordsmith, you can also use a
concordancer called ‘AntConc’ which can be
downloaded for free.
WordSmith Concord
 Click on the Wordsmith icon on the desktop to
open the program. Select concord in order to
search a corpus for a certain word or phrase. You
can now choose a corpus and select those files of
the corpus you want to analyse.
Corpus linguistics
Some further options for entering a search word or
phrase:
 By using the asterisk *, you can widen the scope of your
search. For example, entering going as a search word will
provide you only with all instances of going; entering going
to with all instances of going to. If you type in go*, on the
other hand, you will get all words beginning with go-, e.g.
going, goes, gold. Searching for *ing, you will get all words
ending in –ing, e.g. swimming, dancing, sing.
WordSmith WordList
 The tool WordList generates word lists of the
selected text files and enables you to compare the
length of text files or corpora.
 Moreover, you can use WordList to compare the
frequency of a word in different text files or
across genres and to identify common clusters.
AntConc Concordance tool
 This tool shows the words or word strings you want to
analyse in their textual context.
 Select the files you want to analyse: File > Open file(s)
 Choose the tab "Concordance"
 Type in a search word (“Search Term”, bottom left-hand
corner)
Corpus linguistics
 More reliable than intuition.
 Language patterns are easily identified.
 Deconstruct texts to discover patterns.
 Track the development of specific features in the
history of English.
 Test hypothesis on specific language features
empirically.
 Follow language acquisition properly.
 Draw conclusions on large amounts of linguistics data.
 Frequency rather than the possibility.
 Not always a complete picture.
Corpus linguistics
 More communicative modes:
 spoken corpora, interactional corpora (classroom
interactions, authentic interactions, etc) multimodal
corpora, corpora of textbook materials, etc.
 More text types and genres, to cover text types which are
less represented in corpora (letters, emails, leaflets, TV
programs, book synopses, recipes, short notes, chat room
logs, etc.),
 More longitudinal language data:
 from beginners to advanced levels, from children to adults,
from L1 to L2.
 More variables:
 more language learning variables should be collected and
encoded at the time of corpus collection (proficiency, language
aptitude, motivation, more precise description of the task, of
temporal, social or situational settings, etc).
 More languages:
 to counterbalance the predominance of Anglo-Saxon native and
learner corpora and to foster the computer-aided analysis of
different languages and language families.
Corpus linguistics
 Prior to Corpus Linguistics it was difficult to note patterns of
use in language, since observing and tracking usage patterns
was a monumental task.
 Scholars have used various types of corpora to gain insights
into changes related to language development, both in first
and second language situations.
 Corpus Linguistics can help in telling about language use and
how it varies in different situations.
Corpus linguistics
Corpus linguistics

More Related Content

What's hot

Principles And Parameter Of Universal Grammar
Principles And Parameter Of Universal GrammarPrinciples And Parameter Of Universal Grammar
Principles And Parameter Of Universal GrammarDr. Cupid Lucid
 
Applied linguistics presentation
Applied linguistics  presentationApplied linguistics  presentation
Applied linguistics presentationMuhammad Furqan
 
Two Views of Discourse Structure: As a Product and As a Process
Two Views of Discourse Structure: As a Product and As a ProcessTwo Views of Discourse Structure: As a Product and As a Process
Two Views of Discourse Structure: As a Product and As a ProcessCRISALDO CORDURA
 
Discourse Analysis and Pragmatics
Discourse Analysis and PragmaticsDiscourse Analysis and Pragmatics
Discourse Analysis and PragmaticsMutiara Ayu
 
The role of universal grammar in first and second language acquisition
The role of universal grammar in first and second language acquisitionThe role of universal grammar in first and second language acquisition
The role of universal grammar in first and second language acquisitionSajjad Zehri
 
what is stylistics and its levels 1.Phonological level 2.Graphological leve...
what is stylistics and its levels 1.Phonological level   2.Graphological leve...what is stylistics and its levels 1.Phonological level   2.Graphological leve...
what is stylistics and its levels 1.Phonological level 2.Graphological leve...RajpootBhatti5
 
01 sociolinguistic
01 sociolinguistic01 sociolinguistic
01 sociolinguisticankimakwana
 
Stylistics introduction, Definitions of Stylistics
Stylistics introduction, Definitions of StylisticsStylistics introduction, Definitions of Stylistics
Stylistics introduction, Definitions of StylisticsAngel Ortega
 
Transformational generative grammar
Transformational generative grammarTransformational generative grammar
Transformational generative grammarAliImran376
 
Teun van dijk
Teun van dijkTeun van dijk
Teun van dijkvalen1502
 
DISCOURSE AND POWER
DISCOURSE AND POWERDISCOURSE AND POWER
DISCOURSE AND POWERCDAGCUF
 
Applied linguistics ppt
Applied linguistics pptApplied linguistics ppt
Applied linguistics pptKarimSamnani4
 

What's hot (20)

Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Principles And Parameter Of Universal Grammar
Principles And Parameter Of Universal GrammarPrinciples And Parameter Of Universal Grammar
Principles And Parameter Of Universal Grammar
 
Applied linguistics presentation
Applied linguistics  presentationApplied linguistics  presentation
Applied linguistics presentation
 
Hstorical Linguistics
Hstorical LinguisticsHstorical Linguistics
Hstorical Linguistics
 
Two Views of Discourse Structure: As a Product and As a Process
Two Views of Discourse Structure: As a Product and As a ProcessTwo Views of Discourse Structure: As a Product and As a Process
Two Views of Discourse Structure: As a Product and As a Process
 
Discourse Analysis and Pragmatics
Discourse Analysis and PragmaticsDiscourse Analysis and Pragmatics
Discourse Analysis and Pragmatics
 
Sociolinguistics
SociolinguisticsSociolinguistics
Sociolinguistics
 
History of linguistics - Schools of Linguistics
 History of linguistics - Schools of Linguistics History of linguistics - Schools of Linguistics
History of linguistics - Schools of Linguistics
 
The role of universal grammar in first and second language acquisition
The role of universal grammar in first and second language acquisitionThe role of universal grammar in first and second language acquisition
The role of universal grammar in first and second language acquisition
 
what is stylistics and its levels 1.Phonological level 2.Graphological leve...
what is stylistics and its levels 1.Phonological level   2.Graphological leve...what is stylistics and its levels 1.Phonological level   2.Graphological leve...
what is stylistics and its levels 1.Phonological level 2.Graphological leve...
 
Contrastive analysis
Contrastive analysisContrastive analysis
Contrastive analysis
 
01 sociolinguistic
01 sociolinguistic01 sociolinguistic
01 sociolinguistic
 
Inter-language theory
Inter-language theoryInter-language theory
Inter-language theory
 
Stylistics introduction, Definitions of Stylistics
Stylistics introduction, Definitions of StylisticsStylistics introduction, Definitions of Stylistics
Stylistics introduction, Definitions of Stylistics
 
Transformational generative grammar
Transformational generative grammarTransformational generative grammar
Transformational generative grammar
 
Discourse analysis
Discourse analysisDiscourse analysis
Discourse analysis
 
Generativism
GenerativismGenerativism
Generativism
 
Teun van dijk
Teun van dijkTeun van dijk
Teun van dijk
 
DISCOURSE AND POWER
DISCOURSE AND POWERDISCOURSE AND POWER
DISCOURSE AND POWER
 
Applied linguistics ppt
Applied linguistics pptApplied linguistics ppt
Applied linguistics ppt
 

Viewers also liked

Corpus linguistics the basics
Corpus linguistics the basicsCorpus linguistics the basics
Corpus linguistics the basicsJorge Baptista
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguisticsRaul Vargas
 
Corpus linguistics in language learning
Corpus linguistics in language learningCorpus linguistics in language learning
Corpus linguistics in language learningnfuadah123
 
Corpus linguistics - an introduction
Corpus linguistics  - an introductionCorpus linguistics  - an introduction
Corpus linguistics - an introductionC.B. Balaban
 
Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)Jorge Baptista
 
British national corpus
British national corpusBritish national corpus
British national corpusLaura P
 
Helping Teachers Meet Learner Needs Through Innovative Online Diagnostic Asse...
Helping Teachers Meet Learner Needs Through Innovative Online Diagnostic Asse...Helping Teachers Meet Learner Needs Through Innovative Online Diagnostic Asse...
Helping Teachers Meet Learner Needs Through Innovative Online Diagnostic Asse...CALPER
 
How to Use Corpora in Language Teaching
How to Use Corpora in Language TeachingHow to Use Corpora in Language Teaching
How to Use Corpora in Language TeachingCALPER
 
A case study on college english classroom discourse
A case study on college english classroom discourseA case study on college english classroom discourse
A case study on college english classroom discourseAzam Almubarki
 
Corpus approaches to discourse analysis
Corpus approaches to discourse analysisCorpus approaches to discourse analysis
Corpus approaches to discourse analysisAseel K. Mahmood
 
Corpus Tools for Language Teaching
Corpus Tools for Language TeachingCorpus Tools for Language Teaching
Corpus Tools for Language TeachingCALPER
 
What can a corpus tell us about discourse
What can a corpus tell us about discourseWhat can a corpus tell us about discourse
What can a corpus tell us about discoursePascual Pérez-Paredes
 
Exploring classroom discourse
Exploring classroom discourse Exploring classroom discourse
Exploring classroom discourse Mona khosravii
 
Reaction Paper for Leadership Seminar - Mark John Lado
Reaction Paper for Leadership Seminar - Mark John LadoReaction Paper for Leadership Seminar - Mark John Lado
Reaction Paper for Leadership Seminar - Mark John LadoMark John Lado, MIT
 
World Englishes Final
World Englishes FinalWorld Englishes Final
World Englishes Finaltinatonio
 

Viewers also liked (20)

Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Corpus linguistics the basics
Corpus linguistics the basicsCorpus linguistics the basics
Corpus linguistics the basics
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Corpus linguistics in language learning
Corpus linguistics in language learningCorpus linguistics in language learning
Corpus linguistics in language learning
 
Corpus linguistics - an introduction
Corpus linguistics  - an introductionCorpus linguistics  - an introduction
Corpus linguistics - an introduction
 
Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)
 
British national corpus
British national corpusBritish national corpus
British national corpus
 
презентация1
презентация1презентация1
презентация1
 
Helping Teachers Meet Learner Needs Through Innovative Online Diagnostic Asse...
Helping Teachers Meet Learner Needs Through Innovative Online Diagnostic Asse...Helping Teachers Meet Learner Needs Through Innovative Online Diagnostic Asse...
Helping Teachers Meet Learner Needs Through Innovative Online Diagnostic Asse...
 
How to Use Corpora in Language Teaching
How to Use Corpora in Language TeachingHow to Use Corpora in Language Teaching
How to Use Corpora in Language Teaching
 
A case study on college english classroom discourse
A case study on college english classroom discourseA case study on college english classroom discourse
A case study on college english classroom discourse
 
Corpus approaches to discourse analysis
Corpus approaches to discourse analysisCorpus approaches to discourse analysis
Corpus approaches to discourse analysis
 
Corpus Tools for Language Teaching
Corpus Tools for Language TeachingCorpus Tools for Language Teaching
Corpus Tools for Language Teaching
 
Language testing
Language testingLanguage testing
Language testing
 
What can a corpus tell us about discourse
What can a corpus tell us about discourseWhat can a corpus tell us about discourse
What can a corpus tell us about discourse
 
Exploring classroom discourse
Exploring classroom discourse Exploring classroom discourse
Exploring classroom discourse
 
Reaction Paper for Leadership Seminar - Mark John Lado
Reaction Paper for Leadership Seminar - Mark John LadoReaction Paper for Leadership Seminar - Mark John Lado
Reaction Paper for Leadership Seminar - Mark John Lado
 
Discourse analysis and language teaching
Discourse analysis and language teachingDiscourse analysis and language teaching
Discourse analysis and language teaching
 
World Englishes Final
World Englishes FinalWorld Englishes Final
World Englishes Final
 

Similar to Corpus linguistics

What corpora are available? by David Y. W.D
What corpora are available? by David Y. W.DWhat corpora are available? by David Y. W.D
What corpora are available? by David Y. W.DRajpootBhatti5
 
Corpus study design
Corpus study designCorpus study design
Corpus study designbikashtaly
 
Corpus linguistics intro
Corpus linguistics introCorpus linguistics intro
Corpus linguistics introAlex Curtis
 
Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Umm-e-Rooman Yaqoob
 
lexicography
lexicographylexicography
lexicographyayfa
 
Corpus based translation Studies
Corpus based translation StudiesCorpus based translation Studies
Corpus based translation StudiesHabib Ali
 
Computer assisted text and corpus analysis
Computer assisted text and corpus analysisComputer assisted text and corpus analysis
Computer assisted text and corpus analysisRubyaShaheen
 
11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)ThennarasuSakkan
 
Types of corpus linguistics Parallel ,aligned...
 Types of corpus linguistics Parallel ,aligned... Types of corpus linguistics Parallel ,aligned...
Types of corpus linguistics Parallel ,aligned...RajpootBhatti5
 
Sinopsis
SinopsisSinopsis
Sinopsisayfa
 
What can a corpus tell us about registers and genres douglas biber
What can a corpus tell us about registers and genres douglas biberWhat can a corpus tell us about registers and genres douglas biber
What can a corpus tell us about registers and genres douglas biberPascual Pérez-Paredes
 
Sinopsis
SinopsisSinopsis
Sinopsisayfa
 
Using corpora in instruction
Using corpora in instructionUsing corpora in instruction
Using corpora in instructionJonathan Smart
 
The Corpus In The Classroom
The Corpus In The ClassroomThe Corpus In The Classroom
The Corpus In The ClassroomColin Graham
 
corpus linguistics and lexicography
corpus linguistics and lexicographycorpus linguistics and lexicography
corpus linguistics and lexicographyayfa
 
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdfApplied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdfDr.Badriya Al Mamari
 

Similar to Corpus linguistics (20)

What corpora are available? by David Y. W.D
What corpora are available? by David Y. W.DWhat corpora are available? by David Y. W.D
What corpora are available? by David Y. W.D
 
Corpus study design
Corpus study designCorpus study design
Corpus study design
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Corpus linguistics intro
Corpus linguistics introCorpus linguistics intro
Corpus linguistics intro
 
Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics
 
corpus linguistics.pptx
corpus linguistics.pptxcorpus linguistics.pptx
corpus linguistics.pptx
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
lexicography
lexicographylexicography
lexicography
 
Corpus based translation Studies
Corpus based translation StudiesCorpus based translation Studies
Corpus based translation Studies
 
Computer assisted text and corpus analysis
Computer assisted text and corpus analysisComputer assisted text and corpus analysis
Computer assisted text and corpus analysis
 
11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)
 
Specialist genres
Specialist genresSpecialist genres
Specialist genres
 
Types of corpus linguistics Parallel ,aligned...
 Types of corpus linguistics Parallel ,aligned... Types of corpus linguistics Parallel ,aligned...
Types of corpus linguistics Parallel ,aligned...
 
Sinopsis
SinopsisSinopsis
Sinopsis
 
What can a corpus tell us about registers and genres douglas biber
What can a corpus tell us about registers and genres douglas biberWhat can a corpus tell us about registers and genres douglas biber
What can a corpus tell us about registers and genres douglas biber
 
Sinopsis
SinopsisSinopsis
Sinopsis
 
Using corpora in instruction
Using corpora in instructionUsing corpora in instruction
Using corpora in instruction
 
The Corpus In The Classroom
The Corpus In The ClassroomThe Corpus In The Classroom
The Corpus In The Classroom
 
corpus linguistics and lexicography
corpus linguistics and lexicographycorpus linguistics and lexicography
corpus linguistics and lexicography
 
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdfApplied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
 

Recently uploaded

Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.raviapr7
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.EnglishCEIPdeSigeiro
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRATanmoy Mishra
 
How to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using CodeHow to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using CodeCeline George
 
Department of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfDepartment of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfMohonDas
 
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...Dr. Asif Anas
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptxraviapr7
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17Celine George
 
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?TechSoup
 
Diploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfDiploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfMohonDas
 
5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...CaraSkikne1
 
KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...M56BOOKSTORE PRODUCT/SERVICE
 
How to Create a Toggle Button in Odoo 17
How to Create a Toggle Button in Odoo 17How to Create a Toggle Button in Odoo 17
How to Create a Toggle Button in Odoo 17Celine George
 
Over the counter (OTC)- Sale, rational use.pptx
Over the counter (OTC)- Sale, rational use.pptxOver the counter (OTC)- Sale, rational use.pptx
Over the counter (OTC)- Sale, rational use.pptxraviapr7
 
A gentle introduction to Artificial Intelligence
A gentle introduction to Artificial IntelligenceA gentle introduction to Artificial Intelligence
A gentle introduction to Artificial IntelligenceApostolos Syropoulos
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxDr. Asif Anas
 
Riddhi Kevadiya. WILLIAM SHAKESPEARE....
Riddhi Kevadiya. WILLIAM SHAKESPEARE....Riddhi Kevadiya. WILLIAM SHAKESPEARE....
Riddhi Kevadiya. WILLIAM SHAKESPEARE....Riddhi Kevadiya
 
Optical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptxOptical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptxPurva Nikam
 

Recently uploaded (20)

Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
 
How to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using CodeHow to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using Code
 
Department of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfDepartment of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdf
 
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17
 
Finals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quizFinals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quiz
 
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?
 
Diploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfDiploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdf
 
5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...
 
KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...
 
How to Create a Toggle Button in Odoo 17
How to Create a Toggle Button in Odoo 17How to Create a Toggle Button in Odoo 17
How to Create a Toggle Button in Odoo 17
 
Over the counter (OTC)- Sale, rational use.pptx
Over the counter (OTC)- Sale, rational use.pptxOver the counter (OTC)- Sale, rational use.pptx
Over the counter (OTC)- Sale, rational use.pptx
 
A gentle introduction to Artificial Intelligence
A gentle introduction to Artificial IntelligenceA gentle introduction to Artificial Intelligence
A gentle introduction to Artificial Intelligence
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
 
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdfPersonal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
 
Riddhi Kevadiya. WILLIAM SHAKESPEARE....
Riddhi Kevadiya. WILLIAM SHAKESPEARE....Riddhi Kevadiya. WILLIAM SHAKESPEARE....
Riddhi Kevadiya. WILLIAM SHAKESPEARE....
 
Optical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptxOptical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptx
 

Corpus linguistics

  • 1.  Group Members:  Ayesha Azhar  Bareera Akbar  Irum Masood  Maryam Ahmed  Tahira Jabeen
  • 6.  A Latin word “body / mass”  A collection of written texts, especially the entire works of a particular author or a body of writing on a particular subject: "the Darwinian corpus“ Corpora (plural)
  • 7. History of Corpus Linguistics  Language study is not a new idea.  1921: 30,000 words. A Treasure, but of no use.  1960 with the advent of computer....  The use of collections of COMPUTER-READABLE text for language study.  Brown Corpus of Standard American English.  One million words of American English texts printed in 1964.  First electronic corpus
  • 9. Corpus Linguistics  Linguistics being the scientific study of language and its structure, ‘corpus linguistics’ is the study of language “on the basis of text corpora.”  The analysis does not stop at the description of those texts; rather the contexts are also focused upon.
  • 10. Place for Corpus Linguistics in Applied Linguistics  A means to explore actual patterns of language use.  A tool for developing materials for classroom language instruction.  To explore different questions about language use.  To provide powerful tools for analysis of natural languages.  To give an insight about how language use varies in different situations.
  • 11. Corpora  ‘Corpora’ are a large and structured set of texts (nowadays usually electronically stored and processed).  They are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.
  • 13. General Corpora  The texts that do not belong to a single text type, subject field, or register.  May include written or spoken language, or both.  May include texts produced in one country or many.  They aim to represent language in its broadest sense and to serve as a widely available resource for baseline or comparative studies of general linguistic features.
  • 14.  May be used to produce reference materials for language learning or translation.  Often used as a baseline in comparison with more specialized corpora.  Also sometimes known as ‘reference corpora’.
  • 15. Examples  Brown Corpus – 1 million words.  LOB Corpus – 1 million words.  BNC (British National Corpus) – 100 million words.
  • 16. Specialized Corpora  Texts that are designed with more specific research goals in mind – register-specific descriptions and investigations of language.  It aims to be representative of a given type of text.  Used to investigate a particular type of language.  The kind of texts included are limited:  A time frame – such as a particular century.  A social setting – such as conversations taking place in a bookshop.  A given topic – such as newspaper articles dealing with a particular thing.
  • 17. Examples  Cambridge and Nottingham Corpus of Discourse in English (CANCODE) (informal registers of British English) – 5 million words.  Michigan Corpus of Academic Spoken English (MICASE) (spoken registers in a US academic setting) – 5 million words.
  • 18. Historical or Diachronic Corpora Texts from different periods of time. Aim at representing an earlier stage(s) of a language. They help to trace the development of a language over time.
  • 19. Example Helsinki Corpus - 700 to 1700 texts 1.5 million words
  • 20. Regional Corpora Aim at representing a regional variety of a language, such as dialects.
  • 21. Learner’s Corpora  Aim at representing the language as produced by the learners of a language, and they include spoken or written language samples produced by non-native speakers.  They are used to identify differences among learners’ frequency of words and types of mistakes.  In what respects learners differ from each other and from the language of native speakers
  • 22. Example  Louvain Corpus of Native English Essays (LOCNEE)  International Corpus of Learner English (ICLE)  20,000 words
  • 23. Multilingual Corpora  Any systematic collection of empirical language data enabling linguists to carry out analyses of multilingual individuals, multilingual societies or multilingual communication.
  • 24. Comparable Corpora  Two (or more) corpora in different languages (e.g. English and Spanish) or in different varieties of a language (e.g. Indian English and Canadian English).  They are designed along the same lines – will contain the same proportions of newspaper texts, novels, casual conversation, etc.  Comparable corpora of varieties of the same language can be used to compare those varieties.  Comparable corpora of different languages can be used by translators to identify differences and equivalences in each language.
  • 25. Example  International Corpus of English (ICE) are comparable corpora of 1 million words each of different varieties of English.
  • 26. Parallel Corpora  Two (or more) corpora in different languages, each containing texts that have been translated from one language into the other, or texts that have been produced simultaneously in two or more languages.  Can be used by translators and by learners to find potential equivalent expressions in each language and to investigate differences between languages.
  • 28.  Size  Representativeness  Registers / modes / topics  Demographics  Production / reception  Research goals  Funding  Time  Staff/students
  • 30. Written Corpora  Obtaining/creating, Storing, Organizing Materials Required: -scanner, OCR software Process: -paper document into electronic text file Types: -newspapers, periodicals -small specialized corpora -informal writings (travel diaries, e-mail, discussion, blogs, news groups)
  • 31. Spoken Corpora  deciding on a transcription system I. prosodic/non prosodic II. representing interactional characteristics of speech (over lapping speech, back channels, pauses, non-verbal contextual events) III. permission to use data IV. ensuring anonymity V. avoiding impracticality of data
  • 32. Markup 1. Structural markups: -written corpus: Titles, authors, paragraphs, subheadings, chapters etc. -spoken corpus: Contextual events, paralinguistic features 2: Header: -written corpus: Classification into categories(register, genre, topic domain, discourse mode, formality) -spoken corpus: Demographic infirmation about speaker(gender,social class,occupation,age,native language/dialect) Relationship among the participants
  • 33. Linguistic Annotation Parts of Speech Tagging: Grammatical category, case assigning Prosodic Annotation Phonetic Annotation Syntactic Parsing
  • 34. Advantages of Tagging Vast exploration Frequency Co-occurance Multiple meaning studies Automatically retrievable
  • 36. Concordance Lines  Concordance lines are a useful tool for investigating corpora, but their use is limited by the ability of the human observer to process information.  There are some statistical calculations of collocation and corpus annotation.
  • 37. Frequency and Key-word Lists  A frequency list is a list of all the types in a corpus together with the number of occurrences of each type. Comparing the frequency lists for two corpora can give interesting information  About the differences between the two texts. e.g.) Kennedy (1998)  a comparison between a corpus of Economics texts and one of general academic English→ the words price, cost, demand, curve, firm… are frequently found in the Economics corpus.
  • 38. Keywords  A useful starting point in investigating a specialized corpus.  They can be lexical items which reflect the topic of a particular text but also grammatical words which convey more subtle information.
  • 39. Collocation  The tendency of words to be biased in the way they co-occur.  Statistical measurements of collocation are more reliable, and for this reason a corpus is essential.
  • 40. Measurements of Collocation  Computer programs, which calculate collocation, take a node word and count the instances of all words occurring within a particular span.  (note) the count ignores punctuation marks.  Counts ‘s’ as a separate word.  Ignores sentence boundaries.
  • 41. Tagging and Parsing  Tagging is allocating a part of speech (POS) label to each word in a corpus.  e.g.) the word light ・・・tagged as verb, a noun or an adjective each time it occurs in the corpus.  Parsing is analyzing the sentences in a corpus into their constituent parts, that is, doing.
  • 42. Annotation  General term for tagging and parsing, and also used to describe other kinds of categorisation that may be performed on a corpus. (e.g.) The annotation of a spoken corpus for prosodic features.  The annotation of a corpus of learner English for types of error.  Annotation of anaphora and semantic annotation.
  • 43. Softwares  Special software is used in order to analyze a corpus and certain words or phrases. For example • Sara for the BNC • ICECUP for the ICE Great Britain. • Concordancers can be used for the analysis of almost any corpus.
  • 44. Concordancer  One of the most frequently used concordancers is ‘Wordsmith Tools’.  Its two most important tools are:  Concord and WordList  As an alternative to Wordsmith, you can also use a concordancer called ‘AntConc’ which can be downloaded for free.
  • 45. WordSmith Concord  Click on the Wordsmith icon on the desktop to open the program. Select concord in order to search a corpus for a certain word or phrase. You can now choose a corpus and select those files of the corpus you want to analyse.
  • 47. Some further options for entering a search word or phrase:  By using the asterisk *, you can widen the scope of your search. For example, entering going as a search word will provide you only with all instances of going; entering going to with all instances of going to. If you type in go*, on the other hand, you will get all words beginning with go-, e.g. going, goes, gold. Searching for *ing, you will get all words ending in –ing, e.g. swimming, dancing, sing.
  • 48. WordSmith WordList  The tool WordList generates word lists of the selected text files and enables you to compare the length of text files or corpora.  Moreover, you can use WordList to compare the frequency of a word in different text files or across genres and to identify common clusters.
  • 49. AntConc Concordance tool  This tool shows the words or word strings you want to analyse in their textual context.  Select the files you want to analyse: File > Open file(s)  Choose the tab "Concordance"  Type in a search word (“Search Term”, bottom left-hand corner)
  • 51.  More reliable than intuition.  Language patterns are easily identified.  Deconstruct texts to discover patterns.  Track the development of specific features in the history of English.  Test hypothesis on specific language features empirically.  Follow language acquisition properly.  Draw conclusions on large amounts of linguistics data.  Frequency rather than the possibility.  Not always a complete picture.
  • 53.  More communicative modes:  spoken corpora, interactional corpora (classroom interactions, authentic interactions, etc) multimodal corpora, corpora of textbook materials, etc.  More text types and genres, to cover text types which are less represented in corpora (letters, emails, leaflets, TV programs, book synopses, recipes, short notes, chat room logs, etc.),
  • 54.  More longitudinal language data:  from beginners to advanced levels, from children to adults, from L1 to L2.  More variables:  more language learning variables should be collected and encoded at the time of corpus collection (proficiency, language aptitude, motivation, more precise description of the task, of temporal, social or situational settings, etc).  More languages:  to counterbalance the predominance of Anglo-Saxon native and learner corpora and to foster the computer-aided analysis of different languages and language families.
  • 56.  Prior to Corpus Linguistics it was difficult to note patterns of use in language, since observing and tracking usage patterns was a monumental task.  Scholars have used various types of corpora to gain insights into changes related to language development, both in first and second language situations.  Corpus Linguistics can help in telling about language use and how it varies in different situations.