SlideShare a Scribd company logo
 Group Members:
 Ayesha Azhar
 Bareera Akbar
 Irum Masood
 Maryam Ahmed
 Tahira Jabeen
Incomprehensible
consciously
A sea of
words
Essence of
human
beings
 A Latin word “body / mass”
 A collection of written texts, especially the entire
works of a particular author or a body of writing
on a particular subject: "the Darwinian corpus“
Corpora (plural)
History of Corpus Linguistics
 Language study is not a new idea.
 1921: 30,000 words. A Treasure, but of no use.
 1960 with the advent of computer....
 The use of collections of COMPUTER-READABLE text for
language study.
 Brown Corpus of Standard American English.
 One million words of American English texts printed in 1964.
 First electronic corpus
Corpus Linguistics
 Linguistics being the scientific study of language
and its structure, ‘corpus linguistics’ is the study
of language “on the basis of text corpora.”
 The analysis does not stop at the description of
those texts; rather the contexts are also focused
upon.
Place for Corpus Linguistics in
Applied Linguistics
 A means to explore actual patterns of language use.
 A tool for developing materials for classroom language
instruction.
 To explore different questions about language use.
 To provide powerful tools for analysis of natural
languages.
 To give an insight about how language use varies in
different situations.
Corpora
 ‘Corpora’ are a large and structured set of texts
(nowadays usually electronically stored and
processed).
 They are used to do statistical analysis and
hypothesis testing, checking occurrences or
validating linguistic rules within a specific
language territory.
General Corpora
 The texts that do not belong to a single text type,
subject field, or register.
 May include written or spoken language, or both.
 May include texts produced in one country or
many.
 They aim to represent language in its broadest
sense and to serve as a widely available resource
for baseline or comparative studies of general
linguistic features.
 May be used to produce reference materials for
language learning or translation.
 Often used as a baseline in comparison with more
specialized corpora.
 Also sometimes known as ‘reference corpora’.
Examples
 Brown Corpus – 1 million words.
 LOB Corpus – 1 million words.
 BNC (British National Corpus) – 100 million
words.
Specialized Corpora
 Texts that are designed with more specific research goals
in mind – register-specific descriptions and
investigations of language.
 It aims to be representative of a given type of text.
 Used to investigate a particular type of language.
 The kind of texts included are limited:
 A time frame – such as a particular century.
 A social setting – such as conversations taking place in
a bookshop.
 A given topic – such as newspaper articles dealing
with a particular thing.
Examples
 Cambridge and Nottingham Corpus
of Discourse in English
(CANCODE) (informal registers of
British English) – 5 million words.
 Michigan Corpus of Academic
Spoken English (MICASE) (spoken
registers in a US academic setting) –
5 million words.
Historical or Diachronic Corpora
Texts from different
periods of time.
Aim at representing an
earlier stage(s) of a
language.
They help to trace the
development of a language
over time.
Example
Helsinki Corpus - 700 to 1700 texts
1.5 million words
Regional Corpora
Aim at representing a regional variety of a
language, such as dialects.
Learner’s Corpora
 Aim at representing the language as produced by the
learners of a language, and they include spoken or
written language samples produced by non-native
speakers.
 They are used to identify differences among learners’
frequency of words and types of mistakes.
 In what respects learners differ from each other and
from the language of native speakers
Example
 Louvain Corpus of Native
English Essays (LOCNEE)
 International Corpus of Learner
English (ICLE)
 20,000 words
Multilingual Corpora
 Any systematic collection of empirical language data
enabling linguists to carry out analyses of multilingual
individuals, multilingual societies or multilingual
communication.
Comparable Corpora
 Two (or more) corpora in different languages (e.g. English
and Spanish) or in different varieties of a language (e.g.
Indian English and Canadian English).
 They are designed along the same lines – will contain the
same proportions of newspaper texts, novels, casual
conversation, etc.
 Comparable corpora of varieties of the same language can be
used to compare those varieties.
 Comparable corpora of different languages can be used by
translators to identify differences and equivalences in each
language.
Example
 International Corpus of English (ICE) are
comparable corpora of 1 million words each of
different varieties of English.
Parallel Corpora
 Two (or more) corpora in different languages, each
containing texts that have been translated from one
language into the other, or texts that have been produced
simultaneously in two or more languages.
 Can be used by translators and by learners to find
potential equivalent expressions in each language and to
investigate differences between languages.
 Size
 Representativeness
 Registers / modes / topics
 Demographics
 Production / reception
 Research goals
 Funding
 Time
 Staff/students
Written Corpora
 Obtaining/creating, Storing, Organizing
Materials Required:
-scanner, OCR software
Process:
-paper document into electronic text file
Types:
-newspapers, periodicals
-small specialized corpora
-informal writings (travel diaries, e-mail,
discussion, blogs, news groups)
Spoken Corpora
 deciding on a transcription system
I. prosodic/non prosodic
II. representing interactional characteristics of
speech (over lapping speech, back channels,
pauses, non-verbal contextual events)
III. permission to use data
IV. ensuring anonymity
V. avoiding impracticality of data
Markup
1. Structural markups:
-written corpus: Titles, authors, paragraphs, subheadings,
chapters etc.
-spoken corpus: Contextual events, paralinguistic features
2: Header:
-written corpus:
Classification into categories(register, genre, topic domain, discourse
mode, formality)
-spoken corpus:
Demographic infirmation about speaker(gender,social
class,occupation,age,native language/dialect)
Relationship among the participants
Linguistic Annotation
Parts of Speech Tagging:
Grammatical category, case assigning
Prosodic Annotation
Phonetic Annotation
Syntactic Parsing
Advantages of Tagging
Vast exploration
Frequency
Co-occurance
Multiple meaning studies
Automatically retrievable
Concordance Lines
 Concordance lines are a useful tool for
investigating corpora, but their use is limited by
the ability of the human observer to process
information.
 There are some statistical calculations of
collocation and corpus annotation.
Frequency and Key-word Lists
 A frequency list is a list of all the types in a corpus
together with the number of occurrences of each type.
Comparing the frequency lists for two corpora can give
interesting information
 About the differences between the two texts.
e.g.) Kennedy (1998)
 a comparison between a corpus of Economics texts
and one of general academic English→ the words price,
cost, demand, curve, firm… are frequently found in the
Economics corpus.
Keywords
 A useful starting point in investigating a
specialized corpus.
 They can be lexical items which reflect the topic
of a particular text but also grammatical words
which convey more subtle information.
Collocation
 The tendency of words to be biased in the way
they co-occur.
 Statistical measurements of collocation are more
reliable, and for this reason a
corpus is essential.
Measurements of Collocation
 Computer programs, which calculate collocation,
take a node word and count the instances of all words
occurring within a particular span.
 (note) the count ignores punctuation marks.
 Counts ‘s’ as a separate word.
 Ignores sentence boundaries.
Tagging and Parsing
 Tagging is allocating a part of speech (POS) label
to each word in a corpus.
 e.g.) the word light ・・・tagged as verb, a noun
or an adjective each time it occurs in the corpus.
 Parsing is analyzing the sentences in a corpus
into their constituent parts, that is, doing.
Annotation
 General term for tagging and parsing, and also
used to describe other kinds of categorisation that
may be performed on a corpus.
(e.g.) The annotation of a spoken corpus for
prosodic features.
 The annotation of a corpus of learner English for
types of error.
 Annotation of anaphora and semantic annotation.
Softwares
 Special software is used in order to analyze a corpus and
certain words or phrases.
For example
• Sara for the BNC
• ICECUP for the ICE Great Britain.
• Concordancers can be used for the analysis of almost
any corpus.
Concordancer
 One of the most frequently used concordancers is
‘Wordsmith Tools’.
 Its two most important tools are:
 Concord and WordList
 As an alternative to Wordsmith, you can also use a
concordancer called ‘AntConc’ which can be
downloaded for free.
WordSmith Concord
 Click on the Wordsmith icon on the desktop to
open the program. Select concord in order to
search a corpus for a certain word or phrase. You
can now choose a corpus and select those files of
the corpus you want to analyse.
Some further options for entering a search word or
phrase:
 By using the asterisk *, you can widen the scope of your
search. For example, entering going as a search word will
provide you only with all instances of going; entering going
to with all instances of going to. If you type in go*, on the
other hand, you will get all words beginning with go-, e.g.
going, goes, gold. Searching for *ing, you will get all words
ending in –ing, e.g. swimming, dancing, sing.
WordSmith WordList
 The tool WordList generates word lists of the
selected text files and enables you to compare the
length of text files or corpora.
 Moreover, you can use WordList to compare the
frequency of a word in different text files or
across genres and to identify common clusters.
AntConc Concordance tool
 This tool shows the words or word strings you want to
analyse in their textual context.
 Select the files you want to analyse: File > Open file(s)
 Choose the tab "Concordance"
 Type in a search word (“Search Term”, bottom left-hand
corner)
 More reliable than intuition.
 Language patterns are easily identified.
 Deconstruct texts to discover patterns.
 Track the development of specific features in the
history of English.
 Test hypothesis on specific language features
empirically.
 Follow language acquisition properly.
 Draw conclusions on large amounts of linguistics data.
 Frequency rather than the possibility.
 Not always a complete picture.
 More communicative modes:
 spoken corpora, interactional corpora (classroom
interactions, authentic interactions, etc) multimodal
corpora, corpora of textbook materials, etc.
 More text types and genres, to cover text types which are
less represented in corpora (letters, emails, leaflets, TV
programs, book synopses, recipes, short notes, chat room
logs, etc.),
 More longitudinal language data:
 from beginners to advanced levels, from children to adults,
from L1 to L2.
 More variables:
 more language learning variables should be collected and
encoded at the time of corpus collection (proficiency, language
aptitude, motivation, more precise description of the task, of
temporal, social or situational settings, etc).
 More languages:
 to counterbalance the predominance of Anglo-Saxon native and
learner corpora and to foster the computer-aided analysis of
different languages and language families.
 Prior to Corpus Linguistics it was difficult to note patterns of
use in language, since observing and tracking usage patterns
was a monumental task.
 Scholars have used various types of corpora to gain insights
into changes related to language development, both in first
and second language situations.
 Corpus Linguistics can help in telling about language use and
how it varies in different situations.
Corpus linguistics
Corpus linguistics

More Related Content

What's hot

Norman Fairclough 3D Model and Critical Discourse Analysis
Norman Fairclough 3D Model and Critical Discourse AnalysisNorman Fairclough 3D Model and Critical Discourse Analysis
Norman Fairclough 3D Model and Critical Discourse Analysis
Murk Razzaque
 
04. Mentalism.pptx
04. Mentalism.pptx04. Mentalism.pptx
04. Mentalism.pptx
Muhammad Waqas
 
Introduction to sosiolinguistics
Introduction to sosiolinguisticsIntroduction to sosiolinguistics
Introduction to sosiolinguistics
Sari Kusumaningrum
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
Jitendra Patil
 
What is Sociolinguistics? Explain Its Scope and Origin. BS. English (4th Seme...
What is Sociolinguistics? Explain Its Scope and Origin. BS. English (4th Seme...What is Sociolinguistics? Explain Its Scope and Origin. BS. English (4th Seme...
What is Sociolinguistics? Explain Its Scope and Origin. BS. English (4th Seme...
AleeenaFarooq
 
Two Views of Discourse Structure: As a Product and As a Process
Two Views of Discourse Structure: As a Product and As a ProcessTwo Views of Discourse Structure: As a Product and As a Process
Two Views of Discourse Structure: As a Product and As a Process
CRISALDO CORDURA
 
Stylistics introduction, Definitions of Stylistics
Stylistics introduction, Definitions of StylisticsStylistics introduction, Definitions of Stylistics
Stylistics introduction, Definitions of Stylistics
Angel Ortega
 
Stylistics and it’s relation with linguistics and literature
Stylistics and it’s relation with linguistics and literatureStylistics and it’s relation with linguistics and literature
Stylistics and it’s relation with linguistics and literature
Muhammad Adnan Ejaz
 
Transformational Generative Grammar
Transformational Generative GrammarTransformational Generative Grammar
Transformational Generative Grammar
State University of Makassar
 
Corpus linguistics the basics
Corpus linguistics the basicsCorpus linguistics the basics
Corpus linguistics the basics
Jorge Baptista
 
Prague school slides
Prague school slidesPrague school slides
Prague school slides
noreen zafar
 
Syllabus Design
Syllabus Design Syllabus Design
Syllabus Design
Bochra Benaicha
 
Levels of stylistic analysis
Levels of stylistic analysisLevels of stylistic analysis
Levels of stylistic analysis
Freelancer
 
Language death completed presentation
Language death completed presentationLanguage death completed presentation
Language death completed presentation
zeetariq
 
Diglossia
DiglossiaDiglossia
Diglossia
Youshaib Alam
 
Language Planning
Language PlanningLanguage Planning
Language Planning
Ayesha Mir
 
Code Switching & Codee Mixing
Code Switching & Codee MixingCode Switching & Codee Mixing
Code Switching & Codee Mixing
Junaid Iqbal
 
Language policy and planning
Language policy and planningLanguage policy and planning
Language policy and planning
Carlos Mayora
 
Pidgins creoles - sociolinguistics
Pidgins   creoles - sociolinguistics Pidgins   creoles - sociolinguistics
Pidgins creoles - sociolinguistics Amal Mustafa
 

What's hot (20)

Norman Fairclough 3D Model and Critical Discourse Analysis
Norman Fairclough 3D Model and Critical Discourse AnalysisNorman Fairclough 3D Model and Critical Discourse Analysis
Norman Fairclough 3D Model and Critical Discourse Analysis
 
04. Mentalism.pptx
04. Mentalism.pptx04. Mentalism.pptx
04. Mentalism.pptx
 
Introduction to sosiolinguistics
Introduction to sosiolinguisticsIntroduction to sosiolinguistics
Introduction to sosiolinguistics
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
What is Sociolinguistics? Explain Its Scope and Origin. BS. English (4th Seme...
What is Sociolinguistics? Explain Its Scope and Origin. BS. English (4th Seme...What is Sociolinguistics? Explain Its Scope and Origin. BS. English (4th Seme...
What is Sociolinguistics? Explain Its Scope and Origin. BS. English (4th Seme...
 
Two Views of Discourse Structure: As a Product and As a Process
Two Views of Discourse Structure: As a Product and As a ProcessTwo Views of Discourse Structure: As a Product and As a Process
Two Views of Discourse Structure: As a Product and As a Process
 
Stylistics introduction, Definitions of Stylistics
Stylistics introduction, Definitions of StylisticsStylistics introduction, Definitions of Stylistics
Stylistics introduction, Definitions of Stylistics
 
Stylistics and it’s relation with linguistics and literature
Stylistics and it’s relation with linguistics and literatureStylistics and it’s relation with linguistics and literature
Stylistics and it’s relation with linguistics and literature
 
Transformational Generative Grammar
Transformational Generative GrammarTransformational Generative Grammar
Transformational Generative Grammar
 
Corpus linguistics the basics
Corpus linguistics the basicsCorpus linguistics the basics
Corpus linguistics the basics
 
Prague school slides
Prague school slidesPrague school slides
Prague school slides
 
Saussure
Saussure Saussure
Saussure
 
Syllabus Design
Syllabus Design Syllabus Design
Syllabus Design
 
Levels of stylistic analysis
Levels of stylistic analysisLevels of stylistic analysis
Levels of stylistic analysis
 
Language death completed presentation
Language death completed presentationLanguage death completed presentation
Language death completed presentation
 
Diglossia
DiglossiaDiglossia
Diglossia
 
Language Planning
Language PlanningLanguage Planning
Language Planning
 
Code Switching & Codee Mixing
Code Switching & Codee MixingCode Switching & Codee Mixing
Code Switching & Codee Mixing
 
Language policy and planning
Language policy and planningLanguage policy and planning
Language policy and planning
 
Pidgins creoles - sociolinguistics
Pidgins   creoles - sociolinguistics Pidgins   creoles - sociolinguistics
Pidgins creoles - sociolinguistics
 

Viewers also liked

Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
King Saud University
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguisticsRaul Vargas
 
Corpus linguistics in language learning
Corpus linguistics in language learningCorpus linguistics in language learning
Corpus linguistics in language learningnfuadah123
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguisticsAlicia Ruiz
 
Corpus linguistics - an introduction
Corpus linguistics  - an introductionCorpus linguistics  - an introduction
Corpus linguistics - an introduction
C.B. Balaban
 
Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)
Jorge Baptista
 
British national corpus
British national corpusBritish national corpus
British national corpusLaura P
 
Helping Teachers Meet Learner Needs Through Innovative Online Diagnostic Asse...
Helping Teachers Meet Learner Needs Through Innovative Online Diagnostic Asse...Helping Teachers Meet Learner Needs Through Innovative Online Diagnostic Asse...
Helping Teachers Meet Learner Needs Through Innovative Online Diagnostic Asse...
CALPER
 
How to Use Corpora in Language Teaching
How to Use Corpora in Language TeachingHow to Use Corpora in Language Teaching
How to Use Corpora in Language Teaching
CALPER
 
A case study on college english classroom discourse
A case study on college english classroom discourseA case study on college english classroom discourse
A case study on college english classroom discourseAzam Almubarki
 
Corpus approaches to discourse analysis
Corpus approaches to discourse analysisCorpus approaches to discourse analysis
Corpus approaches to discourse analysis
Aseel K. Mahmood
 
Corpus Tools for Language Teaching
Corpus Tools for Language TeachingCorpus Tools for Language Teaching
Corpus Tools for Language TeachingCALPER
 
What can a corpus tell us about discourse
What can a corpus tell us about discourseWhat can a corpus tell us about discourse
What can a corpus tell us about discourse
Pascual Pérez-Paredes
 
Exploring classroom discourse
Exploring classroom discourse Exploring classroom discourse
Exploring classroom discourse
Mona khosravii
 
Reaction Paper for Leadership Seminar - Mark John Lado
Reaction Paper for Leadership Seminar - Mark John LadoReaction Paper for Leadership Seminar - Mark John Lado
Reaction Paper for Leadership Seminar - Mark John Lado
Mark John Lado, MIT
 
Discourse analysis and language teaching
Discourse analysis and language teachingDiscourse analysis and language teaching
Discourse analysis and language teaching
Claudia Millafilo Antilef
 
World Englishes Final
World Englishes FinalWorld Englishes Final
World Englishes Final
tinatonio
 
English Grammar: Parts of speech (conjunctions)
English Grammar: Parts of speech (conjunctions)English Grammar: Parts of speech (conjunctions)
English Grammar: Parts of speech (conjunctions)
Rita (Dr. Rita) Zuba Prokopetz
 

Viewers also liked (20)

Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Corpus linguistics in language learning
Corpus linguistics in language learningCorpus linguistics in language learning
Corpus linguistics in language learning
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Corpus linguistics - an introduction
Corpus linguistics  - an introductionCorpus linguistics  - an introduction
Corpus linguistics - an introduction
 
Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)
 
British national corpus
British national corpusBritish national corpus
British national corpus
 
презентация1
презентация1презентация1
презентация1
 
Helping Teachers Meet Learner Needs Through Innovative Online Diagnostic Asse...
Helping Teachers Meet Learner Needs Through Innovative Online Diagnostic Asse...Helping Teachers Meet Learner Needs Through Innovative Online Diagnostic Asse...
Helping Teachers Meet Learner Needs Through Innovative Online Diagnostic Asse...
 
How to Use Corpora in Language Teaching
How to Use Corpora in Language TeachingHow to Use Corpora in Language Teaching
How to Use Corpora in Language Teaching
 
A case study on college english classroom discourse
A case study on college english classroom discourseA case study on college english classroom discourse
A case study on college english classroom discourse
 
Corpus approaches to discourse analysis
Corpus approaches to discourse analysisCorpus approaches to discourse analysis
Corpus approaches to discourse analysis
 
Corpus Tools for Language Teaching
Corpus Tools for Language TeachingCorpus Tools for Language Teaching
Corpus Tools for Language Teaching
 
Language testing
Language testingLanguage testing
Language testing
 
What can a corpus tell us about discourse
What can a corpus tell us about discourseWhat can a corpus tell us about discourse
What can a corpus tell us about discourse
 
Exploring classroom discourse
Exploring classroom discourse Exploring classroom discourse
Exploring classroom discourse
 
Reaction Paper for Leadership Seminar - Mark John Lado
Reaction Paper for Leadership Seminar - Mark John LadoReaction Paper for Leadership Seminar - Mark John Lado
Reaction Paper for Leadership Seminar - Mark John Lado
 
Discourse analysis and language teaching
Discourse analysis and language teachingDiscourse analysis and language teaching
Discourse analysis and language teaching
 
World Englishes Final
World Englishes FinalWorld Englishes Final
World Englishes Final
 
English Grammar: Parts of speech (conjunctions)
English Grammar: Parts of speech (conjunctions)English Grammar: Parts of speech (conjunctions)
English Grammar: Parts of speech (conjunctions)
 

Similar to Corpus linguistics

What corpora are available? by David Y. W.D
What corpora are available? by David Y. W.DWhat corpora are available? by David Y. W.D
What corpora are available? by David Y. W.D
RajpootBhatti5
 
Corpus study design
Corpus study designCorpus study design
Corpus study design
bikashtaly
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
Fatima Batool
 
Corpus linguistics intro
Corpus linguistics introCorpus linguistics intro
Corpus linguistics intro
Alex Curtis
 
Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics
Umm-e-Rooman Yaqoob
 
corpus linguistics.pptx
corpus linguistics.pptxcorpus linguistics.pptx
corpus linguistics.pptx
Subramanian Mani
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
Prof.Ravindra Borse
 
lexicography
lexicographylexicography
lexicographyayfa
 
Corpus based translation Studies
Corpus based translation StudiesCorpus based translation Studies
Corpus based translation Studies
Habib Ali
 
Computer assisted text and corpus analysis
Computer assisted text and corpus analysisComputer assisted text and corpus analysis
Computer assisted text and corpus analysis
RubyaShaheen
 
11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)
ThennarasuSakkan
 
Specialist genres
Specialist genresSpecialist genres
Specialist genres
Pascual Pérez-Paredes
 
Types of corpus linguistics Parallel ,aligned...
 Types of corpus linguistics Parallel ,aligned... Types of corpus linguistics Parallel ,aligned...
Types of corpus linguistics Parallel ,aligned...
RajpootBhatti5
 
Sinopsis
SinopsisSinopsis
Sinopsisayfa
 
What can a corpus tell us about registers and genres douglas biber
What can a corpus tell us about registers and genres douglas biberWhat can a corpus tell us about registers and genres douglas biber
What can a corpus tell us about registers and genres douglas biber
Pascual Pérez-Paredes
 
Sinopsis
SinopsisSinopsis
Sinopsisayfa
 
Using corpora in instruction
Using corpora in instructionUsing corpora in instruction
Using corpora in instructionJonathan Smart
 
The Corpus In The Classroom
The Corpus In The ClassroomThe Corpus In The Classroom
The Corpus In The Classroom
Colin Graham
 
corpus linguistics and lexicography
corpus linguistics and lexicographycorpus linguistics and lexicography
corpus linguistics and lexicographyayfa
 
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdfApplied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Dr.Badriya Al Mamari
 

Similar to Corpus linguistics (20)

What corpora are available? by David Y. W.D
What corpora are available? by David Y. W.DWhat corpora are available? by David Y. W.D
What corpora are available? by David Y. W.D
 
Corpus study design
Corpus study designCorpus study design
Corpus study design
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Corpus linguistics intro
Corpus linguistics introCorpus linguistics intro
Corpus linguistics intro
 
Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics
 
corpus linguistics.pptx
corpus linguistics.pptxcorpus linguistics.pptx
corpus linguistics.pptx
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
lexicography
lexicographylexicography
lexicography
 
Corpus based translation Studies
Corpus based translation StudiesCorpus based translation Studies
Corpus based translation Studies
 
Computer assisted text and corpus analysis
Computer assisted text and corpus analysisComputer assisted text and corpus analysis
Computer assisted text and corpus analysis
 
11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)
 
Specialist genres
Specialist genresSpecialist genres
Specialist genres
 
Types of corpus linguistics Parallel ,aligned...
 Types of corpus linguistics Parallel ,aligned... Types of corpus linguistics Parallel ,aligned...
Types of corpus linguistics Parallel ,aligned...
 
Sinopsis
SinopsisSinopsis
Sinopsis
 
What can a corpus tell us about registers and genres douglas biber
What can a corpus tell us about registers and genres douglas biberWhat can a corpus tell us about registers and genres douglas biber
What can a corpus tell us about registers and genres douglas biber
 
Sinopsis
SinopsisSinopsis
Sinopsis
 
Using corpora in instruction
Using corpora in instructionUsing corpora in instruction
Using corpora in instruction
 
The Corpus In The Classroom
The Corpus In The ClassroomThe Corpus In The Classroom
The Corpus In The Classroom
 
corpus linguistics and lexicography
corpus linguistics and lexicographycorpus linguistics and lexicography
corpus linguistics and lexicography
 
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdfApplied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdf
 

Recently uploaded

Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
EduSkills OECD
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 

Recently uploaded (20)

Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 

Corpus linguistics

  • 1.  Group Members:  Ayesha Azhar  Bareera Akbar  Irum Masood  Maryam Ahmed  Tahira Jabeen
  • 3.
  • 4.
  • 5.
  • 6.  A Latin word “body / mass”  A collection of written texts, especially the entire works of a particular author or a body of writing on a particular subject: "the Darwinian corpus“ Corpora (plural)
  • 7. History of Corpus Linguistics  Language study is not a new idea.  1921: 30,000 words. A Treasure, but of no use.  1960 with the advent of computer....  The use of collections of COMPUTER-READABLE text for language study.  Brown Corpus of Standard American English.  One million words of American English texts printed in 1964.  First electronic corpus
  • 8.
  • 9. Corpus Linguistics  Linguistics being the scientific study of language and its structure, ‘corpus linguistics’ is the study of language “on the basis of text corpora.”  The analysis does not stop at the description of those texts; rather the contexts are also focused upon.
  • 10. Place for Corpus Linguistics in Applied Linguistics  A means to explore actual patterns of language use.  A tool for developing materials for classroom language instruction.  To explore different questions about language use.  To provide powerful tools for analysis of natural languages.  To give an insight about how language use varies in different situations.
  • 11. Corpora  ‘Corpora’ are a large and structured set of texts (nowadays usually electronically stored and processed).  They are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.
  • 12.
  • 13. General Corpora  The texts that do not belong to a single text type, subject field, or register.  May include written or spoken language, or both.  May include texts produced in one country or many.  They aim to represent language in its broadest sense and to serve as a widely available resource for baseline or comparative studies of general linguistic features.
  • 14.  May be used to produce reference materials for language learning or translation.  Often used as a baseline in comparison with more specialized corpora.  Also sometimes known as ‘reference corpora’.
  • 15. Examples  Brown Corpus – 1 million words.  LOB Corpus – 1 million words.  BNC (British National Corpus) – 100 million words.
  • 16. Specialized Corpora  Texts that are designed with more specific research goals in mind – register-specific descriptions and investigations of language.  It aims to be representative of a given type of text.  Used to investigate a particular type of language.  The kind of texts included are limited:  A time frame – such as a particular century.  A social setting – such as conversations taking place in a bookshop.  A given topic – such as newspaper articles dealing with a particular thing.
  • 17. Examples  Cambridge and Nottingham Corpus of Discourse in English (CANCODE) (informal registers of British English) – 5 million words.  Michigan Corpus of Academic Spoken English (MICASE) (spoken registers in a US academic setting) – 5 million words.
  • 18. Historical or Diachronic Corpora Texts from different periods of time. Aim at representing an earlier stage(s) of a language. They help to trace the development of a language over time.
  • 19. Example Helsinki Corpus - 700 to 1700 texts 1.5 million words
  • 20. Regional Corpora Aim at representing a regional variety of a language, such as dialects.
  • 21. Learner’s Corpora  Aim at representing the language as produced by the learners of a language, and they include spoken or written language samples produced by non-native speakers.  They are used to identify differences among learners’ frequency of words and types of mistakes.  In what respects learners differ from each other and from the language of native speakers
  • 22. Example  Louvain Corpus of Native English Essays (LOCNEE)  International Corpus of Learner English (ICLE)  20,000 words
  • 23. Multilingual Corpora  Any systematic collection of empirical language data enabling linguists to carry out analyses of multilingual individuals, multilingual societies or multilingual communication.
  • 24. Comparable Corpora  Two (or more) corpora in different languages (e.g. English and Spanish) or in different varieties of a language (e.g. Indian English and Canadian English).  They are designed along the same lines – will contain the same proportions of newspaper texts, novels, casual conversation, etc.  Comparable corpora of varieties of the same language can be used to compare those varieties.  Comparable corpora of different languages can be used by translators to identify differences and equivalences in each language.
  • 25. Example  International Corpus of English (ICE) are comparable corpora of 1 million words each of different varieties of English.
  • 26. Parallel Corpora  Two (or more) corpora in different languages, each containing texts that have been translated from one language into the other, or texts that have been produced simultaneously in two or more languages.  Can be used by translators and by learners to find potential equivalent expressions in each language and to investigate differences between languages.
  • 27.
  • 28.  Size  Representativeness  Registers / modes / topics  Demographics  Production / reception  Research goals  Funding  Time  Staff/students
  • 29.
  • 30. Written Corpora  Obtaining/creating, Storing, Organizing Materials Required: -scanner, OCR software Process: -paper document into electronic text file Types: -newspapers, periodicals -small specialized corpora -informal writings (travel diaries, e-mail, discussion, blogs, news groups)
  • 31. Spoken Corpora  deciding on a transcription system I. prosodic/non prosodic II. representing interactional characteristics of speech (over lapping speech, back channels, pauses, non-verbal contextual events) III. permission to use data IV. ensuring anonymity V. avoiding impracticality of data
  • 32. Markup 1. Structural markups: -written corpus: Titles, authors, paragraphs, subheadings, chapters etc. -spoken corpus: Contextual events, paralinguistic features 2: Header: -written corpus: Classification into categories(register, genre, topic domain, discourse mode, formality) -spoken corpus: Demographic infirmation about speaker(gender,social class,occupation,age,native language/dialect) Relationship among the participants
  • 33. Linguistic Annotation Parts of Speech Tagging: Grammatical category, case assigning Prosodic Annotation Phonetic Annotation Syntactic Parsing
  • 34. Advantages of Tagging Vast exploration Frequency Co-occurance Multiple meaning studies Automatically retrievable
  • 35.
  • 36. Concordance Lines  Concordance lines are a useful tool for investigating corpora, but their use is limited by the ability of the human observer to process information.  There are some statistical calculations of collocation and corpus annotation.
  • 37. Frequency and Key-word Lists  A frequency list is a list of all the types in a corpus together with the number of occurrences of each type. Comparing the frequency lists for two corpora can give interesting information  About the differences between the two texts. e.g.) Kennedy (1998)  a comparison between a corpus of Economics texts and one of general academic English→ the words price, cost, demand, curve, firm… are frequently found in the Economics corpus.
  • 38. Keywords  A useful starting point in investigating a specialized corpus.  They can be lexical items which reflect the topic of a particular text but also grammatical words which convey more subtle information.
  • 39. Collocation  The tendency of words to be biased in the way they co-occur.  Statistical measurements of collocation are more reliable, and for this reason a corpus is essential.
  • 40. Measurements of Collocation  Computer programs, which calculate collocation, take a node word and count the instances of all words occurring within a particular span.  (note) the count ignores punctuation marks.  Counts ‘s’ as a separate word.  Ignores sentence boundaries.
  • 41. Tagging and Parsing  Tagging is allocating a part of speech (POS) label to each word in a corpus.  e.g.) the word light ・・・tagged as verb, a noun or an adjective each time it occurs in the corpus.  Parsing is analyzing the sentences in a corpus into their constituent parts, that is, doing.
  • 42. Annotation  General term for tagging and parsing, and also used to describe other kinds of categorisation that may be performed on a corpus. (e.g.) The annotation of a spoken corpus for prosodic features.  The annotation of a corpus of learner English for types of error.  Annotation of anaphora and semantic annotation.
  • 43. Softwares  Special software is used in order to analyze a corpus and certain words or phrases. For example • Sara for the BNC • ICECUP for the ICE Great Britain. • Concordancers can be used for the analysis of almost any corpus.
  • 44. Concordancer  One of the most frequently used concordancers is ‘Wordsmith Tools’.  Its two most important tools are:  Concord and WordList  As an alternative to Wordsmith, you can also use a concordancer called ‘AntConc’ which can be downloaded for free.
  • 45. WordSmith Concord  Click on the Wordsmith icon on the desktop to open the program. Select concord in order to search a corpus for a certain word or phrase. You can now choose a corpus and select those files of the corpus you want to analyse.
  • 46.
  • 47. Some further options for entering a search word or phrase:  By using the asterisk *, you can widen the scope of your search. For example, entering going as a search word will provide you only with all instances of going; entering going to with all instances of going to. If you type in go*, on the other hand, you will get all words beginning with go-, e.g. going, goes, gold. Searching for *ing, you will get all words ending in –ing, e.g. swimming, dancing, sing.
  • 48. WordSmith WordList  The tool WordList generates word lists of the selected text files and enables you to compare the length of text files or corpora.  Moreover, you can use WordList to compare the frequency of a word in different text files or across genres and to identify common clusters.
  • 49. AntConc Concordance tool  This tool shows the words or word strings you want to analyse in their textual context.  Select the files you want to analyse: File > Open file(s)  Choose the tab "Concordance"  Type in a search word (“Search Term”, bottom left-hand corner)
  • 50.
  • 51.  More reliable than intuition.  Language patterns are easily identified.  Deconstruct texts to discover patterns.  Track the development of specific features in the history of English.  Test hypothesis on specific language features empirically.  Follow language acquisition properly.  Draw conclusions on large amounts of linguistics data.  Frequency rather than the possibility.  Not always a complete picture.
  • 52.
  • 53.  More communicative modes:  spoken corpora, interactional corpora (classroom interactions, authentic interactions, etc) multimodal corpora, corpora of textbook materials, etc.  More text types and genres, to cover text types which are less represented in corpora (letters, emails, leaflets, TV programs, book synopses, recipes, short notes, chat room logs, etc.),
  • 54.  More longitudinal language data:  from beginners to advanced levels, from children to adults, from L1 to L2.  More variables:  more language learning variables should be collected and encoded at the time of corpus collection (proficiency, language aptitude, motivation, more precise description of the task, of temporal, social or situational settings, etc).  More languages:  to counterbalance the predominance of Anglo-Saxon native and learner corpora and to foster the computer-aided analysis of different languages and language families.
  • 55.
  • 56.  Prior to Corpus Linguistics it was difficult to note patterns of use in language, since observing and tracking usage patterns was a monumental task.  Scholars have used various types of corpora to gain insights into changes related to language development, both in first and second language situations.  Corpus Linguistics can help in telling about language use and how it varies in different situations.