SlideShare a Scribd company logo
Corpus linguistics
(Schmitt,2020)
01
TABLE OF
CONTENTS
03
What is corpus
linguistics?
01
02
04
05
Corpus design and
compilation
What can a corpus
tell us?
Overview of different
types of corpus
studies
How can corpora
inform language
teaching?
(Schmitt,2020)
02
What is corpus
linguistics?
01
(Schmitt,2020)
03
 ‘Corpus linguistics’ has enjoyed much greater popularity, both as a means to explore actual
patterns of language use and as a tool for developing materials for classroom language instruction.
 Corpus linguistics uses large collections of both spoken and written natural texts that are stored on
computers.
 One of the major contributions of corpus linguistics is in the area of exploring patterns of language
use.
 Corpus linguistics and the term ‘corpus’ in its present-day are synonymous with computerized
corpora and methods, but they were not before.
What is corpus linguistics?
(Schmitt,2020)
04
 An empirical approach to linguistic analysis is based on naturally occurring spoken or written data.
 Advances in technology have led to a number of advantages for corpus linguists, including the
collection of larger language samples and the ability for faster and more efficient text processing .
 Characteristic of corpus-based analyses of language:
o It is empirical, analysing the actual patterns of use in natural texts.
o It utilizes a large and principled collection of natural texts.
o It makes extensive use of computers for analysis, using both automatic and interactive techniques.
o It depends on both quantitative and qualitative analytical techniques.
What is corpus linguistics?
(Schmitt,2020)
05
 A corpus refers to a large principled collection of natural texts.
 The use of natural texts means that language has been collected from naturally occurring sources.
 Examples of well-known corpora:
o The British National Corpus (BNC)
o The Corpus of Contemporary American English (COCA)
o The Brown Corpus
 The text collection process for building a corpus needs to be principled to ensure
representativeness and balance.
What is corpus linguistics?
(Schmitt,2020)
06
 The linguistic features or research questions being investigated will shape the collection of texts
used in creating the corpus.
 Although computers make possible a wide range of sophisticated statistical techniques, human
analysts are still needed to decide what information is worth searching for, to extract that
information from the corpus and to interpret the findings.
 Corpus linguistics bring together aspects of quantitative and qualitative technique.
 The quantitative analyses provide an accurate view of more macro-level characteristics, whereas
the qualitative analyses provide the complementary micro-level perspective.
What is corpus linguistics?
(Schmitt,2020)
07
Corpus design and
compilation
02
(Schmitt,2020)
08
 Although there is no minimum size for a text collection to be considered a corpus, an early
standard size set by the creators of the Brown Corpus was one million words.
 A number of well-known specialized corpora are much smaller than that, but there is a
general assumption that for most tasks within corpus linguistics, larger corpora are more
valuable.
 Modern corpora are available to other researchers and free of charge.
 They enable researchers all over the world to access the same sets of data, which
encourages a higher degree of accountability in data analysis and permits collaborative
studies by different researchers.
Corpus design and compilation
(Schmitt,2020)
09
.....
Types of corpora
(Schmitt,2020)
10
A. General corpora
o BNC contains 100 million words and the COCA had 560 million words.
o Brown and LOB, at a mere one million words.
 It designed to be balanced and include language samples from a wide range of registers or
genres.
 Most of the early general corpora were limited to written language, because written texts are
vastly easier and cheaper to compile than transcripts of speech.
 A few corpora dedicated to spoken discourse.
o The Cambridge and Nottingham Corpus of Discourse in English (CANCODE).
Types of corpora
(Schmitt,2020)
11
B. Specialized corpora
 They designed with more specific research goals in mind and they considered the most crucial
‘growth area’ for corpus linguistics.
 Specialized corpora may include both spoken and written components.
o International Corpus of English (ICE)
o The TOEFL-2000 Spoken and Written Academic Language Corpus
 A specialized corpus focuses on a particular spoken or written variety of language.
• Historical corpora such as the Archer Corpus (two million words of British and American English dating from
1650 to 1990).
• ‘Learner’s corpus’ (spoken or written language samples produced by non-native speakers).
Types of corpora
(Schmitt,2020)
12
.....
Issues in corpus design
(Schmitt,2020)
13
 One of the most important factors in corpus linguistics is the design of the corpus.
 This design of the corpus impacts all of the analysis and results.
 The composition of the corpus should reflect the anticipated research goals.
 For example:
o Comparing patterns of language found in spoken and written discourse.
• The corpus has to include a range of possible spoken and written texts.
• The information derived from the corpus accurately reflects the variation possible in the
patterns being compared across the two registers.
Issues in corpus design
(Schmitt,2020)
14
 A well-designed corpus should aim to be representative of the types of language included in it.
 There are many different ways to conceive of and justify representativeness:
a. A representative of different registers (fiction, casual conversation) and topics. (national vs local news).
b. A representativeness involves the demographics of the speakers or writers (nationality, gender,
education level).
c. A representative based on production or reception (e-mail messages, newspapers).
 All these issues must be weighed when deciding how much of each category to include.
 In thinking about the research goals of a corpus, compilers must bear in mind the intended
distribution of the corpus.
Issues in corpus design
(Schmitt,2020)
15
.....
Corpus compilation
(Schmitt,2020)
16
 When creating a corpus, data collection involves obtaining or creating electronic versions of the
target texts, and storing and organizing them.
 Data collection for a written corpus means using a scanner and optical character recognition
(OCR) software to scan paper documents into electronic text files.
 OCR is not error-free and manual proofreading and error-correction is necessary.
 The data collection of spoken corpus is long and expensive.
• A transcription system (an orthographic transcription system)
• The representative of interactional characteristics of the speech in the transcripts.
 An important issue for both spoken and written corpus during data collection is obtaining
permission to use the data for the corpus.
Corpus compilation
(Schmitt,2020)
17
.....
Markup and annotation
(Schmitt,2020)
18
 A simple corpus could consist of raw text, with no additional information provided about the
origins, authors, speakers, structure or contents of the texts themselves.
 Encoding some of the information in the form of markup makes the corpus more useful.
 Structural markup refers to the use of codes in the texts to identify structural features of the
text.
o A written corpus (titles, authors, chapters)
o A spoken corpus (speakers, paralinguistic features)
 Many corpora provide information about the contents and creation of each text in what is called
a header.
 Headers include classifications of the text into categories, such as register, genre, topic domain.
Markup and annotation
(Schmitt,2020)
19
 Some corpora are also encoded with certain types of linguistic annotation.
 There are different kinds of linguistic processing or annotation:
A. Part-of-speech tagging which involves assigning a grammatical category tag to each word in the
corpus.
o ‘A goat can eat shoes’ A (indefinite article) goat (noun, singular) can (modal) eat (main verb) shoes (noun,
plural).
B. prosodic and phonetic annotation, which are not uncommon.
C. Syntactic parsing, which is much less common.
 A tagged corpus allows researchers to answer different types of questions, explore the
frequency of lexical items, grammatical structures, and addresses the problem of words that
have multiple meanings or functions.
Markup and annotation
(Schmitt,2020)
20
What can a corpus
tell us?
03
(Schmitt,2020)
21
.....
Word counts and basic corpus
tools
(Schmitt,2020)
22
 There are many levels of information that can be gathered from a corpus.
 These levels range from simple word lists to complex grammatical structures and interactive
analyses.
 Analyses can explore individual lexical or linguistic features or identify clusters.
 The tools that are used for these analyses range from basic to complex computer programs.
 The most basic information that we can get from a corpus, is frequency of occurrence
information.
o MonoConc, WordSmith Tools, and Antconc
 A word list is a list of all the words that occur in the corpus that arranged in alphabetic or
frequency order.
Word counts and basic corpus tools
(Schmitt,2020)
23
(Schmitt,2020)
24
(Schmitt,2020)
25
 Concordancing packages can provide additional information about lexical co-occurrence
patterns.
 Once the search word is selected, the program can search the texts in the corpus and provide a
list of each occurrence of the target word in context this is called ‘key word in context’ (KWIC).
 A concordance program can also provide information about words that tend to occur together in
the corpus in what is called ‘collocates’, and the resulting sets of words are called
‘collocations’.
 An analysis of collocations provides important information about grammatical and semantic
patterns of use for individual lexical items.
 The corpus analysis can discover patterns of use that were unnoticed before.
 For example, synonymous verbs begin and start have the same grammatical potential.
Word counts and basic corpus tools
(Schmitt,2020)
26
(Schmitt,2020)
27
.....
Working with tagged texts
(Schmitt,2020)
28
 In order to carry out more sophisticated types of corpus analyses, it is often necessary to have
a tagged corpus.
 when a corpus is tagged, each word in the corpus is given a grammatical label.
 The process of assigning grammatical labels to words is complex.
 For example:‘ can’ falls into two grammatical categories.
o It can be a modal ‘I can reach the book’.
o It can be used as a noun ‘Put the paper in the can’.
 Computers can accurately identify the grammatical labels for many words.
 There are certain features that remain elusive, and here the program will bring the problematic
to the screen for the user to select the correct classification.
 Once texts have been tagged it provide a fuller picture of the texts in a register.
Working with tagged texts
(Schmitt,2020)
29
Overview of different
types of corpus
studies
04
(Schmitt,2020)
30
 Over the years, corpora have been used to address a number of interesting issues such as the
question of language change.
 The area of historical linguistics which look in how language has changed over the centuries.
 Scholars have also look into to language development, in first and second language situations.
 Corpora have also been used to explore similarities or differences across different national or
regional varieties of English (Australian English, American English, Indian English).
 There also studies explore the differences between spoken and written language.
 Before corpus linguistics it was difficult to note patterns of use, since observing and tracking
use patterns was a huge task.
Overview of different types of corpus
studies
(Schmitt,2020)
31
How can corpora
inform language
teaching?
05
(Schmitt,2020)
32
 There is impact of corpus linguistic studies on classroom language teaching practices.
 Corpus-based studies of particular language features such as The Longman Grammar of Spoken and
Written English will serve language teachers by providing a basis for deciding which language
features and structures are important.
 Teachers and materials writers can have a basis for selecting the material that is being
presented.
 Rather than basing pedagogical decisions on intuitions, these decisions can now be grounded
on actual patterns of language use in various situations.
How can corpora inform language teaching?
(Schmitt,2020)
33
.....
Bringing corpora into the
language classroom
(Schmitt,2020)
34
 Corpus-based information can be brought to bear on language teaching in two ways:
1. Teachers can shape instruction based on corpus-based information.
• They can consult corpus studies to gain information about the features that they are teaching.
• For example:
o ‘Conversational English’ teachers could read corpus investigations on spoken language
to determine which features and grammatical structures are characteristic of conversational
English.
Bringing corpora into the language classroom
(Schmitt,2020)
35
2. Learners interact with corpora.
 This can take place in one of two ways:
A. If computer facilities are adequate learners can be actively involved in exploring corpora.
B. If adequate facilities do not exist teachers can bring the results from corpus searches for
use in the classroom.
 The use of concordancing tasks in the classroom is a matter of some controversy.
• It strongly advocated by those who favour an inductive or data-driven approach to learning.
• It criticized by others who argue that it is difficult to guide students appropriately in the
analysis of vast numbers of linguistic examples.
Bringing corpora into the language classroom
(Schmitt,2020)
36
.....
Examples of corpus-based
classroom activities
(Schmitt,2020)
37
 The creation of appropriate, corpus-based teaching materials takes time, careful planning and
access to a few basic tools and resources.
 The activities will require access to a computer, texts and to a concordancing package.
 Several vocabulary activities can be generated through simple frequency lists and concordance
output.
 The vocabulary frequency list can be used to identify vocabulary words that need to be taught.
 Frequency lists can also be a starting point for students to group words by grammatical
category (verb, nouns, etc.) or semantic categories.
Examples of corpus-based classroom activities
(Schmitt,2020)
38
 Concordances of target words can be used to better understand those words’ meanings and
usage.
 The use of a word and its patterning characteristics also contribute to its meaning senses.
 For example, words often are seen as synonymous when actually, their use is not synonymous.
 Dictionaries often list the ‘resulting copulas’ become, turn, go and come as synonyms, with
meanings like ‘to become’, ‘to get to be’, ‘to result’, ‘to turn out’.
 Most dictionaries provide no clues to how these four words might differ in meaning.
 Corpus research shows that these words differ dramatically in their typical contexts of use.
Examples of corpus-based classroom activities
(Schmitt,2020)
39
o ‘turn’ change of colour or physical appearance.
(The water turned grey)
o ‘go’ describes a change to a negative state.
(go crazy, go bad, go wrong)
o ‘come’ describe a change to a more active state.
(come awake, come alive)
 If corpus activities coupled with dictionary activities, they can provide a much richer language-
learning environment for student.
 The patterns of language use that can be discovered through corpus linguistics will continue to
reshape the way we think of language.
Examples of corpus-based classroom activities
(Schmitt,2020)
40
Schmitt, N. (2020). An introduction to applied linguistics. Routledge.
RESOURCES
41
Do You Have Any Question?
THANKS
42

More Related Content

What's hot

Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
Irum Malik
 
Applied linguistics
Applied linguisticsApplied linguistics
Applied linguistics
Jordán Masías
 
Language planning
Language planningLanguage planning
Language planning
Erhan Bektaş
 
Applied linguistics revision of theories
Applied linguistics revision of theoriesApplied linguistics revision of theories
Applied linguistics revision of theories
Youssef Oustad
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
Prof.Ravindra Borse
 
Discourse structure as process
Discourse structure as processDiscourse structure as process
Discourse structure as process
dyta maykasari
 
Language policy and planning
Language policy and planningLanguage policy and planning
Language policy and planning
Carlos Mayora
 
Functional linguistics
Functional linguisticsFunctional linguistics
Functional linguistics
Munawar Munir
 
Applied linguistics presentation
Applied linguistics  presentationApplied linguistics  presentation
Applied linguistics presentation
Muhammad Furqan
 
Introduction to corpus linguistics 1
Introduction to corpus linguistics 1Introduction to corpus linguistics 1
Introduction to corpus linguistics 1
Rafia Sheikh
 
Language standardization: How and why
Language standardization: How and whyLanguage standardization: How and why
Language standardization: How and why
adm-2012
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
Fatima Batool
 
Applied linguistics ppt
Applied linguistics pptApplied linguistics ppt
Applied linguistics ppt
KarimSamnani4
 
Discourse analysis
Discourse analysis Discourse analysis
Discourse analysis
Sony Calderon
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
Jitendra Patil
 
Sociolinguistics - Language Contact
Sociolinguistics - Language ContactSociolinguistics - Language Contact
Sociolinguistics - Language Contact
Ahmet Ateş
 
Relationship between language, culture, and identity
Relationship between language, culture, and identityRelationship between language, culture, and identity
Relationship between language, culture, and identity
Cool Chaandni
 
Contrastive analysis
Contrastive analysisContrastive analysis
Contrastive analysis
damarisescobar1911
 
What is Applied Linguistics?
What is Applied Linguistics?What is Applied Linguistics?
What is Applied Linguistics?
Shajaira Lopez
 
Language Planning
Language PlanningLanguage Planning
Language Planning
Ayesha Mir
 

What's hot (20)

Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Applied linguistics
Applied linguisticsApplied linguistics
Applied linguistics
 
Language planning
Language planningLanguage planning
Language planning
 
Applied linguistics revision of theories
Applied linguistics revision of theoriesApplied linguistics revision of theories
Applied linguistics revision of theories
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Discourse structure as process
Discourse structure as processDiscourse structure as process
Discourse structure as process
 
Language policy and planning
Language policy and planningLanguage policy and planning
Language policy and planning
 
Functional linguistics
Functional linguisticsFunctional linguistics
Functional linguistics
 
Applied linguistics presentation
Applied linguistics  presentationApplied linguistics  presentation
Applied linguistics presentation
 
Introduction to corpus linguistics 1
Introduction to corpus linguistics 1Introduction to corpus linguistics 1
Introduction to corpus linguistics 1
 
Language standardization: How and why
Language standardization: How and whyLanguage standardization: How and why
Language standardization: How and why
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Applied linguistics ppt
Applied linguistics pptApplied linguistics ppt
Applied linguistics ppt
 
Discourse analysis
Discourse analysis Discourse analysis
Discourse analysis
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Sociolinguistics - Language Contact
Sociolinguistics - Language ContactSociolinguistics - Language Contact
Sociolinguistics - Language Contact
 
Relationship between language, culture, and identity
Relationship between language, culture, and identityRelationship between language, culture, and identity
Relationship between language, culture, and identity
 
Contrastive analysis
Contrastive analysisContrastive analysis
Contrastive analysis
 
What is Applied Linguistics?
What is Applied Linguistics?What is Applied Linguistics?
What is Applied Linguistics?
 
Language Planning
Language PlanningLanguage Planning
Language Planning
 

Similar to Corpus linguistics, ch6

11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)
ThennarasuSakkan
 
11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)
ThennarasuSakkan
 
lexicographic evidence
lexicographic evidencelexicographic evidence
lexicographic evidence
Duygu Aşıklar
 
Corpus study design
Corpus study designCorpus study design
Corpus study design
bikashtaly
 
wk 5 Key issues for corpora design selection.pptx
wk 5 Key issues for corpora design  selection.pptxwk 5 Key issues for corpora design  selection.pptx
wk 5 Key issues for corpora design selection.pptx
Afida Mohamad Ali
 
Corpus
CorpusCorpus
Corpus
EstiVivanco
 
Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics
Umm-e-Rooman Yaqoob
 
Specialist genres
Specialist genresSpecialist genres
Specialist genres
Pascual Pérez-Paredes
 
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)
Cornelius Puschmann
 
TOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORATOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORA
csandit
 
Com ling
Com lingCom ling
Com ling
Mohammad Raza
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
AdnanBaloch15
 
STRUCTURED AND QUANTITATIVE PROPERTIES OF ARABIC SMS-BASED CLASSIFIED ADS SUB...
STRUCTURED AND QUANTITATIVE PROPERTIES OF ARABIC SMS-BASED CLASSIFIED ADS SUB...STRUCTURED AND QUANTITATIVE PROPERTIES OF ARABIC SMS-BASED CLASSIFIED ADS SUB...
STRUCTURED AND QUANTITATIVE PROPERTIES OF ARABIC SMS-BASED CLASSIFIED ADS SUB...
ijnlc
 
AUTOMATIC DETECTION AND LANGUAGE IDENTIFICATION OF MULTILINGUAL DOCUMENTS
AUTOMATIC DETECTION AND LANGUAGE IDENTIFICATION OF MULTILINGUAL DOCUMENTSAUTOMATIC DETECTION AND LANGUAGE IDENTIFICATION OF MULTILINGUAL DOCUMENTS
AUTOMATIC DETECTION AND LANGUAGE IDENTIFICATION OF MULTILINGUAL DOCUMENTS
IRJET Journal
 
Corpus and bnc
Corpus and bncCorpus and bnc
Corpus and bnc
moona butt
 
Barbiers iclave-fr
Barbiers iclave-frBarbiers iclave-fr
Barbiers iclave-fr
Raj Wali Khan
 
Academic and professional written genres - By Giovanni Parodi
Academic and professional written genres - By Giovanni ParodiAcademic and professional written genres - By Giovanni Parodi
Academic and professional written genres - By Giovanni Parodi
vercingetorix2
 
Introduction to automated text analyses in the Political Sciences
Introduction to automated text analyses in the Political SciencesIntroduction to automated text analyses in the Political Sciences
Introduction to automated text analyses in the Political Sciences
ChristianRauh2
 
The Usage of Because of-Words in British National Corpus
 The Usage of Because of-Words in British National Corpus The Usage of Because of-Words in British National Corpus
The Usage of Because of-Words in British National Corpus
Research Journal of Education
 
British national corpus
British national corpusBritish national corpus
British national corpus
Laura P
 

Similar to Corpus linguistics, ch6 (20)

11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)
 
11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)
 
lexicographic evidence
lexicographic evidencelexicographic evidence
lexicographic evidence
 
Corpus study design
Corpus study designCorpus study design
Corpus study design
 
wk 5 Key issues for corpora design selection.pptx
wk 5 Key issues for corpora design  selection.pptxwk 5 Key issues for corpora design  selection.pptx
wk 5 Key issues for corpora design selection.pptx
 
Corpus
CorpusCorpus
Corpus
 
Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics
 
Specialist genres
Specialist genresSpecialist genres
Specialist genres
 
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)
 
TOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORATOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORA
 
Com ling
Com lingCom ling
Com ling
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
STRUCTURED AND QUANTITATIVE PROPERTIES OF ARABIC SMS-BASED CLASSIFIED ADS SUB...
STRUCTURED AND QUANTITATIVE PROPERTIES OF ARABIC SMS-BASED CLASSIFIED ADS SUB...STRUCTURED AND QUANTITATIVE PROPERTIES OF ARABIC SMS-BASED CLASSIFIED ADS SUB...
STRUCTURED AND QUANTITATIVE PROPERTIES OF ARABIC SMS-BASED CLASSIFIED ADS SUB...
 
AUTOMATIC DETECTION AND LANGUAGE IDENTIFICATION OF MULTILINGUAL DOCUMENTS
AUTOMATIC DETECTION AND LANGUAGE IDENTIFICATION OF MULTILINGUAL DOCUMENTSAUTOMATIC DETECTION AND LANGUAGE IDENTIFICATION OF MULTILINGUAL DOCUMENTS
AUTOMATIC DETECTION AND LANGUAGE IDENTIFICATION OF MULTILINGUAL DOCUMENTS
 
Corpus and bnc
Corpus and bncCorpus and bnc
Corpus and bnc
 
Barbiers iclave-fr
Barbiers iclave-frBarbiers iclave-fr
Barbiers iclave-fr
 
Academic and professional written genres - By Giovanni Parodi
Academic and professional written genres - By Giovanni ParodiAcademic and professional written genres - By Giovanni Parodi
Academic and professional written genres - By Giovanni Parodi
 
Introduction to automated text analyses in the Political Sciences
Introduction to automated text analyses in the Political SciencesIntroduction to automated text analyses in the Political Sciences
Introduction to automated text analyses in the Political Sciences
 
The Usage of Because of-Words in British National Corpus
 The Usage of Because of-Words in British National Corpus The Usage of Because of-Words in British National Corpus
The Usage of Because of-Words in British National Corpus
 
British national corpus
British national corpusBritish national corpus
British national corpus
 

More from VivaAs

(Applied linguistics) schmitt's book ch 10
(Applied linguistics)  schmitt's book ch 10(Applied linguistics)  schmitt's book ch 10
(Applied linguistics) schmitt's book ch 10
VivaAs
 
(Applied linguistics) cook's book ch 8
(Applied linguistics) cook's book ch 8(Applied linguistics) cook's book ch 8
(Applied linguistics) cook's book ch 8
VivaAs
 
(Applied linguistics) gass's book ch 6
(Applied linguistics) gass's book ch 6(Applied linguistics) gass's book ch 6
(Applied linguistics) gass's book ch 6
VivaAs
 
(Semantics) kroeger's book ch 9
(Semantics) kroeger's book ch 9(Semantics) kroeger's book ch 9
(Semantics) kroeger's book ch 9
VivaAs
 
(Semantics) saeed's book ch 9
(Semantics) saeed's book ch 9(Semantics) saeed's book ch 9
(Semantics) saeed's book ch 9
VivaAs
 
{Phonetics} ladegfoged's book ch 9
{Phonetics} ladegfoged's book ch 9{Phonetics} ladegfoged's book ch 9
{Phonetics} ladegfoged's book ch 9
VivaAs
 
The semantics of emotions, semantics
The semantics of emotions, semanticsThe semantics of emotions, semantics
The semantics of emotions, semantics
VivaAs
 
Sociolinguistic
SociolinguisticSociolinguistic
Sociolinguistic
VivaAs
 
Forensic linguistics
Forensic linguistics Forensic linguistics
Forensic linguistics
VivaAs
 
Cognitive semantics, semantics
Cognitive semantics, semanticsCognitive semantics, semantics
Cognitive semantics, semantics
VivaAs
 
Semantic roles, semantics
Semantic roles, semanticsSemantic roles, semantics
Semantic roles, semantics
VivaAs
 
Language and communication (1)
Language and communication (1)Language and communication (1)
Language and communication (1)
VivaAs
 
Cognitive semantics ch11
Cognitive semantics ch11Cognitive semantics ch11
Cognitive semantics ch11
VivaAs
 
Semantic roles ch4
Semantic roles ch4Semantic roles ch4
Semantic roles ch4
VivaAs
 
The semantics of emotions, ch4
The semantics of emotions, ch4The semantics of emotions, ch4
The semantics of emotions, ch4
VivaAs
 
English language teaching, ch4
English language teaching, ch4English language teaching, ch4
English language teaching, ch4
VivaAs
 
Grammar, ch2
Grammar, ch2Grammar, ch2
Grammar, ch2
VivaAs
 
Sociolinguistics, ch 9
Sociolinguistics, ch 9Sociolinguistics, ch 9
Sociolinguistics, ch 9
VivaAs
 
Survey designs
Survey designsSurvey designs
Survey designs
VivaAs
 
Speaking and pronunciation
Speaking and pronunciationSpeaking and pronunciation
Speaking and pronunciation
VivaAs
 

More from VivaAs (20)

(Applied linguistics) schmitt's book ch 10
(Applied linguistics)  schmitt's book ch 10(Applied linguistics)  schmitt's book ch 10
(Applied linguistics) schmitt's book ch 10
 
(Applied linguistics) cook's book ch 8
(Applied linguistics) cook's book ch 8(Applied linguistics) cook's book ch 8
(Applied linguistics) cook's book ch 8
 
(Applied linguistics) gass's book ch 6
(Applied linguistics) gass's book ch 6(Applied linguistics) gass's book ch 6
(Applied linguistics) gass's book ch 6
 
(Semantics) kroeger's book ch 9
(Semantics) kroeger's book ch 9(Semantics) kroeger's book ch 9
(Semantics) kroeger's book ch 9
 
(Semantics) saeed's book ch 9
(Semantics) saeed's book ch 9(Semantics) saeed's book ch 9
(Semantics) saeed's book ch 9
 
{Phonetics} ladegfoged's book ch 9
{Phonetics} ladegfoged's book ch 9{Phonetics} ladegfoged's book ch 9
{Phonetics} ladegfoged's book ch 9
 
The semantics of emotions, semantics
The semantics of emotions, semanticsThe semantics of emotions, semantics
The semantics of emotions, semantics
 
Sociolinguistic
SociolinguisticSociolinguistic
Sociolinguistic
 
Forensic linguistics
Forensic linguistics Forensic linguistics
Forensic linguistics
 
Cognitive semantics, semantics
Cognitive semantics, semanticsCognitive semantics, semantics
Cognitive semantics, semantics
 
Semantic roles, semantics
Semantic roles, semanticsSemantic roles, semantics
Semantic roles, semantics
 
Language and communication (1)
Language and communication (1)Language and communication (1)
Language and communication (1)
 
Cognitive semantics ch11
Cognitive semantics ch11Cognitive semantics ch11
Cognitive semantics ch11
 
Semantic roles ch4
Semantic roles ch4Semantic roles ch4
Semantic roles ch4
 
The semantics of emotions, ch4
The semantics of emotions, ch4The semantics of emotions, ch4
The semantics of emotions, ch4
 
English language teaching, ch4
English language teaching, ch4English language teaching, ch4
English language teaching, ch4
 
Grammar, ch2
Grammar, ch2Grammar, ch2
Grammar, ch2
 
Sociolinguistics, ch 9
Sociolinguistics, ch 9Sociolinguistics, ch 9
Sociolinguistics, ch 9
 
Survey designs
Survey designsSurvey designs
Survey designs
 
Speaking and pronunciation
Speaking and pronunciationSpeaking and pronunciation
Speaking and pronunciation
 

Recently uploaded

Simple-Present-Tense xxxxxxxxxxxxxxxxxxx
Simple-Present-Tense xxxxxxxxxxxxxxxxxxxSimple-Present-Tense xxxxxxxxxxxxxxxxxxx
Simple-Present-Tense xxxxxxxxxxxxxxxxxxx
RandolphRadicy
 
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGHKHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
shreyassri1208
 
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
indexPub
 
INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION
INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION
INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION
ShwetaGawande8
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
Prof. Dr. K. Adisesha
 
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapitolTechU
 
adjectives.ppt for class 1 to 6, grammar
adjectives.ppt for class 1 to 6, grammaradjectives.ppt for class 1 to 6, grammar
adjectives.ppt for class 1 to 6, grammar
7DFarhanaMohammed
 
220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx
Kalna College
 
Creative Restart 2024: Mike Martin - Finding a way around “no”
Creative Restart 2024: Mike Martin - Finding a way around “no”Creative Restart 2024: Mike Martin - Finding a way around “no”
Creative Restart 2024: Mike Martin - Finding a way around “no”
Taste
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
deepaannamalai16
 
How to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in useHow to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in use
Celine George
 
Ch-4 Forest Society and colonialism 2.pdf
Ch-4 Forest Society and colonialism 2.pdfCh-4 Forest Society and colonialism 2.pdf
Ch-4 Forest Society and colonialism 2.pdf
lakshayrojroj
 
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
220711130083 SUBHASHREE RAKSHIT  Internet resources for social science220711130083 SUBHASHREE RAKSHIT  Internet resources for social science
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
Kalna College
 
78 Microsoft-Publisher - Sirin Sultana Bora.pptx
78 Microsoft-Publisher - Sirin Sultana Bora.pptx78 Microsoft-Publisher - Sirin Sultana Bora.pptx
78 Microsoft-Publisher - Sirin Sultana Bora.pptx
Kalna College
 
220711130097 Tulip Samanta Concept of Information and Communication Technology
220711130097 Tulip Samanta Concept of Information and Communication Technology220711130097 Tulip Samanta Concept of Information and Communication Technology
220711130097 Tulip Samanta Concept of Information and Communication Technology
Kalna College
 
Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)
nitinpv4ai
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
Steve Thomason
 
220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science
Kalna College
 
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptxA Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
OH TEIK BIN
 
How to Setup Default Value for a Field in Odoo 17
How to Setup Default Value for a Field in Odoo 17How to Setup Default Value for a Field in Odoo 17
How to Setup Default Value for a Field in Odoo 17
Celine George
 

Recently uploaded (20)

Simple-Present-Tense xxxxxxxxxxxxxxxxxxx
Simple-Present-Tense xxxxxxxxxxxxxxxxxxxSimple-Present-Tense xxxxxxxxxxxxxxxxxxx
Simple-Present-Tense xxxxxxxxxxxxxxxxxxx
 
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGHKHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
 
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
 
INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION
INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION
INTRODUCTION TO HOSPITALS & AND ITS ORGANIZATION
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
 
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
 
adjectives.ppt for class 1 to 6, grammar
adjectives.ppt for class 1 to 6, grammaradjectives.ppt for class 1 to 6, grammar
adjectives.ppt for class 1 to 6, grammar
 
220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx
 
Creative Restart 2024: Mike Martin - Finding a way around “no”
Creative Restart 2024: Mike Martin - Finding a way around “no”Creative Restart 2024: Mike Martin - Finding a way around “no”
Creative Restart 2024: Mike Martin - Finding a way around “no”
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
 
How to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in useHow to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in use
 
Ch-4 Forest Society and colonialism 2.pdf
Ch-4 Forest Society and colonialism 2.pdfCh-4 Forest Society and colonialism 2.pdf
Ch-4 Forest Society and colonialism 2.pdf
 
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
220711130083 SUBHASHREE RAKSHIT  Internet resources for social science220711130083 SUBHASHREE RAKSHIT  Internet resources for social science
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
 
78 Microsoft-Publisher - Sirin Sultana Bora.pptx
78 Microsoft-Publisher - Sirin Sultana Bora.pptx78 Microsoft-Publisher - Sirin Sultana Bora.pptx
78 Microsoft-Publisher - Sirin Sultana Bora.pptx
 
220711130097 Tulip Samanta Concept of Information and Communication Technology
220711130097 Tulip Samanta Concept of Information and Communication Technology220711130097 Tulip Samanta Concept of Information and Communication Technology
220711130097 Tulip Samanta Concept of Information and Communication Technology
 
Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
 
220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science
 
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptxA Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
 
How to Setup Default Value for a Field in Odoo 17
How to Setup Default Value for a Field in Odoo 17How to Setup Default Value for a Field in Odoo 17
How to Setup Default Value for a Field in Odoo 17
 

Corpus linguistics, ch6

  • 2. TABLE OF CONTENTS 03 What is corpus linguistics? 01 02 04 05 Corpus design and compilation What can a corpus tell us? Overview of different types of corpus studies How can corpora inform language teaching? (Schmitt,2020) 02
  • 4.  ‘Corpus linguistics’ has enjoyed much greater popularity, both as a means to explore actual patterns of language use and as a tool for developing materials for classroom language instruction.  Corpus linguistics uses large collections of both spoken and written natural texts that are stored on computers.  One of the major contributions of corpus linguistics is in the area of exploring patterns of language use.  Corpus linguistics and the term ‘corpus’ in its present-day are synonymous with computerized corpora and methods, but they were not before. What is corpus linguistics? (Schmitt,2020) 04
  • 5.  An empirical approach to linguistic analysis is based on naturally occurring spoken or written data.  Advances in technology have led to a number of advantages for corpus linguists, including the collection of larger language samples and the ability for faster and more efficient text processing .  Characteristic of corpus-based analyses of language: o It is empirical, analysing the actual patterns of use in natural texts. o It utilizes a large and principled collection of natural texts. o It makes extensive use of computers for analysis, using both automatic and interactive techniques. o It depends on both quantitative and qualitative analytical techniques. What is corpus linguistics? (Schmitt,2020) 05
  • 6.  A corpus refers to a large principled collection of natural texts.  The use of natural texts means that language has been collected from naturally occurring sources.  Examples of well-known corpora: o The British National Corpus (BNC) o The Corpus of Contemporary American English (COCA) o The Brown Corpus  The text collection process for building a corpus needs to be principled to ensure representativeness and balance. What is corpus linguistics? (Schmitt,2020) 06
  • 7.  The linguistic features or research questions being investigated will shape the collection of texts used in creating the corpus.  Although computers make possible a wide range of sophisticated statistical techniques, human analysts are still needed to decide what information is worth searching for, to extract that information from the corpus and to interpret the findings.  Corpus linguistics bring together aspects of quantitative and qualitative technique.  The quantitative analyses provide an accurate view of more macro-level characteristics, whereas the qualitative analyses provide the complementary micro-level perspective. What is corpus linguistics? (Schmitt,2020) 07
  • 9.  Although there is no minimum size for a text collection to be considered a corpus, an early standard size set by the creators of the Brown Corpus was one million words.  A number of well-known specialized corpora are much smaller than that, but there is a general assumption that for most tasks within corpus linguistics, larger corpora are more valuable.  Modern corpora are available to other researchers and free of charge.  They enable researchers all over the world to access the same sets of data, which encourages a higher degree of accountability in data analysis and permits collaborative studies by different researchers. Corpus design and compilation (Schmitt,2020) 09
  • 11. A. General corpora o BNC contains 100 million words and the COCA had 560 million words. o Brown and LOB, at a mere one million words.  It designed to be balanced and include language samples from a wide range of registers or genres.  Most of the early general corpora were limited to written language, because written texts are vastly easier and cheaper to compile than transcripts of speech.  A few corpora dedicated to spoken discourse. o The Cambridge and Nottingham Corpus of Discourse in English (CANCODE). Types of corpora (Schmitt,2020) 11
  • 12. B. Specialized corpora  They designed with more specific research goals in mind and they considered the most crucial ‘growth area’ for corpus linguistics.  Specialized corpora may include both spoken and written components. o International Corpus of English (ICE) o The TOEFL-2000 Spoken and Written Academic Language Corpus  A specialized corpus focuses on a particular spoken or written variety of language. • Historical corpora such as the Archer Corpus (two million words of British and American English dating from 1650 to 1990). • ‘Learner’s corpus’ (spoken or written language samples produced by non-native speakers). Types of corpora (Schmitt,2020) 12
  • 13. ..... Issues in corpus design (Schmitt,2020) 13
  • 14.  One of the most important factors in corpus linguistics is the design of the corpus.  This design of the corpus impacts all of the analysis and results.  The composition of the corpus should reflect the anticipated research goals.  For example: o Comparing patterns of language found in spoken and written discourse. • The corpus has to include a range of possible spoken and written texts. • The information derived from the corpus accurately reflects the variation possible in the patterns being compared across the two registers. Issues in corpus design (Schmitt,2020) 14
  • 15.  A well-designed corpus should aim to be representative of the types of language included in it.  There are many different ways to conceive of and justify representativeness: a. A representative of different registers (fiction, casual conversation) and topics. (national vs local news). b. A representativeness involves the demographics of the speakers or writers (nationality, gender, education level). c. A representative based on production or reception (e-mail messages, newspapers).  All these issues must be weighed when deciding how much of each category to include.  In thinking about the research goals of a corpus, compilers must bear in mind the intended distribution of the corpus. Issues in corpus design (Schmitt,2020) 15
  • 17.  When creating a corpus, data collection involves obtaining or creating electronic versions of the target texts, and storing and organizing them.  Data collection for a written corpus means using a scanner and optical character recognition (OCR) software to scan paper documents into electronic text files.  OCR is not error-free and manual proofreading and error-correction is necessary.  The data collection of spoken corpus is long and expensive. • A transcription system (an orthographic transcription system) • The representative of interactional characteristics of the speech in the transcripts.  An important issue for both spoken and written corpus during data collection is obtaining permission to use the data for the corpus. Corpus compilation (Schmitt,2020) 17
  • 19.  A simple corpus could consist of raw text, with no additional information provided about the origins, authors, speakers, structure or contents of the texts themselves.  Encoding some of the information in the form of markup makes the corpus more useful.  Structural markup refers to the use of codes in the texts to identify structural features of the text. o A written corpus (titles, authors, chapters) o A spoken corpus (speakers, paralinguistic features)  Many corpora provide information about the contents and creation of each text in what is called a header.  Headers include classifications of the text into categories, such as register, genre, topic domain. Markup and annotation (Schmitt,2020) 19
  • 20.  Some corpora are also encoded with certain types of linguistic annotation.  There are different kinds of linguistic processing or annotation: A. Part-of-speech tagging which involves assigning a grammatical category tag to each word in the corpus. o ‘A goat can eat shoes’ A (indefinite article) goat (noun, singular) can (modal) eat (main verb) shoes (noun, plural). B. prosodic and phonetic annotation, which are not uncommon. C. Syntactic parsing, which is much less common.  A tagged corpus allows researchers to answer different types of questions, explore the frequency of lexical items, grammatical structures, and addresses the problem of words that have multiple meanings or functions. Markup and annotation (Schmitt,2020) 20
  • 21. What can a corpus tell us? 03 (Schmitt,2020) 21
  • 22. ..... Word counts and basic corpus tools (Schmitt,2020) 22
  • 23.  There are many levels of information that can be gathered from a corpus.  These levels range from simple word lists to complex grammatical structures and interactive analyses.  Analyses can explore individual lexical or linguistic features or identify clusters.  The tools that are used for these analyses range from basic to complex computer programs.  The most basic information that we can get from a corpus, is frequency of occurrence information. o MonoConc, WordSmith Tools, and Antconc  A word list is a list of all the words that occur in the corpus that arranged in alphabetic or frequency order. Word counts and basic corpus tools (Schmitt,2020) 23
  • 26.  Concordancing packages can provide additional information about lexical co-occurrence patterns.  Once the search word is selected, the program can search the texts in the corpus and provide a list of each occurrence of the target word in context this is called ‘key word in context’ (KWIC).  A concordance program can also provide information about words that tend to occur together in the corpus in what is called ‘collocates’, and the resulting sets of words are called ‘collocations’.  An analysis of collocations provides important information about grammatical and semantic patterns of use for individual lexical items.  The corpus analysis can discover patterns of use that were unnoticed before.  For example, synonymous verbs begin and start have the same grammatical potential. Word counts and basic corpus tools (Schmitt,2020) 26
  • 28. ..... Working with tagged texts (Schmitt,2020) 28
  • 29.  In order to carry out more sophisticated types of corpus analyses, it is often necessary to have a tagged corpus.  when a corpus is tagged, each word in the corpus is given a grammatical label.  The process of assigning grammatical labels to words is complex.  For example:‘ can’ falls into two grammatical categories. o It can be a modal ‘I can reach the book’. o It can be used as a noun ‘Put the paper in the can’.  Computers can accurately identify the grammatical labels for many words.  There are certain features that remain elusive, and here the program will bring the problematic to the screen for the user to select the correct classification.  Once texts have been tagged it provide a fuller picture of the texts in a register. Working with tagged texts (Schmitt,2020) 29
  • 30. Overview of different types of corpus studies 04 (Schmitt,2020) 30
  • 31.  Over the years, corpora have been used to address a number of interesting issues such as the question of language change.  The area of historical linguistics which look in how language has changed over the centuries.  Scholars have also look into to language development, in first and second language situations.  Corpora have also been used to explore similarities or differences across different national or regional varieties of English (Australian English, American English, Indian English).  There also studies explore the differences between spoken and written language.  Before corpus linguistics it was difficult to note patterns of use, since observing and tracking use patterns was a huge task. Overview of different types of corpus studies (Schmitt,2020) 31
  • 32. How can corpora inform language teaching? 05 (Schmitt,2020) 32
  • 33.  There is impact of corpus linguistic studies on classroom language teaching practices.  Corpus-based studies of particular language features such as The Longman Grammar of Spoken and Written English will serve language teachers by providing a basis for deciding which language features and structures are important.  Teachers and materials writers can have a basis for selecting the material that is being presented.  Rather than basing pedagogical decisions on intuitions, these decisions can now be grounded on actual patterns of language use in various situations. How can corpora inform language teaching? (Schmitt,2020) 33
  • 34. ..... Bringing corpora into the language classroom (Schmitt,2020) 34
  • 35.  Corpus-based information can be brought to bear on language teaching in two ways: 1. Teachers can shape instruction based on corpus-based information. • They can consult corpus studies to gain information about the features that they are teaching. • For example: o ‘Conversational English’ teachers could read corpus investigations on spoken language to determine which features and grammatical structures are characteristic of conversational English. Bringing corpora into the language classroom (Schmitt,2020) 35
  • 36. 2. Learners interact with corpora.  This can take place in one of two ways: A. If computer facilities are adequate learners can be actively involved in exploring corpora. B. If adequate facilities do not exist teachers can bring the results from corpus searches for use in the classroom.  The use of concordancing tasks in the classroom is a matter of some controversy. • It strongly advocated by those who favour an inductive or data-driven approach to learning. • It criticized by others who argue that it is difficult to guide students appropriately in the analysis of vast numbers of linguistic examples. Bringing corpora into the language classroom (Schmitt,2020) 36
  • 37. ..... Examples of corpus-based classroom activities (Schmitt,2020) 37
  • 38.  The creation of appropriate, corpus-based teaching materials takes time, careful planning and access to a few basic tools and resources.  The activities will require access to a computer, texts and to a concordancing package.  Several vocabulary activities can be generated through simple frequency lists and concordance output.  The vocabulary frequency list can be used to identify vocabulary words that need to be taught.  Frequency lists can also be a starting point for students to group words by grammatical category (verb, nouns, etc.) or semantic categories. Examples of corpus-based classroom activities (Schmitt,2020) 38
  • 39.  Concordances of target words can be used to better understand those words’ meanings and usage.  The use of a word and its patterning characteristics also contribute to its meaning senses.  For example, words often are seen as synonymous when actually, their use is not synonymous.  Dictionaries often list the ‘resulting copulas’ become, turn, go and come as synonyms, with meanings like ‘to become’, ‘to get to be’, ‘to result’, ‘to turn out’.  Most dictionaries provide no clues to how these four words might differ in meaning.  Corpus research shows that these words differ dramatically in their typical contexts of use. Examples of corpus-based classroom activities (Schmitt,2020) 39
  • 40. o ‘turn’ change of colour or physical appearance. (The water turned grey) o ‘go’ describes a change to a negative state. (go crazy, go bad, go wrong) o ‘come’ describe a change to a more active state. (come awake, come alive)  If corpus activities coupled with dictionary activities, they can provide a much richer language- learning environment for student.  The patterns of language use that can be discovered through corpus linguistics will continue to reshape the way we think of language. Examples of corpus-based classroom activities (Schmitt,2020) 40
  • 41. Schmitt, N. (2020). An introduction to applied linguistics. Routledge. RESOURCES 41
  • 42. Do You Have Any Question? THANKS 42