SlideShare a Scribd company logo
1 of 20
Mr Jitendra B. Patil
Assistant Professor of English
Pratap College Amalner
Dist – Jalgaon (Maharshtra)
Pin-425401 Mob.- 919421655091
Email- jitendrapca@gmail.com
 Corpus (Latin) means ‘body’
 any body of text
 new approach to language study
 collects samples of text from various fields of
language use in a scientific and systematic way
 Corpus: a statistically sampled language
database
 Purposes: investigation, description, application, and
analyses relevant to all branches of linguistics
Indispensability of Corpus in Linguistics:
Due to large structure, varied composition, huge
information, confirmed referential authenticity, wide
representation, easy usability and simple verifiability
Usages:
To verify earlier proposition and examples
To verify logic of pre -proposed definitions and
explanations
Corpus in Corpus Linguistics:
Holds special connotations
A large collection of linguistic data used as a starting point
of logistic description
A body of language text in written and spoken form
Represents varieties of language used at each and every field
of human interaction
Preserves in machine readable form
Enables all kinds of linguistic description and analysis
Corpus means a large collection of texts assumed to be
representative of a given language, dialect or other subset of
language, to be used for linguistic analyses.
Corpus is a large collection of pieces of language that are
selected and ordered according to some explicit linguistic
criteria in order to be used as samples of the language.
Corpus is a large collection of naturally occurring language
texts presented in machine-readable form accumulated in
scientific manner to characterize a particular variety or use of
language.
A corpus, which contains constituent pieces of language
that are documented as to their origin and provenance, is
encoded in a standard and homogenous way for open-
ended retrieval tasks.
Linguistics have always used the word ‘Corpus’ to
describe a collection of naturally occurring examples of
language ,consisting of anything from a set of written text
or tape recordings which have been collected for linguistic
study.
A corpus refers to :
Any body of text
A body of machine-readable text
A finite collection of machine-readable texts which are
sampled to maximally representative of language or
language variety.
Important Issues in Corpus Designing:
Composition of a corpus
Usage potential of a corpus
A Corpus should-
Faithfully represent both common and special linguistic features
of a language from where it is designed and developed
Be large enough to encompass samples of text from various
disciplines
Be a true replica of physical texts
Preserve various forms of words, punctuation marks, spellings,
variations and other orthographic symbols used in the source text.
Represent all linguistic usage varieties in a propositional manner
Use authentic, referential and verifiable Text samples
Enable user to use language data in multiple tasks
Preserve texts in annotated and non-annotated form
Quantity:
No fixed parameter
The bigger the corpus ,the better its authenticity and
reliability
Data from a variety of sources in large quantity
Refers to the sum of the total linguistic component
included
Electronic corpus generation contains millions of words
Quality:
Relates to authenticity
Collection from genuine communications
Depends on ideal restriction of corpus collectors role
Databases should be drawn from actual reality
Interactional properties of casual and informal talks
Representativeness:
Proper representation of a broad range of material
Representative of maximum linguistic features
Authentic in representation of text variety
Maximally representative of demographical variables
Overall size of corpus to be set against the diversity of
sources
Random selection of text samples
Simplicity:
Simple and plain text samples
Unbroken string of characters without any added
information
Separate Preservation of additional features
Separate storage of Extralinguistic information
Equality :
Text sample with equal number of words
 balance between spoken text sample and written text
sample
Collection of equal amount of text from all sources
Balance in case of quality of samples
Retrievability :
Easy Retrievability of data by end user
Techniques and tools preserving data in electronic forma
Accessibility for all
Verifiability:
Must be open to empirical verification
Reflective of actual of patterns of language use
Authentic and valid in synchronic and diachronic studies
Augmentation:
Changeable with time
Can be synchronic
Can be diachronic
Documentation :
Separation of documentary information from the components
Meticulous documentation of extralinguistic information
Easy retrieval of extralinguistic information (annotated info)
Management :
Necessary scheme for maintenance, standardization,
augmentation and upgrading
Preservation of data from virus infection
Displacement of corpus data
Conversion of Corpus data across different formats
Adaptation of new hardware and software technology
Corpus linguistics

More Related Content

What's hot

What is Applied Linguistics?
What is Applied Linguistics?What is Applied Linguistics?
What is Applied Linguistics?Shajaira Lopez
 
Fundamental concepts in linguistics
Fundamental concepts in linguisticsFundamental concepts in linguistics
Fundamental concepts in linguisticsamna-shahid
 
Structuralism in linguistics
Structuralism in linguisticsStructuralism in linguistics
Structuralism in linguisticsSadaqat Hussain
 
Prague school slides
Prague school slidesPrague school slides
Prague school slidesnoreen zafar
 
Pidgins creoles - sociolinguistics
Pidgins   creoles - sociolinguistics Pidgins   creoles - sociolinguistics
Pidgins creoles - sociolinguistics Amal Mustafa
 
The Discourse - Historical Approach
The Discourse - Historical ApproachThe Discourse - Historical Approach
The Discourse - Historical ApproachImtiaz Ahmad
 
Background of English, its Spread, Functions & Status
Background of English, its Spread, Functions & StatusBackground of English, its Spread, Functions & Status
Background of English, its Spread, Functions & StatusAli Soomro
 
Language contact
Language contactLanguage contact
Language contactReham Gamal
 
Language Planning
Language PlanningLanguage Planning
Language PlanningAyesha Mir
 
Corpus linguistics intro
Corpus linguistics introCorpus linguistics intro
Corpus linguistics introAlex Curtis
 
Discourse structure chapter 4 by Ahmet YUSUF
Discourse structure chapter 4 by Ahmet YUSUFDiscourse structure chapter 4 by Ahmet YUSUF
Discourse structure chapter 4 by Ahmet YUSUFأحمد يوسف
 
Five generations of applied linguistics
Five generations of applied linguisticsFive generations of applied linguistics
Five generations of applied linguisticsedac4co
 
Stylistics and Branches in stylistics
Stylistics and Branches in stylisticsStylistics and Branches in stylistics
Stylistics and Branches in stylisticsnirmeennimmu
 

What's hot (20)

Forensic Linguistics
Forensic LinguisticsForensic Linguistics
Forensic Linguistics
 
What is Applied Linguistics?
What is Applied Linguistics?What is Applied Linguistics?
What is Applied Linguistics?
 
Fundamental concepts in linguistics
Fundamental concepts in linguisticsFundamental concepts in linguistics
Fundamental concepts in linguistics
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Structuralism in linguistics
Structuralism in linguisticsStructuralism in linguistics
Structuralism in linguistics
 
Prague school slides
Prague school slidesPrague school slides
Prague school slides
 
Language & style
Language & styleLanguage & style
Language & style
 
Discourse analysis
Discourse analysisDiscourse analysis
Discourse analysis
 
Pidgins creoles - sociolinguistics
Pidgins   creoles - sociolinguistics Pidgins   creoles - sociolinguistics
Pidgins creoles - sociolinguistics
 
The Discourse - Historical Approach
The Discourse - Historical ApproachThe Discourse - Historical Approach
The Discourse - Historical Approach
 
Background of English, its Spread, Functions & Status
Background of English, its Spread, Functions & StatusBackground of English, its Spread, Functions & Status
Background of English, its Spread, Functions & Status
 
Functionalism
FunctionalismFunctionalism
Functionalism
 
Language contact
Language contactLanguage contact
Language contact
 
Language Planning
Language PlanningLanguage Planning
Language Planning
 
Corpus linguistics intro
Corpus linguistics introCorpus linguistics intro
Corpus linguistics intro
 
Discourse structure chapter 4 by Ahmet YUSUF
Discourse structure chapter 4 by Ahmet YUSUFDiscourse structure chapter 4 by Ahmet YUSUF
Discourse structure chapter 4 by Ahmet YUSUF
 
Five generations of applied linguistics
Five generations of applied linguisticsFive generations of applied linguistics
Five generations of applied linguistics
 
History of linguistics - Schools of Linguistics
 History of linguistics - Schools of Linguistics History of linguistics - Schools of Linguistics
History of linguistics - Schools of Linguistics
 
Language planning
Language planningLanguage planning
Language planning
 
Stylistics and Branches in stylistics
Stylistics and Branches in stylisticsStylistics and Branches in stylistics
Stylistics and Branches in stylistics
 

Similar to Corpus linguistics

4 salient features of corpus
4 salient features of corpus4 salient features of corpus
4 salient features of corpusThennarasuSakkan
 
Corpus study design
Corpus study designCorpus study design
Corpus study designbikashtaly
 
Corpus linguistics the basics
Corpus linguistics the basicsCorpus linguistics the basics
Corpus linguistics the basicsJorge Baptista
 
lexicography
lexicographylexicography
lexicographyayfa
 
Corpus-Based Studies of Legal Language for Translation Purposes:
Corpus-Based Studies of Legal Language for Translation Purposes:Corpus-Based Studies of Legal Language for Translation Purposes:
Corpus-Based Studies of Legal Language for Translation Purposes:Lucja Biel
 
Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Umm-e-Rooman Yaqoob
 
Computer assisted text and corpus analysis
Computer assisted text and corpus analysisComputer assisted text and corpus analysis
Computer assisted text and corpus analysisRubyaShaheen
 
Developing Teaching Materials with Authentic Data and Corpus Analysis Tools
Developing Teaching Materials with Authentic Data and Corpus Analysis ToolsDeveloping Teaching Materials with Authentic Data and Corpus Analysis Tools
Developing Teaching Materials with Authentic Data and Corpus Analysis ToolsCALPER
 
Discourse analysis (Linguistics Forms and Functions)
Discourse analysis (Linguistics Forms and Functions)Discourse analysis (Linguistics Forms and Functions)
Discourse analysis (Linguistics Forms and Functions)Satya Permadi
 
Corpus approaches to discourse analysis
Corpus approaches to discourse analysisCorpus approaches to discourse analysis
Corpus approaches to discourse analysisAseel K. Mahmood
 
Types of corpus linguistics Parallel ,aligned...
 Types of corpus linguistics Parallel ,aligned... Types of corpus linguistics Parallel ,aligned...
Types of corpus linguistics Parallel ,aligned...RajpootBhatti5
 
Sinopsis
SinopsisSinopsis
Sinopsisayfa
 
Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Pascual Pérez-Paredes
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
 
The Corpus In The Classroom
The Corpus In The ClassroomThe Corpus In The Classroom
The Corpus In The ClassroomColin Graham
 
Corpus Linguistics II.pptx
Corpus Linguistics II.pptxCorpus Linguistics II.pptx
Corpus Linguistics II.pptxRachidMouzouni1
 
Corpus based translation Studies
Corpus based translation StudiesCorpus based translation Studies
Corpus based translation StudiesHabib Ali
 

Similar to Corpus linguistics (20)

4 salient features of corpus
4 salient features of corpus4 salient features of corpus
4 salient features of corpus
 
Corpus study design
Corpus study designCorpus study design
Corpus study design
 
Corpus linguistics the basics
Corpus linguistics the basicsCorpus linguistics the basics
Corpus linguistics the basics
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
lexicography
lexicographylexicography
lexicography
 
corpus linguistics.pptx
corpus linguistics.pptxcorpus linguistics.pptx
corpus linguistics.pptx
 
Corpus-Based Studies of Legal Language for Translation Purposes:
Corpus-Based Studies of Legal Language for Translation Purposes:Corpus-Based Studies of Legal Language for Translation Purposes:
Corpus-Based Studies of Legal Language for Translation Purposes:
 
Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics
 
Computer assisted text and corpus analysis
Computer assisted text and corpus analysisComputer assisted text and corpus analysis
Computer assisted text and corpus analysis
 
Developing Teaching Materials with Authentic Data and Corpus Analysis Tools
Developing Teaching Materials with Authentic Data and Corpus Analysis ToolsDeveloping Teaching Materials with Authentic Data and Corpus Analysis Tools
Developing Teaching Materials with Authentic Data and Corpus Analysis Tools
 
Discourse analysis (Linguistics Forms and Functions)
Discourse analysis (Linguistics Forms and Functions)Discourse analysis (Linguistics Forms and Functions)
Discourse analysis (Linguistics Forms and Functions)
 
Graded assignment #3
Graded assignment #3Graded assignment #3
Graded assignment #3
 
Corpus approaches to discourse analysis
Corpus approaches to discourse analysisCorpus approaches to discourse analysis
Corpus approaches to discourse analysis
 
Types of corpus linguistics Parallel ,aligned...
 Types of corpus linguistics Parallel ,aligned... Types of corpus linguistics Parallel ,aligned...
Types of corpus linguistics Parallel ,aligned...
 
Sinopsis
SinopsisSinopsis
Sinopsis
 
Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
 
The Corpus In The Classroom
The Corpus In The ClassroomThe Corpus In The Classroom
The Corpus In The Classroom
 
Corpus Linguistics II.pptx
Corpus Linguistics II.pptxCorpus Linguistics II.pptx
Corpus Linguistics II.pptx
 
Corpus based translation Studies
Corpus based translation StudiesCorpus based translation Studies
Corpus based translation Studies
 

Recently uploaded

Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 

Recently uploaded (20)

Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 

Corpus linguistics

  • 1. Mr Jitendra B. Patil Assistant Professor of English Pratap College Amalner Dist – Jalgaon (Maharshtra) Pin-425401 Mob.- 919421655091 Email- jitendrapca@gmail.com
  • 2.  Corpus (Latin) means ‘body’  any body of text  new approach to language study  collects samples of text from various fields of language use in a scientific and systematic way  Corpus: a statistically sampled language database  Purposes: investigation, description, application, and analyses relevant to all branches of linguistics
  • 3. Indispensability of Corpus in Linguistics: Due to large structure, varied composition, huge information, confirmed referential authenticity, wide representation, easy usability and simple verifiability Usages: To verify earlier proposition and examples To verify logic of pre -proposed definitions and explanations
  • 4. Corpus in Corpus Linguistics: Holds special connotations A large collection of linguistic data used as a starting point of logistic description A body of language text in written and spoken form Represents varieties of language used at each and every field of human interaction Preserves in machine readable form Enables all kinds of linguistic description and analysis
  • 5. Corpus means a large collection of texts assumed to be representative of a given language, dialect or other subset of language, to be used for linguistic analyses. Corpus is a large collection of pieces of language that are selected and ordered according to some explicit linguistic criteria in order to be used as samples of the language. Corpus is a large collection of naturally occurring language texts presented in machine-readable form accumulated in scientific manner to characterize a particular variety or use of language.
  • 6. A corpus, which contains constituent pieces of language that are documented as to their origin and provenance, is encoded in a standard and homogenous way for open- ended retrieval tasks. Linguistics have always used the word ‘Corpus’ to describe a collection of naturally occurring examples of language ,consisting of anything from a set of written text or tape recordings which have been collected for linguistic study.
  • 7. A corpus refers to : Any body of text A body of machine-readable text A finite collection of machine-readable texts which are sampled to maximally representative of language or language variety. Important Issues in Corpus Designing: Composition of a corpus Usage potential of a corpus
  • 8. A Corpus should- Faithfully represent both common and special linguistic features of a language from where it is designed and developed Be large enough to encompass samples of text from various disciplines Be a true replica of physical texts Preserve various forms of words, punctuation marks, spellings, variations and other orthographic symbols used in the source text. Represent all linguistic usage varieties in a propositional manner Use authentic, referential and verifiable Text samples Enable user to use language data in multiple tasks Preserve texts in annotated and non-annotated form
  • 9.
  • 10. Quantity: No fixed parameter The bigger the corpus ,the better its authenticity and reliability Data from a variety of sources in large quantity Refers to the sum of the total linguistic component included Electronic corpus generation contains millions of words
  • 11. Quality: Relates to authenticity Collection from genuine communications Depends on ideal restriction of corpus collectors role Databases should be drawn from actual reality Interactional properties of casual and informal talks
  • 12. Representativeness: Proper representation of a broad range of material Representative of maximum linguistic features Authentic in representation of text variety Maximally representative of demographical variables Overall size of corpus to be set against the diversity of sources Random selection of text samples
  • 13. Simplicity: Simple and plain text samples Unbroken string of characters without any added information Separate Preservation of additional features Separate storage of Extralinguistic information
  • 14. Equality : Text sample with equal number of words  balance between spoken text sample and written text sample Collection of equal amount of text from all sources Balance in case of quality of samples
  • 15. Retrievability : Easy Retrievability of data by end user Techniques and tools preserving data in electronic forma Accessibility for all
  • 16. Verifiability: Must be open to empirical verification Reflective of actual of patterns of language use Authentic and valid in synchronic and diachronic studies
  • 17. Augmentation: Changeable with time Can be synchronic Can be diachronic
  • 18. Documentation : Separation of documentary information from the components Meticulous documentation of extralinguistic information Easy retrieval of extralinguistic information (annotated info)
  • 19. Management : Necessary scheme for maintenance, standardization, augmentation and upgrading Preservation of data from virus infection Displacement of corpus data Conversion of Corpus data across different formats Adaptation of new hardware and software technology