Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Corpus linguistics
1. Mr Jitendra B. Patil
Assistant Professor of English
Pratap College Amalner
Dist – Jalgaon (Maharshtra)
Pin-425401 Mob.- 919421655091
Email- jitendrapca@gmail.com
2. Corpus (Latin) means ‘body’
any body of text
new approach to language study
collects samples of text from various fields of
language use in a scientific and systematic way
Corpus: a statistically sampled language
database
Purposes: investigation, description, application, and
analyses relevant to all branches of linguistics
3. Indispensability of Corpus in Linguistics:
Due to large structure, varied composition, huge
information, confirmed referential authenticity, wide
representation, easy usability and simple verifiability
Usages:
To verify earlier proposition and examples
To verify logic of pre -proposed definitions and
explanations
4. Corpus in Corpus Linguistics:
Holds special connotations
A large collection of linguistic data used as a starting point
of logistic description
A body of language text in written and spoken form
Represents varieties of language used at each and every field
of human interaction
Preserves in machine readable form
Enables all kinds of linguistic description and analysis
5. Corpus means a large collection of texts assumed to be
representative of a given language, dialect or other subset of
language, to be used for linguistic analyses.
Corpus is a large collection of pieces of language that are
selected and ordered according to some explicit linguistic
criteria in order to be used as samples of the language.
Corpus is a large collection of naturally occurring language
texts presented in machine-readable form accumulated in
scientific manner to characterize a particular variety or use of
language.
6. A corpus, which contains constituent pieces of language
that are documented as to their origin and provenance, is
encoded in a standard and homogenous way for open-
ended retrieval tasks.
Linguistics have always used the word ‘Corpus’ to
describe a collection of naturally occurring examples of
language ,consisting of anything from a set of written text
or tape recordings which have been collected for linguistic
study.
7. A corpus refers to :
Any body of text
A body of machine-readable text
A finite collection of machine-readable texts which are
sampled to maximally representative of language or
language variety.
Important Issues in Corpus Designing:
Composition of a corpus
Usage potential of a corpus
8. A Corpus should-
Faithfully represent both common and special linguistic features
of a language from where it is designed and developed
Be large enough to encompass samples of text from various
disciplines
Be a true replica of physical texts
Preserve various forms of words, punctuation marks, spellings,
variations and other orthographic symbols used in the source text.
Represent all linguistic usage varieties in a propositional manner
Use authentic, referential and verifiable Text samples
Enable user to use language data in multiple tasks
Preserve texts in annotated and non-annotated form
9.
10. Quantity:
No fixed parameter
The bigger the corpus ,the better its authenticity and
reliability
Data from a variety of sources in large quantity
Refers to the sum of the total linguistic component
included
Electronic corpus generation contains millions of words
11. Quality:
Relates to authenticity
Collection from genuine communications
Depends on ideal restriction of corpus collectors role
Databases should be drawn from actual reality
Interactional properties of casual and informal talks
12. Representativeness:
Proper representation of a broad range of material
Representative of maximum linguistic features
Authentic in representation of text variety
Maximally representative of demographical variables
Overall size of corpus to be set against the diversity of
sources
Random selection of text samples
13. Simplicity:
Simple and plain text samples
Unbroken string of characters without any added
information
Separate Preservation of additional features
Separate storage of Extralinguistic information
14. Equality :
Text sample with equal number of words
balance between spoken text sample and written text
sample
Collection of equal amount of text from all sources
Balance in case of quality of samples
16. Verifiability:
Must be open to empirical verification
Reflective of actual of patterns of language use
Authentic and valid in synchronic and diachronic studies
18. Documentation :
Separation of documentary information from the components
Meticulous documentation of extralinguistic information
Easy retrieval of extralinguistic information (annotated info)
19. Management :
Necessary scheme for maintenance, standardization,
augmentation and upgrading
Preservation of data from virus infection
Displacement of corpus data
Conversion of Corpus data across different formats
Adaptation of new hardware and software technology