Corpus Linguistics



         Jesus Guerrero Perez

    Corpus linguistics means to explore actual
    patterns of language use and as a tool for
    developing materials for classroom language
    instruction. Corpus linguistics provides an
    extremely powerful tool for the analysis of
    natural language and can provide
    tremendous insights as to how language use
    varies in different situations, such as spoken
    versus written or formal interactions versus
    casual conversation.
   A corpus refers to a large principled
    collection of natural texts. The process of
    creating written transcripts of spoken
    language can be quite time – consuming,
    involving a series of choices based on
    research interests of the corpus compilers
   Corpus design and compilation A corpus,
    as a defined above, is a large and
    principled collection of texts stored in
    electronic format.

   Types of corpora There are many types of
    corpora as there are research topics in
    linguistics General corpora Specialized
    corpora Learners corpus
   Issues in corpus design One of the most
    important factors in corpus linguistics is
    the design of the corpus. A corpus of one
    million words will not be large enough to
    provide reliable information about less
    frequent lexical items. An issue to
    consider in devising a representative
    sample whether or not it should be based
    on production or reception.

    Corpus compilation When creating a corpus ,
    data collection involves obtaining or creating
    electronic versions of the target texts.
    Written data are far less labor than spoken
    corpora. The data collection phase of building
    a spoken corpus is lengthy and expensive.
    Most spoken corpora use orthographic
    transcription system that does not attempt to
    capture prosodic details or phonetic variation.

Corpus linguistics

  • 1.
    Corpus Linguistics Jesus Guerrero Perez
  • 2.
    Corpus linguistics means to explore actual patterns of language use and as a tool for developing materials for classroom language instruction. Corpus linguistics provides an extremely powerful tool for the analysis of natural language and can provide tremendous insights as to how language use varies in different situations, such as spoken versus written or formal interactions versus casual conversation.
  • 3.
    A corpus refers to a large principled collection of natural texts. The process of creating written transcripts of spoken language can be quite time – consuming, involving a series of choices based on research interests of the corpus compilers
  • 4.
    Corpus design and compilation A corpus, as a defined above, is a large and principled collection of texts stored in electronic format.  Types of corpora There are many types of corpora as there are research topics in linguistics General corpora Specialized corpora Learners corpus
  • 5.
    Issues in corpus design One of the most important factors in corpus linguistics is the design of the corpus. A corpus of one million words will not be large enough to provide reliable information about less frequent lexical items. An issue to consider in devising a representative sample whether or not it should be based on production or reception.
  • 6.
    Corpus compilation When creating a corpus , data collection involves obtaining or creating electronic versions of the target texts. Written data are far less labor than spoken corpora. The data collection phase of building a spoken corpus is lengthy and expensive. Most spoken corpora use orthographic transcription system that does not attempt to capture prosodic details or phonetic variation.