What is Corpus linguistics?Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely derived by an automated process.
One of the main contributions of corpus linguistics is in the area of exploring patterns of language use. Corpus linguistics provides an extremely powerful tool for the analysis of natural language an use varies in different situations.
As a result of these advances there are typically four features that are seen as characteristic of corpus bases analyses of language:o It’s empirical, analyzing the actual patterns of use in natural texts.o It utilizes large and principled collection of natural texts, known as a ‘corpus’ the basis for analysiso It makes extensive use of computers for analysis, using both automatic and interactive techniqueso It depends on both quantitative and qualitative analytical techniques
Corpus Design and CompilationA corpus is a large and principled collection of texts stored in electronic format. There is no minimum size for a text collection to be considered a corpus. This is a significant development as it enables researchers all over the world to access the same sets of data which not only encourages a higher degree of accountability in data analysis, nut also permits collaborative word an follow up studies by different researcher.
Types of CorporaThere are as many types f corpora as there are research topics in linguistics. General corpora, such as the Brown Corpus, the LOB, or the BNC, aim to represent language I its broadest sense and to serve as a widely available resource for baseline or comparative studies of general linguistic features.A general corpus is designed to be balanced and include language samples from a wide range of registers or genres, including both fiction and nonfiction in al their diversity.
Corpus CompilationWhen creating a corpus, data collection involves obtaining or creating electronic versions of the target texts, and storing and organizing them. Written corpora are far less labor intensive to collect than spoken corpora.The data collection phase of building a spoken copus is lengthy and expensive. The first step is to decide on a transcription system.
Word Counts and Basic Corpus ToolsThere are many levels of information that can be gathered from a corpus. These levels range from simple word lists can reveal both linguistic associating patterns.The tools that are used for these analyses range from basic concordance packages to complex interactive computer programs.