British national corpus


Published on

Published in: Education
  • Be the first to comment

British national corpus

  1. 1. .<br />The British National Corpus (BNC) is one of the mostimportant corpus in the field of linguistics. The content of BCN contains British English data from the late twentiethcentury. This corpus covers a variety of differentgenres.<br />
  2. 2. CREATION OF THE BRITISH NATIONAL CORPUS (BCN)<br />The project was developed by an academic consortium called the BCN Consortium, The British Library and the British Academy. In addition, some other academic research centres are the theUniversity Centre for Computer Corpus Research on Language and the Oxford University Computing Services.<br />
  3. 3. The construction of the corpus began in 1991 and it was finished in 1994. Although no more texts were added to the corpus, there was a revision of this work carried out in 2001 with the publication of the BNC Worldand again in 2007 with a new edition called BNC XML Edition. <br />
  4. 4. The corpus is divided into two types of different type of corpora which are:<br />● The BCN Sampler is a collection of one million written words. <br />● The BNC Baby collects aboutfour one-million word samples which belong to different genres<br />
  5. 5. The British National Corpus follows the Guidelines of the TextEncodingInititative. There are two different parts which constitute the corpus:<br />● Written part: (90%). It covers data from several sources like books, periodicals, brochures and leaflets. In addition the written part covers regional and national newspapers, journals for all ages and interests, academic books, popular fiction, university essays etc <br />
  6. 6. ● Spokenpart(10%): That partextractsinformation from orthographictranscriptions of informal conversations and spoken language collected in different contexts.<br />
  7. 7. Why to use The British national corpus (bnc)?<br />The BNC can be used to know about aspects we did not know about a word and to check our thoughts about its meaning. Moreover, the corpus can help to find out the meaning of a word not just what we think it means. The BNC offers many options like for instance to know if a word can collocate with other set of words, if it is grammatically right in some specific contexts.<br />
  8. 8. If we look for the word the word “bent” plus the preposition “on” the BNC indicates that this combination of words appear together in a specific context. From a grammatical point of view, the British National Corpus determines that “Bent to” can only be followed by a noun or noun phrase, or by verb plus the suffix- ing. Let’s look at it in the next image:<br />
  9. 9.
  10. 10. HOW TO USE THE BRITISH NATIONAL CORPUS<br />There exists two ways of using the British National Corpus according to its complexity:<br />●Xaira: It can be used to check the spelling of a word, compare different variants to measure the frequency of use and if a certain word is part of the BCN.<br />● The BNC Simple Search: It is a quick way of searching a word / phrase. This type of search can be used to check the spelling of a word and also to compare the frequency and variants of a word.<br />
  11. 11. If we use the BNC Simple search, it is necessary to type the word or phrase in the search box that the person wants to find. Once the word/ phrase has been search a list of up to 50 selected instances headed by a note of the total frequency of use of them appears on the screen. <br />
  12. 12.
  13. 13. If we want to look for more complex queries we should add the following characters to the words. The _ character is used to match single words, while the = character allows the restriction of chains of speech and the use of braces {} helps to define a certain expressions<br />
  14. 14. In addition, in the screen, four options are part of the option “display”of the corpus when we are looking for a word: LIST, CHART, KWICand COMPARE. Then there are three more options under the label of search string which are: word, collocationand pos list. <br />
  15. 15.
  16. 16. In addition, there is a section called “sorting and limits”. The sorting can be looked in terms frequency, relevance and alphabetical order.<br />
  17. 17. The corpus includes several categories or labels of texts from different nature which are ““spoken”, fiction”, “magazine”, “newspaper” or “non- academic texts”<br />
  18. 18. For instance, if we look for the word “couch”,the corpus shows us that this word collocates with different words: lying, lay, room, potato etc. After having clicking on one of this word several examples will appear on the screen. The corpus allows looking for a word or phrase but at the same time the possibility of finding collocations. To look for a collocation is as easier as to type the word which wants to be searched and automatically an asterisk will appear on the box of collocation. Once the search has been produced the corpus displays a list of words which collocates with the word.Let’s see:<br />
  19. 19.
  20. 20.
  21. 21.
  22. 22. The KWIC search enriches the corpus because it helps the person which is looking for the word to know in which grammatical structures and contexts the word appears. For example, if we look for the word: “shoes” the corpus shows in colours the different words which can be used with this word. “A new pair of”, “the soles of our”, “the second hand”, “new polished”, or “thousands of” etc.<br />
  23. 23.
  24. 24.
  25. 25. COMPARISON BETWEEN THE BRITISH NATIONAL CORPUS AND THE COCA<br />In terms of size there is a huge difference between both corpuses as the COCA is four times bigger than the BNC. The COCA is made up of 410 + million words in opposition to the BNC which covers 100 million words. In relation to the composition of both corpuses the COCA focuses on spoken, popular magazines, academic texts and each of those genres means a 20% of the total. <br />
  26. 26. However, the BNC is strictly divided in a 90% which is written while the other 10% is spoken English. As a result the COCA deals with more recent information as the corpus was updated while BNC focuses more on everyday language.<br />
  27. 27. Information sources<br />British National Corpus. (2011, April 9th ). In Wikipedia. Retrieved 19: 40, April 9th , 2011, from:<br />British National Corpus . .Retrieved 9th April, 2011<br />BYU-BNC: BRITISH NATIONAL CORPUS. Mark Davies / Brigham Young University. Retrieved 19:40, April 9th , 2011, from<br />Encoding the British National Corpus. Retrieved 19:40, April 9th 2011 from<br />“Phrases in English” (PIE) and the British National Corpus. Retrieved 19:40, April 9th, 2011.<br />