Corpus linguistics

  1. 1. U of K Faculty of Arts Department of Linguistics Applied Linguistic Corpus linguistics By, Ahmed Sosal ALtayeb M.Concordance/corporaA “concordance”, according to the Collins Cobuild English Language Dictionary (1987), is “Analphabetical list of the words in a book or a set of books which also says where each word can befound and often how it is used.”Concordancers are used extensively these days for creating glossaries and dictionaries, and theyare extremely valuable tools for the language teacher but, as Chambers, Farr & ORiordan(2011:86) point out, there is still considerable resistance among language teachers (both of EFLand of Modern Foreign Languages) to make use of corpora and concordancers.The main advantage of concordancers is summed up by Tim Johns, in reference to a phrase thathe frequently used "the company that words keep" in the following paragraph: MicroConcord [...] offers both language learners and language teachers a research tool for investigating "the company that words keep" that has hitherto usually been available only on mainframe computers to academic researchers in such fields as computational linguistics, lexicography, and stylistics (Hockey 1980). (Johns 1986b:121) 1. Some kinds of concordance (corpora)1.1. Basic manual concordanceFor instance here is a concordance for the word "sin" in English, prepared manually, and shownwith the text from which the four separate occurrences of this word are taken.Concordance 1 on the word "sin": 1
  2. 2. 1. Thus from my lips, by Sin is purged. yours, my 2. Then have my lips the Sin that they have took. 3. Sin from thy lips? O trespass sweetly urged! 4. Give me my Sin again.Text used as basis for the concordance, with the keyword in bold: JULIET Ay, pilgrim, lips that they must use in prayer. ROMEO O, then, dear saint, let lips do what hands do; They pray, grant thou, lest faith turn to despair. JULIET Saints do not move, though grant for prayers’ sake. ROMEO Then move not, while my prayer’s effect I take. Thus from my lips, by yours, my sin is purged. JULIET Then have my lips the sin that they have took. ROMEO Sin from thy lips? O trespass sweetly urged! Give me my sin again.According to this concordance may be defined as a list of words (called keywords, e.g. here"sin"), taken from a piece of authentic language (corpus, e.g. here Romeo and Juliet), displayed inthe centre of the page and shown with parts of the contexts in which they occur (here maximum29 characters to the left of the keyword and to the right). This is also known as a Key Words InContext concordance or a KWIC concordance.1.2. Computer-generated concordanceThe following is the same concordance, displayed with fuller context (here between 75 and 80characters each side, including blank spaces): 1. move not, while my prayer’s effect I take. Thus from my lips, by yours, my sin is purged. JULIET Then have my lips the sin that they have took. ROMEO 2. Thus from my lips, by yours, my sin is purged. JULIET Then have my lips the sin that they have took. ROMEO Sin from thy lips? O trespass sweetly urged! 3. is purged. JULIET Then have my lips the sin that they have took. ROMEO Sin from thy lips? O trespass sweetly urged! Give me my sin again 2
  3. 3. 4. they have took. ROMEO Sin from thy lips? O trespass sweetly urged! Give me my sin again.Concordance 2 on the word "sin":Below, is a concordance on the same keyword, based this time on a 25-citation sample created bya concordancer, using contemporary including British and American books, ephemera,newspapers, magazines, radio transcripts and transcriptions of ordinary conversations. 1. said cohabiting was no longer a sin . Serbs free last six 2. daily care of others was the ultimate sin . We arranged for Ted to spend a 3. remarkable. Shaw’s rendition was a sin against culture, an insult to Eliot 4. them that God wants them to turn from sin and transform their lives. Women 5. the ascendancy to and loss of power; sin and redemption; self-doubt and 6. to prove that all that a life of sex, sin and St Tropez sun brings is wrinkles 7. taken seriously. Julian’s account of sin and forgiveness stands unexcelled 8. deepening anxiety over the question of sin and evil, she took it up. Carolly 9. can spring as much from a sense of sin as from sanctity. That, thank God, 10. Roebuck was dismissed to the sin bin for 10 minutes for his part in 11. is pride, covetousness, deceit and sin , but say you’ll accept adultery and 12. is like Sodom and Gomorrah -you Sin City. So the very word know, Youngstown 13. of rubber safety bumpers, as ugly as sin . Few mourned its passing. [p] That 14. White.26 He finds the earthly ideas of sin , guilt, punishment, good and evil 15. BERLIN CABARET NOW Decadence, sin … bohemian excess… Once satire, 16. sumptuous food shops. with a sense of sin , I bought some on Nevsky Prospekt 17. to mine without a tumble. The only sin I’ve committed is not having you 18. sin of all: I have heard of a certain sin . I thank God that I do not know of 19. cannot announce God’s forgiveness of sin in the Absolution and cannot 20. It was during the Reformation that sin in Scotland really got going. Any 21.sin is prevalent. Although this sin is a comment on all of mankind, it 22. sounds a bit stage-ethnic: `The only sin is to believe that happiness is gone 23. insisting on the concept of original sin . It would take on a kind of 24. bed the selfsame one! More primal than sin itself, this fell to me. [f] 25. do nothing to deal with her problem of sin . Joni was disturbed by Carl’s 3
  4. 4. 1.3. Parallel concordanceA parallel French-English concordance on "pour" using an extract from Le Petit Prince byAntoine de Saint Exupéry Original text Translation 1. Ainsi, quand il aperçut POUR la 1. The first time he saw my aeroplane, première fois mon avion [...] for instance [...] 2. Alors elle avait forcé sa toux 2. Then she forced her cough a little POUR lui infliger quand même des more SO THAT he should suffer remords. from remorse just the same. 3. -Approche-toi que je te voie mieux, 3. “Approach, so that I may see you lui dit le roi qui était tout fier d’être better,” said the king, who felt enfin roi POUR quelqu’un. consumingly proud of being at last a king OVER somebody. 4. Car, POUR les vaniteux, les autres 4. For, TO conceited men, all other hommes sont des admirateurs. men are admirers. 5. C’est comme POUR la fleur. “ 5. It is just as it is WITH the flower. 6. C’est donc POUR ça encore que 6. It is FOR THAT PURPOSE, j’ai acheté une boîte de couleurs et again, that I have bought a box of des crayons. paints and some pencils. 7. C’est le même paysage que celui de 7. It is the same as that on page 90, la page précédente, mais je l’ai but I have drawn it again TO impress dessiné une fois encore POUR bien it on your memory. vous le montrer. 8. Elle ferait semblant de mourir 8. She would [...] pretend that she was POUR échapper au ridicule. dying, TO avoid being laughed at. 9. et c’était bien commode POUR 9. and they were very convenient faire chauffer le déjeuner du matin FOR heating his breakfast in the morning., 10. Il commença donc par les visiter 10. He began therefore, by visiting POUR y chercher une occupation et them, IN ORDER TO add to his POUR s’instruire. knowledge. 11. Il me fallut longtemps POUR 11. It took me a long time TO learn comprendre d’où il venait. where he came from. 12. J’avais le reste du jour POUR me 12. I had the rest of the day FOR reposer, et le reste de la nuit POUR relaxation and the rest of the night dormir... FOR sleep.” 13. POUR toi je ne suis qu’un renard 13. TO you, I am nothing more than a semblable à cent mille renards fox like a hundred thousand other foxesConclusion 4
  5. 5. A corpus is either just one text or a collection of texts. In the samples of KWIC concordancesfrom Romeo and Juliet are shown above. In this case the corpus was Shakespeare’s play. Acorpus can also be just one student’s essay. It goes without saying that if the intention is to studythe style of, say, Shakespeare the corpus must be limited to his works, but if the intention is tostudy the grammar and semantics of a whole language, the corpus must contain many textsrepresenting many genres. Likewise: If we want to study 18th-century English we must makesure that the corpus contains a representative amount of texts from the 18th century only. So thecontents of a corpus depend on the aims of the user.ReferenceLamy M-N. & Klarskov Mortensen H. J. (2011) Using concordance programs in the ModernForeign Languages classroom.Available at: 5