Gabriela da Costa Rosa
Ha Na Choi
Ye Jin Choi
What is a corpus?
“A corpus can be defined as a systematic collection of
naturally occurring texts (of both written and spoken
language)”. Nesselhauf, 2005
What is corpus linguistics?
“Corpus linguistics thus is the analysis of naturally
occurring language on the basis of computerized
corpora. Usually, the analysis is performed with the
help of the computer, i.e. with specialized software,
and takes into account the frequency of the
phenomena investigated”. Nesselhauf, 2005
Advantages
 Language in its natural use - frequency and context.
Disadvantages
 The data collected is not explained
Interactive work
Corpus of Contemporary American
English (COCA)
 What can we use it for?
Learning English - solve doubts
Teaching English - prepare and apply class activities
How to use COCA
Displays:
 LIST: Shows a list of word(s) or combination of words
(according to their frequency)
 CHART: Shows a chart comparing frequencies of a
word in different genre or time.
 KWIC: Shows the key word(s), i.e. search word(s), in
contexts
 COMPARE: Compares two words according to their
frequencies (just generally or with a certain
collocate)
How to use COCA
 COLLOCATES: a word (not a phrase) that occurs
within up to 10 words before / after the search word(s)
You can choose the collocation range by clicking two
little boxes next to theCOLLOCATE box
 POS LIST: List of “parts of speech” - to look for a part
of speech (a noun, a verb etc.) that occurs after a word
Other devices:
 [word] – forms of a word
 [=word] –synonyms
 word.[n*] – specific class
 word * - what comes after the word
 word* - what come in the end of the word
How to use COCA
Practical use of COCA
http://corpus.byu.edu/coca/
Suggestion of activities
Activity 1
Search for the most frequent word in a genre
Look at the example sentence below. Answer questions a) - c).
Then, circle the word that you should use in your paper.
I am fully / totally aware of the problem.
In which genre is “totally” most frequently used?
In which genre is “fully” most frequently used?
So, which word would you use in your paper?
Activity 2
Examples of sentences with phrasal verbs
 Find examples in COCA of the following phrasal verbs:
 Come across
 Figure out
 Look after
 Work out
Activity 3
Inferring the meaning of words in the context
Sensitive
X
Sensible
Tutorials on Youtube
 Introduction to COCA
https://www.youtube.com/watch?v=sCLgRTlxG0Y
 Using Part-of-Speech Tags
https://www.youtube.com/watch?v=KP-7thiUnLM
 Collocations
https://www.youtube.com/watch?v=t_SxpfiPo_o

Corpus linguistics

  • 1.
    Gabriela da CostaRosa Ha Na Choi Ye Jin Choi
  • 2.
    What is acorpus? “A corpus can be defined as a systematic collection of naturally occurring texts (of both written and spoken language)”. Nesselhauf, 2005
  • 3.
    What is corpuslinguistics? “Corpus linguistics thus is the analysis of naturally occurring language on the basis of computerized corpora. Usually, the analysis is performed with the help of the computer, i.e. with specialized software, and takes into account the frequency of the phenomena investigated”. Nesselhauf, 2005
  • 4.
    Advantages  Language inits natural use - frequency and context. Disadvantages  The data collected is not explained Interactive work
  • 5.
    Corpus of ContemporaryAmerican English (COCA)  What can we use it for? Learning English - solve doubts Teaching English - prepare and apply class activities
  • 6.
    How to useCOCA Displays:  LIST: Shows a list of word(s) or combination of words (according to their frequency)  CHART: Shows a chart comparing frequencies of a word in different genre or time.  KWIC: Shows the key word(s), i.e. search word(s), in contexts  COMPARE: Compares two words according to their frequencies (just generally or with a certain collocate)
  • 7.
    How to useCOCA  COLLOCATES: a word (not a phrase) that occurs within up to 10 words before / after the search word(s) You can choose the collocation range by clicking two little boxes next to theCOLLOCATE box  POS LIST: List of “parts of speech” - to look for a part of speech (a noun, a verb etc.) that occurs after a word
  • 8.
    Other devices:  [word]– forms of a word  [=word] –synonyms  word.[n*] – specific class  word * - what comes after the word  word* - what come in the end of the word How to use COCA
  • 9.
    Practical use ofCOCA http://corpus.byu.edu/coca/
  • 10.
  • 11.
    Activity 1 Search forthe most frequent word in a genre Look at the example sentence below. Answer questions a) - c). Then, circle the word that you should use in your paper. I am fully / totally aware of the problem. In which genre is “totally” most frequently used? In which genre is “fully” most frequently used? So, which word would you use in your paper?
  • 12.
    Activity 2 Examples ofsentences with phrasal verbs  Find examples in COCA of the following phrasal verbs:  Come across  Figure out  Look after  Work out
  • 13.
    Activity 3 Inferring themeaning of words in the context Sensitive X Sensible
  • 14.
    Tutorials on Youtube Introduction to COCA https://www.youtube.com/watch?v=sCLgRTlxG0Y  Using Part-of-Speech Tags https://www.youtube.com/watch?v=KP-7thiUnLM  Collocations https://www.youtube.com/watch?v=t_SxpfiPo_o