This document provides information about what a corpus is and how it can be used. It defines a corpus as a large collection of written or spoken text stored electronically. It then lists some examples of text that may be included in a corpus, such as books, periodicals, unpublished letters, and transcripts of spoken language. The document also categorizes different types of written and spoken language that could be included in a corpus, such as academic writing, novels, phone calls, and parliamentary debates. It concludes by explaining how analyzing corpora can help language learners with word choice, noticing patterns, and finding answers to language questions.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Corpus methods pia paola
1. 1/18
What is a corpus?
• Corpus (pl. corpora) = „body‟
• Collection of written text or transcribed
speech
• Usually but not necessarily purposefully
collected
• Usually but not necessarily structured
2. CORPUS is a large collection of
searchable text stored electronically…
3. 3/18
*
*60% books
*25% periodicals
* 5% brochures and other ephemera
*eg bus tickets, produce containers, junk mail
* 5% unpublished letters, essays, minutes
* 5% plays, speeches (written to be spoken)
4. 4/18
*
*30% literary or technical “high”
*45% “middle”
*25% informal “low”
*Obvious difficulty of how to judge levels a
priori
5. 5/18
*
*Context-governed material
*Lectures, tutorials, classrooms
*News reports
*Product
demonstrations, consultations, interviews
*Sermons, political speeches, public
meetings, parliamentary debates
*Sports commentaries, phone-ins, chat shows
*Samples from 12 different regions
6. 6/18
Spoken (300) Dialogues (180) Private (100) Conversations (90)
Phone calls (10)
Public (80) Class lessons (20)
Broadcast discussions (20)
Broadcast interviews (10)
Parliamentary debates (10)
Cross-examinations (10)
Business transactions (10)
Monologues (120) Unscripted (70) Commentaries (20)
Unscripted speeches (30)
Demonstrations (10)
Legal presentations (10)
Scripted (50) Broadcast news (20)
Broadcast talks (20)
Non-broadcast talks (10)
Written (200) Non-printed (50) Student writing (20) Student essays (10)
Exam scripts (10)
Letters (30) Social letters (15)
Business letters (15)
Printed (150) Academic (40) Humanities (10)
Social Sciences (10)
Natural Sciences (10)
Technology (10)
Popular (40) Humanities (10)
Social Sciences (10)
Natural Sciences (10)
Technology (10)
Reportage (20) Press reports (20)
Instructional (20) Administrative writing (10)
Skills/hobbies (10)
Persuasive (10) Editorials (10)
Creative (20) Novels (20)
ICE text
categories
Each
sample
should be
2000 words
7. HOW CAN IT HELP?
A corpus can help language learners to take better decisions.