I discuss the basics of corpus linguistics, the application of corpus linguistics on linguistic studies and second language learning, as well as some freely available corpus linguistics resources for beginner corpus linguists.
Citation: Zubaidi, N. (2021). Corpus linguistics: An introduction. UM de Universe 2021. doi: 10.13140/RG.2.2.25479.11683
Description of the subsystems of language and how teachers can draw on their knowledge of language and its subsystems to support ELs in their acquisition of language
Description of the subsystems of language and how teachers can draw on their knowledge of language and its subsystems to support ELs in their acquisition of language
Introductory lecture on Corpus Linguistics. Contents: Corpus linguistics: past and present, What is a corpus?, Why use computers to study language? Corpus-based vs. Intuition-based approach, Theory vs. Methodology.
This lecture was based on McEnery et al. 2006. Corpus-based Language Studies. An Advanced resource book. Routlege.
Introductory lecture on Corpus Linguistics. Contents: Corpus linguistics: past and present, What is a corpus?, Why use computers to study language? Corpus-based vs. Intuition-based approach, Theory vs. Methodology.
This lecture was based on McEnery et al. 2006. Corpus-based Language Studies. An Advanced resource book. Routlege.
Translanguaging in self-access language advising: Informing language policy
Presenters: John Adamson and Naoki Fujimoto-Adamson, University of Niigata Prefecture, Japan
This presentation investigates talk between language advisors and students in a university self access learning center in Japan and how it informs language policy in the center. Its initial ‘English-only’ language policy has shifted to one in which “translanguaging” (Creese & Blackledge, 2010, p. 105) between Japanese and English now predominates in advisory sessions. Qualitative data from advisory sessions, mentor interviews and student questionnaires reveal that translanguaging encourages “local, pragmatic coping tactics” (Lin, 2005, p. 46) and that the mentors’ strategic code-switching presents them as plurilingual “near peer role models” (Murphey, 1996) among students. Despite these positive findings, data also reveals that some students want mentors to enforce monolingual language rules, and others may feel “guilt” (Setati et al, 2002, p.147) when using Japanese. Conclusions imply that the translanguaging of self-access center advisory sessions is helping to create a valid alternative to the ‘English only’ policy commonly seen in classrooms.
CBTS is considered a new paradigm in the discipline of Translation Studies. it is also considered a new methodology , which based is on Corpus linguistics and DTS......
How to develop effective concordance materials using online corpusengedukamall
Chun, Sooin (2014, September). How to develop effective concordance materials using online corpus. Paper presented at the meeting of KAMALL Annual Conference 2014, Seoul, Korea.
Comm skills & multiple intelligences approach to communicative teachingShelia Ann Peace
June, 2013 report given for a Professional Development Seminar: K.S.A. English Prep Year Program.
Teacher Research into the use of Howard Gardner's Multiple Intelligences applications for the teaching of Communication Skills to Saudi Prep Year English students.
Nanang Zubaidi - Week 2 - Gender and English Language TeachingNanang Zubaidi
I discussed the gender and its position and relation with English language teaching. Sub-topics discussed: gender, sexuality, gender and language, gender & ELT
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
4. Jens Martensson
• Very few language education program
studies in Indonesia offered corpus
linguistics (CL) course to students.
• CL has been treated as method only,
rather than as theory and field of
study.
• Zubaidi et al. (2021): most senior high
school English teachers in Malang (N=27)
have never learned nor used CL in their
teaching.
• Purpose: Introduce corpus linguistics to
beginner language practitioners
4
6. Jens Martensson
Corpus (pl. Corpora)
• Corpus (Latin for “body”)
• collection of LARGE, STRUCTURED,
AUTHENTIC TEXTS
• ELECTRONICALLY stored and processed
(MACHINE READABLE DATA/TEXT)
• SAMPLED to be REPRESENTATIVE of a
particular language use/variety (Xiao, n.d.)
• Corpus vs text archive vs database
• The LARGER the texts, the MORE RELIABLE
the generalization of language use.
6
(Zubaidi, 2021) UM de Universe
7. Jens Martensson
1 2
3 4
5 6
7
Most corpora are written
• Written text is EASIER TO OBTAIN than
spoken text
• Newspapers
• Fiction (e.g. Novels, poems)
• Technical Literature (e.g. manuals,
medicine)
• Personal letters & e-mail
• Advertising (e.g. political propaganda)
• Belief and Thought (e.g. Quran, Bible)
• www
(Zubaidi, 2021) UM de Universe
8. Jens Martensson
Corpus Linguistics (CL)
• The study of language as expressed in
corpora (samples) of "real world" text.
• Aim: checking OCCURRENCES/validating
LINGUISTIC RULES in a specific language
area
• Four primary characteristics of CL:
• SAMPLING and REPRESENTATIVENESS;
• FINITE SIZE;
• MACHINE-READABLE form;
• standard reference.
8
(Zubaidi, 2021) UM de Universe
9. Jens Martensson 9
(Zubaidi, 2021) UM de Universe
Theoretical, Interdisciplinary and Applied Linguistics
(Dendrinos, n.d.)
10. Jens Martensson
Theoretical Linguistics
• competence (what is
grammatical?)
• introspection
• indefinitely many types,
productivity
• grammatical vs.
ungrammatical
Corpus Linguistics
• performance (what is
attested?)
• instances
• finite number of types
• degrees of grammaticality
10
(Zubaidi, 2021) UM de Universe
Comparison
15. Jens Martensson
Size of Corpora
• CORPUS SIZE increases with the
DEVELOPMENT OF TECHNOLOGY
• 1960s-70s: 1 million (Brown and LOB)
• 1980s: 20 millions (The
Birmingham/Cobuild)
• 1990s: 100 millions (BNC)
• 2000s: 645 millions (The Bank of English)
• 2021: billions (BYU corpora)
15
(Zubaidi, 2021) UM de Universe
16. Jens Martensson
Types of Corpora
• Raw vs. annotated corpora
• Automatically annotated vs. manually annotated
corpora
• General/balanced/reference vs. special corpora
• Spoken vs. written language
• Monolingual vs. Multilingual Corpora
• Parallel vs. comparable corpora
• Synchronic vs. diachronic corpora
• Static/sample vs. dynamic/monitor corpora
• Native vs. learner corpora
• Developmental vs. learner/interlanguage corpora
16
(Zubaidi, 2021) UM de Universe
17. Jens Martensson
Popular English Corpora
• The British National Corpus (BNC)
• The Bank of English (BoE)
• BYU AMERICAN ENGLISH CORPUS (COCA,
WIKIPEDIA, GOOGLE, GOOGLE BOOKS)
• Corpora of Brown family (Brown, LOB, FLOB, Frown)
• ICE corpora (GB, EA, HK, Singapore, Philippines, New
Zealand etc)
• London-Lund corpus of spoken English
• SBCSAE
• The Helsinki Diachronic Corpus of English Texts (8th -
18th Century)
• The International Corpus of Learner English (ICLE)
• MICASE
17
(Zubaidi, 2021) UM de Universe
19. Jens Martensson
CL on language teaching &
Classroom
• Technology has become globally
widespread and accessible
• Larger, powerful computers that can
analyze large data are available
• Many corpus-related resources are
available FOR FREE
• Language teachers and learners can use
corpora
• http://iteslj.org/Articles/Krieger-
Corpus.html
19
(Zubaidi, 2021) UM de Universe
20. Jens Martensson
CL analysis
• Basis analysis:
• Listing, Sorting, Counting of
Concordances (KWIC)
• Complex analysis:
• Processing using complex programs
(e.g. Complex Ana, WordSmith Tools)
20
(Zubaidi, 2021) UM de Universe
21. Jens Martensson
Possible application of CL
1. Corpora as a source of
empirical data
2. Corpora in language
teaching and learning
3. Corpora in lexical studies
4. Corpora in speech research
5. Corpora in grammar
studies
6. Corpora and semantic
studies
7. Corpora in pragmatic and
discourse studies
21
(Adorjan, 2020; Nkemleke, 2008, 2009)
1. Corpora in sociolinguistic
studies
2. Corpora and stylistic
studies
3. Corpora in historical
linguistics
4. in dialectology and
variational studies
5. in Psycholinguistics
6. in cultural studies
22. Jens Martensson
Examples: CL application
1. Corpora in Teaching and Learning
• Real life language data for textbook
examples
• Critical look at existing language teaching
material
22
(Adorjan, 2020; Nkemleke, 2008, 2009)
23. Jens Martensson
CL Classroom Application
• Low contact vs high contact uses (Kitao &
Kitao, n.d.)
• Low: teacher uses CL to help in teaching
• High: student uses the corpora to learn
about language
• Data-driven learning: e.g. determining word’s
connotation (whether positive or negative)
23
(Zubaidi, 2021) UM de Universe
24. Jens Martensson
CL Classroom Application
• Low contact vs high contact uses (Kitao &
Kitao, n.d.)
• Low: teacher uses CL to help in teaching
• High: student uses the corpora to learn
about language
• Data-driven learning: e.g.
determining word’s connotation
(whether positive or negative)
24
(Zubaidi, 2021) UM de Universe
25. Jens Martensson
CL Classroom Application:
Strengths
• CL: rich, varied, and authentic language
database.
• Effective tool to teach and learn vocabulary
• it may motivate and attract students to the
language type
• valuable tool for conducting linguistic research,
and
• training learners to actively control their learning
(Adorjan, 2020; Ebrahimi & Faghih, 2016)
25
(Zubaidi, 2021) UM de Universe
26. Jens Martensson
CL Classroom Application:
Weaknesses
• only accessible using computers/mobile phones
and the internet
• Time- and energy-consuming and difficult to learn
the software and design corpus-based activities.
(Adorjan, 2020; Ebrahimi & Faghih, 2016)
26
(Zubaidi, 2021) UM de Universe
28. Jens Martensson
1 2
3 4
5 6
28
Corpus-related Internet
resources
1. General resources on corpus linguistics
2. Vocabulary frequency lists and
frequency level checkers
3. Online corpora, concordancers, and
other text-analysis software
4. E-texts
5. Information about using corpus
linguistics for language teaching
(Zubaidi, 2021) UM de Universe
29. Jens Martensson
FREE Corpus Analysis Tools
• Types: Tools with specific corpora vs tools
with any/collection of texts
• General: Word, Excel, etc.
• Specialized:
• Counting words
• Finding example of specific words or
parts of speech
• Analyzing word frequencies
• Evaluating readability
http://www.cis.doshisha.ac.jp/kkitao/library/
resource/corpus/corpus.htm
29
(Zubaidi, 2021) UM de Universe
30. Jens Martensson
Corpus Analysis Tools: Concordancer
• SOFTWARE: AntConc, MonoConc, Wordsmith
• ONLINE:
• Turbo Lingo: http://www.staff.amu.edu.pl/~sipkadan/lingo.htm
• VIEW (Variation in English Words and Phrases):
http://view.byu.edu/
• BNCweb: http://bncweb.lancs.ac.uk/bncwebSignup/
• Lextutor: http://www.lextutor.ca/concordancers/concord_e.html
• WebCorp: http://www.webcorp.org.uk/
• Text Lex Compare: http://www.lextutor.ca/text_lex_compare/
30
31. Jens Martensson
References
• Dendrinos, B. (n.d.) Unit 1: An Introduction to Applied Linguistics. Applied
Linguistics to Foreign Language Teaching and Learning. University of Athens,
Greece.
• Nkemleke, D. (2008). Corpus Linguistics and Language Education: Development
and Utility of the Corpus of Cameroon English. Humboldt Kolleg Kamerun.
• Nkemleke, D.A. (n.d.). Corpus Linguistic Development with reference to
Cameroon. University of Yaounde I.
• Say, B.
• Volk, M. (n.d.). Korpuslinguistik mit und für Computerlinguistik. Universität Zürich.
• Xiao, R. (n.d.). Corpus design and types of corpora. University of Lancaster.
31
(Zubaidi, 2021) UM de Universe
32. Jens Martensson
Conclusion
CL is fast developing area.
• Trend is on the utilization of technology into
various areas of linguistics (theoretical &
applied) and literature.
• It needs to be taught not only as a research
method, but also as field of study.
• Our ever-changing curriculum has not
accommodated it. We need a breakthrough.
32
(Zubaidi, 2021) UM de Universe