CORPUS-INTERGRATED DATA-DRIVEN LEARNING IN LANGUAGE CLASSROOM: A REVIEW OF EMPIRICAL STUDY
1. CORPUS-INTERGRATED DATA-DRIVEN
LEARNING IN LANGUAGE CLASSROOM:
A REVIEW OF EMPIRICAL STUDY
DR HAFIZAH HAJIMIA
SENIOR LECTURER
UITM ARAU, PERLIS
hafizah.hajimia@uitm.edu.my
2. Introduction
• The growing interest in identity and language education especially in the new norm, coupled with increased interest in digital technology and transnationalism, has resulted in a rich body of work that has informed language learning, teaching, and research.
• Research in Language teaching and learning has been appealed to by the corpus linguistics and data-driven learning (DDL) approach, to refer to
the application of corpora to investigate language use.
• This approach, rooted in the principles of discovery or inquiry learning, allows the practice of learning English through the discovery of
language patterns from the data presented in the corpus – an extensive collection of authentic texts either spoken or written on a computer
for language analysis.
• A growing body of evidence from empirical studies in the language classrooms indicates that corpora offer abundant opportunities for data-
driven learning (DDL) potent for students’ language development.
• Although perceived as a promising language learning approach, DDL appeared to encounter stagnancy on the classroom path.
• Notwithstanding the availability of free online corpora and classroom guides, DDL has not been “part of mainstream teaching practice”
(Boulton, 2010, p. 534).
• The primary purpose of this article is to present a empirical finding which rationalize the practice of corpus-integrated data-driven learning
(CiDDL) in language classroom
3. Corpus-based data-driven learning (CiDDL) is an inductive instructional approach
using computer-generated concordances. It provides students with the
opportunity to analyze different language forms across contexts found in the
concordance output. The idea of engaging students to discover the language
rules and patterns from authentic learning materials is central to the theory of
inquiry-based learning.
Despite the robust research support, however, CiDDL has not been widely adopted, in
part because of a dearth of practical and specific recommendations for educators. More
studies are needed to corroborate the claim that the approach can promote the
4. The use of corpus linguistics – the compilation and analysis of corpora – was
initially advocated by John Sinclair (Johns, 1994; Moon, 2007).
The term data-driven learning (DDL) was later popularized to refer to the
language learning strategy that allows students to be “language detectives” or
“researchers” to explore language data (Johns, 1991).
When Johns initially used DDL with his postgraduate students to improve their
writing, he worked with the limited availability of concordancer compared to
the present day (Boulton, 2012).
Data-Driven Learning?
5. Corpus Linguistics involves the construction of large digital collections of authentic
texts (corpora) and their investigation through software tools (e.g. McEnery and
Hardie 2012).
Systematically compiling data.
Is regarded by many scholars as a technological enhancement to traditional
linguistic methodologies (Houston, 2012).
It is a powerful methodology a way of using computers to assist the analysis of
language so that regularities among many millions of words can be quickly and
accurately identified (Paul Baker and Tony McEnery, 2015).
What is corpus linguistics?
6. CiDDL –new norm - online distance learning- IR4.0
CiDDL is applicable with machine learning and using corpora in various genres which is in line with
the Industrial Revolution (IR 4.0) education.
It caters to the new norm for language learning and easily applicable with ODL that encourages
lifelong learning, consists of flexible feature, instills the 21st century education (pak21), includes
Work-based Learning(WBL) and based on experiential and autonomous learning.
The findings from this study indicated the potential for future teaching materials developers and
ODL classroom practitioners who may want to integrate CiDDL in their language classroom and by
extension, encourage learners to migrate to a self-paced, self-accessed and independent mode of
learning.
7. Findings
Authors Title Year
A Boulton Data‐driven learning: Taking the computer out of the equation 2010
A Boulton, T Cobb Corpus use in language learning: A meta‐analysis 2017
K Ackermann, YH Chen Developing the Academic Collocation List (ACL)–A corpus-driven and expert-judged approach
2013
D Gablasova, V Brezina, T McEnery Collocations in corpus‐based language learning research: Identifying, comparing, and interpreting the evidence
2017
ST Gries, N Otani (2010) Behavioral profiles: A corpus-based perspective on synonymy and antonymy
2010
CF Chang, CH Kuo (2010) A corpus-based approach to online materials development for writing research articles
2011
C Yoon (2011) Concordancing in L2 writing class: An overview of research and issues
2011
J Smart(2014) The role of guided induction in paper-based data-driven learning 2014
A Boulton (2011) Data-driven learning: the perpetual enigma. 2011
AL Frankenberg-Garcia (2014) The use of corpus examples for language comprehension and production
2014
H Yoon, JW Jo (2014) Direct and indirect access to corpora: An exploratory case study comparing students' error correction and learning strategy use in L2 writing
2014
Y Tono, Y Satake, A Miura (2014) The effects of using corpora on revision tasks in L2 writing with coded error feedback
2014
E Cotos (2014) Enhancing Writing Padagogy with Learner Corpus Data 2014
K Oghigian, K Chujo (2010) An effective way to use corpus exercises to learn grammar basics in English
2010
HI Hou (2014) Teaching Specialized Vocabulary by Integrating a Corpus-Based Approach: Implications for ESP Course Design at the University Level.
2014
A Boulton, H Tyne (2013) Corpus linguistics and data-driven learning: a critical overview 2013
YA Breyer (2011) Corpora in Language Teaching and Learning: Potential, Evaluation, Challenges. English Corpus Linguistics. Volume 13.
2011
S Ucar, C Yükselir (2015) The Effect of Corpus-Based Activities on Verb-Noun Collocations in EFL Classes.
2015
H Tyne Corpus work with ordinary teachers: Data-driven learning activities
2012
K Ackerley (2017) Effects of corpus-based instruction on phraseology in learner English
2017
ND Almutairi (2016) The Effectiveness of Corpus-Based Approach to Language Description in Creating Corpus-Based Exercises to Teach Writing Personal
Statements. 2016
M Narita Developing a corpus-based online grammar tutorial prototype 2012
8. This section reviews research findings assessing the corpus-based language
analysis in L2 classrooms.
A meta analysis of Boulton & Cobb (2017) systematically examined previous
studies on the use of corpus linguistics for L2 learning. Other empirical study
findings provide evidence for the rationale and strategies for using corpus
consultation to increase students’ knowledge in different language areas.
This section address under the theme corpora in language teaching and learning
with three sub themes namely meta analysis results, authentic language input and
attention in L2 learning.
Corpora in Language Teaching and Learning
9.
10. Meta- analysis results
The most recent, possibly the first meta-analysis on the use of
corpus in language learning is that of Boulton & Cobb (2017). The
study offers compelling evidence supporting the use of corpus
linguistics for L2 development programs.
Three main conclusions.
1. “DDL research is a flourishing field” with at least 205 publications reporting quantitative study findings since
2014 (2017, p. 381).
2. Second, both the effectiveness and efficiency studies on the use of DDL to increase learners’ L2 skills and
knowledge yielded large effect sizes.
3. Third, DDL showed consistent significant effects in situations where
(a) the presence of native English instructors was limited;
(b) courses targeted undergraduate and graduate learners as well as those of intermediate and advanced
English levels;
(c) computer- and paper-based concordancing were used; and
(d) corpora were used either for a reference resource or learning vocabulary and lexicogrammar
11. Authentic language input
The most significant benefit of CiDDL lies in the authenticity of the language to
be analyzed by students (Clifton & Phillips, 2006; Romer, 2008).
Studies reporting the effectiveness of CiDDL in undergraduate
classrooms include those of Daskalovska (2015), Huang (2014),
and Gordani (2012). Daskalovska (2015) compared the
effectiveness of corpus-based activities and traditional activities for
learning collocations among 46 first-year students in a university in
the Republic of Macedonia. The study found that students in the
experimental group, who used the online concordancer learned
more collocations and showed better results on the test that
measured their knowledge of verb-adverb collocations.
12. Authentic language input
The approach facilitates the acquisition of English vocabulary
(Karras, 2016), lexico-grammatical patterns (Huang, 2014; Liu &
Jiang, 2009), and speaking fluency (Geluso & Yamaguchi, 2014).
Students additionally improve their ability to use familiar words in
new ways (Frankenberg-Garcia, 2012). Being engaged in corpus-
based queries and analyses helps students develop their knowledge
of linking adverbials in English (Boulton, 2009) and English
verbnoun collocations (Chan & Liou, 2005) as well as strengthen
their ability to use the passive voice (Smart, 2014).
13. Attention in L2 learning
Other benefits claimed concern with the development of students’
metacognitive and cognitive skills through inductive and deductive
reasoning activities (Boulton, 2009). Students enhance their
language noticing and autonomy by engaging in inquiry-based
activities (Boulton, 2017; Chambers, 2007; Godwin-Jones, 2017;
Yoon & Hirvela, 2004). These findings are in agreement with the
notions of Schmidt (1995, 2001) that attention plays a significant
role in retention and all types of learning, including in second
language learning. The implications for L2 learning include
learners need to pay attention to the language input and compare
their utterances and those of the target language speakers. By
finding clues derived from language samples, L2 learners are
expected to notice how language samples occur in specific
contexts and generate principles of how the target language works
14. Conclusion & Recommendation
Although many studies have found the use of CiDDL beneficial for
L2 students’ language development, further research is still
needed to provide support for the efficacy of the approach in
various contexts. Classroom action research is an alternative to fill
the gap between research and instruction in that educators
become more involved in applying the CiDDL approach and
evaluating its efficacy. Joint research projects between educators
as the direct users of DDL and the corpus linguistic researchers
can establish a stronger connection between empirical evidence
and classroom practice.