Oh, what a BAWE! The British Academic Written English corpus
Articles from TOETOE Technology for
Open English Toying with Open E-
resources (ˈtɔɪtɔɪ)
Oh, what a BAWE! The British Academic Written
English corpus
2013-03-24 03:03:43 admin
This is the sixth post in a blog series based on the the TOETOE International
project with the University of Oxford, the UK Higher Education Academy (HEA) and
the Joint Information Systems Committee (JISC).
FLAX British Academic Written English
(BAWE) collections
The BAWE collections in FLAX, as demonstrated in the training video below, enable
you to interact with the BAWE corpus of university student writing from across the
disciplines to learn about the thirteen different genres assigned by the makers of the
corpus (Nesi, Gardner, Thompson & Wickens, 2007). For free access to the
complete manual on the making of the BAWE by Heuboeck, Holmes and Nesi,
2010) you can access it from the following link (The BAWE Corpus Manual, An
Investigation of Genres of Assessed Writing in British Higher Education). Features
from the FLAX open source software (OSS) project for understanding the BAWE,
include: word lists and keyness indicators; collocations; lexical bundles; a glossary
function with Wikipedia; along with a variety of automated functions for searching,
saving and linking within the BAWE corpus.
From its earliest inception the FLAX project has been envisioned and advanced with
the language teacher and learner in mind. Since 2008, I have been engaged with the
FLAX project to provide user feedback on the development of the language
reference collections and to devise ways to promote the project resources within
mainstream English language teaching and learning communities. A simplified and
intuitive interface has been developed for presenting language collections and
interactive learning activities based on the powerful and complex handling of search
queries from a range of linked corpora and open linguistic content.
Another open web-based interface for accessing the BAWE is located within the
commercial Sketch Engine project. This project provides the more traditional KWIC
(KeyWord In Context) concordancer interface for linguistic data presentation with
strings of search terms embedded in truncated language context snippets. The
Using Sketch Engine with BAWE manual (Nesi & Thompson, 2011) provides an in-
depth user guide for the more expert corpus user.
Sketch Engine open concordancer interface for the BAWE showing results for a KWIC
query for the item ‘research’.
The Word Tree corpus interface is a JISC Rapid Innovation project based at
Coventry University providing yet another open web-based interface alternative to
KWIC searches for analysing the BAWE. One of the project’s goals is for the open
sourcecode that has been developed for this rapid innovation project to be re-used
in further open corpus-based projects for analysing additional corpora which is
available from github. This project can be followed via the Word Tree project blog
and JISC final report, outlining issues encountered with managing and processing
the presentation of large amounts of linguistic data through a word tree interface that
provides click through pathways and the ability to prune and graft word tree
searches.
The Word Tree corpus interface for the BAWE showing a search query word tree for the items ‘research’ and ‘research methods’
Reference corpora versus specialist
corpora
Comparisons made between language as it is used in reference corpora, such as
the British National Corpus (BNC) which provides a snapshot of how English occurs
across a variety of contexts, and how it is used in specialist academic sub-corpora,
or in actual student-generated academic text corpora as in the case of the BAWE,
help us to identify which words and phrases occur more commonly in specific as
well as in general academic contexts of use. Not confined by the boundaries of a
printed volume, the openly available web-based BAWE collections in FLAX
(demonstrated in the video above) are arguably more powerful than the average
dictionary or coursebook for practice with academic English.
Before commencing on my journeys with the TOETOE international, I had written an
extensive project blog post on open trends within corpora and ELT materials
development in Radio Ga Ga: corpus-based resources, you’ve yet to have your
finest hour. At the Open Education conference in Vancouver in October 2012, with
my presentation on the Great Beyond with Open ELT Resources (see below) I had
outlined the development work that TOETOE and the FLAX team were going to
embark on with respects to the BAWE corpus and the evaluations on the earlier
BAWE collections in FLAX that we would be seeking from international participants
in collaboration with the project. Feedback from international stakeholders in China
(Confucian dynamism in Chinese ELT context) and Korea (the English language
skyline in South Korea) on the BAWE collections in FLAX led to further design and
development iterations while back in New Zealand with the FLAX team (Love is a
stranger in an open car to tempt you in and drive you far away…toward open
educational practice) which have been captured in the project blog posts here in
brackets.
The Great Beyond with Open English Language Resources from Alannah
Fitzgerald
Earlier in 2012 FLAX had developed the wikify function for matching key words and
phrases in the BAWE collections to Wikipedia entries as a glossary support feature.
This provides help with subject specific language in the BAWE which may be
daunting to learners and teachers alike who are not yet familiar with the specific
language of a given topic area but where there is an expectation that learners will
need to develop proficiencies with specific academic English if they are to engage in
English-medium higher education programmes. For example, the technical
language from a biology methodology recount text in the BAWE can be glossed for
enhanced understanding in FLAX with links to Wikipedia definitions and related
topics.
Corpus-based approaches for
understanding genre in EAP
“Unsurprisingly, the utility of the corpus is increased when it has been
annotated, making it no longer a body of text where linguistic information
is implicitly present, but one which may be considered a repository of
linguistic information.” (ICT4ELT McEnery & Wilson, 2012)
Corpus studies help with investigations into understanding more than just discrete
language items. The study of genres as different communities of practice develop
them is also central to corpus work for better understanding the different written
assessment types that students will actually encounter across the academy.
Generic EAP writing assessments, especially those found in College Composition
and Writing Across the Curriculum programmes (Freedman; Petraglia, 1995;
Russell, 2002), have been criticized for becoming genres unto themselves; with
serious doubts cast on their ability to resemble or assist with transfer in the
multitude of specific genres that students will be expected to engage with in their
different academic programmes. Generic EAP teaching resources and writing
assignments that teach general things about academic language and writing have
resulted in EAP writing that Wardle describes as conforming to ‘mutt genres’ (2009).
In response to the issue of genre in university writing, the BAWE corpus collections
in FLAX provide EAP teachers and students with a first-hand look into this student-
generated corpus of assessed undergraduate and taught postgraduate writing
collected at three UK universities: Warwick, Oxford Brookes and Reading. Thirteen
different genres were assigned by the developers of the BAWE (Nesi et al., 2004-
2007):
Case Study
Critique
Design Specficiation
Empathy Writing
Essay
Exercise
Explanation
Literature Survey
Methodology Recount
Narrative Recount
Problem Question
Proposal
Research Report
The Oxford Text Archive where the BAWE is managed by the University of Oxford IT
Services granted access to the FLAX project to develop OSS for language learning
and teaching on top of this valuable research corpus, in the same way that FLAX
have developed OSS to enable access to the BNC which is also managed and
distributed by OU IT Services. Four sub-corpora have been developed in FLAX as
they correspond to written academic assessments across the major academic
disciplines as identified by the makers of the BAWE, including: the Physical
Sciences, the Life Sciences, the Social Sciences and the Arts and Humanities
BAWE collections in FLAX. It was determined that student texts from the BAWE
would serve as an achievable model for academic writing for EAP students, and that
this corpus of student texts would serve as a starting point if linked to wider
resources, namely the BNC, Wikipedia, the Learning Collocations collection in FLAX
and the live Web, thereby providing a ‘bridge’ to more expert writing.
The developers of the BAWE corpus have a follow-on ERSC-funded project, Writing
for a Purpose, which is currently piloting EAP learning resources based on the
BAWE. These soon to be launched resources will be housed on the British
Council’s LearnEnglish website with further resources based on the BAWE for
improving the quality of students’ discipline-specific work emerging on Andy Gillet’s
UEfAP website. According to the project schedule these resources are going to be
promoted at the upcoming 2013 IATEFL and BALEAP conferences and will definitely
be something to look out for.
References
Freedman, A. “The What, Where, When, Why, and How of Classroom Genres.”
Petraglia Reconceiving. 121–44.
Heuboeck, A. Holmes, J. & Nesi, H. (2010). The BAWE corpus manual for the
project entitled, ‘An Investigation of Genres of Assessed Writing in British Higher
Education’, version 3. Retrieved from
http://www.coventry.ac.uk/Global/05%20Research%20section%20assets/Research/British%20Academic%20Written%20English%20Corpus%20%28BAWE%29/Microsoft%20Word%20-
%20BAWEmanual%20v3%20-%20BAWEmanual%20v3.pdf
McEnery T. & Wilson A. (2012) Corpus linguistics. Module 3.4 in Davies G. (ed.)
Information and Communications Technology for Language Teachers (ICT4LT),
Slough, Thames Valley University [Online]. Retrieved from
http://www.ict4lt.org/en/en_mod3-4.htm
Nesi, H, Gardner, S., Thompson, P. & Wickens, P. (2007) The British Academic
Written English (BAWE) corpus, developed at the Universities of Warwick, Reading
and Oxford Brookes under the directorship of Hilary Nesi and Sheena Gardner
(formerly of the Centre for Applied Linguistics [previously called CELTE], Warwick),
Paul Thompson (Department of Applied Linguistics, Reading) and Paul Wickens
(Westminster Institute of Education, Oxford Brookes), with funding from the ESRC
(RES-000-23-0800)
Nesi, H. & Thompson, P. (2011). Using Sketch Engine with BAWE. Retrieved from
http://wwwm.coventry.ac.uk/researchnet/BAWE/Documents/Using%20Sketch%20Engine%20with%20BAWE%202011.pdf
Nesi, H. & Gardner S. (2012). Genres across the disciplines: student writing in
Higher Education. Cambridge: Cambridge University Press.
Petraglia, J. (1995). Ed. Reconceiving Writing, Rethinking Writing Instruction.
Mahwah, NJ: Lawrence Erlbaum.
Russell, D. (2002). Writing in the Academic Disciplines: A Curricular History. 2nd
ed. Carbondale: Southern Illinois UP.
Wardle, E. (2009) “‘Mutt Genres’ and the Goal of FYC: Can We Help Students Write
the Genres of the University?” College Composition and Communication 60: 765-
789.
The Oh, what a BAWE! The British Academic Written English corpus by Alannah
Fitzgerald, unless otherwise expressly stated, is licensed under a Creative
Commons Attribution 3.0 Unported License. Terms and conditions beyond the
scope of this license may be available at www.alannahfitzgerald.org.