Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Articles from TOETOE Technology for    Open English Toying with Open E-              resources (ˈtɔɪtɔɪ)Oh, what a BAWE! T...
Sketch Engine open concordancer interface for the BAWE showing results for a KWICquery for the item ‘research’.The Word Tr...
The Great Beyond with Open English Language Resources from AlannahFitzgeraldEarlier in 2012 FLAX had developed the wikify ...
McEnery T. & Wilson A. (2012) Corpus linguistics. Module 3.4 in Davies G. (ed.)Information and Communications Technology f...
Upcoming SlideShare
Loading in …5

Oh, what a BAWE! The British Academic Written English corpus


Published on

From the blog TOETOE (ˈtɔɪtɔɪ): Technology for Open English - Toying with Open E-resources

Published in: Education
  • Be the first to comment

  • Be the first to like this

Oh, what a BAWE! The British Academic Written English corpus

  1. 1. Articles from TOETOE Technology for Open English Toying with Open E- resources (ˈtɔɪtɔɪ)Oh, what a BAWE! The British Academic WrittenEnglish corpus2013-03-24 03:03:43 adminThis is the sixth post in a blog series based on the the TOETOE Internationalproject with the University of Oxford, the UK Higher Education Academy (HEA) andthe Joint Information Systems Committee (JISC).FLAX British Academic Written English (BAWE) collectionsThe BAWE collections in FLAX, as demonstrated in the training video below, enableyou to interact with the BAWE corpus of university student writing from across thedisciplines to learn about the thirteen different genres assigned by the makers of thecorpus (Nesi, Gardner, Thompson & Wickens, 2007). For free access to thecomplete manual on the making of the BAWE by Heuboeck, Holmes and Nesi,2010) you can access it from the following link (The BAWE Corpus Manual, AnInvestigation of Genres of Assessed Writing in British Higher Education). Featuresfrom the FLAX open source software (OSS) project for understanding the BAWE,include: word lists and keyness indicators; collocations; lexical bundles; a glossaryfunction with Wikipedia; along with a variety of automated functions for searching,saving and linking within the BAWE corpus.From its earliest inception the FLAX project has been envisioned and advanced withthe language teacher and learner in mind. Since 2008, I have been engaged with theFLAX project to provide user feedback on the development of the languagereference collections and to devise ways to promote the project resources withinmainstream English language teaching and learning communities. A simplified andintuitive interface has been developed for presenting language collections andinteractive learning activities based on the powerful and complex handling of searchqueries from a range of linked corpora and open linguistic content.Another open web-based interface for accessing the BAWE is located within thecommercial Sketch Engine project. This project provides the more traditional KWIC(KeyWord In Context) concordancer interface for linguistic data presentation withstrings of search terms embedded in truncated language context snippets. TheUsing Sketch Engine with BAWE manual (Nesi & Thompson, 2011) provides an in-depth user guide for the more expert corpus user.
  2. 2. Sketch Engine open concordancer interface for the BAWE showing results for a KWICquery for the item ‘research’.The Word Tree corpus interface is a JISC Rapid Innovation project based atCoventry University providing yet another open web-based interface alternative toKWIC searches for analysing the BAWE. One of the project’s goals is for the opensourcecode that has been developed for this rapid innovation project to be re-usedin further open corpus-based projects for analysing additional corpora which isavailable from github. This project can be followed via the Word Tree project blogand JISC final report, outlining issues encountered with managing and processingthe presentation of large amounts of linguistic data through a word tree interface thatprovides click through pathways and the ability to prune and graft word treesearches.The Word Tree corpus interface for the BAWE showing a search query word tree for the items ‘research’ and ‘research methods’ Reference corpora versus specialist corporaComparisons made between language as it is used in reference corpora, such asthe British National Corpus (BNC) which provides a snapshot of how English occursacross a variety of contexts, and how it is used in specialist academic sub-corpora,or in actual student-generated academic text corpora as in the case of the BAWE,help us to identify which words and phrases occur more commonly in specific aswell as in general academic contexts of use. Not confined by the boundaries of aprinted volume, the openly available web-based BAWE collections in FLAX(demonstrated in the video above) are arguably more powerful than the averagedictionary or coursebook for practice with academic English.Before commencing on my journeys with the TOETOE international, I had written anextensive project blog post on open trends within corpora and ELT materialsdevelopment in Radio Ga Ga: corpus-based resources, you’ve yet to have yourfinest hour. At the Open Education conference in Vancouver in October 2012, withmy presentation on the Great Beyond with Open ELT Resources (see below) I hadoutlined the development work that TOETOE and the FLAX team were going toembark on with respects to the BAWE corpus and the evaluations on the earlierBAWE collections in FLAX that we would be seeking from international participantsin collaboration with the project. Feedback from international stakeholders in China(Confucian dynamism in Chinese ELT context) and Korea (the English languageskyline in South Korea) on the BAWE collections in FLAX led to further design anddevelopment iterations while back in New Zealand with the FLAX team (Love is astranger in an open car to tempt you in and drive you far away…toward openeducational practice) which have been captured in the project blog posts here inbrackets.
  3. 3. The Great Beyond with Open English Language Resources from AlannahFitzgeraldEarlier in 2012 FLAX had developed the wikify function for matching key words andphrases in the BAWE collections to Wikipedia entries as a glossary support feature.This provides help with subject specific language in the BAWE which may bedaunting to learners and teachers alike who are not yet familiar with the specificlanguage of a given topic area but where there is an expectation that learners willneed to develop proficiencies with specific academic English if they are to engage inEnglish-medium higher education programmes. For example, the technicallanguage from a biology methodology recount text in the BAWE can be glossed forenhanced understanding in FLAX with links to Wikipedia definitions and relatedtopics. Corpus-based approaches for understanding genre in EAP “Unsurprisingly, the utility of the corpus is increased when it has been annotated, making it no longer a body of text where linguistic information is implicitly present, but one which may be considered a repository of linguistic information.” (ICT4ELT McEnery & Wilson, 2012)Corpus studies help with investigations into understanding more than just discretelanguage items. The study of genres as different communities of practice developthem is also central to corpus work for better understanding the different writtenassessment types that students will actually encounter across the academy.Generic EAP writing assessments, especially those found in College Compositionand Writing Across the Curriculum programmes (Freedman; Petraglia, 1995;Russell, 2002), have been criticized for becoming genres unto themselves; withserious doubts cast on their ability to resemble or assist with transfer in themultitude of specific genres that students will be expected to engage with in theirdifferent academic programmes. Generic EAP teaching resources and writingassignments that teach general things about academic language and writing haveresulted in EAP writing that Wardle describes as conforming to ‘mutt genres’ (2009).In response to the issue of genre in university writing, the BAWE corpus collectionsin FLAX provide EAP teachers and students with a first-hand look into this student-generated corpus of assessed undergraduate and taught postgraduate writingcollected at three UK universities: Warwick, Oxford Brookes and Reading. Thirteendifferent genres were assigned by the developers of the BAWE (Nesi et al., 2004-2007): Case Study Critique Design Specficiation Empathy Writing Essay Exercise Explanation Literature Survey Methodology Recount Narrative Recount Problem Question Proposal Research ReportThe Oxford Text Archive where the BAWE is managed by the University of Oxford ITServices granted access to the FLAX project to develop OSS for language learningand teaching on top of this valuable research corpus, in the same way that FLAXhave developed OSS to enable access to the BNC which is also managed anddistributed by OU IT Services. Four sub-corpora have been developed in FLAX asthey correspond to written academic assessments across the major academicdisciplines as identified by the makers of the BAWE, including: the PhysicalSciences, the Life Sciences, the Social Sciences and the Arts and HumanitiesBAWE collections in FLAX. It was determined that student texts from the BAWEwould serve as an achievable model for academic writing for EAP students, and thatthis corpus of student texts would serve as a starting point if linked to widerresources, namely the BNC, Wikipedia, the Learning Collocations collection in FLAXand the live Web, thereby providing a ‘bridge’ to more expert writing.The developers of the BAWE corpus have a follow-on ERSC-funded project, Writingfor a Purpose, which is currently piloting EAP learning resources based on theBAWE. These soon to be launched resources will be housed on the BritishCouncil’s LearnEnglish website with further resources based on the BAWE forimproving the quality of students’ discipline-specific work emerging on Andy Gillet’sUEfAP website. According to the project schedule these resources are going to bepromoted at the upcoming 2013 IATEFL and BALEAP conferences and will definitelybe something to look out for. ReferencesFreedman, A. “The What, Where, When, Why, and How of Classroom Genres.”Petraglia Reconceiving. 121–44.Heuboeck, A. Holmes, J. & Nesi, H. (2010). The BAWE corpus manual for theproject entitled, ‘An Investigation of Genres of Assessed Writing in British HigherEducation’, version 3. Retrieved from
  4. 4. McEnery T. & Wilson A. (2012) Corpus linguistics. Module 3.4 in Davies G. (ed.)Information and Communications Technology for Language Teachers (ICT4LT),Slough, Thames Valley University [Online]. Retrieved from, H, Gardner, S., Thompson, P. & Wickens, P. (2007) The British AcademicWritten English (BAWE) corpus, developed at the Universities of Warwick, Readingand Oxford Brookes under the directorship of Hilary Nesi and Sheena Gardner(formerly of the Centre for Applied Linguistics [previously called CELTE], Warwick),Paul Thompson (Department of Applied Linguistics, Reading) and Paul Wickens(Westminster Institute of Education, Oxford Brookes), with funding from the ESRC(RES-000-23-0800)Nesi, H. & Thompson, P. (2011). Using Sketch Engine with BAWE. Retrieved from, H. & Gardner S. (2012). Genres across the disciplines: student writing inHigher Education. Cambridge: Cambridge University Press.Petraglia, J. (1995). Ed. Reconceiving Writing, Rethinking Writing Instruction.Mahwah, NJ: Lawrence Erlbaum.Russell, D. (2002). Writing in the Academic Disciplines: A Curricular History. 2nded. Carbondale: Southern Illinois UP.Wardle, E. (2009) “‘Mutt Genres’ and the Goal of FYC: Can We Help Students Writethe Genres of the University?” College Composition and Communication 60: 765-789.The Oh, what a BAWE! The British Academic Written English corpus by AlannahFitzgerald, unless otherwise expressly stated, is licensed under a CreativeCommons Attribution 3.0 Unported License. Terms and conditions beyond thescope of this license may be available at