Your SlideShare is downloading. ×
Sharing an Open Methodology for Building Domain-specific Corpora for EAP
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Sharing an Open Methodology for Building Domain-specific Corpora for EAP


Published on

Presented at the EAP and Corpora BALEAP Professional Issues Meeting in Coventry, UK on June 21st 2014. Research and Development Collaboration with the FLAX Language Project (University of Waikato), …

Presented at the EAP and Corpora BALEAP Professional Issues Meeting in Coventry, UK on June 21st 2014. Research and Development Collaboration with the FLAX Language Project (University of Waikato), The Open Educational Resources Research Hub (The UK Open University) and the Language Centre at Queen Mary University of London (with Martin Barge, William Tweddle and Saima Sherazi).

Published in: Education, Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • TIRF is The Int. Research Foundation of English language education that is partly funding Alannah to be in the UK to work with QMUL along with the OER Research Hub
  • CTWL students are target users at QMUL for the Law collections in FLAX in addition to the Pre-Sessional Law-strand students on the summer programme at QMUL. An additional target user group are the MOOC learners registered on the courses where we have reused their MOOC lectures for this corpus.
  • The Access limitation with corpus-based approaches is dealt with at 3 levels by FLAX:
    Free and Open Access of the software (and the code) and most of the corpus resources used. In the case of this Law corpus all resources are open and free.
    Accessible interfaces in FLAX that have been designed for the non-expert corpus user, namely language teachers and learners and anybody wanting linguistic support with specific academic resources, here in the case of MOOC learners who are not registered language learners. FLAX avoids the complex querying language which most corpus-based tools rely on users understanding.
    Accessible Open Educational Resources and Open Access resources that can be further used in the development of corpus-based derivatives for classroom use as exemplified with this Law ESAP corpus.
  • Aiming for flexible ESAP (English for Specific Academic Purposes) resources for uptake in traditional classroom-based EAP and in online and open education, including MOOCs.
  • Less than half of all Open Access journals are published using Creative Commons licenses so this is where Open Educational Resources and Open Source Software have more in common than they do with OA. But there are OA journals we can use and most of which are published under the most flexible Creative Commons licenses e.g. CC-BY with only a few being the most restrictive e.g. CC-ND. Depending on the field there will be less or more OA journals. There are not many OA journals for Law but there are many Openly-licensed government papers in the field of Law. We will look at adding samples of these also in future.
    Being able to show demo corpora like the one we are building in FLAX online, enables us to explain to e.g. the British Library, what our intended uses are for theses writing for NC Educational and Research Development purposes for ESAP.
  • OA articles have been pre-formatted by journals and to remove this formatting is somewhat of a challenge. Martin Barge has developed the first iteration of a formatting OA tool which can export text sections with relevant code into html format for use in FLAX. More iterations of this tool will be developed as we continue to rebuild the corpus.
  • Text augmentation – linking in other data resources, here Wikipedia, to enhance the efficacy of the corpus
  • A further example of text augmentation, whereby the smaller subject-specific corpus is linked to larger corpora (The British National Corpus, The British Academic Written English Corpus and Wikipedia as a Corpus) and further resources (Roget’s Thesaurus, Wikipedia,Wiktionary) for comparison across relevant corpora and for further linguistic support for key terms and phrases.
  • Open resources that you can do whatever you want with: corpus building and online activities based on the corpus with open source software as in the FLAX project; developing course book derivatives from the open resources; researching the effectiveness of the corpus for future iterations of collections building and interface designs.
  • Please add your contact detes here, QMUL peeps!
  • Transcript

    • 1. Sharing an Open Methodology for Building Domain-specific Corpora for EAP Martin Barge, William Tweddle, Saima Sherazi, Alannah Fitzgerald
    • 2. Outline • FLAX Language Project at Waikato University • Developing an EAP Resource Interface between Traditional EAP and Massive Open Online Courses • Developing ESAP Collections in FLAX (Academic English for Law at QMUL) – What’s in the Demo Collection and What’s to Come! – Formatting Open Access Articles for FLAX Corpora • Fully Open Texts – Beyond Parsing with Text Augmentation & Linked Data – Lexical Bundles, Collocations, Wordlists, Cherry Picking Functions – Building in Interactivity • Design-based Research with FLAX, Queen Mary and the OER Research Hub – Research & Development Cycles with Design-based Research for Iterating Collections Development – Rapid Prototyping of Online Demo Collections to Evaluate the Design Process and to Share with Stakeholders
    • 3. FLAX Language at Waikato University FLAX image by permission of non-commercial reuse by Jane Galloway
    • 4. FLAX Language Project at the Greenstone Digital Library Lab, Waikato University NZ Professor Ian Witten FLAX Project Lead Dr Shaoqun Wu FLAX Project Lead Researcher & Developer
    • 5. QM’s Critical Thinking & Writing in Law • Queen Mary’s Critical Thinking and Writing in Law (CTWL) Programme has been running successfully for over 7 years. • It is delivered by QM Language Centre’s EAP/ESAP team as part of the Insessional provision. • Over 600-800 LLM students enroll on it every year. • A team of 6-7 EAP tutors teach on it, and are under constant pressure to develop better and new materials for their high calibre students.
    • 6. The FLAX System for Subject- Specific Corpus Development Corpus Linguistics – pioneered by Sinclair 1991. DDL – Data-Driven-Learning – term coined by Johns 1991. An empirical method of linguistic enquiry •Used to discover the lexico-grammatical properties of genre or text-type •Used to discover the key terminology given field or discipline – English for Specific Academic Purposes (ESAP) •Used for exploring collocations: “You shall know a word by the company it keeps.” (Frith, 1957:11)
    • 7. Collaboration with Subject Specialists “In the emerging academic literacies approach involving cooperation between subject specialists and writing teachers, the aim is to help the students develop metacognitive awareness of the roles and functions of writing in that discipline, to enable them to stand back from it and observe how it functions, and then to help them gradually participate in the genres, where genre is understood as a constellation of actions rather than a list of formal features.” (Breeze, 2012)
    • 8. Benefits • Inductive – promotes critical thinking • Promotes learner autonomy • Based on evidence, not instinct • Especially relevant for ESP and ESAP Limitations • Need for Ts and Sts to have technical skills to use corpora and concordancers • Need for access to corpora and software programmes • Large amount of data can be overwhelming “Every student is Sherlock Holmes.” (Johns, 2002:108)
    • 9. Interfacing Traditional EAP & MOOCs
    • 10. ESAP Law Collections in FLAX Type of media in the FLAX Law Collections Number and source of items in the FLAX Law Collections Podcast audio files & transcripts (OpenSpires) 10-15 Lectures (Oxford Law Faculty & the Centre for Socio-Legal Studies) MOOC lecture transcripts & videos (streamed via YouTube & Vimeo) 4 MOOC Collections: Copyright Law (Harvard/edX), English Common Law (Uni. of London/Coursera), Age of Globalization (Texas at Austin/edX), Environmental Law & Politics (OpenYale) Student PhD thesis writing and Pre-sessional for Law ESAP essay writing 70 QMUL EThoS Theses at the British Library (Open Access but not licensed with Creative Commons – will need permission to develop for Non- Commercial Educational & Research purposes); 20+ Essays from QMUL Law Pre-sessional Open Access research articles (relevant to QMUL Law and EAP for Law and Globalisation) 40 Articles (DOAJ - Directory of Open Access Journals)
    • 11. Formatting OA Articles for FLAX
    • 12. Working with Full Texts
    • 13. Text Augmentation + Text Parsing
    • 14. Law Corpus Wikify Function in FLAX
    • 15. Wordlist from OA Articles
    • 16. Collocations from Law Lectures
    • 17. Linking Collocations in Law-Specific Corpus to Reference Collections in FLAX (BNC, BAWE, Wikipedia)
    • 18. Lexical Bundles from Law Lectures
    • 19. Building Interactivity into FLAX
    • 20. FLAX Activities Continued
    • 21. FLAX Do-It-Yourself Podcast Corpora with Oxford OER
    • 22. FLAX Do-It-Yourself Podcast Corpora 2: Building interactivity into your collections
    • 23. Developing Podcast Activities in FLAX
    • 24. Close Exercises in FLAX
    • 25. Scrambled Sentences in FLAX
    • 26. Drag ‘n’ Drop exercises in FLAX
    • 27. Learning Collocations in FLAX
    • 28. Automated Collocations Guessing in FLAX (drawing on the British National Corpus)
    • 29. Design-Based Research Cycles with FLAX, the OER Research Hub & Queen Mary • Practitioners/Researchers involved in iterative development of ESAP language collections – Interfacing with open Law resources Open Access articles, Open Government research reports with contributions from QMUL Law professors, Case Law, Open lectures, Openly-licensed student writing – Developing expertise with open tools and resources – Developing interaction within the corpus and derivatives from the corpus – Documenting the collections development process for sharing across the EAP and Open Education sectors
    • 30. Free to Do Whatever You Want • Open Resources for EAP Soup Dragons: – Building ESAP Corpora – Developing Interactivity into ESAP Corpora – Developing ESAP Course Book and Lesson Plan Derivatives – Researching and Developing ESAP Corpora & Derivatives – Researching and Developing Corpus Tools e.g. Interfaces, Text Augmentation and Linked Data Approaches
    • 31. Thank You FLAX Language Project Shaoqun Wu: / Ian Witten: OER Research Hub Alannah Fitzgerald:; @AlannahFitz; TOETOE Blog; Slideshare: The Language Centre – Queen Mary University of London http://language-centre. Martin Barge William Tweddle Saima Sherazi