University Lecturer en University of Cambridge at University of Cambridge
Dec. 4, 2013•0 likes•6,506 views
1 of 72
Pedagogical applications of corpus data for English for General and Specific Purposes
Dec. 4, 2013•0 likes•6,506 views
Download to read offline
Report
Education
Technology
FIAL (conférence ouverte aux chercheurs et étudiants): "Pedagogical applications of corpus data for English for General and Specific Purposes" le mercredi 4 décembre, 12h45 (local ERAS 56). UCL, Louvain-la-Neuve
Pedagogical applications of corpus data for English for General and Specific Purposes
1. Pedagogical applications of
corpus data for English for
General and Specific Purposes
Université Catholique de Louvain
FIAL (conférence ouverte aux chercheurs et étudiants): Mercredi 4
décembr 12h45 (local ERAS 56).
Pascual Pérez-Paredes
Universidad de Murcia,
Campus Mare Nostrum
2. Pedagogical applications of
corpus data for English for
General and Specific Purposes
Université Catholique de Louvain
FIAL (conférence ouverte aux chercheurs et étudiants): Mercredi 4
décembr 12h45 (local ERAS 56).
perezparedes.blogspot.com
3. 3
Outline
1. Background: corpora
2. The SACODEYL- BACKBONE approach at a
glance
3. Getting down to annotating
1.
2.
3.
Backbone Annotator: download and installation
Texts and CMT
Guided annotation
1. BACKBONE
2. Specific purposes: LADEX
5. 5
Corpus
Principled collection of texts representative of a
given language or reprsentative of a particular
language domain.
-Language research purposes
-Applied purposes: teaching, learning, dictionary
making, testing…
view.byu.edu/corpora.asp
sacodeyl.inf.um.es/sacodeyl-search/
webapps.ael.uni-tuebingen.de/backbone-search/
6. 6
Corpora in language education
ReCALL special issue: Researching uses of corpora for language teaching
and learning . Boulton & Pérez-Paredes (2014).
-Indirect uses: Thorndike and Lorge’s Teacher’s Word Book
of 30,000 Words (1944), West’s General Service List (1953),
or Gougenheim (e.g. 1958) and colleagues’ work on the
Français Fondamental
-Cobuild work led by John Sinclair (1987)
Routledge Frequency Dictionaries
Coxhead’s Academic Word List (2000)
Martinez and Schmitt’s (2012) Phrasal Expressions List
7. 7
TaLC Lancaster 1994
(1) computers and storage at the time were
improving dramatically;
(2) there was a new interest in authentic data and
usage in language education; and
(3) there was a consensus that learners were
adopting new, more active roles in their
learning process.
9. 9
• Braun (2005, 2007): pedagogically motivated
corpora
(a) provide a more systematic range of material
than individual texts or scattered collections of
activities and, if well-designed, (b) offer a wider
range of idiolects than the average material.
Braun (2006) : thematic annotation, including topic
keys and section titles, are particularly useful in the
implementation of pedagogically motivated corpora
11. 11
• Pérez-Paredes & Alcaraz (2009)
For the time being, the natural corpus playground
continues to be tertiary education.
Our motivation:
CL in the language classroom.
The resulting annotated corpus can be seen as being
integrative of language data and annotated pedagogy.
Pedagogy can be annotated and, subsequently, accessed by
corpus users.
12. 12
Linguistic analysis of interest in
FLT
------>
Linguistics comes first
------->
DDL materials
Concordances
and corpus
Researcher/Linguist
End user
What is possible..
(Alcáraz and
Pérez-Paredes
2008)
13. 13
• Pedagogical analysis (and annotation)
of language corpora
------>
Pedagogy comes first
------->
Pedagogy-driven
DDL
What is feasible..
Material
(Alcáraz and
developer/Teacher
/ Learner
Pérez-Paredes
End user
2008)
24. 24
Outline
1. Background: corpora
2. The SACODEYL- BACKBONE approach at a
glance
3. Getting down to annotating
1.
2.
3.
Backbone Annotator: download and installation
Texts and CMT
Guided annotation
1. BACKBONE
2. Specific purposes: LADEX
26. 26
What is XML TEI format?
▫ TEI Text Encoding Initiative
▫ This is a format for storing corpora
▫ Has been promoted by OTA
(Oxford Text Archive)
▫ Is a continuously growing format (more than 50
versions released yet, currently TEI P5)
▫ Is rapidly spreading among the available tools
27. 27
TEI Tools (Research)
• TeiPublisher
“This tool is a XML-based repository that
allows the publication of TEI corpora to the
public community and offers a search tool.”
• Dexter
“This is other annotator tool that used TEI as
the format for the annotated files.”
28. 28
TEI Tools (Research)
• Oxygen XML Editor and XMLSpy
“These are XML Editors that allows the
modification of the TEI files without any
limitation”
(These are complex for non-advanced users)
29. 29
TEI Tools (Research)
• TAPoR (http://portal.tapor.ca/)
“The Text Analysis Portal for Research
(TAPoR) is a gateway to tools for sophisticated
analysis and retrieval, along with representative
texts for experimentation.”
30. 30
TEI Tools (Research)
• TokenX
http://www.unl.edu/libr/etext/tokenx.shtml
“Is a text visualization, analysis, and play
tool”
• WordHoard
http://wordhoard.northwestern.edu/userman/index.html
“Is a tool for annotating or tagging texts by
morphological, lexical, prosodic, and
narratological
criteria and for determining frequency
information”
31. 31
TEI Tools (Research)
• XAIRA
XAIRA (XML Aware Information Retrieval
Architecture) is an open source tool for
constructing high-quality linguisticallymotivated search interfaces to large
collections of XML documents.
33. 33
TEI Tools (Classroom)
• A more interesting orientation.
How I can use the Annotation in the classroom?
Backbone Search Tool
www.um.es/backbone
34. 34
Outline
1. Background: corpora
2. The SACODEYL- BACKBONE approach at a
glance
3. Getting down to annotating
1.
2.
3.
Backbone Annotator: download and installation
Texts and CMT
Guided annotation
1. BACKBONE
2. Specific purposes: LADEX
35. 35
Download BACKBONE Annotator +
Install + CMT config
http://www.um.es/backbone/
Pérez-Paredes, P., and Alcaraz-Calero, J. M.
(2009). Developing annotation solutions for online
Data Driven Learning. ReCALL 21, 55..
37. 37
How can I add a new document to
the current corpus?
1.
Add document …
2. Select the text
format/encoding
3. Select the new
document
38. 38
What does the text format mean?
• Mainly 4 text formats are supported:
▫
▫
▫
▫
Plain text (written) .txt
Oral text in Backbone Transcriptor format
Oral text in SACODEYL Transcriptor format
XML text in TEI standard format
(text in special XML files)
39. 39
What does the text encoding mean?
This
is the form in which the text is stored
(related to the Multilanguage).
(In Windows ANSI by default)
40. 40
Selecting the text to annotate
•
Select a document and annotate it
1.
Open document…
2. Select the document
41. 41
Information shown in the working document
• Section Number
• Applied Categories to this section
(Annotations)
• Speaker (only in oral text)
• Transcription
42. 42
What is a section?
• Is a stretch of text that is “whateverly”
motivated.
• A fragment that could be useful in whatever
context
• A section can be established in any kind of text
(oral and written) with the insertion of the
special char (#) for division of texts into
sections.
44. 44
What is a Keyword?
• “… [a] keyword is a stretch of language (a
word, more than one word or a whole
paragraph) that the annotator associates to a
category…”
Pérez-Paredes and Alcaraz, ReCALL, 2009 Vol
21. (1)
45. 45
What are Keywords?
• BACKBONE Annotator supports the annotation
of keywords
• Just select text and apply a category by rightclicking
47. 47
Section title
• Drag and Drop the special “Title”
category to the desired section.
• The title is rendered by a
tool tip when placing the cursor on
the section.
(No tool tip = No title)
48. 48
Extensible annotation
• Supports customization of the
annotation
• User can add his/her own
annotation taxonomy or
remove any annotation
category
49. 49
How can I add a new category?
▫ Select the parent category.
(i.e. Topics)
▫ Press Add Cat. Button.
▫ Fill in
51. 51
How can I remove a category?
Select
the category to
remove (i.e. Topic)
Be careful …
All
the associated children will
be removed also
All the annotation with the
tags will be removed also
Press
Delete Cat. Button.
52. 52
How can I reorder the categories?
Select
the category to
reorder (i.e. Topic)
Press Up Cat or Down Cat.
to move it.
53. 53
How can I customize a category?
Select
the category to
customize (i.e. Topic)
Press double click
62. 62
Outline
1. Background: corpora
2. The SACODEYL- BACKBONE approach at a
glance
3. Getting down to annotating
1.
2.
3.
Backbone Annotator: download and installation
Texts and CMT
Guided annotation
1. BACKBONE
2. Specific purposes: LADEX
63. 63
Backbone
• Pedagogic Corpora for Content and Language
Integrated Learning. Insights from the
BACKBONE Project. The EUROCALL Review,
20, 2, September 2012
• Kurt Kohn, Applied English Linguistics,
University of Tübingen (Germany)
http://eurocall.webs.upv.es/index.php?
m=menu_00&n=news_20_2
65. 65
Outline
1. Background: corpora
2. The SACODEYL- BACKBONE approach at a
glance
3. Getting down to annotating
1.
2.
3.
Backbone Annotator: download and installation
Texts and CMT
Guided annotation
1. BACKBONE
2. Specific purposes: LADEX
66. 66
Specific uses: Legal-administrative
language and immigration
This
project aims at filling the existing gap between the linguistic
studies combining legal language characterisation and the cultural
and social implications of immigration, from a multilingual angle
(English, Italian, French and Spanish).
The
project will contribute to the definition of the immigrant in
each society, encouraging the debate on solidarity from a linguistic
perspective.
Our
starting point is the compilation, tagging and annotation of a
multilingual corpus comprising a collection of representative
documents used in immigration (UE and non-UE citizens), issued
by the different Public Administrations and institutions in Spain,
UK, France and Italy, ranging from 2007 to 2011.
67. 67
• 1. Compilation and organisation of legal-administrative
binding documents for immigrants in all the countries involved.
• 2. Contrastive analysis of all those terminological, phraseological
and discoursive aspects which can help us shape the cultural
identity of administrators and immigrants.
• 3. Multilingual study of the legal-administrative language analysed
in the research corpus textual typology.
• 4. Contrastive characterisation of the foreign user and cultural
implications.
69. 69
Annotation Aim
• Why are you annotating?
• What is the purpose of your annotation?
• What use are you giving to your annotation?
70. 70
Discussion and debate
• Pedagogical annotation vs. Morphological
tagging paradigm
• Learner-centered vs. Researcher-oriented
• Indirect applications of language corpora vs.
Direct applications
• Constraints of traditional CL in the languagge
classroom
71. 71
Discussion and debate
• Cognitive demands of traditional CL in the
language classroom: learner as a reseacher and
as a traveller
• Is CL an extra hassle in language classrooms?
(Mauranen 2004)
• Customization of language corpus/collection of
texts
• Mediation role of corpus-based resources in the
FLT classroom
• Authenticity issues (Widdowson)
72. 72
References and further reading
• Braun, S. 2005. “From pedagogically relevant corpora to authentic
language learning contents”, ReCALL 17/1:47-64.
• Braun, S. 2006. “ELISA - a pedagogically enriched corpus for language
learning purposes”. In Corpus Technology and Language Pedagogy: New
Resources, New Tools, New Methods, Frankfurt M: Peter Lang. (eds) 2547.
• Braun, S. 2007. “Integrating corpus work into secondary education: from
data-driven learning to needs-driven corpora”. ReCALL 19/3: 307-328.
• Mauranen, A. 2004.” Spoken - general: Spoken corpus for an ordinary
learner”. In How to Use Corpora in Language Teaching, Sinclair, J. McH.
(Ed), 89–105.
• Pérez-Paredes, P. and Alcaraz, J.M. 2009. “Developing annotation
solutions for online data-driven learning”. ReCALL,21,1, .
• Römer, Ute. (2008). “Corpora and Language Teaching”. In Corpus
Linguistics. An International Handbook, Lüdeling, Anke & Merja Kytö
(eds.). Berlin: Mouton de Gruyter.
• Widdowson, H.G. 2003. Defining issues in English Language Teaching.
Oxford: Oxford University Press.
perezparedes.blogspot.com