Pedagogical applications of
corpus data for English for
General and Specific Purposes
Université Catholique de Louvain
FIA...
Pedagogical applications of
corpus data for English for
General and Specific Purposes
Université Catholique de Louvain
FIA...
3

Outline
1. Background: corpora
2. The SACODEYL- BACKBONE approach at a
glance
3. Getting down to annotating
1.
2.
3.

...
4
5

Corpus
Principled collection of texts representative of a
given language or reprsentative of a particular
language dom...
6

Corpora in language education
ReCALL special issue: Researching uses of corpora for language teaching
and learning . B...
7

TaLC Lancaster 1994
(1) computers and storage at the time were
improving dramatically;
(2) there was a new interest in...
8

Imagine …..today
9

• Braun (2005, 2007): pedagogically motivated
corpora
(a) provide a more systematic range of material
than individual ...
10
11

• Pérez-Paredes & Alcaraz (2009)
For the time being, the natural corpus playground
continues to be tertiary education...
12

Linguistic analysis of interest in
FLT
------>
Linguistics comes first
------->

DDL materials
Concordances
and corpu...
13

• Pedagogical analysis (and annotation)
of language corpora
------>
Pedagogy comes first
------->
Pedagogy-driven
DDL...
14
15

www.um.es/sacodeyl
16
17
18
19

sacodeyl.inf.um.es/sacodeyl-search/
webapps.ael.uni-tuebingen.de/backbone-search/
20

• Default annotation
tree has been
developed by the
teachers &
researchers in
SACODEYL
21

What categories
does this default
category tree
contain ?
Topics
Grammatical
Lexical
Style
CEF Level
….
22

Annotator friendly GUI
23

Multilanguage
• Supports a real multilingual annotation
24

Outline
1. Background: corpora
2. The SACODEYL- BACKBONE approach at a
glance
3. Getting down to annotating
1.
2.
3.
...
25
26

What is XML TEI format?
▫ TEI  Text Encoding Initiative
▫ This is a format for storing corpora
▫ Has been promoted b...
27

TEI Tools (Research)
• TeiPublisher
“This tool is a XML-based repository that
allows the publication of TEI corpora t...
28

TEI Tools (Research)
• Oxygen XML Editor and XMLSpy
“These are XML Editors that allows the
modification of the TEI fi...
29

TEI Tools (Research)
• TAPoR (http://portal.tapor.ca/)
“The Text Analysis Portal for Research
(TAPoR) is a gateway to...
30

TEI Tools (Research)
• TokenX
http://www.unl.edu/libr/etext/tokenx.shtml
“Is a text visualization, analysis, and play...
31

TEI Tools (Research)
• XAIRA
XAIRA (XML Aware Information Retrieval
Architecture) is an open source tool for
construc...
32

• The XAIRA search
33

TEI Tools (Classroom)
• A more interesting orientation.
How I can use the Annotation in the classroom?
Backbone Searc...
34

Outline
1. Background: corpora
2. The SACODEYL- BACKBONE approach at a
glance
3. Getting down to annotating
1.
2.
3.
...
35

Download BACKBONE Annotator +
Install + CMT config
http://www.um.es/backbone/
Pérez-Paredes, P., and Alcaraz-Calero, ...
36

How do I create a corpus?
37

How can I add a new document to
the current corpus?
1.

Add document …

2. Select the text
format/encoding
3. Select ...
38

What does the text format mean?
• Mainly 4 text formats are supported:
▫
▫
▫
▫

Plain text (written) .txt
Oral text i...
39

What does the text encoding mean?
 This

is the form in which the text is stored
(related to the Multilanguage).
(In...
40

Selecting the text to annotate
•

Select a document and annotate it

1.

Open document…

2. Select the document
41

Information shown in the working document
• Section Number
• Applied Categories to this section
(Annotations)
• Speak...
42

What is a section?
• Is a stretch of text that is “whateverly”
motivated.
• A fragment that could be useful in whatev...
43

Intuitive Annotation Process
• Drag and Drop to Annotate a Section
44

What is a Keyword?
• “… [a] keyword is a stretch of language (a
word, more than one word or a whole
paragraph) that t...
45

What are Keywords?
• BACKBONE Annotator supports the annotation
of keywords
• Just select text and apply a category b...
46

Selective View
• Offers a selective view of the information in
order to facilitate the organization.
47

Section title
• Drag and Drop the special “Title”
category to the desired section.
• The title is rendered by a
tool ...
48

Extensible annotation
• Supports customization of the
annotation
• User can add his/her own
annotation taxonomy or
re...
49

How can I add a new category?
▫ Select the parent category.
(i.e. Topics)
▫ Press Add Cat. Button.
▫ Fill in
50
51

How can I remove a category?
 Select

the category to
remove (i.e. Topic)
 Be careful …
All

the associated childr...
52

How can I reorder the categories?
 Select

the category to
reorder (i.e. Topic)
 Press Up Cat or Down Cat.
to move ...
53

How can I customize a category?
 Select

the category to
customize (i.e. Topic)
 Press double click
54

Can I manage metadata?
55

What if I find mistakes?
• Supports edition of the inserted texts.
• Uses XML TEI standard for encoding corpora.
56

Integration
• Backbone Annotator is integrated with
▫
▫
▫
▫

Backbone Transcriptor
Backbone CMT
Backbone Search
SACOD...
57

Resource Management

• Offers enrichment
of text with external
resources

• i.e. html links,
videos, audios, etc.
58

Where is the information stored?
• Remember: All the information is store in one
file. The corpus file which you have...
59

Make your corpus collaborative
60

Make your corpus collaborative
61

Make your corpus collaborative
62

Outline
1. Background: corpora
2. The SACODEYL- BACKBONE approach at a
glance
3. Getting down to annotating
1.
2.
3.
...
63

Backbone
• Pedagogic Corpora for Content and Language
Integrated Learning. Insights from the
BACKBONE Project. The EU...
64

webapps.ael.uni-tuebingen.de/backbone-search/
65

Outline
1. Background: corpora
2. The SACODEYL- BACKBONE approach at a
glance
3. Getting down to annotating
1.
2.
3.
...
66

Specific uses: Legal-administrative
language and immigration
This

project aims at filling the existing gap between ...
67

• 1. Compilation and organisation of legal-administrative
binding documents for immigrants in all the countries invol...
68

• LADEX Annotator (Multilingual automatic
tagging) + Manual collaborative annotation

• http://www.um.es/languagecorp...
69

Annotation Aim

• Why are you annotating?
• What is the purpose of your annotation?
• What use are you giving to your...
70

Discussion and debate
• Pedagogical annotation vs. Morphological
tagging paradigm
• Learner-centered vs. Researcher-o...
71

Discussion and debate
• Cognitive demands of traditional CL in the
language classroom: learner as a reseacher and
as ...
72

References and further reading
• Braun, S. 2005. “From pedagogically relevant corpora to authentic
language learning ...
Upcoming SlideShare
Loading in …5
×

Pedagogical applications of corpus data for English for General and Specific Purposes

4,896
-1

Published on

FIAL (conférence ouverte aux chercheurs et étudiants): "Pedagogical applications of corpus data for English for General and Specific Purposes" le mercredi 4 décembre, 12h45 (local ERAS 56). UCL, Louvain-la-Neuve

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
4,896
On Slideshare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Associar elo concepto con el tagging
  • NOW, YOU CAN START
  • Pedagogical applications of corpus data for English for General and Specific Purposes

    1. 1. Pedagogical applications of corpus data for English for General and Specific Purposes Université Catholique de Louvain FIAL (conférence ouverte aux chercheurs et étudiants): Mercredi 4 décembr 12h45 (local ERAS 56). Pascual Pérez-Paredes Universidad de Murcia, Campus Mare Nostrum
    2. 2. Pedagogical applications of corpus data for English for General and Specific Purposes Université Catholique de Louvain FIAL (conférence ouverte aux chercheurs et étudiants): Mercredi 4 décembr 12h45 (local ERAS 56). perezparedes.blogspot.com
    3. 3. 3 Outline 1. Background: corpora 2. The SACODEYL- BACKBONE approach at a glance 3. Getting down to annotating 1. 2. 3. Backbone Annotator: download and installation Texts and CMT Guided annotation 1. BACKBONE 2. Specific purposes: LADEX
    4. 4. 4
    5. 5. 5 Corpus Principled collection of texts representative of a given language or reprsentative of a particular language domain. -Language research purposes -Applied purposes: teaching, learning, dictionary making, testing… view.byu.edu/corpora.asp sacodeyl.inf.um.es/sacodeyl-search/ webapps.ael.uni-tuebingen.de/backbone-search/
    6. 6. 6 Corpora in language education ReCALL special issue: Researching uses of corpora for language teaching and learning . Boulton & Pérez-Paredes (2014). -Indirect uses: Thorndike and Lorge’s Teacher’s Word Book of 30,000 Words (1944), West’s General Service List (1953), or Gougenheim (e.g. 1958) and colleagues’ work on the Français Fondamental -Cobuild work led by John Sinclair (1987) Routledge Frequency Dictionaries Coxhead’s Academic Word List (2000) Martinez and Schmitt’s (2012) Phrasal Expressions List
    7. 7. 7 TaLC Lancaster 1994 (1) computers and storage at the time were improving dramatically; (2) there was a new interest in authentic data and usage in language education; and (3) there was a consensus that learners were adopting new, more active roles in their learning process.
    8. 8. 8 Imagine …..today
    9. 9. 9 • Braun (2005, 2007): pedagogically motivated corpora (a) provide a more systematic range of material than individual texts or scattered collections of activities and, if well-designed, (b) offer a wider range of idiolects than the average material. Braun (2006) : thematic annotation, including topic keys and section titles, are particularly useful in the implementation of pedagogically motivated corpora
    10. 10. 10
    11. 11. 11 • Pérez-Paredes & Alcaraz (2009) For the time being, the natural corpus playground continues to be tertiary education. Our motivation: CL in the language classroom. The resulting annotated corpus can be seen as being integrative of language data and annotated pedagogy. Pedagogy can be annotated and, subsequently, accessed by corpus users.
    12. 12. 12 Linguistic analysis of interest in FLT ------> Linguistics comes first -------> DDL materials Concordances and corpus Researcher/Linguist End user What is possible.. (Alcáraz and Pérez-Paredes 2008)
    13. 13. 13 • Pedagogical analysis (and annotation) of language corpora ------> Pedagogy comes first -------> Pedagogy-driven DDL What is feasible.. Material (Alcáraz and developer/Teacher / Learner Pérez-Paredes End user 2008)
    14. 14. 14
    15. 15. 15 www.um.es/sacodeyl
    16. 16. 16
    17. 17. 17
    18. 18. 18
    19. 19. 19 sacodeyl.inf.um.es/sacodeyl-search/ webapps.ael.uni-tuebingen.de/backbone-search/
    20. 20. 20 • Default annotation tree has been developed by the teachers & researchers in SACODEYL
    21. 21. 21 What categories does this default category tree contain ? Topics Grammatical Lexical Style CEF Level ….
    22. 22. 22 Annotator friendly GUI
    23. 23. 23 Multilanguage • Supports a real multilingual annotation
    24. 24. 24 Outline 1. Background: corpora 2. The SACODEYL- BACKBONE approach at a glance 3. Getting down to annotating 1. 2. 3. Backbone Annotator: download and installation Texts and CMT Guided annotation 1. BACKBONE 2. Specific purposes: LADEX
    25. 25. 25
    26. 26. 26 What is XML TEI format? ▫ TEI  Text Encoding Initiative ▫ This is a format for storing corpora ▫ Has been promoted by OTA (Oxford Text Archive) ▫ Is a continuously growing format (more than 50 versions released yet, currently TEI P5) ▫ Is rapidly spreading among the available tools
    27. 27. 27 TEI Tools (Research) • TeiPublisher “This tool is a XML-based repository that allows the publication of TEI corpora to the public community and offers a search tool.” • Dexter “This is other annotator tool that used TEI as the format for the annotated files.”
    28. 28. 28 TEI Tools (Research) • Oxygen XML Editor and XMLSpy “These are XML Editors that allows the modification of the TEI files without any limitation” (These are complex for non-advanced users)
    29. 29. 29 TEI Tools (Research) • TAPoR (http://portal.tapor.ca/) “The Text Analysis Portal for Research (TAPoR) is a gateway to tools for sophisticated analysis and retrieval, along with representative texts for experimentation.”
    30. 30. 30 TEI Tools (Research) • TokenX http://www.unl.edu/libr/etext/tokenx.shtml “Is a text visualization, analysis, and play tool” • WordHoard http://wordhoard.northwestern.edu/userman/index.html “Is a tool for annotating or tagging texts by morphological, lexical, prosodic, and narratological criteria and for determining frequency information”
    31. 31. 31 TEI Tools (Research) • XAIRA XAIRA (XML Aware Information Retrieval Architecture) is an open source tool for constructing high-quality linguisticallymotivated search interfaces to large collections of XML documents.
    32. 32. 32 • The XAIRA search
    33. 33. 33 TEI Tools (Classroom) • A more interesting orientation. How I can use the Annotation in the classroom? Backbone Search Tool www.um.es/backbone
    34. 34. 34 Outline 1. Background: corpora 2. The SACODEYL- BACKBONE approach at a glance 3. Getting down to annotating 1. 2. 3. Backbone Annotator: download and installation Texts and CMT Guided annotation 1. BACKBONE 2. Specific purposes: LADEX
    35. 35. 35 Download BACKBONE Annotator + Install + CMT config http://www.um.es/backbone/ Pérez-Paredes, P., and Alcaraz-Calero, J. M. (2009). Developing annotation solutions for online Data Driven Learning. ReCALL 21, 55..
    36. 36. 36 How do I create a corpus?
    37. 37. 37 How can I add a new document to the current corpus? 1. Add document … 2. Select the text format/encoding 3. Select the new document
    38. 38. 38 What does the text format mean? • Mainly 4 text formats are supported: ▫ ▫ ▫ ▫ Plain text (written) .txt Oral text in Backbone Transcriptor format Oral text in SACODEYL Transcriptor format XML text in TEI standard format (text in special XML files)
    39. 39. 39 What does the text encoding mean?  This is the form in which the text is stored (related to the Multilanguage). (In Windows ANSI by default)
    40. 40. 40 Selecting the text to annotate • Select a document and annotate it 1. Open document… 2. Select the document
    41. 41. 41 Information shown in the working document • Section Number • Applied Categories to this section (Annotations) • Speaker (only in oral text) • Transcription
    42. 42. 42 What is a section? • Is a stretch of text that is “whateverly” motivated. • A fragment that could be useful in whatever context • A section can be established in any kind of text (oral and written) with the insertion of the special char (#) for division of texts into sections.
    43. 43. 43 Intuitive Annotation Process • Drag and Drop to Annotate a Section
    44. 44. 44 What is a Keyword? • “… [a] keyword is a stretch of language (a word, more than one word or a whole paragraph) that the annotator associates to a category…” Pérez-Paredes and Alcaraz, ReCALL, 2009 Vol 21. (1)
    45. 45. 45 What are Keywords? • BACKBONE Annotator supports the annotation of keywords • Just select text and apply a category by rightclicking
    46. 46. 46 Selective View • Offers a selective view of the information in order to facilitate the organization.
    47. 47. 47 Section title • Drag and Drop the special “Title” category to the desired section. • The title is rendered by a tool tip when placing the cursor on the section. (No tool tip = No title)
    48. 48. 48 Extensible annotation • Supports customization of the annotation • User can add his/her own annotation taxonomy or remove any annotation category
    49. 49. 49 How can I add a new category? ▫ Select the parent category. (i.e. Topics) ▫ Press Add Cat. Button. ▫ Fill in
    50. 50. 50
    51. 51. 51 How can I remove a category?  Select the category to remove (i.e. Topic)  Be careful … All the associated children will be removed also All the annotation with the tags will be removed also  Press Delete Cat. Button.
    52. 52. 52 How can I reorder the categories?  Select the category to reorder (i.e. Topic)  Press Up Cat or Down Cat. to move it.
    53. 53. 53 How can I customize a category?  Select the category to customize (i.e. Topic)  Press double click
    54. 54. 54 Can I manage metadata?
    55. 55. 55 What if I find mistakes? • Supports edition of the inserted texts. • Uses XML TEI standard for encoding corpora.
    56. 56. 56 Integration • Backbone Annotator is integrated with ▫ ▫ ▫ ▫ Backbone Transcriptor Backbone CMT Backbone Search SACODEYL VRP
    57. 57. 57 Resource Management • Offers enrichment of text with external resources • i.e. html links, videos, audios, etc.
    58. 58. 58 Where is the information stored? • Remember: All the information is store in one file. The corpus file which you have created.
    59. 59. 59 Make your corpus collaborative
    60. 60. 60 Make your corpus collaborative
    61. 61. 61 Make your corpus collaborative
    62. 62. 62 Outline 1. Background: corpora 2. The SACODEYL- BACKBONE approach at a glance 3. Getting down to annotating 1. 2. 3. Backbone Annotator: download and installation Texts and CMT Guided annotation 1. BACKBONE 2. Specific purposes: LADEX
    63. 63. 63 Backbone • Pedagogic Corpora for Content and Language Integrated Learning. Insights from the BACKBONE Project. The EUROCALL Review, 20, 2, September 2012 • Kurt Kohn, Applied English Linguistics, University of Tübingen (Germany) http://eurocall.webs.upv.es/index.php? m=menu_00&n=news_20_2
    64. 64. 64 webapps.ael.uni-tuebingen.de/backbone-search/
    65. 65. 65 Outline 1. Background: corpora 2. The SACODEYL- BACKBONE approach at a glance 3. Getting down to annotating 1. 2. 3. Backbone Annotator: download and installation Texts and CMT Guided annotation 1. BACKBONE 2. Specific purposes: LADEX
    66. 66. 66 Specific uses: Legal-administrative language and immigration This project aims at filling the existing gap between the linguistic studies combining legal language characterisation and the cultural and social implications of immigration, from a multilingual angle (English, Italian, French and Spanish). The project will contribute to the definition of the immigrant in each society, encouraging the debate on solidarity from a linguistic perspective. Our starting point is the compilation, tagging and annotation of a multilingual corpus comprising a collection of representative documents used in immigration (UE and non-UE citizens), issued by the different Public Administrations and institutions in Spain, UK, France and Italy, ranging from 2007 to 2011.
    67. 67. 67 • 1. Compilation and organisation of legal-administrative binding documents for immigrants in all the countries involved. • 2. Contrastive analysis of all those terminological, phraseological and discoursive aspects which can help us shape the cultural identity of administrators and immigrants. • 3. Multilingual study of the legal-administrative language analysed in the research corpus textual typology. • 4. Contrastive characterisation of the foreign user and cultural implications.
    68. 68. 68 • LADEX Annotator (Multilingual automatic tagging) + Manual collaborative annotation • http://www.um.es/languagecorpora
    69. 69. 69 Annotation Aim • Why are you annotating? • What is the purpose of your annotation? • What use are you giving to your annotation?
    70. 70. 70 Discussion and debate • Pedagogical annotation vs. Morphological tagging paradigm • Learner-centered vs. Researcher-oriented • Indirect applications of language corpora vs. Direct applications • Constraints of traditional CL in the languagge classroom
    71. 71. 71 Discussion and debate • Cognitive demands of traditional CL in the language classroom: learner as a reseacher and as a traveller • Is CL an extra hassle in language classrooms? (Mauranen 2004) • Customization of language corpus/collection of texts • Mediation role of corpus-based resources in the FLT classroom • Authenticity issues (Widdowson)
    72. 72. 72 References and further reading • Braun, S. 2005. “From pedagogically relevant corpora to authentic language learning contents”, ReCALL 17/1:47-64. • Braun, S. 2006. “ELISA - a pedagogically enriched corpus for language learning purposes”. In Corpus Technology and Language Pedagogy: New Resources, New Tools, New Methods, Frankfurt M: Peter Lang. (eds) 2547. • Braun, S. 2007. “Integrating corpus work into secondary education: from data-driven learning to needs-driven corpora”. ReCALL 19/3: 307-328. • Mauranen, A. 2004.” Spoken - general: Spoken corpus for an ordinary learner”. In How to Use Corpora in Language Teaching, Sinclair, J. McH. (Ed), 89–105. • Pérez-Paredes, P. and Alcaraz, J.M. 2009. “Developing annotation solutions for online data-driven learning”. ReCALL,21,1, . • Römer, Ute. (2008). “Corpora and Language Teaching”. In Corpus Linguistics. An International Handbook, Lüdeling, Anke & Merja Kytö (eds.). Berlin: Mouton de Gruyter. • Widdowson, H.G. 2003. Defining issues in English Language Teaching. Oxford: Oxford University Press. perezparedes.blogspot.com

    ×