Language corpora and the language classroom.
Upcoming SlideShare
Loading in...5
×
 

Language corpora and the language classroom.

on

  • 9,533 views

Pérez-Paredes, P. & Díez Bedmar, B. 2010.

Pérez-Paredes, P. & Díez Bedmar, B. 2010.

Language corpiora and the language classroom. Murcia: Consejería de Educación de la CARM.

ISBN 978-84-692-4229-2

Statistics

Views

Total Views
9,533
Views on SlideShare
8,703
Embed Views
830

Actions

Likes
9
Downloads
222
Comments
0

10 Embeds 830

http://www.scoop.it 441
http://ncomoodle.learning.tku.edu.tw 128
http://140.124.27.206 108
http://perezparedes.wordpress.com 78
http://languagehelper.wikispaces.com 35
http://perezparedes.blogspot.com 27
http://www.um.es 9
http://www.slideshare.net 2
url_unknown 1
http://korpusi.pbworks.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Language corpora and the language classroom. Language corpora and the language classroom. Document Transcript

    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Language corpora and the language classroom 1. Introduction These days, language corpora are being used by language teachers, researchers and students more and more often. Computers have become widely available in homes and schools, corpora can be searched on the Internet for free and corpus resources have improved the quality and the access to the methods of corpus linguistics in applied fields such as foreign language teaching. Compiling your own ad-hoc corpus or a corpus of your own students is easier today than ever before and free resources abound. The most important application of corpora in language classrooms is called Data-driven learning. Corpus Linguistics (CL) and Data-driven learning (DDL) are two terms that have caught the attention of teachers in foreign language teaching (FLT) and researchers alike for a decade now. This is so because the assumptions behind CL and DDL are of enormous importance to language researchers and FL teachers. In a very recent publication, O'Keeffe, McCarthy and Carter (2007:21) state the following about the application of language corpora in FLT: As well as providing an empirical basis for checking our intuitions about language, corpora have also brought to light features about language which had eluded our intuition […] In terms of what we actually teach, numerous studies have shown us that the language presented in textbooks is frequently still based on intuitions about how we use language, rather than actual evidence of use. It seems that language corpora can help us discover that which apparently appears undisputed in prescriptive or in intuition-led textbooks and other reference materials. In the following paragraphs, we will offer a brief account of the implications of CL and DDL for mainstream FLT. In particular, we aim to present useful insights into how using language corpora can help our teaching. Most of the resources presented in this chapter are freely available on the Internet. Page 1 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. 2. Corpus linguistics and Data and Data-driven learning in a nutshell 2.1. Data in FLT: preliminary issues Data-driven learning is a language learning approach that is “basically developed through self-conscious activities instead of being imparted through conceptual knowledge” (Pérez Basanta, C and Rodríguez Martín: 146-7). In DDL, learners become active researchers, they see language from a different perspective and discover language and communication facts that otherwise may remain unseen. In DDL, reading concordance lines is a usual practice. Take the word important, a basic adjective that learners use on an everyday basis in schools. The following screenshot from Collins WordbanksOnline English corpus1 shows fifty random uses of the Word in a 10- million corpus of spoken British English: 1 http://www.collins.co.uk/Corpus/CorpusSearch.aspx Page 2 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 1. Sample concordances of important in the Collins WordbanksOnline English corpus. In a way, DDL promotes vertical reading rather than horizontal reading as learners are invited to look at the accumulated frequency and co-occurrence of lexical items. In Figure 1, learners could note the following: The words to the left of important: more, most, quite, awfully, very, etc. The words to the right of important: to + infinitive, factor, thing, point, etc. However, using concordance lines is useful to note language behaviour that goes beyond the boundaries of two words that appear in contiguity. Take the word sure as an instance. The Cambridge Advanced Learner‟s Dictionary2 offers 8 entries for the word. You can find the entries and examples below: 1: certain; without any doubt: "What's wrong with him?" "I'm not really sure." 2 http://dictionary.cambridge.org/ Page 3 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. I'm sure (that) I left my keys on the table. I feel absolutely sure (that) you've made the right decision. It now seems sure (that) the election will result in another victory for the government. Simon isn't sure whether/if he'll be able to come to the party or not. Is there anything you're not sure of/about? There is only one sure way (= one way that can be trusted) of finding out the truth. See also cocksure. 2 be sure of/about sb to have confidence in and trust someone: Henry has only been working for us for a short while, and we're not really sure about him yet. You can always be sure of Kay. 3 be sure of yourself to be very or too confident: She's become much more sure of herself since she got a job. 4 be sure of sth be confident that something is true: He said that he wasn't completely sure of his facts. 5 be sure of getting/winning sth to be certain to get or win something: We arrived early, to be sure of getting a good seat. A majority of Congress members wanted to put off an election until they could be sure of winning it. 6 be sure to to be certain to: She's sure to win. I want to go somewhere where we're sure to have good weather. 7 make sure (that) to look and/or take action to be certain that something happens, is true, etc: Make sure you lock the door behind you when you go out. 8 If you have a sure knowledge or understanding of something, you know or understand it very well: I don't think he has a very sure understanding of the situation. Isolated from any context, sure is usually taught as being highly assertive, that is, it is taught to express certainty like I’m sure I was there. Of course, there is nothing wrong with this. As you have read above, this is the usual mainstream use of the word. However, if we search for sure in a corpus, in this case the SACODEYL English corpus of European young people, we will find that there is a new pattern which emerges clearly: I‟m not sure + what / if/ whether. See Figure 2: Page 4 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 3: sure in SACODEYL English corpus. It appears that I’m not sure is a powerful pattern to express hedging or tentative opinion as in I’m not sure if I’d like to live there. Or followed by a canonical Subject + Verb + Complement clause to indicate contrast or opinion as in I’m not sure. I’ve always wanted to be... or in I’m not sure. I find art relaxing because… As you can see, when we examine the different contexts in which a node is found, that is, the word you are looking up, we can clearly see different patterns of use that are not always found in textbooks or dictionaries. Corpus linguists often discuss this phenomenon and try to account for it by looking at language as a lexico-grammatical field of interplay rather than one where meaning is created by the use of word in isolation (i.e. sure). Bernardini (2004:16) highlights the fact that in DDL there is a “shift of emphasis from deductive to inductive learning routines” which has a great impact on the agents of FLT. This is summarised in Table 1: FLT agents Shift Teachers Become coordinators of research and facilitator Learners Learn how to learn through exercises that involve the observation and interpretation of patterns of use Pedagogic grammars Are now informed by enough evidence and stimuli for the learner to Page 5 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. arrive at developmentally-appropriate generalisations Table 1. Shift of emphasis in DDL-FLT (Bernardini 2004: 16-7). DDL then is about using data to promote richer language learning experiences. The definition needs clarification, though. D in DDL stands for data, in other words, for language data: However, we should say that in the CL literature these data markedly present a computational reading. We will try to go deep in the implications for language teachers and deflate the obscurity that the term may shed in the following paragraphs. 2.1.1. Our English teaching is mediated by language data We may have not reflected on the issue before, but when we decide on a textbook we are opting for a particular set of language data to be used in our classroom. In all probability, you face a situation where the Education Authorities have set an official curriculum that you are bound to abide by. In a similar way, as a member of a large institution, you are required to follow certain general methodological guidelines. Leaving organizational aspects aside, however, teachers have the chance to reflect on their teaching and choose the materials that best suit their learners. What choices can you make in terms of the contents of your teaching? What are the main ingredients of your teaching? Do you stick to a textbook? If so, to what extent do you or your Department consider the language in there? Have you examined the language used in your textbook? This is a fundamental issue that deserves our attention. EFL teachers, as most professionals in other teaching areas, rely on solvent, reliable publishing houses that make an effort to mediate between the learners and their teachers. In this process, the teacher, or group of teachers of a school, has the opportunity to revise first and select then the textbooks that will be later used. If we use language corpora as a complement to our teaching, we will be enlarging the width of the scope of the language that we present to our students and, certainly, we will be enriching their learning environment (Aston 1997). But, before we move on to dealing with the ways in which we can use language corpora, let us consider briefly the very basics of corpus linguistics. Page 6 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. 2.2. Introducing Corpus Linguistics Corpus linguistics (CL) makes use of data to gain insight into how language works. A well- known definition for corpus is the following: Any collection of more than one text can be called a corpus, (corpus being Latin for "body", hence a corpus is any body of text). But the term "corpus" when used in the context of modern linguistics tends most frequently to have more specific connotations than this simple definition3. This definition is well rooted in the linguistic tradition, and thus the connotations that McEnery and Wilson bring up are concerned with the role of a corpus in a research-oriented paradigm. These connotations are  representativeness,  size,  machine-readable form and  standard reference. If linguists claim that using a corpus is a convenient way to research language use and behaviour, they have to make sure that their tool, that is their language corpus, and their methodology are geared towards maximizing the representative quality of the language samples that have been included in the corpus. McEnerey and Wilson have put it this way: We are therefore interested in creating a corpus which is maximally representative of the variety under examination, that is, which provides us with an as accurate a picture as possible of the tendencies of that variety, as well as their proportions. What we are looking for is a broad range of authors and genres which, when taken together, may be considered to "average out" and provide a reasonably accurate picture of the entire language population in which we are interested4. An example of all this is the British National Corpus (BNC). The BNC claims to be representative of the English language used in the UK in the late 80‟s; its size (100 million words) is big enough to include most communications genre and textual types; it is of course electronic and, as a consequence of it all, it has become a standard reference of British English. The BNC is introduced in its website as follows: The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English from the later part of the 20th century, both spoken and written. The latest edition is the BNC XML Edition, released in 2007. 3 McEnery and Wilson. Corpus Linguistics. Available at http://bowlandfiles.lancs.ac.uk/monkey/ihe/linguistics/corpus2/2fra1.htm 4 Idem. Page 7 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. The written part of the BNC (90%) includes, for example, extracts from regional and national newspapers, specialist periodicals and journals for all ages and interests, academic books and popular fiction, published and unpublished letters and memoranda, school and university essays, among many other kinds of text. The spoken part (10%) consists of orthographic transcriptions of unscripted informal conversations (recorded by volunteers selected from different age, region and social classes in a demographically balanced way) and spoken language collected in different contexts, ranging from formal business or government meetings to radio shows and phone-ins5. The BNC can be searched free of charge from http://www.natcorp.ox.ac.uk/ The results are limited to 50 hits, but this is enough to have a clear idea of what we are looking into: Figure 3. The BNC website. However, using corpora is not the ultimate, one and only solution to linguistic inquiry and research. This is not the place to revisit the old controversy between Noam Chomsky and Charles Fillmore, two influential linguists of the second half of the XXth century. The former has overtly criticized the use of language corpora as they are not seen as a reliable way to render the complexity and vastness of language. Chomsky believed that the rules governing a language could actually be scrutinized through introspection; the actual performance was considered, by contrast, something that could not be apprehended. Fillmore criticised armchair linguists that do not use real, that is, attested language data and, on the contrary, rely on their own intuition and idiolect to develop complex theories of language. 5 From http://www.natcorp.ox.ac.uk/corpus/index.xml Page 8 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. By the way, Fillmore criticises similarly corpus linguists that waste their time on design issues, but that‟s a different story. The point here is that there has traditionally been a controversy between introspection and data examination as valid tools for linguistic analysis. Corpus Linguistics has gained now the interest of many researchers that believe that data need to be collected before we can jump into conclusions about language use. In this sense, CL methodology is empirical and data-driven. Corpus-based research can be then characterised by two main features (Conrad 1999:3-4): 1. The use of a principled collection of naturally-occurring texts, that is, a corpus. The BNC discussed above. 2. The use of computers for language analyses. Depending on the items being analysed, these can be automatic or may need human interaction. Corpus-based studies include both quantitative analyses and functional interpretations of language use. The following table offers the basics of CL: Page 9 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Term Explanation Chunks Groups of words that cluster together in n-number of words, i.e., 2,3,4,5, etc. These are not necessarily phrases (i.e. Noun Phrases) or clauses, but rather words that combine together in a statistically significant way. I don’t know, what I really mean or a couple of are good examples of chunks. Collocates Words that occur frequently in contiguity or almost in contiguity. To determine whether a collocate is significant, the software package performs statistical analyses. Concordance Lines of text which show a node in the middle. The node is the word or string lines of words that is being searched in a corpus. Concordancer The software that generates concordance lines. Corpus A principled collection of texts. This collection should follow strict design guidelines if the corpus is to represent a language or a register. Wordlist The list of words that are found in a corpus or in a particular text. This list usually shows the frequency of occurrence and, possibly, other statistical indexes. Table 2. The basics of CL. All these terms are usually found in descriptive accounts of English and have a very interesting potential in language learning. For example, chunks are strings of n-words that cluster together in a systematic way. Linguists such as Lewis (1993) or Nattinger and De Carrico (1992) have stressed that lexis is primed over grammar in discourse: Lexis is central in creating meaning, grammar plays a subservient managerial role. If you accept this principle then the logical implication is that we should spend more time helping learners develop their stock of phrases, and less time on grammatical structures6. Corpora are useful in revealing that the language speakers use relies heavily on chunking, that is, the repetition of string of words. O'Keeffe, McCarthy and Carter (2007:60) highlight that “language is available for use in ready-made chunks to a far greater extent than could ever be accommodated by a theory of language which rested upon the primacy of syntax”. Let us give you real instances of chunking in English. These authors have used the CANCODE corpus 7, a 5-million word corpus of spoken British English, to generate the most frequent chunks of n- words. These are the results for the top 1 and 2: Top 1 chunk Top 2 chunk 3-word chunks I don‟t know a lot of 4-work chunks You know what I know what I mean 5-word chunks you know what I mean at the end of the 6-word chunks do you know what I mean at the end of the day and these for the top 15 and 19 (chosen at random): Top 15 chunk Top 19 chunk 6 Islam and Timmis: http://www.teachingenglish.org.uk/think/methodology/lexical_approach1.shtml 7 http://www.cambridge.org/elt/corpus/cancode.htm Page 10 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. 3-word chunks I think it‟s you know the 4-work chunks or something like that that sort of thing 5-word chunks I don‟t know what it an hour and a half 6-word chunks and at the end of the if you see what I mean (top 16) O'Keeffe, McCarthy and Carter (2007:71) state that despite being syntactic fragments, these chunks perform a very important pragmatic function beyond the word level and, significantly, many have a discourse marking function (I mean, you know, you know what I mean, at the end of the day, if you see what I mean,...). In the same way, a corpus can be used to generate collocates, frequency lists and, as seen, concordance lines. There are software packages that can handle this. Probably WordSmith 5.08 is one of the most complete suites available. Interesting non-commercial applications include: Generate concordance lines for every word in a text: Text-based concordances: http://www.lextutor.ca/concordancers/text_concord/ Generate chunks for a text: N-Gram phrase extractor: http://www.lextutor.ca/tuples/eng/ Search principled corpora: Online concordancer: http://www.lextutor.ca/concordancers/concord_e.html Generalte concordance lines, frequency lists, etc.: Tubo Lingo: http://www.staff.amu.edu.pl/~sipkadan/lingo.htm 8 http://www.lexically.net/wordsmith/ Page 11 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 4. Online concordancer. Page 12 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. 2.3. How can we make use of Corpus Linguistics? Indirect approaches Following Geoffrey Leech, Römer (2008) distinguishes between indirect and direct applications of CL in the field of language teaching. Indirect approaches to corpora provide access to corpus-informed insights into the nature of language. Those who consume this information are typically, although not exclusively, researchers and language material writers and designers. The typical users of this approach are teachers and learners themselves. The following figure summarises this dichotomy: Figure 5. Indirect and direct applications of CL in the language classroom (Römer 2008). Direct approaches are focused on straight, hands-on learning activities and the generation of classroom material. These direct hands-on experiences can be either guided or unguided by the teachers, and thus it is likely that most teachers find tasks that are suitable to their students‟ needs and contexts. Indirect approaches to using corpora in the language classroom have occupied the agenda of applied linguists for over a quarter of a century now. These approaches have benefited from linguistic research into the nature of language and offer a fresh non-normative view of naturally occurring language. One of the main contributions of these studies is that corpus data very often question our perceptions of how language works. A good example of this is Biber (1988) and, particularly useful in the context of FLT, Biber at al. (1999): Page 13 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 6. Longman Grammar of Spoken and Written English (LGSWE). The authors of the LGSWE claim that this work “describes the actual use of grammatical features in different varieties of English: mainly conversation, fiction, newspaper language, and academic prose […] The LGSWE adopts a corpus-based approach, which means that the grammatical descriptions are based on the patterns of structure and use found in a large collection of spoken and written texts, stored electronically, and searchable by computer” (Biber et al. 1999: 4). So the idea here is that a well-designed corpus can be useful in learning more about how language works. This is useful for both native and non-native speakers as even the latter cannot rely on pure intuition to determine how language works across every single register and communicative domain. Let us have a look at one syntactical construction to illustrate the usefulness of corpora in the language classroom. Existential clauses contain, in most cases, be as a verb and there as a subject: There is no coffee is a nice example of locative here. There, however, introduces other verbs: seem, appear, suppose and use to are nice examples. When to use one or another as their meanings are so close? In the LGSWE we find corpus-driven information that tells us that the frequency of appearance of these verbs after existential there depends on the textual and domain features of the communicative event. Thus there exist/exists is very frequent in academic texts while it is rare or infrequent in conversation, fiction and news language. There come/comes, on the contrary, is infrequent in academic language, conversation and news, but very often found in fiction texts and creative language use. Figure 7 illustrates this point: Page 14 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 7. Verbs other than be in existential constructions. Biber et al. (1999). When these and similar verbs are followed by to be we discover interesting facts. There seem/seems to be is found to occur across all 4 domains and textual types while there used to be is untypical and not frequent at all in fiction, news or academic language: Figure 8. To be after some verbs in some existential constructions. Biber et al. (1999). In these examples we can note the interplay between grammatical categories and register. Page 15 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. 3. Direct approaches As stated, direct approaches are more prone to immediate, straightforward classroom applications. In some schools, it might be convenient to make use of a computer room while in others teachers will prefer to develop materials that can be printed and later distributed. The nature of the lesson will determine what kind of interaction we expect from our students. 3.1. Some tips If you want your learners to plunge into using a corpus, our suggestion is to follow a carefully-planned route: 1. Select a small group of learners. Using technology is cumbersome at times and computers tend to crash in multimedia LANs which are often used by many. If your LAN restricts IPs or domains, make sure before hand that the sites you plan to use are availble. 2. Avoid meta-language, such as linguistics, node or principled corpus. It is language, real language that your learners will be more interested in. 3. Before getting your students to use a concordancer or a similar tool, distribute activities where they can get used to reading vertically rather than horizontally. Make sure they get used to interpreting the context and making hypothesis about contexts of use and prosodies, that is, whether the line is used in a derogatory way or positively. 4. Select what you want your students will be looking up well beforehand. Examples or activities that are over the top easily discourage students. 5. Try to put interesting questions to your students. Motivate them and make them become interested in turning themselves into researchers or, better, detectives. 6. Select carefully the corpus you want to use. You may consider building your own corpus. 3.2. Activities: using SACODEYL A corpus is an excellent tool to discover language behaviour and to learn more about collocations and patterning. In teaching contexts, principled corpora may not adapt well to your students‟ level, especially if these are very young. We recommend that you build your own collection of texts if they are suitable to your students‟ needs. However, using SACODEYL is a more straightforward option if you want to use teen talk, multimedia corpora: Page 16 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. By using a corpus as a tool to find out language, learners are given the chance to empower their inductive skills to learn about language, which is highly instrumental for further learning. Sinclair (2004:288) is definitely optimistic about the unmediated use of reference corpora in the language classroom: ...both teacher and student can make use of a corpus right away, with only a modest few hours orientation; there is no need to wait for the new textbooks and reference books. Only fairly simple queries can be handled at this stage, but the results can be illuminating and very helpful. For this, you will need a computer of normal performance, a corpus and some query software. Will the corpus be 100% reliable, comprehensive and representative? Of course not, but do your present books match these targets? Or your reference grammars and dictionaries? Or any native speaker models? Or any combination of these? Of course not. Despite Sinclair‟s statement, the teaching context in secondary education is still far from complying with much of the requirements above. Good reference corpora are commercial and search tools are difficult to handle9. Mauranen (2004:1999) has voiced her concern for the actual use of innovation in classrooms: No teaching method can become an important innovation, whatever its potential, if it does not make its way to the normal classroom where teachers and students ca use it as part of their everyday routines, whit not too much extra hassle. Fortunately, there are now a few instances of pedagogical corpora whose focus is more on learning than on linguistic research and which happen to be free to use. SACODEYL is one of these pedagogically-motivated corpora. ELISA, its predecessor and inspiration, is another interesting effort: ELISA is a collection of video-based interviews with native speakers of different varieties of English (e.g. US, England, Scotland, Ireland, Australia) and from different walks of life. They talk about their professional career. All interviews follow a general pattern, covering a similar range of topics, e.g. the what the speakers do, their educational background, how they started their career or business, the type of projects they are involved in, their daily routines and future plans. While some of the speakers engage in unusual professions (e.g. a tour guide at Ayers Rock, a guitar teacher, a travel journalist and an arts therapist) and thus make for the attraction of the materials, they all describe issues of general interest in professional contexts. The corpus currently contains 25 interviews of 5 to 15 minutes. the transcripts amount to about 60,000 words in total10. 9 Guy Aston and Lou Burnard published in 1998 The BNC handbook: exploring the British National Corpus with SARA. Edinburgh Textbooks in Empirical Linguistics, an excellent reference book to fully exploit SARA. 10 http://www.uni-tuebingen.de/elisa/html/elisa_index.html Page 17 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. SACODEYL offers young learners the language and the voices of their peers. As in ELISA, SACODEYL kids talk about their daily routines, about themselves, their schools, their hometowns, their leisure time activities and hobbies, films, books, sports and many other topics. The SACODEYL corpus has been annotated with a view on pedagogical applications. This makes SACODEYL a very interesting complementary material in mainstream teaching where teachers and students can find a familiar range of language/communications context. The following figure illustrates this: Figure 9. SACODEYL search categories. These categories resemble the language and the communication-oriented methodology of mainstream language teaching. Learners ant teachers using SACODEYL may want to navigate the English corpus in exactly the same way as they mavigate the contents of their textbook. In SACODEYL, every interview has been split into sections, that is, convenient teaching and learning stretches of language which have a pedagogical value. Each section has been annotated by experienced teachers who have assigned them a full array of categories and subcategories. Having annotated the corpus, this can be searched accordingly: Page 18 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 10. SACODEYL search categories in detail. Users can also browse interviews: Page 19 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 11. Browse area for SACODEYL English corpus. And sections within interviews, search for sections that meet the criteria you set: Figure 12. Browse area for SACODEYL section search. Let us consider some activities for the language classroom. We assume that your learners are Secondary School students of English, so we will use SACODEYL English corpus, a small Page 20 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. corpus of teenage talk contributed by some 25 interviewees from the Reading area in the UK. Here is a selection of activities that illustrate the type of 3.2.1. Activities focused on communication and attention to form Tell your students to search for [Reading]. You may want to introduce them to the area and neighbouring cities, all of them widely known. Ask them to read the concordance lines and get them to classify (A) words on the left, (B) words on the right and (C) contexts of use: Figure 13. Simple SACODEYL word search. The following screen shows the number of hits by displaying the concordance lines: Page 21 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 14. SACODEYL Search tool. You may want to guide your students in their search. Providing tables to fill in is usually very productive as this keeps students focused on the task, which becomes more convergent: A Write here the most frequent words or punctuation to the left of Reading (like, feel, tell) about (live, be) (here) in the (centre, outskirts) of B Write here the most frequent words or puntuation to the right of Reading as a place ./? festival C Guess: What is it talked about? Context 1 Context 2 Context 3 Page 22 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Reactions to/ opinions on Staying in Reading of leaving Reading festival your hometown / Reading / Travelling where you live Table 3. Fill-in table. In A and B students are invited to observe the surrounding context of a word and note the accumulation of certain instances to the left or to the right of the node. In C, students are invited to make hypotheses about what is being talked about. If desired, you can explore uses of like about / feel about / tell about or [Murcia/ Cartagena as a place] or, more from a communicative perspective, expressing opinion about your city/ place or the place where you live. If you tell your students to search for [like about], they will be given instances where kids use it in a real context embedded in the flow of speech. And more importantly, your students will be presented with an opportunity to disambiguate other uses of [like about]: Figure 15. SACODEYL Search tool. In the case highlighted above, [like about] is used as a hedge, a very common feature of spoken English. This is a convenient way to combine communication oriented teaching and Form-focused instruction. This range of activities is focused on analysing the context of use of a given word [Reading], both linguistically and communicatively. Page 23 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. In a unit where music and concerts are presented, you may want to ask your students to find out about [Reading Festival]. This is what they may find11: Figure 16. SACODEYL Search tool. From here, students can go to the interview section where the speaker talks about it: 11 At the time of writing, the corpus search facility was under construction, so search results may vary. Page 24 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 17. SACODEYL Search tool: section level. and read and listen to what this speakers says about it: Figure 18. SACODEYL English corpus: section level. It is interesting to see how the online nature of spoken discourse affects the way we put things while speaking. In this very short extract, your students can find the following, among others,: -Native correction: [gonna to] -Unfinished sentences: [been so, but] -Contractions not frequently used by Sapnish speakers: [it‟ll be] Page 25 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. As put by Bernardini (2004: 17) working “concordancing in particular may prove unique in the acquisition and restructuring of competence [...] Language learning may be viewed as an inductive process in which meaning and form come to be associated”. 3.2.2. Activities focused on attention to form and communication Römer (2008: 19) has pointed out that concordance lines can be used by teachers to “create DDL exercises tailored to their learners‟ proficiency level and their particular learning needs”. A case in point is the use of articles. This will be dealt with later in chapter 4 from a different angle. Let us search for sections in SACODEYL English corpus that have been annotated as being representative of this particular linguistic feature: Figure 19. SACODEYL English corpus: category search on section level. From this you may want to select stretches of language that can be submitted to students for evaluation and analysis or simply they can be used as materials to improve their mastery of the form. The following bits are interesting for different reasons. A is actually very convenient to see the use of the indefinite article: (A) Page 26 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Interviewer: So, what kind of house do you live in? Can you describe what kind of house you live in? Rachel: It‟s a semi-detached and it‟s got a garage and a big garden and it‟s quite big. It‟s got quite a lot of rooms but I have to share my room with my sister. You could present this in a cloze format: Interviewer: So, what kind of ...house do you live in? Can you describe what kind of ...house you live in? Rachel: It‟s ... semi-detached and it‟s got ...garage and ... big garden and it‟s quite big. It‟s got quite ... lot of rooms but I have to share my room with my sister. In B, we can notice the presence of the zero article: (B) Interviewer V: You say you‟ve got a lot of work this year why is that? Sam: It‟s our first year of GCSEs so you‟ve got course work and it‟s like writing essays for different subjects. And recently we‟ve been doing English we did a we did a we did course work on a book Hard Times by Charles Dickens. Which was a bit boring but, but we‟ve finished that now so it‟s alright. You could present this in a cloze format: Interviewer V: You say you‟ve got a lot of work this year why is that? Sam: It‟s our first year of GCSEs so you‟ve got ...course work and it‟s like writing ...essays for ...different subjects. And recently we‟ve been doing ...English we did a we did a we did ...course work on ... book Hard Times by Charles Dickens. Which was a bit boring but, but we‟ve finished that now so it‟s alright. In actual fact, (B) can be expanded easily into an interesting source for pragmatic information including sentence restructuring [did a a we did], sentence relatives to express evaluation [Which was a bit boring] and conclusion [so it‟s alright]. Barlow (1996) sees in activities like these a potential for teachers to enrich the learning environment and students‟ knowledge of language. For a thorough account of concordance-based DDL, we suggest reading a practical book on the issue (Tribble and Jones 1990): Page 27 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 20. Concordances in the classroom, by Chris Tribble and Glyn Jones. Longman 1990. Page 28 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. 4. Indirect approaches: Learner corpora in the EFL classroom 4.1. Definition Among the many types of corpora which can be compiled, analysed and used (see McEnery, Xiao and Tono, 2006, for an overview), Computer Learner Corpora (CLC) stand out as one of the most powerful pedagogic tools for the EFL or ESL classroom. As recently defined, they are „[…] electronic collections of foreign or second language learner texts collected on the basis of strict design criteria.‟ (Granger, Kraif, Ponton, Antoniadis and Zampa, 2007: 254) In other words, a learner corpus is compiled when the oral or written texts produced by your students of English are collected with strict design criteria, put in electronic format, and then stored in your hard drive, memory stick, etc., so that you can conduct analyses with programmes like WordSmith Tools, already mentioned: Figure 21. From oral or written texts to a computer learner corpus. Thanks to the availability of computers and freely available software to carry out analyses, Learner Corpora Research (LCR) has been a fruitful field since the second half of the 1990s. From that moment onwards, the growing number of publications either in edited volumes (cf. Granger, 1998; Granger, Hung and Petch-Tyson, 2002; Guilquin, Papp and Díez-Bedmar, in press, etc.), or international journals (cf. Corpora, Applied Linguistics, English Corpus Studies, Journal of English for Academic Purposes, ReCALL, etc.) shows the potential of this type of research and constitutes the first steps to the awareness of the possibilities that CLC can offer for Second Language Acquisition and for the TEFL or TESOL classroom. Page 29 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. 4.2. Types of CLC Due to the importance of CLC-based results, the number of CLC has mushroomed since the second half of the 1990s. The research questions pursued by various researchers or research teams have fostered different types of CLC, which are frequently classified according to four related variables, namely the mode of the language in the learner corpus, its size, the type of intervention (i.e. when the CLC-based will be applied in the design of materials, the sequencing of the curriculum, etc.), and the type of annotation in the corpus. Written Mode Spoken Multimedia Big (commercial or some research teams) Size Small (research) Delayed Human Intervention Type of Intervention12 Early Human Intervention Raw Type of annotation13 POS-tagged Semantically- tagged Error-tagged Table 4. Main variables considered for the classification of learner corpora. 4.3. Methodologies used with CLC Compiling students‟ production does not constitute new practice to teachers of English as a second or foreign language, as it has always been considered to create remedial exercises, test their command of the foreign language, etc. However, the methodology used to conduct the analysis of the students‟ production has changed along time, as researchers and teachers have focused their attention on different aspects (the students‟ L1, the target language, etc.) and technology has made it possible to compile CLC, i.e. learners‟ real data in electronic format. Table 5 shows the three main methodologies used before the arrival of CLC. The first one, Contrastive Analysis, in its strong form, did not consider the students‟ production, but the 12 This distinction was made by Sinclair (2001, vii). 13 For the types of annotation, refer to McEnery and Wilson Page 30 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. similarities and differences between the students‟ L1 and their target language (i.e. Spanish and English), in order to predict the difficulties that students would have. The weaknesses found in this methodology led researchers to shift their attention to Error Analysis, whose theoretical principles and methodological issues were provided in a series of articles in the 1960s and 1970s (and reprinted in Corder, 1981). Specially outstanding was the paper „The significance of learners errors‟ (included in Corder, 1981), which proved that errors were crucial to researchers, teachers and students, since they all could learn from them and apply that knowledge to their research, teaching practice or learning process. Thus, the steps for conducting an EA were followed by many teachers and researchers and the results published, on some occasions, as dictionaries and lists of common errors. However, Error Analysis only considered errors and dismissed the learners‟ correct use of the foreign or second language. This led Selinker to his Interlanguage Analysis (IA) (Selinker, 1972), which examined the students‟ entire production, i.e. errors and non-errors alike. In this way, it was possible to obtain a better description of the students‟ use of the foreign language when performing a task at a specific point in time in their language learning process: their interlanguage. Methodology Focus of interest Publications Contrastive Analysis (CA) Comparison of Lado (1957) the students‟ L1 and their TL Error Analysis (EA) Students‟ real errors Corder (1981) Pre CLC The students‟ whole Interlanguage Analysis (IA) Selinker (1972) production, errors and non- errors Table 5. Methodologies used to describe the students‟ language before CLC. Despite not in a systematic way, teachers of English as a foreign or second language frequently analyse their students‟ production following any of these methodologies or a combination of some of them. For instance, an Error Analysis is conducted when a teacher corrects a batch of essays and uses a code system, i.e. an error taxonomy,14 to make the students aware of the type of error made. Thus, „sp‟ may stand for a spelling error, „wo‟ for word order, „prep‟ for a problem with a preposition, etc. After marking all the essays, and skimming his or her annotation, the teacher realises that the most frequent error in the compilation of essays has to do with a certain aspect of the foreign language (be it prepositions, articles, verb tenses, etc.). If the correct instances of those aspects are considered together with the incorrect ones, an Interlanguage Analysis is conducted. However, if the students‟ L1 is compared to their TL 14 For an overview of various error taxonomies, refer to (Dulay, Burt and Krashen, 1982: 146-197) or James (1998: 102-117). Page 31 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. either before or after analysing their production in an attempt to explain the causes of the students‟ errors, a CA in its strong or weak version, respectively, is completed. The manual analysis of the students‟ errors, following a CA, EA or IA methodology, proves a time- and effort- consuming task which a teacher can only do with a limited number of essays, as it is necessary to go to the essays, look for the errors, highlight, classify and count them, make sure all the errors are being considered, look for the correct use of the aspect of the language being analysed, compare the use of the aspect under analysis in the L1 and the FL, etc. Fortunately, those processes have been sped up thanks to the improvement in technology and, consequently, the advent of CLC, their electronic format being among their main advantages (Nesselhauf, 2004: 139-40), because they make their compilation and their analysis easier. Not to fall prey of the temptation to collect huge disorganized amounts of data, as it is the case with corpora in general (see section 2.2. above), strict design criteria are to be observed when compiling a learner corpus. Special attention needs to be given to the principles of authenticity and representativeness, and all attempts are to be made to avoid the effects of variability not to compare aspects from a not homogeneous learner corpus. Thus, if the teacher aims at representing students‟ in-class argumentative writing at intermediate level, pieces of writing which belong to other genres, which are written by students at other proficiency levels, or at home (and presumably with access to reference materials), should not be included in that corpus, since the results would be biased. Just consider, from your own experience, the difference in the type and amount of errors which an argumentative essay written by a student in class (and without the use of dictionaries, online resources, etc.) and at home would have or, likewise, the type of errors that you expect from descriptive writing as compared to narrative writing. Drawing from the methodologies in the pre-CLC era, the analysis of students‟ use of language, as represented in a learner corpus, is nowadays being made in a systematic and scientific way following Computer-aided Error Analysis (CEA), Contrastive Interlanguage Analysis (CIA) or the Integrated Contrastive Method (ICM): Methodology Focus of interest Publications Computer-aided Error AnalysisStudents‟ real errors, as (Dagneaux, Dennes (CEA) attested in a CLC and Granger, 1998) Contrastive Interlanguage Comparison of (Granger, 1996) Analysis (CIA)  NS vs. NNS production  NNS vs. NNS production Integrated Contrastive Method  CA (Granger, 1996; CLC (ICM)  CIA Gilquin, 2000/2001) Table 6. Methodologies used in the description of the learners‟ production of the foreign language. The first one, CEA, is a „new type of EA‟ (Dagneaux, Dennes and Granger, 1998: 165). In other words, it is a computerized version of EA, which allows a quicker error annotation and easy retrieval of the erroneous instances of students‟ use of the foreign language. There are Page 32 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. two ways to conduct such an analysis, which depends on whether the learner corpus is error- tagged or not, i.e. whether a code system to highlight the errors has been used or not. If it is not, an intuitive search for an error-prone aspect is undertaken. This is the case when the teacher feels that the central articles the and a(n) pose problems to his or her students. By means of a learner corpus and retrieval tools, s/he can read in the concordances retrieved the use of those articles and decide which ones are incorrect, thus conducting an EA. However, a raw learner corpus, i.e. one without error annotation, will not allow the researcher to retrieve those instances of the (mis-)use of the zero article, since it would be impossible to automatically retrieve them. To do so, the learner corpus needs to be error-tagged. There are two types of error-tagged learner corpora:  Fully error-tagged and  Partially error-tagged In the former, a comprehensive error taxonomy has been used to highlight all the possible errors in a learner corpus. Although few learner corpora are fully error-tagged due to practical reasons of time and money, the results which such EAs yield provide a bird‟s-eye perspective of the students‟ problems when using the foreign language at a specific moment in their language acquisition process. As an example, Figure 7 shows the percentage of errors in forty-three aspects of the foreign language (as represented in the error tags on the horizontal axis) that the written production by first-year university students contains at the beginning of the academic year (Díez-Bedmar, 2005): Figure 22. EA of first-year University students when beginning the academic year (Díez- Bedmar, 2005). Page 33 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. A partially error-tagged learner corpus only highlights a specific type of error, which is of interest to the teacher or the researcher. Resuming the case of the central articles, a partially error-tagged learner corpus will make it possible to easily retrieve, quantify and analyse the errors made with the articles the and a(n) (as it was the case with a raw learner corpus), but also those errors involving the zero article (Ø). Notice in the following concordance lines the cases of incorrect use of the central articles, a(n), followed by erroneous uses of the zero article, and then erroneous uses of the, as error-tagged (GA). Figure 23. Article errors as retrieved from a partially error-tagged learner corpus using WordSmith Tools.. The second methodology used with CLC, the Contrastive Interlanguage Analysis, allows the researcher to compare the students‟ production with: 1 the production by native speakers of English 2 the production by other groups of learners of English with a different L1 On the one hand, if your students‟ production is compared to that by native students of English (at the same level and under the same external variables), it would be possible to see how (dis-)similar both productions are when an aspect of the foreign language is studied. As a result, instances of misuse but also under- or over-use are revealed and conclusions such as the overuse of the prepositions between, inside and according to by Spanish university students, when comparing them to native learners of English can be drawn (Martínez Osés and Neff, 2001: 144). On the other hand, you may be interested in comparing how various Page 34 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. groups of students of English (at the same proficiency level and under the same external variables) struggle with the same aspect of the foreign language, as Kaszubski (2001) did when comparing the use of the lemma be by Spanish, Polish and Belgian-French students. Finally, the Integrated Contrastive Model includes a CIA and a corpus-based CA. Therefore, three different corpora are used, namely the learner corpus, the control corpus and a corpus which contains the production by native speakers in the L1. As it happened with CA in the pre-CLC era, there are two ways of conducting an ICM. First, the corpus-based CA is conducted in order to see the main differences between the two native languages considered and, then, the problems posed by such differences are attested in the learner corpus. On the contrary, the problems in a learner corpus, as revealed by a CIA may lead to a corpus-based analysis of the two native languages in an attempt to find the causes of such errors. 4.4. The application of CLC in the TEFL classroom The potential of CLC in the direct and indirect approaches will be explored in this section. The first one will deal with the indirect approach, that is, using the results from the analysis of CLC (following the methodologies described in 4.3) to improve teaching materials, the curricula, etc., whereas the second one will focus on the direct approach, which provides hand-on experience in working with CLC. 4.3.1. The indirect approach Although CLC-based descriptions of the students‟ interlanguage are still limited and only provide „[…] patchy knowledge of the different stages of interlanguage development.‟ (Gilquin et al., 2007: 322), the results obtained are progressively being introduced in teaching materials. Among the ones which have benefited more from the results in CLC are the dictionaries of common errors, such as The Longman Dictionary of Common Errors (Turton and Heaton, 1987) and the Cambridge series Common Mistakes at… (Tayfoor, 2004; Driscoll, 2005; etc.), in which frequent errors in learner corpora are highlighted and explained. Page 35 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 24. CLC-informed materials focused on common errors. Likewise, dictionaries have also been CLC-informed. The first one was the Longman Essential Activator (LEA), which made use of the information in the Longman Learner’s Corpus (LLC), and was followed by some others such as the Cambridge International Dictionary of English, based on the error-tagged Cambridge Learners’ Corpus (Nicholls, 2003), or the second edition of the Macmillan English Dictionary for Advanced Learners, based on a CIA analysis of the International Corpus of Learner English (ICLE) and a corpus of native speakers‟ academic writing. Figure 25. CLC-informed monolingual dictionaries of English. The CLC-based information in these dictionaries is typically provided in „help boxes‟, which are quite familiar to any learner of English as a foreign or second language. However, new ways of offering information from CLC are being devised, as it is the case of the graphs in the Macmillan English Dictionary for Advanced Learners, which shows the results of the CIAs conducted on problems of frequency, register confusion, etc. Similarly, alternative ways to express the students‟ typical errors are also suggested (as exemplified from the control corpus) and extended writing sections on twelve rhetorical or organizational functions which are particularly prominent in academic writing are included (cf. Gilquin, Granger and Paquot, 2007, pp. IW1-IW29). Page 36 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 26. CLC-based results as provided in the Macmillan English Dictionary for Advanced Learners (MED2). Recent grammars also include information from learner corpora, as it is the case of Carter and McCarthy‟s (2006) Cambridge Grammar of English, or the on-line Chemnitz Internet Grammar of English. Figure 27. CLC-informed grammars of English. Finally, CLC may inform CALL programmes, such as WordPilot (Milton, 1998) or be integrated into CALL programs, so that teachers and students, if deemed convenient, have a direct access to the real data, as in the EXample eXtractor Engine for LAnguage Teaching (eXXelant) (Granger, Kraif, Ponton, Antoniadis and Zampa, 2007). Although syllabus design, textbooks and writing courses are now beginning to consider native data in their recent editions (cf. the Touchstone Student’s Book series), there is no doubt that the information provided by CLC can complement and improve such materials to meet the students‟ real needs. Page 37 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. 4.3.2. Designing remedial exercises from a learner corpus Analysing a learner corpus and designing CLC-based remedial exercises to meet your students‟ real needs is not a difficult task. To help you analyse the data in a learner corpus, this section will explore two ways to approach a small raw learner corpus. The first one deals with the students‟ use of vocabulary, and the second one with the lexico-grammatical pattern of the verb „say‟ and „tell‟. The learner corpus used is one composed of the handwritten production by 16 first-year university students (amounting to 17,765 words) when writing descriptive texts in class, without any access to reference materials and a time limit of 60 minutes, was used. The piece of software used for such purpose will be WordSmith Tools version 4.0. 4.3.2.1. Exploring vocabulary usage: wordlists and concord This piece of software allows the teacher or researcher to create a wordlist, to run concordances and explore keywords, as can be seen in the following Figure. However, we will focus on the use of word lists and concordances for an exploratory analysis of the adjectives used by a group of learners. Page 38 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 28. WordSmith Tools 4.0. As this self-explanatory term indicates, a word list is a list of the words in your learner corpus. This term was reviewed in Table 2 above. Such list may be quantitatively ordered from the word which presents the highest number of occurrences to the ones which only appear once, or the other way round. As can be seen in Figure 29 below, a word list of the adjectives that students used in the learner corpus was obtained after removing from the list the words which did not belong to this open word-class. As a result, it was possible to check that the adjectives which were most used by those students were „good‟, „important‟ and „different‟. This finding may not have surprised an experienced teacher, but the co-text in which these adjectives are used may reveal interesting and unexpected deficiencies in the learners‟ vocabulary. In order to explore such co-texts, the next step is to run concordances of any of these words. For this example, „important‟ was selected. As can be seen in Figure 30 below, by running a concordance we obtain sentences with the searched word in the middle and in blue. This is known as „Key Word In Context‟ (KWIC), or node, and the lines obtained (i.e. concordance lines) are not to be read in the traditional way (that is, everything from left to right as already seen above), but we only focus on the first word to the left or to the right of the KWIC. Thus, we are able to see the type of pre-modification the students use with the adjective under consideration (first word to the left of the KWIC), and which elements are qualified as „important‟. As already reported (cf. Granger and Tribble, 1998 or Osborne, 2004, among others), students rely on this adjective, to the detriment of the use of others like „crucial‟, „outstanding‟, „main‟, „valuable‟, etc., in the appropriate contexts. Therefore, a very easy Page 39 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. exercise to create with the students‟ real words in their compositions is to remove the KWIC and leave a blank, so that they have to think of a better alternative to fit in the linguistic contexts they have created. Figures 29 and 30. WordSmih Tools: Running a concordance and hiding the KWIC. Page 40 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 32, presents a screenshot of such worksheet, which you can put into a word document and use in class. The strongest aspect of this exercise is that it is based on your students‟ own errors, and therefore, cater for their very specific needs. Furthermore, students are more likely to feel motivated to do this exercise, since they may recognise their sentences and may be willing to learn how to improve them. Figure 31. Concord utility. Page 41 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 32. Worsksheet in a .doc document. 4.3.2.2. Exploring lexico-grammatical patterns: „say‟ and „tell‟ The use of the verbs „say‟ and „tell‟ are reported to pose difficulties to students at various levels due to their different lexico-grammatical patterns. However, it is worth exploring whether your students do make those mistakes and, if so, which are the most problematic uses. In order to do so, the first step is to run a concordance of the verb „say‟ and sort the first words to the right of the concordance line, as shown in Figures 33 to 35. Page 42 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figures 33 to 35. Running a concordance and sorting them considering the first element to the right of the KWIC By doing so it is now possible to see how the students complement the verb „say‟ in different contexts and co-texts that they have created themselves. In checking those uses, it is also possible to notice uses of the verb „say‟, where „tell‟ would have been preferred, or where another wording would have been more native-like. In order to show students real native examples of the use of those problematic verbs, i.e. „say‟ and „tell‟, we can use the freely available version of the British National Corpus (BNC) or the Page 43 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Collins Wordbanks Online English Corpus as control corpora, and show students some examples in KWIC format to foster their analysis of the lexico-grammatical patterns used (with the help of the teacher if necessary). To do so, we only have to query those corpora (Figures 36 and 37), select the examples which show the various possibilities to complement the verbs and, finally, create a word document for them to work with Once real input has been provided to students and they have reflected on the various lexico- grammatical patterning, an exercise based on their own written production, that is, in the learner corpus compiled, can be created. As it was the case with the example of the use of „important‟ above, we can easily remove the KWIC (the verbs „say‟ or „tell‟ in this case) from the concordance lines and create a remedial exercise. Page 44 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Figure 36 and 37. Concordances of the verbs „say‟ and „tell‟ in two native corpora. As can be seen, creating materials which meet our students‟ real needs is not such a difficult or time-consuming task. EFL teachers‟ experience is highly valuable when considering their intuitions regarding their students‟ problems, which are worth checking and exploring in the learner corpus that they have compiled. Once the remedial exercises have been created, the worksheets can be stored either in paper format or distributed in a virtual platform, so that students with the same problems, in our school or in another, may benefit from our work created and improve their use of the foreign language. Page 45 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. References Barlow, M. (1996). Corpora for Theory and Practice. International Journal of Corpus Linguistics, 1, 1. 1-37. Bernardini, S. (2004). In the classroom: Corpora in the classroom: An overview and some reflections on future developments. In John Sinclair (ed) How to Use Corpora in Language Teaching,15-36. Amsterdam: John Benjamins. Carter, R. and McCarthy, M. (2006). Cambridge Grammar of English. Cambridge: Cambridge University Press. Corder, S. P. (1981). Error analysis and interlanguage. Oxford: Oxford University Press. Dagneaux, E., Dennes, S., and Granger, S. (1998). Computer-aided error analysis. System 26: 163-174. Díez-Bedmar, M.B. (2005). Struggling with English at university level: error-patterns and problematic areas of first-year students‟ interlanguage. In P. Danielsson and M. Wagenmakers (eds), The corpus linguistics conference series. Retrieved 16 September 2007, from <http://www.corpus.bham.ac.uk/PCLC/> Driscoll, L. (2005). Common Mistakes at PET… and How to Avoid Them. Cambridge: Cambridge University Press. Dulay, H.., Burt, M., and Krashen, S. (1982). Language Two. Oxford: Oxford University Press. Gilquin, G. (2000/2001). The integrated contrastive model. Spicing up your data. Languages in Contrast 3(1): 95-123. Gilquin, G., Papp, Sz. and Diez-Bedmar, M. B. (eds.) (in press) Linking up Contrastive and Learner Corpus Research. Amsterdam and Atlanta: Rodopi. Gilquin, G., Granger, S, and Paquot, M. (2007). Learner corpora: The missing link in EAP pedagogy. Journal of English for Academic Purposes 6: 319-335. Granger, S. (1996). From CA to CIA and back: an integrated approach to computerized bilingual and learner corpora. In K. Aijmer, B.Altenberg and M. Johansson (eds.), Languages in Contrast. Text-Based Cross-Linguistic Studies, 37-51. Lund: Lund University Press. Granger, S. (ed.) (1998). Learner English on Computer. London and New York: Addison Wesley Longman. Granger S. and Tribble C.(1998). Learner corpus data in the foreign language classroom: form-focused instruction and data-driven learning. In S. Granger (ed.) Learner English on Computer, 199-209. London and New York: Addison Wesley Longman. Page 46 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Granger, S., Hung, J. and Petch-Tyson, S. (eds.) (2002). Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching, Amsterdam and Philadelphia: John Benjamins. Granger, S., Kraif, O., Ponton, C., Antoniadis, G. and Zampa, V. (2007). Integrating learner corpora and natural language processing: A crucial step towards reconciling technological sophistication and pedagogical effectiveness. ReCALL 19(3): 252-268. James, C. (1998). Errors in Language Learning and Use. Exploring Error Analysis. London and New York: Longman. Kaszubski, P. (2001). Tracing idiomaticity in learner language –the case of BE. In P. Rayson, A.Wilson, T. McEnery, A. Hardie and S. Khoja (eds.), Proceedings of the Corpus Linguistics 2001 Conference (29 March-2 April), 312-322. Lancaster: University Centre for Computer Corpus Research on Language Lado, R. (1957). Linguistics Across Cultures. Ann Arbour, Michigan: Michigan University Press. Lewis, M. (1993). The Lexical Approach. Language Teaching Publications. McEnery, T.; Xiao, R., and Tono, Y. (2006). Corpus-based language studies. An advanced resource book. London: Routledge. Milton J. (1998). Exploiting L1 and Interlanguage Corpora in the Design of an Electronic Language Learning and Production Environment. In S. Granger (ed.) Learner English on Computer, 186-198. London & New York: Addison Wesley Longman. Martínez Osés, F. and Neff Van Aertselaer, J. (2001). Corpus analysis of prepositional patterns in native and non-native university writing. In C. Muñoz, M. L. Celaya, M. Fernández-Villanueva, T. Navés, O. Strunk and E. Tragant (eds.), Trabajos en Lingüística Aplicada, 139-147. Barcelona: Univerbook. Mauranen, A. (2004).Spoken corpus for an ordinary learner. In John Sinclair (ed) How to Use Corpora in Language Teaching, 89-105. Amsterdam: John Benjamins. Nattinger, J. R. and J. S. Decarrico. (1992) Lexical phrases and language teaching. Oxford: Oxford University Press. Nesselhauf, N. (2004). How learner corpus analysis can contribute to language teaching: A study of support verb constructions. In G. Aston, S. Bernardini and D. Stewart (eds.), Corpora and Language Learners, 109-124. Amsterdam and Philadelphia: John Benjamins. O'Keeffe, A. McCarthy, M. and Carter, R. (2007). From corpus to classroom. Cambridge: Cambridge Univrsity Press. Osborne, J. (2004). Top-down and Botom-up Approaches to Corpora in Language Teaching. In U. Connor and T. A. Upton (eds.). Applied Corpus Linguistics. A Multidimensional Perspective, 251-265. Amsterdam and New York: Rodopi. Page 47 of 48
    • Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom. Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura. Römer, U. (2008). Corpora and language teaching. Selinker, L. (1972). Interlanguage. International Review of Applied Linguistics 10: 209-231. Sinclair, J. (2001). Preface. In M. Ghadessy, A. Henry and R. L. Roseberry (eds.), Small Corpus Studies and ELT. Theory and Practice, vii-xv. Amsterdam and Philadelphia: John Benjamins. Sinclair, J. (2004). New evidence, new priorities, new attitudes. In John Sinclair (ed) How to Use Corpora in Language Teaching, 271-299. Amsterdam: John Benjamins. Tayfoor, S. (2004). Common Mistakes at First Certificate… and How to Avoid Them. Cambridge: Cambridge University Press. Tribble, C. and Jones, G. (1990). Concordances in the classroom. London: Longman. Turton, N. D. and Heaton, J. B. (1987). Longman Dictionary of Common Errors. Harlow: Longman. Page 48 of 48