Eurocall2013: A viewpoint on the place of CALL within the Digital Humanities: considering journals, research data and the sharing of research results.

  • 758 views
Uploaded on

The term "Digital Humanities" (DH) received much attention at the MLA (Modern Language Association) convention in 2009. The term is now in widespread use within the Humanities. CALL may be directly …

The term "Digital Humanities" (DH) received much attention at the MLA (Modern Language Association) convention in 2009. The term is now in widespread use within the Humanities. CALL may be directly concerned: our field belongs to the Humanities and, from the outset, we have had a strong interest in computers and computing. Although various meanings and interpretations can be attributed to the term DH, this presentation will address issues related to ways of promoting CALL research in order to meet what may soon become research standards within the Humanities.
Starting with a historical overview of the release of research results, i.e. in academic journals, we will examine whether CALL encourages multilingual publications. We will then turn to links between journals and research data. We will consider the position of several disciplines (including linguistics) regarding ways to enhance replicability by linking research results and researcher data, increasing the visibility and credibility of research.
Another move towards enhancing the quality of CALL research may be to collect, organize and share data stemming from learning situations in such a way that analyses can be clearly and overtly processed and discussed in our community. With this in mind, we will introduce the notion of Learning and Teaching Corpora (LETEC), and illustrate this methodology with data from online multimodal interactions. Beyond CALL research issues, such data may have different applications, both within the area of teacher-training (examples of Pedagogical Corpora will be given) and the general field of linguistics. Finally we will examine how sustained access to research results (articles and data) can be provided in open-access formats and criteria the CALL field will need to meet to become compliant with the so-called "OpenData".

More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
758
On Slideshare
0
From Embeds
0
Number of Embeds
15

Actions

Shares
Downloads
5
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Download slides and videos, extended. version online
  • Caros colegas e amigos,   Não falo Português e nunca aprendi, más antes de eu começar a minha apresentação sobre o tema referente as revistas e dados sobre as pesquisas, eu gostaria de fazer uma breve introdução sobre Portugal e minha cidade Clermont-Ferrand. Esta cidade é a segunda maior cidade Portuguesa aqui na França, depois de Paris. Este pequeno vídeo mostra bem as relações que existem entre os nossos dois idiomas e culturas.   Obrigado pela vossa atenção.
  • Generally now when you read CALL literature or attend CALL conferences, our domain is presented as being a subpart of fields such as Language teaching, Second Language Acquisition, not to say Applied Linguistics. Without rejecting this standpoint, I would like here to turn our attention to other fields, more particularly to the so-called “Digital Humanities”.
  • The term "Digital Humanities" (DH) made a buzz at the MLA (Modern Language Association) convention in 2009. The term is now in widespread use within the Humanities. CALL may be directly concerned: our field belongs to the Humanities and, from the outset, we have had a strong interest in computers and computing. Although various meanings and interpretations can be attributed to this term, this presentation will address issues related to ways of promoting CALL research in order to meet what may soon become research standards within the Humanities. We will talk about journals, data linked to publications, research data organized as corpora and of access to these publications and research data.
  • First of all, let us start with CALL journals (and conferences) and (briefly) examine whether we have built a multilingual academic community
  • About this first issue, as well as the other ones, I would be happy to collect, in an anonymous way (no possibility for me to know who answered what) you current standpoints. If not already done before the conference you can access the survey. The link to the survey is available on the front page of the site mulce.org. Of course I will publish within one or 2 weeks results on Eurocall and CALICO memberlists.
  • Since it is the 20 th anniversary of Eurocall, let me first remind you that our journal ReCALL has got a long tradition. It first appeared in 89, and was at that time supported by the CTI centre for Modern Languages. At the end of the eighties, was launched a British national initiative to support the development of technology, CAL in education. The university of HULL, where Graham Chesters and June Thompson were working, was in charge of CALL development. Besides publishing the Recall journal, Hull also provided a lot of services to language teachers. Teachers could visit the CTI and get software demos, or people from the CTI could come to their own institutions. The CTI also regularly published a software guide, and abroad we always were waiting for the last release.
  • In 95, ReCALL became a joint publication of the CTI and the association Eurocall. In 2003, Eurocall and Hull who still had the editing responsibility gave the publishing task to Cambridge University Press. Of course I would not like to omit to cite the Eurocall review, edited par Ana Gimeno, review still directly published by Eurocall. Since the beginning of Recall there has been one person who still is in charge of ReCALL, namely June Thompson. Unfortunately for the first time , June could not attend our conference, but we could here send her a real thanks for all the wonderful work she achieved !
  • In 95, ReCALL became a joint publication of the CTI and the association Eurocall. In 2003, Eurocall and Hull who still had the editing responsibility gave the publishing task to Cambridge University Press. Of course I would not like to omit to cite the Eurocall review, edited par Ana Gimeno, review still directly published by Eurocall. Since the beginning of Recall there has been one person who still is in charge of ReCALL, namely June Thompson. Unfortunately for the first time , June could not attend our conference, but we could here send her a real thanks for all the wonderful work she achieved !
  • Now let us consider the multilingual issue. I made a quick enquiry among the discontinuous information available on Eurocall websites concerning communications not given in English during our conferences (starting in 93 up to 2012). I could only note one keynote given in 97 and one in 98. In 99 in Besançon we had one keynote and nearly 20 communications not in English, 2 in 2009 in Spain. When looking at papers, full papers, published after Eurocall conferences, I found one in 97 in Recall, 2 in 98, but 8 and 11, respectively after the 99 and 2010 conferences in France. As you can see, the whole amount of communications and articles not in English is fairly reduced.
  • Let us have a look outside Eurocall. In July this year we had the WorldCALL conference in Glasgow. There were more than 250 events, whether papers, posters, courseware. 134 of them were given by English language teachers and, naturally, only related to this language. Only 45 events were produced by colleagues teaching other languages. 85 authors did not mention any language in their abstracts. When looking more closely to this latter 85 events, a majority of them were organized by colleagues from East Asia. A very large majority of them are teachers of English. So I can reasonably count half of this 85 events as being connected to ELT. This gives us around 190 ELT events. In other words 70% of the conference related to ELT, versus only 17% clearly not referring to the English language. Without denying a good scientific quality to WorldCALL conferences, one may wonder whether we are not building another kind of TESOL conference.
  • Please do not misunderstood my viewpoint. I have nothing against English. When we created the first French speaking CALL journal, namely Alsic, we did it in full collaboration with the other English speaking journals. We did jointly organized Eucall99 in Besançon. Neither can we complain against ReCALL. We always have welcomed submissions of papers in other languages. But we all know that language issues are closely connected to cultural and political ones. Hence we may be on the way of losing our autonomy. Moreover, Humanities is generally considered by other scientific fields as a place where researchers can publish good quality papers in vernacular languages. What would it mean for example to be obliged to translate the words “didactique” in French, or “didaktik” in German into pedagogy when knowing these 3 terms have each a distinctive meaning representative of different perspectives in the Educational Research field? Eventually, can we really be trusted by our learners when we try to explain them “Oh yes please learn Arab, or French, or Spanish, Portuguese, etc. because these are important scientific languages” !
  • What can we do in order to develop multilingual spaces in our community:
  • I mention the larger number of papers published in other languages than English after these two Eurocall conferences organized in France. How did this happen? In Besançon we offered the possibility to authors to submit papers either to ReCALL or to Alsic . In Bordeaux, proceedings have been published with another publisher. These opportunities have been jointly organized with EuroCALL. ReCALL lost nothing. How can we develop publications in other languages?
  • In fact, it already exists in other disciplines. Here is the example of the very large scientific publisher named Scielo. It publishes scientific journals in Spanish and Portuguese with authors belonging to America and Europe. Here is an illustration of a journal in biochemistry published in Portuguese.
  • You could think “well this is only an example for natural science and technology”. Indeed in Humanities we have a great opportunity with the public publisher OpenEdition. It is managed by academic people, it already publishes 400 journals in the academic field. This is where, for exemple, the Alsic editorial board moved 5 years ago in order to publish its journal. It has now set up offices in Spane and Portugal, publishes in these languages (here a journal in Sociology) and start discussing with German editorial teams. So, why not soon start publishing an international CALL journal in Spanish & Portuguese (only one for America & Europe) to ensure a strong scientific basis from the beginning?
  • You could think “well this is only an example for natural science and technology”. Indeed in Humanities we have a great opportunity with the public publisher OpenEdition. It is managed by academic people, it already publishes 400 journals in the academic field. This is where, for exemple, the Alsic editorial board moved 5 years ago in order to publish its journal. It has now set up offices in Spane and Portugal, publishes in these languages (here a journal in Sociology) and start discussing with German editorial teams. So, why not soon start publishing an international CALL journal in Spanish & Portuguese (only one for America & Europe) to ensure a strong scientific basis from the beginning?
  • You could think “well this is only an example for natural science and technology”. Indeed in Humanities we have a great opportunity with the public publisher OpenEdition. It is managed by academic people, it already publishes 400 journals in the academic field. This is where, for example, the Alsic editorial board moved 5 years ago in order to publish its journal. It has now set up offices in Spane and Portugal, publishes in these languages (here a journal in Sociology) and start discussing with German editorial teams. So, why not soon start publishing an international CALL journal in Spanish & Portuguese (only one for America & Europe) to ensure a strong scientific basis from the beginning?
  • After having considered publications, let us turn our attention to research data
  • What types of data can we find in research? Data linked to publications Data coming out of research projects (one or several CALL experiments) A recent new type the so-called Big Data (see the extented version) Let us start with data linked to publications
  • What other discplines in Humanities say about this? Gary King, a sociologist, explains why we should make data, he called « replication data sets » available with publications.
  • What does Europe say? A recent text published by the commission with respect to the new 2020 European framework proposes, in order to improve scientific communication, to develop a pilot on open access to data, primarily those data underlying scientific publications.
  • In CALL, since we are not waiting for the last train to arrive, we started discussing a joint project among 5 journals, European and North American ones, which have the habits of already cooperating, where part of the people here in the room are reviewers or members of the editorial boards.
  • Here is the contents of the proposal made to these journals. I am pleased to announce that yesterday the ReCALL editorial board, supported by CUP, accepted to joint the project !
  • In this 3 rd section we will now consider data stemming from research projects.
  • Generally when one talks in CALL about data assemble into corpora, we immediately think of learner corpora. As you know, a learner corpus is made out of learners’ productions. It is studied as a way to enhance language learning or the understanding of the learning process. Since there are plenty of opportunities on this issue in this conference, I will not say more on learner corpora, except when mentioning the question of access to data.
  • Here I would like to introduce another type of corpus, the Learning & teaching corpus (LETEC). Examples will be taken from the field of online learning situations in a multimodal context.
  • Over the past 12 years a community of researchers have been involved in online language learning projects, either for designing pedagogical scenarios, research protocols, for online tutoring, for collecting, analysing data or/and for publishing. Projects started in 2001, with the global simulation Simuligne. From 2007 some of us decided that it was time to adopt a coherent and systematic way to organize data in order to improve our research methodology for reasons I will soon mention. It gave birth to the Mulce project. It is impossible for me to cite all my colleagues here. Let me just mention Christophe Reffay who was there from the beginning, Marie-Noelle Lamy with whom we co-develop the Simuligne project and the Mulce project. Marie-Laure Betbeder, Maud Ciekanski and Ciara Wigham came later on and made a great deal of work. Chris Jones, present here, co-constructed the Tridem project with Mirjam Hauck, Tim Lewis,, and Bonie Youngs. As you can see here, every cloud corresponds to an online learning situation, a research project. Above clouds you have IDs of the country involved (Colombia, Germany, UK, USA, etc.), most of the situations correspond to what is now called a telecollaborative project. Under the clouds you have the languages that were at stake.
  • Mulce researchers were concerned with questions related to validity and reliability. When we set up a learning situation, study it and publish to which extent what we say is anecdotal or can be generalized? Did we actually studied what we pretended to study or did we neglected hidden factors which may open the way to other explanations, conclusions? When we want to give some insurance about these issues and discuss them with other researchers, we are in a real trouble, for many reasons. Among others: As mentioned no data associated with publications Almost no research data accessible nor visible When you do have slices of data they often are not contextualized (what were the precise technological and pedagogical situations?) They are tangled in specific software using proprietary formats
  • Hence in Mulce we decided to design corpora with these criteria in minds:
  • Here is the definition of what we mean by a LETEC corpus. A learning and teaching corpus is made of several parts. The research protocol and the pedagogical scenario describe the context of the learning situation. The technical term “instantiation” (coming from the IMS consortium) refers to interactions and participants’ productions collected during the learning situations. We also assemble forms and licences related to ethics and right. Forthcoming analysis may be attached to the first version of the corpus or come later on.
  • How can a LETEC be build? Here is a schema detailing the process. We presented it at WorldCALL. I will quiclky skip into it just for the sake of understanding the rest of my presentation.
  • To contextualized things let me choose one of our latest project, Archi21. We designed a CLIL scenario jointly with teachers of architecture and language teachers. Learners had an intensive course in order to develop an architectural project, simultaneously F2F and online in Second Life. Language teachers were only at a distance. We had 4 groups of learners either in French or English as a foreign language.
  • During this first stage, the design stage, research questions are fixed, here mainly: - relationships beween verbal and non vernal modes (by “verbal” please understand it as being the antonym of non-verbal . No exclusive relation to speech) - And Interplay between textchat & voicechat modalities
  • The design process of a learning scenario encompasses the description of the 3 main elements : online environments, learning activities and participant’s roles. Generally we present our learning design in a pretty formal way (see the graph on the left) in order to let people clearly understand it and to relate every step to pedagogical documents such as the guidelines and resources given to the learners. But there is no obligation for using such description format. A corpus compiler can just describes the pedagogical design as a simple text.
  • The design of the research protocol includes definition of researchers’ role besides the teacher’s one, the protocol for data collection, for questionnaires, ethics agreement, etc.
  • Here is an overview of data collected during the Archi21 course: pre and post questionaires, 20hours of videoscreen captures, post-interviews, etc.
  • In order to organize data, we use the IMS-CP format (a standard coming from the IMS consortium already mentioned. This international consortium which gathers academic institutions and companies is concerned with establishing interoperability for learning systems and learning content) , which I have no time to develop here. The bottom part of the corpus assemble the primary data (after they have been anonymised). In the second part, a set of IDs make the link between the top description and the corresponding files of the primary data. The top part, called the “manifest” (another technical term) is made of one XML file and give information about each component of the corpus: metadata, technological environments used in the course, bio information and IDS about participants (teachers, learners, groups). Then in a structure called workspaces are the interactions, links between what participants did with respect to the pedagogical scenario.
  • All these 3 components are assembled in an archive which is deposited into the corpus databank, “Mulce repository”. Here is part of the interface with the locations of the different experiments already mentioned. It is one way to access the archives / corpora.
  • Another way of selecting and downloading corpora is offered through this second interface which details corpora criteria such as, learning situations, technological environments used, languages, types of interactions, pedagogical approaches, etc.
  • We started transcribing multimodal data in 2005 thanks to Lyceum, the Open University environment used in the Copéas experiment. Here is a simplified view of the kinds of interactions we transcribe in Second Life (it is detailed in one of our recent paper in ReCALL). Whilst teaching and researching in various environments, we elaborated our transcription methodology. The latest version of our manual for transcription is online and, as all our publication, open access.
  • Once you achieve your transcription and analysis, you compile them into what we called a distinguished corpus (the one before being called the global corpus) and you deposit this second corpus. As you can see here every corpus receive its own reference on the repository. The new corpus only contains transformed data. It gives links to data already described into the global corpus, and add description on the tools used during the analysis and the transcription step.
  • Some colleagues often wonder why spending so much time on organizing data. Well research is not a one shot process. Once you have your learning situation and your data, you may want to study it from different perspectives. News ideas may come and new analysis tools help you build these new ideas, it is an interactive process. For example here, on the left you have transcriptions coming from a Copeas corpus, trancriptions in our XML format, linked to a video capture in a given format. We have been interested in using the Tatiana software. It has been a straightforward process to transform Mulce XML format to Tatiana XML one and convert videos formats..
  • As an example, you can see an extract of a session in Lyceum, displayed in Tatiana : colors refers to modalities (audio, textchat, word processor) and layers to participants. 3 learners are collaboratively writing into a document, whereas the tutor tries several times intervening but is completely ignore by the learners. Maud Ciekanski gave an interpretation to this surprising phenomena through an analysis of the Context. She used the Goodwin & Duranti 92 model to explain why the tutor was out of context. We published this analysis as a LETEC corpus. Funnily I recently discovered in a paper published by Lamy in 2012, this quotation. It is a general paper were she explains why Social semiotics can be a good theoretical framework to study multimodality. As far as I understood, Marie-Noelle was not referring to a specific example. We can feel here how it could be interesting, around pieces of data, to raises different theoretical issues, or explanations.
  • As an example, you can see an extract of a session in Lyceum, displayed in Tatiana : colors refers to modalities (audio, textchat, word processor) and layers to participants. 3 learners are collaboratively writing into a document, whereas the tutor tries several times intervening but is completely ignore by the learners. Maud Ciekanski gave an interpretation to this surprising phenomena through an analysis of the Context. She used the Goodwin & Duranti 92 model to explain why the tutor was out of context. We published this analysis as a LETEC corpus. Funnily I recently discovered in a paper published by Lamy in 2012, this quotation. It is a general paper were she explains why Social semiotics can be a good theoretical framework to study multimodality. As far as I understood, Marie-Noelle was not referring to a specific example. We can feel here how it could be interesting, around pieces of data, to raises different theoretical issues, or explanations.
  • What can we gain when giving access to research data? De Los Arcos paper demonstrate for example that anxiety is not present in the audiographic environment and consequently this type environment needs to be studied as specific topic.
  • Organising and publishing data may be useful to CALL research. But we can extend our perspective and see whether it may be of interest more generally in linguistics or for pedagogical motivations.
  • Let us firstly consider the linguistic perspective, and keep in mind that we all here in CALL have got very rich CMC data.
  • After the historical project on the English language which lead to the creation of the BNC corpus (British National Corpus), during the last 10 years linguists started to develop reference corpora of other European languages started : There is a reference corpus of German A second one for Flemish Dutch Another reference corpus for French is in progress. Their common features are : Large couverture, Billions of tokens, 500 M structured & annotated (POS), Provide access for linguistic research They seek to develop extensions to Internet communication
  • When colleagues in Linguistics build corpora they systematically structure their data. Hence when considering CMC data, one of the first thing to do is study their specific micro and macro structures (here Wikipedia forums). Very often they use the TEI (Text Encoding Initiative) , a standard previously designed for text, then extend to speech. They now want to propose another CMC extension to the TEI.
  • My German linguist colleague, Michael Beisswenger, after studying discussion forums and textchats recently characterized CMC structures. He sais … But in CALL, when we study multimodal CMC the “en bloc” nature does not apply anymore. Here is a transcription of a LETEC corpus which shows interplay between modalities. Since it s not very readable, let us see it differently.
  • In one of our paper, which will appear in the CALL journal, and the corresponding data are already online in Mulce, Ciara Wigham discusses the interplay between audio and textchat. Here is an extract from Archi21. In the left column you have the transcription of the audio of one learner, who presents his feeling related to the on-going process of his architectural project. He is a French native and speaks in English as his L2. In the 3 other columns on the right, you find textchats turns coming from the tutor and two other learners belonging to the same architectural project group. Let me show you a short video. **** In this example of conversation doubling, the acts in the text chat respond to the voice chat (blue arrows) but equally acts in the voice chat respond to the text chat (orange arrows) and text chat acts respond to interaction in both voice chat and text chat modalities and prompt interaction in both modalities
  • Thanks to the rich kind of data we find in CALL, we have been able to create with colleagues belonging to a dozen of different linguistic research labs the CoMeRe project. CoMeRe is the French name given to CMC. We aim at building a CMC corpus in French and to participate at the same time at a European level, with Michael Beisswenger, to the extension of the TEI to CMC.
  • Let us now consider pedagogical applications of LETEC corpora
  • The idea of pedagogical corpora stemmed out of discussions with colleagues who are simultaneously researchers in physical activity and teacher trainers in sport. Here is how they design new procedures to train pre-service sport teacher. - step1 : the teacher trainer explains how a sport lesson should be designed - Step 2 : students, organized by couples, have a live experience : one teaches a lesson in a school ; the second one records the lesson - Step3 : students come back at the university, they share their experience, use their videos. But the reflection process is not deep enough - hence step 4 : the teacher trainer uses selected data from previous research situation for cross confrontation.
  • To understand the flavour of pedagogical corpora, let us consider a specific situation. In 2006, our colleague Tim Lewis, from the Open University, who have had his first experience as online tutor in a multimodal environment during the Copeas experiment, published afterwards a paper in the CALICO journal. We assembled in a LETEC corpus all the data he gave us, related to this paper : personal diary, videos interviews of learners. We added other data coming from learners reports, discussion forums. Tim was quite pessimistic about the nature of the collaborative process among the learners, and also between himself and the learners. But when you closely look at the data different perspectives appear. Directly using a LETEC corpus in a training situation is not that easy. Hence we extracted data, and with Ciara Wigham, we imagine a pedagogical corpus.
  • To understand the flavour of pedagogical corpora, let us consider a specific situation. In 2006, our colleague Tim Lewis, from the Open University, who have had his first experience as online tutor in a multimodal environment during the Copeas experiment, published afterwards a paper in the CALICO journal. We assembled in a LETEC corpus all the data he gave us, related to this paper : personal diary, videos interviews of learners. We added other data coming from learners reports, discussion forums. Tim was quite pessimistic about the nature of the collaborative process among the learners, and also between himself and the learners. But when you closely look at the data different perspectives appear. Directly using a LETEC corpus in a training situation is not that easy. Hence we extracted data, and with Ciara Wigham, we imagine a pedagogical corpus.
  • Here is an extract of a video which has been created as a lead-in document for the new pedagogical corpus.
  • The pedagogical corpus can be downloaded out of Mulce repository. It uses data from the Copeas experiment and offer new activities around the perception of collaborative process as a language tutor. It aims at - identify language tutors' and students' differing views of successful collaboration - summarize the characteristics of successful collaboration and produce a list of implications for practice - appraise the advantages of keeping a teaching journal - compare and contrast reflections from a teaching journal with naturally occurring data (interaction tracks) and researcher-provoked data (student feedback) to analyse whether teachers should base reflections about teaching practice solely on journal entries and personal reactions
  • The pedagogical corpus can be downloaded out of Mulce repository. It uses data from the Copeas experiment and offer new activities around the perception of collaborative process as a language tutor. It aims at - identify language tutors' and students' differing views of successful collaboration - summarize the characteristics of successful collaboration and produce a list of implications for practice - appraise the advantages of keeping a teaching journal - compare and contrast reflections from a teaching journal with naturally occurring data (interaction tracks) and researcher-provoked data (student feedback) to analyse whether teachers should base reflections about teaching practice solely on journal entries and personal reactions
  • 4 th section , last subject
  • Let us start with a warning coming from James Boyle, one of creator of Creative Common project. Many people would like to enclose the Commons of the Mind as he says. I seriously … And he gave a second warning about the necessity of paying attention to licences even when looking at public domain issues.
  • Let us briefly look at open access to publications, subject I already detailed in LLT in 2007..
  • Here is a recent statement from the European commission which give guidelines to researchers for the “Horizon 2020” framework (recently opened). The report says : there should be open access to publications resulting from publicly founded research as soon as possible, and forms should be changed in order to let researchers retain their copyright while granting licences to publishers. These are 2 different issues. Open access needs not wait for the new wording of copyright forms.
  • There exist a huge literature (open access) on this topic, which details the main ways of providing open access to publications, the so-called Green and Gold roads. As regards CALL journals and the Gold Road, Language Learning & Technology (LLT) and Alsic were the first to provide open access to their articles. Recently CALICO reduced its moving wall to one year (one year after their publication, articles become open access). Let me just recall here, it is firstly the author responsibility to deposit her/his article once it has been accepted by reviewers in open archives (the green road), whichever kind of archive it is, national or institutional.
  • Let us spend a little more time on access to research data, to the new concept of OpenData
  • This term starts being widely used with different aims in mind: In the academic world it refers to the way of sharing research data For government and public institutions, they started opening their data to the public. This is of course the first case we will here consider.
  • There exist 3 main criteria that research data should follow in order to be considered OpenData. Besides being obviously available, the interesting perspective is the fact that data can be access in order to be reuse and mix with other data. Second interesting point is that the constraints for reuse should be reduced to a minimum, then the definition stipulate that non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes are not allowed
  • Another interesting point is also the fact that authors should always put a licence on data, when they plan to make their data available.
  • What about access to data coming from the language learning fields. Here is the example of the most well known learner corpus, namely the International Corpus of Learner English. On the website, no access is given, except the “pay to look at”. Nothing is mentioned about reuse and mixing.
  • Here is another learner corpus , the ELFA (English as a Lingua Franca in Academic Settings). Creators made some progress. In the earlier license, users were being charged over 100 euros for a mere six-month license (and just for the text corpus, not the audio), with instructions to "destroy" the files at the end of this period or purchase a new license! Now text data are open access but for personal use only (i.e. not here for example). They are important restrictions in the license and still no access is given to the audio.
  • As regards the LETEC corpora and Mulce repository, all our corpora are of course open access , without any registration. In each corpus we included the informed consent form signed by participants. We started to change our licence in order to become fully compliant with OpenData criteria.
  • Here is the listing of the Creative Common licences, created by James Boyle and his colleagues. 2 of them are not OpenData compliant, because they forbid direct use for commercial purposes. In Mulce, we started to switch from the BY-NC-SA to the BY only, i.e. the only obligation of to refer to the authors and the original work.
  • The CC-BY licence is an improvement. But it still restricts possibilities for mixing and reusing data. There exist 2 licences which are fully compliant: the PDDL and CCO ones. These licences make a significant step forward. They wave intellectual property rights, data then become part of public domain.
  • When hearing this you may be afraid or at least sceptical: - What will happen if the attribution licence is not there anymore? - I may not be cited?
  • But we should not be afraid. We have the habits of confusing 2 very different issues IPS and citation-references: The first one only refer to legal issues : “you did not cite me, I am going to take you before the court!” In the second one, we have our academic procedure. We need to refer to previous work and when authors do not do it properly, their work is rejected by peers-reviewers. Hence we need only worry about correctly referencing our work and making this reference clearly available. Here are for example two references toLETEC corpora, the first one to the author, only creator of the corpus ; the second one where creators and editors are distinguished (like a chapter of a book). Moreover these references are tagged as such, included in metadata which are harvested on Internet thanks to the OLAC harvesting protocol.

Transcript

  • 1. A viewpoint on the place of CALL within the Digital Humanities: considering CALL journals, research data and the sharing of research results Thierry Chanier, Université Blaise PascalThierry Chanier, Université Blaise Pascal Eurocall 2013, University of Évora , Portugal, 11-14 September, 2013 Download slides and all videos for this talk: link on http://mulce.org, main editorial article, Version 15th September 2013
  • 2. Portugal & Clermont-FerrandPortugal & Clermont-Ferrand 22 Recent but strong relationships Portugal and Clermont-Ferrand: Cultures and languages between the past and the future (3mn video)
  • 3. Connecting CALL with otherConnecting CALL with other disciplines / research fieldsdisciplines / research fields 33 Current situation Are we connected? SLA SLA Linguistics SLA Education / CAL CALL Digital Humanities
  • 4. OverviewOverview 44
  • 5. JOURNALS AND MULTILINGUALJOURNALS AND MULTILINGUAL ISSUES WITHIN THE CALLISSUES WITHIN THE CALL COMMUNITYCOMMUNITY 55 1 2 3 4
  • 6. Survey on CALL journals andSurvey on CALL journals and research dataresearch data  Please participate in the online surveyPlease participate in the online survey  The survey is anonymous. I willThe survey is anonymous. I will publish the results on the EUROCALLpublish the results on the EUROCALL mailing list at the end of September.mailing list at the end of September.  Find the survey:Find the survey: – Link in the main editorial article on :Link in the main editorial article on : http://mulce.orghttp://mulce.org – Questions 1 to 5Questions 1 to 5 66
  • 7. History of ReCALLHistory of ReCALL 77 Find the survey : http://Mulce.org
  • 8. History of ReCALLHistory of ReCALL 88 June Thompson - there from the very beginning 1989 1995? 2003? Ana Gimeno (ed)
  • 9. History of ReCALLHistory of ReCALL 99 1995? 2003?
  • 10. Does Eurocall support multi-languages?Does Eurocall support multi-languages? 1010 Communications in languages other than English during Eurocall conferences (hard to be exhaustive, websites disappeared) France Spain France France Publications not in English after Eurocall conferences
  • 11. WorldCALL and multi-languagesWorldCALL and multi-languages 1111 Sum of papers, posters, courseware. - When tandems involving ELT, count 1 for ELT and 1 for others - More than half of Unknown from Asia (English as L2) Target languages in WorldCALL13, v1 Target languages in WorldCALL13, v2 Half of « unknown » counted as ELT.
  • 12. Unpleasant situations for (Euro)CALLUnpleasant situations for (Euro)CALL  Nothing against EnglishNothing against English (cf. my position on(cf. my position on French-FLE) (ReCALL accepts submissions inFrench-FLE) (ReCALL accepts submissions in other languages)other languages)  Language is culture and politicsLanguage is culture and politics  The humanities generally a multilingualThe humanities generally a multilingual domain: cf. pedagogy ≠ didactique ≠domain: cf. pedagogy ≠ didactique ≠ didaktikdidaktik  Can we be trusted by learners when weCan we be trusted by learners when we assert that other languages are used forassert that other languages are used for academic / scientific purposes?academic / scientific purposes? 1212
  • 13. What can we do?What can we do?  Raise awareness through conferences:Raise awareness through conferences: – Specify language taught when submittingSpecify language taught when submitting – Conference organizers build statisticsConference organizers build statistics – Organize national events during conferencesOrganize national events during conferences (cf. Portugal this year, Spain, Belgium,(cf. Portugal this year, Spain, Belgium, France,…) and encourage com. in vernacularFrance,…) and encourage com. in vernacular languagelanguage  Publish in several languages (cf.Publish in several languages (cf. telecollaboration projects)telecollaboration projects)  Develop international CALL journals inDevelop international CALL journals in other languagesother languages 1313
  • 14. 1414 After Eurocall99 (Besançon) publications in ReCALL and in Alsic After Eurocall2010 (Bordeaux)publications in ReCALL and in another journal Develop international CALL journalsDevelop international CALL journals
  • 15. Exemples from other disciplinesExemples from other disciplines 1515
  • 16. European publishing structures existEuropean publishing structures exist 1616
  • 17. European publishing structures existEuropean publishing structures exist 1717
  • 18. European publishing structures existEuropean publishing structures exist 1818 International CALL journal in Spanish & Portuguese (only one for America and Europe)?
  • 19. ORGANIZE AND PUBLISHORGANIZE AND PUBLISH RESEARCH DATARESEARCH DATA Enhance research quality in CALLEnhance research quality in CALL 1919 1 2 3 4
  • 20. Different coverage for dataDifferent coverage for data 2020 Corpora, see next section We start here Warning: in this presentation, we only consider data produced by CALL research, not data coming from other fields and used by CALL (cf. mixed situation in Corpus CALL)
  • 21. Current situation in CALL (and many,Current situation in CALL (and many, but not all, fields in Humanities)but not all, fields in Humanities)  Some (not all) of our papers are basedSome (not all) of our papers are based on research dataon research data  These data (empty forms, forms filled,These data (empty forms, forms filled, spreadsheets, transcriptions, languagespreadsheets, transcriptions, language data and their computation, audio,data and their computation, audio, video, etc.) are not accessible tovideo, etc.) are not accessible to reviewers, nor to the readers oncereviewers, nor to the readers once papers are publishedpapers are published 2121
  • 22. What other disciplines sayWhat other disciplines say ““Replication data setsReplication data sets include the original data andinclude the original data and any other information needed to reproduce theany other information needed to reproduce the numerical results in a published work.numerical results in a published work. […] making publicly available a replication data set for[…] making publicly available a replication data set for each of their empirical articles or books.each of their empirical articles or books. Citation credit should be apportioned both for theCitation credit should be apportioned both for the original article and separately for the data.“original article and separately for the data.“ 2222 Gary King (2007). "An Introduction to the Dataverse Network as an Infrastructure for Data Sharing," Sociological Methods and Research, Vol. 32, No. 2
  • 23. What Europe saysWhat Europe says 2323 COMMISSION RECOMMENDATION of 17.7.2012 on access to and preservation of scientific information : http://ec.europa.eu/research/science- society/document_library/pdf_06/recommendation-access-and-preservation-scientific- information_en.pdf
  • 24. Data publication for CALL journals:Data publication for CALL journals: proposal for a joint projectproposal for a joint project 25th July 201325th July 2013
  • 25. Contents of the proposalContents of the proposal  Reviewers will access data when readingReviewers will access data when reading the paper (strengthen the review process)the paper (strengthen the review process)  Once the paper is accepted, data areOnce the paper is accepted, data are publishedpublished  The reader (researcher) can access theseThe reader (researcher) can access these data in order to replicate, join them todata in order to replicate, join them to her/his own data, etc.), cf. Opendataher/his own data, etc.), cf. Opendata  The author is the great winner! TwoThe author is the great winner! Two references to her/his work: data will have anreferences to her/his work: data will have an individual reference (but linked to) theindividual reference (but linked to) the paper’s referencepaper’s reference 2525
  • 26. Link between publication & data:Link between publication & data: example from earth sciencesexample from earth sciences Arason, P et al. (2011): Plume-top altitude time-series during 2010Arason, P et al. (2011): Plume-top altitude time-series during 2010 volcanic eruption of Eyjafjallaj??l. Icelandic Meteorologicalvolcanic eruption of Eyjafjallaj??l. Icelandic Meteorological Office, Reykjavik,Office, Reykjavik, doi:10.1594/PANGAEA.76069doi:10.1594/PANGAEA.760690,0, Supplement to:Supplement to: Arason, Pordur; Petersen, G N; Bjornsson, HArason, Pordur; Petersen, G N; Bjornsson, H (2011): Observations of the altitude of the volcanic plume during the(2011): Observations of the altitude of the volcanic plume during the eruption of Eyjafjallajl, April-May 2010.eruption of Eyjafjallajl, April-May 2010. Earth System Science DataEarth System Science Data,, 3, 9-17,3, 9-17, doi:10.5194/essd-3-9-2011doi:10.5194/essd-3-9-2011 2626 Journal site Data site
  • 27. WHAT WE STARTED TOWHAT WE STARTED TO DO IN FRANCEDO IN FRANCE 2727
  • 28. Datapublication (French project)Datapublication (French project)  With the help of TGE-Adonis (national infrastructureWith the help of TGE-Adonis (national infrastructure for humanities)for humanities) – Now part of Huma-NumNow part of Huma-Num  For Alsic and Sticef journals (as a starting point)For Alsic and Sticef journals (as a starting point)  Every journal has its entries, have an internalEvery journal has its entries, have an internal review process (cf. OJS) for datareview process (cf. OJS) for data  Reviewers can look at data when reading the paperReviewers can look at data when reading the paper (data are not open at this stage)(data are not open at this stage)  When the paper is accepted data are publishedWhen the paper is accepted data are published 2828 http://datapublication.tge-adonis.fr
  • 29. An exempleAn exemple 2929 http://datapublication.tge-adonis.fr/data/d-001-102 http://sticef.univ-lemans.fr/num/vol2012/05-guichon/sticef_2012_guichon_05.htm
  • 30. IRIS IS NOT THE PROJECTIRIS IS NOT THE PROJECT WE ARE LOOKING ATWE ARE LOOKING AT 3030
  • 31. 3131 http://www.iris-database.org
  • 32. Why not IRIS?Why not IRIS?  Iris is an interesting OpenData project with links to journalsIris is an interesting OpenData project with links to journals from UK and USA universities, sponsored by UK, but …from UK and USA universities, sponsored by UK, but …  Data are not part of the review processData are not part of the review process  Once a paper is accepted authors do as they pleased, e.g:Once a paper is accepted authors do as they pleased, e.g: some put the form of a questionnaire, not the data collectedsome put the form of a questionnaire, not the data collected (answers), nor the computation (spreadsheet)(answers), nor the computation (spreadsheet)  Metadata are not standard (just for search on the site, likeMetadata are not standard (just for search on the site, like Merlot)Merlot)  They are local and cannot be harvestedThey are local and cannot be harvested  No reference to the data (cf. DataCite) , no permalinkNo reference to the data (cf. DataCite) , no permalink  No crosslink between data and publication (which would notNo crosslink between data and publication (which would not make sense because data are not exhaustive) and have notmake sense because data are not exhaustive) and have not been part of the evaluation processbeen part of the evaluation process 3232
  • 33. CALL Datapublication projectCALL Datapublication project  Make a common proposal at the European Union level (ResearchMake a common proposal at the European Union level (Research agency) via DARIAHagency) via DARIAH  Get logistical and official scientific support in order to design andGet logistical and official scientific support in order to design and open a website site (Datapublication)open a website site (Datapublication)  Where our 5 journals will have separate access for their editorialWhere our 5 journals will have separate access for their editorial board in order to manage distinct review processboard in order to manage distinct review process  Manage a joint design for the workflow of the review processManage a joint design for the workflow of the review process  Metadata format will be standard, permalink given, full reference withMetadata format will be standard, permalink given, full reference with link and full reference of paperslink and full reference of papers  When the web site is open, for every journal author’s guidelines needWhen the web site is open, for every journal author’s guidelines need to be changed (when authors submit papers which rely on data) andto be changed (when authors submit papers which rely on data) and links be implemented in order to point from the journal to the data sitelinks be implemented in order to point from the journal to the data site  Then the Datapublication website may be open for other journals inThen the Datapublication website may be open for other journals in humanities (best to get EU support) whether they are based in or outhumanities (best to get EU support) whether they are based in or out of EUof EU 3333
  • 34. DATA & PROJECT(S),DATA & PROJECT(S), LETEC CORPORALETEC CORPORA With extracts from Wigham & Chanier (2013)With extracts from Wigham & Chanier (2013) 3434 1 2 3 4
  • 35. First corpora in CALL : learner corporaFirst corpora in CALL : learner corpora  Building corpora : collectingBuilding corpora : collecting learners’production (essais),learners’production (essais), structuring, annotating, processingstructuring, annotating, processing  Using corporaUsing corpora – To enhance learning (DDL: data drivenTo enhance learning (DDL: data driven learning) under some circonstanceslearning) under some circonstances – To enhance researchTo enhance research  Thinking about : Eurocall SIG,Thinking about : Eurocall SIG, conferences, special issues, etc.conferences, special issues, etc. 3535
  • 36. New type of corporaNew type of corpora  LEarning and TEaching CorporaLEarning and TEaching Corpora (LETEC) ((LETEC) (corpus d’apprentissagecorpus d’apprentissage))  data-sharing and repository fordata-sharing and repository for research on multimodal interactionsresearch on multimodal interactions 3636
  • 37. 37 Simuligne (2001) UK-FR fre Copéas (2005) eng UK-FR Tridem (2005-06) UK-FR-USA eng, fre Ecofralin (2008) CO-FR fre,spa VMT-teamC (2006) math UK-USA-SG INFRAL (2009) deu,fra DE-FR FR FAVI (2006-08) fra ARCHI21 (2011) eng,fra FR SLIC (2013) USA-FR fra
  • 38. Data validity & reliability in CALL research? • Questions related to validity and reliability • Problems in Humanities, Social Sciences and CALL: ▫ Visibility, accessibility of research data ▫ Data representative / anecdotal? ▫ Publication (already mentionned) • CALL data is often: ▫ not contextualised – pedagogical & technological situations (Kern et al., 2004) ▫ tangled in specific software using proprietary formats • Replication for interaction analysis in online learning near impossible: ▫ variables that are difficult to control ▫ replication does not imply that phenomenon previously observed will reoccur (Reffay et al., 2012) 38
  • 39. Research data quality: Mulce project • Interoperability: ▫ Structured and coherent data sets => analyses can be completed by researchers who did not participate in the course • Sustainability: ▫ Independent from online platforms ▫ Stored in independent formalisms • Open access to research data & appropriate licences • Accessibility: ▫ Finding the research data thanks to harvesting protocols based on standard metadata – OLAC (Open Language Archives Community) 39
  • 40. LETEC Components Instantiation Pedagogical scenario Research protocol Analyses 40 "A LETEC corpus collects in a systematic and structured way all the data from interactions which occur during a course which is partially or entirely online. These data are enriched by technical, pedagogical and scientific information as well as information about the participants and are organized to allow contextualized analyses to be performed.“ (Mulce-documentation, 2013) Public licence Private licence ethics & rights
  • 41. Building a LETEC stages= Data analyses 41
  • 42. 42
  • 43. Illustration of methodology- • European project KA2 Languages • CLIL approach (Content and Language Integrated Learning) ▫ Architecture + French / English L2 • Hybrid course "Building Fragile Spaces" : 5-day studio Feb. 2011 • 17 students, 2 architecture tutors, 1 EFL tutor, 1 FFL tutor Working with external partners: exchanges 43
  • 44. Elaboration of research areas • Interplay between verbal and non verbal modes • Role of nonverbal in identity construction • Interplay between textchat & voicechat modalities Support for L2 verbal participation and production Wigham (2012) – PhD Thesis http://tel.archives-ouvertes.fr/tel-00762382 Stage 1: Design 44
  • 45. Pedagogical Design • Macro-task– collaboratively elaborate a model in a synthetic world (Second Life) as a response to an architectural problem brief • Architectural studio, hybrid CLIL approach • 4 workgroups Stage 1: Design Learning design Online environments Participants’ roles Learning & support activities 45
  • 46. Research protocol • Research protocol design ▫ Protocol for data collection ▫ Researchers' roles ▫ Timetable of research activities Stage 1: Design researcher 46 Wigham & Chanier, 2013 ReCALL
  • 47. 47
  • 48. Data collection & coverage for Archi21 Data collected Pre- questionnaires Session data Post questionnaires Semi- directive interviews Environ ment Kwiksurveys Second Life VoiceForum Kwiksurveys Skype Data type Spreadsheet file Video screen captures Audio recordings Spreadsheet file Audio recordings Quantity & coverage of data 17 student questionnaires 20 group sessions & 2 presentation sessions 19h40m 64 forum messages 16 student questionnaires 5 student interviews 2h30 pre-course post-courseduring course Stage 2: Data collection 48
  • 49. 49
  • 50. Primary data (anonymised) Each resources has an ID and a description given LETEC global corpus: IMS content packaging Manifest : structured data Structured Interaction Data Model (Mce_sid, 2011) XML Information about each component of the corpus Stage 3: Data organisation 50
  • 51. Corpus deposit • Mulce corpus repository : http://repository.mulce.org Stage 3: Data organisation 51
  • 52. Corpus diffusion • Description of corpus; interface to browse structure; zip file to download Stage 3: Data organisation 52
  • 53. 53
  • 54. verbal mode non verbal mode audio textchat proxemic transmission radio transmission public private not detailed here, see Wigham & Chanier, (2013) ReCALL 25(1) Multimodal data transcription Stage 4: Data transcription & diffusion 54 Saddour, I., Wigham, C., Chanier, T. Manuel de transcription. (2011) - http://edutice.archives-ouvertes.fr/edutice-00676230
  • 55. Production & deposit of LETEC distinguished corpus • Particular analysis of a selected part of the global LETEC corpus Chanier, T. Saddour, I. & Wigham, C.R. (2012). (dir.) Distinguished Corpus: Transcription of Verbal and Nonverbal Interactions of the Second Life Reflection archi21-slrefl-av-j2. Mulce.org : Clermont Université. [oai : mulce.org:mce-archi21- slrefl-av-j2 ; http://repository.mulce.org] • Only contains transformed data (=the transcriptions) • Refers to a selection of the original data in global corpus (=videos) • Software used for transcription cited (=ELAN) Stage 4: Data transcription & diffusion 55
  • 56. Simple conversions from LETEC to analysis toolsSimple conversions from LETEC to analysis tools 5656 LETEC structure (format Mulce-struct) LETEC (format Tatiana) Conversions Analysis
  • 57. Type 2Type 2 Partager analyses avec outils associésPartager analyses avec outils associés 5757 Chanier, T. & Ciekanski, M. (2009). (editors). Corpus distinguable Copeas T5 contexte. Mulce.org : Clermont Université. [ oai:mulce.org:mce-copeas-T5_contexte-all ; http://repository.mulce.org ]
  • 58. Type 2Type 2 Partager analyses avec outils associésPartager analyses avec outils associés 5858 Chanier, T. & Ciekanski, M. (2009). (editors). Corpus distinguable Copeas T5 contexte. Mulce.org : Clermont Université. [ oai:mulce.org:mce-copeas-T5_contexte-all ; http://repository.mulce.org ] Various interpretations on data : - (Ciekanski & Chanier, 2007) Context (Goodwin & Duranti, 1992 ) “imagine that the tutor led his tutorial via postings in the text-chat while students talked about other topics in the audio channel. It is unlikely that the group would accept such a position for the tutor, and we draw from multimodal social semiotics to help explain why. “ - (Lamy, 2012) Social semiotics (Kress & Leeuwen, 2001)
  • 59. What providing access to dataWhat providing access to data meansmeans  Go in depth into discussions about models, whatGo in depth into discussions about models, what they explainedthey explained  Carefully compare previous and new situationsCarefully compare previous and new situations  Limit research cycles which may not be soLimit research cycles which may not be so interesting:interesting: – Re-inventing the wheel: new techno. environments, newRe-inventing the wheel: new techno. environments, new affordances, but…affordances, but… – Back to the endless comparison with F2F, with theBack to the endless comparison with F2F, with the standpoint that when online you loose things (cf. currentstandpoint that when online you loose things (cf. current papers on webcams, presence, anxiety, etc.)papers on webcams, presence, anxiety, etc.) – Could we at last reason on new possibilities to discussCould we at last reason on new possibilities to discuss and learn in L2 online?and learn in L2 online? 5959 (De Los Arcos, Coleman, Hampel, 2009)
  • 60. ANOTHER LIFE FORANOTHER LIFE FOR LETEC DATALETEC DATA (AFTER REUSE FOR(AFTER REUSE FOR CALL RESEARCH)CALL RESEARCH) Reference corpus & Pedagogical coporaReference corpus & Pedagogical copora 6060 1 2 3 4
  • 61. CORPORA WHICH MAYCORPORA WHICH MAY INCLUDE CALL CMCINCLUDE CALL CMC (COMPUTER(COMPUTER MEDIATED COMMUNICATION)MEDIATED COMMUNICATION) Linguistic perspective: reference corpusLinguistic perspective: reference corpus 6161
  • 62. Reference corpora of differentReference corpora of different languageslanguages  Corpus in German, DWDSCorpus in German, DWDS Digitales Wörterbuch der deutschen Sprache,Digitales Wörterbuch der deutschen Sprache,  Corpus in Flemish / Dutch, SoNaRCorpus in Flemish / Dutch, SoNaR SSTEVINTEVIN NNederlandstaligederlandstalig RReferentiecorpuseferentiecorpus  Corpus in French (in progress)Corpus in French (in progress)  Common aims:Common aims: – Billions of tokens, 500 M structured &Billions of tokens, 500 M structured & annotated (POS), access for linguisticannotated (POS), access for linguistic researchresearch – Extension to Internet communicationExtension to Internet communication 6262 http://www.dwds.de/
  • 63. CMC macro andCMC macro and micro structuresmicro structures 6363
  • 64. Multimodality and CMC ?Multimodality and CMC ? 6464 The element <posting> is the basic CMC-specific element in our schema. In CMC documents it represents the largest structural unit that can be assigned to one author and one point in time. The category posting is defined as a content unit that has been sent to the server “en bloc”. TEI and CMC, (Beißwenger et al., 2012) (Chanier, Saddour & Wigham, 2012) LETEC corpus
  • 65. Modality interplay 1.5 mn video * Paper: (Wigham & Chanier, 2013) CALL journal * Data: (Chanier, Saddour & Wigham, 2012) LETEC corpus
  • 66. 6666 Salut s que <NOM_4> c dcd à ht 1 dvd pr sa cop ki e pa la 2main? CoMeRe.org: CMC corpus in FrenchCoMeRe.org: CMC corpus in French SMS / texts Tweets Blogs Forums Text chat Etc. CoMeRe: Communication Médiée par les Réseaux)
  • 67. PEDAGOGICAL CORPORAPEDAGOGICAL CORPORA Example from sports scienceExample from sports science 6767 1 2 3 4
  • 68. Training the pre-service teacher in sportTraining the pre-service teacher in sport  Step1: course on building a lessonStep1: course on building a lesson  Step2: personal live experience in a school ;Step2: personal live experience in a school ; record interaction (video) ; reflexionrecord interaction (video) ; reflexion (document)(document)  Step3: back at university: share experienceStep3: back at university: share experience and reflection (and reflection (process not deep enoughprocess not deep enough))  Step4 : teacher uses selected data fromStep4 : teacher uses selected data from previous research for cross confrontationprevious research for cross confrontation 6868 (Researcher in physical activity: N. Gal-Petitfaux, Université Blaise Pascal)
  • 69. PEDAGOGICAL CORPORAPEDAGOGICAL CORPORA CREATED OUT OF LETECCREATED OUT OF LETEC CORPORACORPORA Authors : Ciara Wigham, Thierry ChanierAuthors : Ciara Wigham, Thierry Chanier 6969 1 2 3 4
  • 70. Starting from a distinguished corpusStarting from a distinguished corpus 7070 Lewis, T. (2006) When Teaching is Learning: A Personal Account of Learning to Teach Online. CALICO, Vol 23, No. 3, May 2006.pp 581-600 http://calico.org/html/article_110.pdf
  • 71. Starting from a distinguished corpusStarting from a distinguished corpus 7171
  • 72. Lead-in documentLead-in document 7272 5 mn video
  • 73. 7373 Wigham, C.R. & Chanier, T. (2013) Pedagogical corpus: Reflective Teaching Journals. Mulce.org : Clermont Université. [oai : mulce.org:mce-peda-rtjournals ;
  • 74. 7474 Wigham, C.R. & Chanier, T. (2013) Pedagogical corpus: Reflective Teaching Journals. Mulce.org : Clermont Université. [oai : mulce.org:mce-peda-rtjournals ;
  • 75. OPEN ACCESS TOOPEN ACCESS TO PUBLICATIONS & DATAPUBLICATIONS & DATA OpenDataOpenData 7575 1 2 3 4 Survey on CALL journals and research data : - Link in the main editorial article on : http://mulce.org - Questions 10 to 17
  • 76. Enclosing the Commons of the MindEnclosing the Commons of the Mind  I seriously doubt that we would create theI seriously doubt that we would create the Web today—at least if policy makers andWeb today—at least if policy makers and market incumbents understood what themarket incumbents understood what the technology might become early enoughtechnology might become early enough toto stop it. (p.278)stop it. (p.278)  Almost everything onAlmost everything on the Internet is copyrighted,the Internet is copyrighted, even if its creators do not know that and wouldeven if its creators do not know that and would prefer it to be in the public domain. (p. 26)prefer it to be in the public domain. (p. 26) (Boyle, J.2008,(Boyle, J.2008, The Public Domain: Enclosing theThe Public Domain: Enclosing the Commons of the Mind)Commons of the Mind) Boyle is one of the creatorsBoyle is one of the creators of the Creative Common – CC projectof the Creative Common – CC project 7676
  • 77. FREE AND IMMEDIATEFREE AND IMMEDIATE ACCESS TOACCESS TO PUBLICATIONSPUBLICATIONS (ONCE(ONCE ACCEPTED BY REVIEWERS)ACCEPTED BY REVIEWERS) Open archivesOpen archives 7777 1 2 3 4 Chanier, T. "Commentary: Open Access to Research and the Individual Responsibility of Researchers". Language Learning & Technology, vol. 11, 2 (2007).
  • 78. Guidelines for researchers (EU level)Guidelines for researchers (EU level)  “The Commission proposes to make open access to scientific publications a general principle of Horizon 2020, building on the already existing activities in FP7 (e.g. eligibility of open access publishing costs, embargo for 'Green' open access of six to twelve months). 7878 http://ec.europa.eu/research/science-society/document_library/pdf_06/background-paper- open-access-october-2012_en.pdf
  • 79. 7979 Insitutional repository National repository
  • 80. OPEN ACCESS TOOPEN ACCESS TO RESEARCH DATARESEARCH DATA OpenDataOpenData 8080 OER : Open Educational Ressources are important, but not considered here 1 2 3 4
  • 81. OpendataOpendata  Term which is starting to be widely usedTerm which is starting to be widely used with different aims in mind, among otherwith different aims in mind, among other things:things: – 1) Academic world: share research results1) Academic world: share research results – 2) Government and public institutions: open their2) Government and public institutions: open their data to the publicdata to the public  Here we mainly consider the 1Here we mainly consider the 1stst perspectiveperspective 8181
  • 82. Opendata defOpendata def  “Open data is data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and sharealike." OpenDefinition.org 8282
  • 83. Opendata criteriaOpendata criteria  “Availability and Access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form.  Reuse and Redistribution: the data must be provided under terms that permit reuse and redistribution including the intermixing with other datasets. The data must be machine- readable.  Universal Participation: everyone must be able to use, reuse and redistribute – there should be no discrimination against fields of endeavor or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed. “OpenDefinition.org 8383
  • 84. Why should we use licences?Why should we use licences?  “In most jurisdictions there are intellectual property rights in data that prevent third-parties from using, reusing and redistributing data without explicit permission. Even in places where the existence of rights is uncertain, it is important to apply a license simply for the sake of clarity. Thus, if you are planning to make your data available you should put a license on it — and if you want your data to be open this is even more important.” OpenDefinition.org 8484
  • 85. Example of licences on learner corpora: ICLEExample of licences on learner corpora: ICLE  No access given on the website, exceptNo access given on the website, except « pay to look at »« pay to look at »  Nothing about reuse, mixing, etc.Nothing about reuse, mixing, etc. 8585
  • 86. Example of licences on learner corpora: ELFAExample of licences on learner corpora: ELFA  Open access, but for personal use (hence not forOpen access, but for personal use (hence not for research)research)  Important restriction (NC), where are the sound files?Important restriction (NC), where are the sound files? 8686 https://elomake.helsinki.fi/lomakkeet/43518/lomake.html
  • 87. Open access, ethics and licenceOpen access, ethics and licence 8787 For usage: licence For participants: Informed consent form + Anonymization process Open Data: http://opendefinition.org/guide/ Note : Incoherence between licences on our sites: changes are not yet achieved
  • 88. Usual CC (open but not necessarilyUsual CC (open but not necessarily compliant with OpenData)compliant with OpenData) 8888
  • 89. 2 licences on data fully compliant with2 licences on data fully compliant with OpenDataOpenData  CC0 : As creators, I may have hadCC0 : As creators, I may have had some rights (rights on models, rightssome rights (rights on models, rights on data, etc.) on the work and I waiveon data, etc.) on the work and I waive them (permanent , irrevocable)them (permanent , irrevocable)  PPDL : I do not even mention the factPPDL : I do not even mention the fact that I may have had rights overthat I may have had rights over somethingsomething 8989
  • 90.  What will happen if the attributionWhat will happen if the attribution licence is not there anymore?licence is not there anymore?  I may not be cited?I may not be cited? 9090
  • 91. No confusion between attribution(IPR)No confusion between attribution(IPR) and citation-referencesand citation-references  We give users the way to refer to our workWe give users the way to refer to our work ((metadata : OLAC – bibliographicCitationmetadata : OLAC – bibliographicCitation) and will) and will use this in our list of publication & works. Foruse this in our list of publication & works. For exemple:exemple:  1) creator of the corpus1) creator of the corpus – Wigham, C.R. (2013).Wigham, C.R. (2013). Distinguished Corpus: Interplay betweenDistinguished Corpus: Interplay between textchat and audio modalities during the Second Life Reflectivetextchat and audio modalities during the Second Life Reflective SessionsSessions. Mulce.org : Clermont Université. [oai : mulce.org:mce-. Mulce.org : Clermont Université. [oai : mulce.org:mce- archi21-modality-textchat ; http://repository.mulce.org]archi21-modality-textchat ; http://repository.mulce.org]  2) creator and editor2) creator and editor – Stahl, Gerry ; Weimar, Steve ; Shumar, Wes (2009).Stahl, Gerry ; Weimar, Steve ; Shumar, Wes (2009). LETECLETEC Corpus Virtual Math TeamCorpus Virtual Math Team. Reffay, C. (editor). Mulce.org :. Reffay, C. (editor). Mulce.org : Clermont Université. [oai : mulce.org:mce-vmt-letec-teamc ;Clermont Université. [oai : mulce.org:mce-vmt-letec-teamc ; http://repository.mulce.orghttp://repository.mulce.org]] 9191
  • 92. RecommendationsRecommendations  ActionsActions – Open our data (provided that ethics is OK –Open our data (provided that ethics is OK – anonymisation)anonymisation) – Choose licences with the fewest restrictionsChoose licences with the fewest restrictions – Cite others and your data as bibliographic referencesCite others and your data as bibliographic references – List them in your workList them in your work  ImplicationsImplications – Acknowledgement will come (from institutions, otherAcknowledgement will come (from institutions, other colleagues)colleagues) – CALL research will progress (re-analysis, coverageCALL research will progress (re-analysis, coverage extended with mixing)extended with mixing) – CALL data will be reused by other fieldsCALL data will be reused by other fields 9292 Open our data! Open Data Open Data If we want to be connected to Digital Humanities
  • 93. Thank you for your attention!Thank you for your attention! Thierry.chanier at univ-bpclermont.frThierry.chanier at univ-bpclermont.fr http://lrl.univ-bpclermont.fr/spip.php?rubrique98http://lrl.univ-bpclermont.fr/spip.php?rubrique98 9393