LREC 2010 presentation

919 views
810 views

Published on

LREC 2010 presentation (Malta): The Dictionary of Italian Collocations: Design and Integration in an online Learning Environment

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
919
On SlideShare
0
From Embeds
0
Number of Embeds
203
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

LREC 2010 presentation

  1. 1. The DictionaryofItalianCollocations: Design and Integration in an Online LearningEnvironment<br />Stefania Spina <br />UniversityforForeigners Perugia, Italia<br />
  2. 2. The Dictionary of Italian Collocations<br />LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations<br />2<br />Part of APRIL project (“Personalised web environmentforlanguagelearning”)<br />NLP resourcesas a supportfor the lexicalcompetenceofstudentsofItalianwithin a VirtualLearningEnvironment(VLE). <br />
  3. 3. Presentationoutline<br />LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations<br />3<br />background and motivation<br />reference corpus<br />methodology<br />dictionary compilation<br />integrationwithin VLE<br />
  4. 4. Background<br />differentsyntactic and semanticprofiles, but<br />prototypicalfeatures:<br />semanticnon-compositionality<br />non-substitutabilityofcomponentsbysemanticallysimilarwords<br />non-insertionofexternalitems<br />continuum ratherthan definite categories<br />LREC 2010 - Stefania Spina - The DictionaryofItalianCollocations<br />4<br />
  5. 5. Continuum<br />LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations<br />5<br />semanticnon-compositionality<br />Tagliare la corda “runaway”<br />aprire la porta “open the door”<br />non-substitutability<br />Camera oscura “dark room”<br />{fare|porre|rivolgere|formulare} una domanda “ask a question”<br />* Stanza oscura<br />insertionofexternalitems<br />fare una lunga calda riposante doccia “take a long, hot, restfulshower”<br />Sistema *molto operativo “operating system”<br />
  6. 6. Motivation: collocations in SLA<br />LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations<br />6<br />improvinglearnersfluency<br />non-nativespeakers and L2 vocabulary: first single words, then more extendedchunks<br />trend tooveruse the creative combinationofisolatedwords<br />Sinclair’s open choiceprinciple<br />ExamplesfromItalianleanercorpora<br />preoccupata per il corso che mi mette nelle difficoltà (Russia)<br />mettere in difficoltà “cause problems”<br />e poi alla fine ho fatto questa decisione (Vietnam) <br />Prendere una decisione “make a decision”<br />
  7. 7. DICI<br />LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations<br />7<br />collocationsrequirespecificpedagogicalattention<br />DictionaryofItalianCollocations(DICI)<br />itiscorpus-based; <br />itis a learner-orientedtool: listof the most common Italiancollocations, classified on a frequencybasis;<br />itisalsobased on statisticalmethodologies (dispersion in the differenttextualgenresrepresented in the corpus).<br />
  8. 8. Reference corpus<br />LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations<br />8<br />Perugia corpus: POS-tagged, lemmatized<br />
  9. 9. POS filtering<br />LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations<br />9<br />Analysisofexistinglistofcollocations:<br />150 different POS sequences<br />10 mostproductive POS sequences<br />
  10. 10. Experimentalmethodology: 4steps<br />LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations<br />10<br />extractionof candidate collocationsfrom corpus;<br />filteringof the candidate collocations: frequencyand dispersion;<br />compilation of the dictionary;<br />integrationof the dictionarywith the online learning<br /><ul><li>6POS sequences
  11. 11. 12-million-word sample, 4sections</li></li></ul><li>Collocationsextraction<br />LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations<br />11<br />via IMS Corpus Workbench<br />removingall the candidateswithfrequency = 1<br />41643 collocations<br />Two more filters:<br />Dispersion<br />Manual (non-collocations)<br />
  12. 12. Dispersion<br />LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations<br />12<br />Examples:<br />Aggrottare la fronte “tofrown” (fiction)<br />Vincere le elezioni “towin the elections” (press)<br />Dare una definizione “togive a definition” (academic prose)<br />Juilland’sDvalue (Juilland - Chang-Rodriguez, 1964)<br />Dvalue: combinedwithfrequency = usage<br />Usage value ≥ 2  2047 candidate collocations<br />Manualselection. Finalresult:<br />listof1553 word combinations = dictionaryentries<br />
  13. 13. Collocationslist<br />LREC 2010 - Stefania Spina - The DictionaryofItalianCollocations<br />13<br />
  14. 14. Compilation of the Dictionary<br />LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations<br />14<br />Lexical database enrichedwithtwokindsof data:<br />Visibleto the learner (client output)<br />definition, examples, part-of-speech, syntacticcontextofoccurrenceofcollocations<br />tobeprocessedbyotherapplications (server)<br />internalsyntacticconfigurationforautomaticrecognition<br />
  15. 15. DB integration in the VLE<br />LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations<br />15<br />VirtualLearningEnvironment:<br />web applicationspecificallydevotedtolanguagelearning<br />LELE (Linguistically-EnhancedLearningEnvironment)<br />providelanguagelearnerswithadditional NLP resources, in ordertoimprovetheirlinguisticcompetence<br />receptive and productivelearningactivitiesconcerning the recognition and the activeuseofcollocations<br />
  16. 16. LELE Features<br />LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations<br />16<br />toautomaticallyrecognize and highlightmulti-wordunits in writtenItaliantexts;<br />to show additionallinguistic information about the selectedcollocations;<br />to generate collocationtestsforcollocationalcompetenceassessmentofsecond or foreignlanguagelearners.<br />…<br />
  17. 17. LELE scheme<br />LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations<br />17<br />server<br />
  18. 18.
  19. 19.
  20. 20. Conclusions<br />LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations<br />20<br />Nextstep:<br />samemethodologyto the whole corpus, forall the 10 selected POS sequences<br />Furtherresearch<br />refinestatisticalmeasures<br />assigncollocationstodifferentlevelsofcompetence<br />othertools (productivetasks)<br />
  21. 21. LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations<br />21<br />Stefania Spina<br />stefania.spina@unistrapg.it<br />http://april.unistrapg.it<br />

×