SlideShare a Scribd company logo
1 of 21
The DictionaryofItalianCollocations: Design and Integration in an Online LearningEnvironment Stefania Spina  UniversityforForeigners Perugia, Italia
The Dictionary of Italian Collocations LREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations 2 Part of APRIL project (“Personalised web environmentforlanguagelearning”) NLP resourcesas a supportfor the lexicalcompetenceofstudentsofItalianwithin a VirtualLearningEnvironment(VLE).
Presentationoutline LREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations 3 background and motivation reference corpus methodology dictionary compilation integrationwithin VLE
Background differentsyntactic and semanticprofiles, but prototypicalfeatures: semanticnon-compositionality non-substitutabilityofcomponentsbysemanticallysimilarwords non-insertionofexternalitems continuum ratherthan definite categories LREC 2010 - Stefania Spina -  The DictionaryofItalianCollocations 4
Continuum LREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations 5 semanticnon-compositionality Tagliare la corda “runaway” aprire la porta “open the door” non-substitutability Camera oscura “dark room” {fare|porre|rivolgere|formulare} una domanda “ask a question” * Stanza oscura insertionofexternalitems fare una lunga calda riposante doccia “take a long, hot, restfulshower” Sistema *molto operativo “operating system”
Motivation: collocations in SLA LREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations 6 improvinglearnersfluency non-nativespeakers and L2 vocabulary: first single words, then more extendedchunks trend tooveruse the creative combinationofisolatedwords Sinclair’s open choiceprinciple ExamplesfromItalianleanercorpora preoccupata per il corso che mi mette nelle difficoltà (Russia) mettere in difficoltà “cause problems” e poi alla fine ho fatto questa decisione (Vietnam)	 Prendere una decisione “make a decision”
DICI LREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations 7 collocationsrequirespecificpedagogicalattention DictionaryofItalianCollocations(DICI) itiscorpus-based;  itis a learner-orientedtool: listof the most common Italiancollocations, classified on a frequencybasis; itisalsobased on statisticalmethodologies (dispersion in the differenttextualgenresrepresented in the corpus).
Reference corpus LREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations 8 Perugia corpus: POS-tagged, lemmatized
POS filtering LREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations 9 Analysisofexistinglistofcollocations: 150 different POS sequences 10 mostproductive POS sequences
Experimentalmethodology: 4steps LREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations 10 extractionof candidate collocationsfrom corpus; filteringof the candidate collocations: frequencyand dispersion; compilation of the dictionary; integrationof the dictionarywith the online learning ,[object Object]
12-million-word sample, 4sections,[object Object]
Dispersion LREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations 12 Examples: Aggrottare la fronte “tofrown” (fiction) Vincere le elezioni “towin the elections” (press) Dare una definizione “togive a definition” (academic prose) Juilland’sDvalue (Juilland - Chang-Rodriguez, 1964) Dvalue: combinedwithfrequency = usage Usage value ≥ 2  2047 candidate collocations Manualselection. Finalresult: listof1553 word combinations = dictionaryentries
Collocationslist LREC 2010 - Stefania Spina -  The DictionaryofItalianCollocations 13
Compilation of the Dictionary LREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations 14 Lexical database enrichedwithtwokindsof data: Visibleto the learner (client output) definition, examples, part-of-speech, syntacticcontextofoccurrenceofcollocations tobeprocessedbyotherapplications (server) internalsyntacticconfigurationforautomaticrecognition
DB integration in the VLE LREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations 15 VirtualLearningEnvironment: web applicationspecificallydevotedtolanguagelearning LELE (Linguistically-EnhancedLearningEnvironment) providelanguagelearnerswithadditional NLP resources, in ordertoimprovetheirlinguisticcompetence receptive and productivelearningactivitiesconcerning the recognition and the activeuseofcollocations
LELE Features LREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations 16 toautomaticallyrecognize and highlightmulti-wordunits in writtenItaliantexts; to show additionallinguistic information about the selectedcollocations; to generate collocationtestsforcollocationalcompetenceassessmentofsecond or foreignlanguagelearners. …
LELE scheme LREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations 17 server
Conclusions LREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations 20 Nextstep: samemethodologyto the whole corpus, forall the 10 selected POS sequences Furtherresearch refinestatisticalmeasures assigncollocationstodifferentlevelsofcompetence othertools (productivetasks)
LREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations 21 Stefania Spina stefania.spina@unistrapg.it http://april.unistrapg.it

More Related Content

Similar to LREC 2010 presentation

NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsDimitris Kontokostas
 
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRYSTRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRYkevig
 
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRYSTRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRYkevig
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
 
UNL-ization of Numbers and Ordinals in Punjabi with IAN
UNL-ization of Numbers and Ordinals in Punjabi with IANUNL-ization of Numbers and Ordinals in Punjabi with IAN
UNL-ization of Numbers and Ordinals in Punjabi with IANijnlc
 
What are the basics of Analysing a corpus? chpt.10 Routledge
What are the basics of Analysing a corpus? chpt.10 RoutledgeWhat are the basics of Analysing a corpus? chpt.10 Routledge
What are the basics of Analysing a corpus? chpt.10 RoutledgeRajpootBhatti5
 
Embedding NomLex-BR nominalizations into OpenWordnet-PT
Embedding NomLex-BR nominalizations into OpenWordnet-PTEmbedding NomLex-BR nominalizations into OpenWordnet-PT
Embedding NomLex-BR nominalizations into OpenWordnet-PTAlexandre Rademaker
 
Smart grammar a dynamic spoken language understanding grammar for inflective ...
Smart grammar a dynamic spoken language understanding grammar for inflective ...Smart grammar a dynamic spoken language understanding grammar for inflective ...
Smart grammar a dynamic spoken language understanding grammar for inflective ...ijnlc
 
PhD Thesis: Operationalization of Collaborative Blended Learning Scripts
PhD Thesis: Operationalization of Collaborative Blended Learning ScriptsPhD Thesis: Operationalization of Collaborative Blended Learning Scripts
PhD Thesis: Operationalization of Collaborative Blended Learning ScriptsMar Pérez-Sanagustín
 
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...Nicole Novielli
 
SIGNWRITING SYMPOSIUM PRESENTATION 1: SWORD Project: Observations of users ad...
SIGNWRITING SYMPOSIUM PRESENTATION 1: SWORD Project: Observations of users ad...SIGNWRITING SYMPOSIUM PRESENTATION 1: SWORD Project: Observations of users ad...
SIGNWRITING SYMPOSIUM PRESENTATION 1: SWORD Project: Observations of users ad...SignWriting For Sign Languages
 
lexicog: Overview of the New Module for Lexicography of OntoLex-lemon
lexicog: Overview of the New Module for Lexicography of OntoLex-lemonlexicog: Overview of the New Module for Lexicography of OntoLex-lemon
lexicog: Overview of the New Module for Lexicography of OntoLex-lemonPretaLLOD
 
Calico 2014 intelligent call - def
Calico 2014   intelligent call - defCalico 2014   intelligent call - def
Calico 2014 intelligent call - defPiet Desmet
 
Mainz Expert Workshop on Controlled Vocabularies 10/10/2013
Mainz Expert Workshop on Controlled Vocabularies 10/10/2013Mainz Expert Workshop on Controlled Vocabularies 10/10/2013
Mainz Expert Workshop on Controlled Vocabularies 10/10/2013Giovanni Colavizza
 
Body-Part Nouns and Whole-Part Relations in Portuguese
Body-Part Nouns and Whole-Part Relations in PortugueseBody-Part Nouns and Whole-Part Relations in Portuguese
Body-Part Nouns and Whole-Part Relations in PortugueseJorge Baptista
 
Lexical Resources for Portuguese
Lexical Resources  for PortugueseLexical Resources  for Portuguese
Lexical Resources for PortugueseValeria de Paiva
 

Similar to LREC 2010 presentation (20)

NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology Constraints
 
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRYSTRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY
 
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRYSTRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
UNL-ization of Numbers and Ordinals in Punjabi with IAN
UNL-ization of Numbers and Ordinals in Punjabi with IANUNL-ization of Numbers and Ordinals in Punjabi with IAN
UNL-ization of Numbers and Ordinals in Punjabi with IAN
 
What are the basics of Analysing a corpus? chpt.10 Routledge
What are the basics of Analysing a corpus? chpt.10 RoutledgeWhat are the basics of Analysing a corpus? chpt.10 Routledge
What are the basics of Analysing a corpus? chpt.10 Routledge
 
NLP
NLPNLP
NLP
 
NLP
NLPNLP
NLP
 
Embedding NomLex-BR nominalizations into OpenWordnet-PT
Embedding NomLex-BR nominalizations into OpenWordnet-PTEmbedding NomLex-BR nominalizations into OpenWordnet-PT
Embedding NomLex-BR nominalizations into OpenWordnet-PT
 
Smart grammar a dynamic spoken language understanding grammar for inflective ...
Smart grammar a dynamic spoken language understanding grammar for inflective ...Smart grammar a dynamic spoken language understanding grammar for inflective ...
Smart grammar a dynamic spoken language understanding grammar for inflective ...
 
PhD Thesis: Operationalization of Collaborative Blended Learning Scripts
PhD Thesis: Operationalization of Collaborative Blended Learning ScriptsPhD Thesis: Operationalization of Collaborative Blended Learning Scripts
PhD Thesis: Operationalization of Collaborative Blended Learning Scripts
 
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...
UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity com...
 
SIGNWRITING SYMPOSIUM PRESENTATION 1: SWORD Project: Observations of users ad...
SIGNWRITING SYMPOSIUM PRESENTATION 1: SWORD Project: Observations of users ad...SIGNWRITING SYMPOSIUM PRESENTATION 1: SWORD Project: Observations of users ad...
SIGNWRITING SYMPOSIUM PRESENTATION 1: SWORD Project: Observations of users ad...
 
Icwl2015 wahl
Icwl2015 wahlIcwl2015 wahl
Icwl2015 wahl
 
lexicog: Overview of the New Module for Lexicography of OntoLex-lemon
lexicog: Overview of the New Module for Lexicography of OntoLex-lemonlexicog: Overview of the New Module for Lexicography of OntoLex-lemon
lexicog: Overview of the New Module for Lexicography of OntoLex-lemon
 
Calico 2014 intelligent call - def
Calico 2014   intelligent call - defCalico 2014   intelligent call - def
Calico 2014 intelligent call - def
 
Mainz Expert Workshop on Controlled Vocabularies 10/10/2013
Mainz Expert Workshop on Controlled Vocabularies 10/10/2013Mainz Expert Workshop on Controlled Vocabularies 10/10/2013
Mainz Expert Workshop on Controlled Vocabularies 10/10/2013
 
Body-Part Nouns and Whole-Part Relations in Portuguese
Body-Part Nouns and Whole-Part Relations in PortugueseBody-Part Nouns and Whole-Part Relations in Portuguese
Body-Part Nouns and Whole-Part Relations in Portuguese
 
Lexical Resources for Portuguese
Lexical Resources  for PortugueseLexical Resources  for Portuguese
Lexical Resources for Portuguese
 
thesis_palogiannidi
thesis_palogiannidithesis_palogiannidi
thesis_palogiannidi
 

LREC 2010 presentation

  • 1. The DictionaryofItalianCollocations: Design and Integration in an Online LearningEnvironment Stefania Spina UniversityforForeigners Perugia, Italia
  • 2. The Dictionary of Italian Collocations LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations 2 Part of APRIL project (“Personalised web environmentforlanguagelearning”) NLP resourcesas a supportfor the lexicalcompetenceofstudentsofItalianwithin a VirtualLearningEnvironment(VLE).
  • 3. Presentationoutline LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations 3 background and motivation reference corpus methodology dictionary compilation integrationwithin VLE
  • 4. Background differentsyntactic and semanticprofiles, but prototypicalfeatures: semanticnon-compositionality non-substitutabilityofcomponentsbysemanticallysimilarwords non-insertionofexternalitems continuum ratherthan definite categories LREC 2010 - Stefania Spina - The DictionaryofItalianCollocations 4
  • 5. Continuum LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations 5 semanticnon-compositionality Tagliare la corda “runaway” aprire la porta “open the door” non-substitutability Camera oscura “dark room” {fare|porre|rivolgere|formulare} una domanda “ask a question” * Stanza oscura insertionofexternalitems fare una lunga calda riposante doccia “take a long, hot, restfulshower” Sistema *molto operativo “operating system”
  • 6. Motivation: collocations in SLA LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations 6 improvinglearnersfluency non-nativespeakers and L2 vocabulary: first single words, then more extendedchunks trend tooveruse the creative combinationofisolatedwords Sinclair’s open choiceprinciple ExamplesfromItalianleanercorpora preoccupata per il corso che mi mette nelle difficoltà (Russia) mettere in difficoltà “cause problems” e poi alla fine ho fatto questa decisione (Vietnam) Prendere una decisione “make a decision”
  • 7. DICI LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations 7 collocationsrequirespecificpedagogicalattention DictionaryofItalianCollocations(DICI) itiscorpus-based; itis a learner-orientedtool: listof the most common Italiancollocations, classified on a frequencybasis; itisalsobased on statisticalmethodologies (dispersion in the differenttextualgenresrepresented in the corpus).
  • 8. Reference corpus LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations 8 Perugia corpus: POS-tagged, lemmatized
  • 9. POS filtering LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations 9 Analysisofexistinglistofcollocations: 150 different POS sequences 10 mostproductive POS sequences
  • 10.
  • 11.
  • 12. Dispersion LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations 12 Examples: Aggrottare la fronte “tofrown” (fiction) Vincere le elezioni “towin the elections” (press) Dare una definizione “togive a definition” (academic prose) Juilland’sDvalue (Juilland - Chang-Rodriguez, 1964) Dvalue: combinedwithfrequency = usage Usage value ≥ 2  2047 candidate collocations Manualselection. Finalresult: listof1553 word combinations = dictionaryentries
  • 13. Collocationslist LREC 2010 - Stefania Spina - The DictionaryofItalianCollocations 13
  • 14. Compilation of the Dictionary LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations 14 Lexical database enrichedwithtwokindsof data: Visibleto the learner (client output) definition, examples, part-of-speech, syntacticcontextofoccurrenceofcollocations tobeprocessedbyotherapplications (server) internalsyntacticconfigurationforautomaticrecognition
  • 15. DB integration in the VLE LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations 15 VirtualLearningEnvironment: web applicationspecificallydevotedtolanguagelearning LELE (Linguistically-EnhancedLearningEnvironment) providelanguagelearnerswithadditional NLP resources, in ordertoimprovetheirlinguisticcompetence receptive and productivelearningactivitiesconcerning the recognition and the activeuseofcollocations
  • 16. LELE Features LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations 16 toautomaticallyrecognize and highlightmulti-wordunits in writtenItaliantexts; to show additionallinguistic information about the selectedcollocations; to generate collocationtestsforcollocationalcompetenceassessmentofsecond or foreignlanguagelearners. …
  • 17. LELE scheme LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations 17 server
  • 18.
  • 19.
  • 20. Conclusions LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations 20 Nextstep: samemethodologyto the whole corpus, forall the 10 selected POS sequences Furtherresearch refinestatisticalmeasures assigncollocationstodifferentlevelsofcompetence othertools (productivetasks)
  • 21. LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations 21 Stefania Spina stefania.spina@unistrapg.it http://april.unistrapg.it