SlideShare a Scribd company logo
Senso Comune as a Knowledge Base of Italian 
language 
The Resource and its Development 
Tommaso Caselli 1 Isabella Chiari 2 Aldo Gangemi 3 Elisabetta 
Jezek 4 Alessandro Oltramari 5 Guido Vetere 6 Laure Vieu 7 
Fabio Massimo Zanzotto 8 
1VU Amsterdam 
2Universit `a di Roma ’Sapienza’ 
3CNR ISTC 
4Universit `a di Pavia 
5Carnegie Mellon University 
6IBM Italia 
7CNRS IRIT 
8Universit `a di Roma ’Tor Vergata’ 
Tommaso Caselli , Isabella Chiari , Aldo GangemSie,nEsloisDaCboeemtctuaneJeemazeskba,KeAnlreosws1lae0nddg,reoB2Oa0lstrea1mo4faIrtia,liaGnuildaongVueategree , LauDreecVeiemub,eFra1b0i,o2M01a4ssimo 1Za/ n1z1otto
Introduction 
Senso Comune (www.sensocomune.it) is an open, machine-readable 
knowledge base of the Italian language 
Lexical content has been extracted from a monolingual Italian 
dictionary (De Mauro’s GRADIT), and is continuously enriched 
through a collaborative online platform 
Linguistic knowledge is represented by a semasiological model where 
each sense can be qualified with respect to a small set of ontological 
categories 
Senses can be further enriched in many ways and mapped to other 
dictionaries, such as the Italian version of MultiWordnet, thus 
qualifying Senso Comune as a linguistic Linked Open Data resource 
Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 2 / 11
General principles 
(Computational) lexicography should be able to build on the direct 
witness of native speakers (not only textual sources) 
The way linguistic meanings relate to ontological categories is 
tangential 
Linguistic knowledge belongs to the entire community of speakers, 
thus we are committed to keep the resource as open as possible 
Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 3 / 11
Lexicon and ontology 
To map lexical senses to concepts Senso Comune adopts a notion of 
ontological commitment: If the sense S commits to (7!) the concept C, 
then there are entities of type C to which occurrences of S may refer to. 
Ontological Commitment 
(S7! C) , 9s; cjS(s) ^ C(c) ^ refers to(s; c) 
A sense may commit to several different ontological categories (e.g. 
ARTIFACT, INFORMATION) 
Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 4 / 11
Lexicon and ontology, a semiotic approach 
Senses are semiotic objects whose relationship with real world 
entities is mediated by cognitive structures, emotional polarity and 
social interactions 
Lexical relations, such as synonymy, which hold among senses, do 
not bear direct ontological import 
Conversely, ontological axioms, such as equivalence, do not have 
immediate linguistic side-effects 
If the equivalence of linguistic senses to ontological concepts is 
desired (e.g. for technical portions of the dictionary), this condition 
has to be specifically formalized and managed 
Synonymy < Equivalence 
S7! C ^ S07! C0 ^ S  S0 ; C  C0 
S7! C ^ S07! C0 ^ C  C0 ; S  S0 
Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 5 / 11
Sense classification 
Senso Comune meanings are 
classified w.r.t. a small set of 
categories inspired by DOLCE 
A tutoring methodology (TMEO) 
supports the classification 
process 
Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 6 / 11
Annotation of lexicographic examples and definitions 
Ongoing work in Senso Comune focuses on manual annotation of the 
usage examples associated with the sense definitions of the most 
common verbs in the resource, with the goal of providing Senso Comune 
with corpus-derived verbal frames. The annotation task, which is 
performed through a Web-based tool, is organized in two main subtasks. 
1 consists in identifying the 
constituents that hold a relation 
with the target verb in the 
example and to annotate them 
with information about the type 
of phrase and grammatical 
relation 
2 users are asked to attach a 
semantic role, an ontological 
category and the sense 
definition associated with the 
argument filler of each frame 
participant in the instances 
Figure: Annotation of andare a cavallo 
(riding) 
Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 7 / 11
Word Sense Alignment 
To enrich Senso Comune (SC) and make it interoperable with other 
lexical-semantic resources, we conducted Word Sense Alignment (WSA) 
experiments with MultiWordNet (MWN), both manually and automatically 
Figure: Aligment of appartamento 
Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 8 / 11
Manual Alignment 
At the time of this writing 
584 SC lemmas (nouns) have been processed for manual alignment, 
for a total of 
6,730 word senses, with 3.64 average word senses for each lemma 
2,131 senses could be aligned with at least one MWN synset (31.7%) 
2,187 MWN synsets could be aligned to at least one SC sense 
1,093 biunique alignments 
SC MWN % 
1,622 1 76.1 
367 2 17.2 
108 3 5 
25 4 1.1 
11 5,7 0.6 
Table: SC to MWN 
MWN SC % 
1,681 1 76.8 
400 2 18.2 
85 3 3.8 
17 4 0.9 
4 5,6 0.3 
Table: MWN to SC 
=) Similar granularity, relatively little overlap 
Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 9 / 11
Automatic Alignment 
Lexical Match (overlapping tokens between two sense description), 
with 
1 Lemmatized version of the original glosses of Senso Comune 
2 Bag-of-words based on synset words, direct hypernyms, nearest 
synsets, the corresponding Italian synset words from the “Princeton 
Annotated Gloss Corpus” and Wikipedia glosses from BabelNet 
Sense Similarity (cosine score between the vector representations 
of sense descriptions) 
1 Vector representations have been obtained by means of the 
Personalized Page Rank (PPR) vector representation with WN30 and 
“Princeton Annotated Gloss Corpus” as knowledge base 
Evaluation 
Two Gold Standards, one for verbs (350 sense pairs) and one for 
nouns (166 sense pairs), with Precision (P), Recall (R) and F1 scores 
Best F1 by merging the outputs of the two methods: 0.47 for verbs 
(P=0.61, R=0.38) and 0.64 for nouns (P=0.67, R=0.61). 
Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 10 / 11
Conclusion 
The gap between a “native” Italian dictionary and an 
English-derivative Wordnet may be relevant 
This should be carefully taken into account when devising techniques 
and methodologies to construct multilingual resources 
Our results suggest that more attention should be paid to the 
semantic peculiarity of each language, i.e. the specific way each 
language constructs a conceptual view of the world 
Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 11 / 11

More Related Content

Similar to Senso Comune as a Knowledge Base of Italian language - The Resource and its Development

TESOL Italy Newsletter, EVO article
TESOL Italy Newsletter, EVO articleTESOL Italy Newsletter, EVO article
TESOL Italy Newsletter, EVO article
Sandra Annette Rogers
 
Makalah Phonological Construction
Makalah Phonological ConstructionMakalah Phonological Construction
Makalah Phonological Construction
Jerusman Marbun
 
G2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageG2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian language
ijnlc
 
Introduction_to_Language_and_Linguistics.pptx
Introduction_to_Language_and_Linguistics.pptxIntroduction_to_Language_and_Linguistics.pptx
Introduction_to_Language_and_Linguistics.pptx
ValeryRamirezMendez
 
Difference Between Alphabet And International Phonetic Theory
Difference Between Alphabet And International Phonetic TheoryDifference Between Alphabet And International Phonetic Theory
Difference Between Alphabet And International Phonetic Theory
Sandy Harwell
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
paperpublications3
 
Functions of Gestural Semantics in Contemporary Communication
Functions of Gestural Semantics in Contemporary CommunicationFunctions of Gestural Semantics in Contemporary Communication
Functions of Gestural Semantics in Contemporary Communication
Subramanian Mani
 
3. C. Roper Dissertation.PDF
3. C. Roper Dissertation.PDF3. C. Roper Dissertation.PDF
3. C. Roper Dissertation.PDFClaire Roper
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
paperpublications3
 
Cognitive Process Associated with LanguageNamePsycho.docx
Cognitive Process Associated with LanguageNamePsycho.docxCognitive Process Associated with LanguageNamePsycho.docx
Cognitive Process Associated with LanguageNamePsycho.docx
clarebernice
 
Natural language processing and sanskrit
Natural language processing and sanskritNatural language processing and sanskrit
Natural language processing and sanskrit
IAEME Publication
 
Introduction to Linguistics
Introduction to LinguisticsIntroduction to Linguistics
Introduction to Linguistics
Sheng Nuesca
 
W17 5406
W17 5406W17 5406
W17 5406
bonbon93
 
speech production in psycholinguistics
speech production in psycholinguistics speech production in psycholinguistics
speech production in psycholinguistics
Aseel K. Mahmood
 
Arabic SentiWordNet in Relation to SentiWordNet 3.0
Arabic SentiWordNet in Relation to SentiWordNet 3.0Arabic SentiWordNet in Relation to SentiWordNet 3.0
Arabic SentiWordNet in Relation to SentiWordNet 3.0
Waqas Tariq
 
A TMS Study On Abstract And Concrete Phrases
A TMS Study On Abstract And Concrete PhrasesA TMS Study On Abstract And Concrete Phrases
A TMS Study On Abstract And Concrete Phrases
Yolanda Ivey
 
Natural Language Processing: State of The Art, Current Trends and Challenges
Natural Language Processing: State of The Art, Current Trends and ChallengesNatural Language Processing: State of The Art, Current Trends and Challenges
Natural Language Processing: State of The Art, Current Trends and Challenges
antonellarose
 
The human mind at work
The human mind at workThe human mind at work
The human mind at work
Faith Clavaton
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
Linda Garcia
 

Similar to Senso Comune as a Knowledge Base of Italian language - The Resource and its Development (20)

TESOL Italy Newsletter, EVO article
TESOL Italy Newsletter, EVO articleTESOL Italy Newsletter, EVO article
TESOL Italy Newsletter, EVO article
 
Makalah Phonological Construction
Makalah Phonological ConstructionMakalah Phonological Construction
Makalah Phonological Construction
 
G2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageG2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian language
 
Introduction_to_Language_and_Linguistics.pptx
Introduction_to_Language_and_Linguistics.pptxIntroduction_to_Language_and_Linguistics.pptx
Introduction_to_Language_and_Linguistics.pptx
 
Difference Between Alphabet And International Phonetic Theory
Difference Between Alphabet And International Phonetic TheoryDifference Between Alphabet And International Phonetic Theory
Difference Between Alphabet And International Phonetic Theory
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
Functions of Gestural Semantics in Contemporary Communication
Functions of Gestural Semantics in Contemporary CommunicationFunctions of Gestural Semantics in Contemporary Communication
Functions of Gestural Semantics in Contemporary Communication
 
3. C. Roper Dissertation.PDF
3. C. Roper Dissertation.PDF3. C. Roper Dissertation.PDF
3. C. Roper Dissertation.PDF
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
Cognitive Process Associated with LanguageNamePsycho.docx
Cognitive Process Associated with LanguageNamePsycho.docxCognitive Process Associated with LanguageNamePsycho.docx
Cognitive Process Associated with LanguageNamePsycho.docx
 
Natural language processing and sanskrit
Natural language processing and sanskritNatural language processing and sanskrit
Natural language processing and sanskrit
 
Edinburgh
EdinburghEdinburgh
Edinburgh
 
Introduction to Linguistics
Introduction to LinguisticsIntroduction to Linguistics
Introduction to Linguistics
 
W17 5406
W17 5406W17 5406
W17 5406
 
speech production in psycholinguistics
speech production in psycholinguistics speech production in psycholinguistics
speech production in psycholinguistics
 
Arabic SentiWordNet in Relation to SentiWordNet 3.0
Arabic SentiWordNet in Relation to SentiWordNet 3.0Arabic SentiWordNet in Relation to SentiWordNet 3.0
Arabic SentiWordNet in Relation to SentiWordNet 3.0
 
A TMS Study On Abstract And Concrete Phrases
A TMS Study On Abstract And Concrete PhrasesA TMS Study On Abstract And Concrete Phrases
A TMS Study On Abstract And Concrete Phrases
 
Natural Language Processing: State of The Art, Current Trends and Challenges
Natural Language Processing: State of The Art, Current Trends and ChallengesNatural Language Processing: State of The Art, Current Trends and Challenges
Natural Language Processing: State of The Art, Current Trends and Challenges
 
The human mind at work
The human mind at workThe human mind at work
The human mind at work
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 

Recently uploaded

Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 

Recently uploaded (20)

Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 

Senso Comune as a Knowledge Base of Italian language - The Resource and its Development

  • 1. Senso Comune as a Knowledge Base of Italian language The Resource and its Development Tommaso Caselli 1 Isabella Chiari 2 Aldo Gangemi 3 Elisabetta Jezek 4 Alessandro Oltramari 5 Guido Vetere 6 Laure Vieu 7 Fabio Massimo Zanzotto 8 1VU Amsterdam 2Universit `a di Roma ’Sapienza’ 3CNR ISTC 4Universit `a di Pavia 5Carnegie Mellon University 6IBM Italia 7CNRS IRIT 8Universit `a di Roma ’Tor Vergata’ Tommaso Caselli , Isabella Chiari , Aldo GangemSie,nEsloisDaCboeemtctuaneJeemazeskba,KeAnlreosws1lae0nddg,reoB2Oa0lstrea1mo4faIrtia,liaGnuildaongVueategree , LauDreecVeiemub,eFra1b0i,o2M01a4ssimo 1Za/ n1z1otto
  • 2. Introduction Senso Comune (www.sensocomune.it) is an open, machine-readable knowledge base of the Italian language Lexical content has been extracted from a monolingual Italian dictionary (De Mauro’s GRADIT), and is continuously enriched through a collaborative online platform Linguistic knowledge is represented by a semasiological model where each sense can be qualified with respect to a small set of ontological categories Senses can be further enriched in many ways and mapped to other dictionaries, such as the Italian version of MultiWordnet, thus qualifying Senso Comune as a linguistic Linked Open Data resource Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 2 / 11
  • 3. General principles (Computational) lexicography should be able to build on the direct witness of native speakers (not only textual sources) The way linguistic meanings relate to ontological categories is tangential Linguistic knowledge belongs to the entire community of speakers, thus we are committed to keep the resource as open as possible Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 3 / 11
  • 4. Lexicon and ontology To map lexical senses to concepts Senso Comune adopts a notion of ontological commitment: If the sense S commits to (7!) the concept C, then there are entities of type C to which occurrences of S may refer to. Ontological Commitment (S7! C) , 9s; cjS(s) ^ C(c) ^ refers to(s; c) A sense may commit to several different ontological categories (e.g. ARTIFACT, INFORMATION) Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 4 / 11
  • 5. Lexicon and ontology, a semiotic approach Senses are semiotic objects whose relationship with real world entities is mediated by cognitive structures, emotional polarity and social interactions Lexical relations, such as synonymy, which hold among senses, do not bear direct ontological import Conversely, ontological axioms, such as equivalence, do not have immediate linguistic side-effects If the equivalence of linguistic senses to ontological concepts is desired (e.g. for technical portions of the dictionary), this condition has to be specifically formalized and managed Synonymy < Equivalence S7! C ^ S07! C0 ^ S S0 ; C C0 S7! C ^ S07! C0 ^ C C0 ; S S0 Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 5 / 11
  • 6. Sense classification Senso Comune meanings are classified w.r.t. a small set of categories inspired by DOLCE A tutoring methodology (TMEO) supports the classification process Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 6 / 11
  • 7. Annotation of lexicographic examples and definitions Ongoing work in Senso Comune focuses on manual annotation of the usage examples associated with the sense definitions of the most common verbs in the resource, with the goal of providing Senso Comune with corpus-derived verbal frames. The annotation task, which is performed through a Web-based tool, is organized in two main subtasks. 1 consists in identifying the constituents that hold a relation with the target verb in the example and to annotate them with information about the type of phrase and grammatical relation 2 users are asked to attach a semantic role, an ontological category and the sense definition associated with the argument filler of each frame participant in the instances Figure: Annotation of andare a cavallo (riding) Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 7 / 11
  • 8. Word Sense Alignment To enrich Senso Comune (SC) and make it interoperable with other lexical-semantic resources, we conducted Word Sense Alignment (WSA) experiments with MultiWordNet (MWN), both manually and automatically Figure: Aligment of appartamento Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 8 / 11
  • 9. Manual Alignment At the time of this writing 584 SC lemmas (nouns) have been processed for manual alignment, for a total of 6,730 word senses, with 3.64 average word senses for each lemma 2,131 senses could be aligned with at least one MWN synset (31.7%) 2,187 MWN synsets could be aligned to at least one SC sense 1,093 biunique alignments SC MWN % 1,622 1 76.1 367 2 17.2 108 3 5 25 4 1.1 11 5,7 0.6 Table: SC to MWN MWN SC % 1,681 1 76.8 400 2 18.2 85 3 3.8 17 4 0.9 4 5,6 0.3 Table: MWN to SC =) Similar granularity, relatively little overlap Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 9 / 11
  • 10. Automatic Alignment Lexical Match (overlapping tokens between two sense description), with 1 Lemmatized version of the original glosses of Senso Comune 2 Bag-of-words based on synset words, direct hypernyms, nearest synsets, the corresponding Italian synset words from the “Princeton Annotated Gloss Corpus” and Wikipedia glosses from BabelNet Sense Similarity (cosine score between the vector representations of sense descriptions) 1 Vector representations have been obtained by means of the Personalized Page Rank (PPR) vector representation with WN30 and “Princeton Annotated Gloss Corpus” as knowledge base Evaluation Two Gold Standards, one for verbs (350 sense pairs) and one for nouns (166 sense pairs), with Precision (P), Recall (R) and F1 scores Best F1 by merging the outputs of the two methods: 0.47 for verbs (P=0.61, R=0.38) and 0.64 for nouns (P=0.67, R=0.61). Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 10 / 11
  • 11. Conclusion The gap between a “native” Italian dictionary and an English-derivative Wordnet may be relevant This should be carefully taken into account when devising techniques and methodologies to construct multilingual resources Our results suggest that more attention should be paid to the semantic peculiarity of each language, i.e. the specific way each language constructs a conceptual view of the world Tommaso Caselli , Isabella Chiari , Aldo Gangemi , Elisabetta Jezek , Alessandro Oltramari , Guido Vetere , Laure Vieu , Fabio Massimo Zanzotto Senso Comune as a Knowledge Base of Italian language December 10, 2014 11 / 11