Roberto Navigli
Multilinguality at Your Fingertips:
BabelNet, Babelfy and Beyond!
Bertinoro, 8th July 2015
http://lcl.uniroma1.it
ERC Starting Grant n. 259234
LIDER CSA n. 610782
10/07/2015
Simone
Ponzetto
Tiziano
Flati
Andrea
Moro
Daniele
Vannella
Taher
Pilehvar
Francesco
Cecconi
10/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
2
Federico
Scozzafava
10/07/2015Multilingual Web Access – WWW 2015
Roberto Navigli
3
BabelNet, Babelfy, Video games with a purpose & the Wikipedia Bitaxonomy
Roberto Navigli
4
It’s all about knowledge!
• But can we expect computers to know?
• Can’t computers just use, e.g., statistical techniques?
State-of-the-art Machine Translation
• EN: These are movies in which the music genre, e.g.
rock, is an important element but not necessarily central
to the plot. Examples are Easy Rider (1969), The
Graduate (1969), and Saturday Night Fever (1978).
BabelNet, Babelfy and Beyond!
Roberto Navigli
State-of-the-art Machine Translation
• EN: These are movies in which the music genre, e.g.
rock, is an important element but not necessarily central
to the plot. Examples are Easy Rider (1969), The
Graduate (1969), and Saturday Night Fever (1978).
• FR: Ce sont des films dans lesquels le genre de
musique, par exemple, rock, est un élément important,
mais pas nécessairement au centre de l'intrigue. Les
exemples sont Easy Rider (1969), The Graduate (1969),
et Saturday Night Fever (1978).
BabelNet, Babelfy and Beyond!
Roberto Navigli
State-of-the-art Machine Translation
• EN: These are movies in which the music genre, e.g.
rock, is an important element but not necessarily central
to the plot. Examples are Easy Rider (1969), The
Graduate (1969), and Saturday Night Fever (1978).
• ES: Estas son las películas en las que el género de la
música, por ejemplo, roca, es un elemento importante,
pero no necesariamente el centro de la trama. [...]
BabelNet, Babelfy and Beyond!
Roberto Navigli
State-of-the-art Machine Translation
• EN: We can look at how this vast slug of molten
underground rock was injected.
Danger here!
BabelNet, Babelfy and Beyond!
Roberto Navigli
State-of-the-art Machine Translation
• EN: We can look at how this vast slug of molten
underground rock was injected.
• FR: Nous pouvons voir comment ce vaste bouchon de
rock underground fondu a été injecté.
• IT: Possiamo guardare a come è stato iniettato questo
vasto slug del rock underground fusa.
BabelNet, Babelfy and Beyond!
Roberto Navigli
10/07/2015 11
What are we talking about?
A 5-year ERC Starting Grant (2011-2016)
on Multilingual Word Sense Disambiguation
BabelNet, Babelfy and Beyond!
Roberto Navigli
INTEGRATING
KNOWLEDGE
[Navigli & Ponzetto, ACL 2010;
Pilehvar & Navigli, ACL 2014]
10/07/2015 16BabelNet, Babelfy and Beyond!
Roberto Navigli
10/07/2015 14
Key Objective 1: create knowledge for all languages
Multilingual Joint Word Sense Disambiguation
(MultiJEDI)
WordNet
MultiWordNet
WOLF
MCR
GermaNet
BalkaNet
BabelNet, Babelfy and Beyond!
Roberto Navigli
It all started with merging WordNet and Wikipedia
[Navigli and Ponzetto, ACL 2010; AIJ 2012]
• A wide-coverage multilingual semantic network
including both encyclopedic (from Wikipedia) and
lexicographic (from WordNet) entries
Concepts from WordNetNamed Entities and specialized
concepts from Wikipedia
Concepts integrated from
both resources
10/07/2015 15BabelNet, Babelfy and Beyond!
Roberto Navigli
Creating a Multilingual Semantic Network
• Start from two large complementary resources:
– WordNet: full-fledged taxonomy
– Wikipedia: multilingual and continuously updated
{wheeled vehicle}
{self-propelled vehicle}
{motor vehicle} {tractor}
{car,auto, automobile,
machine, motorcar}
{convertible}
{air bag}
is-a
is-a
is-a
is-a
is-a
has-part
{golf cart,
golfcart}
is-a
{wagon,
waggon}
is-a
{accelerator,
accelerator pedal,
gas pedal, throttle}
has-part
{car window}
has-part
{locomotive, engine,
locomotive engine,
railway locomotive}
is-a
{brake}has-part
{wheel}
has-part
{splasher}
has-part
Get the best from both worlds
16BabelNet, Babelfy and Beyond!
Roberto Navigli
{wheeled vehicle}
{self-propelled vehicle}
{motor vehicle} {tractor}
{car,auto, automobile,
machine, motorcar}
{convertible}
{air bag}
is-a
is-a
is-a
is-a
is-a
has-part{golf cart,
golfcart}
is-a
{wagon,
waggon}
is-a
{accelerator,
accelerator pedal,
gas pedal, throttle}
has-part
{car window}
has-part
{locomotive, engine,
locomotive engine,
railway locomotive}
is-a
{brake}has-part
{wheel}
has-part
{splasher}
has-part
concepts
semantic relation
WordNet [Miller et al., 1990; Fellbaum, 1998]
17BabelNet, Babelfy and Beyond!
Roberto Navigli
BabelNet, Babelfy, Video games with a purpose & the Wikipedia Bitaxonomy
Roberto Navigli
18
• Playing with senses
• Bla bla bla bla bla bla bla
• Bla bla bla bla bla bla bla
• Bla bla bla bla bla bla bla
• Bla bla bla bla bla bla bla
• Bla bla bla bla bla bla bla
concepts
(unspecified) semantic relation
Wikipedia [The Web Community, 2001-today]
Structural Similarity with Personalized PageRank
[Pilehvar and Navigli, ACL 2014]
some
Structural Similarity with Personalized PageRank
[Pilehvar and Navigli, ACL 2014]
To merge or not to merge?
[Pilehvar and Navigli, ACL 2014]
• Measure the similarity of senses of the same word (but
from different resources)
• If they are similar enough, merge the corresponding two
concepts
WordNetplant#n#1plant#n#1
21BabelNet, Babelfy and Beyond!
Roberto Navigli
• We collect lexicalizations, definitions, translations,
images, etc. from each of the merged resources
Merging entries from different resources into BabelNet
BabelNet, Babelfy and Beyond!
Roberto Navigli
22
WordNet
BabelNet: concepts and semantic relations (2)
• We encode knowledge as a labeled directed graph:
– Each vertex is a Babel synset
– Each edge is a semantic relation between synsets:
• is-a (balloon is-a aircraft)
• part-of (gasbag part-of balloon)
• instance-of (Einstein instance-of physicist)
• …
• unspecified/relatedness (balloon related-to flight)
balloonEN, BallonDE,
aerostatoES, aerostatoIT,
pallone aerostaticoIT,
mongolfièreFR
3210/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Building BabelNet: Translating Babel synsets
1. Exploiting Wikipedia interlanguage links
pallone
aerostatico
globo
aerostàtico
Ballon
3310/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Building BabelNet: Translating Babel synsets
2. Filling the lexical translation gaps using a Machine
Translation system to translate the English lexicalizations of
a concept
• On August 27, 1783 in Paris, Franklin witnessed the
world's first hydrogen [[Balloon (aircraft)|balloon]]
flight.
• Le 27 Août, 1783 à Paris, Franklin vu le premier vol en
ballon d'hydrogène.
Statistical Machine Translation
3410/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
The most frequent translation of a word in a given meaning
left context term right context
wikification may refer to: the…
geoinformatics services' and ' wikification of GIS by the masses'
the process may be called wikification (as in ...
which is then called " wikification and to the related problem
reason needs copyediting, wikification , reduction of POV, work on references
huge amount of cleanup, wikification , etc. Version of 12 Nov
3610/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
left context term right context
wikificazione potrebbe riferirsi a: il…
servizi geoinformatici' e ' wikification di GIS dalle masse'
il processo chiamato wikificazione (come in ...
che è quindi chiamato wikificazione e al problema correlato…
ragione richiede copyediting, wikification , riduzione di POV, lavoro su reference
grandi quantità di pulizia, wikificazione , ecc. Versione del 12 Novembre
The most frequent translation of a word in a given meaning
3710/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
left context term right context
wikificazione potrebbe riferirsi a: il…
servizi geoinformatici' e ' wikification di GIS dalle masse'
il processo chiamato wikificazione (come in ...
che è quindi chiamato wikificazione e al problema correlato…
ragione richiede copyediting, wikification , riduzione di POV, lavoro su reference
grandi quantità di pulizia, wikificazione , ecc. Versione del 12 Novembre
The most frequent translation of a word in a given meaning
3810/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
What is BabelNet?
• A merger of resources of different kinds:
10/07/2015META Prize 2015: BabelNet
Roberto Navigli
39
10/07/2015 40
What is BabelNet?
• A merger of resources of different kinds:
– WordNet: the most popular computational lexicon of English
– Open Multilingual WordNet: a collection of open wordnets
– Wikipedia: the largest collaborative encyclopedia
– Wikidata: the largest collaborative knowledge base
– Wiktionary: the largest collaborative dictionary
– OmegaWiki: a medium-size collaborative multilingual dictionary
– High-quality automatic sense-based translations
BabelNet, Babelfy and Beyond!
Roberto Navigli
10/07/2015 41
What is BabelNet?
• A merger of resources of different kinds:
BabelNet, Babelfy and Beyond!
Roberto Navigli
10/07/2015 42
Why do we need BabelNet?
• Multilinguality: the same concept is expressed in tens of
languages
BabelNet, Babelfy and Beyond!
Roberto Navigli
10/07/2015 43
Why do we need BabelNet?
• Multilinguality: the same concept is expressed in tens of
languages
BabelNet, Babelfy and Beyond!
Roberto Navigli
10/07/2015Multilingual Web Access – WWW 2015
Roberto Navigli
44
10/07/2015 45
Why do we need BabelNet?
• Multilinguality: the same concept is expressed in tens of
languages
• Coverage: 271 languages and 14 million entries!
BabelNet, Babelfy and Beyond!
Roberto Navigli
10/07/2015Multilingual Web Access – WWW 2015
Roberto Navigli
46
Why do we need BabelNet?
• Multilinguality: the same concept is expressed in tens of
languages
• Coverage: 271 languages and 14 million entries!
• Concepts and named entities together: dictionary and
encyclopedic knowledge is semantically interconnected
10/07/2015META Prize 2015: BabelNet
Roberto Navigli
46
10/07/2015Multilingual Web Access – WWW 2015
Roberto Navigli
47
Why do we need BabelNet?
• Multilinguality: the same concept is expressed in tens of
languages
• Coverage: 271 languages and 14 million entries!
• Concepts and named entities together: dictionary and
encyclopedic knowledge is semantically interconnected
• "Dictionary of the future": semantic network structure
with labeled relations, pictures, multilingual synsets
10/07/2015META Prize 2015: BabelNet
Roberto Navigli
47
10/07/2015 48
Why do we need BabelNet?
• Multilinguality: the same concept is expressed in tens of
languages
• Coverage: 271 languages and 14 million entries!
• Concepts and named entities together: dictionary and
encyclopedic knowledge is semantically interconnected
• "Dictionary of the future": semantic network structure
with labeled relations, pictures, multilingual synsets
• Full-fledged taxonomy: is-a relations are available for
both concepts and named entities (Wikipedia Bitaxonomy)
– Bertinoro is-a town & comune
– BabelNet is-a semantic network & encyclopedic dictionary
– enjoyment is-a pleasure
– Couldn't find a hypernym for ontology learning
– Ontology engineering is-a Winchester College Ground (!!!)
BabelNet, Babelfy and Beyond!
Roberto Navigli
10/07/2015Multilingual Web Access – WWW 2015
Roberto Navigli
49
DON’T PANIC:
this time has yet to come!!!
But you are
working on it,
aren't you?
10/07/2015 50
Why do we need BabelNet?
• Easy access: Java and HTTP RESTful APIs; SPARQL
endpoint (2 billion triples)
SPARQL endpoint:
http://babelnet.org/sparql/
BabelNet, Babelfy and Beyond!
Roberto Navigli
The core of the Linguistic
Linked Open Data cloud!
10/07/2015 52
What can we do with BabelNet?
• Search and translate:
BabelNet, Babelfy and Beyond!
Roberto Navigli
10/07/2015META Prize 2015: BabelNet
Roberto Navigli
53
What can we do with BabelNet?
What can we do with BabelNet?
• Explore the network:
10/07/2015META Prize 2015: BabelNet
Roberto Navigli
54
We are not alone in the (resource) universe!
10/07/2015BabelNet: a Very Large Multilingual Ontology
Roberto Navigli
71
We are not alone in the (resource) universe!
• DBpedia [Bizer et al. 2009] - a resource obtained from
structured information in Wikipedia
– «Describes 3.77M things»
– No dictionary side
• YAGO [Suchanek et al. 2007]
– «Contains 10M entities and 120M facts about these entities»
– Links Wikipedia categories to WordNet synsets
• MENTA [de Melo and Weikum, 2010]
– A «multilingual taxonomy with 5.4M entities»
• WikiNet [Nastase and Strube, 2013]
– Semantic network connecting Wikipedia entities
– «3M concepts and 38+M relations»
• Freebase (http://freebase.com): collaborative effort
– Started from Wikipedia, MusicBrainz, ChefMoz, etc. Shut down!
72BabelNet, Babelfy and Beyond!
Roberto Navigli
10/07/2015BabelNet & friends
Roberto Navigli
75
ADDRESSING
AMBIGUITY
[Moro, Raganato & Navigli,
TACL 2014]
BabelNet, Babelfy and Beyond!
Roberto Navigli
97
Context matters!!!
10/07/2015 98BabelNet, Babelfy and Beyond!
Roberto Navigli
Back to our issue: lexical ambiguity!
• Thomas and Mario played as strikers in Munich.
9910/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Word Sense Disambiguation and Entity Linking
• Thomas and Mario are strikers playing in Munich
Entity Linking: The task
of discovering mentions
of entities within a text
and linking them in a
knowledge base.
WSD: The task aimed at
assigning meanings to word
occurrences within text.
10010/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Word Sense Disambiguation in a Nutshell
strikers
(target word)
“Thomas and Mario are strikers playing in Munich”
(context)
WSD
system
knowledge
sense of target word
10110/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Entity Linking in a Nutshell
Thomas
(target mention)
“Thomas and Mario are strikers playing in Munich”
(context)
EL
system
Named Entity
102
knowledge
10/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Disambiguation and Entity Linking together!
BabelNet is a huge multilingual inventory
for both word senses and named entities!
10310/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Multilingual Joint Word Sense Disambiguation
(MultiJEDI)
Key Objective 2: use all languages to disambiguate one
10410/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
So what?
10510/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Step 1: Find all possible meanings of words
1. Exact Matching (good for WSD, bad for EL)
Thomas and Mario are strikers playing in Munich
Thomas,
Norman Thomas,
Seth
They both have
Thomas as one of
their lexicalizations
10810/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Step 1: Find all possible meanings of words
2. Partial Matching (good for EL)
Thomas and Mario are strikers playing in Munich
Thomas,
Norman Thomas,
Seth
Thomas
Müller
It has Thomas as a
substring of one of
its lexicalizations
10910/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Step 1: Find all possible meanings of words
“Thomas and Mario are strikers playing in Munich”
Thomas (novel)
Seth Thomas
Thomas Müller
Mario Gómez
Mario (Album)
Mario (Character)
Striker (Movie)
Striker (Video Game)
striker (Sport)
Munich (City)
FC Bayern Munich
Munich (Song)
11010/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Step 1: Find all possible meanings of words
“Thomas and Mario are strikers playing in Munich”
Thomas (novel)
Seth Thomas
Thomas Müller
Mario Gómez
Mario (Album)
Mario (Character)
Striker (Movie)
Striker (Video Game)
striker (Sport)
Munich (City)
FC Bayern Munich
Munich (Song)
Ambiguity!
11110/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Step 2: Connect all the candidate meanings
Thomas and Mario are strikers playing in Munich
11310/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Step 3: Extract a dense subgraph
Thomas and Mario are strikers playing in Munich
11510/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Step 3: Extract a dense subgraph
Thomas and Mario are strikers playing in Munich
11610/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Step 4: Select the most reliable meanings
Thomas and Mario are strikers playing in Munich
11910/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Step 4: Select the most reliable meanings
“Thomas and Mario are strikers playing in Munich”
Thomas (novel)
Seth Thomas
Thomas Müller
Mario Gómez
Mario (Album)
Mario (Character)
Striker (Movie)
Striker (Video Game)
striker (Sport)
Munich (City)
FC Bayern Munich
Munich (Song)
12010/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Experimental Results:
Fine-grained (Multilingual) Disambiguation
Senseval-3
SemEval-2007
task 17
SemEval-2013 task 12
12210/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Experimental Results: KORE50, AIDA-CoNLL
• Two gold-standard Entity Linking datasets:
12410/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
10/07/2015 125
What can we do with Babelfy?
• Disambiguate text written in any language!
BabelNet, Babelfy and Beyond!
Roberto Navigli
10/07/2015 126
What can we do with Babelfy?
• Disambiguate text written in any language!
• Disambiguate in a language-agnostic setting!
BabelNet, Babelfy and Beyond!
Roberto Navigli
10/07/2015BabelNet & friends
Roberto Navigli
127
Now in the Linked Data cloud…
10/07/2015 128BabelNet goes to the (Multilingual) Semantic Web
Roberto Navigli
The Linguistic
Linked Open Data Cloud
BabelNet in Lemon/RDF
10/07/2015Multilingual Web Access – WWW 2015
Roberto Navigli
130
Babel
words
lemmas and
inflected words
Babel
senses
Babel
synsets
• BabelNet is a proof of
concept: a multilingual (271
languages) encyclopedic
dictionary and a semantic
network in the LLOD cloud.
• Choice of model: lemon,
mainly, but also SKOS (for
concepts and
narrower/broader
relations), OWL and LexInfo
(meronym/holonym
relations, POS, translations)
Babel
gloss
Now: what can I do with my
(linguistic? domain-oriented?)
resource?
Your (linguistic)
resource
10/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
10/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Now: what can I do with my
(linguistic? domain-oriented?)
resource?
10/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Now: what can I do with my
(linguistic? domain-oriented?)
resource?
10/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Now: what can I do with my
(linguistic? domain-oriented?)
resource?
A case study: the IATE term base
• EU's inter-institutional terminology database
• 1.4 million multilingual entries
13510/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
A case study: the IATE term base
13610/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Early achievement: linking IATE to BabelNet
• Goal: To automatically (and semantically) link IATE to
BabelNet using a language- and resource-agnostic
approach
10/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
IATE-258730 bn:00491522nCorylus maxima
Early achievement: linking IATE to BabelNet
10/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Linking "pomodoro di serra" to BabelNet
• Babelfy features language-agnostic disambiguation!
Linking "pomodoro di serra" to BabelNet
10/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Summarizing
10/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
203
Actually there’s much much
much more!
But no time NOW!
20410/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
(but I'll be here till Saturday!!!)
10/07/2015BabelNet & friends
Roberto Navigli
205
Thanks or…
m i(grazie)
20610/07/2015BabelNet, Babelfy and Beyond!
Roberto Navigli
Roberto Navigli
Linguistic Computing Laboratory
http://lcl.uniroma1.it

Navigli sssw

  • 1.
    Roberto Navigli Multilinguality atYour Fingertips: BabelNet, Babelfy and Beyond! Bertinoro, 8th July 2015 http://lcl.uniroma1.it ERC Starting Grant n. 259234 LIDER CSA n. 610782
  • 2.
  • 3.
    10/07/2015Multilingual Web Access– WWW 2015 Roberto Navigli 3
  • 4.
    BabelNet, Babelfy, Videogames with a purpose & the Wikipedia Bitaxonomy Roberto Navigli 4 It’s all about knowledge! • But can we expect computers to know? • Can’t computers just use, e.g., statistical techniques?
  • 5.
    State-of-the-art Machine Translation •EN: These are movies in which the music genre, e.g. rock, is an important element but not necessarily central to the plot. Examples are Easy Rider (1969), The Graduate (1969), and Saturday Night Fever (1978). BabelNet, Babelfy and Beyond! Roberto Navigli
  • 6.
    State-of-the-art Machine Translation •EN: These are movies in which the music genre, e.g. rock, is an important element but not necessarily central to the plot. Examples are Easy Rider (1969), The Graduate (1969), and Saturday Night Fever (1978). • FR: Ce sont des films dans lesquels le genre de musique, par exemple, rock, est un élément important, mais pas nécessairement au centre de l'intrigue. Les exemples sont Easy Rider (1969), The Graduate (1969), et Saturday Night Fever (1978). BabelNet, Babelfy and Beyond! Roberto Navigli
  • 7.
    State-of-the-art Machine Translation •EN: These are movies in which the music genre, e.g. rock, is an important element but not necessarily central to the plot. Examples are Easy Rider (1969), The Graduate (1969), and Saturday Night Fever (1978). • ES: Estas son las películas en las que el género de la música, por ejemplo, roca, es un elemento importante, pero no necesariamente el centro de la trama. [...] BabelNet, Babelfy and Beyond! Roberto Navigli
  • 8.
    State-of-the-art Machine Translation •EN: We can look at how this vast slug of molten underground rock was injected. Danger here! BabelNet, Babelfy and Beyond! Roberto Navigli
  • 9.
    State-of-the-art Machine Translation •EN: We can look at how this vast slug of molten underground rock was injected. • FR: Nous pouvons voir comment ce vaste bouchon de rock underground fondu a été injecté. • IT: Possiamo guardare a come è stato iniettato questo vasto slug del rock underground fusa. BabelNet, Babelfy and Beyond! Roberto Navigli
  • 10.
    10/07/2015 11 What arewe talking about? A 5-year ERC Starting Grant (2011-2016) on Multilingual Word Sense Disambiguation BabelNet, Babelfy and Beyond! Roberto Navigli
  • 11.
    INTEGRATING KNOWLEDGE [Navigli & Ponzetto,ACL 2010; Pilehvar & Navigli, ACL 2014] 10/07/2015 16BabelNet, Babelfy and Beyond! Roberto Navigli
  • 12.
    10/07/2015 14 Key Objective1: create knowledge for all languages Multilingual Joint Word Sense Disambiguation (MultiJEDI) WordNet MultiWordNet WOLF MCR GermaNet BalkaNet BabelNet, Babelfy and Beyond! Roberto Navigli
  • 13.
    It all startedwith merging WordNet and Wikipedia [Navigli and Ponzetto, ACL 2010; AIJ 2012] • A wide-coverage multilingual semantic network including both encyclopedic (from Wikipedia) and lexicographic (from WordNet) entries Concepts from WordNetNamed Entities and specialized concepts from Wikipedia Concepts integrated from both resources 10/07/2015 15BabelNet, Babelfy and Beyond! Roberto Navigli
  • 14.
    Creating a MultilingualSemantic Network • Start from two large complementary resources: – WordNet: full-fledged taxonomy – Wikipedia: multilingual and continuously updated {wheeled vehicle} {self-propelled vehicle} {motor vehicle} {tractor} {car,auto, automobile, machine, motorcar} {convertible} {air bag} is-a is-a is-a is-a is-a has-part {golf cart, golfcart} is-a {wagon, waggon} is-a {accelerator, accelerator pedal, gas pedal, throttle} has-part {car window} has-part {locomotive, engine, locomotive engine, railway locomotive} is-a {brake}has-part {wheel} has-part {splasher} has-part Get the best from both worlds 16BabelNet, Babelfy and Beyond! Roberto Navigli
  • 15.
    {wheeled vehicle} {self-propelled vehicle} {motorvehicle} {tractor} {car,auto, automobile, machine, motorcar} {convertible} {air bag} is-a is-a is-a is-a is-a has-part{golf cart, golfcart} is-a {wagon, waggon} is-a {accelerator, accelerator pedal, gas pedal, throttle} has-part {car window} has-part {locomotive, engine, locomotive engine, railway locomotive} is-a {brake}has-part {wheel} has-part {splasher} has-part concepts semantic relation WordNet [Miller et al., 1990; Fellbaum, 1998] 17BabelNet, Babelfy and Beyond! Roberto Navigli
  • 16.
    BabelNet, Babelfy, Videogames with a purpose & the Wikipedia Bitaxonomy Roberto Navigli 18 • Playing with senses • Bla bla bla bla bla bla bla • Bla bla bla bla bla bla bla • Bla bla bla bla bla bla bla • Bla bla bla bla bla bla bla • Bla bla bla bla bla bla bla concepts (unspecified) semantic relation Wikipedia [The Web Community, 2001-today]
  • 17.
    Structural Similarity withPersonalized PageRank [Pilehvar and Navigli, ACL 2014] some
  • 18.
    Structural Similarity withPersonalized PageRank [Pilehvar and Navigli, ACL 2014]
  • 19.
    To merge ornot to merge? [Pilehvar and Navigli, ACL 2014] • Measure the similarity of senses of the same word (but from different resources) • If they are similar enough, merge the corresponding two concepts WordNetplant#n#1plant#n#1 21BabelNet, Babelfy and Beyond! Roberto Navigli
  • 20.
    • We collectlexicalizations, definitions, translations, images, etc. from each of the merged resources Merging entries from different resources into BabelNet BabelNet, Babelfy and Beyond! Roberto Navigli 22 WordNet
  • 21.
    BabelNet: concepts andsemantic relations (2) • We encode knowledge as a labeled directed graph: – Each vertex is a Babel synset – Each edge is a semantic relation between synsets: • is-a (balloon is-a aircraft) • part-of (gasbag part-of balloon) • instance-of (Einstein instance-of physicist) • … • unspecified/relatedness (balloon related-to flight) balloonEN, BallonDE, aerostatoES, aerostatoIT, pallone aerostaticoIT, mongolfièreFR 3210/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 22.
    Building BabelNet: TranslatingBabel synsets 1. Exploiting Wikipedia interlanguage links pallone aerostatico globo aerostàtico Ballon 3310/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 23.
    Building BabelNet: TranslatingBabel synsets 2. Filling the lexical translation gaps using a Machine Translation system to translate the English lexicalizations of a concept • On August 27, 1783 in Paris, Franklin witnessed the world's first hydrogen [[Balloon (aircraft)|balloon]] flight. • Le 27 Août, 1783 à Paris, Franklin vu le premier vol en ballon d'hydrogène. Statistical Machine Translation 3410/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 24.
    The most frequenttranslation of a word in a given meaning left context term right context wikification may refer to: the… geoinformatics services' and ' wikification of GIS by the masses' the process may be called wikification (as in ... which is then called " wikification and to the related problem reason needs copyediting, wikification , reduction of POV, work on references huge amount of cleanup, wikification , etc. Version of 12 Nov 3610/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 25.
    left context termright context wikificazione potrebbe riferirsi a: il… servizi geoinformatici' e ' wikification di GIS dalle masse' il processo chiamato wikificazione (come in ... che è quindi chiamato wikificazione e al problema correlato… ragione richiede copyediting, wikification , riduzione di POV, lavoro su reference grandi quantità di pulizia, wikificazione , ecc. Versione del 12 Novembre The most frequent translation of a word in a given meaning 3710/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 26.
    left context termright context wikificazione potrebbe riferirsi a: il… servizi geoinformatici' e ' wikification di GIS dalle masse' il processo chiamato wikificazione (come in ... che è quindi chiamato wikificazione e al problema correlato… ragione richiede copyediting, wikification , riduzione di POV, lavoro su reference grandi quantità di pulizia, wikificazione , ecc. Versione del 12 Novembre The most frequent translation of a word in a given meaning 3810/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 27.
    What is BabelNet? •A merger of resources of different kinds: 10/07/2015META Prize 2015: BabelNet Roberto Navigli 39
  • 28.
    10/07/2015 40 What isBabelNet? • A merger of resources of different kinds: – WordNet: the most popular computational lexicon of English – Open Multilingual WordNet: a collection of open wordnets – Wikipedia: the largest collaborative encyclopedia – Wikidata: the largest collaborative knowledge base – Wiktionary: the largest collaborative dictionary – OmegaWiki: a medium-size collaborative multilingual dictionary – High-quality automatic sense-based translations BabelNet, Babelfy and Beyond! Roberto Navigli
  • 29.
    10/07/2015 41 What isBabelNet? • A merger of resources of different kinds: BabelNet, Babelfy and Beyond! Roberto Navigli
  • 30.
    10/07/2015 42 Why dowe need BabelNet? • Multilinguality: the same concept is expressed in tens of languages BabelNet, Babelfy and Beyond! Roberto Navigli
  • 31.
    10/07/2015 43 Why dowe need BabelNet? • Multilinguality: the same concept is expressed in tens of languages BabelNet, Babelfy and Beyond! Roberto Navigli
  • 32.
    10/07/2015Multilingual Web Access– WWW 2015 Roberto Navigli 44
  • 33.
    10/07/2015 45 Why dowe need BabelNet? • Multilinguality: the same concept is expressed in tens of languages • Coverage: 271 languages and 14 million entries! BabelNet, Babelfy and Beyond! Roberto Navigli
  • 34.
    10/07/2015Multilingual Web Access– WWW 2015 Roberto Navigli 46 Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages • Coverage: 271 languages and 14 million entries! • Concepts and named entities together: dictionary and encyclopedic knowledge is semantically interconnected 10/07/2015META Prize 2015: BabelNet Roberto Navigli 46
  • 35.
    10/07/2015Multilingual Web Access– WWW 2015 Roberto Navigli 47 Why do we need BabelNet? • Multilinguality: the same concept is expressed in tens of languages • Coverage: 271 languages and 14 million entries! • Concepts and named entities together: dictionary and encyclopedic knowledge is semantically interconnected • "Dictionary of the future": semantic network structure with labeled relations, pictures, multilingual synsets 10/07/2015META Prize 2015: BabelNet Roberto Navigli 47
  • 36.
    10/07/2015 48 Why dowe need BabelNet? • Multilinguality: the same concept is expressed in tens of languages • Coverage: 271 languages and 14 million entries! • Concepts and named entities together: dictionary and encyclopedic knowledge is semantically interconnected • "Dictionary of the future": semantic network structure with labeled relations, pictures, multilingual synsets • Full-fledged taxonomy: is-a relations are available for both concepts and named entities (Wikipedia Bitaxonomy) – Bertinoro is-a town & comune – BabelNet is-a semantic network & encyclopedic dictionary – enjoyment is-a pleasure – Couldn't find a hypernym for ontology learning – Ontology engineering is-a Winchester College Ground (!!!) BabelNet, Babelfy and Beyond! Roberto Navigli
  • 37.
    10/07/2015Multilingual Web Access– WWW 2015 Roberto Navigli 49 DON’T PANIC: this time has yet to come!!! But you are working on it, aren't you?
  • 38.
    10/07/2015 50 Why dowe need BabelNet? • Easy access: Java and HTTP RESTful APIs; SPARQL endpoint (2 billion triples) SPARQL endpoint: http://babelnet.org/sparql/ BabelNet, Babelfy and Beyond! Roberto Navigli
  • 39.
    The core ofthe Linguistic Linked Open Data cloud!
  • 40.
    10/07/2015 52 What canwe do with BabelNet? • Search and translate: BabelNet, Babelfy and Beyond! Roberto Navigli
  • 41.
    10/07/2015META Prize 2015:BabelNet Roberto Navigli 53 What can we do with BabelNet?
  • 42.
    What can wedo with BabelNet? • Explore the network: 10/07/2015META Prize 2015: BabelNet Roberto Navigli 54
  • 43.
    We are notalone in the (resource) universe! 10/07/2015BabelNet: a Very Large Multilingual Ontology Roberto Navigli 71
  • 44.
    We are notalone in the (resource) universe! • DBpedia [Bizer et al. 2009] - a resource obtained from structured information in Wikipedia – «Describes 3.77M things» – No dictionary side • YAGO [Suchanek et al. 2007] – «Contains 10M entities and 120M facts about these entities» – Links Wikipedia categories to WordNet synsets • MENTA [de Melo and Weikum, 2010] – A «multilingual taxonomy with 5.4M entities» • WikiNet [Nastase and Strube, 2013] – Semantic network connecting Wikipedia entities – «3M concepts and 38+M relations» • Freebase (http://freebase.com): collaborative effort – Started from Wikipedia, MusicBrainz, ChefMoz, etc. Shut down! 72BabelNet, Babelfy and Beyond! Roberto Navigli
  • 45.
  • 46.
    ADDRESSING AMBIGUITY [Moro, Raganato &Navigli, TACL 2014] BabelNet, Babelfy and Beyond! Roberto Navigli 97
  • 47.
    Context matters!!! 10/07/2015 98BabelNet,Babelfy and Beyond! Roberto Navigli
  • 48.
    Back to ourissue: lexical ambiguity! • Thomas and Mario played as strikers in Munich. 9910/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 49.
    Word Sense Disambiguationand Entity Linking • Thomas and Mario are strikers playing in Munich Entity Linking: The task of discovering mentions of entities within a text and linking them in a knowledge base. WSD: The task aimed at assigning meanings to word occurrences within text. 10010/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 50.
    Word Sense Disambiguationin a Nutshell strikers (target word) “Thomas and Mario are strikers playing in Munich” (context) WSD system knowledge sense of target word 10110/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 51.
    Entity Linking ina Nutshell Thomas (target mention) “Thomas and Mario are strikers playing in Munich” (context) EL system Named Entity 102 knowledge 10/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 52.
    Disambiguation and EntityLinking together! BabelNet is a huge multilingual inventory for both word senses and named entities! 10310/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 53.
    Multilingual Joint WordSense Disambiguation (MultiJEDI) Key Objective 2: use all languages to disambiguate one 10410/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 54.
    So what? 10510/07/2015BabelNet, Babelfyand Beyond! Roberto Navigli
  • 55.
    Step 1: Findall possible meanings of words 1. Exact Matching (good for WSD, bad for EL) Thomas and Mario are strikers playing in Munich Thomas, Norman Thomas, Seth They both have Thomas as one of their lexicalizations 10810/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 56.
    Step 1: Findall possible meanings of words 2. Partial Matching (good for EL) Thomas and Mario are strikers playing in Munich Thomas, Norman Thomas, Seth Thomas Müller It has Thomas as a substring of one of its lexicalizations 10910/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 57.
    Step 1: Findall possible meanings of words “Thomas and Mario are strikers playing in Munich” Thomas (novel) Seth Thomas Thomas Müller Mario Gómez Mario (Album) Mario (Character) Striker (Movie) Striker (Video Game) striker (Sport) Munich (City) FC Bayern Munich Munich (Song) 11010/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 58.
    Step 1: Findall possible meanings of words “Thomas and Mario are strikers playing in Munich” Thomas (novel) Seth Thomas Thomas Müller Mario Gómez Mario (Album) Mario (Character) Striker (Movie) Striker (Video Game) striker (Sport) Munich (City) FC Bayern Munich Munich (Song) Ambiguity! 11110/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 59.
    Step 2: Connectall the candidate meanings Thomas and Mario are strikers playing in Munich 11310/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 60.
    Step 3: Extracta dense subgraph Thomas and Mario are strikers playing in Munich 11510/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 61.
    Step 3: Extracta dense subgraph Thomas and Mario are strikers playing in Munich 11610/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 62.
    Step 4: Selectthe most reliable meanings Thomas and Mario are strikers playing in Munich 11910/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 63.
    Step 4: Selectthe most reliable meanings “Thomas and Mario are strikers playing in Munich” Thomas (novel) Seth Thomas Thomas Müller Mario Gómez Mario (Album) Mario (Character) Striker (Movie) Striker (Video Game) striker (Sport) Munich (City) FC Bayern Munich Munich (Song) 12010/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 65.
    Experimental Results: Fine-grained (Multilingual)Disambiguation Senseval-3 SemEval-2007 task 17 SemEval-2013 task 12 12210/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 66.
    Experimental Results: KORE50,AIDA-CoNLL • Two gold-standard Entity Linking datasets: 12410/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 67.
    10/07/2015 125 What canwe do with Babelfy? • Disambiguate text written in any language! BabelNet, Babelfy and Beyond! Roberto Navigli
  • 68.
    10/07/2015 126 What canwe do with Babelfy? • Disambiguate text written in any language! • Disambiguate in a language-agnostic setting! BabelNet, Babelfy and Beyond! Roberto Navigli
  • 69.
  • 70.
    Now in theLinked Data cloud… 10/07/2015 128BabelNet goes to the (Multilingual) Semantic Web Roberto Navigli
  • 71.
  • 72.
    BabelNet in Lemon/RDF 10/07/2015MultilingualWeb Access – WWW 2015 Roberto Navigli 130 Babel words lemmas and inflected words Babel senses Babel synsets • BabelNet is a proof of concept: a multilingual (271 languages) encyclopedic dictionary and a semantic network in the LLOD cloud. • Choice of model: lemon, mainly, but also SKOS (for concepts and narrower/broader relations), OWL and LexInfo (meronym/holonym relations, POS, translations) Babel gloss
  • 73.
    Now: what canI do with my (linguistic? domain-oriented?) resource? Your (linguistic) resource 10/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 74.
    10/07/2015BabelNet, Babelfy andBeyond! Roberto Navigli Now: what can I do with my (linguistic? domain-oriented?) resource?
  • 75.
    10/07/2015BabelNet, Babelfy andBeyond! Roberto Navigli Now: what can I do with my (linguistic? domain-oriented?) resource?
  • 76.
    10/07/2015BabelNet, Babelfy andBeyond! Roberto Navigli Now: what can I do with my (linguistic? domain-oriented?) resource?
  • 77.
    A case study:the IATE term base • EU's inter-institutional terminology database • 1.4 million multilingual entries 13510/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 78.
    A case study:the IATE term base 13610/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 79.
    Early achievement: linkingIATE to BabelNet • Goal: To automatically (and semantically) link IATE to BabelNet using a language- and resource-agnostic approach 10/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 80.
    IATE-258730 bn:00491522nCorylus maxima Earlyachievement: linking IATE to BabelNet 10/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 81.
    Linking "pomodoro diserra" to BabelNet • Babelfy features language-agnostic disambiguation!
  • 82.
    Linking "pomodoro diserra" to BabelNet 10/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli
  • 83.
  • 84.
    Actually there’s muchmuch much more! But no time NOW! 20410/07/2015BabelNet, Babelfy and Beyond! Roberto Navigli (but I'll be here till Saturday!!!)
  • 85.
  • 86.
    Thanks or… m i(grazie) 20610/07/2015BabelNet,Babelfy and Beyond! Roberto Navigli
  • 87.
    Roberto Navigli Linguistic ComputingLaboratory http://lcl.uniroma1.it