Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Roberto Navigli - From Text to Concepts and Back: Going Multilingual with BabelNet in a Step or Two
1. Roberto Navigli
From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
30th November 2017 – Rome, Italy
Machine Learning Meetup
http://lcl.uniroma1.it
3. Outline of the talk
• Introduction to BabelNet
• Disambiguation and entity linking with Babelfy
• Demos:
– The Babelfy disambiguation system
– Multilingual concept and entity extraction
30/11/2017 3From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
4. 30/11/2017 4
A 5-year ERC Starting Grant (2011-2016)
on Multilingual Word Sense Disambiguation
http://multijedi.org
From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
5. INTEGRATING
KNOWLEDGE
[Navigli & Ponzetto, ACL 2010
& Artificial Intelligence Journal 2012;
Pilehvar & Navigli, ACL 2014]
30/11/2017
2017 Artificial
Intelligence Prominent
Paper Award!
From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
7. 30/11/2017 7
Key Objective 1: create knowledge for all languages
Multilingual Joint Word Sense Disambiguation
(MultiJEDI)
WordNet
MultiWordNet
WOLF
MCR
GermaNet
BalkaNet
From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
8. It all started with merging WordNet and Wikipedia
[Navigli and Ponzetto, ACL 2010; AIJ 2012]
• A wide-coverage multilingual semantic network
including both encyclopedic (from Wikipedia) and
lexicographic (from WordNet) entries
Concepts from WordNetNamed Entities and specialized
concepts from Wikipedia
Concepts integrated from
both resources
30/11/2017 8From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
9. Creating a Multilingual Semantic Network
• Start from two large complementary resources:
– WordNet: full-fledged taxonomy
– Wikipedia: multilingual and continuously updated
{wheeled vehicle}
{self-propelled vehicle}
{motor vehicle} {tractor}
{car,auto, automobile,
machine, motorcar}
{convertible}
{air bag}
is-a
is-a
is-a
is-a
is-a
has-part
{golf cart,
golfcart}
is-a
{wagon,
waggon}
is-a
{accelerator,
accelerator pedal,
gas pedal, throttle}
has-part
{car window}
has-part
{locomotive, engine,
locomotive engine,
railway locomotive}
is-a
{brake}has-part
{wheel}
has-part
{splasher}
has-part
Get the best from both worlds
9From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
10. {wheeled vehicle}
{self-propelled vehicle}
{motor vehicle} {tractor}
{car,auto, automobile,
machine, motorcar}
{convertible}
{air bag}
is-a
is-a
is-a
is-a
is-a
has-part{golf cart,
golfcart}
is-a
{wagon,
waggon}
is-a
{accelerator,
accelerator pedal,
gas pedal, throttle}
has-part
{car window}
has-part
{locomotive, engine,
locomotive engine,
railway locomotive}
is-a
{brake}has-part
{wheel}
has-part
{splasher}
has-part
concepts
semantic relation
WordNet [Miller et al., 1990; Fellbaum, 1998]
10From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
11. 11
• Playing with senses
• Bla bla bla bla bla bla bla
• Bla bla bla bla bla bla bla
• Bla bla bla bla bla bla bla
• Bla bla bla bla bla bla bla
• Bla bla bla bla bla bla bla
concepts
(unspecified) semantic relation
Wikipedia [The Web Community, 2001-today]
From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
12. To merge or not to merge?
[Pilehvar and Navigli, ACL 2014]
• Measure the similarity of senses of the same word (but
from different resources)
• If they are similar enough, merge the corresponding two
concepts
WordNetplant#n#1plant#n#1
12From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
13. Structural Similarity with Personalized PageRank
[Pilehvar and Navigli, ACL 2014]
The resulting probabilities form a semantic signature
14. • We collect lexicalizations, definitions, translations,
images, etc. from each of the merged resources
Merging entries from different resources into BabelNet
14
WordNet
From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
15. BabelNet: concepts and semantic relations (2)
• We encode knowledge as a labeled directed graph:
– Each vertex is a Babel synset (=synonym set)
– Each edge is a semantic relation between synsets:
• is-a (balloon is-a aircraft)
• part-of (gasbag part-of balloon)
• instance-of (Einstein instance-of physicist)
• …
• unspecified/relatedness (balloon related-to flight)
balloonEN, BallonDE,
aerostatoES, aerostatoIT,
pallone aerostaticoIT,
mongolfièreFR
1530/11/2017From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
16. Building BabelNet: Translating Babel synsets
1. Exploiting Wikipedia interlanguage links
pallone
aerostatico
globo
aerostàtico
Ballon
1630/11/2017From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
17. Building BabelNet: Translating Babel synsets
2. Filling the lexical translation gaps using a Machine
Translation system to translate the English lexicalizations of
a concept
• On August 27, 1783 in Paris, Franklin witnessed the
world's first hydrogen [[Balloon (aircraft)|balloon]]
flight.
• Le 27 Août, 1783 à Paris, Franklin vu le premier vol en
ballon d'hydrogène.
Statistical Machine Translation
1730/11/2017From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
18. What is BabelNet?
• A merger of resources of different kinds:
30/11/2017META Prize 2015: BabelNet
Roberto Navigli
18
19. 30/11/2017 19
• A merger of resources of different kinds:
– WordNet: the most popular computational lexicon of English
– Open Multilingual WordNet: a collection of open wordnets
– WoNeF: a French WordNet
– Wikipedia: the largest collaborative encyclopedia
– Wikidata: the largest collaborative knowledge base
– Wiktionary: the largest collaborative dictionary
– OmegaWiki: a medium-size collaborative multilingual dictionary
– GeoNames: a worldwide geographical database
– FrameNet lexical units
– VerbNet entries
– Microsoft Terminology: a computer science thesaurus
– High-quality automatic sense-based translations
What is BabelNet?
From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
20. 30/11/2017 20
What is BabelNet?
• A merger of resources of different kinds:
From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
21. Anatomy of BabelNet
30/11/2017Quando il linguaggio incontra l’informatica
Roberto Navigli
21
lemma lemma language
definitionpicture
further
languages
synonyms
synonyms
in other
languages
22. 30/11/2017 22
Anatomy of BabelNet
pronunciation
definition speech
definitions
From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
usage examples
generalizations and other relations
23. 30/11/2017 23
Why do we need BabelNet?
• Multilinguality: the same concept is expressed in tens of
languages
From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
24. 30/11/2017 24
Why do we need BabelNet?
• Multilinguality: the same concept is expressed in tens of
languages
From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
25. 30/11/2017 25
Why do we need BabelNet?
• Multilinguality: the same concept is expressed in tens of
languages
From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
26. 30/11/2017 26
Why do we need BabelNet?
• Multilinguality: the same concept is expressed in tens of
languages
• Coverage: 271 languages and 14 million entries!
– 6M concepts and 7.7M named entities
– 119M word senses
– 378M semantic relations (27 relations per concept on avg.)
– 11M images associated with concepts
– 41M textual definitions
– 2M concepts with domains associated
From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
27. 30/11/2017Multilingual Web Access – WWW 2015
Roberto Navigli
27
Why do we need BabelNet?
• Multilinguality: the same concept is expressed in tens of
languages
• Coverage: 271 languages and 14 million entries!
• Concepts and named entities together: dictionary and
encyclopedic knowledge is semantically interconnected
30/11/2017META Prize 2015: BabelNet
Roberto Navigli
27
28. 30/11/2017Multilingual Web Access – WWW 2015
Roberto Navigli
28
Why do we need BabelNet?
• Multilinguality: the same concept is expressed in tens of
languages
• Coverage: 271 languages and 14 million entries!
• Concepts and named entities together: dictionary and
encyclopedic knowledge is semantically interconnected
• "Dictionary of the future": semantic network structure
with labeled relations, pictures, multilingual synsets
30/11/2017META Prize 2015: BabelNet
Roberto Navigli
28
29. 30/11/2017 29
Why do we need BabelNet?
• Multilinguality: the same concept is expressed in tens of
languages
• Coverage: 271 languages and 14 million entries!
• Concepts and named entities together: dictionary and
encyclopedic knowledge is semantically interconnected
• "Dictionary of the future": semantic network structure
with labeled relations, pictures, multilingual synsets
• Full-fledged taxonomy: is-a relations are available for
both concepts and named entities (Wikipedia Bitaxonomy)
– Ferrari Testarossa is-a sports car
– BabelNet is-a semantic network & encyclopedic dictionary
From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
30. 30/11/2017 30
Why do we need BabelNet?
• Multilinguality: the same concept is expressed in tens of
languages
• Coverage: 271 languages and 14 million entries!
• Concepts and named entities together: dictionary and
encyclopedic knowledge is semantically interconnected
• "Dictionary of the future": semantic network structure
with labeled relations, pictures, multilingual synsets
• Full-fledged taxonomy: is-a relations are available for
both concepts and named entities (Wikipedia Bitaxonomy)
• Easy access: Java and HTTP RESTful APIs; SPARQL
endpoint (2 billion triples)
From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
31. The core of the Linguistic
Linked Open Data cloud!
32. 30/11/2017 32
What can we do with BabelNet?
• Search and translate:
From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
34. What can we do with BabelNet?
• Explore the network:
30/11/2017META Prize 2015: BabelNet
Roberto Navigli
34
35. BabelNet is now live!
• 284 languages
• 15 million concepts and named entities
• 1.8 billion semantic relations
30/11/2017Monolingual and multilingual, latent and
explicit representations of meaning
Roberto Navigli
35
36. Outline of the talk
• Introduction to BabelNet
• Disambiguation and entity linking with Babelfy
• Demos:
– The Babelfy disambiguation system
– Multilingual concept and entity extraction
30/11/2017 36From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
37. ADDRESSING
AMBIGUITY
[Moro, Raganato & Navigli,
TACL 2014]
37From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
38. Multilingual Joint Word Sense Disambiguation
(MultiJEDI)
Key Objective 2: use all languages to disambiguate one
3830/11/2017From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
40. Core step: Connect all candidate meanings
Thomas and Mario are strikers playing in Munich
4030/11/2017From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
41. Key points:
1. Disambiguate in any language with a multilingual
NLP pre-processing pipeline
2. Do not use training data
3. Use a knowledge-based algorithm that exploits a
processed version of the BabelNet graph
BabelNet: past, present and future
Roberto Navigli
Babelfy
44. Babelfy demo
• Demo with a newspaper article
30/11/2017 44From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
45. Working with BabelNet saves you a
lot of time
• Annotating with BabelNet implies annotating with
WordNet, Wikipedia, OmegaWiki, Open Multilingual
WordNet, Wikidata, Wiktionary, ...
Key fact!
45
BabelNet
7
45From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
47. Live demo – Crazy polyglot!
KO 자연 언어 처리, 인공 지능
EN In todayʼs knowledge and information society
FR le paysage lexicographique est plus hétérogène que
jamais.
IT Possono le risorse stand-alone competere
ES con múltiples funciones, portale lexicográficas
multilingüe y servicios web,
ZH Web服务,定 制 的 喜 好 和 个 人 用 户 的 个 人 资 料 ?
30/11/2017 47From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
48. Same can be done with unstructured tags
30/11/2017 48From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
49. Same can be done with unstructured tags
30/11/2017BabelNet, the LLOD cloud and the Industry
Roberto Navigli
49
50. Neural Models for Word Sense Disambiguation
(Raganato, Delli Bovi, Navigli, EMNLP 2017)
• Sequence labeling:
30/11/2017 50
words & BabelNet
concepts
From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
51. Neural Models for Word Sense Disambiguation
(Raganato, Delli Bovi, Navigli, EMNLP 2017)
• Training on English (SemCor sense annotated data)
• Testing on all English Senseval & SemEval test sets
30/11/2017 51From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
52. Neural Models for Word Sense Disambiguation
(Raganato, Delli Bovi, Navigli, EMNLP 2017)
• Training on English (SemCor sense annotated data)
• Testing on arbitrary languages (!) – SemEval 2013
– Using multilingual embeddings to encode words in the same space
30/11/2017 52From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
53. The future of BabelNet and related technologies
• The MultiJEDI ERC project is now over (but: the MOUSSE
ERC grant just started – in a couple of slides!)
• However, much work still to be done in this direction
• We created a Sapienza startup, Babelscape, with the key
objective of making BabelNet sustainable
• Income is reinvested in BabelNet and subsequent projects
Multilinguality for free, or why you should care about
linking vector representations to (BabelNet) synsets
Roberto Navigli
54. Demos
• Babelfy DONE
• Multilingual concept and term extraction
30/11/2017 55From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
55. Demo: multilingual concept and term extraction
• http://babeltex.org
– News in Italian: http://www.repubblica.it
– News in English: http://www.bbc.com/news
– News in Chinese: http://www.bbc.com/zhongwen/simp
30/11/2017 56From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli
56. From MultiJEDI to MOUSSE: a Quantum Leap!
MultiJEDI (2011-16)
Large multilingual repository of
concepts: BabelNet
Disambiguation outputs a bag
of concepts
MOUSSE: Multilingual Open-text Unified Syntax-independent SEmantics
Roberto Navigli
57
MOUSSE (2017-22)
Huge repository of novel structured
universal semantic representations
We will construct a language-
independent structured semantic
representation of the sentence
1
2
1
2
CONCEPTS SENTENCE REPRESENTATIONS
DISAMBIGUATION SEMANTIC PARSING
57. Thanks or…
m i(grazie)
5830/11/2017
MultiJEDI (Starting Grant, 2011-2016) + MOUSSE (Consolidator Grant, 2017-2022)
From Text to Concepts and Back: Going
Multilingual with BabelNet in a Step or Two
Roberto Navigli