SlideShare a Scribd company logo
1 of 36
Download to read offline
Entity Linking meets Word Sense
Disambiguation: a Unified Approach
Paper by: Andrea Moro, Alessandro Raganato, Roberto Navigli
Dipartimento di Informatica,Sapienza Universita di Roma
Presentation by: Antonio Quirós
Grupo LaBDA (Laboratorio de Bases de Datos Avanzadas)
Universidad Carlos III de Madrid
Babelfy is a unified, multilingual, graph-based approach to Entity
Linking and Word Sense Disambiguation based on a loose
identification of candidate meanings coupled with a densest subgraph
heuristic which selects high-coherence semantic interpretations.
Babelfy is based on the BabelNet 3.0 multilingual semantic network
and jointly performs disambiguation and entity linking.
Entity Linking: Discovering mentions of entities within a text and
linking them in a Knowledge Base.
Word Sense Disambiguation: Assigning meanings to word
occurrencies within a text.
Babelfy combine Entity Linking and Word Sense Disambiguation.
EL & WSD
- Unlike WSD, Babelfy allows overlapping fragments of text
ie: “Major League Baseball”
It identifies and disambiguate several nominal and entity mentions:
“Major League Baseball” - “Major League” - “League” - “Baseball”
- Unlike EL, it links not only Named Entity Mentions (“Major League
Baseball”) but also nominal mentions (“Major League”) to their
corresponding meaning in the Knowledge Base.
Babelfy approach in three steps:
One: Associate each vertex of the Semantic Network with a Semantic
Signature.
Two: Given an input text, extract all the linkable fragments and for
each fragment list the possible meanings according to the Semantic
Network.
Three: Create a graph-based semantic interpretation of the whole text
by linking the candidate meanings of the fragments using the Semantic
Signatures created in the first step, and then, extract a dense subgraph
of this representation and select the best candidate meaning for each
fragment.
Highly related
verticesPerformed only once
Either concept or named entity
Novel approach !!
Step One: (Creating the Semantic Signatures)
Assign higher weight to edges which are involved in more densely
connected areas.
This is accomplished by using “Directed Triangles” (Cycles of lenght 3)
and weight by the number of triangles they occur in.
Step One: (Creating the Semantic Signatures)
Football
weight(v, v') := |{(v, v', v'') : (v, v'), (v', v''), (v'', v) ∈ E}|+1
Ball
Basketall
Field
Sports
Court
Step One: (Creating the Semantic Signatures)
weight(Football, Sports) = | ( (Football, Sports) , (Football, Ball) , (Sports, Ball) ) , ( (Football,
Sports) , (Football, Field) , (Sports, Field) ) | = 2 + 1 = 3
Football
Ball
Field
Sports
Court
Basketall
Step One: (Creating the Semantic Signatures)
2
Football
Ball
Basketall
Field
Sports
Court
2
2 2
2
2
3
3 3
Step One: (Creating the Semantic Signatures)
After assigning weights to each edge, perform a Random Walk with
Restart to create the Semantic Signature: a set of highly related
vertices.
For a fixed number of steps, run a RWR from every vertex v of the
Semantic Network, keep track of the encountered vertices; eliminate
weakly related vertices, keeping only those items that were hit at least
η times.
Finally return the remaining vertices as SemSignv
: the Semantic
Signature of v.
Step One: (Creating the Semantic Signatures)
1: input: v, the starting vertex; , the restart probability;α
n, the number of steps to be executed; P, the transition probabilities;
, the frequency threshold.η
2: output: semSignv, set of related vertices for v.
3: function RWR(v, , n,P, )α η
4: v' := v
5: counts := newMap < Synset, Integer >
6: while n > 0 do
7: if random() > α then
8: given the transition probabilities P(·|v')
9: of v', choose a random neighbor v''
10: v' := v''
11: counts[v']++
12: else
13: restart the walk
14: v' := v
15: n := n 1−
16: for each v' in counts.keys() do
17: if counts[v'] < η then
18: remove v' from counts.keys()
19: return semSignv = counts.keys()
P(v' | v) = weight(v, v')
∑ weight(v, v'')
v'' V∈
Step Two: (Candidate Identification)
Using part-of-speech tagging, identify the set F of all textual fragments
which contains at least one noun and are substring of lexicalizations in
BabelNet.
For each f F look for candidates meanings -∈ cand(f)-: vertices
containing f or, only for named entities, a superstring of f as their
lexicalization.
Babelfy uses a loose candidate identification based on superstring
matching, instead of exact matching.
Step Two: (Candidate Identification)
example:
Word:
Sports
Candidates:
Sports
Water sports
...
Skateboarding {…, Extreme Sports, …}
...
Vertices containing f
Vertices having a superstring of f as one of its
lexicalization (Senses)
Step Three: (Candidate Disambiguation)
Create a directed graph GI
= (VI
, EI
) of the Semantic Interpretations of
the input text.
VI
: Contains all candidate meanings of all fragments
VI
:= {(v, f) : v ∈ cand(f), f F}∈
EI
: Connect two candidate meanings of different fragments if one is in
the semantic signature of the other.
Add an edge from (v, f) to (v', f') iff f ≠ f' and v' semSign∈ v
Step Three: (Candidate Disambiguation)
Once created GI
(The graph representation of all the possible
interpretations) then apply densest subgraph heuristics.
After that, the result is a sub-graph which contains those semantic
interpretations that are most coherent to each other. But this sub-graph
might still containt multiple interpretations for the same fragment.
So, the final step is to select the most suitable candidate meaning for
each fragment f given a threshold to discard semantically unrelated
candidate meanings.
Step Three: (Candidate Disambiguation)
1: input: F, the fragments in the input text; semSign, the semantic signatures;
µ, ambiguity level to be reached; cand, fragments to candidate meanings.
2: output: selected, disambiguated fragments.
3: function DISAMB(F,semSign, µ, cand)
4: VI := ;EI :=∅ ∅
5: GI := (VI,EI)
6: for each fragment f F∈ do
7: for each candidate v cand(f)∈ do
8: VI := VI {(v, f)}∪
9: for each ((v, f), (v', f')) VI × VI∈ do
10: if f ≠ f' and v' semSignv∈ then
11: EI := EI {((v, f), (v', f'))}∪
12: G*I := DENSSUB(F, cand, GI, µ)
13: selected := newMap < String,Synset >
14: for each f F s.t. (v, f) V*I∈ ∃ ∈ do
15: cand*(f) := {v : (v, f) V*I }∈
16: v* := argmaxv cand*(f)∈
score((v, f))
17: if score((v*, f)) ≥ θ then
18: selected(f) := v*
19: return selected
Function with the novel approach!!
Step Three: (Candidate Disambiguation)
Let's see an example:
“The leaf is falling from the tree on my head”
- Leaf has many candidate meanings.
- falling also has many candidate meanings.
- tree also has many candidate meanings.
And, as you might have guessed...
- Head also has many candidate meanings.
Step Three: “The leaf is falling from the tree on my head”
Music, Disc, Record, Rock
( Tree (Álbum), tree )
Thoughts, Feelings, Reason
( Mind, head )
Body, Anatomy, Falling (Accident)
( Head, head )
Guide, Group, Team, Boss
( Leader, head )
Book, Text, Paragraph, Novel
( Header, head )
cand(f)
SemSignv
Physics, Descend, Sky, High
( Fall, falling )
Music, Alicia Keys, Album
( Falling (Song), falling )
Pain, Hit, Push, Trauma
( Falling (Accident), falling )
Action, Hollywood, Cinema
( Falling (Movie), falling )
Nature, Fall, Earth, Oxygen, Leaf
( Tree, tree )
Leaf, Storage, Father, Son, Binary
( Tree (Data Structure), tree )
Node, Euler, Binary, Math, Path
( Tree (Graph Theory), tree )
Fall, Woods, Tree, Forest, Flora, Fall
( Leaf, leaf )
Text, Side, Right, Left, Book, Novel
( Leaf (Book), leaf )
Car, Motor, Vehicle, Japan, Tree
( Nissan Leaf, leaf )
Games, Visual Novel, Publisher
( Leaf (Japanese Co.), leaf )
Music, Pop, Dutch, Falling (Song)
( Leaf (Band), leaf )
(Generate a graph representation with
all possible meanings)
Step Three: (Candidate Disambiguation)
Following the algorithm, create an edge between two vertex if and only
if they do not belong to the same frangment and one is part of the
Semantic Signature of the other.
Step Three: “The leaf is falling from the tree on my head”
Music, Disc, Record, Rock
( Tree (Álbum), tree )
Thoughts, Feelings, Reason
( Mind, head )
Body, Anatomy, Falling (Accident)
( Head, head )
Guide, Group, Team, Boss
( Leader, head )
Book, Text, Paragraph, Novel
( Header, head )
cand(f)
SemSignv
Physics, Descend, Sky, High
( Fall, falling )
Music, Alicia Keys, Album
( Falling (Song), falling )
Pain, Hit, Push, Trauma
( Falling (Accident), falling )
Action, Hollywood, Cinema
( Falling (Movie), falling )
Nature, Fall, Earth, Oxygen, Leaf
( Tree, tree )
Leaf, Storage, Father, Son, Binary
( Tree (Data Structure), tree )
Node, Euler, Binary, Math, Path
( Tree (Graph Theory), tree )
Fall, Woods, Tree, Forest, Flora, Fall
( Leaf, leaf )
Text, Side, Right, Left, Book, Novel
( Leaf (Book), leaf )
Car, Motor, Vehicle, Japan, Tree
( Nissan Leaf, leaf )
Games, Visual Novel, Publisher
( Leaf (Japanese Co.), leaf )
Music, Pop, Dutch, Falling (Song)
( Leaf (Band), leaf )
(Generate a graph representation with
all possible meanings)
Step Three:
Apply densest sub-graph heuristics to obtain a sub-graph which contains those
semantic interpretations that are most coherent to each other
DENSSUB(F, cand, GI
, µ)
We'll come back to it later...
Step Three: “The leaf is falling from the tree on my head”
Body, Anatomy, Falling (Accident)
( Head, head )
Book, Text, Paragraph, Novel
( Header, head )
cand(f)
SemSignv
Physics, Descend, Sky, High
( Fall, falling )
Pain, Hit, Push, Trauma, Tree
( Falling (Accident), falling )
Nature, Root, Earth, Oxygen, Fall
( Tree, tree )
Leaf, Storage, Father, Son, Binary
( Tree (Data Structure), tree )
Fall, Woods, Tree, Forest, Flora, Fall
( Leaf, leaf )
Music, Pop, Dutch, Falling (Song)
( Leaf (Band), leaf )
(Generate a graph representation with
all possible meanings)
Let's assume this is the
output of the blackbox
Step Three:
Then we have to select the most suitable candidate meaning for each fragment f.
We use a given threshold θ to discard semantically unrealted candidates.
For each fragment f, we compute the score of each candidate for that fragment and
keep those candidates which score is higher than θ.
score((v, f)) = w(v,f) · deg((v, f))
∑ w(v',f) · deg((v', f))
v' cand(f)∈
w(v,f) := |{f' F :∈ v' s.t. ((v, f),(v', f')) or ((v', f'),(v, f)) E∃ ∈ I
}|
|F| 1−
deg(v) is the overall number of incoming and outgoing edges
deg(v) := deg+(v)+deg (v)−
Step Three:
In other words: We compute the score for each meaning by calculating it's normalized
weighted degree.
Calculate the weight for the meaning, multiply it by it's degree and divide it by the
sumatory of all scores of the candidates for that fragment.
The weight is calculated as the fraction of fragments the candidate meaning v connects
to. In other words, count the number of fragments the vertex v connects to and divide it
by the number of fragments minus one.
Fragments, not vertex. In other words, if the
vertex v connects to v' and v'' and they both
belong to the same fragment, they count as
one
Step Three: “The leaf is falling from the tree on my head”
Body, Anatomy, Falling (Accident)
( Head, head )
Book, Text, Paragraph, Novel
( Header, head )
cand(f)
SemSignv
Physics, Descend, Sky, High
( Fall, falling )
Pain, Hit, Push, Trauma, Tree
( Falling (Accident), falling )
Nature, Root, Earth, Oxygen, Fall
( Tree, tree )
Leaf, Storage, Father, Son, Binary
( Tree (Data Structure), tree )
Fall, Woods, Tree, Forest, Flora, Fall
( Leaf, leaf )
Music, Pop, Dutch, Falling (Song)
( Leaf (Band), leaf )
(Generate a graph representation with
all possible meanings)
Let's compute the weight of
(Leaf, leaf)
The number of fragments
“(Leafl, leaf)” is linked to, divided by
the number of fragments minus one:
w((Leaf, leaf)) = |{Fall, Tree}| = 2
4 – 1 3
Step Three: “The leaf is falling from the tree on my head”
Body, Anatomy, Falling (Accident)
( Head, head )
Book, Text, Paragraph, Novel
( Header, head )
cand(f)
SemSignv
Physics, Descend, Sky, High
( Fall, falling )
Pain, Hit, Push, Trauma, Tree
( Falling (Accident), falling )
Nature, Root, Earth, Oxygen, Fall
( Tree, tree )
Leaf, Storage, Father, Son, Binary
( Tree (Data Structure), tree )
Fall, Woods, Tree, Forest, Flora, Fall
( Leaf, leaf )
Music, Pop, Dutch, Falling (Song)
( Leaf (Band), leaf )
(Generate a graph representation with
all possible meanings)
And the degree of (Leaf, leaf) is
the number of incomming and
outgoing edges:
deg((Leaf, leaf)) = 3
Step Three:
For our example the computed weights and degrees are in the next table:
Step Three:
Now we can calculate the score for every candidate meaning:
For each candidate multiply it's weight by it's degree (w*d)
Then again for each candidate, divide w*d by the sum of all w*d for that fragment.
For example (Leaf, leaf)
weight((Leaf, leaf)) = 2/3
degree((Leaf, leaf)) = 4
w*d = 8/3
Sum of all others w*d for that
specific fragment (leaf) = 8/3
score((Leaf, leaf)) = 1,000
8
3 = 1
8
3
8
3
Step Three:
For our example the computed scores are in the next table:
Step Three:
Finally, we link each fragment with the highest ranking candidate meaning v* if it's score
is higher than the fixed threshold.
Four our example, for a threshold of 0,7
We keep:
Leaf (plant)
Fall
Tree
Head (as body part)
Which is correct.
Densest Sub-Graph
DENSSUB(F, cand, GI
, µ)
Back to the blackbox !!
Densest Sub-Graph
This is an approach to drastically reduce the level of ambiguity of the initial semantic
interpretation graph.
It is based on the assumption that the most suitable meanings of each text fragment will
belong to the densest area of the graph.
Identify the densest sub-graph of size at least k is NP-Hard. So Babelfy uses a heuristic
for k-partite graphs inspired by a 2-approximation greedy algorithm for arbitrary graphs.
Babelfy strategy is based on the iterative removal of low-coherence vertices.
Densest Sub-Graph
First, start with the initial semantic interpretation graph GI
(0)
at step 0.
For each step, identify the most ambiguous fragment fmax (The one with the maxumum
number of candidate meanings).
Then, discard the weakest interpretation of the current fragment fmax. This is done by
determining the lexical and semantic coherence of each candidate meaning using the
score formula showed before.
The vertex with the minimum score is removed from the graph.
Densest Sub-Graph
Then, in the next step, repeat the low-coherence removal step. And stop when the
number of remaining candidates for each fragment is below a threshold.
During each iteration, compute the average degree of the current step graph, and keep
the densest subgraph of the initial semantic interpretation graph, which is the one that
maximizes the average degree.
Densest Sub-Graph
1: input: F, the set of all fragments in the input text;
cand, from fragments to candidate meanings;
G(0)
I , the full semantic interpretation graph; µ, ambiguity level to be reached.
2: output: G*I, a dense subgraph.
3: function DENSSUB(F, cand, G(0)
I ,µ)
4: t := 0
5: G*I := G(0)
I
6: while true do
7: fmax := argmaxf F∈ |{v : (v, f) V∃ ∈ (t)
I}|
8: if |{v : (v, fmax) V∃ ∈ (t)
I }| µ≤ then
9: break;
10: vmin:= argmin score((v, fmax))
v cand(fmax)∈
11: V(t+1)
I := V(t)
I  {(vmin, fmax)}
12: E(t+1)
I := E(t)
I V∩ (t+1)
I × V(t+1)
I
13: G(t+1)
I := (V(t+1)
I, E(t+1)
I)
14: if avgdeg(G(t+1)
I) > avgdeg(G*I) then
15: G*I := G(t+1)
I
16: t := t+1
17: return G*I
Links
Reference paper about Babelfy:
A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense
Disambiguation: a Unified Approach. Transactions of the Association for
Computational Linguistics (TACL), 2, pp. 231-244, 2014.
http://wwwusers.di.uniroma1.it/~navigli/pubs/TACL_2014_Babelfy.pdf
Babelfy website
http://babelfy.org/
Babelnet website
http://babelnet.org/
Grupo LaBDA
http://labda.inf.uc3m.es/

More Related Content

Similar to Babelfy: Entity Linking meets Word Sense Disambiguation

Real World Haskell: Lecture 2
Real World Haskell: Lecture 2Real World Haskell: Lecture 2
Real World Haskell: Lecture 2
Bryan O'Sullivan
 
Deduplication on large amounts of code
Deduplication on large amounts of codeDeduplication on large amounts of code
Deduplication on large amounts of code
source{d}
 

Similar to Babelfy: Entity Linking meets Word Sense Disambiguation (20)

Short Reference Card for R users.
Short Reference Card for R users.Short Reference Card for R users.
Short Reference Card for R users.
 
Reference card for R
Reference card for RReference card for R
Reference card for R
 
R command cheatsheet.pdf
R command cheatsheet.pdfR command cheatsheet.pdf
R command cheatsheet.pdf
 
@ R reference
@ R reference@ R reference
@ R reference
 
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 TutorialSemantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
 
Everything is composable
Everything is composableEverything is composable
Everything is composable
 
Real World Haskell: Lecture 2
Real World Haskell: Lecture 2Real World Haskell: Lecture 2
Real World Haskell: Lecture 2
 
R Programming Reference Card
R Programming Reference CardR Programming Reference Card
R Programming Reference Card
 
Deduplication on large amounts of code
Deduplication on large amounts of codeDeduplication on large amounts of code
Deduplication on large amounts of code
 
Cs6660 compiler design november december 2016 Answer key
Cs6660 compiler design november december 2016 Answer keyCs6660 compiler design november december 2016 Answer key
Cs6660 compiler design november december 2016 Answer key
 
Invitation to Scala
Invitation to ScalaInvitation to Scala
Invitation to Scala
 
Frp2016 3
Frp2016 3Frp2016 3
Frp2016 3
 
09-graphs.ppt
09-graphs.ppt09-graphs.ppt
09-graphs.ppt
 
Class 31: Deanonymizing
Class 31: DeanonymizingClass 31: Deanonymizing
Class 31: Deanonymizing
 
Fp in scala part 2
Fp in scala part 2Fp in scala part 2
Fp in scala part 2
 
Sequence and Traverse - Part 3
Sequence and Traverse - Part 3Sequence and Traverse - Part 3
Sequence and Traverse - Part 3
 
Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
 
I1
I1I1
I1
 
ML-CheatSheet (1).pdf
ML-CheatSheet (1).pdfML-CheatSheet (1).pdf
ML-CheatSheet (1).pdf
 
Origins of Free
Origins of FreeOrigins of Free
Origins of Free
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Recently uploaded (20)

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 

Babelfy: Entity Linking meets Word Sense Disambiguation

  • 1. Entity Linking meets Word Sense Disambiguation: a Unified Approach Paper by: Andrea Moro, Alessandro Raganato, Roberto Navigli Dipartimento di Informatica,Sapienza Universita di Roma Presentation by: Antonio Quirós Grupo LaBDA (Laboratorio de Bases de Datos Avanzadas) Universidad Carlos III de Madrid
  • 2. Babelfy is a unified, multilingual, graph-based approach to Entity Linking and Word Sense Disambiguation based on a loose identification of candidate meanings coupled with a densest subgraph heuristic which selects high-coherence semantic interpretations. Babelfy is based on the BabelNet 3.0 multilingual semantic network and jointly performs disambiguation and entity linking.
  • 3. Entity Linking: Discovering mentions of entities within a text and linking them in a Knowledge Base. Word Sense Disambiguation: Assigning meanings to word occurrencies within a text. Babelfy combine Entity Linking and Word Sense Disambiguation. EL & WSD
  • 4. - Unlike WSD, Babelfy allows overlapping fragments of text ie: “Major League Baseball” It identifies and disambiguate several nominal and entity mentions: “Major League Baseball” - “Major League” - “League” - “Baseball” - Unlike EL, it links not only Named Entity Mentions (“Major League Baseball”) but also nominal mentions (“Major League”) to their corresponding meaning in the Knowledge Base.
  • 5. Babelfy approach in three steps: One: Associate each vertex of the Semantic Network with a Semantic Signature. Two: Given an input text, extract all the linkable fragments and for each fragment list the possible meanings according to the Semantic Network. Three: Create a graph-based semantic interpretation of the whole text by linking the candidate meanings of the fragments using the Semantic Signatures created in the first step, and then, extract a dense subgraph of this representation and select the best candidate meaning for each fragment. Highly related verticesPerformed only once Either concept or named entity Novel approach !!
  • 6. Step One: (Creating the Semantic Signatures) Assign higher weight to edges which are involved in more densely connected areas. This is accomplished by using “Directed Triangles” (Cycles of lenght 3) and weight by the number of triangles they occur in.
  • 7. Step One: (Creating the Semantic Signatures) Football weight(v, v') := |{(v, v', v'') : (v, v'), (v', v''), (v'', v) ∈ E}|+1 Ball Basketall Field Sports Court
  • 8. Step One: (Creating the Semantic Signatures) weight(Football, Sports) = | ( (Football, Sports) , (Football, Ball) , (Sports, Ball) ) , ( (Football, Sports) , (Football, Field) , (Sports, Field) ) | = 2 + 1 = 3 Football Ball Field Sports Court Basketall
  • 9. Step One: (Creating the Semantic Signatures) 2 Football Ball Basketall Field Sports Court 2 2 2 2 2 3 3 3
  • 10. Step One: (Creating the Semantic Signatures) After assigning weights to each edge, perform a Random Walk with Restart to create the Semantic Signature: a set of highly related vertices. For a fixed number of steps, run a RWR from every vertex v of the Semantic Network, keep track of the encountered vertices; eliminate weakly related vertices, keeping only those items that were hit at least η times. Finally return the remaining vertices as SemSignv : the Semantic Signature of v.
  • 11. Step One: (Creating the Semantic Signatures) 1: input: v, the starting vertex; , the restart probability;α n, the number of steps to be executed; P, the transition probabilities; , the frequency threshold.η 2: output: semSignv, set of related vertices for v. 3: function RWR(v, , n,P, )α η 4: v' := v 5: counts := newMap < Synset, Integer > 6: while n > 0 do 7: if random() > α then 8: given the transition probabilities P(·|v') 9: of v', choose a random neighbor v'' 10: v' := v'' 11: counts[v']++ 12: else 13: restart the walk 14: v' := v 15: n := n 1− 16: for each v' in counts.keys() do 17: if counts[v'] < η then 18: remove v' from counts.keys() 19: return semSignv = counts.keys() P(v' | v) = weight(v, v') ∑ weight(v, v'') v'' V∈
  • 12. Step Two: (Candidate Identification) Using part-of-speech tagging, identify the set F of all textual fragments which contains at least one noun and are substring of lexicalizations in BabelNet. For each f F look for candidates meanings -∈ cand(f)-: vertices containing f or, only for named entities, a superstring of f as their lexicalization. Babelfy uses a loose candidate identification based on superstring matching, instead of exact matching.
  • 13. Step Two: (Candidate Identification) example: Word: Sports Candidates: Sports Water sports ... Skateboarding {…, Extreme Sports, …} ... Vertices containing f Vertices having a superstring of f as one of its lexicalization (Senses)
  • 14. Step Three: (Candidate Disambiguation) Create a directed graph GI = (VI , EI ) of the Semantic Interpretations of the input text. VI : Contains all candidate meanings of all fragments VI := {(v, f) : v ∈ cand(f), f F}∈ EI : Connect two candidate meanings of different fragments if one is in the semantic signature of the other. Add an edge from (v, f) to (v', f') iff f ≠ f' and v' semSign∈ v
  • 15. Step Three: (Candidate Disambiguation) Once created GI (The graph representation of all the possible interpretations) then apply densest subgraph heuristics. After that, the result is a sub-graph which contains those semantic interpretations that are most coherent to each other. But this sub-graph might still containt multiple interpretations for the same fragment. So, the final step is to select the most suitable candidate meaning for each fragment f given a threshold to discard semantically unrelated candidate meanings.
  • 16. Step Three: (Candidate Disambiguation) 1: input: F, the fragments in the input text; semSign, the semantic signatures; µ, ambiguity level to be reached; cand, fragments to candidate meanings. 2: output: selected, disambiguated fragments. 3: function DISAMB(F,semSign, µ, cand) 4: VI := ;EI :=∅ ∅ 5: GI := (VI,EI) 6: for each fragment f F∈ do 7: for each candidate v cand(f)∈ do 8: VI := VI {(v, f)}∪ 9: for each ((v, f), (v', f')) VI × VI∈ do 10: if f ≠ f' and v' semSignv∈ then 11: EI := EI {((v, f), (v', f'))}∪ 12: G*I := DENSSUB(F, cand, GI, µ) 13: selected := newMap < String,Synset > 14: for each f F s.t. (v, f) V*I∈ ∃ ∈ do 15: cand*(f) := {v : (v, f) V*I }∈ 16: v* := argmaxv cand*(f)∈ score((v, f)) 17: if score((v*, f)) ≥ θ then 18: selected(f) := v* 19: return selected Function with the novel approach!!
  • 17. Step Three: (Candidate Disambiguation) Let's see an example: “The leaf is falling from the tree on my head” - Leaf has many candidate meanings. - falling also has many candidate meanings. - tree also has many candidate meanings. And, as you might have guessed... - Head also has many candidate meanings.
  • 18. Step Three: “The leaf is falling from the tree on my head” Music, Disc, Record, Rock ( Tree (Álbum), tree ) Thoughts, Feelings, Reason ( Mind, head ) Body, Anatomy, Falling (Accident) ( Head, head ) Guide, Group, Team, Boss ( Leader, head ) Book, Text, Paragraph, Novel ( Header, head ) cand(f) SemSignv Physics, Descend, Sky, High ( Fall, falling ) Music, Alicia Keys, Album ( Falling (Song), falling ) Pain, Hit, Push, Trauma ( Falling (Accident), falling ) Action, Hollywood, Cinema ( Falling (Movie), falling ) Nature, Fall, Earth, Oxygen, Leaf ( Tree, tree ) Leaf, Storage, Father, Son, Binary ( Tree (Data Structure), tree ) Node, Euler, Binary, Math, Path ( Tree (Graph Theory), tree ) Fall, Woods, Tree, Forest, Flora, Fall ( Leaf, leaf ) Text, Side, Right, Left, Book, Novel ( Leaf (Book), leaf ) Car, Motor, Vehicle, Japan, Tree ( Nissan Leaf, leaf ) Games, Visual Novel, Publisher ( Leaf (Japanese Co.), leaf ) Music, Pop, Dutch, Falling (Song) ( Leaf (Band), leaf ) (Generate a graph representation with all possible meanings)
  • 19. Step Three: (Candidate Disambiguation) Following the algorithm, create an edge between two vertex if and only if they do not belong to the same frangment and one is part of the Semantic Signature of the other.
  • 20. Step Three: “The leaf is falling from the tree on my head” Music, Disc, Record, Rock ( Tree (Álbum), tree ) Thoughts, Feelings, Reason ( Mind, head ) Body, Anatomy, Falling (Accident) ( Head, head ) Guide, Group, Team, Boss ( Leader, head ) Book, Text, Paragraph, Novel ( Header, head ) cand(f) SemSignv Physics, Descend, Sky, High ( Fall, falling ) Music, Alicia Keys, Album ( Falling (Song), falling ) Pain, Hit, Push, Trauma ( Falling (Accident), falling ) Action, Hollywood, Cinema ( Falling (Movie), falling ) Nature, Fall, Earth, Oxygen, Leaf ( Tree, tree ) Leaf, Storage, Father, Son, Binary ( Tree (Data Structure), tree ) Node, Euler, Binary, Math, Path ( Tree (Graph Theory), tree ) Fall, Woods, Tree, Forest, Flora, Fall ( Leaf, leaf ) Text, Side, Right, Left, Book, Novel ( Leaf (Book), leaf ) Car, Motor, Vehicle, Japan, Tree ( Nissan Leaf, leaf ) Games, Visual Novel, Publisher ( Leaf (Japanese Co.), leaf ) Music, Pop, Dutch, Falling (Song) ( Leaf (Band), leaf ) (Generate a graph representation with all possible meanings)
  • 21. Step Three: Apply densest sub-graph heuristics to obtain a sub-graph which contains those semantic interpretations that are most coherent to each other DENSSUB(F, cand, GI , µ) We'll come back to it later...
  • 22. Step Three: “The leaf is falling from the tree on my head” Body, Anatomy, Falling (Accident) ( Head, head ) Book, Text, Paragraph, Novel ( Header, head ) cand(f) SemSignv Physics, Descend, Sky, High ( Fall, falling ) Pain, Hit, Push, Trauma, Tree ( Falling (Accident), falling ) Nature, Root, Earth, Oxygen, Fall ( Tree, tree ) Leaf, Storage, Father, Son, Binary ( Tree (Data Structure), tree ) Fall, Woods, Tree, Forest, Flora, Fall ( Leaf, leaf ) Music, Pop, Dutch, Falling (Song) ( Leaf (Band), leaf ) (Generate a graph representation with all possible meanings) Let's assume this is the output of the blackbox
  • 23. Step Three: Then we have to select the most suitable candidate meaning for each fragment f. We use a given threshold θ to discard semantically unrealted candidates. For each fragment f, we compute the score of each candidate for that fragment and keep those candidates which score is higher than θ. score((v, f)) = w(v,f) · deg((v, f)) ∑ w(v',f) · deg((v', f)) v' cand(f)∈ w(v,f) := |{f' F :∈ v' s.t. ((v, f),(v', f')) or ((v', f'),(v, f)) E∃ ∈ I }| |F| 1− deg(v) is the overall number of incoming and outgoing edges deg(v) := deg+(v)+deg (v)−
  • 24. Step Three: In other words: We compute the score for each meaning by calculating it's normalized weighted degree. Calculate the weight for the meaning, multiply it by it's degree and divide it by the sumatory of all scores of the candidates for that fragment. The weight is calculated as the fraction of fragments the candidate meaning v connects to. In other words, count the number of fragments the vertex v connects to and divide it by the number of fragments minus one. Fragments, not vertex. In other words, if the vertex v connects to v' and v'' and they both belong to the same fragment, they count as one
  • 25. Step Three: “The leaf is falling from the tree on my head” Body, Anatomy, Falling (Accident) ( Head, head ) Book, Text, Paragraph, Novel ( Header, head ) cand(f) SemSignv Physics, Descend, Sky, High ( Fall, falling ) Pain, Hit, Push, Trauma, Tree ( Falling (Accident), falling ) Nature, Root, Earth, Oxygen, Fall ( Tree, tree ) Leaf, Storage, Father, Son, Binary ( Tree (Data Structure), tree ) Fall, Woods, Tree, Forest, Flora, Fall ( Leaf, leaf ) Music, Pop, Dutch, Falling (Song) ( Leaf (Band), leaf ) (Generate a graph representation with all possible meanings) Let's compute the weight of (Leaf, leaf) The number of fragments “(Leafl, leaf)” is linked to, divided by the number of fragments minus one: w((Leaf, leaf)) = |{Fall, Tree}| = 2 4 – 1 3
  • 26. Step Three: “The leaf is falling from the tree on my head” Body, Anatomy, Falling (Accident) ( Head, head ) Book, Text, Paragraph, Novel ( Header, head ) cand(f) SemSignv Physics, Descend, Sky, High ( Fall, falling ) Pain, Hit, Push, Trauma, Tree ( Falling (Accident), falling ) Nature, Root, Earth, Oxygen, Fall ( Tree, tree ) Leaf, Storage, Father, Son, Binary ( Tree (Data Structure), tree ) Fall, Woods, Tree, Forest, Flora, Fall ( Leaf, leaf ) Music, Pop, Dutch, Falling (Song) ( Leaf (Band), leaf ) (Generate a graph representation with all possible meanings) And the degree of (Leaf, leaf) is the number of incomming and outgoing edges: deg((Leaf, leaf)) = 3
  • 27. Step Three: For our example the computed weights and degrees are in the next table:
  • 28. Step Three: Now we can calculate the score for every candidate meaning: For each candidate multiply it's weight by it's degree (w*d) Then again for each candidate, divide w*d by the sum of all w*d for that fragment. For example (Leaf, leaf) weight((Leaf, leaf)) = 2/3 degree((Leaf, leaf)) = 4 w*d = 8/3 Sum of all others w*d for that specific fragment (leaf) = 8/3 score((Leaf, leaf)) = 1,000 8 3 = 1 8 3 8 3
  • 29. Step Three: For our example the computed scores are in the next table:
  • 30. Step Three: Finally, we link each fragment with the highest ranking candidate meaning v* if it's score is higher than the fixed threshold. Four our example, for a threshold of 0,7 We keep: Leaf (plant) Fall Tree Head (as body part) Which is correct.
  • 31. Densest Sub-Graph DENSSUB(F, cand, GI , µ) Back to the blackbox !!
  • 32. Densest Sub-Graph This is an approach to drastically reduce the level of ambiguity of the initial semantic interpretation graph. It is based on the assumption that the most suitable meanings of each text fragment will belong to the densest area of the graph. Identify the densest sub-graph of size at least k is NP-Hard. So Babelfy uses a heuristic for k-partite graphs inspired by a 2-approximation greedy algorithm for arbitrary graphs. Babelfy strategy is based on the iterative removal of low-coherence vertices.
  • 33. Densest Sub-Graph First, start with the initial semantic interpretation graph GI (0) at step 0. For each step, identify the most ambiguous fragment fmax (The one with the maxumum number of candidate meanings). Then, discard the weakest interpretation of the current fragment fmax. This is done by determining the lexical and semantic coherence of each candidate meaning using the score formula showed before. The vertex with the minimum score is removed from the graph.
  • 34. Densest Sub-Graph Then, in the next step, repeat the low-coherence removal step. And stop when the number of remaining candidates for each fragment is below a threshold. During each iteration, compute the average degree of the current step graph, and keep the densest subgraph of the initial semantic interpretation graph, which is the one that maximizes the average degree.
  • 35. Densest Sub-Graph 1: input: F, the set of all fragments in the input text; cand, from fragments to candidate meanings; G(0) I , the full semantic interpretation graph; µ, ambiguity level to be reached. 2: output: G*I, a dense subgraph. 3: function DENSSUB(F, cand, G(0) I ,µ) 4: t := 0 5: G*I := G(0) I 6: while true do 7: fmax := argmaxf F∈ |{v : (v, f) V∃ ∈ (t) I}| 8: if |{v : (v, fmax) V∃ ∈ (t) I }| µ≤ then 9: break; 10: vmin:= argmin score((v, fmax)) v cand(fmax)∈ 11: V(t+1) I := V(t) I {(vmin, fmax)} 12: E(t+1) I := E(t) I V∩ (t+1) I × V(t+1) I 13: G(t+1) I := (V(t+1) I, E(t+1) I) 14: if avgdeg(G(t+1) I) > avgdeg(G*I) then 15: G*I := G(t+1) I 16: t := t+1 17: return G*I
  • 36. Links Reference paper about Babelfy: A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the Association for Computational Linguistics (TACL), 2, pp. 231-244, 2014. http://wwwusers.di.uniroma1.it/~navigli/pubs/TACL_2014_Babelfy.pdf Babelfy website http://babelfy.org/ Babelnet website http://babelnet.org/ Grupo LaBDA http://labda.inf.uc3m.es/