PhD Day: Entity Linking using Ontology Modularization

PhD Day – 04/2014
Bianca Pereira

Outline
Literature Review
Define the PhD topic

Entity Linking is..
“Grounding entity mentions in documents to
Knowledge Base entries”
- TAC-KBP 2009

http://en.wikipedia.org/wiki/The_Guardian http://en.wikipedia.org/wiki/National_Security_Agency
http://en.wikipedia.org/wiki/British_peoplehttp://en.wikipedia.org/wiki/Edward_snowden

Types of Entity
Domains of Knowledge
Methods
Accuracy
Time

Types of Entity
Named Entities Unamed Entities
Topics Classes
Natural Language Processing
Statistics
Entity Linking

EVERYTHING !
Natural Language Processing
Statistics
Entity Linking

Types of Entity
Named Entities Given by Class
Given by Knowledge Base
Others

Domains of Knowledge
Cross-domain Knowledge Base

Methods
“(…) Collective Inference over a set of entities can lead
to better performance.”
- Stoyanov et al 2012

Named Entity Recognition Disambiguation

Disambiguation
http://en.wikipedia.org/wiki/Michael_Jackson
http://en.wikipedia.org/wiki/Popular_music
http://en.wikipedia.org/wiki/Beat_It
http://en.wikipedia.org/wiki/Billie_Jean
http://en.wikipedia.org/wiki/Thriller_(song)

Collective Inference are algorithms
for Disambiguation
Co

URI1
URI2
URI3
URI4URI5
URI6
URI7
URI8
URI9
URI10

A Local Context is used to give the
mention-candidate score
Co
URI1

There is coherence between
entities in the same document.
Co

Disambiguation using collective
inference is a NP problem.
Co

URI1
URI4URI5
URI6
URI7
URI8
230
candidates
24
candidates

“The number of contexts [entities] is
overwhelming and had to be reduced to
a manageable size.”
- Cucerzan 2007

“Much speed is gained by imposing a
threshold below which all senses
[candidates] are discarded”
- Milne and Witten 2008

“Inference is NP Hard”
- Kulkarni et al 2009

“(…) exact algorithms on large
input graphs are infeasible.”
- Hoffart et al 2011

Collective Inference - Accuracy

Collective Inference - Time
Using approximation algorithms the
time is suitable for the task

Recalling
Given by Knowledge Base
Cross-domain
Knowledge
Base

Problem Statement
The time spent in disambiguation for Entity Linking
increases with the size of the Knowledge Base. It
turns the disambiguation with large Knowledge
Bases infeasible.

Two solutions for the problem..
1.  Approximation Algorithms
2.  Dimensionality Reduction

Approximation Algorithms
Kulkarni et al 2009, Hoffart et al 2011

Dimensionality Reduction
URI1
URI4URI5
URI6
URI7
URI8
230
24
URI1
URI4URI5
URI6
URI7
URI8
URI2
URI3
URI9
URI10
Cucerzan 2007, Milne and Witten 2008, Hoffart et al 2011

(candidate space)
Algorithm
Knowledge Base

(candidate space)
Algorithm
Knowledge Base
Related Work

R1. Is it possible to delimit a feasible maximum
amount of time for disambiguation regardless of
the size of the Knowledge Base?
R2. Is it possible to reduce the dimensionality
directly in the Knowledge Base?
R3. Is it feasible to use exact algorithms for
disambiguation using large Knowledge Bases?

R1. Is it possible to delimit a feasible
maximum amount of time for disambiguation
regardless of the size of the Knowledge
Base?
H1. There is a maximum size of candidate
set that allows disambiguation in a feasible
time.

Base?
H2. If the Knowledge Base can be divided
in subsets of constant ambiguity then the
candidate space is constant.

Base?
Subset of constant ambiguity
Candidate space constant
Candidate space = maximum allowed size
Feasible time

R2. Is it possible to reduce the
dimensionality directly in the Knowledge
Base?
H3. The relatedness between entities is a
sufficient condition to reduce the
dimensionality without loss of accuracy.

R3. Is it feasible to use exact algorithms
for disambiguation using large Knowledge
Bases?
H4. Decreasing the ambiguity in the
Knowledge Base is less time consuming
that perform it at disambiguation time.

R3. Is it feasible to use exact algorithms
for disambiguation using large Knowledge
Bases?
H5. Exact algorithms can be used in a
feasible time until a maximum size of
candidate space.

Ontology Modularization for
Disambiguation in Entity Linking

How to Generate the Modules?
Semantic-Driven Strategies
Depends on the Application.
Structure-Driven Strategies
Graph Decomposition based on inter-relation.
Machine Learning Strategies
Data Mining and Clustering.

H1. There is a maximum size of candidate set that
allows disambiguation in a feasible time.

H1. There is a maximum size of candidate set that
allows disambiguation in a feasible time.
Perform an experiment using different
collective inference approaches to discover
how the time increases with the size of the
candidate set.

H2. If the Knowledge Base can be divided in
subsets of constant ambiguity then the candidate
space is constant.

H2. If the Knowledge Base can be divided in
subsets of constant ambiguity then the candidate
space is constant.
Perform Ontology Modularization
aiming a maximum ambiguity in each
module.

H3. The relatedness between entities is a sufficient
condition to reduce the dimensionality without loss
of accuracy.

H3. The relatedness between entities is a sufficient
condition to reduce the dimensionality without loss
of accuracy.
Generate the module based on the
same relatedness measure used by the
original method and verify the accuracy.

H4. Decreasing the ambiguity in the Knowledge
Base is less time consuming that perform it at
disambiguation time.

H4. Decreasing the ambiguity in the Knowledge
Base is less time consuming that perform it at
disambiguation time.
Measure the time for disambiguation
r e d u c i n g t h e d i m e n s i o n a l i t y a t
disambiguation time and using the
Modularization approach.

H5. Exact algorithms can be used in a feasible
time until a maximum size of candidate space.

H5. Exact algorithms can be used in a feasible
time until a maximum size of candidate space.
Select a set of exact algorithms and
measure the time for different sizes of
candidate space.

Next Steps
Doctoral Consortium
TAC-KBP
First Experiments
Use Cases

Thank you!
Bianca Pereira
bianca.pereira@insight-centre.org

PhD Day: Entity Linking using Ontology Modularization

More Related Content

What's hot

Similar to PhD Day: Entity Linking using Ontology Modularization

More from Bianca Pereira

PhD Day: Entity Linking using Ontology Modularization