Type-Aware Entity Retrieval

Type-aware Entity Retrieval
Dar´ıo Garigliotti
University of Stavanger
Type-aware Entity Retrieval
Dar´ıo Garigliotti
University of Stavanger
Motivation
∎ One of the unique characteristics of entity retrieval is that entities are typed.
∎ Typically, types are organized hierarchically in a type categorization system.
∎ We explore three main identified dimensions to understand how to use entity type information:
⋆ RQ1: How do the retrieval approaches perform across different type taxonomies?
⋆ RQ2: How to represent the type information provided by the type hierarchy?
⋆ RQ3: How to combine type-based and text-based information in retrieval?
Type Taxonomies
We normalize four type systems to an uniform taxonomy structure:
DBpedia Ontology
∎ A well-designed hierarchy.
∎ Created manually by considering the
most frequently used infoboxes in
Wikipedia.
∎ Clean and consistent, but with limited
coverage.
0
1
2
3
4
5
6
7
|Level 1| = 58 types
|Level 7| = 1 type
Freebase Types
∎ A two-layer categorization system:
types and domains.
∎ Entities are only assigned to types,
having most of them “same as”
links to DBpedia entities.
0
1
2
|Level 2| = 1, 626 types
Wikipedia Categories
∎ It consists of textual labels known
as categories.
∎ It’s not a well-defined “is-a” hier-
archy, but a graph: it requires a
major normalization strategy.
∎ Category assignments are neither
consistent nor complete.
0
1
2-10
11-24
25-
34
|Level 2 ∪ ... ∪ Level 10| =
121, 657 types
|Level 11 ∪ ... ∪ Level 24| =
410, 697 types
|Level 25 ∪ ... ∪ Level 34| =
14, 564 types
YAGO Types
∎ A deep subsumption hierarchy.
∎ Constructed by taking leaf categories
from Wikipedia categories and then
using WordNet synsets to establish
the hierarchy.
0
1
2-5
6-10
11-
19
|Level 2 ∪ ... ∪ Level 5| =
80, 384 types
|Level 6 ∪ ... ∪ Level 10| =
461, 843 types
|Level 11 ∪ ... ∪ Level 19| =
26, 383 types
Type
Representations
We propose three representations of
hierarchical type information:
Types along path to the top
t3t3
t2t2
t5t5t4t4
t9t9t8t8
e
t6t6
t12t12
t7t7
…
t10t10 t11t11
t0t0
t1t1 …
Top-level types
t3t3
t2t2
t5t5t4t4
t9t9t8t8
e
t6t6
t12t12
t7t7
…
t10t10 t11t11
t0t0
t1t1 …
Most specific types
t3t3
t2t2
t5t5t4t4
t9t9t8t8
e
t6t6
t12t12
t7t7
…
t10t10 t11t11
t0t0
t1t1 …
Type Information in Retrieval
We define the retrieval task in a generative probabilistic framework.
Both query and entity
are considered in the
term space as well as in
the type space.
An oracle process can
provide the target types
for the query from its
relevant results.
query entity
Olympic games
target types
Rio de Janeiro
term-based
similarity
type-based
similarity
… …
entity types
(Strict) Filtering
P(q e) = P(θT′
q θT′
e ) ⋅ χ[types(q) ∩ types(e) ≠ ∅]
Types(q)Types(q) Types(e)Types(e)
(Soft) Filtering
P(q e) = P(θT′
q θT′
e ) ⋅ P(θT
q θT
e )
Interpolation
P(q e) = (1 − λ) ⋅ P(θT′
q θT′
e ) + λ ⋅ P(θT
q θT
e )
Type weight λ takes values in [0,1] in steps of 0.05. We use the
best performing setting when comparing against other approaches.
Results
DBpedia
Freebase
W
ikipedia
YAGO
0
0.1
0.2
0.3
0.4
MAP
Strict filtering Soft filtering Interpolation
(a) Types along path to top
DBpedia
Freebase
W
ikipedia
YAGO
(b) Top-level types
DBpedia
Freebase
W
ikipedia
YAGO
(c) Most-specific types
Fig. 1: Retrieval performance considering only entities that have types from all four type systems.
Term-based baseline (showed with the red line) and the ground truth are restricted to the same set of entities.
DBpedia
Freebase
W
ikipedia
YAGO
0
0.1
0.2
0.3
0.4
MAP
Strict filtering Soft filtering Interpolation
(a) Types along path to top
DBpedia
Freebase
W
ikipedia
YAGO
(b) Top-level types
DBpedia
Freebase
W
ikipedia
YAGO
(c) Most-specific types
Fig. 2: Retrieval performance considering all entities, and using the full set of relevance judgments.
The red line represents the term-based baseline.
Conclusions
∎ Type information proves most useful when larger,
deeper type taxonomies provide very specific types.
⋆ RQ1 (Type taxonomy): given a type representation and a retrieval model, Wikipedia performs best in most of the cases.
⋆ RQ2 (Type representation): using the most specific types is the most effective way to represent type information.
⋆ RQ3 (Retrieval model): all models suffer from missing type information, but interpolation appears to be the most robust.

Type-Aware Entity Retrieval

More Related Content

What's hot

Similar to Type-Aware Entity Retrieval

More from Darío Garigliotti

Recently uploaded

Type-Aware Entity Retrieval