Date: June 10, 2016
Venue: Stavanger, Norway. Doctoral Seminar at the IAI group for the research visit of Prof. Kalervo Järvelin
Please cite, link to or credit this presentation when using it or part of it in your work.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Type-Aware Entity Retrieval
1. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
Type-aware Entity Retrieval
Dar´ıo Garigliotti
University of Stavanger
June 10, 2016
Dar´ıo Garigliotti Type-aware Entity Retrieval
2. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
Outline:
1 Types and Entity Retrieval
2 Environment Dimensions
Type taxonomies
Type representations
Retrieval models
3 Type-aware Entity Retrieval
Dar´ıo Garigliotti Type-aware Entity Retrieval
3. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
Types and Entity Retrieval
Traditional Information Retrieval recently extended to an
Entity-oriented Search
It revolves around the satisfaction of more complex
information needs
Several entity elements from knowledge bases, naturally
appearing in queries
Countries where one can pay with the euro
Related entities (via a relation or predicate)
Types or categories or classes
Dar´ıo Garigliotti Type-aware Entity Retrieval
4. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
Types and Entity Retrieval
Dar´ıo Garigliotti Type-aware Entity Retrieval
5. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
Types and Entity Retrieval
Dar´ıo Garigliotti Type-aware Entity Retrieval
6. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
Types and Entity Retrieval
Why to think about types?
Entities are typed
Types are useful for retrieval, presentation,
summarization...
Related tasks, e.g.
Entity ranking (given a query and target categories)
List completion (given a query and entity examples, and?
types)
Query target type identification
Our focus is on emergent dimensions to explore
Dar´ıo Garigliotti Type-aware Entity Retrieval
7. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
Type taxonomies
Type representations
Retrieval models
Type taxonomies
There are different type taxonomies from various knowledge
bases
DBpedia Ontology
Freebase Types
Wikipedia Categories
YAGO Taxonomy
These vary a lot in terms of hierarchical structure and in how
entity-type assignments are recorded
Normalisation efforts are needed
Dar´ıo Garigliotti Type-aware Entity Retrieval
8. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
Type taxonomies
Type representations
Retrieval models
DBpedia Ontology
A well-designed hierarchy
Created manually by
considering the most
frequently used infoboxes
in Wikipedia
Clean and consistent, but
with limited coverage
0
1
2
3
4
5
6
7
|Level 1| = 58 types
|Level 2| = 114 types
|Level 3| = 142 types
|Level 4| = 213 types
|Level 5| = 45 types
|Level 6| = 17 types
|Level 7| = 1 type
Dar´ıo Garigliotti Type-aware Entity Retrieval
9. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
Type taxonomies
Type representations
Retrieval models
DBpedia Ontology
Dar´ıo Garigliotti Type-aware Entity Retrieval
10. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
Type taxonomies
Type representations
Retrieval models
Freebase Types
A two-layer categorization
system: types and
domains
Entities are only assigned
to types, having most of
them “same as” links to
DBpedia entities
0
1
2
|Level 1| = 92 types
|Level 2| = 1, 626 types
Dar´ıo Garigliotti Type-aware Entity Retrieval
11. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
Type taxonomies
Type representations
Retrieval models
Wikipedia Categories
It consists of textual
labels known as
categories
It’s not a well-defined
“is-a” hierarchy, but a
graph
Category assignments
are neither consistent
nor complete
It requires a major
normalisation strategy
0
1
2-10
11-24
25-
34
|Level 1| = 27 types
|Level 2 ∪ ... ∪ Level 10| =
121, 657 types
|Level 11 ∪ ... ∪ Level 24| =
410, 697 types
|Level 25 ∪ ... ∪ Level 34| =
14, 564 types
Dar´ıo Garigliotti Type-aware Entity Retrieval
12. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
Type taxonomies
Type representations
Retrieval models
YAGO Taxonomy
A deep subsumption
hierarchy
Its classification schema is
constructed by taking leaf
categories from Wikipedia
categories and then using
WordNet synsets to
establish the hierarchy
0
1
2-5
6-10
11-
19
|Level 1| = 61 types
|Level 2 ∪ ... ∪ Level 5| =
80, 384 types
|Level 6 ∪ ... ∪ Level 10| =
461, 843 types
|Level 11 ∪ ... ∪ Level 19| =
26, 383 types
Dar´ıo Garigliotti Type-aware Entity Retrieval
13. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
Type taxonomies
Type representations
Retrieval models
Type representations
How to represent the hierarchical information?
Dar´ıo Garigliotti Type-aware Entity Retrieval
14. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
Type taxonomies
Type representations
Retrieval models
Type representations
How to represent the hierarchical information?
t3t3
t2t2
t5t5t4t4
t9t9t8t8
e
t6t6
t12t12
t7t7
…
t10t10 t11t11
t0t0
t1t1 …
Type(s) along path
to top
Dar´ıo Garigliotti Type-aware Entity Retrieval
15. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
Type taxonomies
Type representations
Retrieval models
Type representations
How to represent the hierarchical information?
t3t3
t2t2
t5t5t4t4
t9t9t8t8
e
t6t6
t12t12
t7t7
…
t10t10 t11t11
t0t0
t1t1 …
Type(s) along path
to top
t3t3
t2t2
t5t5t4t4
t9t9t8t8
e
t6t6
t12t12
t7t7
…
t10t10 t11t11
t0t0
t1t1 …
Top-level type(s)
Dar´ıo Garigliotti Type-aware Entity Retrieval
16. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
Type taxonomies
Type representations
Retrieval models
Type representations
How to represent the hierarchical information?
t3t3
t2t2
t5t5t4t4
t9t9t8t8
e
t6t6
t12t12
t7t7
…
t10t10 t11t11
t0t0
t1t1 …
Type(s) along path
to top
t3t3
t2t2
t5t5t4t4
t9t9t8t8
e
t6t6
t12t12
t7t7
…
t10t10 t11t11
t0t0
t1t1 …
Top-level type(s)
t3t3
t2t2
t5t5t4t4
t9t9t8t8
e
t6t6
t12t12
t7t7
…
t10t10 t11t11
t0t0
t1t1 …
Most specific type(s)
Dar´ıo Garigliotti Type-aware Entity Retrieval
17. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
Type taxonomies
Type representations
Retrieval models
Retrieval models
Retrieval task
defined in a
generative
probabilistic
framework
P(q | e)
query entity
Olympic games
target types
Rio de Janeiro
term-based
similarity
type-based
similarity
… …
entity types
Both query and entity considered in the term space as well as
in the type space
Dar´ıo Garigliotti Type-aware Entity Retrieval
18. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
Type taxonomies
Type representations
Retrieval models
Retrieval models
(Strict) Filtering model
P(q | e) = P(θT
q | θT
e ) · χ[types(q) ∩ types(e) = ∅]
Types(q)Types(q) Types(e)Types(e)
Dar´ıo Garigliotti Type-aware Entity Retrieval
19. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
Type taxonomies
Type representations
Retrieval models
Retrieval models
(Soft) Filtering model
P(q | e) = P(θT
q | θT
e ) · P(θT
q | θT
e )
Dar´ıo Garigliotti Type-aware Entity Retrieval
20. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
Type taxonomies
Type representations
Retrieval models
Retrieval models
Interpolation model
P(q | e) = (1 − λ) · P(θT
q | θT
e ) + λ · P(θT
q | θT
e )
Dar´ıo Garigliotti Type-aware Entity Retrieval
21. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
What did we do?
Lessons learned
What are we doing?
What did we do?
We systematically identified and compared all combinations of
those dimensions
4 type taxonomies: DBpedia Ontology (3.9), Freebase
Types (2015-03-31), Wikipedia Categories (for DBpedia
3.9) and YAGO Taxonomy (3.0.2)
3 type representations: path-to-top, top-level, most
specific
3 models: strict and soft filtering, interpolation
Environment: from idealized to realistic
query types oracle
entities fully typed in all the taxonomies
Dar´ıo Garigliotti Type-aware Entity Retrieval
22. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
What did we do?
Lessons learned
What are we doing?
What did we do? Results
Dar´ıo Garigliotti Type-aware Entity Retrieval
23. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
What did we do?
Lessons learned
What are we doing?
Lessons learned
Summary of insights:
How to represent hierarchical entity type information?
(RQ1) Using the most specific types appears to be the
best way
What (kind of) type taxonomies to use? (RQ2) Wikipedia,
in combination with most specific types, performs the best
in both the idealized and the more realistic scenarios
What combination model to choose? (RQ3) The
interpolation model appears to be more robust
Dar´ıo Garigliotti Type-aware Entity Retrieval
24. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
What did we do?
Lessons learned
What are we doing?
Further analysis: strict filtering vs interpolation models
Strict filtering treats
target types as a set
Interpolation operates
with a probability
distribution over types
When we drop from
oracle every type
assigned to less than 3
entities, interpolation
adapts quite better
DBpedia Freebase Wikipedia YAGO
Most-specific types
DBpedia Freebase Wikipedia YAGO
Most-specific types
Dar´ıo Garigliotti Type-aware Entity Retrieval
25. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
What did we do?
Lessons learned
What are we doing?
Further analysis: query-level ranking details
E.g. performance for
(Interpolation, Most
specific level,
Wikipedia-3.9)
query = “Which books by
Kerouac were published
by Viking Press?”
Types: 90 (including
Viking Press books)
Types of the hurt relevant
entities: all contain
Viking Press books
Dar´ıo Garigliotti Type-aware Entity Retrieval
26. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
What did we do?
Lessons learned
What are we doing?
Further analysis: query-level ranking details
E.g. performance for
(Interpolation, Most
specific level,
Wikipedia-3.9)
query = “Give me all
actors starring in Batman
Begins”
All 7 relevant entities are
improved
Dar´ıo Garigliotti Type-aware Entity Retrieval
27. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
What did we do?
Lessons learned
What are we doing?
What are we doing?
Automatic query target type detection
Baselines
Entity-centric: first, to rank entities based on their relevance
to the query, then look at what types the top-k ranked
entities have
Type-centric: to build a direct term-based representation for
each type, by aggregating descriptions of entities of that type
Learning-to-rank with several features
Dar´ıo Garigliotti Type-aware Entity Retrieval
28. Types and Entity Retrieval
Environment Dimensions
Type-aware Entity Retrieval
What did we do?
Lessons learned
What are we doing?
What are we doing? Target type detection
Dar´ıo Garigliotti Type-aware Entity Retrieval