Type-aware Entity Retrieval
Darío Garigliotti
IAI
Universitetet i Stavanger (UiS)
NTNU - March 3, 2016
From Information Retrieval to Entity
Retrieval
• Traditional Information Retrieval recently extended
to an Entity-oriented Search
• Satisfaction of more complex information needs
• Current support on search engines
From Information Retrieval to Entity
Retrieval
From Information Retrieval to Entity
Retrieval
Countries where one can pay with the euro
Entity Retrieval
Countries where one can pay with the euro
• Related entities (via a relation or predicate)
• Types or categories or classes
Entity Retrieval
Countries where one can pay with the euro
Impressionist art museums in The Netherlands
• Related entities (via a relation or predicate)
• Types or categories or classes
Entity Retrieval
Countries where one can pay with the euro
Impressionist art museums in The Netherlands
• Related entities (via a relation or predicate)
• Types or categories or classes
Entity Retrieval
Countries where one can pay with the euro
Impressionist art museums in The Netherlands
• Related entities (via a relation or predicate)
• Types or categories or classes
Entity Retrieval
Entity Retrieval
Evaluated tasks
• Entity ranking (given a textual query and target categories)
• List completion (given Q and entity examples, and? types)
• Related entity finding (given entity E, relation R and type T)
e.g. E = "Schumacher", R = "His teammates when he was on
Ferrari", T = "Person"
from Q = "Schumacher teammates when he was on Ferrari"
Type-aware entity retrieval
Our research questions
1. How to represent type-based information?
2. How to combine type-based and textual
information?
3. How to estimate type-based information?
Type-aware entity retrieval
RQ2. How to combine type-based and textual information?
• Basics: term-based models
• A variety of related tasks across the literature
• Entity retrieval approaches
• Where to look for entities? How to find them? How to rank
them?
• Major model families
• Common main insight: types help!
Type-aware entity retrieval
RQ1. How to represent type-based information?
• Dimensions we identified
• type taxonomies
• hierarchical structure
• dataset version
• Minimal concerning in the related work
Type taxonomies
• We consider four well-known type taxonomies
Type system Wikipedia DBpedia Freebase YAGO
#types 753,524 591 1719 568,672
#top-level types NA 58 92 61
#most-specific-level types 753,524 472 1626 549,623
depth NA 7 2 19
entities w/ type 4.12M 3.24M 3.77M 2.89M
avg #types/entity 4.02 6.30 9.57 16.44
Type representation
• We consider different ways of modeling type
assignments:
Top level, most specific level, and path-to-top
r
e
r
e
r
e
Experimental setup
• Our experimental environment looks like this:
Term-based representation
Query model Entity model Query model Entity model
Type-based representation
P(e|q) / P(q|e)P(e)
p(t|✓T
e )p(t|✓T
q )
KL(✓T
q ||✓T
e )
p(t0
| ✓T 0
e )p(t0
| ✓T 0
q )
KL(✓T 0
q k ✓T 0
e )
P(q | e) = (1 )P(✓T 0
q | ✓T 0
e ) + P(✓T
q | ✓T
e )
Experimental setup
• Term-based component: Mixture of LM method
• We obtain combinations of these elements:
• Type taxonomies
• Models
• Type-based representations
Ingredients
• Model instantiations for
• M1 (Mixture):
• M2 (Multiplicative):
• M3 (Filtering):
P(e | q) / P(q | e)P(e)
P(q | e) = (1 )P(✓T 0
q | ✓T 0
e ) + P(✓T
q | ✓T
e )
P(q | e) = P(✓T 0
q | ✓T 0
e )P(✓T
q | ✓T
e )
P(✓T
q | ✓T
e ) 2 {0, 1}
Ingredients
• Query model for the type-based representation
is provided by a target types oracle
P(t|✓T
q )
Query: guitar origin blues DBpedia Types:
<dbo:Album>: 4
<dbo:MusicalArtist>: 43
...
Freebase Types:
<fb:music.group_member>: 34
<fb:people.deceased_person>: 17
...
Wikipedia Categories:
<dbpedia:Category:Blues_musicians_from_New_Orleans,_Louisiana>: 2
<dbpedia:Category:Blues_songs>: 2
...
Ingredients
• Our experimental environment looks like this:
Query model Entity model
Type-based representation
P(e|q) / P(q|e)P(e)
p(t|✓T
e )p(t|✓T
q )
KL(✓T
q ||✓T
e )
P(q | e) = (1 )P(✓T 0
q | ✓T 0
e ) + P(✓T
q | ✓T
e )
Ingredients
• Entity model for the type-based representation is a
distribution estimated through the entity types
Query: guitar origin blues
Relevant entities:
<dbpedia:The_Merle_Travis_Guitar>
<dbpedia:Blues_Breakers_with_Eric_Clapton>
<dbpedia:Poor_Boy_Blues>
...
...
Freebase Types:
... DBpedia Types:
<dbo:Album>
<dbo:MusicalWork>
...
...
Freebase Types:
... Wikipedia Categories:
<Category:1950_albums>
<Category:Merle_Travis_albums>
...
Results (1)
RQ1. How to represent
type-based information?
Type representation - Model M1
MAP
0
0.058
0.115
0.173
0.23
all assigned types most specific level path-to-top top level
YAGO Freebase Wikipedia DBpedia
Type representation - Model M2
MAP
0
0.045
0.09
0.135
0.18
all assigned types most specific level path-to-top top level
YAGO Freebase Wikipedia DBpedia
Type representation - Model M3
MAP
0
0.055
0.11
0.165
0.22
all assigned types most specific level path-to-top top level
YAGO Freebase Wikipedia DBpedia
Results (2)
RQ2. How to combine type-based and textual information?
Combining information - All assigned types
MAP
0
0.06
0.12
0.18
0.24
YAGO Freebase Wikipedia DBpedia
M1 M2 M3
Combining information - Most-specific-level types
MAP
0
0.06
0.12
0.18
0.24
YAGO Freebase Wikipedia DBpedia
M1 M2 M3
Future work
RQ3: How to estimate type-based information?
Term-based representation
Query model Entity model Query model Entity model
Type-based representation
P(e|q) / P(q|e)P(e)
p(t|✓T
e )p(t|✓T
q )
KL(✓T
q ||✓T
e )
p(t0
| ✓T 0
e )p(t0
| ✓T 0
q )
KL(✓T 0
q k ✓T 0
e )
P(q | e) = (1 )P(✓T 0
q | ✓T 0
e ) + P(✓T
q | ✓T
e )
Future work
• Main focus will be on query typing, but eventually
on entity typing as well
• How to take the best from different type taxonomies

Type-Aware Entity Retrieval

  • 1.
    Type-aware Entity Retrieval DaríoGarigliotti IAI Universitetet i Stavanger (UiS) NTNU - March 3, 2016
  • 2.
    From Information Retrievalto Entity Retrieval • Traditional Information Retrieval recently extended to an Entity-oriented Search • Satisfaction of more complex information needs • Current support on search engines
  • 3.
    From Information Retrievalto Entity Retrieval
  • 4.
    From Information Retrievalto Entity Retrieval
  • 5.
    Countries where onecan pay with the euro Entity Retrieval
  • 6.
    Countries where onecan pay with the euro • Related entities (via a relation or predicate) • Types or categories or classes Entity Retrieval
  • 7.
    Countries where onecan pay with the euro Impressionist art museums in The Netherlands • Related entities (via a relation or predicate) • Types or categories or classes Entity Retrieval
  • 8.
    Countries where onecan pay with the euro Impressionist art museums in The Netherlands • Related entities (via a relation or predicate) • Types or categories or classes Entity Retrieval
  • 9.
    Countries where onecan pay with the euro Impressionist art museums in The Netherlands • Related entities (via a relation or predicate) • Types or categories or classes Entity Retrieval
  • 10.
    Entity Retrieval Evaluated tasks •Entity ranking (given a textual query and target categories) • List completion (given Q and entity examples, and? types) • Related entity finding (given entity E, relation R and type T) e.g. E = "Schumacher", R = "His teammates when he was on Ferrari", T = "Person" from Q = "Schumacher teammates when he was on Ferrari"
  • 11.
    Type-aware entity retrieval Ourresearch questions 1. How to represent type-based information? 2. How to combine type-based and textual information? 3. How to estimate type-based information?
  • 12.
    Type-aware entity retrieval RQ2.How to combine type-based and textual information? • Basics: term-based models • A variety of related tasks across the literature • Entity retrieval approaches • Where to look for entities? How to find them? How to rank them? • Major model families • Common main insight: types help!
  • 13.
    Type-aware entity retrieval RQ1.How to represent type-based information? • Dimensions we identified • type taxonomies • hierarchical structure • dataset version • Minimal concerning in the related work
  • 14.
    Type taxonomies • Weconsider four well-known type taxonomies Type system Wikipedia DBpedia Freebase YAGO #types 753,524 591 1719 568,672 #top-level types NA 58 92 61 #most-specific-level types 753,524 472 1626 549,623 depth NA 7 2 19 entities w/ type 4.12M 3.24M 3.77M 2.89M avg #types/entity 4.02 6.30 9.57 16.44
  • 15.
    Type representation • Weconsider different ways of modeling type assignments: Top level, most specific level, and path-to-top r e r e r e
  • 16.
    Experimental setup • Ourexperimental environment looks like this: Term-based representation Query model Entity model Query model Entity model Type-based representation P(e|q) / P(q|e)P(e) p(t|✓T e )p(t|✓T q ) KL(✓T q ||✓T e ) p(t0 | ✓T 0 e )p(t0 | ✓T 0 q ) KL(✓T 0 q k ✓T 0 e ) P(q | e) = (1 )P(✓T 0 q | ✓T 0 e ) + P(✓T q | ✓T e )
  • 17.
    Experimental setup • Term-basedcomponent: Mixture of LM method • We obtain combinations of these elements: • Type taxonomies • Models • Type-based representations
  • 18.
    Ingredients • Model instantiationsfor • M1 (Mixture): • M2 (Multiplicative): • M3 (Filtering): P(e | q) / P(q | e)P(e) P(q | e) = (1 )P(✓T 0 q | ✓T 0 e ) + P(✓T q | ✓T e ) P(q | e) = P(✓T 0 q | ✓T 0 e )P(✓T q | ✓T e ) P(✓T q | ✓T e ) 2 {0, 1}
  • 19.
    Ingredients • Query modelfor the type-based representation is provided by a target types oracle P(t|✓T q ) Query: guitar origin blues DBpedia Types: <dbo:Album>: 4 <dbo:MusicalArtist>: 43 ... Freebase Types: <fb:music.group_member>: 34 <fb:people.deceased_person>: 17 ... Wikipedia Categories: <dbpedia:Category:Blues_musicians_from_New_Orleans,_Louisiana>: 2 <dbpedia:Category:Blues_songs>: 2 ...
  • 20.
    Ingredients • Our experimentalenvironment looks like this: Query model Entity model Type-based representation P(e|q) / P(q|e)P(e) p(t|✓T e )p(t|✓T q ) KL(✓T q ||✓T e ) P(q | e) = (1 )P(✓T 0 q | ✓T 0 e ) + P(✓T q | ✓T e )
  • 21.
    Ingredients • Entity modelfor the type-based representation is a distribution estimated through the entity types Query: guitar origin blues Relevant entities: <dbpedia:The_Merle_Travis_Guitar> <dbpedia:Blues_Breakers_with_Eric_Clapton> <dbpedia:Poor_Boy_Blues> ... ... Freebase Types: ... DBpedia Types: <dbo:Album> <dbo:MusicalWork> ... ... Freebase Types: ... Wikipedia Categories: <Category:1950_albums> <Category:Merle_Travis_albums> ...
  • 22.
    Results (1) RQ1. Howto represent type-based information? Type representation - Model M1 MAP 0 0.058 0.115 0.173 0.23 all assigned types most specific level path-to-top top level YAGO Freebase Wikipedia DBpedia Type representation - Model M2 MAP 0 0.045 0.09 0.135 0.18 all assigned types most specific level path-to-top top level YAGO Freebase Wikipedia DBpedia Type representation - Model M3 MAP 0 0.055 0.11 0.165 0.22 all assigned types most specific level path-to-top top level YAGO Freebase Wikipedia DBpedia
  • 23.
    Results (2) RQ2. Howto combine type-based and textual information? Combining information - All assigned types MAP 0 0.06 0.12 0.18 0.24 YAGO Freebase Wikipedia DBpedia M1 M2 M3 Combining information - Most-specific-level types MAP 0 0.06 0.12 0.18 0.24 YAGO Freebase Wikipedia DBpedia M1 M2 M3
  • 24.
    Future work RQ3: Howto estimate type-based information? Term-based representation Query model Entity model Query model Entity model Type-based representation P(e|q) / P(q|e)P(e) p(t|✓T e )p(t|✓T q ) KL(✓T q ||✓T e ) p(t0 | ✓T 0 e )p(t0 | ✓T 0 q ) KL(✓T 0 q k ✓T 0 e ) P(q | e) = (1 )P(✓T 0 q | ✓T 0 e ) + P(✓T q | ✓T e )
  • 25.
    Future work • Mainfocus will be on query typing, but eventually on entity typing as well • How to take the best from different type taxonomies