We propose the Parametrized Fielded Sequential Dependence Model (PFSDM) and the Parametrized Fielded Full Dependence Model (PFFDM), two novel models for entity retrieval from knowledge graphs, which infer the user's intent behind each individual query concept by dynamically estimating its projection onto the fields of structured entity representations based on a small number of statistical and linguistic features.
Parameterized Fielded Term Dependence Models for Ad-hoc Entity Retrieval from Knowledge Graph
1. 1/31
Parameterized Fielded Term Dependence Models
for Ad-hoc Entity Retrieval from Knowledge
Graph
Fedor Nikolaev 1,2 Alexander Kotov 1 Nikita Zhiltsov 2
1Textual Data Analytics Lab, Department of Computer Science, Wayne State
University
2Kazan Federal University
3. 3/31
Entities
• Material objects or concepts in the
real world or fiction (e.g. people,
movies, conferences etc.)
• Are connected with other entities
by relations (e.g. hasGenre,
actedIn, isPCmemberOf etc.)
• Subject-Predicate-Object (SPO)
triple: subject=entity;
object=entity (or primitive data
value); predicate=relationship
between subject and object
• Many SPO triples → knowledge
graph
5. 5/31
Entity Retrieval from Knowledge Graph(s)
• Graph KBs are perfectly suited for addressing the information
needs that aim at finding specific objects (entities) rather
than documents
• Given the user’s information need expressed as a keyword
query, retrieve a relevant set of objects from the knowledge
graph(s)
6. 6/31
Typical ERWD tasks
• Entity Search
Queries refer to a particular entity.
• “Ben Franklin”
• “England football player highest paid”
• “Einstein Relativity theory”
• List Search
Complex queries with several relevant entities.
• “US presidents since 1960”
• “animals lay eggs mammals”
• Question Answering
Queries are questions in natural language.
• “Who is the architect of leaning tower of Pisa?”
• “For which label did Elvis record his first album?”
7. 7/31
Entity document
An entity is represented as a structured (multi-fielded) document:
names
Conventional names of the entities, such as the name
of a person or the name of an organization
attributes
All entity properties, other than names
categories
Classes or groups, to which the entity has been
assigned
similar entity names
Names of the entities that are very similar or
identical to a given entity
related entity names
Names of the entities that are part of the same RDF
triple
8. 8/31
Entity document example
Multi-fielded entity document for the entity Barack Obama.
Field Content
names barack obama barack hussein obama ii
attributes 44th current president united states
birth place honolulu hawaii
categories democratic party united states senator
nobel peace prize laureate christian
similar entity names barack obama jr barak hussein obama
barack h obama ii
related entity names spouse michelle obama illinois state
predecessor george walker bush
9. 9/31
PRMS
P(Q|d) =
qi ∈Q j
PM(Ej|qi )PQL(qi |ej),
where
PM(Ej|w) =
PM(w|Ej)PM(Ej)
Ek ∈E PM(w|Ek)PM(Ek)
10. 10/31
SDM, FDM
Ranks w.r.t. PΛ(D|Q) = i∈{T,U,O} λi fi (Q, D)
Potential function for unigrams is QL:
fT (qi , D) = log P(qi |θD) = log
tfqi ,D + µ
cfqi
|C|
|D| + µ
SDM only considers two-word sequences in queries, FDM considers
all two-word combinations.
11. 11/31
Fielded SDM
FSDM incorporates document structure and term dependencies
into the one ranking model.
Potential function for unigrams in case of FSDM:
˜fT (qi , D) = log
j
wT
j P(qi |θj
D) = log
j
wT
j
tfqi ,Dj + µj
cf j
qi
|Cj |
|Dj| + µj
12. 12/31
FSDM limitation
In FSDM field weights are the same for all query concepts of the
same type.
Example
capitals in Europe which were host cities of summer Olympic games
14. 13/31
Parametric extension of FSDM
wT
qi ,j =
k
αU
j,kφk(qi , j)
• φk(qi , j) is the the k-th feature value for unigram qi in field j.
15. 13/31
Parametric extension of FSDM
wT
qi ,j =
k
αU
j,kφk(qi , j)
• φk(qi , j) is the the k-th feature value for unigram qi in field j.
• αU
j,k are feature weights that we learn.
16. 13/31
Parametric extension of FSDM
wT
qi ,j =
k
αU
j,kφk(qi , j)
• φk(qi , j) is the the k-th feature value for unigram qi in field j.
• αU
j,k are feature weights that we learn.
j
wT
qi ,j = 1, wT
qi ,j ≥ 0, αU
j,k ≥ 0, 0 ≤ φk(qi , j) ≤ 1
17. 13/31
Parametric extension of FSDM
wT
qi ,j =
k
αU
j,kφk(qi , j)
• φk(qi , j) is the the k-th feature value for unigram qi in field j.
• αU
j,k are feature weights that we learn.
j
wT
qi ,j = 1, wT
qi ,j ≥ 0, αU
j,k ≥ 0, 0 ≤ φk(qi , j) ≤ 1
PFFDM is the same, but uses full dependence model.
18. 14/31
Features
Source Feature Description CT
Collection
statistics
FP(κ, j) Posterior probability P(Ej|w). UG BG
TS(κ, j) Top SDM score on j-th field
when κ is used as a query.
BG
19. 14/31
Features
Source Feature Description CT
Collection
statistics
FP(κ, j) Posterior probability P(Ej|w). UG BG
TS(κ, j) Top SDM score on j-th field
when κ is used as a query.
BG
Stanford
POS
Tagger
NNP(κ) Is concept κ a proper noun? UG
NNS(κ) Is κ a plural non-proper noun? UG BG
JJS(κ) Is κ a superlative adjective? UG
Stanford
Parser
NPP(κ) Is κ part of a noun phrase? BG
NNO(κ) Is κ the only singular non-
proper noun in a noun phrase?
UG
INT Intercept feature (= 1). UG BG
20. 15/31
Parameters of PFSDM
Both PFSDM and PFFDM have F ∗ U + F ∗ B + 3 free
parameters: ˆαU, ˆαB, ˆλ .
We perform direct optimization w.r.t. target metric (e.g. MAP)
using coordinate ascent.
21. 16/31
Collections
1 DBPedia 3.7
• Structured version of on-line encyclopedia Wikipedia
• Provides the descriptions of over 3.5 million entities belonging
to 320 classes
2 BTC-2009
• Contains entities from multiple knowledge bases.
• Consists of 1.14 billion RDF triples.
22. 17/31
Query sets
Balog and Neumayer. A Test Collection for Entity Search in
DBpedia, SIGIR’13.
Query set Amount Query types [Pound et al., 2010]
SemSearch ES 130 Entity
ListSearch 115 Type
INEX-LD 100 Entity, Type, Attribute, Relation
QALD-2 140 Entity, Type, Attribute, Relation
Only SemSearch ES judgments are available for BTC-2009, so for
BTC-2009 we used only this query set.
38. 29/31
Conclusion
• Entity-centric keyword queries have an implicit structure, with
each element in that structure designating a particular aspect
in multi-fielded representation of relevant entities.
• We proposed two novel models for ad-hoc entity retrieval from
knowledge graph, which account for term dependencies and
perform feature-based projection of query concepts onto the
fields of entity documents.
• By demonstrating the possibility of inferring implicit structure
of keyword queries using linguistic attributes and simple field
statistics of query concepts, the proposed models constitute
an important step in the evolution of models for structured
document retrieval.
39. 30/31
Feature work
We hypothesize that the proposed models can be effective in other
structured information retrieval scenarios, such as product and
social graph search, and leave verification of this hypothesis to
future work.