Global Citation Recommendations Using Knowledge Graphs

Global Citation Recommendations Using Knowledge
Graphs
F. Ayala-Gómez, B. Daróczy, A. Benczúr, M. Mathioudakis, A. Gionis
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 1

Motivation
Reference
management
software
• Organize Digital Libraries
• Academic Social Networks
• Recommendations
Recommendations
• Reduces the time spent on retrieving, understanding,
and selecting research papers
Research Papers
Corpus
• The number of scientiﬁc articles has exhibited
exponential increase [1,2].
• New publication channels such (e.g., open archives).
• Exciting! But challenging, hard to keep the pace.

Citations* Recommendations [3]
The task of finding papers to cite
Local Citations
Recommendations
• Detect local contexts (e.g., sentences,
paragraphs in a paper) for which there
exist related papers in the literature.
Global Citation
Recommendations
• Find relevant articles related to a given
corpus as a whole.
• This is the task we are researching.
*Citations are the entries in the Bibliography of a paper. That is, those papers used as supporting literature. We use the word citation and
reference equally.

Use case
Author
Research
Abstract
Recommender
System
List of
Top
Articles
“The author has an
accurate idea about the
problem and approach, and
little knowledge of related
work.”

Problem and Intuition
Find papers that are relevant to a short description (abstract)
of a new paper, provided as input.
Intuition
• Avoid using authors profiles (e.g., collaborations, short history, anon. user).
• Focus on the given information need (abstract).
• Leverage methods for expanding semantic features.
• Expand the text using knowledge graphs, and propose an approach that
uses this new set of features.
• Focus on the Top 10 recommendations.

MICROSOFT ACADEMIC GRAPH (MAG) [32]
127M scientiﬁc papers
Metadata (e.g., authors, year)
List of its references
No abstracts
CITESEERX [11]
Setting: Ground-truth dataset
10M scientiﬁc papers
Abstract
No citations
Merge the datasets by the title’s edit distance.
Collection of approximately 2.2 million research papers, 6.7 million authors, and 24 million
citations.

KNOWLEDGE GRAPHS
Structured and detailed
knowledge about a topic.
Usually following the Resource
Description Framework (RDF)
RDF Triplet:
(Puebla, state_of, Mexico)
(subject, predicate, object)
DBPEDIA[6]
The most extensive knowledge
graph built from Wikipedia
DBpedia Spotlight [7] maps
text to Dbpedia resources.
Set of Entities: RDF subjects
Set of Properties: RDF objects
Setting: Mapping to DBpedia

Example of DBPedia Spotlight

Mapping to Dbpedia (Example)
Papers Entities Entities Properties

Top-10 entities
and their co-
occurrance
Entities
Co-occurrance

Given a query and a set of candidates,
rank the candidates according to their relevance to the given query.
The relevance could be expressed using different labels:
◦ 0: Bad
◦ 1: Ok
◦ 2: Good
◦ 3: Very Good
◦ 4: Excellent
A LTR model should output higher scores for relevant candidates
The Learning to Rank task (LTR)

Normalized Discounted Cumulative Gain (nDCG)
i (Rank) ෝ𝒓𝒊 𝐥𝐨𝐠 𝟐(𝒊 + 𝟏)
𝟐ෞ𝒓 𝟏 − 𝟏
𝐥𝐨𝐠 𝟐(𝒊 + 𝟏)
𝒓𝒊
𝟐 𝒓 𝒊 − 𝟏
𝐥𝐨𝐠 𝟐(𝒊 + 𝟏)
1 3 1.0 7.0 3 7.0
2 2 1.6 1.9 3 4.4
3 3 2.0 3.5 2 1.5
4 - 2.3 - 2 1.3
5 1 2.6 0.4 1 0.4
6 2 2.8 1.1 - -
DCG@10 13.8 IDCG@10 14.6
Used in web search and recommender systems
“DCG measures the usefulness, or gain, of a document based on its position in the result list”.
The gain is accumulated from the top of the result list to the bottom, with the gain of each result
discounted at lower ranks.
𝑛𝐷𝐶𝐺@10 =
13.8
14.6
= .948

◦ LambdaRank:
◦ Intuition: To train a LTR model, we don’t need the costs themselves: we only need the gradients
◦ The λ for a given query get contributions from all other candidates for the
same query that have different labels.
◦ λ‘s can be seen as forces that push up or down the candidates
◦ Multiplying by the size of change in NDCG of swapping i and j
gives good results
◦ Now the cost is an utility function and we want to maximize NDCG
◦ NDCG is optimized directly by computing the gradients
after the candidates have been sorted by their score
From RankNet to LambdaRank to LambdaMART
Burges, Christopher JC. "From ranknet to lambdarank to lambdamart: An overview." Learning 11.23-581 (2010): 81.
Gains on pairwise error
Gains on nDCG
λ
λ

Training Set
Maximize
Prediction Task

Proposed Approach

Pair-wise Features
MLT Score • Lucene’s Algorithm
tf-idf
• Entities
• Properties
• Abstract terms
• Highest/Lowest
Candidate Age
• The years that have passed since the candidate
paper was published.
In-degree • Number of papers citing C
In-degree w/
decay

RANDOM SPLIT PER YEAR
TRAINING ON FULL YEAR
Splitting data
Papers published in a year
Training Testing
Training Testing on Next Year

Candidate Set
Results
19
Lucene’s More Like This
method has a higher recall
when the relevant terms in
the abstract and the entities
are combined

LTR: nDCG@10 Results on next year
Microsoft LightGBM

LTR: Feature Importance

We expanded semantic features using DBpedia and created a Set of Entities and
Set of Properties.
The set of entities helps improving the candidate set generation.
Our proposed approach of building the candidate set, pairwise abstract-candidate
features, and ﬁtting LambdaMART, improved nDCG@10 over the baselines.
The entities tf-idf and properties tf-idf are of high importance to LambdaMART.
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX
Conclusions
22

Thanks!
Global Citation Recommendations Using
Knowledge Graphs
F. Ayala-Gómez, B. Daróczy, A. Benczúr, M. Mathioudakis, A. Gionis

Backup Slides

Approaches used in LTR
Picture from http://www.tongji.edu.cn/~qiliu/

◦ RankNet:
◦ Intuition: For a pair of candidates, estimate the probability of 𝑈𝑖 (i.e., 𝑐𝑖)being more relevant
than 𝑈𝑗(i.e., 𝑐𝑗), both associated to a given query.
◦ Using the sigmoid function:
◦ Cost (Cross-entropy):
◦ Optimized via SGD
◦ The model could be any model for which the output of the model is a differentiable function of
the model parameters
1: 𝑟𝑖 > 𝑟𝑗
−1: 𝑟𝑗 > 𝑟𝑖
0: 𝑟𝑖 = 𝑟𝑗

◦ Factorizing RankNet:
◦ Symmetric cost, swapping i and j doesn’t affect
◦ Factorizing
◦ Updating the model parameters is now
◦ Since we need the set I, instead of SGD, mini-batch is used
(i.e., weight updates are first computed for a given query, and then applied)
◦ “training time dropped from close to quadratic in the number of URLs per query, to close to linear”
1: 𝑟𝑖 > 𝑟𝑗
−1: 𝑟𝑗 > 𝑟𝑖
0: 𝑟𝑖 = 𝑟𝑗
I: Set of pairs of indices {i, j}, for
which we desire 𝑈𝑖 to be ranked
differently from 𝑈𝑗(i.e., 𝑆𝑖𝑗 = 0)

◦ LambdaRank:
◦ Intuition: To train a model, we don’t need the costs themselves: we only need the gradients
◦ The λ for a given query get contributions from all other candidates for the
same query that have different labels.
◦ λ‘s can be seen as forces that push up or down the candidates
◦ Multiplying by the size of change in NDCG of swapping i and j
gives good results
◦ Now the cost is an utility function and we want to maximize NDCG
◦ NDCG is optimized directly by computing the gradients
after the candidates have been sorted by their score
Gains on pairwise error
Gains on nDCG
λ
λ

◦ LambdaMART:
◦ Intuition: LambdaRank + MART. Boosting using the gradient

Global Citation Recommendations Using Knowledge Graphs

Recommended

Recommended

More Related Content

Similar to Global Citation Recommendations Using Knowledge Graphs

Similar to Global Citation Recommendations Using Knowledge Graphs (20)

Recently uploaded

Recently uploaded (20)

Global Citation Recommendations Using Knowledge Graphs