SlideShare a Scribd company logo
1 of 29
Download to read offline
Global Citation Recommendations Using Knowledge
Graphs
F. Ayala-Gómez, B. Daróczy, A. Benczúr, M. Mathioudakis, A. Gionis
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 1
Motivation
Reference
management
software
• Organize Digital Libraries
• Academic Social Networks
• Recommendations
Recommendations
• Reduces the time spent on retrieving, understanding,
and selecting research papers
Research Papers
Corpus
• The number of scientific articles has exhibited
exponential increase [1,2].
• New publication channels such (e.g., open archives).
• Exciting! But challenging, hard to keep the pace.
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 2
Citations* Recommendations [3]
The task of finding papers to cite
Local Citations
Recommendations
• Detect local contexts (e.g., sentences,
paragraphs in a paper) for which there
exist related papers in the literature.
Global Citation
Recommendations
• Find relevant articles related to a given
corpus as a whole.
• This is the task we are researching.
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 3
*Citations are the entries in the Bibliography of a paper. That is, those papers used as supporting literature. We use the word citation and
reference equally.
Use case
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 4
Author
Research
Abstract
Recommender
System
List of
Top
Articles
“The author has an
accurate idea about the
problem and approach, and
little knowledge of related
work.”
Problem and Intuition
Find papers that are relevant to a short description (abstract)
of a new paper, provided as input.
Intuition
• Avoid using authors profiles (e.g., collaborations, short history, anon. user).
• Focus on the given information need (abstract).
• Leverage methods for expanding semantic features.
• Expand the text using knowledge graphs, and propose an approach that
uses this new set of features.
• Focus on the Top 10 recommendations.
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 5
MICROSOFT ACADEMIC GRAPH (MAG) [32]
127M scientific papers
Metadata (e.g., authors, year)
List of its references
No abstracts
CITESEERX [11]
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 6
Setting: Ground-truth dataset
10M scientific papers
Abstract
No citations
Merge the datasets by the title’s edit distance.
Collection of approximately 2.2 million research papers, 6.7 million authors, and 24 million
citations.
KNOWLEDGE GRAPHS
Structured and detailed
knowledge about a topic.
Usually following the Resource
Description Framework (RDF)
RDF Triplet:
(Puebla, state_of, Mexico)
(subject, predicate, object)
DBPEDIA[6]
The most extensive knowledge
graph built from Wikipedia
DBpedia Spotlight [7] maps
text to Dbpedia resources.
Set of Entities: RDF subjects
Set of Properties: RDF objects
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 7
Setting: Mapping to DBpedia
Example of DBPedia Spotlight
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 8
Mapping to Dbpedia (Example)
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 9
Papers Entities Entities Properties
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 10
Top-10 entities
and their co-
occurrance
Entities
Co-occurrance
Given a query and a set of candidates,
rank the candidates according to their relevance to the given query.
The relevance could be expressed using different labels:
◦ 0: Bad
◦ 1: Ok
◦ 2: Good
◦ 3: Very Good
◦ 4: Excellent
A LTR model should output higher scores for relevant candidates
The Learning to Rank task (LTR)
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 11
Normalized Discounted Cumulative Gain (nDCG)
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 12
i (Rank) ෝ𝒓𝒊 𝐥𝐨𝐠 𝟐(𝒊 + 𝟏)
𝟐ෞ𝒓 𝟏 − 𝟏
𝐥𝐨𝐠 𝟐(𝒊 + 𝟏)
𝒓𝒊
𝟐 𝒓 𝒊 − 𝟏
𝐥𝐨𝐠 𝟐(𝒊 + 𝟏)
1 3 1.0 7.0 3 7.0
2 2 1.6 1.9 3 4.4
3 3 2.0 3.5 2 1.5
4 - 2.3 - 2 1.3
5 1 2.6 0.4 1 0.4
6 2 2.8 1.1 - -
DCG@10 13.8 IDCG@10 14.6
Used in web search and recommender systems
“DCG measures the usefulness, or gain, of a document based on its position in the result list”.
The gain is accumulated from the top of the result list to the bottom, with the gain of each result
discounted at lower ranks.
𝑛𝐷𝐶𝐺@10 =
13.8
14.6
= .948
◦ LambdaRank:
◦ Intuition: To train a LTR model, we don’t need the costs themselves: we only need the gradients
◦ The λ for a given query get contributions from all other candidates for the
same query that have different labels.
◦ λ‘s can be seen as forces that push up or down the candidates
◦ Multiplying by the size of change in NDCG of swapping i and j
gives good results
◦ Now the cost is an utility function and we want to maximize NDCG
◦ NDCG is optimized directly by computing the gradients
after the candidates have been sorted by their score
From RankNet to LambdaRank to LambdaMART
Burges, Christopher JC. "From ranknet to lambdarank to lambdamart: An overview." Learning 11.23-581 (2010): 81.
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 13
Gains on pairwise error
Gains on nDCG
λ
λ
Training Set
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 14
Maximize
Prediction Task
Proposed Approach
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 15
Pair-wise Features
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 16
MLT Score • Lucene’s Algorithm
tf-idf
• Entities
• Properties
• Abstract terms
• Highest/Lowest
Candidate Age
• The years that have passed since the candidate
paper was published.
In-degree • Number of papers citing C
In-degree w/
decay
RANDOM SPLIT PER YEAR
TRAINING ON FULL YEAR
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 17
Splitting data
Papers published in a year
Training Testing
Training Testing on Next Year
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 18
Candidate Set
Results
19
Lucene’s More Like This
method has a higher recall
when the relevant terms in
the abstract and the entities
are combined
LTR: nDCG@10 Results on next year
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 20
Microsoft LightGBM
LTR: Feature Importance
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 21
We expanded semantic features using DBpedia and created a Set of Entities and
Set of Properties.
The set of entities helps improving the candidate set generation.
Our proposed approach of building the candidate set, pairwise abstract-candidate
features, and fitting LambdaMART, improved nDCG@10 over the baselines.
The entities tf-idf and properties tf-idf are of high importance to LambdaMART.
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX
Conclusions
22
Thanks!
Global Citation Recommendations Using
Knowledge Graphs
F. Ayala-Gómez, B. Daróczy, A. Benczúr, M. Mathioudakis, A. Gionis
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 23
Backup Slides
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 24
Approaches used in LTR
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 25
Picture from http://www.tongji.edu.cn/~qiliu/
◦ RankNet:
◦ Intuition: For a pair of candidates, estimate the probability of 𝑈𝑖 (i.e., 𝑐𝑖)being more relevant
than 𝑈𝑗(i.e., 𝑐𝑗), both associated to a given query.
◦ Using the sigmoid function:
◦ Cost (Cross-entropy):
◦ Optimized via SGD
◦ The model could be any model for which the output of the model is a differentiable function of
the model parameters
From RankNet to LambdaRank to LambdaMART
Burges, Christopher JC. "From ranknet to lambdarank to lambdamart: An overview." Learning 11.23-581 (2010): 81.
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 26
1: 𝑟𝑖 > 𝑟𝑗
−1: 𝑟𝑗 > 𝑟𝑖
0: 𝑟𝑖 = 𝑟𝑗
◦ Factorizing RankNet:
◦ Symmetric cost, swapping i and j doesn’t affect
◦ Factorizing
◦ Updating the model parameters is now
◦ Since we need the set I, instead of SGD, mini-batch is used
(i.e., weight updates are first computed for a given query, and then applied)
◦ “training time dropped from close to quadratic in the number of URLs per query, to close to linear”
From RankNet to LambdaRank to LambdaMART
Burges, Christopher JC. "From ranknet to lambdarank to lambdamart: An overview." Learning 11.23-581 (2010): 81.
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 27
1: 𝑟𝑖 > 𝑟𝑗
−1: 𝑟𝑗 > 𝑟𝑖
0: 𝑟𝑖 = 𝑟𝑗
I: Set of pairs of indices {i, j}, for
which we desire 𝑈𝑖 to be ranked
differently from 𝑈𝑗(i.e., 𝑆𝑖𝑗 = 0)
◦ LambdaRank:
◦ Intuition: To train a model, we don’t need the costs themselves: we only need the gradients
◦ The λ for a given query get contributions from all other candidates for the
same query that have different labels.
◦ λ‘s can be seen as forces that push up or down the candidates
◦ Multiplying by the size of change in NDCG of swapping i and j
gives good results
◦ Now the cost is an utility function and we want to maximize NDCG
◦ NDCG is optimized directly by computing the gradients
after the candidates have been sorted by their score
From RankNet to LambdaRank to LambdaMART
Burges, Christopher JC. "From ranknet to lambdarank to lambdamart: An overview." Learning 11.23-581 (2010): 81.
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 28
Gains on pairwise error
Gains on nDCG
λ
λ
◦ LambdaMART:
◦ Intuition: LambdaRank + MART. Boosting using the gradient
From RankNet to LambdaRank to LambdaMART
Burges, Christopher JC. "From ranknet to lambdarank to lambdamart: An overview." Learning 11.23-581 (2010): 81.
LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 29

More Related Content

Similar to Global Citation Recommendations Using Knowledge Graphs

Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Metrics for Evaluating Quality of Embeddings for Ontological Concepts Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Metrics for Evaluating Quality of Embeddings for Ontological Concepts Saeedeh Shekarpour
 
Visualization of Knowledge Distribution across Development Teams using 2.5D S...
Visualization of Knowledge Distribution across Development Teams using 2.5D S...Visualization of Knowledge Distribution across Development Teams using 2.5D S...
Visualization of Knowledge Distribution across Development Teams using 2.5D S...Matthias Trapp
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPindico data
 
Low Cost Business Intelligence Platform for MongoDB instances using MEAN stack
Low Cost Business Intelligence Platform for MongoDB instances using MEAN stackLow Cost Business Intelligence Platform for MongoDB instances using MEAN stack
Low Cost Business Intelligence Platform for MongoDB instances using MEAN stackAvinash Kaza
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)
LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)
LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)Yun Huang
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONIJDKP
 
EFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEM
EFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEMEFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEM
EFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEMNexgen Technology
 
Neo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExpNeo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExpAdrian Ziegler
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computingElena Simperl
 
MongoDB What's new in 3.2 version
MongoDB What's new in 3.2 versionMongoDB What's new in 3.2 version
MongoDB What's new in 3.2 versionHéliot PERROQUIN
 
Survey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POISurvey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POIIRJET Journal
 
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...AI Publications
 
Multi Valued Vectors Lucene
Multi Valued Vectors LuceneMulti Valued Vectors Lucene
Multi Valued Vectors LuceneSease
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document RankingBhaskar Mitra
 

Similar to Global Citation Recommendations Using Knowledge Graphs (20)

Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Metrics for Evaluating Quality of Embeddings for Ontological Concepts Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Metrics for Evaluating Quality of Embeddings for Ontological Concepts
 
BICOD-2017
BICOD-2017BICOD-2017
BICOD-2017
 
Bicod2017
Bicod2017Bicod2017
Bicod2017
 
SICOMORO
SICOMOROSICOMORO
SICOMORO
 
TEXT CLUSTERING.doc
TEXT CLUSTERING.docTEXT CLUSTERING.doc
TEXT CLUSTERING.doc
 
Visualization of Knowledge Distribution across Development Teams using 2.5D S...
Visualization of Knowledge Distribution across Development Teams using 2.5D S...Visualization of Knowledge Distribution across Development Teams using 2.5D S...
Visualization of Knowledge Distribution across Development Teams using 2.5D S...
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
Low Cost Business Intelligence Platform for MongoDB instances using MEAN stack
Low Cost Business Intelligence Platform for MongoDB instances using MEAN stackLow Cost Business Intelligence Platform for MongoDB instances using MEAN stack
Low Cost Business Intelligence Platform for MongoDB instances using MEAN stack
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)
LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)
LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
 
EFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEM
EFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEMEFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEM
EFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEM
 
Neo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExpNeo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExp
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computing
 
MongoDB What's new in 3.2 version
MongoDB What's new in 3.2 versionMongoDB What's new in 3.2 version
MongoDB What's new in 3.2 version
 
Survey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POISurvey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POI
 
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
 
Multi Valued Vectors Lucene
Multi Valued Vectors LuceneMulti Valued Vectors Lucene
Multi Valued Vectors Lucene
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
 
Poster Final
Poster FinalPoster Final
Poster Final
 

Recently uploaded

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad EscortsCall girls in Ahmedabad High profile
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 

Global Citation Recommendations Using Knowledge Graphs

  • 1. Global Citation Recommendations Using Knowledge Graphs F. Ayala-Gómez, B. Daróczy, A. Benczúr, M. Mathioudakis, A. Gionis LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 1
  • 2. Motivation Reference management software • Organize Digital Libraries • Academic Social Networks • Recommendations Recommendations • Reduces the time spent on retrieving, understanding, and selecting research papers Research Papers Corpus • The number of scientific articles has exhibited exponential increase [1,2]. • New publication channels such (e.g., open archives). • Exciting! But challenging, hard to keep the pace. LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 2
  • 3. Citations* Recommendations [3] The task of finding papers to cite Local Citations Recommendations • Detect local contexts (e.g., sentences, paragraphs in a paper) for which there exist related papers in the literature. Global Citation Recommendations • Find relevant articles related to a given corpus as a whole. • This is the task we are researching. LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 3 *Citations are the entries in the Bibliography of a paper. That is, those papers used as supporting literature. We use the word citation and reference equally.
  • 4. Use case LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 4 Author Research Abstract Recommender System List of Top Articles “The author has an accurate idea about the problem and approach, and little knowledge of related work.”
  • 5. Problem and Intuition Find papers that are relevant to a short description (abstract) of a new paper, provided as input. Intuition • Avoid using authors profiles (e.g., collaborations, short history, anon. user). • Focus on the given information need (abstract). • Leverage methods for expanding semantic features. • Expand the text using knowledge graphs, and propose an approach that uses this new set of features. • Focus on the Top 10 recommendations. LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 5
  • 6. MICROSOFT ACADEMIC GRAPH (MAG) [32] 127M scientific papers Metadata (e.g., authors, year) List of its references No abstracts CITESEERX [11] LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 6 Setting: Ground-truth dataset 10M scientific papers Abstract No citations Merge the datasets by the title’s edit distance. Collection of approximately 2.2 million research papers, 6.7 million authors, and 24 million citations.
  • 7. KNOWLEDGE GRAPHS Structured and detailed knowledge about a topic. Usually following the Resource Description Framework (RDF) RDF Triplet: (Puebla, state_of, Mexico) (subject, predicate, object) DBPEDIA[6] The most extensive knowledge graph built from Wikipedia DBpedia Spotlight [7] maps text to Dbpedia resources. Set of Entities: RDF subjects Set of Properties: RDF objects LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 7 Setting: Mapping to DBpedia
  • 8. Example of DBPedia Spotlight LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 8
  • 9. Mapping to Dbpedia (Example) LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 9 Papers Entities Entities Properties
  • 10. LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 10 Top-10 entities and their co- occurrance Entities Co-occurrance
  • 11. Given a query and a set of candidates, rank the candidates according to their relevance to the given query. The relevance could be expressed using different labels: ◦ 0: Bad ◦ 1: Ok ◦ 2: Good ◦ 3: Very Good ◦ 4: Excellent A LTR model should output higher scores for relevant candidates The Learning to Rank task (LTR) LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 11
  • 12. Normalized Discounted Cumulative Gain (nDCG) LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 12 i (Rank) ෝ𝒓𝒊 𝐥𝐨𝐠 𝟐(𝒊 + 𝟏) 𝟐ෞ𝒓 𝟏 − 𝟏 𝐥𝐨𝐠 𝟐(𝒊 + 𝟏) 𝒓𝒊 𝟐 𝒓 𝒊 − 𝟏 𝐥𝐨𝐠 𝟐(𝒊 + 𝟏) 1 3 1.0 7.0 3 7.0 2 2 1.6 1.9 3 4.4 3 3 2.0 3.5 2 1.5 4 - 2.3 - 2 1.3 5 1 2.6 0.4 1 0.4 6 2 2.8 1.1 - - DCG@10 13.8 IDCG@10 14.6 Used in web search and recommender systems “DCG measures the usefulness, or gain, of a document based on its position in the result list”. The gain is accumulated from the top of the result list to the bottom, with the gain of each result discounted at lower ranks. 𝑛𝐷𝐶𝐺@10 = 13.8 14.6 = .948
  • 13. ◦ LambdaRank: ◦ Intuition: To train a LTR model, we don’t need the costs themselves: we only need the gradients ◦ The λ for a given query get contributions from all other candidates for the same query that have different labels. ◦ λ‘s can be seen as forces that push up or down the candidates ◦ Multiplying by the size of change in NDCG of swapping i and j gives good results ◦ Now the cost is an utility function and we want to maximize NDCG ◦ NDCG is optimized directly by computing the gradients after the candidates have been sorted by their score From RankNet to LambdaRank to LambdaMART Burges, Christopher JC. "From ranknet to lambdarank to lambdamart: An overview." Learning 11.23-581 (2010): 81. LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 13 Gains on pairwise error Gains on nDCG λ λ
  • 14. Training Set LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 14 Maximize Prediction Task
  • 15. Proposed Approach LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 15
  • 16. Pair-wise Features LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 16 MLT Score • Lucene’s Algorithm tf-idf • Entities • Properties • Abstract terms • Highest/Lowest Candidate Age • The years that have passed since the candidate paper was published. In-degree • Number of papers citing C In-degree w/ decay
  • 17. RANDOM SPLIT PER YEAR TRAINING ON FULL YEAR LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 17 Splitting data Papers published in a year Training Testing Training Testing on Next Year
  • 18. LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 18
  • 19. Candidate Set Results 19 Lucene’s More Like This method has a higher recall when the relevant terms in the abstract and the entities are combined
  • 20. LTR: nDCG@10 Results on next year LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 20 Microsoft LightGBM
  • 21. LTR: Feature Importance LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 21
  • 22. We expanded semantic features using DBpedia and created a Set of Entities and Set of Properties. The set of entities helps improving the candidate set generation. Our proposed approach of building the candidate set, pairwise abstract-candidate features, and fitting LambdaMART, improved nDCG@10 over the baselines. The entities tf-idf and properties tf-idf are of high importance to LambdaMART. LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX Conclusions 22
  • 23. Thanks! Global Citation Recommendations Using Knowledge Graphs F. Ayala-Gómez, B. Daróczy, A. Benczúr, M. Mathioudakis, A. Gionis LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 23
  • 24. Backup Slides LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 24
  • 25. Approaches used in LTR LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 25 Picture from http://www.tongji.edu.cn/~qiliu/
  • 26. ◦ RankNet: ◦ Intuition: For a pair of candidates, estimate the probability of 𝑈𝑖 (i.e., 𝑐𝑖)being more relevant than 𝑈𝑗(i.e., 𝑐𝑗), both associated to a given query. ◦ Using the sigmoid function: ◦ Cost (Cross-entropy): ◦ Optimized via SGD ◦ The model could be any model for which the output of the model is a differentiable function of the model parameters From RankNet to LambdaRank to LambdaMART Burges, Christopher JC. "From ranknet to lambdarank to lambdamart: An overview." Learning 11.23-581 (2010): 81. LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 26 1: 𝑟𝑖 > 𝑟𝑗 −1: 𝑟𝑗 > 𝑟𝑖 0: 𝑟𝑖 = 𝑟𝑗
  • 27. ◦ Factorizing RankNet: ◦ Symmetric cost, swapping i and j doesn’t affect ◦ Factorizing ◦ Updating the model parameters is now ◦ Since we need the set I, instead of SGD, mini-batch is used (i.e., weight updates are first computed for a given query, and then applied) ◦ “training time dropped from close to quadratic in the number of URLs per query, to close to linear” From RankNet to LambdaRank to LambdaMART Burges, Christopher JC. "From ranknet to lambdarank to lambdamart: An overview." Learning 11.23-581 (2010): 81. LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 27 1: 𝑟𝑖 > 𝑟𝑗 −1: 𝑟𝑗 > 𝑟𝑖 0: 𝑟𝑖 = 𝑟𝑗 I: Set of pairs of indices {i, j}, for which we desire 𝑈𝑖 to be ranked differently from 𝑈𝑗(i.e., 𝑆𝑖𝑗 = 0)
  • 28. ◦ LambdaRank: ◦ Intuition: To train a model, we don’t need the costs themselves: we only need the gradients ◦ The λ for a given query get contributions from all other candidates for the same query that have different labels. ◦ λ‘s can be seen as forces that push up or down the candidates ◦ Multiplying by the size of change in NDCG of swapping i and j gives good results ◦ Now the cost is an utility function and we want to maximize NDCG ◦ NDCG is optimized directly by computing the gradients after the candidates have been sorted by their score From RankNet to LambdaRank to LambdaMART Burges, Christopher JC. "From ranknet to lambdarank to lambdamart: An overview." Learning 11.23-581 (2010): 81. LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 28 Gains on pairwise error Gains on nDCG λ λ
  • 29. ◦ LambdaMART: ◦ Intuition: LambdaRank + MART. Boosting using the gradient From RankNet to LambdaRank to LambdaMART Burges, Christopher JC. "From ranknet to lambdarank to lambdamart: An overview." Learning 11.23-581 (2010): 81. LKE’2017, NOVEMBER 22-24, 2017, PUEBLA, PUE, MX 29