SlideShare a Scribd company logo
| 0
Maya Hristakeva (@mayahhf)
Beyond Collaborative Filtering:
Learning to Rank Research Articles
8th November 2018
| 1
The Team
Data Science, Engineering & Product
| 2
What do we do?
| 3
| 4
Our Users
We combine content and data with analytics and technology to help:
RESEARCHERS
to make new discoveries and
have more impact on society
CLINICIANS
to treat patients better
and save more lives
NURSES
throughout their careers
and to help save lives
| 5
Researcher’s Journey
Help me
stay up to
date
Help me
showcase my
work
Help me
organise my
writing
Help me
make peer review more
rewarding
Help me
publish faster
Help me
manage research
data
Help me
with my editorial decisionsHelp me
connect with the right
people
Help me
secure funding
Help me
read and evaluate
articles
| 6
Being the best researcher you can be!
• Good researchers are on top of their game
• Large amount of research produced
• Takes time to get what you need
• Help researchers by recommending relevant content
| 7
Recommenders @ Elsevier
| 8
Mendeley Suggest – personalized article and people
recommenders
| 9
Science Direct – personalized and related article
recommenders
| 10
Mendeley Funding & Institutional Recommenders
| 11
Science Direct Related Articles
| 12
ScienceDirect – related article recommender
• Scientific publication
database
• 15 million articles
• 14 million monthly visitors
| 13
Science Direct V1 Recommender
• Goal
- Present users with related articles based on the article they are
reading
• Start simple & iterate
- Browsing logs to generate item-to-item CF recommendations
- Article content as business logic filtering based on recency, article
type
| 14
Item-based kNN Collaborative Filtering
Recommend articles that are similar to the ones you browsed
- Similarity is based on article co-occurrences in users’ browsing sessions
- “Users who read x also read y”
Identify similar articles using cosine similarity: cos $%, $' =
)*×),
)* × ),
Why we use it?
- Gives good results
- Scales relatively well
- Relatively simple to implement
| 15
Evaluation: Session prediction task
• Article browsing logs:
• Predict what users would browse next
• Time-split evaluation
< "#""$%&'(, *+,$-.#'(, *--#""/$0# >
Train model Query
Ground
truth
Time, user interactions
Test
| 16
CF & Significance Weighting
• Scale down cosine similarity with significance weighting
• Preference is given to high co-occurrence neighbors
- k – min # sessions in common to get original cosine similarity
• Alternative – minimum co-occurrence threshold
- Significantly reduces the catalogue coverage
score &', &) = min 1,
|0'⋂0)|
2
x 345678(&', &))
| 17
Other CF Improvements
• Min/max filters for # articles per user-session & # users-sessions per article
• ~ 12 months of browsing logs
- gives good coverage
- removes cyclical nature of academic year
- focuses on more “current” interactions
• Bias for recent activity using time decay functions (e.g. exponential)
• Using article content as business logic filters for recency and article types
| 18
Collaborative Filtering in production
Recommendations
per article
IBCF
Article
views/downloads
| 19
Can we do any better?
| 20
A Wealth of Data
• Usage Data
- Logged-in activity
- Alt-metrics,
popularity, trending
• Social Features
- User profiles
- Social network
- Collaboration groups
• > 60 million records: journals, conferences,
books, patents …
• The most accurate and complete citation & co-
author graphs
• Reputation metrics for articles, authors and
journals
• > 15 million full text articles
• Article browsing logs
• Recommender impression and click logs!!!
| 21
Learning to Rank (LtR)
CF
candidates
Enriched
candidates
Re-ranked recs
Features
LtR
model
Use CF as candidate selection
Enrich with item and user features
Re-rank results based on learnt model optimised for CtR
| 22
LtR Features
Reputation &
Alt-Metrics Text
Topics
Temporal
Images: wsj, alamy, bookedelic
CF similarity
score &', &) = min 1,
|0'⋂0)|
2
x 345678(&', &))
Citation Network
| 23
LtR Models
• Set of labelled query documents and their associated recommended
documents with feature vectors and relevance judgements
• Different optimization objectives – point-wise, pair-wise & list-wise
• RankLib java-based LtR package
- RankNet – pair-wise neural network algorithm
- LambdaRank – extension of RankNet optimizing list-wise IR metrics such as
NDCG
- LambdaMART – list-wise approach combining LambdaRank and MART
< "#$%&'()*, %$)'(),*-ℎ/$0-#%$12, %$34)(%$*2 >
| 24
Recommender Logs
LtR requires labelled training data that represents user preferences
in relation to the recommendation lists
Recommender Logs
- Impressions – recs shown to the user
- Clicks & conversions – recs the user engaged with
- Timestamp – when the event happened
- Page-load ID – groups recs that were shown at the same time
| 25
Training data for LtR Models
• Query-recommendations pairs with relevance labels inferred from
recommender logs
• For each query article
- Aggregate the recommended articles across all user sessions
- Count # impressions & clicks for each recommendation
- Compute graded relevance scores based on CTR
| 26
Explore/Exploit via Dithering
Slightly shuffle the list of recommendations
• Allows for the exploration of the list
• Gives the impression of freshness
• Reduces some of the bias in LtR training data
!"#$%&'()*+*& = log $012 + 4 0, log 7
where < =
∆ $012
$012
and tipically < ∈ [1.5,2]
| 27
Evaluation: Click prediction task
• Data:
• Rank higher the recommendations users engage with
• Time-split evaluation
< "#$%&'(&)$*+%,-, &%*(&)$*+%/$)ℎ1%2)#&%3, &%+425%+ >
Train model
Validation
Set
Test
Set
Time, user interactions
| 28
Results
• LtR improved the quality of recommendations
- 9-10% improvement in user engagement
- Winner is LambdaMART - GBDT with list-wise optimization
• LtR increased journal diversity in recommendation lists
• LtR promotes recently published articles in the last year
• Best ranking model combines usage data with rich structured
network and meta data
| 29
Offline evaluation should match the online challenge
• Candidate generation – Collaborative Filtering – session prediction task
• Re-ranking candidates – Learning-to-Rank – click prediction task
| 30
LtR in Production
LtR
rescoringIBCF
Recommendation
clicks
Training data
LtR
model
Article
views/downloads
| 31
Next Steps & Future Directions
| 32
Alternative Approaches
Graph-based approaches
- Random walks for candidate generation
Deep Learning
- Learn more complex features for LtR
- Neural embeddings for candidate
generation
- Hybrid systems for ranking
| 33
Evaluation – correcting for bias & confounding
• Algorithm confounding
- How algorithmic confounding in recommendation systems increases homogeneity and
decreases utility. Allison J. B. Chaney, Brandon M. Stewart, and Barbara E. Engelhardt
(RecSys '18).
• Explore/exploit – multi-armed bandits
- Explore, exploit, and explain: personalizing explainable recommendations with bandits.
James McInerney, et al. (RecSys ‘18).
• Counterfactuals
- Counterfactual reasoning and learning systems: The example of computational advertising.
Bottou, Léon, et al. (JMLR 2013).
| 34
Qualitative & Quantitative Evaluation
https://github.com/jeanigarcia/recsys2018-evaluation-tutorial
| 35
Challenges
| 36
Recommender Team Publications
Hristakeva, M., Kershaw, D., Pettit, B., Vargas, S., & Jack, K. (2019). Academic recommendations:
The Mendeley case. In Collaborative Recommendations: Algorithms, Practical Challenges and
Applications.
Pettit, B., Hristakeva, M., Kershaw, D. & Jack, K. (2018). Learning to Rank Research Articles: A case
study of collaborative filtering and learning to rank in Science Direct.
Hristakeva, M., Kershaw, D., Rossetti, M., Knoth, P., Pettit, B., Vargas, S., & Jack, K. (2017). Building
recommender systems for scholarly information. WSDM2017.
Rossetti, M., Vargas, S., Pettit, B., Kershaw, D., Hristakeva, M., & Jack, K. (2017). Effectively
identifying users’ research interests for scholarly reference management and discovery. WSDM2017.
Vargas, S., Hristakeva, M., & Jack, K. (2016). Mendeley: Recommendations for
Researchers. RecSys ’16
| 37
References
From RankNet to LambdaRank to LambdaMART: An Overview (2010). Christopher J. C. Burges.
On Application of Learning to Rank for E-Commerce Search by Shubhra Kanti Karmaker Santu,
Parikshit Sondhi, and ChengXiang Zhai (SIGIR 2017).
Recommender Systems Handbook (2010). Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul
B. Kantor.
Practical Machine Learning: Innovations in Recommendation (2014).
Ted Dunning and Ellen Friedman. O'Reilly Media, Inc.
Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time by Chantat
Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu, Rahul Sharma, Charles Sugnet, Mark
Ulrich, and Jure Leskovec (WWW 2018).
Getting Deep Recommenders Fit: Bloom Embeddings for Sparse Binary Input/Output Networks by
Joan Serrà and Alexandros Karatzoglou (RecSys 2017)
We're hiring, come speak to
us!
https://www.elsevier.com/about/careers/technology-careers
| 39
www.elsevier.com/rd-solutions
Thank you

More Related Content

What's hot

Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector Representations
Parang Saraf
 
Getting the Most Out of Your E-Resources: Measuring Success
Getting the Most Out of Your E-Resources: Measuring SuccessGetting the Most Out of Your E-Resources: Measuring Success
Getting the Most Out of Your E-Resources: Measuring Successkramsey
 
Perception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document ClusteringPerception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document Clustering
IRJET Journal
 
Tdm recent trends
Tdm recent trendsTdm recent trends
Tdm recent trendsKU Leuven
 
An adaptive clustering and classification algorithm for Twitter data streamin...
An adaptive clustering and classification algorithm for Twitter data streamin...An adaptive clustering and classification algorithm for Twitter data streamin...
An adaptive clustering and classification algorithm for Twitter data streamin...
TELKOMNIKA JOURNAL
 
Scholarly Information Practices In The Online Environment
Scholarly Information Practices In The Online EnvironmentScholarly Information Practices In The Online Environment
Scholarly Information Practices In The Online Environment
OCLC Research
 
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
ijtsrd
 
A Review on Resource Discovery Strategies in Grid Computing
A Review on Resource Discovery Strategies in Grid ComputingA Review on Resource Discovery Strategies in Grid Computing
A Review on Resource Discovery Strategies in Grid Computing
iosrjce
 

What's hot (8)

Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector Representations
 
Getting the Most Out of Your E-Resources: Measuring Success
Getting the Most Out of Your E-Resources: Measuring SuccessGetting the Most Out of Your E-Resources: Measuring Success
Getting the Most Out of Your E-Resources: Measuring Success
 
Perception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document ClusteringPerception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document Clustering
 
Tdm recent trends
Tdm recent trendsTdm recent trends
Tdm recent trends
 
An adaptive clustering and classification algorithm for Twitter data streamin...
An adaptive clustering and classification algorithm for Twitter data streamin...An adaptive clustering and classification algorithm for Twitter data streamin...
An adaptive clustering and classification algorithm for Twitter data streamin...
 
Scholarly Information Practices In The Online Environment
Scholarly Information Practices In The Online EnvironmentScholarly Information Practices In The Online Environment
Scholarly Information Practices In The Online Environment
 
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
 
A Review on Resource Discovery Strategies in Grid Computing
A Review on Resource Discovery Strategies in Grid ComputingA Review on Resource Discovery Strategies in Grid Computing
A Review on Resource Discovery Strategies in Grid Computing
 

Similar to Beyond Collaborative Filtering: Learning to Rank Research Articles

Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Charalampos Chelmis
 
Document Recommendation using Boosting Based Multi-graph Classification: A Re...
Document Recommendation using Boosting Based Multi-graph Classification: A Re...Document Recommendation using Boosting Based Multi-graph Classification: A Re...
Document Recommendation using Boosting Based Multi-graph Classification: A Re...
IRJET Journal
 
Paving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsPaving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflows
The University of Edinburgh
 
PhD defense
PhD defense PhD defense
PhD defense
Giuseppe Ricci
 
SFScon18 - Ludovik Coba - rrecsys: an R library for prototyping and assessing...
SFScon18 - Ludovik Coba - rrecsys: an R library for prototyping and assessing...SFScon18 - Ludovik Coba - rrecsys: an R library for prototyping and assessing...
SFScon18 - Ludovik Coba - rrecsys: an R library for prototyping and assessing...
South Tyrol Free Software Conference
 
Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems
Bibliometric-enhanced Retrieval Models for Big Scholarly Information SystemsBibliometric-enhanced Retrieval Models for Big Scholarly Information Systems
Bibliometric-enhanced Retrieval Models for Big Scholarly Information SystemsGESIS
 
Building Recommender Systems - Mendeley and Science Direct
Building Recommender Systems - Mendeley and Science DirectBuilding Recommender Systems - Mendeley and Science Direct
Building Recommender Systems - Mendeley and Science Direct
Daniel Kershaw
 
Data_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfData_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdf
jill734733
 
Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...
ResearchSpace
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
IRJET Journal
 
Filtering content bbased crs
Filtering content bbased crsFiltering content bbased crs
Filtering content bbased crs
Aravindharamanan S
 
Information retrieval systems irt ppt do
Information retrieval systems irt ppt doInformation retrieval systems irt ppt do
Information retrieval systems irt ppt do
PonnuthuraiSelvaraj1
 
Data Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and FutureData Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and Futurefeiwin
 
intro.ppt
intro.pptintro.ppt
intro.ppt
UbaidURRahman78
 
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
IRJET Journal
 
8th sem (1)
8th sem (1)8th sem (1)
8th sem (1)
IdiotJackveer
 
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Gabriel Moreira
 
Contextual model of recommending resources on an academic networking portal
Contextual model of recommending resources on an academic networking portalContextual model of recommending resources on an academic networking portal
Contextual model of recommending resources on an academic networking portal
csandit
 
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTAL
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTALCONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTAL
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTAL
cscpconf
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TV
Francisco Couto
 

Similar to Beyond Collaborative Filtering: Learning to Rank Research Articles (20)

Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
 
Document Recommendation using Boosting Based Multi-graph Classification: A Re...
Document Recommendation using Boosting Based Multi-graph Classification: A Re...Document Recommendation using Boosting Based Multi-graph Classification: A Re...
Document Recommendation using Boosting Based Multi-graph Classification: A Re...
 
Paving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsPaving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflows
 
PhD defense
PhD defense PhD defense
PhD defense
 
SFScon18 - Ludovik Coba - rrecsys: an R library for prototyping and assessing...
SFScon18 - Ludovik Coba - rrecsys: an R library for prototyping and assessing...SFScon18 - Ludovik Coba - rrecsys: an R library for prototyping and assessing...
SFScon18 - Ludovik Coba - rrecsys: an R library for prototyping and assessing...
 
Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems
Bibliometric-enhanced Retrieval Models for Big Scholarly Information SystemsBibliometric-enhanced Retrieval Models for Big Scholarly Information Systems
Bibliometric-enhanced Retrieval Models for Big Scholarly Information Systems
 
Building Recommender Systems - Mendeley and Science Direct
Building Recommender Systems - Mendeley and Science DirectBuilding Recommender Systems - Mendeley and Science Direct
Building Recommender Systems - Mendeley and Science Direct
 
Data_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfData_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdf
 
Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
 
Filtering content bbased crs
Filtering content bbased crsFiltering content bbased crs
Filtering content bbased crs
 
Information retrieval systems irt ppt do
Information retrieval systems irt ppt doInformation retrieval systems irt ppt do
Information retrieval systems irt ppt do
 
Data Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and FutureData Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and Future
 
intro.ppt
intro.pptintro.ppt
intro.ppt
 
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...
 
8th sem (1)
8th sem (1)8th sem (1)
8th sem (1)
 
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
 
Contextual model of recommending resources on an academic networking portal
Contextual model of recommending resources on an academic networking portalContextual model of recommending resources on an academic networking portal
Contextual model of recommending resources on an academic networking portal
 
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTAL
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTALCONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTAL
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTAL
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TV
 

Recently uploaded

一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 

Recently uploaded (20)

一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 

Beyond Collaborative Filtering: Learning to Rank Research Articles

  • 1. | 0 Maya Hristakeva (@mayahhf) Beyond Collaborative Filtering: Learning to Rank Research Articles 8th November 2018
  • 2. | 1 The Team Data Science, Engineering & Product
  • 3. | 2 What do we do?
  • 4. | 3
  • 5. | 4 Our Users We combine content and data with analytics and technology to help: RESEARCHERS to make new discoveries and have more impact on society CLINICIANS to treat patients better and save more lives NURSES throughout their careers and to help save lives
  • 6. | 5 Researcher’s Journey Help me stay up to date Help me showcase my work Help me organise my writing Help me make peer review more rewarding Help me publish faster Help me manage research data Help me with my editorial decisionsHelp me connect with the right people Help me secure funding Help me read and evaluate articles
  • 7. | 6 Being the best researcher you can be! • Good researchers are on top of their game • Large amount of research produced • Takes time to get what you need • Help researchers by recommending relevant content
  • 9. | 8 Mendeley Suggest – personalized article and people recommenders
  • 10. | 9 Science Direct – personalized and related article recommenders
  • 11. | 10 Mendeley Funding & Institutional Recommenders
  • 12. | 11 Science Direct Related Articles
  • 13. | 12 ScienceDirect – related article recommender • Scientific publication database • 15 million articles • 14 million monthly visitors
  • 14. | 13 Science Direct V1 Recommender • Goal - Present users with related articles based on the article they are reading • Start simple & iterate - Browsing logs to generate item-to-item CF recommendations - Article content as business logic filtering based on recency, article type
  • 15. | 14 Item-based kNN Collaborative Filtering Recommend articles that are similar to the ones you browsed - Similarity is based on article co-occurrences in users’ browsing sessions - “Users who read x also read y” Identify similar articles using cosine similarity: cos $%, $' = )*×), )* × ), Why we use it? - Gives good results - Scales relatively well - Relatively simple to implement
  • 16. | 15 Evaluation: Session prediction task • Article browsing logs: • Predict what users would browse next • Time-split evaluation < "#""$%&'(, *+,$-.#'(, *--#""/$0# > Train model Query Ground truth Time, user interactions Test
  • 17. | 16 CF & Significance Weighting • Scale down cosine similarity with significance weighting • Preference is given to high co-occurrence neighbors - k – min # sessions in common to get original cosine similarity • Alternative – minimum co-occurrence threshold - Significantly reduces the catalogue coverage score &', &) = min 1, |0'⋂0)| 2 x 345678(&', &))
  • 18. | 17 Other CF Improvements • Min/max filters for # articles per user-session & # users-sessions per article • ~ 12 months of browsing logs - gives good coverage - removes cyclical nature of academic year - focuses on more “current” interactions • Bias for recent activity using time decay functions (e.g. exponential) • Using article content as business logic filters for recency and article types
  • 19. | 18 Collaborative Filtering in production Recommendations per article IBCF Article views/downloads
  • 20. | 19 Can we do any better?
  • 21. | 20 A Wealth of Data • Usage Data - Logged-in activity - Alt-metrics, popularity, trending • Social Features - User profiles - Social network - Collaboration groups • > 60 million records: journals, conferences, books, patents … • The most accurate and complete citation & co- author graphs • Reputation metrics for articles, authors and journals • > 15 million full text articles • Article browsing logs • Recommender impression and click logs!!!
  • 22. | 21 Learning to Rank (LtR) CF candidates Enriched candidates Re-ranked recs Features LtR model Use CF as candidate selection Enrich with item and user features Re-rank results based on learnt model optimised for CtR
  • 23. | 22 LtR Features Reputation & Alt-Metrics Text Topics Temporal Images: wsj, alamy, bookedelic CF similarity score &', &) = min 1, |0'⋂0)| 2 x 345678(&', &)) Citation Network
  • 24. | 23 LtR Models • Set of labelled query documents and their associated recommended documents with feature vectors and relevance judgements • Different optimization objectives – point-wise, pair-wise & list-wise • RankLib java-based LtR package - RankNet – pair-wise neural network algorithm - LambdaRank – extension of RankNet optimizing list-wise IR metrics such as NDCG - LambdaMART – list-wise approach combining LambdaRank and MART < "#$%&'()*, %$)'(),*-ℎ/$0-#%$12, %$34)(%$*2 >
  • 25. | 24 Recommender Logs LtR requires labelled training data that represents user preferences in relation to the recommendation lists Recommender Logs - Impressions – recs shown to the user - Clicks & conversions – recs the user engaged with - Timestamp – when the event happened - Page-load ID – groups recs that were shown at the same time
  • 26. | 25 Training data for LtR Models • Query-recommendations pairs with relevance labels inferred from recommender logs • For each query article - Aggregate the recommended articles across all user sessions - Count # impressions & clicks for each recommendation - Compute graded relevance scores based on CTR
  • 27. | 26 Explore/Exploit via Dithering Slightly shuffle the list of recommendations • Allows for the exploration of the list • Gives the impression of freshness • Reduces some of the bias in LtR training data !"#$%&'()*+*& = log $012 + 4 0, log 7 where < = ∆ $012 $012 and tipically < ∈ [1.5,2]
  • 28. | 27 Evaluation: Click prediction task • Data: • Rank higher the recommendations users engage with • Time-split evaluation < "#$%&'(&)$*+%,-, &%*(&)$*+%/$)ℎ1%2)#&%3, &%+425%+ > Train model Validation Set Test Set Time, user interactions
  • 29. | 28 Results • LtR improved the quality of recommendations - 9-10% improvement in user engagement - Winner is LambdaMART - GBDT with list-wise optimization • LtR increased journal diversity in recommendation lists • LtR promotes recently published articles in the last year • Best ranking model combines usage data with rich structured network and meta data
  • 30. | 29 Offline evaluation should match the online challenge • Candidate generation – Collaborative Filtering – session prediction task • Re-ranking candidates – Learning-to-Rank – click prediction task
  • 31. | 30 LtR in Production LtR rescoringIBCF Recommendation clicks Training data LtR model Article views/downloads
  • 32. | 31 Next Steps & Future Directions
  • 33. | 32 Alternative Approaches Graph-based approaches - Random walks for candidate generation Deep Learning - Learn more complex features for LtR - Neural embeddings for candidate generation - Hybrid systems for ranking
  • 34. | 33 Evaluation – correcting for bias & confounding • Algorithm confounding - How algorithmic confounding in recommendation systems increases homogeneity and decreases utility. Allison J. B. Chaney, Brandon M. Stewart, and Barbara E. Engelhardt (RecSys '18). • Explore/exploit – multi-armed bandits - Explore, exploit, and explain: personalizing explainable recommendations with bandits. James McInerney, et al. (RecSys ‘18). • Counterfactuals - Counterfactual reasoning and learning systems: The example of computational advertising. Bottou, Léon, et al. (JMLR 2013).
  • 35. | 34 Qualitative & Quantitative Evaluation https://github.com/jeanigarcia/recsys2018-evaluation-tutorial
  • 37. | 36 Recommender Team Publications Hristakeva, M., Kershaw, D., Pettit, B., Vargas, S., & Jack, K. (2019). Academic recommendations: The Mendeley case. In Collaborative Recommendations: Algorithms, Practical Challenges and Applications. Pettit, B., Hristakeva, M., Kershaw, D. & Jack, K. (2018). Learning to Rank Research Articles: A case study of collaborative filtering and learning to rank in Science Direct. Hristakeva, M., Kershaw, D., Rossetti, M., Knoth, P., Pettit, B., Vargas, S., & Jack, K. (2017). Building recommender systems for scholarly information. WSDM2017. Rossetti, M., Vargas, S., Pettit, B., Kershaw, D., Hristakeva, M., & Jack, K. (2017). Effectively identifying users’ research interests for scholarly reference management and discovery. WSDM2017. Vargas, S., Hristakeva, M., & Jack, K. (2016). Mendeley: Recommendations for Researchers. RecSys ’16
  • 38. | 37 References From RankNet to LambdaRank to LambdaMART: An Overview (2010). Christopher J. C. Burges. On Application of Learning to Rank for E-Commerce Search by Shubhra Kanti Karmaker Santu, Parikshit Sondhi, and ChengXiang Zhai (SIGIR 2017). Recommender Systems Handbook (2010). Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor. Practical Machine Learning: Innovations in Recommendation (2014). Ted Dunning and Ellen Friedman. O'Reilly Media, Inc. Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time by Chantat Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu, Rahul Sharma, Charles Sugnet, Mark Ulrich, and Jure Leskovec (WWW 2018). Getting Deep Recommenders Fit: Bloom Embeddings for Sparse Binary Input/Output Networks by Joan Serrà and Alexandros Karatzoglou (RecSys 2017)
  • 39. We're hiring, come speak to us! https://www.elsevier.com/about/careers/technology-careers