SlideShare a Scribd company logo
1 of 26
Download to read offline
RecSys 2018, Vancouver, Canada
On the Robustness and Discriminative Power
of IR Metrics for Top-N Recommendation
Daniel Valcarce∗ A. Bellogín† J. Parapar∗ P. Castells†
@dvalcarce @abellogin @jparapar @pcastells
∗University of A Coruña
†Universidad Autónoma de Madrid
Evaluation
Recommender Systems Evaluation
Online evaluation (e.g., A/B testing):
expensive,
measures real user behavior.
Offline evaluation:
cheap,
highly reproducible,
usually constitutes the first step before deploying a
recommender system.
2
Recommender Systems Evaluation
Online evaluation (e.g., A/B testing):
expensive,
measures real user behavior.
Offline evaluation ←
cheap,
highly reproducible,
usually constitutes the first step before deploying a
recommender system.
2
Offline Evaluation of Recommender Systems (RS)
When evaluating recommender systems (RS), which metric
should we use?
Many types: error, ranking accuracy, diversity, novelty, etc.
Ranking accuracy metrics are becoming the most popular.
These metrics have been traditionally used in Information
Retrieval (IR).
3
IR Metrics used in RS
Some IR metrics that have been used in RS:
P: Precision
Recall
MAP: Mean Average Precision
nDCG: Normalised Discounted Cumulative Gain
MRR: Mean Reciprocal Rank
bpref: Binary Preference
infAP: Inferred Average Precision
4
Analyse how IR Metrics behave in RS
These ranking accuracy metrics have been studied in IR.
We want to study their behavior in top-N recommendation.
Two perspectives:
robustness,
discriminative power.
5
Proposal
Robustness
Sparsity bias
Sparsity is intrinsic to the
recommendation task.
We take random
subsamples from the test
set to increase the bias.
Popularity bias
Missing-not-at-random
(long tail distribution).
We remove the most
popular items to study the
bias.
We measuretherobustnessofametricbycomputingtheKendall’s
correlation of systems rankings when changing the amount of
bias.
7
Discriminative Power
A metric is discriminative when its differences in value are
statistically significant.
We use the permutation test with difference in means as
test statistic.
We run a statistical test between all possible system pairs.
We plot the obtained p-values sorted by decreasing value.
8
Experiments
Experimental Settings
Three datasets:
◦ MovieLens 1M,
◦ LibraryThing,
◦ BeerAdvocate.
Methodology:
◦ AllItems: rank all the items in the dataset that have not
been rated by the target user.
◦ 80-20% random split.
Systems:
◦ 21 different recommendation algorithms.
10
Comparing metric cut-offs
Comparing cut-offs of the same metric (nDCG) 1/4
@5 @10 @20 @30 @40 @50 @60 @70 @80 @90 @100
@5
@10
@20
@30
@40
@50
@60
@70
@80
@90
@100
1.00 0.95 0.93 0.92 0.92 0.92 0.92 0.91 0.90 0.90 0.90
0.95 1.00 0.98 0.97 0.97 0.97 0.97 0.96 0.95 0.95 0.95
0.93 0.98 1.00 0.99 0.99 0.99 0.99 0.98 0.97 0.97 0.97
0.92 0.97 0.99 1.00 1.00 1.00 1.00 0.99 0.98 0.98 0.98
0.92 0.97 0.99 1.00 1.00 1.00 1.00 0.99 0.98 0.98 0.98
0.92 0.97 0.99 1.00 1.00 1.00 1.00 0.99 0.98 0.98 0.98
0.92 0.97 0.99 1.00 1.00 1.00 1.00 0.99 0.98 0.98 0.98
0.91 0.96 0.98 0.99 0.99 0.99 0.99 1.00 0.99 0.99 0.99
0.90 0.95 0.97 0.98 0.98 0.98 0.98 0.99 1.00 1.00 1.00
0.90 0.95 0.97 0.98 0.98 0.98 0.98 0.99 1.00 1.00 1.00
0.90 0.95 0.97 0.98 0.98 0.98 0.98 0.99 1.00 1.00 1.00
Correlation between cut-offs of nDCG.
12
Comparing cut-offs of the same metric (nDCG) 2/4
100 90 80 70 60 50 40 30 20 10 0
% of ratings in the test set
0.85
0.90
0.95
1.00
Kendall’sτ
nDCG@5
nDCG@10
nDCG@20
nDCG@30
nDCG@40
nDCG@50
nDCG@60
nDCG@70
nDCG@80
nDCG@90
nDCG@100
Kendall’s correlation among systems when increasing the
sparsity bias using nDCG.
13
Comparing cut-offs of the same metric (nDCG) 3/4
100 95 90 85 80
% least popular items in the test set
0.0
0.2
0.4
0.6
0.8
1.0
Kendall’sτ
nDCG@5
nDCG@10
nDCG@20
nDCG@30
nDCG@40
nDCG@50
nDCG@60
nDCG@70
nDCG@80
nDCG@90
nDCG@100
Kendall’s correlation among systems when changing the
popularity bias using nDCG.
14
Comparing cut-offs of the same metric (nDCG) 4/4
0 5 10 15 20 25
pairs of recommender systems
0.0
0.2
0.4
0.6
0.8
1.0
p-value
nDCG@5
nDCG@10
nDCG@20
nDCG@30
nDCG@40
nDCG@50
nDCG@60
nDCG@70
nDCG@80
nDCG@90
nDCG@100
Discriminative power of nDCG measured with p-value curves.
15
Comparing metrics at the same
cut-off
Comparing metrics with cut-off @100 1/4
P Recall MAP nDCG MRR bpref infAP
P
Recall
MAP
nDCG
MRR
bpref
infAP
1.00 0.89 0.87 0.89 0.71 0.89 0.91
0.89 1.00 0.87 0.90 0.72 0.90 0.92
0.87 0.87 1.00 0.96 0.84 0.92 0.92
0.89 0.90 0.96 1.00 0.82 0.94 0.96
0.71 0.72 0.84 0.82 1.00 0.80 0.80
0.89 0.90 0.92 0.94 0.80 1.00 0.96
0.91 0.92 0.92 0.96 0.80 0.96 1.00
Correlation between metrics.
17
Comparing metrics with cut-off @100 4/4
100 90 80 70 60 50 40 30 20 10 0
% of ratings in the test set
0.85
0.90
0.95
1.00
Kendall’sτ
P
Recall
MAP
nDCG
MRR
bpref
infAP
Kendall’s correlation among systems when increasing the
sparsity bias.
18
Comparing metrics with cut-off @100 3/4
100 95 90 85 80
% least popular items in the test set
0.0
0.2
0.4
0.6
0.8
1.0
Kendall’sτ
P
Recall
MAP
nDCG
MRR
bpref
infAP
Kendall’s correlation among systems when increasing the
popularity bias.
19
Comparing metrics with cut-off @100 4/4
0 5 10 15 20 25
pairs of recommender systems
0.0
0.2
0.4
0.6
0.8
1.0
p-value
P
Recall
MAP
nDCG
MRR
bpref
infAP
Discriminative power measured with p-value curves.
20
Conclusions and Future Directions
Conclusions
Deeper cut-offs offer greater robustness and
discriminative power than shallow cut-offs.
Precision offers high robustness to sparsity and popularity
biases and good discriminative power.
NDCG provides the best discriminative power and high
robustness to the sparsity bias and moderate robustness to
the popularity bias.
22
Future work
Explore different types of metrics such as diversity or
novelty metrics.
Use other evaluation methodologies instead of AllItems.
◦ For instance, One-Plus-Random: one relevant item and N
non-relevant items as candidate set.
Employ different partitioning schemes such as temporal
splits.
23
Thank you!

More Related Content

Similar to On the Robustness and Discriminative Power of IR Metrics for Top-N Recommendation [RecSys '18 Slides]

QCP user manual EN.pdf
QCP user manual EN.pdfQCP user manual EN.pdf
QCP user manual EN.pdfEmerson Ceras
 
RS in the context of Big Data-v4
RS in the context of Big Data-v4RS in the context of Big Data-v4
RS in the context of Big Data-v4Khadija Atiya
 
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...Emanuel Lacić
 
ANALYTICAL VARIABLES IN QUALITY CONTROL.pptx
ANALYTICAL VARIABLES IN QUALITY CONTROL.pptxANALYTICAL VARIABLES IN QUALITY CONTROL.pptx
ANALYTICAL VARIABLES IN QUALITY CONTROL.pptxDr. Jagroop Singh
 
Measurement system analysis Presentation.ppt
Measurement system analysis Presentation.pptMeasurement system analysis Presentation.ppt
Measurement system analysis Presentation.pptjawadullah25
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationThomas Ploetz
 
Performance evaluation of IR models
Performance evaluation of IR modelsPerformance evaluation of IR models
Performance evaluation of IR modelsNisha Arankandath
 
Off-line vs. On-line Evaluation of Recommender Systems in Small E-commerce
Off-line vs. On-line Evaluation of Recommender Systems in Small E-commerceOff-line vs. On-line Evaluation of Recommender Systems in Small E-commerce
Off-line vs. On-line Evaluation of Recommender Systems in Small E-commerceLadislav Peska
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Sagar Deogirkar
 
Hidalgo jairo, yandun marco 595
Hidalgo jairo, yandun marco 595Hidalgo jairo, yandun marco 595
Hidalgo jairo, yandun marco 595Marco Yandun
 
R&R Gage Analysis
R&R Gage AnalysisR&R Gage Analysis
R&R Gage AnalysisTripticon
 
6-130914140240-phpapp01.pdf
6-130914140240-phpapp01.pdf6-130914140240-phpapp01.pdf
6-130914140240-phpapp01.pdfssuserdca880
 
Summer 2015 Internship
Summer 2015 InternshipSummer 2015 Internship
Summer 2015 InternshipTaylor Martell
 
Vital QMS Process Validation Statistics - OMTEC 2018
Vital QMS Process Validation Statistics - OMTEC 2018Vital QMS Process Validation Statistics - OMTEC 2018
Vital QMS Process Validation Statistics - OMTEC 2018April Bright
 
2007 Pharmasug, Promotion Response Analysis
2007 Pharmasug, Promotion Response Analysis2007 Pharmasug, Promotion Response Analysis
2007 Pharmasug, Promotion Response AnalysisAlejandro Jaramillo
 
Multiple objectives in Collaborative Filtering (RecSys 2010)
Multiple objectives in Collaborative Filtering (RecSys 2010)Multiple objectives in Collaborative Filtering (RecSys 2010)
Multiple objectives in Collaborative Filtering (RecSys 2010)Tamas Jambor
 
How predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarHow predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarAnn-Marie Roche
 

Similar to On the Robustness and Discriminative Power of IR Metrics for Top-N Recommendation [RecSys '18 Slides] (20)

QCP user manual EN.pdf
QCP user manual EN.pdfQCP user manual EN.pdf
QCP user manual EN.pdf
 
RS in the context of Big Data-v4
RS in the context of Big Data-v4RS in the context of Big Data-v4
RS in the context of Big Data-v4
 
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
 
ANALYTICAL VARIABLES IN QUALITY CONTROL.pptx
ANALYTICAL VARIABLES IN QUALITY CONTROL.pptxANALYTICAL VARIABLES IN QUALITY CONTROL.pptx
ANALYTICAL VARIABLES IN QUALITY CONTROL.pptx
 
Measurement system analysis Presentation.ppt
Measurement system analysis Presentation.pptMeasurement system analysis Presentation.ppt
Measurement system analysis Presentation.ppt
 
report
reportreport
report
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
Performance evaluation of IR models
Performance evaluation of IR modelsPerformance evaluation of IR models
Performance evaluation of IR models
 
ARIMA
ARIMA ARIMA
ARIMA
 
Off-line vs. On-line Evaluation of Recommender Systems in Small E-commerce
Off-line vs. On-line Evaluation of Recommender Systems in Small E-commerceOff-line vs. On-line Evaluation of Recommender Systems in Small E-commerce
Off-line vs. On-line Evaluation of Recommender Systems in Small E-commerce
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
 
Hidalgo jairo, yandun marco 595
Hidalgo jairo, yandun marco 595Hidalgo jairo, yandun marco 595
Hidalgo jairo, yandun marco 595
 
R&R Gage Analysis
R&R Gage AnalysisR&R Gage Analysis
R&R Gage Analysis
 
MSA (GR&R)
MSA (GR&R)MSA (GR&R)
MSA (GR&R)
 
6-130914140240-phpapp01.pdf
6-130914140240-phpapp01.pdf6-130914140240-phpapp01.pdf
6-130914140240-phpapp01.pdf
 
Summer 2015 Internship
Summer 2015 InternshipSummer 2015 Internship
Summer 2015 Internship
 
Vital QMS Process Validation Statistics - OMTEC 2018
Vital QMS Process Validation Statistics - OMTEC 2018Vital QMS Process Validation Statistics - OMTEC 2018
Vital QMS Process Validation Statistics - OMTEC 2018
 
2007 Pharmasug, Promotion Response Analysis
2007 Pharmasug, Promotion Response Analysis2007 Pharmasug, Promotion Response Analysis
2007 Pharmasug, Promotion Response Analysis
 
Multiple objectives in Collaborative Filtering (RecSys 2010)
Multiple objectives in Collaborative Filtering (RecSys 2010)Multiple objectives in Collaborative Filtering (RecSys 2010)
Multiple objectives in Collaborative Filtering (RecSys 2010)
 
How predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarHow predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinar
 

More from Daniel Valcarce

Information Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesInformation Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesDaniel Valcarce
 
LiMe: Linear Methods for Pseudo-Relevance Feedback [SAC '18 Slides]
LiMe: Linear Methods for Pseudo-Relevance Feedback [SAC '18 Slides]LiMe: Linear Methods for Pseudo-Relevance Feedback [SAC '18 Slides]
LiMe: Linear Methods for Pseudo-Relevance Feedback [SAC '18 Slides]Daniel Valcarce
 
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Daniel Valcarce
 
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...Daniel Valcarce
 
When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CER...
When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CER...When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CER...
When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CER...Daniel Valcarce
 
A Study of Smoothing Methods for Relevance-Based Language Modelling of Recomm...
A Study of Smoothing Methods for Relevance-Based Language Modelling of Recomm...A Study of Smoothing Methods for Relevance-Based Language Modelling of Recomm...
A Study of Smoothing Methods for Relevance-Based Language Modelling of Recomm...Daniel Valcarce
 
A Study of Priors for Relevance-Based Language Modelling of Recommender Syste...
A Study of Priors for Relevance-Based Language Modelling of Recommender Syste...A Study of Priors for Relevance-Based Language Modelling of Recommender Syste...
A Study of Priors for Relevance-Based Language Modelling of Recommender Syste...Daniel Valcarce
 
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Daniel Valcarce
 
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Daniel Valcarce
 
Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]
Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]
Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]Daniel Valcarce
 
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Daniel Valcarce
 

More from Daniel Valcarce (11)

Information Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesInformation Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slides
 
LiMe: Linear Methods for Pseudo-Relevance Feedback [SAC '18 Slides]
LiMe: Linear Methods for Pseudo-Relevance Feedback [SAC '18 Slides]LiMe: Linear Methods for Pseudo-Relevance Feedback [SAC '18 Slides]
LiMe: Linear Methods for Pseudo-Relevance Feedback [SAC '18 Slides]
 
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
 
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
 
When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CER...
When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CER...When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CER...
When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CER...
 
A Study of Smoothing Methods for Relevance-Based Language Modelling of Recomm...
A Study of Smoothing Methods for Relevance-Based Language Modelling of Recomm...A Study of Smoothing Methods for Relevance-Based Language Modelling of Recomm...
A Study of Smoothing Methods for Relevance-Based Language Modelling of Recomm...
 
A Study of Priors for Relevance-Based Language Modelling of Recommender Syste...
A Study of Priors for Relevance-Based Language Modelling of Recommender Syste...A Study of Priors for Relevance-Based Language Modelling of Recommender Syste...
A Study of Priors for Relevance-Based Language Modelling of Recommender Syste...
 
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
 
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
 
Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]
Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]
Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]
 
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
 

Recently uploaded

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 

Recently uploaded (20)

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 

On the Robustness and Discriminative Power of IR Metrics for Top-N Recommendation [RecSys '18 Slides]

  • 1. RecSys 2018, Vancouver, Canada On the Robustness and Discriminative Power of IR Metrics for Top-N Recommendation Daniel Valcarce∗ A. Bellogín† J. Parapar∗ P. Castells† @dvalcarce @abellogin @jparapar @pcastells ∗University of A Coruña †Universidad Autónoma de Madrid
  • 3. Recommender Systems Evaluation Online evaluation (e.g., A/B testing): expensive, measures real user behavior. Offline evaluation: cheap, highly reproducible, usually constitutes the first step before deploying a recommender system. 2
  • 4. Recommender Systems Evaluation Online evaluation (e.g., A/B testing): expensive, measures real user behavior. Offline evaluation ← cheap, highly reproducible, usually constitutes the first step before deploying a recommender system. 2
  • 5. Offline Evaluation of Recommender Systems (RS) When evaluating recommender systems (RS), which metric should we use? Many types: error, ranking accuracy, diversity, novelty, etc. Ranking accuracy metrics are becoming the most popular. These metrics have been traditionally used in Information Retrieval (IR). 3
  • 6. IR Metrics used in RS Some IR metrics that have been used in RS: P: Precision Recall MAP: Mean Average Precision nDCG: Normalised Discounted Cumulative Gain MRR: Mean Reciprocal Rank bpref: Binary Preference infAP: Inferred Average Precision 4
  • 7. Analyse how IR Metrics behave in RS These ranking accuracy metrics have been studied in IR. We want to study their behavior in top-N recommendation. Two perspectives: robustness, discriminative power. 5
  • 9. Robustness Sparsity bias Sparsity is intrinsic to the recommendation task. We take random subsamples from the test set to increase the bias. Popularity bias Missing-not-at-random (long tail distribution). We remove the most popular items to study the bias. We measuretherobustnessofametricbycomputingtheKendall’s correlation of systems rankings when changing the amount of bias. 7
  • 10. Discriminative Power A metric is discriminative when its differences in value are statistically significant. We use the permutation test with difference in means as test statistic. We run a statistical test between all possible system pairs. We plot the obtained p-values sorted by decreasing value. 8
  • 12. Experimental Settings Three datasets: ◦ MovieLens 1M, ◦ LibraryThing, ◦ BeerAdvocate. Methodology: ◦ AllItems: rank all the items in the dataset that have not been rated by the target user. ◦ 80-20% random split. Systems: ◦ 21 different recommendation algorithms. 10
  • 14. Comparing cut-offs of the same metric (nDCG) 1/4 @5 @10 @20 @30 @40 @50 @60 @70 @80 @90 @100 @5 @10 @20 @30 @40 @50 @60 @70 @80 @90 @100 1.00 0.95 0.93 0.92 0.92 0.92 0.92 0.91 0.90 0.90 0.90 0.95 1.00 0.98 0.97 0.97 0.97 0.97 0.96 0.95 0.95 0.95 0.93 0.98 1.00 0.99 0.99 0.99 0.99 0.98 0.97 0.97 0.97 0.92 0.97 0.99 1.00 1.00 1.00 1.00 0.99 0.98 0.98 0.98 0.92 0.97 0.99 1.00 1.00 1.00 1.00 0.99 0.98 0.98 0.98 0.92 0.97 0.99 1.00 1.00 1.00 1.00 0.99 0.98 0.98 0.98 0.92 0.97 0.99 1.00 1.00 1.00 1.00 0.99 0.98 0.98 0.98 0.91 0.96 0.98 0.99 0.99 0.99 0.99 1.00 0.99 0.99 0.99 0.90 0.95 0.97 0.98 0.98 0.98 0.98 0.99 1.00 1.00 1.00 0.90 0.95 0.97 0.98 0.98 0.98 0.98 0.99 1.00 1.00 1.00 0.90 0.95 0.97 0.98 0.98 0.98 0.98 0.99 1.00 1.00 1.00 Correlation between cut-offs of nDCG. 12
  • 15. Comparing cut-offs of the same metric (nDCG) 2/4 100 90 80 70 60 50 40 30 20 10 0 % of ratings in the test set 0.85 0.90 0.95 1.00 Kendall’sτ nDCG@5 nDCG@10 nDCG@20 nDCG@30 nDCG@40 nDCG@50 nDCG@60 nDCG@70 nDCG@80 nDCG@90 nDCG@100 Kendall’s correlation among systems when increasing the sparsity bias using nDCG. 13
  • 16. Comparing cut-offs of the same metric (nDCG) 3/4 100 95 90 85 80 % least popular items in the test set 0.0 0.2 0.4 0.6 0.8 1.0 Kendall’sτ nDCG@5 nDCG@10 nDCG@20 nDCG@30 nDCG@40 nDCG@50 nDCG@60 nDCG@70 nDCG@80 nDCG@90 nDCG@100 Kendall’s correlation among systems when changing the popularity bias using nDCG. 14
  • 17. Comparing cut-offs of the same metric (nDCG) 4/4 0 5 10 15 20 25 pairs of recommender systems 0.0 0.2 0.4 0.6 0.8 1.0 p-value nDCG@5 nDCG@10 nDCG@20 nDCG@30 nDCG@40 nDCG@50 nDCG@60 nDCG@70 nDCG@80 nDCG@90 nDCG@100 Discriminative power of nDCG measured with p-value curves. 15
  • 18. Comparing metrics at the same cut-off
  • 19. Comparing metrics with cut-off @100 1/4 P Recall MAP nDCG MRR bpref infAP P Recall MAP nDCG MRR bpref infAP 1.00 0.89 0.87 0.89 0.71 0.89 0.91 0.89 1.00 0.87 0.90 0.72 0.90 0.92 0.87 0.87 1.00 0.96 0.84 0.92 0.92 0.89 0.90 0.96 1.00 0.82 0.94 0.96 0.71 0.72 0.84 0.82 1.00 0.80 0.80 0.89 0.90 0.92 0.94 0.80 1.00 0.96 0.91 0.92 0.92 0.96 0.80 0.96 1.00 Correlation between metrics. 17
  • 20. Comparing metrics with cut-off @100 4/4 100 90 80 70 60 50 40 30 20 10 0 % of ratings in the test set 0.85 0.90 0.95 1.00 Kendall’sτ P Recall MAP nDCG MRR bpref infAP Kendall’s correlation among systems when increasing the sparsity bias. 18
  • 21. Comparing metrics with cut-off @100 3/4 100 95 90 85 80 % least popular items in the test set 0.0 0.2 0.4 0.6 0.8 1.0 Kendall’sτ P Recall MAP nDCG MRR bpref infAP Kendall’s correlation among systems when increasing the popularity bias. 19
  • 22. Comparing metrics with cut-off @100 4/4 0 5 10 15 20 25 pairs of recommender systems 0.0 0.2 0.4 0.6 0.8 1.0 p-value P Recall MAP nDCG MRR bpref infAP Discriminative power measured with p-value curves. 20
  • 24. Conclusions Deeper cut-offs offer greater robustness and discriminative power than shallow cut-offs. Precision offers high robustness to sparsity and popularity biases and good discriminative power. NDCG provides the best discriminative power and high robustness to the sparsity bias and moderate robustness to the popularity bias. 22
  • 25. Future work Explore different types of metrics such as diversity or novelty metrics. Use other evaluation methodologies instead of AllItems. ◦ For instance, One-Plus-Random: one relevant item and N non-relevant items as candidate set. Employ different partitioning schemes such as temporal splits. 23