SlideShare a Scribd company logo
How to improve the statistical power of the 10-fold cross
validation scheme in Recommender Systems

University of Ljubljana
[LDOS]

..: Faculty of Electrical Engineering
..: Digital Signal, Image and Video Processing Laboratory

Andrej Košir
Ante Odić
Marko Tkalčič
University of Ljubljana
[LDOS]

..: Faculty of Electrical Engineering
..: Digital Signal, Image and Video Processing Laboratory

Statistical power, replicability and reproducibility
 What is:
 Replicability: to get the same experimental result (on the same data)
 Reproducibility : to get similar experimental results leading to the same
conclusion
Mackay, R., & Oldford, R. (2000). Scientific method, statistical method, and the speed of light, Working paper 2000-02). Department of Statistics and Actuarial Science, University of Waterloo.

 In terms of statistical testing
 Higher power => better reproducibility
 More likely to get to the same conclusions
University of Ljubljana
[LDOS]

..: Faculty of Electrical Engineering
..: Digital Signal, Image and Video Processing Laboratory

On stat hypothese testing
 When we need to use stat tests?
 The results should not change if we repeat the experiment
 When we need it: at later stages of development where results are similar
RS 1

F1

0.72

RS 2

F2

0.89
0.74

Test
data

 Elements of statistical testing






Working hypotheses
Null and alternative hypotheses: 𝐻0 and 𝐻1
p-value: 𝑝
Risk level: 𝛼
Decision on 𝐻0
University of Ljubljana
[LDOS]

..: Faculty of Electrical Engineering
..: Digital Signal, Image and Video Processing Laboratory

On errors and statistical power
 Errors in test decision:
 Errors of type I. and type II.
 Effect size

 Power:






ˆ
H0

ˆ
H1

H0

OK

type I.

H1

type II.

OK

Power = 𝑃𝑟[ 𝐻1 |𝐻1 ]

For each test a new analysis is required
more is better
The best one can do
Task 1 - How to select sample size: apriory power
Task 2 - How to estimate achieved power: posterior power

 History:
 1908 by William Sealy Gosset (Student): he did not need it
 Mainly ignored until then

 Software: GPower
http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/
University of Ljubljana
[LDOS]

..: Faculty of Electrical Engineering
..: Digital Signal, Image and Video Processing Laboratory

The application we were working on: contextual variables
 Which contextual variables are relevant:
 What is context
 Candidates: time, weather, mood, ...
 Can we simply use it all?
• Irrelevant context can worse the performance of RS

 Test if a given context is relevant
 How: compare RS with and without it

ODIĆ, Ante, TKALČIČ, Marko, TASIČ, Jurij F., KOŠIR, Andrej. Predicting and detecting the
relevant contextual information in a movie-recommender system. Interact. comput.. [Print ed.], 2013,
vol. 25, no. 1, pp. 74-90, ilustr., doi:10.1093/iwc/iws003. [COBISS.SI-ID 9650260]
ODIĆ, Ante, TKALČIČ, Marko, TASIČ, Jurij F., KOŠIR, Andrej. Impact of the context relevancy on
ratings prediction in a movie-recommender system. Automatika (Zagreb), 2013, vol. 54, no. 2, pp. 252262, ilustr., doi:10.7305/automatika.54-2.258. [COBISS.SI-ID 9782356]
University of Ljubljana
[LDOS]

..: Faculty of Electrical Engineering
..: Digital Signal, Image and Video Processing Laboratory

The problem we observed: cross validation scheme

ODIĆ, Ante, TKALČIČ, Marko, TASIČ, Jurij F., KOŠIR, Andrej. Predicting and detecting the relevant contextual
information in a movie-recommender system. Interact. comput., vol. 25, no. 1, pp. 74-90, 2013.

 There were differences among folds, but not in conclusion
 What is wrong?
 Paired / unpaired?

 What is usually done:
 Confusion matrix computation is actually unpaired
University of Ljubljana
[LDOS]

..: Faculty of Electrical Engineering
..: Digital Signal, Image and Video Processing Laboratory

Proposed solution
The procedure outline:
1.
2.
3.
4.

Select the scalar comparison measure (such as precision or F-measure).
Store the evaluation results of each fold and each method separately;
According to the specfic features of the evaluation results (distributions
etc.) select the most powerful test that meets these specific features
Perform the paired version of the selected test.
University of Ljubljana
[LDOS]

..: Faculty of Electrical Engineering
..: Digital Signal, Image and Video Processing Laboratory

Materials and methods (1)
 Dataset:
 Context Movie Dataset (LDOS-CoMoDa)
 1611 ratings from 89 users to 946 items with associated contextual factors.
 Contextual variables
•
•
•
•
•
•
•
•
•
•

time (morning, afternoon, evening, night),
daytype (working day, weekend, time (morning, afternoon, evening, night),
season (spring, summer, autumn, winter),
Location (home, public place, friend's house),
weather (sunny/clear, rainy, stormy, snowy, cloudy),
social (alone, partner, friends, colleagues, parents, public, family),
endEmo (sad, happy, scared, surprised, angry, disgusted, neutral),
dominantEmo (sad, happy, scared, surprised, angry, disgusted, neutral),
mood (positive, neutral, negative),
physical (healthy, ill), decision (user's choice, given by other), interaction (1rst, n-th)

 Publically available:
LDOS-CoMoDa contextual dataset: available at www.ldos.si/comoda.html.
Used by 29 researchers at this moment.
University of Ljubljana
[LDOS]

..: Faculty of Electrical Engineering
..: Digital Signal, Image and Video Processing Laboratory

Materials and methods (2), results
 Experimental design
 10-fold cross validation
 Two procedures: ProcPaired, ProcIndep

 Results – which contextual variable improves MF?
 Tests: Wilcoxon signed rank test (ProcIndep) and
Mann Whitney U test, (ProcPaired)
 The achieved (post-hoc) statistical power for the paired test (pw pa.) and for the
independent test (pw in.) along with the computed p-values

Id

Var 1

Var 2

1

Physical

2
3

pw paired

p paired

pw indep.

p indep.

Weather 0.42

0.001

0.14

0.24

Decision

Social

0.99

0.004

0.25

0.19

interaction

Social

0.06

<0.001

0.05

0.43
University of Ljubljana
[LDOS]

..: Faculty of Electrical Engineering
..: Digital Signal, Image and Video Processing Laboratory

Discussion
 Power improvements:
 The first combination (physical vs. weather): 0.14  0.42, low but useful;
 The second combination (decision vs. social): 0.19  0.99, the difference in
power is again substantial;
 The third combination (interaction vs. social): 0.05  0.06, irrelevant;

 It does not require substantial additional work
 Worth of effort
University of Ljubljana
[LDOS]

..: Faculty of Electrical Engineering
..: Digital Signal, Image and Video Processing Laboratory

Further work
 We limited to 10-fold cross validation and simple tests only. There is more
out there.
 We will concentrate on a comparison of RS regarding the selected final tasks
(such as best five) and not limited to scalar performance measures (such as
precision at five).

 More sophisticated statistical approaches:
 are available such as a multi-level repeated binomial regression
 my opinion: will not be used frequently

THANK YOU
Invitation: International Conference on Automatic Face and Gesture
Recognition FG2015, http://www.fg2015.org/
University of Ljubljana
[LDOS]

..: Faculty of Electrical Engineering
..: Digital Signal, Image and Video Processing Laboratory

Presentation structure









The goal
What it has to do with replicability and reproducibility?
Selected items from statistics
Our case & problem statement
Proposed solution & comments
Experimental results
Future work
Take away notes

More Related Content

What's hot

Construction of inexpensive Web-Cam based Optical Spectrometer using
Construction of inexpensive Web-Cam based Optical Spectrometer usingConstruction of inexpensive Web-Cam based Optical Spectrometer using
Construction of inexpensive Web-Cam based Optical Spectrometer using
Soares Fernando
 
Final Presentation (REVISION 2)
Final Presentation (REVISION 2)Final Presentation (REVISION 2)
Final Presentation (REVISION 2)
Chad Buckallew
 
IRJET - An Robust and Dynamic Fire Detection Method using Convolutional N...
IRJET -  	  An Robust and Dynamic Fire Detection Method using Convolutional N...IRJET -  	  An Robust and Dynamic Fire Detection Method using Convolutional N...
IRJET - An Robust and Dynamic Fire Detection Method using Convolutional N...
IRJET Journal
 
(Structural) Feature Interactions for Variability-Intensive Systems Testing
(Structural) Feature Interactions for Variability-Intensive Systems Testing (Structural) Feature Interactions for Variability-Intensive Systems Testing
(Structural) Feature Interactions for Variability-Intensive Systems Testing
Gilles Perrouin
 
APPLICATION OF CONVOLUTIONAL NEURAL NETWORK IN LAWN MEASUREMENT
APPLICATION OF CONVOLUTIONAL NEURAL NETWORK IN LAWN MEASUREMENTAPPLICATION OF CONVOLUTIONAL NEURAL NETWORK IN LAWN MEASUREMENT
APPLICATION OF CONVOLUTIONAL NEURAL NETWORK IN LAWN MEASUREMENT
sipij
 
Socable Influence Maximization
Socable Influence MaximizationSocable Influence Maximization
Socable Influence Maximization
robertlz
 
Time Critical Influence Maximization
Time Critical Influence MaximizationTime Critical Influence Maximization
Time Critical Influence Maximization
Wei Lu
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
PyData
 
Jos van Sas - Testimonial Alcatel-Lucent Bell Labs
Jos van Sas - Testimonial Alcatel-Lucent Bell LabsJos van Sas - Testimonial Alcatel-Lucent Bell Labs
Jos van Sas - Testimonial Alcatel-Lucent Bell Labs
imec.archive
 
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
ijtsrd
 
ECCO: An Electron Counting Implementation for Image Compression and Optimizat...
ECCO: An Electron Counting Implementation for Image Compression and Optimizat...ECCO: An Electron Counting Implementation for Image Compression and Optimizat...
ECCO: An Electron Counting Implementation for Image Compression and Optimizat...
NECST Lab @ Politecnico di Milano
 
Robotics Club Lesson 1
Robotics Club Lesson 1Robotics Club Lesson 1
Robotics Club Lesson 1
Gene Leybzon
 
International Journal of Engineering Inventions (IJEI)
International Journal of Engineering Inventions (IJEI)International Journal of Engineering Inventions (IJEI)
International Journal of Engineering Inventions (IJEI)
International Journal of Engineering Inventions www.ijeijournal.com
 
Joint DoA and Offset Frequency estimator
Joint DoA and Offset Frequency estimatorJoint DoA and Offset Frequency estimator
Joint DoA and Offset Frequency estimator
Jason Fernandes
 
Human Face Detection and Tracking for Age Rank, Weight and Gender Estimation ...
Human Face Detection and Tracking for Age Rank, Weight and Gender Estimation ...Human Face Detection and Tracking for Age Rank, Weight and Gender Estimation ...
Human Face Detection and Tracking for Age Rank, Weight and Gender Estimation ...
IRJET Journal
 
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Obje...
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Obje...Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Obje...
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Obje...
Lionel Briand
 

What's hot (16)

Construction of inexpensive Web-Cam based Optical Spectrometer using
Construction of inexpensive Web-Cam based Optical Spectrometer usingConstruction of inexpensive Web-Cam based Optical Spectrometer using
Construction of inexpensive Web-Cam based Optical Spectrometer using
 
Final Presentation (REVISION 2)
Final Presentation (REVISION 2)Final Presentation (REVISION 2)
Final Presentation (REVISION 2)
 
IRJET - An Robust and Dynamic Fire Detection Method using Convolutional N...
IRJET -  	  An Robust and Dynamic Fire Detection Method using Convolutional N...IRJET -  	  An Robust and Dynamic Fire Detection Method using Convolutional N...
IRJET - An Robust and Dynamic Fire Detection Method using Convolutional N...
 
(Structural) Feature Interactions for Variability-Intensive Systems Testing
(Structural) Feature Interactions for Variability-Intensive Systems Testing (Structural) Feature Interactions for Variability-Intensive Systems Testing
(Structural) Feature Interactions for Variability-Intensive Systems Testing
 
APPLICATION OF CONVOLUTIONAL NEURAL NETWORK IN LAWN MEASUREMENT
APPLICATION OF CONVOLUTIONAL NEURAL NETWORK IN LAWN MEASUREMENTAPPLICATION OF CONVOLUTIONAL NEURAL NETWORK IN LAWN MEASUREMENT
APPLICATION OF CONVOLUTIONAL NEURAL NETWORK IN LAWN MEASUREMENT
 
Socable Influence Maximization
Socable Influence MaximizationSocable Influence Maximization
Socable Influence Maximization
 
Time Critical Influence Maximization
Time Critical Influence MaximizationTime Critical Influence Maximization
Time Critical Influence Maximization
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 
Jos van Sas - Testimonial Alcatel-Lucent Bell Labs
Jos van Sas - Testimonial Alcatel-Lucent Bell LabsJos van Sas - Testimonial Alcatel-Lucent Bell Labs
Jos van Sas - Testimonial Alcatel-Lucent Bell Labs
 
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
 
ECCO: An Electron Counting Implementation for Image Compression and Optimizat...
ECCO: An Electron Counting Implementation for Image Compression and Optimizat...ECCO: An Electron Counting Implementation for Image Compression and Optimizat...
ECCO: An Electron Counting Implementation for Image Compression and Optimizat...
 
Robotics Club Lesson 1
Robotics Club Lesson 1Robotics Club Lesson 1
Robotics Club Lesson 1
 
International Journal of Engineering Inventions (IJEI)
International Journal of Engineering Inventions (IJEI)International Journal of Engineering Inventions (IJEI)
International Journal of Engineering Inventions (IJEI)
 
Joint DoA and Offset Frequency estimator
Joint DoA and Offset Frequency estimatorJoint DoA and Offset Frequency estimator
Joint DoA and Offset Frequency estimator
 
Human Face Detection and Tracking for Age Rank, Weight and Gender Estimation ...
Human Face Detection and Tracking for Age Rank, Weight and Gender Estimation ...Human Face Detection and Tracking for Age Rank, Weight and Gender Estimation ...
Human Face Detection and Tracking for Age Rank, Weight and Gender Estimation ...
 
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Obje...
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Obje...Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Obje...
Automatic Test Suite Generation for Key-Points Detection DNNs using Many-Obje...
 

Viewers also liked

Statistics: the grammar of Data Science
Statistics: the grammar of Data ScienceStatistics: the grammar of Data Science
Statistics: the grammar of Data Science
Ícaro Medeiros
 
Practical Predictive Modeling in Python
Practical Predictive Modeling in PythonPractical Predictive Modeling in Python
Practical Predictive Modeling in Python
Robert Dempsey
 
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
PyData
 
Creating Your First Predictive Model In Python
Creating Your First Predictive Model In PythonCreating Your First Predictive Model In Python
Creating Your First Predictive Model In Python
Robert Dempsey
 
Why Python is better for Data Science
Why Python is better for Data ScienceWhy Python is better for Data Science
Why Python is better for Data Science
Ícaro Medeiros
 
Deploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in HealthcareDeploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in Healthcare
Health Catalyst
 

Viewers also liked (6)

Statistics: the grammar of Data Science
Statistics: the grammar of Data ScienceStatistics: the grammar of Data Science
Statistics: the grammar of Data Science
 
Practical Predictive Modeling in Python
Practical Predictive Modeling in PythonPractical Predictive Modeling in Python
Practical Predictive Modeling in Python
 
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
 
Creating Your First Predictive Model In Python
Creating Your First Predictive Model In PythonCreating Your First Predictive Model In Python
Creating Your First Predictive Model In Python
 
Why Python is better for Data Science
Why Python is better for Data ScienceWhy Python is better for Data Science
Why Python is better for Data Science
 
Deploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in HealthcareDeploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in Healthcare
 

Similar to How to improve the statistical power of the 10-fold cross validation scheme in Recommender Systems

Addressing the New User Problem with a Personality Based User Similarity Measure
Addressing the New User Problem with a Personality Based User Similarity MeasureAddressing the New User Problem with a Personality Based User Similarity Measure
Addressing the New User Problem with a Personality Based User Similarity Measure
Marko Tkalčič
 
Crystallization classification semisupervised
Crystallization classification semisupervisedCrystallization classification semisupervised
Crystallization classification semisupervised
Madhav Sigdel
 
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Multi illuminant estimation with c...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Multi illuminant estimation with c...IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Multi illuminant estimation with c...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Multi illuminant estimation with c...
IEEEBEBTECHSTUDENTPROJECTS
 
Violent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence ClusteringViolent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence Clustering
csandit
 
Violent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence ClusteringViolent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence Clustering
csandit
 
Violent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence ClusteringViolent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence Clustering
csandit
 
New Research Articles 2019 October Issue Signal & Image Processing An Interna...
New Research Articles 2019 October Issue Signal & Image Processing An Interna...New Research Articles 2019 October Issue Signal & Image Processing An Interna...
New Research Articles 2019 October Issue Signal & Image Processing An Interna...
sipij
 
Retraining maximum likelihood classifiers using low-rank model.ppt
Retraining maximum likelihood classifiers using low-rank model.pptRetraining maximum likelihood classifiers using low-rank model.ppt
Retraining maximum likelihood classifiers using low-rank model.ppt
grssieee
 
Data analysis for effective monitoring of partially shaded residential PV system
Data analysis for effective monitoring of partially shaded residential PV systemData analysis for effective monitoring of partially shaded residential PV system
Data analysis for effective monitoring of partially shaded residential PV system
Sandia National Laboratories: Energy & Climate: Renewables
 
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
MLconf
 
Yanjun Chen_1017_English Version
Yanjun Chen_1017_English VersionYanjun Chen_1017_English Version
Yanjun Chen_1017_English Version
Yanjun Chen
 
Sensors
SensorsSensors
Sensors
Ravi Sankar
 
Program for 2015 ieee international conference on consumer electronics taiw...
Program for 2015 ieee international conference on consumer electronics   taiw...Program for 2015 ieee international conference on consumer electronics   taiw...
Program for 2015 ieee international conference on consumer electronics taiw...
supra_uny
 
A Comparative Case Study on Compression Algorithm for Remote Sensing Images
A Comparative Case Study on Compression Algorithm for Remote Sensing ImagesA Comparative Case Study on Compression Algorithm for Remote Sensing Images
A Comparative Case Study on Compression Algorithm for Remote Sensing Images
DR.P.S.JAGADEESH KUMAR
 
Siguccs20101026
Siguccs20101026Siguccs20101026
Siguccs20101026
Takashi Yamanoue
 
JingLi_Resume
JingLi_ResumeJingLi_Resume
JingLi_Resume
Angela (angeli2)
 
Progress Reprot.pptx
Progress Reprot.pptxProgress Reprot.pptx
Progress Reprot.pptx
rahulverma136219
 
Resume
Resume Resume
Resume Yu-Li Liang
Resume Yu-Li LiangResume Yu-Li Liang
Resume Yu-Li Liang
Yuli Liang
 
18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...
18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...
18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...
LINE Corp.
 

Similar to How to improve the statistical power of the 10-fold cross validation scheme in Recommender Systems (20)

Addressing the New User Problem with a Personality Based User Similarity Measure
Addressing the New User Problem with a Personality Based User Similarity MeasureAddressing the New User Problem with a Personality Based User Similarity Measure
Addressing the New User Problem with a Personality Based User Similarity Measure
 
Crystallization classification semisupervised
Crystallization classification semisupervisedCrystallization classification semisupervised
Crystallization classification semisupervised
 
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Multi illuminant estimation with c...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Multi illuminant estimation with c...IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Multi illuminant estimation with c...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Multi illuminant estimation with c...
 
Violent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence ClusteringViolent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence Clustering
 
Violent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence ClusteringViolent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence Clustering
 
Violent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence ClusteringViolent Scenes Detection Using Mid-Level Violence Clustering
Violent Scenes Detection Using Mid-Level Violence Clustering
 
New Research Articles 2019 October Issue Signal & Image Processing An Interna...
New Research Articles 2019 October Issue Signal & Image Processing An Interna...New Research Articles 2019 October Issue Signal & Image Processing An Interna...
New Research Articles 2019 October Issue Signal & Image Processing An Interna...
 
Retraining maximum likelihood classifiers using low-rank model.ppt
Retraining maximum likelihood classifiers using low-rank model.pptRetraining maximum likelihood classifiers using low-rank model.ppt
Retraining maximum likelihood classifiers using low-rank model.ppt
 
Data analysis for effective monitoring of partially shaded residential PV system
Data analysis for effective monitoring of partially shaded residential PV systemData analysis for effective monitoring of partially shaded residential PV system
Data analysis for effective monitoring of partially shaded residential PV system
 
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
 
Yanjun Chen_1017_English Version
Yanjun Chen_1017_English VersionYanjun Chen_1017_English Version
Yanjun Chen_1017_English Version
 
Sensors
SensorsSensors
Sensors
 
Program for 2015 ieee international conference on consumer electronics taiw...
Program for 2015 ieee international conference on consumer electronics   taiw...Program for 2015 ieee international conference on consumer electronics   taiw...
Program for 2015 ieee international conference on consumer electronics taiw...
 
A Comparative Case Study on Compression Algorithm for Remote Sensing Images
A Comparative Case Study on Compression Algorithm for Remote Sensing ImagesA Comparative Case Study on Compression Algorithm for Remote Sensing Images
A Comparative Case Study on Compression Algorithm for Remote Sensing Images
 
Siguccs20101026
Siguccs20101026Siguccs20101026
Siguccs20101026
 
JingLi_Resume
JingLi_ResumeJingLi_Resume
JingLi_Resume
 
Progress Reprot.pptx
Progress Reprot.pptxProgress Reprot.pptx
Progress Reprot.pptx
 
Resume
Resume Resume
Resume
 
Resume Yu-Li Liang
Resume Yu-Li LiangResume Yu-Li Liang
Resume Yu-Li Liang
 
18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...
18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...
18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...
 

Recently uploaded

Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 

Recently uploaded (20)

Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 

How to improve the statistical power of the 10-fold cross validation scheme in Recommender Systems

  • 1. How to improve the statistical power of the 10-fold cross validation scheme in Recommender Systems University of Ljubljana [LDOS] ..: Faculty of Electrical Engineering ..: Digital Signal, Image and Video Processing Laboratory Andrej Košir Ante Odić Marko Tkalčič
  • 2. University of Ljubljana [LDOS] ..: Faculty of Electrical Engineering ..: Digital Signal, Image and Video Processing Laboratory Statistical power, replicability and reproducibility  What is:  Replicability: to get the same experimental result (on the same data)  Reproducibility : to get similar experimental results leading to the same conclusion Mackay, R., & Oldford, R. (2000). Scientific method, statistical method, and the speed of light, Working paper 2000-02). Department of Statistics and Actuarial Science, University of Waterloo.  In terms of statistical testing  Higher power => better reproducibility  More likely to get to the same conclusions
  • 3. University of Ljubljana [LDOS] ..: Faculty of Electrical Engineering ..: Digital Signal, Image and Video Processing Laboratory On stat hypothese testing  When we need to use stat tests?  The results should not change if we repeat the experiment  When we need it: at later stages of development where results are similar RS 1 F1 0.72 RS 2 F2 0.89 0.74 Test data  Elements of statistical testing      Working hypotheses Null and alternative hypotheses: 𝐻0 and 𝐻1 p-value: 𝑝 Risk level: 𝛼 Decision on 𝐻0
  • 4. University of Ljubljana [LDOS] ..: Faculty of Electrical Engineering ..: Digital Signal, Image and Video Processing Laboratory On errors and statistical power  Errors in test decision:  Errors of type I. and type II.  Effect size  Power:      ˆ H0 ˆ H1 H0 OK type I. H1 type II. OK Power = 𝑃𝑟[ 𝐻1 |𝐻1 ] For each test a new analysis is required more is better The best one can do Task 1 - How to select sample size: apriory power Task 2 - How to estimate achieved power: posterior power  History:  1908 by William Sealy Gosset (Student): he did not need it  Mainly ignored until then  Software: GPower http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/
  • 5. University of Ljubljana [LDOS] ..: Faculty of Electrical Engineering ..: Digital Signal, Image and Video Processing Laboratory The application we were working on: contextual variables  Which contextual variables are relevant:  What is context  Candidates: time, weather, mood, ...  Can we simply use it all? • Irrelevant context can worse the performance of RS  Test if a given context is relevant  How: compare RS with and without it ODIĆ, Ante, TKALČIČ, Marko, TASIČ, Jurij F., KOŠIR, Andrej. Predicting and detecting the relevant contextual information in a movie-recommender system. Interact. comput.. [Print ed.], 2013, vol. 25, no. 1, pp. 74-90, ilustr., doi:10.1093/iwc/iws003. [COBISS.SI-ID 9650260] ODIĆ, Ante, TKALČIČ, Marko, TASIČ, Jurij F., KOŠIR, Andrej. Impact of the context relevancy on ratings prediction in a movie-recommender system. Automatika (Zagreb), 2013, vol. 54, no. 2, pp. 252262, ilustr., doi:10.7305/automatika.54-2.258. [COBISS.SI-ID 9782356]
  • 6. University of Ljubljana [LDOS] ..: Faculty of Electrical Engineering ..: Digital Signal, Image and Video Processing Laboratory The problem we observed: cross validation scheme ODIĆ, Ante, TKALČIČ, Marko, TASIČ, Jurij F., KOŠIR, Andrej. Predicting and detecting the relevant contextual information in a movie-recommender system. Interact. comput., vol. 25, no. 1, pp. 74-90, 2013.  There were differences among folds, but not in conclusion  What is wrong?  Paired / unpaired?  What is usually done:  Confusion matrix computation is actually unpaired
  • 7. University of Ljubljana [LDOS] ..: Faculty of Electrical Engineering ..: Digital Signal, Image and Video Processing Laboratory Proposed solution The procedure outline: 1. 2. 3. 4. Select the scalar comparison measure (such as precision or F-measure). Store the evaluation results of each fold and each method separately; According to the specfic features of the evaluation results (distributions etc.) select the most powerful test that meets these specific features Perform the paired version of the selected test.
  • 8. University of Ljubljana [LDOS] ..: Faculty of Electrical Engineering ..: Digital Signal, Image and Video Processing Laboratory Materials and methods (1)  Dataset:  Context Movie Dataset (LDOS-CoMoDa)  1611 ratings from 89 users to 946 items with associated contextual factors.  Contextual variables • • • • • • • • • • time (morning, afternoon, evening, night), daytype (working day, weekend, time (morning, afternoon, evening, night), season (spring, summer, autumn, winter), Location (home, public place, friend's house), weather (sunny/clear, rainy, stormy, snowy, cloudy), social (alone, partner, friends, colleagues, parents, public, family), endEmo (sad, happy, scared, surprised, angry, disgusted, neutral), dominantEmo (sad, happy, scared, surprised, angry, disgusted, neutral), mood (positive, neutral, negative), physical (healthy, ill), decision (user's choice, given by other), interaction (1rst, n-th)  Publically available: LDOS-CoMoDa contextual dataset: available at www.ldos.si/comoda.html. Used by 29 researchers at this moment.
  • 9. University of Ljubljana [LDOS] ..: Faculty of Electrical Engineering ..: Digital Signal, Image and Video Processing Laboratory Materials and methods (2), results  Experimental design  10-fold cross validation  Two procedures: ProcPaired, ProcIndep  Results – which contextual variable improves MF?  Tests: Wilcoxon signed rank test (ProcIndep) and Mann Whitney U test, (ProcPaired)  The achieved (post-hoc) statistical power for the paired test (pw pa.) and for the independent test (pw in.) along with the computed p-values Id Var 1 Var 2 1 Physical 2 3 pw paired p paired pw indep. p indep. Weather 0.42 0.001 0.14 0.24 Decision Social 0.99 0.004 0.25 0.19 interaction Social 0.06 <0.001 0.05 0.43
  • 10. University of Ljubljana [LDOS] ..: Faculty of Electrical Engineering ..: Digital Signal, Image and Video Processing Laboratory Discussion  Power improvements:  The first combination (physical vs. weather): 0.14  0.42, low but useful;  The second combination (decision vs. social): 0.19  0.99, the difference in power is again substantial;  The third combination (interaction vs. social): 0.05  0.06, irrelevant;  It does not require substantial additional work  Worth of effort
  • 11. University of Ljubljana [LDOS] ..: Faculty of Electrical Engineering ..: Digital Signal, Image and Video Processing Laboratory Further work  We limited to 10-fold cross validation and simple tests only. There is more out there.  We will concentrate on a comparison of RS regarding the selected final tasks (such as best five) and not limited to scalar performance measures (such as precision at five).  More sophisticated statistical approaches:  are available such as a multi-level repeated binomial regression  my opinion: will not be used frequently THANK YOU Invitation: International Conference on Automatic Face and Gesture Recognition FG2015, http://www.fg2015.org/
  • 12. University of Ljubljana [LDOS] ..: Faculty of Electrical Engineering ..: Digital Signal, Image and Video Processing Laboratory Presentation structure         The goal What it has to do with replicability and reproducibility? Selected items from statistics Our case & problem statement Proposed solution & comments Experimental results Future work Take away notes