SlideShare a Scribd company logo
1 of 17
Download to read offline
A Similarity-based Adaptation of Naive Bayes for Label
Ranking: Application to the Metalearning Problem of
Algorithm Recommendation
Artur Aiguzhinov1
Carlos Soares1
Ana Paula Serra2
1
LIAAD-INESC Porto LA & Faculdade de Economia da Universidade do Porto
2
Faculdade de Economia da Universidade do Porto & CEFUP - Centro de Economia e Finan¸cas
da Universidade do Porto
October 8th, 2010
Discovery Science 2010, Canberra
1 of 1
Motivation:
ability to predict rankings ahead of time (e.g., financial analysts,
algorithms);
popular topic in Machine Learning;
Why use naive Bayes?:
successful results in many applications;
utilizes Bayesian framework;
2 of 1
Label ranking: formalization
Instance: X ⊆ {V1, . . . , Vm}
Labels: L = {λ1, . . . , λk }
Output: Y = ΠL
Training set: T = {xi , yi }i∈{1,...,n} ⊆ X × Y
Learn a mapping h : X → Y such that a loss function is minimized:
=
n
i=1 ρ(πi , ˆπi )
n
(1)
with ρ being a Spearman correlation coefficient:
ρ(π, ˆπ) = 1 −
6
k
j=1(πj − ˆπj )2
k3 − k
(2)
where π and ˆπ are, respectively, the target and predicted rankings for a
given instance.
3 of 1
Naive Bayes for Classification
Day Outlook Temperature Humidity Wind Play tennis
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
cnb(xi ) = arg max
λ∈L
P (λ)
m
j=1
P (xi,j |λ) (3)
Example:
Prior probability: P (Yes) = 4/7
Conditional probability: P (Outlook = Rain|Yes) = 2/4 = 1/2
4 of 1
Naive Bayes for Label Ranking
Day Outlook Temperature Humidity Wind A B C
1 Sunny Hot High Weak 1 2 3
2 Sunny Hot High Strong 2 3 1
3 Overcast Hot High Weak 1 2 3
4 Rain Mild High Weak 3 2 1
5 Rain Cool Normal Weak 3 2 1
6 Rain Cool Normal Strong 2 1 3
7 Overcast Cool Normal Strong 1 2 3
Main idea: maximizing the likelihood is equivalent to minimizing the
distance (i.e., maximizing the similarity) in a Euclidean space
5 of 1
Prior Probability of Label Ranking
Table: Comparison of values of prior probability by addressing the label ranking
problem as a classification problem or using similarity
π P(π) PLR (π)
1 2 3 3/7 = 0.428 0.571
2 1 3 1/7 = 0.143 0.546
PLR (π) =
n
i=1 ρ(π, πi )
n
6 of 1
Conditional Probability of Label Ranking
Table: Comparison of values of conditional probability by addressing the label
ranking problem as a classification problem or using similarity
π P(Outlook = Rain|π) PLR (Outlook = Rain|π)
3 2 1 2/2=1.00 0.75
2 1 3 1/1=1.00 0.50
1 2 3 0/3=0.00 0.25
PLR (va,i |π) =
i:xi,a=va,i
ρ(π, πi )
|{i : xi,a = va,i }|
7 of 1
Adapting Naive Bayes for Label Ranking
Estimated ranking:
ˆπ = arg max
π∈ΠL
PLR (π)
m
a=1
PLR (xi,a|π) (4)
8 of 1
Metalearning Problem of Algorithm Recommendation
Problem of algorithm selection:
Choose the best algorithm for a given dataset.
Metalearning approach:
gather information about the performance of algorithms on many
datasets;
find a mapping between characteristics of the datasets and the
performance of the algorithms;
Label ranking for metalearning: ranking of the algorithms, according to their
performance.
9 of 1
Baseline
The baseline:
¯π−1
j =
n
i=1 π−1
i,j
n
(5)
where π−1
i,j is the rank of label λj on dataset i.
10 of 1
Dataset description
class: performance of 10 algorithms on a set of 57 classification
base-level dataset.
regr: performance of 9 algorithms on a set of 42 regression base-level
dataset.
svm-*: performance of 4 datasets with different variants of the Support
Vector Machines algorithm on the same 42 regression BLD.
11 of 1
Experiment Results: Metalearning
Table: Experimental results of the adapted naive Bayes algorithm for label ranking
compared to the baseline. Items with (*),(**), and (***) have statistical
significance at 10% , 5% , and 1% confidence level respectively.
Dataset NBr Baseline p-values
class 0.506 0.479 0.000***
regr 0.658 0.523 0.056*
svm-5 0.326 0.083 0.000***
svm-11 0.372 0.144 0.029**
svm-21 0.362 0.229 0.055*
svm-eps01 0.369 0.244 0.091*
12 of 1
Conclusion and Future work
Summary:
a similarity based approach for label ranking problem;
utilize Bayesian framework for ranking prediction;
outperforms baseline;
Future work:
treating missing values
adapt for continuous variables;
13 of 1
Ranking of Financial Analysts (to be presented at
FMA Annual Meeting Oct. 19th, 2010, NY)
StarMine R
issues annual analyst rankings:
Ranks the analysts based on recommendation performance and EPS
forecast accuracy;
Why not to predict stock prices directly?
Analysts’ relative performance (rankings) is more predictable than the stock
prices.
Is it possible to predict these rankings?:
If yes, can we use those predictions in profitable strategy?;
14 of 1
Methodology
236 stocks from 4 sectors (Energy, Materials, IT, Industrials);
quarterly EPS forecasts from 1989 until 2009;
variables that describe market conditions and stock characteristics;
15 of 1
Data summary
Table: Summary of the data
Sector # analysts # stocks
Energy 135 34
Industrials 208 66
Materials 147 30
IT 301 106
Total 791 236
16 of 1
Experiment Results: Financial Analysts
Table: Summary of the results compared to default ranking
Sectors # stocks # of stocks with Prediction # stocks with p-values
successful predictions rate 1% 5% 10%
Energy 34 18 0.53 7 9 9
Industrials 66 31 0.47 16 18 23
Materials 30 12 0.40 5 7 8
IT 106 51 0.48 18 27 30
Total 236 112 0.47 46 61 70
17 of 1

More Related Content

What's hot

Regression_1.pdf
Regression_1.pdfRegression_1.pdf
Regression_1.pdfAmir Saleh
 
2. polynomial interpolation
2. polynomial interpolation2. polynomial interpolation
2. polynomial interpolationEasyStudy3
 
3D scattered data interpolation and approximation2005
3D scattered data interpolation and approximation20053D scattered data interpolation and approximation2005
3D scattered data interpolation and approximation2005德云 钟
 
Recommendation System --Theory and Practice
Recommendation System --Theory and PracticeRecommendation System --Theory and Practice
Recommendation System --Theory and PracticeKimikazu Kato
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015Christian Robert
 
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresLinear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresAnmol Dwivedi
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeGilles Louppe
 
Lecture 2 family of fcts
Lecture 2   family of fctsLecture 2   family of fcts
Lecture 2 family of fctsnjit-ronbrown
 
Tutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksTutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksAnmol Dwivedi
 
Recitation decision trees-adaboost-02-09-2006-3
Recitation decision trees-adaboost-02-09-2006-3Recitation decision trees-adaboost-02-09-2006-3
Recitation decision trees-adaboost-02-09-2006-3Charu Khatwani
 
Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...Kenta Oono
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forestsChristian Robert
 

What's hot (20)

Statistics Assignment Help
Statistics Assignment HelpStatistics Assignment Help
Statistics Assignment Help
 
Regression_1.pdf
Regression_1.pdfRegression_1.pdf
Regression_1.pdf
 
2. polynomial interpolation
2. polynomial interpolation2. polynomial interpolation
2. polynomial interpolation
 
Pqrs
PqrsPqrs
Pqrs
 
3D scattered data interpolation and approximation2005
3D scattered data interpolation and approximation20053D scattered data interpolation and approximation2005
3D scattered data interpolation and approximation2005
 
Recommendation System --Theory and Practice
Recommendation System --Theory and PracticeRecommendation System --Theory and Practice
Recommendation System --Theory and Practice
 
Chapter 3 ds
Chapter 3 dsChapter 3 ds
Chapter 3 ds
 
Midterm
MidtermMidterm
Midterm
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresLinear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
 
Lecture 2 family of fcts
Lecture 2   family of fctsLecture 2   family of fcts
Lecture 2 family of fcts
 
statistics assignment help
statistics assignment helpstatistics assignment help
statistics assignment help
 
1
11
1
 
Tutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksTutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian Networks
 
Recitation decision trees-adaboost-02-09-2006-3
Recitation decision trees-adaboost-02-09-2006-3Recitation decision trees-adaboost-02-09-2006-3
Recitation decision trees-adaboost-02-09-2006-3
 
Isolation Forest
Isolation ForestIsolation Forest
Isolation Forest
 
Intractable likelihoods
Intractable likelihoodsIntractable likelihoods
Intractable likelihoods
 
Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forests
 

Similar to ds2010

Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonChun-Ming Chang
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程台灣資料科學年會
 
Design matrix and Contrast
Design matrix and ContrastDesign matrix and Contrast
Design matrix and ContrastGang Cui
 
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...Alexander Litvinenko
 
S1 - Process product optimization using design experiments and response surfa...
S1 - Process product optimization using design experiments and response surfa...S1 - Process product optimization using design experiments and response surfa...
S1 - Process product optimization using design experiments and response surfa...CAChemE
 
Logistic Regression in Case-Control Study
Logistic Regression in Case-Control StudyLogistic Regression in Case-Control Study
Logistic Regression in Case-Control StudySatish Gupta
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingProf. Wim Van Criekinge
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.pptbutest
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.pptbutest
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.pptbutest
 
Data classification sammer
Data classification sammer Data classification sammer
Data classification sammer Sammer Qader
 
MLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackMLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackarogozhnikov
 
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
Maxim Kazantsev
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Charles Martin
 
A hybrid sine cosine optimization algorithm for solving global optimization p...
A hybrid sine cosine optimization algorithm for solving global optimization p...A hybrid sine cosine optimization algorithm for solving global optimization p...
A hybrid sine cosine optimization algorithm for solving global optimization p...Aboul Ella Hassanien
 

Similar to ds2010 (20)

Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程
 
Design matrix and Contrast
Design matrix and ContrastDesign matrix and Contrast
Design matrix and Contrast
 
fma.ny.presentation
fma.ny.presentationfma.ny.presentation
fma.ny.presentation
 
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
 
Gene expression profiling ii
Gene expression profiling  iiGene expression profiling  ii
Gene expression profiling ii
 
S1 - Process product optimization using design experiments and response surfa...
S1 - Process product optimization using design experiments and response surfa...S1 - Process product optimization using design experiments and response surfa...
S1 - Process product optimization using design experiments and response surfa...
 
Logistic Regression in Case-Control Study
Logistic Regression in Case-Control StudyLogistic Regression in Case-Control Study
Logistic Regression in Case-Control Study
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searching
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
Data classification sammer
Data classification sammer Data classification sammer
Data classification sammer
 
MLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackMLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic track
 
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
 
Seattle.Slides.7
Seattle.Slides.7Seattle.Slides.7
Seattle.Slides.7
 
A hybrid sine cosine optimization algorithm for solving global optimization p...
A hybrid sine cosine optimization algorithm for solving global optimization p...A hybrid sine cosine optimization algorithm for solving global optimization p...
A hybrid sine cosine optimization algorithm for solving global optimization p...
 
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
 

ds2010

  • 1. A Similarity-based Adaptation of Naive Bayes for Label Ranking: Application to the Metalearning Problem of Algorithm Recommendation Artur Aiguzhinov1 Carlos Soares1 Ana Paula Serra2 1 LIAAD-INESC Porto LA & Faculdade de Economia da Universidade do Porto 2 Faculdade de Economia da Universidade do Porto & CEFUP - Centro de Economia e Finan¸cas da Universidade do Porto October 8th, 2010 Discovery Science 2010, Canberra 1 of 1
  • 2. Motivation: ability to predict rankings ahead of time (e.g., financial analysts, algorithms); popular topic in Machine Learning; Why use naive Bayes?: successful results in many applications; utilizes Bayesian framework; 2 of 1
  • 3. Label ranking: formalization Instance: X ⊆ {V1, . . . , Vm} Labels: L = {λ1, . . . , λk } Output: Y = ΠL Training set: T = {xi , yi }i∈{1,...,n} ⊆ X × Y Learn a mapping h : X → Y such that a loss function is minimized: = n i=1 ρ(πi , ˆπi ) n (1) with ρ being a Spearman correlation coefficient: ρ(π, ˆπ) = 1 − 6 k j=1(πj − ˆπj )2 k3 − k (2) where π and ˆπ are, respectively, the target and predicted rankings for a given instance. 3 of 1
  • 4. Naive Bayes for Classification Day Outlook Temperature Humidity Wind Play tennis 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 3 Overcast Hot High Weak Yes 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 6 Rain Cool Normal Strong No 7 Overcast Cool Normal Strong Yes cnb(xi ) = arg max λ∈L P (λ) m j=1 P (xi,j |λ) (3) Example: Prior probability: P (Yes) = 4/7 Conditional probability: P (Outlook = Rain|Yes) = 2/4 = 1/2 4 of 1
  • 5. Naive Bayes for Label Ranking Day Outlook Temperature Humidity Wind A B C 1 Sunny Hot High Weak 1 2 3 2 Sunny Hot High Strong 2 3 1 3 Overcast Hot High Weak 1 2 3 4 Rain Mild High Weak 3 2 1 5 Rain Cool Normal Weak 3 2 1 6 Rain Cool Normal Strong 2 1 3 7 Overcast Cool Normal Strong 1 2 3 Main idea: maximizing the likelihood is equivalent to minimizing the distance (i.e., maximizing the similarity) in a Euclidean space 5 of 1
  • 6. Prior Probability of Label Ranking Table: Comparison of values of prior probability by addressing the label ranking problem as a classification problem or using similarity π P(π) PLR (π) 1 2 3 3/7 = 0.428 0.571 2 1 3 1/7 = 0.143 0.546 PLR (π) = n i=1 ρ(π, πi ) n 6 of 1
  • 7. Conditional Probability of Label Ranking Table: Comparison of values of conditional probability by addressing the label ranking problem as a classification problem or using similarity π P(Outlook = Rain|π) PLR (Outlook = Rain|π) 3 2 1 2/2=1.00 0.75 2 1 3 1/1=1.00 0.50 1 2 3 0/3=0.00 0.25 PLR (va,i |π) = i:xi,a=va,i ρ(π, πi ) |{i : xi,a = va,i }| 7 of 1
  • 8. Adapting Naive Bayes for Label Ranking Estimated ranking: ˆπ = arg max π∈ΠL PLR (π) m a=1 PLR (xi,a|π) (4) 8 of 1
  • 9. Metalearning Problem of Algorithm Recommendation Problem of algorithm selection: Choose the best algorithm for a given dataset. Metalearning approach: gather information about the performance of algorithms on many datasets; find a mapping between characteristics of the datasets and the performance of the algorithms; Label ranking for metalearning: ranking of the algorithms, according to their performance. 9 of 1
  • 10. Baseline The baseline: ¯π−1 j = n i=1 π−1 i,j n (5) where π−1 i,j is the rank of label λj on dataset i. 10 of 1
  • 11. Dataset description class: performance of 10 algorithms on a set of 57 classification base-level dataset. regr: performance of 9 algorithms on a set of 42 regression base-level dataset. svm-*: performance of 4 datasets with different variants of the Support Vector Machines algorithm on the same 42 regression BLD. 11 of 1
  • 12. Experiment Results: Metalearning Table: Experimental results of the adapted naive Bayes algorithm for label ranking compared to the baseline. Items with (*),(**), and (***) have statistical significance at 10% , 5% , and 1% confidence level respectively. Dataset NBr Baseline p-values class 0.506 0.479 0.000*** regr 0.658 0.523 0.056* svm-5 0.326 0.083 0.000*** svm-11 0.372 0.144 0.029** svm-21 0.362 0.229 0.055* svm-eps01 0.369 0.244 0.091* 12 of 1
  • 13. Conclusion and Future work Summary: a similarity based approach for label ranking problem; utilize Bayesian framework for ranking prediction; outperforms baseline; Future work: treating missing values adapt for continuous variables; 13 of 1
  • 14. Ranking of Financial Analysts (to be presented at FMA Annual Meeting Oct. 19th, 2010, NY) StarMine R issues annual analyst rankings: Ranks the analysts based on recommendation performance and EPS forecast accuracy; Why not to predict stock prices directly? Analysts’ relative performance (rankings) is more predictable than the stock prices. Is it possible to predict these rankings?: If yes, can we use those predictions in profitable strategy?; 14 of 1
  • 15. Methodology 236 stocks from 4 sectors (Energy, Materials, IT, Industrials); quarterly EPS forecasts from 1989 until 2009; variables that describe market conditions and stock characteristics; 15 of 1
  • 16. Data summary Table: Summary of the data Sector # analysts # stocks Energy 135 34 Industrials 208 66 Materials 147 30 IT 301 106 Total 791 236 16 of 1
  • 17. Experiment Results: Financial Analysts Table: Summary of the results compared to default ranking Sectors # stocks # of stocks with Prediction # stocks with p-values successful predictions rate 1% 5% 10% Energy 34 18 0.53 7 9 9 Industrials 66 31 0.47 16 18 23 Materials 30 12 0.40 5 7 8 IT 106 51 0.48 18 27 30 Total 236 112 0.47 46 61 70 17 of 1