ds2010

A Similarity-based Adaptation of Naive Bayes for Label
Ranking: Application to the Metalearning Problem of
Algorithm Recommendation
Artur Aiguzhinov1
Carlos Soares1
Ana Paula Serra2
1
LIAAD-INESC Porto LA & Faculdade de Economia da Universidade do Porto
2
Faculdade de Economia da Universidade do Porto & CEFUP - Centro de Economia e Finan¸cas
da Universidade do Porto
October 8th, 2010
Discovery Science 2010, Canberra
1 of 1

Motivation:
ability to predict rankings ahead of time (e.g., ﬁnancial analysts,
algorithms);
popular topic in Machine Learning;
Why use naive Bayes?:
successful results in many applications;
utilizes Bayesian framework;
2 of 1

Label ranking: formalization
Instance: X ⊆ {V1, . . . , Vm}
Labels: L = {λ1, . . . , λk }
Output: Y = ΠL
Training set: T = {xi , yi }i∈{1,...,n} ⊆ X × Y
Learn a mapping h : X → Y such that a loss function is minimized:
=
n
i=1 ρ(πi , ˆπi )
n
(1)
with ρ being a Spearman correlation coeﬃcient:
ρ(π, ˆπ) = 1 −
6
k
j=1(πj − ˆπj )2
k3 − k
(2)
where π and ˆπ are, respectively, the target and predicted rankings for a
given instance.
3 of 1

Naive Bayes for Classiﬁcation
Day Outlook Temperature Humidity Wind Play tennis
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
cnb(xi ) = arg max
λ∈L
P (λ)
m
j=1
P (xi,j |λ) (3)
Example:
Prior probability: P (Yes) = 4/7
Conditional probability: P (Outlook = Rain|Yes) = 2/4 = 1/2
4 of 1

Naive Bayes for Label Ranking
Day Outlook Temperature Humidity Wind A B C
1 Sunny Hot High Weak 1 2 3
2 Sunny Hot High Strong 2 3 1
3 Overcast Hot High Weak 1 2 3
4 Rain Mild High Weak 3 2 1
5 Rain Cool Normal Weak 3 2 1
6 Rain Cool Normal Strong 2 1 3
7 Overcast Cool Normal Strong 1 2 3
Main idea: maximizing the likelihood is equivalent to minimizing the
distance (i.e., maximizing the similarity) in a Euclidean space
5 of 1

Prior Probability of Label Ranking
Table: Comparison of values of prior probability by addressing the label ranking
problem as a classiﬁcation problem or using similarity
π P(π) PLR (π)
1 2 3 3/7 = 0.428 0.571
2 1 3 1/7 = 0.143 0.546
PLR (π) =
n
i=1 ρ(π, πi )
n
6 of 1

Conditional Probability of Label Ranking
Table: Comparison of values of conditional probability by addressing the label
ranking problem as a classiﬁcation problem or using similarity
π P(Outlook = Rain|π) PLR (Outlook = Rain|π)
3 2 1 2/2=1.00 0.75
2 1 3 1/1=1.00 0.50
1 2 3 0/3=0.00 0.25
PLR (va,i |π) =
i:xi,a=va,i
ρ(π, πi )
|{i : xi,a = va,i }|
7 of 1

Adapting Naive Bayes for Label Ranking
Estimated ranking:
ˆπ = arg max
π∈ΠL
PLR (π)
m
a=1
PLR (xi,a|π) (4)
8 of 1

Metalearning Problem of Algorithm Recommendation
Problem of algorithm selection:
Choose the best algorithm for a given dataset.
Metalearning approach:
gather information about the performance of algorithms on many
datasets;
ﬁnd a mapping between characteristics of the datasets and the
performance of the algorithms;
Label ranking for metalearning: ranking of the algorithms, according to their
performance.
9 of 1

Baseline
The baseline:
¯π−1
j =
n
i=1 π−1
i,j
n
(5)
where π−1
i,j is the rank of label λj on dataset i.
10 of 1

Dataset description
class: performance of 10 algorithms on a set of 57 classiﬁcation
base-level dataset.
regr: performance of 9 algorithms on a set of 42 regression base-level
dataset.
svm-*: performance of 4 datasets with diﬀerent variants of the Support
Vector Machines algorithm on the same 42 regression BLD.
11 of 1

Experiment Results: Metalearning
Table: Experimental results of the adapted naive Bayes algorithm for label ranking
compared to the baseline. Items with (*),(**), and (***) have statistical
signiﬁcance at 10% , 5% , and 1% conﬁdence level respectively.
Dataset NBr Baseline p-values
class 0.506 0.479 0.000***
regr 0.658 0.523 0.056*
svm-5 0.326 0.083 0.000***
svm-11 0.372 0.144 0.029**
svm-21 0.362 0.229 0.055*
svm-eps01 0.369 0.244 0.091*
12 of 1

Conclusion and Future work
Summary:
a similarity based approach for label ranking problem;
utilize Bayesian framework for ranking prediction;
outperforms baseline;
Future work:
treating missing values
adapt for continuous variables;
13 of 1

Ranking of Financial Analysts (to be presented at
FMA Annual Meeting Oct. 19th, 2010, NY)
StarMine R
issues annual analyst rankings:
Ranks the analysts based on recommendation performance and EPS
forecast accuracy;
Why not to predict stock prices directly?
Analysts’ relative performance (rankings) is more predictable than the stock
prices.
Is it possible to predict these rankings?:
If yes, can we use those predictions in proﬁtable strategy?;
14 of 1

Methodology
236 stocks from 4 sectors (Energy, Materials, IT, Industrials);
quarterly EPS forecasts from 1989 until 2009;
variables that describe market conditions and stock characteristics;
15 of 1

Data summary
Table: Summary of the data
Sector # analysts # stocks
Energy 135 34
Industrials 208 66
Materials 147 30
IT 301 106
Total 791 236
16 of 1

Experiment Results: Financial Analysts
Table: Summary of the results compared to default ranking
Sectors # stocks # of stocks with Prediction # stocks with p-values
successful predictions rate 1% 5% 10%
Energy 34 18 0.53 7 9 9
Industrials 66 31 0.47 16 18 23
Materials 30 12 0.40 5 7 8
IT 106 51 0.48 18 27 30
Total 236 112 0.47 46 61 70
17 of 1

ds2010

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to ds2010

Similar to ds2010 (20)

ds2010