SlideShare a Scribd company logo
1 of 28
Download to read offline
Meta Learning via Collaborative
Filtering
Barnett Chiu
10.01.19
Ensemble Learning
• Ensemble learning methods combine the
decisions from multiple models to improve
the overall performance.
– Key idea: tradeoff between diversity and accuracy
– Decreases variance and generalization errors
• Homogeneous ensemble
• Heterogeneous ensemble
CF Ensemble Learning
• Collaborative filtering
• Operated mainly in the context of heterogeneous ensemble
learning settings, in which base predictors are assumed to
be quite different
– In general, applicable as long as the base predictor outputs are
in the form of probability scores
• CF Ensemble deals with probability matrices (i.e. prediction
matrices of the base predictors)
– Probability matrix is analogous to rating matrix in CF settings
– A method of stacking (or stacked generalization)
– It attempts to identify unreliable entries in the probability
matrix and works its way to re-estimate probability values for
these entries
• Collaborative filtering
• Probability matrix / rating matrix
Collaborative Filtering (Basics)
Collaborative Filtering
Decision Tree Logistic kNN Classifier Multilayer Perceptron
0.853 0.709 0.528 0.935
Probability matrix
…
0.7 0.9 0.2 0.6 0.7
0.2 0.1 0.6 0.1 0.3
0.1 0.8 0.7 0.2 … 0.8
0.3 0.7 0.3 0.6 0.7
0 1 0 0 … 1
…
base
predictors
data instances
labels
Majority Votes
0.28
0.43 0.59
0.33 0.41
0.08 0.34
0.27 0.17
0.61 0.49
0.05
0.62
0.12
0.55
0.92
0.43
0.88
0.71
0.47
0.42
0.01
0.34
0.23
0.64
0
0 1
0 0
0 0
0 0
1 0
0
1
0
1
1
0
1
1
0
0
0
0
0
1
0
0 1
0 0
0 0
0 0
1 0
0
1
0
1
1
0
1
1
0
0
0
0
0
1
0 0 0 1
0
• Labeling matrix (L) as a binary matrix
Stacking
…
…
…
…
…
…
Training
set
Test
set
new instance?
Ensemble Selection
E.g. CES (Caruana’s Ensemble Selection)
… Iteratively grows the ensemble by selecting base predictors that leads to
higher gains in chosen performance metric.
CF Ensemble Learning
…
…
…
…
…
…
…
…
R
X
Y
R’
data points
base
predictors
Identify unreliable conditional
probabilities
Re-estimate the unreliable entries
via reliable entries
Unreliable entries are analogous
to un-rated entries in recommender system
Predict what could have been a better
estimate of p(y=1|x) using reliable entries
Need to figure out the latent representation
for both classifiers and data points in order to
re-estimate conditional probabilities and
quantify classifier and data similarities
Latent Factor Representation
…
…
…
…
…
…
…
… …
…
…
…
…
…
<y1,y2,y3>
<x1,x2,x3>
Optimization Objectives (1)
• Minimizing the reconstruction error
• Minimizing the weighted reconstruction error, where Cui: {0, 1}
• Since every entry of the probability matrix is “observed,” instead
let’s think about which should remain in the cost functions
!!" − !!
!
∙ !!
!
!,!
+ ! !!
!
!
+ !!
!
!
	
• Include TPs and TNs and leave out FPs and FNs
Optimization Objectives (2)
• Minimizing the weighted reconstruction error with confidence scores (as
continuous quantities rather than discrete values like {0, 1}
• In classification, the situation is more complex because not TPs and TNs
are made equal
• We have very skewed class distributions (e.g. protein function prediction):
very few positive classes
• Each probability score has an associated confidence measure
CF for Ensemble Learning
• How to determine (degrees) of reliability
• Confidence score
– Brier score
– Ratio of correct predictions
Optimization Objectives (3)
• Estimate preference score {0, 1}, depending on the “polarity (M)”
!!" !!" − !!
!
∙ !!
!
!,!
+ ! !!
!
!
+ !!
!
!
	
• Positive polarity if the entry (u, i) corresponds to TPs or TNs
• Negative polarity if the entry (u, i) corresponds to FPs or FNs
• M[u,i] represents a “polarity” of an entry in the matrix
Polarity Matrix
…
…
…
0 1 0 0 0 … 1
base
predictors
data points
labels
+
-
+
+
+ +
+
-
+
+
• If we could identify the polarities reasonably well and drop the negatives
at the prediction time, the predictive performance would be exceptional
• Why? When there are lots of BPs (e.g. via bagging), chances are that there’s
one or more predictors are making correct predictions (for the most part).
0.28
0.43 0.59
0.33 0.41
0.08 0.34
0.27 0.17
0.61 0.49
0.05
0.62
0.12
0.55
0.92
0.43
0.88
0.71
0.47
0.42
0.01
0.34
0.23
0.64
0
0 1
0 0
0 0
0 0
1 0
0
1
0
1
1
0
1
1
0
0
0
0
0
1
0
0 1
0 0
0 0
0 0
1 0
0
1
0
1
1
0
1
1
0
0
0
0
0
1
0 0 0 1
0
Cost Function with Polarities (1)
…
…
…
0 1 0 0 0 … 1
base
predictors
data points
labels
+
-
+
+
+ +
+
-
+
+
• When approximating “ratings” …
Cost Function with Polarities (2)
!!" !!" − !!
!
∙ !!
!
!,!
+ ! !!
!
!
+ !!
!
!
	
…
…
…
0 1 0 0 0 … 1
base
predictors
data points
labels
+
-
+
+
+ +
+
-
+
+
• When representing preference scores …
Polarity Modeling
• Can be broken down into a four-class
prediction problem: i.e. predicting TPs, TNs,
FPs, FNs
• Seems more complex however …
– In predicting polarities, we actually have more
training examples (than when we predict class
labels)
• Each entry in the probability matrix is a training
instance
– We do not need a perfect model
Polarity Modeling
• Baseline model via majority votes
– Drawback: label dependent
• Identify useful features for predicting polarity
– Horizontal statistics (foreach BP …)
• R[u, i] > median of probability scores?
• KDE signature (next slides)
• How does R[u, i] compare to BP’s own probability threshold
– Vertical statistics (foreach data instance …)
• Majority votes
• Rank of R[u, i]
• KDE? too expensive
Polarity Modeling: KDEstimates
• KDE for the four different flavors of particles:
TPs, TNs, FPs, FNs …
• Given a query point R[u, i], get the
“amplitudes” for the four flavors of particles
– If R[u, i] corresponds to TPs, it tends to be large;
by contrast, if R[u, i] corresponds to TNs, it tends
to be small
– Perhaps even better, use survival function to find
P(R>=R[u, i]) given the density estimate
Training
Data 0 1 0 0 0 …
…
…
…
Original base-level
probability estimates
(or meta-data)
with LESS
reliable entries
Ground Truth
Transformed
meta-data with
MORE reliable
entries
Substitute new
estimates for
unreliable ones via
latent factor-based
CF
1
…
…
…
…
… 0 1 0 0 0
Various blending methods
(e.g. stacking, mean prediction,
enemble selection …)
Final prediction
Transformed meta-data
Ensemble Generation Ensemble Transformation Ensemble Integration
m classifiers
varied by
decision
boundaries
(e.g. m=5)
Source data
of size n as input
(e.g. n=10K)
5-by-2
2-by-10K
Meta-Learning via Latent-Factor-Based Collaborative Filtering

More Related Content

Similar to Meta-Learning via Latent-Factor-Based Collaborative Filtering

Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsAlejandro Bellogin
 
Classification Assessment Methods.pptx
Classification Assessment  Methods.pptxClassification Assessment  Methods.pptx
Classification Assessment Methods.pptxRiadh Al-Haidari
 
Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit ivmalathieswaran29
 
Mathematics online: some common algorithms
Mathematics online: some common algorithmsMathematics online: some common algorithms
Mathematics online: some common algorithmsMark Moriarty
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Maninda Edirisooriya
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationThomas Ploetz
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksKevin Lee
 
Feature recognition and classification
Feature recognition and classificationFeature recognition and classification
Feature recognition and classificationSooraz Sresta
 
Teaching Constraint Programming, Patrick Prosser
Teaching Constraint Programming,  Patrick ProsserTeaching Constraint Programming,  Patrick Prosser
Teaching Constraint Programming, Patrick ProsserPierre Schaus
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningAI Summary
 
Lecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptxLecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptxajondaree
 

Similar to Meta-Learning via Latent-Factor-Based Collaborative Filtering (20)

Overfitting and-tbl
Overfitting and-tblOverfitting and-tbl
Overfitting and-tbl
 
Lecture1.ppt
Lecture1.pptLecture1.ppt
Lecture1.ppt
 
Mzi pakdd2019 v2
Mzi pakdd2019 v2Mzi pakdd2019 v2
Mzi pakdd2019 v2
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
Classification Assessment Methods.pptx
Classification Assessment  Methods.pptxClassification Assessment  Methods.pptx
Classification Assessment Methods.pptx
 
Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit iv
 
Mathematics online: some common algorithms
Mathematics online: some common algorithmsMathematics online: some common algorithms
Mathematics online: some common algorithms
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
MLE.pdf
MLE.pdfMLE.pdf
MLE.pdf
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
 
evaluation and credibility-Part 1
evaluation and credibility-Part 1evaluation and credibility-Part 1
evaluation and credibility-Part 1
 
Feature recognition and classification
Feature recognition and classificationFeature recognition and classification
Feature recognition and classification
 
Teaching Constraint Programming, Patrick Prosser
Teaching Constraint Programming,  Patrick ProsserTeaching Constraint Programming,  Patrick Prosser
Teaching Constraint Programming, Patrick Prosser
 
1015 track2 abbott
1015 track2 abbott1015 track2 abbott
1015 track2 abbott
 
1030 track2 abbott
1030 track2 abbott1030 track2 abbott
1030 track2 abbott
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Lecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptxLecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptx
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 

Recently uploaded

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 

Recently uploaded (20)

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 

Meta-Learning via Latent-Factor-Based Collaborative Filtering

  • 1. Meta Learning via Collaborative Filtering Barnett Chiu 10.01.19
  • 2. Ensemble Learning • Ensemble learning methods combine the decisions from multiple models to improve the overall performance. – Key idea: tradeoff between diversity and accuracy – Decreases variance and generalization errors • Homogeneous ensemble • Heterogeneous ensemble
  • 3.
  • 4. CF Ensemble Learning • Collaborative filtering • Operated mainly in the context of heterogeneous ensemble learning settings, in which base predictors are assumed to be quite different – In general, applicable as long as the base predictor outputs are in the form of probability scores • CF Ensemble deals with probability matrices (i.e. prediction matrices of the base predictors) – Probability matrix is analogous to rating matrix in CF settings – A method of stacking (or stacked generalization) – It attempts to identify unreliable entries in the probability matrix and works its way to re-estimate probability values for these entries
  • 5. • Collaborative filtering • Probability matrix / rating matrix
  • 8. Decision Tree Logistic kNN Classifier Multilayer Perceptron 0.853 0.709 0.528 0.935
  • 9. Probability matrix … 0.7 0.9 0.2 0.6 0.7 0.2 0.1 0.6 0.1 0.3 0.1 0.8 0.7 0.2 … 0.8 0.3 0.7 0.3 0.6 0.7 0 1 0 0 … 1 … base predictors data instances labels
  • 10. Majority Votes 0.28 0.43 0.59 0.33 0.41 0.08 0.34 0.27 0.17 0.61 0.49 0.05 0.62 0.12 0.55 0.92 0.43 0.88 0.71 0.47 0.42 0.01 0.34 0.23 0.64 0 0 1 0 0 0 0 0 0 1 0 0 1 0 1 1 0 1 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 1 1 0 1 1 0 0 0 0 0 1 0 0 0 1 0 • Labeling matrix (L) as a binary matrix
  • 12. Ensemble Selection E.g. CES (Caruana’s Ensemble Selection) … Iteratively grows the ensemble by selecting base predictors that leads to higher gains in chosen performance metric.
  • 13. CF Ensemble Learning … … … … … … … … R X Y R’ data points base predictors Identify unreliable conditional probabilities Re-estimate the unreliable entries via reliable entries Unreliable entries are analogous to un-rated entries in recommender system Predict what could have been a better estimate of p(y=1|x) using reliable entries Need to figure out the latent representation for both classifiers and data points in order to re-estimate conditional probabilities and quantify classifier and data similarities
  • 14. Latent Factor Representation … … … … … … … … … … … … … … <y1,y2,y3> <x1,x2,x3>
  • 15. Optimization Objectives (1) • Minimizing the reconstruction error • Minimizing the weighted reconstruction error, where Cui: {0, 1} • Since every entry of the probability matrix is “observed,” instead let’s think about which should remain in the cost functions !!" − !! ! ∙ !! ! !,! + ! !! ! ! + !! ! ! • Include TPs and TNs and leave out FPs and FNs
  • 16. Optimization Objectives (2) • Minimizing the weighted reconstruction error with confidence scores (as continuous quantities rather than discrete values like {0, 1} • In classification, the situation is more complex because not TPs and TNs are made equal • We have very skewed class distributions (e.g. protein function prediction): very few positive classes • Each probability score has an associated confidence measure
  • 17. CF for Ensemble Learning • How to determine (degrees) of reliability • Confidence score – Brier score – Ratio of correct predictions
  • 18. Optimization Objectives (3) • Estimate preference score {0, 1}, depending on the “polarity (M)” !!" !!" − !! ! ∙ !! ! !,! + ! !! ! ! + !! ! ! • Positive polarity if the entry (u, i) corresponds to TPs or TNs • Negative polarity if the entry (u, i) corresponds to FPs or FNs • M[u,i] represents a “polarity” of an entry in the matrix
  • 19. Polarity Matrix … … … 0 1 0 0 0 … 1 base predictors data points labels + - + + + + + - + + • If we could identify the polarities reasonably well and drop the negatives at the prediction time, the predictive performance would be exceptional • Why? When there are lots of BPs (e.g. via bagging), chances are that there’s one or more predictors are making correct predictions (for the most part). 0.28 0.43 0.59 0.33 0.41 0.08 0.34 0.27 0.17 0.61 0.49 0.05 0.62 0.12 0.55 0.92 0.43 0.88 0.71 0.47 0.42 0.01 0.34 0.23 0.64 0 0 1 0 0 0 0 0 0 1 0 0 1 0 1 1 0 1 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 1 1 0 1 1 0 0 0 0 0 1 0 0 0 1 0
  • 20. Cost Function with Polarities (1) … … … 0 1 0 0 0 … 1 base predictors data points labels + - + + + + + - + + • When approximating “ratings” …
  • 21. Cost Function with Polarities (2) !!" !!" − !! ! ∙ !! ! !,! + ! !! ! ! + !! ! ! … … … 0 1 0 0 0 … 1 base predictors data points labels + - + + + + + - + + • When representing preference scores …
  • 22. Polarity Modeling • Can be broken down into a four-class prediction problem: i.e. predicting TPs, TNs, FPs, FNs • Seems more complex however … – In predicting polarities, we actually have more training examples (than when we predict class labels) • Each entry in the probability matrix is a training instance – We do not need a perfect model
  • 23. Polarity Modeling • Baseline model via majority votes – Drawback: label dependent • Identify useful features for predicting polarity – Horizontal statistics (foreach BP …) • R[u, i] > median of probability scores? • KDE signature (next slides) • How does R[u, i] compare to BP’s own probability threshold – Vertical statistics (foreach data instance …) • Majority votes • Rank of R[u, i] • KDE? too expensive
  • 24. Polarity Modeling: KDEstimates • KDE for the four different flavors of particles: TPs, TNs, FPs, FNs … • Given a query point R[u, i], get the “amplitudes” for the four flavors of particles – If R[u, i] corresponds to TPs, it tends to be large; by contrast, if R[u, i] corresponds to TNs, it tends to be small – Perhaps even better, use survival function to find P(R>=R[u, i]) given the density estimate
  • 25.
  • 26.
  • 27. Training Data 0 1 0 0 0 … … … … Original base-level probability estimates (or meta-data) with LESS reliable entries Ground Truth Transformed meta-data with MORE reliable entries Substitute new estimates for unreliable ones via latent factor-based CF 1 … … … … … 0 1 0 0 0 Various blending methods (e.g. stacking, mean prediction, enemble selection …) Final prediction Transformed meta-data Ensemble Generation Ensemble Transformation Ensemble Integration m classifiers varied by decision boundaries (e.g. m=5) Source data of size n as input (e.g. n=10K) 5-by-2 2-by-10K