How good is your prediction?
A gentle introduction to
Conformal Prediction.
Marco Capuccini
System-Software Development Group
RIKEN Center for Computational Science
marco.capuccini@riken.jp
Marco Capuccini
PhD
in
Scientific
C
om
puting
U
ppsala
U
niversity
Sw
eden
M
Sc
B
ioinform
atics
U
ppsala
U
niversity
Sw
eden
B
Sc
C
om
puterScience
La
Sapienza
R
om
e,Italy
Cloud / HPC
Computing
Data
Engineering
Machine
Learning
Bioinformatics
Postdoc
R
IKEN
(R
-C
C
S)
Kobe,Japan
RIKEN
▪ RIKEN is Japan's largest (government funded) research institution
▪ Established in 1917
▪ Research centers and institutes across Japan
…
Machine Learning (ML)
Machine Learning (ML) is a family of methods to derive knowledge or predictions
from data
Supervised learning:
● Given a set of objects x1
, x2
… xn
with known labels l1
, l2
…
● Goal: train a model M that can be used in order to predict unknown labels ln+1
,
ln+2
… for new objects xn+1
, xn+2
…
Application: spam filtering, text recognition, image analysis, financing, genomics,
song and movie recommendation …
How do we evaluate M?
Current (best) practices
1. Split x1
, x2
… xn
in a training set x1
, x2
… xk
and a test set xk+1
, xk+2
… xn
(1<k<n)
2. train M over the training set and evaluate it over the test set
What if performance changes with new objects?
How do we assign object-specific confidence to predictions?
Conformal Prediction
Mathematical framework (by Vovk et al.). Main idea:
● For an unseen object, instead of producing a single prediction l' a Conformal
Predictor (CP) produces a prediction set { l1
',l2
', … lK
' } according to a
user-specified significance level 𝜺
● Vovk et al. provide proof that ℙ( l ϵ { l1
',l2
', … lK
' } ) ≥ 1 - 𝜺, where l is the true label
for the unseen object
Can be applied to any ML predictor. How?
Vovk, Vladimir, Alex Gammerman, and Glenn Shafer. Algorithmic learning in a random world. Springer Science &
Business Media, 2005.
Neural network example
Binary classification
● The output sigmoid layer models the probability of positive class
CP wants the underlying predictor to assign a Non-Conformity Measure (NCM) to
examples; i.e. a strangeness measure for examples
● Given a labelled object (x, l)
NCM(x,l) = 1 - NN(x) if l is positive; NN(x) otherwise
More NCMs (1)
● Logistic regression?
● Linear Support Vector Machines?
● Random forests?
NCM(x,l) = 1 - LR(x) if l is positive; LR(x) otherwise
NCM(x,l) = -SVM(x) if l is positive, SVM(x) otherwise
NCM(x,l) = fraction of trees predicting the wrong label
Implementation
Idea
1. Given a new object. For each label:
● Compute “p-values” using a calibration set
2. Add label to prediction set if p-value > 𝜺. The p-values can also be used as a
measure confidence.
Details
● Shafer, Glenn, and Vladimir Vovk. "A tutorial on conformal prediction." Journal
of Machine Learning Research 9.Mar (2008): 371-421.
● http://jmlr.csail.mit.edu/papers/volume9/shafer08a/shafer08a.pdf
Example: AI-assisted pathology
~80K prostate biopsies from
~7.5K Swedish men
~80K slides ~5M
training tiles
Train CNN with
Inception-v3
Validation over ~500K tiles
Kaggle:
https://www.kaggle.com/c/prostate-cancer-grade-assessment
AI-assisted Pathology with Conformal Prediction
CNN
CP
Conformal
Predictor
For a user-defined 𝜺
CP(𝜺, x1
) = {Benignant}
CP(𝜺, x2
) = {Benignant, Cancer}
...
CP(𝜺, xn
) = {}
xk
, k=1...n is an unseen tile
By construction the true label of xk
is in the prediction set with probability at least 1- 𝜺
(Vovk et al. provide proof under exchangeability assumption)
Conformal Predictor Efficiency (AI-assisted Pathology)
Significance Level (𝜺)
Questions?
marco.capuccini@riken.jp

How good is your prediction a gentle introduction to conformal prediction.

  • 1.
    How good isyour prediction? A gentle introduction to Conformal Prediction. Marco Capuccini System-Software Development Group RIKEN Center for Computational Science marco.capuccini@riken.jp
  • 2.
  • 3.
    RIKEN ▪ RIKEN isJapan's largest (government funded) research institution ▪ Established in 1917 ▪ Research centers and institutes across Japan …
  • 4.
    Machine Learning (ML) MachineLearning (ML) is a family of methods to derive knowledge or predictions from data Supervised learning: ● Given a set of objects x1 , x2 … xn with known labels l1 , l2 … ● Goal: train a model M that can be used in order to predict unknown labels ln+1 , ln+2 … for new objects xn+1 , xn+2 … Application: spam filtering, text recognition, image analysis, financing, genomics, song and movie recommendation …
  • 5.
    How do weevaluate M? Current (best) practices 1. Split x1 , x2 … xn in a training set x1 , x2 … xk and a test set xk+1 , xk+2 … xn (1<k<n) 2. train M over the training set and evaluate it over the test set What if performance changes with new objects? How do we assign object-specific confidence to predictions?
  • 6.
    Conformal Prediction Mathematical framework(by Vovk et al.). Main idea: ● For an unseen object, instead of producing a single prediction l' a Conformal Predictor (CP) produces a prediction set { l1 ',l2 ', … lK ' } according to a user-specified significance level 𝜺 ● Vovk et al. provide proof that ℙ( l ϵ { l1 ',l2 ', … lK ' } ) ≥ 1 - 𝜺, where l is the true label for the unseen object Can be applied to any ML predictor. How? Vovk, Vladimir, Alex Gammerman, and Glenn Shafer. Algorithmic learning in a random world. Springer Science & Business Media, 2005.
  • 7.
    Neural network example Binaryclassification ● The output sigmoid layer models the probability of positive class CP wants the underlying predictor to assign a Non-Conformity Measure (NCM) to examples; i.e. a strangeness measure for examples ● Given a labelled object (x, l) NCM(x,l) = 1 - NN(x) if l is positive; NN(x) otherwise
  • 8.
    More NCMs (1) ●Logistic regression? ● Linear Support Vector Machines? ● Random forests? NCM(x,l) = 1 - LR(x) if l is positive; LR(x) otherwise NCM(x,l) = -SVM(x) if l is positive, SVM(x) otherwise NCM(x,l) = fraction of trees predicting the wrong label
  • 9.
    Implementation Idea 1. Given anew object. For each label: ● Compute “p-values” using a calibration set 2. Add label to prediction set if p-value > 𝜺. The p-values can also be used as a measure confidence. Details ● Shafer, Glenn, and Vladimir Vovk. "A tutorial on conformal prediction." Journal of Machine Learning Research 9.Mar (2008): 371-421. ● http://jmlr.csail.mit.edu/papers/volume9/shafer08a/shafer08a.pdf
  • 10.
    Example: AI-assisted pathology ~80Kprostate biopsies from ~7.5K Swedish men ~80K slides ~5M training tiles Train CNN with Inception-v3 Validation over ~500K tiles Kaggle: https://www.kaggle.com/c/prostate-cancer-grade-assessment
  • 11.
    AI-assisted Pathology withConformal Prediction CNN CP Conformal Predictor For a user-defined 𝜺 CP(𝜺, x1 ) = {Benignant} CP(𝜺, x2 ) = {Benignant, Cancer} ... CP(𝜺, xn ) = {} xk , k=1...n is an unseen tile By construction the true label of xk is in the prediction set with probability at least 1- 𝜺 (Vovk et al. provide proof under exchangeability assumption)
  • 12.
    Conformal Predictor Efficiency(AI-assisted Pathology) Significance Level (𝜺)
  • 13.