Imprecision in (statistical) learning: an incomplete
overview
Sébastien Destercke
Heudiasyc, CNRS Compiegne, France
UTC Data science seminar
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 1 / 56
Basic setting
Plan
1 Basic setting
Setting the learning framework
Model selection (by loss minimisation)
2 Imprecision in learning
Imprecise data (and precise models)
Imprecision in models
Imprecision in predictions
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 2 / 56
Basic setting Setting the learning framework
Outline
1 Basic setting
Setting the learning framework
Model selection (by loss minimisation)
2 Imprecision in learning
Imprecise data (and precise models)
Imprecision in models
Imprecision in predictions
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 3 / 56
Basic setting Setting the learning framework
The basic (supervised) setting
You consider a parametrized set Θ of possible models
You observe a bunch of input/output pairs (xi, yi) over X × Y:
X: input space
Y: output space
From them, learn a predictive model with parameters θ̂ ∈ Θ
A model θ takes as input x, and can typically output:
A probability p(y|x) over Y (e.g., logistic regression)
A real-valued score s(y|x) over Y (e.g., SVM)
One of the element in Y (e.g., Nearest Neighbour)
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 4 / 56
Basic setting Setting the learning framework
Classification: Y finite set
X = R2
Y = { , }
Θ = {θ1, θ2}
X2
X1
(x, y)
a
θ2 (X1
> a)
b θ1 (X2
< b)
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 5 / 56
Basic setting Setting the learning framework
Regression: Y continuous
X = R
Y = R
Θ = {(a, b) ∈ R2} → θ(x) = a · x + b
x
y
(x, y)
θ
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 6 / 56
Basic setting Setting the learning framework
The classical scheme
Precise
data (xi, yi)
Precise
model θ
Precise
prediction
θ(x) = y
Induction
principle
Inference/Decision
rule
→ this talk: what if one the step becomes imprecise/partial (by
constraint or by design)?
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 7 / 56
Basic setting Model selection (by loss minimisation)
Outline
1 Basic setting
Setting the learning framework
Model selection (by loss minimisation)
2 Imprecision in learning
Imprecise data (and precise models)
Imprecision in models
Imprecision in predictions
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 8 / 56
Basic setting Model selection (by loss minimisation)
Loss and selection
`(ŷ, y): loss incurred by predicting ŷ if y is observed.
A model θ will produce predictions θ(x), and its global loss on
observed training data (xi, yi) will be evaluated as1
Remp(θ) =
N
X
i=1
`(θ(xi), yi)
possibly regularizing to avoid overfitting (not this talk topic)
The optimal model is
θ∗
= arg min
θ∈Θ
Remp(θ),
the one with lowest possible average loss
1
Used as approximation of R(θ) =
R
X×Y
`(θ(x), y)dP(x, y).
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 9 / 56
Basic setting Model selection (by loss minimisation)
Classification: Y finite set
`0/1(ŷ, y) =
(
1 if ŷ 6= y
0 if ŷ = y
Remp(θ2) = 1/13 → θ∗ = θ2
Remp(θ1) = 2/13
X2
X1
(x, y)
a
θ2 (X1
> a)
b θ1 (X2
< b)
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 10 / 56
Basic setting Model selection (by loss minimisation)
Illustrations
Regression
L y, ŷ

= (y − ŷ)2
x
y
hθ∗
Classification (binary log reg)
L(y, p) =

− log(p) if y = 1
− log(1 − p) if y = 0
x
y
hθ∗
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 11 / 56
Basic setting Model selection (by loss minimisation)
Some additional notes
The function Remp induces a complete order  between all
models → up to indifference, best model unambiguously defined2
The likelihood function
L(θ|(x·, y·)) =
n
Y
i=1
p(xi, yi|θ)
or the Bayesian posterior
P(θ|(x·, y·)) ∝ L(θ|(x·, y·)) · P(θ)
also induces numerical scores that completely order models θ.
P(θ|(x·, y·)) also provides (“meaningful”) probabilistic weights.
2
Convexifying, a common ML game, ensures computability and 6 ∃ of indifference.
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 12 / 56
Imprecision in learning
Plan
1 Basic setting
Setting the learning framework
Model selection (by loss minimisation)
2 Imprecision in learning
Imprecise data (and precise models)
Imprecision in models
Imprecision in predictions
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 13 / 56
Imprecision in learning Imprecise data (and precise models)
Outline
1 Basic setting
Setting the learning framework
Model selection (by loss minimisation)
2 Imprecision in learning
Imprecise data (and precise models)
Imprecision in models
Imprecision in predictions
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 14 / 56
Imprecision in learning Imprecise data (and precise models)
Induction with imprecise data
We observe possibly imprecise input/output (X, Y) containing the
truth (one (x, y) ∈ (X, Y) are true, unobserved values)
Losses3 both become set-valued [2]:
`(θ(X), Y) = {`(θ(X), Y)|y ∈ Y, x ∈ X}
Previous induction principles are no longer well-defined
What if we still want to get one model?
3
And likelihoods/posteriors alike
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 15 / 56
Imprecision in learning Imprecise data (and precise models)
The imprecise setting illustrated
Regression
x
y
Classification (binary log reg)
x
y
How to define hθ∗ ?
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 16 / 56
Imprecision in learning Imprecise data (and precise models)
Illustration on toy example
`0/1(ŷ, y)
R(θ) =
P
i min(xi ,yi )∈(Xi ,Yi ) `(θ(xi), yi) → best case scenario
R(θ) =
P
i max(xi ,yi )∈(Xi ,Yi ) `(θ(xi), yi) → worst case scenario
X1
X2
1 2
3
4
5
θ2
θ1
[R(θ1), R(θ1)] = [0, 5/13]
[R(θ2), R(θ2)] = [1/13, 3/13]
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 17 / 56
Imprecision in learning Imprecise data (and precise models)
Going back to a precise model
If we know the “imprecisiation” process Pobs((X, Y)|(x, y)), no
theoretical problem → “merely” a computational one
If not, common approaches are to redefine a precise criterion:
Optimistic (Maximax/Minimin) approach [8, 1]:
`opt (θ(x), Y) = min{`(θ(x), Y)|y ∈ Y}
Pessimistic (Maximin/Minimax) approach [6]:
`pes(θ(x), Y) = max{`(θ(x), Y)|y ∈ Y}
EM-like or averaging/weighting approaches4
approach
`w (θ(x), Y) =
X
y∈Y
wy `(θ(x), y),
4
With likelihood ∼ Lav (θ|(x, Y)) = P((x, Y)|θ) [4]
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 18 / 56
Imprecision in learning Imprecise data (and precise models)
Not a trivial choice: regression example
Pessimistic tries to be good for every replacement
Optimistic tries to be the best for one replacement
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 19 / 56
Imprecision in learning Imprecise data (and precise models)
A logistic regression example
OPT
PESS
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 20 / 56
Imprecision in learning Imprecise data (and precise models)
Which one should I be?
Optimist . . .
or. . .
Pessimist?
→ pretty much depends on the context!
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 21 / 56
Imprecision in learning Imprecise data (and precise models)
Some elements of answer
When to be optimist?
Reasonably sure model space Θ can capture a good predictor
and is not too flexible (overfitting!)
“imprecisiation” process random/not designed to make you fail
can capture the best model
Optimism ' semi-sup. learning if imprecision=missingness.
When to be pessimist?
want to obtain guarantees in all possible scenarios ('
distributional robustness)
facing an “adversarial” process
partial data=set of situations for which you want to perform
reasonably well
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 22 / 56
Imprecision in learning Imprecise data (and precise models)
Beyond imprecise data: soft data
Assume two classes {a, b}. We can put different uncertainty models
over them
1
a b
Certain label
1
a b
Imprecise label
1
a b
α
1 − α
Prob. label
1
a b
Possibilistic label
→ utility to consider more complex models/cases?
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 23 / 56
Imprecision in learning Imprecision in models
Outline
1 Basic setting
Setting the learning framework
Model selection (by loss minimisation)
2 Imprecision in learning
Imprecise data (and precise models)
Imprecision in models
Imprecision in predictions
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 24 / 56
Imprecision in learning Imprecision in models
Models and ordering
In the classical scheme, models are completely ranked
θ1 θ2 θ3 θn
R(θ1) R(θ2) R(θ3) R(θn)
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 25 / 56
Imprecision in learning Imprecision in models
Models and ordering
And we pick the top one
θ(1) θ(2) θ(3) θ(n)
R(θ(1)) R(θ(2)) R(θ(3)) R(θ(n))
  
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 26 / 56
Imprecision in learning Imprecision in models
Models and ordering
And we pick the top one
θ(1) θ(2) θ(3) θ(n)
R(θ(1)) R(θ(2)) R(θ(3)) R(θ(n))
  
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 26 / 56
Imprecision in learning Imprecision in models
Model weighting
Ensembles, Bayes posteriors, etc → weights over models
θ(1) θ(2) θ(3) θ(n)
R(θ(1)) R(θ(2)) R(θ(3)) R(θ(n))
  
P(θ1|x, y) P(θ2|x, y) P(θ3|x, y) P(θn|x, y)
But still no imprecision
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 27 / 56
Imprecision in learning Imprecision in models
Outline
1 Basic setting
Setting the learning framework
Model selection (by loss minimisation)
2 Imprecision in learning
Imprecise data (and precise models)
Imprecision in models
Imprecise data
Other ways to get imprecise models
Why looking for an imprecise model?
Imprecision in predictions
How to get an imprecise prediction?
How to evaluate an imprecise prediction?
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 28 / 56
Imprecision in learning Imprecision in models
Back to the toy example
`0/1(ŷ, y)
Unless we commit to a behaviour, models θ1, θ2 incomparable
X1
X2
1 2
3
4
5
θ2
θ1
[R(θ1), R(θ1)] = [0, 5/13]
[R(θ2), R(θ2)] = [1/13, 3/13]
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 29 / 56
Imprecision in learning Imprecision in models
Induced partial order
Each model is now set- or interval-valued
θ1 θ2 θ3 θn
[R(θ1), R(θ1)] [R(θ2), R(θ2)] [R(θ3), R(θ3)] [R(θn), R(θn)]
θi  θj for sure if R(θi)  R(θj)
this is known as an interval-order
(very) safe bet: take all maximal models θ, i.e., without θ0  θ
works also if [R(θi), R(θi)] is a statistical confidence interval
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 30 / 56
Imprecision in learning Imprecision in models
Outline
1 Basic setting
Setting the learning framework
Model selection (by loss minimisation)
2 Imprecision in learning
Imprecise data (and precise models)
Imprecision in models
Imprecise data
Other ways to get imprecise models
Why looking for an imprecise model?
Imprecision in predictions
How to get an imprecise prediction?
How to evaluate an imprecise prediction?
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 31 / 56
Imprecision in learning Imprecision in models
Sets of best models
Not taking the best, but the k-best
θ(1) θ(2) θ(3) θ(k) θ(k+1)
R(θ(1)) R(θ(2)) R(θ(3)) R(θ(k)) R(θ(k+1))
   
One common way to do it [5] (dates back to Birnbaum, at least):
Normalize likelihood by computing
L∗
(θ|(x·, y·)) =
L(θ|(x·, y·))
arg supθ L(θ|(x·, y·))
Take as set estimate cut of level α
Θα = {θ|L∗
(θ|(x·, y·)) ≥ α}
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 32 / 56
Imprecision in learning Imprecision in models
Robust Bayes and imprecise probabilities [12, 14]
Consider a set of priors, and its corresponding set of posteriors
θ(1) θ(2) θ(3) θ(n)
R(θ(1)) R(θ(2)) R(θ(3)) R(θ(n))
  
[P(θ1), P(θ1)] [P(θ2), P(θ2)] [P(θ3), P(θ3)] [P(θn), P(θn)]
6= from pure Bayesian approach, as priors are not weighted5
5
Anarchy!
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 33 / 56
Imprecision in learning Imprecision in models
Outline
1 Basic setting
Setting the learning framework
Model selection (by loss minimisation)
2 Imprecision in learning
Imprecise data (and precise models)
Imprecision in models
Imprecise data
Other ways to get imprecise models
Why looking for an imprecise model?
Imprecision in predictions
How to get an imprecise prediction?
How to evaluate an imprecise prediction?
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 34 / 56
Imprecision in learning Imprecision in models
Yes, why?
Because...
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 35 / 56
Imprecision in learning Imprecision in models
Other reasons
You want to make some robustness analysis around your top
models or your weighting scheme (because of limited data, of the
fact that they are not the theoretical optimal ones, . . . );
You suspect the observed data will be different6 from the training
ones (transfer learning, distributional robustness [9]);
You want a rich uncertainty quantification where there is a clear
distinction between aleatory uncertainty (irreducible, due to fixed
learning setting) and epistemic uncertainty (reducible by collecting
information). This can be used to:
produce cautious predictions (see next slides)
perform active learning [10]
explain uncertainty sources (largely unexplored topic)
6
In practice, issued from a distribution Ptest (X, Y) 6= Ptrain(X, Y)
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 36 / 56
Imprecision in learning Imprecision in models
Two kinds of uncertainties
Aleatory uncertainty: classes are really mixed → irreducible with
more data (but possibly by adding features)
Epistemic uncertainty: lack of information → reducible
X2
X1
x
a
a
a
b
b
b
Aleatory uncertainty
P(a) ∈ [0.45, 0.55]
X2
X1
x
a
b
Epistemic uncertainty
P(a) ∈ [0.2, 0.8]
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 37 / 56
Imprecision in learning Imprecision in predictions
Outline
1 Basic setting
Setting the learning framework
Model selection (by loss minimisation)
2 Imprecision in learning
Imprecise data (and precise models)
Imprecision in models
Imprecision in predictions
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 38 / 56
Imprecision in learning Imprecision in predictions
Imprecise decisions: illustration
Predicting over Y = {a, b, c}
Precise predictions Imprecise predictions
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 39 / 56
Imprecision in learning Imprecision in predictions
Outline
1 Basic setting
Setting the learning framework
Model selection (by loss minimisation)
2 Imprecision in learning
Imprecise data (and precise models)
Imprecision in models
Imprecise data
Other ways to get imprecise models
Why looking for an imprecise model?
Imprecision in predictions
How to get an imprecise prediction?
How to evaluate an imprecise prediction?
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 40 / 56
Imprecision in learning Imprecision in predictions
Some context
You allow your model θ to output more than one class/value7:
θ(x) ⊂ Y
Some questions:
Ensure trade-off b/w information (θ(x) small) and accuracy (ytrue ∈ θ(x))?
Evaluate the quality of θ(x)?
Given confidence α, how to ensure global coverage
P(ytrue ∈ θ(x)) ≥ α
Given confidence α and x, how to ensure local coveragea
P(ytrue ∈ θ(x)|x) ≥ α
a
Much, much more difficult.
7
Yes, this is classical in regression, less so in other frameworks.
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 41 / 56
Imprecision in learning Imprecision in predictions
Probabilistic partial reject [3, 7]
Assume we have p(y|x) as training output
Fix a confidence value α ∈ [0, 1]
Consider the permutation () on Y such that
p(y(1)
|x) ≥ p(y(2)
|x) ≥ . . . ≥ p(y(K)
|x)
Take all classes until cumulated probability is above 
θ(x) = {y(1)
, . . . , y(j)
:
j−1
X
i=1
p(y(i)
|x) ≤ α,
j
X
i=1
p(y(i)
|x) ≥ α}
Example
α = 0.9
P(a|x) = 0.7, P(b|x) = 0.05, P(c|x) = 0.25
θ(x) =
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 42 / 56
Imprecision in learning Imprecision in predictions
Probabilistic partial reject [3, 7]
Assume we have p(y|x) as training output
Fix a confidence value α ∈ [0, 1]
Consider the permutation () on Y such that
p(y(1)
|x) ≥ p(y(2)
|x) ≥ . . . ≥ p(y(K)
|x)
Take all classes until cumulated probability is above 
θ(x) = {y(1)
, . . . , y(j)
:
j−1
X
i=1
p(y(i)
|x) ≤ α,
j
X
i=1
p(y(i)
|x) ≥ α}
Example
α = 0.9
P(a|x) = 0.7  P(c|x) = 0.25  P(b|x) = 0.05
θ(x) = {a, c}
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 42 / 56
Imprecision in learning Imprecision in predictions
On probabilistic reject
The pros:
Rather straightforward to implement
Approximate coverage ensured if P calibrated8
The cons:
Difficult to differentiate ambiguity vs lack of knowledge
Badly estimated probabilities can lead to misleading conclusions
8
∃ techniques for global calibration, less for local.
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 43 / 56
Imprecision in learning Imprecision in predictions
(inductive) Conformal prediction [11]
Take a validation set9 of I instances (xi, yi)
To each yi associate a score αi = maxy6=yi
(p(y|xi) − p(yi|xi))
Given new instance xI+1, define p-value of each prediction yj as
pv(yj
) =
|{i = 1, . . . , I, I + 1 : αi ≥ αyj
}|
n + 1
.
with αyj
score of (xI+1, yj)
Fix a confidence value α ∈ [0, 1]
Get as prediction
θ(x) = {yj
: pv(yj
) ≥ 1 − α}.
9
6= training and test sets
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 44 / 56
Imprecision in learning Imprecision in predictions
Conformal prediction: example
α = 0.9
Assume 10 validation data with scores
−0.1; 0.3; −0.4; 0.1; 0; −0.6; −0.2; 0.2; 0.3; −0.1;
we observe an instance x
Let us test
θ(x) = {}
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 45 / 56
Imprecision in learning Imprecision in predictions
Conformal prediction: example
α = 0.9
Assume 10 validation data with scores
−0.1; 0.3; −0.4; 0.1; 0; −0.6; −0.2; 0.2; 0.3; −0.1;
we observe an instance x
if (x, a) has score 0.5, pv(a) = 0/11  0.1
θ(x) = {}
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 45 / 56
Imprecision in learning Imprecision in predictions
Conformal prediction: example
α = 0.9
Assume 10 validation data with scores
−0.1; 0.3; −0.4; 0.1; 0; −0.6; −0.2; 0.2; 0.3; −0.1;
we observe an instance x
if (x, b) has score -0.2, pv(b) = 8/11  0.1
θ(x) = {b}
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 45 / 56
Imprecision in learning Imprecision in predictions
Conformal prediction: example
α = 0.9
Assume 10 validation data with scores
−0.1; 0.3; −0.4; 0.1; 0; −0.6; −0.2; 0.2; 0.3; −0.1;
we observe an instance x
if (x, c) has score 0, pv(b) = 5/11  0.1
θ(x) = {b, c}
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 45 / 56
Imprecision in learning Imprecision in predictions
On conformal prediction
The pros:
Provide global coverage guarantee10
Works on any score-base model (including deep ones), weak
theoretical requirements (exchangeability)
The cons:
Need validation set
May give fairly imprecise outputs if bad model/small validation set
Not a clear difference between aleatoric/epistemic aspects
10
Some works on conditional coverage exist.
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 46 / 56
Imprecision in learning Imprecision in predictions
Working with a set of model
Output is a set M of (probabilistic) models
For any m ∈ M, call m(x) its optimal prediction for x
Take as θ(x) all possibly optimal predictions
θ(x) = {m(x) : m ∈ M}
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 47 / 56
Imprecision in learning Imprecision in predictions
Example
X2
X1
a
θ2 (X1
 a)
b θ1 (X2
 b)
θ(x) = { , }
θ(x) = { , } θ(x) = { }
θ(x) = { }
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 48 / 56
Imprecision in learning Imprecision in predictions
On sets of models
The pros:
Approximate global coverage can be obtained11
Better control of imprecision
If m probabilistic, easier to distinguish aleatoric/epistemic
uncertainty
The cons:
Learning model has to be adapted → more or less painful
Decision rule can lead to complex optimisation
11
Conditional one harder, not much on that for now.
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 49 / 56
Imprecision in learning Imprecision in predictions
Outline
1 Basic setting
Setting the learning framework
Model selection (by loss minimisation)
2 Imprecision in learning
Imprecise data (and precise models)
Imprecision in models
Imprecise data
Other ways to get imprecise models
Why looking for an imprecise model?
Imprecision in predictions
How to get an imprecise prediction?
How to evaluate an imprecise prediction?
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 50 / 56
Imprecision in learning Imprecision in predictions
The two doctors story
In a hospital, doctors get 1$ each time diagnostic is right.
2 Doctors pretty sure that patients have either Pneumonia (P) or Bronchitis
(B)
Doctor 1
Flip a coin each time
Diagnose the result
Gets 0.5$ in average
Doctor 2
Tells you he does not know b/w P
and B
Should his reward be 0.5 $, same
as doc 1? higher? lower?
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 51 / 56
Imprecision in learning Imprecision in predictions
Main solution so far for 0/1 loss
u(Ŷ, y) =



0 if y /
∈ Ŷ
α
|Ŷ|
+
1 − α
|Ŷ|2
otherwise
with u(Ŷ, y) = 1 if |Ŷ| = 1 and Ŷ = y
Discounted accuracy: α = 1
u(Ŷ, y) =
1
|Ŷ|
→ no reward to cautiousness (cautiousness≡randomness)
u65: α = 1.6, moderate reward to cautiousness
u80: α = 2.2, big reward to cautiousness
The higher α, the higher the reward
Solutions exists for generic losses too.
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 52 / 56
Imprecision in learning Imprecision in predictions
Boldness averseness illustrated
0 1/|Ŷ|
1
1/2
u50
u80
u65
0.8
0.65
0.5
2 classes predicted,
good one in it
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 53 / 56
Imprecision in learning Imprecision in predictions
References I
[1] Timothee Cour, Ben Sapp, and Ben Taskar.
Learning from partial labels.
Journal of Machine Learning Research, 12(May):1501–1536, 2011.
[2] Inés Couso and Luciano Sánchez.
Machine learning models, epistemic set-valued data and generalized loss functions: An encompassing approach.
Information Sciences, 358:129–150, 2016.
[3] Juan José del Coz, Jorge Díez, and Antonio Bahamonde.
Learning nondeterministic classifiers.
Journal of Machine Learning Research, 10(Oct):2273–2293, 2009.
[4] Thierry Denoeux.
Maximum likelihood estimation from uncertain data in the belief function framework.
IEEE Transactions on knowledge and data engineering, 25(1):119–130, 2013.
[5] D. Dubois, S. Moral, and H. Prade.
A semantics for possibility theory based on likelihoods,.
Journal of Mathematical Analysis and Applications, 205(2):359 – 380, 1997.
[6] Romain Guillaume, Inés Couso, and Didier Dubois.
Maximum likelihood with coarse data based on robust optimisation.
In Proceedings of the Tenth International Symposium on Imprecise Probability: Theories and Applications, pages 169–180,
2017.
[7] Thien M Ha.
The optimum class-selective rejection rule.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(6):608–615, 1997.
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 54 / 56
Imprecision in learning Imprecision in predictions
References II
[8] Eyke Hüllermeier.
Learning from imprecise and fuzzy observations: Data disambiguation through generalized loss minimization.
International Journal of Approximate Reasoning, 55(7):1519–1534, 2014.
[9] Daniel Kuhn, Peyman Mohajerin Esfahani, Viet Anh Nguyen, and Soroosh Shafieezadeh-Abadeh.
Wasserstein distributionally robust optimization: Theory and applications in machine learning.
In Operations Research  Management Science in the Age of Analytics, pages 130–166. INFORMS, 2019.
[10] Vu-Linh Nguyen, Sébastien Destercke, and Eyke Hüllermeier.
Epistemic uncertainty sampling.
In International Conference on Discovery Science, pages 72–86. Springer, 2019.
[11] Harris Papadopoulos.
Inductive conformal prediction: Theory and application to neural networks.
In Tools in artificial intelligence. Citeseer, 2008.
[12] P. Walley.
Statistical reasoning with imprecise Probabilities.
Chapman and Hall, New York, 1991.
[13] Gen Yang, Sébastien Destercke, and Marie-Hélène Masson.
The costs of indeterminacy: How to determine them?
IEEE transactions on cybernetics, 47(12):4316–4327, 2017.
[14] M. Zaffalon.
The naive credal classifier.
J. Probabilistic Planning and Inference, 105:105–122, 2002.
[15] Marco Zaffalon, Giorgio Corani, and Denis Mauá.
Evaluating credal classifiers by utility-discounted predictive accuracy.
International Journal of Approximate Reasoning, 53(8):1282–1301, 2012.
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 55 / 56
Imprecision in learning Imprecision in predictions
References III
Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 56 / 56

Imprecision in learning: an overview

  • 1.
    Imprecision in (statistical)learning: an incomplete overview Sébastien Destercke Heudiasyc, CNRS Compiegne, France UTC Data science seminar Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 1 / 56
  • 2.
    Basic setting Plan 1 Basicsetting Setting the learning framework Model selection (by loss minimisation) 2 Imprecision in learning Imprecise data (and precise models) Imprecision in models Imprecision in predictions Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 2 / 56
  • 3.
    Basic setting Settingthe learning framework Outline 1 Basic setting Setting the learning framework Model selection (by loss minimisation) 2 Imprecision in learning Imprecise data (and precise models) Imprecision in models Imprecision in predictions Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 3 / 56
  • 4.
    Basic setting Settingthe learning framework The basic (supervised) setting You consider a parametrized set Θ of possible models You observe a bunch of input/output pairs (xi, yi) over X × Y: X: input space Y: output space From them, learn a predictive model with parameters θ̂ ∈ Θ A model θ takes as input x, and can typically output: A probability p(y|x) over Y (e.g., logistic regression) A real-valued score s(y|x) over Y (e.g., SVM) One of the element in Y (e.g., Nearest Neighbour) Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 4 / 56
  • 5.
    Basic setting Settingthe learning framework Classification: Y finite set X = R2 Y = { , } Θ = {θ1, θ2} X2 X1 (x, y) a θ2 (X1 > a) b θ1 (X2 < b) Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 5 / 56
  • 6.
    Basic setting Settingthe learning framework Regression: Y continuous X = R Y = R Θ = {(a, b) ∈ R2} → θ(x) = a · x + b x y (x, y) θ Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 6 / 56
  • 7.
    Basic setting Settingthe learning framework The classical scheme Precise data (xi, yi) Precise model θ Precise prediction θ(x) = y Induction principle Inference/Decision rule → this talk: what if one the step becomes imprecise/partial (by constraint or by design)? Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 7 / 56
  • 8.
    Basic setting Modelselection (by loss minimisation) Outline 1 Basic setting Setting the learning framework Model selection (by loss minimisation) 2 Imprecision in learning Imprecise data (and precise models) Imprecision in models Imprecision in predictions Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 8 / 56
  • 9.
    Basic setting Modelselection (by loss minimisation) Loss and selection `(ŷ, y): loss incurred by predicting ŷ if y is observed. A model θ will produce predictions θ(x), and its global loss on observed training data (xi, yi) will be evaluated as1 Remp(θ) = N X i=1 `(θ(xi), yi) possibly regularizing to avoid overfitting (not this talk topic) The optimal model is θ∗ = arg min θ∈Θ Remp(θ), the one with lowest possible average loss 1 Used as approximation of R(θ) = R X×Y `(θ(x), y)dP(x, y). Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 9 / 56
  • 10.
    Basic setting Modelselection (by loss minimisation) Classification: Y finite set `0/1(ŷ, y) = ( 1 if ŷ 6= y 0 if ŷ = y Remp(θ2) = 1/13 → θ∗ = θ2 Remp(θ1) = 2/13 X2 X1 (x, y) a θ2 (X1 > a) b θ1 (X2 < b) Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 10 / 56
  • 11.
    Basic setting Modelselection (by loss minimisation) Illustrations Regression L y, ŷ = (y − ŷ)2 x y hθ∗ Classification (binary log reg) L(y, p) = − log(p) if y = 1 − log(1 − p) if y = 0 x y hθ∗ Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 11 / 56
  • 12.
    Basic setting Modelselection (by loss minimisation) Some additional notes The function Remp induces a complete order between all models → up to indifference, best model unambiguously defined2 The likelihood function L(θ|(x·, y·)) = n Y i=1 p(xi, yi|θ) or the Bayesian posterior P(θ|(x·, y·)) ∝ L(θ|(x·, y·)) · P(θ) also induces numerical scores that completely order models θ. P(θ|(x·, y·)) also provides (“meaningful”) probabilistic weights. 2 Convexifying, a common ML game, ensures computability and 6 ∃ of indifference. Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 12 / 56
  • 13.
    Imprecision in learning Plan 1Basic setting Setting the learning framework Model selection (by loss minimisation) 2 Imprecision in learning Imprecise data (and precise models) Imprecision in models Imprecision in predictions Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 13 / 56
  • 14.
    Imprecision in learningImprecise data (and precise models) Outline 1 Basic setting Setting the learning framework Model selection (by loss minimisation) 2 Imprecision in learning Imprecise data (and precise models) Imprecision in models Imprecision in predictions Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 14 / 56
  • 15.
    Imprecision in learningImprecise data (and precise models) Induction with imprecise data We observe possibly imprecise input/output (X, Y) containing the truth (one (x, y) ∈ (X, Y) are true, unobserved values) Losses3 both become set-valued [2]: `(θ(X), Y) = {`(θ(X), Y)|y ∈ Y, x ∈ X} Previous induction principles are no longer well-defined What if we still want to get one model? 3 And likelihoods/posteriors alike Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 15 / 56
  • 16.
    Imprecision in learningImprecise data (and precise models) The imprecise setting illustrated Regression x y Classification (binary log reg) x y How to define hθ∗ ? Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 16 / 56
  • 17.
    Imprecision in learningImprecise data (and precise models) Illustration on toy example `0/1(ŷ, y) R(θ) = P i min(xi ,yi )∈(Xi ,Yi ) `(θ(xi), yi) → best case scenario R(θ) = P i max(xi ,yi )∈(Xi ,Yi ) `(θ(xi), yi) → worst case scenario X1 X2 1 2 3 4 5 θ2 θ1 [R(θ1), R(θ1)] = [0, 5/13] [R(θ2), R(θ2)] = [1/13, 3/13] Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 17 / 56
  • 18.
    Imprecision in learningImprecise data (and precise models) Going back to a precise model If we know the “imprecisiation” process Pobs((X, Y)|(x, y)), no theoretical problem → “merely” a computational one If not, common approaches are to redefine a precise criterion: Optimistic (Maximax/Minimin) approach [8, 1]: `opt (θ(x), Y) = min{`(θ(x), Y)|y ∈ Y} Pessimistic (Maximin/Minimax) approach [6]: `pes(θ(x), Y) = max{`(θ(x), Y)|y ∈ Y} EM-like or averaging/weighting approaches4 approach `w (θ(x), Y) = X y∈Y wy `(θ(x), y), 4 With likelihood ∼ Lav (θ|(x, Y)) = P((x, Y)|θ) [4] Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 18 / 56
  • 19.
    Imprecision in learningImprecise data (and precise models) Not a trivial choice: regression example Pessimistic tries to be good for every replacement Optimistic tries to be the best for one replacement Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 19 / 56
  • 20.
    Imprecision in learningImprecise data (and precise models) A logistic regression example OPT PESS Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 20 / 56
  • 21.
    Imprecision in learningImprecise data (and precise models) Which one should I be? Optimist . . . or. . . Pessimist? → pretty much depends on the context! Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 21 / 56
  • 22.
    Imprecision in learningImprecise data (and precise models) Some elements of answer When to be optimist? Reasonably sure model space Θ can capture a good predictor and is not too flexible (overfitting!) “imprecisiation” process random/not designed to make you fail can capture the best model Optimism ' semi-sup. learning if imprecision=missingness. When to be pessimist? want to obtain guarantees in all possible scenarios (' distributional robustness) facing an “adversarial” process partial data=set of situations for which you want to perform reasonably well Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 22 / 56
  • 23.
    Imprecision in learningImprecise data (and precise models) Beyond imprecise data: soft data Assume two classes {a, b}. We can put different uncertainty models over them 1 a b Certain label 1 a b Imprecise label 1 a b α 1 − α Prob. label 1 a b Possibilistic label → utility to consider more complex models/cases? Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 23 / 56
  • 24.
    Imprecision in learningImprecision in models Outline 1 Basic setting Setting the learning framework Model selection (by loss minimisation) 2 Imprecision in learning Imprecise data (and precise models) Imprecision in models Imprecision in predictions Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 24 / 56
  • 25.
    Imprecision in learningImprecision in models Models and ordering In the classical scheme, models are completely ranked θ1 θ2 θ3 θn R(θ1) R(θ2) R(θ3) R(θn) Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 25 / 56
  • 26.
    Imprecision in learningImprecision in models Models and ordering And we pick the top one θ(1) θ(2) θ(3) θ(n) R(θ(1)) R(θ(2)) R(θ(3)) R(θ(n)) Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 26 / 56
  • 27.
    Imprecision in learningImprecision in models Models and ordering And we pick the top one θ(1) θ(2) θ(3) θ(n) R(θ(1)) R(θ(2)) R(θ(3)) R(θ(n)) Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 26 / 56
  • 28.
    Imprecision in learningImprecision in models Model weighting Ensembles, Bayes posteriors, etc → weights over models θ(1) θ(2) θ(3) θ(n) R(θ(1)) R(θ(2)) R(θ(3)) R(θ(n)) P(θ1|x, y) P(θ2|x, y) P(θ3|x, y) P(θn|x, y) But still no imprecision Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 27 / 56
  • 29.
    Imprecision in learningImprecision in models Outline 1 Basic setting Setting the learning framework Model selection (by loss minimisation) 2 Imprecision in learning Imprecise data (and precise models) Imprecision in models Imprecise data Other ways to get imprecise models Why looking for an imprecise model? Imprecision in predictions How to get an imprecise prediction? How to evaluate an imprecise prediction? Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 28 / 56
  • 30.
    Imprecision in learningImprecision in models Back to the toy example `0/1(ŷ, y) Unless we commit to a behaviour, models θ1, θ2 incomparable X1 X2 1 2 3 4 5 θ2 θ1 [R(θ1), R(θ1)] = [0, 5/13] [R(θ2), R(θ2)] = [1/13, 3/13] Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 29 / 56
  • 31.
    Imprecision in learningImprecision in models Induced partial order Each model is now set- or interval-valued θ1 θ2 θ3 θn [R(θ1), R(θ1)] [R(θ2), R(θ2)] [R(θ3), R(θ3)] [R(θn), R(θn)] θi θj for sure if R(θi) R(θj) this is known as an interval-order (very) safe bet: take all maximal models θ, i.e., without θ0 θ works also if [R(θi), R(θi)] is a statistical confidence interval Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 30 / 56
  • 32.
    Imprecision in learningImprecision in models Outline 1 Basic setting Setting the learning framework Model selection (by loss minimisation) 2 Imprecision in learning Imprecise data (and precise models) Imprecision in models Imprecise data Other ways to get imprecise models Why looking for an imprecise model? Imprecision in predictions How to get an imprecise prediction? How to evaluate an imprecise prediction? Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 31 / 56
  • 33.
    Imprecision in learningImprecision in models Sets of best models Not taking the best, but the k-best θ(1) θ(2) θ(3) θ(k) θ(k+1) R(θ(1)) R(θ(2)) R(θ(3)) R(θ(k)) R(θ(k+1)) One common way to do it [5] (dates back to Birnbaum, at least): Normalize likelihood by computing L∗ (θ|(x·, y·)) = L(θ|(x·, y·)) arg supθ L(θ|(x·, y·)) Take as set estimate cut of level α Θα = {θ|L∗ (θ|(x·, y·)) ≥ α} Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 32 / 56
  • 34.
    Imprecision in learningImprecision in models Robust Bayes and imprecise probabilities [12, 14] Consider a set of priors, and its corresponding set of posteriors θ(1) θ(2) θ(3) θ(n) R(θ(1)) R(θ(2)) R(θ(3)) R(θ(n)) [P(θ1), P(θ1)] [P(θ2), P(θ2)] [P(θ3), P(θ3)] [P(θn), P(θn)] 6= from pure Bayesian approach, as priors are not weighted5 5 Anarchy! Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 33 / 56
  • 35.
    Imprecision in learningImprecision in models Outline 1 Basic setting Setting the learning framework Model selection (by loss minimisation) 2 Imprecision in learning Imprecise data (and precise models) Imprecision in models Imprecise data Other ways to get imprecise models Why looking for an imprecise model? Imprecision in predictions How to get an imprecise prediction? How to evaluate an imprecise prediction? Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 34 / 56
  • 36.
    Imprecision in learningImprecision in models Yes, why? Because... Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 35 / 56
  • 37.
    Imprecision in learningImprecision in models Other reasons You want to make some robustness analysis around your top models or your weighting scheme (because of limited data, of the fact that they are not the theoretical optimal ones, . . . ); You suspect the observed data will be different6 from the training ones (transfer learning, distributional robustness [9]); You want a rich uncertainty quantification where there is a clear distinction between aleatory uncertainty (irreducible, due to fixed learning setting) and epistemic uncertainty (reducible by collecting information). This can be used to: produce cautious predictions (see next slides) perform active learning [10] explain uncertainty sources (largely unexplored topic) 6 In practice, issued from a distribution Ptest (X, Y) 6= Ptrain(X, Y) Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 36 / 56
  • 38.
    Imprecision in learningImprecision in models Two kinds of uncertainties Aleatory uncertainty: classes are really mixed → irreducible with more data (but possibly by adding features) Epistemic uncertainty: lack of information → reducible X2 X1 x a a a b b b Aleatory uncertainty P(a) ∈ [0.45, 0.55] X2 X1 x a b Epistemic uncertainty P(a) ∈ [0.2, 0.8] Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 37 / 56
  • 39.
    Imprecision in learningImprecision in predictions Outline 1 Basic setting Setting the learning framework Model selection (by loss minimisation) 2 Imprecision in learning Imprecise data (and precise models) Imprecision in models Imprecision in predictions Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 38 / 56
  • 40.
    Imprecision in learningImprecision in predictions Imprecise decisions: illustration Predicting over Y = {a, b, c} Precise predictions Imprecise predictions Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 39 / 56
  • 41.
    Imprecision in learningImprecision in predictions Outline 1 Basic setting Setting the learning framework Model selection (by loss minimisation) 2 Imprecision in learning Imprecise data (and precise models) Imprecision in models Imprecise data Other ways to get imprecise models Why looking for an imprecise model? Imprecision in predictions How to get an imprecise prediction? How to evaluate an imprecise prediction? Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 40 / 56
  • 42.
    Imprecision in learningImprecision in predictions Some context You allow your model θ to output more than one class/value7: θ(x) ⊂ Y Some questions: Ensure trade-off b/w information (θ(x) small) and accuracy (ytrue ∈ θ(x))? Evaluate the quality of θ(x)? Given confidence α, how to ensure global coverage P(ytrue ∈ θ(x)) ≥ α Given confidence α and x, how to ensure local coveragea P(ytrue ∈ θ(x)|x) ≥ α a Much, much more difficult. 7 Yes, this is classical in regression, less so in other frameworks. Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 41 / 56
  • 43.
    Imprecision in learningImprecision in predictions Probabilistic partial reject [3, 7] Assume we have p(y|x) as training output Fix a confidence value α ∈ [0, 1] Consider the permutation () on Y such that p(y(1) |x) ≥ p(y(2) |x) ≥ . . . ≥ p(y(K) |x) Take all classes until cumulated probability is above θ(x) = {y(1) , . . . , y(j) : j−1 X i=1 p(y(i) |x) ≤ α, j X i=1 p(y(i) |x) ≥ α} Example α = 0.9 P(a|x) = 0.7, P(b|x) = 0.05, P(c|x) = 0.25 θ(x) = Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 42 / 56
  • 44.
    Imprecision in learningImprecision in predictions Probabilistic partial reject [3, 7] Assume we have p(y|x) as training output Fix a confidence value α ∈ [0, 1] Consider the permutation () on Y such that p(y(1) |x) ≥ p(y(2) |x) ≥ . . . ≥ p(y(K) |x) Take all classes until cumulated probability is above θ(x) = {y(1) , . . . , y(j) : j−1 X i=1 p(y(i) |x) ≤ α, j X i=1 p(y(i) |x) ≥ α} Example α = 0.9 P(a|x) = 0.7 P(c|x) = 0.25 P(b|x) = 0.05 θ(x) = {a, c} Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 42 / 56
  • 45.
    Imprecision in learningImprecision in predictions On probabilistic reject The pros: Rather straightforward to implement Approximate coverage ensured if P calibrated8 The cons: Difficult to differentiate ambiguity vs lack of knowledge Badly estimated probabilities can lead to misleading conclusions 8 ∃ techniques for global calibration, less for local. Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 43 / 56
  • 46.
    Imprecision in learningImprecision in predictions (inductive) Conformal prediction [11] Take a validation set9 of I instances (xi, yi) To each yi associate a score αi = maxy6=yi (p(y|xi) − p(yi|xi)) Given new instance xI+1, define p-value of each prediction yj as pv(yj ) = |{i = 1, . . . , I, I + 1 : αi ≥ αyj }| n + 1 . with αyj score of (xI+1, yj) Fix a confidence value α ∈ [0, 1] Get as prediction θ(x) = {yj : pv(yj ) ≥ 1 − α}. 9 6= training and test sets Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 44 / 56
  • 47.
    Imprecision in learningImprecision in predictions Conformal prediction: example α = 0.9 Assume 10 validation data with scores −0.1; 0.3; −0.4; 0.1; 0; −0.6; −0.2; 0.2; 0.3; −0.1; we observe an instance x Let us test θ(x) = {} Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 45 / 56
  • 48.
    Imprecision in learningImprecision in predictions Conformal prediction: example α = 0.9 Assume 10 validation data with scores −0.1; 0.3; −0.4; 0.1; 0; −0.6; −0.2; 0.2; 0.3; −0.1; we observe an instance x if (x, a) has score 0.5, pv(a) = 0/11 0.1 θ(x) = {} Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 45 / 56
  • 49.
    Imprecision in learningImprecision in predictions Conformal prediction: example α = 0.9 Assume 10 validation data with scores −0.1; 0.3; −0.4; 0.1; 0; −0.6; −0.2; 0.2; 0.3; −0.1; we observe an instance x if (x, b) has score -0.2, pv(b) = 8/11 0.1 θ(x) = {b} Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 45 / 56
  • 50.
    Imprecision in learningImprecision in predictions Conformal prediction: example α = 0.9 Assume 10 validation data with scores −0.1; 0.3; −0.4; 0.1; 0; −0.6; −0.2; 0.2; 0.3; −0.1; we observe an instance x if (x, c) has score 0, pv(b) = 5/11 0.1 θ(x) = {b, c} Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 45 / 56
  • 51.
    Imprecision in learningImprecision in predictions On conformal prediction The pros: Provide global coverage guarantee10 Works on any score-base model (including deep ones), weak theoretical requirements (exchangeability) The cons: Need validation set May give fairly imprecise outputs if bad model/small validation set Not a clear difference between aleatoric/epistemic aspects 10 Some works on conditional coverage exist. Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 46 / 56
  • 52.
    Imprecision in learningImprecision in predictions Working with a set of model Output is a set M of (probabilistic) models For any m ∈ M, call m(x) its optimal prediction for x Take as θ(x) all possibly optimal predictions θ(x) = {m(x) : m ∈ M} Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 47 / 56
  • 53.
    Imprecision in learningImprecision in predictions Example X2 X1 a θ2 (X1 a) b θ1 (X2 b) θ(x) = { , } θ(x) = { , } θ(x) = { } θ(x) = { } Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 48 / 56
  • 54.
    Imprecision in learningImprecision in predictions On sets of models The pros: Approximate global coverage can be obtained11 Better control of imprecision If m probabilistic, easier to distinguish aleatoric/epistemic uncertainty The cons: Learning model has to be adapted → more or less painful Decision rule can lead to complex optimisation 11 Conditional one harder, not much on that for now. Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 49 / 56
  • 55.
    Imprecision in learningImprecision in predictions Outline 1 Basic setting Setting the learning framework Model selection (by loss minimisation) 2 Imprecision in learning Imprecise data (and precise models) Imprecision in models Imprecise data Other ways to get imprecise models Why looking for an imprecise model? Imprecision in predictions How to get an imprecise prediction? How to evaluate an imprecise prediction? Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 50 / 56
  • 56.
    Imprecision in learningImprecision in predictions The two doctors story In a hospital, doctors get 1$ each time diagnostic is right. 2 Doctors pretty sure that patients have either Pneumonia (P) or Bronchitis (B) Doctor 1 Flip a coin each time Diagnose the result Gets 0.5$ in average Doctor 2 Tells you he does not know b/w P and B Should his reward be 0.5 $, same as doc 1? higher? lower? Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 51 / 56
  • 57.
    Imprecision in learningImprecision in predictions Main solution so far for 0/1 loss u(Ŷ, y) =    0 if y / ∈ Ŷ α |Ŷ| + 1 − α |Ŷ|2 otherwise with u(Ŷ, y) = 1 if |Ŷ| = 1 and Ŷ = y Discounted accuracy: α = 1 u(Ŷ, y) = 1 |Ŷ| → no reward to cautiousness (cautiousness≡randomness) u65: α = 1.6, moderate reward to cautiousness u80: α = 2.2, big reward to cautiousness The higher α, the higher the reward Solutions exists for generic losses too. Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 52 / 56
  • 58.
    Imprecision in learningImprecision in predictions Boldness averseness illustrated 0 1/|Ŷ| 1 1/2 u50 u80 u65 0.8 0.65 0.5 2 classes predicted, good one in it Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 53 / 56
  • 59.
    Imprecision in learningImprecision in predictions References I [1] Timothee Cour, Ben Sapp, and Ben Taskar. Learning from partial labels. Journal of Machine Learning Research, 12(May):1501–1536, 2011. [2] Inés Couso and Luciano Sánchez. Machine learning models, epistemic set-valued data and generalized loss functions: An encompassing approach. Information Sciences, 358:129–150, 2016. [3] Juan José del Coz, Jorge Díez, and Antonio Bahamonde. Learning nondeterministic classifiers. Journal of Machine Learning Research, 10(Oct):2273–2293, 2009. [4] Thierry Denoeux. Maximum likelihood estimation from uncertain data in the belief function framework. IEEE Transactions on knowledge and data engineering, 25(1):119–130, 2013. [5] D. Dubois, S. Moral, and H. Prade. A semantics for possibility theory based on likelihoods,. Journal of Mathematical Analysis and Applications, 205(2):359 – 380, 1997. [6] Romain Guillaume, Inés Couso, and Didier Dubois. Maximum likelihood with coarse data based on robust optimisation. In Proceedings of the Tenth International Symposium on Imprecise Probability: Theories and Applications, pages 169–180, 2017. [7] Thien M Ha. The optimum class-selective rejection rule. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(6):608–615, 1997. Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 54 / 56
  • 60.
    Imprecision in learningImprecision in predictions References II [8] Eyke Hüllermeier. Learning from imprecise and fuzzy observations: Data disambiguation through generalized loss minimization. International Journal of Approximate Reasoning, 55(7):1519–1534, 2014. [9] Daniel Kuhn, Peyman Mohajerin Esfahani, Viet Anh Nguyen, and Soroosh Shafieezadeh-Abadeh. Wasserstein distributionally robust optimization: Theory and applications in machine learning. In Operations Research Management Science in the Age of Analytics, pages 130–166. INFORMS, 2019. [10] Vu-Linh Nguyen, Sébastien Destercke, and Eyke Hüllermeier. Epistemic uncertainty sampling. In International Conference on Discovery Science, pages 72–86. Springer, 2019. [11] Harris Papadopoulos. Inductive conformal prediction: Theory and application to neural networks. In Tools in artificial intelligence. Citeseer, 2008. [12] P. Walley. Statistical reasoning with imprecise Probabilities. Chapman and Hall, New York, 1991. [13] Gen Yang, Sébastien Destercke, and Marie-Hélène Masson. The costs of indeterminacy: How to determine them? IEEE transactions on cybernetics, 47(12):4316–4327, 2017. [14] M. Zaffalon. The naive credal classifier. J. Probabilistic Planning and Inference, 105:105–122, 2002. [15] Marco Zaffalon, Giorgio Corani, and Denis Mauá. Evaluating credal classifiers by utility-discounted predictive accuracy. International Journal of Approximate Reasoning, 53(8):1282–1301, 2012. Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 55 / 56
  • 61.
    Imprecision in learningImprecision in predictions References III Sébastien Destercke (CNRS) Imprecision and learning UTC Data science seminar 56 / 56