SlideShare a Scribd company logo
1 of 41
Download to read offline
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Nonparametric fiducial prediction
Vladimir Vovk
(based on joint work with many people)
Royal Holloway, University of London
BFF 2019, Duke University, 29 April 2019
Vladimir Vovk Nonparametric fiducial prediction 1
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
This talk
This talk: in one of the Fs in BFF (fiducial). (At least this is
my interpretation.)
It is mostly about fiducial prediction.
Moreover: it is mostly about nonparametric fiducial
prediction.
I will start from history.
Vladimir Vovk Nonparametric fiducial prediction 2
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Fisher’s fiducial prediction
Terminology
Validity of fiducial prediction
My plan
1 Parametric fiducial prediction
Fisher’s fiducial prediction
Terminology
Validity of fiducial prediction
2 Dempster–Hill procedure
3 Conformal predictive distributions
Vladimir Vovk Nonparametric fiducial prediction 3
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Fisher’s fiducial prediction
Terminology
Validity of fiducial prediction
Fisher’s publications
It appears that Fisher has only two publications that discuss
fiducial prediction:
R. A. Fisher.
Fiducial argument in statistical inference.
Annals of Eugenics, 1935.
R. A. Fisher.
Statistical Methods and Scientific Inference.
1956 (3rd edition: 1973).
Vladimir Vovk Nonparametric fiducial prediction 4
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Fisher’s fiducial prediction
Terminology
Validity of fiducial prediction
Fisher’s 1935 example
His example from the 1935 paper: the Gaussian IID model.
After observing y1, . . . , yn (past data), compute
¯y :=
1
n
n
i=1
yi, s2
:=
1
n − 1
n
i=1
(yi − ¯y)2
.
Then
t :=
n
n + 1
y − ¯y
s
,
where y is a future observation, has Student’s t-distribution
with n − 1 degrees of freedom (is a pivot).
Vladimir Vovk Nonparametric fiducial prediction 5
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Fisher’s fiducial prediction
Terminology
Validity of fiducial prediction
General scheme (1)
I will only talk about predicting one future scalar
observation (uncontroversial part).
General scheme (at least in Fisher’s work): we combine
the past observations Ypast and future observation Y to
obtain a pivot U:
U := Q(Ypast
, Y).
The distribution of U is independent of the parameter θ.
Without loss of generality we can assume that U is
uniformly distributed in [0, 1] (at least when it is
continuous): if not, replace U by FU(U), where FU is U’s
distribution function.
Vladimir Vovk Nonparametric fiducial prediction 6
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Fisher’s fiducial prediction
Terminology
Validity of fiducial prediction
General scheme (2)
If y ∈ R and Q(ypast, y) is increasing in y, y → F(ypast, y) is
a distribution function.
It is fiducial (predictive) distribution.
In our example: the pivot
t =
n
n + 1
y − ¯y
s
becomes the fiducial (predictive) distribution (function)
Q(ypast
, y) := Ft
n−1
n
n + 1
y − ¯y
s
,
Ft
n−1 being the distribution function of Student’s
t-distribution with n − 1 degrees of freedom.
Vladimir Vovk Nonparametric fiducial prediction 7
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Fisher’s fiducial prediction
Terminology
Validity of fiducial prediction
General scheme (3)
Summary:
fiducial predictive distribution = uniform pivot
(provided the pivot is a distribution function).
Notice: no explicit “fiducial inversion” is needed in this
exposition (unless we want prediction regions).
Fisher emphasized both continuity (in many publications)
and monotonicity (in 1962 in letters to Barnard and Sprott,
for parametric fiducial inference).
Vladimir Vovk Nonparametric fiducial prediction 8
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Fisher’s fiducial prediction
Terminology
Validity of fiducial prediction
In Fisher’s own words (in a non-predictive context)
His letter to Barnard in March (?) 1962:
A pivotal quantity is a function of parameters and statistics, the
distribution of which is independent of all parameters. To be of
any use in deducing probability statements about parameters,
let me add
(a) it involves only one parameter,
(b) the statistics involved are jointly exhaustive for that
parameter,
(c) it varies monotonically with that parameter.
In his publications Fisher also mentions continuity (“the
observations should not be discontinuous” in the 1956 book).
Vladimir Vovk Nonparametric fiducial prediction 9
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Fisher’s fiducial prediction
Terminology
Validity of fiducial prediction
Unpleasant possibility
There is no guarantee that Q(ypast, −∞) = 0 and
Q(ypast, ∞) = 1 is satisfied for all ypast.
For example, Q(ypast, −∞) > 0 means that there is a
positive mass at −∞.
We might want to disallow this.
Vladimir Vovk Nonparametric fiducial prediction 10
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Fisher’s fiducial prediction
Terminology
Validity of fiducial prediction
Fiducial prediction: existing terminology
fiducial distribution for a future observation (Fisher, 1935)
fiducial distribution (Hora, Buehler, McCullagh)
fiducial predictive distribution (Dawid, Wang)
predictive fiducial distribution (Hannig, Iyer, Wang)
marginal association (Martin and Liu)
predictive confidence distribution (Schweder, Hjort)
predictive distribution (Xie, Liu, Shen)
People often avoid “fiducial” to stay away from controversy.
Vladimir Vovk Nonparametric fiducial prediction 11
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Fisher’s fiducial prediction
Terminology
Validity of fiducial prediction
The terminology of this talk
Now I prefer to talk about predictive distributions without
imposing any conditions of validity a priori.
Fisher’s probabilistic calibration Q(Ypast, Y) ∼ U may be
the most fundamental notion of validity, but we may also
want to have Q(Ypast, Y) ∼ U given a σ-algebra F.
If F is generated by Ypast, it’s the ideal situation, but only
Bayesians can have it.
There are lots of alternative definitions (“marginal
calibration”, “F-ideal calibration”,. . . ).
Vladimir Vovk Nonparametric fiducial prediction 12
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Fisher’s fiducial prediction
Terminology
Validity of fiducial prediction
Validity and efficiency fiducial prediction
Since Q(Ypast, Y) is a pivot, fiducial prediction is calibrated
in probability by definition.
The main problem is its efficiency (or sharpness). Fisher
insisted that fiducial inference should be based on
exhaustive statistics, leading to uniqueness. This part of
his programme failed.
Gneiting et al.’s paradigm:
Probabilistic forecasting has the general goal of
maximizing the sharpness of predictive distributions,
subject to calibration.
Vladimir Vovk Nonparametric fiducial prediction 13
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Fisher’s fiducial prediction
Terminology
Validity of fiducial prediction
An example of conditional probabilistic calibration (1)
Peter McCullagh: in many cases (such as linear regression),
fiducial prediction is probabilistically calibrated conditionally on
a non-trivial σ-algebra.
Peter McCullagh.
Fiducial prediction.
2004,
http://www.stat.uchicago.edu/˜pmcc/reports/fiducial.pdf
McCullagh, Vovk, Nouretdinov, Devetyarov, Gammerman.
Conditional prediction intervals for linear regression.
ICMLA 2009.
Vladimir Vovk Nonparametric fiducial prediction 14
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Fisher’s fiducial prediction
Terminology
Validity of fiducial prediction
An example of conditional probabilistic calibration (2)
Our model is yi = β xi + σξi, where xi are fixed vectors and
ξi are IID with a known distribution P (does not have to be
Gaussian).
Let F be the σ-algebra of events invariant under the
transformations (y1, y2, . . . ) → (a x1 + by1, a x2 + by2, . . . ).
Then the fiducial predictive distribution is probabilistically
calibrated given F.
Vladimir Vovk Nonparametric fiducial prediction 15
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Fisher’s nonparametric fiducial inference
Dempster–Hill
My plan
1 Parametric fiducial prediction
2 Dempster–Hill procedure
Fisher’s nonparametric fiducial inference
Dempster–Hill
3 Conformal predictive distributions
Vladimir Vovk Nonparametric fiducial prediction 16
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Fisher’s nonparametric fiducial inference
Dempster–Hill
Nonparametric fiducial prediction in Fisher’s work
It might not even exist (Teddy Seidenfeld, personal
communication at BFF 2017).
But lots of authors believe that it does (Dempster 1963,
Lane and Sudderth 1984, Hill 1992, Coolen 1998).
In his 1992 paper “Bayesian nonparametric prediction and
statistical inference”, Bruce M. Hill writes:
Note that for all three of these authors [Student, Fisher,
Dempster] the justification for An seems to be purely
intuitive. Thus none give anything vaguely representing a
“proof” for An. . . .
In my talk at BFF4 I referred to Fisher–Dempster–Hill
(which became “Dempster–Hill” in the published paper).
Vladimir Vovk Nonparametric fiducial prediction 17
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Fisher’s nonparametric fiducial inference
Dempster–Hill
Nonparametric fiducial inference in Fisher’s work
But Fisher definitely introduced nonparametric fiducial
inference for parameters.
Fisher traced the idea back to Student (in “Student”, 1939
paper).
In the case of two observations y1 and y2 from N(µ, 1), the
probability that µ < y(1) is 1/4, the probability that
µ ∈ (y(1), y(2)) is 1/2, and the probability that µ > y(2) is 1/4.
Fisher extended this to an arbitrary sample size n and to
the pth quantile µp (dropping the Gaussian assumption).
Vladimir Vovk Nonparametric fiducial prediction 18
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Fisher’s nonparametric fiducial inference
Dempster–Hill
Predicting the third observation
Nonparametric fiducial prediction: started by Jeffreys
1932; predicting a third observation. Fiducial derivation:
Seidenfeld, 1995.
Fisher did not accept Jeffreys’s argument as fiducial;
probably because it’s blatantly discontinuous.
Vladimir Vovk Nonparametric fiducial prediction 19
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Fisher’s nonparametric fiducial inference
Dempster–Hill
Full Dempster–Hill procedure
Dempster (fiducial argument) and then Hill (Coolen’s NPI):
extended to n observations.
Hill’s definition of An: given the data y1, . . . , yn, the
probability that the next observation y falls in (y(i), y(i+1)) is
1/(n + 1), for each i = 0, . . . , n. By definition, y(0) = −∞,
and y(n+1) = ∞.
To make it into a fiducial predictive distribution, we need a
pivot.
To get a continuous pivot, let’s randomize: for τ ∼ U,
Q(ypast
, y) :=
|{i | yi < y}| + τ + τ |{i | yi = y}|
n + 1
(the last addend is to take care of possible ties).
Vladimir Vovk Nonparametric fiducial prediction 20
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Fisher’s nonparametric fiducial inference
Dempster–Hill
What the predictive distribution may look like
2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0
y
0.0
0.1
0.2
0.3
0.4
0.5
Q(y)
Vladimir Vovk Nonparametric fiducial prediction 21
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Conformal prediction
Split-conformal prediction
Extensions
My plan
1 Parametric fiducial prediction
2 Dempster–Hill procedure
3 Conformal predictive distributions
Conformal prediction
Split-conformal prediction
Extensions
Vladimir Vovk Nonparametric fiducial prediction 22
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Conformal prediction
Split-conformal prediction
Extensions
Our setting
Limitation of the Dempster–Hill procedure (constantly
criticized on this account): it does not cover regression of
classification; we need (x, y), not just y.
Our statistical model (for now): the observations are IID;
standard in machine learning. Each observation: pair
(x, y) (an object and its label).
There is a natural conformal pivot (uniform in [0, 1]).
Vladimir Vovk Nonparametric fiducial prediction 23
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Conformal prediction
Split-conformal prediction
Extensions
Conformity measure
A conformity measure is a function A mapping
observations (z1, . . . , zl) to conformity scores that is
equivariant: for any l and any permutation π of {1, . . . , l},
A(z1, . . . , zl) = (α1, . . . , αl) =⇒
A zπ(1), . . . , zπ(l) = απ(1), . . . , απ(l) .
Intuitively, αi measures how well zi conforms to the other
observations among z1, . . . , zl.
Usually A is built on top of some “base” algorithm. A
simple example:
αi := yi − ˆyi.
Vladimir Vovk Nonparametric fiducial prediction 24
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Conformal prediction
Split-conformal prediction
Extensions
Conformal pivot
The conformal pivot determined by a conformity measure A is
Q(y) :=
i | αy
i < αy + τ + τ i | αy
i = αy
n + 1
,
where i = 1, . . . , n and
(αy
1, . . . , αy
n, αy
) := A(z1, . . . , zn, (x, y)).
The implementation is not as easy as it looks!
Vladimir Vovk Nonparametric fiducial prediction 25
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Conformal prediction
Split-conformal prediction
Extensions
Some literature
Vladimir Vovk, Alex Gammerman, and Glenn Shafer.
Algorithmic Learning in a Random World.
Springer, New York, 2005.
Glenn Shafer and Vladimir Vovk.
A tutorial on conformal prediction.
Journal of Machine Learning Research 2008.
Most of the papers mentoned in this talk where I am a
co-author are at: http://alrw.net.
Vladimir Vovk Nonparametric fiducial prediction 26
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Conformal prediction
Split-conformal prediction
Extensions
Least Squares Predictive Machine (LSPM)
The main struggle is for the monotonicity in y. Even for
αi := yi − ˆyi, it’s not so obvious!
If ˆyi is the LS estimate based on all of z1, . . . , zl (“full
residual”), monotonicity can be violated.
If ˆyi is the LS estimate based on z1, . . . , zl with zi removed
(“deleted residual”), monotonicity can be violated.
But in an intermediate situation (“studentized residual”),
monotonicity always holds.
We can “kernelize” LSPM to cover non-linear situations.
Vladimir Vovk Nonparametric fiducial prediction 27
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Conformal prediction
Split-conformal prediction
Extensions
Efficiency results for LSPM
Fisher’s ideas (using exhaustive statistics leading to
uniqueness) do not seem to work.
A more promising (albeit more restrictive) idea (Burnaev,
Wasserman):
assume a statistical model under which the base algorithm
works perfectly
show that the corresponding conformal predictive
distributions also work well (so that the guaranteed validity
does not cost much).
For LSPM, the difference between the true distribution
function and the predicted one is O(n−1/2), with precise
weak convergence results.
Vladimir Vovk Nonparametric fiducial prediction 28
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Conformal prediction
Split-conformal prediction
Extensions
Asymptotic efficiency
Conformal predictive distributions are universally
consistent, for a suitable conformity measure.
They can be built on top of many classical universally
consistent algorithms, such as nearest neighbours.
Vladimir Vovk Nonparametric fiducial prediction 29
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Conformal prediction
Split-conformal prediction
Extensions
Exact statements and proofs
Vladimir Vovk, Jieli Shen, Valery Manokhin, and Min-ge
Xie.
Nonparametric predictive distributions based on conformal
prediction.
Machine Learning, 2019.
Vladimir Vovk, Ilia Nouretdinov, Valery Manokhin, and Alex
Gammerman.
Conformal predictive distributions with kernels.
In: Braverman’s Readings in Machine Learning, Lecture
Notes in Artificial Intelligence, 2018.
http://alrw.net, Working Paper 18.
Vladimir Vovk Nonparametric fiducial prediction 30
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Conformal prediction
Split-conformal prediction
Extensions
Split-conformal pivot (1)
Remember that full conformal predictive distributions may be
difficult to compute (this depends on the conformity measure).
Let us divide the training set z1, . . . , zn into two parts:
the training set proper, z1, . . . , zm, of size m,
and the calibration set, zm+1, . . . , zn, of size n − m.
Vladimir Vovk Nonparametric fiducial prediction 31
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Conformal prediction
Split-conformal prediction
Extensions
Split-conformal pivot (2)
The split-conformal pivot for a test object x is
Q(y) :=
|{i | αi < α}| + τ + τ |{i | αi = α}|
n − m + 1
,
where i ranges over m + 1, . . . , n and
αi := A(zi; z1, . . . , zm), α := A((x, y); z1, . . . , zm),
where there are no restrictions on the split-conformity
measure A.
Vladimir Vovk Nonparametric fiducial prediction 32
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Conformal prediction
Split-conformal prediction
Extensions
Discussion
Split-conformal predictive distributions are computationally
efficient but may lose predictive efficiency as compared
with full conformal predictive distributions, which use the
full training set as both training set proper and calibration
set.
Way out: divide the training set into a number of folds (as
in cross-validation) and use each fold in turn as calibration
set.
The resulting cross-conformal predictive distribution loses
guaranteed validity but is well-calibrated in practice (unless
the base algorithm is wildly randomized).
Vladimir Vovk Nonparametric fiducial prediction 33
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Conformal prediction
Split-conformal prediction
Extensions
Added flexibility
Problem with the conformity measure αi := yi − ˆyi: it
implicitly assumes homoscedasticity.
We can use a more flexible base algorithm, but then face
difficult calculations for full conformal predictive
distributions.
Or we can use the split-conformal method, which is trivial;
just make sure to make A((x, y); z1, . . . , zm) an increasing
function of y.
Vladimir Vovk Nonparametric fiducial prediction 34
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Conformal prediction
Split-conformal prediction
Extensions
Conformalizing predictive distributions
In particular, we can take
A((x, y); z1, . . . , zm) := F(y),
where F is a standard predictive distribution function for
the label of x computed from z1, . . . , zm as training set,
such as Nadaraya–Watson.
The resulting predictive distributions are probabilistically
calibrated under the IID assumption.
Therefore, “conformalizing” is a way of calibrating
predictive distributions.
A natural version of Dempster–Hill.
Vladimir Vovk Nonparametric fiducial prediction 35
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Conformal prediction
Split-conformal prediction
Extensions
Efficiency
A primitive efficiency result (http://alrw, Working
Paper 23):
If A((x, y); . . . ) as function of y is the true distribution
function conditional on x, the difference between the true
distribution function and the predicted one is
O((n − m)−1/2), with precise weak convergence results.
Can be stated for samples of any size (non-asymptotically).
This result is also true without the IID assumption.
But without the IID assumption we lose the validity
guarantee.
Vladimir Vovk Nonparametric fiducial prediction 36
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Conformal prediction
Split-conformal prediction
Extensions
More conditional calibration?
If we have a large training set, it is natural to aim for
conditional probabilistic calibration.
If we identify object clusters of a reasonable size from the
training set proper, we can do calibration for each cluster
separately. (For example, 1000 calibration observations
will give the accuracy ≈ 1/1000 ≈ 3% in the estimates of
probability.)
We will have calibration inside each cluster, and the hope
is to approach Dawid’s full calibration.
A. Philip Dawid.
Calibration-based empirical probability (with discussion).
Annals of Statistics, 1985.
Vladimir Vovk Nonparametric fiducial prediction 37
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Conformal prediction
Split-conformal prediction
Extensions
Repetitive structures
The IID assumption (equivalent to exchangeability for an
infinite sequence of observations) is a serious limitation of
conformal prediction.
Conformal prediction works in general repetitive structures
(Per Martin-Löf, Lauritzen), and there are lots of them
apart from the IID model.
For example, you can have partial exchangeability or
hypergraphical models.
Vladimir Vovk Nonparametric fiducial prediction 38
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Conformal prediction
Split-conformal prediction
Extensions
Another recent extension (1)
There is one extension that is extremely useful for practical
applications.
Rina Foygel Barber, Emmanuel J. Candes, Aaditya
Ramdas, Ryan J. Tibshirani.
Conformal Prediction Under Covariate Shift.
arXiv, April 2019.
They generalize the exchangeability assumption to weighted
exchangeability.
Vladimir Vovk Nonparametric fiducial prediction 39
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Conformal prediction
Split-conformal prediction
Extensions
Another recent extension (2)
What to do if the xi in the test set are generated from a
different distribution?
Example: a drug company looking for new drugs decides
to explore closely a specific region of the vast chemical
space of all compounds.
We still have the same distribution Y | X (the underlying
chemistry/biology) but the distribution of X changes.
Conformal prediction can be adapted to the situation
where dP /dP is known or can be estimated (where P and
P is the old and new distributions of X, respectively).
Vladimir Vovk Nonparametric fiducial prediction 40
Parametric fiducial prediction
Dempster–Hill procedure
Conformal predictive distributions
Conclusion
Key messages of this talk:
There are different notions of validity from the traditional
probabilistic calibration (some of them are stronger), and
they deserve to be studied for the modern versions of
fiducial prediction (Hannig, Martin,. . . ).
There are ways to extend fiducial prediction to
nonparametric settings, including those useful in
regression problems.
Thank you for your attention!
Vladimir Vovk Nonparametric fiducial prediction 41

More Related Content

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 
2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...
2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...
2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...
 
2019 GDRR: Blockchain Data Analytics - Tracking Criminals by Following the Mo...
2019 GDRR: Blockchain Data Analytics - Tracking Criminals by Following the Mo...2019 GDRR: Blockchain Data Analytics - Tracking Criminals by Following the Mo...
2019 GDRR: Blockchain Data Analytics - Tracking Criminals by Following the Mo...
 
2019 GDRR: Blockchain Data Analytics - Disclosure in the World of Cryptocurre...
2019 GDRR: Blockchain Data Analytics - Disclosure in the World of Cryptocurre...2019 GDRR: Blockchain Data Analytics - Disclosure in the World of Cryptocurre...
2019 GDRR: Blockchain Data Analytics - Disclosure in the World of Cryptocurre...
 
2019 GDRR: Blockchain Data Analytics - Real World Adventures at a Cryptocurre...
2019 GDRR: Blockchain Data Analytics - Real World Adventures at a Cryptocurre...2019 GDRR: Blockchain Data Analytics - Real World Adventures at a Cryptocurre...
2019 GDRR: Blockchain Data Analytics - Real World Adventures at a Cryptocurre...
 
2019 GDRR: Blockchain Data Analytics - Dissecting Blockchain Price Analytics...
2019 GDRR: Blockchain Data Analytics  - Dissecting Blockchain Price Analytics...2019 GDRR: Blockchain Data Analytics  - Dissecting Blockchain Price Analytics...
2019 GDRR: Blockchain Data Analytics - Dissecting Blockchain Price Analytics...
 
2019 GDRR: Blockchain Data Analytics - ChainNet: Learning on Blockchain Graph...
2019 GDRR: Blockchain Data Analytics - ChainNet: Learning on Blockchain Graph...2019 GDRR: Blockchain Data Analytics - ChainNet: Learning on Blockchain Graph...
2019 GDRR: Blockchain Data Analytics - ChainNet: Learning on Blockchain Graph...
 

Recently uploaded

9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 

Recently uploaded (20)

9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 

MUMS: Bayesian, Fiducial, and Frequentist Conference - Calibration of Probability Forecasts, Vladimir Vovk, April 29, 2019

  • 1. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Nonparametric fiducial prediction Vladimir Vovk (based on joint work with many people) Royal Holloway, University of London BFF 2019, Duke University, 29 April 2019 Vladimir Vovk Nonparametric fiducial prediction 1
  • 2. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions This talk This talk: in one of the Fs in BFF (fiducial). (At least this is my interpretation.) It is mostly about fiducial prediction. Moreover: it is mostly about nonparametric fiducial prediction. I will start from history. Vladimir Vovk Nonparametric fiducial prediction 2
  • 3. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Fisher’s fiducial prediction Terminology Validity of fiducial prediction My plan 1 Parametric fiducial prediction Fisher’s fiducial prediction Terminology Validity of fiducial prediction 2 Dempster–Hill procedure 3 Conformal predictive distributions Vladimir Vovk Nonparametric fiducial prediction 3
  • 4. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Fisher’s fiducial prediction Terminology Validity of fiducial prediction Fisher’s publications It appears that Fisher has only two publications that discuss fiducial prediction: R. A. Fisher. Fiducial argument in statistical inference. Annals of Eugenics, 1935. R. A. Fisher. Statistical Methods and Scientific Inference. 1956 (3rd edition: 1973). Vladimir Vovk Nonparametric fiducial prediction 4
  • 5. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Fisher’s fiducial prediction Terminology Validity of fiducial prediction Fisher’s 1935 example His example from the 1935 paper: the Gaussian IID model. After observing y1, . . . , yn (past data), compute ¯y := 1 n n i=1 yi, s2 := 1 n − 1 n i=1 (yi − ¯y)2 . Then t := n n + 1 y − ¯y s , where y is a future observation, has Student’s t-distribution with n − 1 degrees of freedom (is a pivot). Vladimir Vovk Nonparametric fiducial prediction 5
  • 6. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Fisher’s fiducial prediction Terminology Validity of fiducial prediction General scheme (1) I will only talk about predicting one future scalar observation (uncontroversial part). General scheme (at least in Fisher’s work): we combine the past observations Ypast and future observation Y to obtain a pivot U: U := Q(Ypast , Y). The distribution of U is independent of the parameter θ. Without loss of generality we can assume that U is uniformly distributed in [0, 1] (at least when it is continuous): if not, replace U by FU(U), where FU is U’s distribution function. Vladimir Vovk Nonparametric fiducial prediction 6
  • 7. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Fisher’s fiducial prediction Terminology Validity of fiducial prediction General scheme (2) If y ∈ R and Q(ypast, y) is increasing in y, y → F(ypast, y) is a distribution function. It is fiducial (predictive) distribution. In our example: the pivot t = n n + 1 y − ¯y s becomes the fiducial (predictive) distribution (function) Q(ypast , y) := Ft n−1 n n + 1 y − ¯y s , Ft n−1 being the distribution function of Student’s t-distribution with n − 1 degrees of freedom. Vladimir Vovk Nonparametric fiducial prediction 7
  • 8. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Fisher’s fiducial prediction Terminology Validity of fiducial prediction General scheme (3) Summary: fiducial predictive distribution = uniform pivot (provided the pivot is a distribution function). Notice: no explicit “fiducial inversion” is needed in this exposition (unless we want prediction regions). Fisher emphasized both continuity (in many publications) and monotonicity (in 1962 in letters to Barnard and Sprott, for parametric fiducial inference). Vladimir Vovk Nonparametric fiducial prediction 8
  • 9. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Fisher’s fiducial prediction Terminology Validity of fiducial prediction In Fisher’s own words (in a non-predictive context) His letter to Barnard in March (?) 1962: A pivotal quantity is a function of parameters and statistics, the distribution of which is independent of all parameters. To be of any use in deducing probability statements about parameters, let me add (a) it involves only one parameter, (b) the statistics involved are jointly exhaustive for that parameter, (c) it varies monotonically with that parameter. In his publications Fisher also mentions continuity (“the observations should not be discontinuous” in the 1956 book). Vladimir Vovk Nonparametric fiducial prediction 9
  • 10. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Fisher’s fiducial prediction Terminology Validity of fiducial prediction Unpleasant possibility There is no guarantee that Q(ypast, −∞) = 0 and Q(ypast, ∞) = 1 is satisfied for all ypast. For example, Q(ypast, −∞) > 0 means that there is a positive mass at −∞. We might want to disallow this. Vladimir Vovk Nonparametric fiducial prediction 10
  • 11. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Fisher’s fiducial prediction Terminology Validity of fiducial prediction Fiducial prediction: existing terminology fiducial distribution for a future observation (Fisher, 1935) fiducial distribution (Hora, Buehler, McCullagh) fiducial predictive distribution (Dawid, Wang) predictive fiducial distribution (Hannig, Iyer, Wang) marginal association (Martin and Liu) predictive confidence distribution (Schweder, Hjort) predictive distribution (Xie, Liu, Shen) People often avoid “fiducial” to stay away from controversy. Vladimir Vovk Nonparametric fiducial prediction 11
  • 12. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Fisher’s fiducial prediction Terminology Validity of fiducial prediction The terminology of this talk Now I prefer to talk about predictive distributions without imposing any conditions of validity a priori. Fisher’s probabilistic calibration Q(Ypast, Y) ∼ U may be the most fundamental notion of validity, but we may also want to have Q(Ypast, Y) ∼ U given a σ-algebra F. If F is generated by Ypast, it’s the ideal situation, but only Bayesians can have it. There are lots of alternative definitions (“marginal calibration”, “F-ideal calibration”,. . . ). Vladimir Vovk Nonparametric fiducial prediction 12
  • 13. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Fisher’s fiducial prediction Terminology Validity of fiducial prediction Validity and efficiency fiducial prediction Since Q(Ypast, Y) is a pivot, fiducial prediction is calibrated in probability by definition. The main problem is its efficiency (or sharpness). Fisher insisted that fiducial inference should be based on exhaustive statistics, leading to uniqueness. This part of his programme failed. Gneiting et al.’s paradigm: Probabilistic forecasting has the general goal of maximizing the sharpness of predictive distributions, subject to calibration. Vladimir Vovk Nonparametric fiducial prediction 13
  • 14. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Fisher’s fiducial prediction Terminology Validity of fiducial prediction An example of conditional probabilistic calibration (1) Peter McCullagh: in many cases (such as linear regression), fiducial prediction is probabilistically calibrated conditionally on a non-trivial σ-algebra. Peter McCullagh. Fiducial prediction. 2004, http://www.stat.uchicago.edu/˜pmcc/reports/fiducial.pdf McCullagh, Vovk, Nouretdinov, Devetyarov, Gammerman. Conditional prediction intervals for linear regression. ICMLA 2009. Vladimir Vovk Nonparametric fiducial prediction 14
  • 15. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Fisher’s fiducial prediction Terminology Validity of fiducial prediction An example of conditional probabilistic calibration (2) Our model is yi = β xi + σξi, where xi are fixed vectors and ξi are IID with a known distribution P (does not have to be Gaussian). Let F be the σ-algebra of events invariant under the transformations (y1, y2, . . . ) → (a x1 + by1, a x2 + by2, . . . ). Then the fiducial predictive distribution is probabilistically calibrated given F. Vladimir Vovk Nonparametric fiducial prediction 15
  • 16. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Fisher’s nonparametric fiducial inference Dempster–Hill My plan 1 Parametric fiducial prediction 2 Dempster–Hill procedure Fisher’s nonparametric fiducial inference Dempster–Hill 3 Conformal predictive distributions Vladimir Vovk Nonparametric fiducial prediction 16
  • 17. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Fisher’s nonparametric fiducial inference Dempster–Hill Nonparametric fiducial prediction in Fisher’s work It might not even exist (Teddy Seidenfeld, personal communication at BFF 2017). But lots of authors believe that it does (Dempster 1963, Lane and Sudderth 1984, Hill 1992, Coolen 1998). In his 1992 paper “Bayesian nonparametric prediction and statistical inference”, Bruce M. Hill writes: Note that for all three of these authors [Student, Fisher, Dempster] the justification for An seems to be purely intuitive. Thus none give anything vaguely representing a “proof” for An. . . . In my talk at BFF4 I referred to Fisher–Dempster–Hill (which became “Dempster–Hill” in the published paper). Vladimir Vovk Nonparametric fiducial prediction 17
  • 18. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Fisher’s nonparametric fiducial inference Dempster–Hill Nonparametric fiducial inference in Fisher’s work But Fisher definitely introduced nonparametric fiducial inference for parameters. Fisher traced the idea back to Student (in “Student”, 1939 paper). In the case of two observations y1 and y2 from N(µ, 1), the probability that µ < y(1) is 1/4, the probability that µ ∈ (y(1), y(2)) is 1/2, and the probability that µ > y(2) is 1/4. Fisher extended this to an arbitrary sample size n and to the pth quantile µp (dropping the Gaussian assumption). Vladimir Vovk Nonparametric fiducial prediction 18
  • 19. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Fisher’s nonparametric fiducial inference Dempster–Hill Predicting the third observation Nonparametric fiducial prediction: started by Jeffreys 1932; predicting a third observation. Fiducial derivation: Seidenfeld, 1995. Fisher did not accept Jeffreys’s argument as fiducial; probably because it’s blatantly discontinuous. Vladimir Vovk Nonparametric fiducial prediction 19
  • 20. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Fisher’s nonparametric fiducial inference Dempster–Hill Full Dempster–Hill procedure Dempster (fiducial argument) and then Hill (Coolen’s NPI): extended to n observations. Hill’s definition of An: given the data y1, . . . , yn, the probability that the next observation y falls in (y(i), y(i+1)) is 1/(n + 1), for each i = 0, . . . , n. By definition, y(0) = −∞, and y(n+1) = ∞. To make it into a fiducial predictive distribution, we need a pivot. To get a continuous pivot, let’s randomize: for τ ∼ U, Q(ypast , y) := |{i | yi < y}| + τ + τ |{i | yi = y}| n + 1 (the last addend is to take care of possible ties). Vladimir Vovk Nonparametric fiducial prediction 20
  • 21. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Fisher’s nonparametric fiducial inference Dempster–Hill What the predictive distribution may look like 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 y 0.0 0.1 0.2 0.3 0.4 0.5 Q(y) Vladimir Vovk Nonparametric fiducial prediction 21
  • 22. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Conformal prediction Split-conformal prediction Extensions My plan 1 Parametric fiducial prediction 2 Dempster–Hill procedure 3 Conformal predictive distributions Conformal prediction Split-conformal prediction Extensions Vladimir Vovk Nonparametric fiducial prediction 22
  • 23. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Conformal prediction Split-conformal prediction Extensions Our setting Limitation of the Dempster–Hill procedure (constantly criticized on this account): it does not cover regression of classification; we need (x, y), not just y. Our statistical model (for now): the observations are IID; standard in machine learning. Each observation: pair (x, y) (an object and its label). There is a natural conformal pivot (uniform in [0, 1]). Vladimir Vovk Nonparametric fiducial prediction 23
  • 24. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Conformal prediction Split-conformal prediction Extensions Conformity measure A conformity measure is a function A mapping observations (z1, . . . , zl) to conformity scores that is equivariant: for any l and any permutation π of {1, . . . , l}, A(z1, . . . , zl) = (α1, . . . , αl) =⇒ A zπ(1), . . . , zπ(l) = απ(1), . . . , απ(l) . Intuitively, αi measures how well zi conforms to the other observations among z1, . . . , zl. Usually A is built on top of some “base” algorithm. A simple example: αi := yi − ˆyi. Vladimir Vovk Nonparametric fiducial prediction 24
  • 25. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Conformal prediction Split-conformal prediction Extensions Conformal pivot The conformal pivot determined by a conformity measure A is Q(y) := i | αy i < αy + τ + τ i | αy i = αy n + 1 , where i = 1, . . . , n and (αy 1, . . . , αy n, αy ) := A(z1, . . . , zn, (x, y)). The implementation is not as easy as it looks! Vladimir Vovk Nonparametric fiducial prediction 25
  • 26. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Conformal prediction Split-conformal prediction Extensions Some literature Vladimir Vovk, Alex Gammerman, and Glenn Shafer. Algorithmic Learning in a Random World. Springer, New York, 2005. Glenn Shafer and Vladimir Vovk. A tutorial on conformal prediction. Journal of Machine Learning Research 2008. Most of the papers mentoned in this talk where I am a co-author are at: http://alrw.net. Vladimir Vovk Nonparametric fiducial prediction 26
  • 27. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Conformal prediction Split-conformal prediction Extensions Least Squares Predictive Machine (LSPM) The main struggle is for the monotonicity in y. Even for αi := yi − ˆyi, it’s not so obvious! If ˆyi is the LS estimate based on all of z1, . . . , zl (“full residual”), monotonicity can be violated. If ˆyi is the LS estimate based on z1, . . . , zl with zi removed (“deleted residual”), monotonicity can be violated. But in an intermediate situation (“studentized residual”), monotonicity always holds. We can “kernelize” LSPM to cover non-linear situations. Vladimir Vovk Nonparametric fiducial prediction 27
  • 28. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Conformal prediction Split-conformal prediction Extensions Efficiency results for LSPM Fisher’s ideas (using exhaustive statistics leading to uniqueness) do not seem to work. A more promising (albeit more restrictive) idea (Burnaev, Wasserman): assume a statistical model under which the base algorithm works perfectly show that the corresponding conformal predictive distributions also work well (so that the guaranteed validity does not cost much). For LSPM, the difference between the true distribution function and the predicted one is O(n−1/2), with precise weak convergence results. Vladimir Vovk Nonparametric fiducial prediction 28
  • 29. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Conformal prediction Split-conformal prediction Extensions Asymptotic efficiency Conformal predictive distributions are universally consistent, for a suitable conformity measure. They can be built on top of many classical universally consistent algorithms, such as nearest neighbours. Vladimir Vovk Nonparametric fiducial prediction 29
  • 30. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Conformal prediction Split-conformal prediction Extensions Exact statements and proofs Vladimir Vovk, Jieli Shen, Valery Manokhin, and Min-ge Xie. Nonparametric predictive distributions based on conformal prediction. Machine Learning, 2019. Vladimir Vovk, Ilia Nouretdinov, Valery Manokhin, and Alex Gammerman. Conformal predictive distributions with kernels. In: Braverman’s Readings in Machine Learning, Lecture Notes in Artificial Intelligence, 2018. http://alrw.net, Working Paper 18. Vladimir Vovk Nonparametric fiducial prediction 30
  • 31. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Conformal prediction Split-conformal prediction Extensions Split-conformal pivot (1) Remember that full conformal predictive distributions may be difficult to compute (this depends on the conformity measure). Let us divide the training set z1, . . . , zn into two parts: the training set proper, z1, . . . , zm, of size m, and the calibration set, zm+1, . . . , zn, of size n − m. Vladimir Vovk Nonparametric fiducial prediction 31
  • 32. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Conformal prediction Split-conformal prediction Extensions Split-conformal pivot (2) The split-conformal pivot for a test object x is Q(y) := |{i | αi < α}| + τ + τ |{i | αi = α}| n − m + 1 , where i ranges over m + 1, . . . , n and αi := A(zi; z1, . . . , zm), α := A((x, y); z1, . . . , zm), where there are no restrictions on the split-conformity measure A. Vladimir Vovk Nonparametric fiducial prediction 32
  • 33. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Conformal prediction Split-conformal prediction Extensions Discussion Split-conformal predictive distributions are computationally efficient but may lose predictive efficiency as compared with full conformal predictive distributions, which use the full training set as both training set proper and calibration set. Way out: divide the training set into a number of folds (as in cross-validation) and use each fold in turn as calibration set. The resulting cross-conformal predictive distribution loses guaranteed validity but is well-calibrated in practice (unless the base algorithm is wildly randomized). Vladimir Vovk Nonparametric fiducial prediction 33
  • 34. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Conformal prediction Split-conformal prediction Extensions Added flexibility Problem with the conformity measure αi := yi − ˆyi: it implicitly assumes homoscedasticity. We can use a more flexible base algorithm, but then face difficult calculations for full conformal predictive distributions. Or we can use the split-conformal method, which is trivial; just make sure to make A((x, y); z1, . . . , zm) an increasing function of y. Vladimir Vovk Nonparametric fiducial prediction 34
  • 35. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Conformal prediction Split-conformal prediction Extensions Conformalizing predictive distributions In particular, we can take A((x, y); z1, . . . , zm) := F(y), where F is a standard predictive distribution function for the label of x computed from z1, . . . , zm as training set, such as Nadaraya–Watson. The resulting predictive distributions are probabilistically calibrated under the IID assumption. Therefore, “conformalizing” is a way of calibrating predictive distributions. A natural version of Dempster–Hill. Vladimir Vovk Nonparametric fiducial prediction 35
  • 36. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Conformal prediction Split-conformal prediction Extensions Efficiency A primitive efficiency result (http://alrw, Working Paper 23): If A((x, y); . . . ) as function of y is the true distribution function conditional on x, the difference between the true distribution function and the predicted one is O((n − m)−1/2), with precise weak convergence results. Can be stated for samples of any size (non-asymptotically). This result is also true without the IID assumption. But without the IID assumption we lose the validity guarantee. Vladimir Vovk Nonparametric fiducial prediction 36
  • 37. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Conformal prediction Split-conformal prediction Extensions More conditional calibration? If we have a large training set, it is natural to aim for conditional probabilistic calibration. If we identify object clusters of a reasonable size from the training set proper, we can do calibration for each cluster separately. (For example, 1000 calibration observations will give the accuracy ≈ 1/1000 ≈ 3% in the estimates of probability.) We will have calibration inside each cluster, and the hope is to approach Dawid’s full calibration. A. Philip Dawid. Calibration-based empirical probability (with discussion). Annals of Statistics, 1985. Vladimir Vovk Nonparametric fiducial prediction 37
  • 38. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Conformal prediction Split-conformal prediction Extensions Repetitive structures The IID assumption (equivalent to exchangeability for an infinite sequence of observations) is a serious limitation of conformal prediction. Conformal prediction works in general repetitive structures (Per Martin-Löf, Lauritzen), and there are lots of them apart from the IID model. For example, you can have partial exchangeability or hypergraphical models. Vladimir Vovk Nonparametric fiducial prediction 38
  • 39. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Conformal prediction Split-conformal prediction Extensions Another recent extension (1) There is one extension that is extremely useful for practical applications. Rina Foygel Barber, Emmanuel J. Candes, Aaditya Ramdas, Ryan J. Tibshirani. Conformal Prediction Under Covariate Shift. arXiv, April 2019. They generalize the exchangeability assumption to weighted exchangeability. Vladimir Vovk Nonparametric fiducial prediction 39
  • 40. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Conformal prediction Split-conformal prediction Extensions Another recent extension (2) What to do if the xi in the test set are generated from a different distribution? Example: a drug company looking for new drugs decides to explore closely a specific region of the vast chemical space of all compounds. We still have the same distribution Y | X (the underlying chemistry/biology) but the distribution of X changes. Conformal prediction can be adapted to the situation where dP /dP is known or can be estimated (where P and P is the old and new distributions of X, respectively). Vladimir Vovk Nonparametric fiducial prediction 40
  • 41. Parametric fiducial prediction Dempster–Hill procedure Conformal predictive distributions Conclusion Key messages of this talk: There are different notions of validity from the traditional probabilistic calibration (some of them are stronger), and they deserve to be studied for the modern versions of fiducial prediction (Hannig, Martin,. . . ). There are ways to extend fiducial prediction to nonparametric settings, including those useful in regression problems. Thank you for your attention! Vladimir Vovk Nonparametric fiducial prediction 41