Fiducial inference has a long history, but was not taken seriously by most statisticians until a recent resurgence of interest. Can new life be breathed into its corpse? I will reprise some ancient analyses aiming to discover just how far fiducial logic can be taken before it collapses under its own weight.
MUMS: Bayesian, Fiducial, and Frequentist Conference - Can a Fiducial Phoenix Rise from the Ashes?, Phil Dawid, April 30, 2019
1. Can a Fiducial Phoenix Rise from the Ashes?
Philip Dawid1
Keynote Lecture, BFF6
30 April 2019
1
University of Cambridge
1 / 22
2. Fisher’s original fiducial argument
n observations from a bivariate normal distribution
r the sample correlation
ρ the population correlation
Distribution of r depends only on (n and) ρ, with cdf F(r; ρ)
For any fixed ρ, distribution of F(r; ρ) is U[0, 1]
(*): Pr{F(r; ρ) ≤ γ | ρ} = γ
{ρ : F(r; ρ) ≤ γ} is a level-γ (upper) confidence interval
After observing r, regard (*) as still valid
Assign “fiducial probability” γ to {ρ : F(r; ρ) ≤ γ}
Fiducial density of ρ: − ∂
∂ρF(r; ρ)
Savage: Making the Bayesian omelette without breaking the
Bayesian eggs
Lindley: Not a Bayesian posterior
2 / 22
3. Pivotal inference
A pivot e is a function of data x and parameter θ with known
distribution P not depending on θ: e ∼ P, with e ⊥⊥ θ.
e.g. F(r; ρ) ∼ U[0, 1] (x − µ)/(s/
√
n) ∼ tn−1
A fiducial distribution is obtained by regarding distribution P for e
as still relevant, after observing the data: “e ⊥⊥ x”
— (sometimes) yields a distribution for θ.
Validity? We should require that the data are “uninformative”
about the pivot (vague!)
No informative prior distribution
Hacking: Require that likelihood function based on data,
expressed as function of the pivot, is same (up to
proportionality) for any data.
Holds iff (possibly after transformation) we have a location
model. Then fiducial = posterior for uniform prior.
3 / 22
4. Multiparameter case
Sample of size n from N(µ, σ2). Usual estimates x, s2.
A bivariate pivot is
(e1, e2) =
n
1
2 (x − µ)
σ
,
s
σ
∼
N(0, 1),
χ2
n−1
n − 1
(indep.).
Invert:
µ = x −
se1
n
1
2 e2
σ =
s
e2
leading to joint fiducial distribution for (µ, σ).
Alternatively, proceed sequentially: Invert e2 to obtain fiducial
distribution for σ, then, fixing σ, invert e1 to obtain conditional
fiducial distribution for µ given σ
OK?
4 / 22
5. Alternative approaches (Dempster)
Let z = x/s, ζ = µ/σ. The distribution of z depends only on
ζ, so we can get a fiducial distribution for ζ by inverting the
cdf. This turns out to agree with marginal for ζ computed
from previous joint fiducial.
marginal consistency (see later)
Now construct the conditional fiducial distribution for σ given
ζ, based on pivot s/σ having its conditional distribution given
z (and ζ). But this differs from the conditional distribution of
σ given ζ computed from joint fiducial.
conditional inconsistency (see later)
5 / 22
6. Marginal inconsistency (Wilkinson)
xi ∼ N(µi , 1) (indep)
Pivot (x1 − µ1, . . . , xn − µn)
Joint fiducial distribution: µi ∼ N(xi , 1) (indep)
Let η = i µ2
i , y = i x2
i
Marginal fiducial distribution of η is non-central χ2:
η ∼ χ2
n(y)
In particular Ef (η − y | y) = n
But y has sampling distribution χ2
n(η), with E(η − y | η) = −n
Inverting cdf of y gives very different (incomplete) fiducial
distribution for η
For n = 50, y = 100 get central 95% fiducial interval for η:
(109, 196) from marginalising joint fiducial to η
(21, 89) from marginalising data to y
Wilkinson’s noncoherence principle allows joint fiducial to coexist
with above second fiducial distribution for η.
Such fiducial distributions do not obey usual probabilistic rules.
6 / 22
7. Non-unique pivotals (Mauldon)
n pairs (Xi1, Xi2) from N(0, Σ)
Sufficient statistic is SSP matrix S ∼ W (n; Σ)
We can uniquely express S = GGT, Σ = ΓΓT with G, Γ lower
triangular
Then E = Γ−1G is pivotal: E ∼
χn−1 0
N(0, 1) χn−2
(indep)
Inverting this pivotal (Γ = GE−1) gives a fiducial distribution for
Σ = ΓΓT
We can do a similar analysis after interchanging X1 and X2
Different pivotal
Different answers
though both yield same marginal for ρ (agreeing with Fisher)
since, under a monotonicity condition, when x and θ are both
real, all fiducial constructions yield same answer.
7 / 22
8. Functional models (Fraser; Dawid & Stone)
Instead of starting from a distributional model P = {Pθ} and
searching for a pivotal function of (x, θ), regard the fundamental
model as
x = f (θ, e) (x ∈ X, θ ∈ Θ, e ∈ E)
with e having a known distribution P (indep of θ) and known
function f . We simply write
x = θ ◦ e.
We get induced P—but distinct functional models can yield same
P.
Assume invertibility: given x, e, ∃ at most one θ s.t. x = θ ◦ e.
When ∃, call x, e compatible, and write θ = x ◦ e−1.
In simplest case (SFM) any x and e are compatible. Fiducial
distribution, for data x, is that implied by
θ = x ◦ e−1
, e ∼ P.
8 / 22
9. Examples
x, e, θ in group G, with group multiplication x = θe, θ = xe−1
satisfies Hacking’s irrelevance principle
e ∼ U[0, 1], x = F−1
θ (e)
recovers Fisher’s original approach
x = (θe1 + e3)/e2; θ = (xe2 − e3)/e1
Non-pivotal: e not expressible as a function of (x, θ)
Take e1 ∼ χn−1, e2 ∼ χn−2, e3 ∼ N(0, 1).
With r := x/
√
1 + x2, ρ := θ/
√
1 + θ2, we reproduce both the
sampling distribution and Fisher’s fiducial distribution for a
correlation coefficient
9 / 22
10. Model reduction
Suppose there exist ζ = ζ(θ), z = z(x) such that z(θ ◦ e) is a
function of ζ(θ) and e. Write as z = ζ ∗ e.
Then z ∗ e−1 = ζ = ζ(θ) = ζ(x ◦ e−1).
So we get same fiducial distribution for ζ, whether we marginalise
from the full model, or work directly with the reduced model.
However, above condition constrains allowable functions ζ of θ.
Example
(x, s) = (µ + σe1, σe2)
Let ζ = µ/σ, z = x/s.
Then get reduced model z = (ζ + e1)/e2
—confirms marginal consistency (Dempster).
Counter-example
xi = θi + ei , z = i x2
i , ζ = i θ2
i .
Although (with normality) p(z | θ) depends only on
ζ, z = i (θ + ei )2 can not be expressed as a
function of ζ and e. Marginalization of joint fiducial
to ζ is not supported.
10 / 22
11. Conditional inconsistency
SFM x = θ ◦ e, reduced model z = ζ ∗ e. Data x = x0, ⇒ z = z0.
How should we compute a conditional fiducial distribution of θ,
given ζ = ζ0?
1. Just condition the fiducial distribution of θ on ζ = ζ0.
Equivalently, since ζ = z0 ∗ e−1, condition the distribution of
e on z0 ∗ e−1 = ζ0 (and then use θ = x0 ◦ e−1).
2. Given that we know ζ = ζ0, we have observed the value z0 of
z = ζ0 ∗ e. So we should condition distribution of e on
ζ0 ∗ e = z0 (and then use θ = x0 ◦ e−1).
These yield different answers in general. Conditioning on
z0 ∗ e−1 = ζ0 is not the same as conditioning on ζ0 ∗ e = z0: even
though the two conditions appear logically equivalent, they are
embedded in different partitions of E.
11 / 22
12. Dempster’s example—reprise
Functional model: x = µ + σe1, s = σe2.
Data x = x0, s = s0.
Fiducial joint model:
µ = x0 −
s0e1
e2
, σ =
s0
e2
.
Define ζ = µ/σ, z = x/s, z0 = x0/s0.
We have reduced model z = (ζ + e1)/e2, with (consistent)
marginal fiducial model
ζ = z0e2 − e1.
Conditioning the joint fiducial on ζ = ζ0 is equivalent to
conditioning e on z0e2 − e1 = ζ0.
Alternatively, knowing ζ = ζ0, we have observed z = (ζ0 + e1)/e2
to take value z0. Conditioning on this is equivalent to conditioning
e on (ζ0 + e1)/e2 = z0.
Same logical information, different partitions, different conditional
distributions.
12 / 22
13. Non-simple FMs
Let Ex = {e : x and e are compatible}, i.e. x = θ ◦ e for some
(unique) θ — again denoted by θ = x ◦ e−1.
On observing x, we learn the logical information e ∈ Ex .
We should adjust the distribution of e to account for this.
How?
Even if this event has positive probability, naively conditioning on
it can be problematic. We need to consider how our information
has arisen.
13 / 22
14. Example
Mr Smith tells you: “I have two children, who are not twins.” At
this point you regard each of them as equally likely to be a boy (B)
or a girl (G), independently.
He then says: “One of them is a boy”.
Given this information, what is the probability he has two boys?
Argument 1 Initially you assessed 4 equally likely cases: BB, BG,
GB, GG. The new information rules out GG, leaving
3 cases, just one of which is BB. The conditional
probability is thus 1/3.
Argument 2 You might consider that, if he had 2 boys, he would
have said “They are both boys”. The fact that he did
not then implies a conditional probability of 0.
Moral: When conditioning on information, we must take account
of what other information might have been obtained. Otherwise
put, we must know what question the information answers. Was it
the question “Do you have a boy?”, or the question “How many
boys do you have?”.
14 / 22
15. Non-simple FM: Conditioning
Any question defines a random variable, with values the possible
answers. These generate a partition of the sample space.
Conditioning requires a partition, not just an event.
On observing x, we learn e ∈ Ex . Is there a question about e that
this answers?
We might have observed another value y ∈ X, so learning e ∈ Ey .
It is only appropriate to condition on the learned information if
{Ex : x ∈ X} forms a partition of E (i.e., Ex and Ey are either
identical or disjoint)—in which case we term the FM partitionable.
There then exist essentially unique functions a(·), u(·) s.t. x and e
are compatible (e ∈ Ex ) iff a(x) = u(e).
Distribution of a(x) does not depend on θ—functional ancillary
(essentially unique!)
The fiducial distribution is now that of x ◦ e−1, where e ∼ P
conditioned on u(e) = a(x)
A non-partitionable FM does not support fiducial inference.
15 / 22
16. Examples
Structural model
X = E, θ ∈ G, a group of transformations on E.
Partition is by orbits of G.
a(·) = u(·) = maximal invariant under group action.
e.g., location-scale model
E = X = Rn, θ = (µ, σ), x = θe = µ1 + σe.
a(x) = ((xi − x)/sx )).
θ = (x − sx e/se, sx /se)
Fiducial distribution on applying distribution of
(e, se) given (ei − e)/se = (xi − x)/sx , all i.
Variation (reduced model) (z = x/sx , ζ = µ/σ)
E = Rn, Z = {z ∈ Rn : sz = 1}, ζ ∈ R.
zi = (ζ + ei )/se. Partitionable, with
u(e) = ((ei − e)/se)), a(z) = (zi − z).
Non-partitionable model
x, e, θ real, θ > 0, x = θ + e. Ex = (−∞, x).
Positive probability—but should one condition?
16 / 22
17. Simple non-invertible FM, SNIFM
Let Sx,e := {θ : x = θ ◦ e}. Now allow #{Sx,e} > 1.
Suppose first Sx,e is never empty (x, e always compatible).
Fiducial principle: After observing x, no new logical information
about e. So still take e ∼ P.
Now Sx,e becomes a random set, Sx .
We can use this to define belief and plausibility functions for θ
(Dempster-Shafer):
Belx (θ ∈ A) = P ({e : Sx,e ⊆ A})
Plx (θ ∈ A) = P ({e : Sx,e ∩ A = ∅})
Recent variations:
Randomly select some θ ∈ Sx,e (Hannig)
Use a belief function Bel∗
x ≤ Belx (Martin, Zhang, Liu)
Is functional model essential/used (cf. Wilkinson)?
17 / 22
18. General NIFM
Now allow Sx,e = ∅
On observing x, the logical information about e that we learn is
e ∈ Ex := {e : Sx,e = ∅}.
We should condition on this—but again, subject to the
partitionability requirement that, for all x, x ∈ X, Ex and Ex are
either identical or disjoint.
Again, this holds iff ∃ functions a(x), u(e) such that e ∈ Ex iff
u(e) = a(x).
Let Pa be distribution of e given u(e) = a. On observing x, take
e ∼ Pa(x), and use this to define the distribution of the random set
Sx,e (and so Belx , Plx ).
18 / 22
19. Example
xi = 1(ei ≤ θ), i = 1, . . . , n
(if ei are iid U[0, 1], we have n independent Bernoulli(θ) variables)
On observing x, we learn about e:
e ∈ Ex := {e : ei < ej iff xi = 1, xj = 0}. (*)
Then θ lies in Sx,e := [ e(r), e(r+1) ) (r = i xi ).
This is a random set — but based on what distribution for e?
We should be conditioning on (∗). How? Is this permissible?
Only if Ex1 , Ex2 are either identical or disjoint.
But this is not so: e.g., E1 = E, which is not disjoint from (or
identical with) any other Ex.
19 / 22
20. Concluding comments
Fiducial inference may never be a fully self-consistent system
Functional models are useful for delimiting what can logically
be said
We need partitionability to account properly for the logical
information in the data
Even when fiducial inference is not valid of itself, it may be
useful for other purposes (e.g., confidence properties)
THANK YOU!
20 / 22
21. References
Dawid, A. P. and Stone, M. (1982). The functional-model basis of
fiducial inference (with Discussion). Ann. Statist. 10, 1054–1074.
Dempster, A. P. (1963). Further examples of inconsistencies in the
fiducial argument. Ann. Math. Statist. 34, 884–891.
Dempster, A. P. (2008), The Dempster-Shafer calculus for
statisticians. Internat. J. Approx. Reason. 48, 365–377.
Fisher, R. A. (1930). Inverse probability. Math. Proc. Cambridge
Philos. Soc. 26, 528–535.
Fraser, D. A. S. (1968). The Structure of Inference. Wiley, New
York.
Hacking, I. (1965). Logic of Statistical Inference. Cambridge
University Press.
21 / 22
22. Hannig, J. (2009). On generalized fiducial inference. Statistica
Sinica 19, 491–544.
Lindley, D. V. (1958). Fiducial distributions and Bayes’ theorem.
J. Roy. Statist. Soc. B 20, 102–107.
Martin, R., Zhang, J. and Liu, C. (2010). Dempster-Shafer theory
and statistical inference with weak beliefs. Statistical Science 25,
72–87.
Mauldon, J. G. (1955). Pivotal quantities for Wishart’s and related
distributions, and a paradox in fiducial theory. J. Roy. Statist. Soc.
B 17, 79–85.
Wilkinson, G. N. (1977). On resolving the controversy in statistical
inference (with Discussion). J. Roy. Statist. Soc. B 39, 119–171.
22 / 22