SlideShare a Scribd company logo
1 of 55
Download to read offline
Mayo: May 2
1
“Putting the Brakes on the Breakthrough, or
‘How I used simple logic to uncover a flaw in a controversial 50-
year old ‘theorem’ in statistical foundations taken as a
‘breakthrough’ in favor of Bayesian vs frequentist error statistics’”
 Example 1: Trying and Trying Again: Optional stopping
 Example 2: Two instruments with different precisions
(you shouldn’t get credit/blame for something you didn’t do)
 The Breakthrough: Birnbaumization
 Imaginary dialogue with Allan Birnbaum
Mayo: May 2
2
Example 1: Trying and Trying Again: Optional stopping
One of the problems with “replication” these days is that researchers
“try and try again” to get an impressive effect.
Researchers might have an effect they want to demonstrate and they
plan to keep sampling until the data differ sufficiently from a “no
effect” null hypothesis (e.g., mean concentration of a toxin).
Keep sampling until H0 is rejected at the .05 level
Ex.Random sample from Normal distribution
Yi ~ N(µ,) and testing H0: µ=0, vs. H1: µ0.
i.e., keep sampling until |relative frequency of success|  2
standard deviation units (nominal p-value of .05)
Mayo: May 2
3
After 10 trials, failing to rack up enough successes they go to 20
trials, and failing yet again go to 30 trials, and then 10 more, until
finally say on trial 169 they attained a statistically significant result.
Stopping rule: rule for when to stop sampling
Optional stopping vs fixed sample size
y*
: outcome in E2 optional stopping (stops with n = 169)
𝒙∗ : outcome in fixed sample size experiment E1 ( n = 169)
Comparing y* and x* the evidential assessment seems different
Mayo: May 2
4
For a legitimate .05 assessment:
Pr ( P-value < .05; H0)= .05
But with optional stopping, the probability of erroneously rejecting
the null accumulates
Pr (reach a P-value < .05 in optional stopping n = 169; H0)= .55
No longer improbable under the null, say ~.55
[This is a proper stopping rule, it will stop in finitely many trials, and
is assured of rejecting the null, even if the null is true]
Both experiments seek to make inferences about the same parameter
 and the statistical model is given.
Yi ~ N(µ,) and testing H0: µ=0, vs. H1: µ0.
Mayo: May 2
5
Throughout this discussion:
E2 optional stopping: y*
(stops with a 2 s.d. difference at n = 169)
E1 fixed sample size: 𝒙∗
(finds a 2 s.d. difference with n fixed at
169)
Mayo: May 2
6
 These two outcomes x* and y* have proportional likelihoods:
 The data can be viewed as a string < z1 z2, z3,…. z169 >
 Regardless of whether it came from E1 or E2
Pr(y*; H0) = cPr(x*;H0), c some constant
 Their likelihoods are proportional over values of parameter.
Mayo: May 2
7
Likelihood principle (LP): If likelihoods are proportional
Pr(y*; H0) = cPr(x*;H0) c, then inferences from the two should be
the same.
Because there are so many “givens” in the LP, I adopt a shorthand:
Abbreviations:
LP-pair: Whenever 𝒙∗
and y*
from different experiments E1 and E2
have proportional likelihoods over the parameter, I call them LP pairs
(The * indicates they are pairs)
Infr[E,z]: The inference from outcome z from experiment E
Likelihood Principle (LP). If y* and x* are LP pairs, then
Infr[E1,x*] = Infr[E2,y*]
Mayo: May 2
8
The LP is the controversial principle that I’m discussing.
The controversial discussion stems from statistician Allan Birnbaum
(1962).
He wanted to consider general principles of “informative” inference
(Infr), over all methods and schools, inferences about a parameter 
within a given statistical model.
Mayo: May 2
9
Go back to our example:
Example 1: Trying and Trying Again: Optional stopping
The likelihood principle emphasized in Bayesian statistics
implies, … that the rules governing when data collection stops
are irrelevant to data interpretation. (Edwards, Lindman,
Savage, 1963, p. 239).
Savage: The LP restores “simplicity and freedom that had been
lost” with frequentist methods: “optional stopping is no sin”
The LP follows if inference is by way of Bayes’s theory
Bayesians call this the Stopping Rule Principle SRP.
Mayo: May 2
10
Contemporary nonsubjective Bayesians concede they “have to
live with some violations of the likelihood and stopping rule
principles” (Ghosh, Delampady, and Sumanta 2006, 148), since
their prior probability distributions are influenced by the sampling
distribution. ..[O]bjectivity can only be defined relative to a frame
of reference, and this frame needs to include the goal of the
analysis.” (J. Berger 2006, 394). By contrast, Savage stressed:
In calculating [the posterior], our inference about , the only
contribution of the data is through the likelihood function….In
particular, if we have two pieces of data x* and y* with
[proportional] likelihood function ….the inferences about 
from the two data sets should be the same. This is not usually
true in the orthodox [frequent] theory and its falsity in that
theory is an example of its incoherence. (Lindley 1976, p. 36).
Mayo: May 2
11
LP violation: x* and y* form an LP pair, but
Infr[E1,x*] ≠ Infr[E2,y*]
The inference from x* from E1 differs from the inference from y*
from E1
While p1 would be .05, p2 would be ~.55.
Note: Birnbaum’s argument is to hold for any form of inference, p-
values, confidence levels, posteriors, likelihood ratios. I’m using p-
values for simplicity and since they are used in the initial
argument.
Mayo: May 2
12
(Frequentist) Error Statistical Methods
Some frequentists argue the optional stopping example alone as
enough to refute the strong likelihood principle.” (Cox, 1977, p.
54) since, with probability 1, it will stop with a “nominally”
significant result even though  = 0.
It violates the principle that we should avoid methods that give
misleading inferences with high or maximal probability (weak
repeated sampling principle).
Mayo: May 2
13
The error statistical philosophy is in a Popperian spirit:
Discrepancies from a null are “too cheap to be worth
having”, they only count if you would not have readily
(probably) gotten a 2 s.d. difference even if the null is true.
These probabilistic properties of inference procedures are error
frequencies or error probabilities.
Ignoring the stopping rule does not control and appraise
probativeness or severity of tests.
Same for multiple testing, cherry picking and the like.
But someone could say they don’t care about error probabilities;
anyway.
Mayo: May 2
14
But I’m not here to convince you of the error statistical
account that violates the LP!
I only want to talk about a supposed proof—one that seems to
follow from principles we, frequentist error statisticians
accept.
Birnbaum seemed to show the frequentist couldn’t
consistently allow these LP violations.
Mayo: May 2
15
The Birnbaum result heralded as a breakthrough in statistics!
Savage:
Without any intent to speak with exaggeration it seems to me
that this is really a historic occasion. This paper is a landmark
in statistics … I myself, like other Bayesian statisticians, have
been convinced of the truth of the likelihood principle for a long
time. Its consequences for statistics are very great.
… I can’t stop without saying once more that this paper is
really momentous in the history of statistics. It would be hard to
point to even a handful of comparable events. (Savage, 1962).
All error statistical notions, p-values, significance levels, … all
violate the likelihood principle (ibid.)
Mayo: May 2
16
(Savage doubted people would stop at the halfway house, but that
they’d go on to full blown Bayesianism; Birnbaum never did.)
I have no doubt that revealing the flaw in the alleged proof will not
be greeted with anything like the same recognition (Mayo, 2010).
Mayo: May 2
17
Sufficiency or Weak LP
Note: If there were only one experiment, then outcomes with
proportional likelihoods are evidentially equivalent:
If y1* and y2* are LP pairs within a single experiment E, then
Infr[E,y1*] = Infr[E,y2*]
Also called sufficiency principle or weak LP.
(Birnbaum will want to make use of this…)
All that was Example 1.
Mayo: May 2
18
Example 2: Conditionality (CP): You should not get credit (or
be blamed) for something you don’t deserve
Mixture: Cox (1958) Two instruments of different precisions
 We flip a fair coin to decide whether to use a very precise or
imprecise tool: E1 or E2.
 The experiment has 2 steps, first flip the coin to see whether E1
or E2 is performed, then perform it and record outcome
Mayo: May 2
19
Conditionality Principle CP
 CP: Once it is known which Ei produced z, the p-value or other
inferential assessment should be made conditioning on the
experiment actually run.
 This would not be so if the randomizer (the flipping) were
relevant to the parameter  the CP is always with a
irrelevant randomizer.

Example: in observing a normally distributed Z in testing a null
hypothesis = 0, E1 has variance of 1, while that of E2 is 106
.
The same z measurement would correspond to a much smaller p-
value if from E1 rather than E2: The p-value from E1 is much less
than from E2 p1 and p-value from p1 << p2
Mayo: May 2
20
Unconditional or convex combination
What if someone reported the precision of the experiment as the
average of the two:
[p1 + p2]/2
This unconditional assessment is also called the convex
combination of the p-values averaged over the two sample sizes.
Mayo: May 2
21
Suppose we know we observed E2 with its much larger variance:
The unconditional test says that we can assign this a higher level
of significance than we ordinarily do, because if we were to
repeat the experiment, we might sample some quite different
distribution. But this fact seems irrelevant to the interpretation
of an observation which we know came from a distribution
[with the larger variance] (Cox 1958, 361).
[The measurement using the imprecise tool shouldn't get
credit because the measurer might have used a precise tool;
the precise guy is blamed a big because he could have used the
lousy imprecise tool.]
Mayo: May 2
22
CP: Once it is known which Ei has produced z. condition on the Ei
producing the result.
Do not use the unconditional formulation.
It’s obvious.
Mayo: May 2
23
If it’s so obvious why was Cox’s paper considered one of the most
important criticisms of (certain) frequentist methods?
CP is not a mathematical principle, but one thought to reflect
intuitions about evidence.
There are contexts (not so blatant) where the frequentist would (in
effect) report the unconditional assessment e.g., if there were gong
to be many applications and you were reporting average precision
or the like.
The example only arose because Cox wanted to emphasize that in
using frequentist methods for inference (as opposed to long-run
decisions) you should condition if the randomizer is irrelevant to
the parameter of interest.
Mayo: May 2
24
The surprising upshot of the controversial argument I will consider
is it claims to show if you condition on the randomizer you have to
go all the way to the data point, which means no error probabilities
(the LP).
That’s what Birnbaum’s result purports: CP entails LP!
It is not uncommon to see statistics texts argue that in
frequentist theory one is faced with the following dilemma:
either to deny the appropriateness of conditioning on the
precision of the tool chosen by the toss of a coin, or else to
embrace the strong likelihood principle, which entails that
frequentist sampling distributions are irrelevant to inference
once the data are obtained. This is a false dilemma. . . . The
‘dilemma’ argument is therefore an illusion. (Cox and Mayo
2010, 298)
But the illusion is not so easy to dispel; thus this paper.
Mayo: May 2
25
NOW FOR THE BREAKTHROUGH
Shorthand: x* = (E1,x*); y* = (E2,y*)
LP: If x* and y* are LP pairs, then Infr[E1,x*] = Infr[E2,y*]
Whenever 𝒙∗ and y*
form an LP pair, but Infr[E1,x*] ≠ Infr[E2,y*] we
have a violation of the Likelihood Principle (LP).
We saw in Example 1:
y*
(a 2 s.d. result) in E2 optional stopping (stops with n = 169) had a
p-value ~.55
x*
(a 2 s.d. result) in E1, fixed n = 169, had a p-value =.05
The two outcomes form an LP pair, so we have an LP violation.
(Every use of error probabilities is an LP violation.)
Mayo: May 2
26
Birnbaumization
Step 1: Birnbaum Experiment EB:
Birnbaum will describe a funny kind of ‘mixture’ experiment
based on an LP pair;
You observed y* from experiment E2 (optional stopping)
 Enlarge it to an imaginary mixture: I am to imagine it resulted
from E2 getting tails on the toss of a fair coin, where heads
would have meant performing the fixed sample size experiment
with n = 169 from the start.
 Erasure: Next, erase the fact that y* came from E2 and report it
as if it came from (E1,x*)
Mayo: May 2
27
 Report the convex combination
o [p1 + p2]/2
p1 = .05 (E1-fixed sample size n = 169)
p2 = .55 (E2 optional stopping, stops at n = 169.
So the average is ~ .3
(the ½ comes from the imagined fair coin)
Mayo: May 2
28
I said it was a funny kind of mixture, there are two things that
make it funny:
 It didn’t happen, you only observed y* from E2
 You are to report an outcome as x* from E1 even though you
actually observed y* from E2
 The inference is based on the unconditional mixture
We may call it Birnbaumizing the result you got.
You are to do it for any LP pair, we’re focusing on just these two.
Having thus “Birnbaumized” the particular LP pair that you
actually observed, it appears that you must treat x* as evidentially
equivalent to its LP pair, y* (after all: you report y* as x*)
Mayo: May 2
29
Premise 1: Inferences from x* and y*, within EB using the convex
combination, are equivalent (Birnbaumization):
Infr[x*] = Infr[y*] both are equal to (p1 + p2)/2)
Premise 2 (a): From CP: An inference from y* conditioning on E2
(the experiment that produced it), is p2
Infr[y*] = p2
Premise 2 (b): An inference from x* conditioning on E1 is p1
Infr[x*] = p1
Conclusion: p1 = p2
So it appears if we hold CP we cannot have the LP violation.
Mayo: May 2
30
But x* and y* form a LP violation, so, p1 ≠ p2
Thus it would appear the frequentist is led into a contradiction.
He will do the same for any x* , y* LP pair: generally,
Infr[y*] = Infr[x*]
(i.e., no LP violations are possible)
One way to formulate the argument is this:
Mayo: May 2
31
Premise 1: If use Birnbaumization EB
Infr[x*] = Infr[y*] both are equal to (p1 + p2)/2)
Premise 2 (a): If use conditionality CP:
Infr[y*] = p2
Premise 2 (b): Infr[x*] = p1
He wants to infer: p1 = p2
But can have p1 ≠ p2
Whenever we have an LP violation p1 ≠ p2.
You can uphold the two principles, but not apply them in such
a way as to violate self-consistency.
Mayo: May 2
32
Construed this way, it’s an invalid argument; formulated as a
valid argument, it will not be sound.
Instead of continuing with the formal paper, let me sketch an
exchange I post on my blog each New Year’s eve, one I imagine
“Midnight With Birnbaum” akin to that Midnight in Paris movie.
Just as that writer gets to go back in time to get appraisals from
famous past writers, I imagine checking my criticism of Birnbaum
with him.
Just to zero in on the main parts…
Mayo: May 2
33
ERROR STATISTICAL PHILOSOPHER: It’s wonderful to meet
you Professor Birnbaum….I happen to be writing on your
famous argument about the likelihood principle (LP).
BIRNBAUM: Ultimately you know I rejected the LP as failing to
control error probabilities (except in the trivial case of testing
point against point hypotheses, provided they are
predesignated).
ES PHILOSOHER: I suspected you found the flaw in you
argument…? It doesn’t seem to work…
BIRNBAUM: Well, I shall happily invite you to take any case that
violates the LP and allow me to demonstrate that the
frequentist is led to inconsistency, provided she also wishes to
adhere to the Conditionality Principle CP
Mayo: May 2
34
Whoever wins this little argument pays for this whole bottle of
vintage champagne.
ES PHILOSOPHER: I really don’t mind paying for the bottle.
BIRNBAUM: Good, you will have to. Take any LP violation.
In two-sided Normal testing, y*
from optional stopping E2 (stops
with a 2 s.d. difference at n = 169).
I will show you that y* must be inferentially equivalent to
x*
from corresponding fixed sample size E1 (n = 169).
i.e. Infr[x*] = Infr[y*]
(We assume both experiments seek to make inferences about the
same parameter  and the model is given.)
Do you agree that: For a frequentist, y* is not evidentially
equivalent to x*.
Mayo: May 2
35
ES PHILOSOPHER: Yes, that’s a clear case where we reject the
LP, and it makes perfect sense to distinguish their
corresponding p-values: the searching optional stopping
experiment makes the p-value quite a bit higher than with the
fixed sample size.
For optional stopping p2 = .55
For fixed p1 = .05
I don’t see how you can make them equal.
BIRNBAUM: Well, I’ll show you’ll have to renounce another
principle you hold, the CP.
ES PHILOSOPHER: Laughs.
Mayo: May 2
36
BIRNBAUM: Suppose you’ve observed y* from E2. You admit,
do you not, that this outcome could have occurred as a result of
a different experiment? It could have been that a fair coin was
flipped where it is agreed that heads instructs you to perform
E1 and tails instructs you to perform the optional stopping test
E2, and you happened to get tails and managed to stop at 169.
A mixture experiment.
ES PHILOSOPHER: Well, that is not how I got y*, but ok, it
could have occurred that way.
BIRNBAUM: Good. Then you must grant further that your result
could have come from a special variation on this mixture that I
have dreamt up, call it a Birnbaum experiment EB.
In an EB, if the outcome from the experiment you actually
performed has an outcome with a proportional likelihood to
one in some other experiment not performed, E1, then we say
Mayo: May 2
37
that your result has an “LP pair”. In that case, EB stipulates
that you are to report y* as if you had determined whether to
run E1 or E2 by flipping a fair coin (given as irrelevant to the
parameter of interest).
ES PHILOSOPHER: So let’s see if I understand a Birnbaum
experiment EB. I run E2 and when I stop I report my observed
2s.d. difference came from E1, and as a result of this strange
type of a mixture experiment.
BIRNBAUM: Yes, or equivalently you could just report
y*: my result is a 2 s.d. difference and it could have come from
either E1 (fixed sampling, n= 169) or E2 (optional stopping,
which happens to stop at the 169th
trial). The proof is often put
this way, it makes no difference.
Mayo: May 2
38
ES PHILOSOPHER: You’re saying if my result has an LP pair in
the experiment not performed, I should act as if I accept the LP
and so if the likelihoods are proportional in the two
experiments (both testing the same mean), the outcomes yield
the same inference.
BIRNBAUM: Well, wait, experiment EB, as an imagined mixture it
is a single experiment, so really you only need to apply the
sufficiency principle (the weak LP) which everyone
accepts. Yes?
ES PHILOSOPHER: So Birnbaumization has turned the two
experiments into one. But how do I calculate the p-value within
your Birnbaumized experiment?
BIRNBAUM: I don’t think anyone has ever called it that.
Mayo: May 2
39
ES PHILOSOPHER: I just wanted to have a shorthand for the
operation you are describing, … So how do I calculate the p-
value within EB?
BIRNBAUM: You would report the overall (unconditional) p-
value, which would be the average over the sampling
distributions: (p1+ p2)/2.
(The convex combination.) Say p1 is .05 and p2 is ~.55;
anyway, we know they are different, that’s what makes this a
violation of the LP.
ES PHILOSOPHER: So you’re saying that if I observe a 2-s.d.
difference from E2, I do not report the associated p-value .55,
but instead I am to report the average p-value, averaging over
some other experiment E1 that could have given rise to an
outcome with a proportional likelihood to the one I observed,
even though I didn’t obtain it this way?
Mayo: May 2
40
BIRNBAUM: I’m saying that you have to grant that y* from an
optional stopping experiment E2 could have been generated
through Birnbaum experiment EB.
If you follow the rules of EB, then y* is evidentially equivalent
to x* This is premise 1.
Premise 1: Within EB
The inference from y* = the inference from x* both are
equal to (p1 + p2)/2)
(defn of EB and sufficiency)
ES PHILOSOPHER: But this is just a matter of your definition of
inference within your Birnbaumized experiment EB . We still
have p1 different from p2 so I don’t see the contradiction with
my rejecting the LP.
Mayo: May 2
41
BIRNBAUM: Hold on; I’m about to get to the second premise,
premise 2. So far all of this was premise 1.
ES PHILOSOPHER: Oy, what is premise 2?
BIRNBAUM: Premise 2 is this: Surely, you agree (following CP),
that once you know from which experiment the observed 2 s.d.
difference actually came, you ought to report the p-value
corresponding to that experiment. You ought not to report the
average (p1+ p2)/2 as you were instructed EB
Mayo: May 2
42
This gives us:
Premise 2 (a): From CP:
The inference from y* is p2
ES PHILOSOPHER: So, having first insisted I imagine myself in
a Birnbaumized, EB and report an average p-value, I’m to
return to my senses and “condition” to get back to where I was
to begin with?
BIRNBAUM: Yes, at least if you hold to the conditionality
principle CP (of D. R. Cox)
Likewise, if you knew it came from E1, then the p-value is
Premise 2 (b): From CP:
The inference from x* is p1
Mayo: May 2
43
So you arrive at (2a) and (2b), yes? Surely you accept Cox’s CP, I
know you’ve been working with him.
ES PHILOSOPHER: Well, the CP is defined for actual mixtures,
where one flips a coin to determine if E1 or E2 is performed.
One cannot perform the following: Toss a fair coin. If it
lands heads, perform an experiment E1 that yields a
member of an LP pair x*, if tails observe E2 that yields y*.
We do not know what would have resulted from the
unperformed experiment, much less that it would form an
LP pair with the observed y*. There is a single experiment,
and CP stipulates that we know which was performed and
what its outcome was.
BIRNBAUM: Wasn’t I clear that I’m dealing with hypothetical or
mathematical mixtures (rather than actual ones)?
Mayo: May 2
44
Mathematical or Hypothetical Mixtures
Envision the mathematical universe of LP pairs, each imagined
to have been generated from a -irrelevant mixture (for the
inference context at hand). When we observe y* we pluck the
x* companion needed for the argument. If the outcome doesn’t
have an LP pair, just proceed as normal, I’m only out to show
that
If x* and y* are LP pairs,
then the inference from x* = the inference from y*.
That's the LP, right.
A lot of people—including some brilliant statisticians—have
tried to block my argument because it deals in mathematical
mixtures, but I say there’s no clear way to distinguish actual (or
performable) mixtures from mathematical ones.
Mayo: May 2
45
ES PHILOSOPHER:I want to give your argument maximal
mileage (it’s been around for 50 years). I will allow CP is
definable for hypothetical -irrelevant mixtures out there, and
that we can Birnbaumize an experimental result. I cannot even
determine what the LP pair would need to be until after I’ve
observed y* but I guess I can pluck it down from the universe
of LP pais as needed.
BIRNBAUM: Exactly, then premises (1), (2a) and (2b) yield the
strong LP!
ES PHILOSOPHER: Clever, but your “proof” is obviously
unsound. Let me try to put it in valid form:
Mayo: May 2
46
Premise 1: ,In EB, the inference from y* = inference from x*
[both are equal to (p1 + p2)/2)]
Premise 2 (a): from CP (conditionality)
In EB (given y*), the inference from y* = p2
Premise 2 (b): from CP:
In EB , (given x*), the inference from x* = p1
Conclusion: The inference from y* = the inference from x*.
( p1= p2)
The premises about EB can’t both be true, so it’s unsound
CP says the inference from known result y* should not be the
average over experiments not performed: in other words, it
instructs me not to Birnbaumize, as premise 1 requires.
If Premise 1 is true, 2a and b are false (unless it happens that CP
makes no difference and there’s no LP violation).
Mayo: May 2
47
 Keeping to this valid formulation, the premises can hold true
only in a special case: when there is no LP violation.
 With LP violations there are three different distributions E1 ,
E2 , EB
 To show LP holds in general you’d have to start out assuming
there can be no LP violations which would make it circular.
 It certainly doesn’t show the frequentist is open to
contradiction (by dint of accepting CP, and denying the LP.)
 Alternatively, (as shown earlier) I can get the premises to come
out true, but then the conclusion is false (for any LP
violation)—so it is invalid.
Every LP violation will be a case where the argument is unsound
(either invalid or premises conflict).
Mayo: May 2
48
Invalid
Premise 1: If use Birnbaumization EB
Infr[x*] = Infr[y*] both are equal to (p1 + p2)/2)
Premise 2 (a): If use conditionality CP:
Infr[y*] = p2
Premise 2 (b):
Infr[x*] = p1
Conclusion p1 = p2
Putting the premises as “if thens”, they can both be true but can
have p1 ≠ p2
Mayo: May 2
49
BIRNBAUM: I later came to see that merely blocking the LP
isn’t enough: “Durbin’s formulation (CP’), although weaker
than CP, is nevertheless too strong (implies too much of the
content of LP to be compatible with standard (non-
Bayesian) statistical concepts and techniques”. Same for
Kalbfleish and Evans’ recent treatment. In fact you still get
that the stopping rule is irrelevant.
ES PHILOSOPHER: My argument shows that any violation of
LP in frequentist sampling theory necessarily results in an
illicit substitution in the argument; it is in no danger of
implying “too much” of the LP: what was an LP violation
remains one.
BIRNBAUM: Yet some people still think it is a breakthrough!
Mayo: May 2
50
Why has it escaped detection? I think in the case of LP pairs, it is
easy to fall into an equivocation:
(I) Given the measurement (z*), it doesn’t matter if it came
from E1 or E2 (since it’s a member of an LP pair)
(II) Given y* is known to have come from E2, it doesn’t
matter if E2 was the result of an irrelevant randomizer
(tossing a coin) to choose between E1 and E2, or E2 was
fixed all along.
The first, (I), corresponds to Birnbaumization; the second, (II), to
CP.
Mayo: May 2
51
The clue was that LP is shown logically equivalent to CP. So I
knew a proof from CP to LP was circular, but it was hard to
demonstrate.
Mayo: May 2
52
 Example 1: Trying and Trying Again: Optional stopping
 Example 2: Two instruments with different precisions
(you shouldn’t get credit/blame for something you didn’t do)
 The Breakthrough: Birnbaumization
 Imaginary dialogue with Allan Birnbaum
Mayo: May 2
53
REFERENCES
Armitage, P. (1975). Sequential Medical Trials, 2nd
ed. New York: John Wiley & Sons.
Birnbaum, A. (1962). On the Foundations of Statistical Inference (with discussion),
Journal of the American Statistical Association, 57: 269–326.
Birnbaum. A. (1969). Concepts of Statistical Evidence. In Philosophy, Science, and
Method: Essays in Honor of Ernest Nagel, edited by S. Morgernbesser, P. Suppes,
and M. White, New York: St. Martin’s Press: 112-143.
Berger, J. O., and Wolpert, R.L. (1988). The Likelihood Principle, California Institute of
Mathematical Statistics, Hayward, CA.
Cox, D.R. (1977). “The Role of Significance Tests (with Discussion),” Scandinavian
Journal of Statistics, 4: 49–70.
Cox D. R. and Mayo. D. (2010). "Objectivity and Conditionality in Frequentist
Inference" in Error and Inference: Recent Exchanges on Experimental Reasoning,
Reliability and the Objectivity and Rationality of Science, edited by D Mayo and A.
Spanos, Cambridge: Cambridge University Press: 276-304.
Edwards, W., Lindman, H, and Savage, L. (1963). Bayesian Statistical Inference for
Psychological Research, Psychological Review, 70: 193-242.
Jaynes, E. T. (1976). Common Sense as an Interface. In Foundations of Probability
Theory, Statistical Inference and Statistical Theories of Science Volume 2, edited by
W. L. Harper and C.A. Hooker, Dordrect, The Netherlands: D.. Reidel: 218-257.
Mayo: May 2
54
Joshi, V. M. (1976). “A Note on Birnbaum’s Theory of the Likelihood Principle.”
Journal of the American Statistical Association 71, 345-346.
Joshi, V. M. (1990). “Fallacy in the Proof of Birnbaum’s Theorem.” Journal of
Statistical Planning and Inference 26, 111-112.
Lindley D. V. (1976). Bayesian Statistics. In Foundations of Probability theory,
Statistical Inference and Statistical Theories of Science, Volume 2, edited by W. L.
Harper and C.A. Hooker, Dordrect, The Netherlands: D.. Reidel: 353-362.
Mayo, D. (1996). Error and the Growth of Experimental Knowledge. The University of
Chicago Press (Series in Conceptual Foundations of Science).
Mayo, D. (2010). "An Error in the Argument from Conditionality and Sufficiency to the
Likelihood Principle." In Error and Inference: Recent Exchanges on Experimental
Reasoning, Reliability and the Objectivity and Rationality of Science, edited by D.
Mayo and A. Spanos, Cambridge University Press. 305-314.
Mayo, D. and D. R. Cox. (2011) “Statistical Scientist Meets a Philosopher of Science: A
Conversation with Sir David Cox.” Rationality, Markets and Morals (RMM):
Studies at the Intersection of Philosophy and Economics. Edited by M. Albert, H.
Kliemt and B. Lahno. An open access journal published by the Frankfurt School:
Verlag. Volume 2, (2011), 103-114. http://www.rmm-journal.de/htdocs/st01.html
Mayo D. and A. Spanos, eds. (2010). Error and Inference: Recent Exchanges on
Experimental Reasoning, Reliability and the Objectivity and Rationality of Science,
Cambridge: Cambridge University Press.
Mayo: May 2
55
Pratt, John W, H. Raffia and R. Schlaifer. (1995). Introduction to Statistical Decision
Theory. Cambridge, MA: The MIT Press.
Savage, L., ed. (1962a), The Foundations of Statistical Inference: A Discussion. London:
Methuen & Co.
Savage, L. (1962b), “‘Discussion on Birnbaum (1962),” Journal of the American
Statistical Association, 57: 307–8.

More Related Content

What's hot

Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...jemille6
 
Phil6334 day#4slidesfeb13
Phil6334 day#4slidesfeb13Phil6334 day#4slidesfeb13
Phil6334 day#4slidesfeb13jemille6
 
Replication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden ControversiesReplication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden Controversiesjemille6
 
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...jemille6
 
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...jemille6
 
Controversy Over the Significance Test Controversy
Controversy Over the Significance Test ControversyControversy Over the Significance Test Controversy
Controversy Over the Significance Test Controversyjemille6
 
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist PerformanceProbing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performancejemille6
 
Severe Testing: The Key to Error Correction
Severe Testing: The Key to Error CorrectionSevere Testing: The Key to Error Correction
Severe Testing: The Key to Error Correctionjemille6
 
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"jemille6
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyjemille6
 
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in ScienceD. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in Sciencejemille6
 
Meeting #1 Slides Phil 6334/Econ 6614 SP2019
Meeting #1 Slides Phil 6334/Econ 6614 SP2019Meeting #1 Slides Phil 6334/Econ 6614 SP2019
Meeting #1 Slides Phil 6334/Econ 6614 SP2019jemille6
 
Final mayo's aps_talk
Final mayo's aps_talkFinal mayo's aps_talk
Final mayo's aps_talkjemille6
 
Gelman psych crisis_2
Gelman psych crisis_2Gelman psych crisis_2
Gelman psych crisis_2jemille6
 
Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma jemille6
 
D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy jemille6
 
Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)jemille6
 
Mayo &amp; parker spsp 2016 june 16
Mayo &amp; parker   spsp 2016 june 16Mayo &amp; parker   spsp 2016 june 16
Mayo &amp; parker spsp 2016 june 16jemille6
 
Feb21 mayobostonpaper
Feb21 mayobostonpaperFeb21 mayobostonpaper
Feb21 mayobostonpaperjemille6
 
April 3 2014 slides mayo
April 3 2014 slides mayoApril 3 2014 slides mayo
April 3 2014 slides mayojemille6
 

What's hot (20)

Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
 
Phil6334 day#4slidesfeb13
Phil6334 day#4slidesfeb13Phil6334 day#4slidesfeb13
Phil6334 day#4slidesfeb13
 
Replication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden ControversiesReplication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden Controversies
 
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...
 
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
 
Controversy Over the Significance Test Controversy
Controversy Over the Significance Test ControversyControversy Over the Significance Test Controversy
Controversy Over the Significance Test Controversy
 
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist PerformanceProbing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
 
Severe Testing: The Key to Error Correction
Severe Testing: The Key to Error CorrectionSevere Testing: The Key to Error Correction
Severe Testing: The Key to Error Correction
 
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severely
 
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in ScienceD. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
 
Meeting #1 Slides Phil 6334/Econ 6614 SP2019
Meeting #1 Slides Phil 6334/Econ 6614 SP2019Meeting #1 Slides Phil 6334/Econ 6614 SP2019
Meeting #1 Slides Phil 6334/Econ 6614 SP2019
 
Final mayo's aps_talk
Final mayo's aps_talkFinal mayo's aps_talk
Final mayo's aps_talk
 
Gelman psych crisis_2
Gelman psych crisis_2Gelman psych crisis_2
Gelman psych crisis_2
 
Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma
 
D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy
 
Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)
 
Mayo &amp; parker spsp 2016 june 16
Mayo &amp; parker   spsp 2016 june 16Mayo &amp; parker   spsp 2016 june 16
Mayo &amp; parker spsp 2016 june 16
 
Feb21 mayobostonpaper
Feb21 mayobostonpaperFeb21 mayobostonpaper
Feb21 mayobostonpaper
 
April 3 2014 slides mayo
April 3 2014 slides mayoApril 3 2014 slides mayo
April 3 2014 slides mayo
 

Similar to D. Mayo: Putting the brakes on the breakthrough: An informal look at the argument for the Likelihood Principle

Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)jemille6
 
NAVEEN KUMAR CONDITIONAL PROBABILITY COLLEGE ROLL-237094034 (1).pptx
NAVEEN KUMAR CONDITIONAL PROBABILITY COLLEGE ROLL-237094034 (1).pptxNAVEEN KUMAR CONDITIONAL PROBABILITY COLLEGE ROLL-237094034 (1).pptx
NAVEEN KUMAR CONDITIONAL PROBABILITY COLLEGE ROLL-237094034 (1).pptxBARUNSINGH43
 
Ppt unit-05-mbf103
Ppt unit-05-mbf103Ppt unit-05-mbf103
Ppt unit-05-mbf103Vibha Nayak
 
Different types of distributions
Different types of distributionsDifferent types of distributions
Different types of distributionsRajaKrishnan M
 
Mayo aug1, jsm slides (3)
Mayo aug1, jsm slides (3)Mayo aug1, jsm slides (3)
Mayo aug1, jsm slides (3)jemille6
 
Binomial probability distributions ppt
Binomial probability distributions pptBinomial probability distributions ppt
Binomial probability distributions pptTayab Ali
 
Probability and basic statistics with R
Probability and basic statistics with RProbability and basic statistics with R
Probability and basic statistics with RAlberto Labarga
 
Probabilistic decision making
Probabilistic decision makingProbabilistic decision making
Probabilistic decision makingshri1984
 
Mayo: 2nd half “Frequentist Statistics as a Theory of Inductive Inference” (S...
Mayo: 2nd half “Frequentist Statistics as a Theory of Inductive Inference” (S...Mayo: 2nd half “Frequentist Statistics as a Theory of Inductive Inference” (S...
Mayo: 2nd half “Frequentist Statistics as a Theory of Inductive Inference” (S...jemille6
 
The binomial distributions
The binomial distributionsThe binomial distributions
The binomial distributionsmaamir farooq
 
Presentation chi-square test & Anova
Presentation   chi-square test & AnovaPresentation   chi-square test & Anova
Presentation chi-square test & AnovaSonnappan Sridhar
 
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsificationjemille6
 
Frequentist inference only seems easy By John Mount
Frequentist inference only seems easy By John MountFrequentist inference only seems easy By John Mount
Frequentist inference only seems easy By John MountChester Chen
 
Artificial Intelligence Notes Unit 3
Artificial Intelligence Notes Unit 3Artificial Intelligence Notes Unit 3
Artificial Intelligence Notes Unit 3DigiGurukul
 
Hypothesis and Hypothesis Testing
Hypothesis and Hypothesis TestingHypothesis and Hypothesis Testing
Hypothesis and Hypothesis TestingNaibin
 

Similar to D. Mayo: Putting the brakes on the breakthrough: An informal look at the argument for the Likelihood Principle (20)

Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
 
NAVEEN KUMAR CONDITIONAL PROBABILITY COLLEGE ROLL-237094034 (1).pptx
NAVEEN KUMAR CONDITIONAL PROBABILITY COLLEGE ROLL-237094034 (1).pptxNAVEEN KUMAR CONDITIONAL PROBABILITY COLLEGE ROLL-237094034 (1).pptx
NAVEEN KUMAR CONDITIONAL PROBABILITY COLLEGE ROLL-237094034 (1).pptx
 
Ppt unit-05-mbf103
Ppt unit-05-mbf103Ppt unit-05-mbf103
Ppt unit-05-mbf103
 
Bill howe 5_statistics
Bill howe 5_statisticsBill howe 5_statistics
Bill howe 5_statistics
 
Different types of distributions
Different types of distributionsDifferent types of distributions
Different types of distributions
 
Mayo aug1, jsm slides (3)
Mayo aug1, jsm slides (3)Mayo aug1, jsm slides (3)
Mayo aug1, jsm slides (3)
 
Probability
ProbabilityProbability
Probability
 
Binomial probability distributions ppt
Binomial probability distributions pptBinomial probability distributions ppt
Binomial probability distributions ppt
 
Probability and basic statistics with R
Probability and basic statistics with RProbability and basic statistics with R
Probability and basic statistics with R
 
Spss session 1 and 2
Spss session 1 and 2Spss session 1 and 2
Spss session 1 and 2
 
Probabilistic decision making
Probabilistic decision makingProbabilistic decision making
Probabilistic decision making
 
ch 2 hypothesis
ch 2 hypothesisch 2 hypothesis
ch 2 hypothesis
 
Mayo: 2nd half “Frequentist Statistics as a Theory of Inductive Inference” (S...
Mayo: 2nd half “Frequentist Statistics as a Theory of Inductive Inference” (S...Mayo: 2nd half “Frequentist Statistics as a Theory of Inductive Inference” (S...
Mayo: 2nd half “Frequentist Statistics as a Theory of Inductive Inference” (S...
 
Hypothesis
HypothesisHypothesis
Hypothesis
 
The binomial distributions
The binomial distributionsThe binomial distributions
The binomial distributions
 
Presentation chi-square test & Anova
Presentation   chi-square test & AnovaPresentation   chi-square test & Anova
Presentation chi-square test & Anova
 
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
 
Frequentist inference only seems easy By John Mount
Frequentist inference only seems easy By John MountFrequentist inference only seems easy By John Mount
Frequentist inference only seems easy By John Mount
 
Artificial Intelligence Notes Unit 3
Artificial Intelligence Notes Unit 3Artificial Intelligence Notes Unit 3
Artificial Intelligence Notes Unit 3
 
Hypothesis and Hypothesis Testing
Hypothesis and Hypothesis TestingHypothesis and Hypothesis Testing
Hypothesis and Hypothesis Testing
 

More from jemille6

“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”jemille6
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilismjemille6
 
D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfjemille6
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfjemille6
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022jemille6
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inferencejemille6
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?jemille6
 
What's the question?
What's the question? What's the question?
What's the question? jemille6
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metasciencejemille6
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...jemille6
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Twojemille6
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...jemille6
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testingjemille6
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredgingjemille6
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probabilityjemille6
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severityjemille6
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)jemille6
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)jemille6
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...jemille6
 
The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (jemille6
 

More from jemille6 (20)

“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
 
D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdf
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdf
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inference
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?
 
What's the question?
What's the question? What's the question?
What's the question?
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metascience
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Two
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testing
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredging
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probability
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severity
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...
 
The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (
 

Recently uploaded

Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 

Recently uploaded (20)

Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 

D. Mayo: Putting the brakes on the breakthrough: An informal look at the argument for the Likelihood Principle

  • 1. Mayo: May 2 1 “Putting the Brakes on the Breakthrough, or ‘How I used simple logic to uncover a flaw in a controversial 50- year old ‘theorem’ in statistical foundations taken as a ‘breakthrough’ in favor of Bayesian vs frequentist error statistics’”  Example 1: Trying and Trying Again: Optional stopping  Example 2: Two instruments with different precisions (you shouldn’t get credit/blame for something you didn’t do)  The Breakthrough: Birnbaumization  Imaginary dialogue with Allan Birnbaum
  • 2. Mayo: May 2 2 Example 1: Trying and Trying Again: Optional stopping One of the problems with “replication” these days is that researchers “try and try again” to get an impressive effect. Researchers might have an effect they want to demonstrate and they plan to keep sampling until the data differ sufficiently from a “no effect” null hypothesis (e.g., mean concentration of a toxin). Keep sampling until H0 is rejected at the .05 level Ex.Random sample from Normal distribution Yi ~ N(µ,) and testing H0: µ=0, vs. H1: µ0. i.e., keep sampling until |relative frequency of success|  2 standard deviation units (nominal p-value of .05)
  • 3. Mayo: May 2 3 After 10 trials, failing to rack up enough successes they go to 20 trials, and failing yet again go to 30 trials, and then 10 more, until finally say on trial 169 they attained a statistically significant result. Stopping rule: rule for when to stop sampling Optional stopping vs fixed sample size y* : outcome in E2 optional stopping (stops with n = 169) 𝒙∗ : outcome in fixed sample size experiment E1 ( n = 169) Comparing y* and x* the evidential assessment seems different
  • 4. Mayo: May 2 4 For a legitimate .05 assessment: Pr ( P-value < .05; H0)= .05 But with optional stopping, the probability of erroneously rejecting the null accumulates Pr (reach a P-value < .05 in optional stopping n = 169; H0)= .55 No longer improbable under the null, say ~.55 [This is a proper stopping rule, it will stop in finitely many trials, and is assured of rejecting the null, even if the null is true] Both experiments seek to make inferences about the same parameter  and the statistical model is given. Yi ~ N(µ,) and testing H0: µ=0, vs. H1: µ0.
  • 5. Mayo: May 2 5 Throughout this discussion: E2 optional stopping: y* (stops with a 2 s.d. difference at n = 169) E1 fixed sample size: 𝒙∗ (finds a 2 s.d. difference with n fixed at 169)
  • 6. Mayo: May 2 6  These two outcomes x* and y* have proportional likelihoods:  The data can be viewed as a string < z1 z2, z3,…. z169 >  Regardless of whether it came from E1 or E2 Pr(y*; H0) = cPr(x*;H0), c some constant  Their likelihoods are proportional over values of parameter.
  • 7. Mayo: May 2 7 Likelihood principle (LP): If likelihoods are proportional Pr(y*; H0) = cPr(x*;H0) c, then inferences from the two should be the same. Because there are so many “givens” in the LP, I adopt a shorthand: Abbreviations: LP-pair: Whenever 𝒙∗ and y* from different experiments E1 and E2 have proportional likelihoods over the parameter, I call them LP pairs (The * indicates they are pairs) Infr[E,z]: The inference from outcome z from experiment E Likelihood Principle (LP). If y* and x* are LP pairs, then Infr[E1,x*] = Infr[E2,y*]
  • 8. Mayo: May 2 8 The LP is the controversial principle that I’m discussing. The controversial discussion stems from statistician Allan Birnbaum (1962). He wanted to consider general principles of “informative” inference (Infr), over all methods and schools, inferences about a parameter  within a given statistical model.
  • 9. Mayo: May 2 9 Go back to our example: Example 1: Trying and Trying Again: Optional stopping The likelihood principle emphasized in Bayesian statistics implies, … that the rules governing when data collection stops are irrelevant to data interpretation. (Edwards, Lindman, Savage, 1963, p. 239). Savage: The LP restores “simplicity and freedom that had been lost” with frequentist methods: “optional stopping is no sin” The LP follows if inference is by way of Bayes’s theory Bayesians call this the Stopping Rule Principle SRP.
  • 10. Mayo: May 2 10 Contemporary nonsubjective Bayesians concede they “have to live with some violations of the likelihood and stopping rule principles” (Ghosh, Delampady, and Sumanta 2006, 148), since their prior probability distributions are influenced by the sampling distribution. ..[O]bjectivity can only be defined relative to a frame of reference, and this frame needs to include the goal of the analysis.” (J. Berger 2006, 394). By contrast, Savage stressed: In calculating [the posterior], our inference about , the only contribution of the data is through the likelihood function….In particular, if we have two pieces of data x* and y* with [proportional] likelihood function ….the inferences about  from the two data sets should be the same. This is not usually true in the orthodox [frequent] theory and its falsity in that theory is an example of its incoherence. (Lindley 1976, p. 36).
  • 11. Mayo: May 2 11 LP violation: x* and y* form an LP pair, but Infr[E1,x*] ≠ Infr[E2,y*] The inference from x* from E1 differs from the inference from y* from E1 While p1 would be .05, p2 would be ~.55. Note: Birnbaum’s argument is to hold for any form of inference, p- values, confidence levels, posteriors, likelihood ratios. I’m using p- values for simplicity and since they are used in the initial argument.
  • 12. Mayo: May 2 12 (Frequentist) Error Statistical Methods Some frequentists argue the optional stopping example alone as enough to refute the strong likelihood principle.” (Cox, 1977, p. 54) since, with probability 1, it will stop with a “nominally” significant result even though  = 0. It violates the principle that we should avoid methods that give misleading inferences with high or maximal probability (weak repeated sampling principle).
  • 13. Mayo: May 2 13 The error statistical philosophy is in a Popperian spirit: Discrepancies from a null are “too cheap to be worth having”, they only count if you would not have readily (probably) gotten a 2 s.d. difference even if the null is true. These probabilistic properties of inference procedures are error frequencies or error probabilities. Ignoring the stopping rule does not control and appraise probativeness or severity of tests. Same for multiple testing, cherry picking and the like. But someone could say they don’t care about error probabilities; anyway.
  • 14. Mayo: May 2 14 But I’m not here to convince you of the error statistical account that violates the LP! I only want to talk about a supposed proof—one that seems to follow from principles we, frequentist error statisticians accept. Birnbaum seemed to show the frequentist couldn’t consistently allow these LP violations.
  • 15. Mayo: May 2 15 The Birnbaum result heralded as a breakthrough in statistics! Savage: Without any intent to speak with exaggeration it seems to me that this is really a historic occasion. This paper is a landmark in statistics … I myself, like other Bayesian statisticians, have been convinced of the truth of the likelihood principle for a long time. Its consequences for statistics are very great. … I can’t stop without saying once more that this paper is really momentous in the history of statistics. It would be hard to point to even a handful of comparable events. (Savage, 1962). All error statistical notions, p-values, significance levels, … all violate the likelihood principle (ibid.)
  • 16. Mayo: May 2 16 (Savage doubted people would stop at the halfway house, but that they’d go on to full blown Bayesianism; Birnbaum never did.) I have no doubt that revealing the flaw in the alleged proof will not be greeted with anything like the same recognition (Mayo, 2010).
  • 17. Mayo: May 2 17 Sufficiency or Weak LP Note: If there were only one experiment, then outcomes with proportional likelihoods are evidentially equivalent: If y1* and y2* are LP pairs within a single experiment E, then Infr[E,y1*] = Infr[E,y2*] Also called sufficiency principle or weak LP. (Birnbaum will want to make use of this…) All that was Example 1.
  • 18. Mayo: May 2 18 Example 2: Conditionality (CP): You should not get credit (or be blamed) for something you don’t deserve Mixture: Cox (1958) Two instruments of different precisions  We flip a fair coin to decide whether to use a very precise or imprecise tool: E1 or E2.  The experiment has 2 steps, first flip the coin to see whether E1 or E2 is performed, then perform it and record outcome
  • 19. Mayo: May 2 19 Conditionality Principle CP  CP: Once it is known which Ei produced z, the p-value or other inferential assessment should be made conditioning on the experiment actually run.  This would not be so if the randomizer (the flipping) were relevant to the parameter  the CP is always with a irrelevant randomizer.  Example: in observing a normally distributed Z in testing a null hypothesis = 0, E1 has variance of 1, while that of E2 is 106 . The same z measurement would correspond to a much smaller p- value if from E1 rather than E2: The p-value from E1 is much less than from E2 p1 and p-value from p1 << p2
  • 20. Mayo: May 2 20 Unconditional or convex combination What if someone reported the precision of the experiment as the average of the two: [p1 + p2]/2 This unconditional assessment is also called the convex combination of the p-values averaged over the two sample sizes.
  • 21. Mayo: May 2 21 Suppose we know we observed E2 with its much larger variance: The unconditional test says that we can assign this a higher level of significance than we ordinarily do, because if we were to repeat the experiment, we might sample some quite different distribution. But this fact seems irrelevant to the interpretation of an observation which we know came from a distribution [with the larger variance] (Cox 1958, 361). [The measurement using the imprecise tool shouldn't get credit because the measurer might have used a precise tool; the precise guy is blamed a big because he could have used the lousy imprecise tool.]
  • 22. Mayo: May 2 22 CP: Once it is known which Ei has produced z. condition on the Ei producing the result. Do not use the unconditional formulation. It’s obvious.
  • 23. Mayo: May 2 23 If it’s so obvious why was Cox’s paper considered one of the most important criticisms of (certain) frequentist methods? CP is not a mathematical principle, but one thought to reflect intuitions about evidence. There are contexts (not so blatant) where the frequentist would (in effect) report the unconditional assessment e.g., if there were gong to be many applications and you were reporting average precision or the like. The example only arose because Cox wanted to emphasize that in using frequentist methods for inference (as opposed to long-run decisions) you should condition if the randomizer is irrelevant to the parameter of interest.
  • 24. Mayo: May 2 24 The surprising upshot of the controversial argument I will consider is it claims to show if you condition on the randomizer you have to go all the way to the data point, which means no error probabilities (the LP). That’s what Birnbaum’s result purports: CP entails LP! It is not uncommon to see statistics texts argue that in frequentist theory one is faced with the following dilemma: either to deny the appropriateness of conditioning on the precision of the tool chosen by the toss of a coin, or else to embrace the strong likelihood principle, which entails that frequentist sampling distributions are irrelevant to inference once the data are obtained. This is a false dilemma. . . . The ‘dilemma’ argument is therefore an illusion. (Cox and Mayo 2010, 298) But the illusion is not so easy to dispel; thus this paper.
  • 25. Mayo: May 2 25 NOW FOR THE BREAKTHROUGH Shorthand: x* = (E1,x*); y* = (E2,y*) LP: If x* and y* are LP pairs, then Infr[E1,x*] = Infr[E2,y*] Whenever 𝒙∗ and y* form an LP pair, but Infr[E1,x*] ≠ Infr[E2,y*] we have a violation of the Likelihood Principle (LP). We saw in Example 1: y* (a 2 s.d. result) in E2 optional stopping (stops with n = 169) had a p-value ~.55 x* (a 2 s.d. result) in E1, fixed n = 169, had a p-value =.05 The two outcomes form an LP pair, so we have an LP violation. (Every use of error probabilities is an LP violation.)
  • 26. Mayo: May 2 26 Birnbaumization Step 1: Birnbaum Experiment EB: Birnbaum will describe a funny kind of ‘mixture’ experiment based on an LP pair; You observed y* from experiment E2 (optional stopping)  Enlarge it to an imaginary mixture: I am to imagine it resulted from E2 getting tails on the toss of a fair coin, where heads would have meant performing the fixed sample size experiment with n = 169 from the start.  Erasure: Next, erase the fact that y* came from E2 and report it as if it came from (E1,x*)
  • 27. Mayo: May 2 27  Report the convex combination o [p1 + p2]/2 p1 = .05 (E1-fixed sample size n = 169) p2 = .55 (E2 optional stopping, stops at n = 169. So the average is ~ .3 (the ½ comes from the imagined fair coin)
  • 28. Mayo: May 2 28 I said it was a funny kind of mixture, there are two things that make it funny:  It didn’t happen, you only observed y* from E2  You are to report an outcome as x* from E1 even though you actually observed y* from E2  The inference is based on the unconditional mixture We may call it Birnbaumizing the result you got. You are to do it for any LP pair, we’re focusing on just these two. Having thus “Birnbaumized” the particular LP pair that you actually observed, it appears that you must treat x* as evidentially equivalent to its LP pair, y* (after all: you report y* as x*)
  • 29. Mayo: May 2 29 Premise 1: Inferences from x* and y*, within EB using the convex combination, are equivalent (Birnbaumization): Infr[x*] = Infr[y*] both are equal to (p1 + p2)/2) Premise 2 (a): From CP: An inference from y* conditioning on E2 (the experiment that produced it), is p2 Infr[y*] = p2 Premise 2 (b): An inference from x* conditioning on E1 is p1 Infr[x*] = p1 Conclusion: p1 = p2 So it appears if we hold CP we cannot have the LP violation.
  • 30. Mayo: May 2 30 But x* and y* form a LP violation, so, p1 ≠ p2 Thus it would appear the frequentist is led into a contradiction. He will do the same for any x* , y* LP pair: generally, Infr[y*] = Infr[x*] (i.e., no LP violations are possible) One way to formulate the argument is this:
  • 31. Mayo: May 2 31 Premise 1: If use Birnbaumization EB Infr[x*] = Infr[y*] both are equal to (p1 + p2)/2) Premise 2 (a): If use conditionality CP: Infr[y*] = p2 Premise 2 (b): Infr[x*] = p1 He wants to infer: p1 = p2 But can have p1 ≠ p2 Whenever we have an LP violation p1 ≠ p2. You can uphold the two principles, but not apply them in such a way as to violate self-consistency.
  • 32. Mayo: May 2 32 Construed this way, it’s an invalid argument; formulated as a valid argument, it will not be sound. Instead of continuing with the formal paper, let me sketch an exchange I post on my blog each New Year’s eve, one I imagine “Midnight With Birnbaum” akin to that Midnight in Paris movie. Just as that writer gets to go back in time to get appraisals from famous past writers, I imagine checking my criticism of Birnbaum with him. Just to zero in on the main parts…
  • 33. Mayo: May 2 33 ERROR STATISTICAL PHILOSOPHER: It’s wonderful to meet you Professor Birnbaum….I happen to be writing on your famous argument about the likelihood principle (LP). BIRNBAUM: Ultimately you know I rejected the LP as failing to control error probabilities (except in the trivial case of testing point against point hypotheses, provided they are predesignated). ES PHILOSOHER: I suspected you found the flaw in you argument…? It doesn’t seem to work… BIRNBAUM: Well, I shall happily invite you to take any case that violates the LP and allow me to demonstrate that the frequentist is led to inconsistency, provided she also wishes to adhere to the Conditionality Principle CP
  • 34. Mayo: May 2 34 Whoever wins this little argument pays for this whole bottle of vintage champagne. ES PHILOSOPHER: I really don’t mind paying for the bottle. BIRNBAUM: Good, you will have to. Take any LP violation. In two-sided Normal testing, y* from optional stopping E2 (stops with a 2 s.d. difference at n = 169). I will show you that y* must be inferentially equivalent to x* from corresponding fixed sample size E1 (n = 169). i.e. Infr[x*] = Infr[y*] (We assume both experiments seek to make inferences about the same parameter  and the model is given.) Do you agree that: For a frequentist, y* is not evidentially equivalent to x*.
  • 35. Mayo: May 2 35 ES PHILOSOPHER: Yes, that’s a clear case where we reject the LP, and it makes perfect sense to distinguish their corresponding p-values: the searching optional stopping experiment makes the p-value quite a bit higher than with the fixed sample size. For optional stopping p2 = .55 For fixed p1 = .05 I don’t see how you can make them equal. BIRNBAUM: Well, I’ll show you’ll have to renounce another principle you hold, the CP. ES PHILOSOPHER: Laughs.
  • 36. Mayo: May 2 36 BIRNBAUM: Suppose you’ve observed y* from E2. You admit, do you not, that this outcome could have occurred as a result of a different experiment? It could have been that a fair coin was flipped where it is agreed that heads instructs you to perform E1 and tails instructs you to perform the optional stopping test E2, and you happened to get tails and managed to stop at 169. A mixture experiment. ES PHILOSOPHER: Well, that is not how I got y*, but ok, it could have occurred that way. BIRNBAUM: Good. Then you must grant further that your result could have come from a special variation on this mixture that I have dreamt up, call it a Birnbaum experiment EB. In an EB, if the outcome from the experiment you actually performed has an outcome with a proportional likelihood to one in some other experiment not performed, E1, then we say
  • 37. Mayo: May 2 37 that your result has an “LP pair”. In that case, EB stipulates that you are to report y* as if you had determined whether to run E1 or E2 by flipping a fair coin (given as irrelevant to the parameter of interest). ES PHILOSOPHER: So let’s see if I understand a Birnbaum experiment EB. I run E2 and when I stop I report my observed 2s.d. difference came from E1, and as a result of this strange type of a mixture experiment. BIRNBAUM: Yes, or equivalently you could just report y*: my result is a 2 s.d. difference and it could have come from either E1 (fixed sampling, n= 169) or E2 (optional stopping, which happens to stop at the 169th trial). The proof is often put this way, it makes no difference.
  • 38. Mayo: May 2 38 ES PHILOSOPHER: You’re saying if my result has an LP pair in the experiment not performed, I should act as if I accept the LP and so if the likelihoods are proportional in the two experiments (both testing the same mean), the outcomes yield the same inference. BIRNBAUM: Well, wait, experiment EB, as an imagined mixture it is a single experiment, so really you only need to apply the sufficiency principle (the weak LP) which everyone accepts. Yes? ES PHILOSOPHER: So Birnbaumization has turned the two experiments into one. But how do I calculate the p-value within your Birnbaumized experiment? BIRNBAUM: I don’t think anyone has ever called it that.
  • 39. Mayo: May 2 39 ES PHILOSOPHER: I just wanted to have a shorthand for the operation you are describing, … So how do I calculate the p- value within EB? BIRNBAUM: You would report the overall (unconditional) p- value, which would be the average over the sampling distributions: (p1+ p2)/2. (The convex combination.) Say p1 is .05 and p2 is ~.55; anyway, we know they are different, that’s what makes this a violation of the LP. ES PHILOSOPHER: So you’re saying that if I observe a 2-s.d. difference from E2, I do not report the associated p-value .55, but instead I am to report the average p-value, averaging over some other experiment E1 that could have given rise to an outcome with a proportional likelihood to the one I observed, even though I didn’t obtain it this way?
  • 40. Mayo: May 2 40 BIRNBAUM: I’m saying that you have to grant that y* from an optional stopping experiment E2 could have been generated through Birnbaum experiment EB. If you follow the rules of EB, then y* is evidentially equivalent to x* This is premise 1. Premise 1: Within EB The inference from y* = the inference from x* both are equal to (p1 + p2)/2) (defn of EB and sufficiency) ES PHILOSOPHER: But this is just a matter of your definition of inference within your Birnbaumized experiment EB . We still have p1 different from p2 so I don’t see the contradiction with my rejecting the LP.
  • 41. Mayo: May 2 41 BIRNBAUM: Hold on; I’m about to get to the second premise, premise 2. So far all of this was premise 1. ES PHILOSOPHER: Oy, what is premise 2? BIRNBAUM: Premise 2 is this: Surely, you agree (following CP), that once you know from which experiment the observed 2 s.d. difference actually came, you ought to report the p-value corresponding to that experiment. You ought not to report the average (p1+ p2)/2 as you were instructed EB
  • 42. Mayo: May 2 42 This gives us: Premise 2 (a): From CP: The inference from y* is p2 ES PHILOSOPHER: So, having first insisted I imagine myself in a Birnbaumized, EB and report an average p-value, I’m to return to my senses and “condition” to get back to where I was to begin with? BIRNBAUM: Yes, at least if you hold to the conditionality principle CP (of D. R. Cox) Likewise, if you knew it came from E1, then the p-value is Premise 2 (b): From CP: The inference from x* is p1
  • 43. Mayo: May 2 43 So you arrive at (2a) and (2b), yes? Surely you accept Cox’s CP, I know you’ve been working with him. ES PHILOSOPHER: Well, the CP is defined for actual mixtures, where one flips a coin to determine if E1 or E2 is performed. One cannot perform the following: Toss a fair coin. If it lands heads, perform an experiment E1 that yields a member of an LP pair x*, if tails observe E2 that yields y*. We do not know what would have resulted from the unperformed experiment, much less that it would form an LP pair with the observed y*. There is a single experiment, and CP stipulates that we know which was performed and what its outcome was. BIRNBAUM: Wasn’t I clear that I’m dealing with hypothetical or mathematical mixtures (rather than actual ones)?
  • 44. Mayo: May 2 44 Mathematical or Hypothetical Mixtures Envision the mathematical universe of LP pairs, each imagined to have been generated from a -irrelevant mixture (for the inference context at hand). When we observe y* we pluck the x* companion needed for the argument. If the outcome doesn’t have an LP pair, just proceed as normal, I’m only out to show that If x* and y* are LP pairs, then the inference from x* = the inference from y*. That's the LP, right. A lot of people—including some brilliant statisticians—have tried to block my argument because it deals in mathematical mixtures, but I say there’s no clear way to distinguish actual (or performable) mixtures from mathematical ones.
  • 45. Mayo: May 2 45 ES PHILOSOPHER:I want to give your argument maximal mileage (it’s been around for 50 years). I will allow CP is definable for hypothetical -irrelevant mixtures out there, and that we can Birnbaumize an experimental result. I cannot even determine what the LP pair would need to be until after I’ve observed y* but I guess I can pluck it down from the universe of LP pais as needed. BIRNBAUM: Exactly, then premises (1), (2a) and (2b) yield the strong LP! ES PHILOSOPHER: Clever, but your “proof” is obviously unsound. Let me try to put it in valid form:
  • 46. Mayo: May 2 46 Premise 1: ,In EB, the inference from y* = inference from x* [both are equal to (p1 + p2)/2)] Premise 2 (a): from CP (conditionality) In EB (given y*), the inference from y* = p2 Premise 2 (b): from CP: In EB , (given x*), the inference from x* = p1 Conclusion: The inference from y* = the inference from x*. ( p1= p2) The premises about EB can’t both be true, so it’s unsound CP says the inference from known result y* should not be the average over experiments not performed: in other words, it instructs me not to Birnbaumize, as premise 1 requires. If Premise 1 is true, 2a and b are false (unless it happens that CP makes no difference and there’s no LP violation).
  • 47. Mayo: May 2 47  Keeping to this valid formulation, the premises can hold true only in a special case: when there is no LP violation.  With LP violations there are three different distributions E1 , E2 , EB  To show LP holds in general you’d have to start out assuming there can be no LP violations which would make it circular.  It certainly doesn’t show the frequentist is open to contradiction (by dint of accepting CP, and denying the LP.)  Alternatively, (as shown earlier) I can get the premises to come out true, but then the conclusion is false (for any LP violation)—so it is invalid. Every LP violation will be a case where the argument is unsound (either invalid or premises conflict).
  • 48. Mayo: May 2 48 Invalid Premise 1: If use Birnbaumization EB Infr[x*] = Infr[y*] both are equal to (p1 + p2)/2) Premise 2 (a): If use conditionality CP: Infr[y*] = p2 Premise 2 (b): Infr[x*] = p1 Conclusion p1 = p2 Putting the premises as “if thens”, they can both be true but can have p1 ≠ p2
  • 49. Mayo: May 2 49 BIRNBAUM: I later came to see that merely blocking the LP isn’t enough: “Durbin’s formulation (CP’), although weaker than CP, is nevertheless too strong (implies too much of the content of LP to be compatible with standard (non- Bayesian) statistical concepts and techniques”. Same for Kalbfleish and Evans’ recent treatment. In fact you still get that the stopping rule is irrelevant. ES PHILOSOPHER: My argument shows that any violation of LP in frequentist sampling theory necessarily results in an illicit substitution in the argument; it is in no danger of implying “too much” of the LP: what was an LP violation remains one. BIRNBAUM: Yet some people still think it is a breakthrough!
  • 50. Mayo: May 2 50 Why has it escaped detection? I think in the case of LP pairs, it is easy to fall into an equivocation: (I) Given the measurement (z*), it doesn’t matter if it came from E1 or E2 (since it’s a member of an LP pair) (II) Given y* is known to have come from E2, it doesn’t matter if E2 was the result of an irrelevant randomizer (tossing a coin) to choose between E1 and E2, or E2 was fixed all along. The first, (I), corresponds to Birnbaumization; the second, (II), to CP.
  • 51. Mayo: May 2 51 The clue was that LP is shown logically equivalent to CP. So I knew a proof from CP to LP was circular, but it was hard to demonstrate.
  • 52. Mayo: May 2 52  Example 1: Trying and Trying Again: Optional stopping  Example 2: Two instruments with different precisions (you shouldn’t get credit/blame for something you didn’t do)  The Breakthrough: Birnbaumization  Imaginary dialogue with Allan Birnbaum
  • 53. Mayo: May 2 53 REFERENCES Armitage, P. (1975). Sequential Medical Trials, 2nd ed. New York: John Wiley & Sons. Birnbaum, A. (1962). On the Foundations of Statistical Inference (with discussion), Journal of the American Statistical Association, 57: 269–326. Birnbaum. A. (1969). Concepts of Statistical Evidence. In Philosophy, Science, and Method: Essays in Honor of Ernest Nagel, edited by S. Morgernbesser, P. Suppes, and M. White, New York: St. Martin’s Press: 112-143. Berger, J. O., and Wolpert, R.L. (1988). The Likelihood Principle, California Institute of Mathematical Statistics, Hayward, CA. Cox, D.R. (1977). “The Role of Significance Tests (with Discussion),” Scandinavian Journal of Statistics, 4: 49–70. Cox D. R. and Mayo. D. (2010). "Objectivity and Conditionality in Frequentist Inference" in Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science, edited by D Mayo and A. Spanos, Cambridge: Cambridge University Press: 276-304. Edwards, W., Lindman, H, and Savage, L. (1963). Bayesian Statistical Inference for Psychological Research, Psychological Review, 70: 193-242. Jaynes, E. T. (1976). Common Sense as an Interface. In Foundations of Probability Theory, Statistical Inference and Statistical Theories of Science Volume 2, edited by W. L. Harper and C.A. Hooker, Dordrect, The Netherlands: D.. Reidel: 218-257.
  • 54. Mayo: May 2 54 Joshi, V. M. (1976). “A Note on Birnbaum’s Theory of the Likelihood Principle.” Journal of the American Statistical Association 71, 345-346. Joshi, V. M. (1990). “Fallacy in the Proof of Birnbaum’s Theorem.” Journal of Statistical Planning and Inference 26, 111-112. Lindley D. V. (1976). Bayesian Statistics. In Foundations of Probability theory, Statistical Inference and Statistical Theories of Science, Volume 2, edited by W. L. Harper and C.A. Hooker, Dordrect, The Netherlands: D.. Reidel: 353-362. Mayo, D. (1996). Error and the Growth of Experimental Knowledge. The University of Chicago Press (Series in Conceptual Foundations of Science). Mayo, D. (2010). "An Error in the Argument from Conditionality and Sufficiency to the Likelihood Principle." In Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science, edited by D. Mayo and A. Spanos, Cambridge University Press. 305-314. Mayo, D. and D. R. Cox. (2011) “Statistical Scientist Meets a Philosopher of Science: A Conversation with Sir David Cox.” Rationality, Markets and Morals (RMM): Studies at the Intersection of Philosophy and Economics. Edited by M. Albert, H. Kliemt and B. Lahno. An open access journal published by the Frankfurt School: Verlag. Volume 2, (2011), 103-114. http://www.rmm-journal.de/htdocs/st01.html Mayo D. and A. Spanos, eds. (2010). Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science, Cambridge: Cambridge University Press.
  • 55. Mayo: May 2 55 Pratt, John W, H. Raffia and R. Schlaifer. (1995). Introduction to Statistical Decision Theory. Cambridge, MA: The MIT Press. Savage, L., ed. (1962a), The Foundations of Statistical Inference: A Discussion. London: Methuen & Co. Savage, L. (1962b), “‘Discussion on Birnbaum (1962),” Journal of the American Statistical Association, 57: 307–8.