De Finetti meets Popper

De Finetti meets Popper
or Should Bayesians care about falsificationism?
Stephen Senn, Edinburgh
(C) Stephen Senn 2019
Lecture at the Popper Symposium on 7 August 2019 at the
16th International Congress on Logic, Methodology & Philosophy of Science, Prague

Basic thesis Outline
The distinction between refuting and
‘corroborating’ a hypothesis is fundamental.
It does not become irrelevant by adopting a
Bayesian approach to inference.
It has no direct bearing on choice of meaning
for probability: subjective, relative frequency,
propensity, logical etc
Various practical problems in analysing clinical
trials illustrate this
Basic background
• De Finetti’s falsificationism
• Simple illustration
• Jeffreys’s alternative approach
• Inspired by Broad’s challenge
Falsificationist issues in clinical trials
• Bioequivalance
• Equivalence and falsificationism
• Blinding
• Competence
• Causal analysis versus prediction in clinical
trials
Conclusions

A puzzle to keep you thinking
Suppose we are to have 1 million independent trials with a binary outcome.
We wish to decide, in advance of beginning the trials, which of the following
is more likely
A: 1 million successes and no failures
B: 500,000 successes and 500,000 failures in any order
We use a Bayesian approach with a uninform prior for the binary outcome
(such as would have been employed by Laplace)
What is the correct answer?

Basic background
Very elementary – please accept my apologies

“The acquisition of a further piece of information, H - in other words
experience, since experience is nothing more than the acquisition of
further information - acts always and only in the way we have just
described: suppressing the alternatives that turn out to be no longer
possible..”
Popper?
No, de Finetti

Example
• A man has a CD of popular music with 12 tracks on it
• He can play tracks in random order (Shuffle) or in sequential order
(Play)
• On a particular occasion he thinks he has pressed Shuffle (that was his
intention) but the first track played is the first track, F, on the CD
• What is the probability that he did, in fact, press Shuffle as intended’

We can put this together as follows
“Hypothesis” Prior
Probability
P
Evidence Likelihood P x L
Shuffle 9/10 F 1/12 9/120
Shuffle 9/10 X 11/12 99/120
Play 1/10 F 1 12/120
Play 1/10 X 0 0
TOTAL 120/120 = 1
Note that in de Fineti’s theory the relevant historical process is that of the individual’s thought process not
“real world” events

After seeing (hearing) the evidence, however, only two rows remain
Probability
P
Evidence Likelihood P x L
Shuffle 9/10 F 1/12 9/120
Shuffle 9/10 X 11/12 99/120
Play 1/10 F 1 12/120
Play 1/10 X 0 0
TOTAL 21/120

So we rescale by dividing by the total probability
Probability
P
Evidence Likelihood P x L Posterior Probability
Shuffle 9/10 F 1/12 9/120 (9/120)/(21/120)
=9/21
Shuffle 9/10 X 11/12 99/120
Play 1/10 F 1 12/120 (12/120)/(21/120)
=12/21
Play 1/10 X 0 0
TOTAL 21/120 21/21=1

Returning to De Finetti’s general approach
• Suppose we declare all possible sequences of some binary outcome (say S=
success and F = failure) equally likely
• Then no learning is possible
• This is because for any sequences consisting of a number of S and F
outcomes, then every possible forward sequence of S and F is also equally
likely
• Thus, observing which sequences have not occurred and renormalising
changes nothing
• Caution is required!
• This is one reason why De Finetti was sceptical about any automatic
approaches to Bayesian inference

What Jeffreys Understood
Theory of Probability, 3rd edition P128

CD Broad, 1918
P393
p394
As m goes to
infinity the first
approaches 1
If n is much greater
than m the latter is
small

The Economist gets it wrong
The canonical example is to imagine that a precocious newborn observes
his first sunset, and wonders whether the sun will rise again or not. He
assigns equal prior probabilities to both possible outcomes, and
represents this by placing one white and one black marble into a bag. The
following day, when the sun rises, the child places another white marble
in the bag. The probability that a marble plucked randomly from the bag
will be white (ie, the child’s degree of belief in future sunrises) has thus
gone from a half to two-thirds. After sunrise the next day, the child adds
another white marble, and the probability (and thus the degree of belief)
goes from two-thirds to three-quarters. And so on. Gradually, the initial
belief that the sun is just as likely as not to rise each morning is modified
to become a near-certainty that the sun will always rise.
The Economist, ‘In praise of Bayes’, September 2000

Jeffreys’s solution
• The fact that ‘laws’ cannot be proved using Bayes theorem if the
Laplacian approach to choosing prior distributions is adopted means
that the choice of prior distribution is wrong
• His solution is to place a mass of probability on the hypothesis being
true
• This gives simpler representations of the world more prior weight
than more complex ones
• In his view this is necessary to permit induction to work
• Prior probability replaces (or reflects) parsimony as a principle

Falsificationist issues in clinical
trials
Rather more technical – again please accept my apologies

Equivalence studies
(including bioequivalence)
• Studies in which one tries to prove that treatments do not differ
• The most extreme example is so-called bioequivalence studies
• The molecule is the same but the formulation differs
• The same manufacturer may wish to replace one route of administration by
another
• For example a suppository by a pill
• Or a single-dose inhaler with a multi-dose one
• Or a different so-called generic manufacture may wish to supply the market
with its version of a now off-patent brand-name product
• Or a manufacturer may wish for labelling reasons to prove that a drug does
not differ whether given with or without food

But surely, a drug is a drug?
• In fact, no, changing the formulation can have dramatic effects on
potency of a drug
• Here is an example I was involved with
• Bronchodilator in asthma
• Seven treatments compared over twelve hours using forced expiratory
volume in one second (FEV1)
• Placebo
• 6,12 and 24 g of new formulation (MTA)
• 6,12 and 24 g of old formulation (ISF)
• Other details omitted for the sake of brevity
• The results follow (high values of FEV1 are good)
Senn, S.J., et al., An incomplete
blocks cross-over in asthma: a
case study in collaboration, in
Cross-over Clinical Trials, J.
Vollmar and L.A. Hothorn, Editors.
1997, Fischer: Stuttgart. p. 3-26.

Treatment Placebo MT&A 6 MT&A 12 MT&A 24
FEV1 (L)
2.0
2.5
Minute
0 180 360 720
Placebo and the 3 doses of the new formulation

Treatment Placebo MT&A 6 MT&A 12 MT&A 24
ISF 6 ISF 12 ISF 24
FEV1 (L)
2.0
2.5
Minute
0 180 360 540 720
With the 3 doses of reference formulation added

Bioequivalence in terms of confidence
intervals
What is considered ‘proven’
A: neither equivalence nor difference proven
B: exact equivalence rejected
C: inconclusive
D & E: practical equivalence proven
F: practical equivalence proven but exact equivalence
rejected
G: exact and practical equivalence rejected

First issue: Blinding and Equivalence
• Running a double blind trial does not protect you against a conclusion
of equivalence
• You do not need to know the treatment code to bias results towards
equivalence
• Consider a particular simple (and very common) form of trial in which
two oral formulations of a molecule are compared by looking at the
concentration time profile in a cross-over trial
• Equivalence of these profiles is taken to mean equivalence of the
formulations
• “The blood is a gate through which the drug must pass”

The Unscrupulous Pharmacokineticist
• Take the 12 test tubes for day one for a given
volunteer
• hour 1,2…12
• Take the 12 test tubes for day two for the same
volunteer
• hour 1,2…12
• Mix each pair (by hour) together
• Divide them into two
• Et voila
• Perfect equivalence without having to unblind

Fanciful?
• In fact blinding does not protect against false conclusions of
equivalence
• Pharmaceutical companies commonly prosecute cheating doctors
• Reason
• Trial fails to show any effect whereas others do
• Explanation
• The trial never took place
• The data have been invented
• This will produce a conclusion of equivalence

Second issue: Competence
• Experiment is fair if treatments are handled equivalently
• in all aspects except those that form the essence (definition) of the treatment
• cannot be determined by looking at outcomes
• Competence is the ability to detect differences
• can only partly be determined on external grounds
• can be established if difference is detected
• It is a matter of “assay sensitivity”

A Model for Competence
competent, not competent
equivalent, inequivalent
observed difference, no difference
Likelihoods
( ) ( ) 1
( ) ( ) 1
( ) ( ) 1
( ) ( ) 1
1 0
"Priors" (
C C
E E
D D
P D E C P D E C
P D E C P D E C
P D EC P D EC
P D EC P D EC
P E
 
 
 
 
  



    
      
  
    
   
) , ( )P C E  
See Senn, S.J., Inherent difficulties
with active control equivalence
studies. Statistics in Medicine, 1993.
12(24): p. 2367-75.

Interpretation of These Parameters
• 1- and  reflect the ‘precision’ of ‘competent’ experiments
• Their converses  and 1- are analogous to type I and II error rates
•  and 1-, can be reduced by more and more precise experiments
•  represents the probability that where a difference between
treatments really does exist a poor (not competent) experiment will
indicate it exists
• Joint effect of  and  represents factors beyond our control
•  is the probability that ‘Nature’ has decided the two treatments are
equivalent
•  is the probability that the trial is competent given that the treatments are
not equivalent

Notes
Under this formulation of the likelihoods it is irrelevant as to whether
the trial is competent if the treatments are equivalent.
We could require the combination EC as impossible.
We require  > , but this is a linguistic convention.

For those who like formulae
  
 
1 (1 )
( )
(1 ) (1 )
(1 )
( )
(1 )(1 ) (1 )(1 )(1 ) (1 )
as 1 and 0
( ) 1
( )
(1 )(1 )(1 )
P E D
P E D
P E D
P E D
   
    
 
       
 

   
  
 
   

 
       
 
 
 
   

, competence
, prior equivalence

, competence
, prior
equivalence
NB  has been reduced to 0.005

, competence
, prior equivalence
 = 0.05

Consequences
• Asymmetry between concluding equivalence and difference
• The former is more problematic
• Not just a matter of reformulating the problem
• Conditional on an assumption of competence we can conclude
equivalence
• However, if we have any doubts about competence, these doubts increase by
finding a difference
• Speculation: this is a concrete instance of the more general point
made by Popper and Miller 1987

Hunt the thimble
• You are looking for a thimble in a room
• Consider two cases
• You find the thimble
• You search but don’t find the thimble
• Inferences about whether the thimble is in the room or not are
fundamentally different in the two cases
• In the first case, you conclude it is, and your competence as a searcher for
thimbles is irrelevant to this conclusion
• In the second case, you may believe that the thimble is not in the room but
this belief depends on your competence as a thimble-searcher, about
which you may come to have doubts

Third issue: causal versus predictive inference
• Clinical trials can be used to try and answer a number of very
different questions
• Two examples are
• Did the treatment have an effect in these patients?
• A causal purpose
• What will the effect be in future patients?
• A predictive purpose
• Unfortunately, in practice, an answer is produced without stating
what the question was
• Given certain assumptions these questions can be answered using the
same analysis but the assumptions are strong and rarely stated

Two models
Predictive
• The population is taken to be ‘patients in
general’
• Of course this really means future patients
• They are the ones to whom the treatment
will be applied
• We treat the patients in the trial as an
appropriate selection from this
population
• This does not require them to be typical
but it does require additivity of the
treatment effect
Causal
• We take the patients as fixed
• We want to know what the effect
was for them
• Unfortunately there are missing
counterfactuals
• What would have happened to
control patients given intervention
and vice-versa
• The population is the population of
all possible allocations to the
patients studied

Coverage probabilities for two questions
Average treatment effect in population is 300ml FEV1
Predictive Causal
Horizontal dashed line is population average effect (LHS & RHS). Blue horizontal bar is true
trial effect (RHS). Black Cis cover true effect, red don’t).

Conclusion
• There is a fundamental difference between
• Demonstrating that things are different
• Demonstrating they are the same
• There is a fundamental difference between
• Concluding something had an effect
• Concluding it must always have this effect
• Many features of clinical trials reflect this
• The value of blinding
• Competence (assay sensitivity)
• Causal versus predictive inference
• These are not a consequence of being frequentist
• They are not vanquished by becoming Bayesian
• The choice of a Bayesian or frequentist framework does not depend on this

In summary
“Equivalence is different”

The answer to the puzzle
Both are equally likely
The prior distribution is uniform.
By the time we completed the trials the relative frequency will be the probability
But the prior distribution says every probability is equally likely
Therefore it is hardly surprising that every relative frequency will be equally likely
Senn, S.J., Dicing with Death. 2003,
Cambridge: Cambridge University Press.

De Finetti meets Popper

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to De Finetti meets Popper

Similar to De Finetti meets Popper (20)

More from Stephen Senn

More from Stephen Senn (7)

Recently uploaded

Recently uploaded (20)

De Finetti meets Popper

Editor's Notes