Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

# Reading Birnbaum's (1962) paper, by Li Chenlu

2,100
views

Published on

Presentation of Birnbaum's Likelihood Principle foundational paper at the Reading Statistical Classics seminar, Jan. 20, 2013, Université Paris-Dauphine

Presentation of Birnbaum's Likelihood Principle foundational paper at the Reading Statistical Classics seminar, Jan. 20, 2013, Université Paris-Dauphine

Published in: Education

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
2,100
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
4
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Transcript

• 1. On The Foundations Of Statistical Inference by ALLAN BIRNBAUM LI Chenlu 2013.01.22 1 / 33
• 2. content 1 Introduction 2 / 33
• 3. content 1 Introduction 2 Part1 Statistical Evidence The Principle of Suﬃciency The Principle of Conditionality The Likelihood Principle 2 / 33
• 4. content 1 Introduction 2 Part1 Statistical Evidence The Principle of Suﬃciency The Principle of Conditionality The Likelihood Principle 3 Part2 Binary Experiments Finite Parameter Spaces More General Parameter Spaces Bayesian Methods: An Interpretation of the Principle of Insuﬃ- cient Reason 2 / 33
• 5. content 1 Introduction 2 Part1 Statistical Evidence The Principle of Suﬃciency The Principle of Conditionality The Likelihood Principle 3 Part2 Binary Experiments Finite Parameter Spaces More General Parameter Spaces Bayesian Methods: An Interpretation of the Principle of Insuﬃ- cient Reason 4 Conclusion 2 / 33
• 6. Introduction The paper studies the likelihood principle(LP)and how the likelihood function can be used to mesure the evidence in the data about an unkown parameter. • The main aim of the paper is to show and discuss the implication of the fact that the LP is a consequence of the concepts of conditional frames of the reference and suﬃciency. • The second aim of the paper is to describe how and why these principles are appropriate ways to characterize statistical evidence in parametric models for inference purposes. 3 / 33
• 7. Part1 1 Statistical Evidence 2 The Principle of Suﬃciency 3 The Principle of Conditionality 4 The Likelihood Principle 4 / 33
• 8. Statistical Evidence An experiment E is deﬁned as E={Ω,S,f(x,θ)},where f is a density,θ is the unknown parameter,Ω is the parameter space and S the sample space of outcomes x of E.The likelihood function determined by an observed outcome is Lx =f(x,θ) Birnbaum states that the central purpose of the paper is to clarify the essential structure and properites of statistical evidence,termed the evidential meaning of (E,x) and denoted by Ev(E,x),in various instances. Ev(E,x)is the evidence about θ supplied by x and E 5 / 33
• 9. The Principle of Suﬃciency The Principle of Suﬃciency (S) Let E be any experiment,with sample space{x},and let t(x)be any suﬃcient statistic. Let E denote the derived experiment,having the same parameter space,such that when any outcome x of E is observed the corresponding outcome t = t(x) of E is observe.Then for each x,Ev (E , x) = Ev (E , t),where t = t(x) ∗ If t(x) is a suﬃcient statistic for θ, then any inference about θ should depend on the sample x only through the value t(x) 6 / 33
• 10. The Principle of Suﬃciency If x is any speciﬁed outcome of any speciﬁed experiment E,the likelihood function determined by x is the function of θ:cf(x,θ),where c is any positive constant value If for some positive constant c we have f (x, θ) = cg (y , θ),for all θ,x and y are said to determine the same likelihood function If two outcomes x,x of one experiment determine the same likelihood function,f(x,θ)=cf(x ,θ) for all θ ,then there exists a suﬃcient statistic t such that t(x) = t(x ) 7 / 33
• 11. The Principle of Suﬃciency lemma1 if two outcomes x, x of any experiment E determine the same likelihood function,then they have the same evidential meaning: Ev (E , x) = Ev (E , x ) 8 / 33
• 12. The Principle of Conditionality the deﬁniton of the mixture experiment An experiment E is called a mixture ,with componens{Eh },if it is mathematically equivalent to a two-stage experiment of the follow- ing form: 1 An observation h is taken on a random variable H having a ﬁxed and know distribution G (G does not depend on unknow parameter values.) 2 The corresponding component experiment Eh is carried out ,yielding an outcomes xh Thus each outcomes of E is a pair(Eh ,xh ) 9 / 33
• 13. The Principle of Conditionality The Principle of Conditionality (C) If an experiment E is a mixture G of components{Eh },with possible outcomes(Eh ,xh ),then Ev (E , (Eh , xh )) = Ev (Eh , xh ) That is,the evidential meaning of any outcome(Eh ,xh )of any ex- periment E having a mixture structure is the same as: the eviden- tial meaning of the corresponding outcome xh of the corresponding component experiment Eh ,ignoring otherwise the over-all structure of the original experiment E 10 / 33
• 14. The Principle of Conditionality Exemple suppose that two instruments(h=1 or 2) are available for use in an experment,respectives probabilities p1 =0.73,p2 =0.27 of being s- elected for use. each instrument gives the observations y=1,or y=0. Consider the assertion :Ev(E,(E1 ,1))=Ev(E1 ,1),by accepting the experimental conditions,suppose that E leads to selection of the ﬁrst instrument(h=1). . In the hypothetical situation,it would be prepared to report either(E1 ,0)or(E1 ,1)as a complete description of the statistical evidence obtained. 11 / 33
• 15. The Principle of Conditionality Exemple For purpose of informative inference ,if y=1 is observed with the ﬁrst instument,then the report (E1 ,1) seems to be an appropriate and complete description of the statistical evidence obtained. and the”more complete” report(E,(E1 ,1))seems to diﬀer from it only by the addition of recognizably redundant elements irrelevant to the evidential meaning and evidential interpretation of this outcomes of E. 12 / 33
• 16. The Likelihood Principle The Likelihood Principle (L) If E and E are any two experiments with a common parameter space,and if x and y any respective outcomes which determine likelihood functions satisfying f (x, θ) = cg (y , θ) for some positive constant c=c(x,y) and all θ,thenEv (E , x) = Ev (E , y ) That is ,the evidential meaning Ev(E,x)of any outcome x of any experiment E is characterized completely by the likelihood function cf(x,θ),and is otherwise independent of the structure of (E,x) 13 / 33
• 17. The Likelihood Principle Lemma2 (S)and(C)⇐⇒(L) Prove⇐: •That(L)implies(C)follows immediately from the fact that in all cas- es the likelihood functions determined respectively by (E,(Eh ,xh ))and (Eh ,xh )are proportional. •That(L)implies(S)follows immediately from Lemma 1. 14 / 33
• 18. The Likelihood Principle Lemma2 (S)and(C)⇐⇒(L) Prove⇒: Let E and E denote any two experiments,having the same parameter spaceΩ={θ},and represented by probability density functions f(x,θ),g(y,θ)on their respective sample spaces S={x},S ={y}.consider the mixture experiment E whose components are just E and E ,taken with equal probabilities.let z denote the sample point of E ,and let C denote any set of points z;then C=A B,where A⊂ S and B⊂ S 1 1 Prob(Z∈|θ)= 2 Prob(A|θ,E)+ 2 Prob(B|θ,E ) the probability density function representing E be denoted by:  1  2 f (x, θ) if z=x∈ S, h= 1  2 g (y , θ) if z=y∈ S From(C),it follows that:Ev(E ,(E,x))=Ev(E,x), for each x∈ S . Ev(E ,(E ,y))=Ev(E ,y),for each y∈ S (a) 15 / 33
• 19. The Likelihood Principle Prove⇒: Let x y be any two outcomes of E,E respectively which determine the same likelihood function : f(x,θ)=cg(y,θ) for all θ. where c is some positive constant.Then we have h(x,θ)=ch(y,θ) for all θ, the two outcomes(E,x),(E ,y)of E determine the same likelihood function.Then it fol- lows from(S)and Lemma1: Ev(E ,(E,x))=Ev(E ,(E ,y)) (b) from(a)and(b)it follows that: Ev(E,x)=Ev(E ,y). The consequence states that any two outcomes x,y of any two experiments E,E (with the same parameter space)have the same evidential meaning if they determine the same likelihood function. 16 / 33
• 20. The Likelihood Principle impact of the principle •The implication⇒ is the most important part of the equiva- lence,because this means that if you do not accept(L),you have to discard either(S)or(C),two widely accepted principles. •The most important consequence of (L) seems to be that evidential measures based on a speciﬁc experimental frame of refer- ece(like p-values and conﬁdence levels) are somewhat unsatisfactory. • In other words, (L) eliminates the need to consider the sample space or any part of it once the data are observed.Lemma 2 truly was a ”breakthrough” in the foundations of statistical inference and made (L) stand on its own ground,independent of a Bayesian argument. 17 / 33
• 21. Part2 1 Binary Experiments 2 Finite Parameter Spaces 3 More General Parameter Spaces 4 Bayesian Methods 18 / 33
• 22. Binary Experiments let Ω=(θ1 ,θ2 ).In this case,(L)means that all information lie in the likelihood ratio, λ(x)=f(x,θ2 )/f(x,θ1 ). The question is now what evidential meaning we can attach to the numberλ(x)? To answer this,Birnbaum ﬁrst considers a binary experiment in which the sample space has only two points.denoted(+)and(-),and 1 such that p(+|θ1 )=p(-|θ2 )=α for an α ≤ 2 . Such an experiment is called a symmetric simple binary experiment and is characterized by the”error” probability α. 19 / 33
• 23. Binary Experiments For such an experiment,λ(+)=(1-α)/α ≥ 1 ,α=1/(1+λ(+)) andλ(-)= α/(1-α) ≤ 1.The important point now is that according to (L),two experiments with the same value of λ have the same evidential meaning about the value of α . Therefore,the evidential meaning of λ(x)≥1 from any binary experiment E is the same as the evidential meaning of the (+)outcome from a symmetric simple binary experiment with α(x)=1/(1+λ(x)). α(x)is called the intrinsic signigicance level and is measure of evidence that satisﬁes(L). 20 / 33
• 24. Finite Parameter Spaces If E is any experiment with a parameter space containing only a ﬁnite number k of points,θ=i=1,2...k. Any observed outcome x of E determines a likelihood function L(i)=cf(x,i),i=1,....k. We can assume that k L(i)=1 i=1 Any experiment E with a ﬁnite sample space j=1,...m, and ﬁnite parameter space is represented by a stochastic matrix   p11 . . . p1m E = (Pij ) =  . .. .   . .  . . . pk1 . . . pkm m where j=1 Pij =1 and pij =Prob[j|i],for each i,j.Here the i th row is the discrete probability distribution pij given by parameter value i,and the j th column is proportional to the likelihooh function L(i)=L(i|j)=cpij ,i=1,...k,determined by outcome j. 21 / 33
• 25. Finite Parameter Spaces:Qualitative evidential interpretation Exemple1: experiment with only two points j=1,2. we can deﬁne Prob[j=1|i]=L(i) and Prob[j=2|i]=1-L(i).For i=1,...k. for exemple,the likelihood function L(i)= 1 ,i=1,2,3 represents the 3 possible outcome j=1 of the experiment  1 2  3 3   1 2   E=  3 3     1 2 3 3 •Since this experiment gives the same distribution on the two-point sample space under esch hypothesis,it is completely uninformative.According to the likelihood principle,we can therefore conclude that the given likelihood function has a simple evidential inter- pretation,regardless of the structure of the experiment,that is represents a completely uninformative outcome. 22 / 33
• 26. Finite Parameter Spaces:Qualitative evidential interpretation Exemple2: The likelihood function( 1 , 1 ,0)(that is ,L(1)=L(2)= 1 ,L(3)=0,on the 2 2 2 3-points parameter spacei=1,2,3.) this represents the possible outcome j=1 of the experiment  1 1  2 2   1 1   E=  2 2     0 1 •this outcome of E is impossible under i=3,and hence supports without risk of error the conclusion that i=3 • E prescribes identical distributions under i=1,and 2.and hence the experiment E ,and each of its possible outcomes ,is completely uninformative as between i=1,and 2. 23 / 33
• 27. Finite Parameter Spaces:Qualitative evidential interpretation Exemple 3:some likelihood functions on a given parameter space can be compared and ordered in a natural way Consider the likelihood functions (0.8,0.1,0.1) and (0.45,0.275,0.275) The interpretation that the ﬁrst is more informative than the second is supported as follows:   0.8 0.2 E =  0.1 0.9  = (Pij )   0.1 0.9 when outcome j=2 of E is observed,we report w=1 with probability 1 ,w=2 with 2 probability 1 .when outcome j=1 of E is observed, the report w=1 is given. 2   0.9 0.1   E =  0.55  0.45  = (Piw )  0.55 0.45 The experiment E is less informative than E 24 / 33
• 28. Finite Parameter Spaces:Intrinsic conﬁdence methods Exemple 4 consider the likelihood function(0.9,0.09,0.01)deﬁned on the param- eter space i=1,2,3.This represents the possible outcome j=1 of the experiment.   0.9 0.01 0.09     E =  0.09 0.9 0.01  = (Pij )     0.01 0.09 0.9 •In this experiment,a conﬁdence set estimator of the parameter i is given by taking,for each possible outcomes j,the two values of i having greatest likelihoods L(i | j). •we can verify that under each value of i, the probability is 0.99 that the conﬁdence sets determined in this way have conﬁdence coeﬃcient 0.99. 25 / 33
• 29. Finite Parameter Spaces:Intrinsic conﬁdence methods The general form of the intrinsic conﬁdence methods for any likelihood function L(i) deﬁned on a ﬁnite parameter space i=1,...k,and such that k L(i)=1 i=1 if there is a unique least likely value i1 of i,let c1 =1-L(i1 ).Then the remaining (k-1)parameter points will be called an intrinsic conﬁdence set with intrinsic conﬁdence coeﬃcient c1 ;If there is a pair of values of i,say i1 ,i2 ,with likelihoods strictly smaller than those of the remainning (k-2) points,call the latter set of points an intrinsic conﬁdence set,with intrinsic conﬁdence level c2 =1-L(i1 )-L(i2 ).and so on.   L(1) L(k) L(k − 1) . . .    L(2) L(1) L(k) . . .      E =  L(3) L(2) L(1) . . .  = (Pij )     . . . . . .     . . .   L(k) L(k − 1) L(k − 2) . . . 26 / 33
• 30. Finite Parameter Spaces on the ﬁnite parameter space: •For ﬁnite parameter spaces,signiﬁcance levels, conﬁdence sets,and conﬁdence levels can be based on the observed Lx (θ),hence satisfying(L),deﬁned as regular such methods and concepts for a constructed experiment with a likelihood function identical to Lx (θ). •Therefore,in the case of ﬁnite parameter spaces,a clear and logical evidential interpretation of the likelihood function can be given through intrinsic methods and concepts. 27 / 33
• 31. More General Parameter Spaces • This section deals mainly with the case where Ω is the real line. Given E,x,and Lx (θ),a hypothetical experiment E consisting of a single observation of Y with density g(y,θ)=cLx (θ-y)is then constructed. • Then(E,x)has the same likelihood function as (E ,0),and(L) implies that the same inference should be used in (E,x)as in (E ,0). For exemple,if a regular(1-α) conﬁdence interval in E is used, then this interval estimate(for y=0)should be the one used also for (E,x) and is called a (1-α) intrinsic conﬁdence interval for (E,x). 28 / 33
• 32. More General Parameter Spaces As a general comment,Birnbaum emphasizes that intrinsic methods and concepts can ,in light of (L),be nothing more than methods of expressing evidential meaning already implicit in Lx (θ)itself. In the discussion,Birnbaum does not recommend intrinsic methods as statistical methods in practice.The value of these methods is conceptual,and the main use of intrinsic concepts is to show that likelihood functions as such are evidentially meaningful. 29 / 33
• 33. Bayesian Methods:An Interpretation of the principle ofInsuﬃcient Reason Birnbaum views the Bayes approach as not directed to informative inference,but rather as a way to determine an appropriate ﬁnal synthesis of availabe,information based on prior availbale information and data.It is observed that in determining the postrior distribution ,the contribution of the data and E is L x (θ)only,so the Bayes approach implies(L). 30 / 33
• 34. Conclusion •Birnbaum’s main result,that LP follows from suﬃciency and conditionality principles that most statisticians accept,must be regarded as one of the deepest theorems of theoretical statistics, yet the proof is unbelievably simple. •The result had a decisive inﬂuence on how many statisticians came to view the likelihood function as a basic quantity in statistical analysis. •It has also aﬀected in a general way how we view the science of statistics.Birnbaum introduced principles of equivalence within and between experiments, showing various relationships between these principles.This made it possible to disccuss the diﬀerent concepts from alternative viewpoints. 31 / 33
• 35. References Allan Birnbaum ,”On the foundations of statistical inference:binary experiment” Institute of Mathematical Sciences,New York Uni- versity Daniel Steel ,”Beyasian conﬁrmation theory and the likelihood principle” Michigan State University Royall,R” Statistical Evidence:A likelihood paradigm” Chapman and Hall,London Jan F Bjφrnstad,” Breakthroughs in Statistics Volume I -foundationa ans Basic Theory” The university of Trondheim 32 / 33
• 36. the end!thank you! 33 / 33