SlideShare a Scribd company logo
1 of 2
Download to read offline
Sufficient Statistics
Guy Lebanon
May 2, 2006
A sufficient statistics with respect to θ is a statistic T(X1, . . . , Xn) that contains all the information that
is useful for the estimation of θ. It is useful a data reduction tool, and studying its properties leads to other
useful results.
Definition 1. A statistic T is sufficient for θ if p(x1, . . . , xn|T(x1, . . . , xn)) is not a function of θ.
A useful way to visualize it is as a Markov chain θ → T(X1, . . . , Xn) → {X1, . . . , Xn} (although in
classical statistics θ is not a random variable but a specific value). Conditioned on the middle part of the
chain, the front and back are independent.
As mentioned above, the intuition behind the sufficient statistic concept is that it contains all the infor-
mation necessary for estimating θ. Therefore if one is interested in estimating θ, it is perfectly fine to ‘get rid’
of the original data while keeping only the value of the sufficient statistic. The motivation connects to the
formal definition by considering the concept of sampling a ghost sample: Consider a statistician who erased
the original data, but kept the sufficient statistic. Since p(x1, . . . , xn|T(x1, . . . , xn)) is not a function of θ
(which is unknown), we assume that it is a known distribution. The statistician can then sample x1, . . . , xn
from that conditional distribution, and that ghost sample can be used in lieu of the original data that was
thrown away.
The definition of sufficient statistic is very hard to verify. A much easier way to find sufficient statistics
is through the factorization theorem.
Definition 2. Let X1, . . . , Xn be iid RVs whose distribution is the pdf fXi
or the pmf pXi
. The likelihood
function is the product of the pdfs or pmfs
L(x1, . . . , xn|θ) =
n
i=1 fXi (xi) Xi is a continuous RV
n
i=1 pXi
(xi) Xi is a discrete RV
.
The likelihood function is sometimes viewed as a function of x1, . . . , xn (fixing θ) and sometimes as a
function of θ (fixing x1, . . . , xn). In the latter case, the likelihood is sometimes denoted L(θ).
Theorem 1 (Factorization Theorem). T is a sufficient statistic for θ if the likelihood factorizes into the
following form
L(x1, . . . , xn|θ) = g(θ, T(x1, . . . , xn)) · h(x1, . . . , xn)
for some functions g, h.
Proof. We prove the theorem only for the discrete case (the continuous case requires different techniques).
First assume the likelihood factorizes as above. Then
p(x1, . . . , xn|T(x1, . . . , xn)) =
p(x1, . . . , xn, T(x1, . . . , xn))
p(T(x1, . . . , xn))
=
p(x1, . . . , xn)
y:T (y)=T (x) p(y1, . . . , yn)
=
h(x1, . . . , xn)
y:T (y)=T (x) h(y1, . . . , yn)
which is not a function of θ. Conversely, assume that T is a sufficient statistic for θ. Then
L(x1, . . . , xn|θ) = p(x1, . . . , xn|T(x1, . . . , xn), θ)p(T(x1, . . . , xn)|θ) = h(x1, . . . , xn)g(T(x1, . . . , xn), θ).
1
Example: A sufficient statistic for Ber(θ) is Xi since
L(x1, . . . , xn|θ) =
i
θxi
(1 − θ)1−xi
= θ
P
xi
(1 − θ)n−
P
xi
= g θ, xi · 1.
Example: A sufficient statistic for the uniform distribution U([0, θ]) is max(X1, . . . , Xn) since
L(x1, . . . , xn|θ) =
i
1
θ
·1{0≤xi≤θ} = θ−n
·1{max(x1,...,xn)≤θ}·1{min(x1,...,xn)≥0} = g(θ, max(x1, . . . , xn))h(x1, . . . , xn).
In the case that θ is a vector rather than a scalar, the sufficient statistic may be a vector as well. In this
case we say that the sufficient statistic vector is jointly sufficient for the parameter vector θ. The definitions
and factorization theorem carry over with little change.
Example: T = ( Xi, X2
i ) are jointly sufficient statistics for θ = (µ, σ2
) for normally distributed data
X1, . . . , Xn ∼ N(µ, σ2
):
L(x1, . . . , xn|θ) =
i
1
√
2πσ2
e−(xi−µ)2
/(2σ2
)
= (2πσ2
)−n/2
e−
P
i(xi−µ)2
/(2σ2
)
= (2πσ2
)−n/2
e−
P
i x2
i /(2σ2
)+2µ
P
i xi/(2σ2
)−µ2
/(2σ2
)
= g(θ, T) · 1 = g(θ, T) · h(x1, . . . , xn)
Clearly, sufficient statistics are not unique. From the factorization theorem it is easy to see that (i)
the identity function T(x1, . . . , xn) = (x1, . . . , xn) is a sufficient statistic vector and (ii) if T is a sufficient
statistic for θ then so is any 1-1 function of T. A function that is not 1-1 of a sufficient statistic may or may
not be a sufficient statistic. This leads to the notion of a minimal sufficient statistic.
Definition 3. A statistic that is a sufficient statistic and that is a function of all other sufficient statistics
is called a minimal sufficient statistic.
In a sense, a minimal sufficient statistic is the smallest sufficient statistic and therefore it represents the
ultimate data reduction with respect to estimating θ. In general, it may or may not exists.
Example: Since T = ( Xi, X2
i ) are jointly sufficient statistics for θ = (µ, σ2
) for normally distributed
data X1, . . . , Xn ∼ N(µ, σ2
), then so are ( ¯X, S2
) which are a 1-1 function of ( Xi, X2
i ).
The following theorem provides a way of verifying that a sufficient statistic is minimal.
Theorem 2. T is a minimal sufficient statistics if
L(x1, . . . , xn|θ)
L(y1, . . . , yn|θ)
is not a function of θ ⇔ T(x1, . . . , xn) = T(y1, . . . , yn).
Proof. First we show that T is a sufficient statistic. For each element in the range of T, fix a sample
yt
1, . . . , yt
n. For arbitrary x1, . . . , xn denote T(x1, . . . , xn) = t and
L(x1, . . . , xn|θ) =
L(x1, . . . , xn|θ)
L(yt
1, . . . , yt
n|θ)
L(yt
1, . . . , yt
n|θ) = h(x1, . . . , xn)g(T(x1, . . . , xn), θ).
We show that T is a function of some other arbitrary sufficient statistic T . Let x, y be such that T (x1, . . . , xn) =
T (y1, . . . , yn). Since
L(x1, . . . , xn|θ)
L(y1, . . . , yn|θ)
=
g (T (x1, . . . , xn), θ)h (x1, . . . , xn)
g (T (y1, . . . , yn), θ)h (y1, . . . , yn)
=
h (x1, . . . , xn)
h (y1, . . . , yn)
is independent of θ, T(x1, . . . , xn) = T(y1, . . . , yn) and T is a 1-1 function of T .
Example: T = ( Xi, X2
i ) is a minimal sufficient statistic for the Normal distribution since the
likelihood ratio is not a function of θ iff T(x) = T(y)
L(x1, . . . , xn|θ)
L(y1, . . . , yn|θ)
= e− 1
2σ2
P
(xi−µ)2
−(yi−µ)2
= e− 1
2σ2 (
P
x2
i −
P
y2
i )+ µ
σ2 (
P
xi−
P
yi)
.
Since ( ¯X, S2
) is a function of T, it is minimal sufficient statistic as well.
2

More Related Content

What's hot

(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?Christian Robert
 
Problem_Session_Notes
Problem_Session_NotesProblem_Session_Notes
Problem_Session_NotesLu Mao
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABCChristian Robert
 
Statistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: ModelsStatistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: ModelsChristian Robert
 
Batch mode reinforcement learning based on the synthesis of artificial trajec...
Batch mode reinforcement learning based on the synthesis of artificial trajec...Batch mode reinforcement learning based on the synthesis of artificial trajec...
Batch mode reinforcement learning based on the synthesis of artificial trajec...Université de Liège (ULg)
 
Beyond function approximators for batch mode reinforcement learning: rebuildi...
Beyond function approximators for batch mode reinforcement learning: rebuildi...Beyond function approximators for batch mode reinforcement learning: rebuildi...
Beyond function approximators for batch mode reinforcement learning: rebuildi...Université de Liège (ULg)
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...butest
 
Application of interpolation and finite difference
Application of interpolation and finite differenceApplication of interpolation and finite difference
Application of interpolation and finite differenceManthan Chavda
 
CS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesCS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesEric Conner
 
Machine learning (1)
Machine learning (1)Machine learning (1)
Machine learning (1)NYversity
 

What's hot (20)

(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?
 
Problem_Session_Notes
Problem_Session_NotesProblem_Session_Notes
Problem_Session_Notes
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABC
 
Statistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: ModelsStatistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: Models
 
Assignment 2 solution acs
Assignment 2 solution acsAssignment 2 solution acs
Assignment 2 solution acs
 
MUMS Opening Workshop - Emulators for models and Complexity Reduction - Akil ...
MUMS Opening Workshop - Emulators for models and Complexity Reduction - Akil ...MUMS Opening Workshop - Emulators for models and Complexity Reduction - Akil ...
MUMS Opening Workshop - Emulators for models and Complexity Reduction - Akil ...
 
Batch mode reinforcement learning based on the synthesis of artificial trajec...
Batch mode reinforcement learning based on the synthesis of artificial trajec...Batch mode reinforcement learning based on the synthesis of artificial trajec...
Batch mode reinforcement learning based on the synthesis of artificial trajec...
 
Slides lln-risques
Slides lln-risquesSlides lln-risques
Slides lln-risques
 
Convolution
ConvolutionConvolution
Convolution
 
Beyond function approximators for batch mode reinforcement learning: rebuildi...
Beyond function approximators for batch mode reinforcement learning: rebuildi...Beyond function approximators for batch mode reinforcement learning: rebuildi...
Beyond function approximators for batch mode reinforcement learning: rebuildi...
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
1 - Linear Regression
1 - Linear Regression1 - Linear Regression
1 - Linear Regression
 
Application of interpolation and finite difference
Application of interpolation and finite differenceApplication of interpolation and finite difference
Application of interpolation and finite difference
 
CS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesCS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture Notes
 
probability assignment help
probability assignment helpprobability assignment help
probability assignment help
 
Slides econ-lm
Slides econ-lmSlides econ-lm
Slides econ-lm
 
Machine learning (1)
Machine learning (1)Machine learning (1)
Machine learning (1)
 
newton raphson method
newton raphson methodnewton raphson method
newton raphson method
 
stochastic processes assignment help
stochastic processes assignment helpstochastic processes assignment help
stochastic processes assignment help
 

Similar to Sufficient Statistics Definition and Factorization Theorem

Fisher_info_ppt and mathematical process to find time domain and frequency do...
Fisher_info_ppt and mathematical process to find time domain and frequency do...Fisher_info_ppt and mathematical process to find time domain and frequency do...
Fisher_info_ppt and mathematical process to find time domain and frequency do...praveenyadav2020
 
Norraine gadiano report
Norraine gadiano reportNorraine gadiano report
Norraine gadiano reportnorraine
 
Contribution of Fixed Point Theorem in Quasi Metric Spaces
Contribution of Fixed Point Theorem in Quasi Metric SpacesContribution of Fixed Point Theorem in Quasi Metric Spaces
Contribution of Fixed Point Theorem in Quasi Metric SpacesAM Publications,India
 
Fixed Point Theorm In Probabilistic Analysis
Fixed Point Theorm In Probabilistic AnalysisFixed Point Theorm In Probabilistic Analysis
Fixed Point Theorm In Probabilistic Analysisiosrjce
 
PaperNo20-hoseinihabibiPMS1-4-2014-PMS
PaperNo20-hoseinihabibiPMS1-4-2014-PMSPaperNo20-hoseinihabibiPMS1-4-2014-PMS
PaperNo20-hoseinihabibiPMS1-4-2014-PMSMezban Habibi
 
Interpolation techniques - Background and implementation
Interpolation techniques - Background and implementationInterpolation techniques - Background and implementation
Interpolation techniques - Background and implementationQuasar Chunawala
 
Chapter 3 – Random Variables and Probability Distributions
Chapter 3 – Random Variables and Probability DistributionsChapter 3 – Random Variables and Probability Distributions
Chapter 3 – Random Variables and Probability DistributionsJasonTagapanGulla
 
Fixed points of contractive and Geraghty contraction mappings under the influ...
Fixed points of contractive and Geraghty contraction mappings under the influ...Fixed points of contractive and Geraghty contraction mappings under the influ...
Fixed points of contractive and Geraghty contraction mappings under the influ...IJERA Editor
 
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Christian Robert
 
Chapter 5 interpolation
Chapter 5 interpolationChapter 5 interpolation
Chapter 5 interpolationssuser53ee01
 
this materials is useful for the students who studying masters level in elect...
this materials is useful for the students who studying masters level in elect...this materials is useful for the students who studying masters level in elect...
this materials is useful for the students who studying masters level in elect...BhojRajAdhikari5
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and StatisticsMalik Sb
 
PaperNo15-HoseiniHabibiSafariGhezelbash-IJMA
PaperNo15-HoseiniHabibiSafariGhezelbash-IJMAPaperNo15-HoseiniHabibiSafariGhezelbash-IJMA
PaperNo15-HoseiniHabibiSafariGhezelbash-IJMAMezban Habibi
 
Overview of Stochastic Calculus Foundations
Overview of Stochastic Calculus FoundationsOverview of Stochastic Calculus Foundations
Overview of Stochastic Calculus FoundationsAshwin Rao
 
Common Fixed Theorems Using Random Implicit Iterative Schemes
Common Fixed Theorems Using Random Implicit Iterative SchemesCommon Fixed Theorems Using Random Implicit Iterative Schemes
Common Fixed Theorems Using Random Implicit Iterative Schemesinventy
 

Similar to Sufficient Statistics Definition and Factorization Theorem (20)

Sufficient statistics
Sufficient statisticsSufficient statistics
Sufficient statistics
 
Fisher_info_ppt and mathematical process to find time domain and frequency do...
Fisher_info_ppt and mathematical process to find time domain and frequency do...Fisher_info_ppt and mathematical process to find time domain and frequency do...
Fisher_info_ppt and mathematical process to find time domain and frequency do...
 
Norraine gadiano report
Norraine gadiano reportNorraine gadiano report
Norraine gadiano report
 
Contribution of Fixed Point Theorem in Quasi Metric Spaces
Contribution of Fixed Point Theorem in Quasi Metric SpacesContribution of Fixed Point Theorem in Quasi Metric Spaces
Contribution of Fixed Point Theorem in Quasi Metric Spaces
 
Fixed Point Theorm In Probabilistic Analysis
Fixed Point Theorm In Probabilistic AnalysisFixed Point Theorm In Probabilistic Analysis
Fixed Point Theorm In Probabilistic Analysis
 
PaperNo20-hoseinihabibiPMS1-4-2014-PMS
PaperNo20-hoseinihabibiPMS1-4-2014-PMSPaperNo20-hoseinihabibiPMS1-4-2014-PMS
PaperNo20-hoseinihabibiPMS1-4-2014-PMS
 
Lagrange’s interpolation formula
Lagrange’s interpolation formulaLagrange’s interpolation formula
Lagrange’s interpolation formula
 
Interpolation techniques - Background and implementation
Interpolation techniques - Background and implementationInterpolation techniques - Background and implementation
Interpolation techniques - Background and implementation
 
CONTINUITY ON N-ARY SPACES
CONTINUITY ON N-ARY SPACESCONTINUITY ON N-ARY SPACES
CONTINUITY ON N-ARY SPACES
 
Chapter 3 – Random Variables and Probability Distributions
Chapter 3 – Random Variables and Probability DistributionsChapter 3 – Random Variables and Probability Distributions
Chapter 3 – Random Variables and Probability Distributions
 
Fixed points of contractive and Geraghty contraction mappings under the influ...
Fixed points of contractive and Geraghty contraction mappings under the influ...Fixed points of contractive and Geraghty contraction mappings under the influ...
Fixed points of contractive and Geraghty contraction mappings under the influ...
 
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
 
Chapter 5 interpolation
Chapter 5 interpolationChapter 5 interpolation
Chapter 5 interpolation
 
this materials is useful for the students who studying masters level in elect...
this materials is useful for the students who studying masters level in elect...this materials is useful for the students who studying masters level in elect...
this materials is useful for the students who studying masters level in elect...
 
Stochastic Processes - part 6
Stochastic Processes - part 6Stochastic Processes - part 6
Stochastic Processes - part 6
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and Statistics
 
QMC: Operator Splitting Workshop, Stochastic Block-Coordinate Fixed Point Alg...
QMC: Operator Splitting Workshop, Stochastic Block-Coordinate Fixed Point Alg...QMC: Operator Splitting Workshop, Stochastic Block-Coordinate Fixed Point Alg...
QMC: Operator Splitting Workshop, Stochastic Block-Coordinate Fixed Point Alg...
 
PaperNo15-HoseiniHabibiSafariGhezelbash-IJMA
PaperNo15-HoseiniHabibiSafariGhezelbash-IJMAPaperNo15-HoseiniHabibiSafariGhezelbash-IJMA
PaperNo15-HoseiniHabibiSafariGhezelbash-IJMA
 
Overview of Stochastic Calculus Foundations
Overview of Stochastic Calculus FoundationsOverview of Stochastic Calculus Foundations
Overview of Stochastic Calculus Foundations
 
Common Fixed Theorems Using Random Implicit Iterative Schemes
Common Fixed Theorems Using Random Implicit Iterative SchemesCommon Fixed Theorems Using Random Implicit Iterative Schemes
Common Fixed Theorems Using Random Implicit Iterative Schemes
 

More from mustafa sarac

Uluslararasilasma son
Uluslararasilasma sonUluslararasilasma son
Uluslararasilasma sonmustafa sarac
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3mustafa sarac
 
Latka december digital
Latka december digitalLatka december digital
Latka december digitalmustafa sarac
 
Axial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manualAxial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manualmustafa sarac
 
Array programming with Numpy
Array programming with NumpyArray programming with Numpy
Array programming with Numpymustafa sarac
 
Math for programmers
Math for programmersMath for programmers
Math for programmersmustafa sarac
 
TEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimizTEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimizmustafa sarac
 
How to make and manage a bee hotel?
How to make and manage a bee hotel?How to make and manage a bee hotel?
How to make and manage a bee hotel?mustafa sarac
 
Cahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir miCahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir mimustafa sarac
 
How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?mustafa sarac
 
Staff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital MarketsStaff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital Marketsmustafa sarac
 
Yetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimiYetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimimustafa sarac
 
Consumer centric api design v0.4.0
Consumer centric api design v0.4.0Consumer centric api design v0.4.0
Consumer centric api design v0.4.0mustafa sarac
 
State of microservices 2020 by tsh
State of microservices 2020 by tshState of microservices 2020 by tsh
State of microservices 2020 by tshmustafa sarac
 
Uber pitch deck 2008
Uber pitch deck 2008Uber pitch deck 2008
Uber pitch deck 2008mustafa sarac
 
Wireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guideWireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guidemustafa sarac
 
State of Serverless Report 2020
State of Serverless Report 2020State of Serverless Report 2020
State of Serverless Report 2020mustafa sarac
 
Dont just roll the dice
Dont just roll the diceDont just roll the dice
Dont just roll the dicemustafa sarac
 

More from mustafa sarac (20)

Uluslararasilasma son
Uluslararasilasma sonUluslararasilasma son
Uluslararasilasma son
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
 
Latka december digital
Latka december digitalLatka december digital
Latka december digital
 
Axial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manualAxial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manual
 
Array programming with Numpy
Array programming with NumpyArray programming with Numpy
Array programming with Numpy
 
Math for programmers
Math for programmersMath for programmers
Math for programmers
 
The book of Why
The book of WhyThe book of Why
The book of Why
 
BM sgk meslek kodu
BM sgk meslek koduBM sgk meslek kodu
BM sgk meslek kodu
 
TEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimizTEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimiz
 
How to make and manage a bee hotel?
How to make and manage a bee hotel?How to make and manage a bee hotel?
How to make and manage a bee hotel?
 
Cahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir miCahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir mi
 
How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?
 
Staff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital MarketsStaff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital Markets
 
Yetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimiYetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimi
 
Consumer centric api design v0.4.0
Consumer centric api design v0.4.0Consumer centric api design v0.4.0
Consumer centric api design v0.4.0
 
State of microservices 2020 by tsh
State of microservices 2020 by tshState of microservices 2020 by tsh
State of microservices 2020 by tsh
 
Uber pitch deck 2008
Uber pitch deck 2008Uber pitch deck 2008
Uber pitch deck 2008
 
Wireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guideWireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guide
 
State of Serverless Report 2020
State of Serverless Report 2020State of Serverless Report 2020
State of Serverless Report 2020
 
Dont just roll the dice
Dont just roll the diceDont just roll the dice
Dont just roll the dice
 

Recently uploaded

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 

Recently uploaded (20)

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 

Sufficient Statistics Definition and Factorization Theorem

  • 1. Sufficient Statistics Guy Lebanon May 2, 2006 A sufficient statistics with respect to θ is a statistic T(X1, . . . , Xn) that contains all the information that is useful for the estimation of θ. It is useful a data reduction tool, and studying its properties leads to other useful results. Definition 1. A statistic T is sufficient for θ if p(x1, . . . , xn|T(x1, . . . , xn)) is not a function of θ. A useful way to visualize it is as a Markov chain θ → T(X1, . . . , Xn) → {X1, . . . , Xn} (although in classical statistics θ is not a random variable but a specific value). Conditioned on the middle part of the chain, the front and back are independent. As mentioned above, the intuition behind the sufficient statistic concept is that it contains all the infor- mation necessary for estimating θ. Therefore if one is interested in estimating θ, it is perfectly fine to ‘get rid’ of the original data while keeping only the value of the sufficient statistic. The motivation connects to the formal definition by considering the concept of sampling a ghost sample: Consider a statistician who erased the original data, but kept the sufficient statistic. Since p(x1, . . . , xn|T(x1, . . . , xn)) is not a function of θ (which is unknown), we assume that it is a known distribution. The statistician can then sample x1, . . . , xn from that conditional distribution, and that ghost sample can be used in lieu of the original data that was thrown away. The definition of sufficient statistic is very hard to verify. A much easier way to find sufficient statistics is through the factorization theorem. Definition 2. Let X1, . . . , Xn be iid RVs whose distribution is the pdf fXi or the pmf pXi . The likelihood function is the product of the pdfs or pmfs L(x1, . . . , xn|θ) = n i=1 fXi (xi) Xi is a continuous RV n i=1 pXi (xi) Xi is a discrete RV . The likelihood function is sometimes viewed as a function of x1, . . . , xn (fixing θ) and sometimes as a function of θ (fixing x1, . . . , xn). In the latter case, the likelihood is sometimes denoted L(θ). Theorem 1 (Factorization Theorem). T is a sufficient statistic for θ if the likelihood factorizes into the following form L(x1, . . . , xn|θ) = g(θ, T(x1, . . . , xn)) · h(x1, . . . , xn) for some functions g, h. Proof. We prove the theorem only for the discrete case (the continuous case requires different techniques). First assume the likelihood factorizes as above. Then p(x1, . . . , xn|T(x1, . . . , xn)) = p(x1, . . . , xn, T(x1, . . . , xn)) p(T(x1, . . . , xn)) = p(x1, . . . , xn) y:T (y)=T (x) p(y1, . . . , yn) = h(x1, . . . , xn) y:T (y)=T (x) h(y1, . . . , yn) which is not a function of θ. Conversely, assume that T is a sufficient statistic for θ. Then L(x1, . . . , xn|θ) = p(x1, . . . , xn|T(x1, . . . , xn), θ)p(T(x1, . . . , xn)|θ) = h(x1, . . . , xn)g(T(x1, . . . , xn), θ). 1
  • 2. Example: A sufficient statistic for Ber(θ) is Xi since L(x1, . . . , xn|θ) = i θxi (1 − θ)1−xi = θ P xi (1 − θ)n− P xi = g θ, xi · 1. Example: A sufficient statistic for the uniform distribution U([0, θ]) is max(X1, . . . , Xn) since L(x1, . . . , xn|θ) = i 1 θ ·1{0≤xi≤θ} = θ−n ·1{max(x1,...,xn)≤θ}·1{min(x1,...,xn)≥0} = g(θ, max(x1, . . . , xn))h(x1, . . . , xn). In the case that θ is a vector rather than a scalar, the sufficient statistic may be a vector as well. In this case we say that the sufficient statistic vector is jointly sufficient for the parameter vector θ. The definitions and factorization theorem carry over with little change. Example: T = ( Xi, X2 i ) are jointly sufficient statistics for θ = (µ, σ2 ) for normally distributed data X1, . . . , Xn ∼ N(µ, σ2 ): L(x1, . . . , xn|θ) = i 1 √ 2πσ2 e−(xi−µ)2 /(2σ2 ) = (2πσ2 )−n/2 e− P i(xi−µ)2 /(2σ2 ) = (2πσ2 )−n/2 e− P i x2 i /(2σ2 )+2µ P i xi/(2σ2 )−µ2 /(2σ2 ) = g(θ, T) · 1 = g(θ, T) · h(x1, . . . , xn) Clearly, sufficient statistics are not unique. From the factorization theorem it is easy to see that (i) the identity function T(x1, . . . , xn) = (x1, . . . , xn) is a sufficient statistic vector and (ii) if T is a sufficient statistic for θ then so is any 1-1 function of T. A function that is not 1-1 of a sufficient statistic may or may not be a sufficient statistic. This leads to the notion of a minimal sufficient statistic. Definition 3. A statistic that is a sufficient statistic and that is a function of all other sufficient statistics is called a minimal sufficient statistic. In a sense, a minimal sufficient statistic is the smallest sufficient statistic and therefore it represents the ultimate data reduction with respect to estimating θ. In general, it may or may not exists. Example: Since T = ( Xi, X2 i ) are jointly sufficient statistics for θ = (µ, σ2 ) for normally distributed data X1, . . . , Xn ∼ N(µ, σ2 ), then so are ( ¯X, S2 ) which are a 1-1 function of ( Xi, X2 i ). The following theorem provides a way of verifying that a sufficient statistic is minimal. Theorem 2. T is a minimal sufficient statistics if L(x1, . . . , xn|θ) L(y1, . . . , yn|θ) is not a function of θ ⇔ T(x1, . . . , xn) = T(y1, . . . , yn). Proof. First we show that T is a sufficient statistic. For each element in the range of T, fix a sample yt 1, . . . , yt n. For arbitrary x1, . . . , xn denote T(x1, . . . , xn) = t and L(x1, . . . , xn|θ) = L(x1, . . . , xn|θ) L(yt 1, . . . , yt n|θ) L(yt 1, . . . , yt n|θ) = h(x1, . . . , xn)g(T(x1, . . . , xn), θ). We show that T is a function of some other arbitrary sufficient statistic T . Let x, y be such that T (x1, . . . , xn) = T (y1, . . . , yn). Since L(x1, . . . , xn|θ) L(y1, . . . , yn|θ) = g (T (x1, . . . , xn), θ)h (x1, . . . , xn) g (T (y1, . . . , yn), θ)h (y1, . . . , yn) = h (x1, . . . , xn) h (y1, . . . , yn) is independent of θ, T(x1, . . . , xn) = T(y1, . . . , yn) and T is a 1-1 function of T . Example: T = ( Xi, X2 i ) is a minimal sufficient statistic for the Normal distribution since the likelihood ratio is not a function of θ iff T(x) = T(y) L(x1, . . . , xn|θ) L(y1, . . . , yn|θ) = e− 1 2σ2 P (xi−µ)2 −(yi−µ)2 = e− 1 2σ2 ( P x2 i − P y2 i )+ µ σ2 ( P xi− P yi) . Since ( ¯X, S2 ) is a function of T, it is minimal sufficient statistic as well. 2