SlideShare a Scribd company logo
1 of 40
Download to read offline
A Union of Bayesian/Fiducial/Frequentist Approaches:
Uncertainty Quantification by Artificial Samples
Min-ge Xie
Department of Statistics, Rutgers University
VICE Conference on epistemic uncertainty in engineering
Liverpool UK (on-line), February 5 , 2021
Research supported in part by grants from NSF
The Question: A Union of BFF Inferences?
Statistical inference
Infer the underlying truth (population) based on observed data
(Estimation and testing with a quantified uncertainty)
Three statistical paradigms
Bayesian (oldest; regaining some popularity)
Fiducial (intermediate; least popular)
Frequentist (newest; most popular)
One “sticky” question for all statisticians
Can Bayesian, Frequentist and Fiducial (BFF) inferences come together
in a union?
Bayesian, Fiducial and Frequestist = BFF = Best Friends Forever!
– Through the lenses of “distribution estimation” (confidence distribution, etc.)
and simulation (Monte-Carlo/artificial samples, etc.)
Outline
Introduction: articulate the logic behind the CD developments
– CD is a purely frequentist concept, but links to Bayesian inference
concepts and fiducial distribution
A union of Bayesian, Frequentist and Fiducial (BFF) inferences
– Artificial data sample (“fake data”) for inference
– Two versions of a parameter across BFF paradigm
Some reflection (a naive take) on epistemic vs aleatory uncertainty
– Could/Should an epistemic Bayesian prior be viewed as an
aleatory summary of historical data (knowledge)?
Introduction to confidence distribution (CD)
Statistical inference (Parameter estimation):
Point estimate
Interval estimate
Distribution estimate (e.g., confidence distribution)
Example: Y1, . . . , Yn i.i.d. follows N(µ, 1)
Point estimate: ȳn = 1
n
Pn
i=1 yi
Interval estimate: (ȳn − 1.96/
√
n, ȳn + 1.96/
√
n)
Distribution estimate: N(ȳn, 1
n
)
The idea of the CD approach is to use a sample-dependent distribution (or
density) function to estimate the parameter of interest.
(Xie & Singh 2013; Schweder & Hjort 2016)
CD is very informative –
Point estimators, confidence intervals, p-values & more
CD can provide meaningful answers for all questions in statistical inference –
b	
  
(cf., Xie & Singh 2013; Singh et al. 2007)
Definition: Confidence Distribution
Definition:
A confidence distribution (CD) is a sample-dependent distribution
function on parameter space that can represent confidence intervals
(regions) of all levels for a parameter of interest.
– Cox (2013, Int. Stat. Rev. ): The CD approach is “to provide
simple and interpretable summaries of what can reasonably be
learned from data (and an assumed model).”
– Efron (2013, Int. Stat. Rev. ): The CD development is “a
grounding process” to help solve “perhaps the most important
unresolved problem in statistical inference” on “the use of Bayes
theorem in the absence of prior information.”
 Wide range of examples: bootstrap distribution, (normalized) likelihood
function, empirical likelihood, p-value functions, fiducial distributions,
some informative priors and Bayesian posteriors, among others
CD examples
Under regularity conditions, we can prove that a normalized likelihood
function (with respect to parameter θ)
L(θ|data)
R
L(θ|data)dθ
is a density function on Θ (i.e., a confidence density function).
Example: Y1, . . . , Yn i.i.d. follows N(µ, 1)
Likelihood function
L(µ|data) =
Y
f(yi |µ) = Ce− 1
2
P
(yi −µ)2
= Ce− n
2
(ȳn−µ)2
− 1
2
P
(yi −ȳn)2
Normalized with respect to µ
L(µ|data)
R
L(µ|data)dµ
= ... =
1
p
2π/n
e− n
2
(µ−ȳn)2
It is the density of N(ȳn, 1
n
)!
(cf. Fraser  McDunnough 1984, Singh et al., 2007)
CD examples
Under regularity conditions, we can prove that the so called p-value function
is often a confidence distribution.
Example: Y1, . . . , Yn i.i.d. follows N(µ, 1)
One-sided test: K0 : µ = µ0 versus Ka : µ  µ0
p(µ0) = P(Ȳ  ȳn) = 1 − Φ(
√
n{ȳn − µ0}) = Φ(
√
n{µ0 − ȳn}).
Varying µ0 ∈ Θ! =⇒ Cumulative distribution function of N(ȳn, 1
n
) on Θ!
? Suppose n = 100 and we observe ȳn = 0.3
0
-­‐0.1	
   0.2	
   0.3	
   0.496	
  
mu0	
   in/y	
   	
  -­‐0.1	
   	
  0	
   0.1	
   0.1355	
   0.2	
   0.3	
   0.4645	
   0.496	
  
P-­‐value	
   0	
   .00002	
   .00135	
   0.02275	
   0.05	
   .15866	
   0.5	
   0.95	
   0.975	
  
Three forms of CD presentations
0.0 0.2 0.4 0.6
0
1
2
3
4
mu
CD
density
0.0 0.2 0.4 0.6
0.0
0.4
0.8
mu
CD
0.0 0.2 0.4 0.6
0.0
0.4
0.8
mu
CV
.
Confidence density: in the form of a density
function hn(θ)
e.g., N(ȳn, 1
n
) as hn(θ) = 1
√
2π/n
e− n
2
(θ−ȳn)2
.
Confidence distribution in the form of a
cumulative distribution function Hn(θ)
e.g., N(ȳn, 1
n
) as Hn(θ) = Φ
√
n(θ − ȳn)

Confidence curve:
CVn(θ) = 2 min

Hn(θ), 1 − Hn(θ)
	
e.g., N(ȳn, 1
n
) as CVn(θ) =
2 min

Φ
√
n(θ − ȳn)

, 1 − Φ
√
n(θ − ȳn)
Three forms of CD presentations (more)
−2 0 2 4
0.0
0.2
0.4
0.6
0.8
(a)
1−confidence
level Confidence curve for theta
80% CI
95% CI
99% CI
0.05
0.01
¥.
Figure: Confidence curve CVn(θ) = 2 min{Φ(θ−ȳ
√
n
), 1 − Φ(θ−ȳ
√
n
)} The
sample data used to generate the plots are from N(1.35, 1) with n = 5.
Three forms of CD presentations (more)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
theta
CD
(c-box)
+
+
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0 theta
CV
+ +
Figure: Upper/Lower CDs (c-box) and corresponding CV. Upper CD
H+
n (θ) =
P
xk≤n
n
k

θk (1 − θ)n−k ; lower CD H−
n (θ) =
P
x≤k≤n
n
k

θk (1 − θ)n−k .
The sample data used to generate the plots are from Binomial(10, 0.3).
(c-box: Ferson et al 2003; Ferson et al 2013)
(upper/Lower CDs: Cui and Xie 20; Thornton and Xie 20; Luo et al 20)
More examples
Example A: (Normal Mean and Variance) Assume Y1, . . . , Yn ∼ N(µ, σ2
).
Variance σ2
is known:
3 HΦ(µ) = Φ
√
n(µ−Ȳ)
σ

(i.e., N(Ȳ, σ/
√
n)) is a CD for µ
Variance σ2
is known:
3 Ht (µ) = Ftn−1
√
n(µ−Ȳ)
s

is a CD for µ;
3 Hχ2 (θ) = 1 − Fχ2
n−1

(n − 1)s2
/θ

is a CD for σ2
(Here, Ftn−1 and Fχ2
n−1
are the cdf of tn−1 and χ2
n−1 distribution,
respectively. )
o Asymptotic CD are also available in both cases
More examples
Example B: (Bivariate normal correlation) Let ρ denote the correlation
coefficient of a bivariate normal population; r be the sample version.
Fisher’s z
z =
1
2
log
1 + r
1 − r
has the limiting distribution N

1
2
log 1+ρ
1−ρ
, 1
n−3

=⇒
Hn(ρ) = 1 − Φ

√
n − 3

1
2
log
1 + r
1 − r
−
1
2
log
1 + ρ
1 − ρ

is an asymptotic CD for ρ, when sample size n → ∞.
Many more examples – bootstrap distributions, p-value functions,
(normalized) likelihood function, (normalized) empirical likelihood, Bayesian
posteriors (often), fiducial distributions ...
As long as can be used to create confidence intervals of all levels –
(Parametric  nonpara.; normal  nonnormal; exact  asymptotic ... )
CD — a unifying concept for distributional inference
Our understanding/interpretation: Any approach, regardless of being
frequentist, fiducial or Bayesian, can potentially be unified under the
concept of confidence distributions, as long as it can be used to build
confidence intervals of all levels, exactly or asymptotically.
I Provide a union for Bayesian, frequentist, fiducial (BFF) inferences
today
←-
I Supports new methodology developments — providing inference tools
whose solutions are previously unavailable or unknown
◦ From our Rutgers group, for instance -
– New prediction approaches
– New testing methods
– New simulation based inference schemes
– Combining information from diverse sources through combining
CDs (fusion learning/meta analysis, split  conquer, etc.)
Two levels of BFF union
I First level (conceptual) union: Distribution/function estimators of
parameter
Analog (not exact) comparison:
Consistent estimator versus MLE
Confidence distribution versus fiducial  posterior distributions
I Second level (deep-rooted) union: Randomness for uncertainty
Artificial data samples (“fake” date) for inference
Two versions (fixed and random) of parameters across BFF
(CD ≈ Bootstrap distribution ≈ fiducial distribution ≈ Bayesian posterior)
Distribution/function estimation of parameter
I First level union (conceptual)
CD, Bayesian posterior, likelihood function, bootstrap distribution,
p-value function, fiducial distribution (including belief/plausible
functions) · · · can all be considered as a distribution
estimate/function estimate of the parameter that they target –
Sample-dependent (distribution) functions on the parameter space
Used to make statistical inference (including point estimation,
interval estimation, testing, etc.)
Of course, the CD, posterior and others are defined a little differently ...
Revisit of the CD definition
A confidence distribution (CD) is a sample-dependent distribution
function that can represent confidence intervals (regions) of all levels for
a parameter of interest.
Definition (Formal definition [Θ = parameter space; X = sample space])
A sample-dependent function on the parameter space i.e., a function on
Θ × X

is called a confidence distribution (CD) for parameter θ, if:
R1) For each given sample data, it is a distribution function on the
parameter space;
R2) The function can provide confidence intervals (regions) of all levels for θ.
Example: N(ȳn, 1
n
) on Θ = (−∞, ∞).
——————–
Comparison: Consistent/unbiased estimators ≡ R1) Point (sample) + R2) Performance
Descriptive versus inductive/procedure-wise
The CD definition is from “behaviorist”/pragmatic viewpoint, only
describing a certain required property on performance
Fiducial, Bayesian and others base on inductive reasoning, leading to a
specific procedure (e.g., solving equations, max/min-ization, Bayes
formula)
Descriptive Procedure-wise
Point estimation Consistent estimator MLE
M-estimation
. . . . . .
Distribution estimation Confidence distribution Fiducial distribution
p-value function
Bootstrap
(Normalized) likelihood
Bayesian posterior
. . . . . .
Take home message is still – CD, posterior, bootstrap distribution, fiducial
distribution, etc are all distribution (function) estimates/estimators”
Deeper-rooted union: Artificial randomness for uncertainty
I Second level union: Artificial randomness for uncertainty
Artificial data samples (“fake” date) for inference
Two versions (fixed and random) of parameters across BFF
(CD ≈ Bootstrap distribution ≈ fiducial distribution ≈ Bayesian posterior)
Deeper-rooted union: Connecting Bootstrap and CD
To some degree, CD can be viewed an extension of bootstrap distribution,
although the CD concept is much much broader!
Bootstrap method is a resampling (simulation) approach
o Bootstrap samples: simulated “fake” samples from observed data;
o Bootstrap distributions: derived from the bootstrap samples to
help make statistical inference.
Why does it work? – Bootstrap central limit theory (Singh 1981, Bickel 
Friedman 1981)
θ̂∗
BT − θ̂
θ̂
(Artificial/MC randomness)
∼ θ̂ − θ
θ
(Sample randomness)
(θ = parameter; θ̂ = parameter estimator; θ̂∗
BT = bootstrap estimator)
o Normal example earlier: y1, . . . , yn are i.i.d. from N(θ, 1)
ȳ∗
BT − ȳ
1/
√
n
ȳ ∼
ȳ − θ
1/
√
n
θ (both ∼ N(0, 1))
 Simulated randomness in θ∗
BT
match
∼ Sampling uncertainty in estimating θ!
Deeper-rooted union: Connecting Bootstrap and CD
How about CD?
I A key concept - CD-random variable (convenient format for connecting with
Bootstrap, fiducial, Bayesian plus more)
For each given sample yn, H(·) is a distribution function on Θ
=⇒ We can simulate a random variable ξCD from ξCD|Yn = yn ∼ H(·)!
o We call this ξCD a CD-random variable.
o The CD-random variable ξCD is viewed as a “random estimator” of
θ0 (a median unbiased estimator)
Normal example earlier: Mean parameter θ is estimated by N(ȳ, 1/n):
We simulate a CD-random variable
ξCD|ȳ ∼ N(ȳ, 1/n)
Deeper-rooted union: Connecting Bootstrap and CD
Normal example earlier: Mean parameter θ is estimated by N(ȳ, 1/n):
The CD-random variable ξCD|ȳ ∼ N(ȳ, 1/n) can be re-expressed as,
ξCD − ȳ
1/
√
n
ȳ
(Artificial/MC randomness)
∼
ȳ − θ
1/
√
n

More Related Content

What's hot

Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic ReasoningJunya Tanaka
 
Tbs910 sampling hypothesis regression
Tbs910 sampling hypothesis regressionTbs910 sampling hypothesis regression
Tbs910 sampling hypothesis regressionStephen Ong
 
Econometrics of High-Dimensional Sparse Models
Econometrics of High-Dimensional Sparse ModelsEconometrics of High-Dimensional Sparse Models
Econometrics of High-Dimensional Sparse ModelsNBER
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorialbutest
 
An Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) TestingAn Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) Testingjemille6
 
352735322 rsh-qam11-tif-02-doc
352735322 rsh-qam11-tif-02-doc352735322 rsh-qam11-tif-02-doc
352735322 rsh-qam11-tif-02-docFiras Husseini
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Slides csm
Slides csmSlides csm
Slides csmychaubey
 
Figure 1
Figure 1Figure 1
Figure 1butest
 
Week1 GM533 Slides
Week1 GM533 SlidesWeek1 GM533 Slides
Week1 GM533 SlidesBrent Heard
 
Quantitative Analysis For Management 11th Edition Render Test Bank
Quantitative Analysis For Management 11th Edition Render Test BankQuantitative Analysis For Management 11th Edition Render Test Bank
Quantitative Analysis For Management 11th Edition Render Test BankRichmondere
 

What's hot (20)

2018 MUMS Fall Course - Essentials of Bayesian Hypothesis Testing - Jim Berge...
2018 MUMS Fall Course - Essentials of Bayesian Hypothesis Testing - Jim Berge...2018 MUMS Fall Course - Essentials of Bayesian Hypothesis Testing - Jim Berge...
2018 MUMS Fall Course - Essentials of Bayesian Hypothesis Testing - Jim Berge...
 
Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic Reasoning
 
Tbs910 sampling hypothesis regression
Tbs910 sampling hypothesis regressionTbs910 sampling hypothesis regression
Tbs910 sampling hypothesis regression
 
Lecture8
Lecture8Lecture8
Lecture8
 
Econometrics of High-Dimensional Sparse Models
Econometrics of High-Dimensional Sparse ModelsEconometrics of High-Dimensional Sparse Models
Econometrics of High-Dimensional Sparse Models
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
 
1504 basic statistics
1504 basic statistics1504 basic statistics
1504 basic statistics
 
WSDM2019tutorial
WSDM2019tutorialWSDM2019tutorial
WSDM2019tutorial
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
 
ecir2019tutorial
ecir2019tutorialecir2019tutorial
ecir2019tutorial
 
Probability distributionv1
Probability distributionv1Probability distributionv1
Probability distributionv1
 
An Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) TestingAn Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) Testing
 
352735322 rsh-qam11-tif-02-doc
352735322 rsh-qam11-tif-02-doc352735322 rsh-qam11-tif-02-doc
352735322 rsh-qam11-tif-02-doc
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Slides csm
Slides csmSlides csm
Slides csm
 
Figure 1
Figure 1Figure 1
Figure 1
 
Week1 GM533 Slides
Week1 GM533 SlidesWeek1 GM533 Slides
Week1 GM533 Slides
 
Paris Lecture 1
Paris Lecture 1Paris Lecture 1
Paris Lecture 1
 
Decision tree learning
Decision tree learningDecision tree learning
Decision tree learning
 
Quantitative Analysis For Management 11th Edition Render Test Bank
Quantitative Analysis For Management 11th Edition Render Test BankQuantitative Analysis For Management 11th Edition Render Test Bank
Quantitative Analysis For Management 11th Edition Render Test Bank
 

Similar to A Union of BFF Inferences Through Artificial Samples

Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regionsbutest
 
Talk slides imsct2016
Talk slides imsct2016Talk slides imsct2016
Talk slides imsct2016ychaubey
 
original
originaloriginal
originalbutest
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
advanced_statistics.pdf
advanced_statistics.pdfadvanced_statistics.pdf
advanced_statistics.pdfGerryMakilan2
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Henock Beyene
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningcsandit
 
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...cscpconf
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGcsandit
 
PHStat Notes Using the PHStat Stack Data and .docx
    PHStat Notes    Using the  PHStat Stack Data  and .docx    PHStat Notes    Using the  PHStat Stack Data  and .docx
PHStat Notes Using the PHStat Stack Data and .docxShiraPrater50
 
Lect 2 basic ppt
Lect 2 basic pptLect 2 basic ppt
Lect 2 basic pptTao Hong
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional VerificationSai Kiran Kadam
 
A Review on Subjectivity Analysis through Text Classification Using Mining Te...
A Review on Subjectivity Analysis through Text Classification Using Mining Te...A Review on Subjectivity Analysis through Text Classification Using Mining Te...
A Review on Subjectivity Analysis through Text Classification Using Mining Te...IJERA Editor
 
Lecture 7
Lecture 7Lecture 7
Lecture 7butest
 
Lecture 7
Lecture 7Lecture 7
Lecture 7butest
 
Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selectionchenhm
 

Similar to A Union of BFF Inferences Through Artificial Samples (20)

Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regions
 
F0742328
F0742328F0742328
F0742328
 
Data science
Data scienceData science
Data science
 
Talk slides imsct2016
Talk slides imsct2016Talk slides imsct2016
Talk slides imsct2016
 
original
originaloriginal
original
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
advanced_statistics.pdf
advanced_statistics.pdfadvanced_statistics.pdf
advanced_statistics.pdf
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion mining
 
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
 
PHStat Notes Using the PHStat Stack Data and .docx
    PHStat Notes    Using the  PHStat Stack Data  and .docx    PHStat Notes    Using the  PHStat Stack Data  and .docx
PHStat Notes Using the PHStat Stack Data and .docx
 
Lect 2 basic ppt
Lect 2 basic pptLect 2 basic ppt
Lect 2 basic ppt
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
A Review on Subjectivity Analysis through Text Classification Using Mining Te...
A Review on Subjectivity Analysis through Text Classification Using Mining Te...A Review on Subjectivity Analysis through Text Classification Using Mining Te...
A Review on Subjectivity Analysis through Text Classification Using Mining Te...
 
Lecture 7
Lecture 7Lecture 7
Lecture 7
 
Lecture 7
Lecture 7Lecture 7
Lecture 7
 
Basen Network
Basen NetworkBasen Network
Basen Network
 
My7class
My7classMy7class
My7class
 
Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selection
 

Recently uploaded

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 

Recently uploaded (20)

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 

A Union of BFF Inferences Through Artificial Samples

  • 1. A Union of Bayesian/Fiducial/Frequentist Approaches: Uncertainty Quantification by Artificial Samples Min-ge Xie Department of Statistics, Rutgers University VICE Conference on epistemic uncertainty in engineering Liverpool UK (on-line), February 5 , 2021 Research supported in part by grants from NSF
  • 2. The Question: A Union of BFF Inferences? Statistical inference Infer the underlying truth (population) based on observed data (Estimation and testing with a quantified uncertainty) Three statistical paradigms Bayesian (oldest; regaining some popularity) Fiducial (intermediate; least popular) Frequentist (newest; most popular) One “sticky” question for all statisticians Can Bayesian, Frequentist and Fiducial (BFF) inferences come together in a union? Bayesian, Fiducial and Frequestist = BFF = Best Friends Forever! – Through the lenses of “distribution estimation” (confidence distribution, etc.) and simulation (Monte-Carlo/artificial samples, etc.)
  • 3. Outline Introduction: articulate the logic behind the CD developments – CD is a purely frequentist concept, but links to Bayesian inference concepts and fiducial distribution A union of Bayesian, Frequentist and Fiducial (BFF) inferences – Artificial data sample (“fake data”) for inference – Two versions of a parameter across BFF paradigm Some reflection (a naive take) on epistemic vs aleatory uncertainty – Could/Should an epistemic Bayesian prior be viewed as an aleatory summary of historical data (knowledge)?
  • 4. Introduction to confidence distribution (CD) Statistical inference (Parameter estimation): Point estimate Interval estimate Distribution estimate (e.g., confidence distribution) Example: Y1, . . . , Yn i.i.d. follows N(µ, 1) Point estimate: ȳn = 1 n Pn i=1 yi Interval estimate: (ȳn − 1.96/ √ n, ȳn + 1.96/ √ n) Distribution estimate: N(ȳn, 1 n ) The idea of the CD approach is to use a sample-dependent distribution (or density) function to estimate the parameter of interest. (Xie & Singh 2013; Schweder & Hjort 2016)
  • 5. CD is very informative – Point estimators, confidence intervals, p-values & more CD can provide meaningful answers for all questions in statistical inference – b   (cf., Xie & Singh 2013; Singh et al. 2007)
  • 6. Definition: Confidence Distribution Definition: A confidence distribution (CD) is a sample-dependent distribution function on parameter space that can represent confidence intervals (regions) of all levels for a parameter of interest. – Cox (2013, Int. Stat. Rev. ): The CD approach is “to provide simple and interpretable summaries of what can reasonably be learned from data (and an assumed model).” – Efron (2013, Int. Stat. Rev. ): The CD development is “a grounding process” to help solve “perhaps the most important unresolved problem in statistical inference” on “the use of Bayes theorem in the absence of prior information.” Wide range of examples: bootstrap distribution, (normalized) likelihood function, empirical likelihood, p-value functions, fiducial distributions, some informative priors and Bayesian posteriors, among others
  • 7. CD examples Under regularity conditions, we can prove that a normalized likelihood function (with respect to parameter θ) L(θ|data) R L(θ|data)dθ is a density function on Θ (i.e., a confidence density function). Example: Y1, . . . , Yn i.i.d. follows N(µ, 1) Likelihood function L(µ|data) = Y f(yi |µ) = Ce− 1 2 P (yi −µ)2 = Ce− n 2 (ȳn−µ)2 − 1 2 P (yi −ȳn)2 Normalized with respect to µ L(µ|data) R L(µ|data)dµ = ... = 1 p 2π/n e− n 2 (µ−ȳn)2 It is the density of N(ȳn, 1 n )! (cf. Fraser McDunnough 1984, Singh et al., 2007)
  • 8. CD examples Under regularity conditions, we can prove that the so called p-value function is often a confidence distribution. Example: Y1, . . . , Yn i.i.d. follows N(µ, 1) One-sided test: K0 : µ = µ0 versus Ka : µ µ0 p(µ0) = P(Ȳ ȳn) = 1 − Φ( √ n{ȳn − µ0}) = Φ( √ n{µ0 − ȳn}). Varying µ0 ∈ Θ! =⇒ Cumulative distribution function of N(ȳn, 1 n ) on Θ! ? Suppose n = 100 and we observe ȳn = 0.3 0 -­‐0.1   0.2   0.3   0.496   mu0   in/y    -­‐0.1    0   0.1   0.1355   0.2   0.3   0.4645   0.496   P-­‐value   0   .00002   .00135   0.02275   0.05   .15866   0.5   0.95   0.975  
  • 9. Three forms of CD presentations 0.0 0.2 0.4 0.6 0 1 2 3 4 mu CD density 0.0 0.2 0.4 0.6 0.0 0.4 0.8 mu CD 0.0 0.2 0.4 0.6 0.0 0.4 0.8 mu CV . Confidence density: in the form of a density function hn(θ) e.g., N(ȳn, 1 n ) as hn(θ) = 1 √ 2π/n e− n 2 (θ−ȳn)2 . Confidence distribution in the form of a cumulative distribution function Hn(θ) e.g., N(ȳn, 1 n ) as Hn(θ) = Φ √ n(θ − ȳn) Confidence curve: CVn(θ) = 2 min Hn(θ), 1 − Hn(θ) e.g., N(ȳn, 1 n ) as CVn(θ) = 2 min Φ √ n(θ − ȳn) , 1 − Φ √ n(θ − ȳn)
  • 10. Three forms of CD presentations (more) −2 0 2 4 0.0 0.2 0.4 0.6 0.8 (a) 1−confidence level Confidence curve for theta 80% CI 95% CI 99% CI 0.05 0.01 ¥. Figure: Confidence curve CVn(θ) = 2 min{Φ(θ−ȳ √ n ), 1 − Φ(θ−ȳ √ n )} The sample data used to generate the plots are from N(1.35, 1) with n = 5.
  • 11. Three forms of CD presentations (more) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 theta CD (c-box) + + 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 theta CV + + Figure: Upper/Lower CDs (c-box) and corresponding CV. Upper CD H+ n (θ) = P xk≤n n k θk (1 − θ)n−k ; lower CD H− n (θ) = P x≤k≤n n k θk (1 − θ)n−k . The sample data used to generate the plots are from Binomial(10, 0.3). (c-box: Ferson et al 2003; Ferson et al 2013) (upper/Lower CDs: Cui and Xie 20; Thornton and Xie 20; Luo et al 20)
  • 12. More examples Example A: (Normal Mean and Variance) Assume Y1, . . . , Yn ∼ N(µ, σ2 ). Variance σ2 is known: 3 HΦ(µ) = Φ √ n(µ−Ȳ) σ (i.e., N(Ȳ, σ/ √ n)) is a CD for µ Variance σ2 is known: 3 Ht (µ) = Ftn−1 √ n(µ−Ȳ) s is a CD for µ; 3 Hχ2 (θ) = 1 − Fχ2 n−1 (n − 1)s2 /θ is a CD for σ2 (Here, Ftn−1 and Fχ2 n−1 are the cdf of tn−1 and χ2 n−1 distribution, respectively. ) o Asymptotic CD are also available in both cases
  • 13. More examples Example B: (Bivariate normal correlation) Let ρ denote the correlation coefficient of a bivariate normal population; r be the sample version. Fisher’s z z = 1 2 log 1 + r 1 − r has the limiting distribution N 1 2 log 1+ρ 1−ρ , 1 n−3 =⇒ Hn(ρ) = 1 − Φ √ n − 3 1 2 log 1 + r 1 − r − 1 2 log 1 + ρ 1 − ρ is an asymptotic CD for ρ, when sample size n → ∞. Many more examples – bootstrap distributions, p-value functions, (normalized) likelihood function, (normalized) empirical likelihood, Bayesian posteriors (often), fiducial distributions ... As long as can be used to create confidence intervals of all levels – (Parametric nonpara.; normal nonnormal; exact asymptotic ... )
  • 14. CD — a unifying concept for distributional inference Our understanding/interpretation: Any approach, regardless of being frequentist, fiducial or Bayesian, can potentially be unified under the concept of confidence distributions, as long as it can be used to build confidence intervals of all levels, exactly or asymptotically. I Provide a union for Bayesian, frequentist, fiducial (BFF) inferences today ←- I Supports new methodology developments — providing inference tools whose solutions are previously unavailable or unknown ◦ From our Rutgers group, for instance - – New prediction approaches – New testing methods – New simulation based inference schemes – Combining information from diverse sources through combining CDs (fusion learning/meta analysis, split conquer, etc.)
  • 15. Two levels of BFF union I First level (conceptual) union: Distribution/function estimators of parameter Analog (not exact) comparison: Consistent estimator versus MLE Confidence distribution versus fiducial posterior distributions I Second level (deep-rooted) union: Randomness for uncertainty Artificial data samples (“fake” date) for inference Two versions (fixed and random) of parameters across BFF (CD ≈ Bootstrap distribution ≈ fiducial distribution ≈ Bayesian posterior)
  • 16. Distribution/function estimation of parameter I First level union (conceptual) CD, Bayesian posterior, likelihood function, bootstrap distribution, p-value function, fiducial distribution (including belief/plausible functions) · · · can all be considered as a distribution estimate/function estimate of the parameter that they target – Sample-dependent (distribution) functions on the parameter space Used to make statistical inference (including point estimation, interval estimation, testing, etc.) Of course, the CD, posterior and others are defined a little differently ...
  • 17. Revisit of the CD definition A confidence distribution (CD) is a sample-dependent distribution function that can represent confidence intervals (regions) of all levels for a parameter of interest. Definition (Formal definition [Θ = parameter space; X = sample space]) A sample-dependent function on the parameter space i.e., a function on Θ × X is called a confidence distribution (CD) for parameter θ, if: R1) For each given sample data, it is a distribution function on the parameter space; R2) The function can provide confidence intervals (regions) of all levels for θ. Example: N(ȳn, 1 n ) on Θ = (−∞, ∞). ——————– Comparison: Consistent/unbiased estimators ≡ R1) Point (sample) + R2) Performance
  • 18. Descriptive versus inductive/procedure-wise The CD definition is from “behaviorist”/pragmatic viewpoint, only describing a certain required property on performance Fiducial, Bayesian and others base on inductive reasoning, leading to a specific procedure (e.g., solving equations, max/min-ization, Bayes formula) Descriptive Procedure-wise Point estimation Consistent estimator MLE M-estimation . . . . . . Distribution estimation Confidence distribution Fiducial distribution p-value function Bootstrap (Normalized) likelihood Bayesian posterior . . . . . . Take home message is still – CD, posterior, bootstrap distribution, fiducial distribution, etc are all distribution (function) estimates/estimators”
  • 19. Deeper-rooted union: Artificial randomness for uncertainty I Second level union: Artificial randomness for uncertainty Artificial data samples (“fake” date) for inference Two versions (fixed and random) of parameters across BFF (CD ≈ Bootstrap distribution ≈ fiducial distribution ≈ Bayesian posterior)
  • 20. Deeper-rooted union: Connecting Bootstrap and CD To some degree, CD can be viewed an extension of bootstrap distribution, although the CD concept is much much broader! Bootstrap method is a resampling (simulation) approach o Bootstrap samples: simulated “fake” samples from observed data; o Bootstrap distributions: derived from the bootstrap samples to help make statistical inference. Why does it work? – Bootstrap central limit theory (Singh 1981, Bickel Friedman 1981) θ̂∗ BT − θ̂
  • 21.
  • 23.
  • 24. θ (Sample randomness) (θ = parameter; θ̂ = parameter estimator; θ̂∗ BT = bootstrap estimator) o Normal example earlier: y1, . . . , yn are i.i.d. from N(θ, 1) ȳ∗ BT − ȳ 1/ √ n
  • 25.
  • 26.
  • 27.
  • 28. ȳ ∼ ȳ − θ 1/ √ n
  • 29.
  • 30.
  • 31.
  • 32. θ (both ∼ N(0, 1)) Simulated randomness in θ∗ BT match ∼ Sampling uncertainty in estimating θ!
  • 33. Deeper-rooted union: Connecting Bootstrap and CD How about CD? I A key concept - CD-random variable (convenient format for connecting with Bootstrap, fiducial, Bayesian plus more) For each given sample yn, H(·) is a distribution function on Θ =⇒ We can simulate a random variable ξCD from ξCD|Yn = yn ∼ H(·)! o We call this ξCD a CD-random variable. o The CD-random variable ξCD is viewed as a “random estimator” of θ0 (a median unbiased estimator) Normal example earlier: Mean parameter θ is estimated by N(ȳ, 1/n): We simulate a CD-random variable ξCD|ȳ ∼ N(ȳ, 1/n)
  • 34. Deeper-rooted union: Connecting Bootstrap and CD Normal example earlier: Mean parameter θ is estimated by N(ȳ, 1/n): The CD-random variable ξCD|ȳ ∼ N(ȳ, 1/n) can be re-expressed as, ξCD − ȳ 1/ √ n
  • 35.
  • 36.
  • 37.
  • 39.
  • 40.
  • 41.
  • 42. θ (Sample randomness) (both ∼ N(0, 1)) F The above statement is exactly the same as the key justification for bootstrap, replacing ξCD by a bootstrap sample mean ȳ∗ BT : ȳ∗ BT − ȳ 1/ √ n
  • 43.
  • 44.
  • 45.
  • 46. ȳ ∼ ȳ − θ 1/ √ n
  • 47.
  • 48.
  • 49.
  • 50. θ (both ∼ N(0, 1)) =⇒ The ξCD is in essence the same as a bootstrap estimator! CD-random variable ξCD essentially ≡ bootstrap estimator ȳ∗ BT [cf., Xie Singh 2013] ◦ CD is an extension of bootstrap distribution, but CD is much broader! Simulated randomness in ξCD match ∼ Sampling uncertainty in estimating θ!
  • 51. Deeper-rooted union: Connecting fiducial distribution and CD Model/Structure equation: Normal sample Y ∼ N(θ, 1) (for simplicity, let obs.# n = 1) Y = θ + U where U ∼ N(0, 1) (1) Fisher’s fiducial argument Equivalent equation (“Inversion”): θ = Y − U Thus, when we observe Y = y, θ = y − U (2) Since U ∼ N(0, 1), so θ ∼ N(y, 1) ⇒ The fiducial distribution of θ is N(y, 1)! “Hidden subjectivity” (Dempster, 1963) — “Continue to regard U as a random sample from N(0, 1), even “after Y = y is observed.” In particular, U|Y = y 6∼ U, by equation (1). (Y and U are completely dependent: given one, the other is also given!)
  • 52. Deeper-rooted union: Connecting fiducial distribution and CD A new prospective (my understanding/interpretation): In fact, equation (2) for normal sample mean Ȳ ∼ N(θ, 1/n) is: θ = ȳ − u (2a) where Ȳ = ȳ is realized (and observed), a corresponding error U = u is also realized (but unobserved) (a realization from U ∼ N(0, 1/n)) Goal: Make inference for θ What we know: (1) ȳ is observed; (2) unknown u is a realization from U ∼ N(0, 1/n).
  • 53. Deeper-rooted union: Connecting fiducial distribution and CD A new prospective (my understanding/interpretation): In fact, equation (2) for normal sample mean Ȳ ∼ N(θ, 1/n) is: θ = ȳ − u (2a) where Ȳ = ȳ is realized (and observed), a corresponding error U = u is also realized (but unobserved) (a realization from U ∼ N(0, 1/n)) Goal: Make inference for θ What we know: (1) ȳ is observed; (2) unknown u is a realization from U ∼ N(0, 1/n). An intuitive (appealing) solution: – Simulate an artificial u∗ ∼ N(0, 1/n) and use u∗ to estimate u. – Plug it into (2a), we get an artificial θ∗ (“random estimate” of θ): θ∗ = ȳ − u∗ (2b) – Repeating many times, θ∗ forms a fiducial/CD function N(ȳ, 1/n)! (θ∗ is called a fiducial sample and is also a CD-random variable)
  • 54. Deeper-rooted union: Connecting fiducial distribution CD General model/structure equation: (G a given general function) Y = G(θ, U) where an unknown random “error term” U ∼ D(·) (known distribution). Realization y = G(θ, u) with observed y, unobserved realization u (∼ D(·)) and unknown θ. Our take — a fiducial procedure is essentially to solve a structure equation for a “random estimate” θ∗ of θ! y = G(θ∗ , u∗ ) for an independent random variable u∗ ∼ D(·). (Fiducial “inversion”; incorporated knowledge of D(·)) Notation: We rewrite θ∗ as θ∗ FD.
  • 55. Deeper-rooted union: Connecting fiducial distribution and CD Hannig and colleagues developed a general fiducial inversion algorithm (known as generalized fiducial inference) (cf., Hannig, 2009, Hannig et al. 2015) – Covers general settings, beyond pivot statistics and not well defined inversion problems – Proved Fiducial Bernstein-von Mises theorem (my rewording): θ∗ FD − θ̂
  • 56.
  • 58.
  • 59. θ = θ0, (Sample randomness) as n → ∞, (both normal) F Recall our CD-r.v. and bootstrap statements, we find that ξCD essentially ≡ θ∗ BT essentially ≡ θ∗ FD Message: CD-r.v. (ξCD), bootstrap estimator (θ∗ BT ) and fiducial sample (θ∗ FD) are in essence the same! Simulated randomness in θ∗ FD match ∼ Sampling uncertainty in estimating θ!
  • 60. Deeper-rooted union: Connecting Bayesian in BFF Question: How about a Bayesian method? General model/structure equation: Y = G(θ, U) where an unknown random “error term” U ∼ D(·) (known distribution). Realization y = G(θ, u) with observed y, unobserved realization u (∼ D(·)) and unknown θ. Goal: Make inference for θ What we know: (1) y is observed; (2) unknown u is a realization from u ∼ D(·). (3) unknown θ is a realization from a given prior θ ∼ π(θ);
  • 61. Deeper-rooted union: Connecting Bayesian in BFF A Bayesian solution - Approximate Bayesian Computation (ABC) method [Step A] Simulate a θ∗ ∼ π(θ), a u∗ ∼ D(·) and compute y∗ = G(θ∗ , u∗ ) [Step B] If y∗ “matches” the observed y, i.e., y∗ ≈ y , keep the simulated θ∗ ; Otherwise, repeat Step A. Effectively, the kept θ∗ solves equation y ≈ G(θ∗ , u∗ ) (A Bayesian way of “inversion”; incorporated both knowledge of π(θ) D(·)) Repeat above steps many time to get many θ∗ ; These θ∗ form a “distribution estimator” fa(θ|y) (also called an ABC posterior) Theorem: fa(θ|y) is the posterior or an approximation of the posterior!
  • 62. More remarks on ABC It is impossible (very difficult) to have perfect matches – So real ABC methods: 1. Allow a small matching error ( → 0 in theory) 2. Match a “summary” statistics t(y) instead of the original y (related/corresponding to a pivotal quantity!) When t(y) is a sufficient statistic – Theorem: The ABC posterior fa(θ|y) converges to the posterior, as → 0. What happens, when t(y) is not a sufficient statistic? The ABC posterior fa(θ|y) does NOT converge to the posterior, even as → 0. Theorem: Under mild conditions, the ABC posterior fa(θ|y) converges to a confidence distribution, as → 0. (Thornton, Li and Xie, 2018)
  • 63. More remarks on ABC Cauchy example: A sample of size n = 50 from Cauchy(10,1); flat prior Real Cauchy posterior (black curve) ABC “posterior”, when t(y) ≡ sample median (red curves) ABC “posterior”, when t(y) ≡ sample mean (blue curves)         Cauchy  posterior  with  flat  prior  (n  =  50)   (Thanks to Suzanne Thornton for the figure) Both red and blue curves are CDs and they provide us correct statistical inference (although they are not efficient)
  • 64. Deeper-rooted union: Connecting Bayesian in BFF Let θBY ∼ Post(θ|data) and θ̂ be a point estimator of θ (e.g., MLE). A version of Bayesian Bernstein-von Mises theorem under some conditions (reworded version): θBY − θ̂
  • 65.
  • 67.
  • 68. θ (Sample randomness) as n → ∞, (both normal) F This familiar statement immediately link us to the CD, bootstrap and fiducial distributions: ξCD essentially ≡ θ̄∗ BT essentially ≡ θ∗ FD essentially ≡ θBY Message: CD-r.v. (ξCD), bootstrap estimator (θ∗ BT ), fiducial sample (θ∗ FD) and posterior sample (θBY ) are in essence the same! (Remark: Higher order results exist in all settings) Simulated randomness in θBY match ∼ Sampling uncertainty in estimating θ!
  • 69. Deeper-rooted union: Parameter θ is both fixed and random! Each paradigm has two versions of θ! – Random version (distribution) to describe uncertainty; – Fixed version for the ‘true value’/‘realization’ (fixed unknown quantity) Unified theory for valid statistical inference Variability of random version | data ∼ Sampling uncertainty in est. fixed version ———————- * Bernardo/Berger/Sun (2015): In Bayesian statistics, “parameters must have a distribution describing available information about their values ... this is not a description of their variability (since they are fixed unknown quantities), but a description of the uncertainty about their true values.”
  • 70. Bayes’s Billiard-table Experiment and Bayesian Inference Bayesian inference is named after Thomas Bayes, a Presbyterian minister, philosopher and mathematician ∼ 250 years ago in Britain and Europe, there was a debate among intellectuals (philosophers, priests, mathematicians, etc.): Q: Based on observations in the real world, can humans figure out what god’s plan is? Thomas Bayes designed a Billiard table experiment to answer the question – the answer is a Yes Bayes completed but did not publish his seminal work, which was later discovered in his desk by his friend Richard Price after his death
  • 71. Bayes’s Billiard-table Experiment and Bayesian Inference Bayes’ billiard-table model (God’s version) 0 1 θ0 =0.39814 (say) 1) Randomly thrown one black ball onto a Billard table. Measure its landing place on the horizon (red direction) side of unit length (0, 1). 2) Randomly thrown n additional balls onto a billard table. Record how many (say y) of the n balls landed on the left side of the black ball.
  • 72. Bayes’s Billiard-table Experiment and Bayesian Inference Bayes’ billiard-table model (Humans’ version) 0 1 θ0 = ? Data: observe y = 6 balls (out of n = 13) landed on the left side of the black ball Sample data: Take away the black ball (“truth) and only tell you that we observe y = 6 (out of n = 13) balls on the left side of the black ball Goal: Can you infer the truth where the black ball was?
  • 73. Bayes’s Billiard-table Model and Bayesian Inference Bayes’s billiard-table model In modern date mathematical expression Model: The landing place θ ∼ U(0, 1) The sample y|θ ∼ Binomial(n, θ) Observed (and realized) data: The landing place θ0 is a realization (unobserved) from U(0, 1) The observed yobs is a realization from Binomial(n, θ0). Goal: We are interested in inferring what θ0 is (i.e., this target value θ0 = 0.39814 in the demo example).
  • 74. Bayes’s Billiard-table Model and Bayesian Inference Bayes’s solution: Calculated the following probability, for any t ∈ (0, 1), P(θ ≤ t|yobs) = n + 1 yobs!(n − yobs)! Z t 0 syobs (1 − s)n−yobs ds Bayes formula: A posterior distribution can be calculated by p(θ|y) = p(y|θ)π(θ) R p(y|θ)π(θ)dθ = p(y, θ) p(y) where π(θ) is a prior distribution and p(y|θ) is a likelihood (conditional density) function. All inferential information is summarized in the posterior distribution
  • 75. A Reflection on Epistemic vs Aleatory Uncertainty Bayesian method is “the odd one” out in BFF, because Needs to an additional model assumption, i.e., prior distribution; Thus, adopts the system/philosophy of (non-sampling based) epistemic probability (since no data involved in prior) However, if a prior comes from historical data (e.g., Bayes knew prior θ ∼ U(0, 1), because tried many times), then Uncertainty described by prior can be viewed as (aleatory) sampling uncertainty of [historical data]; Thus, uncertainty described by posterior can be viewed as (aleatory) sampling uncertainty of [historical × current data]? Furthermore, Can we also introduce artificial (Monte-Carlo) sample into the picture (i.e., the idea of aleatory uncertainty of artificial sample)?
  • 76. We have – Provided a brief introduction of confidence distribution (CD) and the idea of distribution estimations Tried to make a case for a union of BFF concepts For a valid inference, we highlighted the need to match Aleatory sampling uncertainty (population/model) with Randomness of an artificial data generation scheme !#$%'()*! To our BFFs BFF community Website: http://bff-stat.org/