SlideShare a Scribd company logo
SAMSI Fall,2018
✬
✫
✩
✪
Issue arising in several Working Groups:
probabilistic (Bayesian) combination of evidence
A strong suggestion: Combine data from different sources probabilistically
(through Bayes theorem), not through adhoc methods such as averaging.
1
SAMSI Fall,2018
✬
✫
✩
✪
2
SAMSI Fall,2018
✬
✫
✩
✪
Bayesian combination of the three data sources:
• Let θ be the 12 × 30 matrix of annual rainfalls over the specified grid in
the USA.
• For data source i = 1, 2, 3, let Xi (12 × 30) be the corresponding
matrix of mean observed annual rainfalls.
• Let fi(Xi | θ, σ2
i ) be the probability density of the mean observations,
given θ, and the assumed known error variances σ2
i (12 × 30).
– For, instance, if the measurements in each gridpoint (j, k) from each
data source were independent and normally distributed,
fi(Xi | θ, σ2
i ) =
j,k
1
√
2π σi,j,k
e(xi,j,k−θj,k)2
/[2σ2
i,j,k]
.
– But this is not reasonable:
∗ Typically, observations close spatially have correlated errors.
(Ground station measurements might be an exception.)
∗ Then use models with correlated errors, e.g. Gaussian processes.
3
SAMSI Fall,2018
✬
✫
✩
✪
• Bayes theorem:
– If the different data sources arose from independent measurements,
– and the prior distribution for θ is π(θ),
∗ often the objective prior π(θ) = 1 would be used
∗ if the σi were partly unknown, they would also need a prior
distribution,
– then Bayes theorem yields that the posterior distribution of θ, given
the data is
π(θ | X1, X2, X3) ∝ π(θ)
3
i=1
fi(Xi | θ, σ2
i ) .
4
SAMSI Fall,2018
✬
✫
✩
✪
Things to note: From π(θ | X1, X2, X3), any question about θ can be
answered.
• Under the complete independence and normality assumption, the
estimate of θ would be the posterior mean, given coordinatewise by
ˆθj,k = θj,kπ(θ | X1, X2, X3) dθ =
3
i=1
σ−2
i,j,k
[
3
l=1 σ−2
l,j,k]
xi,j,k .
– Associated standard errors (the square roots of the posterior
variances) are available, as are confidence intervals for the θj,k.
• Correlations and non-normal distributions are handled in the same
way; the computations just may need MCMC.
• If the different data sources were given on different grids, there are
spatial upscaling and downscaling ways of handling it.
5
SAMSI Fall,2018
✬
✫
✩
✪
An Example with different types of densities:
• Data Source 1 observes independent X1i ∼ N(θ, σ2
), i = 1, . . . , n.
– So Data Source 1 has probability density
f1(x11, x12, . . . , x1n | θ, σ2
) =
n
i=1
1
√
2π σ
e(x1i−θ)2
/[2σ2
]
.
• The raw incoming information to Data Source 2 is X2i ∼ N(θ, σ2
),
i = 1, . . . , m, but it is an old measuring instrument that only records if
X2i > 0 (call this Yi = 1) or X2i < 0 (call this Yi = 0).
– P(Yi = 1) = P(X2i > 0 | θ) = 1 − Φ(− θ
σ ), where Φ is the normal cdf.
– So Data Source 2 has probability density
f2(y1, y2, . . . , ym | θ, σ2
) =
m
i=1
1 − Φ −
θ
σ
yi
Φ −
θ
σ
(1−yi)
.
– With a prior density π(θ, σ2
), the posterior density is
π(θ, σ2
| x, y) ∝ f1(x11, x12, . . . x1n | θ, σ2
)f2(y1, y2, . . . , ym | θ, σ2
)π(θ, σ2
) .
6
SAMSI Fall,2018
✬
✫
✩
✪
Combining dependent information sources:
Examples:
• Combining information from four Triangle weather forecasters.
• Combining information from climate models
• Combining information from market forecasters
•
...
The problem: If the dependent information sources, i = 1, 2, . . . , m, provide
posterior densities π∗
i (θ) for the unknown θ, combining them is hard:
• Averaging them (often called the linear opinion pool) is not
probabilistic, and will be bad if they are nearly independent.
• Multiplying the π∗
i (θ) together (often called the independent opinion
pool) is wrong if they are quite dependent.
7
SAMSI Fall,2018
✬
✫
✩
✪
Example: Suppose the four Triangle weather forecasters posterior
distributions, π∗
i (θ), were developed as
• π∗
i (θ) ∝ fi(xi | θ)π(θ), where π(θ) is the weather model forecast and
the fi(xi | θ) are independent recent local temperature measurements.
• Then the correct posterior distribution is π∗
(θ) ∝ π(θ) 4
i=1 fi(xi | θ)
(not 1
4
4
i=1 π∗
i (θ) or 4
i=1 π∗
i (θ)).
• You could find this, if you independently knew π(θ), since
π∗
(θ) ∝
4
i=1 π∗
i (θ)
π(θ)3
.
• Usually, however, you cannot untangle dependent information from
different sources.
8
SAMSI Fall,2018
✬
✫
✩
✪
Hierarchical modeling of the volume/basal-friction relationship
Volume (m3
)
1×105
1×106
1×107
CoefficientofFriction=H/L
0.1
0.3
0.5
0.7
0.9
1.1
1.3
1.5
Colima
Merapi
Soufriere Hill
Unzen
Semeru
9
SAMSI Fall,2018
✬
✫
✩
✪
Hierarchical Modeling:
• For the jth
pyroclastic flow at ith
volcano, model the relationship of the
observed basal friction angle bij to volume Vij by
log tan bij = ai + ci log Vij + ǫij, ǫij ∼ N(0, σ2
i ) .
• Assume that the ci from different volcanoes arise from a normal
distribution N(µc, σ2
c ).
• Assign objective priors to the ai, the σ2
i , and to µc, σ2
c :
– a constant prior for the ai and µc;
– the prior 1/σ2
i for the σ2
i ;
– the prior 1/σc for σ2
c .
• Find the posterior distribution of all unknown unknown parameters,
given the data {bij, i = 1, . . . , 5; j = 1, . . . , ni}.
10
SAMSI Fall,2018
✬
✫
✩
✪
Volume (m3
)
2×104
2×105
2×106
2×107
2×108
CoefficientofFriction
1×10-2
1×10-1
1×100
1×101
1×102
Semeru
Confidence Interval - HM
Confidence Interval - LR
Linear Regression
Median Regression Line - HM
For this mountain, there were only 4 data points, so the ordinary linear regression
gives very wide confidence bands (the dotted lines).
Hierarchical Bayes analysis gives much narrow bands because it uses the
knowledge about the βi from the other mountains (called “borrowing strength.”)
11
SAMSI Fall,2018
✬
✫
✩
✪
Incorporation into the probabilistic risk assessment:
• From the posterior, generate a sample of (a, c) for Montserrat, using a
version of Markov Chain Monte Carlo (MCMC) called Gibbs sampling.
This yields the volume/bias curves below.
• Combine each curve with an emulator of TITAN2D to predict
probabilistic risk in Belham valley.
• The average, over the curves, of the risk is the reported risk.
0 5 10 15 20
8
10
12
14
16
18
Volume (x106
m3
)
BasalFriction(degrees)
50 replicates of φ vs. V at Montserrat
12
SAMSI Fall,2018
✬
✫
✩
✪
Lecture 3: Essentials of Bayesian Model
Uncertainty
13
SAMSI Fall,2018
✬
✫
✩
✪
Outline
• Possible goals for Bayesian model uncertainty analysis
• Formulation and notation for Bayesian model uncertainty
• Model averaging
• Attractions of the Bayesian approach
• Challenges with the Bayesian approach
• Approaches to prior choice for model uncertainty
14
SAMSI Fall,2018
✬
✫
✩
✪
I. Possible goals for Bayesian model
uncertainty analysis
15
SAMSI Fall,2018
✬
✫
✩
✪
Goal 1. Selection of the true model from a collection of models that
contains the truth.
– decision-theoretically, often viewed as choice of 0-1 loss in model selection
Goal 2. Selection of a model good for prediction of future observations or
phenomena
– decision-theoretically, often viewed as choice of squared error loss for
predictions, or Kullback-Liebler divergence for predictive distributions
Goal 3. Finding a simple approximate model that is close enough to the
true model to be useful. (This is purely a utility issue.)
Goal 4. Finding important covariates, important relationships between
covariates, and other features of model structure. (Often, the model is
not of primary importance; it is the relationships of covariates in the
model, or whether certain covariates should even be in the model, that
are of interest.)
16
SAMSI Fall,2018
✬
✫
✩
✪
Caveat 1: Suppose the models under consideration do not contain the
true model, often called the open model scenario.
Positive fact: Bayesian analysis will asymptotically give probability one
to the model that is as close as possible to the true model (in Kullback
Leibler divergence), among the models considered, so the Bayesian
approach is still viable.
Issues of concern:
• Bayesian internal estimates of accuracy may be misleading (Barron,
2004).
• Model averaging may no longer be optimal.
• There is no obvious reason why models need even be ‘plausible’.
Caveat 2: There are two related but quite different statistical paradigms
involving model uncertainty (not discussed here):
• Model criticism or model checking.
• The process of model development.
17
SAMSI Fall,2018
✬
✫
✩
✪
II. Formulation and notation for Bayesian
model uncertainty
18
SAMSI Fall,2018
✬
✫
✩
✪
Models (or hypotheses). Data, x, is assumed to have arisen from one of
several models:
M1: x has density f1(x | θ1)
M2: x has density f2(x | θ2)
...
Mq: x has density fq(x | θq)
Assign prior probabilities, P(Mi), to each model. Common is equal
probabilities
P(Mi) =
1
q
,
but this is inappropriate in many scenarios, such as variable selection
(discussed in Lecture 4).
19
SAMSI Fall,2018
✬
✫
✩
✪
Under model Mi :
– Prior density of θi: θi ∼ πi(θi)
– Marginal density of x:
mi(x) = fi(x | θi)πi(θi) dθi
measures “how likely is x under Mi”
– Posterior density of θi:
πi(θi | x) = π(θi | x, Mi) =
fi(x | θi)πi(θi)
mi(x)
quantifies (posterior) beliefs about θi if the true model was Mi.
20
SAMSI Fall,2018
✬
✫
✩
✪
Bayes factor of Mj to Mi:
Bji =
mj(x)
mi(x)
Posterior probability of Mi:
P(Mi | x) =
P(Mi)mi(x)
q
j=1 P(Mj)mj(x)
=

1 +
q
j=1
P(Mj)
P(Mi)
Bji


−1
Particular case : P(Mj) = 1/q :
P(Mi | x) = mi(x) =
mi(x)
q
j=1 mj(x)
=
1
q
j=1 Bji
Reporting : It is useful to separately report {mi(x)} and {P(Mi)}, along
with
P(Mi | x) =
P(Mi) mi(x)
q
j=1 P(Mj) mj(x)
.
21
SAMSI Fall,2018
✬
✫
✩
✪
Example: location-scale
Suppose X1, X2, . . . , Xn are i.i.d with density
f(xi | µ, σ) =
1
σ
g
xi − µ
σ
Several models are entertained:
MN : g is N(0, 1)
MU : g is Uniform (0, 1)
MC: g is Cauchy (0, 1)
ML: g is Left Exponential ( 1
σ e(x−µ)/σ
, x ≤ µ)
MR: g is Right Exponential ( 1
σ e−(x−µ)/σ
, x ≥ µ)
22
SAMSI Fall,2018
✬
✫
✩
✪
Difficulty: The models are not nested and have no common low
dimensional sufficient statistics; classical model selection would thus
typically rely on asymptotics.
Prior distribution: choose, for i = 1, . . . , 5,
P(Mi) = 1
q
= 1
5
;
utilize the objective prior πi(θi) = πi(µ, σ) = 1
σ
.
Objective improper priors (invariant) are o.k. for the common parameters
in situations having the same invariance structure, if the right-Haar
prior is used (Berger, Pericchi, and Varshavsky, 1998).
Marginal distributions, mN
(x | M), for these models can then be
calculated in closed form.
23
SAMSI Fall,2018
✬
✫
✩
✪
Here mN
(x | M) =
n
i=1
1
σ
g xi−µ
σ
1
σ
dµ dσ.
For the five models, the marginals are
1. Normal: mN
(x | MN ) =
Γ((n−1)/2)
(2 π)(n−1)/2
√
n ( i(xi−¯x)2
)
(n−1)/2
2. Uniform: mN
(x | MU ) = 1
n (n−1)(x(n)−x(1)) n−1
3. Cauchy: mN
(x | MC) is given in Spiegelhalter (1985).
4. Left Exponential: mN
(x | ML) =
(n−2)!
nn(x(n)−¯x) n−1
5. Right Exponential: mN
(x | MR) =
(n−2)!
nn(¯x−x(1)) n−1
24
SAMSI Fall,2018
✬
✫
✩
✪
Consider four classic data sets:
• Darwin’s data (n = 15)
• Cavendish’s data (n = 29)
• Stigler’s Data Set 9 (n = 20)
• A randomly generated Cauchy sample (n = 14)
25
SAMSI Fall,2018
✬
✫
✩
✪
-40-200204060 Darwin’s Data,
n = 15
4.04.55.05.5
Cavendish’s Data,
n = 29
-40-2002040
Stigler’s Data Set 9,
n = 20
0102030
Generated Cauchy Data,
n = 15
26
SAMSI Fall,2018
✬
✫
✩
✪
The objective posterior probabilities of the five models, for each data set,
mN
(x | M) =
mN
(x | M)
mN (x | MN ) + mN (x|MU ) + mN (x|MC ) + mN (x|ML) + mN (x|MR)
,
are as follows:
Data Models
set Normal Uniform Cauchy L. Exp. R. Exp.
Darwin .390 .056 .430 .124 .0001
Cavendish .986 .010 .004 4×10−8
.0006
Stigler 9 7×10−8
4×10−5
.994 .006 2×10−13
Cauchy 5×10−13
9×10−12
.9999 7×10−18
1×10−4
27
SAMSI Fall,2018
✬
✫
✩
✪
III. Model Averaging
28
SAMSI Fall,2018
✬
✫
✩
✪
Accounting for model uncertainty
• Selecting a single model and using it for inference
ignores model uncertainty, resulting in inferior inferences, and
considerable overstatements of accuracy.
• The Bayesian approach incorporates this uncertainty
by model averaging; if, say, inference concerning ξ is desired, it would
be based on
π(ξ | x) =
q
i=1
P(Mi | x) π(ξ | x , Mi).
Note: ξ must have the same meaning across models.
• a useful link, with papers, software, URL’s about BMA
http://www.research.att.com/∼volinsky/bma.html
29
SAMSI Fall,2018
✬
✫
✩
✪
Example: Vehicle emissions data from McDonald et. al. (1995)
Goal: From i.i.d. vehicle emission data X = (X1, . . . , Xn), one desires to
determine the probability that the vehicle type will meet regulatory
standards.
Traditional models for this type of data are Weibull and lognormal
distributions given, respectively, by
M1 : fW (x | β, γ) =
γ
β
x
β
γ−1
exp −
x
β
γ
M2 : fL(x | µ, σ2
) =
1
x
√
2πσ2
exp
−(log x − µ)2
2σ2
.
Model Averaging Analysis:
• For the studied data set, P(M1 | x) = .712. Hence,
P(meeting regulatory standard) = .712 P(meeting standard | M1)
+.288 P(meeting standard | M2) .
30
SAMSI Fall,2018
✬
✫
✩
✪
Model averaging most used for prediction:
Goal: Predict a future Y , given X
Optimal prediction is based on model averaging (cf, Draper, 1994). If
one of the Mi is indeed the true model (but unknown), the optimal
predictor is based on
m(y | x) =
k
i=1
P(Mi | x)mi(y | x, Mi) ,
where
mi(y | x, Mi) = fi(y | θi)πi(θi | x)dθi
is the predictive distribution when Mi is true.
31
SAMSI Fall,2018
✬
✫
✩
✪
IV. Reasons for adopting the Bayesian
approach to model uncertainty
32
SAMSI Fall,2018
✬
✫
✩
✪
Reason 1: Ease of interpretation
• Bayes factors ❀ odds
Posterior model probabilities ❀ direct interpretation
• In the location example and Darwin’s data, the posterior model
probabilities are mN (x) = .390, mU (x) = .056,
mC(x) = .430, ¯mL(x) = .124, mR(x) = .0001,
Bayes factors are BNU = 6.96, BNC = .91, BNL = 3.15, BNR = 3900.
• Interpreting five p-values (or 20, if all pairwise tests are done with
different nulls) is much harder.
33
SAMSI Fall,2018
✬
✫
✩
✪
Reason 2: Prior information can be incorporated, if desired
• If the default report is mi(x), i = 1, . . . , q, then for any prior
probabilities
P(Mi | x) =
P(Mi) mi(x)
q
j=1 P(Mj) mj(x)
Example: In the Darwin’s data, if symmetric models are twice as likely
as the non-symmetric, then
Pr(MN ) = Pr(MU ) = Pr(MC) = 1/4 and Pr(ML) = Pr(MR) = 1/8,
so that
Pr(MN | x) = .416, Pr(MU | x) = .06, Pr(MC | x) = .458,
Pr(ML | x) = .066, Pr(MR | x) = .00005.
34
SAMSI Fall,2018
✬
✫
✩
✪
Reason 3: Consistency
A. If one of the Mi is true, then ¯mi → 1 as n → ∞
B. If some other model, M∗
(not in the original collection of models), is
true, ¯mi → 1 for that model closest to M∗
in Kullback-Leibler
divergence
(Berk, 1966, Dmochowski, 1994)
Reason 4: Ockham’s razor
Bayes Factors automatically seek parsimony; no adhoc penalties for model
complexity are needed.
35
SAMSI Fall,2018
✬
✫
✩
✪
Reason 5: Frequentist interpretation
In many important situations involving testing of M1 versus M2,
B12/(1 + B12) and 1/(1 + B12) are ‘optimal’ conditional frequentist error
probabilities of Type I and Type II, respectively. Indeed, Pr(H0 | data x)
is the frequentist type I error probability, conditional on observing data of
the same “strength of evidence” as the actual data x. (Classical
unconditional error probabilities make the mistake of reporting the error
averaged over data of very different strengths.)
See Sellke, Bayarri and Berger (2001) for discussion and earlier references.
36
SAMSI Fall,2018
✬
✫
✩
✪
Reason 6: The Ability to Account for Model Uncertainty through Model
Averaging
• For non-Bayesians, it is highly problematical to perform inference from
a model based on the same data used to select the model.
– It is not legitimate to do so in the frequentist paradigm.
– It is very hard to determine how to ‘correct’ the inferences (and
known methods are often extremely conservative).
• Bayesian inference based on model averaging overcomes this problem.
37
SAMSI Fall,2018
✬
✫
✩
✪
Reason 7: Generality of application
• Bayesian approach is viable for any number of models, regular or
irregular, nested or not, large or small sample sizes.
• Classical approach is most developed for only two models (hypotheses),
and typically requires at least one of: (i) nested models, (ii) standard
distributions, (iii) regular asymptotics.
• Example 1: In the location-scale example, there were 5 non-nested
models, with small sample sizes, irregular asymptotics, and no reduced
sufficient statistics.
• Example 2: Suppose the Xi are i.i.d from the N(θ, 1) density, where θ =
“mean effect of T1 - mean effect of T2.”
Standard Testing Formulation:
H0 : θ = 0 (no difference in treatments) vs. H1 : θ = 0 (a difference exists)
A More Revealing Formulation:
H0 : θ = 0 (no difference) vs. H1 : θ < 0 ( T2 is better) vs. H2 : θ > 0 (
T1 is better)
38
SAMSI Fall,2018
✬
✫
✩
✪
Reason 8: Standard Bayesian ‘ease of application’ motivations
1. In sequential scenarios, there is no need to ‘spend α’
for looks at the data; posterior probabilities are not affected by the
reason for stopping experimentation (discussed later).
2. The distributions of known censoring variables are
not needed for Bayesian analysis (not discussed).
3. Multiple testing problems can be addressed in a straightforward
manner (not discussed).
39
SAMSI Fall,2018
✬
✫
✩
✪
V. Challenges with the Bayesian approach to
model uncertainty
40
SAMSI Fall,2018
✬
✫
✩
✪
Difficulty 1: Inadequacy of common priors, such as (improper)
objective priors, vague proper priors, and conjugate priors.
As in hypothesis testing, improper priors can only be used for “common
parameters” in models and vague proper priors are usually horrible!
The problem with conjugate priors:
Normal Example: Suppose X1, . . . , Xn are i.i.d. N(x | θ, σ2
) and it is
desired to test H0 : θ = 0 versus H1 : θ = 0 .
• The natural conjugate priors are
π1(σ2
) = 1/σ2
; π2(θ | σ2
) = N(θ | 0, σ2
) and π2(σ2
) = 1/σ2
.
• Bayes factor: B12 =
√
n + 1 1+t2
/[(n−1)(n+1)]
1+t2/(n−1)
n/2
, where t =
√
n¯x
s/
√
n−1
.
• Silly behavior, called ‘information inconsistency’:
as |t| → ∞, B12 → (n + 1)−(n−1)/2
> 0
For instance: if n = 3, B12 → 1/4
if n = 5, B12 → 1/36
41
SAMSI Fall,2018
✬
✫
✩
✪
Difficulty 2: Computation
Computation of the needed marginal distributions (or their ratios) can be
very challenging in high dimensions. (A useful approximation is
discussed later.)
The total number of models under consideration can be enormous (e.g., in
variable selection), so good search strategies are needed.
Difficulty 3: Posterior Inference
When thousands (or millions) of models all have similarly high posterior
probability (as is common), how does one summarize the information
contained therein?
Model averaging summaries are needed, but summaries of what quantities?
(Lecture 4.)
42
SAMSI Fall,2018
✬
✫
✩
✪
VI. Approaches to prior choice for model
uncertainty
• Use of ‘conventional priors’ (Lecture 4)
• Use of intrinsic priors (discussed for hypothesis testing in Lecture 1)
• Inducing model priors from a single prior
43
SAMSI Fall,2018
✬
✫
✩
✪
Inducing Model Priors from a Single Prior
The idea:
• Specify a prior πL(βL) for the ‘largest’ model.
• Use this prior to induce priors on the other models. Possibilities include
– In variable selection, conditioning by setting variables to be removed to
zero. (Logically sound if variables are orthogonal)
– Marginalizing out the variables to be removed. (Not logically sound)
– Matching model parameters for other models with the parameters of the
largest model by, say, minimizing Kullback-Leibler divergence between
models, and then projecting πL(βL). (Probably best, but hard)
44
SAMSI Fall,2018
✬
✫
✩
✪
An example: Suppose the model parameters are probabilities, with the
largest model having probabilities p1, . . . , pm, with prior
πL(p1, . . . , pm) ∼ Dirichlet(1, . . . , 1) (i.e., the uniform distribution on the
simplex such that m
i=1 pi = 1).
If model Ml has parameters (pi1 , . . . , pil
),
• conditioning yields
π(pi1
, . . . , pil
, p∗
| other pj = 0) = Dirichlet(1, . . . , 1), where
p∗
= 1 −
l
j=1 pij .
This is sensible.
• marginalizing yields π(pi1 , . . . , pil
, p∗
) = Dirichlet(1, . . . , 1, m − l).
This is typically much too concentrated near zero when m is large,
with the pik
having means 1/m.
45
SAMSI Fall,2018
✬
✫
✩
✪
Example. Posterior Model Probabilities for Variable Selection in
Probit Regression
(from CMU Case Studies VII, Viele et. al. case study)
Motivating example: Prosopagnosia (face blindness), is a condition
(usually developing after brain trauma) under which the individual cannot
easily distinguish between faces. A psychological study was conducted by
Tarr, Behrmann and Gauthier to address whether this extended to a
difficulty of distinguishing other objects, or was particular to faces.
The study considered 30 control subjects (C) and 2 subjects (S) diagnosed
as having prosopagnosia, and had them try to differentiate between similar
faces, similar ‘Greebles’ and similar ‘objects’ at varying levels of difficulty
and varying comparison time.
46
SAMSI Fall,2018
✬
✫
✩
✪
Data: Sample size was n = 20, 083, the response and covariates being
C = Subject C
S = Subject S
G = Greebles
O = Object
D = Difficulty
B = Brief time
A = Images match or not



⇒ R = Response (answer correct or not).
All variables are binary. All interactions, consisting of all combinations of
C, S, G, O, D, B, A (except{C=S=1} and {G=O=1}), are of interest. There are 3
possible subjects (a control and the two patients), 3 possible stimuli (faces,
objects, or greebles), and 2 possibilities for each of D, B and A, yielding
3 × 3 × 2 × 2 × 2 = 72 possible interactions.
47
SAMSI Fall,2018
✬
✫
✩
✪
Statistical modeling: For a specified interaction vector Xi, let yi and
ni − yi be the numbers of successes and failures among the responses with
that interaction vector, with probability of success pi assumed to follow the
probit regression model
pi = Φ(β1 +
72
j=2 Xijβj),
with Φ being the normal cdf. The full model likelihood (up to a fixed
proportionality constant) is then
f(y | β) =
72
i=1 pyi
i (1 − pi)ni−yi
.
Goal: Select from among the 272
submodels which have some of the βj set
equal to zero. (Actually, only models with graphical structure were
considered, i.e., if an interaction term is in the model, all the lower order
effects must also be there.)
48
SAMSI Fall,2018
✬
✫
✩
✪
Prior Choice: Conditionally inducing submodel priors from a full-model
prior
• A standard noninformative prior for p = (p1, p2, . . . , p72) is the uniform
prior, usable here since it is proper.
• Changing variables from p to β yields πL(β) is N72(0, (X′
X)−1
),
where X′
= (X′
1, X′
2, . . . , X′
72).
This is the full model prior.
• For the model with a nonzero subvector β(j) of dimension kj, with the
remaining βi = 0, the induced prior would be
πL(β(j) | β(−j) = 0) = Nkj (0, (X′
(j)X(j))−1
) ,
where β(−j) is the subvector of parameters that are zero and X(j) is
the corresponding design matrix. (This is a nice conditional property
of normal distributions.)
49

More Related Content

What's hot

ISBA 2016: Foundations
ISBA 2016: FoundationsISBA 2016: Foundations
ISBA 2016: Foundations
Christian Robert
 
Boston talk
Boston talkBoston talk
Boston talk
Christian Robert
 
Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016
Christian Robert
 
testing as a mixture estimation problem
testing as a mixture estimation problemtesting as a mixture estimation problem
testing as a mixture estimation problem
Christian Robert
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forests
Christian Robert
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
Christian Robert
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
Christian Robert
 
statistics assignment help
statistics assignment helpstatistics assignment help
statistics assignment help
Statistics Homework Helper
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
Christian Robert
 
Hessian Matrices in Statistics
Hessian Matrices in StatisticsHessian Matrices in Statistics
Hessian Matrices in Statistics
Ferris Jumah
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
Christian Robert
 
1
11
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
Christian Robert
 
Business statistics homework help
Business statistics homework helpBusiness statistics homework help
Business statistics homework help
Statistics Help Desk
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
NBER
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
Christian Robert
 
ABC short course: introduction chapters
ABC short course: introduction chaptersABC short course: introduction chapters
ABC short course: introduction chapters
Christian Robert
 
An Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) TestingAn Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) Testing
jemille6
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
Christian Robert
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
Christian Robert
 

What's hot (20)

ISBA 2016: Foundations
ISBA 2016: FoundationsISBA 2016: Foundations
ISBA 2016: Foundations
 
Boston talk
Boston talkBoston talk
Boston talk
 
Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016
 
testing as a mixture estimation problem
testing as a mixture estimation problemtesting as a mixture estimation problem
testing as a mixture estimation problem
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forests
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 
statistics assignment help
statistics assignment helpstatistics assignment help
statistics assignment help
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
 
Hessian Matrices in Statistics
Hessian Matrices in StatisticsHessian Matrices in Statistics
Hessian Matrices in Statistics
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
1
11
1
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
 
Business statistics homework help
Business statistics homework helpBusiness statistics homework help
Business statistics homework help
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
 
ABC short course: introduction chapters
ABC short course: introduction chaptersABC short course: introduction chapters
ABC short course: introduction chapters
 
An Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) TestingAn Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) Testing
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 

Similar to 2018 MUMS Fall Course - Issue Arising in Several Working Groups: Probabilistic (Bayesian) Combination of Evidence (EDIT) - Jim Berger, September 11, 2018

Regression on gaussian symbols
Regression on gaussian symbolsRegression on gaussian symbols
Regression on gaussian symbols
Axel de Romblay
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computation
Umberto Picchini
 
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Umberto Picchini
 
Metropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short TutorialMetropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short Tutorial
Ralph Schlosser
 
2018 MUMS Fall Course - Gaussian Processes and Statistic Emulators (EDITED) -...
2018 MUMS Fall Course - Gaussian Processes and Statistic Emulators (EDITED) -...2018 MUMS Fall Course - Gaussian Processes and Statistic Emulators (EDITED) -...
2018 MUMS Fall Course - Gaussian Processes and Statistic Emulators (EDITED) -...
The Statistical and Applied Mathematical Sciences Institute
 
Lausanne 2019 #1
Lausanne 2019 #1Lausanne 2019 #1
Lausanne 2019 #1
Arthur Charpentier
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
Tomaso Aste
 
Predictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big dataPredictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big data
Arthur Charpentier
 
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
The Statistical and Applied Mathematical Sciences Institute
 
Lecture12 xing
Lecture12 xingLecture12 xing
Lecture12 xing
Tianlu Wang
 
XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...
XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...
XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...
Erlangen Artificial Intelligence & Machine Learning Meetup
 
Applications Of Evt In Financial Markets
Applications Of Evt In Financial MarketsApplications Of Evt In Financial Markets
Applications Of Evt In Financial Markets
misterval
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...
Alexander Decker
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
Michael Stumpf
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
jemille6
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
jins0618
 
Input analysis
Input analysisInput analysis
Input analysis
Bhavik A Shah
 
Side 2019 #12
Side 2019 #12Side 2019 #12
Side 2019 #12
Arthur Charpentier
 
13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf
EmanAsem4
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
Frank Nielsen
 

Similar to 2018 MUMS Fall Course - Issue Arising in Several Working Groups: Probabilistic (Bayesian) Combination of Evidence (EDIT) - Jim Berger, September 11, 2018 (20)

Regression on gaussian symbols
Regression on gaussian symbolsRegression on gaussian symbols
Regression on gaussian symbols
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computation
 
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
 
Metropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short TutorialMetropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short Tutorial
 
2018 MUMS Fall Course - Gaussian Processes and Statistic Emulators (EDITED) -...
2018 MUMS Fall Course - Gaussian Processes and Statistic Emulators (EDITED) -...2018 MUMS Fall Course - Gaussian Processes and Statistic Emulators (EDITED) -...
2018 MUMS Fall Course - Gaussian Processes and Statistic Emulators (EDITED) -...
 
Lausanne 2019 #1
Lausanne 2019 #1Lausanne 2019 #1
Lausanne 2019 #1
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
 
Predictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big dataPredictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big data
 
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
 
Lecture12 xing
Lecture12 xingLecture12 xing
Lecture12 xing
 
XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...
XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...
XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...
 
Applications Of Evt In Financial Markets
Applications Of Evt In Financial MarketsApplications Of Evt In Financial Markets
Applications Of Evt In Financial Markets
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
 
Input analysis
Input analysisInput analysis
Input analysis
 
Side 2019 #12
Side 2019 #12Side 2019 #12
Side 2019 #12
 
13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
 

More from The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
The Statistical and Applied Mathematical Sciences Institute
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
The Statistical and Applied Mathematical Sciences Institute
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
The Statistical and Applied Mathematical Sciences Institute
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
The Statistical and Applied Mathematical Sciences Institute
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
The Statistical and Applied Mathematical Sciences Institute
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
The Statistical and Applied Mathematical Sciences Institute
 

More from The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 

Recently uploaded

Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.
IsmaelVazquez38
 
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdfREASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
giancarloi8888
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
Mohammad Al-Dhahabi
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
Prof. Dr. K. Adisesha
 
Juneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School DistrictJuneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School District
David Douglas School District
 
MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025
khuleseema60
 
Standardized tool for Intelligence test.
Standardized tool for Intelligence test.Standardized tool for Intelligence test.
Standardized tool for Intelligence test.
deepaannamalai16
 
78 Microsoft-Publisher - Sirin Sultana Bora.pptx
78 Microsoft-Publisher - Sirin Sultana Bora.pptx78 Microsoft-Publisher - Sirin Sultana Bora.pptx
78 Microsoft-Publisher - Sirin Sultana Bora.pptx
Kalna College
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
TechSoup
 
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptxRESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
zuzanka
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
PsychoTech Services
 
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
220711130083 SUBHASHREE RAKSHIT  Internet resources for social science220711130083 SUBHASHREE RAKSHIT  Internet resources for social science
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
Kalna College
 
Observational Learning
Observational Learning Observational Learning
Observational Learning
sanamushtaq922
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Henry Hollis
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
danielkiash986
 
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptxA Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
OH TEIK BIN
 
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGHKHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
shreyassri1208
 
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
zuzanka
 

Recently uploaded (20)

Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.
 
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdfREASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
 
Juneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School DistrictJuneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School District
 
MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025
 
Standardized tool for Intelligence test.
Standardized tool for Intelligence test.Standardized tool for Intelligence test.
Standardized tool for Intelligence test.
 
78 Microsoft-Publisher - Sirin Sultana Bora.pptx
78 Microsoft-Publisher - Sirin Sultana Bora.pptx78 Microsoft-Publisher - Sirin Sultana Bora.pptx
78 Microsoft-Publisher - Sirin Sultana Bora.pptx
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
 
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptxRESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
 
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
220711130083 SUBHASHREE RAKSHIT  Internet resources for social science220711130083 SUBHASHREE RAKSHIT  Internet resources for social science
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
 
Observational Learning
Observational Learning Observational Learning
Observational Learning
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
 
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptxA Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
 
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGHKHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
 
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
 

2018 MUMS Fall Course - Issue Arising in Several Working Groups: Probabilistic (Bayesian) Combination of Evidence (EDIT) - Jim Berger, September 11, 2018

  • 1. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Issue arising in several Working Groups: probabilistic (Bayesian) combination of evidence A strong suggestion: Combine data from different sources probabilistically (through Bayes theorem), not through adhoc methods such as averaging. 1
  • 3. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Bayesian combination of the three data sources: • Let θ be the 12 × 30 matrix of annual rainfalls over the specified grid in the USA. • For data source i = 1, 2, 3, let Xi (12 × 30) be the corresponding matrix of mean observed annual rainfalls. • Let fi(Xi | θ, σ2 i ) be the probability density of the mean observations, given θ, and the assumed known error variances σ2 i (12 × 30). – For, instance, if the measurements in each gridpoint (j, k) from each data source were independent and normally distributed, fi(Xi | θ, σ2 i ) = j,k 1 √ 2π σi,j,k e(xi,j,k−θj,k)2 /[2σ2 i,j,k] . – But this is not reasonable: ∗ Typically, observations close spatially have correlated errors. (Ground station measurements might be an exception.) ∗ Then use models with correlated errors, e.g. Gaussian processes. 3
  • 4. SAMSI Fall,2018 ✬ ✫ ✩ ✪ • Bayes theorem: – If the different data sources arose from independent measurements, – and the prior distribution for θ is π(θ), ∗ often the objective prior π(θ) = 1 would be used ∗ if the σi were partly unknown, they would also need a prior distribution, – then Bayes theorem yields that the posterior distribution of θ, given the data is π(θ | X1, X2, X3) ∝ π(θ) 3 i=1 fi(Xi | θ, σ2 i ) . 4
  • 5. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Things to note: From π(θ | X1, X2, X3), any question about θ can be answered. • Under the complete independence and normality assumption, the estimate of θ would be the posterior mean, given coordinatewise by ˆθj,k = θj,kπ(θ | X1, X2, X3) dθ = 3 i=1 σ−2 i,j,k [ 3 l=1 σ−2 l,j,k] xi,j,k . – Associated standard errors (the square roots of the posterior variances) are available, as are confidence intervals for the θj,k. • Correlations and non-normal distributions are handled in the same way; the computations just may need MCMC. • If the different data sources were given on different grids, there are spatial upscaling and downscaling ways of handling it. 5
  • 6. SAMSI Fall,2018 ✬ ✫ ✩ ✪ An Example with different types of densities: • Data Source 1 observes independent X1i ∼ N(θ, σ2 ), i = 1, . . . , n. – So Data Source 1 has probability density f1(x11, x12, . . . , x1n | θ, σ2 ) = n i=1 1 √ 2π σ e(x1i−θ)2 /[2σ2 ] . • The raw incoming information to Data Source 2 is X2i ∼ N(θ, σ2 ), i = 1, . . . , m, but it is an old measuring instrument that only records if X2i > 0 (call this Yi = 1) or X2i < 0 (call this Yi = 0). – P(Yi = 1) = P(X2i > 0 | θ) = 1 − Φ(− θ σ ), where Φ is the normal cdf. – So Data Source 2 has probability density f2(y1, y2, . . . , ym | θ, σ2 ) = m i=1 1 − Φ − θ σ yi Φ − θ σ (1−yi) . – With a prior density π(θ, σ2 ), the posterior density is π(θ, σ2 | x, y) ∝ f1(x11, x12, . . . x1n | θ, σ2 )f2(y1, y2, . . . , ym | θ, σ2 )π(θ, σ2 ) . 6
  • 7. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Combining dependent information sources: Examples: • Combining information from four Triangle weather forecasters. • Combining information from climate models • Combining information from market forecasters • ... The problem: If the dependent information sources, i = 1, 2, . . . , m, provide posterior densities π∗ i (θ) for the unknown θ, combining them is hard: • Averaging them (often called the linear opinion pool) is not probabilistic, and will be bad if they are nearly independent. • Multiplying the π∗ i (θ) together (often called the independent opinion pool) is wrong if they are quite dependent. 7
  • 8. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Example: Suppose the four Triangle weather forecasters posterior distributions, π∗ i (θ), were developed as • π∗ i (θ) ∝ fi(xi | θ)π(θ), where π(θ) is the weather model forecast and the fi(xi | θ) are independent recent local temperature measurements. • Then the correct posterior distribution is π∗ (θ) ∝ π(θ) 4 i=1 fi(xi | θ) (not 1 4 4 i=1 π∗ i (θ) or 4 i=1 π∗ i (θ)). • You could find this, if you independently knew π(θ), since π∗ (θ) ∝ 4 i=1 π∗ i (θ) π(θ)3 . • Usually, however, you cannot untangle dependent information from different sources. 8
  • 9. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Hierarchical modeling of the volume/basal-friction relationship Volume (m3 ) 1×105 1×106 1×107 CoefficientofFriction=H/L 0.1 0.3 0.5 0.7 0.9 1.1 1.3 1.5 Colima Merapi Soufriere Hill Unzen Semeru 9
  • 10. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Hierarchical Modeling: • For the jth pyroclastic flow at ith volcano, model the relationship of the observed basal friction angle bij to volume Vij by log tan bij = ai + ci log Vij + ǫij, ǫij ∼ N(0, σ2 i ) . • Assume that the ci from different volcanoes arise from a normal distribution N(µc, σ2 c ). • Assign objective priors to the ai, the σ2 i , and to µc, σ2 c : – a constant prior for the ai and µc; – the prior 1/σ2 i for the σ2 i ; – the prior 1/σc for σ2 c . • Find the posterior distribution of all unknown unknown parameters, given the data {bij, i = 1, . . . , 5; j = 1, . . . , ni}. 10
  • 11. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Volume (m3 ) 2×104 2×105 2×106 2×107 2×108 CoefficientofFriction 1×10-2 1×10-1 1×100 1×101 1×102 Semeru Confidence Interval - HM Confidence Interval - LR Linear Regression Median Regression Line - HM For this mountain, there were only 4 data points, so the ordinary linear regression gives very wide confidence bands (the dotted lines). Hierarchical Bayes analysis gives much narrow bands because it uses the knowledge about the βi from the other mountains (called “borrowing strength.”) 11
  • 12. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Incorporation into the probabilistic risk assessment: • From the posterior, generate a sample of (a, c) for Montserrat, using a version of Markov Chain Monte Carlo (MCMC) called Gibbs sampling. This yields the volume/bias curves below. • Combine each curve with an emulator of TITAN2D to predict probabilistic risk in Belham valley. • The average, over the curves, of the risk is the reported risk. 0 5 10 15 20 8 10 12 14 16 18 Volume (x106 m3 ) BasalFriction(degrees) 50 replicates of φ vs. V at Montserrat 12
  • 13. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Lecture 3: Essentials of Bayesian Model Uncertainty 13
  • 14. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Outline • Possible goals for Bayesian model uncertainty analysis • Formulation and notation for Bayesian model uncertainty • Model averaging • Attractions of the Bayesian approach • Challenges with the Bayesian approach • Approaches to prior choice for model uncertainty 14
  • 15. SAMSI Fall,2018 ✬ ✫ ✩ ✪ I. Possible goals for Bayesian model uncertainty analysis 15
  • 16. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Goal 1. Selection of the true model from a collection of models that contains the truth. – decision-theoretically, often viewed as choice of 0-1 loss in model selection Goal 2. Selection of a model good for prediction of future observations or phenomena – decision-theoretically, often viewed as choice of squared error loss for predictions, or Kullback-Liebler divergence for predictive distributions Goal 3. Finding a simple approximate model that is close enough to the true model to be useful. (This is purely a utility issue.) Goal 4. Finding important covariates, important relationships between covariates, and other features of model structure. (Often, the model is not of primary importance; it is the relationships of covariates in the model, or whether certain covariates should even be in the model, that are of interest.) 16
  • 17. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Caveat 1: Suppose the models under consideration do not contain the true model, often called the open model scenario. Positive fact: Bayesian analysis will asymptotically give probability one to the model that is as close as possible to the true model (in Kullback Leibler divergence), among the models considered, so the Bayesian approach is still viable. Issues of concern: • Bayesian internal estimates of accuracy may be misleading (Barron, 2004). • Model averaging may no longer be optimal. • There is no obvious reason why models need even be ‘plausible’. Caveat 2: There are two related but quite different statistical paradigms involving model uncertainty (not discussed here): • Model criticism or model checking. • The process of model development. 17
  • 18. SAMSI Fall,2018 ✬ ✫ ✩ ✪ II. Formulation and notation for Bayesian model uncertainty 18
  • 19. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Models (or hypotheses). Data, x, is assumed to have arisen from one of several models: M1: x has density f1(x | θ1) M2: x has density f2(x | θ2) ... Mq: x has density fq(x | θq) Assign prior probabilities, P(Mi), to each model. Common is equal probabilities P(Mi) = 1 q , but this is inappropriate in many scenarios, such as variable selection (discussed in Lecture 4). 19
  • 20. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Under model Mi : – Prior density of θi: θi ∼ πi(θi) – Marginal density of x: mi(x) = fi(x | θi)πi(θi) dθi measures “how likely is x under Mi” – Posterior density of θi: πi(θi | x) = π(θi | x, Mi) = fi(x | θi)πi(θi) mi(x) quantifies (posterior) beliefs about θi if the true model was Mi. 20
  • 21. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Bayes factor of Mj to Mi: Bji = mj(x) mi(x) Posterior probability of Mi: P(Mi | x) = P(Mi)mi(x) q j=1 P(Mj)mj(x) =  1 + q j=1 P(Mj) P(Mi) Bji   −1 Particular case : P(Mj) = 1/q : P(Mi | x) = mi(x) = mi(x) q j=1 mj(x) = 1 q j=1 Bji Reporting : It is useful to separately report {mi(x)} and {P(Mi)}, along with P(Mi | x) = P(Mi) mi(x) q j=1 P(Mj) mj(x) . 21
  • 22. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Example: location-scale Suppose X1, X2, . . . , Xn are i.i.d with density f(xi | µ, σ) = 1 σ g xi − µ σ Several models are entertained: MN : g is N(0, 1) MU : g is Uniform (0, 1) MC: g is Cauchy (0, 1) ML: g is Left Exponential ( 1 σ e(x−µ)/σ , x ≤ µ) MR: g is Right Exponential ( 1 σ e−(x−µ)/σ , x ≥ µ) 22
  • 23. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Difficulty: The models are not nested and have no common low dimensional sufficient statistics; classical model selection would thus typically rely on asymptotics. Prior distribution: choose, for i = 1, . . . , 5, P(Mi) = 1 q = 1 5 ; utilize the objective prior πi(θi) = πi(µ, σ) = 1 σ . Objective improper priors (invariant) are o.k. for the common parameters in situations having the same invariance structure, if the right-Haar prior is used (Berger, Pericchi, and Varshavsky, 1998). Marginal distributions, mN (x | M), for these models can then be calculated in closed form. 23
  • 24. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Here mN (x | M) = n i=1 1 σ g xi−µ σ 1 σ dµ dσ. For the five models, the marginals are 1. Normal: mN (x | MN ) = Γ((n−1)/2) (2 π)(n−1)/2 √ n ( i(xi−¯x)2 ) (n−1)/2 2. Uniform: mN (x | MU ) = 1 n (n−1)(x(n)−x(1)) n−1 3. Cauchy: mN (x | MC) is given in Spiegelhalter (1985). 4. Left Exponential: mN (x | ML) = (n−2)! nn(x(n)−¯x) n−1 5. Right Exponential: mN (x | MR) = (n−2)! nn(¯x−x(1)) n−1 24
  • 25. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Consider four classic data sets: • Darwin’s data (n = 15) • Cavendish’s data (n = 29) • Stigler’s Data Set 9 (n = 20) • A randomly generated Cauchy sample (n = 14) 25
  • 26. SAMSI Fall,2018 ✬ ✫ ✩ ✪ -40-200204060 Darwin’s Data, n = 15 4.04.55.05.5 Cavendish’s Data, n = 29 -40-2002040 Stigler’s Data Set 9, n = 20 0102030 Generated Cauchy Data, n = 15 26
  • 27. SAMSI Fall,2018 ✬ ✫ ✩ ✪ The objective posterior probabilities of the five models, for each data set, mN (x | M) = mN (x | M) mN (x | MN ) + mN (x|MU ) + mN (x|MC ) + mN (x|ML) + mN (x|MR) , are as follows: Data Models set Normal Uniform Cauchy L. Exp. R. Exp. Darwin .390 .056 .430 .124 .0001 Cavendish .986 .010 .004 4×10−8 .0006 Stigler 9 7×10−8 4×10−5 .994 .006 2×10−13 Cauchy 5×10−13 9×10−12 .9999 7×10−18 1×10−4 27
  • 29. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Accounting for model uncertainty • Selecting a single model and using it for inference ignores model uncertainty, resulting in inferior inferences, and considerable overstatements of accuracy. • The Bayesian approach incorporates this uncertainty by model averaging; if, say, inference concerning ξ is desired, it would be based on π(ξ | x) = q i=1 P(Mi | x) π(ξ | x , Mi). Note: ξ must have the same meaning across models. • a useful link, with papers, software, URL’s about BMA http://www.research.att.com/∼volinsky/bma.html 29
  • 30. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Example: Vehicle emissions data from McDonald et. al. (1995) Goal: From i.i.d. vehicle emission data X = (X1, . . . , Xn), one desires to determine the probability that the vehicle type will meet regulatory standards. Traditional models for this type of data are Weibull and lognormal distributions given, respectively, by M1 : fW (x | β, γ) = γ β x β γ−1 exp − x β γ M2 : fL(x | µ, σ2 ) = 1 x √ 2πσ2 exp −(log x − µ)2 2σ2 . Model Averaging Analysis: • For the studied data set, P(M1 | x) = .712. Hence, P(meeting regulatory standard) = .712 P(meeting standard | M1) +.288 P(meeting standard | M2) . 30
  • 31. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Model averaging most used for prediction: Goal: Predict a future Y , given X Optimal prediction is based on model averaging (cf, Draper, 1994). If one of the Mi is indeed the true model (but unknown), the optimal predictor is based on m(y | x) = k i=1 P(Mi | x)mi(y | x, Mi) , where mi(y | x, Mi) = fi(y | θi)πi(θi | x)dθi is the predictive distribution when Mi is true. 31
  • 32. SAMSI Fall,2018 ✬ ✫ ✩ ✪ IV. Reasons for adopting the Bayesian approach to model uncertainty 32
  • 33. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Reason 1: Ease of interpretation • Bayes factors ❀ odds Posterior model probabilities ❀ direct interpretation • In the location example and Darwin’s data, the posterior model probabilities are mN (x) = .390, mU (x) = .056, mC(x) = .430, ¯mL(x) = .124, mR(x) = .0001, Bayes factors are BNU = 6.96, BNC = .91, BNL = 3.15, BNR = 3900. • Interpreting five p-values (or 20, if all pairwise tests are done with different nulls) is much harder. 33
  • 34. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Reason 2: Prior information can be incorporated, if desired • If the default report is mi(x), i = 1, . . . , q, then for any prior probabilities P(Mi | x) = P(Mi) mi(x) q j=1 P(Mj) mj(x) Example: In the Darwin’s data, if symmetric models are twice as likely as the non-symmetric, then Pr(MN ) = Pr(MU ) = Pr(MC) = 1/4 and Pr(ML) = Pr(MR) = 1/8, so that Pr(MN | x) = .416, Pr(MU | x) = .06, Pr(MC | x) = .458, Pr(ML | x) = .066, Pr(MR | x) = .00005. 34
  • 35. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Reason 3: Consistency A. If one of the Mi is true, then ¯mi → 1 as n → ∞ B. If some other model, M∗ (not in the original collection of models), is true, ¯mi → 1 for that model closest to M∗ in Kullback-Leibler divergence (Berk, 1966, Dmochowski, 1994) Reason 4: Ockham’s razor Bayes Factors automatically seek parsimony; no adhoc penalties for model complexity are needed. 35
  • 36. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Reason 5: Frequentist interpretation In many important situations involving testing of M1 versus M2, B12/(1 + B12) and 1/(1 + B12) are ‘optimal’ conditional frequentist error probabilities of Type I and Type II, respectively. Indeed, Pr(H0 | data x) is the frequentist type I error probability, conditional on observing data of the same “strength of evidence” as the actual data x. (Classical unconditional error probabilities make the mistake of reporting the error averaged over data of very different strengths.) See Sellke, Bayarri and Berger (2001) for discussion and earlier references. 36
  • 37. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Reason 6: The Ability to Account for Model Uncertainty through Model Averaging • For non-Bayesians, it is highly problematical to perform inference from a model based on the same data used to select the model. – It is not legitimate to do so in the frequentist paradigm. – It is very hard to determine how to ‘correct’ the inferences (and known methods are often extremely conservative). • Bayesian inference based on model averaging overcomes this problem. 37
  • 38. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Reason 7: Generality of application • Bayesian approach is viable for any number of models, regular or irregular, nested or not, large or small sample sizes. • Classical approach is most developed for only two models (hypotheses), and typically requires at least one of: (i) nested models, (ii) standard distributions, (iii) regular asymptotics. • Example 1: In the location-scale example, there were 5 non-nested models, with small sample sizes, irregular asymptotics, and no reduced sufficient statistics. • Example 2: Suppose the Xi are i.i.d from the N(θ, 1) density, where θ = “mean effect of T1 - mean effect of T2.” Standard Testing Formulation: H0 : θ = 0 (no difference in treatments) vs. H1 : θ = 0 (a difference exists) A More Revealing Formulation: H0 : θ = 0 (no difference) vs. H1 : θ < 0 ( T2 is better) vs. H2 : θ > 0 ( T1 is better) 38
  • 39. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Reason 8: Standard Bayesian ‘ease of application’ motivations 1. In sequential scenarios, there is no need to ‘spend α’ for looks at the data; posterior probabilities are not affected by the reason for stopping experimentation (discussed later). 2. The distributions of known censoring variables are not needed for Bayesian analysis (not discussed). 3. Multiple testing problems can be addressed in a straightforward manner (not discussed). 39
  • 40. SAMSI Fall,2018 ✬ ✫ ✩ ✪ V. Challenges with the Bayesian approach to model uncertainty 40
  • 41. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Difficulty 1: Inadequacy of common priors, such as (improper) objective priors, vague proper priors, and conjugate priors. As in hypothesis testing, improper priors can only be used for “common parameters” in models and vague proper priors are usually horrible! The problem with conjugate priors: Normal Example: Suppose X1, . . . , Xn are i.i.d. N(x | θ, σ2 ) and it is desired to test H0 : θ = 0 versus H1 : θ = 0 . • The natural conjugate priors are π1(σ2 ) = 1/σ2 ; π2(θ | σ2 ) = N(θ | 0, σ2 ) and π2(σ2 ) = 1/σ2 . • Bayes factor: B12 = √ n + 1 1+t2 /[(n−1)(n+1)] 1+t2/(n−1) n/2 , where t = √ n¯x s/ √ n−1 . • Silly behavior, called ‘information inconsistency’: as |t| → ∞, B12 → (n + 1)−(n−1)/2 > 0 For instance: if n = 3, B12 → 1/4 if n = 5, B12 → 1/36 41
  • 42. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Difficulty 2: Computation Computation of the needed marginal distributions (or their ratios) can be very challenging in high dimensions. (A useful approximation is discussed later.) The total number of models under consideration can be enormous (e.g., in variable selection), so good search strategies are needed. Difficulty 3: Posterior Inference When thousands (or millions) of models all have similarly high posterior probability (as is common), how does one summarize the information contained therein? Model averaging summaries are needed, but summaries of what quantities? (Lecture 4.) 42
  • 43. SAMSI Fall,2018 ✬ ✫ ✩ ✪ VI. Approaches to prior choice for model uncertainty • Use of ‘conventional priors’ (Lecture 4) • Use of intrinsic priors (discussed for hypothesis testing in Lecture 1) • Inducing model priors from a single prior 43
  • 44. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Inducing Model Priors from a Single Prior The idea: • Specify a prior πL(βL) for the ‘largest’ model. • Use this prior to induce priors on the other models. Possibilities include – In variable selection, conditioning by setting variables to be removed to zero. (Logically sound if variables are orthogonal) – Marginalizing out the variables to be removed. (Not logically sound) – Matching model parameters for other models with the parameters of the largest model by, say, minimizing Kullback-Leibler divergence between models, and then projecting πL(βL). (Probably best, but hard) 44
  • 45. SAMSI Fall,2018 ✬ ✫ ✩ ✪ An example: Suppose the model parameters are probabilities, with the largest model having probabilities p1, . . . , pm, with prior πL(p1, . . . , pm) ∼ Dirichlet(1, . . . , 1) (i.e., the uniform distribution on the simplex such that m i=1 pi = 1). If model Ml has parameters (pi1 , . . . , pil ), • conditioning yields π(pi1 , . . . , pil , p∗ | other pj = 0) = Dirichlet(1, . . . , 1), where p∗ = 1 − l j=1 pij . This is sensible. • marginalizing yields π(pi1 , . . . , pil , p∗ ) = Dirichlet(1, . . . , 1, m − l). This is typically much too concentrated near zero when m is large, with the pik having means 1/m. 45
  • 46. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Example. Posterior Model Probabilities for Variable Selection in Probit Regression (from CMU Case Studies VII, Viele et. al. case study) Motivating example: Prosopagnosia (face blindness), is a condition (usually developing after brain trauma) under which the individual cannot easily distinguish between faces. A psychological study was conducted by Tarr, Behrmann and Gauthier to address whether this extended to a difficulty of distinguishing other objects, or was particular to faces. The study considered 30 control subjects (C) and 2 subjects (S) diagnosed as having prosopagnosia, and had them try to differentiate between similar faces, similar ‘Greebles’ and similar ‘objects’ at varying levels of difficulty and varying comparison time. 46
  • 47. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Data: Sample size was n = 20, 083, the response and covariates being C = Subject C S = Subject S G = Greebles O = Object D = Difficulty B = Brief time A = Images match or not    ⇒ R = Response (answer correct or not). All variables are binary. All interactions, consisting of all combinations of C, S, G, O, D, B, A (except{C=S=1} and {G=O=1}), are of interest. There are 3 possible subjects (a control and the two patients), 3 possible stimuli (faces, objects, or greebles), and 2 possibilities for each of D, B and A, yielding 3 × 3 × 2 × 2 × 2 = 72 possible interactions. 47
  • 48. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Statistical modeling: For a specified interaction vector Xi, let yi and ni − yi be the numbers of successes and failures among the responses with that interaction vector, with probability of success pi assumed to follow the probit regression model pi = Φ(β1 + 72 j=2 Xijβj), with Φ being the normal cdf. The full model likelihood (up to a fixed proportionality constant) is then f(y | β) = 72 i=1 pyi i (1 − pi)ni−yi . Goal: Select from among the 272 submodels which have some of the βj set equal to zero. (Actually, only models with graphical structure were considered, i.e., if an interaction term is in the model, all the lower order effects must also be there.) 48
  • 49. SAMSI Fall,2018 ✬ ✫ ✩ ✪ Prior Choice: Conditionally inducing submodel priors from a full-model prior • A standard noninformative prior for p = (p1, p2, . . . , p72) is the uniform prior, usable here since it is proper. • Changing variables from p to β yields πL(β) is N72(0, (X′ X)−1 ), where X′ = (X′ 1, X′ 2, . . . , X′ 72). This is the full model prior. • For the model with a nonzero subvector β(j) of dimension kj, with the remaining βi = 0, the induced prior would be πL(β(j) | β(−j) = 0) = Nkj (0, (X′ (j)X(j))−1 ) , where β(−j) is the subvector of parameters that are zero and X(j) is the corresponding design matrix. (This is a nice conditional property of normal distributions.) 49