This document discusses Bayesian estimation of conditional moment models. It presents several approaches for completing conditional moment models for Bayesian processing, including using non-parametric parts, empirical likelihood Bayesian tools, or maximum entropy alternatives. It also discusses simplistic ABC alternatives and innovative aspects of introducing tolerance parameters for misspecification and cancelling conditional aspects. Unconditional and conditional model comparison using empirical likelihoods and Bayes factors is proposed.
Final Presentation given at the conclusion of the 2018 IMSM by the MIT Lincoln Labs Student Working Group.
Group Members: Michael Byrne, Fatoumata Sanogo, Pai Song, Kevin Tsai, Hang Yang and Li Zhu
Final Presentation given at the conclusion of the 2018 IMSM by the MIT Lincoln Labs Student Working Group.
Group Members: Michael Byrne, Fatoumata Sanogo, Pai Song, Kevin Tsai, Hang Yang and Li Zhu
Covariate balancing propensity score STATA user written code by Filip PremikGRAPE
The task is to estimate propensity score | the conditional probability of treatment assignment for further use in causal analysis (matching, weighting, etc.).
Usually the propensity score model is misspecified | it might lead to severely biased estimates of treatment effects.
Given the successful estimation distributions of covariates in the treated and non-treated cells should be statistically equal. Any significant discrepancy might indicate either the mis-specication of probability model or a failure of CIA assumption (Caliendo and Kopeinig, 2008).
Therefore one of the most frequently done checks of the quality of propensity score estimates is a covariate balance check (Dehejia and Wahba, 2002).
These are the slides I gave during three lectures at the YES III meeting in Eindhoven, October 5-7, 2009. They include a presentation of the Savage-Dickey paradox and a comparison of major importance sampling solution, both obtained in collaboration with Jean-Michel Marin.
Covariate balancing propensity score STATA user written code by Filip PremikGRAPE
The task is to estimate propensity score | the conditional probability of treatment assignment for further use in causal analysis (matching, weighting, etc.).
Usually the propensity score model is misspecified | it might lead to severely biased estimates of treatment effects.
Given the successful estimation distributions of covariates in the treated and non-treated cells should be statistically equal. Any significant discrepancy might indicate either the mis-specication of probability model or a failure of CIA assumption (Caliendo and Kopeinig, 2008).
Therefore one of the most frequently done checks of the quality of propensity score estimates is a covariate balance check (Dehejia and Wahba, 2002).
These are the slides I gave during three lectures at the YES III meeting in Eindhoven, October 5-7, 2009. They include a presentation of the Savage-Dickey paradox and a comparison of major importance sampling solution, both obtained in collaboration with Jean-Michel Marin.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
1. Bayesian estimation of conditional
moment models: Discussion
Christian P. Robert
(Paris Dauphine PSL & Warwick U.)
10th French Econometrics Conference
2. [conditional] moment models
Model partly defined by moment conditions
Requires completion by non-parametric part for Bayesian
processing
Or use of empirical likelihood Bayesian tools
[Schennach, 2005; Chib, Shin, Simoni, 2017, 2018]
Warnin: conditional aspects secondary to the discussion
why is Schennach (2005) unknown to the BNP community?
3. [conditional] moment models
Model partly defined by moment conditions
Requires completion by non-parametric part for Bayesian
processing
Or use of empirical likelihood Bayesian tools
[Schennach, 2005; Chib, Shin, Simoni, 2017, 2018]
Warnin: conditional aspects secondary to the discussion
why is Schennach (2005) unknown to the BNP community?
4. Empirical likelihood
Given dataset x1, . . . , xn and moment constraints
E[g(X, θ)] = 0
empirical likelihood defined as
emp
(θ|y) =
n
i=1
ˆpi
with (ˆp1, . . . , ˆpn) minimising
n
i=1
pi log(pi)
under the constraint
n
i=1
pig(xi, θ) = 0
[Owen, 1988]
6. Maximum entropy alternative?
Given a reference measure dµ, use the least informative
sampling distribution [if it exists]
f0(x|θ) =
dPθ(x)
dµ
= exp{λ(θ)T
g(x, θ)}
with λ(θ) Lagrangian for constraint
[Jaynes, 2008]
and associate with prior π(θ)
Warnin: Lagrangian differs from BETEL representation of
empirical likelihood
Unlikely to be a generative model
7. Maximum entropy alternative?
Given a reference measure dµ, use the least informative
sampling distribution [if it exists]
f0(x|θ) =
dPθ(x)
dµ
= exp{λ(θ)T
g(x, θ)}
with λ(θ) Lagrangian for constraint
[Jaynes, 2008]
and associate with prior π(θ)
Warnin: Lagrangian differs from BETEL representation of
empirical likelihood
Unlikely to be a generative model
8. Simplistic ABC alternative?
In case generative model available [standard ABC]
Generate θ from π(θ)
Generate pseudo-sample y∗(θ)
Compute
ρ(θ) =
1
n
n
i=1
{g(yi, θ) − g(y∗
i (θ), θ)}
Select smallest 1% in terms of distance ||ρ(θ)||
[Rubin, 1984; Tavar´e et al., 1998]
9. Simplistic ABC alternative?
In case generative model unavailable [standard ABC
inapplicable]
Generate θ from π(θ)
Compute
ρ(θ) =
1
n
n
i=1
g(yi, θ)
Select smallest 1% in terms of distance ||ρ(θ)||
[Rubin, 1984; Tavar´e et al., 1998]
10. Innovative aspects
introduction of tolerance (nuisance) extra-parameters when
some constraints do not hold (misspecification)
Chib-Jeliazkov representation of the marginal empirical
likelihood
Bernstein-von Mises theorem (aka CLT) under correct
specification and under misspecification
cancelling conditional aspects by choice of (sub) functional
basis [with extra parameter K]
choice of K removing conditional structure based solely on
asymptotic and greatly increasing the number of constraints
hence presumably lowering efficiency [and increasing cost?]
Role of prior on indicators?
11. Innovative aspects
introduction of tolerance (nuisance) extra-parameters when
some constraints do not hold (misspecification)
Chib-Jeliazkov representation of the marginal empirical
likelihood
Bernstein-von Mises theorem (aka CLT) under correct
specification and under misspecification
cancelling conditional aspects by choice of (sub) functional
basis [with extra parameter K]
choice of K removing conditional structure based solely on
asymptotic and greatly increasing the number of constraints
hence presumably lowering efficiency [and increasing cost?]
Role of prior on indicators?
12. Unconditional model comparison
Novel and exciting aspect: to compare models (or rather
moment restrictions) by genuine Bayes factors derived from
empirical likelihoods.
Grand (encompassing) model obtained by considering all
moment restrictions at once and use of spike & slab prior on
indicators of active constraints
[George and Foster, 1997]
First sounds like too restrictive, except extra-parameters there
to monitor constraints that actually hold
unclear whether or not priors on extra-parameters can be
automatically derived from single prior
how much impact value of Bayes factor
13. Conditional model comparison
Extension without grand model is major improvement, thanks
to the candidate’s formula
log m (y) = log π (θ∗
) + log p (y|θ∗
) − log π (θ∗
|y)
[Besag, 1989; Chib, 1995]
Plus some consistency results
Requires simulation from eMCMC for each subset of
constraints, with a potential combinatoric explosion