SlideShare a Scribd company logo
Logit stick-breaking priors for partially
exchangeable count data
Tommaso Rigon
http://tommasorigon.github.io
Bocconi University
SIS 2018, Palermo, 22-06-2018
Tommaso Rigon (Bocconi) LSBP SIS 2018 1 / 20
Introduction
Partial exchangeability
A bivariate sequence (Xi , Yj )i,j≥1 is partially exchangeable if
(X1, . . . , Xn1
, Y1, . . . , Yn2
)
d
= (Xσ(1), . . . , Xσ(n1), Yσ (1), . . . , Yσ (n2)),
for any n1, n2 ≥ 1 and any permutations σ and σ .
de Finetti’s representation theorem
The sequence (Xi , Yj )i,j≥1 is partially exchangeable if and only if
P(X1 ∈ A1, . . . Xn1
∈ An1
,Y1 ∈ B1, . . . , Yn2
∈ Bn2
) =
=
P2
n1
i=1
p1(Ai )
n2
j=1
p2(Bj )Q2(dp1, dp2).
Tommaso Rigon (Bocconi) LSBP SIS 2018 2 / 20
Introduction
Partial exchangeability
Thus, a draw from (Xi , Yj )i,j≥1 can be expressed hierarchically:
(Xi | p1)
iid
∼ p1, (Yj | p2)
iid
∼ p2,
(p1, p2) ∼ Q2
where each (Xi | p1) is independent on each (Yj | p2).
The quantity (p1, p2) is a vector of random probability measures and Q2
can be interpreted as their prior law.
If p1
|=
p2 then the observations (X1, . . . , Xn1
) and (Y1, . . . , Yn2
) can be
modeled separately and independently.
Dependence between p1 and p2 allows for borrowing of information
across the sequences.
Tommaso Rigon (Bocconi) LSBP SIS 2018 3 / 20
Introduction
Partial exchangeability with count data
Let Y1, . . . , Yn ∈ N be a collection of count response variables, each
corresponding to a qualitative covariate xi ∈ {1, . . . , J}.
Each data point yi is a conditionally independent draw from
(Yi | xi = j)
ind
∼ pj , i = 1, . . . , n,
where pj denotes the probability mass function of (Yi | xi = j).
This is an instance of partial exchangeability with count data.
Model elicitation is completed by specifying a prior law QJ for the vector
of random probability distributions (p1, . . . , pJ ) ∼ QJ .
Tommaso Rigon (Bocconi) LSBP SIS 2018 4 / 20
Introduction
Desiderata
We seek for a Bayesian inferential procedure which:
provides a flexible, i.e. nonparametric, estimate for each law pj ;
allows for borrowing of information across the J groups;
is scalable, in the sense that is computationally feasible for large n or
large p;
has a reasonable interpretation, thus facilitating the incorporation of
prior information.
Tommaso Rigon (Bocconi) LSBP SIS 2018 5 / 20
Introduction
Bayesian nonparametric mixture models
A flexible Bayesian model for density estimation assumes
p(y) =
Θ
K(y; θ)dP(θ),
where K(y; θ) is a known parametric kernel (e.g. Poisson, negative
binomial), and P(θ) is a prior mixing measure.
Tommaso Rigon (Bocconi) LSBP SIS 2018 6 / 20
Introduction
Bayesian nonparametric mixture models
A flexible Bayesian model for density estimation assumes
p(y) =
Θ
K(y; θ)dP(θ),
where K(y; θ) is a known parametric kernel (e.g. Poisson, negative
binomial), and P(θ) is a prior mixing measure.
If the mixing measure is a Dirichlet process (Lo 1984), then exploiting
the stick-breaking construction:
p(y) =
Θ
K(y; θ)dP(θ) =
∞
h=1
πhK(y; θh), πh = νh
h−1
l=1
(1 − νl ),
with θh
iid
∼ P0 and νh
iid
∼ Beta(1, α), for h = 1, . . . , ∞.
Tommaso Rigon (Bocconi) LSBP SIS 2018 6 / 20
Introduction
The hierarchical Dirichlet process
A popular extension of the Lo model for partially exchangeable data is
the hierarchical Dirichlet process (Teh et al. 2006).
In the hierarchical Dirichlet process, for j = 1, . . . , J,
pj (y) =
Θ
K(y; θ)dPj (θ) =
∞
h=1
πhj K(y; θh),
(Pj | P0)
iid
∼ DP(αP0),
P0 ∼ DP(α0P00).
Under this specification, different groups shares the same atoms, while
having different mixture weights =⇒ borrowing of information.
Alternative models? Simple conditional algorithms?
Tommaso Rigon (Bocconi) LSBP SIS 2018 7 / 20
Introduction
Main contributions
We explored computational, interpretational and theoretical aspects of
the logit stick-breaking process of Ren et al. (2011) in the partially
exchangeable setting, and using count data.
The LSBP can be constructed via sequential logistic regression, allowing
a more clear interpretation of the parameters involved.
For the LSBP we derived an efficient Gibbs sampler based on a
Pólya-gamma data-augmentation.
Further theoretical support.
Tommaso Rigon (Bocconi) LSBP SIS 2018 8 / 20
Logit stick-breaking process
The LSBP model
Our proposal has the same structure of the HDP:
pj (y) =
Θ
Pois(y; θ)dPj (θ) =
∞
h=1
πhj Pois(y; θh), j = 1, . . . , J,
with conditionally conjugate prior for the atoms θh
iid
∼ Gamma(aθ, bθ).
The pj (y)s share the atoms θh and are characterized by group-specific
mixing weights.
The mixing weights πhl have a stick-breaking representation. Moreover,
the prior of the LSBP is different from the one of the HDP.
Tommaso Rigon (Bocconi) LSBP SIS 2018 9 / 20
Logit stick-breaking process
Hierarchical representation
Samples from a LSBP model can be obtained hierarchically.
For each data point yi , sample the group indicator Gi denoting the
mixture component
pr(Gi = h | xi = j) = πhj = νhj
h−1
l=1
(1 − νlj ).
Then, conditionally on Gi , sample the count response variable from
(Yi | Gi = h) ∼ Pois(θh).
Tommaso Rigon (Bocconi) LSBP SIS 2018 10 / 20
Logit stick-breaking process
Sequential interpretation
Can we interpret the stick-breaking weights νhl ?
Yes, indeed they can be rearranged as:
νhj =
πhj
1 −
h−1
l=1 πlj
=
pr(Gi = h | xi = j)
pr(Gi > h − 1 | xi = j)
= pr(Gi = h | Gi > h − 1, xi = j).
Each νhj is the probability of being allocated to component h,
conditionally on the event of surviving to the previous components.
Each I(Gi = h) = ζih is the assignment indicator of each unit to the h-th
component
ζih = zih
h−1
l=1
(1 − zil ), (zih | xi = j) ∼ Bern(νhj ).
Tommaso Rigon (Bocconi) LSBP SIS 2018 11 / 20
Logit stick-breaking process
Continuation-ratio logistic regressions
We need some prior specification for stick-breaking weights νhl .
Consistently with classical generalized linear model, a natural choice is
to define
logit(νhj ) = αhj , with αh = (αh1, . . . , αhJ )
iid
∼ NJ (µα, Σα),
independently for every h = 1, . . . , +∞.
If the matrix Σα is diagonal, then the mixture weights πhj are a priori
independent across groups.
Stronger borrowing of information—i.e. dependence across the mixing
weights—can be induced for non-diagonal choices of Σα.
Tommaso Rigon (Bocconi) LSBP SIS 2018 12 / 20
Logit stick-breaking process
Prior quantities
Prior moments
Let (P1, . . . , PJ ) be a vector of random probability measure induced by the
LSBP. Then, for any measurable set B, and for any j and j , then
E{Pj (B)} = P0(B),
cov{Pj (B), Pj (B)} = P0(B)(1 − P0(B))
E(ν1j ν1j )
E(ν1j ) + E(ν1j ) − E(ν1j ν1j )
.
These expectations have not a closed form solution, but they can be
easily obtained numerically.
The correlation corr{Pj (B), Pj (B)} does not depend on B, and
therefore it is often interpreted as a global measure of dependence.
Tommaso Rigon (Bocconi) LSBP SIS 2018 13 / 20
Logit stick-breaking process
Deterministic truncation of the infinite process
The LSBP is an infinite dimensional process =⇒ computational
challenges.
We propose a truncated version of the vector of random probability
measure (P1, . . . , PJ ), which can be regarded as an approximation of the
infinite process.
We induce the truncation by letting νHj = 1 for some integer H > 1,
which guarantees that
H
h=1 πhj = 1 almost surely.
According to Theorem 1 in Rigon and Durante (2018), the “discrepancy"
between the two processes is exponentially decreasing in H.
Tommaso Rigon (Bocconi) LSBP SIS 2018 14 / 20
Posterior inference
The Pólya-gamma data augmentation
The Gibbs sampler is based on the Pólya-gamma data augmentation,
which relies on the integral identity
ezihψ(xi ) αh
1 + eψ(xi ) αh
=
1
2 R+
f (ωih) exp (zih − 0.5)ψ(xi ) αh − ωih(ψ(xi ) αh)2
/2 dωih,
where p(ωi ) is the pdf of a PG and ψ(xi ) = {I(xi = 1), . . . , I(xi = J)} .
Tommaso Rigon (Bocconi) LSBP SIS 2018 15 / 20
Posterior inference
The Pólya-gamma data augmentation
The Gibbs sampler is based on the Pólya-gamma data augmentation,
which relies on the integral identity
ezihψ(xi ) αh
1 + eψ(xi ) αh
=
1
2 R+
f (ωih) exp (zih − 0.5)ψ(xi ) αh − ωih(ψ(xi ) αh)2
/2 dωih,
where p(ωi ) is the pdf of a PG and ψ(xi ) = {I(xi = 1), . . . , I(xi = J)} .
The augmented log-likelihood has a quadratic form =⇒ simple
computations and conjugacy with Gaussian priors.
Tommaso Rigon (Bocconi) LSBP SIS 2018 15 / 20
Posterior inference
The Pólya-gamma data augmentation
The Gibbs sampler is based on the Pólya-gamma data augmentation,
which relies on the integral identity
ezihψ(xi ) αh
1 + eψ(xi ) αh
=
1
2 R+
f (ωih) exp (zih − 0.5)ψ(xi ) αh − ωih(ψ(xi ) αh)2
/2 dωih,
where p(ωi ) is the pdf of a PG and ψ(xi ) = {I(xi = 1), . . . , I(xi = J)} .
The augmented log-likelihood has a quadratic form =⇒ simple
computations and conjugacy with Gaussian priors.
The conditional distribution of (ωi | −) is still in the class of the
Pólya-gamma distributions =⇒ conjugacy.
Tommaso Rigon (Bocconi) LSBP SIS 2018 15 / 20
Posterior inference
Posterior inference via Gibbs sampling
for i from 1 to n do update Gi from the discrete variable with probabilities
pr(Gi = h | −) =
πhxi
Pois(yi ; θh)
H
q=1 πqxi Pois(yi ; θq)
,
for every h = 1, . . . , H. From Gi derive the associated zih indicators.
for h from 1 to H − 1 do update the logit stick-breaking parameters αh
for every i such that Gi > h − 1 do sample the Pòlya–gamma data ωih from
(ωih | −) ∼ PG{1, ψ(xi ) αh}.
Given the Pòlya–gamma augmented data, update αh from the full conditional
(αh | −) ∼ NR (µαh , Σαh ), standard Bayesian linear regression
for h from 1 to H do update each kernel parameter θh from
(θh | −) ∼ Gamma

aθ +
i:Gi =h
yi , bθ +
n
i=1
I(Gi = h)

 .
Tommaso Rigon (Bocconi) LSBP SIS 2018 16 / 20
Illustration
Application to the seizure dataset
We apply the LSBP Poisson mixture model to the seizure dataset, which
is also available in the flexmix R package.
The dataset consists of daily myoclonic seizure counts (seizures) for a
single subject, comprising a series of n = 140 days.
After 27 days of baseline observation (Treatment:No), the subject
received monthly infusions of intravenous gamma globulin
(Treatment:Yes).
We aim to compare the J = 2 groups: days with treatment and days
without treatment.
Tommaso Rigon (Bocconi) LSBP SIS 2018 17 / 20
Illustration
Application to the seizure dataset
Treatment: Yes Treatment: No
RelativefrequencyLSBPPoissonmixture
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70
0.00
0.05
0.10
0.15
0.20
0.00
0.05
0.10
0.15
0.20
Number of seizures
Estimatedpdf
Tommaso Rigon (Bocconi) LSBP SIS 2018 18 / 20
Discussion and conclusions
Possible extensions
The LSBP for partially exchangeable random variables could be used as
a building block for more sophisticated models.
For instance, one could use the partially exchangeable LSBP as a prior
for infinite hidden Markov models or for topic modeling, where the HDP
is usually employed.
The computational advantages of the LSBP might lead to major
improvements in those settings.
Tommaso Rigon (Bocconi) LSBP SIS 2018 19 / 20
Discussion and conclusions
Summary
We proposed a Bayesian nonparametric mixture model for partially
exchangeable count data.
We explored some of its theoretical properties and we developed a
simple Gibbs sampler for posterior inference.
References
Polson, N. G., Scott, J. G. and Windle, J. (2013). Bayesian inference for logistic models using
Polya-Gamma latent variables. Journal of the American Statistical Association, 108(504), 1–42.
Ren, L., Du, L., Carin, L. and Dunson, D. B. (2011), Logistic stick-breaking process, Journal of Machine
Learning Research 12, 203–239.
Rigon, T. and Durante, D., (2018), Logit stick-breaking priors for Bayesian density regression, ArXiv.
Rodriguez, A. and Dunson, D. B. (2011), Nonparametric Bayesian models through probit stick-breaking
processes, Bayesian Analysis 6(1), 145–178.
Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M. (2006). Hierarchical Dirichlet processes. Journal of the
American Statistical Association, 101(476), 1566—1581.
Tommaso Rigon (Bocconi) LSBP SIS 2018 20 / 20

More Related Content

What's hot

On the solvability of a system of forward-backward linear equations with unbo...
On the solvability of a system of forward-backward linear equations with unbo...On the solvability of a system of forward-backward linear equations with unbo...
On the solvability of a system of forward-backward linear equations with unbo...
Nikita V. Artamonov
 
New Mathematical Tools for the Financial Sector
New Mathematical Tools for the Financial SectorNew Mathematical Tools for the Financial Sector
New Mathematical Tools for the Financial Sector
SSA KPI
 
Montpellier Math Colloquium
Montpellier Math ColloquiumMontpellier Math Colloquium
Montpellier Math Colloquium
Christian Robert
 
Information geometry: Dualistic manifold structures and their uses
Information geometry: Dualistic manifold structures and their usesInformation geometry: Dualistic manifold structures and their uses
Information geometry: Dualistic manifold structures and their uses
Frank Nielsen
 
An elementary introduction to information geometry
An elementary introduction to information geometryAn elementary introduction to information geometry
An elementary introduction to information geometry
Frank Nielsen
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
Christian Robert
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
Valentin De Bortoli
 
Levitan Centenary Conference Talk, June 27 2014
Levitan Centenary Conference Talk, June 27 2014Levitan Centenary Conference Talk, June 27 2014
Levitan Centenary Conference Talk, June 27 2014
Nikita V. Artamonov
 
ABC: How Bayesian can it be?
ABC: How Bayesian can it be?ABC: How Bayesian can it be?
ABC: How Bayesian can it be?
Christian Robert
 
ABC-Xian
ABC-XianABC-Xian
ABC-Xian
Deb Roy
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010
Christian Robert
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Hamilton-Jacobi approach for second order traffic flow models
Hamilton-Jacobi approach for second order traffic flow modelsHamilton-Jacobi approach for second order traffic flow models
Hamilton-Jacobi approach for second order traffic flow models
Guillaume Costeseque
 
Omiros' talk on the Bernoulli factory problem
Omiros' talk on the  Bernoulli factory problemOmiros' talk on the  Bernoulli factory problem
Omiros' talk on the Bernoulli factory problem
BigMC
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
Christian Robert
 
Signal Processing Course : Sparse Regularization of Inverse Problems
Signal Processing Course : Sparse Regularization of Inverse ProblemsSignal Processing Course : Sparse Regularization of Inverse Problems
Signal Processing Course : Sparse Regularization of Inverse Problems
Gabriel Peyré
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Christian Robert
 
Dependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsDependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian Nonparametrics
Julyan Arbel
 
N17. Bellettini- "constraining spacetime torsion"
N17. Bellettini- "constraining spacetime torsion" N17. Bellettini- "constraining spacetime torsion"
N17. Bellettini- "constraining spacetime torsion"
IAPS
 

What's hot (20)

On the solvability of a system of forward-backward linear equations with unbo...
On the solvability of a system of forward-backward linear equations with unbo...On the solvability of a system of forward-backward linear equations with unbo...
On the solvability of a system of forward-backward linear equations with unbo...
 
New Mathematical Tools for the Financial Sector
New Mathematical Tools for the Financial SectorNew Mathematical Tools for the Financial Sector
New Mathematical Tools for the Financial Sector
 
Montpellier Math Colloquium
Montpellier Math ColloquiumMontpellier Math Colloquium
Montpellier Math Colloquium
 
Information geometry: Dualistic manifold structures and their uses
Information geometry: Dualistic manifold structures and their usesInformation geometry: Dualistic manifold structures and their uses
Information geometry: Dualistic manifold structures and their uses
 
An elementary introduction to information geometry
An elementary introduction to information geometryAn elementary introduction to information geometry
An elementary introduction to information geometry
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Levitan Centenary Conference Talk, June 27 2014
Levitan Centenary Conference Talk, June 27 2014Levitan Centenary Conference Talk, June 27 2014
Levitan Centenary Conference Talk, June 27 2014
 
ABC: How Bayesian can it be?
ABC: How Bayesian can it be?ABC: How Bayesian can it be?
ABC: How Bayesian can it be?
 
ABC-Xian
ABC-XianABC-Xian
ABC-Xian
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Hamilton-Jacobi approach for second order traffic flow models
Hamilton-Jacobi approach for second order traffic flow modelsHamilton-Jacobi approach for second order traffic flow models
Hamilton-Jacobi approach for second order traffic flow models
 
Omiros' talk on the Bernoulli factory problem
Omiros' talk on the  Bernoulli factory problemOmiros' talk on the  Bernoulli factory problem
Omiros' talk on the Bernoulli factory problem
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
 
Signal Processing Course : Sparse Regularization of Inverse Problems
Signal Processing Course : Sparse Regularization of Inverse ProblemsSignal Processing Course : Sparse Regularization of Inverse Problems
Signal Processing Course : Sparse Regularization of Inverse Problems
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
 
Dependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsDependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian Nonparametrics
 
N17. Bellettini- "constraining spacetime torsion"
N17. Bellettini- "constraining spacetime torsion" N17. Bellettini- "constraining spacetime torsion"
N17. Bellettini- "constraining spacetime torsion"
 

Similar to Logit stick-breaking priors for partially exchangeable count data

Bayesian regression models and treed Gaussian process models
Bayesian regression models and treed Gaussian process modelsBayesian regression models and treed Gaussian process models
Bayesian regression models and treed Gaussian process models
Tommaso Rigon
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
Suvrat Mishra
 
Probability Cheatsheet.pdf
Probability Cheatsheet.pdfProbability Cheatsheet.pdf
Probability Cheatsheet.pdf
ChinmayeeJonnalagadd2
 
Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniques
Pierre Jacob
 
Asymptotics for discrete random measures
Asymptotics for discrete random measuresAsymptotics for discrete random measures
Asymptotics for discrete random measures
Julyan Arbel
 
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
The Statistical and Applied Mathematical Sciences Institute
 
BAYSM'14, Wien, Austria
BAYSM'14, Wien, AustriaBAYSM'14, Wien, Austria
BAYSM'14, Wien, Austria
Christian Robert
 
Multimedia Communication Lec02: Info Theory and Entropy
Multimedia Communication Lec02: Info Theory and EntropyMultimedia Communication Lec02: Info Theory and Entropy
Multimedia Communication Lec02: Info Theory and Entropy
United States Air Force Academy
 
Bayes Independence Test
Bayes Independence TestBayes Independence Test
Bayes Independence Test
Joe Suzuki
 
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-LassoSparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Laboratoire Statistique et génome
 
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Advanced-Concepts-Team
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
The Statistical and Applied Mathematical Sciences Institute
 
Hypothesis testings on individualized treatment rules
Hypothesis testings on individualized treatment rulesHypothesis testings on individualized treatment rules
Hypothesis testings on individualized treatment rules
Young-Geun Choi
 
Structured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical modelStructured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical model
Laboratoire Statistique et génome
 
Nested sampling
Nested samplingNested sampling
Nested sampling
Christian Robert
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
The Statistical and Applied Mathematical Sciences Institute
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applications
Frank Nielsen
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
Joachim Gwoke
 
guenomu software -- model and agorithm in 2013
guenomu software -- model and agorithm in 2013guenomu software -- model and agorithm in 2013
guenomu software -- model and agorithm in 2013
Leonardo de Oliveira Martins
 
HPWFcorePRES--FUR2016
HPWFcorePRES--FUR2016HPWFcorePRES--FUR2016
HPWFcorePRES--FUR2016
G Charles-Cadogan
 

Similar to Logit stick-breaking priors for partially exchangeable count data (20)

Bayesian regression models and treed Gaussian process models
Bayesian regression models and treed Gaussian process modelsBayesian regression models and treed Gaussian process models
Bayesian regression models and treed Gaussian process models
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Probability Cheatsheet.pdf
Probability Cheatsheet.pdfProbability Cheatsheet.pdf
Probability Cheatsheet.pdf
 
Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniques
 
Asymptotics for discrete random measures
Asymptotics for discrete random measuresAsymptotics for discrete random measures
Asymptotics for discrete random measures
 
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
 
BAYSM'14, Wien, Austria
BAYSM'14, Wien, AustriaBAYSM'14, Wien, Austria
BAYSM'14, Wien, Austria
 
Multimedia Communication Lec02: Info Theory and Entropy
Multimedia Communication Lec02: Info Theory and EntropyMultimedia Communication Lec02: Info Theory and Entropy
Multimedia Communication Lec02: Info Theory and Entropy
 
Bayes Independence Test
Bayes Independence TestBayes Independence Test
Bayes Independence Test
 
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-LassoSparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
 
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
 
Hypothesis testings on individualized treatment rules
Hypothesis testings on individualized treatment rulesHypothesis testings on individualized treatment rules
Hypothesis testings on individualized treatment rules
 
Structured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical modelStructured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical model
 
Nested sampling
Nested samplingNested sampling
Nested sampling
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applications
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
guenomu software -- model and agorithm in 2013
guenomu software -- model and agorithm in 2013guenomu software -- model and agorithm in 2013
guenomu software -- model and agorithm in 2013
 
HPWFcorePRES--FUR2016
HPWFcorePRES--FUR2016HPWFcorePRES--FUR2016
HPWFcorePRES--FUR2016
 

Recently uploaded

The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 

Recently uploaded (20)

The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 

Logit stick-breaking priors for partially exchangeable count data

  • 1. Logit stick-breaking priors for partially exchangeable count data Tommaso Rigon http://tommasorigon.github.io Bocconi University SIS 2018, Palermo, 22-06-2018 Tommaso Rigon (Bocconi) LSBP SIS 2018 1 / 20
  • 2. Introduction Partial exchangeability A bivariate sequence (Xi , Yj )i,j≥1 is partially exchangeable if (X1, . . . , Xn1 , Y1, . . . , Yn2 ) d = (Xσ(1), . . . , Xσ(n1), Yσ (1), . . . , Yσ (n2)), for any n1, n2 ≥ 1 and any permutations σ and σ . de Finetti’s representation theorem The sequence (Xi , Yj )i,j≥1 is partially exchangeable if and only if P(X1 ∈ A1, . . . Xn1 ∈ An1 ,Y1 ∈ B1, . . . , Yn2 ∈ Bn2 ) = = P2 n1 i=1 p1(Ai ) n2 j=1 p2(Bj )Q2(dp1, dp2). Tommaso Rigon (Bocconi) LSBP SIS 2018 2 / 20
  • 3. Introduction Partial exchangeability Thus, a draw from (Xi , Yj )i,j≥1 can be expressed hierarchically: (Xi | p1) iid ∼ p1, (Yj | p2) iid ∼ p2, (p1, p2) ∼ Q2 where each (Xi | p1) is independent on each (Yj | p2). The quantity (p1, p2) is a vector of random probability measures and Q2 can be interpreted as their prior law. If p1 |= p2 then the observations (X1, . . . , Xn1 ) and (Y1, . . . , Yn2 ) can be modeled separately and independently. Dependence between p1 and p2 allows for borrowing of information across the sequences. Tommaso Rigon (Bocconi) LSBP SIS 2018 3 / 20
  • 4. Introduction Partial exchangeability with count data Let Y1, . . . , Yn ∈ N be a collection of count response variables, each corresponding to a qualitative covariate xi ∈ {1, . . . , J}. Each data point yi is a conditionally independent draw from (Yi | xi = j) ind ∼ pj , i = 1, . . . , n, where pj denotes the probability mass function of (Yi | xi = j). This is an instance of partial exchangeability with count data. Model elicitation is completed by specifying a prior law QJ for the vector of random probability distributions (p1, . . . , pJ ) ∼ QJ . Tommaso Rigon (Bocconi) LSBP SIS 2018 4 / 20
  • 5. Introduction Desiderata We seek for a Bayesian inferential procedure which: provides a flexible, i.e. nonparametric, estimate for each law pj ; allows for borrowing of information across the J groups; is scalable, in the sense that is computationally feasible for large n or large p; has a reasonable interpretation, thus facilitating the incorporation of prior information. Tommaso Rigon (Bocconi) LSBP SIS 2018 5 / 20
  • 6. Introduction Bayesian nonparametric mixture models A flexible Bayesian model for density estimation assumes p(y) = Θ K(y; θ)dP(θ), where K(y; θ) is a known parametric kernel (e.g. Poisson, negative binomial), and P(θ) is a prior mixing measure. Tommaso Rigon (Bocconi) LSBP SIS 2018 6 / 20
  • 7. Introduction Bayesian nonparametric mixture models A flexible Bayesian model for density estimation assumes p(y) = Θ K(y; θ)dP(θ), where K(y; θ) is a known parametric kernel (e.g. Poisson, negative binomial), and P(θ) is a prior mixing measure. If the mixing measure is a Dirichlet process (Lo 1984), then exploiting the stick-breaking construction: p(y) = Θ K(y; θ)dP(θ) = ∞ h=1 πhK(y; θh), πh = νh h−1 l=1 (1 − νl ), with θh iid ∼ P0 and νh iid ∼ Beta(1, α), for h = 1, . . . , ∞. Tommaso Rigon (Bocconi) LSBP SIS 2018 6 / 20
  • 8. Introduction The hierarchical Dirichlet process A popular extension of the Lo model for partially exchangeable data is the hierarchical Dirichlet process (Teh et al. 2006). In the hierarchical Dirichlet process, for j = 1, . . . , J, pj (y) = Θ K(y; θ)dPj (θ) = ∞ h=1 πhj K(y; θh), (Pj | P0) iid ∼ DP(αP0), P0 ∼ DP(α0P00). Under this specification, different groups shares the same atoms, while having different mixture weights =⇒ borrowing of information. Alternative models? Simple conditional algorithms? Tommaso Rigon (Bocconi) LSBP SIS 2018 7 / 20
  • 9. Introduction Main contributions We explored computational, interpretational and theoretical aspects of the logit stick-breaking process of Ren et al. (2011) in the partially exchangeable setting, and using count data. The LSBP can be constructed via sequential logistic regression, allowing a more clear interpretation of the parameters involved. For the LSBP we derived an efficient Gibbs sampler based on a Pólya-gamma data-augmentation. Further theoretical support. Tommaso Rigon (Bocconi) LSBP SIS 2018 8 / 20
  • 10. Logit stick-breaking process The LSBP model Our proposal has the same structure of the HDP: pj (y) = Θ Pois(y; θ)dPj (θ) = ∞ h=1 πhj Pois(y; θh), j = 1, . . . , J, with conditionally conjugate prior for the atoms θh iid ∼ Gamma(aθ, bθ). The pj (y)s share the atoms θh and are characterized by group-specific mixing weights. The mixing weights πhl have a stick-breaking representation. Moreover, the prior of the LSBP is different from the one of the HDP. Tommaso Rigon (Bocconi) LSBP SIS 2018 9 / 20
  • 11. Logit stick-breaking process Hierarchical representation Samples from a LSBP model can be obtained hierarchically. For each data point yi , sample the group indicator Gi denoting the mixture component pr(Gi = h | xi = j) = πhj = νhj h−1 l=1 (1 − νlj ). Then, conditionally on Gi , sample the count response variable from (Yi | Gi = h) ∼ Pois(θh). Tommaso Rigon (Bocconi) LSBP SIS 2018 10 / 20
  • 12. Logit stick-breaking process Sequential interpretation Can we interpret the stick-breaking weights νhl ? Yes, indeed they can be rearranged as: νhj = πhj 1 − h−1 l=1 πlj = pr(Gi = h | xi = j) pr(Gi > h − 1 | xi = j) = pr(Gi = h | Gi > h − 1, xi = j). Each νhj is the probability of being allocated to component h, conditionally on the event of surviving to the previous components. Each I(Gi = h) = ζih is the assignment indicator of each unit to the h-th component ζih = zih h−1 l=1 (1 − zil ), (zih | xi = j) ∼ Bern(νhj ). Tommaso Rigon (Bocconi) LSBP SIS 2018 11 / 20
  • 13. Logit stick-breaking process Continuation-ratio logistic regressions We need some prior specification for stick-breaking weights νhl . Consistently with classical generalized linear model, a natural choice is to define logit(νhj ) = αhj , with αh = (αh1, . . . , αhJ ) iid ∼ NJ (µα, Σα), independently for every h = 1, . . . , +∞. If the matrix Σα is diagonal, then the mixture weights πhj are a priori independent across groups. Stronger borrowing of information—i.e. dependence across the mixing weights—can be induced for non-diagonal choices of Σα. Tommaso Rigon (Bocconi) LSBP SIS 2018 12 / 20
  • 14. Logit stick-breaking process Prior quantities Prior moments Let (P1, . . . , PJ ) be a vector of random probability measure induced by the LSBP. Then, for any measurable set B, and for any j and j , then E{Pj (B)} = P0(B), cov{Pj (B), Pj (B)} = P0(B)(1 − P0(B)) E(ν1j ν1j ) E(ν1j ) + E(ν1j ) − E(ν1j ν1j ) . These expectations have not a closed form solution, but they can be easily obtained numerically. The correlation corr{Pj (B), Pj (B)} does not depend on B, and therefore it is often interpreted as a global measure of dependence. Tommaso Rigon (Bocconi) LSBP SIS 2018 13 / 20
  • 15. Logit stick-breaking process Deterministic truncation of the infinite process The LSBP is an infinite dimensional process =⇒ computational challenges. We propose a truncated version of the vector of random probability measure (P1, . . . , PJ ), which can be regarded as an approximation of the infinite process. We induce the truncation by letting νHj = 1 for some integer H > 1, which guarantees that H h=1 πhj = 1 almost surely. According to Theorem 1 in Rigon and Durante (2018), the “discrepancy" between the two processes is exponentially decreasing in H. Tommaso Rigon (Bocconi) LSBP SIS 2018 14 / 20
  • 16. Posterior inference The Pólya-gamma data augmentation The Gibbs sampler is based on the Pólya-gamma data augmentation, which relies on the integral identity ezihψ(xi ) αh 1 + eψ(xi ) αh = 1 2 R+ f (ωih) exp (zih − 0.5)ψ(xi ) αh − ωih(ψ(xi ) αh)2 /2 dωih, where p(ωi ) is the pdf of a PG and ψ(xi ) = {I(xi = 1), . . . , I(xi = J)} . Tommaso Rigon (Bocconi) LSBP SIS 2018 15 / 20
  • 17. Posterior inference The Pólya-gamma data augmentation The Gibbs sampler is based on the Pólya-gamma data augmentation, which relies on the integral identity ezihψ(xi ) αh 1 + eψ(xi ) αh = 1 2 R+ f (ωih) exp (zih − 0.5)ψ(xi ) αh − ωih(ψ(xi ) αh)2 /2 dωih, where p(ωi ) is the pdf of a PG and ψ(xi ) = {I(xi = 1), . . . , I(xi = J)} . The augmented log-likelihood has a quadratic form =⇒ simple computations and conjugacy with Gaussian priors. Tommaso Rigon (Bocconi) LSBP SIS 2018 15 / 20
  • 18. Posterior inference The Pólya-gamma data augmentation The Gibbs sampler is based on the Pólya-gamma data augmentation, which relies on the integral identity ezihψ(xi ) αh 1 + eψ(xi ) αh = 1 2 R+ f (ωih) exp (zih − 0.5)ψ(xi ) αh − ωih(ψ(xi ) αh)2 /2 dωih, where p(ωi ) is the pdf of a PG and ψ(xi ) = {I(xi = 1), . . . , I(xi = J)} . The augmented log-likelihood has a quadratic form =⇒ simple computations and conjugacy with Gaussian priors. The conditional distribution of (ωi | −) is still in the class of the Pólya-gamma distributions =⇒ conjugacy. Tommaso Rigon (Bocconi) LSBP SIS 2018 15 / 20
  • 19. Posterior inference Posterior inference via Gibbs sampling for i from 1 to n do update Gi from the discrete variable with probabilities pr(Gi = h | −) = πhxi Pois(yi ; θh) H q=1 πqxi Pois(yi ; θq) , for every h = 1, . . . , H. From Gi derive the associated zih indicators. for h from 1 to H − 1 do update the logit stick-breaking parameters αh for every i such that Gi > h − 1 do sample the Pòlya–gamma data ωih from (ωih | −) ∼ PG{1, ψ(xi ) αh}. Given the Pòlya–gamma augmented data, update αh from the full conditional (αh | −) ∼ NR (µαh , Σαh ), standard Bayesian linear regression for h from 1 to H do update each kernel parameter θh from (θh | −) ∼ Gamma  aθ + i:Gi =h yi , bθ + n i=1 I(Gi = h)   . Tommaso Rigon (Bocconi) LSBP SIS 2018 16 / 20
  • 20. Illustration Application to the seizure dataset We apply the LSBP Poisson mixture model to the seizure dataset, which is also available in the flexmix R package. The dataset consists of daily myoclonic seizure counts (seizures) for a single subject, comprising a series of n = 140 days. After 27 days of baseline observation (Treatment:No), the subject received monthly infusions of intravenous gamma globulin (Treatment:Yes). We aim to compare the J = 2 groups: days with treatment and days without treatment. Tommaso Rigon (Bocconi) LSBP SIS 2018 17 / 20
  • 21. Illustration Application to the seizure dataset Treatment: Yes Treatment: No RelativefrequencyLSBPPoissonmixture 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20 Number of seizures Estimatedpdf Tommaso Rigon (Bocconi) LSBP SIS 2018 18 / 20
  • 22. Discussion and conclusions Possible extensions The LSBP for partially exchangeable random variables could be used as a building block for more sophisticated models. For instance, one could use the partially exchangeable LSBP as a prior for infinite hidden Markov models or for topic modeling, where the HDP is usually employed. The computational advantages of the LSBP might lead to major improvements in those settings. Tommaso Rigon (Bocconi) LSBP SIS 2018 19 / 20
  • 23. Discussion and conclusions Summary We proposed a Bayesian nonparametric mixture model for partially exchangeable count data. We explored some of its theoretical properties and we developed a simple Gibbs sampler for posterior inference. References Polson, N. G., Scott, J. G. and Windle, J. (2013). Bayesian inference for logistic models using Polya-Gamma latent variables. Journal of the American Statistical Association, 108(504), 1–42. Ren, L., Du, L., Carin, L. and Dunson, D. B. (2011), Logistic stick-breaking process, Journal of Machine Learning Research 12, 203–239. Rigon, T. and Durante, D., (2018), Logit stick-breaking priors for Bayesian density regression, ArXiv. Rodriguez, A. and Dunson, D. B. (2011), Nonparametric Bayesian models through probit stick-breaking processes, Bayesian Analysis 6(1), 145–178. Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476), 1566—1581. Tommaso Rigon (Bocconi) LSBP SIS 2018 20 / 20