SlideShare a Scribd company logo
1 of 92
Download to read offline
Bayesian solutions to high-dimensional challenges using
hybrid search
Shiqiang Jin
Department of Statistics
Kansas State University, Manhattan, KS
Major advisor
Dr. Gyuhyeong Goh (Statistics)
Committee members:
Dr. Weixing Song (Statistics)
Dr. Wei-Wen Hsu (Statistics)
Dr. Jisang Yu (Agricultural Economics)
Outside chairperson: Dr. Yoon-Jin Lee (Economics)
January 25, 2021
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 1 / 75
Outline
Chapter 1: Introduction
Chapter 2: Bayesian selection of best subsets via hybrid search
Chapter 3: Fast Bayesian best subset selection for high-dimensional
multivariate regression
Chapter 4: An approximate Bayesian approach to fast high-dimensional
variable selection
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 2 / 75
Chapter 1
Introduction
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 3 / 75
Introduction
Figure 1: Data storage equipment 1
1https://www.greenamerica.org/amazon-build-cleaner-and-fairer-cloud
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 4 / 75
Challenges of high-dimensional data
High-dimensional data problem arises when number of predictors (p) is much
larger than sample size (n), e.g. p > n.
With large p, only a few of variables are related to the response.
Best subset selection: evaluate all possible combinations of predictors.
I However, it involves non-convex optimization problems that are
computationally intractable when p is large. e.g: 240
≈ 1012
.
Bayesian subset regression is an efficient way to explore the non-convex
model space because it implements stochastic search based on MCMC
computation.
I Limitation: extremely heavy computation and slow convergence.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 5 / 75
Challenges of high-dimensional data
High-dimensional data problem arises when number of predictors (p) is much
larger than sample size (n), e.g. p > n.
With large p, only a few of variables are related to the response.
Best subset selection: evaluate all possible combinations of predictors.
I However, it involves non-convex optimization problems that are
computationally intractable when p is large. e.g: 240
≈ 1012
.
Bayesian subset regression is an efficient way to explore the non-convex
model space because it implements stochastic search based on MCMC
computation.
I Limitation: extremely heavy computation and slow convergence.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 5 / 75
Challenges of high-dimensional data
High-dimensional data problem arises when number of predictors (p) is much
larger than sample size (n), e.g. p > n.
With large p, only a few of variables are related to the response.
Best subset selection: evaluate all possible combinations of predictors.
I However, it involves non-convex optimization problems that are
computationally intractable when p is large. e.g: 240
≈ 1012
.
Bayesian subset regression is an efficient way to explore the non-convex
model space because it implements stochastic search based on MCMC
computation.
I Limitation: extremely heavy computation and slow convergence.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 5 / 75
Challenges in parallel computing
In the Bayesian literature, many efforts have been put to reduce the
computational burden of MCMC.
Shotgun stochastic search (Hans et al., 2007) introduced parallel
computing within MCMC procedure to reduce the computational burden.
A practical issue is that the high-performance machines and programming
protocols are not available to individual users and researchers.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 6 / 75
Challenges in multivariate regression
One of important issues with high-dimensional data analysis is the number of
response variables is multiple, called multivariate data.
The multivariate linear regression model (MVRM) is a popular way to
connect multiple responses to a common set of predictors.
There is an attempt to extend Bayesian stochastic search algorithm to
multivariate linear regression setting (Brown et al., 1998).
I But Brown et al. (1998) still suffers from computational issues in the presence
of high-dimensional data.
.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 7 / 75
Objectives
In this dissertation, the main focus is to develop innovative Bayesian methods that
can
identify a best model via a fast global optimization;
be quickly implemented in a single CPU core;
apply to various types of data (e.g., Gaussian, multivariate, binary, count,
and survival data).
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 8 / 75
Chapter 2
Bayesian selection of best subsets via hybrid search
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 9 / 75
Linear regression model in high-dimensional data
Consider a linear regression model
y = Xβ + , (1)
where y = (y1, . . . , yn)T
is a response vector, X = (x1, . . . , xp) ∈ Rn×p
is a
model matrix, β = (β1, . . . , βp)T
is a coefficient vector and  ∼ N(0, σ2
In).
We assume p  n, i.e. High-dimensional data.
We assume only a few number of predictors are associated with the response,
i.e. β is sparse.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 10 / 75
Reduce model
To better explain the sparsity of β, we introduce a latent index set
γ ⊂ {1, . . . , p} so that Xγ represents a sub-matrix of X containing xj , j ∈ γ.
e.g. γ = {1, 3, 4} ⇒ Xγ = (x1, x3, x4).
The full model in (1) can be reduced to
y = Xγβγ + . (2)
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 11 / 75
Objectives in Chapter 2
Our Goals are to obtain:
(i) k most important predictors out of p
k

candidate models;
(ii) a single best model from among 2p
candidate models.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 12 / 75
Prior specifications
We consider conjugate priors as follows:
βγ|σ2
, γ ∼ Normal(0, τσ2
I|γ|),
σ2
∼ Inverse-Gamma(aσ/2, bσ/2),
π(γ) ∝ I(|γ| = k),
where |γ| is number of elements in γ.
Hence, by Bayes theorem, we have the closed form of marginal likelihood.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 13 / 75
Model posterior distribution
By Bayes theorem, given model γ, we have
π(γ|y) ∝ f (y|γ)π(γ)
∝
(τ−1
)
|γ|
2
|XT
γXγ + τ−1I|γ||
1
2 (yT
Hγy + bσ)
aσ+n
2
I(|γ| = k)
≡ g(γ)I(|γ| = k),
where f (y|γ) is the marginal likelihood function and
Hγ = In − Xγ(XT
γXγ + τ−1
I|γ|)−1
XT
γ.
Hence, our Bayesian model selection procedure is simply to find the true
model γ that maximizes the probability π(γ|y).
For notation simplicity, g(γ) is used as model selection criterion.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 14 / 75
Best subset selection algorithm
According to best subset selection algorithm, our goals become:
(i) Fixed size: given k, select the best subset model by
Mk = argγ max
|γ|=k
g(γ)
from p
k

candidate models.
(ii) Single best model: M = arg maxγ π(γ|y) from 2p
candidate models.
Non-convex optimization problem arises.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 15 / 75
Deterministic search with a fixed k
Given model γ with model size k, define two neighborhood spaces:
I addition neighbor N+(γ) = {γ ∪ {j} : j /
∈ γ} and
I deletion neighbor N−(γ) = {γ  {j} : j ∈ γ}.
Note our Goal (i) is to find γ̂ = argγ max|γ̂|=k g(γ).
1. Initialize γ̂ s.t. |γ̂| = k.
2. Repeat # deterministic search:local optimum
Update γ̃ ← arg maxγ∈N+(γ̂) g(γ) ; # N+(γ̂) = {γ̂ ∪ {j} : j /
∈ γ̂}
Update γ̂ ← arg maxγ∈N−(γ̃) g(γ); # N−(γ̃) = {γ̃  {j} : j ∈ γ̃}
until convergence.
3. Return γ̂.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 16 / 75
Hybrid search algorithm
1. γ̂ is obtained from deterministic search.
2. Set γ(0)
= γ̂.
3. Repeat for t = 1, . . . , T: #stochastic search:global optimum
i) Sample γ∗
with probability proportional to g(γ) for γ ∈ N+(γ̂(t−1)
);
ii) Sample γ(t)
with probability proportional to g(γ) for γ ∈ N−(γ∗
);
iii) If π(γ̂|y)  π(γ(t)
|y), then update γ̂ = γ(t)
, break the loop, and go to
deterministic search.
4. Return γ̂.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 17 / 75
Idea of hybrid search
Current State
Next
Update
Figure 2: Hybrid search enables us to achieve the global maximum efficiently.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 18 / 75
Best subset selection with varying k
Note that Goal (ii): a single best model from among 2p
candidate models.
We extend “fixed” k to varying k by assigning a prior on k.
Note that the uniform prior, k ∼ Uniform{1, . . . , K}, tends to assign larger
probability to a larger subset (see Chen, Chen (2008)).
We define
π(k) ∝ 1/

p
k

I(k ≤ K).
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 19 / 75
Hybrid best subset search with varying k
Bayesian best subset selection can be done by maximizing
π(γ, k|y) ∝ g(γ)/

p
k

(3)
over (γ, k).
Our algorithm proceeds as follows:
1. Repeat for k = 1, . . . , K: Given k, implement the hybrid search algorithm to
obtain best subset model γ̂k .
2. Find the best model γ̂∗
obtained by
γ̂∗
= arg max
k∈{1,...,K}

g(γ̂k )/

p
k

. (4)
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 20 / 75
Computational burden in g(γ)
We have to compute g(γ) p times in each iteration ⇒ inefficient.
g(γ) =
(τ−1
)
|γ|
2
|XT
γXγ + τ−1I|γ||
1
2 (yT
Hγy + bσ)
aσ+n
2
,
where Hγ = In − Xγ(XT
γXγ + τ−1
I|γ|)−1
XT
γ.
We propose a method that can evaluate g(γ) for all models in γ ∈ N+(γ̂)
(or γ ∈ N−(γ̂)) simultaneously in a single computation.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 21 / 75
Define γ̂ ∪ {j} as a model in the addition neighbor of γ̂, then
g(γ̂ ∪ {j}) =
(τ−1
)
|γ̂∪{j}|
2
|XT
γ̂∪{j}Xγ̂∪{j} + τ−1I|γ̂∪{j}||
1
2 yT
Hγ̂∪{j}y + bσ
aσ+n
2
,
where Hγ̂∪{j} = In − Xγ̂∪{j}(XT
γ̂∪{j}Xγ̂∪{j} + τ−1
I|γ̂∪{j}|)−1
XT
γ̂∪{j}.
By the following Lemma 1, Lemma 2 and Sherman-Morrison formula,
I Lemma 1: Hγ̂ = (In + τXγ̂ XT
γ̂ )−1
.
I Lemma 2: det(In + UVT
) = det(Im + VT
U) for U and V are n × m matrices.
I Sherman-Morrison formula: (A + uvT
)−1
= A−1
− A−1
uvT
A−1
1+vT A−1u for A ∈ Rn×n
and u, v ∈ Rn
.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 22 / 75
We have
g(γ̂ ∪ {j}) ∝
(
yT
Hγ̂y + bσ −
(xT
j Hγ̂y)2
τ−1 + xT
j Hγ̂xj
)− aσ+n
2
×
(τ−1
+ xT
j Hγ̂xj )−1/2
. (5)
The vector XT
Hγ̂y contains xT
j Hγ̂y for all j’s.
The diagonal elements in matrix XT
Hγ̂X contain xT
j Hγ̂xj for all j’s.
m+(γ̂) =

(yT
Hγ̂y + bσ)1p −
(XT
Hγ̂y)2
τ−11p + diag(XT
Hγ̂X)
− aσ+n
2
×

τ−1
1p + diag(XT
Hγ̂X)
	−1/2
. (6)
m+(γ̂) contains g(γ̂ ∪ {j}) for all j /
∈ γ.
How many inverse matrices and determinants we need to compute?
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 23 / 75
We have
g(γ̂ ∪ {j}) ∝
(
yT
Hγ̂y + bσ −
(xT
j Hγ̂y)2
τ−1 + xT
j Hγ̂xj
)− aσ+n
2
×
(τ−1
+ xT
j Hγ̂xj )−1/2
. (5)
The vector XT
Hγ̂y contains xT
j Hγ̂y for all j’s.
The diagonal elements in matrix XT
Hγ̂X contain xT
j Hγ̂xj for all j’s.
m+(γ̂) =

(yT
Hγ̂y + bσ)1p −
(XT
Hγ̂y)2
τ−11p + diag(XT
Hγ̂X)
− aσ+n
2
×

τ−1
1p + diag(XT
Hγ̂X)
	−1/2
. (6)
m+(γ̂) contains g(γ̂ ∪ {j}) for all j /
∈ γ.
How many inverse matrices and determinants we need to compute?
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 23 / 75
We have
g(γ̂ ∪ {j}) ∝
(
yT
Hγ̂y + bσ −
(xT
j Hγ̂y)2
τ−1 + xT
j Hγ̂xj
)− aσ+n
2
×
(τ−1
+ xT
j Hγ̂xj )−1/2
. (5)
The vector XT
Hγ̂y contains xT
j Hγ̂y for all j’s.
The diagonal elements in matrix XT
Hγ̂X contain xT
j Hγ̂xj for all j’s.
m+(γ̂) =

(yT
Hγ̂y + bσ)1p −
(XT
Hγ̂y)2
τ−11p + diag(XT
Hγ̂X)
− aσ+n
2
×

τ−1
1p + diag(XT
Hγ̂X)
	−1/2
. (6)
m+(γ̂) contains g(γ̂ ∪ {j}) for all j /
∈ γ.
How many inverse matrices and determinants we need to compute?
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 23 / 75
We have
g(γ̂ ∪ {j}) ∝
(
yT
Hγ̂y + bσ −
(xT
j Hγ̂y)2
τ−1 + xT
j Hγ̂xj
)− aσ+n
2
×
(τ−1
+ xT
j Hγ̂xj )−1/2
. (5)
The vector XT
Hγ̂y contains xT
j Hγ̂y for all j’s.
The diagonal elements in matrix XT
Hγ̂X contain xT
j Hγ̂xj for all j’s.
m+(γ̂) =

(yT
Hγ̂y + bσ)1p −
(XT
Hγ̂y)2
τ−11p + diag(XT
Hγ̂X)
− aσ+n
2
×

τ−1
1p + diag(XT
Hγ̂X)
	−1/2
. (6)
m+(γ̂) contains g(γ̂ ∪ {j}) for all j /
∈ γ.
How many inverse matrices and determinants we need to compute?
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 23 / 75
Simultaneous computation
Similarly for g(γ̃  {j}).
m−(γ̃) =

(yT
Hγ̃y + bσ)1p +
(XT
Hγ̃y)2
τ−11p − diag(XT
Hγ̃X)
− aσ+n
2
×

τ−1
1p − diag(XT
Hγ̃X)
	−1/2
.
m−(γ̃) contains g(γ̃  {j}) for all j ∈ γ.
Applying m+(γ̂) and m−(γ̃) to hybrid search, we can significantly boost the
computational speed.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 24 / 75
Simulation study–Data generation
For given n = 100, we generate the data from
yi
ind
∼ Normal
p
X
j=1
βj xij , 1
!
,
where
I (xi1, . . . , xip)T iid
∼ Normal(0p, Σ) with Σ = (Σij )p×p and Σij = ρ|i−j|
,
I βj
iid
∼ Uniform{−1, −2, 1, 2} if j ∈ γ and βj = 0 if j /
∈ γ,
I γ is an index set of size 4 randomly selected from {1, 2, . . . , p},
I We consider four scenarios for p and ρ:
(i) p = 200, ρ = 0.1, (ii) p = 200, ρ = 0.9,
(iii) p = 1000, ρ = 0.1, (iv) p = 1000, ρ = 0.9.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 25 / 75
Simulation study-Results
Table 1: 2, 000 replications; FDR (false discovery rate), TRUE% (percentage of the true
model detected), SIZE (selected model size), HAM (Hamming distance).
Scenario Method FDR (s.e.) TRUE% (s.e.) SIZE (s.e.) HAM (s.e.)
p = 200 Proposed 0.006 (0.001) 96.900 (0.388) 4.032(0.004) 0.032(0.004)
ρ = 0.1 SCAD 0.034 (0.002) 85.200 (0.794) 4.188 (0.011) 0.188 (0.011)
MCP 0.035 (0.002) 84.750 (0.804) 4.191 (0.011) 0.191 (0.011)
ENET 0.016 (0.001) 92.700 (0.582) 4.087 (0.007) 0.087 (0.007)
LASSO 0.020 (0.002) 91.350 (0.629) 4.109 (0.009) 0.109 (0.009)
p = 200 Proposed 0.023(0.002) 88.750(0.707) 3.985(0.006) 0.203(0.014)
ρ = 0.9 SCAD 0.059 (0.003) 74.150 (0.979) 4.107 (0.015) 0.480 (0.022)
MCP 0.137 (0.004) 55.400 (1.112) 4.264 (0.020) 1.098 (0.034)
ENET 0.501 (0.004) 0.300 (0.122) 7.716 (0.072) 5.018 (0.052)
LASSO 0.276 (0.004) 15.550 (0.811) 5.308 (0.033) 2.038 (0.034)
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 26 / 75
Simulation study-Results
Table 2: 2, 000 replications; FDR (false discovery rate), TRUE% (percentage of the true
model detected), SIZE (selected model size), HAM (Hamming distance).
Scenario Method FDR (s.e.) TRUE% (s.e.) SIZE (s.e.) HAM (s.e.)
p = 1000 Proposed 0.004(0.001) 98.100 (0.305) 4.020 (0.003) 0.020 (0.003)
ρ = 0.1 SCAD 0.027 (0.002) 87.900 (0.729) 4.145 (0.010) 0.145 (0.010)
MCP 0.031 (0.002) 86.550 (0.763) 4.172 (0.013) 0.172 (0.013)
ENET 0.035 (0.002) 84.850 (0.802) 4.181 (0.013) 0.206 (0.012)
LASSO 0.014 (0.001) 93.850 (0.537) 4.073 (0.007) 0.073 (0.007)
p = 1000 Proposed 0.023(0.002) 89.850 (0.675) 4.005 (0.005) 0.190 (0.013)
ρ = 0.9 SCAD 0.068 (0.003) 74.250 (0.978) 4.196 (0.014) 0.493 (0.023)
MCP 0.152 (0.004) 53.750 (1.115) 4.226 (0.017) 1.202 (0.035)
ENET 0.417 (0.005) 0.150 (0.087) 6.228 (0.068) 4.089 (0.043)
LASSO 0.265 (0.004) 19.500 (0.886) 5.139 (0.029) 1.909 (0.035)
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 27 / 75
Computational speed comparison
To show the merit of simultaneous computation, we compare the hybrid
search with “for-loop” and versus simultaneous computation.
0
5
10
15
20
200 400 600 800 1000
p
Time
(sec)
Algorithm 3 (for−loop)
Algorithm 3 (Proposed)
Figure 3: The hybrid search using simultaneous computation versus using “for-loop”.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 28 / 75
Real data application
Data description
We apply the proposed method to Breast Invasive Carcinoma (BRCA) data
generated by The Cancer Genome Atlas (TCGA) Research Network
http://cancergenome.nih.gov.
The data set contains p = 17, 326 gene expression measurements (recorded
on the log scale) of n = 526 patients with primary solid tumor.
BRCA1 is a tumor suppressor gene and its mutations predispose women to
breast cancer (Findlay et al., 2018).
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 29 / 75
Real data application
Our goal here is to identify the best fitting model for estimating an
association between BRCA1 (response variable) and the other genes
(independent variables).
BRCA1 = β1 ∗ NBR2 + β2 ∗ DTL + . . . + βp ∗ VPS25 + .
Results:
Table 3: Model comparison
BIC extended BIC MSPE
Proposed 985.93 1120.87 0.60
SCAD 1104.69 1176.42 0.68
MCP 1104.69 1176.42 0.68
ENET 1110.65 1198.68 0.68
LASSO 1104.69 1176.42 0.68
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 30 / 75
Real data application
Results (cont)
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.582
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
KLHL13
DTL
NBR2
C17orf53
TUBG1
CRBN
ARHGAP19
VPS25
GNL1
GINS1
LASSO ENET MCP SCAD Proposed
Methods
Genes
−0.1
0.0
0.1
0.2
0.3
Coefficient
Figure 4: Except C10orf76, 7 genes are documented as diseases-related genes
.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 31 / 75
Chapter 3
Fast Bayesian best subset selection for high-dimensional
multivariate regression
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 32 / 75
Model Setting
Consider fitting the multivariate regression model (MVRM) described as
follows:
Yn×m = Xn×pCp×m + En×m, (7)
where
I Y = (y1, y2, . . . , ym)T
∈ Rn×m
is a response matrix,
I X = (x1, x2, . . . , xp) ∈ Rn×p
,
I C = (c1, c2, . . . , cp)T
∈ Rp×m
is an unknown coefficient matrix,
I E = (e1, e2, . . . , en)T
∈ Rn×m
with ei
i.i.d
∼ Nm(0, Ω), Ω ∈ Rm×m
is an
unknown nonsingular covariance matrix.
Ω = cov(yi ) describes the relationship between each pair of (yij , yi`).
Ω = cov(yi ) =




σ2
11 σ12 . . . σ1m
σ21 σ2
22 . . . σ2m
. . . . . . . . . . . .
σm1 σm2 . . . σ2
mm



 .
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 33 / 75
Row-sparsity
We assume p  n, i.e. high-dimensional problem arises.
C is row-sparsity: only a few number of xj ’s are related with Y.
cj 6= 0 ⇔ xj selected; cj = 0 ⇔ xj not selected.
We introduce a latent index set γ ⊂ {1, 2, . . . , p} such that γ = {j : cj 6= 0}
For example, γ = {1, 3}, then
Cγ =

cT
1
cT
3

.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 34 / 75
Prior specifications
Given γ, the likelihood function is
f (Y|Cγ, Ω, γ) ∼ MN(XγCγ, In, Ω),
We consider the conjugate prior distributions for parameters C and Ω:
π(Cγ|Ω, γ) ∼ MN(0, ζI|γ|, Ω),
π(Ω) ∼ W−1
(Ψ, ν),
π(γ) ∝ I(|γ| = k),
where
I W−1
is inverse-wishart distribution,
I ζ, Ψ and ν are deterministic hyperparameters.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 35 / 75
Objective
Hence, the marginal likelihood function f (Y|γ) can be obtained by
integrating out the Cγ and Ω:
f (Y|γ) =
Z Z
f (Y|Cγ, Ω, γ)π(Cγ|Ω, γ)π(Ω)dCγdΩ
∝ ζ−
m|γ|
2 |XT
γXγ + ζ−1
I|γ||− m
2 |YT
HγY + Ψ|− n+ν
2
≡ s(Y|γ),
where Hγ = In − Xγ(XT
γXγ + ζ−1
I|γ|)−1
XT
γ.
By Bayesian theorem, the model posterior distribution is given as:
π(γ|Y) ∝ s(Y|γ)π(γ),
which is our model selection criterion.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 36 / 75
Hybrid best subset search with fixed model size k
1. Define γ̂ as the current model with |γ̂| = k.
2. Deterministic search: repeat following steps until get convergence,
i) Compute s(Y|γ) for all γ ∈ N+(γ̂). Update γ̃ to be the maximizer of s(Y|γ);
ii) Compute s(Y|γ) for all γ ∈ N−(γ̃). Update γ̂ to be the maximizer of s(Y|γ).
3. Return γ̂∗
.
4. Set γ̂(0)
← γ̂∗
.
5. Stochastic search: iterate in t = 1, . . . , T,
i) Sample γ̃(t)
with probabilities proportional to {s(Y|γ)} for γ ∈ N+(γ̂(t−1)
);
ii) Sample γ̂(t)
with probabilities proportional to {s(Y|γ)} for γ ∈ N−(γ̃(t)
);
iii) If γ̂(t)
is better than γ̂∗
, γ̂ ← γ̂(t)
, break the loop and jump to step 2.
6. Return γ̂.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 37 / 75
Hybrid best subset search with varying k
To select a single best model, we assign priors on k:
π(k) ∝ I(k  K)/

p
k

.
The model selection criterion becomes
π(γ, k|y) ∝ s(Y|γ)I(|γ| = k)/

p
k

.
Hybrid best subset search with varying k.
i). Fixed size: given k, select the best subset model by Mk = arg maxγ s(Y|γ)
via the hybrid search.
ii). Varying size: pick the single best model from M1, . . . , MK evaluated by
γ̂ = arg max1≤k≤K {π(γ, k|y)}.
Problem: s(Y|γ) includes an inverse matrix and two determinants, making
the hybrid search slow.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 38 / 75
Hybrid best subset search with varying k
To select a single best model, we assign priors on k:
π(k) ∝ I(k  K)/

p
k

.
The model selection criterion becomes
π(γ, k|y) ∝ s(Y|γ)I(|γ| = k)/

p
k

.
Hybrid best subset search with varying k.
i). Fixed size: given k, select the best subset model by Mk = arg maxγ s(Y|γ)
via the hybrid search.
ii). Varying size: pick the single best model from M1, . . . , MK evaluated by
γ̂ = arg max1≤k≤K {π(γ, k|y)}.
Problem: s(Y|γ) includes an inverse matrix and two determinants, making
the hybrid search slow.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 38 / 75
Simultaneous Computation
Like we did in Chapter 2, we can show that s(Y|γ̂ ∪ {j}) is the element of
the following vector:
m+(γ̂) = ζ−1
1p + diag(XT
Hγ̂X)
−m/2
·

1p −
diag(XT
Hγ̂Y(YT
Hγ̂Y + Ψ)−1
YT
Hγ̂X)
ζ−11p + diag(XT
Hγ̂X)
− n+ν
2
.
s(Y|γ̃  {`}) is an element of the following vector:
m−(γ̃) = ζ−1
1p − diag(XT
γ̃Hγ̃Xγ̃)
−m/2
·

1p +
diag(XT
γ̃Hγ̃Y(YT
Hγ̃Y + Ψ)−1
YT
Hγ̃Xγ̃)
ζ−11p − diag(XT
γ̃Hγ̃Xγ̃)
#− n+ν
2
.
Applying m+(γ̂) and m−(γ̃) to the hybrid search, we can significantly
improve the hybrid search speed.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 39 / 75
Simulation Study
Set n = 100 and m = 5, and generate our data {(yi , xi ) : i = 1, . . . , n}
yi
ind
∼ Nm(CT
xi , Ω), (8)
where
I xi
i.i.d
∼ Np(0p, Σx ), Σx = (ρ
|i−j|
x )p×p,
I Ω = (2ρ
|i−j|
e )m×m.
The true model is γ = {1, 2, 3, 4, 7, 8, 9, 10}.
cij
i.i.d
∼ Uniform{−1.5, −1, 1, 1.5} for j ∈ γ and cj = 0 for j /
∈ γ.
Consider all combinations of scenarios of p, ρe, and ρx as follows:
1. p = 200, 500 and 1000;
2. ρe =0, 0.2 and 0.5;
3. ρx =0.2 and 0.8.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 40 / 75
Simulation Study
The proposed method is compared with
I Brown et al. (1998), a well-known Bayesian approach with same setup with
our model in the proposed method.
I multivariate LASSO (mLASSO) and multivariate ENET (mENET) with tuning
parameters selected by
• AIC (Akaike, 1998),
• bias-corrected AIC (AICc) (Bedrick, 1994),
• BIC (Schwarz, 1978),
• consistent AIC (CAIC) (Bozdogan, 1987).
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 41 / 75
Table 4: Simulation study of ρe = 0 based on 100 replications.
Method FDR (s.e) TRUE% (s.e) SIZE (s.e) HAM (s.e) TIME (s.e)
p = 200, ρx = 0.2
Proposed 0(0) 100(0) 8(0) 0(0) 7.054(0.048)
Brown 0.002(0.002) 98(1.407) 8.02(0.014) 0.02(0.014) 302.692(0.779)
mLASSO 0(0) 100(0) 8(0) 0(0) 0.433(0.002)
mENET 0(0) 100(0) 8(0) 0(0) 0.427(0.003)
p = 200, ρx = 0.8
Proposed 0(0) 98(1.407) 7.98(0.014) 0.02(0.014) 7.635(0.067)
Brown 0.015(0.004) 84(3.685) 8.1(0.046) 0.18(0.044) 284.772(1.222)
mLASSO 0.024(0.005) 69(4.648) 8.04(0.063) 0.4(0.067) 0.556(0.005)
mENET 0.066(0.009) 49(5.024) 8.49(0.104) 0.79(0.092) 0.536(0.004)
p = 1000, ρx = 0.2
Proposed 0(0) 100(0) 8(0) 0(0) 14.182(0.12)
Brown 0.008(0.003) 93(2.564) 8.07(0.026) 0.07(0.026) 1752.643(5.724)
mLASSO 0.002(0.002) 98(1.407) 8.02(0.014) 0.02(0.014) 0.624(0.004)
mENET 0.002(0.002) 98(1.407) 8.02(0.014) 0.02(0.014) 0.606(0.004)
p = 1000, ρx = 0.8
Proposed 0(0) 94(2.387) 7.93(0.029) 0.07(0.029) 14.691(0.116)
Brown 0.014(0.004) 82(3.861) 8.07(0.046) 0.19(0.042) 1729.196(8.956)
mLASSO 0.025(0.006) 61(4.902) 7.92(0.091) 0.56(0.082) 0.79(0.008)
mENET 0.078(0.009) 39(4.902) 8.45(0.115) 1.05(0.119) 0.765(0.006)
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 42 / 75
Table 5: Simulation study of ρe = 0.2 based on 100 replications.
Method FDR (s.e) TRUE% (s.e) SIZE (s.e) HAM (s.e) TIME (s.e)
p = 200, ρx = 0.2
Proposed 0(0) 100(0) 8(0) 0(0) 7.067(0.048)
Brown 0.003(0.002) 97(1.714) 8.03(0.017) 0.03(0.017) 303.72(0.817)
mLASSO 0(0) 100(0) 8(0) 0(0) 0.463(0.003)
mENET 0(0) 100(0) 8(0) 0(0) 0.457(0.004)
p = 200, ρx = 0.8
Proposed 0(0) 99(1) 7.99(0.01) 0.01(0.01) 7.799(0.072)
Brown 0.015(0.004) 84(3.685) 8.1(0.046) 0.18(0.044) 286.251(1.186)
mLASSO 0.031(0.006) 67(4.726) 8.13(0.072) 0.45(0.073) 0.544(0.004)
mENET 0.062(0.009) 51(5.024) 8.42(0.102) 0.78(0.096) 0.529(0.004)
p = 1000, ρx = 0.2
Proposed 0(0) 100(0) 8(0) 0(0) 14.323(0.101)
Brown 0.011(0.004) 91(2.876) 8.1(0.033) 0.1(0.033) 1751.256(5.606)
mLASSO 0.002(0.002) 98(1.407) 8.02(0.014) 0.02(0.014) 0.676(0.006)
mENET 0.002(0.002) 98(1.407) 8.02(0.014) 0.02(0.014) 0.645(0.004)
p = 1000, ρx = 0.8
Proposed 0(0) 93(2.564) 7.93(0.026) 0.07(0.026) 14.707(0.122)
Brown 0.017(0.004) 80(4.02) 8.08(0.051) 0.22(0.046) 1750.775(8.792)
mLASSO 0.028(0.007) 61(4.902) 7.94(0.09) 0.6(0.098) 0.776(0.006)
mENET 0.081(0.01) 38(4.878) 8.49(0.123) 1.09(0.121) 0.754(0.006)
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 43 / 75
Table 6: Simulation study of ρe = 0.5 based on 100 replications.
Method FDR (s.e) TRUE% (s.e) SIZE (s.e) HAM (s.e) TIME (s.e)
p = 200, ρx = 0.2
Proposed 0(0) 100(0) 8(0) 0(0) 6.994(0.048)
Brown 0.002(0.002) 98(1.407) 8.02(0.014) 0.02(0.014) 299.103(0.687)
mLASSO 0(0) 100(0) 8(0) 0(0) 0.452(0.003)
mENET 0(0) 100(0) 8(0) 0(0) 0.444(0.004)
p = 200, ρx = 0.8
Proposed 0(0) 96(1.969) 7.96(0.02) 0.04(0.02) 7.606(0.064)
Brown 0.011(0.003) 87(3.38) 8.06(0.04) 0.14(0.038) 284.063(1.011)
mLASSO 0.037(0.008) 67(4.726) 8.19(0.098) 0.55(0.101) 0.549(0.005)
mENET 0.07(0.009) 51(5.024) 8.51(0.107) 0.85(0.106) 0.529(0.004)
p = 1000, ρx = 0.2
Proposed 0(0) 100(0) 8(0) 0(0) 14.131(0.1)
Brown 0.009(0.003) 93(2.564) 8.08(0.031) 0.08(0.031) 1722.443(4.872)
mLASSO 0.002(0.002) 98(1.407) 8.02(0.014) 0.02(0.014) 0.663(0.006)
mENET 0.001(0.001) 99(1) 8.01(0.01) 0.01(0.01) 0.641(0.004)
p = 1000, ρx = 0.8
Proposed 0(0) 93(2.564) 7.93(0.026) 0.07(0.026) 14.804(0.12)
Brown 0.011(0.003) 85(3.589) 8.04(0.042) 0.16(0.039) 1699.55(7.121)
mLASSO 0.027(0.007) 62(4.878) 7.89(0.099) 0.63(0.098) 0.763(0.006)
mENET 0.084(0.01) 37(4.852) 8.5(0.128) 1.14(0.127) 0.749(0.006)
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 44 / 75
Real data analysis
We apply the proposed method to Breast Invasive Carcinoma (BRCA) data
generated by The Cancer Genome Atlas (TCGA) Research Network
http://cancergenome.nih.gov.
The data set contains 17, 814 gene expression measurements (recorded on
the log scale) of 526 patients with primary solid tumor.
BRCA1 and BRCA2 are well-known genes to account for the hereditary
breast cancer (Yang, Lippman, 1999).
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 45 / 75
Real data analysis
Goal: determine the model of best-fit for estimating a relationship between
BRCA1  BRCA2 (multiple response variables) and the other genes
(predictors).
[yBRCA1, yBRCA2] = X[β1, β2] + (1, 2).
The proposed method is compared with
I Brown et al. (1998),
I mLASSO and mENET,
I Ch.2-Proposed method in Chapter 2,
I LASSO and ENET with EBIC (Chen, Chen, 2008).
Compute AIC, AICc, BIC, CAIC and the mean squared prediction error
(MSPE) by Monte Carlo cross-validation with the 70% training and 30%
testing dataset over 500 replications.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 46 / 75
Real data analysis
Goal: determine the model of best-fit for estimating a relationship between
BRCA1  BRCA2 (multiple response variables) and the other genes
(predictors).
[yBRCA1, yBRCA2] = X[β1, β2] + (1, 2).
The proposed method is compared with
I Brown et al. (1998),
I mLASSO and mENET,
I Ch.2-Proposed method in Chapter 2,
I LASSO and ENET with EBIC (Chen, Chen, 2008).
Compute AIC, AICc, BIC, CAIC and the mean squared prediction error
(MSPE) by Monte Carlo cross-validation with the 70% training and 30%
testing dataset over 500 replications.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 46 / 75
Real data analysis
Goal: determine the model of best-fit for estimating a relationship between
BRCA1  BRCA2 (multiple response variables) and the other genes
(predictors).
[yBRCA1, yBRCA2] = X[β1, β2] + (1, 2).
The proposed method is compared with
I Brown et al. (1998),
I mLASSO and mENET,
I Ch.2-Proposed method in Chapter 2,
I LASSO and ENET with EBIC (Chen, Chen, 2008).
Compute AIC, AICc, BIC, CAIC and the mean squared prediction error
(MSPE) by Monte Carlo cross-validation with the 70% training and 30%
testing dataset over 500 replications.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 46 / 75
Real Data Analysis
Table 7: Model comparison by OLS using Monte Carlo cross-validation.
Method AIC AICc BIC CAIC MSPE
Proposed 368.77 381.15 615.15 678.15 0.692
Brown 917.16 918.57 999.29 1020.29 0.989
Ch.2 method 540.66 548.08 732.28 781.28 0.770
mLASSO-AIC 547.52 607.62 1067.66 1200.66 0.808
mLASSO-AICc 547.52 607.62 1067.66 1200.66 0.808
mLASSO-BIC 703.04 717.08 965.06 1032.06 0.864
mLASSO-CAIC 703.04 717.08 965.06 1032.06 0.864
mENET-AIC 562.47 618.67 1066.97 1195.97 0.810
mENET-AICc 562.47 618.67 1066.97 1195.97 0.810
mENET-BIC 705.52 721.34 983.18 1054.18 0.866
mENET-CAIC 705.52 721.34 983.18 1054.18 0.866
LASSO-EBIC 901.06 902.23 975.37 994.37 0.967
ENET-EBIC 902.36 903.77 984.49 1005.49 0.968
30
9
23
65
65
32
32
63
63
34
34
8
9
30
9
23
32
32
19
19
29
29
17
17
7
7
ENET−EBIC
LASSO−EBIC
mENET−CAIC
mENET−BIC
mENET−AICc
mENET−AIC
mLASSO−CAIC
mLASSO−BIC
mLASSO−AICc
mLASSO−AIC
Ch. 2 method
Brown
Proposed
10 20 30 40 50 60
Number of genes
Method
Number of Genes
selected genes
p0.05
Figure 5: The number of selected genes and number of significant coefficients.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 48 / 75
Chapter 4
An approximate Bayesian approach to fast
high-dimensional variable selection
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 49 / 75
Model Setting
Consider a regression model:
I y = (y1, y2, . . . , yn)T
is a response variable, e.g. binary, count, and continuous
data.
I X is an n × p predictor matrix with p  n; high-dimensional problem arises.
I θ = (θ1, θ2, . . . , θp)T
is a sparse coefficient vector.
To explain the sparsity of θ, we introduce a latent index set γ = {j : θj 6= 0}.
So γ can be treated as a selected model.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 50 / 75
Model Setting
Consider a regression model:
I y = (y1, y2, . . . , yn)T
is a response variable, e.g. binary, count, and continuous
data.
I X is an n × p predictor matrix with p  n; high-dimensional problem arises.
I θ = (θ1, θ2, . . . , θp)T
is a sparse coefficient vector.
To explain the sparsity of θ, we introduce a latent index set γ = {j : θj 6= 0}.
So γ can be treated as a selected model.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 50 / 75
Bayesian model setup and selection
Given the model γ, we define
I f (y|θγ , γ): the likelihood function,
I π(θγ |γ): the prior of θγ ,
I π(γ): the prior of model γ.
By Bayes theorem,
π(γ|y) ∝ m(y|γ)π(γ),
where m(y|γ) =
R
f (y|θγ, γ)π(θγ|γ)dθγ is the marginal likelihood function.
Goal: find the best model γ∗
= arg maxγ π(γ|y).
Problem: m(y|γ) =
R
f (y|θγ, γ)π(θγ|γ)dθγ may not be computed in a
closed form.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 51 / 75
Bayesian model setup and selection
Given the model γ, we define
I f (y|θγ , γ): the likelihood function,
I π(θγ |γ): the prior of θγ ,
I π(γ): the prior of model γ.
By Bayes theorem,
π(γ|y) ∝ m(y|γ)π(γ),
where m(y|γ) =
R
f (y|θγ, γ)π(θγ|γ)dθγ is the marginal likelihood function.
Goal: find the best model γ∗
= arg maxγ π(γ|y).
Problem: m(y|γ) =
R
f (y|θγ, γ)π(θγ|γ)dθγ may not be computed in a
closed form.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 51 / 75
Bayesian model setup and selection
Given the model γ, we define
I f (y|θγ , γ): the likelihood function,
I π(θγ |γ): the prior of θγ ,
I π(γ): the prior of model γ.
By Bayes theorem,
π(γ|y) ∝ m(y|γ)π(γ),
where m(y|γ) =
R
f (y|θγ, γ)π(θγ|γ)dθγ is the marginal likelihood function.
Goal: find the best model γ∗
= arg maxγ π(γ|y).
Problem: m(y|γ) =
R
f (y|θγ, γ)π(θγ|γ)dθγ may not be computed in a
closed form.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 51 / 75
Laplace approximation
With Laplace approximation:
m(y|γ) ≈ (2πn−1
)|γ|/2
|V̂γ|1/2
f (y|θ̂γ, γ)π(θ̂γ|γ)
∝ (2πn−1
)|γ|/2
f (y|θ̂γ, γ)π(θ̂γ|γ), (9)
where
θ̂γ = arg max
θγ
f (y|θγ, γ)π(θγ|γ) (10)
is the posterior mode and |V̂γ| = Op(1) under some regularity conditions.
The posterior mode θ̂γ must be estimated first to evaluate (9) in the Laplace
approximation.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 52 / 75
Laplace approximation
With Laplace approximation:
m(y|γ) ≈ (2πn−1
)|γ|/2
|V̂γ|1/2
f (y|θ̂γ, γ)π(θ̂γ|γ)
∝ (2πn−1
)|γ|/2
f (y|θ̂γ, γ)π(θ̂γ|γ), (9)
where
θ̂γ = arg max
θγ
f (y|θγ, γ)π(θγ|γ) (10)
is the posterior mode and |V̂γ| = Op(1) under some regularity conditions.
The posterior mode θ̂γ must be estimated first to evaluate (9) in the Laplace
approximation.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 52 / 75
Objective
Therefore, the Bayesian model selection criterion becomes
π(γ|y) ∝ (2πn−1
)|γ|/2
f (y|θ̂γ, γ)
| {z }
likelihood
π(θ̂γ|γ)
| {z }
prior
π(γ) ≡ S(γ). (11)
Objective: find the best model γ∗
= arg maxγ S(γ).
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 53 / 75
Hybrid search
1. Set γ̂ as the current model and γ̂(0)
= γ̂.
2. Repeat t = 0, 1, 2, . . . , # Deterministic search
i). Define nbd(γ̂(t)
);
ii). Compute S(γ)I{γ ∈ nbd(γ̂(t)
)};# Sequentially in for-loop
iii). Compute γ(t+1)
= arg maxγ∈nbd(γ̂(t)) S(γ);
iv). If S(γ̂(t+1)
)  S(γ̂), then update γ̂ ← γ̂(t+1)
;
else update γ̂ ← γ̂(t)
and break the loop.
3. Set γ̂(0)
= γ̂. # Stochastic search
4. Repeat for t = 0, 1, 2, . . . , T, # small T
i). Define nbd(γ̂(t)
);
ii). Compute S(γ)I{γ ∈ nbd(γ̂(t)
)};# Sequentially in for-loop
iii). Compute γ∗
= arg maxγ∈nbd(γ̂(t)) S(γ);
iv). If S(γ∗
)  S(γ̂), then update γ̂ ← γ∗
and go back to Step 2 immediately,
else sample γ̂(t+1)
with probability proportional to S(γ) for γ ∈ nbd(γ̂(t)
).
5. Return γ̂.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 54 / 75
Hybrid search
1. Set γ̂ as the current model and γ̂(0)
= γ̂.
2. Repeat t = 0, 1, 2, . . . , # Deterministic search
i). Define nbd(γ̂(t)
);
ii). Compute S(γ)I{γ ∈ nbd(γ̂(t)
)};# Sequentially in for-loop
iii). Compute γ(t+1)
= arg maxγ∈nbd(γ̂(t)) S(γ);
iv). If S(γ̂(t+1)
)  S(γ̂), then update γ̂ ← γ̂(t+1)
;
else update γ̂ ← γ̂(t)
and break the loop.
3. Set γ̂(0)
= γ̂. # Stochastic search
4. Repeat for t = 0, 1, 2, . . . , T, # small T
i). Define nbd(γ̂(t)
);
ii). Compute S(γ)I{γ ∈ nbd(γ̂(t)
)};# Sequentially in for-loop
iii). Compute γ∗
= arg maxγ∈nbd(γ̂(t)) S(γ);
iv). If S(γ∗
)  S(γ̂), then update γ̂ ← γ∗
and go back to Step 2 immediately,
else sample γ̂(t+1)
with probability proportional to S(γ) for γ ∈ nbd(γ̂(t)
).
5. Return γ̂.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 54 / 75
Simultaneously evaluate all models in nbd+(γ̂)
θ̂γ̂ is the posterior mode of current model γ̂.
Consider a model in addition neighbor: γ̂ ∪ {j} ∈ nbd+(γ̂).
θγ̂∪{j} = (θγ̂, θj ) ⇒ (θ̂γ̂, θj ) ⇒ (θ̂γ̂, θ̃j ), where
θ̃j = arg max
θj
f

y|(θ̂γ̂, θj ), γ̂ ∪ {j}

| {z }
likelihood
× π

(θ̂γ̂, θj )|γ̂ ∪ {j}

| {z }
prior
(12)
Hence, (θ̂γ̂, θ̃j ) is the estimate for θγ̂∪{j}.
S̃(γ̂ ∪ {j}) = (2πn−1
)
|γ|+1
2 f (y|θ̂γ̂, θ̃j , γ̂ ∪ {j})
| {z }
likelihood
π(θ̂γ̂, θ̃j |γ̂ ∪ {j})
| {z }
prior
π(γ̂ ∪ {j}).
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 55 / 75
Simultaneously evaluate all models in nbd+(γ̂)
θ̂γ̂ is the posterior mode of current model γ̂.
Consider a model in addition neighbor: γ̂ ∪ {j} ∈ nbd+(γ̂).
θγ̂∪{j} = (θγ̂, θj ) ⇒ (θ̂γ̂, θj ) ⇒ (θ̂γ̂, θ̃j ), where
θ̃j = arg max
θj
f

y|(θ̂γ̂, θj ), γ̂ ∪ {j}

| {z }
likelihood
× π

(θ̂γ̂, θj )|γ̂ ∪ {j}

| {z }
prior
(12)
Hence, (θ̂γ̂, θ̃j ) is the estimate for θγ̂∪{j}.
S̃(γ̂ ∪ {j}) = (2πn−1
)
|γ|+1
2 f (y|θ̂γ̂, θ̃j , γ̂ ∪ {j})
| {z }
likelihood
π(θ̂γ̂, θ̃j |γ̂ ∪ {j})
| {z }
prior
π(γ̂ ∪ {j}).
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 55 / 75
Simultaneously evaluate all models in nbd+(γ̂)
θ̂γ̂ is the posterior mode of current model γ̂.
Consider a model in addition neighbor: γ̂ ∪ {j} ∈ nbd+(γ̂).
θγ̂∪{j} = (θγ̂, θj ) ⇒ (θ̂γ̂, θj ) ⇒ (θ̂γ̂, θ̃j ), where
θ̃j = arg max
θj
f

y|(θ̂γ̂, θj ), γ̂ ∪ {j}

| {z }
likelihood
× π

(θ̂γ̂, θj )|γ̂ ∪ {j}

| {z }
prior
(12)
Hence, (θ̂γ̂, θ̃j ) is the estimate for θγ̂∪{j}.
S̃(γ̂ ∪ {j}) = (2πn−1
)
|γ|+1
2 f (y|θ̂γ̂, θ̃j , γ̂ ∪ {j})
| {z }
likelihood
π(θ̂γ̂, θ̃j |γ̂ ∪ {j})
| {z }
prior
π(γ̂ ∪ {j}).
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 55 / 75
Simultaneously evaluate all models in nbd+(γ̂)
θ̂γ̂ is the posterior mode of current model γ̂.
Consider a model in addition neighbor: γ̂ ∪ {j} ∈ nbd+(γ̂).
θγ̂∪{j} = (θγ̂, θj ) ⇒ (θ̂γ̂, θj ) ⇒ (θ̂γ̂, θ̃j ), where
θ̃j = arg max
θj
f

y|(θ̂γ̂, θj ), γ̂ ∪ {j}

| {z }
likelihood
× π

(θ̂γ̂, θj )|γ̂ ∪ {j}

| {z }
prior
(12)
Hence, (θ̂γ̂, θ̃j ) is the estimate for θγ̂∪{j}.
S̃(γ̂ ∪ {j}) = (2πn−1
)
|γ|+1
2 f (y|θ̂γ̂, θ̃j , γ̂ ∪ {j})
| {z }
likelihood
π(θ̂γ̂, θ̃j |γ̂ ∪ {j})
| {z }
prior
π(γ̂ ∪ {j}).
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 55 / 75
Simultaneously evaluate all models in nbd+(γ̂)
3 steps to estimate all θj ’s simultaneously:
1. Given θ̂γ̂ , take the logarithm of following function:
`(θj ) = log f y|(θ̂γ̂ , θj ), γ̂ ∪ {j}

| {z }
likelihood
× π (θ̂γ̂ , θj )|γ̂ ∪ {j}

| {z }
prior
.
2. Taking the first derivative of `(θj ) w.r.t each θj :
uj (θj ) =
∂`(θj )
∂θj
.
Thus, maximizing of `(θj ) is equivalent to finding the root of uj (θj ) = 0.
Note: only one unknown parameter θj in the uj (θj ).
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 56 / 75
Simultaneously evaluate all models in nbd+(γ̂)
3 steps to estimate all θj ’s simultaneously:
1. Given θ̂γ̂ , take the logarithm of following function:
`(θj ) = log f y|(θ̂γ̂ , θj ), γ̂ ∪ {j}

| {z }
likelihood
× π (θ̂γ̂ , θj )|γ̂ ∪ {j}

| {z }
prior
.
2. Taking the first derivative of `(θj ) w.r.t each θj :
uj (θj ) =
∂`(θj )
∂θj
.
Thus, maximizing of `(θj ) is equivalent to finding the root of uj (θj ) = 0.
Note: only one unknown parameter θj in the uj (θj ).
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 56 / 75
Simultaneously evaluate all models in nbd+(γ̂)
Without loss of generality, we assume the current model γ̂ = {1, 2, . . . , k}.
For j = k + 1, . . . , p, all θ̃j ’s can be obtained by finding the roots of system
of equations:
uk+1(θk+1) = 0, uk+2(θk+2) = 0, . . . , up(θp) = 0. (13)
3. Solve (13) using Newton’s method by iterating:
θ(t+1)
= θ(t)
− Ju(θ(t)
)−1
u(θ(t)
), (14)
where θ(t)
= (θ
(t)
k+1, θ
(t)
k+2, . . . , θ
(t)
p ) and Ju(θ(t)
) =

∂uj (θ
(t)
j )
∂θ
(t)
l

j,l∈{k+1,...,p}
.
Note that Ju(θ(t)
) is a diagonal matrix ⇒ easily compute Ju(θ(t)
)−1
.
Plug all (θ̂γ, θ̃j ) to S(γ) function, we obtain all S̃(γ) in the addition
neighbor.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 57 / 75
Theorem
Theorem
S(γ) is defined by (11), where θ̂γ is the posterior mode. S̃(γ) is an approximate
estimate of S(γ). Define γ1 and γ2 are two different models. If S̃(γ1)  S(γ2),
then S(γ1)  S(γ2).
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 58 / 75
Simultaneously evaluate all models in nbd−(γ̂)
If γ̂ = {1, 2, 3}, then nbd−(γ̂) = {{1, 2}, {1, 3}, {2, 3}}.
To estimate θ{1,2}, θ{1,3} and θ{2,3} simultaneously,
I Decompose θ̂γ̂ = (θ̂1, θ̂2, θ̂3).
I


θ̃{1,2}
θ̃{1,3}
θ̃{2,3}

 =


(θ̂1, θ̂2,

θ̂3)
(θ̂1,

θ̂2, θ̂3)
(

θ̂1, θ̂2, θ̂3)

 =


(θ̂1, θ̂2)
(θ̂1, θ̂3)
(θ̂2, θ̂3)


Plug {θ̃{1,2}, θ̃{1,3}, θ̃{2,3}} to S(γ) function to obtain its approximate
estimate S̃(γ) in the deletion neighbor.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 59 / 75
Apply S̃(γ) to hybrid search algorithm–Part I
1. Set γ̂ as the current model and γ̂(0)
= γ̂.
2. Repeat t = 0, 1, 2, . . . , # deterministic search
i). Simultaneously compute S̃(γ) in addition and deletion neighbor.
ii). Select the best model γ̂+
and γ̂−
in addition and deletion neighbor,
respectively.
iii). Compute γ̂(t+1)
= arg maxγ∈{γ̂+,γ̂−} S(γ).
iv). If S(γ̂(t+1)
)  S(γ̂), then γ̂ ← γ̂(t+1)
, else γ̂ ← γ̂(t)
and break the loop.
3. Return γ̂.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 60 / 75
Apply S̃(γ) to hybrid search algorithm–Part II
4. Set γ̂(0)
= γ̂ for the stochastic search.
5. Repeat for t = 0, 1, 2, . . . , T # stochastic search
i). Simultaneously compute S̃(γ) in addition and deletion neighbor.
ii). Sample a model γ̂+
with probability proportional to S̃(γ) for γ ∈ nbd+(γ̂(t)
);
iii). Sample a model γ̂−
with probability proportional to S̃(γ) for γ ∈ nbd−(γ̂(t)
);
iv). Compute γ̂(t+1)
= arg maxγ∈{γ̂+,γ̂−} S(γ);
v). If S(γ̂(t+1)
)  S(γ̂), then γ̂ ← γ̂(t+1)
and jump to Step 2 immediately,
else sample a model γ̂(t+1)
with probability proportional to S(γ) for
γ ∈ {γ̂+
, γ̂−
}.
6. Return γ̂.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 61 / 75
Simulation Study
Given model γ, we have a general framework of Bayesian model structure as
follows:
E(yi |xT
iγθγ) = g−1
(xT
iγθγ),
π(θγ|γ) ∼ N(0, λI|γ|),
π(γ) ∝ 1/

p
|γ|

,
where xiγ is sub-vector of xi and g(·) is the link function.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 62 / 75
Simulation Study–Model setting in Case I
In Case I, n = 300, p = 1000, data sets are generated as follows:
Gaussian: yi
i.i.d
∼ N(xT
i βγ, σ2
), where γ = {1, 3, 5, 7, 9},
βγ = (1, −1, 1, −1, 1)T
, σ2
= 3, and xi
i.i.d
∼ N(0p, Σx ) with Σx = (Σij )p×p
and Σij = 0.6|i−j|
.
Binary: yi
i.i.d
∼ Bernoulli(pi ) with pi = g2(xT
i θγ), where g2(x) = 1
1+exp(−x) ,
γ = {1, 3, 5}, θγ = (0.8, 1 − 1)T
and xi
i.i.d
∼ N(0p, Σx ) with Σx = (Σij )p×p
and Σij = 0.6|i−j|
.
Count: yi
i.i.d
∼ Poisson(µi ) with µi = g3(xT
i θγ), where g3(x) = exp(x),
γ = {1, 3, 5}, θγ = (0.8, 1, −1)T
, xi = Φ(zi ) − 0.51p with Φ(·) the CDF of
standard normal distribution, and zi
i.i.d
∼ N(0p, Σz ) with Σz = (Σij )p×p and
Σij = 0.6|i−j|
.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 63 / 75
Simulation Study–Model setting in Case II
In Case II, n = 600, p = 2000, data sets are generated as follows:
Gaussian: yi
i.i.d
∼ N(xT
i βγ, σ2
), where γ = {1, 3, 5, 7, 9},
βγ = (1, −1, 1, −1, 1)T
, σ2
= 6, and xi
i.i.d
∼ N(0p, Σx ) with Σx = (Σij )p×p
and Σij = 0.6|i−j|
.
Binary: yi
i.i.d
∼ Bernoulli(pi ) with pi = g2(xT
i θγ), where g2(x) = 1
1+exp(−x) ,
γ = {1, 3, 5, 7, 9}, θγ = (1, 0.9, −0.9, 0.9, −0.9)T
and xi
i.i.d
∼ N(0p, Σx ) with
Σx = (Σij )p×p and Σij = 0.6|i−j|
.
Count: yi
i.i.d
∼ Poisson(µi ) with µi = g3(xT
i θγ), where g3(x) = exp(x),
γ = {1, 3, 5, 7, 9}, θγ = (0.8, 0.8, −0.8, 0.8, −0.8)T
,xi = Φ(zi ) − 0.51p with
Φ(·) the CDF of standard normal distribution, and zi
i.i.d
∼ N(0p, Σz ) with
Σz = (Σij )p×p and Σij = 0.6|i−j|
.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 64 / 75
Simulation study–Performance of computation and model
selection
With these model settings, we run
I Simulation study I to investigate the performance of computation;
I Simulation study II to investigate the performance of model selection.
Table 8: The comparison of best model search algorithms
Method Speed Maximum Computation Function
Deterministic Very fast Local “for-loop” S(γ)
Stochastic Slow Global “for-loop” S(γ)
Hybrid Fast Global “for-loop” S(γ)
Proposed Very fast Global Simultaneous S̃(γ)
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 65 / 75
Simulation study I–Speed
●
●
●
●
0
10
20
30
40
50
Case I Case II
Time(min)
Gaussian
●
●
●
●
0
10
20
30
40
50
Case I Case II
Time(min)
Logistic
●
●
●
●
0
10
20
30
40
Case I Case II
Time(min)
Poisson
Algorithm ● ●
Deterministic Stochastic Hybrid Proposed
Figure 6: Comparison of computational speed in four algorithms based on 100
replications.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 66 / 75
Simulation study II–Case I
Table 9: Simulation study for Case I (n, p) = (300, 1000) based on 500 repliations.
Notation: s.e.— standard error;
Method FPR% (s.e) FNR% (s.e) TRUE% (s.e) SIZE (s.e) HAM (s.e)
Gaussian
Deterministic 0.001(0.001) 10.72(1.128) 82.6(1.697) 4.478(0.057) 0.550(0.057)
Stochastic 0.001(0) 0(0) 98.8(0.487) 5.012(0.005) 0.012(0.005)
Hybrid 0.001(0) 0(0) 98.8(0.487) 5.012(0.005) 0.012(0.005)
Proposed 0.001(0) 0.24(0.138) 98.6(0.526) 4.998(0.007) 0.022(0.009)
Logistic (Binary)
Deterministic 0.001(0) 30.933(1.470) 52.0(2.237) 2.080(0.044) 0.936(0.044)
Stochastic 0(0) 4.733(0.734) 91.4(1.255) 2.862(0.022) 0.146(0.022)
Hybrid 0(0) 4.733(0.734) 91.4(1.255) 2.862(0.022) 0.146(0.022)
Proposed 0.001(0) 4.800(0.736) 91.2(1.268) 2.862(0.022) 0.150(0.022)
Poisson (Count)
Deterministic 0(0) 20.333(1.346) 67.8(2.092) 2.390(0.040) 0.610(0.040)
Stochastic 0(0) 4.867(0.726) 91.0(1.281) 2.854(0.022) 0.146(0.022)
Hybrid 0(0) 4.867(0.726) 91.0(1.281) 2.854(0.022) 0.146(0.022)
Proposed 0(0) 4.867(0.726) 91.0(1.281) 2.854(0.022) 0.146(0.022)
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 67 / 75
Simulation study II–Case II
Table 10: Simulation study for Case II (n, p) = (600, 2000) based on 500 replications.
Notation: s.e.— standard error;
Method FPR% (s.e) FNR% (s.e) TRUE% (s.e) SIZE (s.e) HAM (s.e)
Gaussian
Deterministic 0.001(0) 2.2(0.485) 94.8(0.994) 4.900(0.024) 0.120(0.025)
Stochastic 0.001(0) 0(0) 99.0(0.445) 5.010(0.004) 0.010(0.004)
Hybrid 0.001(0) 0(0) 98.6(0.526) 5.016(0.006) 0.016(0.006)
Proposed 0.001(0) 0(0) 99.0(0.445) 5.010(0.004) 0.010(0.004)
Logistic (Binary)
Deterministic 0(0) 11.360(1.031) 78.2(1.848) 4.438(0.051) 0.574(0.052)
Stochastic 0(0) 0.800(0.287) 97.8(0.657) 4.964(0.015) 0.044(0.015)
Hybrid 0(0) 0.640(0.270) 98.4(0.562) 4.972(0.014) 0.036(0.014)
Proposed 0(0) 0.480(0.211) 98.4(0.562) 4.980(0.011) 0.028(0.011)
Poisson (Count)
Deterministic 0(0) 13.520(1.028) 72.0(2.010) 4.326(0.051) 0.678(0.051)
Stochastic 0(0) 0.120(0.069) 99.4(0.346) 4.994(0.003) 0.006(0.003)
Hybrid 0(0) 0.120(0.069) 99.4(0.346) 4.994(0.003) 0.006(0.003)
Proposed 0(0) 0.120(0.069) 99.4(0.346) 4.994(0.003) 0.006(0.003)
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 68 / 75
Real example–Datasets and Comparison
Three real examples and their information are as follows:
Table 11: Three real data examples
Data type Data name (n, p) Data source
Gaussian OV (563, 2000) TCGA
Binary RNA-seq (801, 2059) UCI
Count Communities  Crime (71, 103) UCI
To measure the model performance, we consider model selection criteria:
I i) BIC,
I ii) EBIC,
I iii) modified BIC (MBIC),
I iv) corrected RIC (RICc), and
I v) modified RIC (MRIC).
We compare the proposed method with LASSO, ENET, MCP and SCAD.
To make a fair comparison, tuning parameters are determined by minimal
cross-validation (CV) errors and minimal EBIC (Chen, Chen, 2008).
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 69 / 75
Real example: OV data (Gaussian)
Table 12: Model selection performance for Gaussian data.
BIC EBIC MBIC RICc MRIC NUM
Proposed 1074.7382 1141.1623 1107.2998 1139.3635 1119.0809 5
EBIC-LASSO 1095.1960 1167.4999 1127.7668 1166.8092 1148.5235 4
EBIC-ENET 1095.1960 1167.4999 1127.7668 1166.8092 1148.5235 4
EBIC-MCP 1095.1960 1167.4999 1127.7668 1166.8092 1148.5235 4
EBIC-SCAD 1095.1960 1167.4999 1127.7668 1166.8092 1148.5235 4
CV-LASSO 1216.9863 2907.2430 2397.6789 3812.9631 3150.1096 145
CV-ENET 1253.3920 2962.9995 2450.3700 3885.1753 3213.1790 147
CV-MCP 1084.6155 1943.9425 1613.8915 2248.3292 1951.1880 65
CV-SCAD 1146.9390 2084.5993 1733.2139 2435.9757 2106.8347 72
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 70 / 75
Real example: RNA-seq data (binary)
Table 13: Model selection performance for binary data.
BIC EBIC MBIC RICc MRIC NUM
Proposed 38.8263 93.5041 66.4278 89.3793 73.1226 3
EBIC-LASSO 62.4658 104.6621 83.1682 100.3838 88.1909 2
EBIC-ENET 55.5049 110.1866 83.1081 106.0623 89.8051 3
EBIC-MCP 47.4836 102.1654 75.0868 98.0410 81.7839 3
EBIC-SCAD 62.4658 104.6621 83.1682 100.3838 88.1909 2
CV-LASSO 213.9476 538.6971 434.7732 618.4070 488.3495 31
CV-ENET 501.4396 1139.5007 1018.9996 1449.3914 1144.5692 74
CV-MCP 73.5445 206.3565 149.4533 212.5774 167.8701 10
CV-SCAD 86.9162 240.1280 176.6266 251.2278 198.3920 12
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 71 / 75
Real example: Communities and Crime data (count)
Table 14: Model selection performance for count data.
BIC EBIC MBIC RICc MRIC NUM
Proposed 185.4760 216.1584 194.6094 217.8657 205.5805 3
EBIC-LASSO 206.4090 242.9842 217.7814 246.7787 231.4429 4
EBIC-ENET 206.0983 242.6735 217.4706 246.4679 231.1321 4
EBIC-MCP 199.0646 235.6399 210.4370 239.4343 224.0985 4
EBIC-SCAD 197.6563 228.2602 206.7542 229.9521 217.6835 3
CV-LASSO 199.2896 256.5730 219.8397 272.1664 244.5245 8
CV-ENET 213.5552 279.6320 238.6720 302.6268 268.8423 10
CV-MCP 199.0646 235.7384 210.4814 239.5518 224.1951 4
CV-SCAD 197.6563 228.3387 206.7897 230.0460 217.7608 3
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 72 / 75
Summary
In Chapter 2, we develop a fast Bayesian approach to best subsets selection
under the linear regression setting.
In Chapter 3, we extend the method in Chapter 2 to multivariate data.
In Chapter 4, we further extend Chapter 2 to various types of data.
In future, we can extend it to multivariate data with various data types.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 73 / 75
References I
REFERENCES
Akaike Hirotogu. Information Theory and an Extension of the Maximum Likelihood Principle //
Selected Papers of Hirotugu Akaike. New York, NY: Springer New York, 1998. 199–213.
Bedrick Edward J. Model Selection for Multivariate Regression in Small Samples // Journal of
Biometrics. 1994. 226–231.
Bozdogan Hamparsum. Model selection and Akaike’s Information Criterion (AIC): The general
theory and its analytical extensions // Psychometrika. 1987. 52, 3. 345–370.
Brown P. J., Vannucci M., Fearn T. Multivariate Bayesian Variable Selection and Prediction //
Journal of the Royal Statistical Society. Series B (Statistical Methodology). 1998. 60, 3.
627–641.
Chen Jiahua, Chen Zehua. Extended Bayesian information criteria for model selection with large
model spaces // Biometrika. 2008. 95, 3. 759–771.
Findlay Gregory M, Daza Riza M, Martin Beth, Zhang Melissa D, Leith Anh P, Gasperini Molly,
Janizek Joseph D, Huang Xingfan, Starita Lea M, Shendure Jay. Accurate classification of
BRCA1 variants with saturation genome editing // Nature. 2018. 562, 7726. 217–222.
Hans Chris, Dobra Adrian, West Mike. Shotgun stochastic search for “large p” regression //
Journal of the American Statistical Association. 2007. 102, 478. 507–516.
Schwarz Gideon. Estimating the Dimension of a Model // The Annals of Statistics. 1978. 6, 2.
461–464.
Yang Xiaohong, Lippman Marc E. BRCA1 and BRCA2 in breast cancer // Breast Cancer
Research and Treatment. Mar 1999. 54, 1. 1–10.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 74 / 75
THANK YOU
Major advisor
I Dr. Gyuhyeong Goh
Committee members:
I Dr. Weixing Song
I Dr. Wei-Wen Hsu
I Dr. Jisang Yu
I Dr. Yoon-Jin Lee
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 75 / 75

More Related Content

Similar to Bayesian solutions to high-dimensional challenges using hybrid search

cv_Jun_NI
cv_Jun_NIcv_Jun_NI
cv_Jun_NIJun Ni
 
A stochastic algorithm for solving the posterior inference problem in topic m...
A stochastic algorithm for solving the posterior inference problem in topic m...A stochastic algorithm for solving the posterior inference problem in topic m...
A stochastic algorithm for solving the posterior inference problem in topic m...TELKOMNIKA JOURNAL
 
Econometrics of High-Dimensional Sparse Models
Econometrics of High-Dimensional Sparse ModelsEconometrics of High-Dimensional Sparse Models
Econometrics of High-Dimensional Sparse ModelsNBER
 
The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...SSA KPI
 
slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16Christian Robert
 
Self-organizing Network for Variable Clustering and Predictive Modeling
Self-organizing Network for Variable Clustering and Predictive ModelingSelf-organizing Network for Variable Clustering and Predictive Modeling
Self-organizing Network for Variable Clustering and Predictive ModelingHui Yang
 
Slides csm
Slides csmSlides csm
Slides csmychaubey
 
Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selectionchenhm
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsNBER
 
K-means and bayesian networks to determine building damage levels
K-means and bayesian networks to determine building damage levelsK-means and bayesian networks to determine building damage levels
K-means and bayesian networks to determine building damage levelsTELKOMNIKA JOURNAL
 
original
originaloriginal
originalbutest
 
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...IRJET Journal
 
Deep Learning and Optimization Methods
Deep Learning and Optimization MethodsDeep Learning and Optimization Methods
Deep Learning and Optimization MethodsStefan Kühn
 
Non parametric bayesian learning in discrete data
Non parametric bayesian learning in discrete dataNon parametric bayesian learning in discrete data
Non parametric bayesian learning in discrete dataYueshen Xu
 
An approximate possibilistic
An approximate possibilisticAn approximate possibilistic
An approximate possibilisticcsandit
 
Evolutionary techniques-for-model-order-reduction-of-large-scale-linear-systems
Evolutionary techniques-for-model-order-reduction-of-large-scale-linear-systemsEvolutionary techniques-for-model-order-reduction-of-large-scale-linear-systems
Evolutionary techniques-for-model-order-reduction-of-large-scale-linear-systemsCemal Ardil
 

Similar to Bayesian solutions to high-dimensional challenges using hybrid search (20)

Petrini - MSc Thesis
Petrini - MSc ThesisPetrini - MSc Thesis
Petrini - MSc Thesis
 
TruongNguyen_CV
TruongNguyen_CVTruongNguyen_CV
TruongNguyen_CV
 
cv_Jun_NI
cv_Jun_NIcv_Jun_NI
cv_Jun_NI
 
A stochastic algorithm for solving the posterior inference problem in topic m...
A stochastic algorithm for solving the posterior inference problem in topic m...A stochastic algorithm for solving the posterior inference problem in topic m...
A stochastic algorithm for solving the posterior inference problem in topic m...
 
Econometrics of High-Dimensional Sparse Models
Econometrics of High-Dimensional Sparse ModelsEconometrics of High-Dimensional Sparse Models
Econometrics of High-Dimensional Sparse Models
 
The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...
 
CLIM Program: Remote Sensing Workshop, Multi-resolution Approaches for Big Sp...
CLIM Program: Remote Sensing Workshop, Multi-resolution Approaches for Big Sp...CLIM Program: Remote Sensing Workshop, Multi-resolution Approaches for Big Sp...
CLIM Program: Remote Sensing Workshop, Multi-resolution Approaches for Big Sp...
 
Goodness Of Fit Test
Goodness Of Fit TestGoodness Of Fit Test
Goodness Of Fit Test
 
slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16
 
Self-organizing Network for Variable Clustering and Predictive Modeling
Self-organizing Network for Variable Clustering and Predictive ModelingSelf-organizing Network for Variable Clustering and Predictive Modeling
Self-organizing Network for Variable Clustering and Predictive Modeling
 
Slides csm
Slides csmSlides csm
Slides csm
 
Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selection
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and Algorithms
 
K-means and bayesian networks to determine building damage levels
K-means and bayesian networks to determine building damage levelsK-means and bayesian networks to determine building damage levels
K-means and bayesian networks to determine building damage levels
 
original
originaloriginal
original
 
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
 
Deep Learning and Optimization Methods
Deep Learning and Optimization MethodsDeep Learning and Optimization Methods
Deep Learning and Optimization Methods
 
Non parametric bayesian learning in discrete data
Non parametric bayesian learning in discrete dataNon parametric bayesian learning in discrete data
Non parametric bayesian learning in discrete data
 
An approximate possibilistic
An approximate possibilisticAn approximate possibilistic
An approximate possibilistic
 
Evolutionary techniques-for-model-order-reduction-of-large-scale-linear-systems
Evolutionary techniques-for-model-order-reduction-of-large-scale-linear-systemsEvolutionary techniques-for-model-order-reduction-of-large-scale-linear-systems
Evolutionary techniques-for-model-order-reduction-of-large-scale-linear-systems
 

Recently uploaded

Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTopCSSGallery
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfAnubhavMangla3
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistandanishmna97
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimaginedpanagenda
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxFIDO Alliance
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 

Recently uploaded (20)

Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 

Bayesian solutions to high-dimensional challenges using hybrid search

  • 1. Bayesian solutions to high-dimensional challenges using hybrid search Shiqiang Jin Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members: Dr. Weixing Song (Statistics) Dr. Wei-Wen Hsu (Statistics) Dr. Jisang Yu (Agricultural Economics) Outside chairperson: Dr. Yoon-Jin Lee (Economics) January 25, 2021 Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 1 / 75
  • 2. Outline Chapter 1: Introduction Chapter 2: Bayesian selection of best subsets via hybrid search Chapter 3: Fast Bayesian best subset selection for high-dimensional multivariate regression Chapter 4: An approximate Bayesian approach to fast high-dimensional variable selection Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 2 / 75
  • 3. Chapter 1 Introduction Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 3 / 75
  • 4. Introduction Figure 1: Data storage equipment 1 1https://www.greenamerica.org/amazon-build-cleaner-and-fairer-cloud Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 4 / 75
  • 5. Challenges of high-dimensional data High-dimensional data problem arises when number of predictors (p) is much larger than sample size (n), e.g. p > n. With large p, only a few of variables are related to the response. Best subset selection: evaluate all possible combinations of predictors. I However, it involves non-convex optimization problems that are computationally intractable when p is large. e.g: 240 ≈ 1012 . Bayesian subset regression is an efficient way to explore the non-convex model space because it implements stochastic search based on MCMC computation. I Limitation: extremely heavy computation and slow convergence. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 5 / 75
  • 6. Challenges of high-dimensional data High-dimensional data problem arises when number of predictors (p) is much larger than sample size (n), e.g. p > n. With large p, only a few of variables are related to the response. Best subset selection: evaluate all possible combinations of predictors. I However, it involves non-convex optimization problems that are computationally intractable when p is large. e.g: 240 ≈ 1012 . Bayesian subset regression is an efficient way to explore the non-convex model space because it implements stochastic search based on MCMC computation. I Limitation: extremely heavy computation and slow convergence. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 5 / 75
  • 7. Challenges of high-dimensional data High-dimensional data problem arises when number of predictors (p) is much larger than sample size (n), e.g. p > n. With large p, only a few of variables are related to the response. Best subset selection: evaluate all possible combinations of predictors. I However, it involves non-convex optimization problems that are computationally intractable when p is large. e.g: 240 ≈ 1012 . Bayesian subset regression is an efficient way to explore the non-convex model space because it implements stochastic search based on MCMC computation. I Limitation: extremely heavy computation and slow convergence. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 5 / 75
  • 8. Challenges in parallel computing In the Bayesian literature, many efforts have been put to reduce the computational burden of MCMC. Shotgun stochastic search (Hans et al., 2007) introduced parallel computing within MCMC procedure to reduce the computational burden. A practical issue is that the high-performance machines and programming protocols are not available to individual users and researchers. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 6 / 75
  • 9. Challenges in multivariate regression One of important issues with high-dimensional data analysis is the number of response variables is multiple, called multivariate data. The multivariate linear regression model (MVRM) is a popular way to connect multiple responses to a common set of predictors. There is an attempt to extend Bayesian stochastic search algorithm to multivariate linear regression setting (Brown et al., 1998). I But Brown et al. (1998) still suffers from computational issues in the presence of high-dimensional data. . Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 7 / 75
  • 10. Objectives In this dissertation, the main focus is to develop innovative Bayesian methods that can identify a best model via a fast global optimization; be quickly implemented in a single CPU core; apply to various types of data (e.g., Gaussian, multivariate, binary, count, and survival data). Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 8 / 75
  • 11. Chapter 2 Bayesian selection of best subsets via hybrid search Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 9 / 75
  • 12. Linear regression model in high-dimensional data Consider a linear regression model y = Xβ + , (1) where y = (y1, . . . , yn)T is a response vector, X = (x1, . . . , xp) ∈ Rn×p is a model matrix, β = (β1, . . . , βp)T is a coefficient vector and ∼ N(0, σ2 In). We assume p n, i.e. High-dimensional data. We assume only a few number of predictors are associated with the response, i.e. β is sparse. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 10 / 75
  • 13. Reduce model To better explain the sparsity of β, we introduce a latent index set γ ⊂ {1, . . . , p} so that Xγ represents a sub-matrix of X containing xj , j ∈ γ. e.g. γ = {1, 3, 4} ⇒ Xγ = (x1, x3, x4). The full model in (1) can be reduced to y = Xγβγ + . (2) Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 11 / 75
  • 14. Objectives in Chapter 2 Our Goals are to obtain: (i) k most important predictors out of p k candidate models; (ii) a single best model from among 2p candidate models. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 12 / 75
  • 15. Prior specifications We consider conjugate priors as follows: βγ|σ2 , γ ∼ Normal(0, τσ2 I|γ|), σ2 ∼ Inverse-Gamma(aσ/2, bσ/2), π(γ) ∝ I(|γ| = k), where |γ| is number of elements in γ. Hence, by Bayes theorem, we have the closed form of marginal likelihood. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 13 / 75
  • 16. Model posterior distribution By Bayes theorem, given model γ, we have π(γ|y) ∝ f (y|γ)π(γ) ∝ (τ−1 ) |γ| 2 |XT γXγ + τ−1I|γ|| 1 2 (yT Hγy + bσ) aσ+n 2 I(|γ| = k) ≡ g(γ)I(|γ| = k), where f (y|γ) is the marginal likelihood function and Hγ = In − Xγ(XT γXγ + τ−1 I|γ|)−1 XT γ. Hence, our Bayesian model selection procedure is simply to find the true model γ that maximizes the probability π(γ|y). For notation simplicity, g(γ) is used as model selection criterion. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 14 / 75
  • 17. Best subset selection algorithm According to best subset selection algorithm, our goals become: (i) Fixed size: given k, select the best subset model by Mk = argγ max |γ|=k g(γ) from p k candidate models. (ii) Single best model: M = arg maxγ π(γ|y) from 2p candidate models. Non-convex optimization problem arises. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 15 / 75
  • 18. Deterministic search with a fixed k Given model γ with model size k, define two neighborhood spaces: I addition neighbor N+(γ) = {γ ∪ {j} : j / ∈ γ} and I deletion neighbor N−(γ) = {γ {j} : j ∈ γ}. Note our Goal (i) is to find γ̂ = argγ max|γ̂|=k g(γ). 1. Initialize γ̂ s.t. |γ̂| = k. 2. Repeat # deterministic search:local optimum Update γ̃ ← arg maxγ∈N+(γ̂) g(γ) ; # N+(γ̂) = {γ̂ ∪ {j} : j / ∈ γ̂} Update γ̂ ← arg maxγ∈N−(γ̃) g(γ); # N−(γ̃) = {γ̃ {j} : j ∈ γ̃} until convergence. 3. Return γ̂. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 16 / 75
  • 19. Hybrid search algorithm 1. γ̂ is obtained from deterministic search. 2. Set γ(0) = γ̂. 3. Repeat for t = 1, . . . , T: #stochastic search:global optimum i) Sample γ∗ with probability proportional to g(γ) for γ ∈ N+(γ̂(t−1) ); ii) Sample γ(t) with probability proportional to g(γ) for γ ∈ N−(γ∗ ); iii) If π(γ̂|y) π(γ(t) |y), then update γ̂ = γ(t) , break the loop, and go to deterministic search. 4. Return γ̂. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 17 / 75
  • 20. Idea of hybrid search Current State Next Update Figure 2: Hybrid search enables us to achieve the global maximum efficiently. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 18 / 75
  • 21. Best subset selection with varying k Note that Goal (ii): a single best model from among 2p candidate models. We extend “fixed” k to varying k by assigning a prior on k. Note that the uniform prior, k ∼ Uniform{1, . . . , K}, tends to assign larger probability to a larger subset (see Chen, Chen (2008)). We define π(k) ∝ 1/ p k I(k ≤ K). Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 19 / 75
  • 22. Hybrid best subset search with varying k Bayesian best subset selection can be done by maximizing π(γ, k|y) ∝ g(γ)/ p k (3) over (γ, k). Our algorithm proceeds as follows: 1. Repeat for k = 1, . . . , K: Given k, implement the hybrid search algorithm to obtain best subset model γ̂k . 2. Find the best model γ̂∗ obtained by γ̂∗ = arg max k∈{1,...,K} g(γ̂k )/ p k . (4) Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 20 / 75
  • 23. Computational burden in g(γ) We have to compute g(γ) p times in each iteration ⇒ inefficient. g(γ) = (τ−1 ) |γ| 2 |XT γXγ + τ−1I|γ|| 1 2 (yT Hγy + bσ) aσ+n 2 , where Hγ = In − Xγ(XT γXγ + τ−1 I|γ|)−1 XT γ. We propose a method that can evaluate g(γ) for all models in γ ∈ N+(γ̂) (or γ ∈ N−(γ̂)) simultaneously in a single computation. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 21 / 75
  • 24. Define γ̂ ∪ {j} as a model in the addition neighbor of γ̂, then g(γ̂ ∪ {j}) = (τ−1 ) |γ̂∪{j}| 2 |XT γ̂∪{j}Xγ̂∪{j} + τ−1I|γ̂∪{j}|| 1 2 yT Hγ̂∪{j}y + bσ aσ+n 2 , where Hγ̂∪{j} = In − Xγ̂∪{j}(XT γ̂∪{j}Xγ̂∪{j} + τ−1 I|γ̂∪{j}|)−1 XT γ̂∪{j}. By the following Lemma 1, Lemma 2 and Sherman-Morrison formula, I Lemma 1: Hγ̂ = (In + τXγ̂ XT γ̂ )−1 . I Lemma 2: det(In + UVT ) = det(Im + VT U) for U and V are n × m matrices. I Sherman-Morrison formula: (A + uvT )−1 = A−1 − A−1 uvT A−1 1+vT A−1u for A ∈ Rn×n and u, v ∈ Rn . Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 22 / 75
  • 25. We have g(γ̂ ∪ {j}) ∝ ( yT Hγ̂y + bσ − (xT j Hγ̂y)2 τ−1 + xT j Hγ̂xj )− aσ+n 2 × (τ−1 + xT j Hγ̂xj )−1/2 . (5) The vector XT Hγ̂y contains xT j Hγ̂y for all j’s. The diagonal elements in matrix XT Hγ̂X contain xT j Hγ̂xj for all j’s. m+(γ̂) = (yT Hγ̂y + bσ)1p − (XT Hγ̂y)2 τ−11p + diag(XT Hγ̂X) − aσ+n 2 × τ−1 1p + diag(XT Hγ̂X) −1/2 . (6) m+(γ̂) contains g(γ̂ ∪ {j}) for all j / ∈ γ. How many inverse matrices and determinants we need to compute? Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 23 / 75
  • 26. We have g(γ̂ ∪ {j}) ∝ ( yT Hγ̂y + bσ − (xT j Hγ̂y)2 τ−1 + xT j Hγ̂xj )− aσ+n 2 × (τ−1 + xT j Hγ̂xj )−1/2 . (5) The vector XT Hγ̂y contains xT j Hγ̂y for all j’s. The diagonal elements in matrix XT Hγ̂X contain xT j Hγ̂xj for all j’s. m+(γ̂) = (yT Hγ̂y + bσ)1p − (XT Hγ̂y)2 τ−11p + diag(XT Hγ̂X) − aσ+n 2 × τ−1 1p + diag(XT Hγ̂X) −1/2 . (6) m+(γ̂) contains g(γ̂ ∪ {j}) for all j / ∈ γ. How many inverse matrices and determinants we need to compute? Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 23 / 75
  • 27. We have g(γ̂ ∪ {j}) ∝ ( yT Hγ̂y + bσ − (xT j Hγ̂y)2 τ−1 + xT j Hγ̂xj )− aσ+n 2 × (τ−1 + xT j Hγ̂xj )−1/2 . (5) The vector XT Hγ̂y contains xT j Hγ̂y for all j’s. The diagonal elements in matrix XT Hγ̂X contain xT j Hγ̂xj for all j’s. m+(γ̂) = (yT Hγ̂y + bσ)1p − (XT Hγ̂y)2 τ−11p + diag(XT Hγ̂X) − aσ+n 2 × τ−1 1p + diag(XT Hγ̂X) −1/2 . (6) m+(γ̂) contains g(γ̂ ∪ {j}) for all j / ∈ γ. How many inverse matrices and determinants we need to compute? Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 23 / 75
  • 28. We have g(γ̂ ∪ {j}) ∝ ( yT Hγ̂y + bσ − (xT j Hγ̂y)2 τ−1 + xT j Hγ̂xj )− aσ+n 2 × (τ−1 + xT j Hγ̂xj )−1/2 . (5) The vector XT Hγ̂y contains xT j Hγ̂y for all j’s. The diagonal elements in matrix XT Hγ̂X contain xT j Hγ̂xj for all j’s. m+(γ̂) = (yT Hγ̂y + bσ)1p − (XT Hγ̂y)2 τ−11p + diag(XT Hγ̂X) − aσ+n 2 × τ−1 1p + diag(XT Hγ̂X) −1/2 . (6) m+(γ̂) contains g(γ̂ ∪ {j}) for all j / ∈ γ. How many inverse matrices and determinants we need to compute? Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 23 / 75
  • 29. Simultaneous computation Similarly for g(γ̃ {j}). m−(γ̃) = (yT Hγ̃y + bσ)1p + (XT Hγ̃y)2 τ−11p − diag(XT Hγ̃X) − aσ+n 2 × τ−1 1p − diag(XT Hγ̃X) −1/2 . m−(γ̃) contains g(γ̃ {j}) for all j ∈ γ. Applying m+(γ̂) and m−(γ̃) to hybrid search, we can significantly boost the computational speed. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 24 / 75
  • 30. Simulation study–Data generation For given n = 100, we generate the data from yi ind ∼ Normal p X j=1 βj xij , 1 ! , where I (xi1, . . . , xip)T iid ∼ Normal(0p, Σ) with Σ = (Σij )p×p and Σij = ρ|i−j| , I βj iid ∼ Uniform{−1, −2, 1, 2} if j ∈ γ and βj = 0 if j / ∈ γ, I γ is an index set of size 4 randomly selected from {1, 2, . . . , p}, I We consider four scenarios for p and ρ: (i) p = 200, ρ = 0.1, (ii) p = 200, ρ = 0.9, (iii) p = 1000, ρ = 0.1, (iv) p = 1000, ρ = 0.9. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 25 / 75
  • 31. Simulation study-Results Table 1: 2, 000 replications; FDR (false discovery rate), TRUE% (percentage of the true model detected), SIZE (selected model size), HAM (Hamming distance). Scenario Method FDR (s.e.) TRUE% (s.e.) SIZE (s.e.) HAM (s.e.) p = 200 Proposed 0.006 (0.001) 96.900 (0.388) 4.032(0.004) 0.032(0.004) ρ = 0.1 SCAD 0.034 (0.002) 85.200 (0.794) 4.188 (0.011) 0.188 (0.011) MCP 0.035 (0.002) 84.750 (0.804) 4.191 (0.011) 0.191 (0.011) ENET 0.016 (0.001) 92.700 (0.582) 4.087 (0.007) 0.087 (0.007) LASSO 0.020 (0.002) 91.350 (0.629) 4.109 (0.009) 0.109 (0.009) p = 200 Proposed 0.023(0.002) 88.750(0.707) 3.985(0.006) 0.203(0.014) ρ = 0.9 SCAD 0.059 (0.003) 74.150 (0.979) 4.107 (0.015) 0.480 (0.022) MCP 0.137 (0.004) 55.400 (1.112) 4.264 (0.020) 1.098 (0.034) ENET 0.501 (0.004) 0.300 (0.122) 7.716 (0.072) 5.018 (0.052) LASSO 0.276 (0.004) 15.550 (0.811) 5.308 (0.033) 2.038 (0.034) Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 26 / 75
  • 32. Simulation study-Results Table 2: 2, 000 replications; FDR (false discovery rate), TRUE% (percentage of the true model detected), SIZE (selected model size), HAM (Hamming distance). Scenario Method FDR (s.e.) TRUE% (s.e.) SIZE (s.e.) HAM (s.e.) p = 1000 Proposed 0.004(0.001) 98.100 (0.305) 4.020 (0.003) 0.020 (0.003) ρ = 0.1 SCAD 0.027 (0.002) 87.900 (0.729) 4.145 (0.010) 0.145 (0.010) MCP 0.031 (0.002) 86.550 (0.763) 4.172 (0.013) 0.172 (0.013) ENET 0.035 (0.002) 84.850 (0.802) 4.181 (0.013) 0.206 (0.012) LASSO 0.014 (0.001) 93.850 (0.537) 4.073 (0.007) 0.073 (0.007) p = 1000 Proposed 0.023(0.002) 89.850 (0.675) 4.005 (0.005) 0.190 (0.013) ρ = 0.9 SCAD 0.068 (0.003) 74.250 (0.978) 4.196 (0.014) 0.493 (0.023) MCP 0.152 (0.004) 53.750 (1.115) 4.226 (0.017) 1.202 (0.035) ENET 0.417 (0.005) 0.150 (0.087) 6.228 (0.068) 4.089 (0.043) LASSO 0.265 (0.004) 19.500 (0.886) 5.139 (0.029) 1.909 (0.035) Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 27 / 75
  • 33. Computational speed comparison To show the merit of simultaneous computation, we compare the hybrid search with “for-loop” and versus simultaneous computation. 0 5 10 15 20 200 400 600 800 1000 p Time (sec) Algorithm 3 (for−loop) Algorithm 3 (Proposed) Figure 3: The hybrid search using simultaneous computation versus using “for-loop”. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 28 / 75
  • 34. Real data application Data description We apply the proposed method to Breast Invasive Carcinoma (BRCA) data generated by The Cancer Genome Atlas (TCGA) Research Network http://cancergenome.nih.gov. The data set contains p = 17, 326 gene expression measurements (recorded on the log scale) of n = 526 patients with primary solid tumor. BRCA1 is a tumor suppressor gene and its mutations predispose women to breast cancer (Findlay et al., 2018). Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 29 / 75
  • 35. Real data application Our goal here is to identify the best fitting model for estimating an association between BRCA1 (response variable) and the other genes (independent variables). BRCA1 = β1 ∗ NBR2 + β2 ∗ DTL + . . . + βp ∗ VPS25 + . Results: Table 3: Model comparison BIC extended BIC MSPE Proposed 985.93 1120.87 0.60 SCAD 1104.69 1176.42 0.68 MCP 1104.69 1176.42 0.68 ENET 1110.65 1198.68 0.68 LASSO 1104.69 1176.42 0.68 Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 30 / 75
  • 36. Real data application Results (cont) 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.582 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 KLHL13 DTL NBR2 C17orf53 TUBG1 CRBN ARHGAP19 VPS25 GNL1 GINS1 LASSO ENET MCP SCAD Proposed Methods Genes −0.1 0.0 0.1 0.2 0.3 Coefficient Figure 4: Except C10orf76, 7 genes are documented as diseases-related genes . Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 31 / 75
  • 37. Chapter 3 Fast Bayesian best subset selection for high-dimensional multivariate regression Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 32 / 75
  • 38. Model Setting Consider fitting the multivariate regression model (MVRM) described as follows: Yn×m = Xn×pCp×m + En×m, (7) where I Y = (y1, y2, . . . , ym)T ∈ Rn×m is a response matrix, I X = (x1, x2, . . . , xp) ∈ Rn×p , I C = (c1, c2, . . . , cp)T ∈ Rp×m is an unknown coefficient matrix, I E = (e1, e2, . . . , en)T ∈ Rn×m with ei i.i.d ∼ Nm(0, Ω), Ω ∈ Rm×m is an unknown nonsingular covariance matrix. Ω = cov(yi ) describes the relationship between each pair of (yij , yi`). Ω = cov(yi ) =     σ2 11 σ12 . . . σ1m σ21 σ2 22 . . . σ2m . . . . . . . . . . . . σm1 σm2 . . . σ2 mm     . Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 33 / 75
  • 39. Row-sparsity We assume p n, i.e. high-dimensional problem arises. C is row-sparsity: only a few number of xj ’s are related with Y. cj 6= 0 ⇔ xj selected; cj = 0 ⇔ xj not selected. We introduce a latent index set γ ⊂ {1, 2, . . . , p} such that γ = {j : cj 6= 0} For example, γ = {1, 3}, then Cγ = cT 1 cT 3 . Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 34 / 75
  • 40. Prior specifications Given γ, the likelihood function is f (Y|Cγ, Ω, γ) ∼ MN(XγCγ, In, Ω), We consider the conjugate prior distributions for parameters C and Ω: π(Cγ|Ω, γ) ∼ MN(0, ζI|γ|, Ω), π(Ω) ∼ W−1 (Ψ, ν), π(γ) ∝ I(|γ| = k), where I W−1 is inverse-wishart distribution, I ζ, Ψ and ν are deterministic hyperparameters. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 35 / 75
  • 41. Objective Hence, the marginal likelihood function f (Y|γ) can be obtained by integrating out the Cγ and Ω: f (Y|γ) = Z Z f (Y|Cγ, Ω, γ)π(Cγ|Ω, γ)π(Ω)dCγdΩ ∝ ζ− m|γ| 2 |XT γXγ + ζ−1 I|γ||− m 2 |YT HγY + Ψ|− n+ν 2 ≡ s(Y|γ), where Hγ = In − Xγ(XT γXγ + ζ−1 I|γ|)−1 XT γ. By Bayesian theorem, the model posterior distribution is given as: π(γ|Y) ∝ s(Y|γ)π(γ), which is our model selection criterion. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 36 / 75
  • 42. Hybrid best subset search with fixed model size k 1. Define γ̂ as the current model with |γ̂| = k. 2. Deterministic search: repeat following steps until get convergence, i) Compute s(Y|γ) for all γ ∈ N+(γ̂). Update γ̃ to be the maximizer of s(Y|γ); ii) Compute s(Y|γ) for all γ ∈ N−(γ̃). Update γ̂ to be the maximizer of s(Y|γ). 3. Return γ̂∗ . 4. Set γ̂(0) ← γ̂∗ . 5. Stochastic search: iterate in t = 1, . . . , T, i) Sample γ̃(t) with probabilities proportional to {s(Y|γ)} for γ ∈ N+(γ̂(t−1) ); ii) Sample γ̂(t) with probabilities proportional to {s(Y|γ)} for γ ∈ N−(γ̃(t) ); iii) If γ̂(t) is better than γ̂∗ , γ̂ ← γ̂(t) , break the loop and jump to step 2. 6. Return γ̂. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 37 / 75
  • 43. Hybrid best subset search with varying k To select a single best model, we assign priors on k: π(k) ∝ I(k K)/ p k . The model selection criterion becomes π(γ, k|y) ∝ s(Y|γ)I(|γ| = k)/ p k . Hybrid best subset search with varying k. i). Fixed size: given k, select the best subset model by Mk = arg maxγ s(Y|γ) via the hybrid search. ii). Varying size: pick the single best model from M1, . . . , MK evaluated by γ̂ = arg max1≤k≤K {π(γ, k|y)}. Problem: s(Y|γ) includes an inverse matrix and two determinants, making the hybrid search slow. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 38 / 75
  • 44. Hybrid best subset search with varying k To select a single best model, we assign priors on k: π(k) ∝ I(k K)/ p k . The model selection criterion becomes π(γ, k|y) ∝ s(Y|γ)I(|γ| = k)/ p k . Hybrid best subset search with varying k. i). Fixed size: given k, select the best subset model by Mk = arg maxγ s(Y|γ) via the hybrid search. ii). Varying size: pick the single best model from M1, . . . , MK evaluated by γ̂ = arg max1≤k≤K {π(γ, k|y)}. Problem: s(Y|γ) includes an inverse matrix and two determinants, making the hybrid search slow. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 38 / 75
  • 45. Simultaneous Computation Like we did in Chapter 2, we can show that s(Y|γ̂ ∪ {j}) is the element of the following vector: m+(γ̂) = ζ−1 1p + diag(XT Hγ̂X) −m/2 · 1p − diag(XT Hγ̂Y(YT Hγ̂Y + Ψ)−1 YT Hγ̂X) ζ−11p + diag(XT Hγ̂X) − n+ν 2 . s(Y|γ̃ {`}) is an element of the following vector: m−(γ̃) = ζ−1 1p − diag(XT γ̃Hγ̃Xγ̃) −m/2 · 1p + diag(XT γ̃Hγ̃Y(YT Hγ̃Y + Ψ)−1 YT Hγ̃Xγ̃) ζ−11p − diag(XT γ̃Hγ̃Xγ̃) #− n+ν 2 . Applying m+(γ̂) and m−(γ̃) to the hybrid search, we can significantly improve the hybrid search speed. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 39 / 75
  • 46. Simulation Study Set n = 100 and m = 5, and generate our data {(yi , xi ) : i = 1, . . . , n} yi ind ∼ Nm(CT xi , Ω), (8) where I xi i.i.d ∼ Np(0p, Σx ), Σx = (ρ |i−j| x )p×p, I Ω = (2ρ |i−j| e )m×m. The true model is γ = {1, 2, 3, 4, 7, 8, 9, 10}. cij i.i.d ∼ Uniform{−1.5, −1, 1, 1.5} for j ∈ γ and cj = 0 for j / ∈ γ. Consider all combinations of scenarios of p, ρe, and ρx as follows: 1. p = 200, 500 and 1000; 2. ρe =0, 0.2 and 0.5; 3. ρx =0.2 and 0.8. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 40 / 75
  • 47. Simulation Study The proposed method is compared with I Brown et al. (1998), a well-known Bayesian approach with same setup with our model in the proposed method. I multivariate LASSO (mLASSO) and multivariate ENET (mENET) with tuning parameters selected by • AIC (Akaike, 1998), • bias-corrected AIC (AICc) (Bedrick, 1994), • BIC (Schwarz, 1978), • consistent AIC (CAIC) (Bozdogan, 1987). Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 41 / 75
  • 48. Table 4: Simulation study of ρe = 0 based on 100 replications. Method FDR (s.e) TRUE% (s.e) SIZE (s.e) HAM (s.e) TIME (s.e) p = 200, ρx = 0.2 Proposed 0(0) 100(0) 8(0) 0(0) 7.054(0.048) Brown 0.002(0.002) 98(1.407) 8.02(0.014) 0.02(0.014) 302.692(0.779) mLASSO 0(0) 100(0) 8(0) 0(0) 0.433(0.002) mENET 0(0) 100(0) 8(0) 0(0) 0.427(0.003) p = 200, ρx = 0.8 Proposed 0(0) 98(1.407) 7.98(0.014) 0.02(0.014) 7.635(0.067) Brown 0.015(0.004) 84(3.685) 8.1(0.046) 0.18(0.044) 284.772(1.222) mLASSO 0.024(0.005) 69(4.648) 8.04(0.063) 0.4(0.067) 0.556(0.005) mENET 0.066(0.009) 49(5.024) 8.49(0.104) 0.79(0.092) 0.536(0.004) p = 1000, ρx = 0.2 Proposed 0(0) 100(0) 8(0) 0(0) 14.182(0.12) Brown 0.008(0.003) 93(2.564) 8.07(0.026) 0.07(0.026) 1752.643(5.724) mLASSO 0.002(0.002) 98(1.407) 8.02(0.014) 0.02(0.014) 0.624(0.004) mENET 0.002(0.002) 98(1.407) 8.02(0.014) 0.02(0.014) 0.606(0.004) p = 1000, ρx = 0.8 Proposed 0(0) 94(2.387) 7.93(0.029) 0.07(0.029) 14.691(0.116) Brown 0.014(0.004) 82(3.861) 8.07(0.046) 0.19(0.042) 1729.196(8.956) mLASSO 0.025(0.006) 61(4.902) 7.92(0.091) 0.56(0.082) 0.79(0.008) mENET 0.078(0.009) 39(4.902) 8.45(0.115) 1.05(0.119) 0.765(0.006) Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 42 / 75
  • 49. Table 5: Simulation study of ρe = 0.2 based on 100 replications. Method FDR (s.e) TRUE% (s.e) SIZE (s.e) HAM (s.e) TIME (s.e) p = 200, ρx = 0.2 Proposed 0(0) 100(0) 8(0) 0(0) 7.067(0.048) Brown 0.003(0.002) 97(1.714) 8.03(0.017) 0.03(0.017) 303.72(0.817) mLASSO 0(0) 100(0) 8(0) 0(0) 0.463(0.003) mENET 0(0) 100(0) 8(0) 0(0) 0.457(0.004) p = 200, ρx = 0.8 Proposed 0(0) 99(1) 7.99(0.01) 0.01(0.01) 7.799(0.072) Brown 0.015(0.004) 84(3.685) 8.1(0.046) 0.18(0.044) 286.251(1.186) mLASSO 0.031(0.006) 67(4.726) 8.13(0.072) 0.45(0.073) 0.544(0.004) mENET 0.062(0.009) 51(5.024) 8.42(0.102) 0.78(0.096) 0.529(0.004) p = 1000, ρx = 0.2 Proposed 0(0) 100(0) 8(0) 0(0) 14.323(0.101) Brown 0.011(0.004) 91(2.876) 8.1(0.033) 0.1(0.033) 1751.256(5.606) mLASSO 0.002(0.002) 98(1.407) 8.02(0.014) 0.02(0.014) 0.676(0.006) mENET 0.002(0.002) 98(1.407) 8.02(0.014) 0.02(0.014) 0.645(0.004) p = 1000, ρx = 0.8 Proposed 0(0) 93(2.564) 7.93(0.026) 0.07(0.026) 14.707(0.122) Brown 0.017(0.004) 80(4.02) 8.08(0.051) 0.22(0.046) 1750.775(8.792) mLASSO 0.028(0.007) 61(4.902) 7.94(0.09) 0.6(0.098) 0.776(0.006) mENET 0.081(0.01) 38(4.878) 8.49(0.123) 1.09(0.121) 0.754(0.006) Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 43 / 75
  • 50. Table 6: Simulation study of ρe = 0.5 based on 100 replications. Method FDR (s.e) TRUE% (s.e) SIZE (s.e) HAM (s.e) TIME (s.e) p = 200, ρx = 0.2 Proposed 0(0) 100(0) 8(0) 0(0) 6.994(0.048) Brown 0.002(0.002) 98(1.407) 8.02(0.014) 0.02(0.014) 299.103(0.687) mLASSO 0(0) 100(0) 8(0) 0(0) 0.452(0.003) mENET 0(0) 100(0) 8(0) 0(0) 0.444(0.004) p = 200, ρx = 0.8 Proposed 0(0) 96(1.969) 7.96(0.02) 0.04(0.02) 7.606(0.064) Brown 0.011(0.003) 87(3.38) 8.06(0.04) 0.14(0.038) 284.063(1.011) mLASSO 0.037(0.008) 67(4.726) 8.19(0.098) 0.55(0.101) 0.549(0.005) mENET 0.07(0.009) 51(5.024) 8.51(0.107) 0.85(0.106) 0.529(0.004) p = 1000, ρx = 0.2 Proposed 0(0) 100(0) 8(0) 0(0) 14.131(0.1) Brown 0.009(0.003) 93(2.564) 8.08(0.031) 0.08(0.031) 1722.443(4.872) mLASSO 0.002(0.002) 98(1.407) 8.02(0.014) 0.02(0.014) 0.663(0.006) mENET 0.001(0.001) 99(1) 8.01(0.01) 0.01(0.01) 0.641(0.004) p = 1000, ρx = 0.8 Proposed 0(0) 93(2.564) 7.93(0.026) 0.07(0.026) 14.804(0.12) Brown 0.011(0.003) 85(3.589) 8.04(0.042) 0.16(0.039) 1699.55(7.121) mLASSO 0.027(0.007) 62(4.878) 7.89(0.099) 0.63(0.098) 0.763(0.006) mENET 0.084(0.01) 37(4.852) 8.5(0.128) 1.14(0.127) 0.749(0.006) Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 44 / 75
  • 51. Real data analysis We apply the proposed method to Breast Invasive Carcinoma (BRCA) data generated by The Cancer Genome Atlas (TCGA) Research Network http://cancergenome.nih.gov. The data set contains 17, 814 gene expression measurements (recorded on the log scale) of 526 patients with primary solid tumor. BRCA1 and BRCA2 are well-known genes to account for the hereditary breast cancer (Yang, Lippman, 1999). Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 45 / 75
  • 52. Real data analysis Goal: determine the model of best-fit for estimating a relationship between BRCA1 BRCA2 (multiple response variables) and the other genes (predictors). [yBRCA1, yBRCA2] = X[β1, β2] + (1, 2). The proposed method is compared with I Brown et al. (1998), I mLASSO and mENET, I Ch.2-Proposed method in Chapter 2, I LASSO and ENET with EBIC (Chen, Chen, 2008). Compute AIC, AICc, BIC, CAIC and the mean squared prediction error (MSPE) by Monte Carlo cross-validation with the 70% training and 30% testing dataset over 500 replications. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 46 / 75
  • 53. Real data analysis Goal: determine the model of best-fit for estimating a relationship between BRCA1 BRCA2 (multiple response variables) and the other genes (predictors). [yBRCA1, yBRCA2] = X[β1, β2] + (1, 2). The proposed method is compared with I Brown et al. (1998), I mLASSO and mENET, I Ch.2-Proposed method in Chapter 2, I LASSO and ENET with EBIC (Chen, Chen, 2008). Compute AIC, AICc, BIC, CAIC and the mean squared prediction error (MSPE) by Monte Carlo cross-validation with the 70% training and 30% testing dataset over 500 replications. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 46 / 75
  • 54. Real data analysis Goal: determine the model of best-fit for estimating a relationship between BRCA1 BRCA2 (multiple response variables) and the other genes (predictors). [yBRCA1, yBRCA2] = X[β1, β2] + (1, 2). The proposed method is compared with I Brown et al. (1998), I mLASSO and mENET, I Ch.2-Proposed method in Chapter 2, I LASSO and ENET with EBIC (Chen, Chen, 2008). Compute AIC, AICc, BIC, CAIC and the mean squared prediction error (MSPE) by Monte Carlo cross-validation with the 70% training and 30% testing dataset over 500 replications. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 46 / 75
  • 55. Real Data Analysis Table 7: Model comparison by OLS using Monte Carlo cross-validation. Method AIC AICc BIC CAIC MSPE Proposed 368.77 381.15 615.15 678.15 0.692 Brown 917.16 918.57 999.29 1020.29 0.989 Ch.2 method 540.66 548.08 732.28 781.28 0.770 mLASSO-AIC 547.52 607.62 1067.66 1200.66 0.808 mLASSO-AICc 547.52 607.62 1067.66 1200.66 0.808 mLASSO-BIC 703.04 717.08 965.06 1032.06 0.864 mLASSO-CAIC 703.04 717.08 965.06 1032.06 0.864 mENET-AIC 562.47 618.67 1066.97 1195.97 0.810 mENET-AICc 562.47 618.67 1066.97 1195.97 0.810 mENET-BIC 705.52 721.34 983.18 1054.18 0.866 mENET-CAIC 705.52 721.34 983.18 1054.18 0.866 LASSO-EBIC 901.06 902.23 975.37 994.37 0.967 ENET-EBIC 902.36 903.77 984.49 1005.49 0.968
  • 56. 30 9 23 65 65 32 32 63 63 34 34 8 9 30 9 23 32 32 19 19 29 29 17 17 7 7 ENET−EBIC LASSO−EBIC mENET−CAIC mENET−BIC mENET−AICc mENET−AIC mLASSO−CAIC mLASSO−BIC mLASSO−AICc mLASSO−AIC Ch. 2 method Brown Proposed 10 20 30 40 50 60 Number of genes Method Number of Genes selected genes p0.05 Figure 5: The number of selected genes and number of significant coefficients. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 48 / 75
  • 57. Chapter 4 An approximate Bayesian approach to fast high-dimensional variable selection Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 49 / 75
  • 58. Model Setting Consider a regression model: I y = (y1, y2, . . . , yn)T is a response variable, e.g. binary, count, and continuous data. I X is an n × p predictor matrix with p n; high-dimensional problem arises. I θ = (θ1, θ2, . . . , θp)T is a sparse coefficient vector. To explain the sparsity of θ, we introduce a latent index set γ = {j : θj 6= 0}. So γ can be treated as a selected model. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 50 / 75
  • 59. Model Setting Consider a regression model: I y = (y1, y2, . . . , yn)T is a response variable, e.g. binary, count, and continuous data. I X is an n × p predictor matrix with p n; high-dimensional problem arises. I θ = (θ1, θ2, . . . , θp)T is a sparse coefficient vector. To explain the sparsity of θ, we introduce a latent index set γ = {j : θj 6= 0}. So γ can be treated as a selected model. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 50 / 75
  • 60. Bayesian model setup and selection Given the model γ, we define I f (y|θγ , γ): the likelihood function, I π(θγ |γ): the prior of θγ , I π(γ): the prior of model γ. By Bayes theorem, π(γ|y) ∝ m(y|γ)π(γ), where m(y|γ) = R f (y|θγ, γ)π(θγ|γ)dθγ is the marginal likelihood function. Goal: find the best model γ∗ = arg maxγ π(γ|y). Problem: m(y|γ) = R f (y|θγ, γ)π(θγ|γ)dθγ may not be computed in a closed form. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 51 / 75
  • 61. Bayesian model setup and selection Given the model γ, we define I f (y|θγ , γ): the likelihood function, I π(θγ |γ): the prior of θγ , I π(γ): the prior of model γ. By Bayes theorem, π(γ|y) ∝ m(y|γ)π(γ), where m(y|γ) = R f (y|θγ, γ)π(θγ|γ)dθγ is the marginal likelihood function. Goal: find the best model γ∗ = arg maxγ π(γ|y). Problem: m(y|γ) = R f (y|θγ, γ)π(θγ|γ)dθγ may not be computed in a closed form. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 51 / 75
  • 62. Bayesian model setup and selection Given the model γ, we define I f (y|θγ , γ): the likelihood function, I π(θγ |γ): the prior of θγ , I π(γ): the prior of model γ. By Bayes theorem, π(γ|y) ∝ m(y|γ)π(γ), where m(y|γ) = R f (y|θγ, γ)π(θγ|γ)dθγ is the marginal likelihood function. Goal: find the best model γ∗ = arg maxγ π(γ|y). Problem: m(y|γ) = R f (y|θγ, γ)π(θγ|γ)dθγ may not be computed in a closed form. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 51 / 75
  • 63. Laplace approximation With Laplace approximation: m(y|γ) ≈ (2πn−1 )|γ|/2 |V̂γ|1/2 f (y|θ̂γ, γ)π(θ̂γ|γ) ∝ (2πn−1 )|γ|/2 f (y|θ̂γ, γ)π(θ̂γ|γ), (9) where θ̂γ = arg max θγ f (y|θγ, γ)π(θγ|γ) (10) is the posterior mode and |V̂γ| = Op(1) under some regularity conditions. The posterior mode θ̂γ must be estimated first to evaluate (9) in the Laplace approximation. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 52 / 75
  • 64. Laplace approximation With Laplace approximation: m(y|γ) ≈ (2πn−1 )|γ|/2 |V̂γ|1/2 f (y|θ̂γ, γ)π(θ̂γ|γ) ∝ (2πn−1 )|γ|/2 f (y|θ̂γ, γ)π(θ̂γ|γ), (9) where θ̂γ = arg max θγ f (y|θγ, γ)π(θγ|γ) (10) is the posterior mode and |V̂γ| = Op(1) under some regularity conditions. The posterior mode θ̂γ must be estimated first to evaluate (9) in the Laplace approximation. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 52 / 75
  • 65. Objective Therefore, the Bayesian model selection criterion becomes π(γ|y) ∝ (2πn−1 )|γ|/2 f (y|θ̂γ, γ) | {z } likelihood π(θ̂γ|γ) | {z } prior π(γ) ≡ S(γ). (11) Objective: find the best model γ∗ = arg maxγ S(γ). Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 53 / 75
  • 66. Hybrid search 1. Set γ̂ as the current model and γ̂(0) = γ̂. 2. Repeat t = 0, 1, 2, . . . , # Deterministic search i). Define nbd(γ̂(t) ); ii). Compute S(γ)I{γ ∈ nbd(γ̂(t) )};# Sequentially in for-loop iii). Compute γ(t+1) = arg maxγ∈nbd(γ̂(t)) S(γ); iv). If S(γ̂(t+1) ) S(γ̂), then update γ̂ ← γ̂(t+1) ; else update γ̂ ← γ̂(t) and break the loop. 3. Set γ̂(0) = γ̂. # Stochastic search 4. Repeat for t = 0, 1, 2, . . . , T, # small T i). Define nbd(γ̂(t) ); ii). Compute S(γ)I{γ ∈ nbd(γ̂(t) )};# Sequentially in for-loop iii). Compute γ∗ = arg maxγ∈nbd(γ̂(t)) S(γ); iv). If S(γ∗ ) S(γ̂), then update γ̂ ← γ∗ and go back to Step 2 immediately, else sample γ̂(t+1) with probability proportional to S(γ) for γ ∈ nbd(γ̂(t) ). 5. Return γ̂. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 54 / 75
  • 67. Hybrid search 1. Set γ̂ as the current model and γ̂(0) = γ̂. 2. Repeat t = 0, 1, 2, . . . , # Deterministic search i). Define nbd(γ̂(t) ); ii). Compute S(γ)I{γ ∈ nbd(γ̂(t) )};# Sequentially in for-loop iii). Compute γ(t+1) = arg maxγ∈nbd(γ̂(t)) S(γ); iv). If S(γ̂(t+1) ) S(γ̂), then update γ̂ ← γ̂(t+1) ; else update γ̂ ← γ̂(t) and break the loop. 3. Set γ̂(0) = γ̂. # Stochastic search 4. Repeat for t = 0, 1, 2, . . . , T, # small T i). Define nbd(γ̂(t) ); ii). Compute S(γ)I{γ ∈ nbd(γ̂(t) )};# Sequentially in for-loop iii). Compute γ∗ = arg maxγ∈nbd(γ̂(t)) S(γ); iv). If S(γ∗ ) S(γ̂), then update γ̂ ← γ∗ and go back to Step 2 immediately, else sample γ̂(t+1) with probability proportional to S(γ) for γ ∈ nbd(γ̂(t) ). 5. Return γ̂. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 54 / 75
  • 68. Simultaneously evaluate all models in nbd+(γ̂) θ̂γ̂ is the posterior mode of current model γ̂. Consider a model in addition neighbor: γ̂ ∪ {j} ∈ nbd+(γ̂). θγ̂∪{j} = (θγ̂, θj ) ⇒ (θ̂γ̂, θj ) ⇒ (θ̂γ̂, θ̃j ), where θ̃j = arg max θj f y|(θ̂γ̂, θj ), γ̂ ∪ {j} | {z } likelihood × π (θ̂γ̂, θj )|γ̂ ∪ {j} | {z } prior (12) Hence, (θ̂γ̂, θ̃j ) is the estimate for θγ̂∪{j}. S̃(γ̂ ∪ {j}) = (2πn−1 ) |γ|+1 2 f (y|θ̂γ̂, θ̃j , γ̂ ∪ {j}) | {z } likelihood π(θ̂γ̂, θ̃j |γ̂ ∪ {j}) | {z } prior π(γ̂ ∪ {j}). Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 55 / 75
  • 69. Simultaneously evaluate all models in nbd+(γ̂) θ̂γ̂ is the posterior mode of current model γ̂. Consider a model in addition neighbor: γ̂ ∪ {j} ∈ nbd+(γ̂). θγ̂∪{j} = (θγ̂, θj ) ⇒ (θ̂γ̂, θj ) ⇒ (θ̂γ̂, θ̃j ), where θ̃j = arg max θj f y|(θ̂γ̂, θj ), γ̂ ∪ {j} | {z } likelihood × π (θ̂γ̂, θj )|γ̂ ∪ {j} | {z } prior (12) Hence, (θ̂γ̂, θ̃j ) is the estimate for θγ̂∪{j}. S̃(γ̂ ∪ {j}) = (2πn−1 ) |γ|+1 2 f (y|θ̂γ̂, θ̃j , γ̂ ∪ {j}) | {z } likelihood π(θ̂γ̂, θ̃j |γ̂ ∪ {j}) | {z } prior π(γ̂ ∪ {j}). Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 55 / 75
  • 70. Simultaneously evaluate all models in nbd+(γ̂) θ̂γ̂ is the posterior mode of current model γ̂. Consider a model in addition neighbor: γ̂ ∪ {j} ∈ nbd+(γ̂). θγ̂∪{j} = (θγ̂, θj ) ⇒ (θ̂γ̂, θj ) ⇒ (θ̂γ̂, θ̃j ), where θ̃j = arg max θj f y|(θ̂γ̂, θj ), γ̂ ∪ {j} | {z } likelihood × π (θ̂γ̂, θj )|γ̂ ∪ {j} | {z } prior (12) Hence, (θ̂γ̂, θ̃j ) is the estimate for θγ̂∪{j}. S̃(γ̂ ∪ {j}) = (2πn−1 ) |γ|+1 2 f (y|θ̂γ̂, θ̃j , γ̂ ∪ {j}) | {z } likelihood π(θ̂γ̂, θ̃j |γ̂ ∪ {j}) | {z } prior π(γ̂ ∪ {j}). Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 55 / 75
  • 71. Simultaneously evaluate all models in nbd+(γ̂) θ̂γ̂ is the posterior mode of current model γ̂. Consider a model in addition neighbor: γ̂ ∪ {j} ∈ nbd+(γ̂). θγ̂∪{j} = (θγ̂, θj ) ⇒ (θ̂γ̂, θj ) ⇒ (θ̂γ̂, θ̃j ), where θ̃j = arg max θj f y|(θ̂γ̂, θj ), γ̂ ∪ {j} | {z } likelihood × π (θ̂γ̂, θj )|γ̂ ∪ {j} | {z } prior (12) Hence, (θ̂γ̂, θ̃j ) is the estimate for θγ̂∪{j}. S̃(γ̂ ∪ {j}) = (2πn−1 ) |γ|+1 2 f (y|θ̂γ̂, θ̃j , γ̂ ∪ {j}) | {z } likelihood π(θ̂γ̂, θ̃j |γ̂ ∪ {j}) | {z } prior π(γ̂ ∪ {j}). Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 55 / 75
  • 72. Simultaneously evaluate all models in nbd+(γ̂) 3 steps to estimate all θj ’s simultaneously: 1. Given θ̂γ̂ , take the logarithm of following function: `(θj ) = log f y|(θ̂γ̂ , θj ), γ̂ ∪ {j} | {z } likelihood × π (θ̂γ̂ , θj )|γ̂ ∪ {j} | {z } prior . 2. Taking the first derivative of `(θj ) w.r.t each θj : uj (θj ) = ∂`(θj ) ∂θj . Thus, maximizing of `(θj ) is equivalent to finding the root of uj (θj ) = 0. Note: only one unknown parameter θj in the uj (θj ). Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 56 / 75
  • 73. Simultaneously evaluate all models in nbd+(γ̂) 3 steps to estimate all θj ’s simultaneously: 1. Given θ̂γ̂ , take the logarithm of following function: `(θj ) = log f y|(θ̂γ̂ , θj ), γ̂ ∪ {j} | {z } likelihood × π (θ̂γ̂ , θj )|γ̂ ∪ {j} | {z } prior . 2. Taking the first derivative of `(θj ) w.r.t each θj : uj (θj ) = ∂`(θj ) ∂θj . Thus, maximizing of `(θj ) is equivalent to finding the root of uj (θj ) = 0. Note: only one unknown parameter θj in the uj (θj ). Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 56 / 75
  • 74. Simultaneously evaluate all models in nbd+(γ̂) Without loss of generality, we assume the current model γ̂ = {1, 2, . . . , k}. For j = k + 1, . . . , p, all θ̃j ’s can be obtained by finding the roots of system of equations: uk+1(θk+1) = 0, uk+2(θk+2) = 0, . . . , up(θp) = 0. (13) 3. Solve (13) using Newton’s method by iterating: θ(t+1) = θ(t) − Ju(θ(t) )−1 u(θ(t) ), (14) where θ(t) = (θ (t) k+1, θ (t) k+2, . . . , θ (t) p ) and Ju(θ(t) ) = ∂uj (θ (t) j ) ∂θ (t) l j,l∈{k+1,...,p} . Note that Ju(θ(t) ) is a diagonal matrix ⇒ easily compute Ju(θ(t) )−1 . Plug all (θ̂γ, θ̃j ) to S(γ) function, we obtain all S̃(γ) in the addition neighbor. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 57 / 75
  • 75. Theorem Theorem S(γ) is defined by (11), where θ̂γ is the posterior mode. S̃(γ) is an approximate estimate of S(γ). Define γ1 and γ2 are two different models. If S̃(γ1) S(γ2), then S(γ1) S(γ2). Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 58 / 75
  • 76. Simultaneously evaluate all models in nbd−(γ̂) If γ̂ = {1, 2, 3}, then nbd−(γ̂) = {{1, 2}, {1, 3}, {2, 3}}. To estimate θ{1,2}, θ{1,3} and θ{2,3} simultaneously, I Decompose θ̂γ̂ = (θ̂1, θ̂2, θ̂3). I   θ̃{1,2} θ̃{1,3} θ̃{2,3}   =   (θ̂1, θ̂2, θ̂3) (θ̂1, θ̂2, θ̂3) ( θ̂1, θ̂2, θ̂3)   =   (θ̂1, θ̂2) (θ̂1, θ̂3) (θ̂2, θ̂3)   Plug {θ̃{1,2}, θ̃{1,3}, θ̃{2,3}} to S(γ) function to obtain its approximate estimate S̃(γ) in the deletion neighbor. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 59 / 75
  • 77. Apply S̃(γ) to hybrid search algorithm–Part I 1. Set γ̂ as the current model and γ̂(0) = γ̂. 2. Repeat t = 0, 1, 2, . . . , # deterministic search i). Simultaneously compute S̃(γ) in addition and deletion neighbor. ii). Select the best model γ̂+ and γ̂− in addition and deletion neighbor, respectively. iii). Compute γ̂(t+1) = arg maxγ∈{γ̂+,γ̂−} S(γ). iv). If S(γ̂(t+1) ) S(γ̂), then γ̂ ← γ̂(t+1) , else γ̂ ← γ̂(t) and break the loop. 3. Return γ̂. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 60 / 75
  • 78. Apply S̃(γ) to hybrid search algorithm–Part II 4. Set γ̂(0) = γ̂ for the stochastic search. 5. Repeat for t = 0, 1, 2, . . . , T # stochastic search i). Simultaneously compute S̃(γ) in addition and deletion neighbor. ii). Sample a model γ̂+ with probability proportional to S̃(γ) for γ ∈ nbd+(γ̂(t) ); iii). Sample a model γ̂− with probability proportional to S̃(γ) for γ ∈ nbd−(γ̂(t) ); iv). Compute γ̂(t+1) = arg maxγ∈{γ̂+,γ̂−} S(γ); v). If S(γ̂(t+1) ) S(γ̂), then γ̂ ← γ̂(t+1) and jump to Step 2 immediately, else sample a model γ̂(t+1) with probability proportional to S(γ) for γ ∈ {γ̂+ , γ̂− }. 6. Return γ̂. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 61 / 75
  • 79. Simulation Study Given model γ, we have a general framework of Bayesian model structure as follows: E(yi |xT iγθγ) = g−1 (xT iγθγ), π(θγ|γ) ∼ N(0, λI|γ|), π(γ) ∝ 1/ p |γ| , where xiγ is sub-vector of xi and g(·) is the link function. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 62 / 75
  • 80. Simulation Study–Model setting in Case I In Case I, n = 300, p = 1000, data sets are generated as follows: Gaussian: yi i.i.d ∼ N(xT i βγ, σ2 ), where γ = {1, 3, 5, 7, 9}, βγ = (1, −1, 1, −1, 1)T , σ2 = 3, and xi i.i.d ∼ N(0p, Σx ) with Σx = (Σij )p×p and Σij = 0.6|i−j| . Binary: yi i.i.d ∼ Bernoulli(pi ) with pi = g2(xT i θγ), where g2(x) = 1 1+exp(−x) , γ = {1, 3, 5}, θγ = (0.8, 1 − 1)T and xi i.i.d ∼ N(0p, Σx ) with Σx = (Σij )p×p and Σij = 0.6|i−j| . Count: yi i.i.d ∼ Poisson(µi ) with µi = g3(xT i θγ), where g3(x) = exp(x), γ = {1, 3, 5}, θγ = (0.8, 1, −1)T , xi = Φ(zi ) − 0.51p with Φ(·) the CDF of standard normal distribution, and zi i.i.d ∼ N(0p, Σz ) with Σz = (Σij )p×p and Σij = 0.6|i−j| . Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 63 / 75
  • 81. Simulation Study–Model setting in Case II In Case II, n = 600, p = 2000, data sets are generated as follows: Gaussian: yi i.i.d ∼ N(xT i βγ, σ2 ), where γ = {1, 3, 5, 7, 9}, βγ = (1, −1, 1, −1, 1)T , σ2 = 6, and xi i.i.d ∼ N(0p, Σx ) with Σx = (Σij )p×p and Σij = 0.6|i−j| . Binary: yi i.i.d ∼ Bernoulli(pi ) with pi = g2(xT i θγ), where g2(x) = 1 1+exp(−x) , γ = {1, 3, 5, 7, 9}, θγ = (1, 0.9, −0.9, 0.9, −0.9)T and xi i.i.d ∼ N(0p, Σx ) with Σx = (Σij )p×p and Σij = 0.6|i−j| . Count: yi i.i.d ∼ Poisson(µi ) with µi = g3(xT i θγ), where g3(x) = exp(x), γ = {1, 3, 5, 7, 9}, θγ = (0.8, 0.8, −0.8, 0.8, −0.8)T ,xi = Φ(zi ) − 0.51p with Φ(·) the CDF of standard normal distribution, and zi i.i.d ∼ N(0p, Σz ) with Σz = (Σij )p×p and Σij = 0.6|i−j| . Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 64 / 75
  • 82. Simulation study–Performance of computation and model selection With these model settings, we run I Simulation study I to investigate the performance of computation; I Simulation study II to investigate the performance of model selection. Table 8: The comparison of best model search algorithms Method Speed Maximum Computation Function Deterministic Very fast Local “for-loop” S(γ) Stochastic Slow Global “for-loop” S(γ) Hybrid Fast Global “for-loop” S(γ) Proposed Very fast Global Simultaneous S̃(γ) Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 65 / 75
  • 83. Simulation study I–Speed ● ● ● ● 0 10 20 30 40 50 Case I Case II Time(min) Gaussian ● ● ● ● 0 10 20 30 40 50 Case I Case II Time(min) Logistic ● ● ● ● 0 10 20 30 40 Case I Case II Time(min) Poisson Algorithm ● ● Deterministic Stochastic Hybrid Proposed Figure 6: Comparison of computational speed in four algorithms based on 100 replications. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 66 / 75
  • 84. Simulation study II–Case I Table 9: Simulation study for Case I (n, p) = (300, 1000) based on 500 repliations. Notation: s.e.— standard error; Method FPR% (s.e) FNR% (s.e) TRUE% (s.e) SIZE (s.e) HAM (s.e) Gaussian Deterministic 0.001(0.001) 10.72(1.128) 82.6(1.697) 4.478(0.057) 0.550(0.057) Stochastic 0.001(0) 0(0) 98.8(0.487) 5.012(0.005) 0.012(0.005) Hybrid 0.001(0) 0(0) 98.8(0.487) 5.012(0.005) 0.012(0.005) Proposed 0.001(0) 0.24(0.138) 98.6(0.526) 4.998(0.007) 0.022(0.009) Logistic (Binary) Deterministic 0.001(0) 30.933(1.470) 52.0(2.237) 2.080(0.044) 0.936(0.044) Stochastic 0(0) 4.733(0.734) 91.4(1.255) 2.862(0.022) 0.146(0.022) Hybrid 0(0) 4.733(0.734) 91.4(1.255) 2.862(0.022) 0.146(0.022) Proposed 0.001(0) 4.800(0.736) 91.2(1.268) 2.862(0.022) 0.150(0.022) Poisson (Count) Deterministic 0(0) 20.333(1.346) 67.8(2.092) 2.390(0.040) 0.610(0.040) Stochastic 0(0) 4.867(0.726) 91.0(1.281) 2.854(0.022) 0.146(0.022) Hybrid 0(0) 4.867(0.726) 91.0(1.281) 2.854(0.022) 0.146(0.022) Proposed 0(0) 4.867(0.726) 91.0(1.281) 2.854(0.022) 0.146(0.022) Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 67 / 75
  • 85. Simulation study II–Case II Table 10: Simulation study for Case II (n, p) = (600, 2000) based on 500 replications. Notation: s.e.— standard error; Method FPR% (s.e) FNR% (s.e) TRUE% (s.e) SIZE (s.e) HAM (s.e) Gaussian Deterministic 0.001(0) 2.2(0.485) 94.8(0.994) 4.900(0.024) 0.120(0.025) Stochastic 0.001(0) 0(0) 99.0(0.445) 5.010(0.004) 0.010(0.004) Hybrid 0.001(0) 0(0) 98.6(0.526) 5.016(0.006) 0.016(0.006) Proposed 0.001(0) 0(0) 99.0(0.445) 5.010(0.004) 0.010(0.004) Logistic (Binary) Deterministic 0(0) 11.360(1.031) 78.2(1.848) 4.438(0.051) 0.574(0.052) Stochastic 0(0) 0.800(0.287) 97.8(0.657) 4.964(0.015) 0.044(0.015) Hybrid 0(0) 0.640(0.270) 98.4(0.562) 4.972(0.014) 0.036(0.014) Proposed 0(0) 0.480(0.211) 98.4(0.562) 4.980(0.011) 0.028(0.011) Poisson (Count) Deterministic 0(0) 13.520(1.028) 72.0(2.010) 4.326(0.051) 0.678(0.051) Stochastic 0(0) 0.120(0.069) 99.4(0.346) 4.994(0.003) 0.006(0.003) Hybrid 0(0) 0.120(0.069) 99.4(0.346) 4.994(0.003) 0.006(0.003) Proposed 0(0) 0.120(0.069) 99.4(0.346) 4.994(0.003) 0.006(0.003) Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 68 / 75
  • 86. Real example–Datasets and Comparison Three real examples and their information are as follows: Table 11: Three real data examples Data type Data name (n, p) Data source Gaussian OV (563, 2000) TCGA Binary RNA-seq (801, 2059) UCI Count Communities Crime (71, 103) UCI To measure the model performance, we consider model selection criteria: I i) BIC, I ii) EBIC, I iii) modified BIC (MBIC), I iv) corrected RIC (RICc), and I v) modified RIC (MRIC). We compare the proposed method with LASSO, ENET, MCP and SCAD. To make a fair comparison, tuning parameters are determined by minimal cross-validation (CV) errors and minimal EBIC (Chen, Chen, 2008). Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 69 / 75
  • 87. Real example: OV data (Gaussian) Table 12: Model selection performance for Gaussian data. BIC EBIC MBIC RICc MRIC NUM Proposed 1074.7382 1141.1623 1107.2998 1139.3635 1119.0809 5 EBIC-LASSO 1095.1960 1167.4999 1127.7668 1166.8092 1148.5235 4 EBIC-ENET 1095.1960 1167.4999 1127.7668 1166.8092 1148.5235 4 EBIC-MCP 1095.1960 1167.4999 1127.7668 1166.8092 1148.5235 4 EBIC-SCAD 1095.1960 1167.4999 1127.7668 1166.8092 1148.5235 4 CV-LASSO 1216.9863 2907.2430 2397.6789 3812.9631 3150.1096 145 CV-ENET 1253.3920 2962.9995 2450.3700 3885.1753 3213.1790 147 CV-MCP 1084.6155 1943.9425 1613.8915 2248.3292 1951.1880 65 CV-SCAD 1146.9390 2084.5993 1733.2139 2435.9757 2106.8347 72 Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 70 / 75
  • 88. Real example: RNA-seq data (binary) Table 13: Model selection performance for binary data. BIC EBIC MBIC RICc MRIC NUM Proposed 38.8263 93.5041 66.4278 89.3793 73.1226 3 EBIC-LASSO 62.4658 104.6621 83.1682 100.3838 88.1909 2 EBIC-ENET 55.5049 110.1866 83.1081 106.0623 89.8051 3 EBIC-MCP 47.4836 102.1654 75.0868 98.0410 81.7839 3 EBIC-SCAD 62.4658 104.6621 83.1682 100.3838 88.1909 2 CV-LASSO 213.9476 538.6971 434.7732 618.4070 488.3495 31 CV-ENET 501.4396 1139.5007 1018.9996 1449.3914 1144.5692 74 CV-MCP 73.5445 206.3565 149.4533 212.5774 167.8701 10 CV-SCAD 86.9162 240.1280 176.6266 251.2278 198.3920 12 Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 71 / 75
  • 89. Real example: Communities and Crime data (count) Table 14: Model selection performance for count data. BIC EBIC MBIC RICc MRIC NUM Proposed 185.4760 216.1584 194.6094 217.8657 205.5805 3 EBIC-LASSO 206.4090 242.9842 217.7814 246.7787 231.4429 4 EBIC-ENET 206.0983 242.6735 217.4706 246.4679 231.1321 4 EBIC-MCP 199.0646 235.6399 210.4370 239.4343 224.0985 4 EBIC-SCAD 197.6563 228.2602 206.7542 229.9521 217.6835 3 CV-LASSO 199.2896 256.5730 219.8397 272.1664 244.5245 8 CV-ENET 213.5552 279.6320 238.6720 302.6268 268.8423 10 CV-MCP 199.0646 235.7384 210.4814 239.5518 224.1951 4 CV-SCAD 197.6563 228.3387 206.7897 230.0460 217.7608 3 Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 72 / 75
  • 90. Summary In Chapter 2, we develop a fast Bayesian approach to best subsets selection under the linear regression setting. In Chapter 3, we extend the method in Chapter 2 to multivariate data. In Chapter 4, we further extend Chapter 2 to various types of data. In future, we can extend it to multivariate data with various data types. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 73 / 75
  • 91. References I REFERENCES Akaike Hirotogu. Information Theory and an Extension of the Maximum Likelihood Principle // Selected Papers of Hirotugu Akaike. New York, NY: Springer New York, 1998. 199–213. Bedrick Edward J. Model Selection for Multivariate Regression in Small Samples // Journal of Biometrics. 1994. 226–231. Bozdogan Hamparsum. Model selection and Akaike’s Information Criterion (AIC): The general theory and its analytical extensions // Psychometrika. 1987. 52, 3. 345–370. Brown P. J., Vannucci M., Fearn T. Multivariate Bayesian Variable Selection and Prediction // Journal of the Royal Statistical Society. Series B (Statistical Methodology). 1998. 60, 3. 627–641. Chen Jiahua, Chen Zehua. Extended Bayesian information criteria for model selection with large model spaces // Biometrika. 2008. 95, 3. 759–771. Findlay Gregory M, Daza Riza M, Martin Beth, Zhang Melissa D, Leith Anh P, Gasperini Molly, Janizek Joseph D, Huang Xingfan, Starita Lea M, Shendure Jay. Accurate classification of BRCA1 variants with saturation genome editing // Nature. 2018. 562, 7726. 217–222. Hans Chris, Dobra Adrian, West Mike. Shotgun stochastic search for “large p” regression // Journal of the American Statistical Association. 2007. 102, 478. 507–516. Schwarz Gideon. Estimating the Dimension of a Model // The Annals of Statistics. 1978. 6, 2. 461–464. Yang Xiaohong, Lippman Marc E. BRCA1 and BRCA2 in breast cancer // Breast Cancer Research and Treatment. Mar 1999. 54, 1. 1–10. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 74 / 75
  • 92. THANK YOU Major advisor I Dr. Gyuhyeong Goh Committee members: I Dr. Weixing Song I Dr. Wei-Wen Hsu I Dr. Jisang Yu I Dr. Yoon-Jin Lee Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 75 / 75