Bayesian solutions to high-dimensional challenges using hybrid search

Bayesian solutions to high-dimensional challenges using
hybrid search
Shiqiang Jin
Department of Statistics
Kansas State University, Manhattan, KS
Major advisor
Dr. Gyuhyeong Goh (Statistics)
Committee members:
Dr. Weixing Song (Statistics)
Dr. Wei-Wen Hsu (Statistics)
Dr. Jisang Yu (Agricultural Economics)
Outside chairperson: Dr. Yoon-Jin Lee (Economics)
January 25, 2021
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Major advisor Dr. Gyuhyeong Goh (Statistics) Committee members
Bayesian solutions to high-dimensional challenges using hybrid search January 25, 2021 1 / 75

Outline
Chapter 1: Introduction
Chapter 2: Bayesian selection of best subsets via hybrid search
Chapter 3: Fast Bayesian best subset selection for high-dimensional
multivariate regression
Chapter 4: An approximate Bayesian approach to fast high-dimensional
variable selection

Chapter 1
Introduction

Introduction
Figure 1: Data storage equipment 1
1https://www.greenamerica.org/amazon-build-cleaner-and-fairer-cloud

Challenges of high-dimensional data
High-dimensional data problem arises when number of predictors (p) is much
larger than sample size (n), e.g. p > n.
With large p, only a few of variables are related to the response.
Best subset selection: evaluate all possible combinations of predictors.
I However, it involves non-convex optimization problems that are
computationally intractable when p is large. e.g: 240
≈ 1012
.
Bayesian subset regression is an efficient way to explore the non-convex
model space because it implements stochastic search based on MCMC
computation.
I Limitation: extremely heavy computation and slow convergence.

Challenges in parallel computing
In the Bayesian literature, many efforts have been put to reduce the
computational burden of MCMC.
Shotgun stochastic search (Hans et al., 2007) introduced parallel
computing within MCMC procedure to reduce the computational burden.
A practical issue is that the high-performance machines and programming
protocols are not available to individual users and researchers.

Challenges in multivariate regression
One of important issues with high-dimensional data analysis is the number of
response variables is multiple, called multivariate data.
The multivariate linear regression model (MVRM) is a popular way to
connect multiple responses to a common set of predictors.
There is an attempt to extend Bayesian stochastic search algorithm to
multivariate linear regression setting (Brown et al., 1998).
I But Brown et al. (1998) still suffers from computational issues in the presence
of high-dimensional data.
.

Objectives
In this dissertation, the main focus is to develop innovative Bayesian methods that
can
identify a best model via a fast global optimization;
be quickly implemented in a single CPU core;
apply to various types of data (e.g., Gaussian, multivariate, binary, count,
and survival data).

Chapter 2
Bayesian selection of best subsets via hybrid search

Linear regression model in high-dimensional data
Consider a linear regression model
y = Xβ + , (1)
where y = (y1, . . . , yn)T
is a response vector, X = (x1, . . . , xp) ∈ Rn×p
is a
model matrix, β = (β1, . . . , βp)T
is a coefficient vector and ∼ N(0, σ2
In).
We assume p n, i.e. High-dimensional data.
We assume only a few number of predictors are associated with the response,
i.e. β is sparse.

Reduce model
To better explain the sparsity of β, we introduce a latent index set
γ ⊂ {1, . . . , p} so that Xγ represents a sub-matrix of X containing xj , j ∈ γ.
e.g. γ = {1, 3, 4} ⇒ Xγ = (x1, x3, x4).
The full model in (1) can be reduced to
y = Xγβγ + . (2)

Objectives in Chapter 2
Our Goals are to obtain:
(i) k most important predictors out of p
k

candidate models;
(ii) a single best model from among 2p
candidate models.

Model posterior distribution
By Bayes theorem, given model γ, we have
π(γ|y) ∝ f (y|γ)π(γ)
∝
(τ−1
)
|γ|
2
|XT
γXγ + τ−1I|γ||
1
2 (yT
Hγy + bσ)
aσ+n
2
I(|γ| = k)
≡ g(γ)I(|γ| = k),
where f (y|γ) is the marginal likelihood function and
Hγ = In − Xγ(XT
γXγ + τ−1
I|γ|)−1
XT
γ.
Hence, our Bayesian model selection procedure is simply to find the true
model γ that maximizes the probability π(γ|y).
For notation simplicity, g(γ) is used as model selection criterion.

Best subset selection algorithm
According to best subset selection algorithm, our goals become:
(i) Fixed size: given k, select the best subset model by
Mk = argγ max
|γ|=k
g(γ)
from p
k

candidate models.
(ii) Single best model: M = arg maxγ π(γ|y) from 2p
candidate models.
Non-convex optimization problem arises.

Deterministic search with a fixed k
Given model γ with model size k, define two neighborhood spaces:
I addition neighbor N+(γ) = {γ ∪ {j} : j /
∈ γ} and
I deletion neighbor N−(γ) = {γ {j} : j ∈ γ}.
Note our Goal (i) is to find γ̂ = argγ max|γ̂|=k g(γ).
1. Initialize γ̂ s.t. |γ̂| = k.
2. Repeat # deterministic search:local optimum
Update γ̃ ← arg maxγ∈N+(γ̂) g(γ) ; # N+(γ̂) = {γ̂ ∪ {j} : j /
∈ γ̂}
Update γ̂ ← arg maxγ∈N−(γ̃) g(γ); # N−(γ̃) = {γ̃ {j} : j ∈ γ̃}
until convergence.
3. Return γ̂.

Hybrid search algorithm
1. γ̂ is obtained from deterministic search.
2. Set γ(0)
= γ̂.
3. Repeat for t = 1, . . . , T: #stochastic search:global optimum
i) Sample γ∗
with probability proportional to g(γ) for γ ∈ N+(γ̂(t−1)
);
ii) Sample γ(t)
with probability proportional to g(γ) for γ ∈ N−(γ∗
);
iii) If π(γ̂|y) π(γ(t)
|y), then update γ̂ = γ(t)
, break the loop, and go to
deterministic search.
4. Return γ̂.

Idea of hybrid search
Current State
Next
Update
Figure 2: Hybrid search enables us to achieve the global maximum efficiently.

Best subset selection with varying k
Note that Goal (ii): a single best model from among 2p
candidate models.
We extend “fixed” k to varying k by assigning a prior on k.
Note that the uniform prior, k ∼ Uniform{1, . . . , K}, tends to assign larger
probability to a larger subset (see Chen, Chen (2008)).
We define
π(k) ∝ 1/

p
k

I(k ≤ K).

Hybrid best subset search with varying k
Bayesian best subset selection can be done by maximizing
π(γ, k|y) ∝ g(γ)/

p
k

(3)
over (γ, k).
Our algorithm proceeds as follows:
1. Repeat for k = 1, . . . , K: Given k, implement the hybrid search algorithm to
obtain best subset model γ̂k .
2. Find the best model γ̂∗
obtained by
γ̂∗
= arg max
k∈{1,...,K}

g(γ̂k )/

p
k

. (4)

Computational burden in g(γ)
We have to compute g(γ) p times in each iteration ⇒ inefficient.
g(γ) =
(τ−1
)
|γ|
2
|XT
γXγ + τ−1I|γ||
1
2 (yT
Hγy + bσ)
aσ+n
2
,
where Hγ = In − Xγ(XT
γXγ + τ−1
I|γ|)−1
XT
γ.
We propose a method that can evaluate g(γ) for all models in γ ∈ N+(γ̂)
(or γ ∈ N−(γ̂)) simultaneously in a single computation.

Define γ̂ ∪ {j} as a model in the addition neighbor of γ̂, then
g(γ̂ ∪ {j}) =
(τ−1
)
|γ̂∪{j}|
2
|XT
γ̂∪{j}Xγ̂∪{j} + τ−1I|γ̂∪{j}||
1
2 yT
Hγ̂∪{j}y + bσ
aσ+n
2
,
where Hγ̂∪{j} = In − Xγ̂∪{j}(XT
γ̂∪{j}Xγ̂∪{j} + τ−1
I|γ̂∪{j}|)−1
XT
γ̂∪{j}.
By the following Lemma 1, Lemma 2 and Sherman-Morrison formula,
I Lemma 1: Hγ̂ = (In + τXγ̂ XT
γ̂ )−1
.
I Lemma 2: det(In + UVT
) = det(Im + VT
U) for U and V are n × m matrices.
I Sherman-Morrison formula: (A + uvT
)−1
= A−1
− A−1
uvT
A−1
1+vT A−1u for A ∈ Rn×n
and u, v ∈ Rn
.

We have
g(γ̂ ∪ {j}) ∝
(
yT
Hγ̂y + bσ −
(xT
j Hγ̂y)2
τ−1 + xT
j Hγ̂xj
)− aσ+n
2
×
(τ−1
+ xT
j Hγ̂xj )−1/2
. (5)
The vector XT
Hγ̂y contains xT
j Hγ̂y for all j’s.
The diagonal elements in matrix XT
Hγ̂X contain xT
j Hγ̂xj for all j’s.
m+(γ̂) =

(yT
Hγ̂y + bσ)1p −
(XT
Hγ̂y)2
τ−11p + diag(XT
Hγ̂X)
− aσ+n
2
×

τ−1
1p + diag(XT
Hγ̂X)
−1/2
. (6)
m+(γ̂) contains g(γ̂ ∪ {j}) for all j /
∈ γ.
How many inverse matrices and determinants we need to compute?

Simultaneous computation
Similarly for g(γ̃ {j}).
m−(γ̃) =

(yT
Hγ̃y + bσ)1p +
(XT
Hγ̃y)2
τ−11p − diag(XT
Hγ̃X)
− aσ+n
2
×

τ−1
1p − diag(XT
Hγ̃X)
−1/2
.
m−(γ̃) contains g(γ̃ {j}) for all j ∈ γ.
Applying m+(γ̂) and m−(γ̃) to hybrid search, we can significantly boost the
computational speed.

Simulation study–Data generation
For given n = 100, we generate the data from
yi
ind
∼ Normal
p
X
j=1
βj xij , 1
!
,
where
I (xi1, . . . , xip)T iid
∼ Normal(0p, Σ) with Σ = (Σij )p×p and Σij = ρ|i−j|
,
I βj
iid
∼ Uniform{−1, −2, 1, 2} if j ∈ γ and βj = 0 if j /
∈ γ,
I γ is an index set of size 4 randomly selected from {1, 2, . . . , p},
I We consider four scenarios for p and ρ:
(i) p = 200, ρ = 0.1, (ii) p = 200, ρ = 0.9,
(iii) p = 1000, ρ = 0.1, (iv) p = 1000, ρ = 0.9.

Simulation study-Results
Table 1: 2, 000 replications; FDR (false discovery rate), TRUE% (percentage of the true
model detected), SIZE (selected model size), HAM (Hamming distance).
Scenario Method FDR (s.e.) TRUE% (s.e.) SIZE (s.e.) HAM (s.e.)
p = 200 Proposed 0.006 (0.001) 96.900 (0.388) 4.032(0.004) 0.032(0.004)
ρ = 0.1 SCAD 0.034 (0.002) 85.200 (0.794) 4.188 (0.011) 0.188 (0.011)
MCP 0.035 (0.002) 84.750 (0.804) 4.191 (0.011) 0.191 (0.011)
ENET 0.016 (0.001) 92.700 (0.582) 4.087 (0.007) 0.087 (0.007)
LASSO 0.020 (0.002) 91.350 (0.629) 4.109 (0.009) 0.109 (0.009)
p = 200 Proposed 0.023(0.002) 88.750(0.707) 3.985(0.006) 0.203(0.014)
ρ = 0.9 SCAD 0.059 (0.003) 74.150 (0.979) 4.107 (0.015) 0.480 (0.022)
MCP 0.137 (0.004) 55.400 (1.112) 4.264 (0.020) 1.098 (0.034)
ENET 0.501 (0.004) 0.300 (0.122) 7.716 (0.072) 5.018 (0.052)
LASSO 0.276 (0.004) 15.550 (0.811) 5.308 (0.033) 2.038 (0.034)

Simulation study-Results
Table 2: 2, 000 replications; FDR (false discovery rate), TRUE% (percentage of the true
model detected), SIZE (selected model size), HAM (Hamming distance).
Scenario Method FDR (s.e.) TRUE% (s.e.) SIZE (s.e.) HAM (s.e.)
p = 1000 Proposed 0.004(0.001) 98.100 (0.305) 4.020 (0.003) 0.020 (0.003)
ρ = 0.1 SCAD 0.027 (0.002) 87.900 (0.729) 4.145 (0.010) 0.145 (0.010)
MCP 0.031 (0.002) 86.550 (0.763) 4.172 (0.013) 0.172 (0.013)
ENET 0.035 (0.002) 84.850 (0.802) 4.181 (0.013) 0.206 (0.012)
LASSO 0.014 (0.001) 93.850 (0.537) 4.073 (0.007) 0.073 (0.007)
p = 1000 Proposed 0.023(0.002) 89.850 (0.675) 4.005 (0.005) 0.190 (0.013)
ρ = 0.9 SCAD 0.068 (0.003) 74.250 (0.978) 4.196 (0.014) 0.493 (0.023)
MCP 0.152 (0.004) 53.750 (1.115) 4.226 (0.017) 1.202 (0.035)
ENET 0.417 (0.005) 0.150 (0.087) 6.228 (0.068) 4.089 (0.043)
LASSO 0.265 (0.004) 19.500 (0.886) 5.139 (0.029) 1.909 (0.035)

Computational speed comparison
To show the merit of simultaneous computation, we compare the hybrid
search with “for-loop” and versus simultaneous computation.
0
5
10
15
20
200 400 600 800 1000
p
Time
(sec)
Algorithm 3 (for−loop)
Algorithm 3 (Proposed)
Figure 3: The hybrid search using simultaneous computation versus using “for-loop”.

Real data application
Data description
We apply the proposed method to Breast Invasive Carcinoma (BRCA) data
generated by The Cancer Genome Atlas (TCGA) Research Network
http://cancergenome.nih.gov.
The data set contains p = 17, 326 gene expression measurements (recorded
on the log scale) of n = 526 patients with primary solid tumor.
BRCA1 is a tumor suppressor gene and its mutations predispose women to
breast cancer (Findlay et al., 2018).

Our goal here is to identify the best fitting model for estimating an
association between BRCA1 (response variable) and the other genes
(independent variables).
BRCA1 = β1 ∗ NBR2 + β2 ∗ DTL + . . . + βp ∗ VPS25 + .
Results:
Table 3: Model comparison
BIC extended BIC MSPE
Proposed 985.93 1120.87 0.60
SCAD 1104.69 1176.42 0.68
MCP 1104.69 1176.42 0.68
ENET 1110.65 1198.68 0.68
LASSO 1104.69 1176.42 0.68

Results (cont)
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.582
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
KLHL13
DTL
NBR2
C17orf53
TUBG1
CRBN
ARHGAP19
VPS25
GNL1
GINS1
LASSO ENET MCP SCAD Proposed
Methods
Genes
−0.1
0.0
0.1
0.2
0.3
Coefficient
Figure 4: Except C10orf76, 7 genes are documented as diseases-related genes
.

Chapter 3
Fast Bayesian best subset selection for high-dimensional
multivariate regression

Model Setting
Consider fitting the multivariate regression model (MVRM) described as
follows:
Yn×m = Xn×pCp×m + En×m, (7)
where
I Y = (y1, y2, . . . , ym)T
∈ Rn×m
is a response matrix,
I X = (x1, x2, . . . , xp) ∈ Rn×p
,
I C = (c1, c2, . . . , cp)T
∈ Rp×m
is an unknown coefficient matrix,
I E = (e1, e2, . . . , en)T
∈ Rn×m
with ei
i.i.d
∼ Nm(0, Ω), Ω ∈ Rm×m
is an
unknown nonsingular covariance matrix.
Ω = cov(yi ) describes the relationship between each pair of (yij , yi`).
Ω = cov(yi ) =




σ2
11 σ12 . . . σ1m
σ21 σ2
22 . . . σ2m
. . . . . . . . . . . .
σm1 σm2 . . . σ2
mm



 .

Row-sparsity
We assume p n, i.e. high-dimensional problem arises.
C is row-sparsity: only a few number of xj ’s are related with Y.
cj 6= 0 ⇔ xj selected; cj = 0 ⇔ xj not selected.
We introduce a latent index set γ ⊂ {1, 2, . . . , p} such that γ = {j : cj 6= 0}
For example, γ = {1, 3}, then
Cγ =

cT
1
cT
3

.

Prior specifications
Given γ, the likelihood function is
f (Y|Cγ, Ω, γ) ∼ MN(XγCγ, In, Ω),
We consider the conjugate prior distributions for parameters C and Ω:
π(Cγ|Ω, γ) ∼ MN(0, ζI|γ|, Ω),
π(Ω) ∼ W−1
(Ψ, ν),
π(γ) ∝ I(|γ| = k),
where
I W−1
is inverse-wishart distribution,
I ζ, Ψ and ν are deterministic hyperparameters.

Objective
Hence, the marginal likelihood function f (Y|γ) can be obtained by
integrating out the Cγ and Ω:
f (Y|γ) =
Z Z
f (Y|Cγ, Ω, γ)π(Cγ|Ω, γ)π(Ω)dCγdΩ
∝ ζ−
m|γ|
2 |XT
γXγ + ζ−1
I|γ||− m
2 |YT
HγY + Ψ|− n+ν
2
≡ s(Y|γ),
where Hγ = In − Xγ(XT
γXγ + ζ−1
I|γ|)−1
XT
γ.
By Bayesian theorem, the model posterior distribution is given as:
π(γ|Y) ∝ s(Y|γ)π(γ),
which is our model selection criterion.

Hybrid best subset search with fixed model size k
1. Define γ̂ as the current model with |γ̂| = k.
2. Deterministic search: repeat following steps until get convergence,
i) Compute s(Y|γ) for all γ ∈ N+(γ̂). Update γ̃ to be the maximizer of s(Y|γ);
ii) Compute s(Y|γ) for all γ ∈ N−(γ̃). Update γ̂ to be the maximizer of s(Y|γ).
3. Return γ̂∗
.
4. Set γ̂(0)
← γ̂∗
.
5. Stochastic search: iterate in t = 1, . . . , T,
i) Sample γ̃(t)
with probabilities proportional to {s(Y|γ)} for γ ∈ N+(γ̂(t−1)
);
ii) Sample γ̂(t)
with probabilities proportional to {s(Y|γ)} for γ ∈ N−(γ̃(t)
);
iii) If γ̂(t)
is better than γ̂∗
, γ̂ ← γ̂(t)
, break the loop and jump to step 2.
6. Return γ̂.

Hybrid best subset search with varying k
To select a single best model, we assign priors on k:
π(k) ∝ I(k K)/

p
k

.
The model selection criterion becomes
π(γ, k|y) ∝ s(Y|γ)I(|γ| = k)/

p
k

.
Hybrid best subset search with varying k.
i). Fixed size: given k, select the best subset model by Mk = arg maxγ s(Y|γ)
via the hybrid search.
ii). Varying size: pick the single best model from M1, . . . , MK evaluated by
γ̂ = arg max1≤k≤K {π(γ, k|y)}.
Problem: s(Y|γ) includes an inverse matrix and two determinants, making
the hybrid search slow.

Simultaneous Computation
Like we did in Chapter 2, we can show that s(Y|γ̂ ∪ {j}) is the element of
the following vector:
m+(γ̂) = ζ−1
1p + diag(XT
Hγ̂X)
−m/2
·

1p −
diag(XT
Hγ̂Y(YT
Hγ̂Y + Ψ)−1
YT
Hγ̂X)
ζ−11p + diag(XT
Hγ̂X)
− n+ν
2
.
s(Y|γ̃ {`}) is an element of the following vector:
m−(γ̃) = ζ−1
1p − diag(XT
γ̃Hγ̃Xγ̃)
−m/2
·

1p +
diag(XT
γ̃Hγ̃Y(YT
Hγ̃Y + Ψ)−1
YT
Hγ̃Xγ̃)
ζ−11p − diag(XT
γ̃Hγ̃Xγ̃)
#− n+ν
2
.
Applying m+(γ̂) and m−(γ̃) to the hybrid search, we can significantly
improve the hybrid search speed.

Simulation Study
Set n = 100 and m = 5, and generate our data {(yi , xi ) : i = 1, . . . , n}
yi
ind
∼ Nm(CT
xi , Ω), (8)
where
I xi
i.i.d
∼ Np(0p, Σx ), Σx = (ρ
|i−j|
x )p×p,
I Ω = (2ρ
|i−j|
e )m×m.
The true model is γ = {1, 2, 3, 4, 7, 8, 9, 10}.
cij
i.i.d
∼ Uniform{−1.5, −1, 1, 1.5} for j ∈ γ and cj = 0 for j /
∈ γ.
Consider all combinations of scenarios of p, ρe, and ρx as follows:
1. p = 200, 500 and 1000;
2. ρe =0, 0.2 and 0.5;
3. ρx =0.2 and 0.8.

Simulation Study
The proposed method is compared with
I Brown et al. (1998), a well-known Bayesian approach with same setup with
our model in the proposed method.
I multivariate LASSO (mLASSO) and multivariate ENET (mENET) with tuning
parameters selected by
• AIC (Akaike, 1998),
• bias-corrected AIC (AICc) (Bedrick, 1994),
• BIC (Schwarz, 1978),
• consistent AIC (CAIC) (Bozdogan, 1987).

Table 4: Simulation study of ρe = 0 based on 100 replications.
Method FDR (s.e) TRUE% (s.e) SIZE (s.e) HAM (s.e) TIME (s.e)
p = 200, ρx = 0.2
Proposed 0(0) 100(0) 8(0) 0(0) 7.054(0.048)
Brown 0.002(0.002) 98(1.407) 8.02(0.014) 0.02(0.014) 302.692(0.779)
mLASSO 0(0) 100(0) 8(0) 0(0) 0.433(0.002)
mENET 0(0) 100(0) 8(0) 0(0) 0.427(0.003)
p = 200, ρx = 0.8
Proposed 0(0) 98(1.407) 7.98(0.014) 0.02(0.014) 7.635(0.067)
Brown 0.015(0.004) 84(3.685) 8.1(0.046) 0.18(0.044) 284.772(1.222)
mLASSO 0.024(0.005) 69(4.648) 8.04(0.063) 0.4(0.067) 0.556(0.005)
mENET 0.066(0.009) 49(5.024) 8.49(0.104) 0.79(0.092) 0.536(0.004)
p = 1000, ρx = 0.2
Proposed 0(0) 100(0) 8(0) 0(0) 14.182(0.12)
Brown 0.008(0.003) 93(2.564) 8.07(0.026) 0.07(0.026) 1752.643(5.724)
mLASSO 0.002(0.002) 98(1.407) 8.02(0.014) 0.02(0.014) 0.624(0.004)
mENET 0.002(0.002) 98(1.407) 8.02(0.014) 0.02(0.014) 0.606(0.004)
p = 1000, ρx = 0.8
Proposed 0(0) 94(2.387) 7.93(0.029) 0.07(0.029) 14.691(0.116)
Brown 0.014(0.004) 82(3.861) 8.07(0.046) 0.19(0.042) 1729.196(8.956)
mLASSO 0.025(0.006) 61(4.902) 7.92(0.091) 0.56(0.082) 0.79(0.008)
mENET 0.078(0.009) 39(4.902) 8.45(0.115) 1.05(0.119) 0.765(0.006)

Table 5: Simulation study of ρe = 0.2 based on 100 replications.
p = 200, ρx = 0.2
Proposed 0(0) 100(0) 8(0) 0(0) 7.067(0.048)
Brown 0.003(0.002) 97(1.714) 8.03(0.017) 0.03(0.017) 303.72(0.817)
mLASSO 0(0) 100(0) 8(0) 0(0) 0.463(0.003)
mENET 0(0) 100(0) 8(0) 0(0) 0.457(0.004)
p = 200, ρx = 0.8
Proposed 0(0) 99(1) 7.99(0.01) 0.01(0.01) 7.799(0.072)
Brown 0.015(0.004) 84(3.685) 8.1(0.046) 0.18(0.044) 286.251(1.186)
mLASSO 0.031(0.006) 67(4.726) 8.13(0.072) 0.45(0.073) 0.544(0.004)
mENET 0.062(0.009) 51(5.024) 8.42(0.102) 0.78(0.096) 0.529(0.004)
p = 1000, ρx = 0.2
Proposed 0(0) 100(0) 8(0) 0(0) 14.323(0.101)
Brown 0.011(0.004) 91(2.876) 8.1(0.033) 0.1(0.033) 1751.256(5.606)
mLASSO 0.002(0.002) 98(1.407) 8.02(0.014) 0.02(0.014) 0.676(0.006)
mENET 0.002(0.002) 98(1.407) 8.02(0.014) 0.02(0.014) 0.645(0.004)
p = 1000, ρx = 0.8
Proposed 0(0) 93(2.564) 7.93(0.026) 0.07(0.026) 14.707(0.122)
Brown 0.017(0.004) 80(4.02) 8.08(0.051) 0.22(0.046) 1750.775(8.792)
mLASSO 0.028(0.007) 61(4.902) 7.94(0.09) 0.6(0.098) 0.776(0.006)
mENET 0.081(0.01) 38(4.878) 8.49(0.123) 1.09(0.121) 0.754(0.006)

Table 6: Simulation study of ρe = 0.5 based on 100 replications.
p = 200, ρx = 0.2
Proposed 0(0) 100(0) 8(0) 0(0) 6.994(0.048)
Brown 0.002(0.002) 98(1.407) 8.02(0.014) 0.02(0.014) 299.103(0.687)
mLASSO 0(0) 100(0) 8(0) 0(0) 0.452(0.003)
mENET 0(0) 100(0) 8(0) 0(0) 0.444(0.004)
p = 200, ρx = 0.8
Proposed 0(0) 96(1.969) 7.96(0.02) 0.04(0.02) 7.606(0.064)
Brown 0.011(0.003) 87(3.38) 8.06(0.04) 0.14(0.038) 284.063(1.011)
mLASSO 0.037(0.008) 67(4.726) 8.19(0.098) 0.55(0.101) 0.549(0.005)
mENET 0.07(0.009) 51(5.024) 8.51(0.107) 0.85(0.106) 0.529(0.004)
p = 1000, ρx = 0.2
Proposed 0(0) 100(0) 8(0) 0(0) 14.131(0.1)
Brown 0.009(0.003) 93(2.564) 8.08(0.031) 0.08(0.031) 1722.443(4.872)
mLASSO 0.002(0.002) 98(1.407) 8.02(0.014) 0.02(0.014) 0.663(0.006)
mENET 0.001(0.001) 99(1) 8.01(0.01) 0.01(0.01) 0.641(0.004)
p = 1000, ρx = 0.8
Proposed 0(0) 93(2.564) 7.93(0.026) 0.07(0.026) 14.804(0.12)
Brown 0.011(0.003) 85(3.589) 8.04(0.042) 0.16(0.039) 1699.55(7.121)
mLASSO 0.027(0.007) 62(4.878) 7.89(0.099) 0.63(0.098) 0.763(0.006)
mENET 0.084(0.01) 37(4.852) 8.5(0.128) 1.14(0.127) 0.749(0.006)

Real data analysis
We apply the proposed method to Breast Invasive Carcinoma (BRCA) data
generated by The Cancer Genome Atlas (TCGA) Research Network
http://cancergenome.nih.gov.
The data set contains 17, 814 gene expression measurements (recorded on
the log scale) of 526 patients with primary solid tumor.
BRCA1 and BRCA2 are well-known genes to account for the hereditary
breast cancer (Yang, Lippman, 1999).

Real data analysis
Goal: determine the model of best-fit for estimating a relationship between
BRCA1 BRCA2 (multiple response variables) and the other genes
(predictors).
[yBRCA1, yBRCA2] = X[β1, β2] + (1, 2).
The proposed method is compared with
I Brown et al. (1998),
I mLASSO and mENET,
I Ch.2-Proposed method in Chapter 2,
I LASSO and ENET with EBIC (Chen, Chen, 2008).
Compute AIC, AICc, BIC, CAIC and the mean squared prediction error
(MSPE) by Monte Carlo cross-validation with the 70% training and 30%
testing dataset over 500 replications.

Real Data Analysis
Table 7: Model comparison by OLS using Monte Carlo cross-validation.
Method AIC AICc BIC CAIC MSPE
Proposed 368.77 381.15 615.15 678.15 0.692
Brown 917.16 918.57 999.29 1020.29 0.989
Ch.2 method 540.66 548.08 732.28 781.28 0.770
mLASSO-AIC 547.52 607.62 1067.66 1200.66 0.808
mLASSO-AICc 547.52 607.62 1067.66 1200.66 0.808
mLASSO-BIC 703.04 717.08 965.06 1032.06 0.864
mLASSO-CAIC 703.04 717.08 965.06 1032.06 0.864
mENET-AIC 562.47 618.67 1066.97 1195.97 0.810
mENET-AICc 562.47 618.67 1066.97 1195.97 0.810
mENET-BIC 705.52 721.34 983.18 1054.18 0.866
mENET-CAIC 705.52 721.34 983.18 1054.18 0.866
LASSO-EBIC 901.06 902.23 975.37 994.37 0.967
ENET-EBIC 902.36 903.77 984.49 1005.49 0.968

30
9
23
65
65
32
32
63
63
34
34
8
9
30
9
23
32
32
19
19
29
29
17
17
7
7
ENET−EBIC
LASSO−EBIC
mENET−CAIC
mENET−BIC
mENET−AICc
mENET−AIC
mLASSO−CAIC
mLASSO−BIC
mLASSO−AICc
mLASSO−AIC
Ch. 2 method
Brown
Proposed
10 20 30 40 50 60
Number of genes
Method
Number of Genes
selected genes
p0.05
Figure 5: The number of selected genes and number of significant coefficients.

Chapter 4
An approximate Bayesian approach to fast
high-dimensional variable selection

Model Setting
Consider a regression model:
I y = (y1, y2, . . . , yn)T
is a response variable, e.g. binary, count, and continuous
data.
I X is an n × p predictor matrix with p n; high-dimensional problem arises.
I θ = (θ1, θ2, . . . , θp)T
is a sparse coefficient vector.
To explain the sparsity of θ, we introduce a latent index set γ = {j : θj 6= 0}.
So γ can be treated as a selected model.

Laplace approximation
With Laplace approximation:
m(y|γ) ≈ (2πn−1
)|γ|/2
|V̂γ|1/2
f (y|θ̂γ, γ)π(θ̂γ|γ)
∝ (2πn−1
)|γ|/2
f (y|θ̂γ, γ)π(θ̂γ|γ), (9)
where
θ̂γ = arg max
θγ
f (y|θγ, γ)π(θγ|γ) (10)
is the posterior mode and |V̂γ| = Op(1) under some regularity conditions.
The posterior mode θ̂γ must be estimated first to evaluate (9) in the Laplace
approximation.

Objective
Therefore, the Bayesian model selection criterion becomes
π(γ|y) ∝ (2πn−1
)|γ|/2
f (y|θ̂γ, γ)
| {z }
likelihood
π(θ̂γ|γ)
| {z }
prior
π(γ) ≡ S(γ). (11)
Objective: find the best model γ∗
= arg maxγ S(γ).

Hybrid search
1. Set γ̂ as the current model and γ̂(0)
= γ̂.
2. Repeat t = 0, 1, 2, . . . , # Deterministic search
i). Define nbd(γ̂(t)
);
ii). Compute S(γ)I{γ ∈ nbd(γ̂(t)
)};# Sequentially in for-loop
iii). Compute γ(t+1)
= arg maxγ∈nbd(γ̂(t)) S(γ);
iv). If S(γ̂(t+1)
) S(γ̂), then update γ̂ ← γ̂(t+1)
;
else update γ̂ ← γ̂(t)
and break the loop.
3. Set γ̂(0)
= γ̂. # Stochastic search
4. Repeat for t = 0, 1, 2, . . . , T, # small T
i). Define nbd(γ̂(t)
);
ii). Compute S(γ)I{γ ∈ nbd(γ̂(t)
)};# Sequentially in for-loop
iii). Compute γ∗
= arg maxγ∈nbd(γ̂(t)) S(γ);
iv). If S(γ∗
) S(γ̂), then update γ̂ ← γ∗
and go back to Step 2 immediately,
else sample γ̂(t+1)
with probability proportional to S(γ) for γ ∈ nbd(γ̂(t)
).
5. Return γ̂.

Simultaneously evaluate all models in nbd+(γ̂)
θ̂γ̂ is the posterior mode of current model γ̂.
Consider a model in addition neighbor: γ̂ ∪ {j} ∈ nbd+(γ̂).
θγ̂∪{j} = (θγ̂, θj ) ⇒ (θ̂γ̂, θj ) ⇒ (θ̂γ̂, θ̃j ), where
θ̃j = arg max
θj
f

y|(θ̂γ̂, θj ), γ̂ ∪ {j}

| {z }
likelihood
× π

(θ̂γ̂, θj )|γ̂ ∪ {j}

| {z }
prior
(12)
Hence, (θ̂γ̂, θ̃j ) is the estimate for θγ̂∪{j}.
S̃(γ̂ ∪ {j}) = (2πn−1
)
|γ|+1
2 f (y|θ̂γ̂, θ̃j , γ̂ ∪ {j})
| {z }
likelihood
π(θ̂γ̂, θ̃j |γ̂ ∪ {j})
| {z }
prior
π(γ̂ ∪ {j}).

3 steps to estimate all θj ’s simultaneously:
1. Given θ̂γ̂ , take the logarithm of following function:
`(θj ) = log f y|(θ̂γ̂ , θj ), γ̂ ∪ {j}

| {z }
likelihood
× π (θ̂γ̂ , θj )|γ̂ ∪ {j}

| {z }
prior
.
2. Taking the first derivative of `(θj ) w.r.t each θj :
uj (θj ) =
∂`(θj )
∂θj
.
Thus, maximizing of `(θj ) is equivalent to finding the root of uj (θj ) = 0.
Note: only one unknown parameter θj in the uj (θj ).

Without loss of generality, we assume the current model γ̂ = {1, 2, . . . , k}.
For j = k + 1, . . . , p, all θ̃j ’s can be obtained by finding the roots of system
of equations:
uk+1(θk+1) = 0, uk+2(θk+2) = 0, . . . , up(θp) = 0. (13)
3. Solve (13) using Newton’s method by iterating:
θ(t+1)
= θ(t)
− Ju(θ(t)
)−1
u(θ(t)
), (14)
where θ(t)
= (θ
(t)
k+1, θ
(t)
k+2, . . . , θ
(t)
p ) and Ju(θ(t)
) =

∂uj (θ
(t)
j )
∂θ
(t)
l

j,l∈{k+1,...,p}
.
Note that Ju(θ(t)
) is a diagonal matrix ⇒ easily compute Ju(θ(t)
)−1
.
Plug all (θ̂γ, θ̃j ) to S(γ) function, we obtain all S̃(γ) in the addition
neighbor.

Theorem
Theorem
S(γ) is defined by (11), where θ̂γ is the posterior mode. S̃(γ) is an approximate
estimate of S(γ). Define γ1 and γ2 are two different models. If S̃(γ1) S(γ2),
then S(γ1) S(γ2).

Simultaneously evaluate all models in nbd−(γ̂)
If γ̂ = {1, 2, 3}, then nbd−(γ̂) = {{1, 2}, {1, 3}, {2, 3}}.
To estimate θ{1,2}, θ{1,3} and θ{2,3} simultaneously,
I Decompose θ̂γ̂ = (θ̂1, θ̂2, θ̂3).
I


θ̃{1,2}
θ̃{1,3}
θ̃{2,3}

 =


(θ̂1, θ̂2,

θ̂3)
(θ̂1,

θ̂2, θ̂3)
(

θ̂1, θ̂2, θ̂3)

 =


(θ̂1, θ̂2)
(θ̂1, θ̂3)
(θ̂2, θ̂3)


Plug {θ̃{1,2}, θ̃{1,3}, θ̃{2,3}} to S(γ) function to obtain its approximate
estimate S̃(γ) in the deletion neighbor.

Apply S̃(γ) to hybrid search algorithm–Part I
1. Set γ̂ as the current model and γ̂(0)
= γ̂.
2. Repeat t = 0, 1, 2, . . . , # deterministic search
i). Simultaneously compute S̃(γ) in addition and deletion neighbor.
ii). Select the best model γ̂+
and γ̂−
in addition and deletion neighbor,
respectively.
iii). Compute γ̂(t+1)
= arg maxγ∈{γ̂+,γ̂−} S(γ).
iv). If S(γ̂(t+1)
) S(γ̂), then γ̂ ← γ̂(t+1)
, else γ̂ ← γ̂(t)
and break the loop.
3. Return γ̂.

Apply S̃(γ) to hybrid search algorithm–Part II
4. Set γ̂(0)
= γ̂ for the stochastic search.
5. Repeat for t = 0, 1, 2, . . . , T # stochastic search
i). Simultaneously compute S̃(γ) in addition and deletion neighbor.
ii). Sample a model γ̂+
with probability proportional to S̃(γ) for γ ∈ nbd+(γ̂(t)
);
iii). Sample a model γ̂−
with probability proportional to S̃(γ) for γ ∈ nbd−(γ̂(t)
);
iv). Compute γ̂(t+1)
= arg maxγ∈{γ̂+,γ̂−} S(γ);
v). If S(γ̂(t+1)
) S(γ̂), then γ̂ ← γ̂(t+1)
and jump to Step 2 immediately,
else sample a model γ̂(t+1)
with probability proportional to S(γ) for
γ ∈ {γ̂+
, γ̂−
}.
6. Return γ̂.

Simulation Study
Given model γ, we have a general framework of Bayesian model structure as
follows:
E(yi |xT
iγθγ) = g−1
(xT
iγθγ),
π(θγ|γ) ∼ N(0, λI|γ|),
π(γ) ∝ 1/

p
|γ|

,
where xiγ is sub-vector of xi and g(·) is the link function.

Simulation Study–Model setting in Case I
In Case I, n = 300, p = 1000, data sets are generated as follows:
Gaussian: yi
i.i.d
∼ N(xT
i βγ, σ2
), where γ = {1, 3, 5, 7, 9},
βγ = (1, −1, 1, −1, 1)T
, σ2
= 3, and xi
i.i.d
∼ N(0p, Σx ) with Σx = (Σij )p×p
and Σij = 0.6|i−j|
.
Binary: yi
i.i.d
∼ Bernoulli(pi ) with pi = g2(xT
i θγ), where g2(x) = 1
1+exp(−x) ,
γ = {1, 3, 5}, θγ = (0.8, 1 − 1)T
and xi
i.i.d
.
Count: yi
i.i.d
∼ Poisson(µi ) with µi = g3(xT
i θγ), where g3(x) = exp(x),
γ = {1, 3, 5}, θγ = (0.8, 1, −1)T
, xi = Φ(zi ) − 0.51p with Φ(·) the CDF of
standard normal distribution, and zi
i.i.d
∼ N(0p, Σz ) with Σz = (Σij )p×p and
Σij = 0.6|i−j|
.

Simulation Study–Model setting in Case II
In Case II, n = 600, p = 2000, data sets are generated as follows:
Gaussian: yi
i.i.d
∼ N(xT
i βγ, σ2
), where γ = {1, 3, 5, 7, 9},
βγ = (1, −1, 1, −1, 1)T
, σ2
= 6, and xi
i.i.d
.
Binary: yi
i.i.d
∼ Bernoulli(pi ) with pi = g2(xT
i θγ), where g2(x) = 1
1+exp(−x) ,
γ = {1, 3, 5, 7, 9}, θγ = (1, 0.9, −0.9, 0.9, −0.9)T
and xi
i.i.d
∼ N(0p, Σx ) with
Σx = (Σij )p×p and Σij = 0.6|i−j|
.
Count: yi
i.i.d
∼ Poisson(µi ) with µi = g3(xT
i θγ), where g3(x) = exp(x),
γ = {1, 3, 5, 7, 9}, θγ = (0.8, 0.8, −0.8, 0.8, −0.8)T
,xi = Φ(zi ) − 0.51p with
Φ(·) the CDF of standard normal distribution, and zi
i.i.d
∼ N(0p, Σz ) with
Σz = (Σij )p×p and Σij = 0.6|i−j|
.

Simulation study–Performance of computation and model
selection
With these model settings, we run
I Simulation study I to investigate the performance of computation;
I Simulation study II to investigate the performance of model selection.
Table 8: The comparison of best model search algorithms
Method Speed Maximum Computation Function
Deterministic Very fast Local “for-loop” S(γ)
Stochastic Slow Global “for-loop” S(γ)
Hybrid Fast Global “for-loop” S(γ)
Proposed Very fast Global Simultaneous S̃(γ)

Simulation study I–Speed
●
●
●
●
0
10
20
30
40
50
Case I Case II
Time(min)
Gaussian
●
●
●
●
0
10
20
30
40
50
Case I Case II
Time(min)
Logistic
●
●
●
●
0
10
20
30
40
Case I Case II
Time(min)
Poisson
Algorithm ● ●
Deterministic Stochastic Hybrid Proposed
Figure 6: Comparison of computational speed in four algorithms based on 100
replications.

Simulation study II–Case I
Table 9: Simulation study for Case I (n, p) = (300, 1000) based on 500 repliations.
Notation: s.e.— standard error;
Method FPR% (s.e) FNR% (s.e) TRUE% (s.e) SIZE (s.e) HAM (s.e)
Gaussian
Deterministic 0.001(0.001) 10.72(1.128) 82.6(1.697) 4.478(0.057) 0.550(0.057)
Stochastic 0.001(0) 0(0) 98.8(0.487) 5.012(0.005) 0.012(0.005)
Hybrid 0.001(0) 0(0) 98.8(0.487) 5.012(0.005) 0.012(0.005)
Proposed 0.001(0) 0.24(0.138) 98.6(0.526) 4.998(0.007) 0.022(0.009)
Logistic (Binary)
Deterministic 0.001(0) 30.933(1.470) 52.0(2.237) 2.080(0.044) 0.936(0.044)
Stochastic 0(0) 4.733(0.734) 91.4(1.255) 2.862(0.022) 0.146(0.022)
Hybrid 0(0) 4.733(0.734) 91.4(1.255) 2.862(0.022) 0.146(0.022)
Proposed 0.001(0) 4.800(0.736) 91.2(1.268) 2.862(0.022) 0.150(0.022)
Poisson (Count)
Deterministic 0(0) 20.333(1.346) 67.8(2.092) 2.390(0.040) 0.610(0.040)
Stochastic 0(0) 4.867(0.726) 91.0(1.281) 2.854(0.022) 0.146(0.022)
Hybrid 0(0) 4.867(0.726) 91.0(1.281) 2.854(0.022) 0.146(0.022)
Proposed 0(0) 4.867(0.726) 91.0(1.281) 2.854(0.022) 0.146(0.022)

Simulation study II–Case II
Table 10: Simulation study for Case II (n, p) = (600, 2000) based on 500 replications.
Notation: s.e.— standard error;
Method FPR% (s.e) FNR% (s.e) TRUE% (s.e) SIZE (s.e) HAM (s.e)
Gaussian
Deterministic 0.001(0) 2.2(0.485) 94.8(0.994) 4.900(0.024) 0.120(0.025)
Stochastic 0.001(0) 0(0) 99.0(0.445) 5.010(0.004) 0.010(0.004)
Hybrid 0.001(0) 0(0) 98.6(0.526) 5.016(0.006) 0.016(0.006)
Proposed 0.001(0) 0(0) 99.0(0.445) 5.010(0.004) 0.010(0.004)
Logistic (Binary)
Deterministic 0(0) 11.360(1.031) 78.2(1.848) 4.438(0.051) 0.574(0.052)
Stochastic 0(0) 0.800(0.287) 97.8(0.657) 4.964(0.015) 0.044(0.015)
Hybrid 0(0) 0.640(0.270) 98.4(0.562) 4.972(0.014) 0.036(0.014)
Proposed 0(0) 0.480(0.211) 98.4(0.562) 4.980(0.011) 0.028(0.011)
Poisson (Count)
Deterministic 0(0) 13.520(1.028) 72.0(2.010) 4.326(0.051) 0.678(0.051)
Stochastic 0(0) 0.120(0.069) 99.4(0.346) 4.994(0.003) 0.006(0.003)
Hybrid 0(0) 0.120(0.069) 99.4(0.346) 4.994(0.003) 0.006(0.003)
Proposed 0(0) 0.120(0.069) 99.4(0.346) 4.994(0.003) 0.006(0.003)

Real example–Datasets and Comparison
Three real examples and their information are as follows:
Table 11: Three real data examples
Data type Data name (n, p) Data source
Gaussian OV (563, 2000) TCGA
Binary RNA-seq (801, 2059) UCI
Count Communities Crime (71, 103) UCI
To measure the model performance, we consider model selection criteria:
I i) BIC,
I ii) EBIC,
I iii) modified BIC (MBIC),
I iv) corrected RIC (RICc), and
I v) modified RIC (MRIC).
We compare the proposed method with LASSO, ENET, MCP and SCAD.
To make a fair comparison, tuning parameters are determined by minimal
cross-validation (CV) errors and minimal EBIC (Chen, Chen, 2008).

Real example: OV data (Gaussian)
Table 12: Model selection performance for Gaussian data.
BIC EBIC MBIC RICc MRIC NUM
Proposed 1074.7382 1141.1623 1107.2998 1139.3635 1119.0809 5
EBIC-LASSO 1095.1960 1167.4999 1127.7668 1166.8092 1148.5235 4
EBIC-ENET 1095.1960 1167.4999 1127.7668 1166.8092 1148.5235 4
EBIC-MCP 1095.1960 1167.4999 1127.7668 1166.8092 1148.5235 4
EBIC-SCAD 1095.1960 1167.4999 1127.7668 1166.8092 1148.5235 4
CV-LASSO 1216.9863 2907.2430 2397.6789 3812.9631 3150.1096 145
CV-ENET 1253.3920 2962.9995 2450.3700 3885.1753 3213.1790 147
CV-MCP 1084.6155 1943.9425 1613.8915 2248.3292 1951.1880 65
CV-SCAD 1146.9390 2084.5993 1733.2139 2435.9757 2106.8347 72

Real example: RNA-seq data (binary)
Table 13: Model selection performance for binary data.
Proposed 38.8263 93.5041 66.4278 89.3793 73.1226 3
EBIC-LASSO 62.4658 104.6621 83.1682 100.3838 88.1909 2
EBIC-ENET 55.5049 110.1866 83.1081 106.0623 89.8051 3
EBIC-MCP 47.4836 102.1654 75.0868 98.0410 81.7839 3
EBIC-SCAD 62.4658 104.6621 83.1682 100.3838 88.1909 2
CV-LASSO 213.9476 538.6971 434.7732 618.4070 488.3495 31
CV-ENET 501.4396 1139.5007 1018.9996 1449.3914 1144.5692 74
CV-MCP 73.5445 206.3565 149.4533 212.5774 167.8701 10
CV-SCAD 86.9162 240.1280 176.6266 251.2278 198.3920 12

Real example: Communities and Crime data (count)
Table 14: Model selection performance for count data.
Proposed 185.4760 216.1584 194.6094 217.8657 205.5805 3
EBIC-LASSO 206.4090 242.9842 217.7814 246.7787 231.4429 4
EBIC-ENET 206.0983 242.6735 217.4706 246.4679 231.1321 4
EBIC-MCP 199.0646 235.6399 210.4370 239.4343 224.0985 4
EBIC-SCAD 197.6563 228.2602 206.7542 229.9521 217.6835 3
CV-LASSO 199.2896 256.5730 219.8397 272.1664 244.5245 8
CV-ENET 213.5552 279.6320 238.6720 302.6268 268.8423 10
CV-MCP 199.0646 235.7384 210.4814 239.5518 224.1951 4
CV-SCAD 197.6563 228.3387 206.7897 230.0460 217.7608 3

Summary
In Chapter 2, we develop a fast Bayesian approach to best subsets selection
under the linear regression setting.
In Chapter 3, we extend the method in Chapter 2 to multivariate data.
In Chapter 4, we further extend Chapter 2 to various types of data.
In future, we can extend it to multivariate data with various data types.

References I
REFERENCES
Akaike Hirotogu. Information Theory and an Extension of the Maximum Likelihood Principle //
Selected Papers of Hirotugu Akaike. New York, NY: Springer New York, 1998. 199–213.
Bedrick Edward J. Model Selection for Multivariate Regression in Small Samples // Journal of
Biometrics. 1994. 226–231.
Bozdogan Hamparsum. Model selection and Akaike’s Information Criterion (AIC): The general
theory and its analytical extensions // Psychometrika. 1987. 52, 3. 345–370.
Brown P. J., Vannucci M., Fearn T. Multivariate Bayesian Variable Selection and Prediction //
Journal of the Royal Statistical Society. Series B (Statistical Methodology). 1998. 60, 3.
627–641.
Chen Jiahua, Chen Zehua. Extended Bayesian information criteria for model selection with large
model spaces // Biometrika. 2008. 95, 3. 759–771.
Findlay Gregory M, Daza Riza M, Martin Beth, Zhang Melissa D, Leith Anh P, Gasperini Molly,
Janizek Joseph D, Huang Xingfan, Starita Lea M, Shendure Jay. Accurate classification of
BRCA1 variants with saturation genome editing // Nature. 2018. 562, 7726. 217–222.
Hans Chris, Dobra Adrian, West Mike. Shotgun stochastic search for “large p” regression //
Journal of the American Statistical Association. 2007. 102, 478. 507–516.
Schwarz Gideon. Estimating the Dimension of a Model // The Annals of Statistics. 1978. 6, 2.
461–464.
Yang Xiaohong, Lippman Marc E. BRCA1 and BRCA2 in breast cancer // Breast Cancer
Research and Treatment. Mar 1999. 54, 1. 1–10.

THANK YOU
Major advisor
I Dr. Gyuhyeong Goh
Committee members:
I Dr. Weixing Song
I Dr. Wei-Wen Hsu
I Dr. Jisang Yu
I Dr. Yoon-Jin Lee

Bayesian solutions to high-dimensional challenges using hybrid search

Recommended

Recommended

More Related Content

Similar to Bayesian solutions to high-dimensional challenges using hybrid search

Similar to Bayesian solutions to high-dimensional challenges using hybrid search (20)

Recently uploaded

Recently uploaded (20)

Bayesian solutions to high-dimensional challenges using hybrid search