SlideShare a Scribd company logo
1 of 14
Download to read offline
Hacettepe Journal of Mathematics and Statistics
Volume 31 (2002), 63–76
BAYESIAN VARIABLE SELECTION IN
LINEAR REGRESSION AND A COMPARISON
Atilla Yardımcı∗
and Aydın Erar†
Received 17. 04. 2002
Abstract
In this study, Bayesian approaches, such as Zellner, Occam’s Window
and Gibbs sampling, have been compared in terms of selecting the correct
subset for the variable selection in a linear regression model. The aim of
this comparison is to analyze Bayesian variable selection and the behavior
of classical criteria by taking into consideration the different values of β and
σ and prior expected levels.
Key Words: Bayesian variable selection, Prior distribution, Gibbs Sampling, Markov
Chain Monte Carlo.
1. Introduction
A multiple linear regression equation with n observations and k independent vari-
ables can be defined as
Y = Xβ + , (1)
where Yn×1 is a response vector; Xn×k is a non-stochastic input matrix with rank
k , where k = k + 1; βk ×1 is an unknown vector of coefficients; n×1 is an error
vector with E( ) = 0 and E( ) = σ2
In. Estimation and prediction equations can
be defined as Y = Xˆβ + e and ˆY = Xˆβ, where e is the residual vector of the
estimated model, and ˆβ is the estimator of the model parameter obtained using
the Least Square Error (LSE) method or the Maximum Likelihood (ML) method
under the assumption that is normally distributed. Thus the joint distribution
function of the Y’s is obtained by maximizing the likelihood function as follows:
L(β, σ/Y) = f(Y/β, σ, X) =
1
(2πσ2)n/2
exp −
1
2σ2
(Y − Xβ) (Y − Xβ)
ˆβ = (X X)−1
X Y (2)
ˆσ2
= (Y − Xˆβ) (Y − Xˆβ)/(n − k ) = SSE(ˆβ)/(n − k ) (3)
∗Ba¸skent University, Faculty of Science and Letters, Department of Statistics and Computer
science, Ba˘glıca, Ankara.
†Hacettepe University, Faculty of Science, Department of Statistics, Beytepe, Ankara.
63
64 A. Yardımcı, A. Erar
where Y ∼ N(Xβ, σ2
In) and ˆσ2
is corrected such that E(ˆσ2
) = σ2
[4]. Variable
selection in regression analysis can be used to make estimation with minimum
cost, less MSE, leaving out the non-informative independent variables from the
model and low standard errors in the presence of variables with a high degree of
collinearity.
2. Classical Variable Selection Criteria
The stepwise and all possible subset methods for variable selection are often used
in a linear regression model. The all possible subset method is used in relation with
mean square error, coefficient of determination, F statistics, Cp and PRESSp [4].
In recent years, the AIC, AICc and BIC criteria have also been used in variable
selection procedures having respect to the amount of information. If p is the
number of parameters in the subset, AIC and AICc aim to select the subset which
minimizes the values of AIC and AICc obtained by the equations
AIC = n log(AKO + 1) + 2p (4)
AICc = AIC +
2(p + 1)(p + 2)
n − p − 2
. (5)
If f(Y/ˆβp, σp, X) is the likelihood function for the subset p, the criterion BIC
aims to select the subset which maximizes the equation given below [2]
BIC = log f(Y/ˆβp, σp, X) −
p
2
log n.
Posterior model probabilities and risk criteria have been also been used occasion-
ally in variable selection. Posterior model probabilities have been assessed for all
the subsets using a logarithmic approach given by Wassermann [16] under the as-
sumption of equal prior subset probabilities. The risk criterion has been adapted
to variable selection with the help of the equation
R(ˆβ) = σ2
tr(V(ˆβ)) (6)
given by Toutenburg [15], where V(ˆβ) is the variance-covariance matrix of ˆβ.
The rate (6) found for all the probable subsets corresponds to the risk of the
estimations. The aim is to get the model with the smallest risk.
On the other hand, Yardımcı [17] claims that the rates of risk and posterior
probability should be evaluated together during the interpretation. Ideally, the
aim is to find a subset that has high posterior probability and low risk. Since it is
hard to satisfy this condition, preferences are made within acceptable limits [17].
Furthermore, the Risk Inflation Criterion (RIC) has been found for all probable
subsets with the help of the equation defined by Foster and George [5] given below
RIC = SSEp + 2pMSEp log(k), (7)
where SSEp and MSEp are the sum of squares error and mean square error for the
subset p, respectively. It is claimed that RIC will produce the lowest values and
thus give more consistent deductions when trying to select the correct variables
[17].
Bayesian Variable Selection in Linear Regression 65
3. Bayesian Variable Selection Approaches
Bayesian variable selection approaches make use of hypothesis testing and stochas-
tic searching methods in linear regression. Apart from these approaches, there are
various methods based on the selection of the subset where the probability of the
posterior subset is the highest.
3.1. Posterior Odds and the Zellner Approach
Bayesian approaches to hypothesis testing are quite different from classical prob-
ability ratio tests. The difference comes from the use of prior probabilities and
distributions. The researcher has to define prior probabilities P(H0) and P(H1)
for the correctness of the hypothesis. Here H0 symbolizes the null hypothesis and
H1 stands for the alternative hypothesis. To choose one of the hypotheses, pos-
terior probabilities have to be assessed after defining the prior probabilities. The
Bayes Factor of H1 against H0 has been given by Lee [11] as follows:
BF =
p(H0/y)
p(H1/y)
P(H1)
P(H2)
. (8)
Thus, the Bayes factor is the ratio of H0’s posterior odds to its prior odds. If
the prior probabilities that verify the hypotheses H0 and H1 are equal, that is
P(H0) = P(H1) = 0.5, then Bayes factor is equal to the posterior odds of H0.
Although the interpretation of Bayesian hypothesis testing depends on the nature
of the research, Kass and Raftery [9] tables are often used. In the Zellner approach,
k independent variables in the model are divided into two sets. The variables in
the first set are those that are needed in the model and are shown by X1 and β1.
The variables in the second set are those that should be removed from the model
and are shown by X2 and β2. Here, X1 and X2 = V are n×k1 and n×k2 matrices
respectively, where k = k1 + k2. The vectors β1 and β2 have dimensions k1 × 1
and k2 × 1. I denotes the unit vector. The model can be rewritten as follows:
Y = αI + X1η + Vβ2 + ,
where V = I − X1(X1X1)−1
X1 X2, η = β1 + (X1X1)−1
X1X2β2 and X1V = 0
[19]. The hypotheses that will be tested in the case of the division of variables
into two sets may be written as H0 : β2 = 0 and H1 : β2 = 0.
In conclusion, when the prior probabilities for the rejection or acceptance of
the hypothesis are defined to be equal and the prior beliefs for the parameters are
defined as noninformative, then the odds ratio B01 is given by Moulton [13] as
follows:
BF01 = b(v1/2)k2/2
(v1s2
1/v0s2
1/v − 0s2
0)(v1−1)/2
= b(v1/2)k2/2
[1 + (k2/v1)Fk2,v1
]
(v1−1)/2
(9)
and
¯y =
n
i=1
yi
n
, ˆη = (X1X1)−1
X1Y,
66 A. Yardımcı, A. Erar
v0s2
0 = (y − ¯yI − X1 ˆη), v0 = n − k1 − 1,
ˆβ2 = (V V)−1
V Y,
v1s2
1 = (Y − ¯yI − X1 ˆη − Vˆβ2) (Y − ¯yI − X1 ˆη − Vˆβ2), v1 = n − k1 − k2 − 1,
b =
√
π/Γ[(k2 + 1)/2],
where Fh = Fk2,v1
= ˆβ2V Vˆβ2/k2s2
1 is the F-test statistics. Thus the posterior
odds ratio given by equation (9) can be used as the Bayesian test for the variables
to be included in the model and those to be removed from the model [19].
3.2. Occam’s Window Approach
In this approach, if a model is not as strong as the others in terms of their posterior
odds, then the model and its sub models are eliminated from consideration. If the
posterior probabilities of two models are equal, the simple one is accepted (accord-
ing to parsimony rule). Hoeting [8] has claimed that if the aim is to interpret the
relationship between the variables and to make quick assessments, then Occam
Window (OW) approach is useful. If q = 2k
is the number of all possible models,
then M = M1, . . . , Mq will show all possible subsets. The posterior probability
of the subset Mm is obtained as
p(Y/Mn) = p(Y/β, Mm)p(β/Mm) dβ. (10)
Here, p(Y/β, Mm) is the marginal likelihood of the subset Mm; β is the parameter
vector; p(b/Mm) is the prior probability of β for the subset Mm. The equation
(10) can be rewritten as
p(∆/Y, Hm) = p(∆/Y, βm, Hm)p(βm/Y, Hm) dβm.
with the help of the Bayes theorem [9]. Here βm is the vector of independent
variables for the subset Mm. The OW approach has two main characteristics: if
the subset estimation is weaker than the subset under consideration, then it is
eliminated [12]. Thus, the eliminated subset is defined by Hoeting [8] as follows:
A = Mm;
max(p(Mi/Y ))
p(Mm/Y )
> OL .
Here, OL is determined by the researcher. The second characteristic is that it
determines the alternative subset with the support of data. For this searching,
B = Mm; ∃ Mi ∈ A, Mi ⊆ Mq,
p(Mi/Y )
p(Mm/Y )
> OR
is used [14]. Note that OR is also determined by the researcher. Generally, OR is
taken to be 1 [8]. Also, ∆ denotes a situation which can be regarded as a future
observation or a loss of an event. It will be written as [8]
p(∆/Y ) = Mm∈A p(∆/MmY )p(Y/Mm)p(Mm)
Mm∈A p(Y/M + m)p(Mm)
.
Bayesian Variable Selection in Linear Regression 67
In the OW approach, the rates OL and OR are determined by subjective belief.
For example, Madigan and Raftery [12] accepted OL = 1/20 and OR = 1. In an
application of this approach, model selection is done in two stages: First a logical
model from all the 2k
−1 possible models is chosen. Then “up or down algorithms”
are used to add or remove variables from the models [8].
3.3. The Gibbs Sampling Approach
Bayesian approaches concentrate on the determination of posterior probabilities
or distributions. However, in some cases, the analytic solution of the integrals
needed for the assessment of the posterior moments are impossible or hard to
achieve. In such cases, approaches involving the creation of Markov chains and
the use of convergence characteristics to obtain the posterior distributions are used
[17]. Thus, by using Markov Chain and stochastic simulation techniques together,
Markov Chain Monte Carlo (MCMC) approaches have been developed.
The aim of the MCMC approach is to create a random walk which converges
to the target posterior distributions [1]. In the MCMC approach, a sequence θj
converging to the posterior distribution is produced in which θj+1
depends on the
previous rate θj
. With this in mind, sampling is done using the Markov transi-
tion distribution g(θ/θj
) [7]. For a given posterior distribution, there are various
methods of taking samples from the transition distribution and processing this in-
formation. Metropolis algorithms and Gibbs sampling approaches are used to show
how these samples taken from a posterior distribution achieve the characteristic
of a Markov chain.
In Gibbs sampling, initially starting points (X0
1 , X0
2 , . . . , X0
k) are determined
when j = 0, and then points X(j)
= (X
(j)
1 , X
(j)
2 , . . . , X
(j)
k ) are produced in the
state x(j−1)
. Switching to the state j = j + 1, the process continues until conver-
gence is obtained [6]. When convergence is achieved the points x(j)
correspond to
the rates p(x1, . . . , xk/y). There are various Bayesian variable selection approaches
that depend on Gibbs sampling. Here the approach of Kuo and Mallick [10], that
is used in this study, will be explained.
In Gibbs sampling, a dummy variable given by γ = (γ1, . . . , γk) is used to
define the position of the 2k−1
independent variable subsets in the model. If the
variable considered is in the model, then γj = 1, if not γj = 0. Since the γ rate
is not known, it has to be added to the Bayes process. Thus, the joint prior that
will be used in the variable selection process is defined as
p(β, σ, γ) = p(β/σ, γ)p(σ/γ)p(γ).
Here, the prior defined by Kuo and Mallick [10] for the β coefficients is assumed
to be
β/σ, γ ∼ N(β0, D0I).
In addition, a noninformative prior for σ is defined by
p(σ) ∝
1
σ
.
68 A. Yardımcı, A. Erar
The prior distribution of σ is obtained with the help of the inverted gamma dis-
tribution and the assumptions α → 0 and η → 0. The noninformative prior for γ
is given as
γj ∼ Bernoulli(1, p).
The deductions concentrate on these distributions since the marginal posterior dis-
tribution p(γ/Y) contains all the information that is needed for variable selection.
In Kuo and Mallick [10], it has been shown that p(γ/Y) can be estimated using
Gibbs sampling. Moreover, the least square estimation for the full model can be
used when the starting values of β0
and σ0
, that are needed for this process, are
γj = 1, (j = 1, . . . , k). Initially we may take γ0
= (1, 1, . . . , 1) . In this way, the
marginal posterior density of γj can be given as
p(γj/γ−j, β, σ, Y) = B(1, ˜pj), j = 1, 2, . . . , k,
where, if γ−j = (γ1, . . . , γj−1, γj+1, . . . , γk), then
˜pj =
cj
(cj + dj)
,
cj = pj exp −
1
2σ2
(Y − Xv∗
j ) (Y − Xv∗
j ) ,
dj = (1 − pj) exp −
1
2σ2
(Y − Xv∗∗
j ) (Y − Xv∗∗
j ) .
Here, the vector v∗
j is the column vector of ν with the jth
entry replaced by βj,
similarly, v∗∗
j is the column vector of ν with the jth
entry replaced by 0. Thus, cj
is a probability for the acceptance of the jth
variable and dj a probability for its
rejection.
4. Comparison of the Selection Criteria with Artificial Data
For a comparison of selection criteria in parallel to the study done by Kuo and
Mallick [10] and George and McCulloch, artificial data was produced in this study
for n = 20 and k = 5. The distribution of the variables Xj has been assumed to
be normal with N(0, 1). Four types of β structure, (Beta1,. . . ,Beta4), have been
determined to observe the effects of the significance level of variables in the model
that will be used in selection:
Beta 1 : (3, 2, 1, 0, 0) Beta 2 : (1, 1, 1, 0, 0)
Beta 3 : (3, −2, 1, 0, 0) Beta 4 : (1, −1, 1, 0, 0)
On the other hand, the levels σ2
= 0.01, 1.0, 10 are chosen to see the effects of σ2
on the results. However, the results for σ2
= 1.0 are not given in the tables, but
they are mentioned in interpretations. In this study, to observe the effects of prior
information levels on the result, the prior assumptions given in Table 4.1 are used.
Furthermore, the assumption ei ∼ N(0, σ2
) is made for the errors. The list of all
possible subsets for 5 independent variables is given in Appendix 1.
Bayesian Variable Selection in Linear Regression 69
Table 4.1: Prior Assumptions
Bayesian Approach Prior Assumption
Zellner Equal subset prior probability
a. Equal subset prior probability.
Occam’s Window b. For the 2nd
and 5th
subsets, prior probability is 0.1.
c. For the 26th
subset, prior probability is 0.10.
a. D0 = 16, 1
Gibbs Sampling b. Equal prior probability for variables
c. Different prior probability for variables
During the comparison of these approaches, the BARVAS (Bayesian Regression-
Variable Selection) computer program, developed especially for the Bayesian vari-
able selection process by Yardımcı [17] is used. The BARVAS program can be
used for Bayesian regression and variable selection in linear regression, and it is a
packet program where the classical variable selection criteria for variable selection
could be also used together [18].
4.1. Classical Variable Selection
In the comparison of classical variable selection, the well-known criteria (Cp, AICc,
BIC, Press) will be referred to as Group I, and posterior model probability, Risk,
RIC as Group II. The results of classical variable selection for the data are summa-
rized in Table 4.2. In this table, the criteria are classified among themselves and
the best three subsets are given. It has been observed that the selection criteria in
Group I have not been affected by β and σ2
. For the β structures under consid-
eration, all the criteria in Group I find the best subset, which is subset number 5.
Also, subsets 10 and 26 are found in all criteria as the best selection subsets. The
AICc criterion is more appropriate for a model which has fewer variables because
of its structure. Group II criteria show parallel results with other criteria. Subset
posterior model probability (P) gave the same classification as Cp in all structural
cases. The risk criterion gives better results for subsets with fewer variables. Al-
though subsets with lower risk accord with other criteria, these subsets have lower
dimension for the condition σ2
= 0.01. RIC gives similar results to Group I. As a
result of the analysis of different classifications it is observed that the RIC rates
are very close to one another.
4.2. Bayesian Approaches
The Bayesian variable selection approaches, which are calculated using the BAR-
VAS program, are given in Table 4.2. The numbers in the table show the numbers
of the selected models. The detailed results for the Bayesian approaches are given
as follows.
70 A. Yardımcı, A. Erar
Table 4.2. Number of Selected Models through the Classical and
Bayesian Approaches
(See the Appendix for these numbers).
Classical Criteria Bayesian Approaches
Zellner OW Gibbs
σ2 Beta Cp AICc BIC Press P Risk RIC BF Fh (equal D0 = 1 D0 = 16
prior)
5 2 5 5 5 1 5 5 10 10 5 2
1 10 1 10 2 10 2 10 2 26 5 2 5
2 5 26 13 26 5 26 10 5 21 21 1
5 3 5 5 5 3 5 5 10 5 5 5
2 10 5 10 26 10 4 10 10 26 10 21 3
26 4 26 10 26 7 26 26 5 26 3 6
0.01
5 5 5 5 5 5 5 5 26 26 5 5
3 26 2 26 26 26 10 10 10 10 5 21 2
10 26 10 10 10 2 26 26 5 21 2 18
5 5 5 5 5 7 5 5 26 26 5 5
4 26 6 26 26 26 6 10 26 10 21 21 2
10 26 10 10 10 5 26 10 5 5 26 3
5 5 5 5 5 5 5 5 10 10 5 5
1 10 2 10 26 10 10 26 19 26 21 10 2
26 10 26 10 26 26 10 26 5 5 21 10
5 5 5 5 5 5 5 5 10 10 5 5
2 10 1 10 26 10 26 26 10 26 26 21 1
26 2 26 10 26 10 10 26 5 21 10 2
10
5 2 5 5 5 5 5 5 26 5 5 5
3 26 5 26 10 26 26 26 10 10 26 21 2
10 13 10 2 10 2 10 26 5 10 2 13
5 5 5 5 5 5 5 5 10 10 5 5
4 10 10 10 26 10 10 10 10 26 5 21 3
26 26 26 10 26 26 26 26 5 21 26 21
The Zellner Approach
The Zellner approach preferred strongly the best subset number 5 for all the
different cases of beta and σ2
. The BF rates for the subset 5 are different from
those for the alternative subsets 10 and 26. This difference is also reflected in the
interpretations of Jeffreys, Kass and Rafftery. The results of the Zellner approach
on standardized data are given in Table 4.3. When the rates Fh are considered,
it is observed that the alternative subsets 10 and 26 are preferred. Since these
subsets have a larger dimension than the best subset, they give better results for
the purpose of this study. However, generally when the Zellner approach is used
for normally distributed variables, β and σ2
have no affect.
Bayesian Variable Selection in Linear Regression 71
Table 4.3. Results of the Zellner Approach
σ2 = 0.01 σ2 = 10.0
for max BF for min Fh for max BF for min Fh
Beta Subset BF J KR Subset Fh T Subset BF J KR Subset Fh T
No. No. No. No.
5 13.68 S S 10 0.09 A 5 12.16 S H 10 0.02 A
1 2 7.17 H H 26 0.31 A 10 4.46 H H 26 0.57 A
10 4.50 H H 5 0.39 A 26 3.61 H H 5 0.65 A
5 15.29 S S 10 0.06 A 5 13.37 S S 10 0.11 A
2 10 4.55 H H 26 0.11 A 10 4.48 H H 26 0.20 A
26 4.46 H H 5 0.14 A 26 4.29 H H 5 0.44 A
5 14.52 S S 26 0.04 A 5 15.50 S S 26 0.00 A
3 10 4.59 H H 10 0.24 A 10 4.69 H H 10 0.10 A
26 4.21 H H 5 0.25 A 26 4.47 H H 5 0.11 A
5 0.38 H H 26 0.04 A 5 13.74 S S 10 0.02 A
4 26 4.60 H H 10 1.22 A 10 4.65 H H 26 0.37 A
10 2.27 W H 5 1.25 A 26 3.96 H H 5 0.38 A
J: Jeffreys, KR: Kass ve Rafftery, T: according to the F table
A: Acceptance, S: Strong, H: High, W: Weak, R: Rejection
Occam’s Window Approach
The OW approach primarily determines the subsets to be studied, rather than
determining the best subset. As a result, the researcher chooses the suitable subset
for his aims. On artificial data, while the posterior odds are calculated for the
values OL = 0.5 and OR=1, the prior probability for all subsets is considered to
be equal at the first stage. The OW approach evaluates the full set as a good model.
However, the OW approach concentrates more on larger dimensional subsets. The
results for the OW approach are given in Table 4.4. In this study, the best subset
number 5 is generally selected. However the posterior probabilities of subsets in the
classification are close to one another. It may be observed that the OW approach
is not much affected by a change in the β and σ2
structures. At the second stage,
Case I and II are defined to observe the effects of prior probabilities on the results.
In Case I, the prior probability of subset 2 and 5 is 0.10. In Case II, a 0.10 prior
probability is given to subset 26. The prior probabilities of the other subsets are
0.90/29 for Case I and 0.90/30 for Case II. It may be observed that the subset
prior probabilities have a considerable effect on the OW results. The effect in Case
I is more than in Case II. In Case I, the prior probability determined for subset 2
is not sufficient for it to be taken into the classification, but subset 5 has a high
posterior probability.
The Gibbs Sampling Approach
The Gibbs sampling approach used by Kuo and Mallick [10] is applied using the
BARVAS program, and the results given in Table 4.5. Gibbs sampling is applied
for all cases using simulated data over 10.000 iterations. The sampling time lasted
approximately two minutes for each of the σ2
, β and D0 levels. By considering the
72 A. Yardımcı, A. Erar
prior parameter estimation, which is β0 = 0, the noninformative prior information
has been utilized in the Kuo and Mallick approach to Gibbs sampling.
Table 4.4. The Results for Occam’s Window Approach
σ2
= 0.01
Equal prior Case 1 Case 2
Beta Subset No. Pos. P Subset No. Pos. P Subset No. Pos. P
10 0.25 5 0.50 26 0.47
1 5 0.23 10 0.15 10 0.17
21 0.21 21 0.13 5 0.15
5 0.27 5 0.58 26 0.52
2 10 0.25 10 0.15 5 0.17
26 0.24 26 0.14 10 0.16
26 0.27 5 0.56 26 0.56
3 5 0.26 26 0.16 5 0.16
21 0.23 21 0.14 21 0.14
26 0.36 5 0.44 26 0.65
4 21 0.30 26 0.24 21 0.17
5 0.18 21 0.21 5 0.09
σ2
= 10.0
Equal prior Case 1 Case 2
Beta Subset No. Pos. P Subset No. Pos. P Subset No. Pos. P
10 0.31 5 0.51 26 0.46
1 21 0.26 10 0.20 10 0.21
5 0.22 21 0.17 21 0.17
10 0.27 5 0.53 26 0.52
2 26 0.25 10 0.17 10 0.16
21 0.24 26 0.15 21 0.15
5 0.26 5 0.56 26 0.53
3 26 0.25 26 0.15 5 0.17
10 0.23 10 0.14 10 0.15
10 0.28 5 0.55 26 0.48
4 5 0.25 10 0.17 10 0.19
21 0.24 21 0.14 5 0.17
Case 1: 0.10 prior probability is given to subsets 2 and 5;
Case 2: 0.10 prior probability is given to subset 26.
Pos. P: Subset posterior probability.
The effect on the results have been observed by using prior variance values
of D0 = 1 and 16. The number of zero subsets (Zero) reached, and the subset
numbers (Processed) which were processed were given by the BARVAS program
for Gibbs sampling. It may be observed that the number of zero subsets is affected
Bayesian Variable Selection in Linear Regression 73
by the β structure. A lot of computation time was needed for the zero subsets
for σ2
= 0.01 and σ2
= 1.0 with Beta 2, and for σ2
= 10 with Beta 4. It
may be observed that D0, which is regarded as representing the prior information
level, has no effect on the number of processed subsets, but is effective on the
appearance of zero subsets. In this study, it may be observed that β and σ2
are
not very influential in obtaining subsets. However for σ2
= 10, the posterior model
probabilities for the correct subset are greater than the others. During this study,
the high prior probability rates and the different prior probabilities in the model
are given for redundant variables in all β structures in order to observe the effects
of the independent variables on the process. As a result of this application, it has
been observed that the prior probabilities reversibly determined for variables in
the model have no effect on the results.
Table 4.5 The Results for the Gibbs Sampling Approach
σ2
= 0.01 σ2
= 10
D0 = 1 D0 = 16 D0 = 1 D0 = 16
Beta Subset Pos. P Subset Pos. P Subset Pos. P Subset Pos. P
No. No. No. No.
5 0.37 2 0.63 5 0.62 5 0.70
2 0.31 5 0.27 10 0.12 2 0.21
1 21 0.24 1 0.05 21 0.10 10 0.05
Zero = 4, Zero = 0, Zero = 48, Zero = 9,
Processed = 16 Processed = 14 Processed = 14 Processed = 10
5 0.72 5 0.66 5 0.74 5 0.70
21 0.17 3 0.22 21 0.15 1 0.20
2 3 0.05 6 0.03 10 0.05 2 0.05
Zero = 46, Zero = 389, Zero = 57, Zero = 125,
Processed = 25 Processed = 19 Processed = 19 Processed = 19
5 0.70 5 0.75 5 0.71 5 0.68
21 0.10 2 0.19 21 0.14 2 0.26
3 2 0.05 18 0.02 2 0.09 13 0.03
Zero = 0, Zero = 5, Zero = 0, Zero = 16,
Processed = 13 Processed = 13 Processed = 10 Processed = 9
5 0.64 5 0.59 5 0.80 5 0.88
21 0.13 2 0.15 21 0.16 3 0.06
4 26 0.11 3 0.12 26 0.03 21 0.03
Zero = 0, Zero = 26, Zero = 48, Zero = 448,
Processed = 16 Processed = 18 Processed = 14 Processed = 19
5. Application
An application using air pollution data [3] is also used for comparing the criteria.
The dependent variable is the quantity of SO2. There are 8 independent variables
and 255 subsets for these data. In our study, the classical criterion such as Cp,
BIC, PRESSp, P (posterior model probability) and RIC have selected the same
74 A. Yardımcı, A. Erar
subsets which given with model numbers 162, 163 in Table 5.1. AICc has selected
the models 156, 163. The results of Bayesian criterion are also shown in Table 5.1.
The models selected by Bayesian criterion are fairly like to be the results of classical
selection. While Occam’s Window method tends to select the more large subsets,
Gibbs sampling used by Kuo and Mallick tends to select the subsets with less
variables. The advantage of Gibbs sampling is to have processed only 54 instead
of 255 subsets for D0=1 or 42 subsets for D0=4. If the models with less variables
is preferred according the parsimony rule or the aim of prediction, the models 156
or 163 can be used to make the estimation of parameters of these models. Using
least square or Bayesian methods can make the estimation of parameters of these
models.
Table 5.1 The Variables and Results for Bayesian Selection.
Zellner Occams Window Gibbs
No. Variables Max. BF Min. Fh Pos. P1 Pos. P2 Pos. P Pos. P
(D0 = 1) (D0 = 4)
99 x2, x5, x7 0.22
156 x2, x5, x7, x8 200.0 0.07 0.20 0.33
162 x1, x2, x5, x6, x7, x8 162.2 0.07 0.179 0.160 0.22
163 x2, x5, x6, x7, x8 255.1 0.31 0.23
165 x1, x2, x3, x5, x6, x7, x8 0.02 0.179 0.160
170 x1, x2, x3, x4, x5, x6, x7, x8 0.176 0.157
1) Posterior probability for equal prior, 2) 0.10 prior probability for model 156.
6. Conclusion
As a result of this study, it can be claimed that Bayesian approaches are affected
by prior changes. Generally, the β and σ2
structures do not have any effect on the
results. However Bayesian approaches define the correct subset most of the time.
In Gibbs sampling, the prior probabilities defined for variable selection are
not effective. However, it can be said that prior information as used in parameter
estimation directly affects the results. It is also observed that the prior information
level is influential in obtaining zero subsets. In Gibbs sampling it is also observed
that few subsets having high posterior probability, are dealt with.
It has been observed that the risk criterion proposed by Yardımcı [17] prefers
smaller dimensional subsets than the other criteria, and that these subsets have
average posterior probabilities. In other words, analyzing and interpreting together
the posterior probabilities of subsets with low risk will help the researcher to
determine the good subsets. In the Zellner and Occam’s window approaches,
subset prior probability is effective on the results. It has also been observed that
Occam’s window prefers larger dimensional subsets than both the Zellner approach
and Gibbs sampling.
The Bayesian approaches are more effective, though not very much so, than
all possible subset selection. They have the advantage that only the evaluation of
subsets with a high posterior probability is needed.
Bayesian Variable Selection in Linear Regression 75
7. Appendix
List of subsets for k = 5
Subset Variables Subset Variables
No No
1 x1 17 x1, x4, x5
2 x1, x2 18 x1, x2, x4, x5
3 x2 19 x2, x4, x5
4 x2, x3 20 x2, x3, x4, x5
5 x1, x2, x3 21 x1, x2, x3, x4, x5
6 x1, x3 22 x1, x3, x4, x5
7 x3 23 x3, x4, x5
8 x3, x4 24 x3, x5
9 x1, x3, x4 25 x1, x3, x5
10 x1, x2, x3, x4 26 x1, x2, x3, x5
11 x2, x3, x4 27 x2, x3, x5
12 x2, x4 28 x2, x5
13 x1, x2, x4 29 x1, x2, x5
14 x1, x4 30 x1, x5
15 x4 31 x5
16 x4, x5
References
[1] Carlin, B.P. and Chib, S. Bayesian model choice via markov chain monte carlo
methods, J. R. Statist. Soc. B 57, (3), 473–484, 1995.
[2] Cavanaugh, J. E. Unifying the derivations for the Akaike and corrected Akaike
information criteria, Statistics and Probability Letters 33, 201–208, 1997.
[3] Candan, M. The Robust Estimation in Linear Regression Analysis, Unpub-
lished MSc. Thesis (in Turkish), Hacettepe University, Ankara, 2000.
[4] Draper, N. and Smith, H. Applied Regression Analysis, John Wiley, 709 p,
1981.
[5] Foster, D. P. and George, E. I. The risk inflation criterion for multiple regres-
sion, The Annals of Statistics 22 (4), 1947–1975, 1995.
[6] Gamerman, D. Markov Chain Monte Carlo: Stochastic Simulation for
Bayesian Inference, Chapman and Hall, London, 245p, 1997.
[7] George, E. I. and McCulloch, R. E. Approaches for Bayesian variable selec-
tion, Statistica Sinica 7, 339–373, 1997.
76 A. Yardımcı, A. Erar
[8] Hoeting, J. A. Accounting for model uncertainty in linear regression, PhD.
Thesis, University of Washington, 1994.
[9] Kass, R. E. and Raftery, A. E. Bayes factors, JASA 90, 773–795, 1995.
[10] Kuo, L. and Mallick, B. Variable selection for regression models, Sankhya B
60, 65–81, 1998.
[11] Lee, P. M. Bayesian Statistics: An Introduction, Oxford, 264p, 1989.
[12] Madigan, D. and Raftery, A. E. Model selection and accounting for model
uncertainty in graphical models using Occam’s window, JASA 89, 1535–1546,
1994.
[13] Moulton, B. I. A Bayesian approach to regression selection and estimation,
with application to price index for radio services, Journal of Econometrics 49,
169–193, 1991.
[14] Raftery, A., Madigan, D. and Hoeting, J. Model selection and accounting
for model uncertainty in linear regression models, University of Washington,
Tech. Report. No. 262., 24p, 1993.
[15] Toutenburg, H. Prior information in linear models, John Wiley Inc., 215p,
1982.
[16] Wasserman, L. Bayesian model selection, Symposium on Methods for Model
Selection, Indiana Uni., Bloomington, 1997.
[17] Yardımcı, A. Comparison of Bayesian Approaches to Variable Selection in
Linear Regression, Unpublished PhD. Thesis (in Turkish), Hacettepe Univer-
sity, Ankara, 2000.
[18] Yardımcı, A. and Erar, A. Bayesian variable selection in linear regression
and BARVAS Software Package (in Turkish), Second Statistical Congress,
Antalya, 386–390, 2001.
[19] Zellner, A. and Siow, A. Posterior odds ratio for selected regression hypothe-
sis, Bayesian Statistics, University Press, Valencia, 585–603, 1980.

More Related Content

What's hot

G03405049058
G03405049058G03405049058
G03405049058theijes
 
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATARESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATAorajjournal
 
ECO 578 Final Exam There are 4 parts
ECO 578 Final Exam There are 4 partsECO 578 Final Exam There are 4 parts
ECO 578 Final Exam There are 4 partsAlexHunetr
 
Lec 2 discrete random variable
Lec 2 discrete random variableLec 2 discrete random variable
Lec 2 discrete random variablecairo university
 
Regression analysis
Regression analysisRegression analysis
Regression analysisAwais Salman
 
A. spanos slides ch14-2013 (4)
A. spanos slides ch14-2013 (4)A. spanos slides ch14-2013 (4)
A. spanos slides ch14-2013 (4)jemille6
 
Some Unbiased Classes of Estimators of Finite Population Mean
Some Unbiased Classes of Estimators of Finite Population MeanSome Unbiased Classes of Estimators of Finite Population Mean
Some Unbiased Classes of Estimators of Finite Population Meaninventionjournals
 
Mean, variance, and standard deviation of a Discrete Random Variable
Mean, variance, and standard deviation of a Discrete Random VariableMean, variance, and standard deviation of a Discrete Random Variable
Mean, variance, and standard deviation of a Discrete Random VariableMichael Ogoy
 
Spanos lecture+3-6334-estimation
Spanos lecture+3-6334-estimationSpanos lecture+3-6334-estimation
Spanos lecture+3-6334-estimationjemille6
 
A. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluation
A. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluationA. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluation
A. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluationjemille6
 

What's hot (20)

G03405049058
G03405049058G03405049058
G03405049058
 
Chapter7
Chapter7Chapter7
Chapter7
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distribution
 
Chapter13
Chapter13Chapter13
Chapter13
 
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATARESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
 
Chapter11
Chapter11Chapter11
Chapter11
 
Test for independence
Test for independence Test for independence
Test for independence
 
ECO 578 Final Exam There are 4 parts
ECO 578 Final Exam There are 4 partsECO 578 Final Exam There are 4 parts
ECO 578 Final Exam There are 4 parts
 
Chapter15
Chapter15Chapter15
Chapter15
 
Lec 2 discrete random variable
Lec 2 discrete random variableLec 2 discrete random variable
Lec 2 discrete random variable
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Chapter14
Chapter14Chapter14
Chapter14
 
A. spanos slides ch14-2013 (4)
A. spanos slides ch14-2013 (4)A. spanos slides ch14-2013 (4)
A. spanos slides ch14-2013 (4)
 
Some Unbiased Classes of Estimators of Finite Population Mean
Some Unbiased Classes of Estimators of Finite Population MeanSome Unbiased Classes of Estimators of Finite Population Mean
Some Unbiased Classes of Estimators of Finite Population Mean
 
Mean, variance, and standard deviation of a Discrete Random Variable
Mean, variance, and standard deviation of a Discrete Random VariableMean, variance, and standard deviation of a Discrete Random Variable
Mean, variance, and standard deviation of a Discrete Random Variable
 
Chi sqr
Chi sqrChi sqr
Chi sqr
 
Discrete and Continuous Random Variables
Discrete and Continuous Random VariablesDiscrete and Continuous Random Variables
Discrete and Continuous Random Variables
 
Chapter8
Chapter8Chapter8
Chapter8
 
Spanos lecture+3-6334-estimation
Spanos lecture+3-6334-estimationSpanos lecture+3-6334-estimation
Spanos lecture+3-6334-estimation
 
A. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluation
A. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluationA. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluation
A. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluation
 

Viewers also liked

GIBBS SAMPLING APPROACH TO VARIABLE SELECTION IN LINEAR REGRESSION WITH OUTL...
GIBBS SAMPLING APPROACH TO VARIABLE SELECTION IN LINEAR  REGRESSION WITH OUTL...GIBBS SAMPLING APPROACH TO VARIABLE SELECTION IN LINEAR  REGRESSION WITH OUTL...
GIBBS SAMPLING APPROACH TO VARIABLE SELECTION IN LINEAR REGRESSION WITH OUTL...Atilla YARDIMCI
 
An overview of electricity demand forecasting techniques
An overview of electricity demand forecasting techniquesAn overview of electricity demand forecasting techniques
An overview of electricity demand forecasting techniquesAlexander Decker
 
Stat451 - Life Distribution
Stat451 - Life DistributionStat451 - Life Distribution
Stat451 - Life Distributionnpxquynh
 
Course on Bayesian computational methods
Course on Bayesian computational methodsCourse on Bayesian computational methods
Course on Bayesian computational methodsChristian Robert
 
2014 tromso generalised and fractional Langevin equations implications for en...
2014 tromso generalised and fractional Langevin equations implications for en...2014 tromso generalised and fractional Langevin equations implications for en...
2014 tromso generalised and fractional Langevin equations implications for en...Nick Watkins
 
Bazı Yanlı Kestirim Yöntemleri Üzerine Bir Benzetim Çalışması
Bazı Yanlı Kestirim Yöntemleri Üzerine Bir Benzetim ÇalışmasıBazı Yanlı Kestirim Yöntemleri Üzerine Bir Benzetim Çalışması
Bazı Yanlı Kestirim Yöntemleri Üzerine Bir Benzetim ÇalışmasıAtilla YARDIMCI
 
Çoklubağlantılı Doğrusal Regresyonda Bayes Yaklaşımı
Çoklubağlantılı Doğrusal Regresyonda Bayes YaklaşımıÇoklubağlantılı Doğrusal Regresyonda Bayes Yaklaşımı
Çoklubağlantılı Doğrusal Regresyonda Bayes YaklaşımıAtilla YARDIMCI
 
Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)Christian Robert
 
Short-term Load Forecasting using traditional demand forecasting
Short-term Load Forecasting using traditional demand forecastingShort-term Load Forecasting using traditional demand forecasting
Short-term Load Forecasting using traditional demand forecastingIOSR Journals
 
Bayesian and Non Bayesian Parameter Estimation for Bivariate Pareto Distribut...
Bayesian and Non Bayesian Parameter Estimation for Bivariate Pareto Distribut...Bayesian and Non Bayesian Parameter Estimation for Bivariate Pareto Distribut...
Bayesian and Non Bayesian Parameter Estimation for Bivariate Pareto Distribut...IJERA Editor
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryAlbert Orriols-Puig
 
İçerdiği Faaliyet Gruplarına Göre Türkiye’deki Yaratıcı Endüstri Üzerine Bir ...
İçerdiği Faaliyet Gruplarına Göre Türkiye’deki Yaratıcı Endüstri Üzerine Bir ...İçerdiği Faaliyet Gruplarına Göre Türkiye’deki Yaratıcı Endüstri Üzerine Bir ...
İçerdiği Faaliyet Gruplarına Göre Türkiye’deki Yaratıcı Endüstri Üzerine Bir ...Atilla YARDIMCI
 

Viewers also liked (15)

GIBBS SAMPLING APPROACH TO VARIABLE SELECTION IN LINEAR REGRESSION WITH OUTL...
GIBBS SAMPLING APPROACH TO VARIABLE SELECTION IN LINEAR  REGRESSION WITH OUTL...GIBBS SAMPLING APPROACH TO VARIABLE SELECTION IN LINEAR  REGRESSION WITH OUTL...
GIBBS SAMPLING APPROACH TO VARIABLE SELECTION IN LINEAR REGRESSION WITH OUTL...
 
An overview of electricity demand forecasting techniques
An overview of electricity demand forecasting techniquesAn overview of electricity demand forecasting techniques
An overview of electricity demand forecasting techniques
 
Viva extented final
Viva extented finalViva extented final
Viva extented final
 
Stat451 - Life Distribution
Stat451 - Life DistributionStat451 - Life Distribution
Stat451 - Life Distribution
 
Course on Bayesian computational methods
Course on Bayesian computational methodsCourse on Bayesian computational methods
Course on Bayesian computational methods
 
2014 tromso generalised and fractional Langevin equations implications for en...
2014 tromso generalised and fractional Langevin equations implications for en...2014 tromso generalised and fractional Langevin equations implications for en...
2014 tromso generalised and fractional Langevin equations implications for en...
 
Master Thesis
Master ThesisMaster Thesis
Master Thesis
 
Bazı Yanlı Kestirim Yöntemleri Üzerine Bir Benzetim Çalışması
Bazı Yanlı Kestirim Yöntemleri Üzerine Bir Benzetim ÇalışmasıBazı Yanlı Kestirim Yöntemleri Üzerine Bir Benzetim Çalışması
Bazı Yanlı Kestirim Yöntemleri Üzerine Bir Benzetim Çalışması
 
Çoklubağlantılı Doğrusal Regresyonda Bayes Yaklaşımı
Çoklubağlantılı Doğrusal Regresyonda Bayes YaklaşımıÇoklubağlantılı Doğrusal Regresyonda Bayes Yaklaşımı
Çoklubağlantılı Doğrusal Regresyonda Bayes Yaklaşımı
 
Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)
 
Short-term Load Forecasting using traditional demand forecasting
Short-term Load Forecasting using traditional demand forecastingShort-term Load Forecasting using traditional demand forecasting
Short-term Load Forecasting using traditional demand forecasting
 
Bayesian and Non Bayesian Parameter Estimation for Bivariate Pareto Distribut...
Bayesian and Non Bayesian Parameter Estimation for Bivariate Pareto Distribut...Bayesian and Non Bayesian Parameter Estimation for Bivariate Pareto Distribut...
Bayesian and Non Bayesian Parameter Estimation for Bivariate Pareto Distribut...
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-Theory
 
İçerdiği Faaliyet Gruplarına Göre Türkiye’deki Yaratıcı Endüstri Üzerine Bir ...
İçerdiği Faaliyet Gruplarına Göre Türkiye’deki Yaratıcı Endüstri Üzerine Bir ...İçerdiği Faaliyet Gruplarına Göre Türkiye’deki Yaratıcı Endüstri Üzerine Bir ...
İçerdiği Faaliyet Gruplarına Göre Türkiye’deki Yaratıcı Endüstri Üzerine Bir ...
 

Similar to Bayesian Variable Selection in Linear Regression and A Comparison

REGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREREGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREShriramKargaonkar
 
Extreme bound analysis based on correlation coefficient for optimal regressio...
Extreme bound analysis based on correlation coefficient for optimal regressio...Extreme bound analysis based on correlation coefficient for optimal regressio...
Extreme bound analysis based on correlation coefficient for optimal regressio...Loc Nguyen
 
A comparative analysis of predictve data mining techniques3
A comparative analysis of predictve data mining techniques3A comparative analysis of predictve data mining techniques3
A comparative analysis of predictve data mining techniques3Mintu246
 
Factor analysis
Factor analysis Factor analysis
Factor analysis Mintu246
 
Regression Analysis by Muthama JM
Regression Analysis by Muthama JM Regression Analysis by Muthama JM
Regression Analysis by Muthama JM Japheth Muthama
 
ISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptxISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptxssuser1eba67
 
Asymptotic properties of bayes factor in one way repeated measurements model
Asymptotic properties of bayes factor in one  way repeated measurements modelAsymptotic properties of bayes factor in one  way repeated measurements model
Asymptotic properties of bayes factor in one way repeated measurements modelAlexander Decker
 
Asymptotic properties of bayes factor in one way repeated measurements model
Asymptotic properties of bayes factor in one  way repeated measurements modelAsymptotic properties of bayes factor in one  way repeated measurements model
Asymptotic properties of bayes factor in one way repeated measurements modelAlexander Decker
 
Appendix 2 Probability And Statistics
Appendix 2  Probability And StatisticsAppendix 2  Probability And Statistics
Appendix 2 Probability And StatisticsSarah Morrow
 
SOLUTION OF DIFFERENTIAL EQUATIONS
SOLUTION OF DIFFERENTIAL EQUATIONSSOLUTION OF DIFFERENTIAL EQUATIONS
SOLUTION OF DIFFERENTIAL EQUATIONSPARTH PANCHAL
 
Data classification sammer
Data classification sammer Data classification sammer
Data classification sammer Sammer Qader
 
Multivariate reg analysis
Multivariate reg analysisMultivariate reg analysis
Multivariate reg analysisIrfan Hussain
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksFederico Cerutti
 
Human Face Detection Based on Combination of Logistic Regression, Distance of...
Human Face Detection Based on Combination of Logistic Regression, Distance of...Human Face Detection Based on Combination of Logistic Regression, Distance of...
Human Face Detection Based on Combination of Logistic Regression, Distance of...IJCSIS Research Publications
 

Similar to Bayesian Variable Selection in Linear Regression and A Comparison (20)

REGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREREGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HERE
 
Extreme bound analysis based on correlation coefficient for optimal regressio...
Extreme bound analysis based on correlation coefficient for optimal regressio...Extreme bound analysis based on correlation coefficient for optimal regressio...
Extreme bound analysis based on correlation coefficient for optimal regressio...
 
Chapter13
Chapter13Chapter13
Chapter13
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
A comparative analysis of predictve data mining techniques3
A comparative analysis of predictve data mining techniques3A comparative analysis of predictve data mining techniques3
A comparative analysis of predictve data mining techniques3
 
Factor analysis
Factor analysis Factor analysis
Factor analysis
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...
 
Regression Analysis by Muthama JM
Regression Analysis by Muthama JM Regression Analysis by Muthama JM
Regression Analysis by Muthama JM
 
JISA_Paper
JISA_PaperJISA_Paper
JISA_Paper
 
ISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptxISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptx
 
Asymptotic properties of bayes factor in one way repeated measurements model
Asymptotic properties of bayes factor in one  way repeated measurements modelAsymptotic properties of bayes factor in one  way repeated measurements model
Asymptotic properties of bayes factor in one way repeated measurements model
 
Asymptotic properties of bayes factor in one way repeated measurements model
Asymptotic properties of bayes factor in one  way repeated measurements modelAsymptotic properties of bayes factor in one  way repeated measurements model
Asymptotic properties of bayes factor in one way repeated measurements model
 
Appendix 2 Probability And Statistics
Appendix 2  Probability And StatisticsAppendix 2  Probability And Statistics
Appendix 2 Probability And Statistics
 
SOLUTION OF DIFFERENTIAL EQUATIONS
SOLUTION OF DIFFERENTIAL EQUATIONSSOLUTION OF DIFFERENTIAL EQUATIONS
SOLUTION OF DIFFERENTIAL EQUATIONS
 
Data classification sammer
Data classification sammer Data classification sammer
Data classification sammer
 
Multivariate reg analysis
Multivariate reg analysisMultivariate reg analysis
Multivariate reg analysis
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural Networks
 
Unit3
Unit3Unit3
Unit3
 
Human Face Detection Based on Combination of Logistic Regression, Distance of...
Human Face Detection Based on Combination of Logistic Regression, Distance of...Human Face Detection Based on Combination of Logistic Regression, Distance of...
Human Face Detection Based on Combination of Logistic Regression, Distance of...
 
Talk 4
Talk 4Talk 4
Talk 4
 

More from Atilla YARDIMCI

İl ve Sektör Düzeyinde Ekonomik Temel Çarpanların En Küçük Kareler Yöntemi il...
İl ve Sektör Düzeyinde Ekonomik Temel Çarpanların En Küçük Kareler Yöntemi il...İl ve Sektör Düzeyinde Ekonomik Temel Çarpanların En Küçük Kareler Yöntemi il...
İl ve Sektör Düzeyinde Ekonomik Temel Çarpanların En Küçük Kareler Yöntemi il...Atilla YARDIMCI
 
Bayesci model ortalaması yöntemi: İstihdam oranı üzerine bir uygulama
Bayesci model ortalaması yöntemi: İstihdam oranı üzerine bir uygulamaBayesci model ortalaması yöntemi: İstihdam oranı üzerine bir uygulama
Bayesci model ortalaması yöntemi: İstihdam oranı üzerine bir uygulamaAtilla YARDIMCI
 
ODALAR VE BORSALAR İÇİN TEMEL İSTATİSTİK YÖNTEMLER
ODALAR VE BORSALAR İÇİN TEMEL İSTATİSTİK YÖNTEMLERODALAR VE BORSALAR İÇİN TEMEL İSTATİSTİK YÖNTEMLER
ODALAR VE BORSALAR İÇİN TEMEL İSTATİSTİK YÖNTEMLERAtilla YARDIMCI
 
Üniversite Sanayi İşbirliğine Sanayi Kesiminin Bakışı
Üniversite Sanayi İşbirliğine Sanayi Kesiminin BakışıÜniversite Sanayi İşbirliğine Sanayi Kesiminin Bakışı
Üniversite Sanayi İşbirliğine Sanayi Kesiminin BakışıAtilla YARDIMCI
 
SANAYİ KÜMELENMELERİNDE NASH DENGESİ: KAVRAMSAL BİR İNCELEME
SANAYİ KÜMELENMELERİNDE NASH DENGESİ: KAVRAMSAL BİR İNCELEMESANAYİ KÜMELENMELERİNDE NASH DENGESİ: KAVRAMSAL BİR İNCELEME
SANAYİ KÜMELENMELERİNDE NASH DENGESİ: KAVRAMSAL BİR İNCELEMEAtilla YARDIMCI
 
Kapasite Raporlarına Göre Ankara Sanayisinin Kümelenme Eğilimleri
Kapasite Raporlarına Göre Ankara  Sanayisinin Kümelenme EğilimleriKapasite Raporlarına Göre Ankara  Sanayisinin Kümelenme Eğilimleri
Kapasite Raporlarına Göre Ankara Sanayisinin Kümelenme EğilimleriAtilla YARDIMCI
 
ÜNİVERSİTE SANAYİ İŞBİRLİĞİNDE SANAYİ KESİMİNİN BEKLENTİLERİ VE SORUNLARI
ÜNİVERSİTE SANAYİ İŞBİRLİĞİNDE SANAYİ KESİMİNİN BEKLENTİLERİ VE SORUNLARIÜNİVERSİTE SANAYİ İŞBİRLİĞİNDE SANAYİ KESİMİNİN BEKLENTİLERİ VE SORUNLARI
ÜNİVERSİTE SANAYİ İŞBİRLİĞİNDE SANAYİ KESİMİNİN BEKLENTİLERİ VE SORUNLARIAtilla YARDIMCI
 
2023 için Büyük Ar-Ge Projeleri Gerekiyor
2023 için Büyük Ar-Ge Projeleri Gerekiyor2023 için Büyük Ar-Ge Projeleri Gerekiyor
2023 için Büyük Ar-Ge Projeleri GerekiyorAtilla YARDIMCI
 
Bilişim Sektör Girişimciliği
Bilişim Sektör GirişimciliğiBilişim Sektör Girişimciliği
Bilişim Sektör GirişimciliğiAtilla YARDIMCI
 
Gelecegin Bilgisayarları
Gelecegin BilgisayarlarıGelecegin Bilgisayarları
Gelecegin BilgisayarlarıAtilla YARDIMCI
 

More from Atilla YARDIMCI (16)

İl ve Sektör Düzeyinde Ekonomik Temel Çarpanların En Küçük Kareler Yöntemi il...
İl ve Sektör Düzeyinde Ekonomik Temel Çarpanların En Küçük Kareler Yöntemi il...İl ve Sektör Düzeyinde Ekonomik Temel Çarpanların En Küçük Kareler Yöntemi il...
İl ve Sektör Düzeyinde Ekonomik Temel Çarpanların En Küçük Kareler Yöntemi il...
 
Bayesci model ortalaması yöntemi: İstihdam oranı üzerine bir uygulama
Bayesci model ortalaması yöntemi: İstihdam oranı üzerine bir uygulamaBayesci model ortalaması yöntemi: İstihdam oranı üzerine bir uygulama
Bayesci model ortalaması yöntemi: İstihdam oranı üzerine bir uygulama
 
Entelektüel Sermaye
Entelektüel SermayeEntelektüel Sermaye
Entelektüel Sermaye
 
Yaratıcı Endüstri
Yaratıcı EndüstriYaratıcı Endüstri
Yaratıcı Endüstri
 
ODALAR VE BORSALAR İÇİN TEMEL İSTATİSTİK YÖNTEMLER
ODALAR VE BORSALAR İÇİN TEMEL İSTATİSTİK YÖNTEMLERODALAR VE BORSALAR İÇİN TEMEL İSTATİSTİK YÖNTEMLER
ODALAR VE BORSALAR İÇİN TEMEL İSTATİSTİK YÖNTEMLER
 
Üniversite Sanayi İşbirliğine Sanayi Kesiminin Bakışı
Üniversite Sanayi İşbirliğine Sanayi Kesiminin BakışıÜniversite Sanayi İşbirliğine Sanayi Kesiminin Bakışı
Üniversite Sanayi İşbirliğine Sanayi Kesiminin Bakışı
 
SANAYİ KÜMELENMELERİNDE NASH DENGESİ: KAVRAMSAL BİR İNCELEME
SANAYİ KÜMELENMELERİNDE NASH DENGESİ: KAVRAMSAL BİR İNCELEMESANAYİ KÜMELENMELERİNDE NASH DENGESİ: KAVRAMSAL BİR İNCELEME
SANAYİ KÜMELENMELERİNDE NASH DENGESİ: KAVRAMSAL BİR İNCELEME
 
Kapasite Raporlarına Göre Ankara Sanayisinin Kümelenme Eğilimleri
Kapasite Raporlarına Göre Ankara  Sanayisinin Kümelenme EğilimleriKapasite Raporlarına Göre Ankara  Sanayisinin Kümelenme Eğilimleri
Kapasite Raporlarına Göre Ankara Sanayisinin Kümelenme Eğilimleri
 
ÜNİVERSİTE SANAYİ İŞBİRLİĞİNDE SANAYİ KESİMİNİN BEKLENTİLERİ VE SORUNLARI
ÜNİVERSİTE SANAYİ İŞBİRLİĞİNDE SANAYİ KESİMİNİN BEKLENTİLERİ VE SORUNLARIÜNİVERSİTE SANAYİ İŞBİRLİĞİNDE SANAYİ KESİMİNİN BEKLENTİLERİ VE SORUNLARI
ÜNİVERSİTE SANAYİ İŞBİRLİĞİNDE SANAYİ KESİMİNİN BEKLENTİLERİ VE SORUNLARI
 
2023 için Büyük Ar-Ge Projeleri Gerekiyor
2023 için Büyük Ar-Ge Projeleri Gerekiyor2023 için Büyük Ar-Ge Projeleri Gerekiyor
2023 için Büyük Ar-Ge Projeleri Gerekiyor
 
Bilişim Sektör Girişimciliği
Bilişim Sektör GirişimciliğiBilişim Sektör Girişimciliği
Bilişim Sektör Girişimciliği
 
Gelecegin Bilgisayarları
Gelecegin BilgisayarlarıGelecegin Bilgisayarları
Gelecegin Bilgisayarları
 
Yapay Zeka
Yapay ZekaYapay Zeka
Yapay Zeka
 
Nanoteknoloji
NanoteknolojiNanoteknoloji
Nanoteknoloji
 
Biyoteknoloji
BiyoteknolojiBiyoteknoloji
Biyoteknoloji
 
Aklın Yolu Demiryolu
Aklın Yolu DemiryoluAklın Yolu Demiryolu
Aklın Yolu Demiryolu
 

Recently uploaded

Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionPriyansha Singh
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 

Recently uploaded (20)

CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorption
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 

Bayesian Variable Selection in Linear Regression and A Comparison

  • 1. Hacettepe Journal of Mathematics and Statistics Volume 31 (2002), 63–76 BAYESIAN VARIABLE SELECTION IN LINEAR REGRESSION AND A COMPARISON Atilla Yardımcı∗ and Aydın Erar† Received 17. 04. 2002 Abstract In this study, Bayesian approaches, such as Zellner, Occam’s Window and Gibbs sampling, have been compared in terms of selecting the correct subset for the variable selection in a linear regression model. The aim of this comparison is to analyze Bayesian variable selection and the behavior of classical criteria by taking into consideration the different values of β and σ and prior expected levels. Key Words: Bayesian variable selection, Prior distribution, Gibbs Sampling, Markov Chain Monte Carlo. 1. Introduction A multiple linear regression equation with n observations and k independent vari- ables can be defined as Y = Xβ + , (1) where Yn×1 is a response vector; Xn×k is a non-stochastic input matrix with rank k , where k = k + 1; βk ×1 is an unknown vector of coefficients; n×1 is an error vector with E( ) = 0 and E( ) = σ2 In. Estimation and prediction equations can be defined as Y = Xˆβ + e and ˆY = Xˆβ, where e is the residual vector of the estimated model, and ˆβ is the estimator of the model parameter obtained using the Least Square Error (LSE) method or the Maximum Likelihood (ML) method under the assumption that is normally distributed. Thus the joint distribution function of the Y’s is obtained by maximizing the likelihood function as follows: L(β, σ/Y) = f(Y/β, σ, X) = 1 (2πσ2)n/2 exp − 1 2σ2 (Y − Xβ) (Y − Xβ) ˆβ = (X X)−1 X Y (2) ˆσ2 = (Y − Xˆβ) (Y − Xˆβ)/(n − k ) = SSE(ˆβ)/(n − k ) (3) ∗Ba¸skent University, Faculty of Science and Letters, Department of Statistics and Computer science, Ba˘glıca, Ankara. †Hacettepe University, Faculty of Science, Department of Statistics, Beytepe, Ankara. 63
  • 2. 64 A. Yardımcı, A. Erar where Y ∼ N(Xβ, σ2 In) and ˆσ2 is corrected such that E(ˆσ2 ) = σ2 [4]. Variable selection in regression analysis can be used to make estimation with minimum cost, less MSE, leaving out the non-informative independent variables from the model and low standard errors in the presence of variables with a high degree of collinearity. 2. Classical Variable Selection Criteria The stepwise and all possible subset methods for variable selection are often used in a linear regression model. The all possible subset method is used in relation with mean square error, coefficient of determination, F statistics, Cp and PRESSp [4]. In recent years, the AIC, AICc and BIC criteria have also been used in variable selection procedures having respect to the amount of information. If p is the number of parameters in the subset, AIC and AICc aim to select the subset which minimizes the values of AIC and AICc obtained by the equations AIC = n log(AKO + 1) + 2p (4) AICc = AIC + 2(p + 1)(p + 2) n − p − 2 . (5) If f(Y/ˆβp, σp, X) is the likelihood function for the subset p, the criterion BIC aims to select the subset which maximizes the equation given below [2] BIC = log f(Y/ˆβp, σp, X) − p 2 log n. Posterior model probabilities and risk criteria have been also been used occasion- ally in variable selection. Posterior model probabilities have been assessed for all the subsets using a logarithmic approach given by Wassermann [16] under the as- sumption of equal prior subset probabilities. The risk criterion has been adapted to variable selection with the help of the equation R(ˆβ) = σ2 tr(V(ˆβ)) (6) given by Toutenburg [15], where V(ˆβ) is the variance-covariance matrix of ˆβ. The rate (6) found for all the probable subsets corresponds to the risk of the estimations. The aim is to get the model with the smallest risk. On the other hand, Yardımcı [17] claims that the rates of risk and posterior probability should be evaluated together during the interpretation. Ideally, the aim is to find a subset that has high posterior probability and low risk. Since it is hard to satisfy this condition, preferences are made within acceptable limits [17]. Furthermore, the Risk Inflation Criterion (RIC) has been found for all probable subsets with the help of the equation defined by Foster and George [5] given below RIC = SSEp + 2pMSEp log(k), (7) where SSEp and MSEp are the sum of squares error and mean square error for the subset p, respectively. It is claimed that RIC will produce the lowest values and thus give more consistent deductions when trying to select the correct variables [17].
  • 3. Bayesian Variable Selection in Linear Regression 65 3. Bayesian Variable Selection Approaches Bayesian variable selection approaches make use of hypothesis testing and stochas- tic searching methods in linear regression. Apart from these approaches, there are various methods based on the selection of the subset where the probability of the posterior subset is the highest. 3.1. Posterior Odds and the Zellner Approach Bayesian approaches to hypothesis testing are quite different from classical prob- ability ratio tests. The difference comes from the use of prior probabilities and distributions. The researcher has to define prior probabilities P(H0) and P(H1) for the correctness of the hypothesis. Here H0 symbolizes the null hypothesis and H1 stands for the alternative hypothesis. To choose one of the hypotheses, pos- terior probabilities have to be assessed after defining the prior probabilities. The Bayes Factor of H1 against H0 has been given by Lee [11] as follows: BF = p(H0/y) p(H1/y) P(H1) P(H2) . (8) Thus, the Bayes factor is the ratio of H0’s posterior odds to its prior odds. If the prior probabilities that verify the hypotheses H0 and H1 are equal, that is P(H0) = P(H1) = 0.5, then Bayes factor is equal to the posterior odds of H0. Although the interpretation of Bayesian hypothesis testing depends on the nature of the research, Kass and Raftery [9] tables are often used. In the Zellner approach, k independent variables in the model are divided into two sets. The variables in the first set are those that are needed in the model and are shown by X1 and β1. The variables in the second set are those that should be removed from the model and are shown by X2 and β2. Here, X1 and X2 = V are n×k1 and n×k2 matrices respectively, where k = k1 + k2. The vectors β1 and β2 have dimensions k1 × 1 and k2 × 1. I denotes the unit vector. The model can be rewritten as follows: Y = αI + X1η + Vβ2 + , where V = I − X1(X1X1)−1 X1 X2, η = β1 + (X1X1)−1 X1X2β2 and X1V = 0 [19]. The hypotheses that will be tested in the case of the division of variables into two sets may be written as H0 : β2 = 0 and H1 : β2 = 0. In conclusion, when the prior probabilities for the rejection or acceptance of the hypothesis are defined to be equal and the prior beliefs for the parameters are defined as noninformative, then the odds ratio B01 is given by Moulton [13] as follows: BF01 = b(v1/2)k2/2 (v1s2 1/v0s2 1/v − 0s2 0)(v1−1)/2 = b(v1/2)k2/2 [1 + (k2/v1)Fk2,v1 ] (v1−1)/2 (9) and ¯y = n i=1 yi n , ˆη = (X1X1)−1 X1Y,
  • 4. 66 A. Yardımcı, A. Erar v0s2 0 = (y − ¯yI − X1 ˆη), v0 = n − k1 − 1, ˆβ2 = (V V)−1 V Y, v1s2 1 = (Y − ¯yI − X1 ˆη − Vˆβ2) (Y − ¯yI − X1 ˆη − Vˆβ2), v1 = n − k1 − k2 − 1, b = √ π/Γ[(k2 + 1)/2], where Fh = Fk2,v1 = ˆβ2V Vˆβ2/k2s2 1 is the F-test statistics. Thus the posterior odds ratio given by equation (9) can be used as the Bayesian test for the variables to be included in the model and those to be removed from the model [19]. 3.2. Occam’s Window Approach In this approach, if a model is not as strong as the others in terms of their posterior odds, then the model and its sub models are eliminated from consideration. If the posterior probabilities of two models are equal, the simple one is accepted (accord- ing to parsimony rule). Hoeting [8] has claimed that if the aim is to interpret the relationship between the variables and to make quick assessments, then Occam Window (OW) approach is useful. If q = 2k is the number of all possible models, then M = M1, . . . , Mq will show all possible subsets. The posterior probability of the subset Mm is obtained as p(Y/Mn) = p(Y/β, Mm)p(β/Mm) dβ. (10) Here, p(Y/β, Mm) is the marginal likelihood of the subset Mm; β is the parameter vector; p(b/Mm) is the prior probability of β for the subset Mm. The equation (10) can be rewritten as p(∆/Y, Hm) = p(∆/Y, βm, Hm)p(βm/Y, Hm) dβm. with the help of the Bayes theorem [9]. Here βm is the vector of independent variables for the subset Mm. The OW approach has two main characteristics: if the subset estimation is weaker than the subset under consideration, then it is eliminated [12]. Thus, the eliminated subset is defined by Hoeting [8] as follows: A = Mm; max(p(Mi/Y )) p(Mm/Y ) > OL . Here, OL is determined by the researcher. The second characteristic is that it determines the alternative subset with the support of data. For this searching, B = Mm; ∃ Mi ∈ A, Mi ⊆ Mq, p(Mi/Y ) p(Mm/Y ) > OR is used [14]. Note that OR is also determined by the researcher. Generally, OR is taken to be 1 [8]. Also, ∆ denotes a situation which can be regarded as a future observation or a loss of an event. It will be written as [8] p(∆/Y ) = Mm∈A p(∆/MmY )p(Y/Mm)p(Mm) Mm∈A p(Y/M + m)p(Mm) .
  • 5. Bayesian Variable Selection in Linear Regression 67 In the OW approach, the rates OL and OR are determined by subjective belief. For example, Madigan and Raftery [12] accepted OL = 1/20 and OR = 1. In an application of this approach, model selection is done in two stages: First a logical model from all the 2k −1 possible models is chosen. Then “up or down algorithms” are used to add or remove variables from the models [8]. 3.3. The Gibbs Sampling Approach Bayesian approaches concentrate on the determination of posterior probabilities or distributions. However, in some cases, the analytic solution of the integrals needed for the assessment of the posterior moments are impossible or hard to achieve. In such cases, approaches involving the creation of Markov chains and the use of convergence characteristics to obtain the posterior distributions are used [17]. Thus, by using Markov Chain and stochastic simulation techniques together, Markov Chain Monte Carlo (MCMC) approaches have been developed. The aim of the MCMC approach is to create a random walk which converges to the target posterior distributions [1]. In the MCMC approach, a sequence θj converging to the posterior distribution is produced in which θj+1 depends on the previous rate θj . With this in mind, sampling is done using the Markov transi- tion distribution g(θ/θj ) [7]. For a given posterior distribution, there are various methods of taking samples from the transition distribution and processing this in- formation. Metropolis algorithms and Gibbs sampling approaches are used to show how these samples taken from a posterior distribution achieve the characteristic of a Markov chain. In Gibbs sampling, initially starting points (X0 1 , X0 2 , . . . , X0 k) are determined when j = 0, and then points X(j) = (X (j) 1 , X (j) 2 , . . . , X (j) k ) are produced in the state x(j−1) . Switching to the state j = j + 1, the process continues until conver- gence is obtained [6]. When convergence is achieved the points x(j) correspond to the rates p(x1, . . . , xk/y). There are various Bayesian variable selection approaches that depend on Gibbs sampling. Here the approach of Kuo and Mallick [10], that is used in this study, will be explained. In Gibbs sampling, a dummy variable given by γ = (γ1, . . . , γk) is used to define the position of the 2k−1 independent variable subsets in the model. If the variable considered is in the model, then γj = 1, if not γj = 0. Since the γ rate is not known, it has to be added to the Bayes process. Thus, the joint prior that will be used in the variable selection process is defined as p(β, σ, γ) = p(β/σ, γ)p(σ/γ)p(γ). Here, the prior defined by Kuo and Mallick [10] for the β coefficients is assumed to be β/σ, γ ∼ N(β0, D0I). In addition, a noninformative prior for σ is defined by p(σ) ∝ 1 σ .
  • 6. 68 A. Yardımcı, A. Erar The prior distribution of σ is obtained with the help of the inverted gamma dis- tribution and the assumptions α → 0 and η → 0. The noninformative prior for γ is given as γj ∼ Bernoulli(1, p). The deductions concentrate on these distributions since the marginal posterior dis- tribution p(γ/Y) contains all the information that is needed for variable selection. In Kuo and Mallick [10], it has been shown that p(γ/Y) can be estimated using Gibbs sampling. Moreover, the least square estimation for the full model can be used when the starting values of β0 and σ0 , that are needed for this process, are γj = 1, (j = 1, . . . , k). Initially we may take γ0 = (1, 1, . . . , 1) . In this way, the marginal posterior density of γj can be given as p(γj/γ−j, β, σ, Y) = B(1, ˜pj), j = 1, 2, . . . , k, where, if γ−j = (γ1, . . . , γj−1, γj+1, . . . , γk), then ˜pj = cj (cj + dj) , cj = pj exp − 1 2σ2 (Y − Xv∗ j ) (Y − Xv∗ j ) , dj = (1 − pj) exp − 1 2σ2 (Y − Xv∗∗ j ) (Y − Xv∗∗ j ) . Here, the vector v∗ j is the column vector of ν with the jth entry replaced by βj, similarly, v∗∗ j is the column vector of ν with the jth entry replaced by 0. Thus, cj is a probability for the acceptance of the jth variable and dj a probability for its rejection. 4. Comparison of the Selection Criteria with Artificial Data For a comparison of selection criteria in parallel to the study done by Kuo and Mallick [10] and George and McCulloch, artificial data was produced in this study for n = 20 and k = 5. The distribution of the variables Xj has been assumed to be normal with N(0, 1). Four types of β structure, (Beta1,. . . ,Beta4), have been determined to observe the effects of the significance level of variables in the model that will be used in selection: Beta 1 : (3, 2, 1, 0, 0) Beta 2 : (1, 1, 1, 0, 0) Beta 3 : (3, −2, 1, 0, 0) Beta 4 : (1, −1, 1, 0, 0) On the other hand, the levels σ2 = 0.01, 1.0, 10 are chosen to see the effects of σ2 on the results. However, the results for σ2 = 1.0 are not given in the tables, but they are mentioned in interpretations. In this study, to observe the effects of prior information levels on the result, the prior assumptions given in Table 4.1 are used. Furthermore, the assumption ei ∼ N(0, σ2 ) is made for the errors. The list of all possible subsets for 5 independent variables is given in Appendix 1.
  • 7. Bayesian Variable Selection in Linear Regression 69 Table 4.1: Prior Assumptions Bayesian Approach Prior Assumption Zellner Equal subset prior probability a. Equal subset prior probability. Occam’s Window b. For the 2nd and 5th subsets, prior probability is 0.1. c. For the 26th subset, prior probability is 0.10. a. D0 = 16, 1 Gibbs Sampling b. Equal prior probability for variables c. Different prior probability for variables During the comparison of these approaches, the BARVAS (Bayesian Regression- Variable Selection) computer program, developed especially for the Bayesian vari- able selection process by Yardımcı [17] is used. The BARVAS program can be used for Bayesian regression and variable selection in linear regression, and it is a packet program where the classical variable selection criteria for variable selection could be also used together [18]. 4.1. Classical Variable Selection In the comparison of classical variable selection, the well-known criteria (Cp, AICc, BIC, Press) will be referred to as Group I, and posterior model probability, Risk, RIC as Group II. The results of classical variable selection for the data are summa- rized in Table 4.2. In this table, the criteria are classified among themselves and the best three subsets are given. It has been observed that the selection criteria in Group I have not been affected by β and σ2 . For the β structures under consid- eration, all the criteria in Group I find the best subset, which is subset number 5. Also, subsets 10 and 26 are found in all criteria as the best selection subsets. The AICc criterion is more appropriate for a model which has fewer variables because of its structure. Group II criteria show parallel results with other criteria. Subset posterior model probability (P) gave the same classification as Cp in all structural cases. The risk criterion gives better results for subsets with fewer variables. Al- though subsets with lower risk accord with other criteria, these subsets have lower dimension for the condition σ2 = 0.01. RIC gives similar results to Group I. As a result of the analysis of different classifications it is observed that the RIC rates are very close to one another. 4.2. Bayesian Approaches The Bayesian variable selection approaches, which are calculated using the BAR- VAS program, are given in Table 4.2. The numbers in the table show the numbers of the selected models. The detailed results for the Bayesian approaches are given as follows.
  • 8. 70 A. Yardımcı, A. Erar Table 4.2. Number of Selected Models through the Classical and Bayesian Approaches (See the Appendix for these numbers). Classical Criteria Bayesian Approaches Zellner OW Gibbs σ2 Beta Cp AICc BIC Press P Risk RIC BF Fh (equal D0 = 1 D0 = 16 prior) 5 2 5 5 5 1 5 5 10 10 5 2 1 10 1 10 2 10 2 10 2 26 5 2 5 2 5 26 13 26 5 26 10 5 21 21 1 5 3 5 5 5 3 5 5 10 5 5 5 2 10 5 10 26 10 4 10 10 26 10 21 3 26 4 26 10 26 7 26 26 5 26 3 6 0.01 5 5 5 5 5 5 5 5 26 26 5 5 3 26 2 26 26 26 10 10 10 10 5 21 2 10 26 10 10 10 2 26 26 5 21 2 18 5 5 5 5 5 7 5 5 26 26 5 5 4 26 6 26 26 26 6 10 26 10 21 21 2 10 26 10 10 10 5 26 10 5 5 26 3 5 5 5 5 5 5 5 5 10 10 5 5 1 10 2 10 26 10 10 26 19 26 21 10 2 26 10 26 10 26 26 10 26 5 5 21 10 5 5 5 5 5 5 5 5 10 10 5 5 2 10 1 10 26 10 26 26 10 26 26 21 1 26 2 26 10 26 10 10 26 5 21 10 2 10 5 2 5 5 5 5 5 5 26 5 5 5 3 26 5 26 10 26 26 26 10 10 26 21 2 10 13 10 2 10 2 10 26 5 10 2 13 5 5 5 5 5 5 5 5 10 10 5 5 4 10 10 10 26 10 10 10 10 26 5 21 3 26 26 26 10 26 26 26 26 5 21 26 21 The Zellner Approach The Zellner approach preferred strongly the best subset number 5 for all the different cases of beta and σ2 . The BF rates for the subset 5 are different from those for the alternative subsets 10 and 26. This difference is also reflected in the interpretations of Jeffreys, Kass and Rafftery. The results of the Zellner approach on standardized data are given in Table 4.3. When the rates Fh are considered, it is observed that the alternative subsets 10 and 26 are preferred. Since these subsets have a larger dimension than the best subset, they give better results for the purpose of this study. However, generally when the Zellner approach is used for normally distributed variables, β and σ2 have no affect.
  • 9. Bayesian Variable Selection in Linear Regression 71 Table 4.3. Results of the Zellner Approach σ2 = 0.01 σ2 = 10.0 for max BF for min Fh for max BF for min Fh Beta Subset BF J KR Subset Fh T Subset BF J KR Subset Fh T No. No. No. No. 5 13.68 S S 10 0.09 A 5 12.16 S H 10 0.02 A 1 2 7.17 H H 26 0.31 A 10 4.46 H H 26 0.57 A 10 4.50 H H 5 0.39 A 26 3.61 H H 5 0.65 A 5 15.29 S S 10 0.06 A 5 13.37 S S 10 0.11 A 2 10 4.55 H H 26 0.11 A 10 4.48 H H 26 0.20 A 26 4.46 H H 5 0.14 A 26 4.29 H H 5 0.44 A 5 14.52 S S 26 0.04 A 5 15.50 S S 26 0.00 A 3 10 4.59 H H 10 0.24 A 10 4.69 H H 10 0.10 A 26 4.21 H H 5 0.25 A 26 4.47 H H 5 0.11 A 5 0.38 H H 26 0.04 A 5 13.74 S S 10 0.02 A 4 26 4.60 H H 10 1.22 A 10 4.65 H H 26 0.37 A 10 2.27 W H 5 1.25 A 26 3.96 H H 5 0.38 A J: Jeffreys, KR: Kass ve Rafftery, T: according to the F table A: Acceptance, S: Strong, H: High, W: Weak, R: Rejection Occam’s Window Approach The OW approach primarily determines the subsets to be studied, rather than determining the best subset. As a result, the researcher chooses the suitable subset for his aims. On artificial data, while the posterior odds are calculated for the values OL = 0.5 and OR=1, the prior probability for all subsets is considered to be equal at the first stage. The OW approach evaluates the full set as a good model. However, the OW approach concentrates more on larger dimensional subsets. The results for the OW approach are given in Table 4.4. In this study, the best subset number 5 is generally selected. However the posterior probabilities of subsets in the classification are close to one another. It may be observed that the OW approach is not much affected by a change in the β and σ2 structures. At the second stage, Case I and II are defined to observe the effects of prior probabilities on the results. In Case I, the prior probability of subset 2 and 5 is 0.10. In Case II, a 0.10 prior probability is given to subset 26. The prior probabilities of the other subsets are 0.90/29 for Case I and 0.90/30 for Case II. It may be observed that the subset prior probabilities have a considerable effect on the OW results. The effect in Case I is more than in Case II. In Case I, the prior probability determined for subset 2 is not sufficient for it to be taken into the classification, but subset 5 has a high posterior probability. The Gibbs Sampling Approach The Gibbs sampling approach used by Kuo and Mallick [10] is applied using the BARVAS program, and the results given in Table 4.5. Gibbs sampling is applied for all cases using simulated data over 10.000 iterations. The sampling time lasted approximately two minutes for each of the σ2 , β and D0 levels. By considering the
  • 10. 72 A. Yardımcı, A. Erar prior parameter estimation, which is β0 = 0, the noninformative prior information has been utilized in the Kuo and Mallick approach to Gibbs sampling. Table 4.4. The Results for Occam’s Window Approach σ2 = 0.01 Equal prior Case 1 Case 2 Beta Subset No. Pos. P Subset No. Pos. P Subset No. Pos. P 10 0.25 5 0.50 26 0.47 1 5 0.23 10 0.15 10 0.17 21 0.21 21 0.13 5 0.15 5 0.27 5 0.58 26 0.52 2 10 0.25 10 0.15 5 0.17 26 0.24 26 0.14 10 0.16 26 0.27 5 0.56 26 0.56 3 5 0.26 26 0.16 5 0.16 21 0.23 21 0.14 21 0.14 26 0.36 5 0.44 26 0.65 4 21 0.30 26 0.24 21 0.17 5 0.18 21 0.21 5 0.09 σ2 = 10.0 Equal prior Case 1 Case 2 Beta Subset No. Pos. P Subset No. Pos. P Subset No. Pos. P 10 0.31 5 0.51 26 0.46 1 21 0.26 10 0.20 10 0.21 5 0.22 21 0.17 21 0.17 10 0.27 5 0.53 26 0.52 2 26 0.25 10 0.17 10 0.16 21 0.24 26 0.15 21 0.15 5 0.26 5 0.56 26 0.53 3 26 0.25 26 0.15 5 0.17 10 0.23 10 0.14 10 0.15 10 0.28 5 0.55 26 0.48 4 5 0.25 10 0.17 10 0.19 21 0.24 21 0.14 5 0.17 Case 1: 0.10 prior probability is given to subsets 2 and 5; Case 2: 0.10 prior probability is given to subset 26. Pos. P: Subset posterior probability. The effect on the results have been observed by using prior variance values of D0 = 1 and 16. The number of zero subsets (Zero) reached, and the subset numbers (Processed) which were processed were given by the BARVAS program for Gibbs sampling. It may be observed that the number of zero subsets is affected
  • 11. Bayesian Variable Selection in Linear Regression 73 by the β structure. A lot of computation time was needed for the zero subsets for σ2 = 0.01 and σ2 = 1.0 with Beta 2, and for σ2 = 10 with Beta 4. It may be observed that D0, which is regarded as representing the prior information level, has no effect on the number of processed subsets, but is effective on the appearance of zero subsets. In this study, it may be observed that β and σ2 are not very influential in obtaining subsets. However for σ2 = 10, the posterior model probabilities for the correct subset are greater than the others. During this study, the high prior probability rates and the different prior probabilities in the model are given for redundant variables in all β structures in order to observe the effects of the independent variables on the process. As a result of this application, it has been observed that the prior probabilities reversibly determined for variables in the model have no effect on the results. Table 4.5 The Results for the Gibbs Sampling Approach σ2 = 0.01 σ2 = 10 D0 = 1 D0 = 16 D0 = 1 D0 = 16 Beta Subset Pos. P Subset Pos. P Subset Pos. P Subset Pos. P No. No. No. No. 5 0.37 2 0.63 5 0.62 5 0.70 2 0.31 5 0.27 10 0.12 2 0.21 1 21 0.24 1 0.05 21 0.10 10 0.05 Zero = 4, Zero = 0, Zero = 48, Zero = 9, Processed = 16 Processed = 14 Processed = 14 Processed = 10 5 0.72 5 0.66 5 0.74 5 0.70 21 0.17 3 0.22 21 0.15 1 0.20 2 3 0.05 6 0.03 10 0.05 2 0.05 Zero = 46, Zero = 389, Zero = 57, Zero = 125, Processed = 25 Processed = 19 Processed = 19 Processed = 19 5 0.70 5 0.75 5 0.71 5 0.68 21 0.10 2 0.19 21 0.14 2 0.26 3 2 0.05 18 0.02 2 0.09 13 0.03 Zero = 0, Zero = 5, Zero = 0, Zero = 16, Processed = 13 Processed = 13 Processed = 10 Processed = 9 5 0.64 5 0.59 5 0.80 5 0.88 21 0.13 2 0.15 21 0.16 3 0.06 4 26 0.11 3 0.12 26 0.03 21 0.03 Zero = 0, Zero = 26, Zero = 48, Zero = 448, Processed = 16 Processed = 18 Processed = 14 Processed = 19 5. Application An application using air pollution data [3] is also used for comparing the criteria. The dependent variable is the quantity of SO2. There are 8 independent variables and 255 subsets for these data. In our study, the classical criterion such as Cp, BIC, PRESSp, P (posterior model probability) and RIC have selected the same
  • 12. 74 A. Yardımcı, A. Erar subsets which given with model numbers 162, 163 in Table 5.1. AICc has selected the models 156, 163. The results of Bayesian criterion are also shown in Table 5.1. The models selected by Bayesian criterion are fairly like to be the results of classical selection. While Occam’s Window method tends to select the more large subsets, Gibbs sampling used by Kuo and Mallick tends to select the subsets with less variables. The advantage of Gibbs sampling is to have processed only 54 instead of 255 subsets for D0=1 or 42 subsets for D0=4. If the models with less variables is preferred according the parsimony rule or the aim of prediction, the models 156 or 163 can be used to make the estimation of parameters of these models. Using least square or Bayesian methods can make the estimation of parameters of these models. Table 5.1 The Variables and Results for Bayesian Selection. Zellner Occams Window Gibbs No. Variables Max. BF Min. Fh Pos. P1 Pos. P2 Pos. P Pos. P (D0 = 1) (D0 = 4) 99 x2, x5, x7 0.22 156 x2, x5, x7, x8 200.0 0.07 0.20 0.33 162 x1, x2, x5, x6, x7, x8 162.2 0.07 0.179 0.160 0.22 163 x2, x5, x6, x7, x8 255.1 0.31 0.23 165 x1, x2, x3, x5, x6, x7, x8 0.02 0.179 0.160 170 x1, x2, x3, x4, x5, x6, x7, x8 0.176 0.157 1) Posterior probability for equal prior, 2) 0.10 prior probability for model 156. 6. Conclusion As a result of this study, it can be claimed that Bayesian approaches are affected by prior changes. Generally, the β and σ2 structures do not have any effect on the results. However Bayesian approaches define the correct subset most of the time. In Gibbs sampling, the prior probabilities defined for variable selection are not effective. However, it can be said that prior information as used in parameter estimation directly affects the results. It is also observed that the prior information level is influential in obtaining zero subsets. In Gibbs sampling it is also observed that few subsets having high posterior probability, are dealt with. It has been observed that the risk criterion proposed by Yardımcı [17] prefers smaller dimensional subsets than the other criteria, and that these subsets have average posterior probabilities. In other words, analyzing and interpreting together the posterior probabilities of subsets with low risk will help the researcher to determine the good subsets. In the Zellner and Occam’s window approaches, subset prior probability is effective on the results. It has also been observed that Occam’s window prefers larger dimensional subsets than both the Zellner approach and Gibbs sampling. The Bayesian approaches are more effective, though not very much so, than all possible subset selection. They have the advantage that only the evaluation of subsets with a high posterior probability is needed.
  • 13. Bayesian Variable Selection in Linear Regression 75 7. Appendix List of subsets for k = 5 Subset Variables Subset Variables No No 1 x1 17 x1, x4, x5 2 x1, x2 18 x1, x2, x4, x5 3 x2 19 x2, x4, x5 4 x2, x3 20 x2, x3, x4, x5 5 x1, x2, x3 21 x1, x2, x3, x4, x5 6 x1, x3 22 x1, x3, x4, x5 7 x3 23 x3, x4, x5 8 x3, x4 24 x3, x5 9 x1, x3, x4 25 x1, x3, x5 10 x1, x2, x3, x4 26 x1, x2, x3, x5 11 x2, x3, x4 27 x2, x3, x5 12 x2, x4 28 x2, x5 13 x1, x2, x4 29 x1, x2, x5 14 x1, x4 30 x1, x5 15 x4 31 x5 16 x4, x5 References [1] Carlin, B.P. and Chib, S. Bayesian model choice via markov chain monte carlo methods, J. R. Statist. Soc. B 57, (3), 473–484, 1995. [2] Cavanaugh, J. E. Unifying the derivations for the Akaike and corrected Akaike information criteria, Statistics and Probability Letters 33, 201–208, 1997. [3] Candan, M. The Robust Estimation in Linear Regression Analysis, Unpub- lished MSc. Thesis (in Turkish), Hacettepe University, Ankara, 2000. [4] Draper, N. and Smith, H. Applied Regression Analysis, John Wiley, 709 p, 1981. [5] Foster, D. P. and George, E. I. The risk inflation criterion for multiple regres- sion, The Annals of Statistics 22 (4), 1947–1975, 1995. [6] Gamerman, D. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference, Chapman and Hall, London, 245p, 1997. [7] George, E. I. and McCulloch, R. E. Approaches for Bayesian variable selec- tion, Statistica Sinica 7, 339–373, 1997.
  • 14. 76 A. Yardımcı, A. Erar [8] Hoeting, J. A. Accounting for model uncertainty in linear regression, PhD. Thesis, University of Washington, 1994. [9] Kass, R. E. and Raftery, A. E. Bayes factors, JASA 90, 773–795, 1995. [10] Kuo, L. and Mallick, B. Variable selection for regression models, Sankhya B 60, 65–81, 1998. [11] Lee, P. M. Bayesian Statistics: An Introduction, Oxford, 264p, 1989. [12] Madigan, D. and Raftery, A. E. Model selection and accounting for model uncertainty in graphical models using Occam’s window, JASA 89, 1535–1546, 1994. [13] Moulton, B. I. A Bayesian approach to regression selection and estimation, with application to price index for radio services, Journal of Econometrics 49, 169–193, 1991. [14] Raftery, A., Madigan, D. and Hoeting, J. Model selection and accounting for model uncertainty in linear regression models, University of Washington, Tech. Report. No. 262., 24p, 1993. [15] Toutenburg, H. Prior information in linear models, John Wiley Inc., 215p, 1982. [16] Wasserman, L. Bayesian model selection, Symposium on Methods for Model Selection, Indiana Uni., Bloomington, 1997. [17] Yardımcı, A. Comparison of Bayesian Approaches to Variable Selection in Linear Regression, Unpublished PhD. Thesis (in Turkish), Hacettepe Univer- sity, Ankara, 2000. [18] Yardımcı, A. and Erar, A. Bayesian variable selection in linear regression and BARVAS Software Package (in Turkish), Second Statistical Congress, Antalya, 386–390, 2001. [19] Zellner, A. and Siow, A. Posterior odds ratio for selected regression hypothe- sis, Bayesian Statistics, University Press, Valencia, 585–603, 1980.