Sparse data formats and efficient numerical methods for uncertainties in nume...
A Note on Confidence Bands for Linear Regression Means-07-24-2015
1. A Note on Confidence Band for Linear
Regression Means
Dipak K Dey, Junfeng Liu, Nalini Ravishanker, Edwards Qiang Zhang (07-24-2015)
ABSTRACT. We are often interested in estimating the overall set of population means (e.g., a
curve or surface) defined by the corresponding set of predictor values (e.g., across certain temporal
and/or spatial domains). When the model is correctly specified, we study a simple confidence band
built upon least squares regression.
1 Introduction
We consider the linear regression model
yi = xiβ + ϵi, ϵi ∼ N(0, σ2
), (i = 1, . . . , n).
Let X (dimension n×p) be the design matrix which collects p-dimensional subject-specific predictor
vectors ({xi}) with sample size n. The ordinary least square estimation for the p-dimensional
coefficient vector β and noise variance (σ2
) are denoted as ˆβ and ˆσ2
n−p, respectively. It is known
that the (1 − α) confidence ellipsoid for CT
β (rank(C)= s ≤ p) is
{CT
β : (CT
β − CT ˆβ)T
[CT
(XT
X)−1
C]−1
(CT ˆβ − CT
β) ≤ sˆσ2
n−pFs,n−p,α}.
Specifically, the (1 − α) confidence ellipsoid for β is
{β : (ˆβ − β)′
(X′
X)(ˆβ − β) ≤ pˆσ2
n−pFp,n−p,α}. (1)
A number of methods on multiple comparison and/or testing are available in the literature (e.g.,
Ravishanker and Dey, 2001).
The rest of the article is organized as follows. Section 2 studies the confidence band for response
mean set estimation; Section 3 compares the confidence band and ellipsoid approaches with regard
to the power for hypothesis testing; and Section 4 concludes with future directions.
1
2. 2 Confidence band for response means
Often times, practice calls for a statistical estimation of an overall joint set of population means
across a specified domain (e.g., a continuous temporal/spatial curve/surface) along with a con-
fidence band around. We start with two special examples to highlight the relationship between
single-point coverage and multiple-point coverage with regard to response mean.
Example 1. yi = µi + ϵi with µi = µ (a constant) and ϵi ∼ N(0, σ2
) independently (i = 1, . . . , n),
the design matrix (X) is the n-dimensional vector 1n = (1, . . . , 1)T
n and the regression coefficient
(β) is µ. The simultaneous coverage of the n response means by the following C-scaling confidence
band amounts to individual coverage with coverage rate
Pr(|ˆµ − µ| ≤ Ctn−1,1−α/2n−1/2
(ϵT (I−Jn/n)ϵ
n−1
)1/2
) = Pr(F1,n−1 ≤ C2
F1,n−1,1−α),
where, Jn = 1n1T
n and t1−α
2
,n−p represents the 1 − α
2
quantiles for the Students’ t-distribution (de-
grees of freedom= n − p). C-specific and n-specific simultaneous coverage (screen set=sample set
with size n) rate profiles (p = 1,α =10%) are displayed in Figure 1.
Example 1∗
. We consider model yi = βi + ϵi, ϵi ∼ N(0, σ2
), (i = 1, . . . , n). With a C-scaling confi-
dence band derived from least squares estimation, the simultaneous coverage amounts to individual
coverage, i.e., ∀ 1 ≤ i ≤ n, we have
Pr(|ˆβi − βi| ≤ Ctn−1,1−α/2(i(XT
X)−1
i)1/2
(ϵT (I−PX )ϵ
n−1
)1/2
)
= Pr(|ˆβ − β| ≤ Ctn−1,1−α/2(XT
X)−1/2
(ϵT (I−PX )ϵ
n−1
)1/2
)
= Pr(F1,n−1 ≤ C2
F1,n−1,1−α).
Involving Z quantile (standard normal distribution approximation), we have
Pr(|ˆβi − βi| ≤ CZ1−α/2(i(XT
X)−1
i)1/2
(ϵT (I−PX )ϵ
n−1
)1/2
) = Pr(F1,n−1 ≤ C2
χ2
1,1−α).
In view of the link between single-point and multiple-point coverage, we proceed to study a general
case. With the prescribed targeted coverage rate (1−α), Bonferonni multiplicity adjustment (across
2
3. sample space) amounts to the following discrete n-adaptive fusion of individual confidence intervals
Band = xi
ˆβ ± t1−α/2n,n−pˆσn−p(xi(XT
X)−1
xT
i )1/2
. (2)
This Bonferonni adjustment is n-specific and (2) is prescribed using the sample size under modeling
(Bonferroni adjustment I). We are not very clear about the consequence when we apply Bonferonni
adjustment to any number (n∗
≥ sample size n) of data points screened for claiming an overall
coverage or not across the domain (Bonferonni adjustment II). For instance, n∗
= 1000 leads to
Band = xi
ˆβ ± t1−α/2000,n−pˆσn−p(xi(XT
X)−1
xT
i )1/2
,
which approaches ∞ as n∗
increases to infinity. Since Bonferonni adjustment likely leads to the
actual coverage probability > 1 − α, we resort to the simple scaled fusion of individual confidence
intervals
Band = xi
ˆβ ± C ∗ t1−α/2,n−pˆσn−p(xi(XT
X)−1
xT
i )1/2
. (3)
The tuning parameter C (for multiplicity adjustment) is to be determined for achieving the pre-
scribed coverage probability (1 − α) exactly. We are interested in the coverage probability of
the band (3) for the underlying response means ({xiβ}) across a continuous domain (e.g., xi =
(1, ti, t2
i , . . . , tp−1
i ), ti = i × 10−3
, 1 ≤ i ≤ 103
). The non-coverage rate resembles the family-wise er-
ror rate (FWER) under multiple testing scenario, i.e., the probability of at least one false rejections
when all hypotheses are null. The confidence band width comparison among Bonferroni adjust-
ment I (t1−α/2n,n−p), Bonferroni adjustment II (t1−α/2000,n−p), individual (t1−α/2,n−p) and C−scaling
(Ct1−α/2,n−p) with (p = 3, C = 1.5 and α = 10%) is demonstrated in Figure 2. Since the C−scaling
band (3) achieves the prescribed coverage probability (1 − α) at n = 30, Both Bonferroni ad-
justments (I and II) are conservative coverage. Conditional on n and p, the C−scaling band (3) is
equivalent to C∗
−scaling Bonferroni adjustments (I) with C∗
determined from the correspondent C.
An confidence band example for model fitting using cosine basis functions is given in Figure 3. Under
the correct models (with fixed p), the confidence band coverage probability profiles (interwoven by
n and C) integrates into an intricate pattern (as n increases from p + 1) which shows clusters
3
4. of n-specific profiles with limit to the normal approximation band. This sort of profile-model(p)
correspondence does not depend on basis function type (e.g., polynomial, radial, cosine, continuous,
discontinuous, etc) under additive models (Figures 4, 5, 6, 7). Out of these plots, a segment with
detailed intersection points is enlarged in Figure 8. The across-the-board coverage rate profiles
(C,n(sample size)=n∗
(screen size), p varies] are plotted in Figure 9. After fitting the model with
sample size n = p + 1, we calculate the overall coverage probability by screening a certain number
of data points (n∗
= p + 1 upward). The results are in Figure 10.
2.1 Estimation under model mis-specification
The consequences from model over-fitting are demonstrated in Figure 11, where the basis function
is polynomial f(x) =
∑p
j=1 jxj−1
. The consequences from model over-fitting are demonstrated
in Figures 12 and 13, where the basis function is polynomial f(x) =
∑p
j=1 jxj−1
and f(x) =
∑p
j=1 10jxj−1
, respectively. The over-fitting has a much less serious consequence than under-fitting.
The consequence from under-fitting depends on the specific model specifications. As an example,
we model and estimate the brain image (http://en.wikipedia.org/wiki/Medical imaging) contour
(Figure 14). The left panel is a crude “3+3” partition of the top and bottom halves and the right
panel is an adaptive “3+4” partition segmented by pre-specified landmarks. This confidence band
estimation is only for illustration purpose since some segments have small numbers of data points
incorporated into modeling and estimating.
3 Hypothesis test
As for hypothesis test, we study the hypothesis testing (H0: β = β0 vs. Ha: β ̸= β0) using different
approaches such as the confidence ellipsoid (1) and C-scaling confidence band approaches. The latter
one claim H0 rejection whenever the confidence band does not cover the overall response mean curve
under H0. When α = 10%, we study three different alternative hypotheses (H0 : β0 = (1, 2, 3) vs.
Ha : βa= (1, 2.1, 3),(1, 2.5, 3),(1, 3, 3)). The powers are compared in Figure 15. C = 1.47 achieves
(1−α) coverage under H0 with sample size n = 30. For this example, the powers are similar between
4
5. two approaches in each of these three cases. Note that none of these confidence band construction
and hypothesis testing procedures depend on the design matrix X and/or σ2
.
4 Conclusion
Under additive models, for specified configuration (e.g., model dimension (p), error rate threshold
(α), sample size (n)), scaled individual confidence intervals (by a constant C) are fused into a
continuous confidence band for studying the underlying overall mean function coverage probability.
In the real world, with large amount of data at hand, the fundamental motivation is to extract
decisive information from data contaminated with noises. Pursuing correct model specification (e.g.,
basis functions and dimension) needs substantial efforts for effective data processing, description and
information (feature) extraction. On one hand, we should highlight the subject-matter experience
such as clear-cut specification of segments, landmarks and curve functions in the image analysis
scenarios. On the other hand, we expect more sophisticated methodologies from statistics and/or
machine learning point of view, such as training and validation, model selection and goodness-of-
fit test, adaptive real-time modeling and prediction protocol developments which are tailored and
adjusted for diversified application platforms.
5 APPENDIX
References
[1] Nalini Ravishanker, Dipak K. Dey (2001). A First Course in Linear Model Theory. Chapman &
Hall/CRC, Boca Raton.
5
6. 0 50 100 150 200 250 300
0.00.20.40.60.81.0
Coverage rate
Sample size (n)
Rate
(Model= µ or (i β ), p=1, α =10%)
C(from 1/20, by 1/20)
0.0 0.5 1.0 1.5 2.0 2.5
0.00.20.40.60.81.0
Coverage rate
C
Rate
(Model= µ or (i β ), p=1, α =10%)
n(2 to 300)
n(2 to 300)
Figure 1: C-specific and n-specific simultaneous coverage (screen set=sample set with size n) rate
profiles (p = 1,α =10%)
6
7. 5 10 15 20 25 30 35 40
24681012
Band widths
n
bandcoefficient
(p=3, C=1.50, α =10%)
Bonferroni(sample size)
Bonferroni (n=1000)
individual
C−scaling
Figure 2: Confidence band width comparison among Bonferroni adjustment I (t1−α/2n,n−p), Bon-
ferroni adjustment II (t1−α/2000,n−p), individual (t1−α/2,n−p) and C−scaling (Ct1−α/2,n−p). p = 3,
C = 1.5 and α = 10%. Note that C = 1.5 arises from configuration (p = 3,n = 30,α =10%) to
achieve the coverage probability 1 − α exactly.
7
8. 0 5 10 15 20
−30−20−100102030
Observations, fitted curve and bands
Time
Response
(n(sample)=30, p=3, C=1.5, σ =10.0, α =10%)
f(x)=1+2cos(x)+3cos(2x)
+
+
+
+
+
+
+
+
+
+
+ +
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
curve(true)
curve(fitted)
band
Figure 3: An confidence band example for model fitting using cosine basis functions. The raw data
points are represented by “+”
8
9. 0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00.20.40.60.81.0
Coverage rate
C
Coveragerate
(f(x)=1+2 x +3 x2
, α =10%)
n (from p+1 to 300)
n (from p+1 to 300)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00.20.40.60.81.0
Coverage rate
C
Coveragerate
(f(x)=1+2cos(x)+3cos(2x), α =10%)
n (from p+1 to 300)
n (from p+1 to 300)
Figure 4: Across-the-board simultaneous coverage rate profile (p =3 with polynomial or cosine
basis: f(x) = 1 + 2x + 3x2
(left panel) or f(x) = 1 + 2cos(x) + 3cos(2x) (right panel), x ∈ [0, 1])
9
10. 0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00.20.40.60.81.0
Coverage rate
C
Coveragerate
(f(x)=1+2cos(x)+3cos(2 x2
), α =10%)
n (from p+1 to 300)
n (from p+1 to 300)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00.20.40.60.81.0
Coverage rate
C
Coveragerate
(f(x)=1+2cos( x3
)+3cos(2 x6
), α =10%)
n (from p+1 to 300)
n (from p+1 to 300)
Figure 5: Across-the-board simultaneous coverage rate profile (p =3 with cosine basis: f(x) =
1 + 2cos(x) + 3cos(2x2
) (left panel) or f(x) = 1 + 2cos(x3
) + 3cos(2x6
) (right panel), x ∈ [0, 1])
10
11. 0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00.20.40.60.81.0
Coverage rate
C
Coveragerate
(f(x)=1+2sin(x−0.5|<0.5)+2sin( (x − 0.5)2
|>0.5)+3cos(2 x2
), α =10%)
n (from p+1 to 300)
n (from p+1 to 300)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00.20.40.60.81.0
Coverage rate
C
Coveragerate
(f(x)=1+2(x−0.5|<0.5)+2(x−1.0|>0.5)+3 x2
), α =10%)
n (from p+1 to 300)
n (from p+1 to 300)
Figure 6: Across-the-board simultaneous coverage rate profile (p =3 with polynomial or cosine
basis: f(x) = 1 + 2sin(x − 1/2| < 1/2) + 2sin((x − 1/2)2
| ≥ 1/2) + 3cos(2x2
) (not derivable, left
panel) or f(x) = 1+2(x−1/2| < 1/2)+2(x−1| ≥ 1/2)+3x2
(discontinuous, right panel), x ∈ [0, 1])
11
12. 0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00.20.40.60.81.0
Coverage rate
C
Coveragerate
( α =10%)
n (from p+1 to 300)
n (from p+1 to 300)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00.20.40.60.81.0
Coverage rate
C
Coveragerate
( α =10%)
n (from p+1 to 300)
n (from p+1 to 300)
Figure 7: Across-the-board simultaneous coverage rate profile (p =3 with polynomial or cosine basis:
f(x) = 1 + 2sin(100(x − 1/2)| < 1/2) + 2sin((100(x − 1/2))2
| ≥ 1/2) + 3cos(2x2
) (no derivable, left
panel) or f(x) = 1 + 2(100(x − 1/2)| < 1/2) + 2(100(x − 1)| ≥ 1/2) + 3(100x)2
(discontinuous, right
panel), x ∈ [0, 1])
12
13. 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55
0.750.800.850.900.95
Coverage rate
C
Coveragerate
(p=3,α=10%)
n (from p+1 to 100)
n (from p+1 to 100)
Figure 8: Clusters of coverage rate profiles at turning point (n varies, p = 3, α =10%.)
13
14. 0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00.20.40.60.81.0
Coverage rate
C
Coveragerate
(n=30(across−the−board), α =10%)
p (1 to 10)
Figure 9: Aross-the-board coverage rate profiles [C,n(sample size)=n∗
(screen size), p varies]
14
15. 0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00.20.40.60.81.0
Coverage rate
C
Coveragerate
(n(sample)=p+1, α =10%)
p
2
10
n(screen)
p+1
50
p=2
3
4
5
6
7
8
9
10
Figure 10: Calculate the overall coverage probability with model fitting sample size n = p + 1 and
the number of screen points varying (n = p + 1 upward)
15
19. 0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Raw data and fitted curve
X
Y
(p(fit)=5, C=2.1, α =10%)
+
+
+
+
+
+
+
+++++++++++
+++++++
+++++++++
+
+
+
+
+++
+
+
+
+
++
+
+
+
+
+
+
+
++
++++++++++++++++++++++++++++
++
+
+
+
+
+
+
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Raw data and fitted curve
X
Y
(p(fit)=5, C=2.1, α =10%)
+
+
+
+
+
+
+
+++++++++++
+++++++
+++++++++
+
+
+
+
+++
+
+
+
+
++
+
+
+
+
+
+
+
++
++++++++++++++++++++++++++++
++
+
+
+
+
+
+
Figure 14: Brain contour estimation example. The “+” represents each captured raw data point
with noise. The estimated smooth curves along with two-sided confidence bands are displayed (the
polynomial function has dimension p = 5, C = 2.1, α = 10%). The left panel is a crude “3+3”
partition and the right panel is a landmark-based adaptive “3+4” partition. The disconnection at
the left endpoint indicates an edge effect.
19