SlideShare a Scribd company logo
1 of 67
Download to read offline
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 1 / 67
Linear Regression & Correlation
HCM University of Technology
Dzung Hoang Nguyen
October 1, 2020
Outline
1 Learning Objectives
2 Regression
Empirical Models
Simple Linear Regression
Hypothesis Tests in Simple Linear Regression
Confidence Intervals
3 Correlation
4 Transformation and Logistic Regression
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 2 / 67
Learning Objectives
• Use simple linear regression for building empirical models to
engineering and scientific data.
• Understand how the method of least squares is used to estimate
the parameters in a linear regression model.
• Analyze residuals to determine if the regression model is an
adequate fit to the data or to see if any underlying assumptions
are violated.
• Test the statistical hypotheses and construct confidence intervals
on the regression model parameters.
• Use the regression model to make a prediction of a future
observation and construct an appropriate prediction interval on
the future observation.
• Apply the correlation model.
• Use simple transformations to achieve a linear regression model.
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 3 / 67
Next Subsection
1 Learning Objectives
2 Regression
Empirical Models
Simple Linear Regression
Hypothesis Tests in Simple Linear Regression
Confidence Intervals
3 Correlation
4 Transformation and Logistic Regression
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 4 / 67
Empirical Models
• Many problems in engineering and science involve exploring the
relationships between two or more variables.
• Regression analysis is a statistical technique that is very useful
for these types of problems.
• For example, in a chemical process, suppose that the yield of the
product is related to the process-operating temperature.
• Regression analysis can be used to build a model to predict yield
at a given temperature level.
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 5 / 67
Oxygen Purity
Example
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 6 / 67
Next Subsection
1 Learning Objectives
2 Regression
Empirical Models
Simple Linear Regression
Hypothesis Tests in Simple Linear Regression
Confidence Intervals
3 Correlation
4 Transformation and Logistic Regression
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 7 / 67
Simple Linear Regression
Definition
• The simple linear regression considers a single regressor or
predictor x and a dependent or response variable Y.
• The expected value of Y at each level of x is a random variable:
E(Y |x) = β0 + β1x
• We assume that each observation, Y, can be described by the
model:
Y = β0 + β1x + 
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 8 / 67
Simple Linear Regression
Example
Fig 11 − 3 : deviation of the data from the estimated regression model
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 9 / 67
Least Squares Estimates
L =
n
X
i=1
2
i =
n
X
i=1
(yi − β0 − β1xi )2
∂L
∂β0
β0,β1
= −2
n
X
i=1

yi − ˆ
β0 − ˆ
β1xi

= 0
∂L
∂β1
β0,β1
= −2
n
X
i=1

yi − ˆ
β0 − ˆ
β1xi

xi = 0
n ˆ
β0 + ˆ
β1
n
X
i=1
xi =
n
X
i=1
yi
ˆ
β0
n
X
i=1
xi + ˆ
β1
n
X
i=1
x2
i =
n
X
i=1
yi xi
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 10 / 67
Least Squares Estimates
The least-squares estimates of the intercept and slope in the simple
linear regression model are:
ˆ
β0 = ȳ − ˆ
β1x̄
ˆ
β1 =
n
X
i=1
yi xi −
n
X
i=1
yi
! n
X
i=1
xi
!
n
n
X
i=1
x2
i −
n
X
i=1
xi
!2
n
where
ȳ =
1
n
n
X
i=1
yi ; x̄ =
1
n
n
X
i=1
xi
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 11 / 67
Least Squares Estimates
Definition
The fitted or estimated regression line is therefore
ŷ = ˆ
β0 + ˆ
β1x
Note that each pair of observations satisfies the relationship:
yi = ˆ
β0 + ˆ
β1xi + ei , i = 1, 2, ...n
where ei = yi − ˆ
yi is called the residual that describes the error in the
fit of the model to the ith
observation yi .
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 12 / 67
Some notations
Sxx =
n
X
i=1
(xi − x̄)2
=
n
X
i=1
x2
i −
n
X
i=1
xi
!2
n
Sxy =
n
X
i=1
yi (xi − x̄)2
=
n
X
i=1
xi yi −
n
X
i=1
xi
! n
X
i=1
yi
!
n
ˆ
β1 =
Sxy
Sxx
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 13 / 67
Oxygen Purity
Example
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 14 / 67
Oxygen Purity
We will fit a simple linear regression model to the oxygen purity data
in Table 11 − 1. The following quantities may be computed:
• n = 20;
20
X
i=1
xi = 23.92;
20
X
i=1
yi = 1843.21
• x̄ = 1.1960; ȳ = 92.1605
•
20
X
i=1
y2
i = 170044.5321
20
X
i=1
x2
i = 29.2892
•
20
X
i=1
xi yi = 2214.6566
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 15 / 67
Oxygen Purity-continued
We will fit a simple linear regression model to the oxygen purity data
in Table 11 − 1. The following quantities may be computed:
Sxx =
20
X
i=1
x2
i −
20
X
i=1
xi
!2
20
= 29.2892 −
(23.92)2
20
= 0.68088
Sxy =
20
X
i=1
xi yi −
20
X
i=1
xi
! 20
X
i=1
yi
!
20
= 2214.6566 −
23.92 ∗ 1843.21
20
= 10.17744
ˆ
β1 =
Sxy
Sxx
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 16 / 67
Oxygen Purity-continued
Therefore, the least squares estimates of the slope and intercept are:
ˆ
β1 =
Sxy
Sxx
=
10.17744
0.68088
= 14.94748
and
ˆ
β0 = ȳ − ˆ
β1x̄ = 92.1605 − (14.94748)1.196 = 74.28331
The fitted simple linear regression model (with the coefficients
reported to three decimal places) is
ŷ = 74.283 + 14.947x
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 17 / 67
Estimating σ2
The residuals ei = yi − ˆ
yi are used to obtain an estimate of σ2
.
The sum of squares of the residuals, often called the error sum of
squares, is
SSE =
n
X
i=1
e2
i =
n
X
i=1
(yi − ˆ
yi )2
The expected value of the error sum of squares is
E(SSE ) = (n − 2)σ2
Therefore, an unbiased estimator of σ2
is
σ̂2
=
SSE
n − 2
where SSE can be easily computed using
SSE = SST − ˆ
β1Sxy
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 18 / 67
Estimating σ2
SSE =
n
X
i=1
e2
i =
n
X
i=1
(yi − ˆ
yi )2
ˆ
yi = ˆ
β0 + ˆ
β1xi ; ˆ
β0 = ȳ − ˆ
β1x̄
=
n
X
i=1
[(yi − ȳ) + ˆ
β1(xi − x̄)]2
SSE =
n
X
i=1
(yi − ȳ)2
+ ˆ
β1
2
n
X
i=1
(xi − x̄)2
− 2 ˆ
β1
n
X
i=1
(xi − x̄)(yi − ȳ)
ˆ
β1
2
= ˆ
β1
Sxy
Sxx
; Sxx =
n
X
i=1
(xi − x̄)2
; Sxy =
n
X
i=1
(xi − x̄)(yi − ȳ)
SSE = SST − ˆ
β1Sxy
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 19 / 67
Properties of the Least Squares Estimators
Slope Properties
E( ˆ
β1) = β1 V ( ˆ
β1) =
σ2
Sxx
Intercept Properties
E( ˆ
β0) = β0 V ( ˆ
β0) = σ2
[
1
n
+
x̄2
Sxx
]
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 20 / 67
Hypothesis Tests in Simple Linear Regression
Use of t-Tests
Suppose we wish to test
H0 : β1 = β1,0
H1 : β1 6= β1,0
An appropriate test statistic would be
t0 =
ˆ
β1 − β1,0
p
σ̂2/Sxx
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 21 / 67
Next Subsection
1 Learning Objectives
2 Regression
Empirical Models
Simple Linear Regression
Hypothesis Tests in Simple Linear Regression
Confidence Intervals
3 Correlation
4 Transformation and Logistic Regression
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 22 / 67
Hypothesis Tests in Simple Linear Regression
Use of t-Tests
The test statistic could also be written as:
t0 =
ˆ
β1 − β1,0
se( ˆ
β1)
We would reject the null hypothesis if
|t0|  tα/2,n−2
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 23 / 67
Hypothesis Tests in Simple Linear Regression
Use of t-Tests
Suppose we wish to test
H0 : β0 = β0,0
H1 : β0 6= β0,0
An appropriate test statistic would be
t0 =
ˆ
β0 − β0,0
p
σ̂2/Sxx
=
ˆ
β0 − β0,0
se( ˆ
β0)
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 24 / 67
Hypothesis Tests in Simple Linear Regression
Use of t-Tests
We would reject the null hypothesis if
|t0|  tα/2,n−2
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 25 / 67
Hypothesis Tests in Simple Linear Regression
Use of t-Tests
An important special case of the hypotheses of Equation is
H0 : β1 = 0
H1 : β1 6= 0
These hypotheses relate to the significance of regression.
Failure to reject H0 is equivalent to concluding that there is no linear
relationship between x and Y .
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 26 / 67
Hypothesis Tests in Simple Linear Regression
Example
We will test for significance of regression using the model for the
oxygen purity data from Example. The hypotheses are:
H0 : β1 = 0
H1 : β1 6= 0
and we will use α = 0.01. From Example and dat Table we have
ˆ
β1 = 14.947 n = 20 Sxx = 0.68088 σ̂2
= 1.18
so the t-statistic become
t0 =
ˆ
β1
p
σ̂2/Sxx
=
ˆ
β1
se( ˆ
β1)
=
14.974
p
1.18/0.68088
= 11.35
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 27 / 67
Hypothesis Tests in Simple Linear Regression
Example
Practical Interpretation: Since the reference value of t is
t0.005,18 = 2.88, the value of the test statistic is very far into the
critical region, implying that H0 : β1 = 0 should be rejected.
There is strong evidence to support this claim. The P-value for this
test is P ' 1.23 ∗ 10−9
. This was obtained manually with a calculator.
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 28 / 67
Hypothesis Tests in Simple Linear Regression
ANOVA approach to test Significance of Regression
The ANOVA identity is
n
X
i=1
(yi − ȳ)2
=
n
X
i=1
(ˆ
yi − ȳ)2
+
n
X
i=1
(yi − ˆ
yi )2
Symbolically,
SST = SSR + SSE
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 29 / 67
Hypothesis Tests in Simple Linear Regression
ANOVA approach to test Significance of Regression
If the null hypothesis, H0 : β1 = 0 is true, the statistic:
F0 =
SSR/1
SSE /(n − 2)
=
MSR
MSE
follows the F1,n−2 distribution and we would reject if F0  fα,1,n−2
The quantities, MSR and MSE are called mean squares.
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 30 / 67
Hypothesis Tests in Simple Linear Regression
ANOVA approach to test Significance of Regression
ANOVA table
Source of Sum of Squares Degrees of Mean Square F0
Variation Freedom
Regression SSR = ˆ
β1Sxy 1 MSR MSR /MSE
Error SSE = SST − ˆ
β1Sxy n − 2 MSE
Total SST n − 1
Note that MSE = σ̂2
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 31 / 67
ANOVA example
Example
We will use the analysis of variance approach to test for significance
of regression using the oxygen purity data model from Example 11-1.
Recall that SST = 173.38; ˆ
β1 = 14.947; Sxy = 10.17744, and n = 20.
The regression sum of squares is:
SSR = ˆ
β1Sxy = 14.947 ∗ 10.17744 = 152.13
and the error sum of squares is:
SSE = SST − SSR = 173.28 − 152.13 = 21.25
The ANOVA for testing H0 : β1 = 0 is your task.
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 32 / 67
ANOVA example
Example
The test statistic is f0 =
MSR
MSE
=
152.13
1.18
= 128.86 for which we find
that the P−value is P ' 1.2310−9
, so we conclude that β1 is not zero.
There are frequently minor differences in terminology among
computer packages. For example, sometimes the regression sum of
squares is called the ”model” sum of squares, and the error sum of
squares is called the ”residual” sum of squares.
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 33 / 67
Next Subsection
1 Learning Objectives
2 Regression
Empirical Models
Simple Linear Regression
Hypothesis Tests in Simple Linear Regression
Confidence Intervals
3 Correlation
4 Transformation and Logistic Regression
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 34 / 67
Confidence Intervals on the Slope and Intercept
Definition
Under the assumption that the observation are normally and
independently distributed, a 100(1 − α)% confidence interval on
the slope β1 in simple linear regression is:
ˆ
β1 − tα/2,n−2
s
σ̂2
Sxx
≤ β1 ≤ ˆ
β1 + tα/2,n−2
s
σ̂2
Sxx
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 35 / 67
Confidence Intervals on the Slope and Intercept
Definition
Similarly, a 100(1 − α)% confidence interval on the slope β0 is:
ˆ
β0 − tα/2,n−2
s
σ̂2[
1
n
+
x̄2
Sxx
] ≤ β0 ≤ ˆ
β0 + tα/2,n−2
s
σ̂2[
1
n
+
x̄2
Sxx
]
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 36 / 67
Confidence Intervals
Example
We will find a 95% confidence interval on the slope of the regression
line using the data in Example 11-1. Recal that
ˆ
β1 = 14.974; Sxx = 0.68088, andσ̂2
= 1.18. Then we find:
ˆ
β1 − t0.025,18
s
σ̂2
Sxx
≤ β1 ≤ ˆ
β1 + t0.025,18
s
σ̂2
Sxx
Or
14.974 − 2.101
r
1.18
0.68088
≤ β1 ≤ 14.974 − 2.101
r
1.18
0.68088
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 37 / 67
Confidence Intervals
Example
14.974 − 2.101
r
1.18
0.68088
≤ β1 ≤ 14.974 − 2.101
r
1.18
0.68088
This simplifies to:
12.181 ≤ β1 ≤ 17.713
Practical Interpretation: This CI does not include zero, so there is
strong evidence (at α = 0.05) that the slope is not zero. The CI is
reasonably narrow (±2.766) because the error variance is fairly small.
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 38 / 67
Confidence Intervals on the Mean Response
µ̂Y |x0
= ˆ
β0 + ˆ
β1x0
Definition
A 100(1 − a)% confidence interval about the mean response at
the value of x = x0, say µY |x0
,is given by
µ̂Y |x0
− tα/2,n−2
s
σ̂2[
1
n
+
(x − x̄)2
Sxx
] ≤ µY |x0
≤ µ̂Y |x0
− tα/2,n−2
s
σ̂2[
1
n
+
(x − x̄)2
Sxx
]
where µ̂Y |x0
= ˆ
β0 + ˆ
β1x0 is computed from the fitted regression
model.
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 39 / 67
Confidence Intervals
Example
We will construct a 95% confidence interval about the mean response
for the data in Example 11-1. The fitted model is
µ̂Y |x0
= 74.283 + 14.947x0, and the 95% confidence interval onµY |x0
is found as:
µ̂Y |x0
± 2.101
r
1.18[
1
20
+
(x0 − 1.1960)2
0.68088
]
Suppose that we are interested in predicting mean oxygen purity when
x0 = 100%5. Then µ̂Y |x1.00
= 74.283 + 14.947(1.00) = 89.23 and the
95% confidence interval is:
89.23 ± 2.101
r
1.18[
1
20
+
(1.00 − 1.1960)2
0.68088
]
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 40 / 67
Confidence Intervals
Example
89.23 ± 2.101
r
1.18[
1
20
+
(1.00 − 1.1960)2
0.68088
]
or
89.23 ± 0.75
Therefore, the 95% CI on µY |1.00 is 88.48 ≤ µY |1.00 ≤ 89.98 ⇒ is a
reasonable narrow CI.
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 41 / 67
Prediction New Observations
A 100(1 − α)% prediction interval on a future observation Y0 at
the value x0 is given by
ˆ
y0 − tα/2,n−2
s
σ̂2[1 +
1
n
+
(x − x̄)2
Sxx
] ≤ Y0
≤ ˆ
y0 − tα/2,n−2
s
σ̂2[1 +
1
n
+
(x − x̄)2
Sxx
]
The value ˆ
y0 is computed from the regression model ˆ
y0 = ˆ
β0 + ˆ
β1x0.
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 42 / 67
Prediction New Observations
Example
To illustrate the construction of a prediction interval, suppose we use
the data in Example 11 − 1 and find a 95% prediction interval on the
next observation of oxygen purity x0 = 1.00%. Recalling that
ˆ
y0 = 89.23, we find that the prediction interval is:
89.23 ± 2.101
r
1.18[1 +
1
20
+
(1.00 − 1.1960)2
0.68088
]
which simplifies to:
86.83 ≤ y0 ≤ 91.63
This is a reasonably narrow prediction interval.
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 43 / 67
Adequacy of the Regression Model
Fitting a regression model requires several assumptions:
• Errors are uncorrelated random variables with mean zero;
• Errors have constant variance; and,
• Errors be normally distributed.
The analyst should always consider the validity of these assumptions
to be doubtful and conduct analyses to examine the adequacy of the
model.
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 44 / 67
Residual Analysis
• The residuals from a regression model are ei = yi − ŷi , where yi
is an actual observation and ŷi is the corresponding fitted value
from the regression model.
• Analysis of the residuals is frequently helpful in checking the
assumption that the errors are approximately normally distributed
with constant variance, and in determining whether additional
terms in the model would be useful.
The analyst should always consider the validity of these assumptions
to be doubtful and conduct analyses to examine the adequacy of the
model.
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 45 / 67
Adequacy of the Regression Model
Example
The regression model for the oxygen purity data is:
ŷ = 74.283 + 14.947
Table 11 − 4 presents the observed and predicted values of y at each
value of x from this data set, along with the corresponding residual.
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 46 / 67
Adequacy of the Regression Model
Example
A normal probability plot of the residuals is shown in Fig.11 − 10.
Since the residuals fall approximately along a straight line in the
figure, we conclude that there is no severe departure from normality.
The residuals are also plotted against the predicted value ŷi in
Fig.11 − 11...
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 47 / 67
Adequacy of the Regression Model
Example
.... and against the hydrocarbon levels xi in Fig.11 − 12. These plots
do not indicate any serious model inadequacies.
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 48 / 67
Coefficient of Determination-R2
• The quantity R2
=
SSR
SST
= 1 −
SSE
SST
is called the coefficient of
determination and is often used to judge the adequacy of a
regression model
• 0 ≤ R2
≤ 1
• We often refer (loosely) to R2
as the amount of variability in the
data explained or accounted for by the regression model
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 49 / 67
Coefficient of Determination-R2
• For the oxygen purity regression model:
R2
=
SSR
SST
=
152.13
173.38
= 0.877
• Thus, the model accounts for 87.7% of the variability in the data.
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 50 / 67
Correlation Coefficient
Definition
We assume that the joint distribution of Xi and Yi is the bivariate
normal distribution, and µY and σ2
Y are the mean and variance of Y ,
µX and σ2
X are the mean and variance X, and ρ is the correlation
coefficient between Y and X. Recall that the correlation coefficient is
defined as ρ =
σXY
σX σY
where σXY is the covariance between Y and X.
The conditional distribution of Y for a given value of X = x is
fY |x (y) =
1
p
2πσY |x
exp

−
1
2

y − β0 − β1x
σY |x
2
#
where
β0 = µY − µX ρ
σY
σX
; β1 =
σY
σX
ρ
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 51 / 67
Definition
It is possible to draw inferences about the correlation coefficient r in
this model. The estimator of ρ is the sample correlation coefficient:
R =
n
X
i=1
Yi (Xi − X̄)
 n
X
i=1
(Xi − X̄)2
n
X
i=1
(Yi − Ȳ )2
#1/2
=
SXY
(SXX ST )1/2
Note that:
ˆ
β1 =

SST
SXX
1/2
R
We may also write:
R2
= ˆ
β1
2 SXX
SYY
=
ˆ
β1SXY
SST
=
SSR
SST
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 52 / 67
It is often useful to test the hypothesis:
H0 : ρ = 0
H1 : ρ 6= 0
The appropriate test statistic for these hypotheses is
T0 =
R
√
n − 2
√
1 − R2
Reject H0 if |t0|  tα/2,n−2
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 53 / 67
The test procedure for the hypothesis
H0 : ρ = 0
H1 : ρ 6= 0
where ρ0 6= 0 is somewhat more complicated. In this case, the
appropriate test statistic is
Z0 = (arctanh(R) − arctanh(ρ0))(n − 3)1/2
Reject H0 if |Z0|  Zα/2 The approximate 100(1 − α)% confidence
interval is
tanh

arctanh(r) −
Zα/2
√
n − 3

≤ ρ ≤ tanh

arctanh(r) +
Zα/2
√
n − 3

where tanh(u) = (eu
− e−u
)/(eu
+ e−u
)
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 54 / 67
Correlation-Example
Example
Wire Bond Pull Strength: an application of regression analysis is
described in which an engineer at a semiconductor assembly plant is
investigating the relationship between pull strength of a wire bond
and two factors: wire length and die height. In this example, we will
consider only one of the factors, the wire length. A random sample of
25 units is selected and tested, and the wire bond pull strength and
wire length arc observed for each unit. The data are shown in Table
1-2. We assume that pull strength and wire length are jointly
normally distributed. Figure 11-13 shows a scatter diagram of wire
bond strength versus wire length.
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 55 / 67
Correlation-Example
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 56 / 67
Correlation-Example continued
Example
Now SXX = 698.56 and SXY = 2027.7132, and the sample correlation
coefficient is
r =
Sxy
[Sxx SST ]1/2
=
2027.7132
[(698.560)(6105.9)]1/2
= 0.9818
Note that r2
= 0.98182
= 0.9640 or that approximately 96.40% of the
variability in pull strength is explained by the linear relationship to
wire length.
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 57 / 67
Correlation-Example continued
Now suppose that we wish to test the hypotheses:
H0 : ρ = 0
H1 : ρ 6= 0
with α = 0.05. We can compute the t-statistic as
t0 =
r
√
n − 2
√
1 − r2
=
0.9818
√
23
√
1 − 0.9640
= 24.8
Because t0.025,23 = 2.069, we reject H0 and conclude that the
correlation coefficient ρ 6= 0.
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 58 / 67
Correlation-Example continued
Finally, we may construct an approximate 95% confidence interval on
r. Since arctanhr = arctanh0.9818 = 2.3452:
tanh

2.3452 −
1.96
√
22

≤ ρ ≤ tanh

2.3452 +
1.96
√
22

which reduces to
0.9585 ≤ ρ ≤ 0.9921
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 59 / 67
Example
A research engineer is investigating
the use of a windmill to generate
electricity and has collected data
on the DC output from this
windmill and the corresponding
wind velocity. If we fit a
straight-line model to the data, the
regression model is:
ŷ = 0.1309 + 0.2411x. Also, we
have statistic summary for this
model as following R2
=
0.8745, MSE = ˆ
σ2 = 0.0557, F0 =
160.26(P  0.0001)
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 60 / 67
Definition
We occasionally find that the straight-line regression model
Y = β0 + β1x +  inappropriate because the true regression function
is nonlinear. Sometimes nonlinearity is visually determined from the
scatter diagram, and sometimes, because of prior experience or
underlying theory, we know in advance that the model is nonlinear.
Occasionally, a scatter diagram will exhibit an apparent nonlinear
relationship between Y and x. In some of these situations, a nonlinear
function can be expressed as a straight line by using a suitable
transformation. Such nonlinear models are called intrinsically linear.
Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 61 / 67

More Related Content

What's hot

Numerical Investigation of Higher Order Nonlinear Problem in the Calculus Of ...
Numerical Investigation of Higher Order Nonlinear Problem in the Calculus Of ...Numerical Investigation of Higher Order Nonlinear Problem in the Calculus Of ...
Numerical Investigation of Higher Order Nonlinear Problem in the Calculus Of ...IOSR Journals
 
Numerical solution of linear volterra fredholm integro-
Numerical solution of linear volterra fredholm integro-Numerical solution of linear volterra fredholm integro-
Numerical solution of linear volterra fredholm integro-Alexander Decker
 
Slides csm
Slides csmSlides csm
Slides csmychaubey
 
A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...
A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...
A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...csandit
 
Introductory maths analysis chapter 00 official
Introductory maths analysis   chapter 00 officialIntroductory maths analysis   chapter 00 official
Introductory maths analysis chapter 00 officialEvert Sandye Taasiringan
 
Symbolic Computation via Gröbner Basis
Symbolic Computation via Gröbner BasisSymbolic Computation via Gröbner Basis
Symbolic Computation via Gröbner BasisIJERA Editor
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsA machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsJCMwave
 
A NEW STUDY TO FIND OUT THE BEST COMPUTATIONAL METHOD FOR SOLVING THE NONLINE...
A NEW STUDY TO FIND OUT THE BEST COMPUTATIONAL METHOD FOR SOLVING THE NONLINE...A NEW STUDY TO FIND OUT THE BEST COMPUTATIONAL METHOD FOR SOLVING THE NONLINE...
A NEW STUDY TO FIND OUT THE BEST COMPUTATIONAL METHOD FOR SOLVING THE NONLINE...mathsjournal
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics JCMwave
 
On solving fuzzy delay differential equationusing bezier curves
On solving fuzzy delay differential equationusing bezier curves  On solving fuzzy delay differential equationusing bezier curves
On solving fuzzy delay differential equationusing bezier curves IJECEIAES
 
A Novel Approach for the Solution of a Love’s Integral Equations Using Bernst...
A Novel Approach for the Solution of a Love’s Integral Equations Using Bernst...A Novel Approach for the Solution of a Love’s Integral Equations Using Bernst...
A Novel Approach for the Solution of a Love’s Integral Equations Using Bernst...IOSRJM
 
Expectation Maximization Algorithm with Combinatorial Assumption
Expectation Maximization Algorithm with Combinatorial AssumptionExpectation Maximization Algorithm with Combinatorial Assumption
Expectation Maximization Algorithm with Combinatorial AssumptionLoc Nguyen
 
Introductory maths analysis chapter 14 official
Introductory maths analysis   chapter 14 officialIntroductory maths analysis   chapter 14 official
Introductory maths analysis chapter 14 officialEvert Sandye Taasiringan
 
Introductory maths analysis chapter 01 official
Introductory maths analysis   chapter 01 officialIntroductory maths analysis   chapter 01 official
Introductory maths analysis chapter 01 officialEvert Sandye Taasiringan
 

What's hot (19)

Numerical Investigation of Higher Order Nonlinear Problem in the Calculus Of ...
Numerical Investigation of Higher Order Nonlinear Problem in the Calculus Of ...Numerical Investigation of Higher Order Nonlinear Problem in the Calculus Of ...
Numerical Investigation of Higher Order Nonlinear Problem in the Calculus Of ...
 
Numerical solution of linear volterra fredholm integro-
Numerical solution of linear volterra fredholm integro-Numerical solution of linear volterra fredholm integro-
Numerical solution of linear volterra fredholm integro-
 
Slides csm
Slides csmSlides csm
Slides csm
 
A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...
A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...
A Generalized Sampling Theorem Over Galois Field Domains for Experimental Des...
 
Introductory maths analysis chapter 00 official
Introductory maths analysis   chapter 00 officialIntroductory maths analysis   chapter 00 official
Introductory maths analysis chapter 00 official
 
Factorial Experiments
Factorial ExperimentsFactorial Experiments
Factorial Experiments
 
Chapter 6 - Matrix Algebra
Chapter 6 - Matrix AlgebraChapter 6 - Matrix Algebra
Chapter 6 - Matrix Algebra
 
Symbolic Computation via Gröbner Basis
Symbolic Computation via Gröbner BasisSymbolic Computation via Gröbner Basis
Symbolic Computation via Gröbner Basis
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsA machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
 
A NEW STUDY TO FIND OUT THE BEST COMPUTATIONAL METHOD FOR SOLVING THE NONLINE...
A NEW STUDY TO FIND OUT THE BEST COMPUTATIONAL METHOD FOR SOLVING THE NONLINE...A NEW STUDY TO FIND OUT THE BEST COMPUTATIONAL METHOD FOR SOLVING THE NONLINE...
A NEW STUDY TO FIND OUT THE BEST COMPUTATIONAL METHOD FOR SOLVING THE NONLINE...
 
Project 7
Project 7Project 7
Project 7
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
 
On solving fuzzy delay differential equationusing bezier curves
On solving fuzzy delay differential equationusing bezier curves  On solving fuzzy delay differential equationusing bezier curves
On solving fuzzy delay differential equationusing bezier curves
 
Chapter 1 - Applications and More Algebra
Chapter 1 - Applications and More AlgebraChapter 1 - Applications and More Algebra
Chapter 1 - Applications and More Algebra
 
A Novel Approach for the Solution of a Love’s Integral Equations Using Bernst...
A Novel Approach for the Solution of a Love’s Integral Equations Using Bernst...A Novel Approach for the Solution of a Love’s Integral Equations Using Bernst...
A Novel Approach for the Solution of a Love’s Integral Equations Using Bernst...
 
Expectation Maximization Algorithm with Combinatorial Assumption
Expectation Maximization Algorithm with Combinatorial AssumptionExpectation Maximization Algorithm with Combinatorial Assumption
Expectation Maximization Algorithm with Combinatorial Assumption
 
Introductory maths analysis chapter 14 official
Introductory maths analysis   chapter 14 officialIntroductory maths analysis   chapter 14 official
Introductory maths analysis chapter 14 official
 
NCM Graph theory talk
NCM Graph theory talkNCM Graph theory talk
NCM Graph theory talk
 
Introductory maths analysis chapter 01 official
Introductory maths analysis   chapter 01 officialIntroductory maths analysis   chapter 01 official
Introductory maths analysis chapter 01 official
 

Similar to Reg n corr

Lecture Notes in Econometrics Arsen Palestini.pdf
Lecture Notes in Econometrics Arsen Palestini.pdfLecture Notes in Econometrics Arsen Palestini.pdf
Lecture Notes in Econometrics Arsen Palestini.pdfMDNomanCh
 
Basics of Regression analysis
 Basics of Regression analysis Basics of Regression analysis
Basics of Regression analysisMahak Vijayvargiya
 
Materi_Business_Intelligence_1.pdf
Materi_Business_Intelligence_1.pdfMateri_Business_Intelligence_1.pdf
Materi_Business_Intelligence_1.pdfHasan Dwi Cahyono
 
On Continuous Approximate Solution of Ordinary Differential Equations
On Continuous Approximate Solution of Ordinary Differential EquationsOn Continuous Approximate Solution of Ordinary Differential Equations
On Continuous Approximate Solution of Ordinary Differential EquationsWaqas Tariq
 
The linear regression model: Theory and Application
The linear regression model: Theory and ApplicationThe linear regression model: Theory and Application
The linear regression model: Theory and ApplicationUniversity of Salerno
 
Regression Analysis by Muthama JM
Regression Analysis by Muthama JM Regression Analysis by Muthama JM
Regression Analysis by Muthama JM Japheth Muthama
 
Regression analysis by Muthama JM
Regression analysis by Muthama JMRegression analysis by Muthama JM
Regression analysis by Muthama JMJapheth Muthama
 
Numerical Solutions of Burgers' Equation Project Report
Numerical Solutions of Burgers' Equation Project ReportNumerical Solutions of Burgers' Equation Project Report
Numerical Solutions of Burgers' Equation Project ReportShikhar Agarwal
 
Comparison of the optimal design
Comparison of the optimal designComparison of the optimal design
Comparison of the optimal designAlexander Decker
 
Newbold_chap12.ppt
Newbold_chap12.pptNewbold_chap12.ppt
Newbold_chap12.pptcfisicaster
 
columbus15_cattaneo.pdf
columbus15_cattaneo.pdfcolumbus15_cattaneo.pdf
columbus15_cattaneo.pdfAhmadM65
 
Econ 103 Homework 2Manu NavjeevanAugust 15, 2022S
Econ 103 Homework 2Manu NavjeevanAugust 15, 2022SEcon 103 Homework 2Manu NavjeevanAugust 15, 2022S
Econ 103 Homework 2Manu NavjeevanAugust 15, 2022SEvonCanales257
 

Similar to Reg n corr (20)

Estimation rs
Estimation rsEstimation rs
Estimation rs
 
Lecture Notes in Econometrics Arsen Palestini.pdf
Lecture Notes in Econometrics Arsen Palestini.pdfLecture Notes in Econometrics Arsen Palestini.pdf
Lecture Notes in Econometrics Arsen Palestini.pdf
 
Basics of Regression analysis
 Basics of Regression analysis Basics of Regression analysis
Basics of Regression analysis
 
Materi_Business_Intelligence_1.pdf
Materi_Business_Intelligence_1.pdfMateri_Business_Intelligence_1.pdf
Materi_Business_Intelligence_1.pdf
 
On Continuous Approximate Solution of Ordinary Differential Equations
On Continuous Approximate Solution of Ordinary Differential EquationsOn Continuous Approximate Solution of Ordinary Differential Equations
On Continuous Approximate Solution of Ordinary Differential Equations
 
The linear regression model: Theory and Application
The linear regression model: Theory and ApplicationThe linear regression model: Theory and Application
The linear regression model: Theory and Application
 
Regression Analysis by Muthama JM
Regression Analysis by Muthama JM Regression Analysis by Muthama JM
Regression Analysis by Muthama JM
 
Regression analysis by Muthama JM
Regression analysis by Muthama JMRegression analysis by Muthama JM
Regression analysis by Muthama JM
 
Cost indexes
Cost indexesCost indexes
Cost indexes
 
Chapter6.pdf.pdf
Chapter6.pdf.pdfChapter6.pdf.pdf
Chapter6.pdf.pdf
 
Seattle.Slides.7
Seattle.Slides.7Seattle.Slides.7
Seattle.Slides.7
 
Class9_PCA_final.ppt
Class9_PCA_final.pptClass9_PCA_final.ppt
Class9_PCA_final.ppt
 
Regression
RegressionRegression
Regression
 
Numerical Solutions of Burgers' Equation Project Report
Numerical Solutions of Burgers' Equation Project ReportNumerical Solutions of Burgers' Equation Project Report
Numerical Solutions of Burgers' Equation Project Report
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
D026017036
D026017036D026017036
D026017036
 
Comparison of the optimal design
Comparison of the optimal designComparison of the optimal design
Comparison of the optimal design
 
Newbold_chap12.ppt
Newbold_chap12.pptNewbold_chap12.ppt
Newbold_chap12.ppt
 
columbus15_cattaneo.pdf
columbus15_cattaneo.pdfcolumbus15_cattaneo.pdf
columbus15_cattaneo.pdf
 
Econ 103 Homework 2Manu NavjeevanAugust 15, 2022S
Econ 103 Homework 2Manu NavjeevanAugust 15, 2022SEcon 103 Homework 2Manu NavjeevanAugust 15, 2022S
Econ 103 Homework 2Manu NavjeevanAugust 15, 2022S
 

Recently uploaded

Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 

Recently uploaded (20)

Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 

Reg n corr

  • 1. Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 1 / 67 Linear Regression & Correlation HCM University of Technology Dzung Hoang Nguyen October 1, 2020
  • 2. Outline 1 Learning Objectives 2 Regression Empirical Models Simple Linear Regression Hypothesis Tests in Simple Linear Regression Confidence Intervals 3 Correlation 4 Transformation and Logistic Regression Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 2 / 67
  • 3. Learning Objectives • Use simple linear regression for building empirical models to engineering and scientific data. • Understand how the method of least squares is used to estimate the parameters in a linear regression model. • Analyze residuals to determine if the regression model is an adequate fit to the data or to see if any underlying assumptions are violated. • Test the statistical hypotheses and construct confidence intervals on the regression model parameters. • Use the regression model to make a prediction of a future observation and construct an appropriate prediction interval on the future observation. • Apply the correlation model. • Use simple transformations to achieve a linear regression model. Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 3 / 67
  • 4. Next Subsection 1 Learning Objectives 2 Regression Empirical Models Simple Linear Regression Hypothesis Tests in Simple Linear Regression Confidence Intervals 3 Correlation 4 Transformation and Logistic Regression Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 4 / 67
  • 5. Empirical Models • Many problems in engineering and science involve exploring the relationships between two or more variables. • Regression analysis is a statistical technique that is very useful for these types of problems. • For example, in a chemical process, suppose that the yield of the product is related to the process-operating temperature. • Regression analysis can be used to build a model to predict yield at a given temperature level. Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 5 / 67
  • 6. Oxygen Purity Example Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 6 / 67
  • 7. Next Subsection 1 Learning Objectives 2 Regression Empirical Models Simple Linear Regression Hypothesis Tests in Simple Linear Regression Confidence Intervals 3 Correlation 4 Transformation and Logistic Regression Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 7 / 67
  • 8. Simple Linear Regression Definition • The simple linear regression considers a single regressor or predictor x and a dependent or response variable Y. • The expected value of Y at each level of x is a random variable: E(Y |x) = β0 + β1x • We assume that each observation, Y, can be described by the model: Y = β0 + β1x + Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 8 / 67
  • 9. Simple Linear Regression Example Fig 11 − 3 : deviation of the data from the estimated regression model Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 9 / 67
  • 10. Least Squares Estimates L = n X i=1 2 i = n X i=1 (yi − β0 − β1xi )2 ∂L ∂β0
  • 11.
  • 12.
  • 13. β0,β1 = −2 n X i=1 yi − ˆ β0 − ˆ β1xi = 0 ∂L ∂β1
  • 14.
  • 15.
  • 16. β0,β1 = −2 n X i=1 yi − ˆ β0 − ˆ β1xi xi = 0 n ˆ β0 + ˆ β1 n X i=1 xi = n X i=1 yi ˆ β0 n X i=1 xi + ˆ β1 n X i=1 x2 i = n X i=1 yi xi Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 10 / 67
  • 17. Least Squares Estimates The least-squares estimates of the intercept and slope in the simple linear regression model are: ˆ β0 = ȳ − ˆ β1x̄ ˆ β1 = n X i=1 yi xi − n X i=1 yi ! n X i=1 xi ! n n X i=1 x2 i − n X i=1 xi !2 n where ȳ = 1 n n X i=1 yi ; x̄ = 1 n n X i=1 xi Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 11 / 67
  • 18. Least Squares Estimates Definition The fitted or estimated regression line is therefore ŷ = ˆ β0 + ˆ β1x Note that each pair of observations satisfies the relationship: yi = ˆ β0 + ˆ β1xi + ei , i = 1, 2, ...n where ei = yi − ˆ yi is called the residual that describes the error in the fit of the model to the ith observation yi . Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 12 / 67
  • 19. Some notations Sxx = n X i=1 (xi − x̄)2 = n X i=1 x2 i − n X i=1 xi !2 n Sxy = n X i=1 yi (xi − x̄)2 = n X i=1 xi yi − n X i=1 xi ! n X i=1 yi ! n ˆ β1 = Sxy Sxx Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 13 / 67
  • 20. Oxygen Purity Example Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 14 / 67
  • 21. Oxygen Purity We will fit a simple linear regression model to the oxygen purity data in Table 11 − 1. The following quantities may be computed: • n = 20; 20 X i=1 xi = 23.92; 20 X i=1 yi = 1843.21 • x̄ = 1.1960; ȳ = 92.1605 • 20 X i=1 y2 i = 170044.5321 20 X i=1 x2 i = 29.2892 • 20 X i=1 xi yi = 2214.6566 Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 15 / 67
  • 22. Oxygen Purity-continued We will fit a simple linear regression model to the oxygen purity data in Table 11 − 1. The following quantities may be computed: Sxx = 20 X i=1 x2 i − 20 X i=1 xi !2 20 = 29.2892 − (23.92)2 20 = 0.68088 Sxy = 20 X i=1 xi yi − 20 X i=1 xi ! 20 X i=1 yi ! 20 = 2214.6566 − 23.92 ∗ 1843.21 20 = 10.17744 ˆ β1 = Sxy Sxx Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 16 / 67
  • 23. Oxygen Purity-continued Therefore, the least squares estimates of the slope and intercept are: ˆ β1 = Sxy Sxx = 10.17744 0.68088 = 14.94748 and ˆ β0 = ȳ − ˆ β1x̄ = 92.1605 − (14.94748)1.196 = 74.28331 The fitted simple linear regression model (with the coefficients reported to three decimal places) is ŷ = 74.283 + 14.947x Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 17 / 67
  • 24. Estimating σ2 The residuals ei = yi − ˆ yi are used to obtain an estimate of σ2 . The sum of squares of the residuals, often called the error sum of squares, is SSE = n X i=1 e2 i = n X i=1 (yi − ˆ yi )2 The expected value of the error sum of squares is E(SSE ) = (n − 2)σ2 Therefore, an unbiased estimator of σ2 is σ̂2 = SSE n − 2 where SSE can be easily computed using SSE = SST − ˆ β1Sxy Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 18 / 67
  • 25. Estimating σ2 SSE = n X i=1 e2 i = n X i=1 (yi − ˆ yi )2 ˆ yi = ˆ β0 + ˆ β1xi ; ˆ β0 = ȳ − ˆ β1x̄ = n X i=1 [(yi − ȳ) + ˆ β1(xi − x̄)]2 SSE = n X i=1 (yi − ȳ)2 + ˆ β1 2 n X i=1 (xi − x̄)2 − 2 ˆ β1 n X i=1 (xi − x̄)(yi − ȳ) ˆ β1 2 = ˆ β1 Sxy Sxx ; Sxx = n X i=1 (xi − x̄)2 ; Sxy = n X i=1 (xi − x̄)(yi − ȳ) SSE = SST − ˆ β1Sxy Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 19 / 67
  • 26. Properties of the Least Squares Estimators Slope Properties E( ˆ β1) = β1 V ( ˆ β1) = σ2 Sxx Intercept Properties E( ˆ β0) = β0 V ( ˆ β0) = σ2 [ 1 n + x̄2 Sxx ] Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 20 / 67
  • 27. Hypothesis Tests in Simple Linear Regression Use of t-Tests Suppose we wish to test H0 : β1 = β1,0 H1 : β1 6= β1,0 An appropriate test statistic would be t0 = ˆ β1 − β1,0 p σ̂2/Sxx Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 21 / 67
  • 28. Next Subsection 1 Learning Objectives 2 Regression Empirical Models Simple Linear Regression Hypothesis Tests in Simple Linear Regression Confidence Intervals 3 Correlation 4 Transformation and Logistic Regression Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 22 / 67
  • 29. Hypothesis Tests in Simple Linear Regression Use of t-Tests The test statistic could also be written as: t0 = ˆ β1 − β1,0 se( ˆ β1) We would reject the null hypothesis if |t0| tα/2,n−2 Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 23 / 67
  • 30. Hypothesis Tests in Simple Linear Regression Use of t-Tests Suppose we wish to test H0 : β0 = β0,0 H1 : β0 6= β0,0 An appropriate test statistic would be t0 = ˆ β0 − β0,0 p σ̂2/Sxx = ˆ β0 − β0,0 se( ˆ β0) Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 24 / 67
  • 31. Hypothesis Tests in Simple Linear Regression Use of t-Tests We would reject the null hypothesis if |t0| tα/2,n−2 Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 25 / 67
  • 32. Hypothesis Tests in Simple Linear Regression Use of t-Tests An important special case of the hypotheses of Equation is H0 : β1 = 0 H1 : β1 6= 0 These hypotheses relate to the significance of regression. Failure to reject H0 is equivalent to concluding that there is no linear relationship between x and Y . Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 26 / 67
  • 33. Hypothesis Tests in Simple Linear Regression Example We will test for significance of regression using the model for the oxygen purity data from Example. The hypotheses are: H0 : β1 = 0 H1 : β1 6= 0 and we will use α = 0.01. From Example and dat Table we have ˆ β1 = 14.947 n = 20 Sxx = 0.68088 σ̂2 = 1.18 so the t-statistic become t0 = ˆ β1 p σ̂2/Sxx = ˆ β1 se( ˆ β1) = 14.974 p 1.18/0.68088 = 11.35 Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 27 / 67
  • 34. Hypothesis Tests in Simple Linear Regression Example Practical Interpretation: Since the reference value of t is t0.005,18 = 2.88, the value of the test statistic is very far into the critical region, implying that H0 : β1 = 0 should be rejected. There is strong evidence to support this claim. The P-value for this test is P ' 1.23 ∗ 10−9 . This was obtained manually with a calculator. Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 28 / 67
  • 35. Hypothesis Tests in Simple Linear Regression ANOVA approach to test Significance of Regression The ANOVA identity is n X i=1 (yi − ȳ)2 = n X i=1 (ˆ yi − ȳ)2 + n X i=1 (yi − ˆ yi )2 Symbolically, SST = SSR + SSE Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 29 / 67
  • 36. Hypothesis Tests in Simple Linear Regression ANOVA approach to test Significance of Regression If the null hypothesis, H0 : β1 = 0 is true, the statistic: F0 = SSR/1 SSE /(n − 2) = MSR MSE follows the F1,n−2 distribution and we would reject if F0 fα,1,n−2 The quantities, MSR and MSE are called mean squares. Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 30 / 67
  • 37. Hypothesis Tests in Simple Linear Regression ANOVA approach to test Significance of Regression ANOVA table Source of Sum of Squares Degrees of Mean Square F0 Variation Freedom Regression SSR = ˆ β1Sxy 1 MSR MSR /MSE Error SSE = SST − ˆ β1Sxy n − 2 MSE Total SST n − 1 Note that MSE = σ̂2 Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 31 / 67
  • 38. ANOVA example Example We will use the analysis of variance approach to test for significance of regression using the oxygen purity data model from Example 11-1. Recall that SST = 173.38; ˆ β1 = 14.947; Sxy = 10.17744, and n = 20. The regression sum of squares is: SSR = ˆ β1Sxy = 14.947 ∗ 10.17744 = 152.13 and the error sum of squares is: SSE = SST − SSR = 173.28 − 152.13 = 21.25 The ANOVA for testing H0 : β1 = 0 is your task. Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 32 / 67
  • 39. ANOVA example Example The test statistic is f0 = MSR MSE = 152.13 1.18 = 128.86 for which we find that the P−value is P ' 1.2310−9 , so we conclude that β1 is not zero. There are frequently minor differences in terminology among computer packages. For example, sometimes the regression sum of squares is called the ”model” sum of squares, and the error sum of squares is called the ”residual” sum of squares. Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 33 / 67
  • 40. Next Subsection 1 Learning Objectives 2 Regression Empirical Models Simple Linear Regression Hypothesis Tests in Simple Linear Regression Confidence Intervals 3 Correlation 4 Transformation and Logistic Regression Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 34 / 67
  • 41. Confidence Intervals on the Slope and Intercept Definition Under the assumption that the observation are normally and independently distributed, a 100(1 − α)% confidence interval on the slope β1 in simple linear regression is: ˆ β1 − tα/2,n−2 s σ̂2 Sxx ≤ β1 ≤ ˆ β1 + tα/2,n−2 s σ̂2 Sxx Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 35 / 67
  • 42. Confidence Intervals on the Slope and Intercept Definition Similarly, a 100(1 − α)% confidence interval on the slope β0 is: ˆ β0 − tα/2,n−2 s σ̂2[ 1 n + x̄2 Sxx ] ≤ β0 ≤ ˆ β0 + tα/2,n−2 s σ̂2[ 1 n + x̄2 Sxx ] Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 36 / 67
  • 43. Confidence Intervals Example We will find a 95% confidence interval on the slope of the regression line using the data in Example 11-1. Recal that ˆ β1 = 14.974; Sxx = 0.68088, andσ̂2 = 1.18. Then we find: ˆ β1 − t0.025,18 s σ̂2 Sxx ≤ β1 ≤ ˆ β1 + t0.025,18 s σ̂2 Sxx Or 14.974 − 2.101 r 1.18 0.68088 ≤ β1 ≤ 14.974 − 2.101 r 1.18 0.68088 Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 37 / 67
  • 44. Confidence Intervals Example 14.974 − 2.101 r 1.18 0.68088 ≤ β1 ≤ 14.974 − 2.101 r 1.18 0.68088 This simplifies to: 12.181 ≤ β1 ≤ 17.713 Practical Interpretation: This CI does not include zero, so there is strong evidence (at α = 0.05) that the slope is not zero. The CI is reasonably narrow (±2.766) because the error variance is fairly small. Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 38 / 67
  • 45. Confidence Intervals on the Mean Response µ̂Y |x0 = ˆ β0 + ˆ β1x0 Definition A 100(1 − a)% confidence interval about the mean response at the value of x = x0, say µY |x0 ,is given by µ̂Y |x0 − tα/2,n−2 s σ̂2[ 1 n + (x − x̄)2 Sxx ] ≤ µY |x0 ≤ µ̂Y |x0 − tα/2,n−2 s σ̂2[ 1 n + (x − x̄)2 Sxx ] where µ̂Y |x0 = ˆ β0 + ˆ β1x0 is computed from the fitted regression model. Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 39 / 67
  • 46. Confidence Intervals Example We will construct a 95% confidence interval about the mean response for the data in Example 11-1. The fitted model is µ̂Y |x0 = 74.283 + 14.947x0, and the 95% confidence interval onµY |x0 is found as: µ̂Y |x0 ± 2.101 r 1.18[ 1 20 + (x0 − 1.1960)2 0.68088 ] Suppose that we are interested in predicting mean oxygen purity when x0 = 100%5. Then µ̂Y |x1.00 = 74.283 + 14.947(1.00) = 89.23 and the 95% confidence interval is: 89.23 ± 2.101 r 1.18[ 1 20 + (1.00 − 1.1960)2 0.68088 ] Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 40 / 67
  • 47. Confidence Intervals Example 89.23 ± 2.101 r 1.18[ 1 20 + (1.00 − 1.1960)2 0.68088 ] or 89.23 ± 0.75 Therefore, the 95% CI on µY |1.00 is 88.48 ≤ µY |1.00 ≤ 89.98 ⇒ is a reasonable narrow CI. Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 41 / 67
  • 48. Prediction New Observations A 100(1 − α)% prediction interval on a future observation Y0 at the value x0 is given by ˆ y0 − tα/2,n−2 s σ̂2[1 + 1 n + (x − x̄)2 Sxx ] ≤ Y0 ≤ ˆ y0 − tα/2,n−2 s σ̂2[1 + 1 n + (x − x̄)2 Sxx ] The value ˆ y0 is computed from the regression model ˆ y0 = ˆ β0 + ˆ β1x0. Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 42 / 67
  • 49. Prediction New Observations Example To illustrate the construction of a prediction interval, suppose we use the data in Example 11 − 1 and find a 95% prediction interval on the next observation of oxygen purity x0 = 1.00%. Recalling that ˆ y0 = 89.23, we find that the prediction interval is: 89.23 ± 2.101 r 1.18[1 + 1 20 + (1.00 − 1.1960)2 0.68088 ] which simplifies to: 86.83 ≤ y0 ≤ 91.63 This is a reasonably narrow prediction interval. Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 43 / 67
  • 50. Adequacy of the Regression Model Fitting a regression model requires several assumptions: • Errors are uncorrelated random variables with mean zero; • Errors have constant variance; and, • Errors be normally distributed. The analyst should always consider the validity of these assumptions to be doubtful and conduct analyses to examine the adequacy of the model. Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 44 / 67
  • 51. Residual Analysis • The residuals from a regression model are ei = yi − ŷi , where yi is an actual observation and ŷi is the corresponding fitted value from the regression model. • Analysis of the residuals is frequently helpful in checking the assumption that the errors are approximately normally distributed with constant variance, and in determining whether additional terms in the model would be useful. The analyst should always consider the validity of these assumptions to be doubtful and conduct analyses to examine the adequacy of the model. Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 45 / 67
  • 52. Adequacy of the Regression Model Example The regression model for the oxygen purity data is: ŷ = 74.283 + 14.947 Table 11 − 4 presents the observed and predicted values of y at each value of x from this data set, along with the corresponding residual. Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 46 / 67
  • 53. Adequacy of the Regression Model Example A normal probability plot of the residuals is shown in Fig.11 − 10. Since the residuals fall approximately along a straight line in the figure, we conclude that there is no severe departure from normality. The residuals are also plotted against the predicted value ŷi in Fig.11 − 11... Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 47 / 67
  • 54. Adequacy of the Regression Model Example .... and against the hydrocarbon levels xi in Fig.11 − 12. These plots do not indicate any serious model inadequacies. Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 48 / 67
  • 55. Coefficient of Determination-R2 • The quantity R2 = SSR SST = 1 − SSE SST is called the coefficient of determination and is often used to judge the adequacy of a regression model • 0 ≤ R2 ≤ 1 • We often refer (loosely) to R2 as the amount of variability in the data explained or accounted for by the regression model Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 49 / 67
  • 56. Coefficient of Determination-R2 • For the oxygen purity regression model: R2 = SSR SST = 152.13 173.38 = 0.877 • Thus, the model accounts for 87.7% of the variability in the data. Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 50 / 67
  • 57. Correlation Coefficient Definition We assume that the joint distribution of Xi and Yi is the bivariate normal distribution, and µY and σ2 Y are the mean and variance of Y , µX and σ2 X are the mean and variance X, and ρ is the correlation coefficient between Y and X. Recall that the correlation coefficient is defined as ρ = σXY σX σY where σXY is the covariance between Y and X. The conditional distribution of Y for a given value of X = x is fY |x (y) = 1 p 2πσY |x exp − 1 2 y − β0 − β1x σY |x 2 # where β0 = µY − µX ρ σY σX ; β1 = σY σX ρ Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 51 / 67
  • 58. Definition It is possible to draw inferences about the correlation coefficient r in this model. The estimator of ρ is the sample correlation coefficient: R = n X i=1 Yi (Xi − X̄) n X i=1 (Xi − X̄)2 n X i=1 (Yi − Ȳ )2 #1/2 = SXY (SXX ST )1/2 Note that: ˆ β1 = SST SXX 1/2 R We may also write: R2 = ˆ β1 2 SXX SYY = ˆ β1SXY SST = SSR SST Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 52 / 67
  • 59. It is often useful to test the hypothesis: H0 : ρ = 0 H1 : ρ 6= 0 The appropriate test statistic for these hypotheses is T0 = R √ n − 2 √ 1 − R2 Reject H0 if |t0| tα/2,n−2 Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 53 / 67
  • 60. The test procedure for the hypothesis H0 : ρ = 0 H1 : ρ 6= 0 where ρ0 6= 0 is somewhat more complicated. In this case, the appropriate test statistic is Z0 = (arctanh(R) − arctanh(ρ0))(n − 3)1/2 Reject H0 if |Z0| Zα/2 The approximate 100(1 − α)% confidence interval is tanh arctanh(r) − Zα/2 √ n − 3 ≤ ρ ≤ tanh arctanh(r) + Zα/2 √ n − 3 where tanh(u) = (eu − e−u )/(eu + e−u ) Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 54 / 67
  • 61. Correlation-Example Example Wire Bond Pull Strength: an application of regression analysis is described in which an engineer at a semiconductor assembly plant is investigating the relationship between pull strength of a wire bond and two factors: wire length and die height. In this example, we will consider only one of the factors, the wire length. A random sample of 25 units is selected and tested, and the wire bond pull strength and wire length arc observed for each unit. The data are shown in Table 1-2. We assume that pull strength and wire length are jointly normally distributed. Figure 11-13 shows a scatter diagram of wire bond strength versus wire length. Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 55 / 67
  • 62. Correlation-Example Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 56 / 67
  • 63. Correlation-Example continued Example Now SXX = 698.56 and SXY = 2027.7132, and the sample correlation coefficient is r = Sxy [Sxx SST ]1/2 = 2027.7132 [(698.560)(6105.9)]1/2 = 0.9818 Note that r2 = 0.98182 = 0.9640 or that approximately 96.40% of the variability in pull strength is explained by the linear relationship to wire length. Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 57 / 67
  • 64. Correlation-Example continued Now suppose that we wish to test the hypotheses: H0 : ρ = 0 H1 : ρ 6= 0 with α = 0.05. We can compute the t-statistic as t0 = r √ n − 2 √ 1 − r2 = 0.9818 √ 23 √ 1 − 0.9640 = 24.8 Because t0.025,23 = 2.069, we reject H0 and conclude that the correlation coefficient ρ 6= 0. Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 58 / 67
  • 65. Correlation-Example continued Finally, we may construct an approximate 95% confidence interval on r. Since arctanhr = arctanh0.9818 = 2.3452: tanh 2.3452 − 1.96 √ 22 ≤ ρ ≤ tanh 2.3452 + 1.96 √ 22 which reduces to 0.9585 ≤ ρ ≤ 0.9921 Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 59 / 67
  • 66. Example A research engineer is investigating the use of a windmill to generate electricity and has collected data on the DC output from this windmill and the corresponding wind velocity. If we fit a straight-line model to the data, the regression model is: ŷ = 0.1309 + 0.2411x. Also, we have statistic summary for this model as following R2 = 0.8745, MSE = ˆ σ2 = 0.0557, F0 = 160.26(P 0.0001) Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 60 / 67
  • 67. Definition We occasionally find that the straight-line regression model Y = β0 + β1x + inappropriate because the true regression function is nonlinear. Sometimes nonlinearity is visually determined from the scatter diagram, and sometimes, because of prior experience or underlying theory, we know in advance that the model is nonlinear. Occasionally, a scatter diagram will exhibit an apparent nonlinear relationship between Y and x. In some of these situations, a nonlinear function can be expressed as a straight line by using a suitable transformation. Such nonlinear models are called intrinsically linear. Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 61 / 67
  • 68. Transformation and Logistic Regression Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 62 / 67
  • 69. Example Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 63 / 67
  • 70. A plot of the residuals from the transformed model versus ˆ yi is shown in Figure 11-17. This plot does not reveal any serious problem with inequality of variance. The normal probability plot, shown in Figure 11-18, gives a mild indication that the errors come from a distribution with heavier tails than the normal (notice the slight upward and downward curve at the extremes). This normal probability plot has the z-score value plotted on the horizontal axis. Since there is no strong signal of model inadequacy, we conclude that the transformed model is satisfactory. Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 64 / 67
  • 71. Important Terms and Concepts • Analysis of variance test in regression • Confidence interval on mean response • Correlation coefficient • Empirical model • Confidence intervals on model parameters • Intrinsically linear model • Least squares estimation of regression model parameters • Logistics regression • Model adequacy checking • Odds ratio • Prediction interval on a future observation • Regression analysis • Residual plots • Residuals • Scatter diagram • Simple linear regression model standard error • Statistical test on model parameters • Transformations Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 65 / 67
  • 72. Important Terms and Concepts Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 66 / 67
  • 73. Last Page Summary End of my talk with a tidy TU Delft lay-out. Thank you! Dzung Hoang Nguyen (SensoryLab) Regression and Correlation talk October 1, 2020 67 / 67