SlideShare a Scribd company logo
1 of 22
Elementary Statistics
Chapter 10: Correlation
and Regression
10.2 Regression
1
Chapter 10: Correlation and Regression
10.1 Correlation
10.2 Regression
10.3 Prediction Intervals and Variation
10.4 Multiple Regression
10.5 Nonlinear Regression
2
Objectives:
• Draw a scatter plot for a set of ordered pairs.
• Compute the correlation coefficient.
• Test the hypothesis H0: ρ = 0.
• Compute the equation of the regression line & the coefficient of determination.
• Compute the standard error of the estimate & a prediction interval.
Key Concepts: If the value of the correlation coefficient is significant, determine the equation of
the regression line.
Find the equation of the straight line that best fits the points in a scatterplot of paired sample data.
That best-fitting straight line is called the regression line, and its equation is called the regression
equation. The regression equation expresses a relationship between x (called the independent
variable, predictor variable or explanatory variable), and y (called the dependent variable or response
variable). The typical equation of a straight line is expressed in the form of y = mx + b, where b is
the y-intercept and m is the slope.
Regression Line: Given a collection of paired sample data, the regression line (or line of best fit, or
least-squares line) is the straight line that “best” fits the scatterplot of the data.
If there is not a significant linear correlation, the best predicted y-value is 𝑦.
If there is a significant linear correlation, the best predicted y-value is found by substituting the x-
value into the regression equation.
10.2 Regression
3
𝑦 = 𝑏0 + 𝑏1𝑥, OR 𝑦 = 𝑎 + 𝑏𝑥,
𝑆𝑙𝑜𝑝𝑒: 𝑏1 =
𝑛 𝑥𝑦 − 𝑥 𝑦
𝑛 𝑥2 − 𝑥 2
𝑌 − int 𝑒 𝑟𝑐𝑒𝑝𝑡: 𝑏0 = 𝑦 − 𝑏1𝑥,
𝑦 =
𝑦
𝑛
, 𝑥 =
𝑥
𝑛
Population Parameter: 𝒚 = 𝜷𝟎 + 𝜷𝟏𝒙
Sample Statistic : 𝒚 = 𝒃𝟎 + 𝒃𝟏𝒙
Ti calculator : 𝒚 = 𝒂 + 𝒃𝒙
Also: 𝑏1 = 𝑏 = 𝑟
𝑠𝑦
𝑠𝑥
, 𝑏0 = 𝑎 = 𝑦 − 𝑏1𝑥
r is the linear correlation coefficient
sy is the standard deviation of the sample y values
sx is the standard deviation of the sample x values.
Regression equations are often useful for predicting the value of one variable, given
some specific value of the other variable:
1. Bad Model: If the regression equation does not appear to be useful for making
predictions, don’t use the regression equation for making predictions. For bad
models, the best predicted value of a variable is simply its sample mean: 𝒚.
2. Good Model: Use the regression equation for predictions only if the graph of the
regression line on the scatterplot confirms that the regression line fits the points
reasonably well.
3. Correlation: Use the regression equation for predictions only if the linear
correlation coefficient r indicates that there is a linear correlation between the two
variables.
4. Scope: Use the regression line for predictions only if the data do not go much
beyond the scope of the available sample data.
4
10.2 Regression, Making Predictions
Best fit means that the sum of
the squares of the vertical
distance (residuals) from each
point to the line is at a minimum.
5
10.2 Regression Population Parameter: 𝒚 = 𝜷𝟎 + 𝜷𝟏𝒙
Sample Statistic : 𝒚 = 𝒃𝟎 + 𝒃𝟏𝒙
Ti calculator : 𝒚 = 𝒂 + 𝒃𝒙
𝑦 = 𝑏0 + 𝑏1𝑥, OR 𝑦 = 𝑎 + 𝑏𝑥,
𝑆𝑙𝑜𝑝𝑒: 𝑏 = 𝑏1 =
𝑛 𝑥𝑦 − 𝑥 𝑦
𝑛 𝑥2 − 𝑥 2
𝑌 − int 𝑒 𝑟𝑐𝑒𝑝𝑡:
𝑎 = 𝑏0 =
𝑦 𝑥2
− 𝑥 𝑥𝑦
𝑛 𝑥2 − 𝑥 2
Also :
𝑏1 = 𝑏 = 𝑟
𝑠𝑦
𝑠𝑥
, 𝑏0 = 𝑎 = 𝑦 − 𝑏1𝑥
oefficient
ent
e y values
es
the sample x values.
he sample x values.
𝑟 =
𝑛 𝑥𝑦 − 𝑥 𝑦
𝑛 𝑥2 − 𝑥 2 𝑛 𝑦2 − 𝑦 2
, 𝑂𝑟: 𝑟 =
(𝑍𝑥𝑍𝑦)
𝑛 − 1
x 1 1 3 5
y 2 8 6 4
6
x y x•y x² y²
1 2 2 1 4
1 8 8 1 64
3 6 18 9 36
5 4 20 25 16
𝑟 =
4 • 48 − 10 • 20
4(36) − 102 4(120) − 202
=
−8
44 • 80
= −0.135
𝑟 =
𝑛( 𝑥𝑦) − 𝑥 • 𝑦
𝑛( 𝑥
2
) − ( 𝑥)2 𝑛( 𝑦
2
) − ( 𝑦)2
TI Calculator:
How to enter data:
1. Stat
2. Edit
3. ClrList 𝑳𝟏
4. Or Highlight & Clear
5. Type in your data in L1, ..
TI Calculator:
Linear Regression - test
1. Stat
2. Tests
3. LinRegTTest
4. Enter 𝑳𝟏 & 𝑳𝟐
5. Freq = 1
6. Choose ≠
7. Calculate
∑x = 10 ∑y = 20 ∑xy = 48 ∑x² = 36 ∑y² = 120
Example 1
Given the sample data:
a. Find the value of the linear correlation coefficient r
b. Test the claim that there is a linear correlation between
the two variables x and y. Use both (a) Method 1 and
(b) Method 2. ( = 0.05)
c. Find the regression equation.
d. Find the best predicted value of y, when x is equal to 2.
Social science Statistics Calculator Tab: https://www.socscistatistics.com/tests/
Correlation Coefficient Calculator: https://www.socscistatistics.com/tests/pearson/default.aspx
Linear Regression Calculator: https://www.socscistatistics.com/tests/regression/default.aspx
7
Example 1b
1) Null & Alternative hypotheses:
2) Test statistic (TS)
𝑡 =
𝑟 − 𝜇𝑟
1 − 𝑟2
𝑛 − 2
3) Distribution, CV, RR & NRR.
Method 1 : T-test  = 0.05,
df = 𝑛 − 2 = 2
4) Make a decision:
Decision:
a. Do not Reject H0
b. The claim is False
c. There is no linear correlation between the 2 variables.
=
−.135
0.70064
=
−.135 − 0
1 − −.135 2
4 − 2
r = −0.135
CV: 𝑛 = 4,  = 0.05,
Use r: From
Correlation Table
→ CV: t = ±4.303
→ CV: r = ±0.950
H0: 𝜌 = 0, H1: 𝜌 ≠ 0, 2TT. claim
Method 1 : Method 2:
= −0.1927
𝑟 = −0.135
8
Method 1 : T-test  = 0.05,
df = 𝑛 − 2 = 2
CV: 𝑛 = 4,  = 0.05, Use r:
From Correlation Table
→ CV: t = ±4.303
→ CV: r = ±0.950
Example 1b
9
Example 1c, d
𝑏1 =
4 48 − 10 20
4 36 − 102
=
−8
44
= −0.182
𝑦 = −0.182𝑥 + 5.455
d) No significant linear correlation:
The best predicted y-value is 𝒚.
𝑦 =
𝑦
𝑛
=
20
4
= 5, 𝑥 =
𝑥
𝑛
=
10
4
= 2.5
𝑏0 = 5 − −0.181818 2.5 = 5.455
→ r = −0.135 → 𝒚 = 𝟓
c) Regression Equation:
𝑦 = 𝑏0 + 𝑏1𝑥, OR 𝑦 = 𝑎 + 𝑏𝑥,
𝑆𝑙𝑜𝑝𝑒: 𝑏1 =
𝑛 𝑥𝑦 − 𝑥 𝑦
𝑛 𝑥2 − 𝑥 2
𝑌 − int 𝑒 𝑟𝑐𝑒𝑝𝑡: 𝑏0 = 𝑦 − 𝑏1𝑥,
𝑦 =
𝑦
𝑛
, 𝑥 =
𝑥
𝑛
𝑟 = −0.135
∑x = 10 ∑y = 20 ∑xy = 48 ∑x² = 36 ∑y² = 120
Example 2: Finding r Using the following Formula
The data shown is for car rental companies in the United
States for a recent year. Find the correlation coefficient, the
equation of the regression line for the data, and graph the
line of the scatter plot.
10
Company
Cars x
(in 10,000s)
Income y
(in billions) xy x2
y2
A
B
C
D
E
F
63.0
29.0
20.8
19.1
13.4
8.5
7.0
3.9
2.1
2.8
1.4
1.5
441.00
113.10
43.68
53.48
18.76
12.75
3969.00
841.00
432.64
364.81
179.56
72.25
49.00
15.21
4.41
7.84
1.96
2.25
Σx =
153.8
Σy =
18.7
Σxy =
682.77
Σx2
=
5859.26
Σy2
=
80.67
Σx = 153.8, Σy = 18.7, Σxy = 682.77, Σx2 = 5859.26, Σy2 = 80.67, n = 6
𝑟 =
6(682.77) − 153.8 • 18.7
6(5859.26) − 153. 82 6(80.67) − 18. 72
𝑟 = 0.982 (strong positive relationship)
𝑟 =
𝑛 𝑥𝑦 − 𝑥 𝑦
𝑛 𝑥2 − 𝑥 2 𝑛 𝑦2 − 𝑦 2
TI Calculator:
Linear Regression – test &
Correlation Coefficient 𝑟
1. Stat
2. Tests
3. LinRegTTest
4. Enter 𝑳𝟏 & 𝑳𝟐
5. Freq = 1
6. Choose ≠
7. Calculate
TI Calculator:
How to enter data:
1. Stat
2. Edit
3. ClrList 𝑳𝟏
4. Or Highlight & Clear
5. Type in your data in L1, ..
11
Example 2 Continued:
Company
Cars x
(in 10,000s)
Income y
(in billions) xy x2
y2
A
B
C
D
E
F
63.0
29.0
20.8
19.1
13.4
8.5
7.0
3.9
2.1
2.8
1.4
1.5
441.00
113.10
43.68
53.48
18.76
12.75
3969.00
841.00
432.64
364.81
179.56
72.25
49.00
15.21
4.41
7.84
1.96
2.25
Σx =
153.8
Σy =
18.7
Σxy =
682.77
Σx2
=
5859.26
Σy2
=
80.67
Σx = 153.8, Σy = 18.7, Σxy = 682.77
Σx2 = 5859.26, Σy2 = 80.67, n = 6
𝑏0 ==
18.7 5859.26 − 153.8 682.77
6 5859.26 − 153.8 2
= 0.396
𝑏1 =
6 682.77 − 153.8 18.7
6 5859.26 − 153.8 2 = 0.106
→ 𝑦′ = 0.396 + 0.106𝑥
𝑆𝑙𝑜𝑝𝑒: 𝑏1 =
𝑛 𝑥𝑦 − 𝑥 𝑦
𝑛 𝑥2 − 𝑥 2
𝑏0 =
𝑦 𝑥2
− 𝑥 𝑥𝑦
𝑛 𝑥2 − 𝑥 2 =
18.7
6
− (0.106)
153.8
6
OR: 𝑌 − 𝑖𝑛𝑡: 𝑏0 = 𝑦 − 𝑏1𝑥,
𝑦 =
𝑦
𝑛
, 𝑥 =
𝑥
𝑛
𝑦 = 𝑏0 + 𝑏1𝑥, 𝑂𝑅: 𝑦′
= 𝑎 + 𝑏𝑥
TI Calculator:
Linear Regression – test &
Correlation Coefficient 𝑟
1. Stat
2. Tests
3. LinRegTTest
4. Enter 𝑳𝟏 & 𝑳𝟐
5. Freq = 1
6. Choose ≠
7. Calculate
The data shown is for car rental
companies in the United States for
a recent year. Find the equation of
the regression line for the data, and
graph the line of the scatter plot.
Find two points to
sketch the graph of the
regression line.
12
Example 2 Continued:
Any x values between 10 and 60 (Between 8.5 & 63)
Let x = 15 & 40
Plot (15,1.986) & (40,4.636),
and sketch the resulting line.
𝑦′
15 = 0.396 + 0.106 15 = 1.986
→ (15,1.986)
𝑦′
(40) = 0.396 + 0.106 40 = 4.636
→ (40, 4.636)
Predict the income of a car
rental agency that has
200,000 automobiles.
Significant linear correlation
→ Plug in
𝑥 = 20, 𝑦′(20) =
0.396 + 0.106 20 = 2.516
When a rental agency has 200,000
automobiles, its revenue will be
approximately $2.516 billion.
𝑦 = 𝑏0 + 𝑏1𝑥, 𝑂𝑅: 𝑦′
= 𝑎 + 𝑏𝑥
𝑦′ = 0.396 + 0.106𝑥
Marginal Change: In working with two variables related by a regression equation, the marginal change in a
variable is the amount that it changes when the other variable changes by exactly one unit. The slope b1 in the
regression equation represents the marginal change in y that occurs when x changes by one unit.
13
10.2 Regression, Marginal Change, Outlier & Influential Points
The slope of 2.49 tells us that if we increase x by 1, the predicted
variable y will increase by 2.49.
For Example: 𝑦 = 𝑏0 + 𝑏1𝑥 → 𝑦 = −3.37 + 2.49𝑥
Outlier (O): In a scatterplot, an outlier is a point lying far away from the other data points.
Influential Points (IP): Paired sample data may include one or more influential points, which are points that
strongly affect the graph of the regression line.
The scatterplot
located to the left
shows the regression
line. If we include an
additional pair of
data, x = 50 and y =
0, we get the
regression line
shown to the right
below.
The additional point
(50,0) is an
influential point
because the graph of
the regression line
did change
considerably as
shown. It is also an
outlier because it is
far from the other
points.
Essentially, an influential point is an outlier that significantly affects
the slope of the regression line. As a result of that single outlier, the
slope of the regression line changes greatly resulting in changing the
shape of the line. Accordingly, the outlier is considered an influential
point. (All IPs are Os but all Os may not be IPs.)
14
2. Given the sample data: (the numbers of registered boats in tens of thousands)
a.Find the value of the linear correlation coefficient r.
b.Test the claim that there is a linear correlation between the two variables x and y.
Use both (a) Method 1 and (b) Method 2. ( = 0.05)
c.Find the regression equation.
d. Assume that in 2001 there were 850,000 registered boats. Because the table lists the
numbers of registered boats in tens of thousands, this means that for 2001 we have x
= 85. Given that x = 85, find the best predicted value of y, the number of manatee
deaths from boats.
e.Using the above pairs and the value of r, what proportion of the variation in numbers of
manatee deaths can be explained by the variation in the number of registered boats?
Year 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
X:Boats(10,000s) 68 68 67 70 71 73 76 81 83 84
Y:Manatee Deaths 53 38 35 49 42 60 54 67 82 78
Example 3
2. Given the sample data:
a. Find r
b. Test the claim…
c. Regression equation.
d. x = 85, find the best
predicted value of y.
e. Proportion of the
variation in # of manatee
deaths explained by the
variation in the # of
boats? 15
𝑃 − 𝑣𝑎𝑙𝑢𝑒 = 0.000151 <  = 0.05
Decision:
a. Reject H0
b. The claim is True
c. There is a significant linear correlation
between the 2 variables.
𝑟 = 0.922
𝑐. 𝑦 = 𝑎 + 𝑏𝑥 = −112.71 + 2.274𝑥
d. Significant linear correlation: → Plug in
y = −112.71 + 2.274 85 = 80.58 → 81.0
r = 0.9215 → r2
= 0.84920 = 84.92%
Example 3 Continued
16
1) Null & Alternative hypotheses:
2) Test statistic (TS)
3) Distribution, RR & NRR.
Method 1 : T-test  = 0.05,
df = n-2 = 8 CV: t = ±2.306
4) Make a decision:
Decision:
a. Reject H0
b. The claim is True
c. There is a significant linear correlation between the 2 variables.
=
0.922
0.13689
= 6.7352
=
0.992 − 0
1 − 0.992 2
10 − 2
Use r = 0.922
𝑇𝑆: 𝑡 = 𝑟
𝑛 − 2
1 − 𝑟2
, 𝑑𝑓 = 𝑛 − 2
𝑂𝑟: 𝑟
𝑡 =
𝑟
1 − 𝑟2
𝑛 − 2
H0: 𝜌 = 0, H1: 𝜌 ≠ 0, 2TT. claim
Method 2 :
Method 1:
n = 10,  = 0.05
→ 𝐶𝑉: 𝑟 = ±0.632
CV: From Pearson
Correlation Coefficient table:
𝑟 = 0.922
Example 3 Continued
a. Use the table to the right the regression line and
predict the y value when x is 10.
b. Predict the IQ score of an adult who is exactly
175 cm tall. (IQ scores have a mean of 100 of )
17
Example 4:
Solution: Good Model: Use
the Regression Equation for
Predictions. Why?
𝑦 = 𝑏0 + 𝑏1𝑥 = −3.37 + 2.49𝑥
𝑦(10) = −3.37 + 2.49(10) = 21.5
Solution: Bad Model: Use 𝒚 for predictions.
Knowing that there is no correlation between height and IQ score, we know that a
regression equation is not a good model, so the best predicted value of IQ score is the
mean, which is 100.
𝑦 = 𝑏0 + 𝑏1𝑥, OR 𝑦 = 𝑎 + 𝑏𝑥,
𝑆𝑙𝑜𝑝𝑒: 𝑏1 =
𝑛 𝑥𝑦 − 𝑥 𝑦
𝑛 𝑥2 − 𝑥 2
𝑌 − int 𝑒 𝑟𝑐𝑒𝑝𝑡: 𝑏0 = 𝑦 − 𝑏1𝑥,
𝑦 =
𝑦
𝑛
, 𝑥 =
𝑥
𝑛
Least-Squares Property: A straight line satisfies the least-squares property if the
sum of the squares of the residuals is the smallest sum possible.
Residual: For a pair of sample x and y values, the residual is the difference between the observed
sample value of y and the y value that is predicted by using the regression equation.
𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 = 𝑦 − 𝑦 → 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 plot is collection of Pairs: (𝑥, 𝑦 − 𝑦). The residual plot should not have
any obvious pattern. The residual plot should not become much wider (or thinner) when viewed from
left to right.
18
Example 5: a. Find the residual value for the sample point
with coordinates of (8, 4). b. Draw the Residual Plot. c. What is
the value of the Marginal Change? 𝑦 = 𝑏0 + 𝑏1𝑥 = 1 + 𝑥
x 8 12 20 24
y 4 24 8 32
a.𝑥 = 8 → 𝑦 = 1 + 8 = 9
𝑥 = 8 → 𝑦 = 4
Residual:𝑦 − 𝑦 = 4 − 9 = −5
c. 𝑀𝑎𝑟𝑔𝑖𝑛𝑎𝑙 𝐶ℎ𝑎𝑛𝑔𝑒 = 𝑆𝑙𝑜𝑝𝑒 = 1
10.2 Regression, Least-Squares Property & Residual Plots
19
10.2 Regression Summary
Finding the Correlation Coefficient and the Regression Line Equation
Step 1 Make a table, as shown in step 2.
Step 2 Find the values of xy, x2, and y2. Place them in the appropriate columns and sum each
column.
Step 3 Substitute in the formula to find the value of r: 𝑟 =
𝑛 𝑥𝑦 − 𝑥 𝑦
𝑛 𝑥2 − 𝑥 2 𝑛 𝑦2 − 𝑦 2
Step 4 When r is significant, substitute in the formulas to find the values of a and b for the
regression line equation y' = a + bx.
𝑦 = 𝑏0 + 𝑏1𝑥, OR 𝑦 = 𝑎 + 𝑏𝑥, 𝑆𝑙𝑜𝑝𝑒: 𝑏1 = 𝑏 =
𝑛 𝑥𝑦− 𝑥 𝑦
𝑛 𝑥2− 𝑥 2 , 𝑌 − int 𝑒 𝑟𝑐𝑒𝑝𝑡: 𝑏0 = 𝑎 = 𝑦 − 𝑏1𝑥, 𝑦 =
𝑦
𝑛
, 𝑥 =
𝑥
𝑛
Example 5: (Skip)
Find the equation of the regression line in which the explanatory variable (or x
variable) is chocolate consumption and the response variable (or y variable) is the
corresponding Nobel Laureate rate. The table of data is on the next slide.
20
Chocolate (x) Nobel (y)
4.5 5.5
10.2 24.3
4.4 8.6
2.9 0.1
3.9 6.1
0.7 0.1
8.5 25.3
7.3 7.6
6.3 9.0
11.6 12.7
2.5 1.9
8.8 12.7
Chocolate
(x)
Nobel (y)
3.7 3.3
1.8 1.5
4.5 11.4
9.4 25.5
3.6 3.1
2.0 1.9
3.6 1.7
6.4 31.9
11.9 31.5
9.7 18.9
5.3 10.8
Solution: REQUIREMENT
(1) The data are assumed to be a simple random
sample (SRS).
(2) The scatterplot is very roughly a
straight-line pattern.
(3) There are no outliers.
21
Example 5:
Use the first formulas for b1 and b0
to find the equation of the
regression line in which the
explanatory variable (or x variable)
is chocolate consumption and the
response variable (or y variable) is
the corresponding number of Nobel
Laureates.
Find the slope b1 as follows:
r is the linear correlation coefficient
sy is the standard deviation of the sample y values
sx is the standard deviation of the sample x values.
𝑦 = 𝑏0 + 𝑏1𝑥, OR 𝑦 = 𝑎 + 𝑏𝑥,
𝑆𝑙𝑜𝑝𝑒: 𝑏1 =
𝑛 𝑥𝑦 − 𝑥 𝑦
𝑛 𝑥2 − 𝑥 2
𝑌 − int 𝑒 𝑟𝑐𝑒𝑝𝑡: 𝑏0 = 𝑦 − 𝑏1𝑥,
𝑦 =
𝑦
𝑛
, 𝑥 =
𝑥
𝑛
Also :
𝑏1 = 𝑏 = 𝑟
𝑠𝑦
𝑠𝑥
, 𝑏0 = 𝑎 = 𝑦 − 𝑏1𝑥
𝑦 = 𝑏0 + 𝑏1𝑥 = −3.3667 + 2.4931𝑥
Graphing the Regression
Line: Shown below is the
Minitab display of the
scatterplot with the graph of the
regression line included. We can
see that the regression line fits
the points well, but the points
are not very close to the line.
𝑏1 = 𝑟
𝑠𝑦
𝑠𝑥
= 0.80061 ∙
10.2116
3.2792
= 2.4931, 𝑏0 = 𝑦 − 𝑏1𝑥
= 11.10435 − 2.4931 ∙ 5.8043 = −3.3667
22
10.2 Regression (For later courses!)
Population Parameter: 𝒚 = 𝜷𝟎 + 𝜷𝟏𝒙
Sample Statistic : 𝒚 = 𝒃𝟎 + 𝒃𝟏𝒙
Ti calculator : 𝒚 = 𝒂 + 𝒃𝒙
𝑦 = 𝑏0 + 𝑏1𝑥, OR 𝑦 = 𝑎 + 𝑏𝑥,
𝑆𝑙𝑜𝑝𝑒: 𝑏 = 𝑏1 =
𝑛 𝑥𝑦 − 𝑥 𝑦
𝑛 𝑥2 − 𝑥 2
𝑌 − int 𝑒 𝑟𝑐𝑒𝑝𝑡:
𝑎 = 𝑏0 =
𝑦 𝑥2
− 𝑥 𝑥𝑦
𝑛 𝑥2 − 𝑥 2
Also :
𝑏1 = 𝑏 = 𝑟
𝑠𝑦
𝑠𝑥
, 𝑏0 = 𝑎 = 𝑦 − 𝑏1𝑥
oefficient
ent
e y values
es
the sample x values.
he sample x values.
Population Parameter: 𝒚 = 𝜷𝟎 + 𝜷𝟏𝒙
Sample Statistic : 𝒚 = 𝒃𝟎 + 𝒃𝟏𝒙
Ti calculator : 𝒚 = 𝒂 + 𝒃𝒙
Observed Value: 𝑌𝑖
Observed Average Value:𝑌
Predicted Value (From Reg EQ): 𝑌𝑖
𝑦 = 𝑎 + 𝑏𝑥
𝑆𝑆𝑇𝑜𝑡𝑎𝑙 𝑜𝑟𝑆𝑆𝑦 =
𝑖=1
𝑛
𝑌𝑖 − 𝑌 2
= 𝑆𝑆𝑟𝑒𝑔 + 𝑆𝑆𝑒𝑟𝑟𝑜𝑟
𝑆𝑆𝑟𝑒𝑔 =
𝑖=1
𝑛
𝑌𝑖 − 𝑌
2
𝑆𝑆𝑒𝑟𝑟𝑜𝑟 =
𝑖=1
𝑛
𝑌𝑖 − 𝑌𝑖
2
𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦 (𝑥𝑖 − 𝑥)2
(𝑦𝑖 − 𝑦)2
SP = (𝑥𝑖 −𝑥 )(𝑦𝑖 − 𝑦) 𝒚 𝒚 − 𝑦 (𝒚 − 𝑦)2
𝑦𝑖 − 𝒚 (𝑦𝑖 − 𝒚)2

More Related Content

What's hot

What's hot (20)

Practice test ch 8 hypothesis testing ch 9 two populations
Practice test ch 8 hypothesis testing ch 9 two populationsPractice test ch 8 hypothesis testing ch 9 two populations
Practice test ch 8 hypothesis testing ch 9 two populations
 
Practice Test 2 Probability
Practice Test 2 ProbabilityPractice Test 2 Probability
Practice Test 2 Probability
 
Inferences about Two Proportions
 Inferences about Two Proportions Inferences about Two Proportions
Inferences about Two Proportions
 
Basics of Hypothesis Testing
Basics of Hypothesis TestingBasics of Hypothesis Testing
Basics of Hypothesis Testing
 
Assessing Normality
Assessing NormalityAssessing Normality
Assessing Normality
 
Practice Test 1
Practice Test 1Practice Test 1
Practice Test 1
 
Solution to the practice test ch 8 hypothesis testing ch 9 two populations
Solution to the practice test ch 8 hypothesis testing ch 9 two populationsSolution to the practice test ch 8 hypothesis testing ch 9 two populations
Solution to the practice test ch 8 hypothesis testing ch 9 two populations
 
Binomial Probability Distributions
Binomial Probability DistributionsBinomial Probability Distributions
Binomial Probability Distributions
 
Practice Test 2 Solutions
Practice Test 2  SolutionsPractice Test 2  Solutions
Practice Test 2 Solutions
 
Contingency Tables
Contingency TablesContingency Tables
Contingency Tables
 
Probability Distribution
Probability DistributionProbability Distribution
Probability Distribution
 
One-Way ANOVA
One-Way ANOVAOne-Way ANOVA
One-Way ANOVA
 
Testing a Claim About a Mean
Testing a Claim About a MeanTesting a Claim About a Mean
Testing a Claim About a Mean
 
Practice test ch 10 correlation reg ch 11 gof ch12 anova
Practice test ch 10 correlation reg ch 11 gof ch12 anovaPractice test ch 10 correlation reg ch 11 gof ch12 anova
Practice test ch 10 correlation reg ch 11 gof ch12 anova
 
The Standard Normal Distribution
The Standard Normal DistributionThe Standard Normal Distribution
The Standard Normal Distribution
 
Solution to the Practice Test 3A, Chapter 6 Normal Probability Distribution
Solution to the Practice Test 3A, Chapter 6 Normal Probability DistributionSolution to the Practice Test 3A, Chapter 6 Normal Probability Distribution
Solution to the Practice Test 3A, Chapter 6 Normal Probability Distribution
 
Practice Test Chapter 6 (Normal Probability Distributions)
Practice Test Chapter 6 (Normal Probability Distributions)Practice Test Chapter 6 (Normal Probability Distributions)
Practice Test Chapter 6 (Normal Probability Distributions)
 
Two Variances or Standard Deviations
Two Variances or Standard DeviationsTwo Variances or Standard Deviations
Two Variances or Standard Deviations
 
Practice Test 1 solutions
Practice Test 1 solutions  Practice Test 1 solutions
Practice Test 1 solutions
 
Basic Concepts of Probability
Basic Concepts of ProbabilityBasic Concepts of Probability
Basic Concepts of Probability
 

Similar to Regression

Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inference
Kemal İnciroğlu
 
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Neeraj Bhandari
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
Maria Theresa
 
Curve_Fitting.pdf
Curve_Fitting.pdfCurve_Fitting.pdf
Curve_Fitting.pdf
Irfan Khan
 

Similar to Regression (20)

Regression
Regression  Regression
Regression
 
simple linear regression - brief introduction
simple linear regression - brief introductionsimple linear regression - brief introduction
simple linear regression - brief introduction
 
Regression.pptx
Regression.pptxRegression.pptx
Regression.pptx
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inference
 
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
 
Regression
RegressionRegression
Regression
 
Regression
RegressionRegression
Regression
 
Lesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing dataLesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing data
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
REGRESSION ANALYSIS
REGRESSION ANALYSISREGRESSION ANALYSIS
REGRESSION ANALYSIS
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Course pack unit 5
Course pack unit 5Course pack unit 5
Course pack unit 5
 
ML Module 3.pdf
ML Module 3.pdfML Module 3.pdf
ML Module 3.pdf
 
Regression Analysis.pptx
Regression Analysis.pptxRegression Analysis.pptx
Regression Analysis.pptx
 
Properties of coefficient of correlation
Properties of coefficient of correlationProperties of coefficient of correlation
Properties of coefficient of correlation
 
Curve_Fitting.pdf
Curve_Fitting.pdfCurve_Fitting.pdf
Curve_Fitting.pdf
 
Corr-and-Regress (1).ppt
Corr-and-Regress (1).pptCorr-and-Regress (1).ppt
Corr-and-Regress (1).ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Cr-and-Regress.ppt
Cr-and-Regress.pptCr-and-Regress.ppt
Cr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 

More from Long Beach City College

More from Long Beach City College (10)

Practice test ch 9 inferences from two samples
Practice test ch 9 inferences from two samplesPractice test ch 9 inferences from two samples
Practice test ch 9 inferences from two samples
 
Practice Test Ch 8 Hypothesis Testing
Practice Test Ch 8 Hypothesis TestingPractice Test Ch 8 Hypothesis Testing
Practice Test Ch 8 Hypothesis Testing
 
Stat sample test ch 12 solution
Stat sample test ch 12 solutionStat sample test ch 12 solution
Stat sample test ch 12 solution
 
Stat sample test ch 12
Stat sample test ch 12Stat sample test ch 12
Stat sample test ch 12
 
Stat sample test ch 11
Stat sample test ch 11Stat sample test ch 11
Stat sample test ch 11
 
Stat sample test ch 10
Stat sample test ch 10Stat sample test ch 10
Stat sample test ch 10
 
Two-Way ANOVA
Two-Way ANOVATwo-Way ANOVA
Two-Way ANOVA
 
Two Means, Two Dependent Samples, Matched Pairs
Two Means, Two Dependent Samples, Matched PairsTwo Means, Two Dependent Samples, Matched Pairs
Two Means, Two Dependent Samples, Matched Pairs
 
Testing a Claim About a Standard Deviation or Variance
Testing a Claim About a Standard Deviation or VarianceTesting a Claim About a Standard Deviation or Variance
Testing a Claim About a Standard Deviation or Variance
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance
 

Recently uploaded

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Recently uploaded (20)

PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 

Regression

  • 1. Elementary Statistics Chapter 10: Correlation and Regression 10.2 Regression 1
  • 2. Chapter 10: Correlation and Regression 10.1 Correlation 10.2 Regression 10.3 Prediction Intervals and Variation 10.4 Multiple Regression 10.5 Nonlinear Regression 2 Objectives: • Draw a scatter plot for a set of ordered pairs. • Compute the correlation coefficient. • Test the hypothesis H0: ρ = 0. • Compute the equation of the regression line & the coefficient of determination. • Compute the standard error of the estimate & a prediction interval.
  • 3. Key Concepts: If the value of the correlation coefficient is significant, determine the equation of the regression line. Find the equation of the straight line that best fits the points in a scatterplot of paired sample data. That best-fitting straight line is called the regression line, and its equation is called the regression equation. The regression equation expresses a relationship between x (called the independent variable, predictor variable or explanatory variable), and y (called the dependent variable or response variable). The typical equation of a straight line is expressed in the form of y = mx + b, where b is the y-intercept and m is the slope. Regression Line: Given a collection of paired sample data, the regression line (or line of best fit, or least-squares line) is the straight line that “best” fits the scatterplot of the data. If there is not a significant linear correlation, the best predicted y-value is 𝑦. If there is a significant linear correlation, the best predicted y-value is found by substituting the x- value into the regression equation. 10.2 Regression 3 𝑦 = 𝑏0 + 𝑏1𝑥, OR 𝑦 = 𝑎 + 𝑏𝑥, 𝑆𝑙𝑜𝑝𝑒: 𝑏1 = 𝑛 𝑥𝑦 − 𝑥 𝑦 𝑛 𝑥2 − 𝑥 2 𝑌 − int 𝑒 𝑟𝑐𝑒𝑝𝑡: 𝑏0 = 𝑦 − 𝑏1𝑥, 𝑦 = 𝑦 𝑛 , 𝑥 = 𝑥 𝑛 Population Parameter: 𝒚 = 𝜷𝟎 + 𝜷𝟏𝒙 Sample Statistic : 𝒚 = 𝒃𝟎 + 𝒃𝟏𝒙 Ti calculator : 𝒚 = 𝒂 + 𝒃𝒙 Also: 𝑏1 = 𝑏 = 𝑟 𝑠𝑦 𝑠𝑥 , 𝑏0 = 𝑎 = 𝑦 − 𝑏1𝑥 r is the linear correlation coefficient sy is the standard deviation of the sample y values sx is the standard deviation of the sample x values.
  • 4. Regression equations are often useful for predicting the value of one variable, given some specific value of the other variable: 1. Bad Model: If the regression equation does not appear to be useful for making predictions, don’t use the regression equation for making predictions. For bad models, the best predicted value of a variable is simply its sample mean: 𝒚. 2. Good Model: Use the regression equation for predictions only if the graph of the regression line on the scatterplot confirms that the regression line fits the points reasonably well. 3. Correlation: Use the regression equation for predictions only if the linear correlation coefficient r indicates that there is a linear correlation between the two variables. 4. Scope: Use the regression line for predictions only if the data do not go much beyond the scope of the available sample data. 4 10.2 Regression, Making Predictions
  • 5. Best fit means that the sum of the squares of the vertical distance (residuals) from each point to the line is at a minimum. 5 10.2 Regression Population Parameter: 𝒚 = 𝜷𝟎 + 𝜷𝟏𝒙 Sample Statistic : 𝒚 = 𝒃𝟎 + 𝒃𝟏𝒙 Ti calculator : 𝒚 = 𝒂 + 𝒃𝒙 𝑦 = 𝑏0 + 𝑏1𝑥, OR 𝑦 = 𝑎 + 𝑏𝑥, 𝑆𝑙𝑜𝑝𝑒: 𝑏 = 𝑏1 = 𝑛 𝑥𝑦 − 𝑥 𝑦 𝑛 𝑥2 − 𝑥 2 𝑌 − int 𝑒 𝑟𝑐𝑒𝑝𝑡: 𝑎 = 𝑏0 = 𝑦 𝑥2 − 𝑥 𝑥𝑦 𝑛 𝑥2 − 𝑥 2 Also : 𝑏1 = 𝑏 = 𝑟 𝑠𝑦 𝑠𝑥 , 𝑏0 = 𝑎 = 𝑦 − 𝑏1𝑥 oefficient ent e y values es the sample x values. he sample x values. 𝑟 = 𝑛 𝑥𝑦 − 𝑥 𝑦 𝑛 𝑥2 − 𝑥 2 𝑛 𝑦2 − 𝑦 2 , 𝑂𝑟: 𝑟 = (𝑍𝑥𝑍𝑦) 𝑛 − 1
  • 6. x 1 1 3 5 y 2 8 6 4 6 x y x•y x² y² 1 2 2 1 4 1 8 8 1 64 3 6 18 9 36 5 4 20 25 16 𝑟 = 4 • 48 − 10 • 20 4(36) − 102 4(120) − 202 = −8 44 • 80 = −0.135 𝑟 = 𝑛( 𝑥𝑦) − 𝑥 • 𝑦 𝑛( 𝑥 2 ) − ( 𝑥)2 𝑛( 𝑦 2 ) − ( 𝑦)2 TI Calculator: How to enter data: 1. Stat 2. Edit 3. ClrList 𝑳𝟏 4. Or Highlight & Clear 5. Type in your data in L1, .. TI Calculator: Linear Regression - test 1. Stat 2. Tests 3. LinRegTTest 4. Enter 𝑳𝟏 & 𝑳𝟐 5. Freq = 1 6. Choose ≠ 7. Calculate ∑x = 10 ∑y = 20 ∑xy = 48 ∑x² = 36 ∑y² = 120 Example 1 Given the sample data: a. Find the value of the linear correlation coefficient r b. Test the claim that there is a linear correlation between the two variables x and y. Use both (a) Method 1 and (b) Method 2. ( = 0.05) c. Find the regression equation. d. Find the best predicted value of y, when x is equal to 2. Social science Statistics Calculator Tab: https://www.socscistatistics.com/tests/ Correlation Coefficient Calculator: https://www.socscistatistics.com/tests/pearson/default.aspx Linear Regression Calculator: https://www.socscistatistics.com/tests/regression/default.aspx
  • 7. 7 Example 1b 1) Null & Alternative hypotheses: 2) Test statistic (TS) 𝑡 = 𝑟 − 𝜇𝑟 1 − 𝑟2 𝑛 − 2 3) Distribution, CV, RR & NRR. Method 1 : T-test  = 0.05, df = 𝑛 − 2 = 2 4) Make a decision: Decision: a. Do not Reject H0 b. The claim is False c. There is no linear correlation between the 2 variables. = −.135 0.70064 = −.135 − 0 1 − −.135 2 4 − 2 r = −0.135 CV: 𝑛 = 4,  = 0.05, Use r: From Correlation Table → CV: t = ±4.303 → CV: r = ±0.950 H0: 𝜌 = 0, H1: 𝜌 ≠ 0, 2TT. claim Method 1 : Method 2: = −0.1927 𝑟 = −0.135
  • 8. 8 Method 1 : T-test  = 0.05, df = 𝑛 − 2 = 2 CV: 𝑛 = 4,  = 0.05, Use r: From Correlation Table → CV: t = ±4.303 → CV: r = ±0.950 Example 1b
  • 9. 9 Example 1c, d 𝑏1 = 4 48 − 10 20 4 36 − 102 = −8 44 = −0.182 𝑦 = −0.182𝑥 + 5.455 d) No significant linear correlation: The best predicted y-value is 𝒚. 𝑦 = 𝑦 𝑛 = 20 4 = 5, 𝑥 = 𝑥 𝑛 = 10 4 = 2.5 𝑏0 = 5 − −0.181818 2.5 = 5.455 → r = −0.135 → 𝒚 = 𝟓 c) Regression Equation: 𝑦 = 𝑏0 + 𝑏1𝑥, OR 𝑦 = 𝑎 + 𝑏𝑥, 𝑆𝑙𝑜𝑝𝑒: 𝑏1 = 𝑛 𝑥𝑦 − 𝑥 𝑦 𝑛 𝑥2 − 𝑥 2 𝑌 − int 𝑒 𝑟𝑐𝑒𝑝𝑡: 𝑏0 = 𝑦 − 𝑏1𝑥, 𝑦 = 𝑦 𝑛 , 𝑥 = 𝑥 𝑛 𝑟 = −0.135 ∑x = 10 ∑y = 20 ∑xy = 48 ∑x² = 36 ∑y² = 120
  • 10. Example 2: Finding r Using the following Formula The data shown is for car rental companies in the United States for a recent year. Find the correlation coefficient, the equation of the regression line for the data, and graph the line of the scatter plot. 10 Company Cars x (in 10,000s) Income y (in billions) xy x2 y2 A B C D E F 63.0 29.0 20.8 19.1 13.4 8.5 7.0 3.9 2.1 2.8 1.4 1.5 441.00 113.10 43.68 53.48 18.76 12.75 3969.00 841.00 432.64 364.81 179.56 72.25 49.00 15.21 4.41 7.84 1.96 2.25 Σx = 153.8 Σy = 18.7 Σxy = 682.77 Σx2 = 5859.26 Σy2 = 80.67 Σx = 153.8, Σy = 18.7, Σxy = 682.77, Σx2 = 5859.26, Σy2 = 80.67, n = 6 𝑟 = 6(682.77) − 153.8 • 18.7 6(5859.26) − 153. 82 6(80.67) − 18. 72 𝑟 = 0.982 (strong positive relationship) 𝑟 = 𝑛 𝑥𝑦 − 𝑥 𝑦 𝑛 𝑥2 − 𝑥 2 𝑛 𝑦2 − 𝑦 2 TI Calculator: Linear Regression – test & Correlation Coefficient 𝑟 1. Stat 2. Tests 3. LinRegTTest 4. Enter 𝑳𝟏 & 𝑳𝟐 5. Freq = 1 6. Choose ≠ 7. Calculate TI Calculator: How to enter data: 1. Stat 2. Edit 3. ClrList 𝑳𝟏 4. Or Highlight & Clear 5. Type in your data in L1, ..
  • 11. 11 Example 2 Continued: Company Cars x (in 10,000s) Income y (in billions) xy x2 y2 A B C D E F 63.0 29.0 20.8 19.1 13.4 8.5 7.0 3.9 2.1 2.8 1.4 1.5 441.00 113.10 43.68 53.48 18.76 12.75 3969.00 841.00 432.64 364.81 179.56 72.25 49.00 15.21 4.41 7.84 1.96 2.25 Σx = 153.8 Σy = 18.7 Σxy = 682.77 Σx2 = 5859.26 Σy2 = 80.67 Σx = 153.8, Σy = 18.7, Σxy = 682.77 Σx2 = 5859.26, Σy2 = 80.67, n = 6 𝑏0 == 18.7 5859.26 − 153.8 682.77 6 5859.26 − 153.8 2 = 0.396 𝑏1 = 6 682.77 − 153.8 18.7 6 5859.26 − 153.8 2 = 0.106 → 𝑦′ = 0.396 + 0.106𝑥 𝑆𝑙𝑜𝑝𝑒: 𝑏1 = 𝑛 𝑥𝑦 − 𝑥 𝑦 𝑛 𝑥2 − 𝑥 2 𝑏0 = 𝑦 𝑥2 − 𝑥 𝑥𝑦 𝑛 𝑥2 − 𝑥 2 = 18.7 6 − (0.106) 153.8 6 OR: 𝑌 − 𝑖𝑛𝑡: 𝑏0 = 𝑦 − 𝑏1𝑥, 𝑦 = 𝑦 𝑛 , 𝑥 = 𝑥 𝑛 𝑦 = 𝑏0 + 𝑏1𝑥, 𝑂𝑅: 𝑦′ = 𝑎 + 𝑏𝑥 TI Calculator: Linear Regression – test & Correlation Coefficient 𝑟 1. Stat 2. Tests 3. LinRegTTest 4. Enter 𝑳𝟏 & 𝑳𝟐 5. Freq = 1 6. Choose ≠ 7. Calculate The data shown is for car rental companies in the United States for a recent year. Find the equation of the regression line for the data, and graph the line of the scatter plot.
  • 12. Find two points to sketch the graph of the regression line. 12 Example 2 Continued: Any x values between 10 and 60 (Between 8.5 & 63) Let x = 15 & 40 Plot (15,1.986) & (40,4.636), and sketch the resulting line. 𝑦′ 15 = 0.396 + 0.106 15 = 1.986 → (15,1.986) 𝑦′ (40) = 0.396 + 0.106 40 = 4.636 → (40, 4.636) Predict the income of a car rental agency that has 200,000 automobiles. Significant linear correlation → Plug in 𝑥 = 20, 𝑦′(20) = 0.396 + 0.106 20 = 2.516 When a rental agency has 200,000 automobiles, its revenue will be approximately $2.516 billion. 𝑦 = 𝑏0 + 𝑏1𝑥, 𝑂𝑅: 𝑦′ = 𝑎 + 𝑏𝑥 𝑦′ = 0.396 + 0.106𝑥
  • 13. Marginal Change: In working with two variables related by a regression equation, the marginal change in a variable is the amount that it changes when the other variable changes by exactly one unit. The slope b1 in the regression equation represents the marginal change in y that occurs when x changes by one unit. 13 10.2 Regression, Marginal Change, Outlier & Influential Points The slope of 2.49 tells us that if we increase x by 1, the predicted variable y will increase by 2.49. For Example: 𝑦 = 𝑏0 + 𝑏1𝑥 → 𝑦 = −3.37 + 2.49𝑥 Outlier (O): In a scatterplot, an outlier is a point lying far away from the other data points. Influential Points (IP): Paired sample data may include one or more influential points, which are points that strongly affect the graph of the regression line. The scatterplot located to the left shows the regression line. If we include an additional pair of data, x = 50 and y = 0, we get the regression line shown to the right below. The additional point (50,0) is an influential point because the graph of the regression line did change considerably as shown. It is also an outlier because it is far from the other points. Essentially, an influential point is an outlier that significantly affects the slope of the regression line. As a result of that single outlier, the slope of the regression line changes greatly resulting in changing the shape of the line. Accordingly, the outlier is considered an influential point. (All IPs are Os but all Os may not be IPs.)
  • 14. 14 2. Given the sample data: (the numbers of registered boats in tens of thousands) a.Find the value of the linear correlation coefficient r. b.Test the claim that there is a linear correlation between the two variables x and y. Use both (a) Method 1 and (b) Method 2. ( = 0.05) c.Find the regression equation. d. Assume that in 2001 there were 850,000 registered boats. Because the table lists the numbers of registered boats in tens of thousands, this means that for 2001 we have x = 85. Given that x = 85, find the best predicted value of y, the number of manatee deaths from boats. e.Using the above pairs and the value of r, what proportion of the variation in numbers of manatee deaths can be explained by the variation in the number of registered boats? Year 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 X:Boats(10,000s) 68 68 67 70 71 73 76 81 83 84 Y:Manatee Deaths 53 38 35 49 42 60 54 67 82 78 Example 3
  • 15. 2. Given the sample data: a. Find r b. Test the claim… c. Regression equation. d. x = 85, find the best predicted value of y. e. Proportion of the variation in # of manatee deaths explained by the variation in the # of boats? 15 𝑃 − 𝑣𝑎𝑙𝑢𝑒 = 0.000151 <  = 0.05 Decision: a. Reject H0 b. The claim is True c. There is a significant linear correlation between the 2 variables. 𝑟 = 0.922 𝑐. 𝑦 = 𝑎 + 𝑏𝑥 = −112.71 + 2.274𝑥 d. Significant linear correlation: → Plug in y = −112.71 + 2.274 85 = 80.58 → 81.0 r = 0.9215 → r2 = 0.84920 = 84.92% Example 3 Continued
  • 16. 16 1) Null & Alternative hypotheses: 2) Test statistic (TS) 3) Distribution, RR & NRR. Method 1 : T-test  = 0.05, df = n-2 = 8 CV: t = ±2.306 4) Make a decision: Decision: a. Reject H0 b. The claim is True c. There is a significant linear correlation between the 2 variables. = 0.922 0.13689 = 6.7352 = 0.992 − 0 1 − 0.992 2 10 − 2 Use r = 0.922 𝑇𝑆: 𝑡 = 𝑟 𝑛 − 2 1 − 𝑟2 , 𝑑𝑓 = 𝑛 − 2 𝑂𝑟: 𝑟 𝑡 = 𝑟 1 − 𝑟2 𝑛 − 2 H0: 𝜌 = 0, H1: 𝜌 ≠ 0, 2TT. claim Method 2 : Method 1: n = 10,  = 0.05 → 𝐶𝑉: 𝑟 = ±0.632 CV: From Pearson Correlation Coefficient table: 𝑟 = 0.922 Example 3 Continued
  • 17. a. Use the table to the right the regression line and predict the y value when x is 10. b. Predict the IQ score of an adult who is exactly 175 cm tall. (IQ scores have a mean of 100 of ) 17 Example 4: Solution: Good Model: Use the Regression Equation for Predictions. Why? 𝑦 = 𝑏0 + 𝑏1𝑥 = −3.37 + 2.49𝑥 𝑦(10) = −3.37 + 2.49(10) = 21.5 Solution: Bad Model: Use 𝒚 for predictions. Knowing that there is no correlation between height and IQ score, we know that a regression equation is not a good model, so the best predicted value of IQ score is the mean, which is 100. 𝑦 = 𝑏0 + 𝑏1𝑥, OR 𝑦 = 𝑎 + 𝑏𝑥, 𝑆𝑙𝑜𝑝𝑒: 𝑏1 = 𝑛 𝑥𝑦 − 𝑥 𝑦 𝑛 𝑥2 − 𝑥 2 𝑌 − int 𝑒 𝑟𝑐𝑒𝑝𝑡: 𝑏0 = 𝑦 − 𝑏1𝑥, 𝑦 = 𝑦 𝑛 , 𝑥 = 𝑥 𝑛
  • 18. Least-Squares Property: A straight line satisfies the least-squares property if the sum of the squares of the residuals is the smallest sum possible. Residual: For a pair of sample x and y values, the residual is the difference between the observed sample value of y and the y value that is predicted by using the regression equation. 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 = 𝑦 − 𝑦 → 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 plot is collection of Pairs: (𝑥, 𝑦 − 𝑦). The residual plot should not have any obvious pattern. The residual plot should not become much wider (or thinner) when viewed from left to right. 18 Example 5: a. Find the residual value for the sample point with coordinates of (8, 4). b. Draw the Residual Plot. c. What is the value of the Marginal Change? 𝑦 = 𝑏0 + 𝑏1𝑥 = 1 + 𝑥 x 8 12 20 24 y 4 24 8 32 a.𝑥 = 8 → 𝑦 = 1 + 8 = 9 𝑥 = 8 → 𝑦 = 4 Residual:𝑦 − 𝑦 = 4 − 9 = −5 c. 𝑀𝑎𝑟𝑔𝑖𝑛𝑎𝑙 𝐶ℎ𝑎𝑛𝑔𝑒 = 𝑆𝑙𝑜𝑝𝑒 = 1 10.2 Regression, Least-Squares Property & Residual Plots
  • 19. 19 10.2 Regression Summary Finding the Correlation Coefficient and the Regression Line Equation Step 1 Make a table, as shown in step 2. Step 2 Find the values of xy, x2, and y2. Place them in the appropriate columns and sum each column. Step 3 Substitute in the formula to find the value of r: 𝑟 = 𝑛 𝑥𝑦 − 𝑥 𝑦 𝑛 𝑥2 − 𝑥 2 𝑛 𝑦2 − 𝑦 2 Step 4 When r is significant, substitute in the formulas to find the values of a and b for the regression line equation y' = a + bx. 𝑦 = 𝑏0 + 𝑏1𝑥, OR 𝑦 = 𝑎 + 𝑏𝑥, 𝑆𝑙𝑜𝑝𝑒: 𝑏1 = 𝑏 = 𝑛 𝑥𝑦− 𝑥 𝑦 𝑛 𝑥2− 𝑥 2 , 𝑌 − int 𝑒 𝑟𝑐𝑒𝑝𝑡: 𝑏0 = 𝑎 = 𝑦 − 𝑏1𝑥, 𝑦 = 𝑦 𝑛 , 𝑥 = 𝑥 𝑛
  • 20. Example 5: (Skip) Find the equation of the regression line in which the explanatory variable (or x variable) is chocolate consumption and the response variable (or y variable) is the corresponding Nobel Laureate rate. The table of data is on the next slide. 20 Chocolate (x) Nobel (y) 4.5 5.5 10.2 24.3 4.4 8.6 2.9 0.1 3.9 6.1 0.7 0.1 8.5 25.3 7.3 7.6 6.3 9.0 11.6 12.7 2.5 1.9 8.8 12.7 Chocolate (x) Nobel (y) 3.7 3.3 1.8 1.5 4.5 11.4 9.4 25.5 3.6 3.1 2.0 1.9 3.6 1.7 6.4 31.9 11.9 31.5 9.7 18.9 5.3 10.8 Solution: REQUIREMENT (1) The data are assumed to be a simple random sample (SRS). (2) The scatterplot is very roughly a straight-line pattern. (3) There are no outliers.
  • 21. 21 Example 5: Use the first formulas for b1 and b0 to find the equation of the regression line in which the explanatory variable (or x variable) is chocolate consumption and the response variable (or y variable) is the corresponding number of Nobel Laureates. Find the slope b1 as follows: r is the linear correlation coefficient sy is the standard deviation of the sample y values sx is the standard deviation of the sample x values. 𝑦 = 𝑏0 + 𝑏1𝑥, OR 𝑦 = 𝑎 + 𝑏𝑥, 𝑆𝑙𝑜𝑝𝑒: 𝑏1 = 𝑛 𝑥𝑦 − 𝑥 𝑦 𝑛 𝑥2 − 𝑥 2 𝑌 − int 𝑒 𝑟𝑐𝑒𝑝𝑡: 𝑏0 = 𝑦 − 𝑏1𝑥, 𝑦 = 𝑦 𝑛 , 𝑥 = 𝑥 𝑛 Also : 𝑏1 = 𝑏 = 𝑟 𝑠𝑦 𝑠𝑥 , 𝑏0 = 𝑎 = 𝑦 − 𝑏1𝑥 𝑦 = 𝑏0 + 𝑏1𝑥 = −3.3667 + 2.4931𝑥 Graphing the Regression Line: Shown below is the Minitab display of the scatterplot with the graph of the regression line included. We can see that the regression line fits the points well, but the points are not very close to the line. 𝑏1 = 𝑟 𝑠𝑦 𝑠𝑥 = 0.80061 ∙ 10.2116 3.2792 = 2.4931, 𝑏0 = 𝑦 − 𝑏1𝑥 = 11.10435 − 2.4931 ∙ 5.8043 = −3.3667
  • 22. 22 10.2 Regression (For later courses!) Population Parameter: 𝒚 = 𝜷𝟎 + 𝜷𝟏𝒙 Sample Statistic : 𝒚 = 𝒃𝟎 + 𝒃𝟏𝒙 Ti calculator : 𝒚 = 𝒂 + 𝒃𝒙 𝑦 = 𝑏0 + 𝑏1𝑥, OR 𝑦 = 𝑎 + 𝑏𝑥, 𝑆𝑙𝑜𝑝𝑒: 𝑏 = 𝑏1 = 𝑛 𝑥𝑦 − 𝑥 𝑦 𝑛 𝑥2 − 𝑥 2 𝑌 − int 𝑒 𝑟𝑐𝑒𝑝𝑡: 𝑎 = 𝑏0 = 𝑦 𝑥2 − 𝑥 𝑥𝑦 𝑛 𝑥2 − 𝑥 2 Also : 𝑏1 = 𝑏 = 𝑟 𝑠𝑦 𝑠𝑥 , 𝑏0 = 𝑎 = 𝑦 − 𝑏1𝑥 oefficient ent e y values es the sample x values. he sample x values. Population Parameter: 𝒚 = 𝜷𝟎 + 𝜷𝟏𝒙 Sample Statistic : 𝒚 = 𝒃𝟎 + 𝒃𝟏𝒙 Ti calculator : 𝒚 = 𝒂 + 𝒃𝒙 Observed Value: 𝑌𝑖 Observed Average Value:𝑌 Predicted Value (From Reg EQ): 𝑌𝑖 𝑦 = 𝑎 + 𝑏𝑥 𝑆𝑆𝑇𝑜𝑡𝑎𝑙 𝑜𝑟𝑆𝑆𝑦 = 𝑖=1 𝑛 𝑌𝑖 − 𝑌 2 = 𝑆𝑆𝑟𝑒𝑔 + 𝑆𝑆𝑒𝑟𝑟𝑜𝑟 𝑆𝑆𝑟𝑒𝑔 = 𝑖=1 𝑛 𝑌𝑖 − 𝑌 2 𝑆𝑆𝑒𝑟𝑟𝑜𝑟 = 𝑖=1 𝑛 𝑌𝑖 − 𝑌𝑖 2 𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦 (𝑥𝑖 − 𝑥)2 (𝑦𝑖 − 𝑦)2 SP = (𝑥𝑖 −𝑥 )(𝑦𝑖 − 𝑦) 𝒚 𝒚 − 𝑦 (𝒚 − 𝑦)2 𝑦𝑖 − 𝒚 (𝑦𝑖 − 𝒚)2