This Slideshare presentation is a partial preview of the full business document. To view and download the full document, please go here:
http://flevy.com/browse/business-document/059correlation-and-simple-regression-1160
DESCRIPTION
Regression is one of the most widely and successfully used data analysis methods. At its core is correlation. Correlation explains how much linear association exists between two variables x and y, while Linear Regression provides a prediction equation describing the relationship between x and y.
The objectives of this module are to: understand the basics of Correlation, measure the strength of and test the statistical significance of correlation between two variables, understand and Perform simple Linear Regression, validate assumptions and understand and use the resulting regression equation.
This material is suitable for independent study or formal classroom training and includes an exercise, list of tools and quiz questions.
2. 4
Correlation Coefficient
• Correlation coefficient, r, for two variables x and y,
• r = Sample correlation coefficient
• ρ = Population correlation coefficient
• r is an estimate of ρ
• Coefficient values fall between –1 and +1
x) y)
r
n-1
(xi (yi
xy
xi=1
n
=
- -
Σ
1
s sy
Sample std.dev of 1st variable Sample std.dev of 2nd variable
Sample mean of 1st variable Sample mean of 2nd variable
This document is a partial preview. Full document download can be found on Flevy:
http://flevy.com/browse/document/059correlation-and-simple-regression-1160
3. 7
Correlation Basics
• Correlation does not prove causation. Correlation
between two variables may be caused by a 3rd,
unidentified, variable.
• True causation can only be identified through
properly controlled experiments.
• The correlation coefficient only measures linear
relationships. A nonlinear relationship may exist
even if r = 0
X
Y
3210-1-2-3
16
14
12
10
8
6
4
2
0
Scatterplot of Y vs X
This document is a partial preview. Full document download can be found on Flevy:
http://flevy.com/browse/document/059correlation-and-simple-regression-1160
4. 10
Example: Correlation
3. Continued . . .
Study Hours Score
44 93
36 86
32 81
28 80
24 78
42 89
34 85
30 82
26 83
46 90
40 88
38 90
ExamScore
Study Hours
This document is a partial preview. Full document download can be found on Flevy:
http://flevy.com/browse/document/059correlation-and-simple-regression-1160
5. 13
Example: Correlation
6. Determine rcritical from table, with DF = N-2 = 12-2=10
rcritical = 0.6581
7. rcalc = .924 >rcritical = .658, therefore reject Ho
This document is a partial preview. Full document download can be found on Flevy:
http://flevy.com/browse/document/059correlation-and-simple-regression-1160
6. 16
Regression
This document is a partial preview. Full document download can be found on Flevy:
http://flevy.com/browse/document/059correlation-and-simple-regression-1160
7. 19
Regression Terms
• Response Variable: also known as dependent variable.
Y in the expression Y=f(X)
• Predictor Variable: independent variables (X) used to
predict values of the dependent or response variables
• Regression Equation: algebraic representation of the
regression line used to describe the relationship
between X and Y
• Fits: the predicted (fitted) value for the response at
the combination of predictor settings (X’s) you
requested
• Residuals: difference between predicted response (Fit)
and observed response
• Method of Least Squares: the line is drawn such that
the sum of the squared distances between the points
and the line is minimized
This document is a partial preview. Full document download can be found on Flevy:
http://flevy.com/browse/document/059correlation-and-simple-regression-1160
8. 22
Example: Regression
A Master Black Belt is interested in the relationship
between the amount of hours each student spent
preparing for their six sigma exam and the resulting
exam scores.
Create a regression equation
Use the Regression Equation to predict the score when
hours spent preparing was 39
This document is a partial preview. Full document download can be found on Flevy:
http://flevy.com/browse/document/059correlation-and-simple-regression-1160
9. 25
Example: Regression
Excel (alternative) Method:
1. Data > Data Analysis > Regression
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.923980238
R Square 0.853739481
Adjusted R Square 0.839113429
Standard Error 1.877141357
Observations 12
ANOVA
df SS MS F
Regression 1 205.6800699 205.6800699 58.3711507
Residual 10 35.23659674 3.523659674
Total 11 240.9166667
Coefficients Standard Error t Stat P-value
Intercept 64.42890443 2.799988827 23.01041483 5.4276E-10
X Variable 1 0.59965035 0.078487223 7.640101487 1.7581E-05
Y = 64.43 + 0.5996X
1. B0 & B1 appear under
coefficients
2. P-Value < α confirms relevance
of coefficient values
3. R2 = 0.854
This document is a partial preview. Full document download can be found on Flevy:
http://flevy.com/browse/document/059correlation-and-simple-regression-1160
10. 28
P-Value & R2
• P-Value appears in two places:
Coefficient table: p-value tests the significance of the
observed relationship between response and predictor
ANOVA table: p-value tests whether the regression is
significant. Is at least one of the coefficients significantly
different from 0?
• R2 describes the amount of variation in the observed
response values that can be explained by the model
The higher R2 , the better the prediction equation is assumed
to be
Adjusted R2 is a modified R2 that has been adjusted for the
number of terms in the model. Unlike R2, adjusted R2 may get
smaller when you add unnecessary terms to the model
• It is possible to have a p-value below α with an R2
value that is still relatively low. This means that there
are additional sources of variation (X’s) that should be
included in the model.
This document is a partial preview. Full document download can be found on Flevy:
http://flevy.com/browse/document/059correlation-and-simple-regression-1160
11. 31
Score = 64.43 + 0.5996 * Study Hours
Score = 64.43 + 0.5996 * 39 = 87.82
By studying 39 hours, the exam taker is predicted to
receive a score of 87.82 ≅ 88%
Example Results
How certain can we be in this prediction?
This document is a partial preview. Full document download can be found on Flevy:
http://flevy.com/browse/document/059correlation-and-simple-regression-1160
12. 34
Analyzing Residuals
• A residual is the difference between the actual Y and
the Y predicted by the regression equation. The
residuals in a regression model can be analyzed to
reveal inadequacies in the model
• In order to use and trust the results of a regression
analysis, the following assumptions about residuals
must be verified:
The residuals are independent
The residuals are normally distributed
The residuals have equal variance
These assumptions are the same as those for ANOVA
This document is a partial preview. Full document download can be found on Flevy:
http://flevy.com/browse/document/059correlation-and-simple-regression-1160
13. Exercise
In a recent Wave of Green Belts, the students asked their instructor how much
they have to study in order to pass the exam. The instructor looked at previous
records of those who took the exam to see if there was a relationship between
The number of hours each student studied and their final score.
Name Hours of Study GB Test Score
Annie Leung 4 83
BH Tan 7 88
Chen Weizhang 3 83
CK Wong 6 89
Jason Kuo 10 91
John Anock 4 85
John Zhang 2 77
Karl Jiang 8 90
Determine if a relationship between hours studied and test score exists.
37
This document is a partial preview. Full document download can be found on Flevy:
http://flevy.com/browse/document/059correlation-and-simple-regression-1160
14. 1
Flevy (www.flevy.com) is the marketplace
for premium documents. These
documents can range from Business
Frameworks to Financial Models to
PowerPoint Templates.
Flevy was founded under the principle that
companies waste a lot of time and money
recreating the same foundational business
documents. Our vision is for Flevy to
become a comprehensive knowledge base
of business documents. All organizations,
from startups to large enterprises, can use
Flevy— whether it's to jumpstart projects, to
find reference or comparison materials, or
just to learn.
Contact Us
Please contact us with any questions you may have
about our company.
• General Inquiries
support@flevy.com
• Media/PR
press@flevy.com
• Billing
billing@flevy.com