SlideShare a Scribd company logo
1 of 35
iStockphoto/Thinkstock
chapter 10
Linear Regression
Learning Objectives
After reading this chapter, you will be able to. . .
1 explain the relationship between correlation and regression.
2. understand the importance of prediction using regression.
3. identify the two values that determine the regression line in
least-squares regression.
4 predict the value of a criterion based on a predictor value.
5. describe the extension of bivariate regression to multiple
regression.
6. report and interpret results of multiple regression in APA
format.
CN
CO_LO
CO_TX
CO_NL
CT
CO_CRD
suk85842_10_c10.indd 367 10/23/13 1:43 PM
CHAPTER 10Section 10.1 Regression and Correlation
Regression is a powerful analytical tool that involves using
relationships between vari-ables to predict one from a measure
of the other. Generally speaking, people make
regression-like predictions often. The presence of clouds in the
morning sky prompts us
to take an umbrella to work, for example, or a phone call from
unexpected guests prompts
us to prepare extra food for a meal. This chapter follows the
same thinking, except that the
predictions are mathematical.
Regression topics represent a bit of a puzzle. Sometimes readers
shy away from regression
topics, thinking that they are difficult to understand or at least
difficult to master. How-
ever, regression, as the reader will see, is neither difficult to
understand nor to execute.
Social scientists rely on regression in virtually every advanced
statistical procedure. Most
of those high-end statistical techniques—such as multivariate
analysis of variance, dis-
criminant function analysis, and structural equations
modeling—are beyond the scope of
an introductory text, but regression analysis is an essential part
of the preparation for each
of them. In the meantime, we will learn to use regression for its
own purposes, to detect
significant independent variables known as predictors on a
dependent outcome variable
known as the criterion.
10.1 Regression and Correlation
In Chapter 9, we made the point that when two variables are
correlated, it is because they both contain some of the same
information. If intelligence and reading comprehension
are correlated, it is because to some degree they both measure
the same characteristic. The
more highly they are correlated, the greater the quantity of
whatever is measured that the
two characteristics have in common.
Recall that the coefficient of determination (rxy
2) indicates the proportion of one variable,
the x variable, for example, that can be explained by the other,
which is designated y.
If intelligence (x) and reading comprehension (y) are correlated
rxy 5 .8, then rxy
2 5 .64.
The coefficient of determination tells us that 64% of whatever
reading comprehension
measures can be explained by differences in intelligence.
Another way to say this is that
the coefficient of determination indicates how much information
two correlated variables
have in common.
If correlated variables share information, it seems logical that if
we had information about
the value of one of those variables, we could make a better-
than-chance prediction of what
the value of the other would be.
• If age and height are correlated for teenagers and we know
how old someone is,
we should be able to make a better-than-chance prediction of
how tall that per-
son is, right? Or perhaps it can be turned around, and knowing
how tall a teen is
can predict that teen’s age.
• If education and income are correlated and we know how
many years of school-
ing someone has had, we should be able to make an educated
guess about that
person’s level of income, right?
• If soldiers’ exposure to combat situations is correlated with
their manifestation of
posttraumatic stress disorder (PTSD), can the severity of PTSD
be predicted from
the number of times a soldier has been in combat?
H1
TX_DC
BLF
TX
BL
BLL
suk85842_10_c10.indd 368 10/23/13 1:43 PM
CHAPTER 10Section 10.1 Regression and Correlation
Questions like these are at the heart of regression, and people
have been asking such ques-
tions for a long time. Karl Gauss, the man who defined the
characteristics of the normal
(Gaussian) distribution, began developing the mathematics
behind regression in the early
part of the 19th century. Many others have also made
contributions to this form of quan-
titative analysis. Collectively, their work has allowed experts in
a variety of fields to use
regression procedures in their decision making for many years.
• Economists gather data on unemployment rates, wholesale
inventories, and con-
sumer spending in order to predict the rate at which the
economy will grow. This
approach is effective because each of those variables correlates
with economic
expansion.
• Meteorologists use changes in barometric pressure to predict
the kind of weather
that is coming. Because drops in barometric pressure are
predictors of violent
storms in the Great Plains states and in the southeastern part of
the United States,
meteorologists watch particularly for dramatic drops in air
pressure.
• Odds makers use data on a team’s past performance, any
injuries to key players,
and the quality of the opponent to predict the outcomes in
games.
• Psychologists can use genetic and social factors, including the
history of alcohol
abuse in a family, to predict an individual’s predisposition to
abuse drugs.
• Human resource professionals can use current employee
performance measures
to predict the effectiveness of future employee hires.
Each of these scenarios is possible because of correlations
between variables. Correlations
pave the way for prediction. The point is not that the people in
these examples necessar-
ily sit down with mathematical models to calculate the
probability
that certain results will emerge (which is something we will do
in this
chapter), but they could. In fact, a review of the professional
literature
provides ample evidence that scholars perform analyses like
these
frequently. They are important because prediction allows those
who
must act to be proactive. Rather than waiting for some
important con-
dition to emerge, its timing can be anticipated with some
precision,
and then an appropriate action can be taken. Prediction is the
basis for
sound decision making.
The Language of Regression
There are many types of regression procedures. However, even
though we are interested
in just one in this chapter, the concepts and much of the
language used here are common
to the different approaches. Because variables share
information, one can be used to pre-
dict the other.
Those terms are common enough in statistics, but in regression
discussions the danger
is that the words independent and dependent make it very easy
to begin thinking of the
relationship between the antecedent variables and that
dependent variable as causal.
Although we also used this language with t-tests and ANOVA,
the risk is greater in cor-
relation discussions because the discussion begins with the
assumption of a relationship
between the variables. To avoid this slippery slope, we will
make an adjustment in the
regression discussion. Rather than the terms dependent variable
and independent variable,
as common as those terms are elsewhere, we will refer to the
variable to be predicted
in a regression procedure as the criterion variable and the
variable used to make the
A From the
standpoint of
regression, why does
the size of a correlation
coefficient matter?
Try It!
suk85842_10_c10.indd 369 10/23/13 1:43 PM
CHAPTER 10Section 10.1 Regression and Correlation
prediction as the predictor variable. This language is adopted to
minimize the risk of
confusing correlations with causal relationships.
Note that the suggestion is not that there is not a causal
relationship. There may be just
such a connection. In fact, the relationship between exposure to
combat and the develop-
ment of posttraumatic stress probably is causal. The point is
that the correlation alone—
and correlation is the foundation for pursuing regression—is not
sufficient by itself to
establish causality.
Although the terms criterion and predictor are used here for
descriptive purposes, there
needs to be some shorthand indicators as well. The symbols
used in regression are the
same as those used with the Pearson correlation in Chapter 9: x
and y for the correlated
variables. The symbols will be
• x for the predictor variable and
• y for the criterion variable.
Which Is the Predictor?
The confusion that can occur when equating correlation with
causation is increased by
recognizing that either variable in a significant correlation can
be used to predict the
other. If there is a correlation between the degree of
posttraumatic stress disorder and the
length of exposure to combat, it means that each variable is
equally related to the other.
Someone could
• predict the degree of PTSD from the length of combat
exposure or
• turn it around and predict the length of combat exposure from
the degree of PTSD.
Either variable in a statistically significant correlation can
predict the other. From the point
of view of the mathematics involved, it does not matter,
although practical considerations
may dictate which variable must be the predictor and which
must be the criterion. Some-
times one of the variables will prove more elusive than the other
because the data involved
is more difficult to gather. In such cases, the difficulty involved
may dictate that the more
accessible variable becomes the predictor and that the less
available variable be predicted
rather than gathered.
If reading comprehension scores are significantly correlated
with intelligence scores and
someone wishes to predict the value of one from the other, it
makes sense that the reading
scores would be the predictor variable. Reading scores are
easier to collect than intelli-
gence scores. Most major intelligence tests must be
administered one on one by someone
who has been trained with the instrument, which makes the
process expensive and time-
consuming. Reading tests are typically less demanding to
administer.
There are other considerations. Perhaps aptitude scores from a
college admissions test
that students take in high school are correlated with the grades
that students earn dur-
ing their first year of college study. From the standpoint of the
correlation, scores can be
predicted from grades quite as readily as grades can be
predicted from scores. It does not
make much sense to predict backwards. It will generally be the
students’ future rather
than their past performance that someone will be interested in
knowing.
suk85842_10_c10.indd 370 10/23/13 1:43 PM
CHAPTER 10Section 10.1 Regression and Correlation
Picturing Regression
Scatterplots are an easy way to graphically represent the
correlation between variables. In
Chapter 9 the scatterplot was used to illustrate the correlation
between verbal ability and
intelligence. Recall that each point in the scatterplot
represented one person’s scores on
both of the measured characteristics. When we use the
scatterplot for regression purposes,
the vertical axis will indicate the level of the criterion variable,
y, and the horizontal axis
will indicate the level of the predictor variable, x:
• x signifies the predictor variable in a scatterplot.
• y is the criterion variable.
As noted in Chapter 9, when two variables are highly
correlated, there tends to be little
scatter from left to right along the ranges of the two variables.
For example, a researcher
randomly selects a group of 20 first-year college students at the
end of their first year of
study at a large university and gathers from them two types of
data: (a) the number of hours
per week they typically study and (b) their recorded grade
averages at the end of their first
semester. The researcher wants to determine how well the
number of hours the student
studies per week will predict the student’s grades. This means
that the hours studied is the
predictor variable, x, and grade average is the criterion variable,
y. The data follows:
Hours Studied (x) Grade Average (y)
1 1 1.5
2 2 1.8
3 3 2.2
4 3 2.0
5 5 2.0
6 5 2.1
7 7 2.4
8 7 2.2
9 8 2.4
10 10 2.7
11 10 2.6
12 11 2.9
13 13 3.0
14 15 3.0
15 16 3.1
16 16 2.7
17 16 3.3
18 17 3.0
19 18 3.4
20 20 4.0
suk85842_10_c10.indd 371 10/23/13 1:43 PM
CHAPTER 10Section 10.1 Regression and Correlation
This data can be used to create the same type of graph
completed in Chapter 9. If the data
is placed in columns in an Excel spreadsheet just as they are
here (the numbering down
the left is automatic in Excel and not considered one of the data
columns), the commands
for creating the scatterplot are then simply Insert and then
Scatter. With some additions
to label the axes and title the graph, the result is Figure 10.1.
To plot the data in this example, draw the vertical and
horizontal axes of the graph, and
then for each individual subject indicate the confluence of the
individual’s grade (vertical
axis) with the same individual’s study time (horizontal axis).
Once all subjects are plotted,
at least three conclusions can be drawn:
1. There is a correlation between the number of hours studied
and the students’
first-semester grades. If this was not the case, the dots would
have no particular
pattern and would have just been randomly scattered. However,
the dots array
fairly consistently from lower left to upper right.
2. The correlation is substantial; there is not much scatter in the
scatterplot, and
the trend from lower left to upper right is easy to recognize. If
the relationship
had been negative (the more they studied, the poorer they
performed), the data
pattern would have been from the upper left to the lower right.
In that case, as
the value of one variable increased, the other would decrease.
(This might be the
case, for example, if this was a correlation of the number of
hours students spend
playing video games and their grades. The more they play, the
less well they do
in their classes.)
3. The relationship appears to be linear. There is not a point in
the pattern where it
appears that as one variable seems to increase, the other tends
to level off or even
diminish.
Figure 10.1: A scatterplot for the relationship between study
time
and grades
0
2.0
2.5
1.0
1.5
0.5
3.0
3.5
4.0
4.5
0 5 10 15 20 25
Hours Studied Per Week
F
ir
st
T
e
rm
G
ra
d
e
s
o
n
a
4
.0
S
ca
le
suk85842_10_c10.indd 372 10/23/13 1:43 PM
CHAPTER 10Section 10.1 Regression and Correlation
About Linearity
The third point about the relationship between x and y being
linear is particularly impor-
tant. The type of regression discussed in this chapter is based on
the assumption that the
association between the predictor and criterion variables is
linear. There are other regres-
sion techniques that are used when the relationship between
variables is not linear, but
in the reasoning used in this chapter, there needs to be linearity.
Because there is a linear
connection, a straight line drawn through or as close as possible
to as many of the data
points as possible might look like the line in the graph in Figure
10.2A.
Figure 10.2: Regression lines
That line through the data points is called a regression line. If it
is positioned so that it is
as close as possible to as many of the 20 data points as it can be
and still be a straight line, it
can be used to determine any value of y from a specified value
of x (or for that matter, any
value of x from a specified value of y). For example, someone
using the graph can select any
value of x along the horizontal axis, go vertically from that x
value up to the line, and then
move left horizontally from the line to where the y-axis is. The
value of the point at which
the y-axis is encountered will be the value of y for the specified
x value. See Figure 10.2B.
0
4
5
2
3
1
0 5 10 15 20 25
Hours Studied Per Week
F
ir
st
T
e
rm
G
ra
d
e
s
o
n
a
4
.0
S
ca
le
A. A Regression Line through a Scatterplot
0
4
5
2
3
1
0 5 10 15 20 25
Hours Studied Per Week
F
ir
st
T
e
rm
G
ra
d
e
s
o
n
a
4
.0
S
ca
le
B. Using the Regression Line to Determine y from x
suk85842_10_c10.indd 373 10/23/13 1:43 PM
CHAPTER 10Section 10.1 Regression and Correlation
No one in the sample of 20 indicated a study time of 12.5 hours
per week, but perhaps
one of the researcher’s colleagues who knows what kind of
analysis is under way asks, “I
know someone who studies 12.5 hours every week. What is your
best guess for this stu-
dent’s grade average?” If the regression line is positioned
accurately, the researcher can
find 12.5 on the x-axis, travel vertically up to the regression
line, and then move left to the
y-axis to determine that 12.5 hours per week of study time
predicts a grade point average
of about 2.9.
The researcher gathered data from only 20 students, which is a
little risky, but they were
randomly selected, and sampling theory tells us that a randomly
selected sample will
differ from the population of all freshman students only by
chance. In spite of the small
sample size, the researcher may be lucky and find that the
sampling error is minimal.
Coping with Less-Than-Perfect Correlations
Besides the fact that the graph in Figure 10.2 does not provide
very detailed markings (the
researcher had to guess that 12.5 hours studied will produce a
grade average of “about
2.9”), other factors affect prediction accuracy. The researcher
cannot be exactly precise
about the grades because the correlation between the two
variables is not perfect. Although
a grade average of 2.9 might be the best possible prediction
given this data, it is quite
likely that the prediction will not be exactly accurate. Maybe
for a particular student who
studies 12.5 hours per week, a GPA of 2.8 would turn out to be
a better prediction, or per-
haps 3.0.
Without yet calculating the correlation, the evidence for rxy is
the scatter in the data points.
For example, note that three students reported studying the
same amount, 16 hours per
week, but all of them ended the semester with different grade
averages. Of course, an
imperfect correlation such as this one reflects the fact that
besides probably including
some error in the way the variables are measured, grades are
affected
by more than just study time. The researcher has not accounted
for
differences in academic ability or differences in the quality of
the
study time. Those problems aside, as long as the correlation
between
the predictor and criterion variables is statistically significant,
the pre-
dicted value of y from the value of x will be more accurate over
time
than a number of random predictions.
Error in prediction is something we learn to tolerate. No one
sues the United States
Commerce Department if the prediction for job growth is off by
one-tenth for the year.
We do not petition the television station to fire the
meteorologist when the forecast high
for the day is wrong by a couple of degrees. Error is inevitable
when predictions have to
be based on imperfectly correlated variables, but we can at least
have some measure of
how extensive the error is likely to be. Later in the chapter, we
will learn to calculate the
amount of error and then include it with the predicted value in
order to have a gauge of
prediction accuracy.
B What is the
visual evidence in
a scatterplot for a weak
correlation?
Try It!
suk85842_10_c10.indd 374 10/23/13 1:43 PM
CHAPTER 10Section 10.1 Regression and Correlation
Understanding the Least-Squares Criterion
A scatterplot is a helpful way to introduce the idea of
regression, but relying on the scat-
tered points in a graph in order to predict one variable from the
other is not practical. It is
a conceptual model for what will actually be done
mathematically with a regression equa-
tion as depicted in Figure 10.3. The equation is derived in a way
that satisfies the require-
ments for what is called the least-squares criterion. The least-
squares criterion (think of
criterion in this case as a requirement) is this: The regression
line must be positioned so
that the sum of all possible prediction errors (y 2 ŷ) has the
lowest possible value. In
other words, the distance (depicted as gray lines in the graph)
from the data point to the
regression line is as minimal as possible. These errors that are
represented in the difference
between y and ŷ are termed residual scores as shown in Figure
10.3.
Figure 10.3: The least-squares criterion graph
Source: Used by permission Alan B. Hale, Ph.D.
Describing the Prediction Error
Because of error in the regression solution, at least some of the
predictions are going to be
inaccurate. These errors emerge as differences between the
criterion variable’s predicted
value and what its actual value would be with all possible data
and if there was no need
to rely on the figure or on mathematics for the prediction. If
x is the actual value of the predictor variable, and
y is the actual value of the criterion variable, we can use
y-intercept
x1
x (independent)
y1
y2
y2
y2 � y2
y
(d
e
p
e
n
d
e
n
t)
^
^
Minimize: �(yi � yi)2
Least Squares Method
n
i = 1
^
Line: y = a + bx
suk85842_10_c10.indd 375 10/23/13 1:43 PM
CHAPTER 10Section 10.2 Ordinary Least-Squares Regression
With One Predictor
ŷ (y-hat) to symbolize the predicted value of the criterion
variable and
y 2 ŷ as a way to indicate the prediction error or residual scores
in a particular
solution.
Look at Figure 10.2 again. If the correlation between hours
studied and end-of-term grades
was perfect, the data points would form a straight line. With no
scatter in the line, each
value of x would result in just one corresponding value of y.
Those three people who
reported 16 hours per week of study time would all have had the
same grades at the end
of the semester.
But with scatter in the data points, establishing the regression
line is not a matter of just
connecting the dots. Because the regression line is a straight
line and it has to be posi-
tioned so as to minimize the prediction errors, its positioning
involves some compromise.
It is a line of best fit, not a line of perfect fit.
Using the Residual Scores
If someone were to rely on Figure 10.2 to make a series of
predictions and then go through
university records to determine what the actual grade averages
were for those 20 students
after their first semester, any difference between what was
predicted and what the grades
actually were would be reflected in the residual scores. The
residual scores would indicate
the amount of prediction error.
If all of those residual scores were added up, what would they
total? This is how the ques-
tion looks in symbols:
a ( y 2 ŷ) 5 ?
When a number of predictions are made, some of the residual
scores will be positive (the
actual value of y will be larger than the predicted value, ŷ), and
some will be negative (y
is smaller than ŷ). And if all the residual scores were summed,
their total would be 0. The
positive and negative residual scores would cancel each other
out and sum to 0.
The zero sum is not very helpful to someone who needs to know
how much error there
is in a regression solution. A better indicator of prediction error
would be to square the
residual scores, which would make all the negative residual
scores positive, and then sum
them. If the sum of those squared values has its lowest possible
value, the solution meets
the least-squares criterion. The regression equation used here
was developed to satisfy
that requirement. That is why this particular form of regression
is called ordinary least-
squares regression.
10.2 Ordinary Least-Squares Regression With One Predictor
Theoretically, there can be any number of predictors, x1, x2, x3,
. . ., xn in a regression problem rather than just the single x.
Toward the end of the chapter, there is a descrip-
tion of how multiple regression works. This form of multiple
regression is just ordinary
suk85842_10_c10.indd 376 10/23/13 1:43 PM
CHAPTER 10Section 10.2 Ordinary Least-Squares Regression
With One Predictor
least-squares regression with multiple predictors. The focus
here is on regression with just
one predictor. It is sometimes called simple regression or
bivariate regression because
there are only two variables involved.
The regression equation provides the math needed to position
the regression line. So that
the positioning of the regression line meets the least-squares
criterion, there are two ques-
tions to be answered:
1. At what point does the regression line cross the y-axis in the
graph?
2. How much does the criterion variable (y) change when the
predictor variable (x)
increases by 1.0?
The answer to the first question establishes the value of the
intercept. The intercept indi-
cates the value of y where the regression line intercepts the
vertical axis. Put more suc-
cinctly, the intercept is the value y when x 5 0. If you look at
Figure 10.2 again, it appears
that the regression line crosses the y-axis at about y 5 1.2.
Following is an equation to
calculate this value, which will indicate how close we were with
the estimate.
The answer to the second question requires a value that
represents the slope of the line,
or just the slope. Refer to Figure 10.2. It is possible to estimate
from the way the line is
positioned that if x increases by 1.0, it appears that y increases
by about .2. The slope value
reveals how radically the regression line inclines or declines.
The Regression Equation
The bivariate regression equation has this form:
ŷ 5 a 1 bx 1 e Formula 10.1
Where
ŷ 5 the predicted value of the criterion variable
a 5 the intercept
b 5 the slope of the regression line
x 5 the value of the predictor variable
e 5 prediction error
The equation shows that for a given value of the predictor
variable (x), the best prediction
for ŷ is the value of the intercept plus the product of the slope
times the predictor plus
some component of error. Calculating the amount of error in the
regression problem is
a separate process rather than part of the initial calculation. The
e indicates that there is
always error in the prediction. Because the error will be
calculated separately, we can take
the e out of the equation, which makes the formula
ŷ 5 a 1 bx
suk85842_10_c10.indd 377 10/23/13 1:43 PM
CHAPTER 10Section 10.2 Ordinary Least-Squares Regression
With One Predictor
Before calculating a regression solution, we need to know the
intercept value, a, and the
slope, b. They each have their own equations, but most of the
terms involved are statistics
that are already familiar. First, the formula for the intercept:
a 5 My 2 bMx Formula 10.2
Where
a 5 the intercept
b 5 (this is the new part) the slope of the regression line
My 5 the mean of the criterion variable, y
Mx 5 the mean of the predictor variable, x
The intercept value is
1. the mean of the criterion variable minus
2. the slope value (once that is determined) times the mean of
the predictor variable.
Because the intercept formula includes the slope of the
regression line, or the regression
coefficient, we need to start there. That formula is
b 5 rxy a
sy
sx
b Formula 10.3
Where
b 5 the slope of the regression line
rxy 5 the correlation coefficient for the two variables
sy 5 the standard deviation of the criterion variable, y
sx 5 the standard deviation of the predictor variable, x
Calculating a Regression
Solution
Using the study-time and semester-grades data, calculate a
regression solution. Remem-
ber that the researcher estimated that someone who studied 12.5
hours per week would
probably have a semester grade average of about 2.9. How
accurate was that estimate?
The process will be as follows:
1. Calculate the means and standard deviations for x and y.
2. Calculate the correlation of x and y.
3. Calculate the slope of the line, or the regression coefficient,
b.
4. Calculate the regression intercept, a.
5. Calculate the value of ŷ.
suk85842_10_c10.indd 378 10/23/13 1:43 PM
CHAPTER 10Section 10.2 Ordinary Least-Squares Regression
With One Predictor
1. For the two variables, verify that
Hours Studied (x) Grade Average (y)
Mean 10.15 2.615
Standard deviation 5.941 613
2. Using Formula 9.2, calculate the correlation as follows:
rxy 5
n a xy 2 ( a x)( a y)
Å e cn a x
2 2 ( a x)
2 d cn a y2 2 ( a y2 d f
rxy 5
201596.32 2 12032 152.32
"532012,7312 2 12032 2 4 3 1201143.912 2 152.32 2 2 46
rxy 5
1,309.4
"3 113,4112 1142.912 4
5 .946
Checking this value against the critical value from Table 9.1,
rxy .05(18) 5 .444, indi-
cates that the correlation is statistically significant.
To do the correlation on Excel, remember that having set up
the data in the two
columns, the commands are DataSData AnalysisSCorrelation.
3. The slope of the line is
b 5 rxy a
sy
sx
b
5 .946 a .613
5.941
b
5 .098
This value indicates that y increases .098 for every 1.0
increase in x. Earlier we
guessed that the slope of the line in the graph in Figure 10.2
might be about .2,
which turns out to be incorrect.
4. The regression line intercept is
a 5 My 2 bMx
5 2.615 2 (.098)(10.15)
5 1.620
This value indicates that if x 5 0, then y 5 1.620. Based on the
visual best fit that
we made to the graph in Figure 10.2, we guessed that the
intercept would be about
y 5 1.2. That estimate was not very close either.
suk85842_10_c10.indd 379 10/23/13 1:43 PM
CHAPTER 10Section 10.2 Ordinary Least-Squares Regression
With One Predictor
5. What grades would students likely earn if they study 12.5
hours per week? The
estimate based on that visually placed regression line in Figure
10.2 was a grade
average of about 2.9. Check this “guesstimate” by solving for ŷ
and comparing the
two solutions.
ŷ 5 a 1 bx
5 1.620 1 (.098)(12.5)
5 2.845
6. Based on the data from the 20 students, the best prediction of
the grade average
someone who studies 12.5 hours per week will earn is 2.845.
Interestingly, that
value is not too far from the earlier prediction.
Notice that the data from two correlated variables were helpful
in predicting a value from
one of the two variables that was unknown, from a known value
of the other variable. It
took a few descriptive statistics (the means, standard deviations,
and the correlation coef-
ficient) and the relatively straightforward equations for the
slope, the intercept, and for
the predicted value of y.
Practicing the Regression

More Related Content

Similar to iStockphotoThinkstockchapter 10Linear RegressionL.docx

Topic Two Biostatistics.pptx
Topic Two  Biostatistics.pptxTopic Two  Biostatistics.pptx
Topic Two Biostatistics.pptxkihembopamelah
 
Real Estate Data Set
Real Estate Data SetReal Estate Data Set
Real Estate Data SetSarah Jimenez
 
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docx
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docxBUS308 – Week 5 Lecture 1 A Different View Expected Ou.docx
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docxcurwenmichaela
 
Analyzing quantitative data
Analyzing quantitative dataAnalyzing quantitative data
Analyzing quantitative datamostafasharafiye
 
Topic Learning TeamNumber of Pages 2 (Double Spaced)Num.docx
Topic Learning TeamNumber of Pages 2 (Double Spaced)Num.docxTopic Learning TeamNumber of Pages 2 (Double Spaced)Num.docx
Topic Learning TeamNumber of Pages 2 (Double Spaced)Num.docxAASTHA76
 
DQ2Patrick QueisneOne of the greatest barriers that the orga
DQ2Patrick QueisneOne of the greatest barriers that the orgaDQ2Patrick QueisneOne of the greatest barriers that the orga
DQ2Patrick QueisneOne of the greatest barriers that the orgaDustiBuckner14
 
RELIABILITY.pptx
RELIABILITY.pptxRELIABILITY.pptx
RELIABILITY.pptxrupasi13
 
Classification via Logistic Regression
Classification via Logistic RegressionClassification via Logistic Regression
Classification via Logistic RegressionTaweh Beysolow II
 
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docxPage 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docxkarlhennesey
 
Xue paper-01-13-12
Xue paper-01-13-12Xue paper-01-13-12
Xue paper-01-13-12Yuhong Xue
 
Simple regressionand correlation (2).pdf
Simple regressionand correlation (2).pdfSimple regressionand correlation (2).pdf
Simple regressionand correlation (2).pdfyadavrahulrahul799
 
Correlational research
Correlational researchCorrelational research
Correlational researchAiden Yeh
 
raprap-opaw.pptx
raprap-opaw.pptxraprap-opaw.pptx
raprap-opaw.pptxLouieCase
 
Not sure how to do this case analysis please help me do it!1.Are t.pdf
Not sure how to do this case analysis please help me do it!1.Are t.pdfNot sure how to do this case analysis please help me do it!1.Are t.pdf
Not sure how to do this case analysis please help me do it!1.Are t.pdfamitbagga0808
 
Research methods 2 operationalization & measurement
Research methods 2   operationalization & measurementResearch methods 2   operationalization & measurement
Research methods 2 operationalization & measurementattique1960
 
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docxhyacinthshackley2629
 
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docxnovabroom
 

Similar to iStockphotoThinkstockchapter 10Linear RegressionL.docx (20)

Topic Two Biostatistics.pptx
Topic Two  Biostatistics.pptxTopic Two  Biostatistics.pptx
Topic Two Biostatistics.pptx
 
Real Estate Data Set
Real Estate Data SetReal Estate Data Set
Real Estate Data Set
 
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docx
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docxBUS308 – Week 5 Lecture 1 A Different View Expected Ou.docx
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docx
 
Analyzing quantitative data
Analyzing quantitative dataAnalyzing quantitative data
Analyzing quantitative data
 
Topic Learning TeamNumber of Pages 2 (Double Spaced)Num.docx
Topic Learning TeamNumber of Pages 2 (Double Spaced)Num.docxTopic Learning TeamNumber of Pages 2 (Double Spaced)Num.docx
Topic Learning TeamNumber of Pages 2 (Double Spaced)Num.docx
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
DQ2Patrick QueisneOne of the greatest barriers that the orga
DQ2Patrick QueisneOne of the greatest barriers that the orgaDQ2Patrick QueisneOne of the greatest barriers that the orga
DQ2Patrick QueisneOne of the greatest barriers that the orga
 
MSTHESIS_Fuzzy
MSTHESIS_FuzzyMSTHESIS_Fuzzy
MSTHESIS_Fuzzy
 
RELIABILITY.pptx
RELIABILITY.pptxRELIABILITY.pptx
RELIABILITY.pptx
 
Classification via Logistic Regression
Classification via Logistic RegressionClassification via Logistic Regression
Classification via Logistic Regression
 
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docxPage 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
 
Xue paper-01-13-12
Xue paper-01-13-12Xue paper-01-13-12
Xue paper-01-13-12
 
Simple regressionand correlation (2).pdf
Simple regressionand correlation (2).pdfSimple regressionand correlation (2).pdf
Simple regressionand correlation (2).pdf
 
Correlational research
Correlational researchCorrelational research
Correlational research
 
Final analysis & Discussion_Volen
Final analysis & Discussion_VolenFinal analysis & Discussion_Volen
Final analysis & Discussion_Volen
 
raprap-opaw.pptx
raprap-opaw.pptxraprap-opaw.pptx
raprap-opaw.pptx
 
Not sure how to do this case analysis please help me do it!1.Are t.pdf
Not sure how to do this case analysis please help me do it!1.Are t.pdfNot sure how to do this case analysis please help me do it!1.Are t.pdf
Not sure how to do this case analysis please help me do it!1.Are t.pdf
 
Research methods 2 operationalization & measurement
Research methods 2   operationalization & measurementResearch methods 2   operationalization & measurement
Research methods 2 operationalization & measurement
 
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx
 
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx
 

More from vrickens

1000 words, 2 referencesBegin conducting research now on your .docx
1000 words, 2 referencesBegin conducting research now on your .docx1000 words, 2 referencesBegin conducting research now on your .docx
1000 words, 2 referencesBegin conducting research now on your .docxvrickens
 
1000 words only due by 5314 at 1200 estthis is a second part to.docx
1000 words only due by 5314 at 1200 estthis is a second part to.docx1000 words only due by 5314 at 1200 estthis is a second part to.docx
1000 words only due by 5314 at 1200 estthis is a second part to.docxvrickens
 
1000 words with refernceBased on the American constitution,” wh.docx
1000 words with refernceBased on the American constitution,” wh.docx1000 words with refernceBased on the American constitution,” wh.docx
1000 words with refernceBased on the American constitution,” wh.docxvrickens
 
10.1. In a t test for a single sample, the samples mean.docx
10.1. In a t test for a single sample, the samples mean.docx10.1. In a t test for a single sample, the samples mean.docx
10.1. In a t test for a single sample, the samples mean.docxvrickens
 
100 WORDS OR MOREConsider your past experiences either as a studen.docx
100 WORDS OR MOREConsider your past experiences either as a studen.docx100 WORDS OR MOREConsider your past experiences either as a studen.docx
100 WORDS OR MOREConsider your past experiences either as a studen.docxvrickens
 
1000 to 2000 words Research Title VII of the Civil Rights Act of.docx
1000 to 2000 words Research Title VII of the Civil Rights Act of.docx1000 to 2000 words Research Title VII of the Civil Rights Act of.docx
1000 to 2000 words Research Title VII of the Civil Rights Act of.docxvrickens
 
1000 word essay MlA Format.. What is our personal responsibility tow.docx
1000 word essay MlA Format.. What is our personal responsibility tow.docx1000 word essay MlA Format.. What is our personal responsibility tow.docx
1000 word essay MlA Format.. What is our personal responsibility tow.docxvrickens
 
100 wordsGoods and services that are not sold in markets.docx
100 wordsGoods and services that are not sold in markets.docx100 wordsGoods and services that are not sold in markets.docx
100 wordsGoods and services that are not sold in markets.docxvrickens
 
100 word responseChicago style citingLink to textbook httpbo.docx
100 word responseChicago style citingLink to textbook httpbo.docx100 word responseChicago style citingLink to textbook httpbo.docx
100 word responseChicago style citingLink to textbook httpbo.docxvrickens
 
100 word response to the followingBoth perspectives that we rea.docx
100 word response to the followingBoth perspectives that we rea.docx100 word response to the followingBoth perspectives that we rea.docx
100 word response to the followingBoth perspectives that we rea.docxvrickens
 
100 word response to the followingThe point that Penetito is tr.docx
100 word response to the followingThe point that Penetito is tr.docx100 word response to the followingThe point that Penetito is tr.docx
100 word response to the followingThe point that Penetito is tr.docxvrickens
 
100 word response to the folowingMust use Chicago style citing an.docx
100 word response to the folowingMust use Chicago style citing an.docx100 word response to the folowingMust use Chicago style citing an.docx
100 word response to the folowingMust use Chicago style citing an.docxvrickens
 
100 word response using textbook Getlein, Mark. Living with Art, 9t.docx
100 word response using textbook Getlein, Mark. Living with Art, 9t.docx100 word response using textbook Getlein, Mark. Living with Art, 9t.docx
100 word response using textbook Getlein, Mark. Living with Art, 9t.docxvrickens
 
100 word response to the following. Must cite properly in MLA.Un.docx
100 word response to the following. Must cite properly in MLA.Un.docx100 word response to the following. Must cite properly in MLA.Un.docx
100 word response to the following. Must cite properly in MLA.Un.docxvrickens
 
100 original, rubric, word count and required readings must be incl.docx
100 original, rubric, word count and required readings must be incl.docx100 original, rubric, word count and required readings must be incl.docx
100 original, rubric, word count and required readings must be incl.docxvrickens
 
100 or more wordsFor this Discussion imagine that you are speaki.docx
100 or more wordsFor this Discussion imagine that you are speaki.docx100 or more wordsFor this Discussion imagine that you are speaki.docx
100 or more wordsFor this Discussion imagine that you are speaki.docxvrickens
 
10. (TCOs 1 and 10) Apple, Inc. a cash basis S corporation in Or.docx
10. (TCOs 1 and 10) Apple, Inc. a cash basis S corporation in Or.docx10. (TCOs 1 and 10) Apple, Inc. a cash basis S corporation in Or.docx
10. (TCOs 1 and 10) Apple, Inc. a cash basis S corporation in Or.docxvrickens
 
10-12 slides with Notes APA Style ReferecesThe prosecutor is getti.docx
10-12 slides with Notes APA Style ReferecesThe prosecutor is getti.docx10-12 slides with Notes APA Style ReferecesThe prosecutor is getti.docx
10-12 slides with Notes APA Style ReferecesThe prosecutor is getti.docxvrickens
 
10-12 page paer onDiscuss the advantages and problems with trailer.docx
10-12 page paer onDiscuss the advantages and problems with trailer.docx10-12 page paer onDiscuss the advantages and problems with trailer.docx
10-12 page paer onDiscuss the advantages and problems with trailer.docxvrickens
 
10. Assume that you are responsible for decontaminating materials in.docx
10. Assume that you are responsible for decontaminating materials in.docx10. Assume that you are responsible for decontaminating materials in.docx
10. Assume that you are responsible for decontaminating materials in.docxvrickens
 

More from vrickens (20)

1000 words, 2 referencesBegin conducting research now on your .docx
1000 words, 2 referencesBegin conducting research now on your .docx1000 words, 2 referencesBegin conducting research now on your .docx
1000 words, 2 referencesBegin conducting research now on your .docx
 
1000 words only due by 5314 at 1200 estthis is a second part to.docx
1000 words only due by 5314 at 1200 estthis is a second part to.docx1000 words only due by 5314 at 1200 estthis is a second part to.docx
1000 words only due by 5314 at 1200 estthis is a second part to.docx
 
1000 words with refernceBased on the American constitution,” wh.docx
1000 words with refernceBased on the American constitution,” wh.docx1000 words with refernceBased on the American constitution,” wh.docx
1000 words with refernceBased on the American constitution,” wh.docx
 
10.1. In a t test for a single sample, the samples mean.docx
10.1. In a t test for a single sample, the samples mean.docx10.1. In a t test for a single sample, the samples mean.docx
10.1. In a t test for a single sample, the samples mean.docx
 
100 WORDS OR MOREConsider your past experiences either as a studen.docx
100 WORDS OR MOREConsider your past experiences either as a studen.docx100 WORDS OR MOREConsider your past experiences either as a studen.docx
100 WORDS OR MOREConsider your past experiences either as a studen.docx
 
1000 to 2000 words Research Title VII of the Civil Rights Act of.docx
1000 to 2000 words Research Title VII of the Civil Rights Act of.docx1000 to 2000 words Research Title VII of the Civil Rights Act of.docx
1000 to 2000 words Research Title VII of the Civil Rights Act of.docx
 
1000 word essay MlA Format.. What is our personal responsibility tow.docx
1000 word essay MlA Format.. What is our personal responsibility tow.docx1000 word essay MlA Format.. What is our personal responsibility tow.docx
1000 word essay MlA Format.. What is our personal responsibility tow.docx
 
100 wordsGoods and services that are not sold in markets.docx
100 wordsGoods and services that are not sold in markets.docx100 wordsGoods and services that are not sold in markets.docx
100 wordsGoods and services that are not sold in markets.docx
 
100 word responseChicago style citingLink to textbook httpbo.docx
100 word responseChicago style citingLink to textbook httpbo.docx100 word responseChicago style citingLink to textbook httpbo.docx
100 word responseChicago style citingLink to textbook httpbo.docx
 
100 word response to the followingBoth perspectives that we rea.docx
100 word response to the followingBoth perspectives that we rea.docx100 word response to the followingBoth perspectives that we rea.docx
100 word response to the followingBoth perspectives that we rea.docx
 
100 word response to the followingThe point that Penetito is tr.docx
100 word response to the followingThe point that Penetito is tr.docx100 word response to the followingThe point that Penetito is tr.docx
100 word response to the followingThe point that Penetito is tr.docx
 
100 word response to the folowingMust use Chicago style citing an.docx
100 word response to the folowingMust use Chicago style citing an.docx100 word response to the folowingMust use Chicago style citing an.docx
100 word response to the folowingMust use Chicago style citing an.docx
 
100 word response using textbook Getlein, Mark. Living with Art, 9t.docx
100 word response using textbook Getlein, Mark. Living with Art, 9t.docx100 word response using textbook Getlein, Mark. Living with Art, 9t.docx
100 word response using textbook Getlein, Mark. Living with Art, 9t.docx
 
100 word response to the following. Must cite properly in MLA.Un.docx
100 word response to the following. Must cite properly in MLA.Un.docx100 word response to the following. Must cite properly in MLA.Un.docx
100 word response to the following. Must cite properly in MLA.Un.docx
 
100 original, rubric, word count and required readings must be incl.docx
100 original, rubric, word count and required readings must be incl.docx100 original, rubric, word count and required readings must be incl.docx
100 original, rubric, word count and required readings must be incl.docx
 
100 or more wordsFor this Discussion imagine that you are speaki.docx
100 or more wordsFor this Discussion imagine that you are speaki.docx100 or more wordsFor this Discussion imagine that you are speaki.docx
100 or more wordsFor this Discussion imagine that you are speaki.docx
 
10. (TCOs 1 and 10) Apple, Inc. a cash basis S corporation in Or.docx
10. (TCOs 1 and 10) Apple, Inc. a cash basis S corporation in Or.docx10. (TCOs 1 and 10) Apple, Inc. a cash basis S corporation in Or.docx
10. (TCOs 1 and 10) Apple, Inc. a cash basis S corporation in Or.docx
 
10-12 slides with Notes APA Style ReferecesThe prosecutor is getti.docx
10-12 slides with Notes APA Style ReferecesThe prosecutor is getti.docx10-12 slides with Notes APA Style ReferecesThe prosecutor is getti.docx
10-12 slides with Notes APA Style ReferecesThe prosecutor is getti.docx
 
10-12 page paer onDiscuss the advantages and problems with trailer.docx
10-12 page paer onDiscuss the advantages and problems with trailer.docx10-12 page paer onDiscuss the advantages and problems with trailer.docx
10-12 page paer onDiscuss the advantages and problems with trailer.docx
 
10. Assume that you are responsible for decontaminating materials in.docx
10. Assume that you are responsible for decontaminating materials in.docx10. Assume that you are responsible for decontaminating materials in.docx
10. Assume that you are responsible for decontaminating materials in.docx
 

Recently uploaded

Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 

Recently uploaded (20)

Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 

iStockphotoThinkstockchapter 10Linear RegressionL.docx

  • 1. iStockphoto/Thinkstock chapter 10 Linear Regression Learning Objectives After reading this chapter, you will be able to. . . 1 explain the relationship between correlation and regression. 2. understand the importance of prediction using regression. 3. identify the two values that determine the regression line in least-squares regression. 4 predict the value of a criterion based on a predictor value. 5. describe the extension of bivariate regression to multiple regression. 6. report and interpret results of multiple regression in APA format. CN CO_LO CO_TX CO_NL
  • 2. CT CO_CRD suk85842_10_c10.indd 367 10/23/13 1:43 PM CHAPTER 10Section 10.1 Regression and Correlation Regression is a powerful analytical tool that involves using relationships between vari-ables to predict one from a measure of the other. Generally speaking, people make regression-like predictions often. The presence of clouds in the morning sky prompts us to take an umbrella to work, for example, or a phone call from unexpected guests prompts us to prepare extra food for a meal. This chapter follows the same thinking, except that the predictions are mathematical. Regression topics represent a bit of a puzzle. Sometimes readers shy away from regression topics, thinking that they are difficult to understand or at least difficult to master. How- ever, regression, as the reader will see, is neither difficult to understand nor to execute. Social scientists rely on regression in virtually every advanced statistical procedure. Most of those high-end statistical techniques—such as multivariate analysis of variance, dis- criminant function analysis, and structural equations modeling—are beyond the scope of an introductory text, but regression analysis is an essential part of the preparation for each
  • 3. of them. In the meantime, we will learn to use regression for its own purposes, to detect significant independent variables known as predictors on a dependent outcome variable known as the criterion. 10.1 Regression and Correlation In Chapter 9, we made the point that when two variables are correlated, it is because they both contain some of the same information. If intelligence and reading comprehension are correlated, it is because to some degree they both measure the same characteristic. The more highly they are correlated, the greater the quantity of whatever is measured that the two characteristics have in common. Recall that the coefficient of determination (rxy 2) indicates the proportion of one variable, the x variable, for example, that can be explained by the other, which is designated y. If intelligence (x) and reading comprehension (y) are correlated rxy 5 .8, then rxy 2 5 .64. The coefficient of determination tells us that 64% of whatever reading comprehension measures can be explained by differences in intelligence. Another way to say this is that the coefficient of determination indicates how much information two correlated variables have in common. If correlated variables share information, it seems logical that if we had information about
  • 4. the value of one of those variables, we could make a better- than-chance prediction of what the value of the other would be. • If age and height are correlated for teenagers and we know how old someone is, we should be able to make a better-than-chance prediction of how tall that per- son is, right? Or perhaps it can be turned around, and knowing how tall a teen is can predict that teen’s age. • If education and income are correlated and we know how many years of school- ing someone has had, we should be able to make an educated guess about that person’s level of income, right? • If soldiers’ exposure to combat situations is correlated with their manifestation of posttraumatic stress disorder (PTSD), can the severity of PTSD be predicted from the number of times a soldier has been in combat? H1 TX_DC BLF TX BL BLL
  • 5. suk85842_10_c10.indd 368 10/23/13 1:43 PM CHAPTER 10Section 10.1 Regression and Correlation Questions like these are at the heart of regression, and people have been asking such ques- tions for a long time. Karl Gauss, the man who defined the characteristics of the normal (Gaussian) distribution, began developing the mathematics behind regression in the early part of the 19th century. Many others have also made contributions to this form of quan- titative analysis. Collectively, their work has allowed experts in a variety of fields to use regression procedures in their decision making for many years. • Economists gather data on unemployment rates, wholesale inventories, and con- sumer spending in order to predict the rate at which the economy will grow. This approach is effective because each of those variables correlates with economic expansion. • Meteorologists use changes in barometric pressure to predict the kind of weather that is coming. Because drops in barometric pressure are predictors of violent storms in the Great Plains states and in the southeastern part of the United States, meteorologists watch particularly for dramatic drops in air pressure. • Odds makers use data on a team’s past performance, any
  • 6. injuries to key players, and the quality of the opponent to predict the outcomes in games. • Psychologists can use genetic and social factors, including the history of alcohol abuse in a family, to predict an individual’s predisposition to abuse drugs. • Human resource professionals can use current employee performance measures to predict the effectiveness of future employee hires. Each of these scenarios is possible because of correlations between variables. Correlations pave the way for prediction. The point is not that the people in these examples necessar- ily sit down with mathematical models to calculate the probability that certain results will emerge (which is something we will do in this chapter), but they could. In fact, a review of the professional literature provides ample evidence that scholars perform analyses like these frequently. They are important because prediction allows those who must act to be proactive. Rather than waiting for some important con- dition to emerge, its timing can be anticipated with some precision, and then an appropriate action can be taken. Prediction is the basis for sound decision making. The Language of Regression
  • 7. There are many types of regression procedures. However, even though we are interested in just one in this chapter, the concepts and much of the language used here are common to the different approaches. Because variables share information, one can be used to pre- dict the other. Those terms are common enough in statistics, but in regression discussions the danger is that the words independent and dependent make it very easy to begin thinking of the relationship between the antecedent variables and that dependent variable as causal. Although we also used this language with t-tests and ANOVA, the risk is greater in cor- relation discussions because the discussion begins with the assumption of a relationship between the variables. To avoid this slippery slope, we will make an adjustment in the regression discussion. Rather than the terms dependent variable and independent variable, as common as those terms are elsewhere, we will refer to the variable to be predicted in a regression procedure as the criterion variable and the variable used to make the A From the standpoint of regression, why does the size of a correlation coefficient matter? Try It!
  • 8. suk85842_10_c10.indd 369 10/23/13 1:43 PM CHAPTER 10Section 10.1 Regression and Correlation prediction as the predictor variable. This language is adopted to minimize the risk of confusing correlations with causal relationships. Note that the suggestion is not that there is not a causal relationship. There may be just such a connection. In fact, the relationship between exposure to combat and the develop- ment of posttraumatic stress probably is causal. The point is that the correlation alone— and correlation is the foundation for pursuing regression—is not sufficient by itself to establish causality. Although the terms criterion and predictor are used here for descriptive purposes, there needs to be some shorthand indicators as well. The symbols used in regression are the same as those used with the Pearson correlation in Chapter 9: x and y for the correlated variables. The symbols will be • x for the predictor variable and • y for the criterion variable. Which Is the Predictor? The confusion that can occur when equating correlation with causation is increased by recognizing that either variable in a significant correlation can
  • 9. be used to predict the other. If there is a correlation between the degree of posttraumatic stress disorder and the length of exposure to combat, it means that each variable is equally related to the other. Someone could • predict the degree of PTSD from the length of combat exposure or • turn it around and predict the length of combat exposure from the degree of PTSD. Either variable in a statistically significant correlation can predict the other. From the point of view of the mathematics involved, it does not matter, although practical considerations may dictate which variable must be the predictor and which must be the criterion. Some- times one of the variables will prove more elusive than the other because the data involved is more difficult to gather. In such cases, the difficulty involved may dictate that the more accessible variable becomes the predictor and that the less available variable be predicted rather than gathered. If reading comprehension scores are significantly correlated with intelligence scores and someone wishes to predict the value of one from the other, it makes sense that the reading scores would be the predictor variable. Reading scores are easier to collect than intelli- gence scores. Most major intelligence tests must be administered one on one by someone who has been trained with the instrument, which makes the process expensive and time-
  • 10. consuming. Reading tests are typically less demanding to administer. There are other considerations. Perhaps aptitude scores from a college admissions test that students take in high school are correlated with the grades that students earn dur- ing their first year of college study. From the standpoint of the correlation, scores can be predicted from grades quite as readily as grades can be predicted from scores. It does not make much sense to predict backwards. It will generally be the students’ future rather than their past performance that someone will be interested in knowing. suk85842_10_c10.indd 370 10/23/13 1:43 PM CHAPTER 10Section 10.1 Regression and Correlation Picturing Regression Scatterplots are an easy way to graphically represent the correlation between variables. In Chapter 9 the scatterplot was used to illustrate the correlation between verbal ability and intelligence. Recall that each point in the scatterplot represented one person’s scores on both of the measured characteristics. When we use the scatterplot for regression purposes, the vertical axis will indicate the level of the criterion variable, y, and the horizontal axis will indicate the level of the predictor variable, x:
  • 11. • x signifies the predictor variable in a scatterplot. • y is the criterion variable. As noted in Chapter 9, when two variables are highly correlated, there tends to be little scatter from left to right along the ranges of the two variables. For example, a researcher randomly selects a group of 20 first-year college students at the end of their first year of study at a large university and gathers from them two types of data: (a) the number of hours per week they typically study and (b) their recorded grade averages at the end of their first semester. The researcher wants to determine how well the number of hours the student studies per week will predict the student’s grades. This means that the hours studied is the predictor variable, x, and grade average is the criterion variable, y. The data follows: Hours Studied (x) Grade Average (y) 1 1 1.5 2 2 1.8 3 3 2.2 4 3 2.0 5 5 2.0 6 5 2.1 7 7 2.4
  • 12. 8 7 2.2 9 8 2.4 10 10 2.7 11 10 2.6 12 11 2.9 13 13 3.0 14 15 3.0 15 16 3.1 16 16 2.7 17 16 3.3 18 17 3.0 19 18 3.4 20 20 4.0 suk85842_10_c10.indd 371 10/23/13 1:43 PM CHAPTER 10Section 10.1 Regression and Correlation This data can be used to create the same type of graph completed in Chapter 9. If the data is placed in columns in an Excel spreadsheet just as they are here (the numbering down
  • 13. the left is automatic in Excel and not considered one of the data columns), the commands for creating the scatterplot are then simply Insert and then Scatter. With some additions to label the axes and title the graph, the result is Figure 10.1. To plot the data in this example, draw the vertical and horizontal axes of the graph, and then for each individual subject indicate the confluence of the individual’s grade (vertical axis) with the same individual’s study time (horizontal axis). Once all subjects are plotted, at least three conclusions can be drawn: 1. There is a correlation between the number of hours studied and the students’ first-semester grades. If this was not the case, the dots would have no particular pattern and would have just been randomly scattered. However, the dots array fairly consistently from lower left to upper right. 2. The correlation is substantial; there is not much scatter in the scatterplot, and the trend from lower left to upper right is easy to recognize. If the relationship had been negative (the more they studied, the poorer they performed), the data pattern would have been from the upper left to the lower right. In that case, as the value of one variable increased, the other would decrease. (This might be the case, for example, if this was a correlation of the number of hours students spend playing video games and their grades. The more they play, the less well they do
  • 14. in their classes.) 3. The relationship appears to be linear. There is not a point in the pattern where it appears that as one variable seems to increase, the other tends to level off or even diminish. Figure 10.1: A scatterplot for the relationship between study time and grades 0 2.0 2.5 1.0 1.5 0.5 3.0 3.5 4.0 4.5 0 5 10 15 20 25 Hours Studied Per Week
  • 15. F ir st T e rm G ra d e s o n a 4 .0 S ca le suk85842_10_c10.indd 372 10/23/13 1:43 PM CHAPTER 10Section 10.1 Regression and Correlation About Linearity The third point about the relationship between x and y being
  • 16. linear is particularly impor- tant. The type of regression discussed in this chapter is based on the assumption that the association between the predictor and criterion variables is linear. There are other regres- sion techniques that are used when the relationship between variables is not linear, but in the reasoning used in this chapter, there needs to be linearity. Because there is a linear connection, a straight line drawn through or as close as possible to as many of the data points as possible might look like the line in the graph in Figure 10.2A. Figure 10.2: Regression lines That line through the data points is called a regression line. If it is positioned so that it is as close as possible to as many of the 20 data points as it can be and still be a straight line, it can be used to determine any value of y from a specified value of x (or for that matter, any value of x from a specified value of y). For example, someone using the graph can select any value of x along the horizontal axis, go vertically from that x value up to the line, and then move left horizontally from the line to where the y-axis is. The value of the point at which the y-axis is encountered will be the value of y for the specified x value. See Figure 10.2B. 0 4 5
  • 17. 2 3 1 0 5 10 15 20 25 Hours Studied Per Week F ir st T e rm G ra d e s o n a 4 .0 S ca
  • 18. le A. A Regression Line through a Scatterplot 0 4 5 2 3 1 0 5 10 15 20 25 Hours Studied Per Week F ir st T e rm G ra d e s o
  • 19. n a 4 .0 S ca le B. Using the Regression Line to Determine y from x suk85842_10_c10.indd 373 10/23/13 1:43 PM CHAPTER 10Section 10.1 Regression and Correlation No one in the sample of 20 indicated a study time of 12.5 hours per week, but perhaps one of the researcher’s colleagues who knows what kind of analysis is under way asks, “I know someone who studies 12.5 hours every week. What is your best guess for this stu- dent’s grade average?” If the regression line is positioned accurately, the researcher can find 12.5 on the x-axis, travel vertically up to the regression line, and then move left to the y-axis to determine that 12.5 hours per week of study time predicts a grade point average of about 2.9. The researcher gathered data from only 20 students, which is a little risky, but they were randomly selected, and sampling theory tells us that a randomly
  • 20. selected sample will differ from the population of all freshman students only by chance. In spite of the small sample size, the researcher may be lucky and find that the sampling error is minimal. Coping with Less-Than-Perfect Correlations Besides the fact that the graph in Figure 10.2 does not provide very detailed markings (the researcher had to guess that 12.5 hours studied will produce a grade average of “about 2.9”), other factors affect prediction accuracy. The researcher cannot be exactly precise about the grades because the correlation between the two variables is not perfect. Although a grade average of 2.9 might be the best possible prediction given this data, it is quite likely that the prediction will not be exactly accurate. Maybe for a particular student who studies 12.5 hours per week, a GPA of 2.8 would turn out to be a better prediction, or per- haps 3.0. Without yet calculating the correlation, the evidence for rxy is the scatter in the data points. For example, note that three students reported studying the same amount, 16 hours per week, but all of them ended the semester with different grade averages. Of course, an imperfect correlation such as this one reflects the fact that besides probably including some error in the way the variables are measured, grades are affected by more than just study time. The researcher has not accounted
  • 21. for differences in academic ability or differences in the quality of the study time. Those problems aside, as long as the correlation between the predictor and criterion variables is statistically significant, the pre- dicted value of y from the value of x will be more accurate over time than a number of random predictions. Error in prediction is something we learn to tolerate. No one sues the United States Commerce Department if the prediction for job growth is off by one-tenth for the year. We do not petition the television station to fire the meteorologist when the forecast high for the day is wrong by a couple of degrees. Error is inevitable when predictions have to be based on imperfectly correlated variables, but we can at least have some measure of how extensive the error is likely to be. Later in the chapter, we will learn to calculate the amount of error and then include it with the predicted value in order to have a gauge of prediction accuracy. B What is the visual evidence in a scatterplot for a weak correlation? Try It! suk85842_10_c10.indd 374 10/23/13 1:43 PM
  • 22. CHAPTER 10Section 10.1 Regression and Correlation Understanding the Least-Squares Criterion A scatterplot is a helpful way to introduce the idea of regression, but relying on the scat- tered points in a graph in order to predict one variable from the other is not practical. It is a conceptual model for what will actually be done mathematically with a regression equa- tion as depicted in Figure 10.3. The equation is derived in a way that satisfies the require- ments for what is called the least-squares criterion. The least- squares criterion (think of criterion in this case as a requirement) is this: The regression line must be positioned so that the sum of all possible prediction errors (y 2 ŷ) has the lowest possible value. In other words, the distance (depicted as gray lines in the graph) from the data point to the regression line is as minimal as possible. These errors that are represented in the difference between y and ŷ are termed residual scores as shown in Figure 10.3. Figure 10.3: The least-squares criterion graph Source: Used by permission Alan B. Hale, Ph.D. Describing the Prediction Error Because of error in the regression solution, at least some of the predictions are going to be inaccurate. These errors emerge as differences between the
  • 23. criterion variable’s predicted value and what its actual value would be with all possible data and if there was no need to rely on the figure or on mathematics for the prediction. If x is the actual value of the predictor variable, and y is the actual value of the criterion variable, we can use y-intercept x1 x (independent) y1 y2 y2 y2 � y2 y (d e p e n d e n t) ^
  • 24. ^ Minimize: �(yi � yi)2 Least Squares Method n i = 1 ^ Line: y = a + bx suk85842_10_c10.indd 375 10/23/13 1:43 PM CHAPTER 10Section 10.2 Ordinary Least-Squares Regression With One Predictor ŷ (y-hat) to symbolize the predicted value of the criterion variable and y 2 ŷ as a way to indicate the prediction error or residual scores in a particular solution. Look at Figure 10.2 again. If the correlation between hours studied and end-of-term grades was perfect, the data points would form a straight line. With no scatter in the line, each value of x would result in just one corresponding value of y. Those three people who reported 16 hours per week of study time would all have had the same grades at the end
  • 25. of the semester. But with scatter in the data points, establishing the regression line is not a matter of just connecting the dots. Because the regression line is a straight line and it has to be posi- tioned so as to minimize the prediction errors, its positioning involves some compromise. It is a line of best fit, not a line of perfect fit. Using the Residual Scores If someone were to rely on Figure 10.2 to make a series of predictions and then go through university records to determine what the actual grade averages were for those 20 students after their first semester, any difference between what was predicted and what the grades actually were would be reflected in the residual scores. The residual scores would indicate the amount of prediction error. If all of those residual scores were added up, what would they total? This is how the ques- tion looks in symbols: a ( y 2 ŷ) 5 ? When a number of predictions are made, some of the residual scores will be positive (the actual value of y will be larger than the predicted value, ŷ), and some will be negative (y is smaller than ŷ). And if all the residual scores were summed, their total would be 0. The positive and negative residual scores would cancel each other out and sum to 0.
  • 26. The zero sum is not very helpful to someone who needs to know how much error there is in a regression solution. A better indicator of prediction error would be to square the residual scores, which would make all the negative residual scores positive, and then sum them. If the sum of those squared values has its lowest possible value, the solution meets the least-squares criterion. The regression equation used here was developed to satisfy that requirement. That is why this particular form of regression is called ordinary least- squares regression. 10.2 Ordinary Least-Squares Regression With One Predictor Theoretically, there can be any number of predictors, x1, x2, x3, . . ., xn in a regression problem rather than just the single x. Toward the end of the chapter, there is a descrip- tion of how multiple regression works. This form of multiple regression is just ordinary suk85842_10_c10.indd 376 10/23/13 1:43 PM CHAPTER 10Section 10.2 Ordinary Least-Squares Regression With One Predictor least-squares regression with multiple predictors. The focus here is on regression with just one predictor. It is sometimes called simple regression or bivariate regression because there are only two variables involved.
  • 27. The regression equation provides the math needed to position the regression line. So that the positioning of the regression line meets the least-squares criterion, there are two ques- tions to be answered: 1. At what point does the regression line cross the y-axis in the graph? 2. How much does the criterion variable (y) change when the predictor variable (x) increases by 1.0? The answer to the first question establishes the value of the intercept. The intercept indi- cates the value of y where the regression line intercepts the vertical axis. Put more suc- cinctly, the intercept is the value y when x 5 0. If you look at Figure 10.2 again, it appears that the regression line crosses the y-axis at about y 5 1.2. Following is an equation to calculate this value, which will indicate how close we were with the estimate. The answer to the second question requires a value that represents the slope of the line, or just the slope. Refer to Figure 10.2. It is possible to estimate from the way the line is positioned that if x increases by 1.0, it appears that y increases by about .2. The slope value reveals how radically the regression line inclines or declines. The Regression Equation The bivariate regression equation has this form:
  • 28. ŷ 5 a 1 bx 1 e Formula 10.1 Where ŷ 5 the predicted value of the criterion variable a 5 the intercept b 5 the slope of the regression line x 5 the value of the predictor variable e 5 prediction error The equation shows that for a given value of the predictor variable (x), the best prediction for ŷ is the value of the intercept plus the product of the slope times the predictor plus some component of error. Calculating the amount of error in the regression problem is a separate process rather than part of the initial calculation. The e indicates that there is always error in the prediction. Because the error will be calculated separately, we can take the e out of the equation, which makes the formula ŷ 5 a 1 bx suk85842_10_c10.indd 377 10/23/13 1:43 PM CHAPTER 10Section 10.2 Ordinary Least-Squares Regression With One Predictor Before calculating a regression solution, we need to know the
  • 29. intercept value, a, and the slope, b. They each have their own equations, but most of the terms involved are statistics that are already familiar. First, the formula for the intercept: a 5 My 2 bMx Formula 10.2 Where a 5 the intercept b 5 (this is the new part) the slope of the regression line My 5 the mean of the criterion variable, y Mx 5 the mean of the predictor variable, x The intercept value is 1. the mean of the criterion variable minus 2. the slope value (once that is determined) times the mean of the predictor variable. Because the intercept formula includes the slope of the regression line, or the regression coefficient, we need to start there. That formula is b 5 rxy a sy sx b Formula 10.3 Where b 5 the slope of the regression line
  • 30. rxy 5 the correlation coefficient for the two variables sy 5 the standard deviation of the criterion variable, y sx 5 the standard deviation of the predictor variable, x Calculating a Regression Solution Using the study-time and semester-grades data, calculate a regression solution. Remem- ber that the researcher estimated that someone who studied 12.5 hours per week would probably have a semester grade average of about 2.9. How accurate was that estimate? The process will be as follows: 1. Calculate the means and standard deviations for x and y. 2. Calculate the correlation of x and y. 3. Calculate the slope of the line, or the regression coefficient, b. 4. Calculate the regression intercept, a.
  • 31. 5. Calculate the value of ŷ. suk85842_10_c10.indd 378 10/23/13 1:43 PM CHAPTER 10Section 10.2 Ordinary Least-Squares Regression With One Predictor 1. For the two variables, verify that Hours Studied (x) Grade Average (y) Mean 10.15 2.615 Standard deviation 5.941 613 2. Using Formula 9.2, calculate the correlation as follows: rxy 5 n a xy 2 ( a x)( a y) Å e cn a x 2 2 ( a x)
  • 32. 2 d cn a y2 2 ( a y2 d f rxy 5 201596.32 2 12032 152.32 "532012,7312 2 12032 2 4 3 1201143.912 2 152.32 2 2 46 rxy 5 1,309.4 "3 113,4112 1142.912 4 5 .946 Checking this value against the critical value from Table 9.1, rxy .05(18) 5 .444, indi- cates that the correlation is statistically significant. To do the correlation on Excel, remember that having set up the data in the two columns, the commands are DataSData AnalysisSCorrelation. 3. The slope of the line is b 5 rxy a sy
  • 33. sx b 5 .946 a .613 5.941 b 5 .098 This value indicates that y increases .098 for every 1.0 increase in x. Earlier we guessed that the slope of the line in the graph in Figure 10.2 might be about .2, which turns out to be incorrect. 4. The regression line intercept is a 5 My 2 bMx 5 2.615 2 (.098)(10.15) 5 1.620 This value indicates that if x 5 0, then y 5 1.620. Based on the
  • 34. visual best fit that we made to the graph in Figure 10.2, we guessed that the intercept would be about y 5 1.2. That estimate was not very close either. suk85842_10_c10.indd 379 10/23/13 1:43 PM CHAPTER 10Section 10.2 Ordinary Least-Squares Regression With One Predictor 5. What grades would students likely earn if they study 12.5 hours per week? The estimate based on that visually placed regression line in Figure 10.2 was a grade average of about 2.9. Check this “guesstimate” by solving for ŷ and comparing the two solutions. ŷ 5 a 1 bx 5 1.620 1 (.098)(12.5) 5 2.845
  • 35. 6. Based on the data from the 20 students, the best prediction of the grade average someone who studies 12.5 hours per week will earn is 2.845. Interestingly, that value is not too far from the earlier prediction. Notice that the data from two correlated variables were helpful in predicting a value from one of the two variables that was unknown, from a known value of the other variable. It took a few descriptive statistics (the means, standard deviations, and the correlation coef- ficient) and the relatively straightforward equations for the slope, the intercept, and for the predicted value of y. Practicing the Regression