Regression
Making predictions using data
Limitations of correlations
Correlations measure the magnitude of the relationship between two variables within a population
There are two important limitations associated with correlations
They cannot predict scores on one variable from knowledge of the other
They cannot measure relationships between more than two variables
Linear regression is a more flexible statistical technique that allows you to answer both types of questions
Knowledge of how much bacon a person consumes does not let you predict their exact risk of heart disease
You cannot produce an estimate of how bacon consumption, exercise, and alcohol intake combine to predict heart disease
Linear regression
Unlike Pearson correlations, linear regressions formalize the relationship between the two variables using a line
The components of this equation each have special meaning
Y = value of Y variable – also called outcome variable
X = value of X variable – also called predictor variable
b = slope of line – how changes in X produce changes in Y
a = intercept – what value of Y is associated with 0 in X
A regression line is an algorithm that maps scores on the predictor variable to scores on the outcome variable
Y = mX + b
Y = mX + b
Y = bX + a
Linear regression
But there are many possible lines that can capture the relationship between two variables
How do we determine the best line to represent a given set of data?
0.3 0.4 0.6 0.7 0.8 0.9 1.1000000000000001 1.3 3 8 6 9 3 6 11 10
Linear regression
But there are many possible lines that can capture the relationship between two variables
Each potential regression equation has a certain amount of error
Error = the distance between the regression line and each datapoint
0.3 0.4 0.6 0.7 0.8 0.9 1.1000000000000001 1.3 3 8 6 9 3 6 11 10
Linear regression
But there are many possible lines that can capture the relationship between two variables
Each potential regression equation has a certain amount of error
Error = the distance between the regression line and each datapoint
Also called residuals
The line of best fit is the line that minimizes the (squared) residuals
No other line can produce a smaller total error
0.3 0.4 0.6 0.7 0.8 0.9 1.1000000000000001 1.3 3 8 6 9 3 6 11 10
Line of best fit
To specify the equation for a line, we must estimate two values
The derivations for these are complicated (matrix algebra), but final form of the equations are easy to use
Y = bX + a
slope
intercept
Line of best fit
To specify the equation for a line, we must estimate two values
The derivations for these are complicated (matrix algebra), but final form of the equations are easy to use
We can use the equation for the line of best fit to predict scores on the outcome variable for any value of the predictor variable
Predicted scores are represented with Ŷ
Y = bX + a
Let’s do an example!Height (X)Rated deepness of voice ...
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
RegressionMaking predictions using dataLimitations.docx
1. Regression
Making predictions using data
Limitations of correlations
Correlations measure the magnitude of the relationship between
two variables within a population
There are two important limitations associated with correlations
They cannot predict scores on one variable from knowledge of
the other
They cannot measure relationships between more than two
variables
Linear regression is a more flexible statistical technique that
allows you to answer both types of questions
Knowledge of how much bacon a person consumes does not let
you predict their exact risk of heart disease
You cannot produce an estimate of how bacon consumption,
exercise, and alcohol intake combine to predict heart disease
2. Linear regression
Unlike Pearson correlations, linear regressions formalize the
relationship between the two variables using a line
The components of this equation each have special meaning
Y = value of Y variable – also called outcome variable
X = value of X variable – also called predictor variable
b = slope of line – how changes in X produce changes in Y
a = intercept – what value of Y is associated with 0 in X
A regression line is an algorithm that maps scores on the
predictor variable to scores on the outcome variable
Y = mX + b
Y = mX + b
Y = bX + a
Linear regression
But there are many possible lines that can capture the
relationship between two variables
How do we determine the best line to represent a given set of
data?
3. 0.3 0.4 0.6 0.7 0.8 0.9 1.1000000000000001 1.3 3
8 6 9 3 6 11 10
Linear regression
But there are many possible lines that can capture the
relationship between two variables
Each potential regression equation has a certain amount of error
Error = the distance between the regression line and each
datapoint
0.3 0.4 0.6 0.7 0.8 0.9 1.1000000000000001 1.3 3
8 6 9 3 6 11 10
Linear regression
But there are many possible lines that can capture the
relationship between two variables
Each potential regression equation has a certain amount of error
Error = the distance between the regression line and each
datapoint
Also called residuals
The line of best fit is the line that minimizes the (squared)
residuals
No other line can produce a smaller total error
4. 0.3 0.4 0.6 0.7 0.8 0.9 1.1000000000000001 1.3 3
8 6 9 3 6 11 10
Line of best fit
To specify the equation for a line, we must estimate two values
The derivations for these are complicated (matrix algebra), but
final form of the equations are easy to use
Y = bX + a
slope
intercept
Line of best fit
To specify the equation for a line, we must estimate two values
The derivations for these are complicated (matrix algebra), but
final form of the equations are easy to use
5. We can use the equation for the line of best fit to predict scores
on the outcome variable for any value of the predictor variable
Predicted scores are represented with Ŷ
Y = bX + a
Let’s do an example!Height (X)Rated deepness of voice
(Y)481602725787M = 64.5M = 3.75
Results: Predicting DeepnessVariableCoefficientt-valueSig. = p-
valueIntercept/Constant-9.186-3.88.061Height.2015.54.031
Intercept/Constant = Deepness if someone was literally zero
height
6. Doesn’t make sense here, but could in other cases
Height Coefficient = Ratings of deepness increase by .201 for
every inch increase in height
there really is a relationship between height and deepness of
voice
it’s not actually different from zero.
BUT still must include to compute predicted values
Results: Predicting DeepnessVariableCoefficientt-valueSig. = p-
valueIntercept/Constant-9.186-3.88.061Height.2015.54.031
How deep would a 5.5 ft person be rated?
.201 * 66 – 9.186
=4.08Height (X)Deepness (Y)481602725787M = 64.5M = 3.75
How deep would a 7 ft person be rated?
.201 * 84 – 9.186
= 7.698
How deep would a 6 ft person be rated?
.201 * 72– 9.186
= 5.286
Be cautious outside of original range
Prediction won’t exactly equal raw data
Multiple regression
Regression can also be used to evaluate the effect of multiple
7. predictor variables on the outcome variable
This technique is called multiple regression
Multiple regressions are commonly used in two situations:
When you expect many predictor variables to play a significant
role in predicting the outcome variable (e.g., age and experience
When you have a group of predictor variables and want to
decide which have the strongest relationships with the outcome
variable
(e.g. is sex or height the better predictor of voice pitch)
Adding multiple predictors to the regression equation changes
the interpretation of regression coefficients
Just like partial correlations is a different interpretation
Multiple regression
Regression equations with multiple predictors must specify a
different coefficient for each predictor
Predictor variable 1
Coefficient relating X1 and Y
Coefficient relating X2 and Y
Predictor variable 2
8. Multiple regression
Regression equations with multiple predictors must specify a
different coefficient for each predictor
These coefficients can be used to estimate the value of the
outcome associated with a set of scores on the predictors
What value of Y would be predicted from the following scores:
X1 = 5
X2 = 2
X3 = 7
X4 = 10
Ŷ = 130
Multiple regression
Regression equations with multiple predictors must specify a
different coefficient for each predictor
These coefficients can be used to estimate the value of the
outcome associated with a set of scores on the predictors
Calculating these coefficients by hand is extremely tedious…
…so we won’t bother doing it in this class
9. Predicting deepness in multiple regression
VariableCoefficientt-valueSig. = p-
valueIntercept/Constant8.2112.261.152Height-.123-
1.905.197Sex (M=1/F=0)7.005.085.037
Intercept/Constant = Deepness if someone was literally zero
height and is a woman
Again, doesn’t make sense here, but could in other cases
Height Coefficient = Ratings of deepness decrease by .123 for
every inch increase in height, once sex is controlled for. But it
is NOT significant, so CANNOT conclude height influences
voice pitch. Again, still include it.
Sex Coefficient = Men receive 7 unit increase in their ratings
compared to women, once height is controlled for. It is
significant, so we can conclude that sex influences pitch.
actually different from zero. BUT still must include to compute
predicted values
Predicting deepness in multiple regression
12. Ŷ = b1X1 +b2X2 +b3X3
ˆ
Y=b
1
X
1
+b
2
X
2
+b
3
X
3
n
X
n
+a
Ŷ = 5X1 +10X2 −5X3 +2X4 +100
ˆ
Y=5X
1
+10X
2
-5X
3
+2X
4
+100
13. Ŷ = 5(5) +10(2)−5(7)+2(10)+100
ˆ
Y=5(5)+10(2)-5(7)+2(10)+100
SW 301
Mini-Quiz 5
1. Utilizing the assigned readings regarding social justice and
social work in DuBois & Miley, discuss three of the theories
presented that would support the decision of the judge’s as
depicted thus far in the “I am Sam” film. Please give specific
examples from the film to support the theories you selected.