2. Correlation and Regression
2
Correlation describes the strength of a
linear relationship between two variables
Regression tells us how to draw the straight
line described by the correlation
3. Correlation and Regression
• For example:
A sociologist may be interested in the relationship
between education and self-esteem or Income and
Number of Children in a family.
Independent Variables
Education
Family Income
Dependent Variables
Self-Esteem
Number of Children
3
4. Correlation and Regression
• For example:
• May expect: As education increases, self-esteem
increases (positive relationship).
• May expect: As family income increases, the number
of children in families declines (negative relationship).
Family Income
Dependent Variables
Self-Esteem
Number of Children
Independent Variables
Education
+
4
-
6. Correlation
6
• Correlation is a statistical technique used to
determine the degree to which two variables
are related
• A correlation is a relationship between two
variables. The data can be represented by the
ordered pairs (x, y) where x is the independent
(or explanatory) variable, and y is the
dependent (or response) variable.
7. Correlation
x 1 2 3 4 5
y – 4 – 2 – 1 0 2
A scatter plot can be used to determine
whether a linear (straight line) correlation
exists between two variables.
x
2 4
–2
– 4
y
7
2
6
Example:
8. Linear Correlation
y
x
Negative Linear Correlation
y
x
No Correlation
y
x
Positive Linear Correlation
y
x
Nonlinear Correlation
As x increases,
y tends to
decrease.
8
As x increases,
y tends to
increase.
10. of r denotes the nature of
The sign
association
while the value of r denotes the strength of
association.
10
11. If the sign is +ve this means the relation is
direct (an increase in one variable is
11
other variable and a decrease in
variable is associated with
associated with an increase in the
one
a
decrease in the other variable).
While if the sign is -ve this means an
inverse or indirect relationship (which
means an increase in one variable is
associated with a decrease in the other).
12. -1 -0.75 -0.25 0 0.25 0.75 1
The value of r ranges between ( -1) and ( +1)
The value of r denotes the strength of the
association as illustrated
by the following diagram.
strong intermediate weak weak intermediate strong
no
relation
perfect
correlation
perfect
correlation
Direct
12
indirect
13. If r = Zero this means no association or
correlation between the two variables.
If 0 < r < 0.25 = weak correlation.
If 0.25 ≤ r < 0.75 = intermediate correlation.
If 0.75 ≤ r < 1 = strong correlation.
If r = l = perfect correlation.
13
14. Linear Correlation
x
Weak positive correlation
y
x
Nonlinear Correlation
y
r = ï€0.91
x
Strong negative correlation
y
r = 0.42
14
r = 0.88
x
Strong positive correlation
y
r = 0.07
16. Calculatinga CorrelationCoefficient
16
Calculating a Correlation Coefficient
x2
 y 2
In Words In Symbols
4. Square each x-value and
find the sum.
5. Square each y-value and
find the sum.
6. Use these five sums to
calculate the correlation
coefficient.
17. Correlation Coefficient
17
Example:
Calculate the correlation coefficient r for the following
data.
x y xy x2 y2
1 – 3 – 3 1 9
2 – 1 – 2 4 1
3 0 0 9 0
4 1 4 16 1
5 2 10 25 4
x 15  y  ï€1 xy  9 x2
 55  y2
 15
19. Correlation Coefficient
19
Hours,
x
0 1 2 3 3 5 5 5 6 7 7 10
Test score,
y
96 85 82 74 95 68 76 84 58 65 75 50
Example:
The following data represents the number of hours, 12
different students watched television during the
weekend and the scores of each student who took a test
the following Monday.
a) Display the scatter plot.
b) Calculate the correlation coefficient r.
23. Example:
23
A sample of 6 children was selected, data about their
age in years and weight in kilograms was recorded
as shown in the following table . It is required to find
the correlation between age and weight.
serial
No
Age
(years)
Weight
(Kg)
1 7 12
2 6 8
3 8 12
4 5 10
5 6 11
6 9 13
29. 13
10
0
20
30
40
50
60
70
0 2 12 14
2
29
4 6 8 10
Trunk Diameter, x
Tree
Height,
y
Example
• r = 0.886 → relatively
strong positive linear
association between x
and y
32. Regression Analyses
32
• Regression technique is concerned with
predicting some variables by knowing others
• The process of predicting variable Y using
variable X
33. 20
Types of Regression Models
Positive Linear Relationship
Negative Linear Relationship
Relationship NOT Linear
No Relationship
33
34. Regression
34
Uses a variable (x) to predict some outcome
variable (y)
Tells you how values in y change as a
function of changes in values of x
35. The regression line makes the sum of the squares of the
residuals smaller than for any other line
Regression minimizes residuals
220
200
180
160
140
120
100
80
60 70 80 90 100 110
Wt (kg)
120
SBP(mmHg)
35
36. By using the least squares method (a procedure that
minimizes the vertical deviations of
surrounding a straight line)
plotted
we
points
are
able to construct a best fitting straight line to the scatter
diagram points and then formulate a regression equation
in the form of:
ï€
n
 n
ï€ ïƒ¥ x  y
 xy
b
(  x) 2
 x 2
1
yÌ‚  y  b(x ï€ x)
ŷ  a  bX
Regression equation describes the regression line
mathematically by showing Intercept and Slope
36
37. Correlation and Regression
37
• The statistics equation for a line:
Y = a + bx
Where^
: Y = the line’s position on
the v
^ertical axis at any point
(estimated value of dependent
variable)
X = the line’s position on the
horizontal axis at any point (value of
the independent variable for which you
want an estimate of Y)
b = the slope of the line
(called the coefficient)
a = the intercept with the Y axis,
where X equals zero
39. Exercise
A sample of 6 persons was selected the value of
their age ( x variable) and their weight is
demonstrated in the following table. Find the
regression equation and what is the predicted
weight when age is 8.5 years.
39
Serial no. Age (x) Weight (y)
1 7 12
2 6 8
3 8 12
4 5 10
5 6 11
6 9 13
43. we create a regression line by plotting two estimated
values for y against their X component, then extending
the line right and left.
43
44. Regression Line
44
Example:
a) Find the equation of the regression line.
b) Use the equation to find the expected value when
value of x is 2.3
x y xy x2 y2
1 – 3 – 3 1 9
2 – 1 – 2 4 1
3 0 0 9 0
4 1 4 16 1
5 2 10 25 4
x 15  y  ï€1 xy  9 x2
 55  y2
 15
46. Regression Line
46
Example:
The following data represents the number of hours 12
different students watched television during the
weekend and the scores of each student who took a
test the following Monday.
a) Find the equation of the regression line.
b) Use the equation to find the expected test score
for a student who watches 9 hours of TV.
48. Exercise
• Find the correlation between age and blood
pressure using simple and Spearman's
correlation coefficients, and comment.
• Find the regression equation?
• What is the predicted blood pressure for a
man aging 25 years?
48