This document summarizes key concepts from Chapter 5 of Jamri AB on correlation and simple linear regression. It introduces correlation as a measure of the strength of the linear relationship between two variables. It discusses scatter diagrams, the coefficient of correlation (r), and Pearson's product-moment correlation coefficient and Spearman's rank correlation coefficient as methods to calculate r. It also covers the coefficient of determination (r^2), linear regression analysis to predict relationships, and calculating the regression equation coefficients a and b. Examples are provided to demonstrate calculating r and the regression equation from sets of data.
1. Jamri AB Chapter 5 : Correlation and Simple Linear Regression
1
CHAPTER 5 :CORRELATION AND SIMPLE LINEAR REGRESSION
5.1 INTRODUCTION
In this chapter, we will learn about a population which a measure of the strength of the linear
relationship between two variables only. For example:
Expenditure (y) and revenue (x)
Price (x) and sales (y)
Advertising (x) and sales (y)
Quantity/output (x) and cost of production (y)
5.2 THE SCATTER DIAGRAM
The purpose of the scatter diagram, as you know, is to illustrate diagrammatically any relationship
that may exist between the dependent and independent variables.
To the extent that it succeeds it can help the analyst in three ways:
It indicates generally whether or not there appears to be a relationship between the two
variables.
If there is a relationship it may indicate whether it is linear or non linear.
If the relationship is linear, the scatter diagram will show whether it is positive or negative.
5.3 COEFFICIENT OF CORRELATION
Apart from plotting scatter diagrams, it is often useful to have an actual measure of the amount of
correlation that exists between two given variables such as weight and height, turnover and profit,
age and salary and so on. This measure of correlation is called a coefficient of correlation and is
normally given the symbol “r”.
It can only lies between –1 and +1.
Two methods are commonly used to give reasonable approximations to r .
1. Pearson’s product- moment correlation coefficient
2. Spearman’s rank correlation coefficient.
5.3.1 PEARSON’S PRODUCT MOMENT CORRELATION COEFFICIENT
The Pearson’s correlation coefficient tells us two aspects of the relationship between two
variables. The sign (negative or positive) for r identifies the kind of relationship and the magnitude
of r describes the strength of relationship.
The formula is as follows.
r =
n
y
y
n
x
x
n
x y
xy
2
2
2
2
2. Jamri AB Chapter 5 : Correlation and Simple Linear Regression
2
Example.
Suppose we record the height and weight of a random sample of six adults. It is reasonable to
assume that these variable are normally distributed, so the pearson correlation coefficient is the
appropriate measure of the degree of association between height and weight.
Height (cm) weight (kg)
170 57
175 64
176 70
178 76
183 71
185 82
Where one of our variable is the x variable, the other is the y variable and n is the number of
individuals.
In correlation, it is an arbitrary decision to which variable we call x and which we call y.
Solution
x y 2 x 2 y xy
170 57 28900 3249 9690
175 64 30625 4096 11200
176 70 30976 4900 12320
178 76 31684 5776 13528
183 71 33489 5041 12993
185 82 34225 6724 15170
x = 1067 y = 420 2 x = 189 899 2 y = 29 786 xy = 74901
r =
n
y
y
n
x
x
n
x y
xy
2
2
2
2
r =
6
(420)
29786
6
(1067)
189899
6
(1067)(420)
74901
2 2
r = 0.874
5.3.2 COEFFICIENT OF DETERMINATION.
The coefficient of determination, 2 r is the ratio of the explained variation to the total variation.
The term
2 r is expressed as a percentage.
If r = 0.9349, 2 r = 0.8740 means that 87.40% of the total variation in y can be explained by
the total variation in x.
3. Jamri AB Chapter 5 : Correlation and Simple Linear Regression
3
5.4 LINEAR REGRESSION ANALYSIS
The primary objective of regression analysis is to make predictions.
In regression analysis of the simple linear case, linear implying that the relationship between x
and y is a straight line relationship. In this simple case, the equation which best fits the data can
be written in the form of y = a + bx.
The formula of ‘a’ and ‘b’
Example
A local travel agency collected data on the numbers of booking made and the total payment received
from organizing trips within Malaysia between January 2007 through June 2007.
Number of bookings made Total payment received (RM’00)
20 60
2 25
4 26
23 66
18 49
14 48
Find the regression equation.
Solution
x y 2 x 2 y xy
20 60 400 3600 1200
2 25 4 625 50
4 26 16 676 104
23 66 529 4356 1518
18 49 324 2401 882
14 48 196 2304 672
x = 81 y = 274 2 x = 1469 2 y = 13962 xy = 4426
n
x
x
n
x y
xy
b 2
2 ( )
( )( )
=
6
(81)
1469
6
(81)(274)
4426
2
= 1.936
n
x
b
n
y
a
( ) ( )
=
6
(81)
1.936
6
(274)
= 19.530
Thus, the regression line is y = 19.530 + 1.936x
2 2 2
2
( X)( Y)
( XY)
n( XY) ( X)( Y) n
b or
n( X ) ( X) ( X)
( X )
n
( y) ( x)
a Y bX or b
n n
4. Jamri AB Chapter 5 : Correlation and Simple Linear Regression
4
5.5 SPEARMAN’S RANK CORRELATION COEFFICIENT
It can also be used on quantitative data but the variables must first ranked. The value of r is
calculated based on these rankings.
rs =
( 1)
6
1 2
2
n n
d
Example
The manufacturer of Physio exercise equipment wants to study the relationship between the number
of months exercise equipment was purchased and the number of hours the equipment was purchased
and the number of hours the equipment was used. A random sample of 10 persons who purchased
the equipment yielded the following:
Person A B C D E F G
Months purchased 2 6 9 7 8 4 5
Hours exercised 10 8 5 5 3 8 5
Compute Spearman’s rank correlation and interpret its value.
Solution
Months
Purchased
(x)
Hours
Exercised
(y)
x R y R d 2 d
2 10 1 7 -6.0 36
6 8 4 5.5 -1.5 2.25
9 5 7 3 4 16
7 5 5 3 2 4
8 3 6 1 5 25
4 8 2 5.5 -3.5 12.25
5 5 3 3 0 0
d 0 95.5 2 d
rs =
( 1)
6
1 2
2
n n
d
= 1-
7(49 1)
6(95.5)
= -0.705