1. CORRELATION
Correlation is a statistical measurement of
the relationship between two variables
such that a change in one variable results
a change in other variable and such
variables are called correlated.
Thus the correlation analysis is a
mathematical tool which is used to
measure the degree to which are variable
is linearly related to each other
2. DIRECT OR POSITIVE CORRELATION
If the increase(or decrease) in one
variable results in a corresponding
increase (or decrease) in the other, the
correlation is said to be direct or
positive.
INVERSE OR NEGATIVE CORRELATION
If the increase(or decrease) in one
variable results in a corresponding
decrease (or increase) in the other, the
correlation is said to be inverse or
negative correlation.
3. For example, the correlation
between (i)The income and
expenditure; is positive.
And the correlation between (i)
the volume and pressure of a
perfect gas; is negative.
4. LINEAR CORRELATION
A relation in which the values of
two variable have a constant
ratio is called linear correlation
(or perfect correlation).
NON LINEAR CORRELATION
A relation in which the values of
two variable does not have a
constant ratio is called a non
linear correlation.
5. Karl Pearson’s Coefficient of
Correlation-
Correlation coefficient between two
variables x and y is denoted by r(x,y)
and it is a numerical measure of linear
relationship between them.
r=
Where r = correlation coefficient
between x and y
σx= standard deviation of x
σy = standard deviation of y
n= no. of observations
6. Properties of coefficient of
correlation-
(i) It is the degree of measure of correlation
(ii)The value of r(x,y) lies between -1 and 1.
(iii) If r=1, then the correlation is perfect
positive.
(iv) If r= -1, then the correlation is perfect
negative.
(v) If r = 0,then variables are independent ,
i.e. no correlation
7. (vi) Correlation coefficient is
independent of change of origin and
scale.
If X and Y are random variables and
a,b,c,d are any numbers provided that
a ≠0, c ≠0 ,then
r( aX+b, cY+d) = r(X,Y)
8. Example:- Calculate the correlation
coefficient of the following heights(in inches)
of fathers(X) and their sons(Y):
X : 65 66 67 67 68 69 70 72
Y : 67 68 65 68 72 72 69 71
12. = 0
= 0
r(U,V) =
On putting all the values we get-
r(U,V) = .603
13. RANK CORRELATION-
Let (xi ,yi) i = 1,2,3……n be the ranks of n
individuals in the group for the characteristic A
and B respectively.
Co-efficient of correlation between the ranks is
called the rank correlation co-efficient between the
characteristic A and B for that group of
individuals.
r = 1-
Where di denotes the difference in ranks of the ith
individual.
14. EXAMPLE-
Compute the rank correlation co-efficient for the following
data-
Person : A B C D E F G H I J
Rank in Maths : 9 10 6 5 7 2 4 8 1 3
Rank in Physics:1 2 3 4 5 6 7 8 9 10
15. Person R1 R2 d=R1 -R2 d2
A 9 1 8 64
B 10 2 8 64
C 6 3 3 9
D 5 4 1 1
E 7 5 2 4
F 2 6 -4 16
G 4 7 -3 9
H 8 8 0 0
I 1 9 -8 64
J 3 10 -7 49
TOTAL 280
17. Repeated Ranks
2 2 2 2
1 1 2 2
2
1 1 1
6 1 1 ..... 1
12 12 12
1
1
k kd m m m m m m
r
n n
Example : Obtain the rank correlation co-efficient for the following data ;
X 68 64 75 50 64 80 75 40 55 64
Y 62 58 68 45 81 60 68 48 50 70
18. X 68 64 75 50 64 80 75 40 55 64
Y 62 58 68 45 81 60 68 48 50 70
Ranks in
X
4 6 2.5 9 6 1 2.5 10 8 6
Ranks in
Y
5 7 3.5 10 1 6 3.5 9 8 2
d=x-y -1 -1 -1 -1 5 -5 -1 1 0 4 0
d2 1 1 1 1 25 25 1 1 0 16 72
2 2 2 2
1 1 2 2 3 3
2
2 2 2
2
1 1 1
6 1 1 1
12 12 12
1
1
1 1 1
6 72 2 2 1 3 3 1 2 2 1
12 12 12
1
10 10 1
6 75 6
1 0.545
990 11
d m m m m m m
r
n n
r
r
19. Regression Analysis
The term regression means some
sort of functional relationship
between two or more variables.
Regression measures the nature
and extent of correlation.
Regression is the estimation or
prediction of unknown values of one
variable from known values of
another variable.
20. CURVE OF REGRESSION AND
REGRESSION EQUATION
If two variates x and y are correlated, then
the scatter diagram will be more or less
concentrated round a curve. This curve is
called the curve of regression.
The mathematical equation of the
regression curve is called regression
equation.
21. LINEAR REGRESSION
When the points of the scatter
diagram concentrate round a
straight line, the regression is called
linear and this straight line is known
as the line of regression.
22. LINES OF REGRESSION
In case of n pairs (x,y), we can assume x or y
as independent or dependent variable.
Either of the two may be estimated for the
given values of the other. Thus if want to
estimate y for given values of x, we shall
have the regression equation of the form y =
a + bx, called the regression line of y on x.
And if we wish to estimate x from the given
values of y, we shall have the regression line
of the form x = A + By, called the regression
line of x on y.
Thus in general, we always have two lines of
regression
26. Where is the regression co-efficient.
xyb
2 2
( )
x
xy
y
n xy x y
b r
n y y
27. Theorem :- Correlation co-efficient is the geometric mean between the
regression co-efficients.
The co-efficient of regression are
Then geometric mean =
= co-efficient of correlation
y x
yx xy
x y
r r
b and b
yx
y x
rr
r
28. EXAMPLE-
Find the line of regression of y on x for the data given below:
X: 1.53 1.78 2.60 2.95 3.43
Y: 33.50 36.30 40 45.80 53.50
29. Solution:
x y x y
1.53 33.50 2.3409 51.255
1.78 36.30 2.1684 64.614
2.60 40.00 6.76 104
2.95 45.80 8.7025 135.11
3.42 53.50 11.6964 182.97
2
x
12.28x 209.1y 2
32.67x 537.95xy
30. Here n=5
= 9.726
Then, the line of regression of y on x
y=17.932+9.726x
Which is required line of regression of y on x.
2 2
( )
yx
n xy x y
b
n x x
( )yxy y b x x
31. Question:
For 10 observations on price (x) and
supply (y), the following data were
obtained :
Obtain the two lines of regression
and estimate the supply when price
is 16 units.
2 2
130., 220., 2288., 5506., 3467x y x y xy
32. Solution:
Regression coefficient of y on x
=1.015
Regression line of y on x is
y=1.015x+8.805
10,, 13., 22
x y
n x y
n n
2 2
( )
yx
n xy x y
b
n x x
( )yxy y b x x
33. Since we are to estimate supply (y) when price (x)
is given therefore we are to use regression line of y
on x here.
When x=16 units
y = 1.105(16)+8.805
=25.045
34. Ques:- From the following data, find the most likely
value of y when x=24:
Mean (x)=18.1, mean (y)=985.8
S.D (x)=2, S.D (y)=36.4,
r=0.58
35. Ex. In a partially destroyed laboratory
record of an analysis of a correlation data, the
following results only are eligible : Variance of
x = 9 Regression equations :
What were (a) the mean values of x and y ,
(b) the standard deviation of x and y and the
coefficient of correlation between x and y
8 10 66 0, 40 18 214.x y x y
36. 2
(i)Sinceboth thelinesof regression passthrough thepoint(x,y)therefore,
8 10 66 0
40 18 214 0 .
13 17
( ) 9 3
0.8 6.6
x x
x y
x y Solvetheseeqs
x and y
ii Variance of x
Theequationsof linesof regressioncanbewritten as
y x and x
2
0.45 5.35
0.8 0.45
* 0.8*0.45 0.36
0.6
0.8*0.3
4
0.6
yx xy
yx xy
y yx x
yx y
x
y
b and b
r b b
r
r b
b
r
37. Ques. : If the regression co-efficient are 0.8 and
0.2, what would be the value of co-efficient of
correlation.
38. Ques.: The equations of two lines of regression
obtained in a correlation analysis of 60 observation
are 5x = 6y +24 , and 1000y =768 x – 3608.
What is the co-efficient of correlation ?
Mean values of x and y.
What is the ratio of variance of x and y ?