Correlation-Regression
It deals with association between two or
more variables
Correlation analysis deals with
covariation between two or more
variables
Types
1. Positive or negative
Simple or multiple
Linear or non-linear
Methods of Measuring correlation
1. Graphic Method
2. Diagramatic Method- Scatter Diagram
3. Algebraic method
a. Karl Pearson’s Coefficient of correlation
b. Spearman’s Rank Co-efficient Correlation
c. Coefficient of Concurrent deviations
d. Least Squares Method
Karl Pearson’s Coefficient of Correlation
Σ dx dy
γ ( Gamma) = -------------------------
√ Σ dx2
Σ dy2
Σ dx dy
= -------------------------
N σxσy
dx = x-xbar
dy = y- ybar
dx dy = sum of products of deviations from respective
arithmetic means of both series
Karl Pearson’s Coefficient of Correlation
After calculating assumed or working mean Ax & Ay
Σ dx dy – (Σ dx) x( Σ dy)
γ ( Gamma) = --------------------------------
√ [ NΣ dx2
- (Σ dx)2
x [Σ Ndy2
- (Σ dy)2
]
Σ dx dy = total of products of deviation from assumed
means of x and y series
Σ dx = total of deviations of x series
Σ dy = total of deviations of y series
Σ dx2
= total of squared deviations of x series
Σ dy2
= total of squared deviations of y series
N= No. of items ( no. of paired items
Karl Pearson’s Coefficient of Correlation
After calculating assumed or working mean Ax &
Ay
Σ dx x Σ dy
Σ dx dy - ----------------
N
γ ( Gamma) = -------------------------
(Σ dx)2
(Σ dy)2
√ [ Σ dx2
- --------- ] x [ Σ dy2
- ------------]
N N
Assumptions of Karl Pearson’s Coefficient of Correlation
1. Linear relationship exists between the variables
Properties of Karl Pearson’s Coefficient of Correlation
1.value lies between +1 & - 1
2.Zero means no correlation
3.γ ( Gamma) = √ bxy X byx
Where bxy X byx are two regression coefficicent
Merit
Convenient for accurate interpretation as it gives degree &
direction of relationship between two variables
Limitations
1. Assumes linear relationship , even though it
may not be
2. Method & process of calculation is difficult &
time consuming
3. Affected by extreme values in distribution
Probable Error of Karl Pearson’s Coefficient of
Correlation
1- γ2
Probable Error of γ ( Gamma) = 0.6745 --------
√ N
Q7.Calculate coefficient of correlation for following data
X
65 63 67 64 68 62 70 66 68 67 69 71
Y 68 66 68 65 69 66 68 65 71 67 68 70
Ans Σ dx dy
γ ( Gamma) = -------------------------
√ Σ dx2
Σ dy2
Σ dx dy
= -------------------
N σxσy
1 2 3 4 5 6 7 8 9 10 11 12
Su
mX Xbar
X 65 63 67 64 68 62 70 66 68 67 69 71 800 66.67
Y 68 66 68 65 69 66 68 65 71 67 68 70 811 67.58
dx=x-xbar -1.67 -3.67 0.33 -2.67 1.33 -4.67 3.33 -0.67 1.33 0.33 2.33 4.33
dx2
2.78 13.44 0.11 7.11 1.78 21.78 11.11 0.44 1.78 0.11 5.44 18.78
84.
67
dx.dy -0.69 5.81 0.14 6.89 1.89 7.39 1.39 1.72 4.56 -0.19 0.97 10.47
40.
33
dy=y-ybar 0.42 -1.58 0.42 -2.58 1.42 -1.58 0.42 -2.58 3.42 -0.58 0.42 2.42
dy2
0.17 2.51 0.17 6.67 2.01 2.51 0.17 6.67 11.67 0.34 0.17 5.84
38.
92
Σ dx dy
sum dx2*
sumdy2
3294.
9
√ Σ dx2 Σ dy2 57.40
coeff of
correlation = 0.70
Q8. following information about age of husbands
& wives. Find correlation coefficient
Husband
23 27 28 29 30 31 33 35 36 39
Wife 18 22 23 24 25 26 28 29 30 32
γ ( Gamma) =0.99
1 2 3 4 5 6 7 8 9 10
Sum
X Xbar
X 23 27 28 29 30 31 33 35 36 39 311 31.10
Y 18 22 23 24 25 26 28 29 30 32 257 25.70
dx=x-
xbar -8.10 -4.10 -3.10 -2.10 -1.10 -0.10 1.90 3.90 4.90 7.90
dx2
65.61 16.81 9.61 4.41 1.21 0.01 3.61 15.21 24.01 62.41
202.
9
dx.dy 62.37 15.17 8.37 3.57 0.77 -0.03 4.37 12.87 21.07 49.77
178.
3
dy=y-
ybar -7.70 -3.70 -2.70 -1.70 -0.70 0.30 2.30 3.30 4.30 6.30
dy2
59.29 13.69 7.29 2.89 0.49 0.09 5.29 10.89 18.49 39.69
158.
1
Σ dx dy sum dx2* sumdy2
32078.4
9
√ Σ dx2 Σ dy2 179.10
coeff of correlation
= 1.00
Rank Correlation : some times variable are not
quantitative in nature but can be arranged in
serial order.
Specially while eading with attributes like –
honesty , beauty , character , morality etc
To deal with such situations , Charles Edward
Spearman , in 1904 developed a formula for
obtaining correlation coefficient between ranks
of n individuals in two attributes under study , or
ranks given by two or three judges
Rank coefficient of correlation
6Σ d2
ρ (rho) = 1 - -------------------
N3
-N
6Σ d2
ρ (rho) = 1 - -------------------
N(N2
-1)
Σ d2
= total of squared difference
N = number of items
Q9. ten competitors in a cooking competition are ranked
by three judges in the following way .by using rank
coorelation method find out which pair of judges have
nearest approach
P Q R
1 1 3 6
2 6 5 4
3 5 8 9
4 10 4 8
5 3 7 1
6 2 10 2
7 4 2 3
8 9 1 10
9 7 6 5
10 8 9 7
P Q R
Rp-
Rq dpq2
Rq-
Rr dqr2
Rp-
Rr dpr2
1 1 3 6 -2 4 -3 9 -5 25
2 6 5 4 1 1 1 1 2 4
3 5 8 9 -3 9 -1 1 -4 16
4 10 4 8 6 36 -4 16 2 4
5 3 7 1 -4 16 6 36 2 4
6 2 10 2 -8 64 8 64 0 0
7 4 2 3 2 4 -1 1 1 1
8 9 1 10 8 64 -9 81 -1 1
9 7 6 5 1 1 1 1 2 4
10 8 9 7 -1 1 2 4 1 1
1000 200 214 0 60
6Sigma d2 1200 1284 360
N3
-N 990 6Sigma d2/N3
-N 1.21 1.297 0.3636
ρ
(rho) -0.21 -0.297 0.636364
Regression Analysis is the process of
developing a statistical model which is used
to predict the value of a dependant variable
by an independent variable
Application
Advertising v/s sales revenue
First used by Sir Francis Gatton in 1877 for
study of height of sons w.r.t height of fathers
Regression Analysis – going back or to revert to
the former condition or return
Refers to functional relationship between x & y
and estimates of value of depebdent variable y
for given values of independeny variable x
Relationship between income of employees and
savings
Regression coefficients can be used to calculate ,
correlation coeffecient.γ ( Gamma) = √ bxy X
byx
Types of Regression
1. Simple & Multiple Regression
2. Total or Partial
3. Linear / Non-linear
Methods of Regression Analysis
1. Scatter Diagram
2. Regression Equations
3. Regression Lines
Line of Regression of y on x y= a + bx
Coefficient b is slope of line of regression of y on x.
It represents the increment in the value of the dependent
variable y for a unit change in the value of independent
variable x i.e. rate of change of y w.r.t. x. It is written as byx
Regression coefficients/ coefficient of regression of y on x
Σ( x- x-
) (y- y-
) σdx dy
byx= ------------------= ----------
Σ (x- x-
)2
Σ dx2
i.e. Equation of Line of Regression of x on y
y-y-
= byx (x-x-
)
Line of Regression of x on y x= a + by
Coefficient b is slope of line of regression of x on y.
It represents the increment in the value of the dependent
variable x for a unit change in the value of independent
variabley i.e. rate of change of x w.r.t. y. It is written as bxy
Regression coefficients/ coefficient of regression of x on y
Σ( x- x-
) (y- y-
) σdx dy
bxy= ------------------= ----------
Σ (y- y-
)2
Σ dy2
i.e. Equation of Line of Regression of x on y
y-y-
= bxy (x-x-
)
Q2.From the data given below find
two regression coefficients
two regression equations
coefficient of correlation between marks in
Economics & statistics
most likely marks in statistics when marks in
Economics are 30
let marks in Economics be x and that in statistics
be y
Marks in Eco 25 28 35 32 31 36 29 38 34 32
Marks in Stat 43 46 49 41 36 32 31 30 33 39
Marks in
Eco
25 28 35 32 31 36 29 38 34 32 Σx 320 x-
32
Marks in
Stat
43 46 49 41 36 32 31 30 33 39 Σy 380 y-
38
Marks in
Eco
25 2
8
35 3
2
3
1
3
6
2
9
3
8
3
4
3
2
Σx 320 x-
3
2
Marks in
Stat
43 4
6
49 4
1
3
6
3
2
3
1
3
0
3
3
3
9
Σy 380 y-
3
8
dx=x- x-
=x-32
-7 -4 3 0 -1 4 -3 6 2 0 Σdx 0 3
3
3
3
dy=y- y-
=x-38
5 8 11 3 -2 -6 -7 -8 -5 1 Σdy 0
Marks in
Eco
25 28 35 32 31 36 29 38 34 32 Σx 320 x-
32
Marks in
Stat
43 46 49 41 36 32 31 30 33 39 Σy 380 y-
38
dx=x- x-
=x-
32
-7 -4 3 0 -1 4 -3 6 2 0 Σdx 0 33 33
dy=y- y-
=x-
38
5 8 11 3 -2 -6 -7 -8 -5 1 Σdy 0
dx2
49 16 9 0 1 16 9 36 4 0 Σdx2 140
dy2
25 64 121 9 4 36 49 64 25 1 Σdy2 398
dx dy -35 -
32
33 0 2 -
24
21 -
48
-
10
0 Σdxd
y
-93
Regression coefficients / coefficient of regression of y on
x =
Σ( x- x-
) (y- y-
) Σdx dy -93
byx= ------------------= ---------- = --------= -0.6643
Σ (x- x
-
)2
Σ dx2
140
regression of y on x
y-y-
= byx (x-x-
)
y-38 = -0.6643(x-32)
y -38= -0.6643x+0.6643*32
y = -0.6643x+38+0.6643*32
y = -0.6643x+38+21.2576
y = -0.6643x+59.2576
coefficient of regression of x on y
Σ( x- x-
) (y- y-
) Σdx dy -93
bxy= ------------------= ------- = ------ = -0.2337
Σ (y- y-
)2
Σ dy2
398
Equation of regression of x on y
x-x-
= bxy (y-y-
)
x-32 = -0.2337(y-38)
= - 0.2337 y +0.2337 *38
= -0.2337y + 8.8806
x = -0.2337y +32 + 8.8806
x = -0.2337y +40.8806
Correlation Coefficient = √ bxy *byx
= √ -0.2337 *-0.6643 = √ 0.1552 = -0.394
Since byx & bxy are both negative
In order to estimate most likely marks in statistics
(y) when Economics (x) are 30 , we shall use the
line regression of y x viz
The required estimate is given by
y = -0.6643* 30+59.2576= -19.929+59.2576 =
=39.3286
Sum of Squares- x&y
(Σx )*(Σy)
SSxy=Σ( x-x-
)(y-y-
)= Σdxdy = Σxy - --------------
n
Sum of Squares xx
(Σx )
SSxx = Σ ( x-x-
)2
= Σdx
2
=Σx2
- -------------
n
advt sales
92 930
94 900
97 1020
98 990
100 1100
102 1050
104 1150
105 1120
105 1130
107 1200
107 1250
110 1220
Sales &advt expenses in Rs.1000. Develop a regression model
Sum of Squares- x&y
(Σx )*(Σy)
SSxy=Σ( x-x-
)(y-y-
)= Σdxdy = Σxy - --------------
n
Sum of Squares xx
(Σx )
SSxx = Σ ( x-x-
)2
= Σdx
2
=Σx2
- -------------
n
SSxy Σdxdy
b = ------------=---------
SSxx Σdx
2
y=a+bx
Σ y= Σ a+b Σ x
Σ y= n* a+b Σ x
n* a = b Σ x - Σ y
Σ y - bΣ x Σ y bΣ x
a = ----------- = ------- - -------
n n n
xi= yi=
predict
ed residual
advt sales x2 xy (yi-y-
) (yi-y-
)2
y^=fits yi-y^ ( yi-y^)2 y^-y-
(y^-y-
)2
92 930 8464 85560 = 902.4 27.6
94 900 8836 84600 940.54 -40.54
97 1020 9409 98940 997.75 22.25
98 990 9604 97020
1016.8
2 -26.82
100 1100 10000 110000
1054.9
6 45.04
102 1050 10404 107100 1093.1 -43.1
104 1150 10816 119600
1131.2
4 18.76
105 1120 11025 117600
1150.3
1 -30.31
105 1130 11025 118650
1150.3
1 -20.31
107 1200 11449 128400
1188.4
5 11.55
107 1250 11449 133750
1188.4
5 61.55
110 1220 12100 134200
1245.6
6 -25.66
1221 13060 124581 1335420 0
13059.
99 0.01
Σx Σy Σx2 Σxy Σ Σ Σ(yi-yc) Σ
xi= yi= predicted residual
advt sales x2 xy (yi-y-
) (yi-y-
)2
y^=fits yi-y^ ( yi-y^)2 y^-y-
(y^-y-
)2
92 930 8464 85560 -158.33
25069.44
902.4 27.6
761.76 -185.93 34571.20
94 900 8836 84600 -188.33
35469.44
940.54 -40.54
1643.49 -147.79 21842.87
97 1020 9409 98940 -68.33 4669.44 997.75 22.25 495.06 -90.58 8205.34
98 990 9604 97020 -98.33 9669.44 1016.82 -26.82 719.31 -71.51 5114.16
100 1100 10000 110000 11.67 136.11 1054.96 45.04 2028.60 -33.37 1113.78
102 1050 10404 107100 -38.33 1469.44 1093.1 -43.1 1857.61 4.77 22.72
104 1150 10816 119600 61.67 3802.78 1131.24 18.76 351.94 42.91 1840.98
105 1120 11025 117600 31.67 1002.78 1150.31 -30.31 918.70 61.98 3841.11
105 1130 11025 118650 41.67 1736.11 1150.31 -20.31 412.50 61.98 3841.11
107 1200 11449 128400 111.67 12469.44 1188.45 11.55 133.40 100.12 10023.35
107 1250 11449 133750 161.67 26136.11 1188.45 61.55 3788.40 100.12 10023.35
110 1220 12100 134200 131.67 17336.11 1245.66 -25.66 658.44 157.33 24751.68
1221 13060 124581 1335420 0.00 138966.667 13059.99 0.01 13769.21 -0.01 125191.6
Σx Σy Σx2 Σxy Σ Σ Σ(yi-yc) Σ
1221
x-
= ------------- = 101.75
12
(Σx *Σy) 1221*13060
SSxy = Σxy - ------------= 1335420 - -------------- =6565
n 12
(Σx )2
( 1221)2
SSxx = Σx2
- -------------= 124581 - ------- = = 344.25
n 12
SSxy 6565
b = ------------- = ----------------= 19.0704
SSxx 344.25
y=a+bx
Σ y= Σ a+b Σ x
Σ y= n* a+b Σ x
n* a = b Σ x - Σ y
Σ y - bΣ x Σ y bΣ x 13060 19.0704*1221
a = ----------- = ------- - ------- = ---------- - --------------
n n n 12 12
= - 852.08
equation for simple regression line
y= a+bx
y= -852.08+ 19.0704 x
for regression of y on x
For testing the Fit
yi = yi- value of y –recorded value in the given data
y-
= Mean ( Average )of y
y^ = Predicted Values from regression line
deviation = (yi- y-
) = difference in actual value of y from
mean
Residuals = (yi- y^)= gap ( error , difference ) between
actual value of y & predicted value calculated from
regression line
Deviation of predicted value from mean = (y^- y-
)
a = intercept on y -axis
b= slope of regression line
total sum of squares = SST = Σ (yi-y-
)2
regression sum of squares = SSR = Σ (y^- y-
)2
Error sum of squares = SSE = Σ (yi-y^)2
SSR
coefficient of determination = γ2= -------
SST
SSE
Standard Error of Estimate =Syx= √----------------
n-2
In order to to determine whether a significant
linear relationship exists between independent
variable x and dependent variable y we perform
whether population slope is zero
b - β
t= ----------
Sb
Syx
Sb = Standard error of b= -----------
√ SSxx
H0:Slope of thr regression line is zero
H1-Slope of the regression line is not zero
SSE
Syx= Standard Error of Estimate =√--------
n-2
Σ (yi-y^)2 13769.21
=√ -------- = √------------ = √1376.92 = 37.1068
n-2 10-2
(Σx )2 (1221)2
SSxx = Σx2 - -------- = 124581 - -------= 344.25
n 12
Syx
Sb = Standard error of b= -----------
√ SSxx
Syx
Sb = Standard error of b= -----------
√ SSxx
b- β 19.07-0
t= ---------- = ------------------------------- = 9.53
Sb 37.1068/( √344.25)
As calculated value of t is more than table
value of t for 12-2 = 10 degrees of freedom
Null hypothesis is rejected
Coefficient of Determination Definition
The Coefficient of Determination, also known as R
Squared, is interpreted as the goodness of fit of a
regression.
The higher the coefficient of determination, the
better the variance that the dependent variable is
explained by the independent variable.
The coefficient of determination is the overall
measure of the usefulness of a regression.
For example,r2
is given at 0.95. This means that the
variation in the regression is 95% explained by the
independent variable. That is a good regression.
The Coefficient of Determination can be
calculated as the Regression sum of squares,
SSR, divided by the total sum of squares, SST
SSR
Coefficient of Determination γ2
= ---------- SST
Campus Overview
907/A Uvarshad,
Gandhinagar
Highway, Ahmedabad –
382422.
Ahmedabad Kolkata
Infinity Benchmark,
10th
Floor, Plot G1,
Block EP & GP,
Sector V, Salt-Lake,
Kolkata – 700091.
Mumbai
Goldline Business Centre
Linkway Estate,
Next to Chincholi Fire
Brigade, Malad (West),
Mumbai – 400 064.
Thank You

Correlation & regression

  • 2.
  • 3.
    It deals withassociation between two or more variables Correlation analysis deals with covariation between two or more variables Types 1. Positive or negative Simple or multiple Linear or non-linear
  • 4.
    Methods of Measuringcorrelation 1. Graphic Method 2. Diagramatic Method- Scatter Diagram 3. Algebraic method a. Karl Pearson’s Coefficient of correlation b. Spearman’s Rank Co-efficient Correlation c. Coefficient of Concurrent deviations d. Least Squares Method
  • 5.
    Karl Pearson’s Coefficientof Correlation Σ dx dy γ ( Gamma) = ------------------------- √ Σ dx2 Σ dy2 Σ dx dy = ------------------------- N σxσy dx = x-xbar dy = y- ybar dx dy = sum of products of deviations from respective arithmetic means of both series
  • 6.
    Karl Pearson’s Coefficientof Correlation After calculating assumed or working mean Ax & Ay Σ dx dy – (Σ dx) x( Σ dy) γ ( Gamma) = -------------------------------- √ [ NΣ dx2 - (Σ dx)2 x [Σ Ndy2 - (Σ dy)2 ] Σ dx dy = total of products of deviation from assumed means of x and y series Σ dx = total of deviations of x series Σ dy = total of deviations of y series Σ dx2 = total of squared deviations of x series Σ dy2 = total of squared deviations of y series N= No. of items ( no. of paired items
  • 7.
    Karl Pearson’s Coefficientof Correlation After calculating assumed or working mean Ax & Ay Σ dx x Σ dy Σ dx dy - ---------------- N γ ( Gamma) = ------------------------- (Σ dx)2 (Σ dy)2 √ [ Σ dx2 - --------- ] x [ Σ dy2 - ------------] N N
  • 8.
    Assumptions of KarlPearson’s Coefficient of Correlation 1. Linear relationship exists between the variables Properties of Karl Pearson’s Coefficient of Correlation 1.value lies between +1 & - 1 2.Zero means no correlation 3.γ ( Gamma) = √ bxy X byx Where bxy X byx are two regression coefficicent Merit Convenient for accurate interpretation as it gives degree & direction of relationship between two variables
  • 9.
    Limitations 1. Assumes linearrelationship , even though it may not be 2. Method & process of calculation is difficult & time consuming 3. Affected by extreme values in distribution
  • 10.
    Probable Error ofKarl Pearson’s Coefficient of Correlation 1- γ2 Probable Error of γ ( Gamma) = 0.6745 -------- √ N
  • 11.
    Q7.Calculate coefficient ofcorrelation for following data X 65 63 67 64 68 62 70 66 68 67 69 71 Y 68 66 68 65 69 66 68 65 71 67 68 70 Ans Σ dx dy γ ( Gamma) = ------------------------- √ Σ dx2 Σ dy2 Σ dx dy = ------------------- N σxσy
  • 12.
    1 2 34 5 6 7 8 9 10 11 12 Su mX Xbar X 65 63 67 64 68 62 70 66 68 67 69 71 800 66.67 Y 68 66 68 65 69 66 68 65 71 67 68 70 811 67.58 dx=x-xbar -1.67 -3.67 0.33 -2.67 1.33 -4.67 3.33 -0.67 1.33 0.33 2.33 4.33 dx2 2.78 13.44 0.11 7.11 1.78 21.78 11.11 0.44 1.78 0.11 5.44 18.78 84. 67 dx.dy -0.69 5.81 0.14 6.89 1.89 7.39 1.39 1.72 4.56 -0.19 0.97 10.47 40. 33 dy=y-ybar 0.42 -1.58 0.42 -2.58 1.42 -1.58 0.42 -2.58 3.42 -0.58 0.42 2.42 dy2 0.17 2.51 0.17 6.67 2.01 2.51 0.17 6.67 11.67 0.34 0.17 5.84 38. 92 Σ dx dy sum dx2* sumdy2 3294. 9 √ Σ dx2 Σ dy2 57.40 coeff of correlation = 0.70
  • 13.
    Q8. following informationabout age of husbands & wives. Find correlation coefficient Husband 23 27 28 29 30 31 33 35 36 39 Wife 18 22 23 24 25 26 28 29 30 32 γ ( Gamma) =0.99
  • 14.
    1 2 34 5 6 7 8 9 10 Sum X Xbar X 23 27 28 29 30 31 33 35 36 39 311 31.10 Y 18 22 23 24 25 26 28 29 30 32 257 25.70 dx=x- xbar -8.10 -4.10 -3.10 -2.10 -1.10 -0.10 1.90 3.90 4.90 7.90 dx2 65.61 16.81 9.61 4.41 1.21 0.01 3.61 15.21 24.01 62.41 202. 9 dx.dy 62.37 15.17 8.37 3.57 0.77 -0.03 4.37 12.87 21.07 49.77 178. 3 dy=y- ybar -7.70 -3.70 -2.70 -1.70 -0.70 0.30 2.30 3.30 4.30 6.30 dy2 59.29 13.69 7.29 2.89 0.49 0.09 5.29 10.89 18.49 39.69 158. 1 Σ dx dy sum dx2* sumdy2 32078.4 9 √ Σ dx2 Σ dy2 179.10 coeff of correlation = 1.00
  • 15.
    Rank Correlation :some times variable are not quantitative in nature but can be arranged in serial order. Specially while eading with attributes like – honesty , beauty , character , morality etc To deal with such situations , Charles Edward Spearman , in 1904 developed a formula for obtaining correlation coefficient between ranks of n individuals in two attributes under study , or ranks given by two or three judges
  • 16.
    Rank coefficient ofcorrelation 6Σ d2 ρ (rho) = 1 - ------------------- N3 -N 6Σ d2 ρ (rho) = 1 - ------------------- N(N2 -1) Σ d2 = total of squared difference N = number of items
  • 17.
    Q9. ten competitorsin a cooking competition are ranked by three judges in the following way .by using rank coorelation method find out which pair of judges have nearest approach P Q R 1 1 3 6 2 6 5 4 3 5 8 9 4 10 4 8 5 3 7 1 6 2 10 2 7 4 2 3 8 9 1 10 9 7 6 5 10 8 9 7
  • 18.
    P Q R Rp- Rqdpq2 Rq- Rr dqr2 Rp- Rr dpr2 1 1 3 6 -2 4 -3 9 -5 25 2 6 5 4 1 1 1 1 2 4 3 5 8 9 -3 9 -1 1 -4 16 4 10 4 8 6 36 -4 16 2 4 5 3 7 1 -4 16 6 36 2 4 6 2 10 2 -8 64 8 64 0 0 7 4 2 3 2 4 -1 1 1 1 8 9 1 10 8 64 -9 81 -1 1 9 7 6 5 1 1 1 1 2 4 10 8 9 7 -1 1 2 4 1 1 1000 200 214 0 60 6Sigma d2 1200 1284 360 N3 -N 990 6Sigma d2/N3 -N 1.21 1.297 0.3636 ρ (rho) -0.21 -0.297 0.636364
  • 19.
    Regression Analysis isthe process of developing a statistical model which is used to predict the value of a dependant variable by an independent variable Application Advertising v/s sales revenue First used by Sir Francis Gatton in 1877 for study of height of sons w.r.t height of fathers
  • 20.
    Regression Analysis –going back or to revert to the former condition or return Refers to functional relationship between x & y and estimates of value of depebdent variable y for given values of independeny variable x Relationship between income of employees and savings Regression coefficients can be used to calculate , correlation coeffecient.γ ( Gamma) = √ bxy X byx
  • 21.
    Types of Regression 1.Simple & Multiple Regression 2. Total or Partial 3. Linear / Non-linear Methods of Regression Analysis 1. Scatter Diagram 2. Regression Equations 3. Regression Lines
  • 22.
    Line of Regressionof y on x y= a + bx Coefficient b is slope of line of regression of y on x. It represents the increment in the value of the dependent variable y for a unit change in the value of independent variable x i.e. rate of change of y w.r.t. x. It is written as byx Regression coefficients/ coefficient of regression of y on x Σ( x- x- ) (y- y- ) σdx dy byx= ------------------= ---------- Σ (x- x- )2 Σ dx2 i.e. Equation of Line of Regression of x on y y-y- = byx (x-x- )
  • 23.
    Line of Regressionof x on y x= a + by Coefficient b is slope of line of regression of x on y. It represents the increment in the value of the dependent variable x for a unit change in the value of independent variabley i.e. rate of change of x w.r.t. y. It is written as bxy Regression coefficients/ coefficient of regression of x on y Σ( x- x- ) (y- y- ) σdx dy bxy= ------------------= ---------- Σ (y- y- )2 Σ dy2 i.e. Equation of Line of Regression of x on y y-y- = bxy (x-x- )
  • 24.
    Q2.From the datagiven below find two regression coefficients two regression equations coefficient of correlation between marks in Economics & statistics most likely marks in statistics when marks in Economics are 30 let marks in Economics be x and that in statistics be y Marks in Eco 25 28 35 32 31 36 29 38 34 32 Marks in Stat 43 46 49 41 36 32 31 30 33 39
  • 25.
    Marks in Eco 25 2835 32 31 36 29 38 34 32 Σx 320 x- 32 Marks in Stat 43 46 49 41 36 32 31 30 33 39 Σy 380 y- 38
  • 26.
    Marks in Eco 25 2 8 353 2 3 1 3 6 2 9 3 8 3 4 3 2 Σx 320 x- 3 2 Marks in Stat 43 4 6 49 4 1 3 6 3 2 3 1 3 0 3 3 3 9 Σy 380 y- 3 8 dx=x- x- =x-32 -7 -4 3 0 -1 4 -3 6 2 0 Σdx 0 3 3 3 3 dy=y- y- =x-38 5 8 11 3 -2 -6 -7 -8 -5 1 Σdy 0
  • 27.
    Marks in Eco 25 2835 32 31 36 29 38 34 32 Σx 320 x- 32 Marks in Stat 43 46 49 41 36 32 31 30 33 39 Σy 380 y- 38 dx=x- x- =x- 32 -7 -4 3 0 -1 4 -3 6 2 0 Σdx 0 33 33 dy=y- y- =x- 38 5 8 11 3 -2 -6 -7 -8 -5 1 Σdy 0 dx2 49 16 9 0 1 16 9 36 4 0 Σdx2 140 dy2 25 64 121 9 4 36 49 64 25 1 Σdy2 398 dx dy -35 - 32 33 0 2 - 24 21 - 48 - 10 0 Σdxd y -93
  • 28.
    Regression coefficients /coefficient of regression of y on x = Σ( x- x- ) (y- y- ) Σdx dy -93 byx= ------------------= ---------- = --------= -0.6643 Σ (x- x - )2 Σ dx2 140 regression of y on x y-y- = byx (x-x- ) y-38 = -0.6643(x-32) y -38= -0.6643x+0.6643*32 y = -0.6643x+38+0.6643*32 y = -0.6643x+38+21.2576 y = -0.6643x+59.2576
  • 29.
    coefficient of regressionof x on y Σ( x- x- ) (y- y- ) Σdx dy -93 bxy= ------------------= ------- = ------ = -0.2337 Σ (y- y- )2 Σ dy2 398 Equation of regression of x on y x-x- = bxy (y-y- ) x-32 = -0.2337(y-38) = - 0.2337 y +0.2337 *38 = -0.2337y + 8.8806 x = -0.2337y +32 + 8.8806 x = -0.2337y +40.8806
  • 30.
    Correlation Coefficient =√ bxy *byx = √ -0.2337 *-0.6643 = √ 0.1552 = -0.394 Since byx & bxy are both negative
  • 31.
    In order toestimate most likely marks in statistics (y) when Economics (x) are 30 , we shall use the line regression of y x viz The required estimate is given by y = -0.6643* 30+59.2576= -19.929+59.2576 = =39.3286
  • 32.
    Sum of Squares-x&y (Σx )*(Σy) SSxy=Σ( x-x- )(y-y- )= Σdxdy = Σxy - -------------- n Sum of Squares xx (Σx ) SSxx = Σ ( x-x- )2 = Σdx 2 =Σx2 - ------------- n
  • 33.
    advt sales 92 930 94900 97 1020 98 990 100 1100 102 1050 104 1150 105 1120 105 1130 107 1200 107 1250 110 1220 Sales &advt expenses in Rs.1000. Develop a regression model
  • 34.
    Sum of Squares-x&y (Σx )*(Σy) SSxy=Σ( x-x- )(y-y- )= Σdxdy = Σxy - -------------- n Sum of Squares xx (Σx ) SSxx = Σ ( x-x- )2 = Σdx 2 =Σx2 - ------------- n
  • 35.
    SSxy Σdxdy b =------------=--------- SSxx Σdx 2 y=a+bx Σ y= Σ a+b Σ x Σ y= n* a+b Σ x n* a = b Σ x - Σ y Σ y - bΣ x Σ y bΣ x a = ----------- = ------- - ------- n n n
  • 36.
    xi= yi= predict ed residual advtsales x2 xy (yi-y- ) (yi-y- )2 y^=fits yi-y^ ( yi-y^)2 y^-y- (y^-y- )2 92 930 8464 85560 = 902.4 27.6 94 900 8836 84600 940.54 -40.54 97 1020 9409 98940 997.75 22.25 98 990 9604 97020 1016.8 2 -26.82 100 1100 10000 110000 1054.9 6 45.04 102 1050 10404 107100 1093.1 -43.1 104 1150 10816 119600 1131.2 4 18.76 105 1120 11025 117600 1150.3 1 -30.31 105 1130 11025 118650 1150.3 1 -20.31 107 1200 11449 128400 1188.4 5 11.55 107 1250 11449 133750 1188.4 5 61.55 110 1220 12100 134200 1245.6 6 -25.66 1221 13060 124581 1335420 0 13059. 99 0.01 Σx Σy Σx2 Σxy Σ Σ Σ(yi-yc) Σ
  • 37.
    xi= yi= predictedresidual advt sales x2 xy (yi-y- ) (yi-y- )2 y^=fits yi-y^ ( yi-y^)2 y^-y- (y^-y- )2 92 930 8464 85560 -158.33 25069.44 902.4 27.6 761.76 -185.93 34571.20 94 900 8836 84600 -188.33 35469.44 940.54 -40.54 1643.49 -147.79 21842.87 97 1020 9409 98940 -68.33 4669.44 997.75 22.25 495.06 -90.58 8205.34 98 990 9604 97020 -98.33 9669.44 1016.82 -26.82 719.31 -71.51 5114.16 100 1100 10000 110000 11.67 136.11 1054.96 45.04 2028.60 -33.37 1113.78 102 1050 10404 107100 -38.33 1469.44 1093.1 -43.1 1857.61 4.77 22.72 104 1150 10816 119600 61.67 3802.78 1131.24 18.76 351.94 42.91 1840.98 105 1120 11025 117600 31.67 1002.78 1150.31 -30.31 918.70 61.98 3841.11 105 1130 11025 118650 41.67 1736.11 1150.31 -20.31 412.50 61.98 3841.11 107 1200 11449 128400 111.67 12469.44 1188.45 11.55 133.40 100.12 10023.35 107 1250 11449 133750 161.67 26136.11 1188.45 61.55 3788.40 100.12 10023.35 110 1220 12100 134200 131.67 17336.11 1245.66 -25.66 658.44 157.33 24751.68 1221 13060 124581 1335420 0.00 138966.667 13059.99 0.01 13769.21 -0.01 125191.6 Σx Σy Σx2 Σxy Σ Σ Σ(yi-yc) Σ
  • 38.
    1221 x- = ------------- =101.75 12 (Σx *Σy) 1221*13060 SSxy = Σxy - ------------= 1335420 - -------------- =6565 n 12 (Σx )2 ( 1221)2 SSxx = Σx2 - -------------= 124581 - ------- = = 344.25 n 12
  • 39.
    SSxy 6565 b =------------- = ----------------= 19.0704 SSxx 344.25 y=a+bx Σ y= Σ a+b Σ x Σ y= n* a+b Σ x n* a = b Σ x - Σ y Σ y - bΣ x Σ y bΣ x 13060 19.0704*1221 a = ----------- = ------- - ------- = ---------- - -------------- n n n 12 12 = - 852.08
  • 40.
    equation for simpleregression line y= a+bx y= -852.08+ 19.0704 x for regression of y on x
  • 41.
    For testing theFit yi = yi- value of y –recorded value in the given data y- = Mean ( Average )of y y^ = Predicted Values from regression line deviation = (yi- y- ) = difference in actual value of y from mean Residuals = (yi- y^)= gap ( error , difference ) between actual value of y & predicted value calculated from regression line Deviation of predicted value from mean = (y^- y- ) a = intercept on y -axis b= slope of regression line
  • 42.
    total sum ofsquares = SST = Σ (yi-y- )2 regression sum of squares = SSR = Σ (y^- y- )2 Error sum of squares = SSE = Σ (yi-y^)2 SSR coefficient of determination = γ2= ------- SST
  • 43.
    SSE Standard Error ofEstimate =Syx= √---------------- n-2 In order to to determine whether a significant linear relationship exists between independent variable x and dependent variable y we perform whether population slope is zero b - β t= ---------- Sb Syx Sb = Standard error of b= ----------- √ SSxx
  • 44.
    H0:Slope of thrregression line is zero H1-Slope of the regression line is not zero
  • 45.
    SSE Syx= Standard Errorof Estimate =√-------- n-2 Σ (yi-y^)2 13769.21 =√ -------- = √------------ = √1376.92 = 37.1068 n-2 10-2 (Σx )2 (1221)2 SSxx = Σx2 - -------- = 124581 - -------= 344.25 n 12 Syx Sb = Standard error of b= ----------- √ SSxx
  • 46.
    Syx Sb = Standarderror of b= ----------- √ SSxx b- β 19.07-0 t= ---------- = ------------------------------- = 9.53 Sb 37.1068/( √344.25) As calculated value of t is more than table value of t for 12-2 = 10 degrees of freedom Null hypothesis is rejected
  • 47.
    Coefficient of DeterminationDefinition The Coefficient of Determination, also known as R Squared, is interpreted as the goodness of fit of a regression. The higher the coefficient of determination, the better the variance that the dependent variable is explained by the independent variable. The coefficient of determination is the overall measure of the usefulness of a regression. For example,r2 is given at 0.95. This means that the variation in the regression is 95% explained by the independent variable. That is a good regression.
  • 48.
    The Coefficient ofDetermination can be calculated as the Regression sum of squares, SSR, divided by the total sum of squares, SST SSR Coefficient of Determination γ2 = ---------- SST
  • 49.
    Campus Overview 907/A Uvarshad, Gandhinagar Highway,Ahmedabad – 382422. Ahmedabad Kolkata Infinity Benchmark, 10th Floor, Plot G1, Block EP & GP, Sector V, Salt-Lake, Kolkata – 700091. Mumbai Goldline Business Centre Linkway Estate, Next to Chincholi Fire Brigade, Malad (West), Mumbai – 400 064.
  • 50.