Regression
The term regression was 1st
used by the british biomethician sir
Francis Galton. While studying the relation between average
height of their children ,Galton found that the off springs of
abnormally tall or short parents tend to regress or step back to the
average population height . in the course of time the meaning of
the word “ Regreassion “ become wider and now it stands to
measure the average relationship between different variables. If
there are only 2 variable under study then one is taken as
independent and another is taken as dependent variable and
regression analysis explain how on the average the values of the
dependent variable change with a change in the values of the
independent variable.
Lines of regression :
If the 2 variables in a bivariate distribution are related then the
dots in the scatter diagram will covariate in the neighbourhood of a
curve called a regression curve . If the curve is a straight line we
say that there is a linear regression between the variables,
otherwise regression is said to be curvilinear.
Let us suppose that in the bivariate distribution (x1,y1) , i= 1, 2,
... , n ; Y is dependent variable . let the line of regression of Y on X
be Y=a+bX. The values of a and b are obtained by using the
principal of least squares. According to the method of least square
we get the following two normal equation.
∑ 	௡
௜ୀଵ Yi = na + b ∑ ܺ௡
௜ୀଵ i ..... (a)
∑ ܺ௡
௜ୀଵ iYi = a ∑ ܺ௡
௜ୀଵ i + b ∑ ܺ௡
௜ୀଵ i
2
...... (b)
Y - ‫	ݕ‬ഥ = b ( X - ‫	ݔ‬ഥ) =
	ఓଵ	ଵ	
ఙ୶ଶ
( X - ‫	ݔ‬ഥ) ...... (g)
Y - ‫	ݕ‬ഥ = r
ఙ௬
ఙ௫
( X - ‫	ݔ‬ഥ) ....... (h)
∴ r =
	ఓଵ	ଵ	
ఙ୶	ఙ୷
Similarly starting with X = a + bY , we get the line of regression of
X on y as follows
( X - ‫	ݔ‬ഥ) =
	ఓଵ	ଵ	
ఙ୷ଶ
(Y - ‫	ݕ‬ഥ) ....... (i)
( X - ‫	ݔ‬ഥ) = r
ఙ௫
ఙ௬
	 (Y - ‫	ݕ‬ഥ) ...... (j)
Q.1) The two regression equations of the variables x and y are
x=19.13-0.87y and y=11.64-0.50x. Find:
i. Mean of x’s,
ii. Mean of y’s,
iii. The correlation coefficient between x and y.
Solution: since the mean of x’s and mean of y’s lie on two
regression lines, we have
‫	ݔ‬ഥ=19.13-0.87‫ݕ‬ത ... (i)
‫ݕ‬ത=11.64-0.50‫	ݔ‬ഥ ...(ii)
Multiplying (ii) by 0.87 and subtracting from (i), we have,
[1-(0.87)(0.50)] ‫̅ݔ‬=19.13-(11.64)(0.87)
Or 0.57‫̅ݔ‬=9.00
Or ‫̅ݔ‬=15.79
∴ ‫ݕ‬ത= 11.64-(0.50)(15.79)=3.74
∴ Regression coefficient of x on y is -0.87 and that of y on x is
-0.50
Now since the coefficient of correlation is the geometric mean
between two regression coefficients.
∴ r=√[(-0.50)(-0.87)] = √(0.43) = -0.66
[-ve sign is taken since both regression coefficients are –ve]
Q.2) In a partially destroyed laboratory record, only the lines
of regression of y on x and x on y are available as 4x-5y=33 and
20x-9y=107 respectively. Calculate ࢞ഥ, ࢟ഥ and the coefficient of
correlation between x and y.
Solution: Since the regression lines pass through (‫̅ݔ‬,‫ݕ‬ത), therefore,
4‫̅ݔ‬-5‫ݕ‬ത=33=0
20‫̅ݔ‬-9‫ݕ‬ത=107
Solving these equations, we get ‫̅ݔ‬	= 13, ‫	ݕ‬ഥ= 17
Rewriting the line of regression of y on x as y=
ସ
ହ
x+
ଷଷ
ହ
, we get
byx= r
ఙ೤
ఙೣ
=
ସ
ହ
...(i)
Rewriting the line of regression of x on y as x=
ଽ
ଶ଴
y+
ଵ଴଻
ଽ
, we get
bxy= r
ఙೣ
ఙ೤
=
ଽ
ଶ଴
...(ii)
Multiplying (i) and (ii), we get
r2
=
ସ
ହ
x
ଽ
ଶ଴
= 0.36
∴ r=0.6
Hence r = 0.6, the positive sign being taken as byx and bxy both are
positive.
1) Find out the line of regression of Y on X from the data
given below & also obtain the value of Y for X=71
Solution :
X Y X.Y ࢄ૛ ࢅ૛
65 67 4355 4225 4489
66 68 4488 4356 4624
67 66 4422 4489 4356
68 69 4692 4624 4761
69 72 4968 4761 5184
70 72 5040 4900 5184
71 69 4899 5041 4761
∑ࢄ
= ૝ૠ૟
∑ࢅ
= ૝ૡ૜
∑X.Y=32864 ∑ࢄ૛=32396 ∑ࢅ૛=33359
X 65 66 67 68 69 70 71
Y 67 68 66 69 72 72 71
ࢄ	ഥ =	
∑ࢄ
࢔
=
ସ଻଺
଻
= 68
ࢅ	ഥ=
∑ࢅ
࢔
=
ସ଼ଷ
଻
= 69
࣌࢞ = ට
૚
࢔
∑ࢄ૛ − ࢄഥ૛ =ට
ଵ
଻
(32396) − 4624 = 2
࣌࢟	= ට
૚
࢔
∑ࢅ૛ − ࢅഥ૛ = ට
ଵ
଻
33359 − 4761 = 2.138
r =
૚
࢔
∑ࢄ.ࢅି	ࢄഥ.ࢅഥ
ටቀ
૚
࢔
∑ࢄ૛ିࢄഥ૛ቁ.ቀ
૚
࢔
∑ࢅ૛ିࢅഥ૛ቁ
=
ସ଺ଽସ.଼଺ି(଺଼)(଺ଽ)
(ଶ)(ଶ.ଵଷ଼)
= 0.6688
Now , Regression line of Y on X
Y - ࢅ		തതത=		࢈࢟࢞	(X -ࢄ	ഥ 	)
࢈࢟࢞=r (
࣌࢟
࣌࢞
) =0.6688(
ଶ.ଵଷ଼
ଶ
) =0.715
Y-69 = 0.715(X-68)
y-69 = 0.715x-48.62
y = 0.715x+20.38
now for X=71, y=0.715(71)+20.38
y=71.145
2) A company wants to predict the annual value of its total
sales based on the national income of the company where it
does business. The relationship is represented in following
table.
X 189 190 208 227 239 252 257 274 293 308 316
Y 402 404 412 425 429 436 440 447 458 469 469
X represents the national income in millions of dollars & Y
represents the company’s sales in thousands of dollars in
the period from 2000 to 2010
Calculate:
a.The regression line of Y on X.
b. If in 2011, the country’s national income was 325
million dollars, what would the prediction for the
company’s sales be?
X Y X.Y X2
Y2
189 402 75978 35721 161604
190 404 76760 36100 163216
208 412 85696 43264 169744
227 425 96475 51529 180625
239 429 102531 57121 184041
252 436 109872 63504 190096
257 440 113080 66049 193600
274 447 122478 75076 199809
293 458 134194 85849 209764
308 469 144452 94864 219961
316 469 148204 99856 219961
2753 4791 1209720 708933 2092421
ܺത	=	
ଶ଻ହଷ
ଵଵ
= 250.27 ܻ	ഥ=
ସ଻ଽଵ
ଵଵ
= 435.55
࣌࢞	ୀට
૚
࢔
∑ࢄ૛ − ࢄഥ૛ = 42.58
࣌࢟	= ට
૚
࢔
∑ࢅ૛ − ࢅഥ૛ = 22.72
r =
૚
࢔
∑ࢄ.ࢅି	ࢄഥ.ࢅഥ
ටቀ
૚
࢔
∑ࢄ૛ିࢄഥ૛ቁ.ቀ
૚
࢔
∑ࢅ૛ିࢅഥ૛ቁ
= 0.998
Now, Regression line of Y on X
Y-ࢅഥ=r (
࣌࢟
࣌࢞
) (X-ࢄഥ)
Y -435.55 = 0.53(X-250.27)
Y = 0.53X+302.91
Now, for X = 325 Y = (0.53)(325)+302.91 = 475.16

Regression analysis

  • 1.
    Regression The term regressionwas 1st used by the british biomethician sir Francis Galton. While studying the relation between average height of their children ,Galton found that the off springs of abnormally tall or short parents tend to regress or step back to the average population height . in the course of time the meaning of the word “ Regreassion “ become wider and now it stands to measure the average relationship between different variables. If there are only 2 variable under study then one is taken as independent and another is taken as dependent variable and regression analysis explain how on the average the values of the dependent variable change with a change in the values of the independent variable.
  • 2.
    Lines of regression: If the 2 variables in a bivariate distribution are related then the dots in the scatter diagram will covariate in the neighbourhood of a curve called a regression curve . If the curve is a straight line we say that there is a linear regression between the variables, otherwise regression is said to be curvilinear. Let us suppose that in the bivariate distribution (x1,y1) , i= 1, 2, ... , n ; Y is dependent variable . let the line of regression of Y on X be Y=a+bX. The values of a and b are obtained by using the principal of least squares. According to the method of least square we get the following two normal equation. ∑ ௡ ௜ୀଵ Yi = na + b ∑ ܺ௡ ௜ୀଵ i ..... (a) ∑ ܺ௡ ௜ୀଵ iYi = a ∑ ܺ௡ ௜ୀଵ i + b ∑ ܺ௡ ௜ୀଵ i 2 ...... (b)
  • 3.
    Y - ‫ ݕ‬ഥ= b ( X - ‫ ݔ‬ഥ) = ఓଵ ଵ ఙ୶ଶ ( X - ‫ ݔ‬ഥ) ...... (g) Y - ‫ ݕ‬ഥ = r ఙ௬ ఙ௫ ( X - ‫ ݔ‬ഥ) ....... (h) ∴ r = ఓଵ ଵ ఙ୶ ఙ୷ Similarly starting with X = a + bY , we get the line of regression of X on y as follows ( X - ‫ ݔ‬ഥ) = ఓଵ ଵ ఙ୷ଶ (Y - ‫ ݕ‬ഥ) ....... (i) ( X - ‫ ݔ‬ഥ) = r ఙ௫ ఙ௬ (Y - ‫ ݕ‬ഥ) ...... (j)
  • 4.
    Q.1) The tworegression equations of the variables x and y are x=19.13-0.87y and y=11.64-0.50x. Find: i. Mean of x’s, ii. Mean of y’s, iii. The correlation coefficient between x and y. Solution: since the mean of x’s and mean of y’s lie on two regression lines, we have ‫ ݔ‬ഥ=19.13-0.87‫ݕ‬ത ... (i) ‫ݕ‬ത=11.64-0.50‫ ݔ‬ഥ ...(ii) Multiplying (ii) by 0.87 and subtracting from (i), we have, [1-(0.87)(0.50)] ‫̅ݔ‬=19.13-(11.64)(0.87)
  • 5.
    Or 0.57‫̅ݔ‬=9.00 Or ‫̅ݔ‬=15.79 ∴‫ݕ‬ത= 11.64-(0.50)(15.79)=3.74 ∴ Regression coefficient of x on y is -0.87 and that of y on x is -0.50 Now since the coefficient of correlation is the geometric mean between two regression coefficients. ∴ r=√[(-0.50)(-0.87)] = √(0.43) = -0.66 [-ve sign is taken since both regression coefficients are –ve]
  • 6.
    Q.2) In apartially destroyed laboratory record, only the lines of regression of y on x and x on y are available as 4x-5y=33 and 20x-9y=107 respectively. Calculate ࢞ഥ, ࢟ഥ and the coefficient of correlation between x and y. Solution: Since the regression lines pass through (‫̅ݔ‬,‫ݕ‬ത), therefore, 4‫̅ݔ‬-5‫ݕ‬ത=33=0 20‫̅ݔ‬-9‫ݕ‬ത=107 Solving these equations, we get ‫̅ݔ‬ = 13, ‫ ݕ‬ഥ= 17 Rewriting the line of regression of y on x as y= ସ ହ x+ ଷଷ ହ , we get byx= r ఙ೤ ఙೣ = ସ ହ ...(i)
  • 7.
    Rewriting the lineof regression of x on y as x= ଽ ଶ଴ y+ ଵ଴଻ ଽ , we get bxy= r ఙೣ ఙ೤ = ଽ ଶ଴ ...(ii) Multiplying (i) and (ii), we get r2 = ସ ହ x ଽ ଶ଴ = 0.36 ∴ r=0.6 Hence r = 0.6, the positive sign being taken as byx and bxy both are positive.
  • 8.
    1) Find outthe line of regression of Y on X from the data given below & also obtain the value of Y for X=71 Solution : X Y X.Y ࢄ૛ ࢅ૛ 65 67 4355 4225 4489 66 68 4488 4356 4624 67 66 4422 4489 4356 68 69 4692 4624 4761 69 72 4968 4761 5184 70 72 5040 4900 5184 71 69 4899 5041 4761 ∑ࢄ = ૝ૠ૟ ∑ࢅ = ૝ૡ૜ ∑X.Y=32864 ∑ࢄ૛=32396 ∑ࢅ૛=33359 X 65 66 67 68 69 70 71 Y 67 68 66 69 72 72 71
  • 9.
    ࢄ ഥ = ∑ࢄ ࢔ = ସ଻଺ ଻ = 68 ࢅ ഥ= ∑ࢅ ࢔ = ସ଼ଷ ଻ =69 ࣌࢞ = ට ૚ ࢔ ∑ࢄ૛ − ࢄഥ૛ =ට ଵ ଻ (32396) − 4624 = 2 ࣌࢟ = ට ૚ ࢔ ∑ࢅ૛ − ࢅഥ૛ = ට ଵ ଻ 33359 − 4761 = 2.138 r = ૚ ࢔ ∑ࢄ.ࢅି ࢄഥ.ࢅഥ ටቀ ૚ ࢔ ∑ࢄ૛ିࢄഥ૛ቁ.ቀ ૚ ࢔ ∑ࢅ૛ିࢅഥ૛ቁ = ସ଺ଽସ.଼଺ି(଺଼)(଺ଽ) (ଶ)(ଶ.ଵଷ଼) = 0.6688 Now , Regression line of Y on X
  • 10.
    Y - ࢅ തതത= ࢈࢟࢞ (X-ࢄ ഥ ) ࢈࢟࢞=r ( ࣌࢟ ࣌࢞ ) =0.6688( ଶ.ଵଷ଼ ଶ ) =0.715 Y-69 = 0.715(X-68) y-69 = 0.715x-48.62 y = 0.715x+20.38 now for X=71, y=0.715(71)+20.38 y=71.145
  • 11.
    2) A companywants to predict the annual value of its total sales based on the national income of the company where it does business. The relationship is represented in following table. X 189 190 208 227 239 252 257 274 293 308 316 Y 402 404 412 425 429 436 440 447 458 469 469 X represents the national income in millions of dollars & Y represents the company’s sales in thousands of dollars in the period from 2000 to 2010 Calculate: a.The regression line of Y on X.
  • 12.
    b. If in2011, the country’s national income was 325 million dollars, what would the prediction for the company’s sales be? X Y X.Y X2 Y2 189 402 75978 35721 161604 190 404 76760 36100 163216 208 412 85696 43264 169744 227 425 96475 51529 180625 239 429 102531 57121 184041 252 436 109872 63504 190096 257 440 113080 66049 193600 274 447 122478 75076 199809 293 458 134194 85849 209764 308 469 144452 94864 219961 316 469 148204 99856 219961 2753 4791 1209720 708933 2092421
  • 13.
    ܺത = ଶ଻ହଷ ଵଵ = 250.27 ܻ ഥ= ସ଻ଽଵ ଵଵ =435.55 ࣌࢞ ୀට ૚ ࢔ ∑ࢄ૛ − ࢄഥ૛ = 42.58 ࣌࢟ = ට ૚ ࢔ ∑ࢅ૛ − ࢅഥ૛ = 22.72 r = ૚ ࢔ ∑ࢄ.ࢅି ࢄഥ.ࢅഥ ටቀ ૚ ࢔ ∑ࢄ૛ିࢄഥ૛ቁ.ቀ ૚ ࢔ ∑ࢅ૛ିࢅഥ૛ቁ = 0.998 Now, Regression line of Y on X Y-ࢅഥ=r ( ࣌࢟ ࣌࢞ ) (X-ࢄഥ)
  • 14.
    Y -435.55 =0.53(X-250.27) Y = 0.53X+302.91 Now, for X = 325 Y = (0.53)(325)+302.91 = 475.16