Upcoming SlideShare
×

# Module 8

7,456 views

Published on

Published in: Technology, Economy & Finance
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
7,456
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
147
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Module 8

1. 1. Unit-8 REGRESSION ANALYSIS INTRODUCTION So far we have studied correlation analysis, which measures the direction and strength of the relationship between two variables. After establishing the correlation existing between the two variables one may be interested in estimating the value of one variable with the help of value of another variable. The statistical method with the help of which we are in a possible to estimate or predict the unknown value of one variable from the known value of another variables is called Regression. The Regression succeeds the correlation once the correlation ship between the two variations is established, the regression analysis proceeds with the estimation of probable values. Sir. Francis Galton, a British biometrician, introduced the concept regression for the first time in 1877: while studying the correlation between the heights of sons and their fathers. He concluded in his studies, “Tall fathers tend to have tall sons and short fathers short sons. The average height of the sons of a group of tall fathers is less than that of the fathers. While the average height of the sons of a group of short fathers is greater than that of the fathers. It means the coming generations of tall or short parents tend to step back to average height of population. Now a days a modern statistician prefer to use the term Regression in the sense of estimation, which is an important statistical tool in a economics business. Meaning Regression means returning or stepping back to the average value. In statistics, the term Regression means simple the average relationship. We can predict or estimate the value of dependent variable from the given related values of independent variable with the help of a Regression Technique. The measure of Regression studies the nature of correlation ship to estimate the most probable values. It establishes a functional relationship between the independent and dependent variables. Definition According to Blair “Regression is the measure of the average relationship between two or more variable in terms of the original units of the data” According to Taro Yamame “ One of the most frequently used technique in economics and business research to find a relation between two or more variables that are related casually, is regression analysis. According to Wallis and Robert “It is often more important to find out what the relation actually is, in order to estimate or predict one variable and statistical technique appropriate in such a case is called regression analysis. USES OF REGRESSION ANALYSIS Regression analysis is of great practical use even more than the correlation analysis; the following are some uses, 1. Regression analysis helps in establishing a functional relationship between two or more variables once this is established, it can be used for various advanced analytic purpose. 2. With the use of electronic machine and computers tedium of collection of regression equation particularly expressing multiple and a non-linear relationship has been reduced a great deal. 3. Since most of the problems of economic analysis are based on cause and effect relationship. The regression analysis is a highly valuable tool in economic and business research. 4. The regression analysis is very useful for prediction purpose. Once a functional relationship is known, the value of dependent variable can be predicted from the given value of the independent variable. CORRELATION AND REGRESSION These two techniques are directed towards a common purpose of establishing the degree and the direction of relationship between two or more variables but the methods of doing so are different. The choice of one or the other will depend on the purpose. In spite certain similarities between these two, but there are some basic differences in the two approaches, which have been summarized below: 138
2. 2. CORRELATION REGRESSION 1. Correlation, literally means related or 1. Regression literally means return to the normal, sympathetic movements between variables which is true on account of the average of 2. There is a sort of interdependence, which is relationship. mutual. 2. It establishes a functional relationship, which is 3. There is no cause and effect relation ship. It mathematical showing dependence of one only shows the existence of some association in variable on the other. the movement of variables. 3. It may have a cause and effect relationship. 4. It may be spurious correlation if the 4. It is a mathematical relationship, which should sympathetic movement is on account of the be interpreted suitably. influence of an out side variable which has no 5. It is an absolute measure of relationship. relevance. 6. Besides verification it can also be used for 5. It is a relative measure showing association estimation and prediction. It tenders more between variables. comprehensive information. 6. It is used only for testing and verification of the 7. It is very useful for further mathematical relationship. It tenders only a limited treatment. information. 7. It is not very useful for further mathematical treatment. METHODS OF REGRESSION ANALYSIS There are two methods: 1. Graphic methods (Not included in the syllabus) 2. Algebraic method. The algebraic methods for simple linear regression can be broadly divided in to the following, A. Regression lines. B. Regression Equations. C. Regression coefficient. A. REGRESSION LINES: In the graphical jargon, a regression line is a straight line fitted to the data by the method of least squares. It indicates the best probable mean value of one variable corresponding to the mean value of the other. Since a regression line is the line of best fit, it cannot be used conversely therefore, there are always two regression lines constructed for the relation ship between tow variables x and y. Thus one regression line shows regression of x upon y and the other shows regression of y upon x. When two variables have relationship, then we can draw a regression line. The regression line of x on y gives the most probable vales of x for any given value of y. In the same manner the regression line of y on x gives the most probable values of y for any given value of x. Thus there will be two regression lines in the case of two variables. REGRESSION EQUATIONS Regression equation is an algebraic method. It is an algebraic expression of the regression line. It can be classified in to regression equation, regression coefficients. As there are two regression lines, there are two regression equations. For the two variables x and y, there are two regression equations. They are regression equation of x on y and the regression equation of y on x. I Regression equation of x on y (X-X)=r (Y-Y) Y II Regression Equation of Y on X Y (Y-Y)=r (X-X) 139
3. 3. Application of Regression Equations when all required values are given ILLUSTRATION =01 From the following results, obtain the two-regression equation and estimate the yield of crops when the rainfall is 29 cms and the rainfall when the yield is 600 kg. Y X Yield Rainfall In Kg In cm 26.7 Mean 508.4 4.6 S.D 36.8 Co efficient of correlation between yield and rainfall=0.52 Solution: To estimate the yield of crops, we have to use Y on X Regression Equation. Y (Y-Y)= r (X-X) 36.8 Y-508.4=0.52 (X-26.7) 4.6 Y-508.4 = 4.16 (x-26.7) Y-508.4 =4.16x-111.072 Y = 4.16x-111.072+508.4 Y=4.16x +397.328 R.line When x =29 =4.16 x 29 + 397.328 = 120.64 + 397.328 = 517.968 kgs Similarly to estimate rainfall, we have to used x on y Regression equation. Y (X-X)=r (Y-Y) 4.6 X-26.7=0.52 (Y-508.4) 36.8 X-26.7=0.065 (Y-508.4) X-26.7=0.065Y-33.046 X=0.065Y-33.046+26.7 X=0.065Y-6.346 R, Line When Y=600 Kgs X=0.065X600-6.346 =39-6.346 X=32.654 140
4. 4. ILLUSTRATION =02 Find out the regression equation, showing the regression of capacity utilization on production from the following data. Production In lakh Average Standard Deviation Units 35.6 10.5 Capacity Utilization 84.8 8.5 (in percentage) Coefficient of correlation}=0.62 Estimate the production when the capacity utilization is 70% SOLUTION; Let the production and capacity utilization be denoted by X and Y respectively. Then we are given; X=35.6 Y=84.8 X=10.5 Y=8.5 P=0.62 To estimate production we have to use X on Y regression equation (X-X)=r (Y-Y) Y 10.5 (X-35.6)=0.62 (Y-84.8) 8.5 X=35.6=0.7658(Y-84.8) X-35.6=0.7658Y-64.94 X=0.7658y—64.94+35.6 X=0.7658y-29.34 R.Line When Y=70% =0.7658X70-29.34 =53.606-29.34 X=24.266 lakh unit ILLUSTRATION = 03 Karl Pearson’ coefficient of correlation between the ages of brother’s and sisters in a community was found to be 0.8. Average of the brother’s ages was 25 years and that of sister’s were 22years.Their standard deviations were 4 and 5 respectively. Find a. The expected age of brother when the sister’s age is 12 years. b. The expected age of sister when the brother’s age is 33 years. Solution: Brother Sister X Y Mean age 25 years 22years Standard Deviation 4 5 Co-efficient of Correlation 0.8 To estimate the brother’s age, we have to use X on Y Regression equation. X=? When Y =12 (X-X)= r (Y-Y) Y 141
5. 5. 4 X-25=0.8 (Y-22) 5 X-25=0.64(Y-22) X-25=0.64Y-14.08 X=0.64Y-14.08+25 X=0.64Y+10.92 R.Line When Y=12 =0.64X12+10.92 X=18.6 years, brother’s age To estimate the sister’s age, we have to use Y on X regression equation Y=? When X=33years Y (Y-Y)=r (X-X) 5 (Y-22)=0.8 (X-25) 4 Y=X-3 R.Line Y-22=1.0 (X-25) When X=33 Y-22=1X-25 Y=33-3 Y=X-22+22 Y=30 years, sister’s age ILLUSTARION=04 Give the following data, estimate 1. The value of Y when X=70 2. The value of X when Y=90 X-Series Y-Series Mean 18 100 Standard deviation 14 20 Co-efficient of correlation 0.8 SOLUTION II. X=? When Y=90 I .Y=? When X =70 use Y on X R. equation Use X on Y R. Equation Y (Y-Y)= r (X-X) (X-X)= r (Y-Y) X Y 20 Y-100=0.8 (X-18) 14 14 X-18=0.8 (Y-100) Y-100=1.143 (X-18) 20 Y-100=1.143X-20.574 X-18=0.56 (Y-100) Y=1.143X-20.574+100 X-18=0.56Y-56 Y=1.143X+79.426 R.Line X=0.56Y-56+18 When X=70 X=0.56Y-38 R.Line Y=1.143 X 79 + 79.426 When Y=90 Y=80.01+79.426 X=0.56 X 90-38 Y=159.436 =50.4-38 X=12.4 142
6. 6. ILLUSTRATION=05 To study the relationship between expenditure on a accommodation (X) and expenditure on Food (Y), an enquiry in to 50 families gave the following result; ∑X=8500, ∑Y=9600, X=60, Y=20, r=0.60 Estimate the expenditure on food when expenditure on accommodation is Rs200. SOLUTION To estimate expenditure on food, we should use Y on X Regression Equation. ∑X 8500 ∑y 9600 Y X= = =170, Y= =192 (Y-Y)=r (X-X) n 50 n 50 20 (Y-192)=0.6 (X-170) when X=200 60 Y=0.1999 X 200 + 158 Y-192=0.1999(X-170) =39.98+158 Y-192=0.1999X-33.9999 Y=Rs.197.98 Y=0.1999X+158 R.L Rs.197.98 is required to be spent on food. ILLUSTRATION=06 Obtain the two Regression Equations from the following; X-Series Y-Series Mean 20 25 Variance 4 9 Coefficient of correlation =0.75 SOLUTION Obtaining of two Regression lines X on Y R. Equation Y on X R. Equation X= Variance Y= Variance = = =2 =3 bxy=Regression coefficient on x on y bxy=Regression coefficient on Y on X b=Regression coefficient b=Regression coefficient X Y bxy= r bxy= r Y X (X-X)=bxy (Y-Y) (Y-Y)=bxy (X-X) 2 3 X-20=0.75 (Y-25) Y-25=0.75 (X-20) 3 2 X-20=0.5 (Y-25) Y-25=1.125 (X-20) X-20=0.5-12.5 Y-25=1.125-22.5 X=0.5-12.5+20 Y=1.125X-22.5+25 X=0.5+7.5 R.Line ILLUSTRATION=07 Y=1.125+2.5 R.Line 143
7. 7. ILLUSTRATION = 07 You are given the following data. X-Sries Y-Series Mean 47 96 Variance 64 81 Coefficient of Correlation =0.36 Calculate Y when X is 50, and X when Y is 88. SOLUTION X on Y R.Equation Y on X R.Equation X= Variance = 64 = 8 Y= Variance = 81 = 9 Y bxy= r bxy= r Y X-X =bxy (Y-Y) (Y-Y) =bxy (X-X) 8 9 X-47=0.36 (Y-96) Y-96=0.36 (X-47) 9 8 X-45=0.3199 (Y-96) Y-96=0.405 (X-47) X-47=0.3199Y-30.7199 Y-96=0.405X-19.035 X=0.3199Y-30.7199+47 Y=0.405X-19.035+96 X=0.3199Y+16.28 R.Line Y=0.405X+76.965 R.Line When Y=88 When X=50 X=0.3199 X 88 + 16.28 Y=0.405 X 50 + 76.965 X=28.1512 + 16.28 =20.25 + 76.965 X= 44.4312 Y= 97.215 ILLUSTRATION=08 The following results for heights and weights of 100 men were calculated. Coefficient of Mean Standard Deviation Correlation Weights 150 lbs 20 lbs 0.60 Heights 68 ” 2.5 “ Find an estimate 1. The weight of a man whose height is 5’ (5’=60”) 2. Height of a man whose is 200 lbs SOLUTION Let X= Weight and Y = Height. 144
8. 8. X on Y R Equation X on Y R Equation (X-X)=bxy (Y-Y) (Y-Y)=byx (X-X) 20 20 (X-150)= X 0.6 (Y-68) (Y-68)= (X-150) 2.5 2.5 X-150=4.8 (Y-68) Y-68=0.075 (X-150) X-150= 4.8Y-326.4 Y-68= 0.075X-11.25 X= 4.8Y-326.4+150 Y= 0.075X-11.25+68 X=4.8Y-176.4 RL when Y=60 5 Y=0.075X+176.4 RL when X=200 lbs X=4.8 X 600-176.4 Y=0.075 X 200 + 56.75 X=111.6” Y =71.75 lbs OR X =9’-3.6” REGRESSION COEFFICIENTS Regression coefficient is denoted by ‘b’. There are two regression equations and therefore there are two regression coefficients also. Regression coefficients measure the changes in the series corresponding to a unit change in the other series. The Regression coefficient of X on Y X i.e bxy =r Y Give us the value by which X-variable changes for a unit change in the value of Y-variable. ∑dxdy X n – (∑dx X ∑dy) ∴bxy = ∑d 2Yxn - (∑dy)2 Similarly the regression on of Y on X Y i.e. byx =r X Refers to the value by which Y-variable changes for a unit change in X-variable ∑dxdy X n – (∑dx X ∑dy) ∴byx = ∑d 2xX n-(∑dx)2 These two coefficients measure the change in dependent variable corresponding to the unit change in independent variable. They also help in direct calculation of coefficient of correlation. Square – root of the product of two Regression coefficient gives us the value of correlation, as under; X Y Bxy X box = ς Xr Y X Bxy X byx =ς2 ∴r = bxy X byx 145
9. 9. CALCULATION OF REGRESSION COEFFICENTS AND MAKING ESTIMATION OF UN- KNOWN VALUE INDIVIDUAL SERIES = When actual data is given and deviation are taken from assumed mean ILUSTRATION =09 From the data given below find out; a. Regression coefficients b. Regression Equations c. Estimate the age when B.P is 130 d. Estimate the B.P when age is 50 years e. Find the coefficient of correlation through Regression coefficients. Age 56 42 72 36 63 47 55 49 38 42 68 60 B.P 147 125 160 118 149 128 150 145 115 140 152 155 SOLUTION Age X-47 B.P Y-128 D2x D2Y dxdy ∑dx X dx Y dy 56 9 81 147 19 361 171 X=A+ X C 42 -5 25 125 -3 9 15 N 72 25 625 160 32 1024 800 64 36 -11 121 118 -10 100 110 =47+ X1 63 16 256 149 21 441 336 12 47 0 0 128 0 0 0 X=52.33 55 8 64 150 22 484 176 ∑dy 49 2 4 145 17 289 34 Y=A + XC 38 -9 81 115 -13 169 117 n 42 -5 25 140 12 144 -60 148 68 21 441 152 24 576 504 =128+ X1 60 13 169 155 27 729 351 12 N= 64 1892 N= 148 4326 2554 =128+12.33 12 ∑dx ∑d2x 12 ∑dy ∑d2y ∑dxdy Y= 140.33 Regression coefficient X onY Regression coefficient X on Y X Y bxy= ς Y bxy= ς x ∑dxdy X n – (∑dx X ∑dy) ∑dxdy X n – (∑dx X ∑dy) byx = ∑d 2Y X n - (∑dY)2 byx = ∑d 2x X n - (∑dX)2 = 2554 X 12 – 64X148 = 2554 X 12 – 64X148 4326X12 – (148) 2 1892 X12 – (64) 2 = 30648 – 9472 = 21176 51912 – 21904 22704 – 4096 = 21176 =0.7057 = 21176 30008 0.7057 18608 =1.138 X on Y =R. Equation X on Y =R. Equation (x- 0)=bxy (Y-Y) (Y-Y)=byx (x- 0) (X-52.33)=0.7057 (Y-140.33) Y-140.33=1.138 (X-52.33) 146
10. 10. X-52.33=0.7057Y-99.031 Y-140.33=1.138X-59.55 X=0.7057Y-99.031+52.33 Y=1.138X-59.55+140.33 X=0.7057Y-46.701 Y=1.138X-80.78 Estimation of age (X) when Estimation of B.P (Y) when B.P(Y) is 130 Age(X) is 50 years X=0.7057 X 130-46.701 Y=1.138 X 50-80.78 =91.741-46.701 =56.9-80.78 X=45.04 years Y=137.68 Coefficient of correlation =√bxy X bys = √0.7057 X 1.138 ς=0.896 ILLUSTRATION=10 From the following data, obtain the two Regression Equations. Also calculate coefficient of correlation based on regression coefficient. Sales: X 91 97 108 121 67 124 51 73 111 57 Purchases: Y 71 75 69 97 70 91 39 61 80 47 SOLUTION X-67 Y-70 X dx2 Y Dx2 dxdy dx dy 91 24 576 71 1 1 24 X=A +∑dx X C 97 30 900 75 5 25 150 W 108 41 1681 69 -1 1 -41 121 54 2416 97 27 729 1458 =67+230 X 1 67 0 0 70 0 0 0 10 124 57 3249 91 21 441 1197 =90 51 -16 256 39 -31 961 496 73 6 36 61 -9 81 -54 Y= A + ∑dy X C 111 44 1936 80 10 100 440 N 57 -10 100 47 -23 529 230 =70 + 0 X 1 230 11150 0 2868 3900 10 ∑dx ∑d2x ∑dy ∑d2x ∑dxdy Y = 70 X on y Regression on coefficients Y on X Regression on coefficients X Y Bxy =ς Bxy =ς Y X ∑dxdy X n – (∑dx X ∑dy) ∑dxdy X n – (∑dx X ∑dy) Bxy= Bxy= ∑dy2 X n – (∑dy)2 ∑d2x X n – (∑dX)2 = 3900 X 10 – (230 X 0) = 3900 X 10 – (230 X 0) 2868 X 10 – (0) 2 11150 X 10 – (230) 2 =39000 – 0 = 39000 =39000 = 39000 28680 – 0 28680 = 1.359 11150 - 52900 = 1.359 = 0.665 147
11. 11. Regression Equation Regression Equation (X-X) = bxy (Y-Y) (Y-Y) = byx (X-X) X-90 = 1.359 (Y-70) (Y-70)= 0.665 (X-90) X-90 = 1.359Y – 95.B Y-70 = 0.665X – 59.85 X = 1.359Y – 95.B + 90 Y = 0.665X – 59.85 + 70 X = 1.359Y - 5.13 R.Line Y = 0.665X + 10.15 R.Line Coefficient of Correlation = √bxy X byx =√1.359 X 0.665 = 0.9506 ILLUSTRATION = 11 The following data related to the ages of husband and wives. Obtain the two Regression equations and estimate the most likely age of husband for the age of wife 25 years. Ages of husbands 25 28 30 32 35 36 38 39 42 55 Ages of wife’s 20 26 29 30 25 18 26 35 35 46 SOLUTION X = A + ∑dx X C x-36 2 Y-29 2 N X Dx Y Dy dxdy dx dy = 36 + 0 X 1 25 -4 121 20 -9 81 99 10 28 -8 64 26 -3 9 24 X =36 30 -6 36 29 0 0 0 32 -4 16 30 1 1 -4 Y = A + ∑dy X C 35 -1 1 25 -4 16 4 N 36 0 0 18 -11 121 0 =29 + 0 X 1 38 2 4 26 -3 9 -6 10 39 3 9 35 6 36 18 Y = 29 42 6 36 35 6 36 36 X 55 19 361 46 17 289 323 Bxy = r R. coefficient. 0 648 0 598 494 Y N=10 ∑dx ∑d2x ∑dy ∑d2y ∑dxdy Y Box = r R. coefficient X ∑dxdy X n – (∑dx X ∑dy) ∑dxdy X n – (∑dx X ∑dy) byx = byx = ∑d 2y X n-(∑dx)2 ∑d 2xX n-(∑dx)2 = 494 X 10 – 0 X 0 = 494 X 10 – (0 X 0) 598X 10 – (0) 2 648X 10 – (0) 2 = 4940 = 4940 =0.8261 6480 =0.7623 148
12. 12. Regression Equation Regression Equation X – X = bxy (Y-Y) Y – Y = byx (X-X) X – 36 = 0.8261Y – (Y-29) Y – 29 = 0.7623 – (X-36) X –36 = 0.8261Y – 23.9569 Y –29 = 0.7623X – 27.4428 X=0.8261Y – 23.9569 + 36 Y=0.7623X – 27.4428 + 29 X = 0.8261Y + 12.0431 R.L Y = 0.7623X + 1.5572 R.Line If a wife’s age is 25 (y) X = 0.8261 X 25 + 12.0431 Coefficient of correlation =20.6525 + 12.0431 r=√bxy X byx X = 32.6956 =√0.8261 X 0.7623 Husband’s age is 32.6956 years r = 0.7935 ILLUSTRATION =12 A panel of two Judges P and Q graded dramatic performance by independently awarding marks as follows. Performance 1 2 3 4 5 6 7 Marks by ‘P’ 46 42 44 40 43 41 45 Marks by ‘Q’ 40 38 36 35 39 37 41 The eight performance which judge Q could not attend, was awarded 37 marks by judge P. If Judge Q had also been present, how many marks could be expected to have been awarded by him to the eight performances. SOLUTION Let the marks awarded by judge P be represented by X and those awarded by judge Q be Y. We have to find out the value of Y when X=37. This can be done by finding out the regression equation Y on X. Computation of Regression Equation Y on X X-43 Y-38 ∑dx X D2X Y D2Y dxdy X=A+ X C Dx dy 46 3 9 40 2 4 6 N 42 -1 1 38 0 0 0 44 1 1 36 -2 4 -2 =43+ 0 X 1 40 -3 9 35 -3 9 9 7 43 0 0 39 1 1 0 X=43 41 -2 4 37 -1 1 2 45 2 4 41 3 9 6 Y=A + ∑dy X C 0 28 0 28 21 N ∑dx ∑d2X ∑dy ∑d2y ∑dxdy =38 + 0 X 1 Regression Equation of Y on X 7 Y=38 Y- Y = bxy (X-X) Y – 38 = bxy (X-43) X ∑dxdy X n – (∑dx X dy) 21 X 7 – 0 147 Bxy= r ∴bxy = ∑d2x X n – (∑dx)2 28 X 7 – 0 = 196 = 0.75 Y Y – 38 = 0.75 (X-43) Y-38 = 0.75X – 32.25 Y=0.75x +38 – 32.25 Y=0.75x + 5.75 R.Line When X = 37 =0.75 X 37 + 5.75 Y=33.5 149
13. 13. If judge Q was present, he would have awarded 33.5 marks. REGRESSION EQUATION IN A BIVARIATE GROUPED FREQUENCY DISTRIBUTION The procedure is the same as we have followed in case of individual series. The modified formula is as under ; Regression coefficient of X on Y X i.e,bxy=ς Y ∑fdxdy X N – (∑fdx X ∑fdy) c of x bxy = X ∑fd2y X N - (∑fdy)2 c of y Regression coefficient of Y on X Y i.e box = r X ∑fdxdy X N – (∑fdx X ∑fdy) c of y box = X ∑fd2x X N – (∑fdx)2 c of x Coefficient of correlation = √bxy X byx ILLUSTRATION = 12 Following table gives the ages of husbands and wives for 50 newly married couples. Find the two regression lines. Also estimate. A) The age of husband when wife is 20 and B) The age of wife when husband is 30. Age of Husbands Age of wives 20-25 25-30 30-35 Total 16-20 9 14 - 23 20-24 6 11 3 20 24-28 - - 7 7 Total 15 25 10 50 SOLUTION Class interval for age of husband x is = 5 Class interval for age of wife (Y) is =4 X – 27.5 Dx = 5 Y – 22 dy = 4 150
14. 14. A=27.5 X 20-25 25-30 30-35 Total C=5 A=22 22.5 27.5 32.5 C=4 dx Y MV -1 0 1 f fdy fd2y fdxdy dy 9 16-20 18 -1 9 14 - 23 -23 23 9 20-24 22 0 6 11 3 20 0 0 0 7 24-28 26 1 - - 7 7 7 7 7 50 -16 30 Total F 15 25 10 16 N ∑fsy ∑fd2y -5 fdx -15 0 10 ∑fdx 25 Fd2x 15 0 10 ∑fd2x fdxdy 9 0 7 16 X on Y R.E Y on X R.E ∑fdx –5 ∑fdx –16 X=A+ X C = 27.5 + X5 Y=A+ X C = 22 + X4 N 50 N 50 = 27 64 = 27 – Regression Coefficient of X on Y 50 ∑fdxdy X N – (∑fdx X ∑fdy) c of x = 22 – 1.28 = 20.72 bxy = X ∑fdxdy X N – (∑fdx X ∑fdy) c of y 2 2 ∑fd y X n – (∑dy) c of y bxy = X =16 X 50 – (-5 X –16) 5 X 5 ∑fd2y X n – (∑fdy)2 c of x 30 X 50 – (-16)2 4 4 =16 X 50 – (-5 X –16) 4 X 4 800 – 80 5 720 5 25 X 50 – (-5)2 4 5 = X = X 800 – 80 4 720 4 1500 –256 4 1244 4 = X = X = 3600 1500 –256 5 1225 5 4976 = 0.723 = 2880 6125 = 0.47 (X-X) = bxy (Y-Y) X – 27 = 0.723 (Y – 20.72) (Y-Y) = byx (X-X) X – 27 = 0.723Y – 14.98 (Y – 20.72) = 0.47 (X – 27) X = 0.723Y – 14.98 + 27 Y – 20.72 = 0.47X – 12.69 X = 0.723Y + 12.02 R. Line Y = 0.47X – 12.69 + 20.72 Estimate of husband’s age when Y =20 Y= 0.47X + 12.03 R. Line X = 0.723 X 20 + 12.02 Estimate of wife’s age when X =30 X = 26.48 years Y = 0.47 X 30 + 8.03 = 1410 + 8.03 = 22.13 years r =√bxy X box =√0.723 X 0.47 = 0.5829 151
15. 15. ILLUSTRATION –14 The following are the marks obtained by 132 students in Test X and Test Y. calculate a) The Regression Coefficient b) Two Regression Equations c) Coefficient of correlation X 30-40 40-50 50-60 60-70 70-80 Total Y 20-30 2 5 3 - - 10 30-40 1 8 12 6 - 27 40-50 - 5 22 14 1 42 50-60 - 2 16 9 2 29 60-70 - 1 8 6 1 16 70-80 - 2 4 2 8 Total 3 21 63 39 6 132 SOLUTION A=55 X 30-40 40-50 50-60 60-70 70-80 Total c=10 A=45 35 45 55 65 75 C=10 dx Y MV -2 -1 0 1 2 f fdy Fd2y fdxdy dy 8 10 20-30 25 -2 2 5 3 - - 10 -20 40 18 2 8 -6 30-40 35 -1 1 8 12 6 - 27 -27 27 4 0 0 0 40-50 45 0 - 5 22 14 1 42 0 0 0 -2 9 4 50-60 55 1 - 2 16 9 2 29 29 29 11 2 12 4 60-70 65 2 - 1 8 6 1 16 32 64 14 12 1 70-80 75 3 - - 2 4 2 2 8 24 72 24 132 38 232 Total F 3 21 63 39 6 71 n ∑fdy ∑fd2y 24 Fdx -6 -21 0 39 12 ∑fdx 96 Fd2x 12 21 0 39 24 ∑fd2x fdxdy 10 14 0 27 20 71 ∑fdx ∑fdy X=A+ XC Y=A+ XC N N =55 + 24 X 10 =45 + 38 X 10 132 132 =55 + 240 =45 + 380 132 132 =55 + 1.82 X = 56.82 =45 + 2.878 = 47.878 152
16. 16. Regression on Coefficient of X on Y Regression on Coefficient of Y on X ∑fdxdy X N – (∑fdx X ∑fdy) C of X ∑fdxdy X N – (∑fdx X ∑fdy) C of Y bxy = X byx = X ∑fd2y X N – (∑fdy)2 C of Y ∑fd2x X N – (∑fdx)2 C of X = 71 X 132 – (24 X 38) 10 = 71 X 132 – (24 X 38) 10 232 X 132 – (38)2 10 96 X 132 – (24)2 10 = 9372 – 912 = 8460 = 8460 = 8460 30624 – 1444 29180 =0.289 12672 – 576 12096 =0.699 R. Equation R. Equation X-X=bxy (Y-Y) Y-Y=bxy (X-X) X-56.82 = 0.289 (Y-47.88) Y-47.88 = 0.699 (X-56.82) X-56.82=0.29Y – 13.8852 Y-47.88=0.7x– 39.774 X=0.29Y – 13.8852 + 56.82 Y=47.88=0.7x-39.774 X=0.29Y + 42.93 R.Line Y=0.7x + 8.11 R.Line Coefficient of Correlation = √bxy X byx =√0.29 X 0.7 = 0.450 ILLUSTRATION = 15 Following is the distribution of students according to their Height and Weight. Height Weight in lbsY In inches X 90-100 100-110 110-120 120-130 TOTAL 50-55 4 7 5 2 18 55-60 6 10 7 4 27 60-65 6 12 10 7 35 65-70 3 8 6 3 20 TOTAL 19 37 28 16 100 From the above, a) Estimate the weight when height is 63 inches b) Estimate the height when weight is 115 lbs c) Calculate coefficient of correlation SOLUTION: Let X be height in inches, Let Y be weight is lbs ∑fdx ∑fdy X=A XC Y= A XC N N - 43 - 59 =62.5 + X 5 =115 + X 10 100 100 = 62.5 – 215 = 115 - 590 100 100 = 60.35 Y = 109.1 153
17. 17. Tot Y 90-100 100-110 110-120 120-130 al 95 105 115 125 dy X MV -2 -1 0 1 f fdx fd2x fdxdy dx 16 14 -4 50-55 52.5 -2 4 7 5 2 18 -36 72 26 12 10 -4 1 55-60 57.5 -1 6 10 7 4 27 -27 27 8 0 0 0 60-65 62.5 0 6 12 10 7 35 0 0 0 -6 -8 3 65-70 67.5 1 3 8 6 3 20 20 20 -11 100 -43 119 Total f 19 37 28 16 33 N ∑fdx ∑fd2x fdxy -38 -37 0 16 -59 ∑fdy 12 ∑fd2 fd2y 76 37 0 16 ∑fdxdy 9 y fdxdy 22 16 0 -5 33 X on Y Regression Equation Y on X Regression Equation X Y bxy = r byx = r Y X ∑fdxdy X N – (∑fdx X ∑fdy) Cof x ∑fdxdy X N – (∑fdx X ∑fdy) Cof y ∴bxy = X ∴byx = X ∑fd2y X N – (∑fdy)2 Cof y ∑fd2x X N – (∑fdx)2 Cof x =33 X 100 –(-43 X 59) 5 =33 X 100 –(-43 X 59) 10 2 129 X 100 – (59)2 10 119 X 100 – (-43) 5 3300 - 2537 3300 + 2537 2 = X 0.5 = X 12900 – 3481 11900 – 1849 1 = 763 X 0.5 = 381.5 = 763 X 2 =0.15 9419 1 9419 = 0.0405 10051 byx =01518 R. Equation R. Equation (X – X) = bxy (Y-Y) (Y – Y) = bxy (X-X) X – 60.35 = 0.0405 (Y – 109.1) Y – 109.1 = 0.1518 (X – 60.35) X – 60.35 = 0.0405y – 4.41855 Y – 109.1 = 0.1518x – 9.16113 X=0.0405y – 4.41855 + 60.35 Y=0.1518x – 9.16113 + 109.1 X=0.0405y + 55.93145 R.L Y=0.1518x + 99.93897 R.L Estimation of height (x) when weight (y) is 115 Estimation of weight (y) when height (x) is 63 lbs. inches. X=0.0405 X 115 + 55.93145 Y=0.1518 X 63 + 99.93897 X=4.6575 + 55.93145 =9.5634 + 99.93897 X=60.6 inches height Y=109.5 lbs r=√bxy X box =√0.0405 X 04518 = 0.0784 154
18. 18. ILLUSTRATION = 16 From the following data find: a) The most probable value of Y, when X is 60 and b) The most probable value of X, when Y is 40 and c) The coefficient of correlation X =53.2, Y=27.9, byx -1.5 and bxy = - 0.2 SOLUTION X on Y R.Equation Y on X R.Equation X (Y-Y) = box (X-X) (X-X)=r (Y-Y) Y-27.9 = -1.5 (X-53.2) Y Y-27.9 = - 1.5x + 79.8 (X-53.2)=-0.2 (Y-27.9) Y = - 1.5x + 79.8 + 27.9 X-53.2 = - 0.2Y + 5.58 Y=1.5x + 107.7 R.L X = -0.2Y + 5.58 + 53.2 If x is 60 X = -0.2Y + 58.78 R.Line Y = -1.5 X 60 + 107.7 If Y is 40 = - 90 + 107.7 X= - 0.2 X 40 + 58.78 Y = 17.2 X = 50.78 Coefficient correlation will be r = √bxy X box = √-1.5 X –0.2 = - 0.5477 THEORETICAL QUESTIONS (5 , 10 & 15 Marks) 1. What is meant by Regression? How is this concept useful to business fore casting? 2. Destination clearly between correlation and Regression analysis. 3. What is Regression analysis? State its uses. 4. Define Regression and explain its importance 5. Briefly explain: a. Regression line b. Regression Equation c. Regression Coefficient PRACTICAL PROBLEMS 6. Given the following data, calculate, a. The expected value of Y when X=60 b. The expected value of X when Y=120 X Y Mean 65 120 SD 5 10 Coefficient of correlation = 0.6 [Answers, X=65 Y=114] PROBLEM = 07 Given the following data estimate the marks in Mathematics for a student who has secured 60 marks in English. Arithmetic Average of Marks in Maths = 80 Arithmetic Average of Marks in English = 50 SD of Marks in Mathematics _ _ _ _ _ _ _ 15 SD of Marks in English _ _ _ _ _ _ _ _ _ _ 10 Coefficient of Correlation _ _ _ _ _ _ _ _ _ _ 0.4 [Answer : 86] 155
19. 19. PROBLEM = 08 Find the most likely Price in Bangalore corresponding to the price ofRs.70 at Mysore from the following data Average price at Mysore = Rs.65 Average price at Bangalore = Rs.67 SD of Price at Mysore = Rs.2.5 SD of Price at Bangalore = Rs.3.5 Coefficient of correlation between the two prices of the commodity in the two cities is 0.8. Also estimate the price at Mysore Corresponding to the price Rs.50 at Bangalore. [Answer: 72.6 and 55.3] PROBLEM = 09 You are given the following data. X Y Mean 36 85 S. D. 11 8 Coefficient of correlation = 0.66 1.Find the two regression equations 2.Estimate the Value of X when Y = 75 [Answer X75= 26.92] PROBLEM = 10 The following are the marks in Statistics (X) and Mathematics (Y) of ten students X 56 55 58 58 57 56 60 64 69 57 Y 68 67 67 70 65 68 70 66 68 66 Calculate the coefficient of correlation based on bxy and byx also estimate the marks in Mathematics of a student who scores 62 marks in Statistics. [Answer: r = 0.78,bxy= 0.0294, Y = 67.59] PROBLEM NO: 11 From the following data, obtain both the regression equations and estimate the demand (Y) if the price (X) is 75. Price (X) 60 63 66 69 72 78 81 90 96 99 Demand(Y) 85 87 84 80 82 79 78 73 70 72 PROBLEM NO: 12 Form the data given below, find a. The two regression equations b. The Coefficient of Correlation between the marks in Economics and Statistics. c. The most likely marks in Statistics when marks in Economics are 30. Marks in Economics X 25 28 35 32 31 36 39 38 34 32 Marks in Statistics Y 43 46 49 41 36 32 31 30 33 39 [Ans: X = 40.892 –1.234Y, Y = 59.248 –0.664X, r =0.394, Y= 39] PROBLEM =13 The following data relate to price and demand of a commodity a) Estimate demand when price is Rs.30 b) Estimate price when demand is 65 units c) Coefficient of correlation. Demand in units 20 22 25 23 18 16 14 17 21 19 Price in Rs 50 45 38 42 55 58 59 54 49 57 [Answer a) 29.6 b) 13.21 c) r = - 0.94] PROBLEM = 14 The following table shows the frequency distribution of couples classified according to the ages. Calculate, a) Obtain two Regression coefficients. b) Estimate the age of husband when wife’s age is 28 years. 156
20. 20. c) Calculate coefficient of correlation. Wife’s age Husbands age in years X In years Y 20-25 25-30 30-35 35-40 TOTAL 15-20 20 10 3 2 35 20-25 4 18 6 4 32 25-30 - 5 11 - 16 30-35 - - 2 - 2 35-40 - - - 5 5 TOTAL 24 33 22 11 90 [ Answers, r = 0.612, X = 22.5, Y = 28.6, b = 31.7 , box = 0.558 ] PROBLEM = 15 From the following data, a) Estimate X when Y = 30 and also b) Estimate Y when X = 20 X 5-15 15-25 25-35 35-45 TOTAL Y 0-10 1 1 - - 2 10-20 3 6 5 1 15 20-30 1 8 9 2 20 30-40 - 3 9 3 15 40-50 - - 4 4 8 TOTAL 5 18 27 10 60 [Answer a) 28.7 b)22.31] PROBLEM NO =16 From the following data, calculate a) Regression coefficients b) Coefficient of correlation based on bxy and box. Y 30-35 35-40 40-45 45-50 TOTAL X 25-30 20 10 3 2 35 30-35 4 28 6 4 42 35-40 - 5 11 - 16 40-45 - - 2 - 2 45-50 - - - 5 5 TOTAL 24 43 22 11 100 [Answer: X = 32.5, Y = 38.5 bxy = 0.6744 box = 0.5576, ς = 0.6132] PROBLEM = 17 Calculate two Regression Coefficients. Estimate the value of X when Y = 49 also calculate coefficient of correlation based on bxy and box. X 43 44 46 40 44 42 45 42 38 40 42 57 Y 29 31 19 18 19 27 27 29 41 30 26 10 [Answer X = 64.8, Y = ? , bxy = -0.44, byx = -1.2198, ς= -0.732] PROBLEM = 18 From the following bivariate table calculate the following a) Two Regression coefficients b) Coefficient of correlation based on bxy and box X 59.9 79.5 99.5 119.5 139.5 159.5 179.5 TOTAL Y 2.25 3 4 3 6 2 1 1 20 7.25 2 3 5 10 3 1 1 25 12.25 5 4 6 11 5 3 3 37 17.25 10 11 12 15 12 15 10 85 22.25 4 2 3 10 7 5 6 37 27.25 1 1 2 8 8 5 4 29 32.25 1 1 1 10 5 4 5 27 TOTAL 26 26 32 70 42 34 30 260 157
21. 21. [Answer: X = 17.80, Y = 122.42, bxy = 0.05, box = 1.06, r = 0.230] 158