Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Stats 4


Published on

Published in: Technology, Economy & Finance
  • Be the first to comment

Stats 4

  1. 1. REGRESSION Literally the word regression means ‘return to the origin’. In statistics, the word is used in a different sense. If two variables are correlated, the unknown value of one of the variables can be estimated by using the known value of the other variable. The so estimated value may not be equal to the actually observed value, but it will be close to the actual value. Regression Analysis, in general sense, means the estimation or prediction of the unknown value of one variable from the known value of the other variable. The Regression Analysis confined to the study of only two variables at a time is termed as Simple Regression. But quite often the values of a particular phenomenon may be affected by multiplicity of causes. The Regression analysis for studying more than two variables at a time is known as Multiple Regression. In Regression Analysis there are two types of variables. The variable whose value is influenced or is to be predicted is called dependent variable. The variable which influences the values or used for prediction is called independent variable. The Regression Analysis independent variable is known as regressor or predictor or explanator while the dependent variable is also known as regressed or explained variable. LINEAR & NON-LINEAR REGRESSION If the given bivariate data are plotted on a graph, the points so obtained on the diagram will more or less concentrate around a curve, called the “Curve of Regression”. The mathematical equation of the Regression curve, is called the Regression Equation. If the regression curve is a straight line, we say that there is linear regression between the variables under study. If the curve of regression is not a straight line, the regression is termed as curved or non-linear regression. The property of the tendency of the actual value to lie close to the estimated value is called regression. In a wider usage regression is the theory of estimation of unknown value of a variable with the help of known values of the variables. The regression theory was first introduced and developed by Sir Francis Galton in the field of Genetics.
  2. 2. Here, firstly, a mathematical relation between the two variables is framed. This relation which is called regression equation is obtained by the method of least squares. It may be linear or non – linear. For a bivariate data on x and y, the regression equation obtained with the assumption that x is dependent on y is called regression of x on y. The regression of x on y is: (x – AM of x ) = bxy (y – AM of y) The regression equation obtained with the assumption that y is dependent on x is called regression of y on x. the regression of y on x is – (y – AM of y) = byx (x – AM of x) The following set of formulas explains all the terms given below: r. бx Cov (x,y) r. бy Cov (x,y) bxy = bxy = byx = byx = бy бy2 бx бx2 nΣxy - Σx.Σy Σdx.dy nΣxy - Σx.Σy Σdx.dy bxy= bxy = byx= byx = Σdy2 Σdx2 nΣy2 -(Σy)2 nΣx2 -(Σx)2 The regression of x on y is used for the estimation of x values and the regression of y on x is used for the estimation of y values. The graph of the regression equations are the regression lines. PROPERTIES OF REGRESSION Regression coefficient are the coefficients of the independent variables in the regression equations. 1. The regression coefficient bxy is the change occurring in x for unit change in y. The regression coefficient byx is the change occurring in y for unit change in x.
  3. 3. 2. The regression coefficient is independent of the origin of measurements of the variables. But, they are dependent on the scale. 3. The geometric mean of regression coefficients is equal to the coefficient of correlation (numerically). 4. The regression coefficients cannot be of opposite signs. If r is positive, both the regression coefficients will be positive. If r is negative, both the regression coefficients will be negative. If r is zero, both the regression coefficients will be zero. 5. Since coefficient of correlation, numerically cannot be greater than 1, the product of regression coefficients cannot be greater than 1. PROPERTIES OF REGRESSION LINES There are two regression lines. 1. The regression lines intersect at ( x,y) 2. The regression lines have positive slope if the variables are positively correlated. They have negative slope if the variables are negatively correlated. 3. If there is perfect correlation, the regression lines coincide ( there will be only one regression line) LINES OF REGRESSION Line of regression is the lines which gives the best estimate of one variable for any given value of the other variable. In case of two variable say x & y, we shall have two regression equations; x on y and the other is y on x. Line of regression of y on x is the line which gives the best estimate for the value of y for any specified value of x. Line of regression of x on y is the line which gives the best estimate for the value of x for any specified value of y.
  4. 4. LINES OF REGRESSION OF y on x (y - AM of y) = (x – AM of x) r. бy бx LINES OF REGRESSION OF x on y (x – AM of x) = (y - AM of y) r. бx бy REMEMBER a. When r=0 i.e., when x & y are uncorrelated, then the lines of regression of y on x, and x on y are given as: y – y = 0 and x – x = 0. The lines are perpendicular to each other. b. When r=+1 then the two lines coincide. c. If the value of r is significant, we can use the lines of regression for estimation and prediction. d. If r is not significant, then the linear model is not a good fit and hence the line of regression should not be used for prediction. COEFFICIENTS OF REGRESSION a. bxy is the Coefficient of regression of x on y. b. byx is the Coefficient of regression of y on x. THEOREMS ON REGRESSION COEFFICIENTS a. The correlation coefficient is the Geometric Mean between the Regression Coefficients i.e., r2= bxy byx b. The sign to be taken before the square root is same as that of regression coefficients. c. If one of the regression coefficient is greater than one, then the other must be less than one. d. The AM of the modulus value of regression coefficients is greater than the GM of the modulus value of the Correlation Coefficient.
  5. 5. e. Regression coefficients are independent of change of origin but not of scale. Problem 1: X Y dx=X-X dy=Y-Y dx2 dy2 dxdy 91 71 1 1 1 1 1 Σdx.dy Σdx.dy 97 75 7 5 49 25 35 bxy = byx = Σdy2 Σdx2 105 69 18 -1 324 1 -18 121 97 31 27 961 729 837 3900 3900 67 70 -23 0 529 0 0 bxy = byx = 2868 6360 124 91 34 21 1156 441 714 51 39 -39 -31 1521 961 1209 1.361 0.6132 73 61 -17 -9 289 81 153 111 80 21 10 441 100 210 57 47 -33 -23 1089 529 759 900 700 0 0 6360 2868 3900 (x-x) = bxy (y-y) (y-y) = byx (x-x) (x-90) = 1.361(y-70) (y-70) = 0.6132 (x-90) x=1.361y - 5.27 y=0.6132x + 14.812 Problem 2: The data about the sales & advertisement expenditure of a firm is given below: Sales Advertisement Expenditure Means 40 6 Standard Deviations 10 1.5
  6. 6. Coefficient of Correlation is 0.9 o Estimate the likely sales for a proposed advertisement expenditure of Rs. 10 crores. o What should be the advertisement expenditure if the firm proposes a sales target of 60 crores of rupees? Answer: (x-x) = bxy (y-y) (y-y) = byx (x-x) r. бx r. бy byx = bxy = бx бy (x-40) = (0.9*10/1.5) (y-6) (y-6) = (0.9*1.50/10) (x-40) x = 6y+4 y = 0.135x+0.6 x = 6*10+4 y = 0.135*60+0.6 x = 64 y =8.7 Problem 3: Point out the consistency, if any, in the following statement: “The Regression Equation of y on x is 2y+3x=4 and the correlation coefficient between x & y is 0.8” Answer: Refer properties.
  7. 7. Problem 4: By using the following data, find out the two lines of regression and from them compute the Karl-Pearson’s coefficient of correlation: ΣX=250; ΣY=300; ΣXY=7900; ΣX2=6500; ΣY2=10000; n=10 Answer: nΣxy - Σx.Σy nΣxy - Σx.Σy bxy = byx = nΣx2 -(Σx)2 nΣy2 -(Σy)2 10*7900 – 250*300 10*7900 – 250*300 bxy = byx = 10*10000 -(300)2 10*6500 -(250)2 0.4 1.6 rxy 2 = bxy* bxy rxy2 = 1.6* 0.4 rxy = 0.8 Problem 5: Find the two regression coefficients and hence the r. n=5; X=10; Y=20; Σ(X-4)2=100; Σ(Y-10)2=160; Σ(X-4)(Y-10)=80 Answer: U=X-4; U=X-4=6; ΣU= nU = 30. Similarly ΣV=50
  8. 8. nΣUV - ΣU.ΣV nΣUV - ΣU.ΣV byx= byx= nΣU2 -(ΣU)2 nΣV2 -(ΣV)2 5*80 – 30*50 5*80 – 30*50 byx= = (11 byx= = (11 4) 5*160 -(50)2 17) 2 5*100 -(30) r = √(11/4)(11/17) = 1.33 ( it is impossible)