Successfully reported this slideshow.
Upcoming SlideShare
×

# Course pack unit 5

390 views

Published on

Course: MCA
Subject: Computer Oriented Numerical Statistical Methods
Unit-5

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### Course pack unit 5

1. 1. RAI UNIVERSITY, AHMEDABAD 1 Course: MCA Subject: Computer Oriented Numerical Statistical Methods Unit-5 RAI UNIVERSITY, AHMEDABAD
2. 2. RAI UNIVERSITY, AHMEDABAD 2 Unit-V-Regression Sr. No. Name of the Topic Page No. 1 Introduction and Definition of Regression Analysis 2 2 Regression lines ,Properties and its explanation 3 3 Regression coefficients and its Properties 5 4 Difference between Regression and Correlation 6 5 Example based on the Regression line and Regression Co- efficients 7 6 Example based on the fitting of regression line and estimation for bivariate frequency distribution 13 7 Advantage and limitations of Regression Analysis 17 8 References 18 9 Exercise 19
3. 3. RAI UNIVERSITY, AHMEDABAD 3 Unit-V-Regression 1.1 Introduction: If two variables are significantly correlated, and if there is some theoretical basis for doing so, it is possible to predict (estimate) values of one variable from the other. This observation leads to a very important conceptknown as βRegression Analysisβ. For example, if we know that the advertising and sales are correlated we find out expected amount of sales for a given advertising expenditure for attaining a given amount of sales. Similarly if we know the yield of rice and rainfall are closely related we may find out the amount of rain is required to achieve a certain production figure. In general Regression analysis means the estimation or prediction of the unknown value of one variable from the known value of the other variable. It is one of the most important statistical tools which is extensively used in almost all sciences β Natural, Social and Physical. 1.2 Definition: The dictionary meaning of βRegressionβ is returning or going back. The term βRegressionβ is first used by Sir Francis Galton (1822-1911) in 1877 while studying the relationship between the height of father and sons. This term was introduced by him in the paper of βRegression towards Mediocrity in healthcare structureβ. Regression analysis was explained by M. M. Blair as follows: βRegressionanalysis is a mathematical measure of the average relationship betweentwo or more variables in terms of the original units of the dataβ.
4. 4. RAI UNIVERSITY, AHMEDABAD 4 2.1 RegressionLine: Regressionline is the line which gives the bestestimate of one variable from the value of any other given variable. The regressionline gives the average relationship between the two variables in mathematical form. 2.2 The Regressionwouldhave the following properties: a) β(π¦ β π¦π) = 0 and b) β(π¦ β π¦π) 2 = Minimum β’ For two variables X and Y, there are always two lines of regression β (A)Regressionline of π on π : It gives the bestestimate for the value of X for any specific given values of Y π = π + π π where π = π₯ - intercept π = Slope of the line π₯ = Dependent variable π¦ = Independent variable (B) Regressionline of π on :
5. 5. RAI UNIVERSITY, AHMEDABAD 5 β’ It gives the best estimate for the value of π for any specific given values of π π = π + ππ Where π = π¦ - intercept π = Slope of the line π¦ = Dependent variable π₯= Independent variable 2.3 The Explanation of RegressionLine β’ In case of perfect correlation (positive or negative ) the two line of regression coincide. β’ If the two R. line are far from each other then degree of correlation is less, & vice versa. β’ The mean values of π₯ & π¦ can be obtained as the point of intersection of the two regression line. β’ The higher degree of correlation between the variables, the angle between the lines is smaller & vice versa. 2.4 RegressionEquation / Line & Method of LeastSquares 2.4.1 RegressionEquationof π on π π = π + ππ β’ In order to obtain the values of βπβ & βπβ βπ¦ = ππ + πβπ₯ βπ₯π¦ = πβπ₯ + πβπ₯2 2.4.2 RegressionEquationof π on π = π + ππ
6. 6. RAI UNIVERSITY, AHMEDABAD 6 β’ In order to obtain the values of βπβ & βπβ βπ₯ = ππ + πβπ¦ βπ₯π¦ = πβπ¦ + πβπ¦2 3.1 RegressionCoefficients: The regression coefficient between two variables is a numerical measure showing the change in the value of one variable for a unit change in the value of the other variable. 3.1.1 Formula for finding regressioncoefficient π ππ: Regression Equation of y on x: π¦ β π¦Μ = π π¦π₯ (π₯ β π₯Μ) π π¦π₯ = βππ βπ2 π π¦π₯ = π ( π π¦ ππ₯ ) Also by using the formula of π, ππ₯ πππ π π¦ we get π π¦π₯ = π β π₯π¦ββ π₯ β π¦ π β π₯2β(β π₯) 2 3.1.2 Formula for finding regressioncoefficient π ππ: Regression Equation of x on y: π₯ β π₯Μ = ππ₯π¦ (π¦ β π¦Μ) ππ₯π¦ = βππ βπ2 ππ₯π¦ = π ( ππ₯ π π¦ ) Also by using the formula of π, ππ₯ πππ π π¦ we get ππ₯π¦ = π β π₯π¦ββ π₯ β π¦ π β π¦2β(β π¦) 2 3.2 Properties of Regressionco-efficients:
7. 7. RAI UNIVERSITY, AHMEDABAD 7 (1) The productof regression co-efficients is equal to the square of the correlation co-efficient. Since π π¦π₯ = π ( π π¦ π π₯ ) and ππ₯π¦ = π ( π π₯ π π¦ ) π π¦π₯ Γ ππ₯π¦ = π2 β π π¦π₯ Γ ππ₯π¦ = π2 Thus regression coefficient is the geometric mean between two regression coefficients. (2) π π¦π₯ , ππ₯π¦ and π have always the same sign. πππππ βπ2 πππ βπ2 is always positive ,the signs of ππ₯π¦, π π¦π₯ and π depend upon the sign of βππ. If βππ is positive then ππ₯π¦, π π¦π₯ and π are positive and If βππis negative then ππ₯π¦, π π¦π₯ and π are negative. Thus all the three ππ₯π¦, π π¦π₯ and π have always the same sign. (3) If two variables have perfect relationship one regression co-efficient is reciprocal of the other. For perfect relationship π = Β± 1 Now π π¦π₯ Γ ππ₯π¦ = π2 π π¦π₯ Γ ππ₯π¦ = (Β± 1)2 = 1 β΄ π π¦π₯ = 1 π π₯π¦ (4)The productof regression co-efficients is π2 which can not exceed 1.Hence if one regression co-efficient is greater than 1, the other regression co- efficient is must be less than 1. (5) The regression co-efficients are independent of change of origin but not of scale.
8. 8. RAI UNIVERSITY, AHMEDABAD 8 4.1 Difference betweencorrelationand regression: Correlation Regression It gives a numerical measure of the linear relationship between the variables. It gives functional relationship between the variables, and this relationship helps us in estimating the value of one variable for a given value of another variable. Correlation co-efficient is always between -1 and +1. One regression co-efficient can be greater than 1. Correlation co-efficient is independent of change of origin and scale. Regression co-efficients are independent of change of origin but not of scale. Correlation co-efficient can be obtained from regression co-efficients. Regression co-efficient can not be obtained from only correlation co- efficient. 5.1 Example:- From the following data obtain the two regressionequation and calculate the regressionequationtaking deviation of items from mean of x and y series. X 6 2 10 4 8 Y 9 11 5 8 7 Solution:- Method-1 OBTAINING REGRESSION EQUATION π π ππ π± π π² π 6 9 54 36 81 2 11 22 4 121 10 5 50 100 25
9. 9. RAI UNIVERSITY, AHMEDABAD 9 4 8 32 16 64 8 7 56 64 49 βπ = ππ βπ¦ = 40 βπ₯π¦ = 214 βx2 = 220 βy2 = 340 Regression equation of y on x: π¦ = π + ππ₯ βπ¦ = ππ + πβπ₯ βπ₯π¦ = πβπ₯ + πβπ₯2 Substituting the values 40 = 5π + 30πβ―(π) 214 = 30π + 220πβ―( ππ) Multiplying equation (π) by 6, 240 = 30π + 180πβ― ( πππ) 214 = 30π + 220πβ―( ππ£) Subtracting equation ( ππ£) from ( πππ) we get β40π = 26 ππ π = β0.65 Substituting the value of b in equation( π) 40 = 5π + 30(β0.65) ππ 5π = 40 + 19.5 = 59.5 ππ π = 11.9 Putting the values of a and b in equation, the regression of π¦ on π₯ is π¦ = 11.9β 0.65π₯ Regression equation of π₯ on π¦: π₯ = π + ππ¦ βπ₯ = ππ + πβπ¦ βπ₯π¦ = πβπ¦ + πβπ¦2 30 = 5π + 40πβ―(π) 214 = 40π + 340πβ―(ππ) Multiplying equation ( π) by 8:
10. 10. RAI UNIVERSITY, AHMEDABAD 10 240 = 40π + 320πβ―(πππ) 214 = 40π + 340πβ―(ππ£) From equation ( πππ) and ( ππ£) β20π = 26 ππ π = β13 Substituting the value of b in equation ( π); 30 = 5π + 40(β1.3) ππ 5π = 30 + 52 = 82 π = 16.4 Putting the value of a and b in the equation, the regression line of π₯ on π¦ is π₯ = 16.4β 1.3π¦ Now we find the Regression line by using the second method. Method-2 Here, we use the formula of regression line which contain regression coefficients. CALCULATION OF REGRESSION EQUATIONS π π β πΜ = πΏ πΏ π Y π β πΜ = π π π πΏπ 6 0 0 9 +1 1 0 2 -4 16 11 +3 9 -12 10 +4 16 5 -3 9 -12 4 -2 4 8 0 0 0 8 +2 4 7 -1 1 -2 βπ = ππ βπ = 0 βπ2 = 40 βπ¦ = 40 βπ = 0 βπ2 = 20 βππ = β26 π₯Μ = 30 5 = 6 ; π¦Μ = 40 5 = 8
11. 11. RAI UNIVERSITY, AHMEDABAD 11 The line of regression π₯ on π¦ is ( π₯ β π₯Μ) = π ππ₯ π π¦ ( π¦ β π¦Μ) π ππ₯ π π¦ = βππ βπ2 = β26 20 = β1.3 π₯ β 6 = β1.3( π¦ β 8) = β1.3π¦ + 10.4 π₯ = β1.3π¦ + 10.4+ 6 π₯ = 16.4β 1.3π¦ The line of regression π¦ on π₯ is ( π¦ β π¦Μ) = π π π¦ ππ₯ ( π₯ β π₯Μ) π π π¦ ππ₯ = βππ βπ2 = β26 40 = β0.65 π¦ β 8 = β0.65( π₯ β 6) = β0.65π₯ + 3.9 π¦ = β0.65π₯ + 3.9+ 8 π¦ = 11.9β 0.65π₯ Thus we find the same answer what obtained earlier. However, the calculations are very much simplified without the use of the normal equation. 5.2 Exampleβ The following Information is obtained from result of an examination. Marks in Mathematics (x) Marks in English (y) Average 39.5 47.5 S.D. 10.8 16.8 Correlationco-efficientbetween π and π = 0.42
12. 12. RAI UNIVERSITY, AHMEDABAD 12 Obtain the two regressionlines and hence estimate π for π = ππ and π for π = ππ. Solution: The equation of regression line of π¦ on π₯ is: π¦ = π + π π¦π₯ π₯ Where π π¦π₯ = π π π¦ π π₯ = 0.42 16.8 10.8 = 0.653 And π = π¦Μ β π π¦π₯ π₯Μ π = 47.5β 0.653(39.5) π = 47.5β 25.79 π = 21.71 β΄ π¦ = 21.71+ 0.653π₯ is the regression line of π¦ on π₯. The equation of regression line of π₯ on π¦ is π₯ = π + ππ₯π¦ π¦ Where ππ₯π¦ = π π π₯ π π¦ = 0.42 10.8 16.8 = 0.27 And π = π₯Μ β ππ₯π¦ π¦Μ π = 39.5β 0.27(47.5) π = 39.5β 12.82 π = 26.68 β΄ π₯ = 26.68+ 0.27π¦ is regression line of π₯ on π¦. When π₯ = 50, the estimated value of π¦ is π¦ = 21.71+ 0.653(50) π¦ = 21.71+ 32.65
13. 13. RAI UNIVERSITY, AHMEDABAD 13 π¦ = 54.36 When π¦ = 30, the estimated value of π₯ is π₯ = 26.68+ 0.27(30) π₯ = 26.68+ 8.10 π₯ = 34.78 5.3 Exampleβ The following information is obtained for two variables π and π. Find regressionequationof π on π. π = ππ; β π = πππ,β π = πππ, β π π = ππππ; β ππ = ππππ. Solution: Supposethe regression line of π¦ on π₯ is π¦ = π + π π¦π₯ π₯ Here π₯Μ = β π₯ π = 130 10 = 13 π¦Μ = β π¦ π = 220 10 = 22 π π¦π₯ = π β π₯π¦ββ π₯ β π¦ π β π₯2β(β π₯) 2 π π¦π₯ = 10(3467)β(130)(220) 10(2288)β(130)2 π π¦π₯ = 34670β28600 22880β16900 π π¦π₯ = 6070 5980 π π¦π₯ = 1.015 And π = π¦Μ β π π¦π₯ π₯Μ π = 22 β 1.015(13) π = 8.805
14. 14. RAI UNIVERSITY, AHMEDABAD 14 β΄ π¦ = 8.805+ 1.015π₯ Is the regression line of π¦ on π₯. 6. Fitting of regressionlines and estimation for bivariate frequency Distribution: 6.1 Example: Find two lines of regressionfrom the following bivariate table: Age of Wife Age Of Husband Solution: β β π Y 10-20 20-30 30-40 40-50 50-60 π π M.V. π π ππ π π π π π πππ 15-25 (24) 6 (6) 3 9 20 -2 -18 36 30 25-35 (6) 3 (16) 16 (0) 10 29 30 -1 -29 29 22 10-20 20-30 30-40 40-50 50-60 15-25 6 3 - - - 25-35 3 16 10 - - 35-45 - 10 15 7 - 45-55 - - 7 10 4 55-65 - - 4 5
15. 15. RAI UNIVERSITY, AHMEDABAD 15 35-45 (0) 10 (0) 15 (0) 7 32 40 0 0 0 0 45-55 (0) 7 (10) 10 (8) 4 21 50 1 21 21 18 55-65 (8) 4 (20) 5 9 60 2 18 36 28 π π 9 29 32 21 9 100 -8 122 98 M.V. π 15 25 35 45 55 π -2 -1 0 1 2 ππ π -18 -29 0 21 18 -8 π π π π 36 29 0 21 36 122 πππ 30 22 0 18 28 98 Here πΜ = π΄ + β π’ππ’ π Γ πΆπ₯ πΜ = 35 + β8 100 Γ 10 πΜ = 34.2 πΜ = π΅ + β π£ππ£ π Γ πΆπ¦ πΜ = 40 + β8 100 Γ 10 πΜ = 39.2 Supposethe regression equation of π¦ on π₯ is π¦ = π + π π¦π₯ π₯ Where, π π¦π₯ = π β ππ’π£ββ π’ππ’ β π£ππ£ π β π’2 ππ’β(β π’ππ’) 2 Γ πΆ π¦ πΆ π₯
16. 16. RAI UNIVERSITY, AHMEDABAD 16 π π¦π₯ = 100Γ98β(β8)(β8) 100Γ122β(β8)2 Γ 10 10 π π¦π₯ = 9800β64 12200β64 Γ 1 π π¦π₯ = 9736 12136 π π¦π₯ = 0.802 And, π = π¦Μ β π π¦π₯ π₯Μ π = 39.2β 0.802(34.2) π = 39.2β 27.43 π = 11.77 β΄ π¦ = 11.77+ 0.802π₯ is the regression line of π¦ on π₯. Now supposethe regression line of π₯ on π¦ is π₯ = π + ππ₯π¦ π¦ Where ππ₯π¦ = π β ππ’π£ββ π’ππ’ β π£ππ£ π β π£2 ππ£β(β π£ππ£) 2 Γ πΆ π₯ πΆ π¦ ππ₯π¦ = 9736 100Γ122β(β8)2 Γ 10 10 ππ₯π¦ = 0.802 And π = π₯Μ β ππ₯π¦ π¦Μ π = 34.2β 0.802(39.2) π = 34.2β 31.44 π = 2.76 β΄ π₯ = 2.76 + 0.802π¦ is the regression line of π₯ on π¦. 6.2 Exampleβ
17. 17. RAI UNIVERSITY, AHMEDABAD 17 Calculate π ππ, π ππ& π using the following: Given Value Estimated Value π = ππ π¦ = 22 π = ππ π¦ = 34 π = ππ π₯ = 17 π = ππ π₯ = 23 Solution: Let the regression equation of π¦ on π₯ be π¦ = π + π π¦π₯ π¦ β΄ 22 = π + π π¦π₯ . 10 34 = π + π π¦π₯ . 20 _____________________ β12 = β10π π¦π₯ β΄ π π¦π₯ = 1.2 Let the regression equation of π₯ on π¦ be π₯ = π + ππ₯π¦ π¦ β΄ 17 = π + ππ₯π¦. 30 23 = π + ππ₯π¦. 50 _______________________ β6 = βππ₯π¦. 20 ππ₯π¦ = 6 20 = 0.3 Now π = β π π¦π₯. ππ₯π¦ π = β(1.2)(0.3) π = β0.36 π = 0.6
18. 18. RAI UNIVERSITY, AHMEDABAD 18 7.1Advantages of RegressionAnalysis: 1. The estimates of the unknown parameters obtained from linear least squares regression are the optimal. 2. Estimates from a broad class of possibleparameter estimates under the usual assumptions are used for process modeling. 3. It uses data very eο¬ciently. Good results can be obtained with relatively small data sets. 4. The theory associated with linear regression is well-understood and allows for construction of diο¬erent types of easily-interpretable statistical intervals for predictions, calibrations, and optimizations. 7.2 Limitations of RegressionAnalysis: 1. In making estimate from a regression equation, it is important to remember that the assumption is being made that relationship has not changed since the regression equation was computed. Another point worth remembering is that the relationship shown by the scatter diagram may not be the same if the equation is extended beyond the values used in computing the equation. ο For example there may be a close linear relationship between the yield of a crop and the amount of fertilizer applied, with the yield increasing as the amount of fertilizer is increased. It would not be logical, however, to extend this equation beyond the limits of the experiment for it is quite likely that if the amount of fertilizer were increased indefinitely, the yield would eventually decline as too much fertilizer was applied.