Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Correlation & regression (2)

481 views

Published on

Correlation & regression - Unitedworld School of Business

Published in: Education, Technology
  • Be the first to comment

Correlation & regression (2)

  1. 1. Correlation-Regression
  2. 2. It deals with association between two or more variables Correlation analysis deals with covariation between two or more variables Types 1. Positive or negative Simple or multiple Linear or non-linear
  3. 3. Methods of Measuring correlation 1. Graphic Method 2. Diagramatic Method- Scatter Diagram 3. Algebraic method a. Karl Pearson’s Coefficient of correlation b. Spearman’s Rank Co-efficient Correlation c. Coefficient of Concurrent deviations d. Least Squares Method
  4. 4. Karl Pearson’s Coefficient of Correlation Σ dx dy γ ( Gamma) = ------------------------- √ Σ dx2 Σ dy2 Σ dx dy = ------------------------- N σxσy dx = x-xbar dy = y- ybar dx dy = sum of products of deviations from respective arithmetic means of both series
  5. 5. Karl Pearson’s Coefficient of Correlation After calculating assumed or working mean Ax & Ay Σ dx dy – (Σ dx) x( Σ dy) γ ( Gamma) = -------------------------------- √ [ NΣ dx2 - (Σ dx)2 x [Σ Ndy2 - (Σ dy)2 ] Σ dx dy = total of products of deviation from assumed means of x and y series Σ dx = total of deviations of x series Σ dy = total of deviations of y series Σ dx2 = total of squared deviations of x series Σ dy2 = total of squared deviations of y series N= No. of items ( no. of paired items
  6. 6. Karl Pearson’s Coefficient of Correlation After calculating assumed or working mean Ax & Ay Σ dx x Σ dy Σ dx dy - ---------------- N γ ( Gamma) = ------------------------- (Σ dx)2 (Σ dy)2 √ [ Σ dx2 - --------- ] x [ Σ dy2 - ------------] N N
  7. 7. Assumptions of Karl Pearson’s Coefficient of Correlation 1. Linear relationship exists between the variables Properties of Karl Pearson’s Coefficient of Correlation 1.value lies between +1 & - 1 2.Zero means no correlation 3.γ ( Gamma) = √ bxy X byx Where bxy X byx are regression coefficicent Merit Convenient for accurate interpretation as it gives degree & direction of relationship between two variables
  8. 8. Limitations 1. Assumes linear relationship , even though it may not be 2. Method & process of calculation is difficult & time consuming 3. Affected by extreme values in distribution
  9. 9. Probable Error of Karl Pearson’s Coefficient of Correlation 1- γ2 Probable Error of γ ( Gamma) = 0.6745 -------- √ N
  10. 10. Q7.Calculate coefficient of correlation for following data X 65 63 67 64 68 62 70 66 68 67 69 71 Y 68 66 68 65 69 66 68 65 71 67 68 70 Ans Σ dx dy γ ( Gamma) = ------------------------- √ Σ dx2 Σ dy2 Σ dx dy = ------------------- N σxσy
  11. 11. 1 2 3 4 5 6 7 8 9 10 11 12 Su mX Xbar X 65 63 67 64 68 62 70 66 68 67 69 71 800 66.67 Y 68 66 68 65 69 66 68 65 71 67 68 70 811 67.58 dx=x-xbar -1.67 -3.67 0.33 -2.67 1.33 -4.67 3.33 -0.67 1.33 0.33 2.33 4.33 dx2 2.78 13.44 0.11 7.11 1.78 21.78 11.11 0.44 1.78 0.11 5.44 18.78 84. 67 dx.dy -0.69 5.81 0.14 6.89 1.89 7.39 1.39 1.72 4.56 -0.19 0.97 10.47 40. 33 dy=y-ybar 0.42 -1.58 0.42 -2.58 1.42 -1.58 0.42 -2.58 3.42 -0.58 0.42 2.42 dy2 0.17 2.51 0.17 6.67 2.01 2.51 0.17 6.67 11.67 0.34 0.17 5.84 38. 92 Σ dx dy sum dx2* sumdy2 3294. 9 √ Σ dx2 Σ dy2 57.40 coeff of correlation = 0.70
  12. 12. Q8. following information about age of husbands & wives. Find correlation coefficient Husband 23 27 28 29 30 31 33 35 36 39 Wife 18 22 23 24 25 26 28 29 30 32 γ ( Gamma) =0.99
  13. 13. 1 2 3 4 5 6 7 8 9 10 Sum X Xbar X 23 27 28 29 30 31 33 35 36 39 311 31.10 Y 18 22 23 24 25 26 28 29 30 32 257 25.70 dx=x- xbar -8.10 -4.10 -3.10 -2.10 -1.10 -0.10 1.90 3.90 4.90 7.90 dx2 65.61 16.81 9.61 4.41 1.21 0.01 3.61 15.21 24.01 62.41 202. 9 dx.dy 62.37 15.17 8.37 3.57 0.77 -0.03 4.37 12.87 21.07 49.77 178. 3 dy=y- ybar -7.70 -3.70 -2.70 -1.70 -0.70 0.30 2.30 3.30 4.30 6.30 dy2 59.29 13.69 7.29 2.89 0.49 0.09 5.29 10.89 18.49 39.69 158. 1 Σ dx dy sum dx2* sumdy2 32078.4 9 √ Σ dx2 Σ dy2 179.10 coeff of correlation = 1.00
  14. 14. Rank Correlation : some times variable are not quantitative in nature but can be arranged in serial order. Specially while eading with attributes like – honesty , beauty , character , morality etc To deal with such situations , Charles Edward Spearman , in 1904 developed a formula for obtaining correlation coefficient between ranks of n individuals in two attributes under study , or ranks given by two or three judges
  15. 15. Rank coefficient of correlation 6Σ d2 ρ (rho) = 1 - ------------------- N3 -N 6Σ d2 ρ (rho) = 1 - ------------------- N(N2 -1) Σ d2 = total of squared difference N = number of items
  16. 16. Q9. ten competitors in a cooking competition are ranked by three judges in the following way .by using rank coorelation method find out which pair of judges have nearest approach P Q R 1 1 3 6 2 6 5 4 3 5 8 9 4 10 4 8 5 3 7 1 6 2 10 2 7 4 2 3 8 9 1 10 9 7 6 5 10 8 9 7
  17. 17. P Q R Rp-Rq dpq2 Rq-Rr dqr2 Rp-Rr dpr2 1 1 3 6 -2 4 -3 9 -5 25 2 6 5 4 1 1 1 1 2 4 3 5 8 9 -3 9 -1 1 -4 16 4 10 4 8 6 36 -4 16 2 4 5 3 7 1 -4 16 6 36 2 4 6 2 10 2 -8 64 8 64 0 0 7 4 2 3 2 4 -1 1 1 1 8 9 1 10 8 64 -9 81 -1 1 9 7 6 5 1 1 1 1 2 4 10 8 9 7 -1 1 2 4 1 1 1000 200 214 0 60 6Sigma d2 1200 1284 360 N3 -N 990 6Sigma d2/N3 -N 1.21 1.297 0.3636 ρ (rho) -0.21 -0.297 0.636364
  18. 18. Regression Analysis is the process of developing a statistical model which is used to predict the value of a dependant variable by an independent variable Application Advertising v/s sales revenue First used by Sir Francis Gatton in 1877 for study of height of sons w.r.t height of fathers
  19. 19. Regression Analysis – going back or to revert to the former condition or return Refers to functional relationship between x & y and estimates of value of depebdent variable y for given values of independeny variable x Relationship between income of employees and savings Regression coefficients can be used to calculate , correlation coeffecient.γ ( Gamma) = √ bxy X byx
  20. 20. Types of Regression 1. Simple & Multiple Regression 2. Total or Partial 3. Linear / Non-linear Methods of Regression Analysis 1. Scatter Diagram 2. Regression Equations 3. Regression Lines Regression of x on y y= a + bx Regression of y on x x= a + by
  21. 21. Regression coefficients coefficient of regression of x on y = coefficient of regression of x on y = Σ( x- x-) (y- y-) Σdx dy bxy= ------------------= ------- Σ (y- y-)2 Σ dy2 coefficient of regression of y on x Σ( x- x-) (y- y-) Σdx dy byx= ------------------= ---------- Σ (x- x-)2 Σ dx2
  22. 22. Q2.From the data given below find two regression coefficients two regression equations coefficient of correlation between marks in Economics & statistics most likely marks in statistics when marks in Economics are 30 let marks in Economics be x and that in statistics be y Marks in Eco 25 28 35 32 31 36 29 38 34 32 Marks in Stat 43 46 49 41 36 32 31 30 33 39
  23. 23. Marks in Eco 25 28 35 32 31 36 29 38 34 32 Σx 320 x- 32 Marks in Stat 43 46 49 41 36 32 31 30 33 39 Σy 380 y- 38
  24. 24. Marks in Eco 25 2 8 35 3 2 3 1 3 6 2 9 3 8 3 4 3 2 Σx 320 x- 3 2 Marks in Stat 43 4 6 49 4 1 3 6 3 2 3 1 3 0 3 3 3 9 Σy 380 y- 3 8 dx=x- x- =x-32 -7 -4 3 0 -1 4 -3 6 2 0 Σdx 0 3 3 3 3 dy=y- y- =x-38 5 8 11 3 -2 -6 -7 -8 -5 1 Σdy 0
  25. 25. Marks in Eco 25 28 35 32 31 36 29 38 34 32 Σx 320 x- 32 Marks in Stat 43 46 49 41 36 32 31 30 33 39 Σy 380 y- 38 dx=x- x- =x- 32 -7 -4 3 0 -1 4 -3 6 2 0 Σdx 0 33 33 dy=y- y- =x- 38 5 8 11 3 -2 -6 -7 -8 -5 1 Σdy 0 dx2 49 16 9 0 1 16 9 36 4 0 Σdx2 140 dy2 25 64 121 9 4 36 49 64 25 1 Σdy2 398 dx dy -35 - 32 33 0 2 - 24 21 - 48 - 10 0 Σdxd y -93
  26. 26. Regression coefficients coefficient of regression of x on y = coefficient of regression of x on y = Σ( x- x-) (y- y-) Σdx dy -93 bxy= ------------------= ------- = ------ = -0.2337 Σ (y- y-)2 Σ dy2 398 coefficient of regression of y on x = Σ( x- x-) (y- y-) Σdx dy -93 byx= ------------------= ---------- = --------= -0.6643 Σ (x- x-)2 Σ dx2 140
  27. 27. regression of x on y x-x- = bxy (y-y-) x-32 = -0.2337(y-38) = - 0.2337 y +0.2337 *38 = -0.2337y + 8.8806 x = -0.2337y +32 + 8.8806 x = -0.2337y +40.8806
  28. 28. Correlation Coefficient = √ bxy *byx = √ -0.2337 *-0.6643 = √ 0.1552 = -0.394 Since byx & bxy are both negative
  29. 29. regression of y on x y-y- = bxy (x-x-) y-38 = -0.6643(x-32) y -38= -0.6643x+0.6643*32 y = -0.6643x+38+0.6643*32 y = -0.6643x+38+21.2576 y = -0.6643x+59.2576
  30. 30. In order to estimate most likely marks in statistics (y) when Economics (x) are 30 , we shall use the line regression of y x viz The required estimate is given by y = -0.6643* 30+59.2576= -19.929+59.2576 = =39.3286
  31. 31. Sum of Squares- x&y (Σx )*(Σy) SSxy = Σ ( x-x- ) ( y-y- )= = Σxy - -------------- n Sum of Squares xx (Σx ) SSxx = Σ ( x-x- )2 =Σx2 - ------------- n
  32. 32. advt sales 92 930 94 900 97 1020 98 990 100 1100 102 1050 104 1150 105 1120 105 1130 107 1200 107 1250 110 1220 Sales &advt expenses in Rs.1000. Develop a regression model
  33. 33. SSxy b = ------------ SSxx y=a+bx Σ y= Σ a+b Σ x Σ y= n* a+b Σ x n* a = b Σ x - Σ y Σ y - bΣ x Σ y bΣ x a = ----------- = ------- - ------- n n n
  34. 34. xi= yi= predict ed residual advt sales x2 xy (yi-y- ) (yi-y- )2 y^=fits yi-y^ ( yi-y^)2 y^-y- (y^-y- )2 92 930 8464 85560 = 902.4 27.6 94 900 8836 84600 940.54 -40.54 97 1020 9409 98940 997.75 22.25 98 990 9604 97020 1016.8 2 -26.82 100 1100 10000 110000 1054.9 6 45.04 102 1050 10404 107100 1093.1 -43.1 104 1150 10816 119600 1131.2 4 18.76 105 1120 11025 117600 1150.3 1 -30.31 105 1130 11025 118650 1150.3 1 -20.31 107 1200 11449 128400 1188.4 5 11.55 107 1250 11449 133750 1188.4 5 61.55 110 1220 12100 134200 1245.6 6 -25.66 1221 13060 124581 1335420 0 13059. 99 0.01 Σx Σy Σx2 Σxy Σ Σ Σ(yi-yc) Σ
  35. 35. xi= yi= predicted residual advt sales x2 xy (yi-y- ) (yi-y- )2 y^=fits yi-y^ ( yi-y^)2 y^-y- (y^-y- )2 92 930 8464 85560 -158.33 25069.44 902.4 27.6 761.76 -185.93 34571.20 94 900 8836 84600 -188.33 35469.44 940.54 -40.54 1643.49 -147.79 21842.87 97 1020 9409 98940 -68.33 4669.44 997.75 22.25 495.06 -90.58 8205.34 98 990 9604 97020 -98.33 9669.44 1016.82 -26.82 719.31 -71.51 5114.16 100 1100 10000 110000 11.67 136.11 1054.96 45.04 2028.60 -33.37 1113.78 102 1050 10404 107100 -38.33 1469.44 1093.1 -43.1 1857.61 4.77 22.72 104 1150 10816 119600 61.67 3802.78 1131.24 18.76 351.94 42.91 1840.98 105 1120 11025 117600 31.67 1002.78 1150.31 -30.31 918.70 61.98 3841.11 105 1130 11025 118650 41.67 1736.11 1150.31 -20.31 412.50 61.98 3841.11 107 1200 11449 128400 111.67 12469.44 1188.45 11.55 133.40 100.12 10023.35 107 1250 11449 133750 161.67 26136.11 1188.45 61.55 3788.40 100.12 10023.35 110 1220 12100 134200 131.67 17336.11 1245.66 -25.66 658.44 157.33 24751.68 1221 13060 124581 1335420 0.00 138966.667 13059.99 0.01 13769.21 -0.01 125191.6 Σx Σy Σx2 Σxy Σ Σ Σ(yi-yc) Σ
  36. 36. 1221 x- = ------------- = 101.75 12 (Σx *Σy) 1221*13060 SSxy = Σxy - ------------= 1335420 - -------------- =6565 n 12 (Σx )2 ( 1221)2 SSxx = Σx2 - -------------= 124581 - ------- = = 344.25 n 12
  37. 37. SSxy 6565 b = ------------- = ----------------= 19.0704 SSxx 344.25 y=a+bx Σ y= Σ a+b Σ x Σ y= n* a+b Σ x n* a = b Σ x - Σ y Σ y - bΣ x Σ y bΣ x 13060 19.0704*1221 a = ----------- = ------- - ------- = ---------- - -------------- n n n 12 12 = - 852.08
  38. 38. equation for simple regression line y= a+bx y= -852.08+ 19.0704 x for regression of y on x
  39. 39. For testing the Fit yi = yi- value of y –recorded value in the given data y- = Mean ( Average )of y y^ = Predicted Values from regression line deviation = (yi- y- ) = difference in actual value of y from mean Residuals = (yi- y^)= gap ( error , difference ) between actual value of y & predicted value calculated from regression line Deviation of predicted value from mean = (y^- y- ) a = intercept on y -axis b= slope of regression line
  40. 40. total sum of squares = SST = Σ (yi-y- )2 regression sum of squares = SSR = Σ (y^- y- )2 Error sum of squares = SSE = Σ (yi-y^)2 SSR coefficient of determination = γ2= ------- SST
  41. 41. SSE Standard Error of Estimate =Syx= √---------------- n-2 In order to to determine whether a significant linear relationship exists between independent variable x and dependent variable y we perform whether population slope is zero b - β t= ---------- Sb Syx Sb = Standard error of b= ----------- √ SSxx
  42. 42. H0:Slope of thr regression line is zero H1-Slope of the regression line is not zero
  43. 43. SSE Syx= Standard Error of Estimate =√-------- n-2 Σ (yi-y^)2 13769.21 =√ -------- = √------------ = √1376.92 = 37.1068 n-2 10-2 (Σx )2 (1221)2 SSxx = Σx2 - -------- = 124581 - -------= 344.25 n 12 Syx Sb = Standard error of b= ----------- √ SSxx
  44. 44. Syx Sb = Standard error of b= ----------- √ SSxx b- β 19.07-0 t= ---------- = ------------------------------- = 9.53 Sb 37.1068/( √344.25) As calculated value of t is more than table value of t for 12-2 = 10 degrees of freedom Null hypothesis is rejected
  45. 45. Coefficient of Determination Definition The Coefficient of Determination, also known as R Squared, is interpreted as the goodness of fit of a regression. The higher the coefficient of determination, the better the variance that the dependent variable is explained by the independent variable. The coefficient of determination is the overall measure of the usefulness of a regression. For example,r2 is given at 0.95. This means that the variation in the regression is 95% explained by the independent variable. That is a good regression.
  46. 46. The Coefficient of Determination can be calculated as the Regression sum of squares, SSR, divided by the total sum of squares, SST SSR Coefficient of Determination γ2 = ---------- SST
  47. 47. Campus Overview 907/A Uvarshad, Gandhinagar Highway, Ahmedabad – 382422. Ahmedabad Kolkata Infinity Benchmark, 10th Floor, Plot G1, Block EP & GP, Sector V, Salt-Lake, Kolkata – 700091. Mumbai Goldline Business Centre Linkway Estate, Next to Chincholi Fire Brigade, Malad (West), Mumbai – 400 064.
  48. 48. Thank You

×