Correlation & regression uwsb

777 views

Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Correlation & regression uwsb

  1. 1. Correlation-Regression
  2. 2. It deals with association between two ormore variablesCorrelation analysis deals withcovariation between two or morevariablesTypes1. Positive or negativeSimple or multipleLinear or non-linear
  3. 3. Methods of Measuring correlation1. Graphic Method2. Diagramatic Method- Scatter Diagram3. Algebraic methoda. Karl Pearson’s Coefficient of correlationb. Spearman’s Rank Co-efficient Correlationc. Coefficient of Concurrent deviationsd. Least Squares Method
  4. 4. Karl Pearson’s Coefficient of CorrelationΣ dx dyγ ( Gamma) = -------------------------√ Σ dx2Σ dy2Σ dx dy= -------------------------N σxσydx = x-xbardy = y- ybardx dy = sum of products of deviations fromrespective arithmetic means of both series
  5. 5. Karl Pearson’s Coefficient of CorrelationAfter calculating assumed or working mean Ax &AyΣ dx dy – (Σ dx) x( Σ dy)γ ( Gamma) = --------------------------------√ [ NΣ dx2- (Σ dx)2x [Σ Ndy2- (Σ dy)2]Σ dx dy = total of products of deviation fromassumed means of x and y seriesΣ dx = total of deviations of x seriesΣ dy = total of deviations of y seriesΣ dx2= total of squared deviations of x seriesΣ dy2= total of squared deviations of y seriesN= No. of items ( no. of paired items
  6. 6. Karl Pearson’s Coefficient of CorrelationAfter calculating assumed or working mean Ax &AyΣ dx x Σ dyΣ dx dy - ----------------Nγ ( Gamma) = -------------------------(Σ dx)2(Σ dy)2√ [ Σ dx2- --------- ] x [ Σ dy2- ------------]N N
  7. 7. Assumptions of Karl Pearson’s Coefficient ofCorrelation1. Linear relationship exists between the variablesProperties of Karl Pearson’s Coefficient ofCorrelation1.value lies between +1 & - 12.Zero means no correlation3.γ ( Gamma) = √ bxy X byxWhere bxy X byx are two regression coefficicentMeritConvenient for accurate interpretation as it givesdegree & direction of relationship between twovariables
  8. 8. Limitations1. Assumes linear relationship , even though itmay not be2. Method & process of calculation is difficult &time consuming3. Affected by extreme values in distribution
  9. 9. Probable Error of Karl Pearson’s Coefficient ofCorrelation1- γ2Probable Error of γ ( Gamma) = 0.6745 --------√ N
  10. 10. Q7.Calculate coefficient of correlation for following dataX65 63 67 64 68 62 70 66 68 67 69 71Y 68 66 68 65 69 66 68 65 71 67 68 70Ans Σ dx dyγ ( Gamma) = -------------------------√ Σ dx2Σ dy2Σ dx dy= -------------------N σxσy
  11. 11. 1 2 3 4 5 6 7 8 9 10 11 12SumX XbarX 65 63 67 64 68 62 70 66 68 67 69 71 800 66.67Y 68 66 68 65 69 66 68 65 71 67 68 70 811 67.58dx=x-xbar -1.67 -3.67 0.33 -2.67 1.33 -4.67 3.33 -0.67 1.33 0.33 2.33 4.33dx22.78 13.44 0.11 7.11 1.78 21.78 11.11 0.44 1.78 0.11 5.44 18.7884.67dx.dy -0.69 5.81 0.14 6.89 1.89 7.39 1.39 1.72 4.56 -0.19 0.97 10.4740.33dy=y-ybar 0.42 -1.58 0.42 -2.58 1.42 -1.58 0.42 -2.58 3.42 -0.58 0.42 2.42dy20.17 2.51 0.17 6.67 2.01 2.51 0.17 6.67 11.67 0.34 0.17 5.8438.92Σ dx dysum dx2*sumdy23294.9√ Σ dx2 Σ dy2 57.40coeff ofcorrelation = 0.70
  12. 12. Q8. following information about age of husbands& wives. Find correlation coefficientHusband23 27 28 29 30 31 33 35 36 39Wife 18 22 23 24 25 26 28 29 30 32γ ( Gamma) =0.99
  13. 13. 1 2 3 4 5 6 7 8 9 10SumX XbarX 23 27 28 29 30 31 33 35 36 39 311 31.10Y 18 22 23 24 25 26 28 29 30 32 257 25.70dx=x-xbar -8.10 -4.10 -3.10 -2.10 -1.10 -0.10 1.90 3.90 4.90 7.90dx265.61 16.81 9.61 4.41 1.21 0.01 3.61 15.21 24.01 62.41202.9dx.dy 62.37 15.17 8.37 3.57 0.77 -0.03 4.37 12.87 21.07 49.77178.3dy=y-ybar -7.70 -3.70 -2.70 -1.70 -0.70 0.30 2.30 3.30 4.30 6.30dy259.29 13.69 7.29 2.89 0.49 0.09 5.29 10.89 18.49 39.69158.1Σ dx dy sum dx2* sumdy232078.49√ Σ dx2 Σ dy2 179.10coeff of correlation= 1.00
  14. 14. Rank Correlation : some times variable are notquantitative in nature but can be arranged inserial order.Specially while eading with attributes like –honesty , beauty , character , morality etcTo deal with such situations , Charles EdwardSpearman , in 1904 developed a formula forobtaining correlation coefficient between ranksof n individuals in two attributes under study , orranks given by two or three judges
  15. 15. Rank coefficient of correlation6Σ d2ρ (rho) = 1 - -------------------N3-N6Σ d2ρ (rho) = 1 - -------------------N(N2-1)Σ d2= total of squared differenceN = number of items
  16. 16. Q9. ten competitors in a cooking competition are rankedby three judges in the following way .by using rankcoorelation method find out which pair of judges havenearest approachP Q R1 1 3 62 6 5 43 5 8 94 10 4 85 3 7 16 2 10 27 4 2 38 9 1 109 7 6 510 8 9 7
  17. 17. P Q RRp-Rq dpq2Rq-Rr dqr2Rp-Rr dpr21 1 3 6 -2 4 -3 9 -5 252 6 5 4 1 1 1 1 2 43 5 8 9 -3 9 -1 1 -4 164 10 4 8 6 36 -4 16 2 45 3 7 1 -4 16 6 36 2 46 2 10 2 -8 64 8 64 0 07 4 2 3 2 4 -1 1 1 18 9 1 10 8 64 -9 81 -1 19 7 6 5 1 1 1 1 2 410 8 9 7 -1 1 2 4 1 11000 200 214 0 606Sigma d2 1200 1284 360N3-N 990 6Sigma d2/N3-N 1.21 1.297 0.3636ρ(rho) -0.21 -0.297 0.636364
  18. 18. Regression Analysis is the process ofdeveloping a statistical model which is usedto predict the value of a dependant variableby an independent variableApplicationAdvertising v/s sales revenueFirst used by Sir Francis Gatton in 1877 forstudy of height of sons w.r.t height of fathers
  19. 19. Regression Analysis – going back or to revert tothe former condition or returnRefers to functional relationship between x & yand estimates of value of depebdent variable yfor given values of independeny variable xRelationship between income of employees andsavingsRegression coefficients can be used to calculate ,correlation coeffecient.γ ( Gamma) = √ bxy Xbyx
  20. 20. Types of Regression1. Simple & Multiple Regression2. Total or Partial3. Linear / Non-linearMethods of Regression Analysis1. Scatter Diagram2. Regression Equations3. Regression Lines
  21. 21. Line of Regression of y on x y= a + bxCoefficient b is slope of line of regression of y on x.It represents the increment in the value of thedependent variable y for a unit change in the value ofindependent variable x i.e. rate of change of y w.r.t. x.It is written as byxRegression coefficients/ coefficient of regression of yon xΣ( x- x-) (y- y-) σdx dybyx= ------------------= ----------Σ (x- x-)2Σ dx2i.e. Equation of Line of Regression of x on yy-y-= byx (x-x-)
  22. 22. Line of Regression of x on y x= a + byCoefficient b is slope of line of regression of x on y.It represents the increment in the value of thedependent variable x for a unit change in the value ofindependent variabley i.e. rate of change of x w.r.t. y.It is written as bxyRegression coefficients/ coefficient of regression of xon yΣ( x- x-) (y- y-) σdx dybxy= ------------------= ----------Σ (y- y-)2Σ dy2i.e. Equation of Line of Regression of x on yy-y-= bxy (x-x-)
  23. 23. Q2.From the data given below findtwo regression coefficientstwo regression equationscoefficient of correlation between marks inEconomics & statisticsmost likely marks in statistics when marks inEconomics are 30let marks in Economics be x and that in statisticsbe yMarks in Eco 25 28 35 32 31 36 29 38 34 32Marks in Stat 43 46 49 41 36 32 31 30 33 39
  24. 24. Marks inEco25 28 35 32 31 36 29 38 34 32 Σx 320 x-32Marks inStat43 46 49 41 36 32 31 30 33 39 Σy 380 y-38
  25. 25. Marks inEco25 2835 32313629383432Σx 320 x-32Marks inStat43 4649 41363231303339Σy 380 y-38dx=x- x-=x-32-7 -4 3 0 -1 4 -3 6 2 0 Σdx 0 3333dy=y- y-=x-385 8 11 3 -2 -6 -7 -8 -5 1 Σdy 0
  26. 26. Marks inEco25 28 35 32 31 36 29 38 34 32 Σx 320 x-32Marks inStat43 46 49 41 36 32 31 30 33 39 Σy 380 y-38dx=x- x-=x-32-7 -4 3 0 -1 4 -3 6 2 0 Σdx 0 33 33dy=y- y-=x-385 8 11 3 -2 -6 -7 -8 -5 1 Σdy 0dx249 16 9 0 1 16 9 36 4 0 Σdx2 140dy225 64 121 9 4 36 49 64 25 1 Σdy2 398dx dy -35 -3233 0 2 -2421 -48-100 Σdxdy-93
  27. 27. Regression coefficients / coefficient of regressionof y on x =Σ( x- x-) (y- y-) Σdx dy -93byx= ------------------= ---------- = --------= -0.6643Σ (x- x-)2Σ dx2140regression of y on xy-y-= byx (x-x-)y-38 = -0.6643(x-32)y -38= -0.6643x+0.6643*32y = -0.6643x+38+0.6643*32y = -0.6643x+38+21.2576y = -0.6643x+59.2576
  28. 28. coefficient of regression of x on yΣ( x- x-) (y- y-) Σdx dy -93bxy= ------------------= ------- = ------ = -0.2337Σ (y- y-)2Σ dy2398Equation of regression of x on yx-x-= bxy (y-y-)x-32 = -0.2337(y-38)= - 0.2337 y +0.2337 *38= -0.2337y + 8.8806x = -0.2337y +32 + 8.8806x = -0.2337y +40.8806
  29. 29. Correlation Coefficient = √ bxy *byx= √ -0.2337 *-0.6643 = √ 0.1552 = -0.394Since byx & bxy are both negative
  30. 30. In order to estimate most likely marks in statistics(y) when Economics (x) are 30 , we shall use theline regression of y x vizThe required estimate is given byy = -0.6643* 30+59.2576= -19.929+59.2576 ==39.3286
  31. 31. Sum of Squares- x&y(Σx )*(Σy)SSxy=Σ( x-x-)(y-y-)= Σdxdy = Σxy - --------------nSum of Squares xx(Σx )SSxx = Σ ( x-x-)2= Σdx2=Σx2- -------------n
  32. 32. advt sales92 93094 90097 102098 990100 1100102 1050104 1150105 1120105 1130107 1200107 1250110 1220Sales &advt expenses in Rs.1000. Develop a regression model
  33. 33. Sum of Squares- x&y(Σx )*(Σy)SSxy=Σ( x-x-)(y-y-)= Σdxdy = Σxy - --------------nSum of Squares xx(Σx )SSxx = Σ ( x-x-)2= Σdx2=Σx2- -------------n
  34. 34. SSxy Σdxdyb = ------------=---------SSxx Σdx2y=a+bxΣ y= Σ a+b Σ xΣ y= n* a+b Σ xn* a = b Σ x - Σ yΣ y - bΣ x Σ y bΣ xa = ----------- = ------- - -------n n n
  35. 35. xi= yi=predicted residualadvt sales x2 xy (yi-y-) (yi-y-)2y^=fits yi-y^ ( yi-y^)2 y^-y-(y^-y-)292 930 8464 85560 = 902.4 27.694 900 8836 84600 940.54 -40.5497 1020 9409 98940 997.75 22.2598 990 9604 970201016.82 -26.82100 1100 10000 1100001054.96 45.04102 1050 10404 107100 1093.1 -43.1104 1150 10816 1196001131.24 18.76105 1120 11025 1176001150.31 -30.31105 1130 11025 1186501150.31 -20.31107 1200 11449 1284001188.45 11.55107 1250 11449 1337501188.45 61.55110 1220 12100 1342001245.66 -25.661221 13060 124581 1335420 013059.99 0.01Σx Σy Σx2 Σxy Σ Σ Σ(yi-yc) Σ
  36. 36. xi= yi= predicted residualadvt sales x2 xy (yi-y-) (yi-y-)2y^=fits yi-y^ ( yi-y^)2 y^-y-(y^-y-)292 930 8464 85560 -158.3325069.44902.4 27.6761.76 -185.93 34571.2094 900 8836 84600 -188.3335469.44940.54 -40.541643.49 -147.79 21842.8797 1020 9409 98940 -68.33 4669.44 997.75 22.25 495.06 -90.58 8205.3498 990 9604 97020 -98.33 9669.44 1016.82 -26.82 719.31 -71.51 5114.16100 1100 10000 110000 11.67 136.11 1054.96 45.04 2028.60 -33.37 1113.78102 1050 10404 107100 -38.33 1469.44 1093.1 -43.1 1857.61 4.77 22.72104 1150 10816 119600 61.67 3802.78 1131.24 18.76 351.94 42.91 1840.98105 1120 11025 117600 31.67 1002.78 1150.31 -30.31 918.70 61.98 3841.11105 1130 11025 118650 41.67 1736.11 1150.31 -20.31 412.50 61.98 3841.11107 1200 11449 128400 111.67 12469.44 1188.45 11.55 133.40 100.12 10023.35107 1250 11449 133750 161.67 26136.11 1188.45 61.55 3788.40 100.12 10023.35110 1220 12100 134200 131.67 17336.11 1245.66 -25.66 658.44 157.33 24751.681221 13060 124581 1335420 0.00 138966.667 13059.99 0.01 13769.21 -0.01 125191.6Σx Σy Σx2 Σxy Σ Σ Σ(yi-yc) Σ
  37. 37. 1221x-= ------------- = 101.7512(Σx *Σy) 1221*13060SSxy = Σxy - ------------= 1335420 - -------------- =6565n 12(Σx )2( 1221)2SSxx = Σx2- -------------= 124581 - ------- = = 344.25n 12
  38. 38. SSxy 6565b = ------------- = ----------------= 19.0704SSxx 344.25y=a+bxΣ y= Σ a+b Σ xΣ y= n* a+b Σ xn* a = b Σ x - Σ yΣ y - bΣ x Σ y bΣ x 13060 19.0704*1221a = ----------- = ------- - ------- = ---------- - --------------n n n 12 12= - 852.08
  39. 39. equation for simple regression liney= a+bxy= -852.08+ 19.0704 xfor regression of y on x
  40. 40. For testing the Fityi = yi- value of y –recorded value in the givendatay-= Mean ( Average )of yy^ = Predicted Values from regression linedeviation = (yi- y-) = difference in actual value of yfrom meanResiduals = (yi- y^)= gap ( error , difference )between actual value of y & predicted valuecalculated from regression lineDeviation of predicted value from mean = (y^- y-)a = intercept on y -axisb= slope of regression line
  41. 41. total sum of squares = SST = Σ (yi-y-)2regression sum of squares = SSR = Σ (y^- y-)2Error sum of squares = SSE = Σ (yi-y^)2SSRcoefficient of determination = γ2= -------SST
  42. 42. SSEStandard Error of Estimate =Syx= √----------------n-2In order to to determine whether a significantlinear relationship exists between independentvariable x and dependent variable y we performwhether population slope is zerob - βt= ----------SbSyxSb = Standard error of b= -----------√ SSxx
  43. 43. H0:Slope of thr regression line is zeroH1-Slope of the regression line is not zero
  44. 44. SSESyx= Standard Error of Estimate =√--------n-2Σ (yi-y^)2 13769.21=√ -------- = √------------ = √1376.92 = 37.1068n-2 10-2(Σx )2 (1221)2SSxx = Σx2 - -------- = 124581 - -------= 344.25n 12SyxSb = Standard error of b= -----------√ SSxx
  45. 45. SyxSb = Standard error of b= -----------√ SSxxb- β 19.07-0t= ---------- = ------------------------------- = 9.53Sb 37.1068/( √344.25)As calculated value of t is more than tablevalue of t for 12-2 = 10 degrees of freedomNull hypothesis is rejected
  46. 46. Coefficient of Determination DefinitionThe Coefficient of Determination, also known as RSquared, is interpreted as the goodness of fit of aregression.The higher the coefficient of determination, thebetter the variance that the dependent variable isexplained by the independent variable.The coefficient of determination is the overallmeasure of the usefulness of a regression.For example,r2is given at 0.95. This means that thevariation in the regression is 95% explained by theindependent variable. That is a good regression.
  47. 47. The Coefficient of Determination can becalculated as the Regression sum of squares,SSR, divided by the total sum of squares, SSTSSRCoefficient of Determination γ2= ---------- SST

×