Simple linear regression and correlation
Upcoming SlideShare
Loading in...5
×
 

Simple linear regression and correlation

on

  • 572 views

Name                                       Shakeel Nouman ...

Name                                       Shakeel Nouman
Religion                                  Christian
Domicile                            Punjab (Lahore)
Contact #                            0332-4462527. 0321-9898767
E.Mail                                sn_gcu@yahoo.com
sn_gcu@hotmail.com

Statistics

Views

Total Views
572
Views on SlideShare
572
Embed Views
0

Actions

Likes
2
Downloads
50
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Simple linear regression and correlation Simple linear regression and correlation Presentation Transcript

  • Simple Linear Regression and Correlation Slide 1 Shakeel Nouman M.Phil Statistics Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Slide 2 10 Simple Linear Regression and Correlation • • • • • • • • • • • Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares Error Variance and the Standard Errors of Regression Estimators Correlation Hypothesis Tests about the Regression Relationship How Good is the Regression? Analysis of Variance Table and an F Test of the Regression Model Residual Analysis and Checking for Model Inadequacies Use of the Regression Model for Prediction Summary and Review of Terms Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Slide 3 10-1 Using Statistics This scatterplot locates pairs of observations of advertising expenditures on the x-axis and sales on the y-axis. We notice that: Scatterplot of Advertising Expenditures (X) and Sales (Y) 140 120  Larger (smaller) values of sales tend to be associated with larger (smaller) values of advertising. S ale s 100 80 60 40 20 0 0 10 20 30 40 50 Ad ve rtis ing  The scatter of points tends to be distributed around a positively sloped straight line.  The pairs of values of advertising expenditures and sales are not located exactly on a straight line.  The scatter plot reveals a more or less strong tendency rather than a precise linear relationship.  The line represents the nature of the relationship on average. Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer View slide
  • Examples of Other Scatterplots Slide 4 0 Y Y Y 0 0 0 0 X X X Y Y Y X X X Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer View slide
  • Model Building The inexact nature of the relationship between advertising and sales suggests that a statistical model might be useful in analyzing the relationship. A statistical model separates the systematic component of a relationship from the random component. Data Statistical model Systematic component + Random errors Slide 5 In ANOVA, the systematic component is the variation of means between samples or treatments (SSTR) and the random component is the unexplained variation (SSE). In regression, the systematic component is the overall linear relationship, and the random component is the variation around the line. Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • 10-2 The Simple Linear Regression Model Slide 6 The population simple linear regression model: Y= 0 + 1 X +  Nonrandom or Random Systematic Component Component where  Y is the dependent variable, the variable we wish to explain or predict  X is the independent variable, also called the predictor variable   is the error term, the only random component in the model, and thus, the only source of randomness in Y.  0 is the intercept of the systematic component of the regression relationship.  1 is the slope of the systematic component. The conditional mean of Y: E [Y X ]   0   1 X Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Picturing the Simple Linear Regression Model Y Regression Plot E[Y]= +  X 0 1 Yi } { Error:  i } Slide 7 The simple linear regression model gives an exact linear relationship between the expected or average value of Y, the dependent variable, and X, the independent or predictor variable: E[Yi]=0 + 1 Xi  = Slope 1 Actual observed values of Y differ from the expected value by an unexplained or random error: 1  = Intercept 0 X Yi = E[Yi] + i = 0 + 1 Xi + i Xi Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Assumptions of the Simple Linear Regression Model • The relationship between X • • and Y is a straight-line relationship. The values of the independent variable X are assumed fixed (not random); the only randomness in the values of Y comes from the error term  i. The errors i are normally distributed with mean 0 and variance s2. The errors are uncorrelated (not related) in successive observations. That is: ~ N(0,s2) Y Slide 8 Assumptions of the Simple Linear Regression Model E[Y]=0 +  1 X Identical normal distributions of errors, all centered on the regression line. X Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • 10-3 Estimation: The Method of Least Squares Slide 9 Estimation of a simple linear regression relationship involves finding estimated or predicted values of the intercept and slope of the linear regression line. The estimated regression equation: Y = b0 + b1X + e where b0 estimates the intercept of the population regression line, 0 ; b1 estimates the slope of the population regression line, 1; and e stands for the observed errors - the residuals from fitting the estimated regression line b0 + b1X to a set of n points. The estimated regression line:  Y  b0 + b1 X  where Y (Y - hat) is the value of Y lying on the fitted regression line for a given Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer value of X.
  • Fitting a Regression Line Y Slide 10 Y Data X Thr rrors from th last squars rgrssion lin X Y Three errors from a fitted line X Errors from the least squares regression line are minimized X Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Errors in Regression Slide 11 Y the observed data point Yi  {  Error ei  Yi  Yi  Yi Xi  Y  b0  b1 X  Yi the predicted value of Y for X the fitted regression line i X Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Least Squares Regression Slide 12 The sum of squared errors in regression is: n n e  (y i  y i ) 2   i=1 SSE = i=1 2 i The least squares regression line is that which minimizes the SSE with respect to the estimates b 0 and b 1 . SSE The normal equations: n y 0 n i  nb0  b1  x i i=1 i=1 Least squares b0 n n n i=1 i=1 i=1 x i y i b0  x i  b1  x 2  i Least squares b1 At this point SSE is minimized with respect to b0 and b1 1 Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Sums of Squares, Cross Products, and Least Squares Estimators Slide 13 Sums of Squares and Cross Products: SSx   (x  x )   x 2 2   x  2 n 2   y 2 2 SS y   ( y  y )   y  n SSxy   (x  x )( y  y )     x  ( y ) xy  n Least  squares regression estimators: SS XY b1  SS X b0  y  b1 x Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Example 10-1 Mils Dollars Mils 2MilsDollars 1211 1802 1466521 2182222 1345 2405 1809025 3234725 1422 2005 2022084 2851110 1687 2511 2845969 4236057 1849 2332 3418801 4311868 2026 2305 4104676 4669930 2133 3016 4549689 6433128 2253 3385 5076009 7626405 2400 3090 5760000 7416000 2468 3694 6091024 9116792 2699 3371 7284601 9098329 2806 3998 7873636 11218388 3082 3555 9498724 10956510 3209 4692 10297681 15056628 3466 4244 12013156 14709704 3643 5298 13271449 19300614 3852 4801 14837904 18493452 4033 5147 16265089 20757852 4267 5738 18207288 24484046 4498 6420 20232004 28877160 4533 6059 20548088 27465448 4804 6426 23078416 30870504 5090 6321 25908100 32173890 5233 7026 27384288 36767056 5439 6964 29582720 37877196 79448 106605 293426946 390185014 2 SS x   x  Slide 14  x 2 n  293 , 426 ,946  SS xy   xy  79 , 448  x ( y ) 2  40 ,947 ,557 .84 25 n  390 ,185 ,014  (79 , 448 )(106 ,605 )  51, 402 ,852 .4 25 SS 51, 402 ,852 .4 b  XY   1.255333776  1.26 1 SS 40 ,947 ,557 .84 X b  y b x  0 1 106 ,605 25  79,448   25    (1.255333776 )  274 .85 Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Template (partial output) that can be used to carry out a Simple Regression Slide 15 Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Template (continued) that can be used to carry out a Simple Regression Slide 16 Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Template (continued) that can be used to carry out a Simple Regression Slide 17 Residual Analysis. The plot shows the absence of a relationship between the residuals and the X-values (miles). Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Template (continued) that can be used to carry out a Simple Regression Slide 18 Note: The normal probability plot is approximately linear. This would indicate that the normality assumption for the errors has not been violated. Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Total Variance and Error Variance Y Slide 19 Y X What you see when looking at the total variation of Y. X What you see when looking along the regression line at the error variance of Y. Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • 10-4 Error Variance and the Standard Errors of Regression Estimators Slide 20 Y Degrees of Freedom in Regression: df = (n - 2) (n total observations less one degree of freedom for each parameter estimated (b 0 and b1 ) ) 2 ( SS XY )  2 SSE =  ( Y - Y )  SSY  SS X = SSY  b1SS XY 2 2 An unbiased estimator of s , denoted by S : SSE MSE = (n - 2) Square and sum all regression errors to find SSE. X Example 10 - 1: SSE = SS Y  b1 SS XY  66855898  (1.255333776)( 51402852 .4 )  2328161.2 MSE  SSE n2  101224 .4 s  MSE   2328161.2 23 101224 .4  318.158 Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Standard Errors of Estimates in Regression The standard error of b0 (intercept): s(b0 )  where s = s x2  nSS X MSE The standard error of b1 (slope): s(b1 )  s SS X Slide 21 Example 10 - 1: 2 s x s(b0 )  nSS X 318.158 293426944  ( 25)( 4097557.84 )  170.338 s s(b1 )  SS X 318.158  40947557.84  0.04972 Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Confidence Intervals for the Regression Parameters A (1 - a ) 100% confidence interval for b : 0 b  t a s (b ) 0  ,(n 2 ) 0  2  A (1 - a ) 100% confidence interval for b : 1 b  t a s (b ) 1  ,(n 2 ) 1  2  Least-squares point estimate: b1=1.25533 0 Example 10 - 1 95% Confidence Intervals: b t s (b ) 0  0.025,( 25 2 ) 0 = 274.85  ( 2.069) (170.338)  274.85  352.43  [ 77.58, 627.28] b1  t  0.025,( 25 2 ) s (b1 ) = 1.25533  ( 2.069) ( 0.04972 )  1.25533  010287 .  [115246,1.35820] . Height = Slope Length = 1 Slide 22 (not a possible value of the regression slope at 95%) Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Slide 23 Template (partial output) that can be used to obtain Confidence Intervals for 0 and 1 Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • 10-5 Correlation Slide 24 The correlation between two random variables, X and Y, is a measure of the degree of linear association between the two variables. The population correlation, denoted by r, can take on any value from -1 to 1. r  1 indicates a perfect negative linear relationship -1 < r < 0 indicates a negative linear relationship r0 indicates no linear relationship 0<r<1 indicates a positive linear relationship r1 indicates a perfect positive linear relationship The absolute value of r indicates the strength or exactness of the relationship. Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Slide 25 Illustrations of Correlation Y r  1 Y r  8 X Y X X Y r0 Y r0 X r1 X Y r  8 X Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Covariance and Correlation Slide 26 The covariance of two random variables X and Y: Cov ( X , Y )  E [( X  m )(Y  m )] X Y where m and m Y are the population means of X and Y respectively. X The population correlation coefficient: Cov ( X , Y ) r= s s X Y The sample correlation coefficient * : SS r= Not: XY SS SS X Y Exampl 101:  SS XY r SS SS X Y 51402852 4  40947557  84 66855898  51402852 4  9824 52321943 29  If r < 0 1 < 0 If r  0 1  0 If r > 0 1 >0 Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Hypothesis Tests for the Correlation Coefficient H0: r = 0 (No linear relationship) H1: r  0 (Some linear relationship) Test Statistic: r t( n 2 )  1 r2 n2 Slide 27 Example 10 -1: r t( n  2 )  1 r2 n2 0.9824 = 1 - 0.9651 25 - 2 0.9824 =  25.25 0.0389 t0. 005  2.807 < 25.25 H 0 rejected at 1% level Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • 10-6 Hypothesis Tests about the Regression Relationship Constant Y Unsystematic Variation Y Y X Slide 28 Nonlinear Relationship Y X X A hypothesis test for the existence of a linear relationship between X and Y: H0: b1 = 0 H1: b 1 ¹ 0 Test statistic for the existence of a linear relationship between X and Y: b 1 = t (n - 2) s(b ) 1 where b is the least - squares estimate of the regression slope and s ( b ) is the standard error of b . 1 1 1 When the null hypothesis is true, the statistic has a t distribution with n - 2 degrees of freedom. Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Hypothesis Tests for the Regression Slope Example 10 - 4 : Example 10 - 1: H0: 1  0 H1:  1  0  t b 1 s(b ) 1 1.25533 (n - 2) = Slide 29  25.25 H :  1 0 1 H :  1 1 1 b 1 t  1 ( n - 2) s (b ) 1 1.24 - 1 =  1.14 0.21 0.04972  2.807 < 25.25 t ( 0 . 005 , 23 ) H 0 is rejected at the 1% level and we may conclude that there is a relationship between charges and miles traveled.  1.671 > 1.14 (0.05,58) H is not rejected at the 10% level. 0 We may not conclude that the beta coefficient is different from1. t Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • 10-7 How Good is the Regression? Slide 30 The coefficient of determination, r2, is a descriptive measure of the strength of the regression relationship, a measure of how well the regression line fits the data. Y ˆ ( y  y )  ( y  y)  Total = Unexplained Deviation Deviation (Error) . Y Unexplained Deviation } {  Y Explained Deviation ˆ ( y  y) Explained Deviation (Regression) ˆ ˆ  ( y  y ) 2   ( y  y )2   ( y  y ) SST = SSE + SSR Total Deviation { 2 r 2  SSR 1 SSE SST SST Y X Percentage of total variation explained by the regression. X Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • The Coefficient of Determination Y Y SST r2=0 Y X SSE X r2=0.50 SST SSE SSR r2=0.90 S S E SST SSR 6000 Dollars SSR 64527736.8   0.96518 SST 66855898 X 7000 Example 10 -1: r2  Slide 31 5000 4000 3000 2000 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 Miles Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • 10-8 Analysis of Variance and an F Test of the Regression Model Source of Variation Sum of Squares Regression SSR Slide 32 Degrees of Freedom Mean Square F Ratio (1) MSR Error SSE (n-2) MSE Total SST (n-1) MSR MSE MST Example 10-1 Source of Variation Sum of Squares Regression 64527736.8 Degrees of Freedom F Ratio p Value 1 Mean Square 64527736.8 637.47 101224.4 Error 2328161.2 23 Total 66855898.0 0.000 24 Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Slide 33 Template (partial output) that displays Analysis of Variance and an F Test of the Regression Model Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • 10-9 Residual Analysis and Checking for Model Inadequacies Slide 34 Residuals Residuals 0 0  x or y  x or y Homoscedasticity: Residuals appear completely random. No indication of model inadequacy. Residuals Heteroscedasticity: Variance of residuals changes when x changes. Residuals 0 0 Time Residuals exhibit a linear trend with time.  x or y Curved pattern in residuals resulting from underlying nonlinear relationship. Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Normal Probability Plot of the Residuals Slide 35 Flatter than Normal Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Normal Probability Plot of the Residuals Slide 36 More Peaked than Normal Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Normal Probability Plot of the Residuals Slide 37 More Positively Skewed than Normal Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Normal Probability Plot of the Residuals Slide 38 More Negatively Skewed than Normal Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • 10-10 Use of the Regression Model for Prediction • • Slide 39 Point Prediction A single-valued estimate of Y for a given value of X obtained by inserting the value of X in the estimated regression equation. Prediction Interval For a value of Y given a value of X » Variation in regression line estimate » Variation of points around regression line For an average value of Y given a value of X » Variation in regression line estimate Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Errors in Predicting E[Y|X] Y Y Upper limit on slope Slide 40 Upper limit on intercept Regression line Lower limit on slope Y X X 1) Uncertainty about the slope of the regression line Regression line Y Lower limit on intercept X X 2) Uncertainty about the intercept of the regression line Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Prediction Interval for E[Y|X] Y • Prediction band for E[Y|X] Regression line • Y X X Prediction Interval for E[Y|X] • Slide 41 The prediction band for E[Y|X] is narrowest at the mean value of X. The prediction band widens as the distance from the mean of X increases. Predictions become very unreliable when we extrapolate beyond the range of the sample itself. Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Additional Error in Predicting Individual Value of Y Y Regression line Y Slide 42 Prediction band for E[Y|X] Regression line Y Prediction band for Y X 3) Variation around the regression line X X Prediction Interval for E[Y|X] Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Prediction Interval for a Value of Y Slide 43 A (1 - a ) 100% prediction interval for Y : 1 (x  x) y  t  s 1  ˆ n SS 2 a 2 X Example 10 - 1 (X = 4,000) : 1 (4,000  3,177.92) {274.85  (1.2553)(4,000)}  2.069  318.16 1   25 40,947,557.84 2  5296 .05  676.62  [4619 .43, 5972 .67] Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Prediction Interval for the Average Value of Y Slide 44 A (1 - a ) 100% prediction interval for the E[ Y X] : 1 (x  x) yt s  ˆ n SS 2 a 2 X Example 10 - 1 (X = 4,000) : 1 (4,000  3,177.92) {274.85  (1.2553)(4,000)}  2.069  318.16  25 40,947,557.84 2  5,296.05  156.48  [5139 .57, 5452 .53] Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Template Output with Prediction Intervals Slide 45 Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • 10-11 The Solver Method for Regression Slide 46 The solver macro available in EXCEL can also be used to conduct a simple linear regression. See the text for instructions. Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer
  • Slide 47 Name Religion Domicile Contact # E.Mail M.Phil (Statistics) Shakeel Nouman Christian Punjab (Lahore) 0332-4462527. 0321-9898767 sn_gcu@yahoo.com sn_gcu@hotmail.com GC University, . (Degree awarded by GC University) M.Sc (Statistics) Statitical Officer (BS-17) (Economics & Marketing Division) GC University, . (Degree awarded by GC University) Livestock Production Research Institute Bahadurnagar (Okara), Livestock & Dairy Development Department, Govt. of Punjab Simple Linear Regression and Correlation By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer