Linear Regression


Published on

Published in: Education, Technology
1 Comment
1 Like
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Linear Regression

  1. 1. Linear regression<br />Ryan Sain, Ph.D.<br />
  2. 2. Regression Introduced<br />Regression is about prediction<br />Predicting an unknown point based on observations (or measurements)<br />Widgets sold based on advertising<br />We can explore known relationships<br />We can explore unknown relationships<br />
  3. 3. The Variables<br />Outcome Variable<br />The thing we are predicting<br />Number of widgets sold<br />Predictor Variable (simple regression)<br />The variables that you know about<br />Advertising dollars<br />Predictor Variables (multiple regression)<br />We predict values of a dependent variable (outcome) using one or more independent variables (predictors)<br />
  4. 4. The model<br />Any prediction follows the basic formula:<br />Outcomei = (model) + errori<br />In regression our model contains several things:<br />Slope of the line (that best fits the data measured) = b1<br />Intercept of the line (at the Y axis) b0 <br />So our model = Yi = (b0 + b1Xi) + Errori<br />Do you recognize this equation?<br />The model is simply a line<br />
  5. 5. So how do we calculate this line?<br />The Method of Least Squares<br />The line that is the closest to all the data points<br />Residuals = Deviations (distance of actual data points to the line)<br />Square these residuals to get rid of negatives<br />Then sum them. <br />
  6. 6. How well does this line fit?<br />No line is perfect (there are always residuals)<br />If our line is a good one it should be better than a basic line (significantly so)<br />We compare our line to a basic line:<br />Deviation = SUM (observed – model)2<br />This is basically a ‘mean’ (model)<br />The mean is an awful predictor<br />No matter how much you spend on adverts – the sales of your widgets are the same<br />
  7. 7. Fitness continued<br />SSt = total sum of squared differences (using the mean)<br />SSr= total residual sum of squares (using our best fit model)<br />Represents a degree of inaccuracy<br />SSm (model sum of squares) = SSt – SSr<br />Large = our model is different than the simple model<br />Proportion of improvement:<br />R2 = SSm/ SSt<br />Percentage of variation in the outcome that can be explained by our model<br />
  8. 8. More fitness<br />You can assess this using an F test as well<br />F is simply systematic variance/unsystematic variance<br />In regression that means:<br />Improvement of the model (SSm - systematic) and the difference between the model and the observed data (SSr – unsystematic)<br />But we need to look at mean squares<br />Because we need to use the average sums of squares in an F test.<br />So we divide by degrees of freedom<br />For SSm = the number of variables in the model<br />For SSr = number of observations – the number of parameters being estimated (number of beta coefficients or predictors)<br />F = MSM / MSR<br />
  9. 9. Individual Predictors<br />The coefficient bis essentially the gradient of the line<br />If the predictor is not valuable then it will predict no change in the outcome as it changes. <br />This would be b= 0 <br />This is what the mean does<br />If the predictor is valuable – then it will be significantly different than 0. <br />
  10. 10. Individual Predictors cont.<br />To test if b is different from 0 we will use a t-test.<br />We are comparing how big the b value is in comparison to the amount of error in that estimate.<br />We will then use the standard error of the bvalue. <br />t = bobserved – bexpected / SEb<br />Since the expected value is 0 (no change) then we have to simply divide the observed b value by the standard error of b to get the t score.<br />Degress of freedom is calculated using the following: N – p – 1 (p = number of predictors)<br />