Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

S6 w2 linear regression

576 views

Published on

Published in: Technology, Economy & Finance
  • Login to see the comments

  • Be the first to like this

S6 w2 linear regression

  1. 1.  Purpose – Determine if one or more IVs can predict a DV Examples: • Does your height (IV) predict how much money you will spend (DV)? • Does the number of store managers predict how often the machine will break down (DV)? • Does the number of clicks (IV1) and the number of comments (IV2) on the blog predict the size of revenue (DV)?
  2. 2. Research Question Inferential StatisticsCompare means of 2 numeric T testvariablesRelate 2 categorical variables Pearson Chi SquareRelate 2 numeric variables Pearson Correlation rUse 1+ IVs to explain 1 numeric DV Regression
  3. 3.  Correlation tells us how X relates to Y (in the past) Simple Regression tells us how X predicts Y (in the future) • E.g., Does AvgDailyClicks predict DirectSalesRevenue? Multiple Regression tells us how X1, X2, X3, ….. predicts Y • E.g., Do NumberBlogAuthors & AvgDailyClicks predict SponsorRevenue?
  4. 4.  The relationship between Xs and Y are linear If you have 2 or more Xs, they are not perfectly correlated with each other Xs are not correlated with external variables Independence – Any two observations should be independent from each other. Errors are normally distributed And a few others
  5. 5.  Example:Does Number of Stupid Customers predict Self Checkout Error Rate? When we use X to predict Y: • X = the predictor = the independent variable (IV) • Y = the predicted value = the dependent variable (the value of Y depends on the predictor X) (DV) • You’re basically building a linear model between X and Y: Y = Constant + B*X + error
  6. 6.  Y = Constant + B*X + error Y= 1 + 2*X Constant = 1 Slope B = 2 Source: wikepedia
  7. 7. Who is the best fitting model? (Hint: Not Kate Moss)Line that’s closest to all dots
  8. 8. Goodness of Fit (R2): How well does the line fit the data? (How well does Kate fit the average woman?)(constant) Slope B Distances to regression line = error Good fit = small errors
  9. 9.  Y = Constant + B*X + error  DirectSalesRevenue = 19.466-.003*AvgDailyClicks+error Constant is significantly greater than zero Slope (-.003) is significantly less than zeroGoodness of Fit (R2): Model explains 59% variations in DirectSalesRevenue
  10. 10. The number of average daily clickssignificantly predicted direct sales revenue, b= -.03, t(39) = 14.72, p < .001. The number ofaverage daily clicks also explained asignificant proportion of variance in directsales revenue, R2 = .59, F(1, 38) = 42.64, p <.001. These findings suggest that, websiteswith more average daily clicks tend to havelower direct sales revenue level.
  11. 11. Y=200X (R2 = 45%)Given any X, we can predict value of Y with 45% accuracy
  12. 12.  Assumptions: Xs are somewhat independent; Y values are independent; Y values are normally distributed; errors are normally distributed; X Y relations are linear; no outliers • Example: Time series data are NOT independent – stock price today depends on stock price yesterday which depends on stock price the day before, etc. Multiple regression is just an extension of single regression • Use multiple Xs (e.g., both AvgDailyClicks and NumberAuthors) to predict Y • When you have a condition (e.g., customer choice depends on gender; brand awareness depends on comm. channel; number of applications depends on program of study), you need to create an interaction term  next class When an X is categorical (e.g., whether the blog host is Google or WordPress): Code X in numbers – e.g., 0 is Google, 1 is WordPress When Y is categorical (e.g., whether the blog won the Outstanding Blog Award): Code Y in numbers – e.g. 0 is No, 1 is Yes, and use Logistic Regression
  13. 13.  What is your Y (the value you want to predict)? Is your Y categorical?  Do you need Logistic Regression? See the instructor for help What is your X (your predictor variable)? How many Xs do you have? Is any of your Xs categorical?  Do you have a coding scheme? Do you have a condition? (e.g., customer choice depends on gender; brand awareness depends on comm. channel; number of applications depends on program of study)  See the instructor for help
  14. 14. Research Question Inferential StatisticsCompare means of 2 numeric variables T testRelate 2 numeric variables Pearson Correlation rRelate 2 categorical variables Pearson Chi SquareUse 1+ IVs to explain 1 numeric DV Regression

×