Successfully reported this slideshow.
Upcoming SlideShare
×

# S6 w2 linear regression

576 views

Published on

Published in: Technology, Economy & Finance
• Full Name
Comment goes here.

Are you sure you want to Yes No

• Be the first to like this

### S6 w2 linear regression

1. 1.  Purpose – Determine if one or more IVs can predict a DV Examples: • Does your height (IV) predict how much money you will spend (DV)? • Does the number of store managers predict how often the machine will break down (DV)? • Does the number of clicks (IV1) and the number of comments (IV2) on the blog predict the size of revenue (DV)?
2. 2. Research Question Inferential StatisticsCompare means of 2 numeric T testvariablesRelate 2 categorical variables Pearson Chi SquareRelate 2 numeric variables Pearson Correlation rUse 1+ IVs to explain 1 numeric DV Regression
3. 3.  Correlation tells us how X relates to Y (in the past) Simple Regression tells us how X predicts Y (in the future) • E.g., Does AvgDailyClicks predict DirectSalesRevenue? Multiple Regression tells us how X1, X2, X3, ….. predicts Y • E.g., Do NumberBlogAuthors & AvgDailyClicks predict SponsorRevenue?
4. 4.  The relationship between Xs and Y are linear If you have 2 or more Xs, they are not perfectly correlated with each other Xs are not correlated with external variables Independence – Any two observations should be independent from each other. Errors are normally distributed And a few others
5. 5.  Example:Does Number of Stupid Customers predict Self Checkout Error Rate? When we use X to predict Y: • X = the predictor = the independent variable (IV) • Y = the predicted value = the dependent variable (the value of Y depends on the predictor X) (DV) • You’re basically building a linear model between X and Y: Y = Constant + B*X + error
6. 6.  Y = Constant + B*X + error Y= 1 + 2*X Constant = 1 Slope B = 2 Source: wikepedia
7. 7. Who is the best fitting model? (Hint: Not Kate Moss)Line that’s closest to all dots
8. 8. Goodness of Fit (R2): How well does the line fit the data? (How well does Kate fit the average woman?)(constant) Slope B Distances to regression line = error Good fit = small errors
9. 9.  Y = Constant + B*X + error  DirectSalesRevenue = 19.466-.003*AvgDailyClicks+error Constant is significantly greater than zero Slope (-.003) is significantly less than zeroGoodness of Fit (R2): Model explains 59% variations in DirectSalesRevenue
10. 10. The number of average daily clickssignificantly predicted direct sales revenue, b= -.03, t(39) = 14.72, p < .001. The number ofaverage daily clicks also explained asignificant proportion of variance in directsales revenue, R2 = .59, F(1, 38) = 42.64, p <.001. These findings suggest that, websiteswith more average daily clicks tend to havelower direct sales revenue level.
11. 11. Y=200X (R2 = 45%)Given any X, we can predict value of Y with 45% accuracy
12. 12.  Assumptions: Xs are somewhat independent; Y values are independent; Y values are normally distributed; errors are normally distributed; X Y relations are linear; no outliers • Example: Time series data are NOT independent – stock price today depends on stock price yesterday which depends on stock price the day before, etc. Multiple regression is just an extension of single regression • Use multiple Xs (e.g., both AvgDailyClicks and NumberAuthors) to predict Y • When you have a condition (e.g., customer choice depends on gender; brand awareness depends on comm. channel; number of applications depends on program of study), you need to create an interaction term  next class When an X is categorical (e.g., whether the blog host is Google or WordPress): Code X in numbers – e.g., 0 is Google, 1 is WordPress When Y is categorical (e.g., whether the blog won the Outstanding Blog Award): Code Y in numbers – e.g. 0 is No, 1 is Yes, and use Logistic Regression
13. 13.  What is your Y (the value you want to predict)? Is your Y categorical?  Do you need Logistic Regression? See the instructor for help What is your X (your predictor variable)? How many Xs do you have? Is any of your Xs categorical?  Do you have a coding scheme? Do you have a condition? (e.g., customer choice depends on gender; brand awareness depends on comm. channel; number of applications depends on program of study)  See the instructor for help
14. 14. Research Question Inferential StatisticsCompare means of 2 numeric variables T testRelate 2 numeric variables Pearson Correlation rRelate 2 categorical variables Pearson Chi SquareUse 1+ IVs to explain 1 numeric DV Regression