Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Assumptions: 
Check yo self, before 
you wreck yo self. 
Erin Shellman @erinshellman 
Seattle Software Craftsmanship 
Augu...
Assumptions: 
Making an ass out of you 
and me. 
Erin Shellman @erinshellman 
Seattle Software Craftsmanship 
August 28, 2...
I’m Erin, and I’m a 
data scientist.
How much should 
this cost?
What about these?
…and when? 
What about these?
Price optimization
Price optimization 
1. Git yer 
Big Data!
Price optimization 
1. Git yer 
Big Data! 
2. Forecast 
demand
Price optimization 
1. Git yer 
Big Data! 
2. Forecast 
demand 
3. Optimize 
price
Price optimization 
4. Profit!!!!! 
1. 
Big Data! 
2. 
demand 
3. 
price
Price optimization 
1. Git yer 
Big Data! 
2. Forecast 
demand 
3. Optimize 
price 
max 
X 
yi = !0 + !1xi + ✏i revenue
The key is a 
good forecast.
Do the easiest thing 
•Subset the data and focus on one category of 
product. 
• e.g. Alpine ski bindings. 
• Prototype & ...
Do the easiest thing 
•Subset the data and focus on one category of 
product. 
• e.g. Alpine ski bindings. 
• Prototype & ...
Assumptions of SLR 
•We assume that residuals: 
1.Normal, with mean zero. 
2.Are not autocorrelated. 
3.Are unrelated to t...
Checking assumptions is 
hard 
•…and boring! 
•For statistical methods, assumption 
testing traditionally relies on 
visua...
40 60 80 100 120 
0 500 1000 1500 2000 2500 
Fitted values 
Residuals 
Residuals vs Fitted 
117914 
156 
-3 -2 -1 0 1 2 3 ...
OF all the practices you can 
leverage to assist your 
craftsmanship, you will get 
the most benefit from testing. 
! 
Ste...
test_that assumption! 
context("Check assumptions of SLR") 
! 
test_that("The residuals are normally distributed", { 
! 
e...
Tests pass! 
> test_file("./tests/test_slr.R") 
Check assumptions of SLR : [1] "units_sold ~ price" 
... 
!
Psych. 
> test_file("./tests/test_slr.R") 
Check assumptions of SLR : [1] "units_sold ~ price" 
1.. 
!! 
1. Failure(@test_...
Linear? Eh. 
•We assumed the 
2500 
functional form was 
2000 
linear, but there are 
1500 
several common forms 
1000 
th...
Price ($) 
Units Sold 
Price ($) 
Units Sold 
Price ($) 
Units Sold 
Price ($) 
Units Sold 
Linear Log-log 
Linear-log Log...
Price ($) 
Units Sold 
Price ($) 
Units Sold 
Price ($) 
Units Sold 
Price ($) 
Units Sold 
Linear response to change in p...
# Automagically explore SLR with common functional forms 
candidate_models = list(linear = 'units_sold ~ price', 
loglog =...
rut roh… 
> run(candidate_models, slr_data) 
Check assumptions of SLR : [1] "units_sold ~ price" 
1.. 
!! 
1. Failure(@tes...
20000 
15000 
10000 
5000 
0 
Linear Log-log 
0 250 500 750 1000 
Price ($) 
Expected Revenue 
15000 
10000 
5000 
0 
0 25...
20000 
15000 
10000 
5000 
Optimal Price = $322 
0 
Linear Log-log 
0 250 500 750 1000 
Price ($) 
Expected Revenue 
15000...
Mean = 185 
40 
30 
20 
10 
0 
100 200 300 400 
Price ($) 
Counts
In conclusion, these 
forecasts suck. 
We are just 
getting 
warmed up!
Beginner-Intermediate Intermediate-Advanced Advanced-Expert 
2000 
1500 
1000 
500 
0 
0 100 200 300 400 5000 100 200 300 ...
2011-06-01 2011-10-01 2012-02-01 2012-06-01 2012-10-01 2013-02-01 2013-06-01 2013-10-01 2014-02-01 
Date 
Units Sold
2011-06-01 2011-10-01 2012-02-01 2012-06-01 2012-10-01 2013-02-01 2013-06-01 2013-10-01 2014-02-01 
Date 
Units Sold 
TIME...
Try something a little 
smarter 
Units Soldi = α + β1(pricei) + β2(abilityi) + β3(monthi) + εi
Beginner-Intermediate Intermediate-Advanced Advanced-Expert 
15000 
10000 
5000 
0 
15000 
10000 
5000 
0 
15000 
10000 
5...
Yeah, but who cares? 
•Do we need to throw everything out 
just because some assumptions are 
invalidated? 
•What is our g...
Wrap it up. 
1. Do the easiest thing first, and do it well. 
It’s how you’re going to learn the domain, 
and it’s your ben...
Thanks bros!! 
Nathan Decker, Brian Pratt & the Evo crew  
Jason Gowans & Bryan Mayer  
Elissa “Downtown” Brown, forecas...
Click-bait! 
1. Data Carpentry: http://mimno.infosci.cornell.edu/b/articles/carpentry/ 
2. Getting started with testthat. ...
Assumptions: Check yo'self before you wreck yourself
Assumptions: Check yo'self before you wreck yourself
Assumptions: Check yo'self before you wreck yourself
Assumptions: Check yo'self before you wreck yourself
Assumptions: Check yo'self before you wreck yourself
Upcoming SlideShare
Loading in …5
×

of

Assumptions: Check yo'self before you wreck yourself Slide 1 Assumptions: Check yo'self before you wreck yourself Slide 2 Assumptions: Check yo'self before you wreck yourself Slide 3 Assumptions: Check yo'self before you wreck yourself Slide 4 Assumptions: Check yo'self before you wreck yourself Slide 5 Assumptions: Check yo'self before you wreck yourself Slide 6 Assumptions: Check yo'self before you wreck yourself Slide 7 Assumptions: Check yo'self before you wreck yourself Slide 8 Assumptions: Check yo'self before you wreck yourself Slide 9 Assumptions: Check yo'self before you wreck yourself Slide 10 Assumptions: Check yo'self before you wreck yourself Slide 11 Assumptions: Check yo'self before you wreck yourself Slide 12 Assumptions: Check yo'self before you wreck yourself Slide 13 Assumptions: Check yo'self before you wreck yourself Slide 14 Assumptions: Check yo'self before you wreck yourself Slide 15 Assumptions: Check yo'self before you wreck yourself Slide 16 Assumptions: Check yo'self before you wreck yourself Slide 17 Assumptions: Check yo'self before you wreck yourself Slide 18 Assumptions: Check yo'self before you wreck yourself Slide 19 Assumptions: Check yo'self before you wreck yourself Slide 20 Assumptions: Check yo'self before you wreck yourself Slide 21 Assumptions: Check yo'self before you wreck yourself Slide 22 Assumptions: Check yo'self before you wreck yourself Slide 23 Assumptions: Check yo'self before you wreck yourself Slide 24 Assumptions: Check yo'self before you wreck yourself Slide 25 Assumptions: Check yo'self before you wreck yourself Slide 26 Assumptions: Check yo'self before you wreck yourself Slide 27 Assumptions: Check yo'self before you wreck yourself Slide 28 Assumptions: Check yo'self before you wreck yourself Slide 29 Assumptions: Check yo'self before you wreck yourself Slide 30 Assumptions: Check yo'self before you wreck yourself Slide 31 Assumptions: Check yo'self before you wreck yourself Slide 32 Assumptions: Check yo'self before you wreck yourself Slide 33 Assumptions: Check yo'self before you wreck yourself Slide 34 Assumptions: Check yo'self before you wreck yourself Slide 35 Assumptions: Check yo'self before you wreck yourself Slide 36 Assumptions: Check yo'self before you wreck yourself Slide 37 Assumptions: Check yo'self before you wreck yourself Slide 38 Assumptions: Check yo'self before you wreck yourself Slide 39 Assumptions: Check yo'self before you wreck yourself Slide 40 Assumptions: Check yo'self before you wreck yourself Slide 41 Assumptions: Check yo'self before you wreck yourself Slide 42 Assumptions: Check yo'self before you wreck yourself Slide 43 Assumptions: Check yo'self before you wreck yourself Slide 44 Assumptions: Check yo'self before you wreck yourself Slide 45
Upcoming SlideShare
Fun! with the Twitter API
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

Assumptions: Check yo'self before you wreck yourself

Download to read offline

Predicting the future is hard and it requires a lot of assumptions, also known as beliefs, also known as faith. In “Assumptions: Check yo self, before you wreck yo self” we explore the consequences of beliefs when constructing predictive models. We’ll walk through the process of developing a demand forecast for Evo, a Seattle-based outdoor recreation retailer, and discuss how assumptions influence the behavior of your application and ultimately the decisions you make.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Assumptions: Check yo'self before you wreck yourself

  1. 1. Assumptions: Check yo self, before you wreck yo self. Erin Shellman @erinshellman Seattle Software Craftsmanship August 28, 2014 !
  2. 2. Assumptions: Making an ass out of you and me. Erin Shellman @erinshellman Seattle Software Craftsmanship August 28, 2014 !
  3. 3. I’m Erin, and I’m a data scientist.
  4. 4. How much should this cost?
  5. 5. What about these?
  6. 6. …and when? What about these?
  7. 7. Price optimization
  8. 8. Price optimization 1. Git yer Big Data!
  9. 9. Price optimization 1. Git yer Big Data! 2. Forecast demand
  10. 10. Price optimization 1. Git yer Big Data! 2. Forecast demand 3. Optimize price
  11. 11. Price optimization 4. Profit!!!!! 1. Big Data! 2. demand 3. price
  12. 12. Price optimization 1. Git yer Big Data! 2. Forecast demand 3. Optimize price max X yi = !0 + !1xi + ✏i revenue
  13. 13. The key is a good forecast.
  14. 14. Do the easiest thing •Subset the data and focus on one category of product. • e.g. Alpine ski bindings. • Prototype & validate in R. Units Soldi = α + β1(pricei) + εi
  15. 15. Do the easiest thing •Subset the data and focus on one category of product. • e.g. Alpine ski bindings. • Prototype & validate in R. Units Soldi = α + β1(pricei) + εi Residual
  16. 16. Assumptions of SLR •We assume that residuals: 1.Normal, with mean zero. 2.Are not autocorrelated. 3.Are unrelated to the predictors.
  17. 17. Checking assumptions is hard •…and boring! •For statistical methods, assumption testing traditionally relies on visually inspecting plots (and lets be real, most people don’t even do that).
  18. 18. 40 60 80 100 120 0 500 1000 1500 2000 2500 Fitted values Residuals Residuals vs Fitted 117914 156 -3 -2 -1 0 1 2 3 0 2 4 6 8 Theoretical Quantiles Standardized residuals Normal Q-Q 194 171 156 40 60 80 100 120 0.0 0.5 1.0 1.5 2.0 2.5 Fitted values Standardized residuals Scale-Location 117914 156 0.00 0.01 0.02 0.03 0.04 0 2 4 6 8 Leverage Standardized residuals Cook's distance 1 0.5 Residuals vs Leverage 119741 109
  19. 19. OF all the practices you can leverage to assist your craftsmanship, you will get the most benefit from testing. ! Stephen Vance
  20. 20. test_that assumption! context("Check assumptions of SLR") ! test_that("The residuals are normally distributed", { ! expect_that(shapiro.test(model_object$residuals)$p.value, is_more_than(0.05)) ! }) ! test_that("There is no autocorrelation", { ! expect_that(lmtest::bgtest(model_object)$p.value, is_more_than(0.05)) ! }) ! test_that("The residuals are unrelated to the predictor", { ! expect_that(cor(model_object$residuals, data$covariates), equals(0)) ! }) !
  21. 21. Tests pass! > test_file("./tests/test_slr.R") Check assumptions of SLR : [1] "units_sold ~ price" ... !
  22. 22. Psych. > test_file("./tests/test_slr.R") Check assumptions of SLR : [1] "units_sold ~ price" 1.. !! 1. Failure(@test_slr.R#12): The residuals are normally distributed ------------------------ shapiro.test(model_object$residuals)$p.value not more than 0.05. Difference: 0.05 !
  23. 23. Linear? Eh. •We assumed the 2500 functional form was 2000 linear, but there are 1500 several common forms 1000 that might better fit the 500 data. 0 100 200 300 400 500 Price ($) Units Sold
  24. 24. Price ($) Units Sold Price ($) Units Sold Price ($) Units Sold Price ($) Units Sold Linear Log-log Linear-log Log-linear
  25. 25. Price ($) Units Sold Price ($) Units Sold Price ($) Units Sold Price ($) Units Sold Linear response to change in price. Much more sensitive to change in price. More gradual response to changes in price Sensitive initially, then gradual
  26. 26. # Automagically explore SLR with common functional forms candidate_models = list(linear = 'units_sold ~ price', loglog = 'log(units_sold + 1) ~ log(price + 1)', linearlog = 'units_sold ~ log(price + 1)', loglinear = 'log(units_sold + 1) ~ price') ! run = function(candidate_models, input_data) { forecasts = list() test_input = data.frame(price = 0:1000) ! # Forecast for (model in candidate_models) { test_environment = new.env() ! # Generate the forecast forecasts[[model]] = generate_forecast(model, input_data) ! # Save off current value of things for testing assign("model", forecasts[[model]], envir = test_environment) assign("errors", forecasts[[model]]$residuals, envir = test_environment) assign("covariate", input_data$price, envir = test_environment) assign("label", model, envir = test_environment) ! save(test_environment, file = 'env_to_test.Rda') ! # Run assumption tests test_file("./tests/test_slr.R") ! #### OPTIMIZE PRICE!!! #### opt_results = optimizer(forecasts[[model]], test_input) ! # Multiply the predicted demand by the price for expected revenue opt_results$expected_revenue = test_data$price * opt_results$predicted_units_sold ! pdf(paste(model, “.pdf”, sep = ‘’)) plot_price(opt_results) ! } ! return(forecasts) ! }
  27. 27. rut roh… > run(candidate_models, slr_data) Check assumptions of SLR : [1] "units_sold ~ price" 1.. !! 1. Failure(@test_slr.R#12): The residuals are normally distributed --------------------------------- shapiro.test(linear$residuals)$p.value not more than 0.05. Difference: 0.05 ! Check assumptions of SLR : [1] "log(units_sold + 1) ~ log(price + 1)" 1.2 !! 1. Failure(@test_slr.R#12): The residuals are normally distributed --------------------------------- shapiro.test(linear$residuals)$p.value not more than 0.05. Difference: 0.05 ! 2. Failure(@test_slr.R#24): The residuals are unrelated to the predictor --------------------------- cor(test_environment$errors, test_environment$covariate) not equal to 0 Mean absolute difference: 0.05545615 ! Check assumptions of SLR : [1] "units_sold ~ log(price + 1)" 1.2 !! 1. Failure(@test_slr.R#12): The residuals are normally distributed --------------------------------- shapiro.test(linear$residuals)$p.value not more than 0.05. Difference: 0.05 ! 2. Failure(@test_slr.R#24): The residuals are unrelated to the predictor --------------------------- cor(test_environment$errors, test_environment$covariate) not equal to 0 Mean absolute difference: 0.04201906 ! Check assumptions of SLR : [1] "log(units_sold + 1) ~ price" 1.. !! 1. Failure(@test_slr.R#12): The residuals are normally distributed --------------------------------- shapiro.test(linear$residuals)$p.value not more than 0.05. Difference: 0.05
  28. 28. 20000 15000 10000 5000 0 Linear Log-log 0 250 500 750 1000 Price ($) Expected Revenue 15000 10000 5000 0 0 250 500 750 1000 Price ($) Expected Revenue Linear-log Log-linear 6000 4000 2000 0 0 250 500 750 1000 Price ($) Expected Revenue 60000 40000 20000 0 0 250 500 750 1000 Price ($) Expected Revenue
  29. 29. 20000 15000 10000 5000 Optimal Price = $322 0 Linear Log-log 0 250 500 750 1000 Price ($) Expected Revenue 15000 10000 5000 0 0 250 500 750 1000 Price ($) Expected Revenue Linear-log Log-linear 6000 4000 2000 0 0 250 500 750 1000 Price ($) Expected Revenue 60000 40000 20000 0 0 250 500 750 1000 Price ($) Expected Revenue Optimal Price > $1000 Optimal Price = $∞ Optimal Price = $779
  30. 30. Mean = 185 40 30 20 10 0 100 200 300 400 Price ($) Counts
  31. 31. In conclusion, these forecasts suck. We are just getting warmed up!
  32. 32. Beginner-Intermediate Intermediate-Advanced Advanced-Expert 2000 1500 1000 500 0 0 100 200 300 400 5000 100 200 300 400 5000 100 200 300 400 500 Price ($) Units Sold
  33. 33. 2011-06-01 2011-10-01 2012-02-01 2012-06-01 2012-10-01 2013-02-01 2013-06-01 2013-10-01 2014-02-01 Date Units Sold
  34. 34. 2011-06-01 2011-10-01 2012-02-01 2012-06-01 2012-10-01 2013-02-01 2013-06-01 2013-10-01 2014-02-01 Date Units Sold TIME?!
  35. 35. Try something a little smarter Units Soldi = α + β1(pricei) + β2(abilityi) + β3(monthi) + εi
  36. 36. Beginner-Intermediate Intermediate-Advanced Advanced-Expert 15000 10000 5000 0 15000 10000 5000 0 15000 10000 5000 0 15000 10000 5000 0 15000 10000 5000 0 15000 10000 5000 0 15000 10000 5000 0 15000 10000 5000 0 15000 10000 5000 0 15000 10000 5000 0 15000 10000 5000 0 15000 10000 5000 0 1 2 3 4 5 6 7 8 9 10 11 12 0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000 Price ($) Expected Revenue
  37. 37. Yeah, but who cares? •Do we need to throw everything out just because some assumptions are invalidated? •What is our goal? •Is it still better than what we did previously?
  38. 38. Wrap it up. 1. Do the easiest thing first, and do it well. It’s how you’re going to learn the domain, and it’s your benchmark for improvement. 2. Test your assumptions, and invest time in building the tools needed to do that effectively. 3. Be cool, stay in school.
  39. 39. Thanks bros!! Nathan Decker, Brian Pratt & the Evo crew  Jason Gowans & Bryan Mayer  Elissa “Downtown” Brown, forecasting genius  John Foreman, MailChimp  #nordstromdatalab 
  40. 40. Click-bait! 1. Data Carpentry: http://mimno.infosci.cornell.edu/b/articles/carpentry/ 2. Getting started with testthat. http://journal.r-project.org/archive/2011-1/ RJournal_2011-1_Wickham.pdf 3. Clean Code: http://www.amazon.com/Clean-Code-Handbook-Software- Craftsmanship/dp/0132350882/ 4. Quality Code: http://www.amazon.com/Quality-Code-Software-Principles- Practices/dp/0321832981 5. Revenue Management: http://www.amazon.com/Practice-Management- International-Operations-Research/dp/0387243763/ 6. Pricing and Revenue Optimization: http://www.amazon.com/Pricing-Revenue- Optimization-Robert-Phillips-ebook/dp/B005JTDOVE/ 7. Original G, Rob Hyndman: https://www.otexts.org/fpp and http:// robjhyndman.com/hyndsight/

Predicting the future is hard and it requires a lot of assumptions, also known as beliefs, also known as faith. In “Assumptions: Check yo self, before you wreck yo self” we explore the consequences of beliefs when constructing predictive models. We’ll walk through the process of developing a demand forecast for Evo, a Seattle-based outdoor recreation retailer, and discuss how assumptions influence the behavior of your application and ultimately the decisions you make.

Views

Total views

1,464

On Slideshare

0

From embeds

0

Number of embeds

238

Actions

Downloads

11

Shares

0

Comments

0

Likes

0

×