SlideShare a Scribd company logo

Assumptions: Check yo'self before you wreck yourself

Predicting the future is hard and it requires a lot of assumptions, also known as beliefs, also known as faith. In “Assumptions: Check yo self, before you wreck yo self” we explore the consequences of beliefs when constructing predictive models. We’ll walk through the process of developing a demand forecast for Evo, a Seattle-based outdoor recreation retailer, and discuss how assumptions influence the behavior of your application and ultimately the decisions you make.

1 of 45
Download to read offline
Assumptions: 
Check yo self, before 
you wreck yo self. 
Erin Shellman @erinshellman 
Seattle Software Craftsmanship 
August 28, 2014 
!
Assumptions: 
Making an ass out of you 
and me. 
Erin Shellman @erinshellman 
Seattle Software Craftsmanship 
August 28, 2014 
!
I’m Erin, and I’m a 
data scientist.
How much should 
this cost?
What about these?
…and when? 
What about these?

Recommended

Downloading the internet with Python + Scrapy
Downloading the internet with Python + ScrapyDownloading the internet with Python + Scrapy
Downloading the internet with Python + ScrapyErin Shellman
 
Web Scraping with Python
Web Scraping with PythonWeb Scraping with Python
Web Scraping with PythonPaul Schreiber
 
How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...
How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...
How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...Anton
 
Web Crawling Modeling with Scrapy Models #TDC2014
Web Crawling Modeling with Scrapy Models #TDC2014Web Crawling Modeling with Scrapy Models #TDC2014
Web Crawling Modeling with Scrapy Models #TDC2014Bruno Rocha
 
Python, web scraping and content management: Scrapy and Django
Python, web scraping and content management: Scrapy and DjangoPython, web scraping and content management: Scrapy and Django
Python, web scraping and content management: Scrapy and DjangoSammy Fung
 
Scrapy talk at DataPhilly
Scrapy talk at DataPhillyScrapy talk at DataPhilly
Scrapy talk at DataPhillyobdit
 

More Related Content

What's hot

Django - 次の一歩 gumiStudy#3
Django - 次の一歩 gumiStudy#3Django - 次の一歩 gumiStudy#3
Django - 次の一歩 gumiStudy#3makoto tsuyuki
 
Hardcore URL Routing for WordPress - WordCamp Atlanta 2014 (PPT)
Hardcore URL Routing for WordPress - WordCamp Atlanta 2014 (PPT)Hardcore URL Routing for WordPress - WordCamp Atlanta 2014 (PPT)
Hardcore URL Routing for WordPress - WordCamp Atlanta 2014 (PPT)Mike Schinkel
 
Code is Cool - Products are Better
Code is Cool - Products are BetterCode is Cool - Products are Better
Code is Cool - Products are Betteraaronheckmann
 
Open Hack London - Introduction to YQL
Open Hack London - Introduction to YQLOpen Hack London - Introduction to YQL
Open Hack London - Introduction to YQLChristian Heilmann
 
Cross Domain Web
Mashups with JQuery and Google App Engine
Cross Domain Web
Mashups with JQuery and Google App EngineCross Domain Web
Mashups with JQuery and Google App Engine
Cross Domain Web
Mashups with JQuery and Google App EngineAndy McKay
 
Zepto.js, a jQuery-compatible mobile JavaScript framework in 2K
Zepto.js, a jQuery-compatible mobile JavaScript framework in 2KZepto.js, a jQuery-compatible mobile JavaScript framework in 2K
Zepto.js, a jQuery-compatible mobile JavaScript framework in 2KThomas Fuchs
 
Building Go Web Apps
Building Go Web AppsBuilding Go Web Apps
Building Go Web AppsMark
 
Angular.js Fundamentals
Angular.js FundamentalsAngular.js Fundamentals
Angular.js FundamentalsMark
 
Using YQL Sensibly - YUIConf 2010
Using YQL Sensibly - YUIConf 2010Using YQL Sensibly - YUIConf 2010
Using YQL Sensibly - YUIConf 2010Christian Heilmann
 
Neo4j: Import and Data Modelling
Neo4j: Import and Data ModellingNeo4j: Import and Data Modelling
Neo4j: Import and Data ModellingNeo4j
 
Working With a Real-World Dataset in Neo4j: Import and Modeling
Working With a Real-World Dataset in Neo4j: Import and ModelingWorking With a Real-World Dataset in Neo4j: Import and Modeling
Working With a Real-World Dataset in Neo4j: Import and ModelingNeo4j
 
Create responsive websites with Django, REST and AngularJS
Create responsive websites with Django, REST and AngularJSCreate responsive websites with Django, REST and AngularJS
Create responsive websites with Django, REST and AngularJSHannes Hapke
 
Working with LifeDesks
Working with LifeDesksWorking with LifeDesks
Working with LifeDesksKatja Schulz
 
solving little problems
solving little problemssolving little problems
solving little problemsAustin Ziegler
 
Essential git fu for tech writers
Essential git fu for tech writersEssential git fu for tech writers
Essential git fu for tech writersGaurav Nelson
 
Django Overview
Django OverviewDjango Overview
Django OverviewBrian Tol
 

What's hot (20)

Scrapy workshop
Scrapy workshopScrapy workshop
Scrapy workshop
 
Django - 次の一歩 gumiStudy#3
Django - 次の一歩 gumiStudy#3Django - 次の一歩 gumiStudy#3
Django - 次の一歩 gumiStudy#3
 
Pydata-Python tools for webscraping
Pydata-Python tools for webscrapingPydata-Python tools for webscraping
Pydata-Python tools for webscraping
 
Hardcore URL Routing for WordPress - WordCamp Atlanta 2014 (PPT)
Hardcore URL Routing for WordPress - WordCamp Atlanta 2014 (PPT)Hardcore URL Routing for WordPress - WordCamp Atlanta 2014 (PPT)
Hardcore URL Routing for WordPress - WordCamp Atlanta 2014 (PPT)
 
Code is Cool - Products are Better
Code is Cool - Products are BetterCode is Cool - Products are Better
Code is Cool - Products are Better
 
Django
DjangoDjango
Django
 
Open Hack London - Introduction to YQL
Open Hack London - Introduction to YQLOpen Hack London - Introduction to YQL
Open Hack London - Introduction to YQL
 
Cross Domain Web
Mashups with JQuery and Google App Engine
Cross Domain Web
Mashups with JQuery and Google App EngineCross Domain Web
Mashups with JQuery and Google App Engine
Cross Domain Web
Mashups with JQuery and Google App Engine
 
Zepto.js, a jQuery-compatible mobile JavaScript framework in 2K
Zepto.js, a jQuery-compatible mobile JavaScript framework in 2KZepto.js, a jQuery-compatible mobile JavaScript framework in 2K
Zepto.js, a jQuery-compatible mobile JavaScript framework in 2K
 
Building Go Web Apps
Building Go Web AppsBuilding Go Web Apps
Building Go Web Apps
 
Angular.js Fundamentals
Angular.js FundamentalsAngular.js Fundamentals
Angular.js Fundamentals
 
Using YQL Sensibly - YUIConf 2010
Using YQL Sensibly - YUIConf 2010Using YQL Sensibly - YUIConf 2010
Using YQL Sensibly - YUIConf 2010
 
Selenium&scrapy
Selenium&scrapySelenium&scrapy
Selenium&scrapy
 
Neo4j: Import and Data Modelling
Neo4j: Import and Data ModellingNeo4j: Import and Data Modelling
Neo4j: Import and Data Modelling
 
Working With a Real-World Dataset in Neo4j: Import and Modeling
Working With a Real-World Dataset in Neo4j: Import and ModelingWorking With a Real-World Dataset in Neo4j: Import and Modeling
Working With a Real-World Dataset in Neo4j: Import and Modeling
 
Create responsive websites with Django, REST and AngularJS
Create responsive websites with Django, REST and AngularJSCreate responsive websites with Django, REST and AngularJS
Create responsive websites with Django, REST and AngularJS
 
Working with LifeDesks
Working with LifeDesksWorking with LifeDesks
Working with LifeDesks
 
solving little problems
solving little problemssolving little problems
solving little problems
 
Essential git fu for tech writers
Essential git fu for tech writersEssential git fu for tech writers
Essential git fu for tech writers
 
Django Overview
Django OverviewDjango Overview
Django Overview
 

Viewers also liked

Fun! with the Twitter API
Fun! with the Twitter APIFun! with the Twitter API
Fun! with the Twitter APIErin Shellman
 
Collaborative Filtering for fun ...and profit!
Collaborative Filtering for fun ...and profit!Collaborative Filtering for fun ...and profit!
Collaborative Filtering for fun ...and profit!Erin Shellman
 
software project management Assumption about conventional model
software project management Assumption about conventional modelsoftware project management Assumption about conventional model
software project management Assumption about conventional modelREHMAT ULLAH
 
Software Risk Management
Software Risk ManagementSoftware Risk Management
Software Risk ManagementGunjan Patel
 

Viewers also liked (7)

Fun! with the Twitter API
Fun! with the Twitter APIFun! with the Twitter API
Fun! with the Twitter API
 
Collaborative Filtering for fun ...and profit!
Collaborative Filtering for fun ...and profit!Collaborative Filtering for fun ...and profit!
Collaborative Filtering for fun ...and profit!
 
real time real talk
real time real talkreal time real talk
real time real talk
 
Bot or Not
Bot or NotBot or Not
Bot or Not
 
network
networknetwork
network
 
software project management Assumption about conventional model
software project management Assumption about conventional modelsoftware project management Assumption about conventional model
software project management Assumption about conventional model
 
Software Risk Management
Software Risk ManagementSoftware Risk Management
Software Risk Management
 

Similar to Assumptions: Check yo'self before you wreck yourself

Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Spencer Fox
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsDarius Barušauskas
 
Simple rules for building robust machine learning models
Simple rules for building robust machine learning modelsSimple rules for building robust machine learning models
Simple rules for building robust machine learning modelsKyriakos Chatzidimitriou
 
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdfMachine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdfMaris R
 
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdfMachine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdfRajatSingh732989
 
Model Explanation and Prediction Exploration Using Spark ML
Model Explanation and Prediction Exploration Using Spark MLModel Explanation and Prediction Exploration Using Spark ML
Model Explanation and Prediction Exploration Using Spark MLDatabricks
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Ram Narasimhan
 
Practical Machine Learning
Practical Machine LearningPractical Machine Learning
Practical Machine LearningDavid Jones
 
Exact Real Arithmetic for Tcl
Exact Real Arithmetic for TclExact Real Arithmetic for Tcl
Exact Real Arithmetic for Tclke9tv
 
Location based sales forecast for superstores
Location based sales forecast for superstoresLocation based sales forecast for superstores
Location based sales forecast for superstoresThaiQuants
 
AddQ Testautomatiseringserfarenheter
AddQ TestautomatiseringserfarenheterAddQ Testautomatiseringserfarenheter
AddQ TestautomatiseringserfarenheterAddQ Consulting
 
Introduction to Simulation- Predictive Analytics
Introduction to Simulation- Predictive AnalyticsIntroduction to Simulation- Predictive Analytics
Introduction to Simulation- Predictive AnalyticsPerformanceG2, Inc.
 
How Shutl Delivers Even Faster Using Neo4J
How Shutl Delivers Even Faster Using Neo4JHow Shutl Delivers Even Faster Using Neo4J
How Shutl Delivers Even Faster Using Neo4JC4Media
 
Mixed Effects Models - Fixed Effect Interactions
Mixed Effects Models - Fixed Effect InteractionsMixed Effects Models - Fixed Effect Interactions
Mixed Effects Models - Fixed Effect InteractionsScott Fraundorf
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlpankit_ppt
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelinesjeykottalam
 
Company segmentation - an approach with R
Company segmentation - an approach with RCompany segmentation - an approach with R
Company segmentation - an approach with RCasper Crause
 
Elementary Data Analysis with MS Excel_Day-4
Elementary Data Analysis with MS Excel_Day-4Elementary Data Analysis with MS Excel_Day-4
Elementary Data Analysis with MS Excel_Day-4Redwan Ferdous
 
Practical unit testing 2014
Practical unit testing 2014Practical unit testing 2014
Practical unit testing 2014Andrew Fray
 

Similar to Assumptions: Check yo'self before you wreck yourself (20)

Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitions
 
Simple rules for building robust machine learning models
Simple rules for building robust machine learning modelsSimple rules for building robust machine learning models
Simple rules for building robust machine learning models
 
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdfMachine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
 
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdfMachine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
 
Model Explanation and Prediction Exploration Using Spark ML
Model Explanation and Prediction Exploration Using Spark MLModel Explanation and Prediction Exploration Using Spark ML
Model Explanation and Prediction Exploration Using Spark ML
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)
 
Practical Machine Learning
Practical Machine LearningPractical Machine Learning
Practical Machine Learning
 
Exact Real Arithmetic for Tcl
Exact Real Arithmetic for TclExact Real Arithmetic for Tcl
Exact Real Arithmetic for Tcl
 
Location based sales forecast for superstores
Location based sales forecast for superstoresLocation based sales forecast for superstores
Location based sales forecast for superstores
 
AddQ Testautomatiseringserfarenheter
AddQ TestautomatiseringserfarenheterAddQ Testautomatiseringserfarenheter
AddQ Testautomatiseringserfarenheter
 
Introduction to Simulation- Predictive Analytics
Introduction to Simulation- Predictive AnalyticsIntroduction to Simulation- Predictive Analytics
Introduction to Simulation- Predictive Analytics
 
How Shutl Delivers Even Faster Using Neo4J
How Shutl Delivers Even Faster Using Neo4JHow Shutl Delivers Even Faster Using Neo4J
How Shutl Delivers Even Faster Using Neo4J
 
Mixed Effects Models - Fixed Effect Interactions
Mixed Effects Models - Fixed Effect InteractionsMixed Effects Models - Fixed Effect Interactions
Mixed Effects Models - Fixed Effect Interactions
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlp
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
 
Agile games
Agile gamesAgile games
Agile games
 
Company segmentation - an approach with R
Company segmentation - an approach with RCompany segmentation - an approach with R
Company segmentation - an approach with R
 
Elementary Data Analysis with MS Excel_Day-4
Elementary Data Analysis with MS Excel_Day-4Elementary Data Analysis with MS Excel_Day-4
Elementary Data Analysis with MS Excel_Day-4
 
Practical unit testing 2014
Practical unit testing 2014Practical unit testing 2014
Practical unit testing 2014
 

Recently uploaded

Les02 Restricting and Sorting Data using SQL.ppt
Les02 Restricting and Sorting Data using SQL.pptLes02 Restricting and Sorting Data using SQL.ppt
Les02 Restricting and Sorting Data using SQL.pptDrZeeshanBhatti
 
100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS
100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS
100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTSi-engage
 
Software Testing life cycle (STLC) Importance, Phases, Benefits...
Software Testing life cycle (STLC) Importance, Phases, Benefits...Software Testing life cycle (STLC) Importance, Phases, Benefits...
Software Testing life cycle (STLC) Importance, Phases, Benefits...Flexsin
 
killing camp week 6 problem - maximal matrix.pdf
killing camp week 6 problem - maximal matrix.pdfkilling camp week 6 problem - maximal matrix.pdf
killing camp week 6 problem - maximal matrix.pdfssuser82c38d
 
Getting Started with Trello for Beginners.pptx
Getting Started with Trello for Beginners.pptxGetting Started with Trello for Beginners.pptx
Getting Started with Trello for Beginners.pptxmavinoikein
 
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A..."Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...ISPMAIndia
 
maximum subarray ppt for killing camp students
maximum subarray ppt for killing camp studentsmaximum subarray ppt for killing camp students
maximum subarray ppt for killing camp studentsssuser82c38d
 
No more Dockerfiles? Buildpacks to help you ship your image!
No more Dockerfiles? Buildpacks to help you ship your image!No more Dockerfiles? Buildpacks to help you ship your image!
No more Dockerfiles? Buildpacks to help you ship your image!Anthony Dahanne
 
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...emili denli
 
The Age of AI: Elevating Experiences & Delivering Customer Value!
The Age of AI: Elevating Experiences & Delivering Customer Value!The Age of AI: Elevating Experiences & Delivering Customer Value!
The Age of AI: Elevating Experiences & Delivering Customer Value!ISPMAIndia
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
Embracing Change - The Impact of Generative AI on Strategic Portfolio Management
Embracing Change - The Impact of Generative AI on Strategic Portfolio ManagementEmbracing Change - The Impact of Generative AI on Strategic Portfolio Management
Embracing Change - The Impact of Generative AI on Strategic Portfolio ManagementOnePlan Solutions
 
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ..."Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...ISPMAIndia
 
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)GDSCNiT
 
killingcamp longest common subsequence.pdf
killingcamp longest common subsequence.pdfkillingcamp longest common subsequence.pdf
killingcamp longest common subsequence.pdfssuser82c38d
 
Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024Asher Sterkin
 
SPM 2024 – Overview of and benefits of AI in Product Management
SPM 2024 – Overview of and benefits of AI in Product ManagementSPM 2024 – Overview of and benefits of AI in Product Management
SPM 2024 – Overview of and benefits of AI in Product ManagementISPMAIndia
 
AI Product Management by Abhijit Bendigiri
AI Product Management by Abhijit BendigiriAI Product Management by Abhijit Bendigiri
AI Product Management by Abhijit BendigiriISPMAIndia
 
App Builder - Hierarchical Data Apps.pptx
App Builder - Hierarchical Data Apps.pptxApp Builder - Hierarchical Data Apps.pptx
App Builder - Hierarchical Data Apps.pptxPoojitha B
 

Recently uploaded (20)

Les02 Restricting and Sorting Data using SQL.ppt
Les02 Restricting and Sorting Data using SQL.pptLes02 Restricting and Sorting Data using SQL.ppt
Les02 Restricting and Sorting Data using SQL.ppt
 
100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS
100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS
100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS
 
Software Testing life cycle (STLC) Importance, Phases, Benefits...
Software Testing life cycle (STLC) Importance, Phases, Benefits...Software Testing life cycle (STLC) Importance, Phases, Benefits...
Software Testing life cycle (STLC) Importance, Phases, Benefits...
 
killing camp week 6 problem - maximal matrix.pdf
killing camp week 6 problem - maximal matrix.pdfkilling camp week 6 problem - maximal matrix.pdf
killing camp week 6 problem - maximal matrix.pdf
 
Getting Started with Trello for Beginners.pptx
Getting Started with Trello for Beginners.pptxGetting Started with Trello for Beginners.pptx
Getting Started with Trello for Beginners.pptx
 
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A..."Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
 
maximum subarray ppt for killing camp students
maximum subarray ppt for killing camp studentsmaximum subarray ppt for killing camp students
maximum subarray ppt for killing camp students
 
No more Dockerfiles? Buildpacks to help you ship your image!
No more Dockerfiles? Buildpacks to help you ship your image!No more Dockerfiles? Buildpacks to help you ship your image!
No more Dockerfiles? Buildpacks to help you ship your image!
 
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
 
The Age of AI: Elevating Experiences & Delivering Customer Value!
The Age of AI: Elevating Experiences & Delivering Customer Value!The Age of AI: Elevating Experiences & Delivering Customer Value!
The Age of AI: Elevating Experiences & Delivering Customer Value!
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
Embracing Change - The Impact of Generative AI on Strategic Portfolio Management
Embracing Change - The Impact of Generative AI on Strategic Portfolio ManagementEmbracing Change - The Impact of Generative AI on Strategic Portfolio Management
Embracing Change - The Impact of Generative AI on Strategic Portfolio Management
 
eLearning Content Development Company Code and Pixels.pdf
eLearning Content Development Company Code and Pixels.pdfeLearning Content Development Company Code and Pixels.pdf
eLearning Content Development Company Code and Pixels.pdf
 
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ..."Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
 
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
 
killingcamp longest common subsequence.pdf
killingcamp longest common subsequence.pdfkillingcamp longest common subsequence.pdf
killingcamp longest common subsequence.pdf
 
Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024
 
SPM 2024 – Overview of and benefits of AI in Product Management
SPM 2024 – Overview of and benefits of AI in Product ManagementSPM 2024 – Overview of and benefits of AI in Product Management
SPM 2024 – Overview of and benefits of AI in Product Management
 
AI Product Management by Abhijit Bendigiri
AI Product Management by Abhijit BendigiriAI Product Management by Abhijit Bendigiri
AI Product Management by Abhijit Bendigiri
 
App Builder - Hierarchical Data Apps.pptx
App Builder - Hierarchical Data Apps.pptxApp Builder - Hierarchical Data Apps.pptx
App Builder - Hierarchical Data Apps.pptx
 

Assumptions: Check yo'self before you wreck yourself

  • 1. Assumptions: Check yo self, before you wreck yo self. Erin Shellman @erinshellman Seattle Software Craftsmanship August 28, 2014 !
  • 2. Assumptions: Making an ass out of you and me. Erin Shellman @erinshellman Seattle Software Craftsmanship August 28, 2014 !
  • 3. I’m Erin, and I’m a data scientist.
  • 4. How much should this cost?
  • 6. …and when? What about these?
  • 8. Price optimization 1. Git yer Big Data!
  • 9. Price optimization 1. Git yer Big Data! 2. Forecast demand
  • 10. Price optimization 1. Git yer Big Data! 2. Forecast demand 3. Optimize price
  • 11. Price optimization 4. Profit!!!!! 1. Big Data! 2. demand 3. price
  • 12. Price optimization 1. Git yer Big Data! 2. Forecast demand 3. Optimize price max X yi = !0 + !1xi + ✏i revenue
  • 13. The key is a good forecast.
  • 15. Do the easiest thing •Subset the data and focus on one category of product. • e.g. Alpine ski bindings. • Prototype & validate in R. Units Soldi = α + β1(pricei) + εi
  • 16. Do the easiest thing •Subset the data and focus on one category of product. • e.g. Alpine ski bindings. • Prototype & validate in R. Units Soldi = α + β1(pricei) + εi Residual
  • 17. Assumptions of SLR •We assume that residuals: 1.Normal, with mean zero. 2.Are not autocorrelated. 3.Are unrelated to the predictors.
  • 18. Checking assumptions is hard •…and boring! •For statistical methods, assumption testing traditionally relies on visually inspecting plots (and lets be real, most people don’t even do that).
  • 19. 40 60 80 100 120 0 500 1000 1500 2000 2500 Fitted values Residuals Residuals vs Fitted 117914 156 -3 -2 -1 0 1 2 3 0 2 4 6 8 Theoretical Quantiles Standardized residuals Normal Q-Q 194 171 156 40 60 80 100 120 0.0 0.5 1.0 1.5 2.0 2.5 Fitted values Standardized residuals Scale-Location 117914 156 0.00 0.01 0.02 0.03 0.04 0 2 4 6 8 Leverage Standardized residuals Cook's distance 1 0.5 Residuals vs Leverage 119741 109
  • 20. OF all the practices you can leverage to assist your craftsmanship, you will get the most benefit from testing. ! Stephen Vance
  • 21. test_that assumption! context("Check assumptions of SLR") ! test_that("The residuals are normally distributed", { ! expect_that(shapiro.test(model_object$residuals)$p.value, is_more_than(0.05)) ! }) ! test_that("There is no autocorrelation", { ! expect_that(lmtest::bgtest(model_object)$p.value, is_more_than(0.05)) ! }) ! test_that("The residuals are unrelated to the predictor", { ! expect_that(cor(model_object$residuals, data$covariates), equals(0)) ! }) !
  • 22. Tests pass! > test_file("./tests/test_slr.R") Check assumptions of SLR : [1] "units_sold ~ price" ... !
  • 23. Psych. > test_file("./tests/test_slr.R") Check assumptions of SLR : [1] "units_sold ~ price" 1.. !! 1. Failure(@test_slr.R#12): The residuals are normally distributed ------------------------ shapiro.test(model_object$residuals)$p.value not more than 0.05. Difference: 0.05 !
  • 24. Linear? Eh. •We assumed the 2500 functional form was 2000 linear, but there are 1500 several common forms 1000 that might better fit the 500 data. 0 100 200 300 400 500 Price ($) Units Sold
  • 25. Price ($) Units Sold Price ($) Units Sold Price ($) Units Sold Price ($) Units Sold Linear Log-log Linear-log Log-linear
  • 26. Price ($) Units Sold Price ($) Units Sold Price ($) Units Sold Price ($) Units Sold Linear response to change in price. Much more sensitive to change in price. More gradual response to changes in price Sensitive initially, then gradual
  • 30. # Automagically explore SLR with common functional forms candidate_models = list(linear = 'units_sold ~ price', loglog = 'log(units_sold + 1) ~ log(price + 1)', linearlog = 'units_sold ~ log(price + 1)', loglinear = 'log(units_sold + 1) ~ price') ! run = function(candidate_models, input_data) { forecasts = list() test_input = data.frame(price = 0:1000) ! # Forecast for (model in candidate_models) { test_environment = new.env() ! # Generate the forecast forecasts[[model]] = generate_forecast(model, input_data) ! # Save off current value of things for testing assign("model", forecasts[[model]], envir = test_environment) assign("errors", forecasts[[model]]$residuals, envir = test_environment) assign("covariate", input_data$price, envir = test_environment) assign("label", model, envir = test_environment) ! save(test_environment, file = 'env_to_test.Rda') ! # Run assumption tests test_file("./tests/test_slr.R") ! #### OPTIMIZE PRICE!!! #### opt_results = optimizer(forecasts[[model]], test_input) ! # Multiply the predicted demand by the price for expected revenue opt_results$expected_revenue = test_data$price * opt_results$predicted_units_sold ! pdf(paste(model, “.pdf”, sep = ‘’)) plot_price(opt_results) ! } ! return(forecasts) ! }
  • 31. rut roh… > run(candidate_models, slr_data) Check assumptions of SLR : [1] "units_sold ~ price" 1.. !! 1. Failure(@test_slr.R#12): The residuals are normally distributed --------------------------------- shapiro.test(linear$residuals)$p.value not more than 0.05. Difference: 0.05 ! Check assumptions of SLR : [1] "log(units_sold + 1) ~ log(price + 1)" 1.2 !! 1. Failure(@test_slr.R#12): The residuals are normally distributed --------------------------------- shapiro.test(linear$residuals)$p.value not more than 0.05. Difference: 0.05 ! 2. Failure(@test_slr.R#24): The residuals are unrelated to the predictor --------------------------- cor(test_environment$errors, test_environment$covariate) not equal to 0 Mean absolute difference: 0.05545615 ! Check assumptions of SLR : [1] "units_sold ~ log(price + 1)" 1.2 !! 1. Failure(@test_slr.R#12): The residuals are normally distributed --------------------------------- shapiro.test(linear$residuals)$p.value not more than 0.05. Difference: 0.05 ! 2. Failure(@test_slr.R#24): The residuals are unrelated to the predictor --------------------------- cor(test_environment$errors, test_environment$covariate) not equal to 0 Mean absolute difference: 0.04201906 ! Check assumptions of SLR : [1] "log(units_sold + 1) ~ price" 1.. !! 1. Failure(@test_slr.R#12): The residuals are normally distributed --------------------------------- shapiro.test(linear$residuals)$p.value not more than 0.05. Difference: 0.05
  • 32. 20000 15000 10000 5000 0 Linear Log-log 0 250 500 750 1000 Price ($) Expected Revenue 15000 10000 5000 0 0 250 500 750 1000 Price ($) Expected Revenue Linear-log Log-linear 6000 4000 2000 0 0 250 500 750 1000 Price ($) Expected Revenue 60000 40000 20000 0 0 250 500 750 1000 Price ($) Expected Revenue
  • 33. 20000 15000 10000 5000 Optimal Price = $322 0 Linear Log-log 0 250 500 750 1000 Price ($) Expected Revenue 15000 10000 5000 0 0 250 500 750 1000 Price ($) Expected Revenue Linear-log Log-linear 6000 4000 2000 0 0 250 500 750 1000 Price ($) Expected Revenue 60000 40000 20000 0 0 250 500 750 1000 Price ($) Expected Revenue Optimal Price > $1000 Optimal Price = $∞ Optimal Price = $779
  • 35. Mean = 185 40 30 20 10 0 100 200 300 400 Price ($) Counts
  • 36. In conclusion, these forecasts suck. We are just getting warmed up!
  • 37. Beginner-Intermediate Intermediate-Advanced Advanced-Expert 2000 1500 1000 500 0 0 100 200 300 400 5000 100 200 300 400 5000 100 200 300 400 500 Price ($) Units Sold
  • 38. 2011-06-01 2011-10-01 2012-02-01 2012-06-01 2012-10-01 2013-02-01 2013-06-01 2013-10-01 2014-02-01 Date Units Sold
  • 39. 2011-06-01 2011-10-01 2012-02-01 2012-06-01 2012-10-01 2013-02-01 2013-06-01 2013-10-01 2014-02-01 Date Units Sold TIME?!
  • 40. Try something a little smarter Units Soldi = α + β1(pricei) + β2(abilityi) + β3(monthi) + εi
  • 41. Beginner-Intermediate Intermediate-Advanced Advanced-Expert 15000 10000 5000 0 15000 10000 5000 0 15000 10000 5000 0 15000 10000 5000 0 15000 10000 5000 0 15000 10000 5000 0 15000 10000 5000 0 15000 10000 5000 0 15000 10000 5000 0 15000 10000 5000 0 15000 10000 5000 0 15000 10000 5000 0 1 2 3 4 5 6 7 8 9 10 11 12 0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000 Price ($) Expected Revenue
  • 42. Yeah, but who cares? •Do we need to throw everything out just because some assumptions are invalidated? •What is our goal? •Is it still better than what we did previously?
  • 43. Wrap it up. 1. Do the easiest thing first, and do it well. It’s how you’re going to learn the domain, and it’s your benchmark for improvement. 2. Test your assumptions, and invest time in building the tools needed to do that effectively. 3. Be cool, stay in school.
  • 44. Thanks bros!! Nathan Decker, Brian Pratt & the Evo crew  Jason Gowans & Bryan Mayer  Elissa “Downtown” Brown, forecasting genius  John Foreman, MailChimp  #nordstromdatalab 
  • 45. Click-bait! 1. Data Carpentry: http://mimno.infosci.cornell.edu/b/articles/carpentry/ 2. Getting started with testthat. http://journal.r-project.org/archive/2011-1/ RJournal_2011-1_Wickham.pdf 3. Clean Code: http://www.amazon.com/Clean-Code-Handbook-Software- Craftsmanship/dp/0132350882/ 4. Quality Code: http://www.amazon.com/Quality-Code-Software-Principles- Practices/dp/0321832981 5. Revenue Management: http://www.amazon.com/Practice-Management- International-Operations-Research/dp/0387243763/ 6. Pricing and Revenue Optimization: http://www.amazon.com/Pricing-Revenue- Optimization-Robert-Phillips-ebook/dp/B005JTDOVE/ 7. Original G, Rob Hyndman: https://www.otexts.org/fpp and http:// robjhyndman.com/hyndsight/