7. Understand the Decision Process
Business Scenario Key Decision Data Science Question
Predictive
Maintenance
Should I service this
piece of equipment?
What is the probability this
equipment will fail within the
next X days?
Energy Forecasting Should I buy or sell
energy contracts?
What will be the long/short
term demand for energy in a
region?
Customer Churn Which customers
should I prioritize to
reduce churn?
What is probability of churn
within X days for each
customer?
Personalized
Marketing
What product should I
offer first?
What is the probability that
customer will purchase each
product?
Product Feedback Which service/product
needs attention?
What is social media sentiment
for each service/product?
13. Example – Establish Performance Metrics
Establish a Qualitative
Objective
Translate into
Quantifiable Metric
Quantify the metric value
improvement useful
Establish a baseline Establish how to measure the
improvement in the metric with
the data science solution
(e.g. Increase reliability at
fixed maintenance cost)
(e.g. reduce failure rate
by 50%)
(e.g., 10% fewer failures →
savings of $1MM/year)
(e.g., current failure rate
= 10% per year)
(e.g. 80% of the equipment
maintained based on
predictive model)
15. Experiment –Iterate, Learn and Fail Fast
1. Laser Focused on a Key metric
2. Measure the rate of exploration (count of
experiments)
3. Empower everyone to explore ( 100X people)
16. Experiment –Iterate, Learn and Fail Fast
1. Learn from experiments
• Both Successes or Failures
2. Share the learnings
3. Promote successful experiments to
production
4. Move on to the next hypothesis to
experiment
5. Failure is a valid outcome of an
experiment’
6. Learn and refine the next
experiment
19. Example - E2E Solution
1. Set up the end to end solution and the metrics
2. Launch with a baseline/simple model
3. Act on the recommendations of solution
4. Measure and iterate
21. Example – AI Toolbox of Trick
1. Data Exploration
2. RFM – User Behavior Modeling
3. Hyper parameter tuning
4. Auto Featurization
Note: Domain expertise is still
helpful
22. KDD Cup 2015
Student logs in online
university courses
8M rows
7 columns
Goal
Predict student
churn
23. 2 Challenges
Challenge 1 – How
do we build an initial
experiment that will
be within the 1%
accuracy of the
winning solution?
Challenge 2 – How do we
build an initial
experiment that is
within the top 30 (on
the leaderboard?)
25. RFM
Have the student
Attended class recently?
How many hours have the
student spent on the course?
How many problems have the student solved?
How many videos have the student watched?
Enrollment ID’s Last timestamp.
Count (events) Group By Enrollment ID.
Number of unique hours on which the
enrollment ID has had an eve
26. 800+ entries Winner – AUC 90.9
• Merger of 9 teams
• ~250 iterations
RFM solution – AUC 90.1
• Quick configuration
• 26th place
28. Source: What’s your ML test score? A rubric for ML production systems, Sculley et.al , NeurIPS 2016
29.
30.
31.
32.
33. App Developer
Cloud Services
IDE
Data Scientist
[ { "cat": 0.99218,
"feline": 0.81242 }]
IDE
Apps
Edge Devices
Model Store
Consume Model
DevOps
Pipeline
Customize Model
Deploy Model
Predict
Validate&Flight
Model+App
Update
Application
Publish Model
Collect
Feedback
Deploy
Application
Model
Telemetry
Retrain Model
35. Keep a Human in the Loop
1. How to interpret the model?
2. Importance of Features
3. Bias in the model
4. Interpreting predictions per instance
5. What-if analysis
Empower ALL to perform like the BEST
Automate repetitive human tasks
Embed expert knowledge into the solution