Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Predicting Azure Churn with
Deep Learning and Explaining
Predictions with LIME
Feng Zhu (Speaker)
Chao Zhong
Shijing Fang
...
 Azure customers and churn
 Customer data and features
 Current ML pipelines and challenges
 Apply deep learning to im...
Azure Customers and Churn
Customer attrition is expensive!
Industry growth leaders (>35% YoY) spend ~40% of
revenue on acquiring new customers via s...
A Predictive Model for Churn is Needed
To identify customers at high risk of churning
• The model outputs a ranked list of...
Customer Features and ML Pipelines
Customer and Churn Definitions
Subscription
Data
Subscription
Level
Customer
Level
Customer ID 1
Subscription ID 1
Billing...
Mar Apr May
Customer Data and Features
Snapshot
Data
• Offer Type
• Tenure Age
• Segmentation
Time Series
Data
• Overall U...
Feature Extraction from Time Series Data
Time Series
(20+)
• FreeUsage
• Usage of Cloud Services
• …
Extraction
Functions
...
Churn V1 ML Pipeline
Customer
Data
• Time series
usage
• Snapshot billing
status
Feature
Engineering
• Extract features
fr...
Churn V2 ML Pipeline
Customer Usage
Data
• Time series usage
• Snapshot billing
status
Deep Learning
Model
• Deep neural
n...
Deep Learning Models
What is Deep Learning (DL)?
 Deep learning is a class of neural
network based ML algorithms
- Transform raw inputs throug...
Overview of DL Model Architecture
DNN Layer 1
DNN Layer 2
DNN Layer 3
DNN Layer 4
DNN Layer 5
RNN Layer 1
RNN Layer 2
Outp...
DNN Layers (Left Path)
Output Layer
 Five identical DNN layers are stacked to
transform static features.
- Input dimensio...
RNN Layers (Right Path)
h1 h2 … h5
…
…
…
LSTM Layer 1
Time Series Input
LSTM Layer 2
x1 x2 … x56
 Two identical RNN (LSTM...
Model Performance Improved
Model AUC Precision-
Recall
AUC
Recall
(@Precision
=0.562)
Random Forest
(V1 model)
0.836 0.325...
Business Impact
 The payout curve shows significant potential
margin from DL model.
- Select top X percent of customer ra...
Explaining Model Predictions Using LIME
Customer Usage
Data
• Time series usage
• Snapshot billing
status
Deep Learning
Model
• Deep neural
networks
• Recurrent n...
LIME Algorithm
 LIME is an interface between scores and
end users.
- Local Interpretable Model-agnostic
Explanations
- Us...
An Example Using LIME
 LIME takes a scoring function and a test
instance as input.
- Assume a model is trained and its
sc...
Explaining Churn Score of A Customer
 LIME can provide the evidences of
churning and reduce “information
overwhelm”
- Pic...
Summary
Lessons Learned and Future Plans
Lessons learned
- “In Deep Learning, Architecture Engineering is the New Feature
Enginee...
Thanks! Questions?
Upcoming SlideShare
Loading in …5
×

Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME

4,582 views

Published on

Although deep learning has proved to be very powerful, few results are reported on its application to business-focused problems. Feng Zhu and Val Fontama explore how Microsoft built a deep learning-based churn predictive model and demonstrate how to explain the predictions using LIME—a novel algorithm published in KDD 2016—to make the black box models more transparent and accessible.

Published in: Technology
  • Today, I want to share with you my own "unfair advantage" ... An honest crack at an insider's edge that's so effective it's nothing less than performance enhancing for your own bottom line profits! ●●● http://t.cn/A6hPRSfx
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • USA Today Has Proof That Lotto Is NOT Random ●●● https://tinyurl.com/t2onem4
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME

  1. 1. Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME Feng Zhu (Speaker) Chao Zhong Shijing Fang Ryan Bouchard Val Fontama (Speaker)
  2. 2.  Azure customers and churn  Customer data and features  Current ML pipelines and challenges  Apply deep learning to improve model performance  Explain model predictions using LIME
  3. 3. Azure Customers and Churn
  4. 4. Customer attrition is expensive! Industry growth leaders (>35% YoY) spend ~40% of revenue on acquiring new customers via sales and marketing. For renewal customers, they incentivize their sales team with 7%+ commission. The fastest growing companies generate up to 40% of their revenue from upsells (avg. is 16%) with goals of <3% monthly attrition. Not all customers are acquired equallyAcquiring customers is expensive Sales & marketing spend Customer acquisition cost Source: 2015 Pacific Crest SaaS Survey (parts 1+2): http://www.forentrepreneurs.com/2015-saas-survey-part-1/
  5. 5. A Predictive Model for Churn is Needed To identify customers at high risk of churning • The model outputs a ranked list of customers with the likelihood of churning • Help prioritize a list of customers for churn intervention • Help optimize resource allocation and ROI Business impact • Increase customer retention and Azure market share • Increase profitability of Azure business • Increase customer satisfaction and loyalty
  6. 6. Customer Features and ML Pipelines
  7. 7. Customer and Churn Definitions Subscription Data Subscription Level Customer Level Customer ID 1 Subscription ID 1 Billing status data Time series usage data… Subscription ID n  Each customer can have multiple subscriptions across multiple cloud services  Most usage and billing data are generated at subscription level and then aggregated to customer level.  Two types of churns - System churn: No active usage in 28 days - Usage churn: Cancel all subscriptions Mar Apr May Jun
  8. 8. Mar Apr May Customer Data and Features Snapshot Data • Offer Type • Tenure Age • Segmentation Time Series Data • Overall Usage • Usage of Different Services • Free Usage Churn Labels • Usage Churn Label • System Churn Label  Customer profiles consists of two types of data. - Snapshot data (billing info) - Time series data (detailed usage info)  Use 8 weeks of snapshot and time series data to build customer feature sets.  Label customers as churn or not churn in next 4 weeks
  9. 9. Feature Extraction from Time Series Data Time Series (20+) • FreeUsage • Usage of Cloud Services • … Extraction Functions (20+) • Min Max Mean SD Median • Number of Zero Values • Slope • Sum of Last Interval Values • … Extracted Features (400+) • FreeUsage.min • TotalUnits.max • … … Customer ID 1
  10. 10. Churn V1 ML Pipeline Customer Data • Time series usage • Snapshot billing status Feature Engineering • Extract features from time series • Combine all features together Random Forest Model • Classification model • 400+ features Churn Risk Scores • Weekly scores for all active customers End Users • Take actions based on prediction scores 1.“Feature explosion?” – As we increase # of time series, features extracted from time series will grow exponentially. 1 2 3 3. “Should I trust model/scores?” - If end users do not trust a model or a prediction, they will not use it 2. “Squeeze more juice?” – Can we further improve the model performance?
  11. 11. Churn V2 ML Pipeline Customer Usage Data • Time series usage • Snapshot billing status Deep Learning Model • Deep neural networks • Recurrent neural networks Risk Scores • Weekly scores for all active customers Model Explaining • LIME algorithms • Explain the scores at customer level End Users • Trust the model more • Easy to start the conversion • Take actions based on churn scores 1.Deep learning model – 1) “automated” feature engineering 2) boosts prediction accuracy 1 2 2. LIME – An interface to explain model and predictions to end users
  12. 12. Deep Learning Models
  13. 13. What is Deep Learning (DL)?  Deep learning is a class of neural network based ML algorithms - Transform raw inputs through multiple layers of neural networks - “unreasonably effective” in speech/image recognition and NLP. - DL excels with unstructured data  Deep learning can be applied to many business problems. - Churn prediction - Recommender https://blogs.microsoft.com/next/2016/01/25/microsoft-releases-cntk-its-open-source-deep-learning-toolkit-on-git
  14. 14. Overview of DL Model Architecture DNN Layer 1 DNN Layer 2 DNN Layer 3 DNN Layer 4 DNN Layer 5 RNN Layer 1 RNN Layer 2 Output Layer Static Feature Input Time Series Input Merge Layers DNN Layer 6  The model is a “hybrid” model consisting of two types of neural networks. - Deep neural networks (DNN) layers - Takes static features as input - Fully connected MLP with add-ons - Recurrent neural networks (RNN) layers - Takes time series as input - Long Short Term Memory (LSTM) networks “In Deep Learning, Architecture Engineering is the New Feature Engineering.”
  15. 15. DNN Layers (Left Path) Output Layer  Five identical DNN layers are stacked to transform static features. - Input dimension of DNN layer 1 equals input dimension of static feature input - DNN 2-5 have the same input and output dimensions - Batch normalization layer is for faster convergence and “gradient vanishing” mitigation - Highway-like structure is to mitigate “gradient vanishing” and improve accuracy - The output of DNN layers is the merged output from each layer DNN Layer 1 … DNN Layer 5 Dense Layer Batch Normalization Activation Layer Merge Layer (concat) Static Feature Input
  16. 16. RNN Layers (Right Path) h1 h2 … h5 … … … LSTM Layer 1 Time Series Input LSTM Layer 2 x1 x2 … x56  Two identical RNN (LSTM) layers are stacked to transform time series data. - The input dimension is 18*56. (18 time series and 56 time steps). - Each LSTM layer has 128*56 hidden states. (128 hidden nodes and 56 time steps) - Highway-like structure is also added to mitigate “gradient vanishing” and improve accuracy - The final output of RNN layers is the merged states from the last time step.
  17. 17. Model Performance Improved Model AUC Precision- Recall AUC Recall (@Precision =0.562) Random Forest (V1 model) 0.836 0.325 0.283 DNN Layers only 0.854 (+1.8%) 0.352 (+2.7%) 0.32 (+3.7%) DNN + RNN Layers 0.863 (+2.7%) 0.383 (+5.8%) 0.343 (+6%)
  18. 18. Business Impact  The payout curve shows significant potential margin from DL model. - Select top X percent of customer ranked by the scores - Apply churn intervention to these customers - Assume revenue per churning customer is $m, cost of intervention per customer is $n - Y = Payout = total revenue saved – total cost of intervention - Payout of new model - Payout of old model
  19. 19. Explaining Model Predictions Using LIME
  20. 20. Customer Usage Data • Time series usage • Snapshot billing status Deep Learning Model • Deep neural networks • Recurrent neural networks Risk Scores • Weekly scores for all active customers Model Explaining • LIME algorithms • Explain the scores at customer level End Users • Trust the model more • Easy to start the conversion • Take actions based on churn scores
  21. 21. LIME Algorithm  LIME is an interface between scores and end users. - Local Interpretable Model-agnostic Explanations - Use a simple model to approximate the scoring logic of a black-box classifier - Fit the original classifier locally using sparse linear models through sampling for local exploration
  22. 22. An Example Using LIME  LIME takes a scoring function and a test instance as input. - Assume a model is trained and its scoring function (classifier.predict_proba) is obtained. - Choose a test instance (test_example) which needs to explain. - A sparse linear model is built from the perturbations of the test instance. - Show the coefficients as supporting evidences of predictions for this test instance.
  23. 23. Explaining Churn Score of A Customer  LIME can provide the evidences of churning and reduce “information overwhelm” - Pick on customer for explaining - Supporting features are shown along with the score - Visualize only the related time series (out of many time series) - End users can make informed decisions on if they should reach out to the customer.
  24. 24. Summary
  25. 25. Lessons Learned and Future Plans Lessons learned - “In Deep Learning, Architecture Engineering is the New Feature Engineering.” - Quantify the business value to motivate stakeholders - Making sense of predictions helps operationalize the model. Future Plans - Explore other network architectures for faster training and higher accuracy - Incorporate reinforcement learning to recommend the best churn intervention action
  26. 26. Thanks! Questions?

×