Customer Choice Probability Prediction
Machine Learning: Process Walkthrough
June 2019
2
3
Who Are We
4
Walkthrough of Production Modeling Solution
5
5
Machine Learning is a “Process”
Problem
Formulation
Model
Management
Feature
Engineering
Model Learning
Label
Preparation
Model
Deployment
6
6
Case Study – B2C Modeling
Problem
Formulation
Model
Management
Feature
Engineering
Model Learning
Label
Preparation
Model
Deployment
7
7
Problem Formulation
Challenges of This Problem:
• Class imbalance
• Definition of churn can vary for predictive purposes.
• Data evolves dynamically, time series events.
• Data is sparse and noisy.
Binary classification problem: let 𝑦𝑖 represent the product status of customer.
1: Churn
0: Customer
Which customers are going to renew less, and what can we do?
𝑦𝑖 =
What is the best channel to acquire?
Which customer has upsell potential, why and what product?
8
8
Machine Learning is a “Process”
Problem
Formulation
Model
Management
Feature
Engineering
Model Learning
Label
Preparation
Model
Deployment
9
9
Label Preparation
Label: Close/Renew
• Negative – 1: Customer churned
• Positive – 0: Customer renewed
10
10
Machine Learning is a “Process”
Problem
Formulation
Model
Management
Feature
Engineering
Model Learning
Label
Preparation
Model
Deployment
11
11
Feature Collecting
• Feature is an individual measurable property or characteristic of a phenomenon being observed.
Demographic Features
• Demographics
• Personal interests
• Professional interests
Etc.
Behavioral Features
• Pageviews
• Account behavior
• Campaign behavior
• Professional interests
12
12
Feature Engineering
• Numeric values:
• Compute basic statistics such as: sum/avg/coverage/percentiles
• Define anomaly with context: seasonal, product evolvement, etc.
• Approach: percentage change, Z-score, etc -> aware of any statistical assumption restrictions.
• Outlier in usage data: eg 25 kwh, 30,000 kwh.
• Categorical values:
• Convert to number
• One-hot-encoding: binary indicator for each categorical value
• Ordered categorical (ordinal) 1-10 -> 5, 20-30 -> 25
• Too many levels in marketing channel data which is a categorical data.
o Ex. product channel product_channel_is_ptc : {0,1}
product_channel_is_energyorgre: {0,1}
Interactions
• Cross-products of feature types, ex.
o {𝑒𝑛𝑒𝑟𝑔𝑦𝑟𝑎𝑡𝑒𝑆𝐹𝐻} X {𝑢𝑠𝑎𝑔𝑒 𝑎𝑚𝑜𝑢𝑛𝑡𝐵𝑟𝑎𝑛𝑑}
13
13
Machine Learning is a “Process”
Problem
Formulation
Model
Management
Feature
Engineering
Model Learning
Label
Preparation
Model
Deployment
14
14
Model Learning
• Stable model, so we chose Gradient Boosting Machines (GMB).
• GBM is unique compared to other decision tree algorithms because it builds models sequentially with higher weights given to those
cases that were poorly predicted in previous models.
• Giving higher weights to poorly predicted cases improves accuracy incrementally instead of simply taking an average of all models
like a random forest algorithm.
• By reducing the error iteratively to produce what will become the final model, GBM is an efficient and powerful algorithm for
classification and regression problems.
• Hyperparameter tips:
o Number of trees:
❑ Large data needs many trees.
❑ Many features needs many trees.
❑ More trees will reduce bias but also comes with more computational costs.
• Compare error rate in training set and in the validation set to catch possible overfitting/bias.
15
15
Model Learning - Evaluation
• Standard AUC: 0.74
o Diagonal line: random guess
o Above diagonal line
❑ Normal prediction
❑ Curves close to the perfect prediction have a better performance level than the ones close to the baseline.
o Below diagonal line
❑ Poor prediction
• Check feature importance to see if they pass the smell test.
• Renewal/Recontract rate comparison between models.
• Voluntary/Involuntary rate comparison between models.
16
16
Performance Summary
17
Methods to Predict Our Churners
• Random Forest - Robust for outliers and have good
performance but was slow for large dataset and did not
produce as accurate predictions.
• L2 (Ridge) Regression / Elasticnet – Minimizes
multicollinearity while also reducing variance of the
model.
• Artificial Neural Network – Deep Learning that acquires
knowledge through learning (nodes). Performs best for
tasks like clustering, classification, and pattern
recognition.
• Extreme Gradient Boosting Trees – decision-tree based
machine learning technique that optimizes fit by
modifying the remaining error of multiple prior
weaker/simpler models.
18
18
Machine Learning is a “Process”
Problem
Formulation
Model
Management
Feature
Engineering
Model Learning
Label
Preparation
Model
Deployment
19
Spark ML Pipelines
20
20
Model Deployment Paths
Mleap
• Better for real-time prediction of a small number of
records
• Doesn't require Spark session, portable to apps/devices
that support JVM
Spark ML Persistence
•Appropriate for batch jobs, scoring lots of records at
once
•Requires Spark session
21
21
Model Deployment Paths
Towards a better deployment story
Data Scientist: Hey this logistic regression churn model is ready to go! Here is the parquet file, and here is the documentation you need to
use it.
Big Data Engineer: Awesome! We won't need to write a bazillion if-else statements to recreate the model!
When the model needs updating...
Data Scientist: We decided to use a GBM instead for better log loss error, here's the updated bundle file.
Big Data Engineer: Fantabulous! All we need to do is update the model directory!
Source: https://sais2018.netlify.com/#39
22
22
22
Streams of Work
• Data: Bring in (link to modelling data-set)
remaining items on Data Sources List.
• Modelling :
• New ML approaches
• Model non-linearities (eg. cubic
spline) in key continuous vars
• Model factor interactions
• Further tuning of models
• Combination/ensemble models
• Time-Series models for precision tuning of portfolio-level predictions
• Extended Problem (2019 H2+ start): Estimate effects of main factors on choice probabilities (customer
specific factors, DE-controlled factors & external factors)
23
23
23
Goal and Success Criteria
Probability Forecasting for Customer Choice (i.e. stay, churn, re-contract and
renew) at individual level, all ERCOT customers over 30-day and 120-day
windows.
Models will be judged and selected based on both cross-sectional out-of-sample
performance and (monthly) time-series performance against actual choice
events.
Final model will have to have the best discrimination power between customers
and will have to roll into an accurate aggregated portfolio choice predictor ( by
business segment). A proper model scoring rule is required. We will use:
• Log-Loss (a.k.a. Cross Entropy) – principal scoring rule (proper)
• Brier (a.k.a. Mean Sq Loss) – secondary use (proper for binary event prediction only)
• ROC AUC (area under the curve) – ranking accuracy rule (not proper but informative on
discrimination power)
• Decile Band Prediction Matching (not proper but informative on distributional match of predicted
probabilities to realized choice events)
24 DO NOT FORWARD | CONFIDENTIAL
Machine Learning is a “Process”
Problem
Formulation
Model
Management
Feature
Engineering
Model Learning
Label
Preparation
Model
Deployment
25
25
Model Deployment - Management
• Schedule and run the scoring weekly.
• Need to score customer accounts as well as “new” customer accounts for completeness. Do customers with invoice
score higher?
• After each scoring do some sniff test, ex. Are early tenure customers lining up as expected?
• Monitor model/feature performance.
• Refresh model as needed.
• Weekly review new wins/losses by segment.
26
26
Model Interpretation
Voluntary Score = 0.6
Roxanne
Score = 0.9
Aimee
Score = 0.9
Involuntary Score = 0.2
Recontract Score = 0.1
Voluntary Score = 0.1
Involuntary Score = 0.3
Renewal Score = 0.5
…Take advantage of our summer savings with a X$ bill discount. … Do you know about DE’s new Echo Dot plan?
27
27
Common Pitfalls and Challenges
Problem
Formulation
Model
Management
Feature
Engineering
Model
Learning
Label
Preparation
Model
Deployment
-Label Quality/ Noise
-Class Imbalance
-Model Degradation
-Feature quality monitoring
-Data quality
-Categorical Data
-Missing Data
-Outliers
-High Dimensionality
-Overfitting
-Scalability, speed, fast iteration
-Model Interpretation
-A/B testing
-Dependencies
28
Wrap Up
Inspirations/other talks to check out
•“Big data analytics and machine learning techniques to drive and grow
business" BigDataAnalyticwsandMLTechniques Micheal Lie, Chi-Yi Kuan, Wei Di, Burcu Baran
•“From Prototyping to Deployment at Scale with R and sparklyr" https://sais2018.netlify.com/#1 Kevin Kuo
•"MLeap and Combust ML" https://youtu.be/MGZDF6E41r4 Hollin Wilkins and Mikhail Semeniuk

Customer choice probabilities

  • 1.
    Customer Choice ProbabilityPrediction Machine Learning: Process Walkthrough June 2019
  • 2.
  • 3.
  • 4.
    4 Walkthrough of ProductionModeling Solution
  • 5.
    5 5 Machine Learning isa “Process” Problem Formulation Model Management Feature Engineering Model Learning Label Preparation Model Deployment
  • 6.
    6 6 Case Study –B2C Modeling Problem Formulation Model Management Feature Engineering Model Learning Label Preparation Model Deployment
  • 7.
    7 7 Problem Formulation Challenges ofThis Problem: • Class imbalance • Definition of churn can vary for predictive purposes. • Data evolves dynamically, time series events. • Data is sparse and noisy. Binary classification problem: let 𝑦𝑖 represent the product status of customer. 1: Churn 0: Customer Which customers are going to renew less, and what can we do? 𝑦𝑖 = What is the best channel to acquire? Which customer has upsell potential, why and what product?
  • 8.
    8 8 Machine Learning isa “Process” Problem Formulation Model Management Feature Engineering Model Learning Label Preparation Model Deployment
  • 9.
    9 9 Label Preparation Label: Close/Renew •Negative – 1: Customer churned • Positive – 0: Customer renewed
  • 10.
    10 10 Machine Learning isa “Process” Problem Formulation Model Management Feature Engineering Model Learning Label Preparation Model Deployment
  • 11.
    11 11 Feature Collecting • Featureis an individual measurable property or characteristic of a phenomenon being observed. Demographic Features • Demographics • Personal interests • Professional interests Etc. Behavioral Features • Pageviews • Account behavior • Campaign behavior • Professional interests
  • 12.
    12 12 Feature Engineering • Numericvalues: • Compute basic statistics such as: sum/avg/coverage/percentiles • Define anomaly with context: seasonal, product evolvement, etc. • Approach: percentage change, Z-score, etc -> aware of any statistical assumption restrictions. • Outlier in usage data: eg 25 kwh, 30,000 kwh. • Categorical values: • Convert to number • One-hot-encoding: binary indicator for each categorical value • Ordered categorical (ordinal) 1-10 -> 5, 20-30 -> 25 • Too many levels in marketing channel data which is a categorical data. o Ex. product channel product_channel_is_ptc : {0,1} product_channel_is_energyorgre: {0,1} Interactions • Cross-products of feature types, ex. o {𝑒𝑛𝑒𝑟𝑔𝑦𝑟𝑎𝑡𝑒𝑆𝐹𝐻} X {𝑢𝑠𝑎𝑔𝑒 𝑎𝑚𝑜𝑢𝑛𝑡𝐵𝑟𝑎𝑛𝑑}
  • 13.
    13 13 Machine Learning isa “Process” Problem Formulation Model Management Feature Engineering Model Learning Label Preparation Model Deployment
  • 14.
    14 14 Model Learning • Stablemodel, so we chose Gradient Boosting Machines (GMB). • GBM is unique compared to other decision tree algorithms because it builds models sequentially with higher weights given to those cases that were poorly predicted in previous models. • Giving higher weights to poorly predicted cases improves accuracy incrementally instead of simply taking an average of all models like a random forest algorithm. • By reducing the error iteratively to produce what will become the final model, GBM is an efficient and powerful algorithm for classification and regression problems. • Hyperparameter tips: o Number of trees: ❑ Large data needs many trees. ❑ Many features needs many trees. ❑ More trees will reduce bias but also comes with more computational costs. • Compare error rate in training set and in the validation set to catch possible overfitting/bias.
  • 15.
    15 15 Model Learning -Evaluation • Standard AUC: 0.74 o Diagonal line: random guess o Above diagonal line ❑ Normal prediction ❑ Curves close to the perfect prediction have a better performance level than the ones close to the baseline. o Below diagonal line ❑ Poor prediction • Check feature importance to see if they pass the smell test. • Renewal/Recontract rate comparison between models. • Voluntary/Involuntary rate comparison between models.
  • 16.
  • 17.
    17 Methods to PredictOur Churners • Random Forest - Robust for outliers and have good performance but was slow for large dataset and did not produce as accurate predictions. • L2 (Ridge) Regression / Elasticnet – Minimizes multicollinearity while also reducing variance of the model. • Artificial Neural Network – Deep Learning that acquires knowledge through learning (nodes). Performs best for tasks like clustering, classification, and pattern recognition. • Extreme Gradient Boosting Trees – decision-tree based machine learning technique that optimizes fit by modifying the remaining error of multiple prior weaker/simpler models.
  • 18.
    18 18 Machine Learning isa “Process” Problem Formulation Model Management Feature Engineering Model Learning Label Preparation Model Deployment
  • 19.
  • 20.
    20 20 Model Deployment Paths Mleap •Better for real-time prediction of a small number of records • Doesn't require Spark session, portable to apps/devices that support JVM Spark ML Persistence •Appropriate for batch jobs, scoring lots of records at once •Requires Spark session
  • 21.
    21 21 Model Deployment Paths Towardsa better deployment story Data Scientist: Hey this logistic regression churn model is ready to go! Here is the parquet file, and here is the documentation you need to use it. Big Data Engineer: Awesome! We won't need to write a bazillion if-else statements to recreate the model! When the model needs updating... Data Scientist: We decided to use a GBM instead for better log loss error, here's the updated bundle file. Big Data Engineer: Fantabulous! All we need to do is update the model directory! Source: https://sais2018.netlify.com/#39
  • 22.
    22 22 22 Streams of Work •Data: Bring in (link to modelling data-set) remaining items on Data Sources List. • Modelling : • New ML approaches • Model non-linearities (eg. cubic spline) in key continuous vars • Model factor interactions • Further tuning of models • Combination/ensemble models • Time-Series models for precision tuning of portfolio-level predictions • Extended Problem (2019 H2+ start): Estimate effects of main factors on choice probabilities (customer specific factors, DE-controlled factors & external factors)
  • 23.
    23 23 23 Goal and SuccessCriteria Probability Forecasting for Customer Choice (i.e. stay, churn, re-contract and renew) at individual level, all ERCOT customers over 30-day and 120-day windows. Models will be judged and selected based on both cross-sectional out-of-sample performance and (monthly) time-series performance against actual choice events. Final model will have to have the best discrimination power between customers and will have to roll into an accurate aggregated portfolio choice predictor ( by business segment). A proper model scoring rule is required. We will use: • Log-Loss (a.k.a. Cross Entropy) – principal scoring rule (proper) • Brier (a.k.a. Mean Sq Loss) – secondary use (proper for binary event prediction only) • ROC AUC (area under the curve) – ranking accuracy rule (not proper but informative on discrimination power) • Decile Band Prediction Matching (not proper but informative on distributional match of predicted probabilities to realized choice events)
  • 24.
    24 DO NOTFORWARD | CONFIDENTIAL Machine Learning is a “Process” Problem Formulation Model Management Feature Engineering Model Learning Label Preparation Model Deployment
  • 25.
    25 25 Model Deployment -Management • Schedule and run the scoring weekly. • Need to score customer accounts as well as “new” customer accounts for completeness. Do customers with invoice score higher? • After each scoring do some sniff test, ex. Are early tenure customers lining up as expected? • Monitor model/feature performance. • Refresh model as needed. • Weekly review new wins/losses by segment.
  • 26.
    26 26 Model Interpretation Voluntary Score= 0.6 Roxanne Score = 0.9 Aimee Score = 0.9 Involuntary Score = 0.2 Recontract Score = 0.1 Voluntary Score = 0.1 Involuntary Score = 0.3 Renewal Score = 0.5 …Take advantage of our summer savings with a X$ bill discount. … Do you know about DE’s new Echo Dot plan?
  • 27.
    27 27 Common Pitfalls andChallenges Problem Formulation Model Management Feature Engineering Model Learning Label Preparation Model Deployment -Label Quality/ Noise -Class Imbalance -Model Degradation -Feature quality monitoring -Data quality -Categorical Data -Missing Data -Outliers -High Dimensionality -Overfitting -Scalability, speed, fast iteration -Model Interpretation -A/B testing -Dependencies
  • 28.
    28 Wrap Up Inspirations/other talksto check out •“Big data analytics and machine learning techniques to drive and grow business" BigDataAnalyticwsandMLTechniques Micheal Lie, Chi-Yi Kuan, Wei Di, Burcu Baran •“From Prototyping to Deployment at Scale with R and sparklyr" https://sais2018.netlify.com/#1 Kevin Kuo •"MLeap and Combust ML" https://youtu.be/MGZDF6E41r4 Hollin Wilkins and Mikhail Semeniuk