Solving Churn challenge in Big data
environment
Jelena Pekez
Principal Business Consultant, Lead Data Scientist
Comtrade System Integrations
CHURN IS A CHALLENGE IN EVERY INDUSTRY, THE
DIFFERENCE IS HOW IT IS MANAGED
Examine how this model will be used
Business
focus
High margin
customers
Strategy
relevant
segments
At Risk
customers
How fresh data do we need?
Monthly/Daily/Real time?
Outcome
Identified Key Patterns of Behavior that
Lead to Churn
Enabled pro-active outreach to save
profitable at-risk customers
The Goal
Enhance churn prediction through multi-
channel customer behavior analytics and
find an incremental number of high risk
churners compared to the traditional
statistical models
BUSINESS GOAL AND OUTCOMES
The goal of retention strategy is to keep churn under control.
• Domain specific features
• Special models outputs as
new features
• Balancing techniques
Evaluation
Data
Deployment
Modeling
Data
preparation
Data
understanding
Business
understanding
CRISP METHODOLOGY
BUSINESS UNDERSTANDING
What is goal?
Relevant segment
Challenge the business
definition
Formal definition is often
not relevant for targeting
purposes
e.g. Churn 90 days  10
days
Are there already
tested campaigns
Results, take rate
e.g. do we have integrated
campaigns results,
experiences
Existing reports
review
Reduces data analysis and
understanding
Helps to set expectations
and feasibility of prediction
Trend and seasonality
understanding
Which population is
relevant
Exclude inactive customers
Find irrelevant groups and
black list
Examine existing
segments / behavior
groups
Define metrics
for success
Evaluation metrics and
expectations
What will be product
offering
Target list size for
campaign
Frequency
6
 Set objectives
 Produce Project Plan
 Business success criteria/DS success
criteria
 Assess the current situation
 Risks assumptions, constraints and
contingencies
 Terminology
 Cost and benefits
BUSINESS
UNDERSTANDING
POTENTIAL
ANALYSIS / MODELS
 Churn prediction model creation
 Sequence of impacting events
 Content Categorization
 Competitor calls recognition
 CEI – Experience Index
 Social Network Analytics (SNA)
 Behavior clusters
 Offer optimization
7
DATA PREPARATION
PHASE
1. Data understanding
2. Data integration:
 Data Integration from different data sources
 Data quality report
3. Data preparation:
 Deriving new attributes and trend variables
 Balancing data set
 Handling nulls and outliers
 Normalization and standardization of data
 Data reduction techniques
4. Feature selection
5. Create Event Tables
 Generating and investigating events
 Creating Event History Table
 Fine-tuning of event definitions based on their correlation
with churn
6. Create Event Sequences
 Generating event sequences from event table
 Generating subpaths from event sequences
 Analyzing temporal churn effects of event paths
Features from different data sources
DWH
Lifecycle stage (near contract
termination indicator)
Drop calls and Silent calls
Products and Discounts
Spending and profitability
Device info
Contract history
NPS score
Close friend churned (based on
freq. calls)
Network KPI-s
Calls to Competition
CRM
Shop visits
Handset service
Campaigns available to the customer
(Upsell, NBA, X-sell)
Previous termination requests
Call Center Activity
IVR
Call logs (frequency, recency,
duration, branch)
Text mining, text segmentation
Complaints (network, device,
contract…)
Web/ App usage
Web/App Categories browsing
Voice, data, SMS usage and limits Bill shock Web/App keyword search
CHURN IMPACT THROUGH „BIG DATA“ FEED
Non traditional data
(raw) CDR:
Competitor CC, Poaching
calls, Usage change…
Market Research:
Satisfaction surveys,
competitor new offer
(conjoint)…
Call Center:
Compliance, compliance
path, operator data, …
POS:
Visits, inquires…
Web:
Self Service portal,
browsing behavior, …
CRM:
Campaign/Response
history, Opt In/Out,
Customer data change, …
Provisioning:
Not successful activations,
…
Network:
All bad network events
(mulfunction, droped calls,
silent calls…)
Network External
Process Interaction
Event triggers
SPECIAL FEATURES EXTRACTION
Customer email
Call record
Call summary note
SN comments
Define key words Recognize intent
CHURN
NON
CHURN
Web crawled numbers
to all POS and agents
of direct competition
Find trend of calls
Find sequence of calls and SMS to
these numbers for
relevant groups
NON
CHURN
CHURN
Content Categorization
Like: Tariff, product,
service, competitor,..
Competitors calls recognition
AGREGGATED CUSTOMER EXPERIENCE INDEX
N PC U= + + +
Cantakeanyvalue between
0 and1,where1is Excellent
and0is Awful
NetworkExp. Measuredby
numberof
drops&failures
CallcenterExp. Ismeasured
byvoice sentiments
(positive,neutral,negative)
ProductExp. Ismeasuredby
numberof attemptsto
searchcompetitors
productsorsites
UsageExp. Ismeasuredby
appsusage
Calculated
DAILY
At
SUBSCRIBER
level
Benchmarked
againstAVERAGE CEM score
With ALARMS
if scoresuddenlydrops
IndividualCustomer Experience Index varies from 0 to 1 and is determined
bythe following parameters:
Using CDR data and modern tools for data integration, we can create graph of customers interactions and calculate different relationship metrics.
Combine social groups with Geo-location calculations
Features for model
• Size of network
(number of nodes)
• Number of links+
• unique links
• Leadership score
• Role in community
• Community shape
• Centrality
• Density
SOCIAL NETWORK ANALYTICS FEATURES
• Who contacts whom?
• How often?
• How long?
• Both directions?
Identify the social network
• Who influences whom?
• Who work together?
• Close people
Identify important people, calling
circles
SNA: graph analysis where nodes are metrics
Using CDR data and modern tools for data integration, we can create graph of customers interactions and calculate
different relationship metrics.
Combine social groups with Geo-location calculations
CUSTOMER PROFILE – E.G. GAMER
Network:
- Capabilities
- Access
- Bandwidth
- …
Social:
- Social Media
- Gaming forum
- Social Network
- Multi-Gaming
identity
- …
Consumption:
- Data volume
- Messaging
- VoIP
- …
Devices:
- Multi vs Single
- Online / Offline
- …
Sources of Experience:
- Profile
(demographics)
- Behavior (Usage,
CDRs)
- Interaction (CRM)
- Price plan, add-on
services
- History
- …
Traditional
sources:
Areas of importance: (AoI)
- MMO1 vs. Single player
- Online vs. Offline - / multi-screen
- VoIP
- Game communication
- Data volume, Latency
- Access method
- Gaming forum, youtube channels
- …
Experience:
1 MMO – Massively-Multiplayer Online Game
DATA INTEGRATION
From Analytical Data Mart to training table
DWH
Training table
Evaluation table
Scoring table
Features
Engineering
The metric trap – if any of values is Zero- model is
Biased
The Goal
Is to get
curve like
this
Non-churn
Churn
0
100000
200000
300000
400000
500000
600000
0 1
Share of churn in relevant population is
less than 10% in majority of cases
even less than 1% in some cases
TYPICAL CHALLENGE IS HIGHLY IMBALANCED DATASET
Confusion Matrix
99%
ACCURACY
Predicted Class
No Yes
Observed
Class
No 114700 0
Yes 4334 0
0
20
40
60
80
100
0 10 20 30 40 50 60 70 80 90 100
%ofevents
% of data sets
Gain Chart
RESAMPLING TECHNIQUES
UNDERSAMPLING
Removing samples from the
majority class
https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets
OVERSAMPLING
Adding more examples from
the minority class
Weaknesses:
1. Loss of information
2. Overfitting
Useful only with big
enough data sets.
When 1% is actually
more than 10
thousands units.
Tools:
• SQL / Python
• imbalanced-learn
OVER-SAMPLING FOLLOWED BY UNDER-SAMPLING
SMOTE ADASYN consists of synthesizing elements for the
minority class, based on those that already exist. It is based on the
nearest neighbors:
• Randomly pick a point from the minority class
• Computing the k-nearest neighbors for this point
• The synthetic points are added between the chosen point and its
neighbors
• Adds a random small values to the points
• TOMEK LINKS are pairs of very close instances, but of opposite
classes. Removing the instances of the majority class of each pair
increases the space between the two classes, facilitating the
classification process.
EXAMPLE OF BALANCING TECHNIQUES COMBINATION
0 1
12 months
historical data
10:1 ratio
Boost minority
class with
SMOTE
Eliminate similar
points with
TomekLinks
0 1
5:1 ratio
More
balanced
training set
SMOTETomek
 XGBoost offers fast computing speed
combined with explainable results with
regards to ranking feature importance's.
 Compatible with the SHAP framework offering
even more in-depth explanations of model
predictions
MODEL DEVELOPMENT USING XGBoost ALGORITHM
IS THE BEST PRACTICE FOR IMBALANCED DATASET
XGBoost
Regularization for
avoiding
Overfitting
(both Lasso and Rige)
Efficient handling
of missing data
(?)
Cash awareness
and out-of-core
computing
Parallelized
processing
In-built
cross-validation
capability
Tree pruning
using depth-first
approach
Sequentially learning algorithm that is based on function approximation by
optimizing specific loss functions as well as applying several regularization
techniques.
LatBill shock= 1,15
MODEL INTERPRETATION IS VITAL FOR
FINE TUNING OF OFFERING
1. Overall interpretation
Understanding the most important features with feature
importance plot.
2. Local interpretation:
1. understand for an individual case the reasons of the
prediction.
2. understand on a filtered population the most frequent
reasons of their prediction
SHAP summary plot
3 variables with most contribution
1st variable 2nd variable 3rd varible
ID
Probability
to churn
Class
predicted
Name Impact Name Impact Name Impact
12098321 95% 1 Reb_1 +34 Bill_3 +19% Lat_2 +8%
12098322 88% 1 Bill_1 +25 NPS_2 +14% Sill_c3 +13%
12098323 35% 0 Inf_7 -27 Lat_2 -23% Reb_1 -12%
21
ANALYTICAL
OBJECTIVE
MODEL PERFORMANCE
EVALUATION
 Lift on top 1%, 10%, and 20% most likely
churners
 Campaign performance evaluation (A/B
testing):
• Churn rate in different model
percentiles
• Churn rate DNC vs. TGT
• Offer response rate DNC vs. TGT
• Churn rate old vs. BD approach
• Offer response rate old vs. BD
approach
• Monthly level measurement
Assign a churn score to all customers in the eligible
segment
Automatically target top X% of customers with high
probability with special offer
The score should be recalculated on a daily level
New events should trigger near real-time scoring
Optimize offer type and price for individual customer
MODEL DEPLOYMENT IN AIRFLOW ENVIRONMENT
BENEFITS OF BIG DATA PLATFORM
1 2 3 4 5Include better
granularity of specific
features.
Quickly calculate daily
attributes and longer
history from more data
sources
Faster combine results
of different analytical
models to optimize
process and value
Recompute score in
real-time based on the
latest customer activity
/ event
Efficient monitoring of
model performance
and execution
THANK YOU
J e l e n a . p e k e z @ c o m t r a d e . c o m
Copyright © 2019 Comtrade. All rights reserved.
The content of this presentation is copyright protected.
Any reproduction, distribution, or modification is not allowed.
The information, solutions, and opinions contained in this presentation are of informative nature only and are not
intended to be a comprehensive study, nor should they be relied on or treated as a means to provide a complete
solution or advice, since we may not be aware of all specific circumstances of the case. We try to provide quality
information, but we make no claims, promises, or guaranties about the accuracy, completeness, or adequacy of the
information contained herein.
www.comtradeintegration.com

Solving churn challenge in Big Data environment - Jelena Pekez

  • 1.
    Solving Churn challengein Big data environment Jelena Pekez Principal Business Consultant, Lead Data Scientist Comtrade System Integrations
  • 2.
    CHURN IS ACHALLENGE IN EVERY INDUSTRY, THE DIFFERENCE IS HOW IT IS MANAGED Examine how this model will be used Business focus High margin customers Strategy relevant segments At Risk customers How fresh data do we need? Monthly/Daily/Real time?
  • 3.
    Outcome Identified Key Patternsof Behavior that Lead to Churn Enabled pro-active outreach to save profitable at-risk customers The Goal Enhance churn prediction through multi- channel customer behavior analytics and find an incremental number of high risk churners compared to the traditional statistical models BUSINESS GOAL AND OUTCOMES The goal of retention strategy is to keep churn under control.
  • 4.
    • Domain specificfeatures • Special models outputs as new features • Balancing techniques Evaluation Data Deployment Modeling Data preparation Data understanding Business understanding CRISP METHODOLOGY
  • 5.
    BUSINESS UNDERSTANDING What isgoal? Relevant segment Challenge the business definition Formal definition is often not relevant for targeting purposes e.g. Churn 90 days  10 days Are there already tested campaigns Results, take rate e.g. do we have integrated campaigns results, experiences Existing reports review Reduces data analysis and understanding Helps to set expectations and feasibility of prediction Trend and seasonality understanding Which population is relevant Exclude inactive customers Find irrelevant groups and black list Examine existing segments / behavior groups Define metrics for success Evaluation metrics and expectations What will be product offering Target list size for campaign Frequency
  • 6.
    6  Set objectives Produce Project Plan  Business success criteria/DS success criteria  Assess the current situation  Risks assumptions, constraints and contingencies  Terminology  Cost and benefits BUSINESS UNDERSTANDING POTENTIAL ANALYSIS / MODELS  Churn prediction model creation  Sequence of impacting events  Content Categorization  Competitor calls recognition  CEI – Experience Index  Social Network Analytics (SNA)  Behavior clusters  Offer optimization
  • 7.
    7 DATA PREPARATION PHASE 1. Dataunderstanding 2. Data integration:  Data Integration from different data sources  Data quality report 3. Data preparation:  Deriving new attributes and trend variables  Balancing data set  Handling nulls and outliers  Normalization and standardization of data  Data reduction techniques 4. Feature selection 5. Create Event Tables  Generating and investigating events  Creating Event History Table  Fine-tuning of event definitions based on their correlation with churn 6. Create Event Sequences  Generating event sequences from event table  Generating subpaths from event sequences  Analyzing temporal churn effects of event paths
  • 8.
    Features from differentdata sources DWH Lifecycle stage (near contract termination indicator) Drop calls and Silent calls Products and Discounts Spending and profitability Device info Contract history NPS score Close friend churned (based on freq. calls) Network KPI-s Calls to Competition CRM Shop visits Handset service Campaigns available to the customer (Upsell, NBA, X-sell) Previous termination requests Call Center Activity IVR Call logs (frequency, recency, duration, branch) Text mining, text segmentation Complaints (network, device, contract…) Web/ App usage Web/App Categories browsing Voice, data, SMS usage and limits Bill shock Web/App keyword search
  • 9.
    CHURN IMPACT THROUGH„BIG DATA“ FEED Non traditional data (raw) CDR: Competitor CC, Poaching calls, Usage change… Market Research: Satisfaction surveys, competitor new offer (conjoint)… Call Center: Compliance, compliance path, operator data, … POS: Visits, inquires… Web: Self Service portal, browsing behavior, … CRM: Campaign/Response history, Opt In/Out, Customer data change, … Provisioning: Not successful activations, … Network: All bad network events (mulfunction, droped calls, silent calls…) Network External Process Interaction Event triggers
  • 10.
    SPECIAL FEATURES EXTRACTION Customeremail Call record Call summary note SN comments Define key words Recognize intent CHURN NON CHURN Web crawled numbers to all POS and agents of direct competition Find trend of calls Find sequence of calls and SMS to these numbers for relevant groups NON CHURN CHURN Content Categorization Like: Tariff, product, service, competitor,.. Competitors calls recognition
  • 11.
    AGREGGATED CUSTOMER EXPERIENCEINDEX N PC U= + + + Cantakeanyvalue between 0 and1,where1is Excellent and0is Awful NetworkExp. Measuredby numberof drops&failures CallcenterExp. Ismeasured byvoice sentiments (positive,neutral,negative) ProductExp. Ismeasuredby numberof attemptsto searchcompetitors productsorsites UsageExp. Ismeasuredby appsusage Calculated DAILY At SUBSCRIBER level Benchmarked againstAVERAGE CEM score With ALARMS if scoresuddenlydrops IndividualCustomer Experience Index varies from 0 to 1 and is determined bythe following parameters:
  • 12.
    Using CDR dataand modern tools for data integration, we can create graph of customers interactions and calculate different relationship metrics. Combine social groups with Geo-location calculations Features for model • Size of network (number of nodes) • Number of links+ • unique links • Leadership score • Role in community • Community shape • Centrality • Density SOCIAL NETWORK ANALYTICS FEATURES • Who contacts whom? • How often? • How long? • Both directions? Identify the social network • Who influences whom? • Who work together? • Close people Identify important people, calling circles SNA: graph analysis where nodes are metrics Using CDR data and modern tools for data integration, we can create graph of customers interactions and calculate different relationship metrics. Combine social groups with Geo-location calculations
  • 13.
    CUSTOMER PROFILE –E.G. GAMER Network: - Capabilities - Access - Bandwidth - … Social: - Social Media - Gaming forum - Social Network - Multi-Gaming identity - … Consumption: - Data volume - Messaging - VoIP - … Devices: - Multi vs Single - Online / Offline - … Sources of Experience: - Profile (demographics) - Behavior (Usage, CDRs) - Interaction (CRM) - Price plan, add-on services - History - … Traditional sources: Areas of importance: (AoI) - MMO1 vs. Single player - Online vs. Offline - / multi-screen - VoIP - Game communication - Data volume, Latency - Access method - Gaming forum, youtube channels - … Experience: 1 MMO – Massively-Multiplayer Online Game
  • 14.
    DATA INTEGRATION From AnalyticalData Mart to training table DWH Training table Evaluation table Scoring table Features Engineering
  • 15.
    The metric trap– if any of values is Zero- model is Biased The Goal Is to get curve like this Non-churn Churn 0 100000 200000 300000 400000 500000 600000 0 1 Share of churn in relevant population is less than 10% in majority of cases even less than 1% in some cases TYPICAL CHALLENGE IS HIGHLY IMBALANCED DATASET Confusion Matrix 99% ACCURACY Predicted Class No Yes Observed Class No 114700 0 Yes 4334 0 0 20 40 60 80 100 0 10 20 30 40 50 60 70 80 90 100 %ofevents % of data sets Gain Chart
  • 16.
    RESAMPLING TECHNIQUES UNDERSAMPLING Removing samplesfrom the majority class https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets OVERSAMPLING Adding more examples from the minority class Weaknesses: 1. Loss of information 2. Overfitting Useful only with big enough data sets. When 1% is actually more than 10 thousands units. Tools: • SQL / Python • imbalanced-learn
  • 17.
    OVER-SAMPLING FOLLOWED BYUNDER-SAMPLING SMOTE ADASYN consists of synthesizing elements for the minority class, based on those that already exist. It is based on the nearest neighbors: • Randomly pick a point from the minority class • Computing the k-nearest neighbors for this point • The synthetic points are added between the chosen point and its neighbors • Adds a random small values to the points • TOMEK LINKS are pairs of very close instances, but of opposite classes. Removing the instances of the majority class of each pair increases the space between the two classes, facilitating the classification process.
  • 18.
    EXAMPLE OF BALANCINGTECHNIQUES COMBINATION 0 1 12 months historical data 10:1 ratio Boost minority class with SMOTE Eliminate similar points with TomekLinks 0 1 5:1 ratio More balanced training set SMOTETomek
  • 19.
     XGBoost offersfast computing speed combined with explainable results with regards to ranking feature importance's.  Compatible with the SHAP framework offering even more in-depth explanations of model predictions MODEL DEVELOPMENT USING XGBoost ALGORITHM IS THE BEST PRACTICE FOR IMBALANCED DATASET XGBoost Regularization for avoiding Overfitting (both Lasso and Rige) Efficient handling of missing data (?) Cash awareness and out-of-core computing Parallelized processing In-built cross-validation capability Tree pruning using depth-first approach Sequentially learning algorithm that is based on function approximation by optimizing specific loss functions as well as applying several regularization techniques. LatBill shock= 1,15
  • 20.
    MODEL INTERPRETATION ISVITAL FOR FINE TUNING OF OFFERING 1. Overall interpretation Understanding the most important features with feature importance plot. 2. Local interpretation: 1. understand for an individual case the reasons of the prediction. 2. understand on a filtered population the most frequent reasons of their prediction SHAP summary plot 3 variables with most contribution 1st variable 2nd variable 3rd varible ID Probability to churn Class predicted Name Impact Name Impact Name Impact 12098321 95% 1 Reb_1 +34 Bill_3 +19% Lat_2 +8% 12098322 88% 1 Bill_1 +25 NPS_2 +14% Sill_c3 +13% 12098323 35% 0 Inf_7 -27 Lat_2 -23% Reb_1 -12%
  • 21.
    21 ANALYTICAL OBJECTIVE MODEL PERFORMANCE EVALUATION  Lifton top 1%, 10%, and 20% most likely churners  Campaign performance evaluation (A/B testing): • Churn rate in different model percentiles • Churn rate DNC vs. TGT • Offer response rate DNC vs. TGT • Churn rate old vs. BD approach • Offer response rate old vs. BD approach • Monthly level measurement Assign a churn score to all customers in the eligible segment Automatically target top X% of customers with high probability with special offer The score should be recalculated on a daily level New events should trigger near real-time scoring Optimize offer type and price for individual customer
  • 22.
    MODEL DEPLOYMENT INAIRFLOW ENVIRONMENT
  • 23.
    BENEFITS OF BIGDATA PLATFORM 1 2 3 4 5Include better granularity of specific features. Quickly calculate daily attributes and longer history from more data sources Faster combine results of different analytical models to optimize process and value Recompute score in real-time based on the latest customer activity / event Efficient monitoring of model performance and execution
  • 24.
    THANK YOU J el e n a . p e k e z @ c o m t r a d e . c o m Copyright © 2019 Comtrade. All rights reserved. The content of this presentation is copyright protected. Any reproduction, distribution, or modification is not allowed. The information, solutions, and opinions contained in this presentation are of informative nature only and are not intended to be a comprehensive study, nor should they be relied on or treated as a means to provide a complete solution or advice, since we may not be aware of all specific circumstances of the case. We try to provide quality information, but we make no claims, promises, or guaranties about the accuracy, completeness, or adequacy of the information contained herein. www.comtradeintegration.com

Editor's Notes

  • #18 from imblearn.combine import SMOTETomek Synthetic Minority Oversampling Technique
  • #20 XGBoost4j on Scala-Spark Early stopping may still contain bugs
  • #25 Real time triggers – Examples: 1. Bill shock + call to Call Centar- -customer has Bill Shock but call to CC triggers Real time scoring and agent can see new score for that customer during a call 2. Reclamation + low NPS score Customer submitted reclamation and gives low NPS score, which triggers real time restoring for that customer