© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. 1
Consumer Churn Program
Framework, capabilities and lessons learned (well, at least
so far….)
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Before and after…
2
The thinking around churn
• Wait, the consumer hasn’t churned yet,
we’ll do xx after they churn
• Churn happens when we find out
someone hasn’t transacted
• Let’s assign a probability every day and
figure out today, if someone’s going to
churn in the next pre-defined churn
period. It’s ok if you’re not super
accurate
• A consumer churned on the day of their
last transaction, not when we found out,
but, when they did their last transaction
(probably)
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Rough idea of end product
3
What do we think will resonate with our internal customers Cust Segment P(churn)
C1 Month of prior txn 0.945
C2 Days since your last
txn
0.883
C3 Days since/max. gap 0.657
C4 Lifetime spend 0.760
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Rough idea of audience
4
How will our internal customers use the product
Churn Model
Output
Executive Marketing Analysts
• Consistency (can’t change 12
month churn to 45.87 days, or
refer to churn as “brief hiatus”)
• Aggregates and segments
• May be related but different
from what drives action for
other personas, so, code
needs to be written
• Easy to put into PowerPoint,
email, Excel
• Moderately fast tool to size population
• Must have filters on region and country
• Actual population is much smaller
• Test/control clarity and size estimator
• Data, documentation
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Predictive Modeling Exercise
5
Mission Statement to Data Product
Exploratory Data Analysis Modeling Production
• Feature engineering and
reduction
• SQL, Pig, Python, JMP, R, SK
Learn
• Transaction variables - v.
important; Behavioral
variables - moderately
important; Demographic –
meh
• Automation is critical, saves
time in the long run
• Optimize SQL or MapReduce
now, don’t wait until production
• JDBC >> ODBC
• Further feature reduction, fitting,
tuning, validation
• R, H2O
• Ensemble models rock! Validate
sample size, go multi processing early,
QC your data
• Train/test/validate data sets
• AUC to set threshold
• Focus on Confusion matrix variables
like accuracy, in class error, recall,.. to
compare models
• MVP for time/accuracy and iterate
• R, H2O, C3 (PayPal’s S3), HTML,
Tableau, FEXP
• Scale with C3 and a Unix cluster
management tool
• HTML wrapper helps keep things
organized and version controlled
• I/O is time consuming - FEXP on a
DT ETL Box is super fast
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Modeling
6
Performance
Train
CV1
Validate
365 days
90 days
Metric Value
F1 0.87
Precision 0.86
Recall 0.88
Accuracy 0.87
Train: 2 million sample
Validation: 1 million sample
Precision : TP/(TP+FP)
% of wolves when I cried ‘Wolf’
Recall: TP/(TP+FN)
% of wolves I actually identified
CV2
CV5
…
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Modeling
7
Benchmarking on Random Forest and H2O’s Distributed Random Forest
Software Hardware Performance Data size
R, ODBC 1 processor,
32 GB RAM
Modeling – 6 hrs
Scoring – 72 hrs
Train: hundreds of thousands of
rows, score on entire consumer
base
Revolution R,
ODBC
8 processors,
32 GB RAM
Modeling – 1 hr
Scoring – 48+ hrs (did not complete)
Train on hundreds of thousands of
rows, score on entire consumer
base
H2O, JDBC 3 machines,
24 processors,
50 GB
Modeling – 30 min
Scoring – 12 hrs (mainly I/O)
Train on hundreds of thousands of
rows, score on entire consumer
base
H2O, JDBC 16 machines,
128 processors,
300 GB
Modeling – 20 min
Scoring – 25 min (unzip)
Train on hundreds of thousands of
rows, score on entire consumer
base
H2O, Hadoop 20 nodes Modeling – 10 min
Scoring - 5 min (about 4 min is I/O)
Train and score on entire
consumer base !
Goal: Modeling – under 30 min
Scoring – under 1 hour
Enables multiple models daily – a true forecast!
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
Production
8
Process used for identifying individual features
Current Enhancement
• Normalize feature importance
• Normalize features per consumer
feature value - mean
standard score = ---------------------------
standard deviation
• Sort feature columns by feature importance *
standard score for each feature
• Works for most cases, misses out obvious
branching in corner cases
• OK for MVP, but, not a great process
• Multiple runs of same model less 1 feature
• Evaluate difference in probability for each
run
• Order differences by feature to get most
impact
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.
So what ?
9
Data science matters!
I can’t share $$ impact, so here are some proxies:
• Resources dedicated to overall program both budget, headcount and tech spend
• Feature importance output fed into enterprise level framework
• Ongoing program built around model, literally, around output of Random Forest and GBM – no
longer a prototype (I need to figure out a way to productionize this stuff, quickly)

H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj

  • 1.
    © 2014 PayPalInc. All rights reserved. Confidential and proprietary. 1 Consumer Churn Program Framework, capabilities and lessons learned (well, at least so far….)
  • 2.
    © 2014 PayPalInc. All rights reserved. Confidential and proprietary. Before and after… 2 The thinking around churn • Wait, the consumer hasn’t churned yet, we’ll do xx after they churn • Churn happens when we find out someone hasn’t transacted • Let’s assign a probability every day and figure out today, if someone’s going to churn in the next pre-defined churn period. It’s ok if you’re not super accurate • A consumer churned on the day of their last transaction, not when we found out, but, when they did their last transaction (probably)
  • 3.
    © 2014 PayPalInc. All rights reserved. Confidential and proprietary. Rough idea of end product 3 What do we think will resonate with our internal customers Cust Segment P(churn) C1 Month of prior txn 0.945 C2 Days since your last txn 0.883 C3 Days since/max. gap 0.657 C4 Lifetime spend 0.760
  • 4.
    © 2014 PayPalInc. All rights reserved. Confidential and proprietary. Rough idea of audience 4 How will our internal customers use the product Churn Model Output Executive Marketing Analysts • Consistency (can’t change 12 month churn to 45.87 days, or refer to churn as “brief hiatus”) • Aggregates and segments • May be related but different from what drives action for other personas, so, code needs to be written • Easy to put into PowerPoint, email, Excel • Moderately fast tool to size population • Must have filters on region and country • Actual population is much smaller • Test/control clarity and size estimator • Data, documentation
  • 5.
    © 2014 PayPalInc. All rights reserved. Confidential and proprietary. Predictive Modeling Exercise 5 Mission Statement to Data Product Exploratory Data Analysis Modeling Production • Feature engineering and reduction • SQL, Pig, Python, JMP, R, SK Learn • Transaction variables - v. important; Behavioral variables - moderately important; Demographic – meh • Automation is critical, saves time in the long run • Optimize SQL or MapReduce now, don’t wait until production • JDBC >> ODBC • Further feature reduction, fitting, tuning, validation • R, H2O • Ensemble models rock! Validate sample size, go multi processing early, QC your data • Train/test/validate data sets • AUC to set threshold • Focus on Confusion matrix variables like accuracy, in class error, recall,.. to compare models • MVP for time/accuracy and iterate • R, H2O, C3 (PayPal’s S3), HTML, Tableau, FEXP • Scale with C3 and a Unix cluster management tool • HTML wrapper helps keep things organized and version controlled • I/O is time consuming - FEXP on a DT ETL Box is super fast
  • 6.
    © 2014 PayPalInc. All rights reserved. Confidential and proprietary. Modeling 6 Performance Train CV1 Validate 365 days 90 days Metric Value F1 0.87 Precision 0.86 Recall 0.88 Accuracy 0.87 Train: 2 million sample Validation: 1 million sample Precision : TP/(TP+FP) % of wolves when I cried ‘Wolf’ Recall: TP/(TP+FN) % of wolves I actually identified CV2 CV5 …
  • 7.
    © 2014 PayPalInc. All rights reserved. Confidential and proprietary. Modeling 7 Benchmarking on Random Forest and H2O’s Distributed Random Forest Software Hardware Performance Data size R, ODBC 1 processor, 32 GB RAM Modeling – 6 hrs Scoring – 72 hrs Train: hundreds of thousands of rows, score on entire consumer base Revolution R, ODBC 8 processors, 32 GB RAM Modeling – 1 hr Scoring – 48+ hrs (did not complete) Train on hundreds of thousands of rows, score on entire consumer base H2O, JDBC 3 machines, 24 processors, 50 GB Modeling – 30 min Scoring – 12 hrs (mainly I/O) Train on hundreds of thousands of rows, score on entire consumer base H2O, JDBC 16 machines, 128 processors, 300 GB Modeling – 20 min Scoring – 25 min (unzip) Train on hundreds of thousands of rows, score on entire consumer base H2O, Hadoop 20 nodes Modeling – 10 min Scoring - 5 min (about 4 min is I/O) Train and score on entire consumer base ! Goal: Modeling – under 30 min Scoring – under 1 hour Enables multiple models daily – a true forecast!
  • 8.
    © 2014 PayPalInc. All rights reserved. Confidential and proprietary. Production 8 Process used for identifying individual features Current Enhancement • Normalize feature importance • Normalize features per consumer feature value - mean standard score = --------------------------- standard deviation • Sort feature columns by feature importance * standard score for each feature • Works for most cases, misses out obvious branching in corner cases • OK for MVP, but, not a great process • Multiple runs of same model less 1 feature • Evaluate difference in probability for each run • Order differences by feature to get most impact
  • 9.
    © 2014 PayPalInc. All rights reserved. Confidential and proprietary. So what ? 9 Data science matters! I can’t share $$ impact, so here are some proxies: • Resources dedicated to overall program both budget, headcount and tech spend • Feature importance output fed into enterprise level framework • Ongoing program built around model, literally, around output of Random Forest and GBM – no longer a prototype (I need to figure out a way to productionize this stuff, quickly)