Churn Prediction is one of the most popular Big Data use cases in business. This case is decided by everyone from internet-shops, telecom operators to game developers and ticket services. It is widely thought, that churn prediction is traditionally well solved by methods of machine learning. In that report we would discuss how to compare metrics of machine learning (AUC ROC, F1, Logloss) with business metrics. How to make an experiment in order to estimate the effect from introducing the machine learning. Which characteristics should business have?
2. Alexei Chernobrovov
is the largest internet business, evolving
e-commerce and leading Russian social
networks.
is one of the biggest online-schools in
Europe for learning English.
is the central securities depository of the
Russian Federation.
is the largest sporting goods retailer in the
world.
is one of the biggest carmakers in the
world.
Member of the expert Council of the
Runet Prize.
4. Game platform on Ok.ru
The target
to force a user
to make more
purchases
to keep a user in
a game as long
as possible
5. Churn Prediction
Not churned Churned
We should divide users
into 2 groups: churned &
Not churned.
We need to keep users
in a platform and to
minimize user’s churn.
7. ● Website traffic
● Unique users
● Session duration
● NPS (net promoter score)
● Average Order Value
● Conversion
● Churn rate
● MONEY!!!
Business - Metrics
8. What does Business need from ML?
● Possibility to implement
● Real business improvement
● Return-On-Investment
9. ML and Business Aims
We need to predict churn before it happens.
ML Aim Business Aim
to predict a user’s
churning out
to make actions on user’s
retention
10. Actual
Churned Not churned Action
Predicted
Churned True Positive False Positive Discount
Not
churned
False Negative True Negative No action
Confusion
Matrix
11. Actual
Churned Not churned Action
Predicted
Churned
LTV of a client
- amount of
discount
Amount of
discount Discount
Not
churned
LTV of a client 0 No action
Business
Metrics
15. Discount is not provided (А)
Churned Not
churned
Churned 28% 6%
Not churned 6% 60%
Discount is provided (B)
AB-test
Churned Not
churned
Сhurned 25% 9%
Not churned 6% 60%
16. Discount is not provided (А)
Churned Not
churned
Churned 28% 6%
Not churned 6% 60%
Discount is provided (B)
AB-test
Churned Not
churned
Churned 25% 9%
Not churned 6% 60%
17. Discount is provided (B)
Churned Remained
Churned 28% 6%
Not churned 6% 60%
Discount is not provided (А)
AB-test
Churned Remained
Churned 25% 9%
Not churned 6% 60%
Not
churned
Not
churned
3%
18. 69%-66%=3% of all users (3%/28%=10,7% of those who intended
to churn) didn’t leave the service thanks to ML and activity!
Churned Not
churned
Churned 28% 6%
Not churned 6% 60%
66%
Real effect from the Experiment
Churned Not
churned
Churned 25% 9%
Not churned 6% 60%
69%
19. 69%-66%=3% of all users (3%/28%=10,7% of those who intended
to churn) didn’t leave the service thanks to ML and activity!
Churned Not
churned
Churned 28% 6%
Not churned 6% 60%
66%
Real effect from the Experiment
Churned Not
churned
Churned 25% 9%
Not churned 6% 60%
69%
20. The right business metric.
А: 66%⋅LTV
B: 69%⋅LTV-9%⋅(amount of discount)
The real effect (B-А): 3%⋅LTV-9%⋅(amount of discount)
What does Business get as a Result?
21. Let’s assume:
LTV = 115 €
Amount of discount = 5 €
ML implementation = 30,000 €
Is the Experiment cost-effective?
What minimal number of users is required to recover the
investments?
22. Let’s assume:
LTV = 115 €
Amount of discount = 5 €
ML implementation = 30,000
Is the Experiment cost-effective?
What minimal number of users is required to recover the
investments? 10000
Simple calculation:
3%⋅ 115 € - 9%⋅ 5 €=+3 €
That is 3 € from each user.
30,000/3 = 10000
23. Correct Business-Metric
+(LTV − amount of discount)⋅TP⋅DA
− (amount of discount)⋅FP
→ max
Where DA is percentage of discount
acceptances out of those users who
intended to leave our service.
25. We Can Calculate!
LTV: (28%+6%)⋅10,7%=3,6% [DA = 3%/28%=10,7%]
(in other words we will save an additional 0,6% LTV)
Amount of discount: -0,6%-9%-60%=-69,6%
If we express in money:
3,6%⋅LTV - 69,6%⋅(discount)= 3,6% ⋅ 115€ – 69,6%⋅ 5€≈+0,71€ per user.
But ML implementation: 0€!
So if we provide discounts to everyone, we will earn +0,71€ per user. It is more
attractive than not to give a discount anyone at all!
This algorithm should be considered as a baseline.
27. Discount Acceptance (DA) Forecast
That’s it. Churn might be predicted ideally.
But if business does not possess a tool to apply - the result might be zero.
First, we need to estimate a
possibility to impact those users,
who are going to churn out.
It might be more important than
even a good ML prediction.
28. The right Design of the Experiment
1. Build a confusion matrix
2. Take a sub-sample (N% of users) and provide a discount to everyone.
3. Calculate percentage of users who retained after a discount (in other words an
error offset ∆FP/(TP+FN)). Our DA.
4. Optimize ML-algorithm considering the results from step # 3.
5. Compare ML-algorithm with 2 (!) baseline (provide a discount to everyone, do
not provide it to anyone)
6. Calculate the real effect from the implementation of the algorithm
AND ONLY IF IT PAY-OFFS:
7. Implement ML-algorithm
29. To Sum up
• Test possibility to retain a
customer before you start churn
prediction;
• Choose the right metrics for
churn prediction;
• Estimate project cost-efficiency
before you launch it.