2. I will explain…
• Abstract
• Prepaid Lines
• Background
• RFM
• CLV
• Pareto/NBD
• Offer Management
• Data Set Definition
• Benchmark Results
• With Log Reg
• Pareto/NBD Model Results
• Proposed Offer Model & Results
• Conclusion
3. Abstract
• In todays challenging market conditions, mobile operators have to
manage their cost while proposing offers to their customers
• Mathematical model was developed for the offer management which
uses;
• Expected future transactions and customer lifetime value that are
calculated with the predictive model based on refill transactions .
• Logistic regression was also used to benchmark the «Refill
Fetaures» with «Subscriber Features».
5. Why?
There are 2 subscription types for mobile users:
• Prepaid : Subscribers pay before usage with discrete refill actions
• Postpaid : Subscribers pay after usage in monthly basis
Postpaid subscriptions are more preferable than prepaid ones.
However, prepaid subscribers are also valuable in todays’ market
conditions which has high level of penetration rate.
6. Mobile Prepaid Lines
• Prepaid lines are important as much as
postpaid lines.
• In time they both converges.
36.2 million prepaid 40.3 million postpaid
ARPU : 14.6 TL ARPU :40.8 TL
• Prepaid lines are potential postpaid lines.
• Calculating the correct CLV is a valuable
data for the prepaid subscribers.
* https://www.btk.gov.tr/File/?path=ROOT%2f1%2fDocuments%2fSayfalar%2fPazar_Verileri%2f2017-Q1.pdf
7. Mobile Subscriber Penetration
Excluding 0-9 age population, the penetration of mobile
subscribers is 108% in the end of 2nd quarter of 2017.
* https://www.btk.gov.tr/File/?path=ROOT%2f1%2fDocuments%2fSayfalar%2fPazar_Verileri%2f2017-Q1.pdf
Population
% Penetrarion
8. Prepaid Subscribers Behavior
Require less paper
work
Advantage of controlling the
usage cost Preferred more than
postpaid lines.
Operators never know
when subscriber will end
their relationship
Can we define the
behaviour of the
prepaid subscribers
with RFM data?
Non-contractual
relationship
Discrete Purchase
Transactions
Drop out period
unknown
9. Literature Review
RFM and CLV: Using iso-value curves for customer base analysis . Fader et al, 2005 which used the
assumptions in the study «Counting your customers: Who they are and what will they do next?» Schmittlein
et al. 1987
NBD = Poisson X Gamma : Customer Purchase Transactions
• Poisson : While active, customers can place orders whenever they want to.
• Gamma : Purchase rates vary independently across customers
Pareto = Exponential X Gamma : Customer Aliveness
• Exponential : Customers go through two stages in their “lifetime” with a specific firm: they are active for
some period of time, then become permanently inactive. The point at which a customer becomes inactive
is unobserved by the firm.
• Gamma : “dropout” rates vary independently across customers
Monetary = Gamma X Gamma
• Gamma : The amount of a customer’s given transaction varies randomly around his average transaction
value
• Gamma : The distribution of average transaction values across customers is independent of the
transaction process.
10. Literature Review
A dynamic model for valuing customers: A case study , H.-S. Hwank, 2015
In literature, there are
many studies about CLV.
According to this study,
CLV models can be
categorized in four main
fous areas and 8 different
models.
We used in non-
contractual, discrete
approach in our CLV
calculation based on the
Fader’s study also.
Focus Category Model
Structural Model
Customer Unit
Individual model
Segment (Customer base) model
Prediction data
Retrospective model
Prospective model
Transaction
Contractual model
Non-Contractual model
Purchase cycle
Discrete model
Continuous model
Strategic Model Strategic use of CLV in management
Normative Model Relationship between Duration and cost
Analytic Model Resource allocation (Budget allocation) / Pricing
12. RFM Data
Quintiles
Scores from 555
to 111 assigned to all
Customers
The customers in the
RFM bucket
which is 555,
could have the highest
probability to respond
the campaigns
13. Some Distributions used in the model
Poisson : λ is the
expected number of
occurrences, x= 0,1,2,…
Exponential : µ describes
the time between events
in a Poisson points = 1/λ
Gamma : Sum of n
indepented exponential
variables, x>0
𝑓 𝑥 = 𝜆 𝑒−𝜆𝑥
𝑓 𝑥 =
𝑒−𝜆 λ 𝑥
𝑥!
𝑓 𝑥 =
𝑒−𝜆𝑥
λα
𝑥α−1
Г(α)
14. Poisson Distribution : Purchase count while alive λ
Exponential Distribution :
Lifetime of Customer µ
Poisson / Exponential / Gamma
Accross
Customers
Purhcase
Rates &
Dropout rates
follows
gamma
distributionRefill Transactions
15. Purchase Count
Follows a poisson distribution:
This corresponds to a random
purchasing pattern with constant
rate λ
Lifetime
The Customer lifetime is modeled
according to an exponantial
distribution with latent parameter
µ.
Monetary Value
The Money value of each
purchase is modelled following
gamma distribution with
parameter ρ and ν
Heterogeneity Accross
Customers
Pareto/NBD model assumes
gamma distributions for the
parameter priors.
NBD = Poisson X Gamma,
Pareto = Exponential X Gamma
Schimmtlein 1987
Gamma X Gamma Monetary
Model
Fader 2005
Pareto / NBD
16. Refill Data Sample (RFM Data)
The only customer-level information required by this
model is recency and frequency. It is represented by this
notation : (X = x, t_x, T),
x Transaction Count between (0, T]
t_x the time of the last transaction in week
(0 < tx ≤ T)
T Maks weeks
Tenure Calculated based on the transaction date
SUBSCRIBER_ID X T_X T_CAL
2559188 2 31.57142857 34.71428571
13664295 3 1.428571429 1.714285714
130628981 2 11 34.85714286
130664562 1 7.714285714 14.71428571
130753128 3 12.85714286 38.28571429
130885479 3 28.57142857 32.57142857
130912253 3 28.28571429 39.71428571
130929251 6 42.85714286 47.71428571
The data for the same day is
merged into single
transactions for the model.
The gamma parameters ; are
estimated by using the training
period data with maksimum
likelihood method
17. 1000 $ 1000 $ 1000 $ 1000 $ 1000 $
1st Year 2nd Year 3rd Year 4th Year 5th Year
952 $
907 $
863 $
822 $
783 $
%5
4.329 $ = NPV : Net Present Value
= Discount Rate
Future Value
CLV
Can be simply described as the
net present value of the future
cash flows associated with a
customer
CLV = margin x revenue / transaction X DET
19. Offer Management Process
Data
Warehouse Offer
Management
Campaign Response Collection
DWH
Central
Decision
Hub
Campaign
Management
System
Optimization
Rules and
Variables
Maximize
Response
Maximize
Profit
Sending
Lists
Optimized
Sending
Lists
Daily
Interaction
Plan
Response
Calculation
Analytic
Results
Subscriber
Info
20. Main Data Set
Subscriber Activation & Refill Date Period
2 years period
1st of Feb
2015
31th of Jan
2017
Segment Info For Subscriber Included
21. Main Data Set Preparation Method
•The data is on Oracle Database
•For subscriber set selection ora_hash function is used
with a subscriber id key.
•15 buckets are created with this hash function
•The subsets are selected randomly from these sets
•Subscribers are distinct in each of selected sub sets
•Only 6 of them were chosen for the analysis
•The transactions are selected based on these subscriber
sets.
22. The basic statistics
Main Set : 2.7M
6 sub sets taken
Sub Set : ~50K
Max ~150+
Min 1
Median ~25
Q75 ~30
Q25 ~19
Variance 129
SD 11.3
Max ~550+
Q75 ~35
Median ~25
Q25 ~10
Min 1
Trans Cnt Days Between
Refill Transactions Main & Sub Data Sets
Set Name Trans Cnt Min Max Range Mean Var Stddev
Main 2,700,349 1 360 359 24 129 11
Set 1 53,320 2 180 178 24 122 11
Set 2 52,683 2 360 358 24 123 11
Set 3 53,091 1 180 179 24 123 11
Set 4 51,920 1 180 179 24 124 11
Set 5 53,900 1 360 359 25 130 11
Set 6 53,248 2 360 358 24 147 12
23. Refill Amount Quintiles Main & Sub Sets
Set # Subs Q5 Q25 Median Q75 Q95
Main 2,700,349 10 19 25 30 40
Set 1 53,320 10 19 25 30 40
Set 2 52,683 10 19 25 30 40
Set 3 53,091 10 19 25 30 40
Set 4 51,920 10 19 25 30 45
Set 5 53,900 10 19 25 30 45
Set 6 53,248 10 19 25 30 40
24. One of Subset Selected
Transaction Count 53.248
Distinct Subscriber Count 6.480
Minimum Refill Date 01.03.2015
Maximum Refill Date 31.01.2017
Days Between Dates 708
26. • RFM Features
• Usage behaviors (Data, SMS usage, VAS usage)
• Call Center Transactions
• SNA Calculations
• Refill Transactions
• Equipment information
• Subscription information (ARPU, Payment type, Customer_type, tariff etc)
• Number portablity transactions
• Call Behavior (Incoming & Outgoing call counts)
Logistic Regression Features
395
27. Extra Features
RFM
Log Regression
With Extra
Features
Log Regression
With RFM
(Result 1)
Log Regression
With Extra
Features and RFM
50 features
(Result 4)
10 features
(Result 2)
50 features + RFM
(Result 5)
10 features + RFM
(Result 3)
Benchmark Logistic Reg. For RFM & Features
28. Cutoff Result Set Name Accuracy TP Rate FP Rate Precision
0.5
Result1 RFM 95% 98% 24% 96%
Result2 Var10 91% 100% 64% 90%
Result3 Var10 + RFM 95% 100% 31% 95%
Result4 Var50 92% 100% 56% 91%
Result5 Var50 + RFM 96% 100% 28% 95%
0.6
Result1 RFM 94% 97% 22% 96%
Result2 Var10 92% 100% 50% 92%
Result3 Var10 + RFM 96% 100% 24% 96%
Result4 Var50 93% 99% 41% 93%
Result5 Var50 + RFM 96% 100% 23% 96%
0.7
Result1 RFM 93% 95% 20% 96%
Result2 Var10 94% 99% 34% 94%
Result3 Var10 + RFM 97% 100% 20% 97%
Result4 Var50 95% 99% 30% 95%
Result5 Var50 + RFM 97% 100% 18% 97%
0.8
Result1 RFM 92% 93% 17% 97%
Result2 Var10 94% 98% 29% 95%
Result3 Var10 + RFM 97% 99% 15% 97%
Result4 Var50 95% 98% 24% 96%
Result5 Var50 + RFM 97% 99% 14% 98%
0.9
Result1 RFM 86% 86% 13% 97%
Result2 Var10 95% 98% 24% 96%
Result3 Var10 + RFM 98% 99% 12% 98%
Result4 Var50 95% 98% 20% 97%
Result5 Var50 + RFM 98% 99% 11% 98%
Benchmark Logistic Reg. For RFM & Features
• While cut off rates increases the accuracy of RFM results
are decreasing.
• For cutoff rate 0.5 and 0.6 the accuracy for RFM is higher
than the model results with the features.
• However 0.7, 0.8 and 0.9 the selected features gave better
results comparing to RFM variables.
• But the model gave stronger results when run with the
features and RFM variables in the same model.
• It shows that RFM data is a sufficient data for analising the
churn behaviour of a subscriber.
• Depending on the preferred cut off rate only RFM data can
be used for the logistic regression.
Accuracy = TP + TN / Total Population
TP Rate = TP / TP + FN (Recall)
FP Rate = FP / FP + TN (Fallout)
Precision = TP / TP + FP
29. Pareto/NBD Model Results
With segment and tenure
information the model
accuracy is much higher than
without these information
Cutoff Result Set Name Accuracy TP Rate FP Rate Precision
0.5
Result6 w/o Segment + Tenure 88% 98% 69% 89%
Result7 Segment 95% 99% 29% 95%
Result8 Segment + Tenure 94% 98% 29% 95%
0.6
Result6 w/o Segment + Tenure 88% 97% 64% 90%
Result7 Segment 96% 99% 18% 97%
Result8 Segment + Tenure 96% 98% 18% 97%
0.7
Result6 w/o Segment + Tenure 89% 97% 58% 91%
Result7 Segment 96% 98% 13% 98%
Result8 Segment + Tenure 96% 97% 13% 98%
0.8
Result6 w/o Segment + Tenure 90% 96% 46% 92%
Result7 Segment 97% 98% 10% 98%
Result8 Segment + Tenure 96% 97% 10% 98%
0.9
Result6 w/o Segment + Tenure 91% 93% 18% 97%
Result7 Segment 97% 97% 6% 99%
Result8 Segment + Tenure 96% 96% 6% 99%
30. Model failed for
the test period
With tenure
information the
model gave
expected results
MASS Segment Youth Segment Other Segment
Pareto /NBD Model Results For Expected Future Transactions
OnlySegmentSegment&Tenure
31. • clv_basei= Base Customer Lifetime Value of customer i. This amount should be
interpreted as the expected present net worth of customer i for the following cases: i)
customer receives no offer or ii) customer refuses the proposed offer.
• p_i= Churn probability of customer i to company before receiving the offer,
• o_i= Upper bound of discount for customer i,
• w_j= Upper bound of offer j,
• c_j= Discount rate of offer j,
• f_i= Monthly payment of customer i to company before receiving the offer,
• f_avg= Monthly average payment of customers,
• β_ij= Probability of accepting offer j by customer i,
• ρ_ij= Churn probability of customer i after taking offer,
• clv_ij= Modified Customer Lifetime Value of customer i,
• t= Monthly budget for discount,
Proposed Offer Model
33. • GAMS was used with CPLEX solver for the model.
• For 6480 Customer, the model finished in 23 sec
• When Customer number increased to;
• 115.932 the model time was 78 sec
• 225.387 the model time was 152 sec
• With these customer numbers, it can be said that the model performs well.
Offer Model
34. • To maximize the objective function different scenarios were tested.
• Budget and offer limit were changed to get the efficient budget and offer limit.
• If offer limit is not applied in the model, more customers accept the offers than
the model with limited offer.
• Last scenario only customers with churn probability > 0.5 were put into the
model and most of them accepted the offer.
Offer Model Scenarios
Budget Limit Offer Limit Cutoff # of Customers # of Received
Scenario1 10,000 300 0 6,480 741
Scenario2 10,000 inf 0 6,480 3,398
Scenario3 5,000 300 0 6,480 532
Scenario4 5,000 inf 0 6,480 1,498
Scenario5 1,000 300 0 6,480 168
Scenario6 1,000 inf 0 6,480 168
Scenario7 10,000 inf 0.5 781 562
35. • In Scenario1, mostly churner customers accept the offers
• If offer limit is removed from the constraint, almost all the budget is spent for the non-
churner customers. This is also true with the limited budget in Scenario3 and 4.
• If budget is getting to narrow, only the customers with low churn probability get offers.
• In last scenario only churner customers are put into the model and %75 accepts the
proposed offers.
Offer & Churn Distributions
Discount Offers
Offer1 %10 Offer2 %20 Offer3 %30 Offer4 %40 Offer5 %50 Total Subs #
300 57 39 60 285 741
3,352 10 2 11 23 3,398
300 22 13 35 162 532
1,495 3 1,498
168 168
168 168
121 57 39 60 285 562
Churn Rates
p<0.5 p>=0.5
Scenario1 294 447
Scenario2 3,331 67
Scenario3 294 238
Scenario4 1,485 13
Scenario5 168
Scenario6 168
Scenario7 562
36. • There is 2 option to choose;
• One of them is to make my customers to increase the
loyalty and revenue of the company.
• The second is to keep the churner customers, and try
to lower my acquisition costs.
Which Option?
37. Conclusion
Pareto /NBD model can be used to predict the
expected transactions of the subscriber for CLV
calculation
RFM data is sufficient data for prepaid
subscriber analysis.
Depending on the budget limit mobile
operator can chose between loyalty or
keeping customers.