2. Customer Activation
Focus on Equity
Objectives
1. To predict activity levels of each customer in the near future( (Current Model: 90 days)
2. To profile customer activity over time (i.e., activity states with durations)
3. To determine the recommendations to activate customers
Problem Dimensions
• People
– Who are the people likely to be inactive in the next month?
• Activity State
– What are the different states in customer life cycle?
– What is the customer behaviour in a particular state?
• State Duration
– How long would the customer will be in particular state?
– What will be the transition time for a particular customer?
• Recommendation
– What strategy will be effective to prohibit inactivity of a particular customer?
– What strategy can bring customer back from inactive state to active state?
3. Analysis Process
Distributions:
Inactive period
behaviour, life
cycle of
customer
Comparative
views:
First time
inactive vs.
current inactive,
inactive vs.
active customer
life cycle
ETL Merge Filter Visualize
Storage:
ACMIIL (Trades)
Data
Formats:
Dates,
categories,
numeric value
ranges, etc.
File Formats:
Comma, Tilde,
or Tab
delimited
Customer
types:
Individual vs.
Institutions,
etc.
Transaction
types:
Buying/Selling,
First time
inactive,
current
inactive
Identifiers:
Client Code,
CommonClient
Code
Timeline:
Daily, Monthly
Aggregates:
Counts, Sums
of EQ buy,
Sums of EQ
sell
4. Activity Modelling - Outline
Trades
Data
• Summary
• Discovery
Model
• Setup
• Application
• Code
• Results
• Setup - Next Steps
• Application – Next Steps
Future Work
5. Data: Summary
Statistical measures (e.g., mean) errors
– Units field has negative values
– Too large or small values
Text data:
Mis-matches
Numerical data:
Unreal ranges
Numerical data:
Spurious values
DQ Issues
Sizing for technology
• ~7M EQ and ~1M DER trades per year
• ~100k trading customers currently on
platform, and 1/3rd transacted in the last 6
months
Analysis caution
• Data distributions highly skewed,
e.g., few high amount Txs by one or
two individuals
6. Data: Discovery
Inactivity count
All clients
inactive at least
once for greater
than 91 days
Inactivity Count
Insights
All Clients have been inactive at least once
Frequency
Data Particulars
Data Duration 2012 Apr - 2015 Sep
Each Row Client-month
Client Category Individual and HUF
# of Rows 366421
# of Columns 31
# of Unique Clients 49444
7. Data: Discovery
Average Inactive Duration
Average inactivity duration (days)
Frequency
300 days
Insights
Histogram of Average inactivity duration gives maximum frequency at 300 days
Data Particulars
Data Duration 2012 Apr - 2015 Sep
Each Row Client-month
Client Category Individual and HUF
# of Rows 366421
# of Columns 31
# of Unique Clients 49444
8. Data: Discovery
First Time Inactive vs. Currently Inactive
First time
inactive
Currently
inactive
Vintage (yrs) Vintage (yrs)
Frequency
Frequency
5 yrs 7 yrs
Insights
Current inactive customers are a mix of first time inactive and other periods making it harder to study current
inactivity alone => It brings about the need to study each activity level or state separately
Data Particulars
Data Duration 2012 Apr - 2015 Sep
Each Row Client-month
Client Category Individual and HUF
# of Rows 366421
# of Columns 31
# of Unique Clients 49444
13. Data: Discovery Insights
• All clients have been inactive (> 91 days inactivity) at least once
• The most-likely inactivity duration is ~300 days, i.e., if customer becomes
inactive => there is a high chance of a long inactivity period
• Customer behaviour is different before various inactive states
• Each inactive state (i.e., first time or second time, etc.) need to be
modelled separately
• There are different trend curves in a customer’s life cycle that each of
customers follow
• The trend curves may be grouped together into a finite set of
representative trend curves
• All the above may be modelled using a State-space approach
• A simple binary approximation is the Logistic regression model
14. Test Data
Three Year Trade Data
60% Used for Training
Model
20% Used for Validating
Model
20% Used for Testing
Model
Total Available
Data
Training Data
Validation
Data
Time
Acc
Opening
Date
1 1
First Time
inactive Inactive
1
Active Period
Inactive Period
Inactivity: Defined as 0 transactions in consecutive
91 days
Hypothesis: Customer’s state can be predicted using
transactions data
Logistic Regression Model
To find predictive variables
To predict next state of the
customer
0 0 0
0
Data Set Creation
Model: Setup
16. Model: Application
0 0 1 0 0 0 1
0 0 0 0 1 0 1
Actual States
Predicted States
Inactive
State miss
Active
State miss
Actual
Predicted
Positive
Positive
Negative
Negative
a b
c d
a - True Positive
b - False Negative
c - False Positive
d - True Negative
𝐻 𝑎 =
𝑑
𝑁0
𝑀 𝑎 =
𝑏
𝑁0
𝐻𝑖 =
𝑎
𝑁1
𝑀𝑖 =
𝑐
𝑁1
𝐻 𝑎 - Active state hit rate
𝑀 𝑎- Active state miss rate
𝐻𝑖 - Inactive state hit rate
𝑀𝑖 - Inactive state miss rate
17. Model: Results
𝑀 𝑎= 0.01%
0
5000
10000
15000
20000
25000
30000
35000
Correct Predicted
Active State
Wrong Predicted
Active State
0
5000
10000
15000
20000
Correct Predicted
Inctive State
Wrong Prdicted
Inctive State
𝐻𝑖 = 84.5%
Threshold = 0.25
0
5000
10000
15000
20000
25000
30000
35000
Correct Predicted
Active State
Wrong Predicted
Active State
𝑀 𝑎= 60.8%
0
5000
10000
15000
20000
25000
Correct Predicted Inctive
State
Wrong Prdicted Inctive
State
𝐻𝑖 = 93.1%
Threshold = 0.35
0
10000
20000
30000
40000
50000
60000
Correct Predicted
Active State
Wrong Predicted
Active State
0
5000
10000
15000
20000
25000
Correct Predicted
Inctive State
Wrong Prdicted
Inctive State
𝑀 𝑎= 40.3%
𝐻𝑖 = 0.0009%
Threshold = 0.50
𝑀 𝑎- Active state miss rate
𝐻𝑖 - Inactive state hit rate
a
a
a
c
c
c
d
d db
b
b
18. Model: Application (next steps)
Multi-period
Hypothesis:
- Error rates can be decreased by taking into account multiple periods for predictions
0 0 1 0 0 0 1
0 0 0 0 1 0 1
Actual States
Predicted States
Model
predicts 1
Check customer’s
transaction in next 30
days
If Tx = 0
Model output is 0 Model output is 1
TrueFalse
1
Active Period
Inactive Period
0
20. active
inactive closed
On-
boarded
Technical Model: State-space Model
• In the applied model we have taken only two states 0 for active and 1 for inactive
• Between these active and inactive state a customer can transit into many different states as shown in the
state space model above
• By applying state space model the complete life cycle of a customer
i. Previous state
ii. Next state
iii. Time he will be in a particular state
iv. Behaviour of customer in a particular state
v. Behaviour of customer just before transition,
vi. Behaviour of customer before going off-board, etc., will be profiled
24. Model: Setup (next steps)
Customer Sampling
For the current model, Training, validation and Testing dataset has been created by sampling on the basis of
rows, where each row is a particular customer and aggregated transaction amounts on monthly basis.
We can create Training, validation and Testing dataset by sampling as per customer basis.