The document discusses building a model to predict customer redemptions from mutual funds. It analyzes historical transaction data to identify predictive variables and build a logistic regression model. The model aims to predict redemptions with 76% accuracy on test data and 71% on validation data. Future work focuses on uplift modeling to optimize sales and reduce costs. It also discusses analyzing customer characteristics to identify those with equity accounts most likely to purchase mutual funds for cross-selling opportunities.
2. Problem Statement
• Redemption of invested funds by customers is a
scenario in which capital market players such as
Securities firms or Mutual Funds have to prepare for
and face on a daily basis.India Mutual fund Industry
size is $14 Trillion with close to 80% volume traded in
Open Ended Fund according to SEBI Report.
3. Objective
• To study Mutual fund market life cycle of a open-
ended fund, and based on their characteristics and
transaction history predict Redemption in
future(Current Model:-90 days)
• Create Segmentation of Investors on basis of their
Behavioural characteristics and Demography
5. Analytics objective
Customer may Redeem due to following
• Lower Returns in portfolio
• High Risk due to volatility of market
• Poor Fund services
• Higher capital taxes and change in interest rates
6. Data Available
Historical Transaction Data
• Transaction type, date, Mutual Fund plan and Schemes
• No of units purchased, amount, NAV, Price
• Unique Identifier:- ClientCode, CommonClientCode
Customer Data
• Demographic details
• Account Opening Date, balance, ledger details
7. Analytics Approach
• Data Cleansing:
– Removing outliers
– Removing usage records after Redemption.
• Data Integration:
– For each ID combine data for client, ledger
– Merge dataset with Sensex and NIFTY historical data
– Apply Redemption date if customer has Redeem fund
– Calculate derived variables to capture user behaviour
10. Derived Variables
• Scheme Returns
• Asset ratio
• Beta
• R Squared
• Scheme Risk
• Market returns
• Market Risk
• Vintage
• Broker profit
• SIP Tenure
• STP Tenure
• Transaction count
• Age of customer
• Switch Out Ratio
• Mean NAV
Total Variables calculated: 23
Variables considered in Model: 10
11. Building Model
Eight Year Transaction Data
(2007-15)
60 % Training Data
40 % Testing Data
Validate Data
Total Available Data
Training Data
Testing Data
Validation Data(Aug-OCT)
12. Logistic Regression
To find predictive variables
To predict redemption of fund by
the customer
0
1
No Redemption
Redemption
Model: Setup
17. Confusion Matrix
Predicted
Actual 0 1 Sum
0 320060 26480 346540
1 11705 51064 62769
Sum 331765 77544 409309
Predicted
Actual 0 1 Sum
0 18272 524 18796
1 1037 3879 4916
Sum 19309 4403 23712
Test Data Validation Data
22. Future Work
To optimize Mutual fund sales ,reduce costs, customer
acquisition and to enhance customer satisfaction and
retention using uplift Modelling technique.
Focus on customers who will purchase after price
reduction
24. Hypothesis
Customers
with only
EQ
Customers
with both EQ
and MF
Characteristics of cross-sell
customers
Characteristics of customers
with only EQ
The people who bought only EQ but they have similar characteristics (in terms of
variables) to the people who bought both EQ and MF, are more likely to buy MF also.
25. Age frequency
maximum between
33 to 43
Approach-1
Based on Demographics of customer
Vintage frequency
maximum between
4 to 6
Vintage
Graduation has the highest
frequency amongst education
Maharashtra has the
highest frequency
amongst all states
State
26. Approach-2
Based on the activity of customers
Table 1 for customers
with Equity or Both EQ
and MF
Table 2 for customers
with Both EQ and MF
𝐸𝑞𝑢𝑖𝑡𝑦 𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦 𝑒 𝑎 =
𝑖=1
𝑛
𝑥𝑖
𝑛
𝐼𝑛𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦 𝑟𝑎𝑡𝑖𝑜 𝑖 𝑎 =
𝑖=1
𝑛
𝑦𝑖
𝑛
From customers of table 1 find distance of each customer from the
centroid 𝑐 𝑏.
Based on this distance rank the customers from minimum distance to
maximum distance.