This is a presentation in a meetup called "Business of Data Science". Data science is being leveraged extensively in the field of Banking and Financial Services and this presentation will give a brief and fundamental highlight to the evergreen field.
2. HELLO!
I am Arul Bharathi
I am here because I love Analytics & BFS
Currently : Master’s in Big Data @ SFU
2017 : PG. Diploma in Data Analytics
(Banking & Fin Serv. Specialization)
2013-2017: BI App Developer in Bank of
New York Mellon
Connect with me -
linkedin.com/in/arulbharathi
2
4. • In 2016, Banks employed = 275,450
• Tax paid by 6 largest banks of Canada = $10.3 billion
• Percentage of population who have a favorable
impression of banks in Canada = 84%
• Banks contribute approximately 3.4% to Canada’s GDP
• Number of Bank Branches in Canada = 6200
• Number of Banks in Canada = 85
• Online banking transactions completed with the 6
largest banks in Canada in 2016 = $583 million
• Mobile banking transactions completed with the 6
largest banks in Canada in 2016 = $278 million
t
4
8. Data Mining
Non-trivial extraction of implicit, previously unknown and
potentially useful information from data
Machine Learning
A mechanism of teaching a machine to learn concepts using
data without explicit programming
DescriptiveAnalytics
• Summaries thedata
• Describesthepast trends
• Mean,Median,Quartilesetc.
ExploratoryAnalytics
• Exploringand figuring
patterns
• Discoveringcorrelations
• FormulatingHypothesis
ConfirmatoryAnalytics
• To testand confirm
hypothesis
• HypothesesTesting
PredictiveAnalytics
• Predictingfutureby
understandingpast patterns.
• Regression,Classificationetc.
Data
Analytics
Data Science
A multidisciplinary field of scientific methods, processes, and
systems to extract knowledge from data.
9. “Torture the data, and it will
confess to anything
Ronald Coase, winner of
the Noble Prize in
Economics
9
10. 2. CRISP-DM Framework
A robust process to solve virtually any
analytics problem in any industry
11. Cross Industry Standard Process for Data Mining
▪ Business Understanding
▪ Data Understanding
▪ Data Preparation
▪ Data Modelling
▪ Model Evaluation
▪ Model Deployment
11
12. Business Understanding
▪ Understanding the Business Objectives
▪ Defining the semantics of objectives
▪ Define the Goals of Analysis
▪ Establish contacts with SME
12
13. Data Understanding
▪ Collection of relevant data
▪ Describe the data (data dictionary)
▪ Explore the data using plots
▪ Validate the data quality
13
15. Data Preparation
▪ Selecting relevant data (There may be several data sources –
Unstructured & Structured)
▪ Integrating the data into a database store – (Merging different
files into a single source)
▪ Cleaning the data – (Missing Value Treatment, Outlier Treatment,
Homonyms, Synonyms)
▪ Constructing the data – (Extracting year from date, latitudes &
longitudes)
15
16. Data Preparation – Best Practices
▪ Data Integration – Merging is performed using primary keys,
composite primary keys
▪ Missing Value Imputation – Removal, Mean, Median, Mode, Data
Prediction
▪ Data Construction– APIs, Macros, Programming scripts
16
17. ▪ The heart of Data Analytics is
the “Data Model”
▪ A model should be succinct,
mathematically sound,
efficient and easy to use.
▪ Regression, Classification,
Clustering, Time Series
models etc.
17
Data Modelling
18. How Banks make money
A brief walk through of revenue generation in
banks to understand the underlying analytics
18
19. 19
Banks
Deposits
Fixed Deposit 7%-8%
Savings 4%-6%
Chequing 0%
Lending
Secured Loans 9%-10%
Unsecured Loans 14%-18%
Taking money at low interest rates and lending it at higher rates makes
money for the banks
Net Interest Margin is a measure of difference between the interest paid and
interest received related to the amount of interest
20. What’s with the P & L of a Bank?
▪ To know the individual product level P&L
▪ How are you impacting the P&L?
▪ What side of the P&L are you impacting?
20
Revenues
▪ Interest(Primary)
▪ Risk-based fees
▪ Affinity Rebates
▪ Cross-sell
▪ Annual fees
Expenses
▪ Net Credit Loss
▪ Operating Expenses
▪ Loan Loss reserve
21. Customer Lifecycle
21
• How to create prospect segments
• How to reach out to the different prospect segments
• How to quantify the quality of prospects
Customer
Acquisition
• Selling new/upgraded products to customers
• Retaining the customers or avoiding attrition
Customer
Engagement
• Managing the risk associated with customer
defaulting
• Identifying credit, operational, market risk etc.
Risk
Management
23. Frequent Datasets used in Acquisition Analytics
▪ Census data: To understand the demographic and geographic
information of your target prospects
▪ Market research and primary survey data: Additional
information for specific target groups of prospects
▪ Publicly available data: Provided by the government, related to
certain schemes and policies
▪ Credit bureau data: In common terms, it provides an entire ABC
of past transaction history (for example, previous loan status,
credit card history, default status) of prospects
23
25. ▪ Segment the prospects into various clusters (clustering) and deploy different
marketing strategies for each of them.
▪ Divide and conquer is the go-to strategy for banks. This comes under market
segmentation which is an extremely important strategy to optimize marketing
costs.
▪ There are two fundamental strategies that organizations can undertake to
market their products and services:
▫ Mass marketing: The same marketing message is communicated to
everyone through channels like TV and newspaper ads. In most cases,
mass marketing is expensive
▫ Targeted marketing: Personalized marketing is always preferred as each
individual has different needs but, at the same time, the volume is too
high for customization at an individual level. Hence, there is a need to
identify segments where people have the same behavior and sell a
specific product to a particular group of customers having similar
interests.25
Market Segmentation
26. Once the entire addressable prospect universe is broken down into segments, the
next step is to identify which segments are more important given the company's
overall business objective. The key questions are:
▪ Who are the best customers?
▪ What product or services does each segment need?
▪ When do they need these products or services?
▪ How do they want to interact with the bank?
▪ What do they behave the way they do?
Segment prioritization is all about answering these questions.
26
Segment Prioritisation
27. 27
Segment Prioritisation Use case
HIGH
INCOME
MEDIUM
INCOME
LOW INCOME
Digital Online
Channel
PRIORITY 1
PRIORITY 2 PRIORITY 4
Traditional
Branch
Channel
PRIORITY 3 PRIORITY 5
28. ▪ Objective
What is the business objective? Is it future profitability, or increasing
the current market share, or something else?
▪ Acquisition of the right prospects
Which segments have income/age/attributes suited for the product?
▪ Retention and loyalty
Which segments are likely to retain for long?
Which segments could be cross-sold newer products easily later?
▪ Risk
Which of the customers are risky from a profitability perspective?
Dividing your analysis across the lifecycle gives you a helpful bird’s-eye view
of the business strategy.
28
29. In many cases, more than one channel is used or the same channel is utilized
twice to reinforce the offer.
▪ Direct Mailer: It is a marketing effort that uses a mail service to deliver a
promotional printed piece to the target audience.
▪ Telemarketing: Costing higher than the direct mailers, telemarketing
provides much higher returns on investment than direct mailers.
▪ Email marketing: The biggest advantages of email marketing as direct
marketing channel is its flexibility and low cost, and it is widely used in
the banking sector for promoting new products.
29
Channel Preferences
30. 30
Cost per customer
Sales per customer
Customer Response Rate
Acquisition Analytics - Evaluation Metrics
32. 32
Lets assume Scotiabank conducted a telemarketing campaign for a term-
deposit product somewhere around late 2011.
The marketing team of the bank wants to launch yet another telemarketing
campaign for the same product.
33. Problem Analysis
33
You, an analyst at the bank, want to answer the following questions using the past
data:
• Which prospects are more likely to buy the product (i.e. to respond )?
• Which attributes determine the propensity to buy a term-deposit?
• Once you predict the likelihood of response, how many prospects should you
target for telemarketing?
• By how much can you reduce the marketing cost using the model, and how many
prospects will you acquire?
34. Business and Data Understanding
34
Problem and Business Objective
•To reduce the customer acquisition cost by targeting the ones who are likely to
buy
•To improve the response rate, i.e. the fraction of prospects who respond to the
campaign
Data
•Customer data: Demographic data, data about other financial products like home
loan, personal loan etc.
•Campaign data: Data about previous campaigns (number of previous calls, no. of
days since the last call was made, etc.)
•Macroeconomic data
•Target variable: Response (Yes/No)
35. Initial Insights
35
To summarize, you got an intuitive understanding of the data. The most important
points are:
• The age distribution suggested a wide variation in age
• We have capped the data to treat outliers
• Bin the age variable for easier understanding
• The response rate varies significantly across the age groups, with the groups 16-20
and senior citizens having high response rates
• Month of contact seems a significant predictor of response rate
• Day of month does not seem very important
• Call duration seems an important predictor
• The more time you spend on the call, the more is the response rate
36. Data Modeling and Evaluation
36
• Model Types : Regression, Classification, Clustering etc.
In the context of this case, which metrics are the most important to assess:
• How well the model fits the training data?
• How well the model spots the responders?
• How well the model separates the responders from the non-responders?
37. Financial Benefit Analysis
37
How many (high likelihood responders) should the bank target? The top 30%, 40%
or 50% etc.?
• If you target the top X% responders, what will be the marketing cost and the
expected response rate?
How can you assess the financial benefit of the project?
• In targeting the top X% responders, how many will you acquire and at what
cost?
• By how much can you reduce the total marketing cost and the cost per
response using the model compared to without using it (i.e. targeting all the
prospects)?
38. 38
If you market to only the top 3 deciles (30% of the customers), you will capture more than 90% of
the responders.
•You can acquire more than 90% customers at less than 30% cost, or
•The cost of acquisition per customer has come down from approximately 8.87 units/person to
approximately 2.72/person
Gain Chart- Tells us the number of
responders captured (y-axis) as a
function of the number of prospects
targeted (x-axis)
Example: In the 4th decile (40%
people targeted), we can capture
about 99% of the responders
39. 39
Lift chart:
Compares the response rate with and without using the model
• Compares the ‘lift in response rate’ you will get with the model, when you target the entire
population (without using the model)
• Contains lift (on y-axis) and the number of prospects targeted (on x-axis)
• Example: Our original response rate is about 11.7% (called the baseline); if at the 4th decile on
the x-axis, you get a lift of 5, it means that you can get 5 x 11.7% the response rate by targeting
40% prospects using the model
41. Customer Engagement
41
Cross-selling and upselling
• How to increase revenues by selling new products to existing customers
• How to measure the value (i.e. financial benefit) corresponding to each customer using
customer lifetime value
• Cross-selling and upselling are aimed primarily at increasing revenue. Some examples
are:
Cross-selling: Selling credit cards to home loan customers
Upselling: Selling premium credit cards to regular credit card holders
Retention management
• How to identify customers who are likely to leave a product or service
• What retention strategies can you employ to retain your customers
42. Don’t Cross Sell Too Much
42
Among the pitfalls, the most
common one is lending credit to
risky customers or the ones who
are unlikely to use it (as setting
aside capital for this service is
also a cost). Apart from lending
unutilized credit, cross-selling
sometimes also results in
acquiring customers who cost
more than the profit they bring
in.
43. 43
Types of Cross-Selling
Sequential Cross-sell
• means selling products one after
the other in a sequence.
Bundled Cross-sell
• Grouping different items together
and selling them at a lower price
than what they would have sold for
separately.
44. 44
Customer Lifetime Value
• To measure the value generated by a customer
• The value the bank will get from a customer over the life cycle of the
customer
• Used in wide variety of industries including retail banking
• Used to execute targeted campaigns focussed on high value customers
45. 45
Retention and Loyalty Management
• Aims at reducing customer attrition.
• Used in wide variety of industries including retail banking
• Used to execute targeted campaigns focussed on high value customers
Customer Attrition
Voluntary Involuntary
47. 47
Types of Cost
• Risk management works with the cost side of asset products
Credit Loss
• Loss of credit when
customer default
payments
• Largest component
Provisioning Loss
• Expected Loss
• Unexpected Loss
Cost of Capital
• Money set aside for
provisioning cannot
be used for other
business and there
is a cost attached to
it.
50. ▪ The “fair lending” concept should be practised
▪ Can be used to classify whether to Approve/Reject customer
▪ Can be used to predict the amount of loan, the tenure, and interest rate
▪ If a customer is denied a service then it is an “Adverse Action”
▪ Reasons should be given for Adverse Action
50
Acquisition Risk Analytics
51. ▪ Application data: From the application form submitted by the customer
▫ Demographic data
▪ Credit bureau data
▫ Payment history
▫ Types of loan (paid and unpaid)
▫ Enquiries for credit
▪ Internal information: Available only to the bank where the customer already has
an account
▪ Alternate sources: Social media, travel pattern, and any other external sources of
data
51
Types of Datasets
52. 52
Types of Risk Models
▪ Commercial Credit Models
▫ Probability of default(PD) models
▫ Loss given default (LGD) models
▫ Exposure at default(EAD) models
▪ Consumer Credit Models
▫ Default models
▫ Bankruptcy models
▫ Behavioral models
▫ Loss given default(LGD) models
▫ Exposure at default(EAD) models
PD – Probability of default
LGD – Amount of money a bank or financial institution loses when borrower defaults
EAD – Total value that a bank is exposed to at time of a loan’s default
53. 53
Model Validation
▪ Models need validation because there is potential for error in
modeling, which can lead to poor management decision.
▪ The existence of potential errors in modeling is called model risk
Why Model risk exist?
▪ At some level a model is always incorrect
▪ Poor decisions from erroneous results
▫ Actual losses
▫ Foregone income from opportunity costs
▪ Some of the worst of the risks center around implementation
55. ▪ Auto Approval Rate – Approval rate calculated by algorithms
▪ Manual Approval Rate – Calculate manually
▪ Override Rate – If black box gives a decision and the decision is manually
overrode.
Reject Inferencing –
▪ Data available for acquisition contains only approved candidates
▪ To avoid Bias Sampling, Reject Inferencing is performed
▪ Perform small experiments on rejected population
55
Metrics to measure Acquisition Risk Analytics
56. 56
Existing Customer Management
▪ How will you tag customers – good or bad?
▫ Delinquency – DPD (Days past due)
▫ 90+ DPD is default, 120+ is written off
▫ Roll-Rate Matrix
▪ How to asses the riskiness of a portfolio?
▫ Roll-Rate Matrix
▫ Roll overs are bad, rollbacks may be good
▪ Optimal time to declare a default?
▫ Vintage Curve
57. 57
▪ Matrix created on month-on-month basis
▪ Out of all those who were 60 DPD in Dec-15, 3.10% rolled back to 30 DPD
▪ Out of all those who were 60 DPD in Dec-15, 85.50% rolled over to 90 DPD
▪ The diagonal elements represent the customer who stay in same bucket
58. 58
Collection and Recovery – Objectives & Modeling
▪ What should you do with early delinquent customers?
▫ Self-cures should not be disturbed
▪ What actions to take on bad customer?
▫ Send SMS, court-notice etc.
▪ What do with write-offs?
▫ Sell-off
▪ What to do with collaterals in secured loans?
▫ Repossession
Self-cure
models
• Will
customer
pay on his
own?
Bucket
rollover
• Who will
roll over to
higher
DPD?
Recovery
Models
• The
amount of
money that
can be
recovered.
Pay/No-pay
M
• Whether
the person
will pay?