SlideShare a Scribd company logo
1 of 60
Download to read offline
TRANSACTIONS
BASED
ANALYTICS
Vijay Desai, SAS Institute
Presented at Kelly School of
Business, Bloomington, IN
Nov. 8, 2010
1
AGENDA
 Transactions landscape
 Transactions data
 Problems to tackle
 Transactions analytics
 Types of techniques
 Performance measurement
 Target definition
 Fraud
 Credit risk
 Attrition
 Deploying the solution
 Using the scores in production
 Monitoring the production system
2
Trasactions Data
Problems to Tackle
3
TRANSACTIONS LANDSCAPE
TRANSACTIONS DATA
 Credit/Debit cards
 Authorisations
 Payments
 Statements
 Non-monetary data
 Bureau data
 Demographic data
 Campaign data
 Clickstream data
 Wire transfers
 Financial transactions
4
PROBLEMS TO TACKLE
 First party fraud
 Second party fraud
 Third party fraud
 Credit risk, bankruptcy
 Product offers, pricing
 Money laundering
 Financial trading violations
 Bio-terrorism
 Intrusion detection
5
PREDICTION VERSUS DETECTION
 Detection examples
 Credit card fraud
 Tax under-filing
 Bio-terror attack
 Prediction examples
 Charge-off, serious
delinquency
 Cross-sell, up-sell
propensity
 ???
 Attrition
 Fraud rings
 Network intrusion
Time
Event
Prediction Detection
6
Attrition
TARGET DEFINITION AND AVAILABILITY
7
Credit Risk
Credit card fraud
Tax under-filers
Network intrusion
Bio-terror attack
FIRST PARTY FRAUD
 Committed on own account
 Victimless fraud
 Examples
 Fictitious identities
 Check kiting
 Bust out fraud
 Tax under-filing
8
SECOND PARTY FRAUD
 Committed by someone known to or close to
genuine account holder
 Examples
 Employee abuse of corporate cards
 Relatives abusing cards/data of children, siblings,
parents
 Caregivers abusing cards/data of senior citizens
9
THIRD PARTY FRAUD
 Committed on some one else’s account
 Impersonation of genuine identity
 Examples
 Identity theft
 Lost/stolen cards/accounts
 Stolen data/account information
 Online fraud with infected PCs
10
FRAUD TYPES: DEBIT CARD EXAMPLE
11
Source: First Data Resources
GLOBAL CARD FRAUD
12
US CARD FRAUD LOSSES
13
Source: Kansas City Federal Reserve
CARD FRAUD LOSSES FOR SELECT COUNTRIES
14
Source: Kansas City Federal Reserve
CREDIT RISK
 Existing accounts
 Serious delinquency
 Bankruptcy
 Charge-off
 New accounts
 Delinquency in first six months
 Bankruptcy in first six months
 Charge-off in first six months
15
CREDIT LIMIT AND BALANCES
16
DELINQUENCY STATUS
17
ATTRITION/CHURN RISK
 Closed/Cancelled account
 Loss of revenue due to sharp and lasting
reduction in balance and activity
18
NUMBER OF ACCOUNTS
19
OPENED AND CLOSED ACCOUNTS
20
Techniques
Performance measurement
Target definitions
21
TRANSACTIONS ANALYTICS
TYPES OF TECHNIQUES
 Rules
 Supervised learning models
 Regression, decision trees, neural networks, SVM
 Unsupervised learning models
 Clustering, PCA, neural networks
 Semi-supervised learning models
 Association rules/Market basket analysis
 Optimization
22
PREDICTION/DETECTION TECHNIQUES
 Un-supervised
Input
Layer
Feature
Layers
Input
Layer
Feature
Hidden
Input
Layer
Output
Layer
Feature
Hidden
Layers
Input
Layer
Output
Layer
Feature
Hidden
Layers
Input
Layer
Feature
Layers
Input
Layer
Feature
Hidden
 Semi-supervised
 Supervised
Input
Layer
Output
Layer
Feature
Hidden
Layers
Input
Layer
Output
Layer
Feature
Hidden
Layers
23
TYPICAL RULE BASED SYSTEM
 Pros
 Easy to understand
 Can be a batch or automated system
 Effective in catching the obvious cases
 Cons
 Too many false-positives
 Likely to miss many risky cases
 Does not provide priority for investigation
 Difficult to maintain
24
RULES FOR MEASURING SUCCESS
All
”goods”
and
“bads”
unknown
All
”goods”
and
“bads”
known
25
PERFORMANCE MEASURES
 How good is the score at separating the two classes of goods and
bads?
 Information value
 Kolmogorov–Smirnov statistic
 Lift curve
 ROC curve
 Gini coefficient
 Somer’s D-concordance statistic
 How good is the score as a probability forecast?
 Binomial and Normal tests
 Hosmer-Lemeshow test
 How good is the score and cut-offs in business decisions?
 Error rates
 Swap set analysis
26
INFORMATION VALUE
 Divide the score into i bands
27
PERFORMANCE DEFINITIONS
 Let F(s|G) ( F(s|B)) be distribution functions of scores (s) of
goods, (G) ( bads (B)) in a scorecard 28
KOLMOGOROV SMIRNOV STATISTIC
 Kolmogorov Smirnov statistic (KS)
29
LIFT CURVE
 Plots percentage bads rejected versus percentage rejected
 Ideal score given by ABC where B represents population
bad rate
 Random score represented by AC
 Accuracy ratio AR=2(Area of curve above diagonal)/Area of
ABC
30
ROC CURVE
 ABC represents ideal score
 Diagonal represents random score
 Gini coefficient (GC) measures twice
the ratio of area between curve and
diagonal to area ABC
 GC=1 corresponds to perfect score
 GC=0 represents random score
 Somer’s D-concordance (SD)
 If “good” and “bad” chosen at
random, good will have lower
score/probability of being bad than
bad
 AUROC is area under ROC curve
 GC=2AUROC-1=SD
31
BINOMIAL TEST
 Checks if predicted bad rate in a given bin i is
correct versus underestimated
 Let there be bads in the observations of
bin i and the probability of a borrower in that
band being good
 The predicted bad rate in bin i is correct if it the
number of bads k in bin i is less than or equal
to
32
NORMAL TEST
 Approximation of Binomial
 The predicted bad rate in bin i is correct if it the
number of bads k in bin i is less than
 Where is the inverse of the cumulative
normal distribution
33
HOSMER-LEMESHOW TEST
 Assess whether observed bad rates match
expected bad rates in each of ten bins
 A chi-square test statistic
 Let
34
SIMPLE SAS EXAMPLE-I
35
SIMPLE SAS EXAMPLE-II
36
ERROR RATES
 Account False Positive Ratio (AFPR): The ratio of good to bad
accounts for a given cut-off score
 A ratio of 10:1 would indicate that out of 11 accounts, 1 is bad, 10 are
good
 Account Detection Rate (ADR): The ratio of bad accounts to the total
number of bad accounts for the period at a given cut-off score.
 If there are 100 bad accounts in the time period and 30 of them are at
or above the cut-off score at some time during the period, the ADR is
30%
 Value Detection Rate (VDR): Percentage of dollars saved on detected
bad accounts for a given cut-off score
 Assuming that the total losses on all accounts are $1,000,000 and that
$600,000 of these are saved by the system, the VDR would,
consequently, be equal to 60%
37
SWAP SET ANALYSIS
 Used to compare two competing scores
 Choose top x% accounts using score1 and
score2
 Eliminate the common bads and goods
 Compare the two data sets to identify bads
caught by score1 but not score 2 and vice
versa
 Score1 is better than score2 if it has a higher
bad rate in the swap set
38
TARGET DEFINITION: CARD FRAUD
39
Pre-fraud Fraud window Post Block
Date/time of
first fraudulent
transactions
Block date/time
All transactions are
declined / blocked
Fraud activities has not been detected or confirmed yet. The approved fraudulent
transactions during this window are the fraud losses. Legitimate transactions could
exist in this period. (For the fraud case with no loss, there is no fraud window.)
All transactions are
legitimate
TARGET DEFINITION: CREDIT RISK
 Bad: Account becomes at any time during the outcome
window
 3+ cycles delinquent
 Bankrupt
 Charged-off
 Indeterminate accounts
 Maximum of 2 cycles delinquent in the outcome window
 Fraud or Transfer status in the outcome window
 Inactive accounts
 Indeterminate accounts will be excluded from off-sample
validation and off-time validation
 Other accounts are Good
40
41
TARGET DEFINITION: ATTRITION RISK
 Account closure
 Many banks/vendors use this to define “Bad” accounts
 Silent attrition
 Many banks/vendors use this to define “Bad” accounts
 Silent attrition defined as a sharp and lasting drop in economic value
(balance and activity) of accounts that were valuable in prior periods
 Many banks/vendors refine this definition to exclude accounts that have
other reasons for change in economic value of account
 Many banks/vendors use both to define “Bad” accounts
 All other non-fraudulent active current accounts are classified
as “Good” accounts
Using the scores in production
Monitoring the production system
42
DEPLOYING THE SOLUTION
43
SCORE USES
 Typical use of scores is in strategies to manage decisions
concerning:
 Whether to approve/decline authorizations
 Whether to approve/decline over-limit requests
 Actions to make delinquent accounts current
 Increase/decrease credit limits
 Whether to reissue credit cards
 Collections related actions
 Credit risk score is the most frequently used score for above
strategies. Some banks also use attrition, revenue and profit
scores
 Scores also used in other strategies such as retention, balance
transfer, balance build, convenience checks, and cross-sell/up-
sell optimization
 Fraud scores are used for approve/decline/refer decisions
BENEFITS FROM REAL TIME SCORING
44
WHY DO BOTH RULES AND SCORING?
 Rules allow the input of client specific intellectual property and operation
constraints
 Rules allow tracking and adjustments for new or short term risk patterns
 Models pick up non-obvious risk patterns and behaviors
 Output from advanced models easy to translate into probability & log odds
scores
 Scores can be used very easily to rank order entities
 The combination of rules and scores provides better detection rate and
better quality referral
 Business implication - with the same amount of resource,
Catching more risk activity
Catching them earlier
 Faster way to get a good ROI
45
AUTHORIZATION STRATEGY EXAMPLE
46
OVERLIMIT STRATEGY EXAMPLE
47
RETAIL/CHECK STRATEGY EXAMPLE
48
CREDIT LIMIT STRATEGY
Risk Score Low Medium High
Credit Limit Utilization Low High Low High
Delinquency Status Clean Dirty Clean Dirty Clean Dirty Clean Dirty
Credit Line Inc. 0 500 0 1000 500 1500 1000 5000 2500
 Implemented in the form of decision trees/strategies
 Champion/Challenger framework for improving strategies over time
 Randomly assign accounts to champion or challenger strategy
 Measure performance over time
 Takes a six to twelve months to evaluate each challenger strategy
 A very small number of potential champion strategies can be tested at a
given time
 Difficult to analyze why a particular challenger strategy worked
49
EXPANDING BEYOND THE “COMFORT ZONE”
Risk Score Low Medium High
Credit Limit Utilization Low High Low High
Delinquency Status Clean Dirty Clean Dirty Clean Dirty Clean Dirty
Champion Credit Line Inc. 0 500 0 1000 500 1500 1000 5000 2500
Test Group 1 0 0 0 500 0 500 0 2500 1000
Test Group 2 0 0 0 500 0 1500 0 3000 1500
Test Group 3 0 0 0 1500 0 2000 1500 4000 2000
Test Group 4 500 1000 500 2500 1000 3000 2000 7000 3000
Test Group 5 500 1500 1000 3000 1500 4000 2500 8000 4000
Test Group 6 500 2000 1500 4000 2500 5000 3000 9000 5000
50
NON-LINEAR PROGRAMMING EXAMPLE (A)
 Credit limit increases are a
continuous variable
 Randomly choose a small
number of accounts for
optimization
 Use Lagrangian relaxation
techniques
 Adding more constraints can
make solution more difficult
 Map optimal solution to a
decision tree to score all
accounts
 Deploying decision tree in
lieu of solution can result in
significant loss in benefit of
the whole effort
51
LINEAR PROGRAMMING EXAMPLE I (B)
 Only discrete credit limit
increases allowed
 Subset of LP problem has
integer solutions most of the
time
 Account level optimization
possible
 Solve relaxed LP problem
and check feasibility for
remaining constraints
 No need to map optimal
solution to a score
52
MONITORING THE SYSTEM
 Monitoring the scoring system
 Stability index of score
 Stability index of input fields
 Remedies for score deterioration
 Monitoring the portfolio
 Population stability report
 Characteristic analysis report
 Final score report
 Delinquency distribution list
 Roll rates
 Vintage analysis
 Reports by portfolio segments, risky segments
53
CHARACTERISTIC ANALYSIS REPORT
 Stability index
 Characteristic reports
54
REMEDIES FOR SCORE DETERIORATION
 Score shelf life depends upon the problem
 Fraud scores have lower shelf life because fraudsters constantly change
techniques
 Credit scores have longer shelf life because causes do not change much
over time
 Remedies
 Recalibrate the score
 Least expensive, easiest to implement
 A table mapping the old score to a new score
 Retrain the model
 More expensive, straightforward to implement
 Keep same variables, simply change the weights/coefficients
 Rebuild the model
 Most expensive, needs the full implementation cycle
 New models with new variables and new weights/coefficients
55
QUARTERLY REPORTS
 Population stability report
 Measures change in score distribution over time
 Characteristic analysis report
 Measures changes in individual input fields over time
 Final score report
 Measures how closely the score is used in production
 E.g., show number of accepts and rejects by application score
band
 Delinquency distribution report
 Measures the portfolio quality by different score ranges
56
QUARTERLY REPORT EXAMPLE
 Monitor change in population
57
NET FLOW RATE REPORT
Month Total Active 0 Days 30 Days 0 to 30 60 Days 30 to 60 90 Days 60 to 90 120 Days90 to 120 Charge-off120 to Charge-o
Jan-02 5,000,000 3,223,095 2708576 138010 62592 20993 15504 20304
Feb-02 4953109 3,042,517 2572243 135248 4.99% 53557 38.81% 22461 35.88% 20993 100.00% 15504 100.00%
Mar-02 4904891 3,113,894 2540610 149907 5.83% 50032 36.99% 20013 37.37% 20384 90.75% 10391 49.50%
Apr-02 5053111 2,871,802 2372516 156405 6.16% 32108 21.42% 15676 31.33% 12809 64.00% 16991 83.35%
May-02 4757579 3,499,756 3020579 107666 4.54% 49620 31.73% 30997 96.54% 15676 100.00% 12029 93.91%
Jun-02 4797435 2,705,767 2319788 159521 5.28% 35672 33.13% 23269 46.89% 10495 33.86% 12967 82.72%
Jul-02 4893318 3,413,728 2916158 146442 6.31% 49193 30.84% 21039 58.98% 16096 69.17% 10495 100.00%
Aug-02 4873484 2,995,243 2565883 91843 3.15% 48012 32.79% 26098 53.05% 21039 100.00% 15735 97.76%
Sep-02 4782782 3,474,030 2804788 173177 6.75% 44291 48.22% 33136 69.02% 21253 81.44% 14616 69.47%
Oct-02 4988121 3,365,931 2999460 118388 4.22% 38906 22.47% 23146 52.26% 15841 47.81% 14074 66.22%
Nov-02 5239903 2,991,770 2584154 152951 5.10% 46657 39.41% 17197 44.20% 14658 63.33% 15841 100.00%
Dec-02 4943682 3,204,539 2734118 141276 5.47% 48221 31.53% 23593 50.57% 12658 73.61% 14658 100.00%
4.99% of current accounts in Jan ’02 become 30 days delinquent in Feb ‘02
3,223,095 accounts roll into 12967
charge-offs with annualized charge-
off rate of 4.8%
58
VINTAGE CURVE REPORT
Vintage Curve
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
0 5 10 15 20 25 30 35 40
Months on Books
Cumulative%Losses
Months Cohort #1
Months Cohort # 2
Months Cohort #3
59
Q&A
60

More Related Content

Viewers also liked

Population Stability Index(PSI) for Big Data World
Population Stability Index(PSI) for Big Data WorldPopulation Stability Index(PSI) for Big Data World
Population Stability Index(PSI) for Big Data WorldJeomoan Kurian
 
Risk Asessment Presentation 19th Sept - Chris Delves
Risk Asessment Presentation 19th Sept - Chris DelvesRisk Asessment Presentation 19th Sept - Chris Delves
Risk Asessment Presentation 19th Sept - Chris DelvesKerryanne Wilde
 
A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...
A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...
A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...SSA KPI
 
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Magnify Analytic Solutions
 
How to read a receiver operating characteritic (ROC) curve
How to read a receiver operating characteritic (ROC) curveHow to read a receiver operating characteritic (ROC) curve
How to read a receiver operating characteritic (ROC) curveSamir Haffar
 
Credit Scoring
Credit ScoringCredit Scoring
Credit ScoringMABSIV
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval Venkata Reddy Konasani
 
Predictive Model for Loan Approval Process using SAS 9.3_M1
Predictive Model for Loan Approval Process using SAS 9.3_M1Predictive Model for Loan Approval Process using SAS 9.3_M1
Predictive Model for Loan Approval Process using SAS 9.3_M1Akanksha Jain
 
Civitas Learning: Understanding ROC Curves
Civitas Learning: Understanding ROC CurvesCivitas Learning: Understanding ROC Curves
Civitas Learning: Understanding ROC CurvesKristen Hunter
 

Viewers also liked (14)

Population Stability Index(PSI) for Big Data World
Population Stability Index(PSI) for Big Data WorldPopulation Stability Index(PSI) for Big Data World
Population Stability Index(PSI) for Big Data World
 
Risk Asessment Presentation 19th Sept - Chris Delves
Risk Asessment Presentation 19th Sept - Chris DelvesRisk Asessment Presentation 19th Sept - Chris Delves
Risk Asessment Presentation 19th Sept - Chris Delves
 
AUC: at what cost(s)?
AUC: at what cost(s)?AUC: at what cost(s)?
AUC: at what cost(s)?
 
A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...
A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...
A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...
 
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
 
Roc
RocRoc
Roc
 
How to read a receiver operating characteritic (ROC) curve
How to read a receiver operating characteritic (ROC) curveHow to read a receiver operating characteritic (ROC) curve
How to read a receiver operating characteritic (ROC) curve
 
Credit Scoring
Credit ScoringCredit Scoring
Credit Scoring
 
Credit scoring
Credit scoringCredit scoring
Credit scoring
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval
 
Credit Risk Model Building Steps
Credit Risk Model Building StepsCredit Risk Model Building Steps
Credit Risk Model Building Steps
 
Predictive Model for Loan Approval Process using SAS 9.3_M1
Predictive Model for Loan Approval Process using SAS 9.3_M1Predictive Model for Loan Approval Process using SAS 9.3_M1
Predictive Model for Loan Approval Process using SAS 9.3_M1
 
Civitas Learning: Understanding ROC Curves
Civitas Learning: Understanding ROC CurvesCivitas Learning: Understanding ROC Curves
Civitas Learning: Understanding ROC Curves
 

Similar to TransactionBasedAnalytics2010

Creditscore
CreditscoreCreditscore
Creditscorekevinlan
 
2. op risk and aml
2. op risk and aml2. op risk and aml
2. op risk and amlcrmbasel
 
Business and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April MeetupBusiness and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April MeetupKen Tucker
 
The Mindset of Decision-Making: Best Practices to Increase Agility and Visibi...
The Mindset of Decision-Making: Best Practices to Increase Agility and Visibi...The Mindset of Decision-Making: Best Practices to Increase Agility and Visibi...
The Mindset of Decision-Making: Best Practices to Increase Agility and Visibi...Prolifics
 
Customer Segmentation and Predictive Modeling
Customer Segmentation and Predictive ModelingCustomer Segmentation and Predictive Modeling
Customer Segmentation and Predictive ModelingAngie Wang
 
492 Ch5web
492 Ch5web492 Ch5web
492 Ch5webFNian
 
Why Size Matters in Merchant Onboarding
Why Size Matters in Merchant OnboardingWhy Size Matters in Merchant Onboarding
Why Size Matters in Merchant OnboardingProvenir
 
Are denials and payer audits still impacting your bottom line?
Are denials and payer audits still impacting your bottom line?Are denials and payer audits still impacting your bottom line?
Are denials and payer audits still impacting your bottom line?Matt Moneypenny
 
5 AI Solutions Every Chief Risk Officer Needs
5 AI Solutions Every Chief Risk Officer Needs5 AI Solutions Every Chief Risk Officer Needs
5 AI Solutions Every Chief Risk Officer NeedsAlisa Karybina
 
Using Analytics to Audit T&E
Using Analytics to Audit T&EUsing Analytics to Audit T&E
Using Analytics to Audit T&Emcoello
 
Infographic: Sales Channel Incentive Fraud Trends For 2014
Infographic: Sales Channel Incentive Fraud Trends For 2014Infographic: Sales Channel Incentive Fraud Trends For 2014
Infographic: Sales Channel Incentive Fraud Trends For 2014360insights
 
Liubomyr Bregman "Financial Crime Detection using Advanced Analytics"
Liubomyr Bregman  "Financial Crime Detection using Advanced Analytics"Liubomyr Bregman  "Financial Crime Detection using Advanced Analytics"
Liubomyr Bregman "Financial Crime Detection using Advanced Analytics"Lviv Startup Club
 
Analytics to the Rescue: Better Loss Prevention through Modeling
Analytics to the Rescue: Better Loss Prevention through ModelingAnalytics to the Rescue: Better Loss Prevention through Modeling
Analytics to the Rescue: Better Loss Prevention through ModelingCognizant
 
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdfTanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdfShrutiGarg649495
 
Financial Supply chain Management.
Financial Supply chain Management.Financial Supply chain Management.
Financial Supply chain Management.Rajeev Kumar
 
Operation Payback Debrief_eBook_SV
Operation Payback Debrief_eBook_SVOperation Payback Debrief_eBook_SV
Operation Payback Debrief_eBook_SVShaun O'keeffe
 
Business Risk Case Study Ba 32
Business Risk Case Study  Ba 32Business Risk Case Study  Ba 32
Business Risk Case Study Ba 32Sandip Sen
 

Similar to TransactionBasedAnalytics2010 (20)

Creditscore
CreditscoreCreditscore
Creditscore
 
2. op risk and aml
2. op risk and aml2. op risk and aml
2. op risk and aml
 
Business and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April MeetupBusiness and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April Meetup
 
The Mindset of Decision-Making: Best Practices to Increase Agility and Visibi...
The Mindset of Decision-Making: Best Practices to Increase Agility and Visibi...The Mindset of Decision-Making: Best Practices to Increase Agility and Visibi...
The Mindset of Decision-Making: Best Practices to Increase Agility and Visibi...
 
Customer Segmentation and Predictive Modeling
Customer Segmentation and Predictive ModelingCustomer Segmentation and Predictive Modeling
Customer Segmentation and Predictive Modeling
 
492 Ch5web
492 Ch5web492 Ch5web
492 Ch5web
 
Why Size Matters in Merchant Onboarding
Why Size Matters in Merchant OnboardingWhy Size Matters in Merchant Onboarding
Why Size Matters in Merchant Onboarding
 
Are denials and payer audits still impacting your bottom line?
Are denials and payer audits still impacting your bottom line?Are denials and payer audits still impacting your bottom line?
Are denials and payer audits still impacting your bottom line?
 
eCommerce – Fraud
eCommerce – FraudeCommerce – Fraud
eCommerce – Fraud
 
5 AI Solutions Every Chief Risk Officer Needs
5 AI Solutions Every Chief Risk Officer Needs5 AI Solutions Every Chief Risk Officer Needs
5 AI Solutions Every Chief Risk Officer Needs
 
Using Analytics to Audit T&E
Using Analytics to Audit T&EUsing Analytics to Audit T&E
Using Analytics to Audit T&E
 
Credit Card Fraud Detection_ Mansi_Choudhary.pptx
Credit Card Fraud Detection_ Mansi_Choudhary.pptxCredit Card Fraud Detection_ Mansi_Choudhary.pptx
Credit Card Fraud Detection_ Mansi_Choudhary.pptx
 
Infographic: Sales Channel Incentive Fraud Trends For 2014
Infographic: Sales Channel Incentive Fraud Trends For 2014Infographic: Sales Channel Incentive Fraud Trends For 2014
Infographic: Sales Channel Incentive Fraud Trends For 2014
 
Liubomyr Bregman "Financial Crime Detection using Advanced Analytics"
Liubomyr Bregman  "Financial Crime Detection using Advanced Analytics"Liubomyr Bregman  "Financial Crime Detection using Advanced Analytics"
Liubomyr Bregman "Financial Crime Detection using Advanced Analytics"
 
Analytics to the Rescue: Better Loss Prevention through Modeling
Analytics to the Rescue: Better Loss Prevention through ModelingAnalytics to the Rescue: Better Loss Prevention through Modeling
Analytics to the Rescue: Better Loss Prevention through Modeling
 
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdfTanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdf
 
Financial Supply chain Management.
Financial Supply chain Management.Financial Supply chain Management.
Financial Supply chain Management.
 
Credit scorecard
Credit scorecardCredit scorecard
Credit scorecard
 
Operation Payback Debrief_eBook_SV
Operation Payback Debrief_eBook_SVOperation Payback Debrief_eBook_SV
Operation Payback Debrief_eBook_SV
 
Business Risk Case Study Ba 32
Business Risk Case Study  Ba 32Business Risk Case Study  Ba 32
Business Risk Case Study Ba 32
 

TransactionBasedAnalytics2010

  • 1. TRANSACTIONS BASED ANALYTICS Vijay Desai, SAS Institute Presented at Kelly School of Business, Bloomington, IN Nov. 8, 2010 1
  • 2. AGENDA  Transactions landscape  Transactions data  Problems to tackle  Transactions analytics  Types of techniques  Performance measurement  Target definition  Fraud  Credit risk  Attrition  Deploying the solution  Using the scores in production  Monitoring the production system 2
  • 3. Trasactions Data Problems to Tackle 3 TRANSACTIONS LANDSCAPE
  • 4. TRANSACTIONS DATA  Credit/Debit cards  Authorisations  Payments  Statements  Non-monetary data  Bureau data  Demographic data  Campaign data  Clickstream data  Wire transfers  Financial transactions 4
  • 5. PROBLEMS TO TACKLE  First party fraud  Second party fraud  Third party fraud  Credit risk, bankruptcy  Product offers, pricing  Money laundering  Financial trading violations  Bio-terrorism  Intrusion detection 5
  • 6. PREDICTION VERSUS DETECTION  Detection examples  Credit card fraud  Tax under-filing  Bio-terror attack  Prediction examples  Charge-off, serious delinquency  Cross-sell, up-sell propensity  ???  Attrition  Fraud rings  Network intrusion Time Event Prediction Detection 6
  • 7. Attrition TARGET DEFINITION AND AVAILABILITY 7 Credit Risk Credit card fraud Tax under-filers Network intrusion Bio-terror attack
  • 8. FIRST PARTY FRAUD  Committed on own account  Victimless fraud  Examples  Fictitious identities  Check kiting  Bust out fraud  Tax under-filing 8
  • 9. SECOND PARTY FRAUD  Committed by someone known to or close to genuine account holder  Examples  Employee abuse of corporate cards  Relatives abusing cards/data of children, siblings, parents  Caregivers abusing cards/data of senior citizens 9
  • 10. THIRD PARTY FRAUD  Committed on some one else’s account  Impersonation of genuine identity  Examples  Identity theft  Lost/stolen cards/accounts  Stolen data/account information  Online fraud with infected PCs 10
  • 11. FRAUD TYPES: DEBIT CARD EXAMPLE 11 Source: First Data Resources
  • 13. US CARD FRAUD LOSSES 13 Source: Kansas City Federal Reserve
  • 14. CARD FRAUD LOSSES FOR SELECT COUNTRIES 14 Source: Kansas City Federal Reserve
  • 15. CREDIT RISK  Existing accounts  Serious delinquency  Bankruptcy  Charge-off  New accounts  Delinquency in first six months  Bankruptcy in first six months  Charge-off in first six months 15
  • 16. CREDIT LIMIT AND BALANCES 16
  • 18. ATTRITION/CHURN RISK  Closed/Cancelled account  Loss of revenue due to sharp and lasting reduction in balance and activity 18
  • 20. OPENED AND CLOSED ACCOUNTS 20
  • 22. TYPES OF TECHNIQUES  Rules  Supervised learning models  Regression, decision trees, neural networks, SVM  Unsupervised learning models  Clustering, PCA, neural networks  Semi-supervised learning models  Association rules/Market basket analysis  Optimization 22
  • 24. TYPICAL RULE BASED SYSTEM  Pros  Easy to understand  Can be a batch or automated system  Effective in catching the obvious cases  Cons  Too many false-positives  Likely to miss many risky cases  Does not provide priority for investigation  Difficult to maintain 24
  • 25. RULES FOR MEASURING SUCCESS All ”goods” and “bads” unknown All ”goods” and “bads” known 25
  • 26. PERFORMANCE MEASURES  How good is the score at separating the two classes of goods and bads?  Information value  Kolmogorov–Smirnov statistic  Lift curve  ROC curve  Gini coefficient  Somer’s D-concordance statistic  How good is the score as a probability forecast?  Binomial and Normal tests  Hosmer-Lemeshow test  How good is the score and cut-offs in business decisions?  Error rates  Swap set analysis 26
  • 27. INFORMATION VALUE  Divide the score into i bands 27
  • 28. PERFORMANCE DEFINITIONS  Let F(s|G) ( F(s|B)) be distribution functions of scores (s) of goods, (G) ( bads (B)) in a scorecard 28
  • 29. KOLMOGOROV SMIRNOV STATISTIC  Kolmogorov Smirnov statistic (KS) 29
  • 30. LIFT CURVE  Plots percentage bads rejected versus percentage rejected  Ideal score given by ABC where B represents population bad rate  Random score represented by AC  Accuracy ratio AR=2(Area of curve above diagonal)/Area of ABC 30
  • 31. ROC CURVE  ABC represents ideal score  Diagonal represents random score  Gini coefficient (GC) measures twice the ratio of area between curve and diagonal to area ABC  GC=1 corresponds to perfect score  GC=0 represents random score  Somer’s D-concordance (SD)  If “good” and “bad” chosen at random, good will have lower score/probability of being bad than bad  AUROC is area under ROC curve  GC=2AUROC-1=SD 31
  • 32. BINOMIAL TEST  Checks if predicted bad rate in a given bin i is correct versus underestimated  Let there be bads in the observations of bin i and the probability of a borrower in that band being good  The predicted bad rate in bin i is correct if it the number of bads k in bin i is less than or equal to 32
  • 33. NORMAL TEST  Approximation of Binomial  The predicted bad rate in bin i is correct if it the number of bads k in bin i is less than  Where is the inverse of the cumulative normal distribution 33
  • 34. HOSMER-LEMESHOW TEST  Assess whether observed bad rates match expected bad rates in each of ten bins  A chi-square test statistic  Let 34
  • 37. ERROR RATES  Account False Positive Ratio (AFPR): The ratio of good to bad accounts for a given cut-off score  A ratio of 10:1 would indicate that out of 11 accounts, 1 is bad, 10 are good  Account Detection Rate (ADR): The ratio of bad accounts to the total number of bad accounts for the period at a given cut-off score.  If there are 100 bad accounts in the time period and 30 of them are at or above the cut-off score at some time during the period, the ADR is 30%  Value Detection Rate (VDR): Percentage of dollars saved on detected bad accounts for a given cut-off score  Assuming that the total losses on all accounts are $1,000,000 and that $600,000 of these are saved by the system, the VDR would, consequently, be equal to 60% 37
  • 38. SWAP SET ANALYSIS  Used to compare two competing scores  Choose top x% accounts using score1 and score2  Eliminate the common bads and goods  Compare the two data sets to identify bads caught by score1 but not score 2 and vice versa  Score1 is better than score2 if it has a higher bad rate in the swap set 38
  • 39. TARGET DEFINITION: CARD FRAUD 39 Pre-fraud Fraud window Post Block Date/time of first fraudulent transactions Block date/time All transactions are declined / blocked Fraud activities has not been detected or confirmed yet. The approved fraudulent transactions during this window are the fraud losses. Legitimate transactions could exist in this period. (For the fraud case with no loss, there is no fraud window.) All transactions are legitimate
  • 40. TARGET DEFINITION: CREDIT RISK  Bad: Account becomes at any time during the outcome window  3+ cycles delinquent  Bankrupt  Charged-off  Indeterminate accounts  Maximum of 2 cycles delinquent in the outcome window  Fraud or Transfer status in the outcome window  Inactive accounts  Indeterminate accounts will be excluded from off-sample validation and off-time validation  Other accounts are Good 40
  • 41. 41 TARGET DEFINITION: ATTRITION RISK  Account closure  Many banks/vendors use this to define “Bad” accounts  Silent attrition  Many banks/vendors use this to define “Bad” accounts  Silent attrition defined as a sharp and lasting drop in economic value (balance and activity) of accounts that were valuable in prior periods  Many banks/vendors refine this definition to exclude accounts that have other reasons for change in economic value of account  Many banks/vendors use both to define “Bad” accounts  All other non-fraudulent active current accounts are classified as “Good” accounts
  • 42. Using the scores in production Monitoring the production system 42 DEPLOYING THE SOLUTION
  • 43. 43 SCORE USES  Typical use of scores is in strategies to manage decisions concerning:  Whether to approve/decline authorizations  Whether to approve/decline over-limit requests  Actions to make delinquent accounts current  Increase/decrease credit limits  Whether to reissue credit cards  Collections related actions  Credit risk score is the most frequently used score for above strategies. Some banks also use attrition, revenue and profit scores  Scores also used in other strategies such as retention, balance transfer, balance build, convenience checks, and cross-sell/up- sell optimization  Fraud scores are used for approve/decline/refer decisions
  • 44. BENEFITS FROM REAL TIME SCORING 44
  • 45. WHY DO BOTH RULES AND SCORING?  Rules allow the input of client specific intellectual property and operation constraints  Rules allow tracking and adjustments for new or short term risk patterns  Models pick up non-obvious risk patterns and behaviors  Output from advanced models easy to translate into probability & log odds scores  Scores can be used very easily to rank order entities  The combination of rules and scores provides better detection rate and better quality referral  Business implication - with the same amount of resource, Catching more risk activity Catching them earlier  Faster way to get a good ROI 45
  • 49. CREDIT LIMIT STRATEGY Risk Score Low Medium High Credit Limit Utilization Low High Low High Delinquency Status Clean Dirty Clean Dirty Clean Dirty Clean Dirty Credit Line Inc. 0 500 0 1000 500 1500 1000 5000 2500  Implemented in the form of decision trees/strategies  Champion/Challenger framework for improving strategies over time  Randomly assign accounts to champion or challenger strategy  Measure performance over time  Takes a six to twelve months to evaluate each challenger strategy  A very small number of potential champion strategies can be tested at a given time  Difficult to analyze why a particular challenger strategy worked 49
  • 50. EXPANDING BEYOND THE “COMFORT ZONE” Risk Score Low Medium High Credit Limit Utilization Low High Low High Delinquency Status Clean Dirty Clean Dirty Clean Dirty Clean Dirty Champion Credit Line Inc. 0 500 0 1000 500 1500 1000 5000 2500 Test Group 1 0 0 0 500 0 500 0 2500 1000 Test Group 2 0 0 0 500 0 1500 0 3000 1500 Test Group 3 0 0 0 1500 0 2000 1500 4000 2000 Test Group 4 500 1000 500 2500 1000 3000 2000 7000 3000 Test Group 5 500 1500 1000 3000 1500 4000 2500 8000 4000 Test Group 6 500 2000 1500 4000 2500 5000 3000 9000 5000 50
  • 51. NON-LINEAR PROGRAMMING EXAMPLE (A)  Credit limit increases are a continuous variable  Randomly choose a small number of accounts for optimization  Use Lagrangian relaxation techniques  Adding more constraints can make solution more difficult  Map optimal solution to a decision tree to score all accounts  Deploying decision tree in lieu of solution can result in significant loss in benefit of the whole effort 51
  • 52. LINEAR PROGRAMMING EXAMPLE I (B)  Only discrete credit limit increases allowed  Subset of LP problem has integer solutions most of the time  Account level optimization possible  Solve relaxed LP problem and check feasibility for remaining constraints  No need to map optimal solution to a score 52
  • 53. MONITORING THE SYSTEM  Monitoring the scoring system  Stability index of score  Stability index of input fields  Remedies for score deterioration  Monitoring the portfolio  Population stability report  Characteristic analysis report  Final score report  Delinquency distribution list  Roll rates  Vintage analysis  Reports by portfolio segments, risky segments 53
  • 54. CHARACTERISTIC ANALYSIS REPORT  Stability index  Characteristic reports 54
  • 55. REMEDIES FOR SCORE DETERIORATION  Score shelf life depends upon the problem  Fraud scores have lower shelf life because fraudsters constantly change techniques  Credit scores have longer shelf life because causes do not change much over time  Remedies  Recalibrate the score  Least expensive, easiest to implement  A table mapping the old score to a new score  Retrain the model  More expensive, straightforward to implement  Keep same variables, simply change the weights/coefficients  Rebuild the model  Most expensive, needs the full implementation cycle  New models with new variables and new weights/coefficients 55
  • 56. QUARTERLY REPORTS  Population stability report  Measures change in score distribution over time  Characteristic analysis report  Measures changes in individual input fields over time  Final score report  Measures how closely the score is used in production  E.g., show number of accepts and rejects by application score band  Delinquency distribution report  Measures the portfolio quality by different score ranges 56
  • 57. QUARTERLY REPORT EXAMPLE  Monitor change in population 57
  • 58. NET FLOW RATE REPORT Month Total Active 0 Days 30 Days 0 to 30 60 Days 30 to 60 90 Days 60 to 90 120 Days90 to 120 Charge-off120 to Charge-o Jan-02 5,000,000 3,223,095 2708576 138010 62592 20993 15504 20304 Feb-02 4953109 3,042,517 2572243 135248 4.99% 53557 38.81% 22461 35.88% 20993 100.00% 15504 100.00% Mar-02 4904891 3,113,894 2540610 149907 5.83% 50032 36.99% 20013 37.37% 20384 90.75% 10391 49.50% Apr-02 5053111 2,871,802 2372516 156405 6.16% 32108 21.42% 15676 31.33% 12809 64.00% 16991 83.35% May-02 4757579 3,499,756 3020579 107666 4.54% 49620 31.73% 30997 96.54% 15676 100.00% 12029 93.91% Jun-02 4797435 2,705,767 2319788 159521 5.28% 35672 33.13% 23269 46.89% 10495 33.86% 12967 82.72% Jul-02 4893318 3,413,728 2916158 146442 6.31% 49193 30.84% 21039 58.98% 16096 69.17% 10495 100.00% Aug-02 4873484 2,995,243 2565883 91843 3.15% 48012 32.79% 26098 53.05% 21039 100.00% 15735 97.76% Sep-02 4782782 3,474,030 2804788 173177 6.75% 44291 48.22% 33136 69.02% 21253 81.44% 14616 69.47% Oct-02 4988121 3,365,931 2999460 118388 4.22% 38906 22.47% 23146 52.26% 15841 47.81% 14074 66.22% Nov-02 5239903 2,991,770 2584154 152951 5.10% 46657 39.41% 17197 44.20% 14658 63.33% 15841 100.00% Dec-02 4943682 3,204,539 2734118 141276 5.47% 48221 31.53% 23593 50.57% 12658 73.61% 14658 100.00% 4.99% of current accounts in Jan ’02 become 30 days delinquent in Feb ‘02 3,223,095 accounts roll into 12967 charge-offs with annualized charge- off rate of 4.8% 58
  • 59. VINTAGE CURVE REPORT Vintage Curve -0.5 0 0.5 1 1.5 2 2.5 3 3.5 0 5 10 15 20 25 30 35 40 Months on Books Cumulative%Losses Months Cohort #1 Months Cohort # 2 Months Cohort #3 59