TransactionBasedAnalytics2010

TRANSACTIONS
BASED
ANALYTICS
Vijay Desai, SAS Institute
Presented at Kelly School of
Business, Bloomington, IN
Nov. 8, 2010
1

AGENDA
 Transactions landscape
 Transactions data
 Problems to tackle
 Transactions analytics
 Types of techniques
 Performance measurement
 Target definition
 Fraud
 Credit risk
 Attrition
 Deploying the solution
 Using the scores in production
 Monitoring the production system
2

Trasactions Data
Problems to Tackle
3
TRANSACTIONS LANDSCAPE

TRANSACTIONS DATA
 Credit/Debit cards
 Authorisations
 Payments
 Statements
 Non-monetary data
 Bureau data
 Demographic data
 Campaign data
 Clickstream data
 Wire transfers
 Financial transactions
4

PROBLEMS TO TACKLE
 First party fraud
 Second party fraud
 Third party fraud
 Credit risk, bankruptcy
 Product offers, pricing
 Money laundering
 Financial trading violations
 Bio-terrorism
 Intrusion detection
5

PREDICTION VERSUS DETECTION
 Detection examples
 Credit card fraud
 Tax under-filing
 Bio-terror attack
 Prediction examples
 Charge-off, serious
delinquency
 Cross-sell, up-sell
propensity
 ???
 Attrition
 Fraud rings
 Network intrusion
Time
Event
Prediction Detection
6

Attrition
TARGET DEFINITION AND AVAILABILITY
7
Credit Risk
Credit card fraud
Tax under-filers
Network intrusion
Bio-terror attack

FIRST PARTY FRAUD
 Committed on own account
 Victimless fraud
 Examples
 Fictitious identities
 Check kiting
 Bust out fraud
 Tax under-filing
8

SECOND PARTY FRAUD
 Committed by someone known to or close to
genuine account holder
 Examples
 Employee abuse of corporate cards
 Relatives abusing cards/data of children, siblings,
parents
 Caregivers abusing cards/data of senior citizens
9

THIRD PARTY FRAUD
 Committed on some one else’s account
 Impersonation of genuine identity
 Examples
 Identity theft
 Lost/stolen cards/accounts
 Stolen data/account information
 Online fraud with infected PCs
10

FRAUD TYPES: DEBIT CARD EXAMPLE
11
Source: First Data Resources

US CARD FRAUD LOSSES
13
Source: Kansas City Federal Reserve

CARD FRAUD LOSSES FOR SELECT COUNTRIES
14
Source: Kansas City Federal Reserve

CREDIT RISK
 Existing accounts
 Serious delinquency
 Bankruptcy
 Charge-off
 New accounts
 Delinquency in first six months
 Bankruptcy in first six months
 Charge-off in first six months
15

ATTRITION/CHURN RISK
 Closed/Cancelled account
 Loss of revenue due to sharp and lasting
reduction in balance and activity
18

Techniques
Performance measurement
Target definitions
21
TRANSACTIONS ANALYTICS

TYPES OF TECHNIQUES
 Rules
 Supervised learning models
 Regression, decision trees, neural networks, SVM
 Unsupervised learning models
 Clustering, PCA, neural networks
 Semi-supervised learning models
 Association rules/Market basket analysis
 Optimization
22

PREDICTION/DETECTION TECHNIQUES
 Un-supervised
Input
Layer
Feature
Layers
Input
Layer
Feature
Hidden
Input
Layer
Output
Layer
Feature
Hidden
Layers
Input
Layer
Output
Layer
Feature
Hidden
Layers
Input
Layer
Feature
Layers
Input
Layer
Feature
Hidden
 Semi-supervised
 Supervised
Input
Layer
Output
Layer
Feature
Hidden
Layers
Input
Layer
Output
Layer
Feature
Hidden
Layers
23

TYPICAL RULE BASED SYSTEM
 Pros
 Easy to understand
 Can be a batch or automated system
 Effective in catching the obvious cases
 Cons
 Too many false-positives
 Likely to miss many risky cases
 Does not provide priority for investigation
 Difficult to maintain
24

RULES FOR MEASURING SUCCESS
All
”goods”
and
“bads”
unknown
All
”goods”
and
“bads”
known
25

PERFORMANCE MEASURES
 How good is the score at separating the two classes of goods and
bads?
 Information value
 Kolmogorov–Smirnov statistic
 Lift curve
 ROC curve
 Gini coefficient
 Somer’s D-concordance statistic
 How good is the score as a probability forecast?
 Binomial and Normal tests
 Hosmer-Lemeshow test
 How good is the score and cut-offs in business decisions?
 Error rates
 Swap set analysis
26

INFORMATION VALUE
 Divide the score into i bands
27

PERFORMANCE DEFINITIONS
 Let F(s|G) ( F(s|B)) be distribution functions of scores (s) of
goods, (G) ( bads (B)) in a scorecard 28

KOLMOGOROV SMIRNOV STATISTIC
 Kolmogorov Smirnov statistic (KS)
29

LIFT CURVE
 Plots percentage bads rejected versus percentage rejected
 Ideal score given by ABC where B represents population
bad rate
 Random score represented by AC
 Accuracy ratio AR=2(Area of curve above diagonal)/Area of
ABC
30

ROC CURVE
 ABC represents ideal score
 Diagonal represents random score
 Gini coefficient (GC) measures twice
the ratio of area between curve and
diagonal to area ABC
 GC=1 corresponds to perfect score
 GC=0 represents random score
 Somer’s D-concordance (SD)
 If “good” and “bad” chosen at
random, good will have lower
score/probability of being bad than
bad
 AUROC is area under ROC curve
 GC=2AUROC-1=SD
31

BINOMIAL TEST
 Checks if predicted bad rate in a given bin i is
correct versus underestimated
 Let there be bads in the observations of
bin i and the probability of a borrower in that
band being good
 The predicted bad rate in bin i is correct if it the
number of bads k in bin i is less than or equal
to
32

NORMAL TEST
 Approximation of Binomial
 The predicted bad rate in bin i is correct if it the
number of bads k in bin i is less than
 Where is the inverse of the cumulative
normal distribution
33

HOSMER-LEMESHOW TEST
 Assess whether observed bad rates match
expected bad rates in each of ten bins
 A chi-square test statistic
 Let
34

ERROR RATES
 Account False Positive Ratio (AFPR): The ratio of good to bad
accounts for a given cut-off score
 A ratio of 10:1 would indicate that out of 11 accounts, 1 is bad, 10 are
good
 Account Detection Rate (ADR): The ratio of bad accounts to the total
number of bad accounts for the period at a given cut-off score.
 If there are 100 bad accounts in the time period and 30 of them are at
or above the cut-off score at some time during the period, the ADR is
30%
 Value Detection Rate (VDR): Percentage of dollars saved on detected
bad accounts for a given cut-off score
 Assuming that the total losses on all accounts are $1,000,000 and that
$600,000 of these are saved by the system, the VDR would,
consequently, be equal to 60%
37

SWAP SET ANALYSIS
 Used to compare two competing scores
 Choose top x% accounts using score1 and
score2
 Eliminate the common bads and goods
 Compare the two data sets to identify bads
caught by score1 but not score 2 and vice
versa
 Score1 is better than score2 if it has a higher
bad rate in the swap set
38

TARGET DEFINITION: CARD FRAUD
39
Pre-fraud Fraud window Post Block
Date/time of
first fraudulent
transactions
Block date/time
All transactions are
declined / blocked
Fraud activities has not been detected or confirmed yet. The approved fraudulent
transactions during this window are the fraud losses. Legitimate transactions could
exist in this period. (For the fraud case with no loss, there is no fraud window.)
All transactions are
legitimate

TARGET DEFINITION: CREDIT RISK
 Bad: Account becomes at any time during the outcome
window
 3+ cycles delinquent
 Bankrupt
 Charged-off
 Indeterminate accounts
 Maximum of 2 cycles delinquent in the outcome window
 Fraud or Transfer status in the outcome window
 Inactive accounts
 Indeterminate accounts will be excluded from off-sample
validation and off-time validation
 Other accounts are Good
40

41
TARGET DEFINITION: ATTRITION RISK
 Account closure
 Many banks/vendors use this to define “Bad” accounts
 Silent attrition
 Many banks/vendors use this to define “Bad” accounts
 Silent attrition defined as a sharp and lasting drop in economic value
(balance and activity) of accounts that were valuable in prior periods
 Many banks/vendors refine this definition to exclude accounts that have
other reasons for change in economic value of account
 Many banks/vendors use both to define “Bad” accounts
 All other non-fraudulent active current accounts are classified
as “Good” accounts

Using the scores in production
Monitoring the production system
42
DEPLOYING THE SOLUTION

43
SCORE USES
 Typical use of scores is in strategies to manage decisions
concerning:
 Whether to approve/decline authorizations
 Whether to approve/decline over-limit requests
 Actions to make delinquent accounts current
 Increase/decrease credit limits
 Whether to reissue credit cards
 Collections related actions
 Credit risk score is the most frequently used score for above
strategies. Some banks also use attrition, revenue and profit
scores
 Scores also used in other strategies such as retention, balance
transfer, balance build, convenience checks, and cross-sell/up-
sell optimization
 Fraud scores are used for approve/decline/refer decisions

BENEFITS FROM REAL TIME SCORING
44

WHY DO BOTH RULES AND SCORING?
 Rules allow the input of client specific intellectual property and operation
constraints
 Rules allow tracking and adjustments for new or short term risk patterns
 Models pick up non-obvious risk patterns and behaviors
 Output from advanced models easy to translate into probability & log odds
scores
 Scores can be used very easily to rank order entities
 The combination of rules and scores provides better detection rate and
better quality referral
 Business implication - with the same amount of resource,
Catching more risk activity
Catching them earlier
 Faster way to get a good ROI
45

AUTHORIZATION STRATEGY EXAMPLE
46

RETAIL/CHECK STRATEGY EXAMPLE
48

CREDIT LIMIT STRATEGY
Risk Score Low Medium High
Credit Limit Utilization Low High Low High
Delinquency Status Clean Dirty Clean Dirty Clean Dirty Clean Dirty
Credit Line Inc. 0 500 0 1000 500 1500 1000 5000 2500
 Implemented in the form of decision trees/strategies
 Champion/Challenger framework for improving strategies over time
 Randomly assign accounts to champion or challenger strategy
 Measure performance over time
 Takes a six to twelve months to evaluate each challenger strategy
 A very small number of potential champion strategies can be tested at a
given time
 Difficult to analyze why a particular challenger strategy worked
49

EXPANDING BEYOND THE “COMFORT ZONE”
Risk Score Low Medium High
Credit Limit Utilization Low High Low High
Delinquency Status Clean Dirty Clean Dirty Clean Dirty Clean Dirty
Champion Credit Line Inc. 0 500 0 1000 500 1500 1000 5000 2500
Test Group 1 0 0 0 500 0 500 0 2500 1000
Test Group 2 0 0 0 500 0 1500 0 3000 1500
Test Group 3 0 0 0 1500 0 2000 1500 4000 2000
Test Group 4 500 1000 500 2500 1000 3000 2000 7000 3000
Test Group 5 500 1500 1000 3000 1500 4000 2500 8000 4000
Test Group 6 500 2000 1500 4000 2500 5000 3000 9000 5000
50

NON-LINEAR PROGRAMMING EXAMPLE (A)
 Credit limit increases are a
continuous variable
 Randomly choose a small
number of accounts for
optimization
 Use Lagrangian relaxation
techniques
 Adding more constraints can
make solution more difficult
 Map optimal solution to a
decision tree to score all
accounts
 Deploying decision tree in
lieu of solution can result in
significant loss in benefit of
the whole effort
51

LINEAR PROGRAMMING EXAMPLE I (B)
 Only discrete credit limit
increases allowed
 Subset of LP problem has
integer solutions most of the
time
 Account level optimization
possible
 Solve relaxed LP problem
and check feasibility for
remaining constraints
 No need to map optimal
solution to a score
52

MONITORING THE SYSTEM
 Monitoring the scoring system
 Stability index of score
 Stability index of input fields
 Remedies for score deterioration
 Monitoring the portfolio
 Population stability report
 Characteristic analysis report
 Final score report
 Delinquency distribution list
 Roll rates
 Vintage analysis
 Reports by portfolio segments, risky segments
53

CHARACTERISTIC ANALYSIS REPORT
 Stability index
 Characteristic reports
54

REMEDIES FOR SCORE DETERIORATION
 Score shelf life depends upon the problem
 Fraud scores have lower shelf life because fraudsters constantly change
techniques
 Credit scores have longer shelf life because causes do not change much
over time
 Remedies
 Recalibrate the score
 Least expensive, easiest to implement
 A table mapping the old score to a new score
 Retrain the model
 More expensive, straightforward to implement
 Keep same variables, simply change the weights/coefficients
 Rebuild the model
 Most expensive, needs the full implementation cycle
 New models with new variables and new weights/coefficients
55

QUARTERLY REPORTS
 Population stability report
 Measures change in score distribution over time
 Characteristic analysis report
 Measures changes in individual input fields over time
 Final score report
 Measures how closely the score is used in production
 E.g., show number of accepts and rejects by application score
band
 Delinquency distribution report
 Measures the portfolio quality by different score ranges
56

QUARTERLY REPORT EXAMPLE
 Monitor change in population
57

NET FLOW RATE REPORT
Month Total Active 0 Days 30 Days 0 to 30 60 Days 30 to 60 90 Days 60 to 90 120 Days90 to 120 Charge-off120 to Charge-o
Jan-02 5,000,000 3,223,095 2708576 138010 62592 20993 15504 20304
Feb-02 4953109 3,042,517 2572243 135248 4.99% 53557 38.81% 22461 35.88% 20993 100.00% 15504 100.00%
Mar-02 4904891 3,113,894 2540610 149907 5.83% 50032 36.99% 20013 37.37% 20384 90.75% 10391 49.50%
Apr-02 5053111 2,871,802 2372516 156405 6.16% 32108 21.42% 15676 31.33% 12809 64.00% 16991 83.35%
May-02 4757579 3,499,756 3020579 107666 4.54% 49620 31.73% 30997 96.54% 15676 100.00% 12029 93.91%
Jun-02 4797435 2,705,767 2319788 159521 5.28% 35672 33.13% 23269 46.89% 10495 33.86% 12967 82.72%
Jul-02 4893318 3,413,728 2916158 146442 6.31% 49193 30.84% 21039 58.98% 16096 69.17% 10495 100.00%
Aug-02 4873484 2,995,243 2565883 91843 3.15% 48012 32.79% 26098 53.05% 21039 100.00% 15735 97.76%
Sep-02 4782782 3,474,030 2804788 173177 6.75% 44291 48.22% 33136 69.02% 21253 81.44% 14616 69.47%
Oct-02 4988121 3,365,931 2999460 118388 4.22% 38906 22.47% 23146 52.26% 15841 47.81% 14074 66.22%
Nov-02 5239903 2,991,770 2584154 152951 5.10% 46657 39.41% 17197 44.20% 14658 63.33% 15841 100.00%
Dec-02 4943682 3,204,539 2734118 141276 5.47% 48221 31.53% 23593 50.57% 12658 73.61% 14658 100.00%
4.99% of current accounts in Jan ’02 become 30 days delinquent in Feb ‘02
3,223,095 accounts roll into 12967
charge-offs with annualized charge-
off rate of 4.8%
58

VINTAGE CURVE REPORT
Vintage Curve
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
0 5 10 15 20 25 30 35 40
Months on Books
Cumulative%Losses
Months Cohort #1
Months Cohort # 2
Months Cohort #3
59

TransactionBasedAnalytics2010

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (14)

Similar to TransactionBasedAnalytics2010

Similar to TransactionBasedAnalytics2010 (20)

TransactionBasedAnalytics2010