More Related Content
Similar to 1330 keynote Shahapurkar
Similar to 1330 keynote Shahapurkar (20)
More from Rising Media, Inc.
More from Rising Media, Inc. (20)
1330 keynote Shahapurkar
- 1. © 2016 Fair Isaac Corporation. Confidential. 1
© 2016 Fair Isaac Corporation. Confidential.
This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent.
A Consortium and Its Data
Fraud Screening for 2/3rds of All Card Transactions
Scott Zoldi, PhD
Chief Analytics Officer, FICO
ScottZoldi@fico.com
@ScottZoldi
Predictive Analytics World for Business
San Francisco
May 16, 2017
- 2. © 2016 Fair Isaac Corporation. Confidential. 2
About Me
• Responsible for analytic development of FICO’s
product and technology solutions, including Falcon
Fraud Manager
• 17 years at FICO
• Author of 77 patents
─ 38 granted and 39 in process
• Recent focus on self learning analytics for real-time
detection of Cyber Security attacks, AML detection,
and mobile device analytics
• Ph.D. in theoretical physics from Duke University
- 3. © 2016 Fair Isaac Corporation. Confidential. 3
History of Fraud Detection at FICO
Falcon
introduced
Percentage of the US payment cards
covered by FICO fraud solutions90%
20
15
10
5
0
1990 1994 1998 2002 2006 2014
Fraud Losses
Banks participating in FICO’s fraud
data consortium9,000
Active financial accounts protected by
FICO worldwide2.6B
Average response time for fraud
decisions rendered by Falcon10ms
2010
- 4. © 2016 Fair Isaac Corporation. Confidential. 4
Profiles Summarize Customer Transaction History
Recursive analytic features to efficiently summarize history
numTrx(t-2)
numCashTrx(t-2)
avgAmount4hr(t-2)
Profile(t-2) Profile(t-1) Profile(t)
numTrx(t-1)
numCashTrx(t-1)
avgAmount4hr(t-1)
numTrx(t)
numCashTrx(t)
avgAmount4hr(t)
Patents 14/796,547 (USA) ,14/613,300 (USA)
12:31:05, MCC123, USD, ...
13:01:15, MCC234, USD, ...
13:32:07, MCC345, USD, ...
14:03:25, ATM223, USD, ...
...
...
18:42:27, MCC567, USD, ...
...
...
Customer History
Too big!
Time
Amount
Clothing Expense
Restaurant
Expense
Cash
withdrawal at
ATM
- 5. © 2016 Fair Isaac Corporation. Confidential. 5
Fraud Detection Through Neural Networks
Activation function
𝑓 𝑧 =
𝑒 𝑧 − 𝑒−𝑧
𝑒 𝑧 + 𝑒−𝑧
Powerful detection of known patterns of frauds using supervised learning
• Features based on raw
transactions
Model Inputs
• Computational unit takes
input and generates output
• Uses an activation
function
Hidden Layer
• Single output indicating
fraud/non-fraud
Model Output
Hidden
Layer
Input
Layer
Feature
Extraction
Output
Layer
- 6. © 2016 Fair Isaac Corporation. Confidential. 6
Fraud Detection With Unsupervised Learning
Streaming self-calibration
• Track 95% and 99% points automatically
• For each feature, create outlier model
• Memory & time efficient, no historical data
storage
• Real-time adapting to every transaction
Multi-layered Self-Calibrating (MLSC) Score
• Combine the outlier models from all
features
• Features in hidden nodes are selected to
minimize correlation
• Weights based on:
• Expert knowledge
• Limited data
Multi-Layer
Self-Calibrating Score
Hidden
Layer
Input
Layer
Output
Layer
Weight
Tuning
Patents 8,027,439, 8,041,597 13/367,344 (USA), 14/796,547 (USA) ,14/613,300 (USA
Low Risk
Feature
Probability
Current feature value
High Risk
95%99%
Scores 1– 999
Effective detection of unknown and changing patterns of frauds
- 7. © 2016 Fair Isaac Corporation. Confidential. 7
Auto-Encoder Learns a Data Representation
• Deep Learning algorithm that sets target values equal to input and applies
unsupervised learning to minimize the reconstruction error
─ Provides a compressed distributed representation (encoding) of original data.
𝑥
𝑊𝐸 𝑊𝐷
𝐸 𝑅 = 𝑥 − 𝑥 𝑅
2
𝑥 𝑅𝑥 𝑥 𝑅 Learning
Latent features
Reconstruction Error
Reconstructed Image
https://commons.wikimedia.org/w/index.php?curid=488211
- 8. © 2016 Fair Isaac Corporation. Confidential. 8
Power of Pooling
Data Consortium
- 9. © 2016 Fair Isaac Corporation. Confidential. 9
Data Consortium
Percentage of credit card
accounts in the world that
are covered by FICO fraud
solutions
• Most transactions are genuine
─ Fraud: a rare class problem
─ Genuine cases could be non-
representative
─ Hard to create a great model
• Consortium
─ Pools data from across the globe
─ More fraud cases
─ More diverse data
─ Superior models
Clients benefit from the pooled data
- 10. © 2016 Fair Isaac Corporation. Confidential. 10
Challenges Working With Consortium
Media &
Frequency
Processing
power & time
Data-format
Data Security
& Compliance
Cross-
Contamination
Data-quality
Client-
uniqueness
Terabytes of raw data received each
month
- 11. © 2016 Fair Isaac Corporation. Confidential. 11
Consortium Data Flow: 4 Steps
Receive
File
Process
File
Process
Data
Clean
Data
- 12. © 2016 Fair Isaac Corporation. Confidential. 12
(1) Data Transmission
Data
FICO “Landing Zone”
• Process 50,000+ Consortium files per month
• About 1 petabyte of payment card data in 5 years
•Electronic, disc, etc.
•Daily, weekly, monthly
Receive
File
Process
File
Process
Data
Clean
Data
- 13. © 2016 Fair Isaac Corporation. Confidential. 13
Files
Receipt
• Encryption Check
• Extract
• File Tagging
• Assign Client & Build
Archives
(2) Data Security and Basic ETL
• Encrypt and obfuscate (hash) PII and sensitive client data
• Quarantine data that fails the FICO Data Security Analysis
• File ETL: archive, inventory, join, transform, and tag
•Receipt
•Security
•Basic file processing
•Archive data
Receive
File
Process
File
Process
Data
Clean
Data
- 14. © 2016 Fair Isaac Corporation. Confidential. 14
(3) High Level Statistically-Based Alerting and Trend Analysis
Data
checks
Transactionvolume Time
Missing data
analystA@fico.com
Subject: Data Issue
File x has invalid
fraud dates
• FICO processes about 20 billion Falcon records/month
• Each file checked against global and client-specific statistical distributions
•Health checks
•Statistical analysis
•Tabulate
Receive
File
Process
File
Process
Data
Clean
Data
- 15. © 2016 Fair Isaac Corporation. Confidential. 15
(4) Apply Domain and Client Specific Transformations
Raw
File
Clean
File
001000001
01010100 01001101
ATM
Currency Code
Spain
CVV2
valid
Cust.
not
present
.com
merchant
eCommerce
Record Length = 500
Record Length = 492
Record Length = 494
Record Length = 500
Record Length = 500
Record Length = 500
Record Length = 500
Record Length = 500
Record Length = 500
Record Length = 500
Regularly review transformations for relevance and sunset if obsolete
•Data fixes
•Automatic, client- specific
•Cross-client uniformity
Receive
File
Process
File
Process
Data
Clean
Data
- 16. © 2016 Fair Isaac Corporation. Confidential. 16
Model Governance
Applications of Supervised and Unsupervised Learning Technologies
- 17. © 2016 Fair Isaac Corporation. Confidential. 17
Model Governance
Is Serious Business
Regulators
Customers
Internal
Audit
Modeling
Data
• Model Inputs
• Development Data
• Data Quality
• Sensitivity Analysis
Model
Specifications
• Model Structure
• Model Assumptions
• Benchmark/Alternative Architectures
• Model Updates
Model
Validation
• Data Validation
• Model Validation
Deployment
Validation
• Post Implementation Validation
• Performance Monitoring
OCC
FICO
Client
Requests
FICO
Model
Governanc
e
- 18. © 2016 Fair Isaac Corporation. Confidential. 18
The OLD : Data Quality Reporting
Check basic data integrity
─ Data quality reports
─ Red flags: Missing records,
fields, or incorrect data types
Monitor before and during model
deployment
─ Data Statistics
─ Score Distributions
─ Model Performance
snapshots
- 19. © 2016 Fair Isaac Corporation. Confidential. 19
4. Anomalous
Transaction
Identification
3. Trigger
Model-
Retrain
2. New Client
Model
Selection
The NEW: Auto Encoders in Model & Data Governance
Production
Data
Production Model
1. Data Feed
Validation
- 20. © 2016 Fair Isaac Corporation. Confidential. 20
1. Data Feed Validation
Statistical analysis often too generic to point to data integrity issues
• Auto-Encoder can easily identify sets of transactions across clients with different
reconstruction errors which identify key data integrity issues
Transaction Amount
Frequency
Wrong currency conversion
Correct currency conversion
Cluster
reconstruction
errors
Per-cluster root
cause analysis
- 21. © 2016 Fair Isaac Corporation. Confidential. 21
2. New Client Model Selection
Identify model trained on consortium data with minimal reconstruction
error compared with new client’s transaction data
Minimum
Reconstruction
Error
Indonesia?
- 22. © 2016 Fair Isaac Corporation. Confidential. 22
3. Trigger Model-Retrain
Learn a companion auto-encoder network based on the same data as the
unsupervised model
─ Unsupervised model and the auto-encoder network is packaged together and
installed in the production environment.
Timeline
Recon-error
Timeline
Recon-error
No significant deviation
Significant deviation
Rebuild
Score
- 23. © 2016 Fair Isaac Corporation. Confidential. 23
4. Anomalous Transaction Identification
Score or,
rule-triggered review
Timeline
Recon-error
Above-threshold Error
Outlier
detection
Feature
Engineering
- 24. © 2016 Fair Isaac Corporation. Confidential. 24
Surfacing Patterns
Leveraging Consortium Data to Inform Predictive Modeling
27
- 25. © 2016 Fair Isaac Corporation. Confidential. 25
Three large US credit issuers for 2014 and 2015
MCC = Merchant Category Code
$1TransactionsInvestigative Analysis
$ Amount
Tran#
Anomaly
emerges
8% of $1 CNP
- 26. © 2016 Fair Isaac Corporation. Confidential. 26
Investigating– Eating Places Analysis
75% of
$1 CNP Fraud
2 Food Vendors
What can we learn about the Vendors…
- 27. © 2016 Fair Isaac Corporation. Confidential. 27
Investigating– Eating Places Analysis
Vendor 1
32% Ep-
MCC
Vendor 2
42% Ep-
MCC
Vendor 1
32% Ep-MCC
Vendor 2
42% Ep-MCC
7,791
Fraud PANs
19,199
Fraud
PANs
Why?
- 28. © 2016 Fair Isaac Corporation. Confidential. 28
Investigating – Visibility makes Huge difference
Effective
Strategy by
Credit Issuer!
Compromise Detection
Vendor 1
32% Ep-
MCC
Vendor 2
42% Ep-
MCC
~55% PANs closed 1stday (Industry avg. )
~90% closed 1stday
Specifically targeted 1 issuer!
- 29. © 2016 Fair Isaac Corporation. Confidential. 29
Investigating New Card Testing Scheme
Fraudsters do “test” transactions
─ Usually $1
─ Fraudsters invent new testing
schemes to evade detection
• $22M loss could have been
reduced by Vendor 1
• US CNP fraud ~
$3.8Billion/Year
CNP = Card Not Present i.e., online, phone
etc
Aite Group.
$1
$10
$100
$1,000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Fraud DollarsExample Trend after “test” transaction
“Test”
- 30. © 2016 Fair Isaac Corporation. Confidential. 30
Investigating New Card Testing Scheme
• Detecting and responding to schemes
─ Fraudsters will continuously attempt new
attack methods
─ Data science informs model design
• Data Science
─ Brute force statics can miss changes
─ Autoencoders can detect shifts in fraud
patterns in real time
• Model design
─ Adaptive models to learn new fraud patterns
─ Entity profiles respond at the entity level
- 31. © 2016 Fair Isaac Corporation. Confidential. 31
Consortium, Deep Learning, and Data Science
steps up the fight against Cybercrime
- 32. © 2016 Fair Isaac Corporation. Confidential. 32
© 2016 Fair Isaac Corporation. Confidential.
This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent.
Thank You
Scott Zoldi, Chief Analytics Officer
FICO
scottzoldi@fico.com
@ScottZoldi