1
Fighting Financial Fraud
with Artificial Intelligence
© 2017 Teradata
2
Ron Bodkin
VP & GM, Artificial
Intelligence, Teradata
@ronbodkin
Nadeem Gulzar
Head of Global Analytics,
Danske Bank
@Nadeem_Gulzar
3
So far we have delivered
150+ successful projects for
100+ clients worldwide.
500+
employees
(2017)
Vendor-neutral with
an open source
focus.
Full spectrum consulting, data
engineering, data science &
support.
Apache
Hadoop
and cloud
ecosystem
integration.
Founded in 2010
industry thought
leader.
Fixed fee offerings
for data science
and engineering.
Who Is Think Big?
1st Big Data provider 100%
focused around open source.
4
We help our customers be financially
confident and achieve their ambitions
by making daily banking and
important financial decisions easy.
19,000
employees
(2017)
1,800 corporate and
institutional
customers
Danske Bank covers all Personal
Banking, Business Banking, Corporates
& Institutions and Wealth
Management.
2.7 million
personal
customers
236,000 small and
medium-sized
business customers
Making banking
easier for over 145
years.
Who Is Danske Bank?
Danske Bank is a Nordic universal bank
with strong local roots and bridges to
the rest of the world
5
Data Driven Approach to Fight Fraud
Challenges for
Fraud Detection
• Low Detection Rate
• Many False Positives
• High Fraud Loss
• Fast Growing Competition
Ambitions for
Fraud Project
Danske Bank advanced
analytics blueprint
Data driven approach to
real time scoring of
transactions
Reduce false-positives &
Enhance fraud detection rate
6
Data Driven Approach to Fight Fraud
Fast evolving fraud sophistication
Ambitions for Fraud Project
Danske Bank advanced
analytics blueprint
Data driven approach to
real time scoring of
transactions
Reduce false-positives &
Enhance fraud detection rate
ONLY ~40%
of fraud cases are detected
Low Detection Rate
99.5%
of cases are not
fraud related
Many false positives
Challenges for Fraud Detection
Tens of Millions
€ lost each month
High Fraud Loss
© 2017 Teradata
7
Fraud Types – Customer Initiated
Nigeria/
Investment Scam
CEO
Fraud
Customer
Initiated
© 2017 Teradata
Beneficiary
Account Change
Fake
Invoice
Rental Scam/
Goods Not
Received
8
ID Theft
SPEAR
Phishing
Vishing/Support Scam
Malware
Phishing/Smishing
Fraud Types – Fraudster Initiated
Fraudster
Initiated
© 2017 Teradata
9
Modeling
Challenges
© 2017 Teradata
• Class imbalance
(100,000:1 non-fraud vs. fraud)
• Assigning fraud labels from
historic data
• Fraud is ambiguous
• Not all features available in
real-time
• Most machine learning sees
transactions atomically
10
Advanced Platform for Fraud: Data-Driven Approach
How It Works:
• Understanding the domain
• Gathering and preparing the data
• Automatically generating the rules and recognizing fraudulent patterns
by training models on historical data
• Automatically maintaining the engine by retraining the model
Pros:
• Automatic/data-driven/objective
inference of the rules
• Ability to detect patterns in a high
dimensional data input
• Fast detection of new/changing
fraudulent patterns
Cons:
• Might be unintuitive and hardly
interpretable
• Data preparation and feature
aggregation is time consuming
© 2017 Teradata
11
​Phase One
​© 2017 Teradata
12
Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul
Project kick-off
30th September
Teamwork to Deliver Value in Each Project Iteration
2016 2017
Kick-off: Data
Scientist track
Go-Live plan set
for the project
19th December
Kick-off:
Engineering track
Models successfully in
Shadow Production
4th March
First round trip
test transaction
First production
virtual machine
Production Hardening for
HA and Security
Full productionisation
of the Fraud Engine
Cross-functional
Collaboration
Cont. Deep
Learning
modeling
© 2017 Teradata
13
>
Banking
Anti-Fraud
Solution
© 2017 Teradata
Banking Anti-Fraud Solution
By leveraging the power of
a thoughtful and strong
data and analytics
strategy, we unleash high
impact business outcomes.
• Multiple models running in
production at the same time
• Mix of traditional and advance
deep learning methods
• AnalyticsOps: Deploying machine
learning models in production
Model
Management
Framework
• Organisation and silos of data
• Real-time data integration
• Security and Procedures: following
existing bank procedures
Data Modelling,
Pipeline and Ingestion
• Hard to operationalize insights
• Availability of analytic capabilities/
skills and data
• Interpreting the results of machine
learning models
Machine Learning and
Artificial Intelligence
14
Advanced Platform for Fraud: Data-Driven Approach
Manual
Evaluation
eBanking
Mobile
Pay
Business
Online
Global Payment
Interface
Fraud?
Create
Verification
Fraud
Payment
Weblog Data
Basic Customer
Data
Historic
Transactions
Customer
Product Data
Aggregated
Customer Data
Fraud?
Advanced Analytics
Platform for Fraud
Central Fraud Engine
Process Payment
to Beneficiary
Yes/Maybe
No
Yes !
Return Payment
to Customer
Update
Fraud Data
© 2017 Teradata
No
15
Model Management
Real Time Scoring
System
Production
Operations
Source
Systems
Data
Managers
Framework to Enhance Danske Bank AnalyticsOps Capabilities
Log
Transactions
Add Historic
Data
Model
Performance
Promote
Models
© 2017 Teradata
Data Scientists
Development & Deployment
Management
Data Science Lab
Data Analysis &
Model Development
Data
Management
Developers &
Release Managers
16
Key Requirement: Model Interpretation
• We have deployed LIME (Locally Interpretable Model Explanation) for
customers
– Improves trust
– Compliance with EU’s General Data Protection Regulation (GDPR)
© 2017 Teradata
17.6% Fraud Probability
Customer Amount
Debit Amount
Avg. # Trans Cred Acc
# Prev to This Dest
# Xfer Accounts
X% score due to:
+ transfer amount
+ destination country
+ last year monthly spend
What features are most
important to this decision?
17
Machine Learning Results
(Live System: 60 transactions/sec.)
Ensemble of boosted decision trees
and logistic regression.
From online validation of the model:
● 25-30% false positive reduction, with
over 35% increase in detection rate
● Opportunity to expand model with
additional features, retrain on recent
data and add additional models to
the ensemble.
● Models can be expanded to
additional channels
Rule Engine on
validation set
© 2017 Teradata
18
​Deep Learning
​© 2017 Teradata
19
Current models can
only catch ~70% of
all fraud cases
Deep Learning Opportunity
Traditional ML models
view transactions
atomically
Often missed fraud
transactions are part
of a series
Capturing
correlation across
many features
© 2017 Teradata
20
Three Deep Learning Architectures to Deliver Value
• Designed for spatial
correlated features,
but by transforming
transactions into a
2D image, we can
learn temporal
correlated features.
• Deeper ConvNet
allows learning more
complex & general
features.
Goal: Learn kernels from
temporal & static
features to gain insight
into the characteristics of
fraud.
• Learn temporal
information and
classify if the
sequence of
transactions
contains fraud.
• Shares knowledge
across learning time.
Goal: Learn transaction
patterns within a window.
Two solutions can be
tested: flag fraud or
predict next transaction
and define an error.
• Learn how to
generate normal
transactions,
potentially large
volumes of non-fraud
data.
• AE provide a low level
representation of the
data.
Goal: Build a model that
learns how to generate
non-fraud data. To
detect fraud, define a
reconstruction error rate
for the fraud cases
Auto-Encoders
LSTM
ConvNet
© 2017 Teradata
21
How Can We Create an Image From Bank Transactions?
t0 X_0, X_1, ... X_n
dt
t1 X_0, X_1, ... X_n
t2 X_0, X_1, ... X_n
ts X_0, X_1, ... X_n
...
Top k Features Correlation
...
X_0
X_41 X_5
X_30
X_29X_31X_10
X_37
X_3
X_1
X_42 X_40
X_32
X_15X_35X_2
X_16
X_31
X_2
X_3 X_15
X_4
X_1X_28X_40
X_31
X_49
X_n
X_26 X_9
X_40
X_35X_28X_2
X_17
X_1
...
X_0
X_41 X_5
X_30
X_29X_31X_10
X_37
X_3
X_1
X_42 X_40
X_32
X_15X_35X_2
X_16
X_31
X_2
X_3 X_15
X_4
X_1X_28X_40
X_31
X_49
X_n
X_26 X_9
X_40
X_35X_28X_2
X_17
X_1
...
...
...
...
...
X_0
X_41 X_5
X_30
X_29X_31X_10
X_37
X_3
X_1
X_42 X_40
X_32
X_15X_35X_2
X_16
X_31
X_2
X_3 X_15
X_4
X_1X_28X_40
X_31
X_49
X_n
X_26 X_9
X_40
X_35X_28X_2
X_17
X_1
N
dt
t0
t1
ts
Input Output
Raw Features
Add correlated features
in a clock-wise manner
© 2017 Teradata
Image size is:
[10 x 3, 50 x 3, 1]
22
Convolutional Layers for Trans2D
Kernels
of 3x3
...X_0 X_1 X_2 X_n
...X_0 X_1 X_2 X_n...
...
...
...
...X_0 X_1 X_2 X_n
Features
Time
Strides of 3
Strides
of 1
First Convolutional Layer Architecture
© 2017 Teradata
23
2D Transaction Image Example
© 2017 Teradata
Non-fraud Transaction Image
Non-fraud
Fraud Transaction Image
X-axis: features, Y-axis: time
Fraud
Non-fraud
24
Network Architecture for CNNs
Fraud
Normal
50
30
25
15
13
8CNN
Fraud
Normal50
30
25
15
25
15
25
15
25
15
© 2017 Teradata
25
Inside the ResNet model
64 Filters
Activations
After the CNN
Residual Blocks
FraudNon-fraud
© 2017 Teradata
26
Deep Learning
First Results
on the fraud verification dataset
Comparison of the three deep
learning models andthe traditional
machine learning ensemble model.
© 2017 Teradata
• Ensemble model (AUC 0.89)
• ConvNets (AUC 0.95)
• LSTM (AUC 0.90)
• ResNet (AUC 0.94)
27
Lessons Learned: Take-Aways From Danske Bank
Deep
learning
adoption
from
pictures to
financial
transactions
Enhancement
of data quality
& cluster
capabilities
with data
ingestion
Building
Analytics
Ops
capabilities
to support
business
units
Leveraging
experience
from Fraud
advanced
analytics to
deliver extra
use cases
© 2017 Teradata
28
Success: From PowerPoint to production in 8 sprints
Team effort: Thorough collaboration across IMD, GFU and Think Big
Synergy: Successfully spearheaded innovation in all involved systems
Inspiration: Incorporation Danske Bank advanced analytics blueprint sets a generic
scene for combatting new challenges in advanced analytics
Agile influence: Using an agile approach we were able to quickly deliver within the
challenging timeframe.
© 2017 Teradata
Big Data Team – Lessons Learned
29
Ron Bodkin
VP & GM, Artificial
Intelligence, Teradata
@ronbodkin
Nadeem Gulzar
Head of Global Analytics,
Danske Bank
@Nadeem_Gulzar
30
​Appendix
​© 2017 Teradata
31
Advanced Analytics Fraud Platform
• Develop a scalable and expandable platform which follows the Danske
Bank blueprint of digitalization
• 100 % data-driven approach to find patterns in the data and complement
the existing fraud engine
• Use Hadoop to handle the large data volumes for training models on
transaction data
• Implement a real-time solution that can score live transactions such as
e-banking, credit card and mobile payments.
• Reduce amount of false-positives by at least 20-40 %
© 2017 Teradata
32
Ambitions for the Advance Analytics project
• The goal is to enhance fraud detection
and reduce false positives across products.
• Fraud is only the first of many possible
advanced analytics use cases.
• The project leads to the creation of Danske
Bank advanced analytics blueprint.
• Danske Bank ambition is to become one of
the bank leading in advanced analytics
capabilities.
Challenges in Fraud Detection at Danske Bank
• Low detection rate: Existing human-written
rule engine catches ~40% of fraud cases.
• Many false positives: 99.5% of all cases
investigated are not fraud related.
• High fraud loss: Tens of millions of € total
fraud per month.
• Mobile payments are quickly growing in
number.
• Fraud evolving rapidly, with increased
sophistication: Danske Bank must
modernize its anti-fraud arsenal.
© 2017 Teradata

Fighting financial fraud at Danske Bank with artificial intelligence

  • 1.
    1 Fighting Financial Fraud withArtificial Intelligence © 2017 Teradata
  • 2.
    2 Ron Bodkin VP &GM, Artificial Intelligence, Teradata @ronbodkin Nadeem Gulzar Head of Global Analytics, Danske Bank @Nadeem_Gulzar
  • 3.
    3 So far wehave delivered 150+ successful projects for 100+ clients worldwide. 500+ employees (2017) Vendor-neutral with an open source focus. Full spectrum consulting, data engineering, data science & support. Apache Hadoop and cloud ecosystem integration. Founded in 2010 industry thought leader. Fixed fee offerings for data science and engineering. Who Is Think Big? 1st Big Data provider 100% focused around open source.
  • 4.
    4 We help ourcustomers be financially confident and achieve their ambitions by making daily banking and important financial decisions easy. 19,000 employees (2017) 1,800 corporate and institutional customers Danske Bank covers all Personal Banking, Business Banking, Corporates & Institutions and Wealth Management. 2.7 million personal customers 236,000 small and medium-sized business customers Making banking easier for over 145 years. Who Is Danske Bank? Danske Bank is a Nordic universal bank with strong local roots and bridges to the rest of the world
  • 5.
    5 Data Driven Approachto Fight Fraud Challenges for Fraud Detection • Low Detection Rate • Many False Positives • High Fraud Loss • Fast Growing Competition Ambitions for Fraud Project Danske Bank advanced analytics blueprint Data driven approach to real time scoring of transactions Reduce false-positives & Enhance fraud detection rate
  • 6.
    6 Data Driven Approachto Fight Fraud Fast evolving fraud sophistication Ambitions for Fraud Project Danske Bank advanced analytics blueprint Data driven approach to real time scoring of transactions Reduce false-positives & Enhance fraud detection rate ONLY ~40% of fraud cases are detected Low Detection Rate 99.5% of cases are not fraud related Many false positives Challenges for Fraud Detection Tens of Millions € lost each month High Fraud Loss © 2017 Teradata
  • 7.
    7 Fraud Types –Customer Initiated Nigeria/ Investment Scam CEO Fraud Customer Initiated © 2017 Teradata Beneficiary Account Change Fake Invoice Rental Scam/ Goods Not Received
  • 8.
    8 ID Theft SPEAR Phishing Vishing/Support Scam Malware Phishing/Smishing FraudTypes – Fraudster Initiated Fraudster Initiated © 2017 Teradata
  • 9.
    9 Modeling Challenges © 2017 Teradata •Class imbalance (100,000:1 non-fraud vs. fraud) • Assigning fraud labels from historic data • Fraud is ambiguous • Not all features available in real-time • Most machine learning sees transactions atomically
  • 10.
    10 Advanced Platform forFraud: Data-Driven Approach How It Works: • Understanding the domain • Gathering and preparing the data • Automatically generating the rules and recognizing fraudulent patterns by training models on historical data • Automatically maintaining the engine by retraining the model Pros: • Automatic/data-driven/objective inference of the rules • Ability to detect patterns in a high dimensional data input • Fast detection of new/changing fraudulent patterns Cons: • Might be unintuitive and hardly interpretable • Data preparation and feature aggregation is time consuming © 2017 Teradata
  • 11.
  • 12.
    12 Sep Oct NovDec Jan Feb Mar Apr May Jun Jul Project kick-off 30th September Teamwork to Deliver Value in Each Project Iteration 2016 2017 Kick-off: Data Scientist track Go-Live plan set for the project 19th December Kick-off: Engineering track Models successfully in Shadow Production 4th March First round trip test transaction First production virtual machine Production Hardening for HA and Security Full productionisation of the Fraud Engine Cross-functional Collaboration Cont. Deep Learning modeling © 2017 Teradata
  • 13.
    13 > Banking Anti-Fraud Solution © 2017 Teradata BankingAnti-Fraud Solution By leveraging the power of a thoughtful and strong data and analytics strategy, we unleash high impact business outcomes. • Multiple models running in production at the same time • Mix of traditional and advance deep learning methods • AnalyticsOps: Deploying machine learning models in production Model Management Framework • Organisation and silos of data • Real-time data integration • Security and Procedures: following existing bank procedures Data Modelling, Pipeline and Ingestion • Hard to operationalize insights • Availability of analytic capabilities/ skills and data • Interpreting the results of machine learning models Machine Learning and Artificial Intelligence
  • 14.
    14 Advanced Platform forFraud: Data-Driven Approach Manual Evaluation eBanking Mobile Pay Business Online Global Payment Interface Fraud? Create Verification Fraud Payment Weblog Data Basic Customer Data Historic Transactions Customer Product Data Aggregated Customer Data Fraud? Advanced Analytics Platform for Fraud Central Fraud Engine Process Payment to Beneficiary Yes/Maybe No Yes ! Return Payment to Customer Update Fraud Data © 2017 Teradata No
  • 15.
    15 Model Management Real TimeScoring System Production Operations Source Systems Data Managers Framework to Enhance Danske Bank AnalyticsOps Capabilities Log Transactions Add Historic Data Model Performance Promote Models © 2017 Teradata Data Scientists Development & Deployment Management Data Science Lab Data Analysis & Model Development Data Management Developers & Release Managers
  • 16.
    16 Key Requirement: ModelInterpretation • We have deployed LIME (Locally Interpretable Model Explanation) for customers – Improves trust – Compliance with EU’s General Data Protection Regulation (GDPR) © 2017 Teradata 17.6% Fraud Probability Customer Amount Debit Amount Avg. # Trans Cred Acc # Prev to This Dest # Xfer Accounts X% score due to: + transfer amount + destination country + last year monthly spend What features are most important to this decision?
  • 17.
    17 Machine Learning Results (LiveSystem: 60 transactions/sec.) Ensemble of boosted decision trees and logistic regression. From online validation of the model: ● 25-30% false positive reduction, with over 35% increase in detection rate ● Opportunity to expand model with additional features, retrain on recent data and add additional models to the ensemble. ● Models can be expanded to additional channels Rule Engine on validation set © 2017 Teradata
  • 18.
  • 19.
    19 Current models can onlycatch ~70% of all fraud cases Deep Learning Opportunity Traditional ML models view transactions atomically Often missed fraud transactions are part of a series Capturing correlation across many features © 2017 Teradata
  • 20.
    20 Three Deep LearningArchitectures to Deliver Value • Designed for spatial correlated features, but by transforming transactions into a 2D image, we can learn temporal correlated features. • Deeper ConvNet allows learning more complex & general features. Goal: Learn kernels from temporal & static features to gain insight into the characteristics of fraud. • Learn temporal information and classify if the sequence of transactions contains fraud. • Shares knowledge across learning time. Goal: Learn transaction patterns within a window. Two solutions can be tested: flag fraud or predict next transaction and define an error. • Learn how to generate normal transactions, potentially large volumes of non-fraud data. • AE provide a low level representation of the data. Goal: Build a model that learns how to generate non-fraud data. To detect fraud, define a reconstruction error rate for the fraud cases Auto-Encoders LSTM ConvNet © 2017 Teradata
  • 21.
    21 How Can WeCreate an Image From Bank Transactions? t0 X_0, X_1, ... X_n dt t1 X_0, X_1, ... X_n t2 X_0, X_1, ... X_n ts X_0, X_1, ... X_n ... Top k Features Correlation ... X_0 X_41 X_5 X_30 X_29X_31X_10 X_37 X_3 X_1 X_42 X_40 X_32 X_15X_35X_2 X_16 X_31 X_2 X_3 X_15 X_4 X_1X_28X_40 X_31 X_49 X_n X_26 X_9 X_40 X_35X_28X_2 X_17 X_1 ... X_0 X_41 X_5 X_30 X_29X_31X_10 X_37 X_3 X_1 X_42 X_40 X_32 X_15X_35X_2 X_16 X_31 X_2 X_3 X_15 X_4 X_1X_28X_40 X_31 X_49 X_n X_26 X_9 X_40 X_35X_28X_2 X_17 X_1 ... ... ... ... ... X_0 X_41 X_5 X_30 X_29X_31X_10 X_37 X_3 X_1 X_42 X_40 X_32 X_15X_35X_2 X_16 X_31 X_2 X_3 X_15 X_4 X_1X_28X_40 X_31 X_49 X_n X_26 X_9 X_40 X_35X_28X_2 X_17 X_1 N dt t0 t1 ts Input Output Raw Features Add correlated features in a clock-wise manner © 2017 Teradata Image size is: [10 x 3, 50 x 3, 1]
  • 22.
    22 Convolutional Layers forTrans2D Kernels of 3x3 ...X_0 X_1 X_2 X_n ...X_0 X_1 X_2 X_n... ... ... ... ...X_0 X_1 X_2 X_n Features Time Strides of 3 Strides of 1 First Convolutional Layer Architecture © 2017 Teradata
  • 23.
    23 2D Transaction ImageExample © 2017 Teradata Non-fraud Transaction Image Non-fraud Fraud Transaction Image X-axis: features, Y-axis: time Fraud Non-fraud
  • 24.
    24 Network Architecture forCNNs Fraud Normal 50 30 25 15 13 8CNN Fraud Normal50 30 25 15 25 15 25 15 25 15 © 2017 Teradata
  • 25.
    25 Inside the ResNetmodel 64 Filters Activations After the CNN Residual Blocks FraudNon-fraud © 2017 Teradata
  • 26.
    26 Deep Learning First Results onthe fraud verification dataset Comparison of the three deep learning models andthe traditional machine learning ensemble model. © 2017 Teradata • Ensemble model (AUC 0.89) • ConvNets (AUC 0.95) • LSTM (AUC 0.90) • ResNet (AUC 0.94)
  • 27.
    27 Lessons Learned: Take-AwaysFrom Danske Bank Deep learning adoption from pictures to financial transactions Enhancement of data quality & cluster capabilities with data ingestion Building Analytics Ops capabilities to support business units Leveraging experience from Fraud advanced analytics to deliver extra use cases © 2017 Teradata
  • 28.
    28 Success: From PowerPointto production in 8 sprints Team effort: Thorough collaboration across IMD, GFU and Think Big Synergy: Successfully spearheaded innovation in all involved systems Inspiration: Incorporation Danske Bank advanced analytics blueprint sets a generic scene for combatting new challenges in advanced analytics Agile influence: Using an agile approach we were able to quickly deliver within the challenging timeframe. © 2017 Teradata Big Data Team – Lessons Learned
  • 29.
    29 Ron Bodkin VP &GM, Artificial Intelligence, Teradata @ronbodkin Nadeem Gulzar Head of Global Analytics, Danske Bank @Nadeem_Gulzar
  • 30.
  • 31.
    31 Advanced Analytics FraudPlatform • Develop a scalable and expandable platform which follows the Danske Bank blueprint of digitalization • 100 % data-driven approach to find patterns in the data and complement the existing fraud engine • Use Hadoop to handle the large data volumes for training models on transaction data • Implement a real-time solution that can score live transactions such as e-banking, credit card and mobile payments. • Reduce amount of false-positives by at least 20-40 % © 2017 Teradata
  • 32.
    32 Ambitions for theAdvance Analytics project • The goal is to enhance fraud detection and reduce false positives across products. • Fraud is only the first of many possible advanced analytics use cases. • The project leads to the creation of Danske Bank advanced analytics blueprint. • Danske Bank ambition is to become one of the bank leading in advanced analytics capabilities. Challenges in Fraud Detection at Danske Bank • Low detection rate: Existing human-written rule engine catches ~40% of fraud cases. • Many false positives: 99.5% of all cases investigated are not fraud related. • High fraud loss: Tens of millions of € total fraud per month. • Mobile payments are quickly growing in number. • Fraud evolving rapidly, with increased sophistication: Danske Bank must modernize its anti-fraud arsenal. © 2017 Teradata