Nitin sharma - Deep Learning Applications to Online Payment Fraud Detection

Deep Learning Applications To
Online Payments Fraud
Detection

Part 1 - Problem Background & Motivation

PayPal Ecosystem (1)
©2017 PayPal Inc. Confidential and proprietary.
Complex Social Graph of Consumers & Merchants
v Establish confidence/trust for millions of account
holders to connect and transact in different
modes, at scale in markets all over the world.
v Personal Accounts
v PayPal Personal Account
ü Send Money
ü Receive Money
ü Make Purchases
ü Defer Payments (PayPal Credit)
v PayPal Mobile App
v Business Accounts
v Different needs of different users; Collecting funds in
exchange of goods/services
v Connect at cash registers through Mobile for web-
based checkouts, app-based or Credit Card readers
• Person unloading goods online
• Food Truck Collecting Payments on Tablet
• Landscaping Services - payment on phone
• Major retailers with checkout flows
Where?
Online In-store
Web Mobile
What?
Money
Transfer
Goods
Digital
Tangible
Services
Local/Small
Scale
Retail/Large
Scale
International US-based
Credit
Person
Business
Who?
Person Business
1.
2.
3. OR
?
Heterogeneous
Ecosystem
Good
User?
Fraudster

2018 Full-Year Statistics
$15.45B
REVENUE**
$578B
TOTAL PAYMENT
VOLUME1
9.9B
TRANSACTIONS2
$227B
MOBILE PAYMENT
VOLUME
3.7B
MOBILE PAYMENT
TRANSACTIONS
4
246M*
Consumer Accounts
21M*
Merchant Accounts
• Multiple Countries/Regions
• Multiple Currencies
• Multiple Funding Instruments
PayPal Ecosystem (2)
Massive Scale of E-Commerce

Problem Formulation – Fraud Detection
• Reliably facilitate large scale e-commerce between buyers and sellers:
• Protect the identity of transacting entities
• Establish trust between transacting entities
• Scale across countries, currencies, products and modes of transaction
• Facilitate e-commerce or money exchange swiftly
• Boils down to:
• Reliably separating good customers from potentially bad ones
• Maximize decline of bad transactions or fraudulent entities
• Addressing complex fraud patterns across countries, currencies, products and modes of transaction
• Addressing temporally evolving fraud patterns on different platforms
• Maximize approval of good transactions or legitimate entities
• Approve good transactions up-front quickly for best user experience
• Reduce False Positives or Good User Declines
• Modus Operandi or Behaviors of good and bad customers – What is it? How does it manifest?
©2017 PayPal Inc. Confidential and proprietary. 6
Business Bottomline

Complexity of Risks in PayPal Ecosystem
What is it?
- Gain unauthorized access to account and transact.
- Log in and out but not transact
How to Monetize?
- Use existing FS to buy goods from legit sellers
- Send money to themselves from account
- Sell account to others
- Attack Prep, Mask with SF, Layering
How is identity stolen?
- Data Breaches
- Phishing
- Not Sufficient Funds – Bank Transfers take time, no
immediate response (account exists? Has enough balance?)
- Collusion (not received or different), Friendly Fraud, Abuse
Buyers
Buyer Abuse
Bounced
Check
Sellers
Consumer
Identity/Stolen
Financial
CreditRisk
Fraud
Risk
Protections Policy
Collusion,
Mal Intent
Bankruptcy
Seller
Identity
Account Take Over
Stolen Identity
Stolen Financials
What is it?
- Steal FS (CC/DC or bank) and add to new account.
How to Monetize?
- Use existing FS to buy goods from legit sellers
- Send money to another PP account or bank account
- Aging
How is financial stolen?
- Data Breaches
- Phishing
Others
- Use stolen identity to
apply for credit
Credit Fraud
Credit Risk
- Will they pay on time?
- Assess Credit-Worthiness
- Consumers / Merchants
- Allocate Credit Lines
- Heavy Regulation in Modeling
How different fraud behaviors manifest?

Market for Fraudsters
Source: SecureWorks Underground Hacker Markets Annual Report April 2016
Credentials Available Online for a Price
17

Sustaining Model Performance
Performance deteriorates with Time
TRAIN TEST
Jan Dec April
2016 2017
Conceptual
Population
P(x, Y)
OFFLINE
May Nov
2017
LIVE
FPR
TPR
TPR
FPR
05/17
07/17
09/17

Time-Varying Ecosystem
10
Areas of Improvement
• Technology
• Gradual Ramp-up of new features or products.
• Evolving Fraud (Desktop / Mobile)
• Seasonality – Short / Long
Why does the population change?
DomainFeaturesModel
Raw Data from Events
+
Data
+
Time Aggregation
• Round-about view of Time / Memory
• Long/Short term – seasonal distinction
• Anomalies
• Assumptions with Time-based Manual Feature Engineering
• High Dimensionality of Initial Space:
• Correlation / Redundancy
• How features change with systemic fraud evolution? Feature
generation removed from training process
• Robust features across time/distribution shifts
• What about cross-domain learning?
• Discover new features common representation across domains
• How can we explicitly also reduce good user declines?
• Can we learn from past intelligence?
• What could be ways to address class imbalance?
Manual Feature Engineering: Traditional ML
F(x)
• How is Fraud Data Different?
• Representation (No Pixel-like consistency)
• Temporality (X and Y)

Part 2 – Applications of Deep Learning
Architectures

Addressing Class Imbalance
• Given the low ratio of fraudulent to legitimate transactions, the modeling context poses class imbalance problem.
Small Bad to Good Ratio – SMOTE (Chawla et al., 2002)
• Introduce synthetic examples along the line segments joining
any/all of the k minority class nearest neighbor.
• Depending on how much oversampling, neighbors from k NN are
randomly chosen.
• Take difference between feature vector (sample) and its NN;
multiply by URN(0, 1) – add to feature vector under consideration.
• Forces decision region of minority class to be more general.
• Consider $-value of fraud, high risk regions for sampling bias
• Use:
• Edited NN – remove instances whose class label differs from
majority of its K-NN.
• Tomek Links – remove Tomek links (pair of examples which
are NN but have different classes); only remove majority class
instance.
SMOTE and variants
ADAptive SYNthetic (ADASYN)
Adaptive Neighbor Synthetic (ANS)
Border SMOTE
Safe-Level SMOTE
DBSMOTE
TomekLink
1.
2. Weighted Loss Functions

Opportunities for Improvement
Manual Feature Engineering: The Prologue
• Example: Time property
• Event-perspective for temporality or rawness.
• Event features created BEFORE and independent of model training
• Can we learn the function and all underlying complexities from scratch?
E10
E9E8
w1
w2
w3
Manual Feature
Engineering
Constants Time Windows
Event Sequence
in time order
Raw
Feature 1
Raw
Feature 1
Raw
Feature k
• Correlation
• Redundancy
• Always Decay?
Representation
Learning for
Temporality

Temporal Representation Learning Using LSTM
Event-driven Deep Learning (Yuan et al., 2017)
DomainFeatures
+
E10E9E8
w1
w2w3
+
Features
Feature Discovery Using LSTM
• LSTM: learn long-term dependencies – leverage
long sequences of user behavior (good/bad).
• Classify user behaviors given lags of unknown
duration between key events (specific fraud
behaviors).
• Event sequences as input, predict either future
sequences or labels.
• Use raw event sequences: no restriction on
function, time decay.
• Features replace manually engineered features
based on assumptions.
• For LSTM:
• Use payment attempt event data (raw features) – all transactions
• Replace manually-generated features with less than half of raw features.
• Sequence train LSTM architecture using raw features.
• Using features from newly discovered feature hierarchies and other features, train another model.
• Approximately 7-10% relative increase in performance.
Model P1 P2 P3 P4 P5 P6
M3 (LSTM Feature Learning + NN) 1.0747 1.0665 1.0419 1.0720 1.1374 1.1094 E10E9E8
w1
w2w3
Fraud
Cells remember
event behaviors
over arbitrary
time intervals
• Homogeneous
• Heterogeneous

Robust Feature Learning to address Post-Deployment Shifts
Discover stable feature spaces to boost robustness
• Train stacked denoising auto-encoder to reconstruct the input from a corrupted version of it.
• Corruption based on past systemic behavior or random; for example: build models that are robust to IP corruption.
• Force the hidden-layer to discover more robust features; hence stable models.
• Simulates feature shifts/scenarios post-deployment.
• Use weights as a choice instead of randomly initializing the weights for a second stage supervised multi-task learning
problem.
Feature Selection
Ensemble
Recursive Feature
Elimination
Training

Multi-Task & Transfer Learning
Multi-Input Multi-Output Modeling Architectures
• Stacked Architecture to learn robust hierarchical features from long term fraud
patterns, multi-task cross-domain learners, hard example mining learners:
• Iteratively better than learning Ensembles from sub-sampling and then
weighting scores linearly.
• Cross Stitch Networks (Misra et al., 2016): At each layer learn linear
combination of activation maps from each task – next layer filters operate on
shared representation
…
…
…
…
Long-term
Feature Learners
…
…
…
…
Multi-task
Feature Learners
Short-term MO
Specific Models

Model Performance Comparison
Robust Feature Learning Using Hybridized Architectures
Model Monthly (18 m) Weekly (78 w)
Std. Deviation Proportion > cut-
off1
Proportion >
cutoff2
Std. Deviation Proportion > cut-
off1
Proportion >
cutoff2
M01_AE x 2.50x 1.39x x 2.61x 1.71x
BM 1 1.98x 1.33x 0.98x 2.24x 1.28x 1.05x
BM 2 2.85x x x 2.26x x x

Reducing Good User Declines (1)
• General Objective of a machine learning algorithm:
• Find parameters or weights that optimize (minimize, in this context) a loss function.
• Loss function measures how far off the prediction is from ground truth
• Gradient search is directed in a way to optimize the loss function.
• Beyond canned loss functions:
• Can a loss function be designed that explicitly penalizes false positives?
• Search is then directed to optimize a loss function that:
• Minimizes the gap between ground truth and prediction while
• Constraining to search spaces where false positives are lower.
• For fraud context:
• Improve TP or maximize fraud catch rate
• While constraining to search spaces where FP or good user decline is lowest.
• Caveat in some cases: No free lunch (FP v/s FN)
Explore DNN search space for solutions – Cost Functions
Low FPR Region
Optimal Catch
Region

Transfer Learning using Generative Modeling Contexts
Rejection Region
X >= k, Decline
Good Users Fraudsters
M1
M2
Mk
….
What’s the
probability of a
transaction being
fraudulent?
What’s the
distribution of
features that
generates fraud?
What’s the distribution of features that
generates good users who get declined by
M1 … Mk?• Deep Autoencoder
Learning
• Transfer Learning
(Feature Learners or
prediction override)
Decision
Boundary
Distant
Discriminative Generative

Hard Example Mining – Object Detection
Good
Bad
Shrivastava et al., 2016
• Train Model
• Freeze -- Identify Hard Examples
• Create Minibatch (Different Variations based on
segmentation, risky business domain, dollar value of
fraud)
• Unfreeze and Continue Training – Backpropagate
only hard examples
P(Y = 1 / X) > k
GOOD
BAD
Freeze Network
Unfreeze / Continue
Training
Create Minibatch
• Good Users who got declined
• Two passes to identify good users who get declined
and then improve classifier to re-classify these hard
examples as “good users”.

Model Performance Comparison (Catch v/s FPR)
Model* P1 P2 P3 P4 P5
DNN_CFU 1.000 (1.000) 1.000 (1.000) 1.000 (1.000) 1.000 (1.000) 1.000 (1.000)
DNN_RRFL 1.0074 (0.9488) 1.0052 (0.9702) 1.0108 (0.9701) 0.9900 (1.0474) 1.0186 (0.8362)
DNN_OHEM 1.0131 (0.8279) 1.0141 (0.9007) 1.0229 (0.8856) 1.0141 (0.7905) 1.0342 (0.6595)
FPR ratios across different methods
• Online Hard Example Mining consistently provides low FPR while retaining high catch rate, beats status-quo champion.
• Cost-function based optimizers involve locally weighting data batch by batch and need significant tuning – often cause
variability in FPR.
• Rejection Feature Learning needs further tuning, the current combination is basically a feature learner.

Deep Learning Applications to Fraud Detection
• Key Conclusions:
• Next step-function increase in performance.
• Scale performance robustly to rapidly evolving fraud patterns.
• Deep Learning Architectures offer significant performance boost:
• Far lesser trade-off between performance & robustness
• Performance scales very well with data or better hardware.
• No Pre-training Initial Assumptions (legacy):
• Learn temporally/systemically robust features while training
• Significant reduction in manual Feature Engineering (assumption-driven, static definitions)
• Learn cross-domain features -- less domain-centric restriction (segmentation, tagging)
• Past intelligence better utilized due to transfer learning and domain adaptation.
• Boost catch rate while also reduce good user decline
Conclusions
DomainFeaturesTraining
Extent of ML / Restriction
Traditional ML
Performance Stability
Sweet Zone
Deep Learning
Domain
Features
Training
Raw Data
from Events
DNN
Architectures
Cross-Domain

References
[1] Abhinav Shrivastava, Abhinav Gupta and Ross Girshick. "Training Region-based Object Detectors with Online Hard
Example Mining," arXiv:1604.03540 [cs.CV], 2016.
[2] Ishan Misra, Abhinav Shrivastava, Abhinav Gupta and Martial Hebert. "Cross-stitch Networks for Multi-task Learning,"
arXiv:1604.03539 [cs.CV], 2016.
[3] Dell SecureWorks. 2006. Underground Hacker Markets Annual Report - April 2006.
http://online.wsj.com/public/resources/documents/secureworks_hacker_annualreport.pdf
[4] Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. "SMOTE: synthetic minority over-sampling
technique," arXiv:1106.1813 [cs.AI], 2002.
[5] Shuhan Yuan, Panpan Zheng, Xintao Wu and Yang Xiang. "Wikipedia Vandal Early Detection: from User Behavior to User
Embedding, " arXiv:1706.00887 [cs.CR], 2017.
Research Papers

Nitin sharma - Deep Learning Applications to Online Payment Fraud Detection

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Nitin sharma - Deep Learning Applications to Online Payment Fraud Detection

Similar to Nitin sharma - Deep Learning Applications to Online Payment Fraud Detection (20)

More from MLconf

More from MLconf (20)

Recently uploaded

Recently uploaded (20)

Nitin sharma - Deep Learning Applications to Online Payment Fraud Detection