Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Nitin sharma - Deep Learning Applications to Online Payment Fraud Detection

253 views

Published on

Deep Learning Applications to Online Payment Fraud Detection

Published in: Technology
  • Be the first to comment

Nitin sharma - Deep Learning Applications to Online Payment Fraud Detection

  1. 1. Deep Learning Applications To Online Payments Fraud Detection
  2. 2. Agenda
  3. 3. Part 1 - Problem Background & Motivation
  4. 4. PayPal Ecosystem (1) ©2017 PayPal Inc. Confidential and proprietary. Complex Social Graph of Consumers & Merchants v Establish confidence/trust for millions of account holders to connect and transact in different modes, at scale in markets all over the world. v Personal Accounts v PayPal Personal Account ü Send Money ü Receive Money ü Make Purchases ü Defer Payments (PayPal Credit) v PayPal Mobile App v Business Accounts v Different needs of different users; Collecting funds in exchange of goods/services v Connect at cash registers through Mobile for web- based checkouts, app-based or Credit Card readers • Person unloading goods online • Food Truck Collecting Payments on Tablet • Landscaping Services - payment on phone • Major retailers with checkout flows Where? Online In-store Web Mobile What? Money Transfer Goods Digital Tangible Services Local/Small Scale Retail/Large Scale International US-based Credit Person Business Who? Person Business 1. 2. 3. OR ? Heterogeneous Ecosystem Good User? Fraudster
  5. 5. 2018 Full-Year Statistics $15.45B REVENUE** $578B TOTAL PAYMENT VOLUME1 9.9B TRANSACTIONS2 $227B MOBILE PAYMENT VOLUME 3.7B MOBILE PAYMENT TRANSACTIONS 4 246M* Consumer Accounts 21M* Merchant Accounts • Multiple Countries/Regions • Multiple Currencies • Multiple Funding Instruments PayPal Ecosystem (2) Massive Scale of E-Commerce
  6. 6. Problem Formulation – Fraud Detection • Reliably facilitate large scale e-commerce between buyers and sellers: • Protect the identity of transacting entities • Establish trust between transacting entities • Scale across countries, currencies, products and modes of transaction • Facilitate e-commerce or money exchange swiftly • Boils down to: • Reliably separating good customers from potentially bad ones • Maximize decline of bad transactions or fraudulent entities • Addressing complex fraud patterns across countries, currencies, products and modes of transaction • Addressing temporally evolving fraud patterns on different platforms • Maximize approval of good transactions or legitimate entities • Approve good transactions up-front quickly for best user experience • Reduce False Positives or Good User Declines • Modus Operandi or Behaviors of good and bad customers – What is it? How does it manifest? ©2017 PayPal Inc. Confidential and proprietary. 6 Business Bottomline
  7. 7. Complexity of Risks in PayPal Ecosystem What is it? - Gain unauthorized access to account and transact. - Log in and out but not transact How to Monetize? - Use existing FS to buy goods from legit sellers - Send money to themselves from account - Sell account to others - Attack Prep, Mask with SF, Layering How is identity stolen? - Data Breaches - Phishing - Not Sufficient Funds – Bank Transfers take time, no immediate response (account exists? Has enough balance?) - Collusion (not received or different), Friendly Fraud, Abuse Buyers Buyer Abuse Bounced Check Sellers Consumer Identity/Stolen Financial CreditRisk Fraud Risk Protections Policy Collusion, Mal Intent Bankruptcy Seller Identity Account Take Over Stolen Identity Stolen Financials What is it? - Steal FS (CC/DC or bank) and add to new account. How to Monetize? - Use existing FS to buy goods from legit sellers - Send money to another PP account or bank account - Aging How is financial stolen? - Data Breaches - Phishing Others - Use stolen identity to apply for credit Credit Fraud Credit Risk - Will they pay on time? - Assess Credit-Worthiness - Consumers / Merchants - Allocate Credit Lines - Heavy Regulation in Modeling How different fraud behaviors manifest? ©2017 PayPal Inc. Confidential and proprietary.
  8. 8. Market for Fraudsters Source: SecureWorks Underground Hacker Markets Annual Report April 2016 Credentials Available Online for a Price 17
  9. 9. Sustaining Model Performance ©2017 PayPal Inc. Confidential and proprietary. 9 Performance deteriorates with Time TRAIN TEST Jan Dec April 2016 2017 Conceptual Population P(x, Y) OFFLINE May Nov 2017 LIVE FPR TPR TPR FPR 05/17 07/17 09/17
  10. 10. Time-Varying Ecosystem ©2017 PayPal Inc. Confidential and proprietary. 10 Areas of Improvement • Technology • Gradual Ramp-up of new features or products. • Evolving Fraud (Desktop / Mobile) • Seasonality – Short / Long Why does the population change? DomainFeaturesModel Raw Data from Events + Data + Time Aggregation • Round-about view of Time / Memory • Long/Short term – seasonal distinction • Anomalies • Assumptions with Time-based Manual Feature Engineering • High Dimensionality of Initial Space: • Correlation / Redundancy • How features change with systemic fraud evolution? Feature generation removed from training process • Robust features across time/distribution shifts • What about cross-domain learning? • Discover new features common representation across domains • How can we explicitly also reduce good user declines? • Can we learn from past intelligence? • What could be ways to address class imbalance? Manual Feature Engineering: Traditional ML F(x) • How is Fraud Data Different? • Representation (No Pixel-like consistency) • Temporality (X and Y)
  11. 11. Part 2 – Applications of Deep Learning Architectures
  12. 12. Addressing Class Imbalance • Given the low ratio of fraudulent to legitimate transactions, the modeling context poses class imbalance problem. ©2017 PayPal Inc. Confidential and proprietary. Small Bad to Good Ratio – SMOTE (Chawla et al., 2002) • Introduce synthetic examples along the line segments joining any/all of the k minority class nearest neighbor. • Depending on how much oversampling, neighbors from k NN are randomly chosen. • Take difference between feature vector (sample) and its NN; multiply by URN(0, 1) – add to feature vector under consideration. • Forces decision region of minority class to be more general. • Consider $-value of fraud, high risk regions for sampling bias • Use: • Edited NN – remove instances whose class label differs from majority of its K-NN. • Tomek Links – remove Tomek links (pair of examples which are NN but have different classes); only remove majority class instance. SMOTE and variants ADAptive SYNthetic (ADASYN) Adaptive Neighbor Synthetic (ANS) Border SMOTE Safe-Level SMOTE DBSMOTE TomekLink 1. 2. Weighted Loss Functions
  13. 13. Opportunities for Improvement ©2017 PayPal Inc. Confidential and proprietary. 13 Manual Feature Engineering: The Prologue • Example: Time property • Event-perspective for temporality or rawness. • Event features created BEFORE and independent of model training • Can we learn the function and all underlying complexities from scratch? E10 E9E8 w1 w2 w3 Manual Feature Engineering Constants Time Windows Event Sequence in time order Raw Feature 1 Raw Feature 1 Raw Feature k • Correlation • Redundancy • Always Decay? Representation Learning for Temporality
  14. 14. Temporal Representation Learning Using LSTM ©2017 PayPal Inc. Confidential and proprietary. 14 Event-driven Deep Learning (Yuan et al., 2017) DomainFeatures Raw Data from Events + E10E9E8 w1 w2w3 Raw Data from Events + Features Feature Discovery Using LSTM • LSTM: learn long-term dependencies – leverage long sequences of user behavior (good/bad). • Classify user behaviors given lags of unknown duration between key events (specific fraud behaviors). • Event sequences as input, predict either future sequences or labels. • Use raw event sequences: no restriction on function, time decay. • Features replace manually engineered features based on assumptions. • For LSTM: • Use payment attempt event data (raw features) – all transactions • Replace manually-generated features with less than half of raw features. • Sequence train LSTM architecture using raw features. • Using features from newly discovered feature hierarchies and other features, train another model. • Approximately 7-10% relative increase in performance. Model P1 P2 P3 P4 P5 P6 M3 (LSTM Feature Learning + NN) 1.0747 1.0665 1.0419 1.0720 1.1374 1.1094 E10E9E8 w1 w2w3 Fraud Cells remember event behaviors over arbitrary time intervals • Homogeneous • Heterogeneous
  15. 15. Robust Feature Learning to address Post-Deployment Shifts ©2017 PayPal Inc. Confidential and proprietary. 15 Discover stable feature spaces to boost robustness • Train stacked denoising auto-encoder to reconstruct the input from a corrupted version of it. • Corruption based on past systemic behavior or random; for example: build models that are robust to IP corruption. • Force the hidden-layer to discover more robust features; hence stable models. • Simulates feature shifts/scenarios post-deployment. • Use weights as a choice instead of randomly initializing the weights for a second stage supervised multi-task learning problem. Feature Selection Ensemble Recursive Feature Elimination Training
  16. 16. Multi-Task & Transfer Learning ©2017 PayPal Inc. Confidential and proprietary. Multi-Input Multi-Output Modeling Architectures • Stacked Architecture to learn robust hierarchical features from long term fraud patterns, multi-task cross-domain learners, hard example mining learners: • Iteratively better than learning Ensembles from sub-sampling and then weighting scores linearly. • Cross Stitch Networks (Misra et al., 2016): At each layer learn linear combination of activation maps from each task – next layer filters operate on shared representation … … … … Long-term Feature Learners … … … … Multi-task Feature Learners Short-term MO Specific Models
  17. 17. Model Performance Comparison ©2017 PayPal Inc. Confidential and proprietary. 17 Robust Feature Learning Using Hybridized Architectures Model Monthly (18 m) Weekly (78 w) Std. Deviation Proportion > cut- off1 Proportion > cutoff2 Std. Deviation Proportion > cut- off1 Proportion > cutoff2 M01_AE x 2.50x 1.39x x 2.61x 1.71x BM 1 1.98x 1.33x 0.98x 2.24x 1.28x 1.05x BM 2 2.85x x x 2.26x x x
  18. 18. Reducing Good User Declines (1) • General Objective of a machine learning algorithm: • Find parameters or weights that optimize (minimize, in this context) a loss function. • Loss function measures how far off the prediction is from ground truth • Gradient search is directed in a way to optimize the loss function. • Beyond canned loss functions: • Can a loss function be designed that explicitly penalizes false positives? • Search is then directed to optimize a loss function that: • Minimizes the gap between ground truth and prediction while • Constraining to search spaces where false positives are lower. • For fraud context: • Improve TP or maximize fraud catch rate • While constraining to search spaces where FP or good user decline is lowest. • Caveat in some cases: No free lunch (FP v/s FN) ©2017 PayPal Inc. Confidential and proprietary. 18 Explore DNN search space for solutions – Cost Functions Low FPR Region Optimal Catch Region
  19. 19. Reducing Good User Declines (2) ©2017 PayPal Inc. Confidential and proprietary. 19 Transfer Learning using Generative Modeling Contexts Rejection Region X >= k, Decline Good Users Fraudsters M1 M2 Mk …. What’s the probability of a transaction being fraudulent? What’s the distribution of features that generates fraud? What’s the distribution of features that generates good users who get declined by M1 … Mk?• Deep Autoencoder Learning • Transfer Learning (Feature Learners or prediction override) Decision Boundary Distant Discriminative Generative
  20. 20. Reducing Good User Declines (3) ©2017 PayPal Inc. Confidential and proprietary. 20 Hard Example Mining – Object Detection Good Bad Shrivastava et al., 2016 • Train Model • Freeze -- Identify Hard Examples • Create Minibatch (Different Variations based on segmentation, risky business domain, dollar value of fraud) • Unfreeze and Continue Training – Backpropagate only hard examples P(Y = 1 / X) > k GOOD BAD Freeze Network Unfreeze / Continue Training Create Minibatch • Good Users who got declined • Two passes to identify good users who get declined and then improve classifier to re-classify these hard examples as “good users”.
  21. 21. Model Performance Comparison (Catch v/s FPR) Model* P1 P2 P3 P4 P5 DNN_CFU 1.000 (1.000) 1.000 (1.000) 1.000 (1.000) 1.000 (1.000) 1.000 (1.000) DNN_RRFL 1.0074 (0.9488) 1.0052 (0.9702) 1.0108 (0.9701) 0.9900 (1.0474) 1.0186 (0.8362) DNN_OHEM 1.0131 (0.8279) 1.0141 (0.9007) 1.0229 (0.8856) 1.0141 (0.7905) 1.0342 (0.6595) ©2017 PayPal Inc. Confidential and proprietary. 21 FPR ratios across different methods • Online Hard Example Mining consistently provides low FPR while retaining high catch rate, beats status-quo champion. • Cost-function based optimizers involve locally weighting data batch by batch and need significant tuning – often cause variability in FPR. • Rejection Feature Learning needs further tuning, the current combination is basically a feature learner.
  22. 22. Part 3 - Conclusions
  23. 23. Deep Learning Applications to Fraud Detection • Key Conclusions: • Next step-function increase in performance. • Scale performance robustly to rapidly evolving fraud patterns. • Deep Learning Architectures offer significant performance boost: • Far lesser trade-off between performance & robustness • Performance scales very well with data or better hardware. • No Pre-training Initial Assumptions (legacy): • Learn temporally/systemically robust features while training • Significant reduction in manual Feature Engineering (assumption-driven, static definitions) • Learn cross-domain features -- less domain-centric restriction (segmentation, tagging) • Past intelligence better utilized due to transfer learning and domain adaptation. • Boost catch rate while also reduce good user decline ©2017 PayPal Inc. Confidential and proprietary. 23 Conclusions DomainFeaturesTraining Extent of ML / Restriction Raw Data from Events Traditional ML Performance Stability Sweet Zone Deep Learning Domain Features Training Raw Data from Events DNN Architectures Cross-Domain
  24. 24. References
  25. 25. References [1] Abhinav Shrivastava, Abhinav Gupta and Ross Girshick. "Training Region-based Object Detectors with Online Hard Example Mining," arXiv:1604.03540 [cs.CV], 2016. [2] Ishan Misra, Abhinav Shrivastava, Abhinav Gupta and Martial Hebert. "Cross-stitch Networks for Multi-task Learning," arXiv:1604.03539 [cs.CV], 2016. [3] Dell SecureWorks. 2006. Underground Hacker Markets Annual Report - April 2006. http://online.wsj.com/public/resources/documents/secureworks_hacker_annualreport.pdf [4] Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. "SMOTE: synthetic minority over-sampling technique," arXiv:1106.1813 [cs.AI], 2002. [5] Shuhan Yuan, Panpan Zheng, Xintao Wu and Yang Xiang. "Wikipedia Vandal Early Detection: from User Behavior to User Embedding, " arXiv:1706.00887 [cs.CR], 2017. ©2017 PayPal Inc. Confidential and proprietary. Research Papers

×