Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Liubomyr Bregman "Financial Crime Detection using Advanced Analytics"


Published on

Data Science Practice

Published in: Business
  • Be the first to comment

  • Be the first to like this

Liubomyr Bregman "Financial Crime Detection using Advanced Analytics"

  1. 1. Fin Crime detection using autoencoders Liubomyr Bregman Richard Bobek L'viv, Ukraine 3 Nov 2018 AI & Big Data Day
  2. 2. Lessons learnt 1. Skilled operators 2. Less opportunity for “insider” or “opportunistic” attack 3. Need for ‘out-of-band’ systems for notifications PwC 2 Global Payment Fraud – lessons learnt and investigations highlights Lessons learnt 1. Dedicated scenarios within FIs 2. Leverage the convergence between cyber, fraud and ML 3. Leverage advanced analytics – evolving threat landscape *BAE Systems If Hollywood releases another iteration of the 'Oceans 11' franchise, they should base it on the recent attack against the Central Bank of Bangladesh (BB)* Bangladesh cyber heist The attackers attempted to steal $951m in 35 separate fraudulent transactions. 30 orders (worth $850m) were stopped by the US Fed, but 5 orders (worth $101m) went through. A further $20m was blocked by a recipient bank in Sri Lanka Vietnam Swift fraud attempt Further analysis of the Bangladesh cyber heist, led to the conclusion that the same attackers appear to have struck previously, using similar tools written for targeting a bank in Vietnam just a couple months before the Bangladesh attack.
  3. 3. There are many “creative” new strategies in fin crimes Some of the known financial crime strategies: Cheque fraud Credit card fraud Mortgage fraud Medical fraud Corporate fraud Securities fraud (including insider trading) Bank fraud Insurance fraud Market manipulation Payment (point of sale) fraud Health care fraud Theft Scams or confidence tricks Tax evasion Bribery Embezzlement Identity theft Money laundering Forgery and counterfeiting PwC 3
  4. 4. There are many fraud detection and AML software on the market Market segment by Type, Financial Fraud Detection Software can be split into Anti Money Laundering Detection Software Identity Theft Detection Software Credit/Debit Card Fraud Detection Software Others Wire Transfer Fraud Detection Software PwC 4
  5. 5. Traditionally, fin crime is approached by reporting and expert knowledge and assessment The steps are usually: * Historically those can be large financial abuse management systems, transaction monitoring systems, in-house development scripts, etc.. Report of alerts is generated by rule decision engine*. This report is showing transactions / clients detected by (usually) orthogonal rules. Experts assess the alerts and decides on appropriate action. This can be for example investigation of the activities of the client. 1 2 PwC 5
  6. 6. Most financial institutions struggle with similar problems in detecting financial crime Huge streams of data Scenarios are far from perfect Fraud schemas are developing Investigations are costly Number of scenarios are limited and costly More data science is better PwC 6
  7. 7. Machine learning approaches aim to increase the automation and recall of the process Rule based expert based Supervised (Investigation needed) Unsupervised 1. Optimal rules 1. Segmentations 2. Anomaly detection 3. Semi-Supervised approach 2. Deep learning approaches a) Pattern discovery a) Rule based Models Creation b) Threshold optimizations c) Rule optimization ways d) Alert prioritization PwC 7
  8. 8. The key problem is the unbalanced dataset and some terminology 0.1% True positive and 99.9% False positives Only 13 scenarios Around 90 features ~12 segments ~700 threshold 600M transaction 2M Alerts 2K SARs PwC 8
  9. 9. More precise numbers from past projects in different banks Alerting and escalation Customers L1 (Alerts) Alert Rate (L1/Cust.) L2 (Cases) L1 to L2 Rate L3 (SAR-Rec) L2 to L3 Rate SAR Rate (SARs/Cust.) Peer 1 55,000,000 320,000 0.58% 32,000 10% 6,400 20% 0.012% Peer 2 4,500,000 60,500 1.34% 6,340 10% 3,340 53% 0.074% Peer 3 9,900,000 148,000 1.49% 40,000 27% 670 2% 0.007% Peer 4 40,000,000 50,000 0.13% 12,000 24% 375 3% 0.001% Peer Average: 0.89% 17.88% 19.37% Benchmarking for alert volumes Benchmarking for AML TM investigations Number FTE Annual spend Maturity (0 (low) to 3 (high)) Peer 1 12,000 $2bn 2+ Peer 2 5,000 $800m 2 Peer 3 10,000 $1.2bn 3 Peer 4 210 $50m 0-1 Peer 5 150 $125m 1-2 Peer 6 2,500 $300m 1-2 PwC 9
  10. 10. What is normality Normal or not? PwC 10
  11. 11. Anomaly Sensitivity Density How does it work: Normality Normality is a measure of concentration separated from anomaly by sensitivity threshold Normal Normal PwC 11
  12. 12. How does it work: Abnormality of anomalies How far from Normality?! How far from other abnormalities?! Abnormality of anomalies Normal Normal PwC 12
  13. 13. How does it work: Similarity of anomalies Anomaly cluster Some anomalies are similar and create a separate cluster Investigation of one anomaly and finding a fraud make other anomalies more probable to be fraud Normal Normal Similarity of anomalies PwC 13
  14. 14. How does it work: Stability of normal and anomalous patterns Anomaly cluster When normality definition over time remains ”stable”, the analytical set is considered “operational” Normal Normal PwC 14
  15. 15. PwC 15 There are a lot of ways how to detect anomaly Non parametric • Density-based techniques (k-nearest neighbor, local outlier factor, and many more variations of this concept). • Fuzzy logic-based outlier detection. • Cluster analysis-based outlier detection. Parametric • Subspace- and correlation-based outlier detection for high-dimensional data.. • Bayesian Networks. • Deviations from association rules and frequent item sets. &more • Ensemble techniques, using feature bagging, score normalization and different sources of diversity.
  16. 16. PwC 16 Non parametric: Density-based techniques
  17. 17. PwC 17 Parametric Prediction Error term is anomaly score
  18. 18. PwC Autoencoders PwC 18
  19. 19. Autoencoders present powerful method for anomaly detection in financial crimes What is this? Approach to training Measure of quality x H R Input (observation) Internal representation (neural network, hidden layer) Output (reconstruction) f(x) g(x) Target output = observed input: H = f(x) R = g(x) = g(f(x)) = x Loss function L(x, g(f(x))), e.g. RMSE Traditionally used for anomaly detection & dimension reduction PwC 19
  20. 20. PwC 20 Is it the same principle as compression? No, compression usually has no loss Compression is generic Autoencoding is trained on specific cases Original observation (Labrador with brown collar) Decoded observation (still dog) Encoding Decoding 1 0 0 1 0 1 1 0 1 1 0 0 1 1 0 0 0 1 1 0 1 0 1 1
  21. 21. PwC 21 Which strategy should we apply? Train on goods, predict anomality by loss (difference between input and representation) 1 Train on all, assume that neural network will not learn bads due to low number of observations 2 StrategiesTransactions Goods Bads Unknown Bads 1 2
  22. 22. HX R Why do we need a model of g(f(x)) = x? We do not We need the internal representation H of x With a deep H (multiple layers), autoencoder can approximate any mapping from X to R arbitrary well (Hinton & Salakhutdinor, 2006) 1 1 + e−(a1W1+a2W2+bias) x1 x2 x3 bias NEURON a1W1 a2W2 PwC 22
  23. 23. How can I understand what’s happening inside? Why do you want? x1 x2 x3 Ok, then … We simulate: x1 ∈ {min(x1): max(x1)} x2 = x2 x3 = x3 PwC 23
  24. 24. So what do I get using this? It learns only the probable inputs1 You can play with the loss function2 As a result we get a powerful anomaly detector Autoencoder is able to learn the structure of manifold Those combined force H to capture + information about the structure of the data generating distribution Applying the expert knowledge e.g. L = n=1 k Wn x−xn 2 k Wn = 0,5; 3; , PwC 24
  25. 25. PwC 25 How do we say in the end what is anomaly? Input OutputHidden X1 X2 X3 X1 X2 X3 Comparison of input & output Bads Goods Anomality treshold RMSE Classification Using the final layer of encoder as input for the classifier 1 2
  26. 26. 26 Finally, we train and validate a classification algorithm to predict anomalies in advance Anomaly labeling with Autoencoders Anomaly Normality RMSE 1 Boosted decision tree to predict failure and define predictive rules4 Normality Normality Anomaly 5 Validating the results ROC FP TP Time series2 measuredattributeX time Counter example Positive example measuredattributeX Slidingwindow Slidingwindow 3 measuredattributeX time measuredattributeX time Translating the problem to classification
  27. 27. PwC 27 Case study Asian Bank: Deep Neural Network was built for Anomaly detection Neural network illustration Accuracy and loss of the resulting solution 12 nodes 32 nodes 8 nodes 8 nodes 32 nodes 12 nodes
  28. 28. PwC 28 Case study Asian Bank Comparison of input & output of Neural Network Anomaly and actual SAR Anomaly but not a fraud Sensitivity top 1% Transactions Final ROC curve results into 80% AUC vs Prioritization is possible
  29. 29. PwC 29 Measuring of sensitivity of the autoencoder to input
  30. 30. This publication has been prepared for general guidance on matters of interest only, and does not constitute professional advice. You should not act upon the information contained in this publication without obtaining specific professional advice. No representation or warranty (express or implied) is given as to the accuracy or completeness of the information contained in this publication, and, to the extent permitted by law, PricewaterhouseCoopers Česká republika, s.r.o., its members, employees and agents do not accept or assume any liability, responsibility or duty of care for any consequences of you or anyone else acting, or refraining to act, in reliance on the information contained in this publication or for any decision based on it. © 2018 PricewaterhouseCoopers Česká republika, s.r.o. All rights reserved. “PwC” is the brand under which member firms of PricewaterhouseCoopers International Limited (PwCIL) operate and provide services. Together, these firms form the PwC network. Each firm in the network is a separate legal entity and does not act as agent of PwCIL or any other member firm. PwCIL does not provide any services to clients. PwCIL is not responsible or liable for the acts or omissions of any of its member firms nor can it control the exercise of their professional judgment or bind them in any way. Thank you!