Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach

496 views

Published on

  • Be the first to comment

Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach

  1. 1. 1 Reducing False Positives: BSA AML Transaction Monitoring Re-Tuning Approach Written by Mayank Johri and Erik De Monte Introduction Institutions waste millions per year analyzing false positives due to models which return low efficacy. In an era of heightened regulatory scrutiny coupled with institutions’ desire to control compliance costs, there is need for a sound methodology to improve the overall efficacy of alerts. High efficacy and sound methodology allow institutions to better channel their time and resources to true suspicious activities and improve the overall quality of a BSA/AML program. Certain proposed solutions to this problem include automated alert closures, whitelists, etc. These solutions do not in any way ameliorate the issue of reducing false positives and do not represent sound principles for a robust BSA/AML program. Instead of using “out-of-the-box” rules from the transaction monitoring software, custom rules that encapsulate multiple scenarios and using automated learned behavior (based on past disposition), customer segmentation, and peer group analysis may help improve the efficacy of the alerts; however, these still have to be tuned to determine most effective thresholds. Below is summarization of the steps/approach that can stand the scrutiny of examiners and fulfil the desired objective of generating quality alerts. Approach Assessment & Prioritization On a regular basis evaluate the efficacy of the current suspicious activity detection rules in production, identify the rules with the lowest to the highest efficacy and create a prioritization list. This list then drives the tuning schedule/plan. Data Acquisition Three sets of data are pulled for the re-tuning analysis: 1) All historic transaction data since the most previous tuning was implemented; 2) For the rule in question, all historic alerted transactions, and subsequent disposition (escalated cases and SAR) data. This data can be collected by querying the backend databases of the transaction monitoring system. 3) Various relevant customer data elements like entity/consumer, cash intensive business, AI, etc. on the customers alerted. Data Analysis Stratify the data as required (such as grouping like-attributes or ‘non-tunable parameters’ such as entity/consumer, cash intensive businesses, etc.) to account for like-attribute behavior patterns. Subsequent to stratification, perform a series of data analyses to better understand the data. This
  2. 2. 2 data analysis consists of, but is not limited to, identifying if suitable transaction codes and details are all available, confirming the completeness and accuracy of the data set, and performing a series of correlation tests to identify if certain data elements are correlated. This stage will help the institution to understand the data specific to your client and data set. For example, two data elements may prove to be correlated for an institution and not for another. If two data elements are found to be correlated it may be in the institution’s best interest (from a resource or time perspective) to run data analyses against those two elements in parallel. Build Detection Engine Using the transaction monitoring manual as a guide, recreate the rule using an object oriented programming language (statistically driven language preferable; R, MATLAB or Python recommended) to build an external engine to perform analysis on the rule thresholds. A threshold range is determined for each threshold being tuned, and a matrix is created for all combinations of each of the different possible threshold values. As mentioned above, in the event that two thresholds are discovered to be directly correlated, choose to anchor these two thresholds together as one to eliminate unnecessary noise in the permutation matrix. Determine a de-minimis value to serve as the lowest threshold value in the re-tuning range for that threshold. Professional judgment is used to identify the highest threshold value in the re-tuning range for that threshold but typically will mirror the same delta between the current threshold value and the de-minimis threshold value in the opposite direction. For some rules, it is expected that this permutation matrix can easily create upwards of a thousand different threshold combinations. A simple example is included below to visualize the permutation matrix discussed above. Threshold Current Threshold Current Threshold – Lower Range (de-minimis) Current Threshold – Upper Range 1 10,000 9,500 10,500 2 4 2 6 Figure 1.1 Sample Thresholds and Ranges Permutation Threshold 1 Threshold 2 1 9,500 2 2 9,500 4 3 9,500 6 4 10,000 2 5 10,000 4 6 10,000 6 7 10,500 2 8 10,500 4 9 10,500 6 Figure 1.2 Sample Permutation Matrix (of Thresholds and Ranges from Figure 1.1)
  3. 3. 3 Once both the permutation matrix and the rule engine have been built, the transaction data is clustered and all transactions falling into clusters outside of the threshold ranges are excluded and the two sets of transaction data (full set of transaction data and the transaction data related to historic alerts, cases, and SARs) are run through the rule engine against a loop of all threshold combinations in the permutation matrix. The first full set of all transactions are run through the engine to output a count of events, or “alerts”, for each permutation combination. Before proceeding, identify the threshold combination in the matrix which contains all current thresholds and compare this count against the actual alert count per the historic transaction monitoring system data. This provides a check for completeness over the data pull as well as validates the risk engine’s accuracy. Once confirmed, the second set of transaction data linked to the historic cases and SARs are run through the engine and logged as separate event counts in new columns in the matrix as show below. Permutation Threshold 1 Threshold 2 Transaction Alert – Historic All Transaction Data – Count Case Event Count SAR Event Count 1 (Current) 10,000 4 65 65 14 2 2 (New) 10,000 2 65 58 11 2 3 (New) 10,500 2 65 51 10 1 4 (New) … … … … … … Figure 1.3 Re-Tuning Permutation Matrix Event Counts As seen above, the “Transaction Alert – Historic” count and the “All Transaction Data- Count” for permutation 1 (the current threshold) are equal which would confirm the rule engine is simulating the rule accurately. In permutation 2, when the thresholds have been adjusted to a new combination there is a slight decline in the “All Transaction Data – Count”, as expected with the adjusted thresholds (Note that the “Transaction Alert – Historic” will be anchored at 65 as this logic only produces the alert count at the current thresholds). It is notable to mention that the SAR count of 2 will be used as an anchor in the analysis of the results to set the rule threshold or parameter. Best practice instructs that recent SARs serve as a benchmark for tuning thresholds and should heavily considered in the analysis. As seen above in permutation 3, the threshold combination would cause one of the historic SARs to evade detection, and thus this permutation (and any additional permutations which do not detect 2 historic SARs) should be subsequently eliminated from any consideration for re-tuning. A sample transaction data set and shell code (written in R) for the detection engine discussed above is provided in the Appendix. Quantitative Analysis Identify the remaining permutation combinations and focus the analysis on the case and SAR retention proportions (SAR proportion is usually weighed the most in the analysis). Any threshold combinations in the matrix with undesirably SAR and/or case retention ratios are eliminated from the list of possibilities. No one specific line of demarcation is identified at the end of the quantitative
  4. 4. 4 analysis for a re-tuning exercise. Instead, all remaining threshold combinations in the permutation matrix continue through to the qualitative assessment and subsequent qualitative analysis is performed to solidify a new proposed line of demarcation. Qualitative Analysis Determine during the quantitative analysis historical data in order to set indicators for Above-the- Line (ATL) and Below-the-Line (BTL) and pull the qualitative samples to be reviewed by the FIU. These samples, when flagged as ‘ATL’ are essentially the pseudo alerts, and are treated as such in the FIU’s investigative analysis. BTL samples are included in the sample to further validate the threshold line as the expectation is that less than x% of BTL samples (this percentage will depend on institutions risk appetite) would return as escalated cases. Sampling Determine the appropriate sample size using a hypergeometric binomial sampling without replacement. The number of transactions which fall into the ATL or BTL category will determine the number of random samples required for a statistically significant qualitative assessment. A large enough random sample of the same size would have roughly the same chance of producing a similar result. Below is the formula to be used for determining sample size. Included below is a sample size example: N 620 Through data segmentation analysis (e.g., clustering, etc.) BTL population is determined. CI 1.96 Target significance level (or confidence interval) is 95%; in this case associated factor (“z-value”) is 1.96. In MS Excel this can be calculated using “=NORM.S.INV (1-((1-0.95)/2))” Prec 0.05 Precision is set by risk appetite. The smaller the value of this variable the larger the sample size needs to be. P 10% Occurrence rate which needs to be detected n 113 Based on these values listed above, n = 113       − ⋅ +       ⋅ = 1 Prec PQCI N 1 1 Prec PQCI n 2 2 2 2 Legend N = population size P = expected occurrence rate of an attribute Q = l - P Prec = desired precision level CI = associated factor at a given confidence level
  5. 5. 5 The table below shows how each variable impacts sample size: N Prec P CI n 620 0.05 0.1 1.96 113 620 0.03 0.1 1.96 237 620 0.05 0.2 1.96 176 620 0.05 0.1 1.64 135 Figure 1.4 Sample Size Investigator Analysis The purpose of generating these samples is for the FIU to qualitatively evaluate the efficacy of the quantitatively calculated thresholds. A group of investigators should be selected for the exercise and randomly assigned pseudo ‘alerts’ to review as if they were authentic alerts from the transaction monitoring system. In theory, if the threshold is appropriately tuned, then a transaction marked ‘ATL’ should most likely also be classified as ‘suspicious’ during this qualitative analysis, and all sample transactions that are marked ‘BTL’ would be flagged as not suspicious. The investigator’s evaluation must include consideration for the intent of each rule, and they will generally evaluate each transaction through a lens akin to “Given what is known from KYC, origin/destination of funds, beneficiary, etcetera, is it explainable that this consumer/entity would transact this dollar amount at this ...frequency, velocity, pattern etc...” To maintain the integrity of this assessment, the investigator does not make this qualitative assessment based only on the value of the flagged transaction, but rather looks holistically at various qualities of the transaction such as who the transaction is from/to (is it a wire transfer between two branches of the same company or a similar commodity like computers and semi-conductors), and if there are any fields such as an individual’s last name which contain key words which caused the rule to misinterpret a field as a false positive. Proportion and Efficacy Tests All threshold combinations will need a review to identify which threshold combination has the best efficacy both from a quantitative and qualitative perspective. The outcome of the investigator’s qualitative analysis and the subsequent statistical analysis decide if the line of demarcation determined during the quantitative analysis remains at the current level or is revised. The risk appetite determines the acceptable magnitude of proportion defective (proportion of suspicious transactions), also known as the “efficacy rate”. The range of outcomes and the corresponding decisions are listed below. 1. BTL has acceptable proportion of suspicious transactions and ATL proportion is significantly different (i.e., larger) than BTL’s proportion; threshold remains at the current level: the threshold meaningfully separates BTL and ATL populations and the separation is at the “correct” level (in terms of the risk appetite). 2. Both BTL and ATL proportions are low. Regardless of the statistical difference between the two populations, if the proportions are low, most likely the threshold needs to become less stringent to reduce the level of false positive.
  6. 6. 6 3. Both BTL and ATL proportions are higher than what is the acceptable level of suspicious transactions. Threshold needs to become more stringent. Approval and Implementation Per the institution’s review and approval process, receive all necessary approvals from key personnel prior to making any changes into production. Once all pertinent parties are in agreement, create a functional specification document which should include a brief overview of the rule change, what is currently configured, and the desired configuration changes to be made. It is imperative that the functional specification document is thoroughly vetted and signed off validating that the document provides all necessary and accurate information to make the desired implementation changes. Authors Mayank Johri and Erik De Monte both work in the Bank Security Act/Anti-Money Laundering Analytics group at First Republic Bank in San Francisco, California. Their contact information is included below Mayank Johri, Vice President Analytics https://www.linkedin.com/in/johrim Erik De Monte, Data Scientist https://www.linkedin.com/in/edemonte
  7. 7. 7 Appendix: Detection Engine Shell Code (R) Included below is a sample of transaction data and a detection engine shell code written in R that the data can be run through to depict the methodology discussed above. Please note that the table below should be saved as a comma-separated file (CSV) with the headers included as “Transactions.csv”. The R code was built using RStudio Version 0.99.902 and has been commented to navigate the user through each step of the methodology.
  8. 8. 8 Sample Transaction File (Save as “Transactions.csv”) Transaction_Key Date Alert_Nbr Case_Nbr SAR_Nbr Attribute_01 Attribute_02 Attribute_03 TXN001 1/1/2016 NULL NULL NULL 6 0 70000 TXN002 1/15/2016 NULL NULL NULL 1 1 40 TXN003 2/1/2016 ALRT001 NULL NULL 11 2 1300000 TXN004 2/15/2016 NULL NULL NULL 5 1 340 TXN005 3/1/2016 NULL NULL NULL 7 0 126 TXN006 3/15/2016 NULL NULL NULL 7 0 986 TXN007 4/1/2016 NULL NULL NULL 5 0 1400 TXN008 4/15/2016 NULL NULL NULL 2 1 9765 TXN009 5/1/2016 NULL NULL NULL 3 0 2098 TXN010 5/15/2016 ALRT002 CASE001 SAR001 16 5 1000001 TXN011 6/1/2016 ALRT003 NULL NULL 15 3 1800765 TXN012 6/15/2016 NULL NULL NULL 3 1 65433 TXN013 1/1/2016 NULL NULL NULL 3 0 765889 TXN014 1/15/2016 NULL NULL NULL 4 1 12 TXN015 2/1/2016 NULL NULL NULL 7 1 2345 TXN016 2/15/2016 NULL NULL NULL 9 0 97800 TXN017 3/1/2016 NULL NULL NULL 6 0 5422 TXN018 3/15/2016 ALRT004 NULL NULL 12 2 1005678 TXN019 4/1/2016 NULL NULL NULL 6 1 9845 TXN020 4/15/2016 NULL NULL NULL 3 0 998 TXN021 5/1/2016 ALRT005 CASE002 NULL 18 4 1009876 TXN022 5/15/2016 NULL NULL NULL 4 0 12333 TXN023 6/1/2016 ALRT006 NULL NULL 10 5 1200000 TXN024 6/15/2016 ALRT007 CASE003 SAR002 20 10 34087264
  9. 9. 9 Detection Engine Shell Code (R) #////////////////////////////////////////////////////////////////////// # Name: Re-Tuning Permutation Analysis - Example R Script # Date: October 2016 # Developers: Erik De Monte, Mayank Johri #////////////////////////////////////////////////////////////////////// # Assumptions: # # i. There are 4 tables of Transactions available to be run through the engine: # - All transactions for the date period identified # - All transactions related to historic alerts for the date period identified # - All transactions related to historic alerts that were escalated to case # - All transactions related to historic alerts that were escalated to SAR # # ii. The data available for the relevant thresholds being re-tuned are available. #////////////////////////////////////////////////////////////////////// #0. Preliminary Procedures #////////////////////////////////////////////////////////////////////// # Load relevant preinstalled R Packages library(cluster) library(doBy) library (base) library(lubridate) library(utils) library(RODBC) library(reshape) library(dplyr) # Upload and Format Data Frame transactions <- read.csv(file='Transactions.csv', sep=',', header=TRUE, stringsAsFactors = FALSE) transactions[,1] <- as.character(transactions[,1]) transactions[,2] <- as.Date(transactions[,2], format = "%m/%d/%Y") transactions[,3] <- as.character(transactions[,3]) transactions[,4] <- as.character(transactions[,4]) transactions[,5] <- as.character(transactions[,5]) transactions[,6] <- as.numeric(transactions[,6]) transactions[,7] <- as.numeric(transactions[,7]) transactions[,8] <- as.numeric(transactions[,8]) #////////////////////////////////////////////////////////////////////// # 1. Create a reference table for permutation matrix. #////////////////////////////////////////////////////////////////////// # 1a. Define Threshold Variables # For the sake of this example, let us assume that the current thresholds are set at: # threshold_01 = 10 # threshold_02 = 2 # threshold_03 = 1000000 # To define exact values to a threshold, assign it to a vector ("c") # To define a sequence of values, use the "seq" function under the syntax: # threshold = seq(a,b,c) ; Go from a to b in increments of c threshold_01 = c(7, 10, 12) threshold_02 = c(1,2,3) threshold_03 = seq(800000,1200000,200000) # 1b. Create the Threshold Table
  10. 10. 10 x_Threshold_Table <- expand.grid(threshold_01,threshold_02,threshold_03) # 1c. Accurately define the columns in the new table names(x_Threshold_Table)[names(x_Threshold_Table) == 'Var1'] <- 'Example_Threshold_01' names(x_Threshold_Table)[names(x_Threshold_Table) == 'Var2'] <- 'Example_Threshold_02' names(x_Threshold_Table)[names(x_Threshold_Table) == 'Var3'] <- 'Example_Threshold_03' # 1d. Clean up your Enviornment and remove unneccesary varibales. rm(threshold_01) rm(threshold_02) rm(threshold_03) #////////////////////////////////////////////////////////////////////// # 2. Loop transactions through each permutation in the Permutation Matrix (x_Threshold_Table) # Count the number of events #////////////////////////////////////////////////////////////////////// # Count of Transactions - Current Thresholds #////////////////////////////////////////////////////////////////////// # 2a. Set the baseline alert count based on current transactions # For the sake of this example, let us assume that the current thresholds are set at: # threshold_01 = 10 # threshold_02 = 2 # threshold_03 = 1000000 # In this example, there are 7 historic alerts for the transaction set. alerts <- subset(transactions, transactions$Alert_Nbr != 'NULL') alert_count <- as.numeric(length(alerts$Transaction_Key)) x_Final <- data.frame(x_Threshold_Table[1:3], alert_count) names(x_Final)[names(x_Final) == 'alert_count'] <- 'Transaction Alert - Historic' rm(alert_count) #////////////////////////////////////////////////////////////////////// # Count of Transactions - Permutation Thresholds #////////////////////////////////////////////////////////////////////// # 2b. Create a variable which logs the number of events which fit the respective loop Var_Event <- rep(NA,nrow(x_Threshold_Table)) # 2c. Loop through all threshold permutation combinations and create a subset of the transactions that would alert # var_index is used to temporarily hold the count of alerts between loops for (i in 1:nrow(x_Threshold_Table)){ var_index <- subset(transactions, ( (transactions$Attribute_01 >= x_Threshold_Table$Example_Threshold_01[i]) & (transactions$Attribute_02 >= x_Threshold_Table$Example_Threshold_02[i]) & (transactions$Attribute_03 >= x_Threshold_Table$Example_Threshold_03[i]) )) #Count Var_Event[i] <- as.numeric(length(var_index$Transaction_Key)) rm(var_index) } Event_Count=as.matrix(Var_Event) x_Final = cbind(x_Final, Event_Count)
  11. 11. 11 names(x_Final)[names(x_Final) == 'Event_Count'] <- 'Transaction Data - Count' rm(Event_Count) rm(Var_Event) rm(i) #////////////////////////////////////////////////////////////////////// # Count of Historic Case Transactions #////////////////////////////////////////////////////////////////////// # Emulate the logic above using only the transactions related to historic cases. # Append ("cbind") the results to the final permutation table as done above. # Name it "Case Event Count" cases <- subset(transactions, transactions$Case_Nbr != 'NULL') Var_Event <- rep(NA,nrow(x_Threshold_Table)) for (i in 1:nrow(x_Threshold_Table)){ var_index <- subset(cases, ( (cases$Attribute_01 >= x_Threshold_Table$Example_Threshold_01[i]) & (cases$Attribute_02 >= x_Threshold_Table$Example_Threshold_02[i]) & (cases$Attribute_03 >= x_Threshold_Table$Example_Threshold_03[i]) )) #Count Var_Event[i] <- as.numeric(length(var_index$Transaction_Key)) rm(var_index) } Event_Count=as.matrix(Var_Event) x_Final = cbind(x_Final, Event_Count) names(x_Final)[names(x_Final) == 'Event_Count'] <- 'Case Event Count' rm(Event_Count) rm(Var_Event) rm(i) #////////////////////////////////////////////////////////////////////// # Count of Historic SAR Transactions #////////////////////////////////////////////////////////////////////// # Emulate the logic above using only the transactions related to historic SARs. # Append ("cbind") the results to the final permutation table as done above. # Name it "SAR Event Count" sars <- subset(transactions, transactions$SAR_Nbr != 'NULL') Var_Event <- rep(NA,nrow(x_Threshold_Table)) for (i in 1:nrow(x_Threshold_Table)){ var_index <- subset(sars, ( (sars$Attribute_01 >= x_Threshold_Table$Example_Threshold_01[i]) & (sars$Attribute_02 >= x_Threshold_Table$Example_Threshold_02[i]) & (sars$Attribute_03 >= x_Threshold_Table$Example_Threshold_03[i]) )) #Count Var_Event[i] <- as.numeric(length(var_index$Transaction_Key)) rm(var_index) }
  12. 12. 12 Event_Count=as.matrix(Var_Event) x_Final = cbind(x_Final, Event_Count) names(x_Final)[names(x_Final) == 'Event_Count'] <- 'SAR Event Count' rm(Event_Count) rm(Var_Event) rm(i) #////////////////////////////////////////////////////////////////////// # Anchor your analysis to the number of SARs filed, remove any combinations which would have # missed a prior filed SAR. sar_count <- as.numeric(length(sars$Transaction_Key)) x_Final <- subset(x_Final, x_Final$`SAR Event Count` >= sar_count) rm(sar_count) #////////////////////////////////////////////////////////////////////// #////////////////////////////////////////////////////////////////////// #////////////////////////////////////////////////////////////////////// #//////////////////////////////////////////////////////////////////FIN.

×