The document provides an approach for improving the efficacy of alerts from anti-money laundering transaction monitoring models by reducing false positives. The approach involves regularly evaluating rule efficacy, acquiring historic transaction and disposition data, analyzing the data to understand patterns, building a detection engine to test threshold combinations, and quantitatively and qualitatively analyzing the results to identify the combination with the best efficacy based on case and SAR retention proportions while minimizing false positives. Key steps include prioritizing rules for tuning, testing threshold permutations, sampling transactions for investigator review, and approving final threshold changes. The goal is to generate higher quality alerts while controlling compliance costs.
Reducing False Positives in BSA/AML Transaction Monitoring
1. 1
Reducing False Positives:
BSA AML Transaction Monitoring Tuning Approach
Written by Mayank Johri Ph.D. and Erik De Monte MS
Introduction
Institutions waste millions per year analyzing false positives due to models which return low efficacy.
In an era of heightened regulatory scrutiny coupled with institutions’ desire to control compliance
costs, there is need for a sound methodology to improve the overall efficacy of alerts. High efficacy
and sound methodology allow institutions to better channel their time and resources to true
suspicious activities and improve the overall quality of a BSA/AML program.
Certain proposed solutions to this problem include automated alert closures, whitelists, etc. These
solutions do not in any way ameliorate the issue of reducing false positives and do not represent
sound principles for a robust BSA/AML program.
Instead of using “out-of-the-box” rules from the transaction monitoring software, custom rules that
encapsulate multiple scenarios and using automated learned behavior (based on past disposition),
customer segmentation, and peer group analysis may help improve the efficacy of the alerts;
however, these still have to be tuned to determine most effective thresholds.
Below is summarization of the steps/approach that can stand the scrutiny of examiners and fulfil the
desired objective of generating quality alerts.
Approach
Assessment & Prioritization
On a regular basis evaluate the efficacyof the current suspicious activitydetection rules in
production, identify the rules with the lowest to the highest efficacy and create a prioritization list.
This list then drives the tuning schedule/plan.
Data Acquisition
Three sets of data are pulled for the re-tuning analysis:
1) All historic transaction data since the most previous tuning was implemented;
2) For the rule in question, all historic alerted transactions, and subsequent disposition
(escalated casesand SAR) data. This data can be collected by querying the backend
databases of the transaction monitoring system.
3) Various relevant customer data elementslike entity/consumer, cash intensivebusiness,
AI, etc. on the customers alerted.
Data Analysis
Stratify the data as required (such as grouping like-attributes or ‘non-tunable parameters’ such as
entity/consumer, cash intensive businesses, etc.)to account for like-attributebehavior patterns.
Subsequent to stratification, perform a series of data analyses to better understand the data. This
2. 2
data analysis consists of, but is not limited to, identifying if suitable transaction codes and details are
all available, confirming the completeness and accuracy of the data set, and performing a series of
correlation tests to identify if certain data elements are correlated. This stage will help the institution
to understand the data specific to your client and data set. For example, two data elements may
prove to be correlated for an institution and not for another.
Build Detection Engine
Using the transaction monitoring manual as a guide, recreate the rule using an object oriented
programming language (statisticallydriven language preferable; R, MATLAB or Python
recommended) to build an external engine to perform analysis on the rule thresholds.
A threshold range is determined for each threshold being tuned, and a matrix is created for all
combinations of each of the different possible threshold values. As mentioned above, in the event
that two thresholds are discovered to be directly correlated, choose to anchor these two thresholds
together as one to eliminate unnecessary noise in the permutation matrix.
Determine a de-minimis value to serve as the lowest threshold value in the re-tuning range for that
threshold. Professional judgment is used to identify the highest threshold value in the re-tuning
range for that threshold but typically will mirror the same delta between the current threshold value
and the de-minimis threshold value in the opposite direction.
For some rules, it is expected that this permutation matrix can easily create upwards of a thousand
different threshold combinations. A simple example is included below to visualize the permutation
matrix discussed above.
Threshold Current Threshold Current Threshold – Lower Range (de-minimis) Current Threshold – Upper Range
1 10,000 9,500 10,500
2 4 2 6
Figure 1.1 Sample Thresholds and Ranges
Permutation Threshold 1 Threshold 2
1 9,500 2
2 9,500 4
3 9,500 6
4 10,000 2
5 10,000 4
6 10,000 6
7 10,500 2
8 10,500 4
9 10,500 6
Figure 1.2 Sample Permutation Matrix
(of Thresholds and Ranges from Figure 1.1)
3. 3
Once both the permutation matrix and the rule engine have been built, the transaction data is
clustered and all transactions falling into clusters outside of the threshold ranges are excluded and
the two sets of transaction data (full set of transaction data and the transaction data related to
historic alerts, cases, and SARs) are run through the rule engine against a loop of all threshold
combinations in the permutation matrix.
The first full set of all transactions are run through the engine to output a count of events, or
“alerts”, for each permutation combination. Before proceeding, identify the threshold combination
in the matrix which contains all current thresholds and compare this count against the actual alert
count per the historic transaction monitoring system data. This provides a check for completeness
over the data pull as well as validates the risk engine’s accuracy.
Once confirmed, the second set of transaction data linked to the historic cases and SARs are run
through the engine and logged as separate event counts in new columns in the matrix as show
below.
Permutation Threshold 1 Threshold 2
Transaction Alert
– Historic
All Transaction
Data – Count
Case Event
Count
SAR Event
Count
1 (Current) 10,000 4 65 65 14 2
2 (New) 10,000 2 65 58 11 2
3 (New) 10,500 2 65 51 10 1
4 (New) … … … … … …
Figure 1.3 Re-Tuning Permutation Matrix Event Counts
As seen above, the “Transaction Alert – Historic” count and the “All Transaction Data- Count” for
permutation 1 (the current threshold) are equal which would confirm the rule engine is simulating
the rule accurately. In permutation 2, when the thresholds have been adjusted to a new combination
there is a slight decline in the “All Transaction Data – Count”, as expected with the adjusted
thresholds (Note that the “Transaction Alert – Historic” will be anchored at 65 as this logic only
produces the alert count at the current thresholds).
It is notable to mention that the SAR count of 2 will be used as an anchor in the analysis of the
results to set the rule threshold or parameter. Best practice instructs that recent SARs serve as a
benchmark for tuning thresholds and should heavily considered in the analysis. As seen above in
permutation 3, the threshold combination would cause one of the historic SARs to evade detection,
and thus this permutation (and any additional permutations which do not detect 2 historic SARs)
should be subsequently eliminated from any consideration for re-tuning.
A sample transaction data set and shell code (written in R) for the detection engine discussed above
is provided in the Appendix.
Quantitative Analysis
Identify the remaining permutation combinations and focus the analysis on the case and SAR
retention proportions (SAR proportion is usually weighed the most in the analysis). Any threshold
combinations in the matrix with undesirably SAR and/or case retention ratios are eliminated from
the list of possibilities. No one specific line of demarcation is identified at the end of the quantitative
analysis for a re-tuning exercise. Instead, all remaining threshold combinations in the permutation
4. 4
matrix continue through to the qualitative assessment and subsequent qualitative analysis is
performed to solidify a new proposed line of demarcation.
Qualitative Analysis
Determine during the quantitative analysishistorical data in order to set indicators for Above-the-
Line (ATL) and Below-the-Line (BTL) and pull the qualitative samples to be reviewed by the FIU.
These samples, when flagged as ‘ATL’ are essentially the pseudo alerts, and are treated as such in the
FIU’s investigative analysis. BTL samples are included in the sample to further validate the threshold
line as the expectation is that less than x% of BTL samples (this percentage will depend on
institutions risk appetite) would return as escalated cases.
Sampling
Determine the appropriate sample size using a hypergeometric binomial sampling without
replacement. The number of transactions which fall into the ATL or BTL category will determine
the number of random samples required for a statistically significantqualitative assessment. A large
enough random sample of the same size would have roughly the same chance of producing a similar
result. Below is the formula to be used for determining sample size.
Included below is a sample size example:
N 620 Throughdatasegmentationanalysis (e.g.,clustering,etc.) BTL populationis determined.
CI 1.96
Target significancelevel (or confidence interval) is 95%;in this case associatedfactor(“z-value”) is 1.96.
In MS Excel this can be calculatedusing“=NORM.S.INV(1-((1-0.95)/2))”
Prec 0.05
Precisionis set by risk appetite.
The smallerthe valueof this variablethe larger thesample size needs to be.
P 10% Occurrencerate whichneeds to be detected
n 113 Based on these values listedabove,n = 113
The table below shows how each variable impacts sample size:
1
Prec
PQCI
N
1
1
Prec
PQCI
n
2
2
2
2
Legend
N = populationsize
P = expected occurrencerate of an attribute
Q = l - P
Prec = desiredprecision level
CI = associated factor at agiven confidence level
5. 5
N Prec P CI n
620 0.05 0.1 1.96 113
620 0.03 0.1 1.96 237
620 0.05 0.2 1.96 176
620 0.05 0.1 1.64 135
Figure 1.4 Sample Size
Investigator Analysis
The purpose of generating these samples is for the FIU to qualitatively evaluatethe efficacy of the
quantitatively calculated thresholds. A group of investigators should be selected for the exercise and
randomly assigned pseudo ‘alerts’ to review as if they were authentic alerts from the transaction
monitoring system. In theory, if the threshold is appropriately tuned, then a transaction marked
‘ATL’ should most likely also be classified as ‘suspicious’ during this qualitative analysis, and all
sample transactions that are marked ‘BTL’ would be flagged as not suspicious.
The investigator’s evaluation must include consideration for the intent of each rule, and they will
generally evaluate each transaction through a lens akin to “Given what is known from KYC,
origin/destination of funds, beneficiary, etcetera, is it explainable that this consumer/entity would
transact this dollar amount at this ...frequency, velocity, pattern etc...” To maintain the integrity of
this assessment, the investigator does not make this qualitative assessment based only on the value
of the flagged transaction, but rather looks holistically at various qualities of the transaction such as
who the transaction is from/to (is it a wire transfer between two branches of the same company or a
similar commodity like computers and semi-conductors), and if there are any fields such as an
individual’s last name which contain key words which caused the rule to misinterpret a field as a
false positive.
Proportion and Efficacy Tests
All threshold combinations will need a review to identify which threshold combination has the best
efficacy both from a quantitative and qualitativeperspective.
The outcome of the investigator’s qualitative analysisand the subsequent statistical analysisdecide if
the line of demarcation determined during the quantitative analysis remains at the current level or is
revised. The risk appetite determines the acceptable magnitude of proportion defective (proportion
of suspicious transactions), also known as the “efficacy rate”. The range of outcomes and the
corresponding decisions are listed below.
1. BTL has acceptable proportion of suspicious transactions and ATL proportion is
significantly different (i.e., larger) than BTL’s proportion; threshold remains at the current
level: the threshold meaningfully separates BTL and ATL populations and the separation is
at the “correct” level (in terms of the risk appetite).
2. Both BTL and ATL proportions are low. Regardless of the statisticaldifference between the
two populations, if the proportions are low, most likely the threshold needs to become less
stringent to reduce the level of false positive.
3. Both BTL and ATL proportions are higher than what is the acceptable level of suspicious
transactions. Threshold needs to become more stringent.
6. 6
Approval and Implementation
Per the institution’s review and approval process, receive allnecessary approvals from key personnel
prior to making any changes into production. Once all pertinent parties are in agreement, create a
functional specification document which should include a brief overview of the rule change, what is
currently configured, and the desired configuration changes to be made. It is imperative that the
functional specification document is thoroughly vetted and signed off validating that the document
provides all necessary and accurate information to make the desired implementation changes.
Authors
Mayank Johri and Erik De Monte both work in the BSA/AML Analytics group at First Republic
Bank in San Francisco, California. Their contact information is included below
Mayank Johri, Vice President Analytics
https://www.linkedin.com/in/johrim
Erik De Monte, Data Scientist
https://www.linkedin.com/in/edemonte
7. 7
Appendix: Detection Engine Shell Code (R)
Included below is a sample of transaction data and a detection engine shell code written in R that
the data can be run through to depict the methodology discussed above. Please note that the table
below should be saved as a comma-separated file (CSV) with the headers included as
“Transactions.csv”.
The R code was built using RStudio Version 0.99.902 and has been commented to navigate the user
through each step of the methodology.
9. 9
Detection Engine Shell Code (R)
#//////////////////////////////////////////////////////////////////////
# Name: Re-Tuning Permutation Analysis - Example R Script
# Date: October 2016
# Developers: Erik De Monte, Mayank Johri
#//////////////////////////////////////////////////////////////////////
# Assumptions:
#
# i. There are 4 tables of Transactions available to be run through the engine:
# - All transactions for the date period identified
# - All transactions related to historic alerts for the date period identified
# - All transactions related to historic alerts that were escalated to case
# - All transactions related to historic alerts that were escalated to SAR
#
# ii. The data available for the relevant thresholds being re-tuned are available.
#//////////////////////////////////////////////////////////////////////
#0. Preliminary Procedures
#//////////////////////////////////////////////////////////////////////
# Load relevant preinstalled R Packages
library(cluster)
library(doBy)
library (base)
library(lubridate)
library(utils)
library(RODBC)
library(reshape)
library(dplyr)
# Upload and Format Data Frame
transactions <- read.csv(file='Transactions.csv', sep=',', header=TRUE, stringsAsFactors = FALSE)
transactions[,1] <- as.character(transactions[,1])
transactions[,2] <- as.Date(transactions[,2], format = "%m/%d/%Y")
transactions[,3] <- as.character(transactions[,3])
transactions[,4] <- as.character(transactions[,4])
transactions[,5] <- as.character(transactions[,5])
transactions[,6] <- as.numeric(transactions[,6])
transactions[,7] <- as.numeric(transactions[,7])
transactions[,8] <- as.numeric(transactions[,8])
#//////////////////////////////////////////////////////////////////////
# 1. Create a reference table for permutation matrix.
#//////////////////////////////////////////////////////////////////////
# 1a. Define Threshold Variables
# For the sake of this example, let us assume that the current thresholds are set at:
# threshold_01 = 10
# threshold_02 = 2
# threshold_03 = 1000000
# To define exact values to a threshold, assign it to a vector ("c")
# To define a sequence of values, use the "seq" function under the syntax:
# threshold = seq(a,b,c) ; Go from a to b in increments of c
threshold_01 = c(7, 10, 12)
threshold_02 = c(1,2,3)
threshold_03 = seq(800000,1200000,200000)
# 1b. Create the Threshold Table
10. 10
x_Threshold_Table <- expand.grid(threshold_01,threshold_02,threshold_03)
# 1c. Accurately define the columns in the new table
names(x_Threshold_Table)[names(x_Threshold_Table) == 'Var1'] <- 'Example_Threshold_01'
names(x_Threshold_Table)[names(x_Threshold_Table) == 'Var2'] <- 'Example_Threshold_02'
names(x_Threshold_Table)[names(x_Threshold_Table) == 'Var3'] <- 'Example_Threshold_03'
# 1d. Clean up your Enviornment and remove unneccesary varibales.
rm(threshold_01)
rm(threshold_02)
rm(threshold_03)
#//////////////////////////////////////////////////////////////////////
# 2. Loop transactions through each permutation in the Permutation Matrix (x_Threshold_Table)
# Count the number of events
#//////////////////////////////////////////////////////////////////////
# Count of Transactions - Current Thresholds
#//////////////////////////////////////////////////////////////////////
# 2a. Set the baseline alert count based on current transactions
# For the sake of this example, let us assume that the current thresholds are set at:
# threshold_01 = 10
# threshold_02 = 2
# threshold_03 = 1000000
# In this example, there are 7 historic alerts for the transaction set.
alerts <- subset(transactions, transactions$Alert_Nbr != 'NULL')
alert_count <- as.numeric(length(alerts$Transaction_Key))
x_Final <- data.frame(x_Threshold_Table[1:3], alert_count)
names(x_Final)[names(x_Final) == 'alert_count'] <- 'Transaction Alert - Historic'
rm(alert_count)
#//////////////////////////////////////////////////////////////////////
# Count of Transactions - Permutation Thresholds
#//////////////////////////////////////////////////////////////////////
# 2b. Create a variable which logs the number of events which fit the respective loop
Var_Event <- rep(NA,nrow(x_Threshold_Table))
# 2c. Loop through all threshold permutation combinations and create a subset of the transactions
that would alert
# var_index is used to temporarily hold the count of alerts between loops
for (i in 1:nrow(x_Threshold_Table)){
var_index <- subset(transactions, (
(transactions$Attribute_01 >= x_Threshold_Table$Example_Threshold_01[i])
& (transactions$Attribute_02 >= x_Threshold_Table$Example_Threshold_02[i])
& (transactions$Attribute_03 >= x_Threshold_Table$Example_Threshold_03[i])
))
#Count
Var_Event[i] <- as.numeric(length(var_index$Transaction_Key))
rm(var_index)
}
Event_Count=as.matrix(Var_Event)
x_Final = cbind(x_Final, Event_Count)
11. 11
names(x_Final)[names(x_Final) == 'Event_Count'] <- 'Transaction Data - Count'
rm(Event_Count)
rm(Var_Event)
rm(i)
#//////////////////////////////////////////////////////////////////////
# Count of Historic Case Transactions
#//////////////////////////////////////////////////////////////////////
# Emulate the logic above using only the transactions related to historic cases.
# Append ("cbind") the results to the final permutation table as done above.
# Name it "Case Event Count"
cases <- subset(transactions, transactions$Case_Nbr != 'NULL')
Var_Event <- rep(NA,nrow(x_Threshold_Table))
for (i in 1:nrow(x_Threshold_Table)){
var_index <- subset(cases, (
(cases$Attribute_01 >= x_Threshold_Table$Example_Threshold_01[i])
& (cases$Attribute_02 >= x_Threshold_Table$Example_Threshold_02[i])
& (cases$Attribute_03 >= x_Threshold_Table$Example_Threshold_03[i])
))
#Count
Var_Event[i] <- as.numeric(length(var_index$Transaction_Key))
rm(var_index)
}
Event_Count=as.matrix(Var_Event)
x_Final = cbind(x_Final, Event_Count)
names(x_Final)[names(x_Final) == 'Event_Count'] <- 'Case Event Count'
rm(Event_Count)
rm(Var_Event)
rm(i)
#//////////////////////////////////////////////////////////////////////
# Count of Historic SAR Transactions
#//////////////////////////////////////////////////////////////////////
# Emulate the logic above using only the transactions related to historic SARs.
# Append ("cbind") the results to the final permutation table as done above.
# Name it "SAR Event Count"
sars <- subset(transactions, transactions$SAR_Nbr != 'NULL')
Var_Event <- rep(NA,nrow(x_Threshold_Table))
for (i in 1:nrow(x_Threshold_Table)){
var_index <- subset(sars, (
(sars$Attribute_01 >= x_Threshold_Table$Example_Threshold_01[i])
& (sars$Attribute_02 >= x_Threshold_Table$Example_Threshold_02[i])
& (sars$Attribute_03 >= x_Threshold_Table$Example_Threshold_03[i])
))
#Count
Var_Event[i] <- as.numeric(length(var_index$Transaction_Key))
rm(var_index)
}
12. 12
Event_Count=as.matrix(Var_Event)
x_Final = cbind(x_Final, Event_Count)
names(x_Final)[names(x_Final) == 'Event_Count'] <- 'SAR Event Count'
rm(Event_Count)
rm(Var_Event)
rm(i)
#//////////////////////////////////////////////////////////////////////
# Anchor your analysis to the number of SARs filed, remove any combinations which would have
# missed a prior filed SAR.
sar_count <- as.numeric(length(sars$Transaction_Key))
x_Final <- subset(x_Final, x_Final$`SAR Event Count` >= sar_count)
rm(sar_count)
#//////////////////////////////////////////////////////////////////////
#//////////////////////////////////////////////////////////////////////
#//////////////////////////////////////////////////////////////////////
#//////////////////////////////////////////////////////////////////FIN.