SlideShare a Scribd company logo
1 of 12
1
Reducing False Positives:
BSA AML Transaction Monitoring Tuning Approach
Written by Mayank Johri Ph.D. and Erik De Monte MS
Introduction
Institutions waste millions per year analyzing false positives due to models which return low efficacy.
In an era of heightened regulatory scrutiny coupled with institutions’ desire to control compliance
costs, there is need for a sound methodology to improve the overall efficacy of alerts. High efficacy
and sound methodology allow institutions to better channel their time and resources to true
suspicious activities and improve the overall quality of a BSA/AML program.
Certain proposed solutions to this problem include automated alert closures, whitelists, etc. These
solutions do not in any way ameliorate the issue of reducing false positives and do not represent
sound principles for a robust BSA/AML program.
Instead of using “out-of-the-box” rules from the transaction monitoring software, custom rules that
encapsulate multiple scenarios and using automated learned behavior (based on past disposition),
customer segmentation, and peer group analysis may help improve the efficacy of the alerts;
however, these still have to be tuned to determine most effective thresholds.
Below is summarization of the steps/approach that can stand the scrutiny of examiners and fulfil the
desired objective of generating quality alerts.
Approach
Assessment & Prioritization
On a regular basis evaluate the efficacyof the current suspicious activitydetection rules in
production, identify the rules with the lowest to the highest efficacy and create a prioritization list.
This list then drives the tuning schedule/plan.
Data Acquisition
Three sets of data are pulled for the re-tuning analysis:
1) All historic transaction data since the most previous tuning was implemented;
2) For the rule in question, all historic alerted transactions, and subsequent disposition
(escalated casesand SAR) data. This data can be collected by querying the backend
databases of the transaction monitoring system.
3) Various relevant customer data elementslike entity/consumer, cash intensivebusiness,
AI, etc. on the customers alerted.
Data Analysis
Stratify the data as required (such as grouping like-attributes or ‘non-tunable parameters’ such as
entity/consumer, cash intensive businesses, etc.)to account for like-attributebehavior patterns.
Subsequent to stratification, perform a series of data analyses to better understand the data. This
2
data analysis consists of, but is not limited to, identifying if suitable transaction codes and details are
all available, confirming the completeness and accuracy of the data set, and performing a series of
correlation tests to identify if certain data elements are correlated. This stage will help the institution
to understand the data specific to your client and data set. For example, two data elements may
prove to be correlated for an institution and not for another.
Build Detection Engine
Using the transaction monitoring manual as a guide, recreate the rule using an object oriented
programming language (statisticallydriven language preferable; R, MATLAB or Python
recommended) to build an external engine to perform analysis on the rule thresholds.
A threshold range is determined for each threshold being tuned, and a matrix is created for all
combinations of each of the different possible threshold values. As mentioned above, in the event
that two thresholds are discovered to be directly correlated, choose to anchor these two thresholds
together as one to eliminate unnecessary noise in the permutation matrix.
Determine a de-minimis value to serve as the lowest threshold value in the re-tuning range for that
threshold. Professional judgment is used to identify the highest threshold value in the re-tuning
range for that threshold but typically will mirror the same delta between the current threshold value
and the de-minimis threshold value in the opposite direction.
For some rules, it is expected that this permutation matrix can easily create upwards of a thousand
different threshold combinations. A simple example is included below to visualize the permutation
matrix discussed above.
Threshold Current Threshold Current Threshold – Lower Range (de-minimis) Current Threshold – Upper Range
1 10,000 9,500 10,500
2 4 2 6
Figure 1.1 Sample Thresholds and Ranges
Permutation Threshold 1 Threshold 2
1 9,500 2
2 9,500 4
3 9,500 6
4 10,000 2
5 10,000 4
6 10,000 6
7 10,500 2
8 10,500 4
9 10,500 6
Figure 1.2 Sample Permutation Matrix
(of Thresholds and Ranges from Figure 1.1)
3
Once both the permutation matrix and the rule engine have been built, the transaction data is
clustered and all transactions falling into clusters outside of the threshold ranges are excluded and
the two sets of transaction data (full set of transaction data and the transaction data related to
historic alerts, cases, and SARs) are run through the rule engine against a loop of all threshold
combinations in the permutation matrix.
The first full set of all transactions are run through the engine to output a count of events, or
“alerts”, for each permutation combination. Before proceeding, identify the threshold combination
in the matrix which contains all current thresholds and compare this count against the actual alert
count per the historic transaction monitoring system data. This provides a check for completeness
over the data pull as well as validates the risk engine’s accuracy.
Once confirmed, the second set of transaction data linked to the historic cases and SARs are run
through the engine and logged as separate event counts in new columns in the matrix as show
below.
Permutation Threshold 1 Threshold 2
Transaction Alert
– Historic
All Transaction
Data – Count
Case Event
Count
SAR Event
Count
1 (Current) 10,000 4 65 65 14 2
2 (New) 10,000 2 65 58 11 2
3 (New) 10,500 2 65 51 10 1
4 (New) … … … … … …
Figure 1.3 Re-Tuning Permutation Matrix Event Counts
As seen above, the “Transaction Alert – Historic” count and the “All Transaction Data- Count” for
permutation 1 (the current threshold) are equal which would confirm the rule engine is simulating
the rule accurately. In permutation 2, when the thresholds have been adjusted to a new combination
there is a slight decline in the “All Transaction Data – Count”, as expected with the adjusted
thresholds (Note that the “Transaction Alert – Historic” will be anchored at 65 as this logic only
produces the alert count at the current thresholds).
It is notable to mention that the SAR count of 2 will be used as an anchor in the analysis of the
results to set the rule threshold or parameter. Best practice instructs that recent SARs serve as a
benchmark for tuning thresholds and should heavily considered in the analysis. As seen above in
permutation 3, the threshold combination would cause one of the historic SARs to evade detection,
and thus this permutation (and any additional permutations which do not detect 2 historic SARs)
should be subsequently eliminated from any consideration for re-tuning.
A sample transaction data set and shell code (written in R) for the detection engine discussed above
is provided in the Appendix.
Quantitative Analysis
Identify the remaining permutation combinations and focus the analysis on the case and SAR
retention proportions (SAR proportion is usually weighed the most in the analysis). Any threshold
combinations in the matrix with undesirably SAR and/or case retention ratios are eliminated from
the list of possibilities. No one specific line of demarcation is identified at the end of the quantitative
analysis for a re-tuning exercise. Instead, all remaining threshold combinations in the permutation
4
matrix continue through to the qualitative assessment and subsequent qualitative analysis is
performed to solidify a new proposed line of demarcation.
Qualitative Analysis
Determine during the quantitative analysishistorical data in order to set indicators for Above-the-
Line (ATL) and Below-the-Line (BTL) and pull the qualitative samples to be reviewed by the FIU.
These samples, when flagged as ‘ATL’ are essentially the pseudo alerts, and are treated as such in the
FIU’s investigative analysis. BTL samples are included in the sample to further validate the threshold
line as the expectation is that less than x% of BTL samples (this percentage will depend on
institutions risk appetite) would return as escalated cases.
Sampling
Determine the appropriate sample size using a hypergeometric binomial sampling without
replacement. The number of transactions which fall into the ATL or BTL category will determine
the number of random samples required for a statistically significantqualitative assessment. A large
enough random sample of the same size would have roughly the same chance of producing a similar
result. Below is the formula to be used for determining sample size.
Included below is a sample size example:
N 620 Throughdatasegmentationanalysis (e.g.,clustering,etc.) BTL populationis determined.
CI 1.96
Target significancelevel (or confidence interval) is 95%;in this case associatedfactor(“z-value”) is 1.96.
In MS Excel this can be calculatedusing“=NORM.S.INV(1-((1-0.95)/2))”
Prec 0.05
Precisionis set by risk appetite.
The smallerthe valueof this variablethe larger thesample size needs to be.
P 10% Occurrencerate whichneeds to be detected
n 113 Based on these values listedabove,n = 113
The table below shows how each variable impacts sample size:














 

1
Prec
PQCI
N
1
1
Prec
PQCI
n
2
2
2
2
Legend
N = populationsize
P = expected occurrencerate of an attribute
Q = l - P
Prec = desiredprecision level
CI = associated factor at agiven confidence level
5
N Prec P CI n
620 0.05 0.1 1.96 113
620 0.03 0.1 1.96 237
620 0.05 0.2 1.96 176
620 0.05 0.1 1.64 135
Figure 1.4 Sample Size
Investigator Analysis
The purpose of generating these samples is for the FIU to qualitatively evaluatethe efficacy of the
quantitatively calculated thresholds. A group of investigators should be selected for the exercise and
randomly assigned pseudo ‘alerts’ to review as if they were authentic alerts from the transaction
monitoring system. In theory, if the threshold is appropriately tuned, then a transaction marked
‘ATL’ should most likely also be classified as ‘suspicious’ during this qualitative analysis, and all
sample transactions that are marked ‘BTL’ would be flagged as not suspicious.
The investigator’s evaluation must include consideration for the intent of each rule, and they will
generally evaluate each transaction through a lens akin to “Given what is known from KYC,
origin/destination of funds, beneficiary, etcetera, is it explainable that this consumer/entity would
transact this dollar amount at this ...frequency, velocity, pattern etc...” To maintain the integrity of
this assessment, the investigator does not make this qualitative assessment based only on the value
of the flagged transaction, but rather looks holistically at various qualities of the transaction such as
who the transaction is from/to (is it a wire transfer between two branches of the same company or a
similar commodity like computers and semi-conductors), and if there are any fields such as an
individual’s last name which contain key words which caused the rule to misinterpret a field as a
false positive.
Proportion and Efficacy Tests
All threshold combinations will need a review to identify which threshold combination has the best
efficacy both from a quantitative and qualitativeperspective.
The outcome of the investigator’s qualitative analysisand the subsequent statistical analysisdecide if
the line of demarcation determined during the quantitative analysis remains at the current level or is
revised. The risk appetite determines the acceptable magnitude of proportion defective (proportion
of suspicious transactions), also known as the “efficacy rate”. The range of outcomes and the
corresponding decisions are listed below.
1. BTL has acceptable proportion of suspicious transactions and ATL proportion is
significantly different (i.e., larger) than BTL’s proportion; threshold remains at the current
level: the threshold meaningfully separates BTL and ATL populations and the separation is
at the “correct” level (in terms of the risk appetite).
2. Both BTL and ATL proportions are low. Regardless of the statisticaldifference between the
two populations, if the proportions are low, most likely the threshold needs to become less
stringent to reduce the level of false positive.
3. Both BTL and ATL proportions are higher than what is the acceptable level of suspicious
transactions. Threshold needs to become more stringent.
6
Approval and Implementation
Per the institution’s review and approval process, receive allnecessary approvals from key personnel
prior to making any changes into production. Once all pertinent parties are in agreement, create a
functional specification document which should include a brief overview of the rule change, what is
currently configured, and the desired configuration changes to be made. It is imperative that the
functional specification document is thoroughly vetted and signed off validating that the document
provides all necessary and accurate information to make the desired implementation changes.
Authors
Mayank Johri and Erik De Monte both work in the BSA/AML Analytics group at First Republic
Bank in San Francisco, California. Their contact information is included below
Mayank Johri, Vice President Analytics
https://www.linkedin.com/in/johrim
Erik De Monte, Data Scientist
https://www.linkedin.com/in/edemonte
7
Appendix: Detection Engine Shell Code (R)
Included below is a sample of transaction data and a detection engine shell code written in R that
the data can be run through to depict the methodology discussed above. Please note that the table
below should be saved as a comma-separated file (CSV) with the headers included as
“Transactions.csv”.
The R code was built using RStudio Version 0.99.902 and has been commented to navigate the user
through each step of the methodology.
8
Sample Transaction File (Save as “Transactions.csv”)
Transaction_Key Date Alert_Nbr Case_Nbr SAR_Nbr Attribute_01 Attribute_02 Attribute_03
TXN001 1/1/2016 NULL NULL NULL 6 0 70000
TXN002 1/15/2016 NULL NULL NULL 1 1 40
TXN003 2/1/2016 ALRT001 NULL NULL 11 2 1300000
TXN004 2/15/2016 NULL NULL NULL 5 1 340
TXN005 3/1/2016 NULL NULL NULL 7 0 126
TXN006 3/15/2016 NULL NULL NULL 7 0 986
TXN007 4/1/2016 NULL NULL NULL 5 0 1400
TXN008 4/15/2016 NULL NULL NULL 2 1 9765
TXN009 5/1/2016 NULL NULL NULL 3 0 2098
TXN010 5/15/2016 ALRT002 CASE001 SAR001 16 5 1000001
TXN011 6/1/2016 ALRT003 NULL NULL 15 3 1800765
TXN012 6/15/2016 NULL NULL NULL 3 1 65433
TXN013 1/1/2016 NULL NULL NULL 3 0 765889
TXN014 1/15/2016 NULL NULL NULL 4 1 12
TXN015 2/1/2016 NULL NULL NULL 7 1 2345
TXN016 2/15/2016 NULL NULL NULL 9 0 97800
TXN017 3/1/2016 NULL NULL NULL 6 0 5422
TXN018 3/15/2016 ALRT004 NULL NULL 12 2 1005678
TXN019 4/1/2016 NULL NULL NULL 6 1 9845
TXN020 4/15/2016 NULL NULL NULL 3 0 998
TXN021 5/1/2016 ALRT005 CASE002 NULL 18 4 1009876
TXN022 5/15/2016 NULL NULL NULL 4 0 12333
TXN023 6/1/2016 ALRT006 NULL NULL 10 5 1200000
TXN024 6/15/2016 ALRT007 CASE003 SAR002 20 10 34087264
9
Detection Engine Shell Code (R)
#//////////////////////////////////////////////////////////////////////
# Name: Re-Tuning Permutation Analysis - Example R Script
# Date: October 2016
# Developers: Erik De Monte, Mayank Johri
#//////////////////////////////////////////////////////////////////////
# Assumptions:
#
# i. There are 4 tables of Transactions available to be run through the engine:
# - All transactions for the date period identified
# - All transactions related to historic alerts for the date period identified
# - All transactions related to historic alerts that were escalated to case
# - All transactions related to historic alerts that were escalated to SAR
#
# ii. The data available for the relevant thresholds being re-tuned are available.
#//////////////////////////////////////////////////////////////////////
#0. Preliminary Procedures
#//////////////////////////////////////////////////////////////////////
# Load relevant preinstalled R Packages
library(cluster)
library(doBy)
library (base)
library(lubridate)
library(utils)
library(RODBC)
library(reshape)
library(dplyr)
# Upload and Format Data Frame
transactions <- read.csv(file='Transactions.csv', sep=',', header=TRUE, stringsAsFactors = FALSE)
transactions[,1] <- as.character(transactions[,1])
transactions[,2] <- as.Date(transactions[,2], format = "%m/%d/%Y")
transactions[,3] <- as.character(transactions[,3])
transactions[,4] <- as.character(transactions[,4])
transactions[,5] <- as.character(transactions[,5])
transactions[,6] <- as.numeric(transactions[,6])
transactions[,7] <- as.numeric(transactions[,7])
transactions[,8] <- as.numeric(transactions[,8])
#//////////////////////////////////////////////////////////////////////
# 1. Create a reference table for permutation matrix.
#//////////////////////////////////////////////////////////////////////
# 1a. Define Threshold Variables
# For the sake of this example, let us assume that the current thresholds are set at:
# threshold_01 = 10
# threshold_02 = 2
# threshold_03 = 1000000
# To define exact values to a threshold, assign it to a vector ("c")
# To define a sequence of values, use the "seq" function under the syntax:
# threshold = seq(a,b,c) ; Go from a to b in increments of c
threshold_01 = c(7, 10, 12)
threshold_02 = c(1,2,3)
threshold_03 = seq(800000,1200000,200000)
# 1b. Create the Threshold Table
10
x_Threshold_Table <- expand.grid(threshold_01,threshold_02,threshold_03)
# 1c. Accurately define the columns in the new table
names(x_Threshold_Table)[names(x_Threshold_Table) == 'Var1'] <- 'Example_Threshold_01'
names(x_Threshold_Table)[names(x_Threshold_Table) == 'Var2'] <- 'Example_Threshold_02'
names(x_Threshold_Table)[names(x_Threshold_Table) == 'Var3'] <- 'Example_Threshold_03'
# 1d. Clean up your Enviornment and remove unneccesary varibales.
rm(threshold_01)
rm(threshold_02)
rm(threshold_03)
#//////////////////////////////////////////////////////////////////////
# 2. Loop transactions through each permutation in the Permutation Matrix (x_Threshold_Table)
# Count the number of events
#//////////////////////////////////////////////////////////////////////
# Count of Transactions - Current Thresholds
#//////////////////////////////////////////////////////////////////////
# 2a. Set the baseline alert count based on current transactions
# For the sake of this example, let us assume that the current thresholds are set at:
# threshold_01 = 10
# threshold_02 = 2
# threshold_03 = 1000000
# In this example, there are 7 historic alerts for the transaction set.
alerts <- subset(transactions, transactions$Alert_Nbr != 'NULL')
alert_count <- as.numeric(length(alerts$Transaction_Key))
x_Final <- data.frame(x_Threshold_Table[1:3], alert_count)
names(x_Final)[names(x_Final) == 'alert_count'] <- 'Transaction Alert - Historic'
rm(alert_count)
#//////////////////////////////////////////////////////////////////////
# Count of Transactions - Permutation Thresholds
#//////////////////////////////////////////////////////////////////////
# 2b. Create a variable which logs the number of events which fit the respective loop
Var_Event <- rep(NA,nrow(x_Threshold_Table))
# 2c. Loop through all threshold permutation combinations and create a subset of the transactions
that would alert
# var_index is used to temporarily hold the count of alerts between loops
for (i in 1:nrow(x_Threshold_Table)){
var_index <- subset(transactions, (
(transactions$Attribute_01 >= x_Threshold_Table$Example_Threshold_01[i])
& (transactions$Attribute_02 >= x_Threshold_Table$Example_Threshold_02[i])
& (transactions$Attribute_03 >= x_Threshold_Table$Example_Threshold_03[i])
))
#Count
Var_Event[i] <- as.numeric(length(var_index$Transaction_Key))
rm(var_index)
}
Event_Count=as.matrix(Var_Event)
x_Final = cbind(x_Final, Event_Count)
11
names(x_Final)[names(x_Final) == 'Event_Count'] <- 'Transaction Data - Count'
rm(Event_Count)
rm(Var_Event)
rm(i)
#//////////////////////////////////////////////////////////////////////
# Count of Historic Case Transactions
#//////////////////////////////////////////////////////////////////////
# Emulate the logic above using only the transactions related to historic cases.
# Append ("cbind") the results to the final permutation table as done above.
# Name it "Case Event Count"
cases <- subset(transactions, transactions$Case_Nbr != 'NULL')
Var_Event <- rep(NA,nrow(x_Threshold_Table))
for (i in 1:nrow(x_Threshold_Table)){
var_index <- subset(cases, (
(cases$Attribute_01 >= x_Threshold_Table$Example_Threshold_01[i])
& (cases$Attribute_02 >= x_Threshold_Table$Example_Threshold_02[i])
& (cases$Attribute_03 >= x_Threshold_Table$Example_Threshold_03[i])
))
#Count
Var_Event[i] <- as.numeric(length(var_index$Transaction_Key))
rm(var_index)
}
Event_Count=as.matrix(Var_Event)
x_Final = cbind(x_Final, Event_Count)
names(x_Final)[names(x_Final) == 'Event_Count'] <- 'Case Event Count'
rm(Event_Count)
rm(Var_Event)
rm(i)
#//////////////////////////////////////////////////////////////////////
# Count of Historic SAR Transactions
#//////////////////////////////////////////////////////////////////////
# Emulate the logic above using only the transactions related to historic SARs.
# Append ("cbind") the results to the final permutation table as done above.
# Name it "SAR Event Count"
sars <- subset(transactions, transactions$SAR_Nbr != 'NULL')
Var_Event <- rep(NA,nrow(x_Threshold_Table))
for (i in 1:nrow(x_Threshold_Table)){
var_index <- subset(sars, (
(sars$Attribute_01 >= x_Threshold_Table$Example_Threshold_01[i])
& (sars$Attribute_02 >= x_Threshold_Table$Example_Threshold_02[i])
& (sars$Attribute_03 >= x_Threshold_Table$Example_Threshold_03[i])
))
#Count
Var_Event[i] <- as.numeric(length(var_index$Transaction_Key))
rm(var_index)
}
12
Event_Count=as.matrix(Var_Event)
x_Final = cbind(x_Final, Event_Count)
names(x_Final)[names(x_Final) == 'Event_Count'] <- 'SAR Event Count'
rm(Event_Count)
rm(Var_Event)
rm(i)
#//////////////////////////////////////////////////////////////////////
# Anchor your analysis to the number of SARs filed, remove any combinations which would have
# missed a prior filed SAR.
sar_count <- as.numeric(length(sars$Transaction_Key))
x_Final <- subset(x_Final, x_Final$`SAR Event Count` >= sar_count)
rm(sar_count)
#//////////////////////////////////////////////////////////////////////
#//////////////////////////////////////////////////////////////////////
#//////////////////////////////////////////////////////////////////////
#//////////////////////////////////////////////////////////////////FIN.

More Related Content

What's hot

Application of ordinal logistic regression in the study of students’ performance
Application of ordinal logistic regression in the study of students’ performanceApplication of ordinal logistic regression in the study of students’ performance
Application of ordinal logistic regression in the study of students’ performanceAlexander Decker
 
Multiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionMultiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionKaushik Rajan
 
Winters Method
Winters MethodWinters Method
Winters Method3abooodi
 
Factor analysis
Factor analysisFactor analysis
Factor analysis緯鈞 沈
 
Regression diagnostics
Regression diagnosticsRegression diagnostics
Regression diagnosticsdermengles
 
SAMPLING AND ESTIMATION PPT.pptx
SAMPLING AND ESTIMATION PPT.pptxSAMPLING AND ESTIMATION PPT.pptx
SAMPLING AND ESTIMATION PPT.pptxYashikaSaini24
 
Regression Study: Boston Housing
Regression Study: Boston HousingRegression Study: Boston Housing
Regression Study: Boston HousingRavish Kalra
 
Time Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and ForecastingTime Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and ForecastingMaruthi Nataraj K
 
New power point hemodynamic
New power point hemodynamicNew power point hemodynamic
New power point hemodynamic0000memo
 
Multiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IMultiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IJames Neill
 
Week 4 forecasting - time series - smoothing and decomposition - m.awaluddin.t
Week 4   forecasting - time series - smoothing and decomposition - m.awaluddin.tWeek 4   forecasting - time series - smoothing and decomposition - m.awaluddin.t
Week 4 forecasting - time series - smoothing and decomposition - m.awaluddin.tMaling Senk
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Marina Santini
 

What's hot (20)

Application of ordinal logistic regression in the study of students’ performance
Application of ordinal logistic regression in the study of students’ performanceApplication of ordinal logistic regression in the study of students’ performance
Application of ordinal logistic regression in the study of students’ performance
 
Seasonal ARIMA
Seasonal ARIMASeasonal ARIMA
Seasonal ARIMA
 
Multiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionMultiple Regression and Logistic Regression
Multiple Regression and Logistic Regression
 
Winters Method
Winters MethodWinters Method
Winters Method
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Regression diagnostics
Regression diagnosticsRegression diagnostics
Regression diagnostics
 
SAMPLING AND ESTIMATION PPT.pptx
SAMPLING AND ESTIMATION PPT.pptxSAMPLING AND ESTIMATION PPT.pptx
SAMPLING AND ESTIMATION PPT.pptx
 
Time series analysis
Time series analysisTime series analysis
Time series analysis
 
Regression Study: Boston Housing
Regression Study: Boston HousingRegression Study: Boston Housing
Regression Study: Boston Housing
 
Demand forecasting
Demand forecastingDemand forecasting
Demand forecasting
 
Time Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and ForecastingTime Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and Forecasting
 
New power point hemodynamic
New power point hemodynamicNew power point hemodynamic
New power point hemodynamic
 
Confirmatory Factor Analysis
Confirmatory Factor AnalysisConfirmatory Factor Analysis
Confirmatory Factor Analysis
 
Survival analysis
Survival analysisSurvival analysis
Survival analysis
 
Ordinal Logistic Regression
Ordinal Logistic RegressionOrdinal Logistic Regression
Ordinal Logistic Regression
 
Multiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IMultiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA I
 
Week 4 forecasting - time series - smoothing and decomposition - m.awaluddin.t
Week 4   forecasting - time series - smoothing and decomposition - m.awaluddin.tWeek 4   forecasting - time series - smoothing and decomposition - m.awaluddin.t
Week 4 forecasting - time series - smoothing and decomposition - m.awaluddin.t
 
Logistic Regression Analysis
Logistic Regression AnalysisLogistic Regression Analysis
Logistic Regression Analysis
 
1.2 types of data
1.2 types of data1.2 types of data
1.2 types of data
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation
 

Similar to Reducing False Positives in BSA/AML Transaction Monitoring

Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachReducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachErik De Monte
 
BSA_AML Rule Tuning
BSA_AML Rule TuningBSA_AML Rule Tuning
BSA_AML Rule TuningMayank Johri
 
Open06
Open06Open06
Open06butest
 
Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industryskewdlogix
 
Softhandover criteria
Softhandover criteriaSofthandover criteria
Softhandover criteriaDian Azizi
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisIRJET Journal
 
Detection of credit card fraud
Detection of credit card fraudDetection of credit card fraud
Detection of credit card fraudBastiaan Frerix
 
Integration of a Predictive, Continuous Time Neural Network into Securities M...
Integration of a Predictive, Continuous Time Neural Network into Securities M...Integration of a Predictive, Continuous Time Neural Network into Securities M...
Integration of a Predictive, Continuous Time Neural Network into Securities M...Chris Kirk, PhD, FIAP
 
Quality management information system
Quality management information systemQuality management information system
Quality management information systemselinasimpson341
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPranov Mishra
 
Market basket predictive_model
Market basket predictive_modelMarket basket predictive_model
Market basket predictive_modelFatima Khalid
 
Solve Production Allocation and Reconciliation Problems using the same Network
Solve Production Allocation and Reconciliation Problems using the same NetworkSolve Production Allocation and Reconciliation Problems using the same Network
Solve Production Allocation and Reconciliation Problems using the same NetworkAlkis Vazacopoulos
 
Risk based quality management
Risk based quality managementRisk based quality management
Risk based quality managementselinasimpson2301
 
Quality assurance management
Quality assurance managementQuality assurance management
Quality assurance managementselinasimpson0301
 
Final SAS Day 2015 Poster
Final SAS Day 2015 PosterFinal SAS Day 2015 Poster
Final SAS Day 2015 PosterReuben Hilliard
 
Lecture 3 Statistical ProcessControl (SPC).docx
Lecture 3 Statistical ProcessControl (SPC).docxLecture 3 Statistical ProcessControl (SPC).docx
Lecture 3 Statistical ProcessControl (SPC).docxsmile790243
 
Intelligent Supermarket using Apriori
Intelligent Supermarket using AprioriIntelligent Supermarket using Apriori
Intelligent Supermarket using AprioriIRJET Journal
 
Purpose of quality management system
Purpose of quality management systemPurpose of quality management system
Purpose of quality management systemselinasimpson1801
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMSAli T. Lotia
 

Similar to Reducing False Positives in BSA/AML Transaction Monitoring (20)

Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachReducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
 
BSA_AML Rule Tuning
BSA_AML Rule TuningBSA_AML Rule Tuning
BSA_AML Rule Tuning
 
Open06
Open06Open06
Open06
 
Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industry
 
Softhandover criteria
Softhandover criteriaSofthandover criteria
Softhandover criteria
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend Analysis
 
Detection of credit card fraud
Detection of credit card fraudDetection of credit card fraud
Detection of credit card fraud
 
Integration of a Predictive, Continuous Time Neural Network into Securities M...
Integration of a Predictive, Continuous Time Neural Network into Securities M...Integration of a Predictive, Continuous Time Neural Network into Securities M...
Integration of a Predictive, Continuous Time Neural Network into Securities M...
 
Quality management information system
Quality management information systemQuality management information system
Quality management information system
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
Market basket predictive_model
Market basket predictive_modelMarket basket predictive_model
Market basket predictive_model
 
Solve Production Allocation and Reconciliation Problems using the same Network
Solve Production Allocation and Reconciliation Problems using the same NetworkSolve Production Allocation and Reconciliation Problems using the same Network
Solve Production Allocation and Reconciliation Problems using the same Network
 
Risk based quality management
Risk based quality managementRisk based quality management
Risk based quality management
 
Quality assurance management
Quality assurance managementQuality assurance management
Quality assurance management
 
Final SAS Day 2015 Poster
Final SAS Day 2015 PosterFinal SAS Day 2015 Poster
Final SAS Day 2015 Poster
 
Quality management kpi
Quality management kpiQuality management kpi
Quality management kpi
 
Lecture 3 Statistical ProcessControl (SPC).docx
Lecture 3 Statistical ProcessControl (SPC).docxLecture 3 Statistical ProcessControl (SPC).docx
Lecture 3 Statistical ProcessControl (SPC).docx
 
Intelligent Supermarket using Apriori
Intelligent Supermarket using AprioriIntelligent Supermarket using Apriori
Intelligent Supermarket using Apriori
 
Purpose of quality management system
Purpose of quality management systemPurpose of quality management system
Purpose of quality management system
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMS
 

Recently uploaded

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 

Recently uploaded (20)

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 

Reducing False Positives in BSA/AML Transaction Monitoring

  • 1. 1 Reducing False Positives: BSA AML Transaction Monitoring Tuning Approach Written by Mayank Johri Ph.D. and Erik De Monte MS Introduction Institutions waste millions per year analyzing false positives due to models which return low efficacy. In an era of heightened regulatory scrutiny coupled with institutions’ desire to control compliance costs, there is need for a sound methodology to improve the overall efficacy of alerts. High efficacy and sound methodology allow institutions to better channel their time and resources to true suspicious activities and improve the overall quality of a BSA/AML program. Certain proposed solutions to this problem include automated alert closures, whitelists, etc. These solutions do not in any way ameliorate the issue of reducing false positives and do not represent sound principles for a robust BSA/AML program. Instead of using “out-of-the-box” rules from the transaction monitoring software, custom rules that encapsulate multiple scenarios and using automated learned behavior (based on past disposition), customer segmentation, and peer group analysis may help improve the efficacy of the alerts; however, these still have to be tuned to determine most effective thresholds. Below is summarization of the steps/approach that can stand the scrutiny of examiners and fulfil the desired objective of generating quality alerts. Approach Assessment & Prioritization On a regular basis evaluate the efficacyof the current suspicious activitydetection rules in production, identify the rules with the lowest to the highest efficacy and create a prioritization list. This list then drives the tuning schedule/plan. Data Acquisition Three sets of data are pulled for the re-tuning analysis: 1) All historic transaction data since the most previous tuning was implemented; 2) For the rule in question, all historic alerted transactions, and subsequent disposition (escalated casesand SAR) data. This data can be collected by querying the backend databases of the transaction monitoring system. 3) Various relevant customer data elementslike entity/consumer, cash intensivebusiness, AI, etc. on the customers alerted. Data Analysis Stratify the data as required (such as grouping like-attributes or ‘non-tunable parameters’ such as entity/consumer, cash intensive businesses, etc.)to account for like-attributebehavior patterns. Subsequent to stratification, perform a series of data analyses to better understand the data. This
  • 2. 2 data analysis consists of, but is not limited to, identifying if suitable transaction codes and details are all available, confirming the completeness and accuracy of the data set, and performing a series of correlation tests to identify if certain data elements are correlated. This stage will help the institution to understand the data specific to your client and data set. For example, two data elements may prove to be correlated for an institution and not for another. Build Detection Engine Using the transaction monitoring manual as a guide, recreate the rule using an object oriented programming language (statisticallydriven language preferable; R, MATLAB or Python recommended) to build an external engine to perform analysis on the rule thresholds. A threshold range is determined for each threshold being tuned, and a matrix is created for all combinations of each of the different possible threshold values. As mentioned above, in the event that two thresholds are discovered to be directly correlated, choose to anchor these two thresholds together as one to eliminate unnecessary noise in the permutation matrix. Determine a de-minimis value to serve as the lowest threshold value in the re-tuning range for that threshold. Professional judgment is used to identify the highest threshold value in the re-tuning range for that threshold but typically will mirror the same delta between the current threshold value and the de-minimis threshold value in the opposite direction. For some rules, it is expected that this permutation matrix can easily create upwards of a thousand different threshold combinations. A simple example is included below to visualize the permutation matrix discussed above. Threshold Current Threshold Current Threshold – Lower Range (de-minimis) Current Threshold – Upper Range 1 10,000 9,500 10,500 2 4 2 6 Figure 1.1 Sample Thresholds and Ranges Permutation Threshold 1 Threshold 2 1 9,500 2 2 9,500 4 3 9,500 6 4 10,000 2 5 10,000 4 6 10,000 6 7 10,500 2 8 10,500 4 9 10,500 6 Figure 1.2 Sample Permutation Matrix (of Thresholds and Ranges from Figure 1.1)
  • 3. 3 Once both the permutation matrix and the rule engine have been built, the transaction data is clustered and all transactions falling into clusters outside of the threshold ranges are excluded and the two sets of transaction data (full set of transaction data and the transaction data related to historic alerts, cases, and SARs) are run through the rule engine against a loop of all threshold combinations in the permutation matrix. The first full set of all transactions are run through the engine to output a count of events, or “alerts”, for each permutation combination. Before proceeding, identify the threshold combination in the matrix which contains all current thresholds and compare this count against the actual alert count per the historic transaction monitoring system data. This provides a check for completeness over the data pull as well as validates the risk engine’s accuracy. Once confirmed, the second set of transaction data linked to the historic cases and SARs are run through the engine and logged as separate event counts in new columns in the matrix as show below. Permutation Threshold 1 Threshold 2 Transaction Alert – Historic All Transaction Data – Count Case Event Count SAR Event Count 1 (Current) 10,000 4 65 65 14 2 2 (New) 10,000 2 65 58 11 2 3 (New) 10,500 2 65 51 10 1 4 (New) … … … … … … Figure 1.3 Re-Tuning Permutation Matrix Event Counts As seen above, the “Transaction Alert – Historic” count and the “All Transaction Data- Count” for permutation 1 (the current threshold) are equal which would confirm the rule engine is simulating the rule accurately. In permutation 2, when the thresholds have been adjusted to a new combination there is a slight decline in the “All Transaction Data – Count”, as expected with the adjusted thresholds (Note that the “Transaction Alert – Historic” will be anchored at 65 as this logic only produces the alert count at the current thresholds). It is notable to mention that the SAR count of 2 will be used as an anchor in the analysis of the results to set the rule threshold or parameter. Best practice instructs that recent SARs serve as a benchmark for tuning thresholds and should heavily considered in the analysis. As seen above in permutation 3, the threshold combination would cause one of the historic SARs to evade detection, and thus this permutation (and any additional permutations which do not detect 2 historic SARs) should be subsequently eliminated from any consideration for re-tuning. A sample transaction data set and shell code (written in R) for the detection engine discussed above is provided in the Appendix. Quantitative Analysis Identify the remaining permutation combinations and focus the analysis on the case and SAR retention proportions (SAR proportion is usually weighed the most in the analysis). Any threshold combinations in the matrix with undesirably SAR and/or case retention ratios are eliminated from the list of possibilities. No one specific line of demarcation is identified at the end of the quantitative analysis for a re-tuning exercise. Instead, all remaining threshold combinations in the permutation
  • 4. 4 matrix continue through to the qualitative assessment and subsequent qualitative analysis is performed to solidify a new proposed line of demarcation. Qualitative Analysis Determine during the quantitative analysishistorical data in order to set indicators for Above-the- Line (ATL) and Below-the-Line (BTL) and pull the qualitative samples to be reviewed by the FIU. These samples, when flagged as ‘ATL’ are essentially the pseudo alerts, and are treated as such in the FIU’s investigative analysis. BTL samples are included in the sample to further validate the threshold line as the expectation is that less than x% of BTL samples (this percentage will depend on institutions risk appetite) would return as escalated cases. Sampling Determine the appropriate sample size using a hypergeometric binomial sampling without replacement. The number of transactions which fall into the ATL or BTL category will determine the number of random samples required for a statistically significantqualitative assessment. A large enough random sample of the same size would have roughly the same chance of producing a similar result. Below is the formula to be used for determining sample size. Included below is a sample size example: N 620 Throughdatasegmentationanalysis (e.g.,clustering,etc.) BTL populationis determined. CI 1.96 Target significancelevel (or confidence interval) is 95%;in this case associatedfactor(“z-value”) is 1.96. In MS Excel this can be calculatedusing“=NORM.S.INV(1-((1-0.95)/2))” Prec 0.05 Precisionis set by risk appetite. The smallerthe valueof this variablethe larger thesample size needs to be. P 10% Occurrencerate whichneeds to be detected n 113 Based on these values listedabove,n = 113 The table below shows how each variable impacts sample size:                  1 Prec PQCI N 1 1 Prec PQCI n 2 2 2 2 Legend N = populationsize P = expected occurrencerate of an attribute Q = l - P Prec = desiredprecision level CI = associated factor at agiven confidence level
  • 5. 5 N Prec P CI n 620 0.05 0.1 1.96 113 620 0.03 0.1 1.96 237 620 0.05 0.2 1.96 176 620 0.05 0.1 1.64 135 Figure 1.4 Sample Size Investigator Analysis The purpose of generating these samples is for the FIU to qualitatively evaluatethe efficacy of the quantitatively calculated thresholds. A group of investigators should be selected for the exercise and randomly assigned pseudo ‘alerts’ to review as if they were authentic alerts from the transaction monitoring system. In theory, if the threshold is appropriately tuned, then a transaction marked ‘ATL’ should most likely also be classified as ‘suspicious’ during this qualitative analysis, and all sample transactions that are marked ‘BTL’ would be flagged as not suspicious. The investigator’s evaluation must include consideration for the intent of each rule, and they will generally evaluate each transaction through a lens akin to “Given what is known from KYC, origin/destination of funds, beneficiary, etcetera, is it explainable that this consumer/entity would transact this dollar amount at this ...frequency, velocity, pattern etc...” To maintain the integrity of this assessment, the investigator does not make this qualitative assessment based only on the value of the flagged transaction, but rather looks holistically at various qualities of the transaction such as who the transaction is from/to (is it a wire transfer between two branches of the same company or a similar commodity like computers and semi-conductors), and if there are any fields such as an individual’s last name which contain key words which caused the rule to misinterpret a field as a false positive. Proportion and Efficacy Tests All threshold combinations will need a review to identify which threshold combination has the best efficacy both from a quantitative and qualitativeperspective. The outcome of the investigator’s qualitative analysisand the subsequent statistical analysisdecide if the line of demarcation determined during the quantitative analysis remains at the current level or is revised. The risk appetite determines the acceptable magnitude of proportion defective (proportion of suspicious transactions), also known as the “efficacy rate”. The range of outcomes and the corresponding decisions are listed below. 1. BTL has acceptable proportion of suspicious transactions and ATL proportion is significantly different (i.e., larger) than BTL’s proportion; threshold remains at the current level: the threshold meaningfully separates BTL and ATL populations and the separation is at the “correct” level (in terms of the risk appetite). 2. Both BTL and ATL proportions are low. Regardless of the statisticaldifference between the two populations, if the proportions are low, most likely the threshold needs to become less stringent to reduce the level of false positive. 3. Both BTL and ATL proportions are higher than what is the acceptable level of suspicious transactions. Threshold needs to become more stringent.
  • 6. 6 Approval and Implementation Per the institution’s review and approval process, receive allnecessary approvals from key personnel prior to making any changes into production. Once all pertinent parties are in agreement, create a functional specification document which should include a brief overview of the rule change, what is currently configured, and the desired configuration changes to be made. It is imperative that the functional specification document is thoroughly vetted and signed off validating that the document provides all necessary and accurate information to make the desired implementation changes. Authors Mayank Johri and Erik De Monte both work in the BSA/AML Analytics group at First Republic Bank in San Francisco, California. Their contact information is included below Mayank Johri, Vice President Analytics https://www.linkedin.com/in/johrim Erik De Monte, Data Scientist https://www.linkedin.com/in/edemonte
  • 7. 7 Appendix: Detection Engine Shell Code (R) Included below is a sample of transaction data and a detection engine shell code written in R that the data can be run through to depict the methodology discussed above. Please note that the table below should be saved as a comma-separated file (CSV) with the headers included as “Transactions.csv”. The R code was built using RStudio Version 0.99.902 and has been commented to navigate the user through each step of the methodology.
  • 8. 8 Sample Transaction File (Save as “Transactions.csv”) Transaction_Key Date Alert_Nbr Case_Nbr SAR_Nbr Attribute_01 Attribute_02 Attribute_03 TXN001 1/1/2016 NULL NULL NULL 6 0 70000 TXN002 1/15/2016 NULL NULL NULL 1 1 40 TXN003 2/1/2016 ALRT001 NULL NULL 11 2 1300000 TXN004 2/15/2016 NULL NULL NULL 5 1 340 TXN005 3/1/2016 NULL NULL NULL 7 0 126 TXN006 3/15/2016 NULL NULL NULL 7 0 986 TXN007 4/1/2016 NULL NULL NULL 5 0 1400 TXN008 4/15/2016 NULL NULL NULL 2 1 9765 TXN009 5/1/2016 NULL NULL NULL 3 0 2098 TXN010 5/15/2016 ALRT002 CASE001 SAR001 16 5 1000001 TXN011 6/1/2016 ALRT003 NULL NULL 15 3 1800765 TXN012 6/15/2016 NULL NULL NULL 3 1 65433 TXN013 1/1/2016 NULL NULL NULL 3 0 765889 TXN014 1/15/2016 NULL NULL NULL 4 1 12 TXN015 2/1/2016 NULL NULL NULL 7 1 2345 TXN016 2/15/2016 NULL NULL NULL 9 0 97800 TXN017 3/1/2016 NULL NULL NULL 6 0 5422 TXN018 3/15/2016 ALRT004 NULL NULL 12 2 1005678 TXN019 4/1/2016 NULL NULL NULL 6 1 9845 TXN020 4/15/2016 NULL NULL NULL 3 0 998 TXN021 5/1/2016 ALRT005 CASE002 NULL 18 4 1009876 TXN022 5/15/2016 NULL NULL NULL 4 0 12333 TXN023 6/1/2016 ALRT006 NULL NULL 10 5 1200000 TXN024 6/15/2016 ALRT007 CASE003 SAR002 20 10 34087264
  • 9. 9 Detection Engine Shell Code (R) #////////////////////////////////////////////////////////////////////// # Name: Re-Tuning Permutation Analysis - Example R Script # Date: October 2016 # Developers: Erik De Monte, Mayank Johri #////////////////////////////////////////////////////////////////////// # Assumptions: # # i. There are 4 tables of Transactions available to be run through the engine: # - All transactions for the date period identified # - All transactions related to historic alerts for the date period identified # - All transactions related to historic alerts that were escalated to case # - All transactions related to historic alerts that were escalated to SAR # # ii. The data available for the relevant thresholds being re-tuned are available. #////////////////////////////////////////////////////////////////////// #0. Preliminary Procedures #////////////////////////////////////////////////////////////////////// # Load relevant preinstalled R Packages library(cluster) library(doBy) library (base) library(lubridate) library(utils) library(RODBC) library(reshape) library(dplyr) # Upload and Format Data Frame transactions <- read.csv(file='Transactions.csv', sep=',', header=TRUE, stringsAsFactors = FALSE) transactions[,1] <- as.character(transactions[,1]) transactions[,2] <- as.Date(transactions[,2], format = "%m/%d/%Y") transactions[,3] <- as.character(transactions[,3]) transactions[,4] <- as.character(transactions[,4]) transactions[,5] <- as.character(transactions[,5]) transactions[,6] <- as.numeric(transactions[,6]) transactions[,7] <- as.numeric(transactions[,7]) transactions[,8] <- as.numeric(transactions[,8]) #////////////////////////////////////////////////////////////////////// # 1. Create a reference table for permutation matrix. #////////////////////////////////////////////////////////////////////// # 1a. Define Threshold Variables # For the sake of this example, let us assume that the current thresholds are set at: # threshold_01 = 10 # threshold_02 = 2 # threshold_03 = 1000000 # To define exact values to a threshold, assign it to a vector ("c") # To define a sequence of values, use the "seq" function under the syntax: # threshold = seq(a,b,c) ; Go from a to b in increments of c threshold_01 = c(7, 10, 12) threshold_02 = c(1,2,3) threshold_03 = seq(800000,1200000,200000) # 1b. Create the Threshold Table
  • 10. 10 x_Threshold_Table <- expand.grid(threshold_01,threshold_02,threshold_03) # 1c. Accurately define the columns in the new table names(x_Threshold_Table)[names(x_Threshold_Table) == 'Var1'] <- 'Example_Threshold_01' names(x_Threshold_Table)[names(x_Threshold_Table) == 'Var2'] <- 'Example_Threshold_02' names(x_Threshold_Table)[names(x_Threshold_Table) == 'Var3'] <- 'Example_Threshold_03' # 1d. Clean up your Enviornment and remove unneccesary varibales. rm(threshold_01) rm(threshold_02) rm(threshold_03) #////////////////////////////////////////////////////////////////////// # 2. Loop transactions through each permutation in the Permutation Matrix (x_Threshold_Table) # Count the number of events #////////////////////////////////////////////////////////////////////// # Count of Transactions - Current Thresholds #////////////////////////////////////////////////////////////////////// # 2a. Set the baseline alert count based on current transactions # For the sake of this example, let us assume that the current thresholds are set at: # threshold_01 = 10 # threshold_02 = 2 # threshold_03 = 1000000 # In this example, there are 7 historic alerts for the transaction set. alerts <- subset(transactions, transactions$Alert_Nbr != 'NULL') alert_count <- as.numeric(length(alerts$Transaction_Key)) x_Final <- data.frame(x_Threshold_Table[1:3], alert_count) names(x_Final)[names(x_Final) == 'alert_count'] <- 'Transaction Alert - Historic' rm(alert_count) #////////////////////////////////////////////////////////////////////// # Count of Transactions - Permutation Thresholds #////////////////////////////////////////////////////////////////////// # 2b. Create a variable which logs the number of events which fit the respective loop Var_Event <- rep(NA,nrow(x_Threshold_Table)) # 2c. Loop through all threshold permutation combinations and create a subset of the transactions that would alert # var_index is used to temporarily hold the count of alerts between loops for (i in 1:nrow(x_Threshold_Table)){ var_index <- subset(transactions, ( (transactions$Attribute_01 >= x_Threshold_Table$Example_Threshold_01[i]) & (transactions$Attribute_02 >= x_Threshold_Table$Example_Threshold_02[i]) & (transactions$Attribute_03 >= x_Threshold_Table$Example_Threshold_03[i]) )) #Count Var_Event[i] <- as.numeric(length(var_index$Transaction_Key)) rm(var_index) } Event_Count=as.matrix(Var_Event) x_Final = cbind(x_Final, Event_Count)
  • 11. 11 names(x_Final)[names(x_Final) == 'Event_Count'] <- 'Transaction Data - Count' rm(Event_Count) rm(Var_Event) rm(i) #////////////////////////////////////////////////////////////////////// # Count of Historic Case Transactions #////////////////////////////////////////////////////////////////////// # Emulate the logic above using only the transactions related to historic cases. # Append ("cbind") the results to the final permutation table as done above. # Name it "Case Event Count" cases <- subset(transactions, transactions$Case_Nbr != 'NULL') Var_Event <- rep(NA,nrow(x_Threshold_Table)) for (i in 1:nrow(x_Threshold_Table)){ var_index <- subset(cases, ( (cases$Attribute_01 >= x_Threshold_Table$Example_Threshold_01[i]) & (cases$Attribute_02 >= x_Threshold_Table$Example_Threshold_02[i]) & (cases$Attribute_03 >= x_Threshold_Table$Example_Threshold_03[i]) )) #Count Var_Event[i] <- as.numeric(length(var_index$Transaction_Key)) rm(var_index) } Event_Count=as.matrix(Var_Event) x_Final = cbind(x_Final, Event_Count) names(x_Final)[names(x_Final) == 'Event_Count'] <- 'Case Event Count' rm(Event_Count) rm(Var_Event) rm(i) #////////////////////////////////////////////////////////////////////// # Count of Historic SAR Transactions #////////////////////////////////////////////////////////////////////// # Emulate the logic above using only the transactions related to historic SARs. # Append ("cbind") the results to the final permutation table as done above. # Name it "SAR Event Count" sars <- subset(transactions, transactions$SAR_Nbr != 'NULL') Var_Event <- rep(NA,nrow(x_Threshold_Table)) for (i in 1:nrow(x_Threshold_Table)){ var_index <- subset(sars, ( (sars$Attribute_01 >= x_Threshold_Table$Example_Threshold_01[i]) & (sars$Attribute_02 >= x_Threshold_Table$Example_Threshold_02[i]) & (sars$Attribute_03 >= x_Threshold_Table$Example_Threshold_03[i]) )) #Count Var_Event[i] <- as.numeric(length(var_index$Transaction_Key)) rm(var_index) }
  • 12. 12 Event_Count=as.matrix(Var_Event) x_Final = cbind(x_Final, Event_Count) names(x_Final)[names(x_Final) == 'Event_Count'] <- 'SAR Event Count' rm(Event_Count) rm(Var_Event) rm(i) #////////////////////////////////////////////////////////////////////// # Anchor your analysis to the number of SARs filed, remove any combinations which would have # missed a prior filed SAR. sar_count <- as.numeric(length(sars$Transaction_Key)) x_Final <- subset(x_Final, x_Final$`SAR Event Count` >= sar_count) rm(sar_count) #////////////////////////////////////////////////////////////////////// #////////////////////////////////////////////////////////////////////// #////////////////////////////////////////////////////////////////////// #//////////////////////////////////////////////////////////////////FIN.