SlideShare a Scribd company logo
1 of 6
Download to read offline
Approach to AML Rule Thresholds
By Mayank Johri, Amin Ahmadi, Kevin Kinkade, Sam Day, Michael Spieler, Erik DeMonte
January 12, 2016
Introduction
Institutions are constantly facing the challenge of managing growing alert volumes from automated transaction
monitoring systems, new money laundering typologies to surveil, and more robust regulatory guidance. The question is
how will BSA/AML departments scale to meet demand while managing compliance cost? In order to effectively set
baseline thresholds for new detection scenario configuration or improve the efficacy of existing scenarios, apply
statistical techniques and industry standards to identify the cut-off between “normal” and “abnormal” or “suspicious”
activity. These estimated thresholds are then either challenged or reinforced by the qualitative judgement of professional
investigators during a simulated ‘pseudo’ investigation or ‘qualitative assessment’.
An effective AML transaction monitoring program includes a standardized process for tuning, optimizing, and testing
AML scenarios/typologies that is understandable, repeatable and consistent.
An appropriately tuned or optimized scenario seeks a balance between maximizing the identification of suspicious
activity while simultaneously maximizing resource efficiency. The two competing objectives of tuning and optimization
which must remain in constant balance are:
(1) Reduce the number of ‘false positives’ or alerts generated on transactions that do not require further investigation
or the filing a Suspicious Activity Report (SAR).
(2) Reduce the number of ‘false negatives’ or ‘transactions that were not alerted’ but that do require further
investigation or the filing a SAR.
Phases
The following outlines the eight phase process for the initial tuning:
 Phase 0 | Planning. The Policy Office (PO) works closely with the Analytics team to strategize the scenario,
stratification, and parameters that will be used to conduct a threshold analysis.
 Phase 1 | Assess Data. Analytics communicates which data fields will be required to perform this analysis to
Information Technology (IT). IT then determines if the ETL of these fields into Transaction Monitoring System is
a near or long term process.
 Phase 2 | Query Data. Analytics queries the required transactional data for analysis.
 Phase 3 | Quantitative Analysis. Analytics stratifies the data as required (such as grouping like-attributes or ‘non-
tunable parameters’ such as entity/consumer, cash intensive businesses/non-CIB, Cr/Db, high/medium risk
destinations etc.) to account for like-attribute behavior patterns.
Transformation
Once stratified, Analytics performs transformations to the data as required (such as 90 day rolling count/sum/standard
deviation etc).
Exploratory Data Analysis
Analytics performs a variety of visual and statistical exploratory data analysis (EDA) techniques to analyze the dataset to
understand the correlation and impact that one or more parameters may have on the scenario, and therefore ultimately
on the alert-to-case efficacy. The objective of EDA is to further explore the recommended parameters (count, amount,
standard deviation, etc.) proposed during the planning phase to determine with greater statistical precision the best
combination of parameters
Segmentation
Once stratified and transformed, Analytics clusters the data’s ‘tunable parameters’ to account for ‘skewness’ in the data
population caused by outliers in order to yield a statistically accurate threshold that is representative of the 85th
percentile.
The 85th percentile is used as a standard when establishing a new rule to set an initial baseline threshold for defining the
cutoff between “normal” transactional data and “unusual” transactional data. For normally distributed data with a bell-
shaped curve (as depicted in the middle diagram below, figure 1.1), the mean value (i.e., the “expected” value) represents
the central tendency of the data, and the 85th percentile represents one standard deviation (σ) from this central tendency.
The 85th percentile could represent a reasonably conservative cutoff line or “threshold” for unusual activity. This
baseline simply provides a starting point for further analysis, and is later refined through qualitative judgement and alert-
to-case efficacy.
If transactional data were always normally distributed, it would be easy to calculate one standard deviation above the
mean to identify where to draw the line representing the 85th percentile of the data (this technique is often referred to as
‘quantiling’), thus establishing the threshold. However, in real world applications transactional data is often not normally
distributed. Transactional data is frequently skewed by outliers (such as uniquely high-value customers), therefore, if
statistical techniques that assume normal distribution (such as quantile) are applied while determining the 85th percentile
(+1 standard deviation from the mean), the result will yield a misrepresentative ‘threshold’ which is offset by the
outlier(s).
Figure 1.1 Distribution affected by Skewness
Clustering
To account for skewness in the data, employ the clustering technique known as ‘Partition around Medoid’ (PAM), or
more specifically, ‘Clustering Large Application’s (CLARA). Clustering is an alternative method of data segmentation
which is not predicated on the assumption that the data is normally distributed or that it has constant variance.
Clustering works by breaking the dataset into groups of distinct clusters around one common entity of the dataset
(which represents the group). This partition more accurately allows the assignment of a boundary (such as a target
threshold to distinguish normal from unusual activity).
The first step of the clustering model is to understand the number of clusters to partition the data by. The methodology
used to identify the optimal number of clusters takes into account two variables:
 Approximation – How the clustering model fits to the current data set (“Error Measure”)
 Generalization – Cost of how well the clustering model could be re-performed with another similar data set
The model for clustering can be seen in the figure below. As the number of clusters increases, (x-axis) the model will
become more complex and thus less stable. Increasing the number of clusters creates a more customized model which is
catered to the current data set, resulting in a high level of approximation. However, in this situation cost will increase as
the flexibility to re-perform using a similar data set will become more difficult. Inversely, the fewer clusters the less
representative the model is for the current data set, but the more scalable it is for future similar data sets. An objective
function curve is plotted to map the tradeoff between the two competing objectives. This modelling methodology is
used to identify the inflection point of the objective function of the two variables - the optimal number of clusters that
will accommodate both the current data set (approximation) and future data sets (generalization). Refer to figure 1.2
below for the conceptual visual of the modelling methodology used for identifying the optimal number of clusters.
Figure 1.2 Cluster Modeling – Identification of Number of Clusters
The basic approach to CLARA clustering is to partition objects/observations into several similar subsets. Data is
partitioned based on ‘Euclidean’ distance to a common data point (called a medoid). Medoid, rather than being a
calculated quantity (as it is the case with “mean”), is a data point in the cluster which happens to have the minimal
average dissimilarity to all other data points assigned to the same cluster. Euclidean distance is the most common
measure of dissimilarity. The advantage of using medoid-based cluster analysis is the fact that no assumption is made
about the structure of the data. In the case of mean-based cluster analysis, however, one makes the implicit restrictive
assumption that the data follows a Gaussian (bell-shape) distribution.
The next step is to determine the number of dimensions for parameter threshold analysis and to translate the
transactional data into ‘events’. An event is defined as a unique combination of all parameters for the identified scenario
or rule. The full transactional data set is translated into a population of events. Event bands are formed based on the
distribution of total events within the clusters. Event bands can be thought of as the boundaries between the clusters
(such that one or more parameters exhibit similarity).
Event Banding with One Parameter
When a scenario only has one tunable parameter (such as ‘amount’), bands for this parameter are ideally generated in 5%
increments beginning at the 50th percentile, resulting in six bands – P50, P55, P60, P65, P70, P75, P80, P85, P90, and
P95. The 50th percentile is chosen as a starting point to allow room for adjustment towards a more conservative
cluster/threshold, pending the results of the qualitative analysis. In other words, it is important to include clusters well
below, but still within reasonable consideration to the target threshold definition of transaction activity that will be
considered quantitatively suspicious. Refer to Figure 1.3 below.
Figure 1.3 85
th
Percentile and BTL/ATL
Some parameters such as ‘transaction count’ have a discrete range of values, and therefore the bands may not be able to
be established exactly at the desired percentile level. In these cases, judgment is necessary to establish reasonable bands.
Depending on the values of the bands, they will often be rounded to nearby numbers of a similar order of magnitude
but that are more easily socialized with internal and external business partners. Each of these bands corresponds to a
parameter value to be tested as a prospective threshold for the scenario.
If the six clusters have ranges that are drastically different from one another, adjustment to the bands may be necessary
to make the clusters more reasonable while still maintaining a relatively evenly distributed volume across the event
bands. This process is subjective and will differ from scenario-to-scenario, especially in cases where a specific value for a
parameter is inherent in the essence of the rule (e.g., $10,000 for cash structuring). In many cases the nature of the
customer segment and activity being monitored may support creating fewer than 6 event bands due to the lack of
volume of activity for that segment.
Figure 1.4 Event Banding of 1 Parameter ‘Amount’
Event Banding with Two Parameters
When a scenario has two tunable parameters (such as ‘count’ and ‘amount’), two independent sets of bands need to be
established for each parameter, similar to the method used for one parameter.
Analysis of two tunable parameters may be thought of as ‘two-dimensional’, whereas one parameter event banding is
based only on a single parameter (one axis), event banding with two parameters is affected by two axes (x & y axis). For
example, ‘count’ may represent the x-axis, while ‘amount’ may represent the y-axis. In this sense, the ultimate threshold
is determined by a combination of both axes, and so are the event bands. Including additional parameters will likewise
add additional dimensions and complexity.
As discussed above, while the 85th percentile is used to determine the threshold line, bands are created through
clustering techniques starting at the 50th percentile to account for those data points below, but still within reasonable
consideration to the target threshold definition of transaction activity that will be considered quantitatively suspicious. In
the diagram below, we see banding between two parameters, count and value. Once the data is clustered, the 85th
percentile is identified per the distribution (upper right hand table in Figure 1.5 below) and qualitative judgement is
exercised in order to set exact thresholds within the range that creates a model conducive for re-performance (Refer
above to discussion on “generalization” in the discussion of clustering modelling).
Figure 1.5 Event Banding of 2 Parameters ‘Value’/‘Count’
Event Banding with more than Two Parameters
When the scenario has more than two tunable parameters (such as count, amount and standard deviation), more than
two independent sets of bands need to be established for each parameter, similarly to the method used for two
parameters.
Select Threshold(s)
The output of phase three is the ‘threshold’, or ‘event’ characteristics (combination of thresholds based in the case of
multiple parameters) which serve as the baseline for ‘suspicious’ activity. Too many alerts may be generated which
creates extraneous noise and strains BSA/AML investigative resources. Conversely if the threshold is set too high,
suspicious activity may not generate alerts.
 Phase 4 | Sampling. Analytics applies the thresholds determined during the quantitative analysis phase to the
historical data in order to identify potential events for Above-the-Line (ATL) and Below-the-Line (BTL) analysis.
These indicators, when flagged as ‘ATL’ are essentially the same thing as alerts, except since they are applied using
historical data they are referred to as ‘pseudo alerts’. The number of transactions which fall into the ATL or BTL
category will determine the number of random samples required for a statistically significant qualitative assessment.
The purpose of the samples is to evaluate the efficacy of Analytics’ calculated thresholds. In other words, if the
threshold is appropriately tuned, then a larger percentage of events marked ‘ATL’ should be classified as ‘suspicious’
by an independent FIU investigator compared to the ‘BTL’ events. Analytics then packages these sample ATL and
BTL transactions into a format which is understandable and readable by an FIU investigator (samples must include
the transactional detail fields required for FIU to determine the nature of the transactions).
 Phase 5 | Training. Analytics orients the FIU investigators to the scenario, parameters and overall intent/spirit of
each rule so that during the qualitative analysis phase, the FIU investigators render appropriate independent
judgements for ATL and BTL samples.
 Phase 6 | Qualitative Analysis. FIU assesses the sampled transactions from a qualitative perspective. During this
phase, an independent FIU investigator analyzes each sampled pseudo alert as they would treat real alerts (without
any bias regardless of the alert’s classification as ATL or BTL). The investigator’s evaluation must include
consideration for the intent of each rule, and should include an assessment of both the qualitative and quantitative
fields associated with each alert. The FIU investigator will generally evaluate each transaction through a lens akin to
“Given what is known from KYC, origin/destination of funds, beneficiary, etcetera, is it explainable that this
consumer/entity would transact this dollar amount at this ...frequency, velocity, pattern etc...” FIU provides
feedback to Analytics for each pseudo alert classified as (a) ‘Escalate-to-Case’ (b) ‘Alert Cleared – No Investigation
Required (false positive)’, (c) ‘Alert Cleared – Error, or (d) ‘Insufficient Information’. If the efficacy is deemed
appropriate, then Business Review Session is scheduled to vote the rule into production.
 Phase 7 | Business Review Session. PO, Analytics and FIU present their findings for business review to voting
members.
 Phase 8 | Implementation. Analytics provides functional specifications to IT to implement the scenario within
Transaction Monitoring System.

More Related Content

What's hot

MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxnikshaikh786
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forestsViet-Trung TRAN
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
Introduction to unsupervised learning: outlier detection
Introduction to unsupervised learning: outlier detectionIntroduction to unsupervised learning: outlier detection
Introduction to unsupervised learning: outlier detectionJoseph Itopa Abubakar
 
Market basket analysis
Market basket analysisMarket basket analysis
Market basket analysisVermaAkash32
 
Solution to the Practice Test 3A, Chapter 6 Normal Probability Distribution
Solution to the Practice Test 3A, Chapter 6 Normal Probability DistributionSolution to the Practice Test 3A, Chapter 6 Normal Probability Distribution
Solution to the Practice Test 3A, Chapter 6 Normal Probability DistributionLong Beach City College
 
Lecture-3 Probability and probability distribution.ppt
Lecture-3 Probability and probability distribution.pptLecture-3 Probability and probability distribution.ppt
Lecture-3 Probability and probability distribution.ppthabtamu biazin
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberHouw Liong The
 
Practice test ch 8 hypothesis testing ch 9 two populations
Practice test ch 8 hypothesis testing ch 9 two populationsPractice test ch 8 hypothesis testing ch 9 two populations
Practice test ch 8 hypothesis testing ch 9 two populationsLong Beach City College
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic netVivian S. Zhang
 
Semi supervised learning machine learning made simple
Semi supervised learning  machine learning made simpleSemi supervised learning  machine learning made simple
Semi supervised learning machine learning made simpleDevansh16
 

What's hot (20)

MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptx
 
Data cleaning-outlier-detection
Data cleaning-outlier-detectionData cleaning-outlier-detection
Data cleaning-outlier-detection
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Histograms
HistogramsHistograms
Histograms
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Sec 1.3 collecting sample data
Sec 1.3 collecting sample data  Sec 1.3 collecting sample data
Sec 1.3 collecting sample data
 
Ch_13_Wooldridge.pdf.pdf
Ch_13_Wooldridge.pdf.pdfCh_13_Wooldridge.pdf.pdf
Ch_13_Wooldridge.pdf.pdf
 
Decision Tree
Decision TreeDecision Tree
Decision Tree
 
Introduction to unsupervised learning: outlier detection
Introduction to unsupervised learning: outlier detectionIntroduction to unsupervised learning: outlier detection
Introduction to unsupervised learning: outlier detection
 
Market basket analysis
Market basket analysisMarket basket analysis
Market basket analysis
 
1.1 statistical and critical thinking
1.1 statistical and critical thinking1.1 statistical and critical thinking
1.1 statistical and critical thinking
 
Solution to the Practice Test 3A, Chapter 6 Normal Probability Distribution
Solution to the Practice Test 3A, Chapter 6 Normal Probability DistributionSolution to the Practice Test 3A, Chapter 6 Normal Probability Distribution
Solution to the Practice Test 3A, Chapter 6 Normal Probability Distribution
 
Lecture-3 Probability and probability distribution.ppt
Lecture-3 Probability and probability distribution.pptLecture-3 Probability and probability distribution.ppt
Lecture-3 Probability and probability distribution.ppt
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Practice test ch 8 hypothesis testing ch 9 two populations
Practice test ch 8 hypothesis testing ch 9 two populationsPractice test ch 8 hypothesis testing ch 9 two populations
Practice test ch 8 hypothesis testing ch 9 two populations
 
El riesgo
El riesgoEl riesgo
El riesgo
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic net
 
Semi supervised learning machine learning made simple
Semi supervised learning  machine learning made simpleSemi supervised learning  machine learning made simple
Semi supervised learning machine learning made simple
 
Outlier detection handling
Outlier detection handlingOutlier detection handling
Outlier detection handling
 

Similar to Approach to BSA/AML Rule Thresholds

Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachReducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachErik De Monte
 
A Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningA Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningIRJET Journal
 
SELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODSSELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODSKAMIL MAJEED
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data PreprocessingT Kavitha
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientistsAjay Ohri
 
Credit Card Fraud Detection - Anomaly Detection
Credit Card Fraud Detection - Anomaly DetectionCredit Card Fraud Detection - Anomaly Detection
Credit Card Fraud Detection - Anomaly DetectionLalit Jain
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisIRJET Journal
 
Risk based quality management
Risk based quality managementRisk based quality management
Risk based quality managementselinasimpson2301
 
Data Mining StepsProblem Definition Market AnalysisC
Data Mining StepsProblem Definition Market AnalysisCData Mining StepsProblem Definition Market AnalysisC
Data Mining StepsProblem Definition Market AnalysisCsharondabriggs
 
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...Happiest Minds Technologies
 
4Data Mining Approach of Accident Occurrences Identification with Effective M...
4Data Mining Approach of Accident Occurrences Identification with Effective M...4Data Mining Approach of Accident Occurrences Identification with Effective M...
4Data Mining Approach of Accident Occurrences Identification with Effective M...IJECEIAES
 
Keys to extract value from the data analytics life cycle
Keys to extract value from the data analytics life cycleKeys to extract value from the data analytics life cycle
Keys to extract value from the data analytics life cycleGrant Thornton LLP
 
Open06
Open06Open06
Open06butest
 
Exam Short Preparation on Data Analytics
Exam Short Preparation on Data AnalyticsExam Short Preparation on Data Analytics
Exam Short Preparation on Data AnalyticsHarsh Parekh
 
IRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection AnalysisIRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection AnalysisIRJET Journal
 

Similar to Approach to BSA/AML Rule Thresholds (20)

Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachReducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
 
1234
12341234
1234
 
A Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningA Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data Mining
 
SELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODSSELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODS
 
Chapter 3.pdf
Chapter 3.pdfChapter 3.pdf
Chapter 3.pdf
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
 
Credit Card Fraud Detection - Anomaly Detection
Credit Card Fraud Detection - Anomaly DetectionCredit Card Fraud Detection - Anomaly Detection
Credit Card Fraud Detection - Anomaly Detection
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend Analysis
 
Risk based quality management
Risk based quality managementRisk based quality management
Risk based quality management
 
Data Mining StepsProblem Definition Market AnalysisC
Data Mining StepsProblem Definition Market AnalysisCData Mining StepsProblem Definition Market AnalysisC
Data Mining StepsProblem Definition Market AnalysisC
 
Descriptive Analytics: Data Reduction
 Descriptive Analytics: Data Reduction Descriptive Analytics: Data Reduction
Descriptive Analytics: Data Reduction
 
Analyzing Performance Test Data
Analyzing Performance Test DataAnalyzing Performance Test Data
Analyzing Performance Test Data
 
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
 
4Data Mining Approach of Accident Occurrences Identification with Effective M...
4Data Mining Approach of Accident Occurrences Identification with Effective M...4Data Mining Approach of Accident Occurrences Identification with Effective M...
4Data Mining Approach of Accident Occurrences Identification with Effective M...
 
Keys to extract value from the data analytics life cycle
Keys to extract value from the data analytics life cycleKeys to extract value from the data analytics life cycle
Keys to extract value from the data analytics life cycle
 
Open06
Open06Open06
Open06
 
Exam Short Preparation on Data Analytics
Exam Short Preparation on Data AnalyticsExam Short Preparation on Data Analytics
Exam Short Preparation on Data Analytics
 
IRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection AnalysisIRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection Analysis
 

Recently uploaded

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 

Recently uploaded (20)

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 

Approach to BSA/AML Rule Thresholds

  • 1. Approach to AML Rule Thresholds By Mayank Johri, Amin Ahmadi, Kevin Kinkade, Sam Day, Michael Spieler, Erik DeMonte January 12, 2016 Introduction Institutions are constantly facing the challenge of managing growing alert volumes from automated transaction monitoring systems, new money laundering typologies to surveil, and more robust regulatory guidance. The question is how will BSA/AML departments scale to meet demand while managing compliance cost? In order to effectively set baseline thresholds for new detection scenario configuration or improve the efficacy of existing scenarios, apply statistical techniques and industry standards to identify the cut-off between “normal” and “abnormal” or “suspicious” activity. These estimated thresholds are then either challenged or reinforced by the qualitative judgement of professional investigators during a simulated ‘pseudo’ investigation or ‘qualitative assessment’. An effective AML transaction monitoring program includes a standardized process for tuning, optimizing, and testing AML scenarios/typologies that is understandable, repeatable and consistent. An appropriately tuned or optimized scenario seeks a balance between maximizing the identification of suspicious activity while simultaneously maximizing resource efficiency. The two competing objectives of tuning and optimization which must remain in constant balance are: (1) Reduce the number of ‘false positives’ or alerts generated on transactions that do not require further investigation or the filing a Suspicious Activity Report (SAR). (2) Reduce the number of ‘false negatives’ or ‘transactions that were not alerted’ but that do require further investigation or the filing a SAR. Phases The following outlines the eight phase process for the initial tuning:  Phase 0 | Planning. The Policy Office (PO) works closely with the Analytics team to strategize the scenario, stratification, and parameters that will be used to conduct a threshold analysis.  Phase 1 | Assess Data. Analytics communicates which data fields will be required to perform this analysis to Information Technology (IT). IT then determines if the ETL of these fields into Transaction Monitoring System is a near or long term process.  Phase 2 | Query Data. Analytics queries the required transactional data for analysis.  Phase 3 | Quantitative Analysis. Analytics stratifies the data as required (such as grouping like-attributes or ‘non- tunable parameters’ such as entity/consumer, cash intensive businesses/non-CIB, Cr/Db, high/medium risk destinations etc.) to account for like-attribute behavior patterns. Transformation Once stratified, Analytics performs transformations to the data as required (such as 90 day rolling count/sum/standard deviation etc).
  • 2. Exploratory Data Analysis Analytics performs a variety of visual and statistical exploratory data analysis (EDA) techniques to analyze the dataset to understand the correlation and impact that one or more parameters may have on the scenario, and therefore ultimately on the alert-to-case efficacy. The objective of EDA is to further explore the recommended parameters (count, amount, standard deviation, etc.) proposed during the planning phase to determine with greater statistical precision the best combination of parameters Segmentation Once stratified and transformed, Analytics clusters the data’s ‘tunable parameters’ to account for ‘skewness’ in the data population caused by outliers in order to yield a statistically accurate threshold that is representative of the 85th percentile. The 85th percentile is used as a standard when establishing a new rule to set an initial baseline threshold for defining the cutoff between “normal” transactional data and “unusual” transactional data. For normally distributed data with a bell- shaped curve (as depicted in the middle diagram below, figure 1.1), the mean value (i.e., the “expected” value) represents the central tendency of the data, and the 85th percentile represents one standard deviation (σ) from this central tendency. The 85th percentile could represent a reasonably conservative cutoff line or “threshold” for unusual activity. This baseline simply provides a starting point for further analysis, and is later refined through qualitative judgement and alert- to-case efficacy. If transactional data were always normally distributed, it would be easy to calculate one standard deviation above the mean to identify where to draw the line representing the 85th percentile of the data (this technique is often referred to as ‘quantiling’), thus establishing the threshold. However, in real world applications transactional data is often not normally distributed. Transactional data is frequently skewed by outliers (such as uniquely high-value customers), therefore, if statistical techniques that assume normal distribution (such as quantile) are applied while determining the 85th percentile (+1 standard deviation from the mean), the result will yield a misrepresentative ‘threshold’ which is offset by the outlier(s). Figure 1.1 Distribution affected by Skewness Clustering To account for skewness in the data, employ the clustering technique known as ‘Partition around Medoid’ (PAM), or more specifically, ‘Clustering Large Application’s (CLARA). Clustering is an alternative method of data segmentation which is not predicated on the assumption that the data is normally distributed or that it has constant variance. Clustering works by breaking the dataset into groups of distinct clusters around one common entity of the dataset (which represents the group). This partition more accurately allows the assignment of a boundary (such as a target threshold to distinguish normal from unusual activity). The first step of the clustering model is to understand the number of clusters to partition the data by. The methodology used to identify the optimal number of clusters takes into account two variables:
  • 3.  Approximation – How the clustering model fits to the current data set (“Error Measure”)  Generalization – Cost of how well the clustering model could be re-performed with another similar data set The model for clustering can be seen in the figure below. As the number of clusters increases, (x-axis) the model will become more complex and thus less stable. Increasing the number of clusters creates a more customized model which is catered to the current data set, resulting in a high level of approximation. However, in this situation cost will increase as the flexibility to re-perform using a similar data set will become more difficult. Inversely, the fewer clusters the less representative the model is for the current data set, but the more scalable it is for future similar data sets. An objective function curve is plotted to map the tradeoff between the two competing objectives. This modelling methodology is used to identify the inflection point of the objective function of the two variables - the optimal number of clusters that will accommodate both the current data set (approximation) and future data sets (generalization). Refer to figure 1.2 below for the conceptual visual of the modelling methodology used for identifying the optimal number of clusters. Figure 1.2 Cluster Modeling – Identification of Number of Clusters The basic approach to CLARA clustering is to partition objects/observations into several similar subsets. Data is partitioned based on ‘Euclidean’ distance to a common data point (called a medoid). Medoid, rather than being a calculated quantity (as it is the case with “mean”), is a data point in the cluster which happens to have the minimal average dissimilarity to all other data points assigned to the same cluster. Euclidean distance is the most common measure of dissimilarity. The advantage of using medoid-based cluster analysis is the fact that no assumption is made about the structure of the data. In the case of mean-based cluster analysis, however, one makes the implicit restrictive assumption that the data follows a Gaussian (bell-shape) distribution. The next step is to determine the number of dimensions for parameter threshold analysis and to translate the transactional data into ‘events’. An event is defined as a unique combination of all parameters for the identified scenario or rule. The full transactional data set is translated into a population of events. Event bands are formed based on the distribution of total events within the clusters. Event bands can be thought of as the boundaries between the clusters (such that one or more parameters exhibit similarity). Event Banding with One Parameter When a scenario only has one tunable parameter (such as ‘amount’), bands for this parameter are ideally generated in 5% increments beginning at the 50th percentile, resulting in six bands – P50, P55, P60, P65, P70, P75, P80, P85, P90, and P95. The 50th percentile is chosen as a starting point to allow room for adjustment towards a more conservative cluster/threshold, pending the results of the qualitative analysis. In other words, it is important to include clusters well below, but still within reasonable consideration to the target threshold definition of transaction activity that will be considered quantitatively suspicious. Refer to Figure 1.3 below.
  • 4. Figure 1.3 85 th Percentile and BTL/ATL Some parameters such as ‘transaction count’ have a discrete range of values, and therefore the bands may not be able to be established exactly at the desired percentile level. In these cases, judgment is necessary to establish reasonable bands. Depending on the values of the bands, they will often be rounded to nearby numbers of a similar order of magnitude but that are more easily socialized with internal and external business partners. Each of these bands corresponds to a parameter value to be tested as a prospective threshold for the scenario. If the six clusters have ranges that are drastically different from one another, adjustment to the bands may be necessary to make the clusters more reasonable while still maintaining a relatively evenly distributed volume across the event bands. This process is subjective and will differ from scenario-to-scenario, especially in cases where a specific value for a parameter is inherent in the essence of the rule (e.g., $10,000 for cash structuring). In many cases the nature of the customer segment and activity being monitored may support creating fewer than 6 event bands due to the lack of volume of activity for that segment. Figure 1.4 Event Banding of 1 Parameter ‘Amount’ Event Banding with Two Parameters When a scenario has two tunable parameters (such as ‘count’ and ‘amount’), two independent sets of bands need to be established for each parameter, similar to the method used for one parameter.
  • 5. Analysis of two tunable parameters may be thought of as ‘two-dimensional’, whereas one parameter event banding is based only on a single parameter (one axis), event banding with two parameters is affected by two axes (x & y axis). For example, ‘count’ may represent the x-axis, while ‘amount’ may represent the y-axis. In this sense, the ultimate threshold is determined by a combination of both axes, and so are the event bands. Including additional parameters will likewise add additional dimensions and complexity. As discussed above, while the 85th percentile is used to determine the threshold line, bands are created through clustering techniques starting at the 50th percentile to account for those data points below, but still within reasonable consideration to the target threshold definition of transaction activity that will be considered quantitatively suspicious. In the diagram below, we see banding between two parameters, count and value. Once the data is clustered, the 85th percentile is identified per the distribution (upper right hand table in Figure 1.5 below) and qualitative judgement is exercised in order to set exact thresholds within the range that creates a model conducive for re-performance (Refer above to discussion on “generalization” in the discussion of clustering modelling). Figure 1.5 Event Banding of 2 Parameters ‘Value’/‘Count’ Event Banding with more than Two Parameters When the scenario has more than two tunable parameters (such as count, amount and standard deviation), more than two independent sets of bands need to be established for each parameter, similarly to the method used for two parameters. Select Threshold(s) The output of phase three is the ‘threshold’, or ‘event’ characteristics (combination of thresholds based in the case of multiple parameters) which serve as the baseline for ‘suspicious’ activity. Too many alerts may be generated which creates extraneous noise and strains BSA/AML investigative resources. Conversely if the threshold is set too high, suspicious activity may not generate alerts.
  • 6.  Phase 4 | Sampling. Analytics applies the thresholds determined during the quantitative analysis phase to the historical data in order to identify potential events for Above-the-Line (ATL) and Below-the-Line (BTL) analysis. These indicators, when flagged as ‘ATL’ are essentially the same thing as alerts, except since they are applied using historical data they are referred to as ‘pseudo alerts’. The number of transactions which fall into the ATL or BTL category will determine the number of random samples required for a statistically significant qualitative assessment. The purpose of the samples is to evaluate the efficacy of Analytics’ calculated thresholds. In other words, if the threshold is appropriately tuned, then a larger percentage of events marked ‘ATL’ should be classified as ‘suspicious’ by an independent FIU investigator compared to the ‘BTL’ events. Analytics then packages these sample ATL and BTL transactions into a format which is understandable and readable by an FIU investigator (samples must include the transactional detail fields required for FIU to determine the nature of the transactions).  Phase 5 | Training. Analytics orients the FIU investigators to the scenario, parameters and overall intent/spirit of each rule so that during the qualitative analysis phase, the FIU investigators render appropriate independent judgements for ATL and BTL samples.  Phase 6 | Qualitative Analysis. FIU assesses the sampled transactions from a qualitative perspective. During this phase, an independent FIU investigator analyzes each sampled pseudo alert as they would treat real alerts (without any bias regardless of the alert’s classification as ATL or BTL). The investigator’s evaluation must include consideration for the intent of each rule, and should include an assessment of both the qualitative and quantitative fields associated with each alert. The FIU investigator will generally evaluate each transaction through a lens akin to “Given what is known from KYC, origin/destination of funds, beneficiary, etcetera, is it explainable that this consumer/entity would transact this dollar amount at this ...frequency, velocity, pattern etc...” FIU provides feedback to Analytics for each pseudo alert classified as (a) ‘Escalate-to-Case’ (b) ‘Alert Cleared – No Investigation Required (false positive)’, (c) ‘Alert Cleared – Error, or (d) ‘Insufficient Information’. If the efficacy is deemed appropriate, then Business Review Session is scheduled to vote the rule into production.  Phase 7 | Business Review Session. PO, Analytics and FIU present their findings for business review to voting members.  Phase 8 | Implementation. Analytics provides functional specifications to IT to implement the scenario within Transaction Monitoring System.