In my desire to learn and understand machine learning, I decided to use an AML use case to see how machine learning can be applied to a real business scenario. These articles would cover various machine learning algorithms like classification, clustering and regression
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Using Machine Learning in Anti Money Laundering - Part 1
1. Using Machine Learning in Anti Money Laundering – Part 1
Background
Machine Learning is being used or experimented in all sorts of areas. Financial institutions are (or
looking to) leverage machine learning (and Artificial Intelligence) to improve how they run their
business.
In my desire to learn and understand machine learning, I decided to use an AML use case to see how
machine learning can be applied to a real business scenario. The AML activities consist of Know Your
Customer, Customer Due Diligence, Transaction Monitoring, SAR filing, Sanctions Screening, etc.
Customer Risk Rating
During Customer Due Diligence, financial institutions do customer risk assessment to determine the
overall risk rating of a customer. This is typically done by the risk rating methodology defined by the
Compliance group. A customer is assessed against several risk factors and given a score. Based on the
score calculated the customer is assigned a risk rating. The various risk factors are broadly in Geography
risk, Industry risk, Product risk, Channel risk, Relationship risk, Political risk, etc.
The customer risk rating is determined using a rules-based score and one could argue that this is not an
ideal candidate for machine learning use case. However precisely for this reason, I want to use this
because I can look at various machine learning models and determine how accurate these models are.
I have used a customer risk rating model using a limited number of risk factors. The risk factors that I
have used are:
1. Politically Exposed Person
2. Country of Residence
3. Length of Relationship
4. Number of Products
5. Net worth
6. Primary Product
Based on these risk factors, a risk score is calculated and the customer classified into Low, Medium or
High risk customer.
Algorithms
Before I get into the machine learning experiments, I want to thank Microsoft for making Azure Machine
Learning Studio available for learning. I also want to thank edX.org for the machine learning classes that
are made available on edX.org.
There are many machine learning algorithms available and I am going to experiment (this is still work in
progress and my experiments will continue) with following broad categories of algorithms:
- Classification
- Regression
- Clustering
Classification is supervised learning that is used to predict a category. In this case the category is the
customer risk classification. There are three risk classifications – Low, Medium and High. And due to
more than two categories, I used multi-class classification models.
2. Regression algorithms are used when a value is being predicted. In my learning, I will predict the risk
score and then use the risk score to risk rate a customer.
Clustering is a non-supervised learning algorithm that is used to segment data into similar clusters. To be
done after classification and regression experiments.
Preparing Data
Preparing data to train machine learning models consumes a lot of time and since I created the data,
there was really no data quality, munging or cleansing work done. However, I had to do some data prep
work before I could start on my experiments. The data work that I
did was:
- Remove one of the columns that I am not going to use
- Set the datatype of
o IsPEP, Residence Country, Primary Product and Risk
Class to String
o Relationship Length, Number of products and
Networth to Integer
o Risk score to float.
- Set IsPEP, Residence Country, Primary Product to
Categorical variables
- Set IsPEP, Residence Country, Relationship Length, Number
of Products, Networth, Primary Product, Risk Score as
Features
- Set Risk Class as label
- Normalized Relationship Length using MinMax
transformation for values between 0 and 1
- Normalized Networth using ZScore transformation
- Risk Score was not normalized
A quick note on feature and label. Features are the fields that are used in the machine learning
algorithms to predict. Label is the target variable that is to be predicted.
More on the classification experiments in Part 2.
Sundries
The data that I am using is dummy data. I have created this data based on my experience and reflects
real life scenarios. E.g. If a customer is PEP, that customer is all likelihood would be classified as High
risk.
The experiments done and the outcomes documented are my personal views and don’t reflect views of
any organization.