SlideShare a Scribd company logo
1 of 2
Download to read offline
Using Machine Learning in Anti Money Laundering – Part 1
Background
Machine Learning is being used or experimented in all sorts of areas. Financial institutions are (or
looking to) leverage machine learning (and Artificial Intelligence) to improve how they run their
business.
In my desire to learn and understand machine learning, I decided to use an AML use case to see how
machine learning can be applied to a real business scenario. The AML activities consist of Know Your
Customer, Customer Due Diligence, Transaction Monitoring, SAR filing, Sanctions Screening, etc.
Customer Risk Rating
During Customer Due Diligence, financial institutions do customer risk assessment to determine the
overall risk rating of a customer. This is typically done by the risk rating methodology defined by the
Compliance group. A customer is assessed against several risk factors and given a score. Based on the
score calculated the customer is assigned a risk rating. The various risk factors are broadly in Geography
risk, Industry risk, Product risk, Channel risk, Relationship risk, Political risk, etc.
The customer risk rating is determined using a rules-based score and one could argue that this is not an
ideal candidate for machine learning use case. However precisely for this reason, I want to use this
because I can look at various machine learning models and determine how accurate these models are.
I have used a customer risk rating model using a limited number of risk factors. The risk factors that I
have used are:
1. Politically Exposed Person
2. Country of Residence
3. Length of Relationship
4. Number of Products
5. Net worth
6. Primary Product
Based on these risk factors, a risk score is calculated and the customer classified into Low, Medium or
High risk customer.
Algorithms
Before I get into the machine learning experiments, I want to thank Microsoft for making Azure Machine
Learning Studio available for learning. I also want to thank edX.org for the machine learning classes that
are made available on edX.org.
There are many machine learning algorithms available and I am going to experiment (this is still work in
progress and my experiments will continue) with following broad categories of algorithms:
- Classification
- Regression
- Clustering
Classification is supervised learning that is used to predict a category. In this case the category is the
customer risk classification. There are three risk classifications – Low, Medium and High. And due to
more than two categories, I used multi-class classification models.
Regression algorithms are used when a value is being predicted. In my learning, I will predict the risk
score and then use the risk score to risk rate a customer.
Clustering is a non-supervised learning algorithm that is used to segment data into similar clusters. To be
done after classification and regression experiments.
Preparing Data
Preparing data to train machine learning models consumes a lot of time and since I created the data,
there was really no data quality, munging or cleansing work done. However, I had to do some data prep
work before I could start on my experiments. The data work that I
did was:
- Remove one of the columns that I am not going to use
- Set the datatype of
o IsPEP, Residence Country, Primary Product and Risk
Class to String
o Relationship Length, Number of products and
Networth to Integer
o Risk score to float.
- Set IsPEP, Residence Country, Primary Product to
Categorical variables
- Set IsPEP, Residence Country, Relationship Length, Number
of Products, Networth, Primary Product, Risk Score as
Features
- Set Risk Class as label
- Normalized Relationship Length using MinMax
transformation for values between 0 and 1
- Normalized Networth using ZScore transformation
- Risk Score was not normalized
A quick note on feature and label. Features are the fields that are used in the machine learning
algorithms to predict. Label is the target variable that is to be predicted.
More on the classification experiments in Part 2.
Sundries
The data that I am using is dummy data. I have created this data based on my experience and reflects
real life scenarios. E.g. If a customer is PEP, that customer is all likelihood would be classified as High
risk.
The experiments done and the outcomes documented are my personal views and don’t reflect views of
any organization.

More Related Content

What's hot (20)

Homomorphic encryption
Homomorphic encryptionHomomorphic encryption
Homomorphic encryption
 
Euler-Fermat theorem.pptx
Euler-Fermat theorem.pptxEuler-Fermat theorem.pptx
Euler-Fermat theorem.pptx
 
Rsa
RsaRsa
Rsa
 
Cryptography
Cryptography Cryptography
Cryptography
 
Cryptography
CryptographyCryptography
Cryptography
 
Cs8792 cns - Public key cryptosystem (Unit III)
Cs8792   cns - Public key cryptosystem (Unit III)Cs8792   cns - Public key cryptosystem (Unit III)
Cs8792 cns - Public key cryptosystem (Unit III)
 
RSA ALGORITHM
RSA ALGORITHMRSA ALGORITHM
RSA ALGORITHM
 
Ch09
Ch09Ch09
Ch09
 
Public Key Cryptography
Public Key CryptographyPublic Key Cryptography
Public Key Cryptography
 
Kerberos
KerberosKerberos
Kerberos
 
Product Cipher
Product CipherProduct Cipher
Product Cipher
 
basic encryption and decryption
 basic encryption and decryption basic encryption and decryption
basic encryption and decryption
 
Rotor Cipher and Enigma Machine
Rotor Cipher and Enigma MachineRotor Cipher and Enigma Machine
Rotor Cipher and Enigma Machine
 
Cs8792 cns - unit i
Cs8792   cns - unit iCs8792   cns - unit i
Cs8792 cns - unit i
 
RSA algorithm
RSA algorithmRSA algorithm
RSA algorithm
 
CRYPTOGRAPHY AND NETWORK SECURITY- E-Mail Security
CRYPTOGRAPHY AND NETWORK SECURITY- E-Mail SecurityCRYPTOGRAPHY AND NETWORK SECURITY- E-Mail Security
CRYPTOGRAPHY AND NETWORK SECURITY- E-Mail Security
 
System and web security
System and web securitySystem and web security
System and web security
 
Mitre Attack - Credential Dumping - updated.pptx
Mitre Attack - Credential Dumping - updated.pptxMitre Attack - Credential Dumping - updated.pptx
Mitre Attack - Credential Dumping - updated.pptx
 
Hash Function & Analysis
Hash Function & AnalysisHash Function & Analysis
Hash Function & Analysis
 
Digital signatures
 Digital signatures Digital signatures
Digital signatures
 

Similar to Using Machine Learning in Anti Money Laundering - Part 1

SURVEY ON SENTIMENT ANALYSIS
SURVEY ON SENTIMENT ANALYSISSURVEY ON SENTIMENT ANALYSIS
SURVEY ON SENTIMENT ANALYSISIRJET Journal
 
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...IRJET Journal
 
ArrowMiner FAQs
ArrowMiner FAQsArrowMiner FAQs
ArrowMiner FAQsdtsiolis
 
How ml can improve purchase conversions
How ml can improve purchase conversionsHow ml can improve purchase conversions
How ml can improve purchase conversionsSudeep Shukla
 
Supervised learning techniques and applications
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applicationsBenjaminlapid1
 
IRJET- Analysis of Brand Value Prediction based on Social Media Data
IRJET-  	  Analysis of Brand Value Prediction based on Social Media DataIRJET-  	  Analysis of Brand Value Prediction based on Social Media Data
IRJET- Analysis of Brand Value Prediction based on Social Media DataIRJET Journal
 
IRJET- Personality Prediction System using AI
IRJET- Personality Prediction System using AIIRJET- Personality Prediction System using AI
IRJET- Personality Prediction System using AIIRJET Journal
 
IRJET- Fake Review Detection using Opinion Mining
IRJET- Fake Review Detection using Opinion MiningIRJET- Fake Review Detection using Opinion Mining
IRJET- Fake Review Detection using Opinion MiningIRJET Journal
 
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET Journal
 
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET- Survey of Classification of Business Reviews using Sentiment AnalysisIRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET- Survey of Classification of Business Reviews using Sentiment AnalysisIRJET Journal
 
Using machine learning in anti money laundering part 2
Using machine learning in anti money laundering   part 2Using machine learning in anti money laundering   part 2
Using machine learning in anti money laundering part 2Naveen Grover
 
FAQ for the Predictive Testing of Opportunities
FAQ for the Predictive Testing of OpportunitiesFAQ for the Predictive Testing of Opportunities
FAQ for the Predictive Testing of OpportunitiesThe Inovo Group
 
IRJET - Online Product Scoring based on Sentiment based Review Analysis
IRJET - Online Product Scoring based on Sentiment based Review AnalysisIRJET - Online Product Scoring based on Sentiment based Review Analysis
IRJET - Online Product Scoring based on Sentiment based Review AnalysisIRJET Journal
 
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISKMACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISKIRJET Journal
 
Cmgt 400 Entire Course NEW
Cmgt 400 Entire Course NEWCmgt 400 Entire Course NEW
Cmgt 400 Entire Course NEWshyamuop
 
CMGT 400 Entire Course NEW
CMGT 400 Entire Course NEWCMGT 400 Entire Course NEW
CMGT 400 Entire Course NEWshyamuopfive
 
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYN
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYNAutomobile Insurance Claim Fraud Detection using Random Forest and ADASYN
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYNIRJET Journal
 
Enterprise 360 degree risk management
Enterprise 360 degree risk managementEnterprise 360 degree risk management
Enterprise 360 degree risk managementInfosys
 
Term ProjectTotal Points 5Due date 05012018Select an e.docx
Term ProjectTotal Points 5Due date 05012018Select an e.docxTerm ProjectTotal Points 5Due date 05012018Select an e.docx
Term ProjectTotal Points 5Due date 05012018Select an e.docxbradburgess22840
 
What we do; predictive and prescriptive analytics
What we do; predictive and prescriptive analyticsWhat we do; predictive and prescriptive analytics
What we do; predictive and prescriptive analyticsWeibull AS
 

Similar to Using Machine Learning in Anti Money Laundering - Part 1 (20)

SURVEY ON SENTIMENT ANALYSIS
SURVEY ON SENTIMENT ANALYSISSURVEY ON SENTIMENT ANALYSIS
SURVEY ON SENTIMENT ANALYSIS
 
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
 
ArrowMiner FAQs
ArrowMiner FAQsArrowMiner FAQs
ArrowMiner FAQs
 
How ml can improve purchase conversions
How ml can improve purchase conversionsHow ml can improve purchase conversions
How ml can improve purchase conversions
 
Supervised learning techniques and applications
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applications
 
IRJET- Analysis of Brand Value Prediction based on Social Media Data
IRJET-  	  Analysis of Brand Value Prediction based on Social Media DataIRJET-  	  Analysis of Brand Value Prediction based on Social Media Data
IRJET- Analysis of Brand Value Prediction based on Social Media Data
 
IRJET- Personality Prediction System using AI
IRJET- Personality Prediction System using AIIRJET- Personality Prediction System using AI
IRJET- Personality Prediction System using AI
 
IRJET- Fake Review Detection using Opinion Mining
IRJET- Fake Review Detection using Opinion MiningIRJET- Fake Review Detection using Opinion Mining
IRJET- Fake Review Detection using Opinion Mining
 
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
 
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET- Survey of Classification of Business Reviews using Sentiment AnalysisIRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
 
Using machine learning in anti money laundering part 2
Using machine learning in anti money laundering   part 2Using machine learning in anti money laundering   part 2
Using machine learning in anti money laundering part 2
 
FAQ for the Predictive Testing of Opportunities
FAQ for the Predictive Testing of OpportunitiesFAQ for the Predictive Testing of Opportunities
FAQ for the Predictive Testing of Opportunities
 
IRJET - Online Product Scoring based on Sentiment based Review Analysis
IRJET - Online Product Scoring based on Sentiment based Review AnalysisIRJET - Online Product Scoring based on Sentiment based Review Analysis
IRJET - Online Product Scoring based on Sentiment based Review Analysis
 
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISKMACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
 
Cmgt 400 Entire Course NEW
Cmgt 400 Entire Course NEWCmgt 400 Entire Course NEW
Cmgt 400 Entire Course NEW
 
CMGT 400 Entire Course NEW
CMGT 400 Entire Course NEWCMGT 400 Entire Course NEW
CMGT 400 Entire Course NEW
 
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYN
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYNAutomobile Insurance Claim Fraud Detection using Random Forest and ADASYN
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYN
 
Enterprise 360 degree risk management
Enterprise 360 degree risk managementEnterprise 360 degree risk management
Enterprise 360 degree risk management
 
Term ProjectTotal Points 5Due date 05012018Select an e.docx
Term ProjectTotal Points 5Due date 05012018Select an e.docxTerm ProjectTotal Points 5Due date 05012018Select an e.docx
Term ProjectTotal Points 5Due date 05012018Select an e.docx
 
What we do; predictive and prescriptive analytics
What we do; predictive and prescriptive analyticsWhat we do; predictive and prescriptive analytics
What we do; predictive and prescriptive analytics
 

Recently uploaded

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Recently uploaded (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Using Machine Learning in Anti Money Laundering - Part 1

  • 1. Using Machine Learning in Anti Money Laundering – Part 1 Background Machine Learning is being used or experimented in all sorts of areas. Financial institutions are (or looking to) leverage machine learning (and Artificial Intelligence) to improve how they run their business. In my desire to learn and understand machine learning, I decided to use an AML use case to see how machine learning can be applied to a real business scenario. The AML activities consist of Know Your Customer, Customer Due Diligence, Transaction Monitoring, SAR filing, Sanctions Screening, etc. Customer Risk Rating During Customer Due Diligence, financial institutions do customer risk assessment to determine the overall risk rating of a customer. This is typically done by the risk rating methodology defined by the Compliance group. A customer is assessed against several risk factors and given a score. Based on the score calculated the customer is assigned a risk rating. The various risk factors are broadly in Geography risk, Industry risk, Product risk, Channel risk, Relationship risk, Political risk, etc. The customer risk rating is determined using a rules-based score and one could argue that this is not an ideal candidate for machine learning use case. However precisely for this reason, I want to use this because I can look at various machine learning models and determine how accurate these models are. I have used a customer risk rating model using a limited number of risk factors. The risk factors that I have used are: 1. Politically Exposed Person 2. Country of Residence 3. Length of Relationship 4. Number of Products 5. Net worth 6. Primary Product Based on these risk factors, a risk score is calculated and the customer classified into Low, Medium or High risk customer. Algorithms Before I get into the machine learning experiments, I want to thank Microsoft for making Azure Machine Learning Studio available for learning. I also want to thank edX.org for the machine learning classes that are made available on edX.org. There are many machine learning algorithms available and I am going to experiment (this is still work in progress and my experiments will continue) with following broad categories of algorithms: - Classification - Regression - Clustering Classification is supervised learning that is used to predict a category. In this case the category is the customer risk classification. There are three risk classifications – Low, Medium and High. And due to more than two categories, I used multi-class classification models.
  • 2. Regression algorithms are used when a value is being predicted. In my learning, I will predict the risk score and then use the risk score to risk rate a customer. Clustering is a non-supervised learning algorithm that is used to segment data into similar clusters. To be done after classification and regression experiments. Preparing Data Preparing data to train machine learning models consumes a lot of time and since I created the data, there was really no data quality, munging or cleansing work done. However, I had to do some data prep work before I could start on my experiments. The data work that I did was: - Remove one of the columns that I am not going to use - Set the datatype of o IsPEP, Residence Country, Primary Product and Risk Class to String o Relationship Length, Number of products and Networth to Integer o Risk score to float. - Set IsPEP, Residence Country, Primary Product to Categorical variables - Set IsPEP, Residence Country, Relationship Length, Number of Products, Networth, Primary Product, Risk Score as Features - Set Risk Class as label - Normalized Relationship Length using MinMax transformation for values between 0 and 1 - Normalized Networth using ZScore transformation - Risk Score was not normalized A quick note on feature and label. Features are the fields that are used in the machine learning algorithms to predict. Label is the target variable that is to be predicted. More on the classification experiments in Part 2. Sundries The data that I am using is dummy data. I have created this data based on my experience and reflects real life scenarios. E.g. If a customer is PEP, that customer is all likelihood would be classified as High risk. The experiments done and the outcomes documented are my personal views and don’t reflect views of any organization.