How Big Data and Predictive Analytics are revolutionizing AML and Financial Crime Detection

Big Data and Predictive
Analytics for AML and
Financial Crime Detection
Nadeem Asghar
Field CTO and Global Head, Technical Alliances

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
 Introduction
 What is Financial Crime, AML and what we are seeing in the AML Space
 Brief Discussion of Customer Activity in AML
 Illustrative Use Cases
 Where Current Implementations fall short?
 Reference Architecture for AML and Predictive Analytics
 Q&A

Introduction and Background
 Over 20 years of experience with deep expertise in Financial Services
 Build Big data solution to capture all sorts of data for large Financial Institute
 Almost 4 years with Hortonworks working across all verticals with focus on Financial
Services
 Working as Field CTO and Global head of Technical Alliances
 Huge fan of applying machine learning, deep learning to complex problems

FSI Industry Market Segments
FSI Industry
Capital Markets
Investment Banks Hedge Funds Wealth Mgmt
Retail Lines
Consumer lines Corporate
Payments
Acquirer & Issuer
Banks
Schemes
Market Exchanges
• There are 4 primary market segments/ sectors
comprising the global FSI industry: Capital Markets;
Retail Banking, Payments; Market Exchanges.
• Each geography, country and state may have their own
regulation and compliance requirements for products,
distribution and rating requirements. Banking is the most
regulated industry!
• It is key to understand the market segment of
the Banking company as the business process
and data/information needs and challenges are
very different across the 4. Additionally,
challenges vary by Premium/Revenue tier.
• There are many Global FS companies which
may define standards globally and deploy
locally.

Impact of Big Data in 5 major areas
Predictive Analytics
And ML/DL
Digital Banking
Capital Markets
Wealth Management
Cybersecurity Helping defend institutions against cyber threats
Improving wealth management capabilities thereby providing enhanced customer
service
Enhancing capabilities across investment banking, trading etc.
Enabling Digital bank, providing seamless customer experience
Analytics enabling both defensive and offensive use cases

Why Big Data for Financial Crimes and Controls
 Firms, large and small, need to navigate a set of increasingly complex compliance rules
and regulations as regulatory bodies clampdown on loopholes in the financial regulatory
framework. With tighter regulation comes the need to seek out more advanced and cost
effective compliance solutions
 It is estimated by the Financial Action Task Force that over one trillion dollars is
laundered annually.
 Regulators increasingly require greater oversight from institutions, including closer
monitoring for anti-money laundering (AML) and know your customer (KYC) compliance.
 The methods and tactics used to launder money are constantly evolving, from loan-back
schemes and front companies, to trusts and black market currency exchanges, there is
no “typical” money laundering case.

What Is AML, Financial Crime and What we are
seeing in AML

What is AML and Financial Crimes
 Financial crime is commonly considered as covering the following offences:
– Fraud
– Electronic Crime(Credit Card, stolen information etc)
– Money Laundering
– Terrorist financing
– Bribery and Corruption (KYC)
– market abuse and insider dealing (Trade Surveillance)
– Information security (Cyber Security)
 Anti-money laundering (AML) is a term mainly used in the financial and legal industries
to describe the legal controls that require financial institutions and other regulated
entities to prevent or report money laundering activities.

Financial Crime Is On the Rise!
of businesses were victims of fraud
of banks failed to catch fraud before funds were transferred out
of fraud attacks, the bank was unable to fully recover assets
of businesses said they have moved their banking activities elsewhere
Only 20% of banks were able to identify fraud before money was transferred.
“The ROI of investing in fraud prevention is clear.”
58%
Source: Ponemon Institute/Guardian Analytics study, March, 2010
80%
87%
40%
20%
A poll of 500 executives and owners of small and medium businesses showed:

Key AML Use Cases

Case1: Understand Customer Profile (KYC)
• Case Description: Mr Alex is a Compliance officer at ABC bank. While scrutinizing number of the customer profile and account
activity he noted some suspicious activity in one of the customer's account. Customer profile and account activity has the
following information.
• Customer Profile:
– Individual customer account, Risk Type Classification – Sensitive Client, Senior Public Figure. Customers carrying out large
transactions
– A number of transactions in the range of $10000 to 5,000,000 carried out by the same customer within a short space of time
– A number of customers sending payments to the same individual
• Uniqueness of Use case: Multi–Channel Linked Accounts involving multiple geography
• Data elements involved
–Customer Data
–Transaction Data over 5 year period
• Challenges with current technology
–Multiple Linked Accounts and Past History beyond 6 months Data retrieval
–Real-time visualization
 Supporting Data required to simulate the use case
–Cross Currency, Cross Geography Locations
–Multiple Channels Transactions
–Multiple Cross Currency transactions from USD, SGD, GBP and EUR
–Nearly x Accounts
–Across Geography in 50 countries
–Between 500-600 CR/DB transaction every Month
 Results / Objective of Use Case: To demonstrate Multi Channel transactions with historic data set
 Visualization to show results of use case: To be identified

Case2: Multi Product Linked Accounts (KYC)
• Case Description: A customer profile with a business profile with linked accounts and Transaction across products and
investments. There are many funneled transactions in to the account and investments across geographical locations of high risk
countries.
– Business customer account, Risk Type Classification – High Risk Client, Customers carrying out large transactions
– Complex and Large cash transactions in the range of $50,000 above
– Multiple Exchange of cash in one currency for foreign currency
– High cash businesses such as restaurants, pubs, casinos, taxi firms, beauty salons and amusement arcades
– A number of customers sending payments to the same individual
• Uniqueness of Use case: Multi–Product Linked Accounts
–Customer Master Profile
–Product Master
–Transactions over x year data set
• Challenges with current technology
–Multiple Linked Accounts with Multi products
–Real-time link visualization and tracking
–Cross Currency, Cross Geography Locations
–Multiple Product Transactions and wired transactions
–Multiple Cross Currency transactions from USD, SGD, GBP and EUR
–Nearly x Linked Accounts
–Between 2000 CR/DB transaction every Month
 Results / Objective of Use Case: To demonstrate Product transaction links with historic data set

Case3: $200 Million Credit Card Fraud
• Case Description: On Feb. 5, federal authorities arrested 13 individuals allegedly connected to one of the biggest payment card
schemes ever uncovered by the Department of Justice. The defendants' alleged criminal enterprise - built on synthetic, or fake,
identities and fraudulent credit histories - crossed numerous state and international borders, investigators say.
– 169 Bank Accounts
– 25000 Fraudulent Credit cards
– 7000 false identities
– Wired Transaction across geographies
 Uniqueness of Use case: Multiple customer profiles tracking
 Challenges with current technology
–Multi Customer Profile tracking and verification
–Accurate profile verification by cross-verification of public records with utility bills and bank accounts around the
world
–Create a single entity view (SEV) of similar entities
–Detect aliases whether they are created intentionally or through human error
–Identify irregularities in user input
–Reduce false positives through data enrichment
–Cross Geography Locations Profiles
–x Linked Accounts across different banks and products
 Results / Objective of Use Case: To demonstrate DE-duplication of customer profiles and verification of identity

Case4: Social Network Analysis
• Case Description: Analysis of Social Network Network sites to establish links with fraudulent customers Links
–Customer Profiles with over 5 Million records
–Search, match and link with Telephone, Mobile Number, Email, Social Network IDs
–Identify irregularities in user input
–Protect individual privacy concerns through anonymous resolution, displaying either the full matching records
–Reduce false positives through Data enrichment
 Uniqueness of Use case: Social Network Analysis of Customer Profiles
–Ability to link to social network sites and Text Analysis
–Customer Profiles gleaned from social network sites like Facebook, LindedIn, Myspace and other social
networks/communities
 Results / Objective of Use Case: To demonstrate Social Network identity links with customer profiles to establish Fraudulent
customer profiles and to reduce false identity

Case5: WatchList Filtering and Text Mining
• Case Description: Watch list filtering primary requirement is to routinely scan current and prospective clients against a database
(watch list) consisting of names, aka and address entries.
–Compare and scrutinize 1,000,000 names on the global PEP list
–Nearly 120 sanctions lists that collectively have more than 20,000 profiles.
–Watch list screening is creating an effective screening process that minimizes false positives and false negatives.
–Search, match and link with names and provide comparison with actual and original records
 Uniqueness of Use case: Text Mining of Unstructured Data
–Unstructured data results in False Positives
–Number of Matching Rules and Ease of incorporating Match Matrix changes.
–Customer Data Integrity
–Foreign names, multipart names, hyphenated names, names which “sound” similar but spelled differently
(eg.Muhammed v/s Mohamad)
–OFAC's SDN list, Bank of England List, Denied Person's List
 Results / Objective of Use Case: To demonstrate Reliable and scalable watch-list filtering

 Need for highly interactive and visually appealing UI’s for investigation
 Need for advanced analytics for deeper insight into trends in customer
behavior.
 Higher degree of depth of analysis in AML program.
 Guard against Aging technology and Manual approaches
 Automated Risk Classification Approaches
 Need to reduce the volume of False positives
 The need for structured and unstructured data analysis
Data Analysis Trends in AML

 Higher degree of technology sophistication among criminals
 AML programs need to move from running detection processes on similar data
sets, to operating across diverse data Fraud patterns of fraud demand 360 view
of Risk as well as an ability to work across more complex and larger data sets
 Most illicit activities spanning across geographies, products and accounts
 Lack of efficiency in Investigation Tools and Processes
 Expert Systems or Rules Engine based approaches becoming ineffective
 Predictive approach to detecting fraud is emerging as a key trend
 Move to increased automation
 The amount of data that is needed to feed the predictive approaches is growing
exponentially.
What we are seeing in AML..

Where current solutions fall short

 Fragmented Book of Record Transaction systems
– Lending systems along geographic and business lines
– Trading systems along desk and geographic lines
 Fragmented enterprise systems
– Multiple general ledgers
– Multiple Enterprise Risk Systems
– Multiple compliance systems by business line
• AML for Retail, AML for Commercial Lending, AML for Capital Markets…
• Lack of real time data processing, transaction monitoring and historical analytics
 Typically proprietary vendor and in-house built solutions that have been acquired over
the years building up a significant technological debt.
 Unable to keep pace with the progress of technology
 Move to combine Fraud (AML, Credit Card Fraud & InfoSec) into one platform
 Issues with flexibility, cost and scalability
What We Have Seen at Banks

High Level Solution - Architecture
Predictive Analytics

Some essential data elements for AML: Structure and Unstructured
 Inflow and outflow
 Links between entities and accounts
 Account activity: speed, volume, anonymity, etc.
 Reactivation of dormant accounts
 Signer relationship
 Deposit mix
 Transactions in areas of concern
 Use of multiple accounts and account types
 Social Media Behavior
 Etc.

Big Data for Financial Crimes and Controls- Solution
 The unique nature of money laundering requires a new generation of solutions based on
– Vast variety of Historical Data
– Business rules, a technology familiar to regulators the world over, can be empowered with fuzzy logic
– supervised and unsupervised learningand other machine learning technologies to increase detection
and reduce false positives.
 To implement a next generation solution for BSA/AML, firms must look towards updated
machine learning tools that allow finer grain resolution at the scale needed to detect
AML.
 Phased Approach
– Rule Based Model ( Crawl Phase )
– Feature based Model (Walk Phase)
– Data Driven Model ( Run Phase)

AML Solution: Rule Based Solution (Crawl Phase)
 Manual Analysis by a investigator
 Subjective and Inconsistent
 Time Consuming
 High False Positive
 Constant update to rules
 Not able to Catch no modes of Frauds
Key Highlights
Transactio
n Data
LexisNexis
Accounts
Database
Payment
Data
Card data
Dashboard to Match
Data
NOT
Alerts from Rule
Based System
Suspicious
Rule Based AML
Solution

AML Solution: Feature Based Solution (Walk Phase)
Rule base & Supervised & Unsupervised Learning for AML
 Features are meta data (Extracted from the
data)--average balance of last 7 days
 Features help algorithms capture
information from the data.
 Feature engineering is a form of language
translation: Between raw data and the
algorithm.
 Uses Supervised and/or unsupervised
Machine Learning
 Quick classification
 Low false positive rate - tweaked based on
risk appetite.
Key highlights
Transactio
n Data
LexisNexis
Accounts
Database
Payment
Data
Card data
Dashboard to Match
Data
NOT
Alerts from ML
Based System
Suspicious
Machine Learning
Algorithms
Historical Alerts

AML Solution: Data Driven Based Solution (Deep Learning)
 The algorithm understands malicious
behavior through data
 Algorithm is smart to work without
features - metadata
 Does not need alerts for training
 Helps in identifying any kind of
anomalous behavior
 Deeper insights about customer
Key highlights
Transactio
n Data
LexisNexis
Accounts
Database
Payment
Data
Card data
NOT
Suspicious
Deep learning
Algorithms
Data Driven Solution

High Level System Architecture: MAX ROI & Future Proof Solution
Note Just for AML/Fraud
Source
Data
(examples)
Data.gov
Accounts
Transactions
lexisNexis
Social
Real-Time Event
Streaming Engine
Dynamic Customer
Profile /Risk
Appetite Model
Central Data Lake
Real-time Intelligent Action
• Risk Similarity/Risk Profiling
• Related Entity Analysis (graph database)
• Fraud/Social Network Analysis
• Multi-line “profitable” class code
• Geospatial data
• Updated risk appetite
Risk Scoring Engine (examples)
• Credit score (if allowed by regulatory agencies)
• Rating attributes (demograhics, geographic,
social, property attributes)
• Likelihood of fraud/risk(frequency/severity)
Enrich Events with
Customer/Risk info and
Scoring Models
Update Profiles and Scoring Models
External/3rd party Data Sources
Native API
Rest API
ODBC/JDBC
Update Data Lake
Visualization / Analytical Views

Key Deliverable to build Big Data Solution
 Automating Due Diligence around KYC data
– Simple information collected during customer onboarding
– More complex information for certain entities
– Applying sophisticated analysis to such entities
– Automating Research across news feeds (LexisNexis, DB, TR, DJ, Google etc)
 Efficient Case Management
 Capture all Data Set at one place
 Applying Advanced Analytics (two sub Use Cases)
– Exploratory Data Science
– Advanced Transaction Intelligence
– Machine Learning/ Deep Learning

Business Analytics Must Evolve To Deal With Data Tipping Point
PROVIDE INSIGHT INTO THE PAST
via data aggregation, data mining,
business reporting, OLAP,
visualization, dashboards, etc.
UNDERSTAND THE FUTURE
via statistical models, forecasting
techniques, machine learning, etc.
ADVISE ON POSSIBLE OUTCOMES
via rules, optimization and
simulation algorithms

 A free open source linearly scalable platform has only become available within the last
few years
 Due to the amount of regulation over the last 15 years all bank enterprise compliance,
risk and finance systems now function essentially the same way
 Banks partnering with an open source partner is very different from partnering with a
vendor who develops proprietary software
 Proprietary software vendors will adopt the new standards since it is in their self
interest to do so
 Regulators can now streamline their regulatory practices by adopting a Big Data based
approach
 Having a standards based Open Source platform means that regulators can use the
same platform as the banks
Why Will This Work Now?

Q & A

How Big Data and Predictive Analytics are revolutionizing AML and Financial Crime Detection

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to How Big Data and Predictive Analytics are revolutionizing AML and Financial Crime Detection

Similar to How Big Data and Predictive Analytics are revolutionizing AML and Financial Crime Detection (20)

More from DataWorks Summit

More from DataWorks Summit (20)

Recently uploaded

Recently uploaded (20)

How Big Data and Predictive Analytics are revolutionizing AML and Financial Crime Detection