Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money Laundering Rings

Hardware Accelerated
Machine Learning Solution
for Detecting Fraud and
Money Laundering Rings
1
Sept 30, 2020
Victor Lee, TigerGraph
Kumar Deepak, Xilinx

| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Our Presenters
2
Victor Lee
Head of Product Strategy &
Developer Relations
● BS in Electrical Engineering and
Computer Science from UC Berkeley,
MS in Electrical Engineering from
Stanford University
● PhD in Computer Science from Kent
State University focused on graph data
mining
● 20+ years in tech industry
Kumar Deepak
Distinguished Engineer
● B.S in Electronics and Communication
Engineering from Indian Institute of
Technology, Kharagpur.
● Leads Xilinx engineering efforts to
accelerate database and analytics
● 20+ years of experience in architecting
and developing large-scale complex
software and hardware systems

● How Graph Analytics provide better and faster insights
● How FPGAs amplify the speed and value of analytics
● Use Case: Fraud Detection and Money Laundering
- Finding Connected Communities for fraud detection
● How FPGAs work
● Louvain Modularity run on FPGA
● Benchmark
Agenda
3

Graph-Powered Analytics & Machine Learning
Richer Data
● Relationships are 1st Class Citizens
● Connects different datasets and silos
Deeper Questions
● Look for semantic patterns of relationship
● Search far and wide more easily
Additional Computational Options
● Graph algorithms
● Graph-enhanced machine learning
Explainable Results
● Semantic data model, queries, and answers
● Visual exploration and results
4

The TigerGraph Difference
Feature Design Difference Beneﬁt
Real-Time Deep-Link Querying ● Native Graph design
● C++ engine, for high performance
● Storage Architecture
● Uncovers hard-to-ﬁnd patterns
● Operational, real-time
● HTAP: Transactions+Analytics
Handling Massive Scale ● Distributed DB architecture
● Massively parallel processing
● Compressed storage reduces
footprint and messaging
● Integrates all your data
● Automatic partitioning
● Elastic scaling of resource usage
In-Database Analytics ● GSQL: High-level yet
Turing-complete language
● User-extensible graph algorithm
library, runs in-DB
● ACID (OLTP) and Accumulators
(OLAP)
● Avoids transferring data
● Richer graph context
● In-DB machine learning
5 to 10+ hops deep
5

TigerGraph Platform: Deploy Anywhere
Graph Storage Engine (GSE) Graph Processing Engine (GPE)
Parallel Query
Processing
Data
Snapshots
GSQL
Queries
Visual
Design UI
RESTful
APIs
Input
Data
Operational Data
Master Data
DBs
Spark
Kafka
Files
Business
Intelligence
Analytics
Visualization
Dashboards
Reports
Data Warehouses
Master Data
Stores
Machine Learning
ETL Data Loader
User queries,
graph algorithms
GSQL
Server
Graph-
Studio
Server
Graph Data
Storage
ID ServiceIndexing
Message Queuing
(Spark / Kafka
Zookeeper)
RESTPP

TigerGraph Platform: Deploy Anywhere
Graph Storage Engine (GSE) Graph Processing Engine (GPE)
Parallel Query
Processing
Data
Snapshots
GSQL
Queries
Visual
Design UI
RESTful
APIs
Input
Data
Operational Data
Master Data
DBs
Spark
Kafka
Files
Business
Intelligence
Analytics
Visualization
Dashboards
Reports
Data Warehouses
Master Data
Stores
Machine Learning
ETL Data Loader
User queries,
graph algorithms
GSQL
Server
Graph-
Studio
Server
Graph Data
Storage
ID ServiceIndexing
Message Queuing
(Spark / Kafka
Zookeeper)
RESTPP
C++ UDF
on Alveo

TigerGraph + XILINX = faster, deeper, and wider insights.
Vertical
Markets
TigerGraph
Use Cases
XILINX Acceleration Customer Beneﬁts
Healthcare Member Journey/
Customer 360
“Show similar members”
via Cosine Similarity
400X faster on Alveo U50
$150M/year call
center savings
Financial
Services
Anti-fraud/Anti-
Money Laundering
“Show fraud ring activity”
via Louvain Community Detection
~ 20X faster on Alveo U50 (WIP)
$500M credit card
fraud prevention
Manufacturing Supply Chain
Optimization
“Balance portfolio forecast”
Soon…
￡400M supply chain
savings

9
GRAPH
Clustering
Betweenness
Similarity
Degree
Page Rank
Recommend
Shortest Path
Connected
Centrality
Detection
Machine
Learning
Graph
Convolutional
Networks (GCN)
Temporal
Pattern Detect
Louvain
Dependency
Networks (RPN)
Markov
Networks (RDN)
Probabilistic
Models (PRM)
Graph Speciﬁc Algorithms + ML
https://www.geeksforgeeks.org/graph-data-structure-and-algorithms/

● Sophisticated fraud is multi-step, multi-actor,
orchestrated
● Graph Algorithms & ML both provide valuable
detection and investigative capabilities
Fraud Detection with Graph-enhanced ML
10

Shortest Path
• Is this entity closely connected
to known suspicious/risky
entities?
Graph Algorithms for Fraud Detection
11
Community Detection
• Narrow the focus of the
investigation
• How many high risk entities
are in the community?
Cycle Detection
• Is there a closed loop of related
entities where there
shouldn’t be (conﬂicts of
interest, etc.)?
• Is there a closed loop is
money ﬂow (money
laundering)?
Other valuable algorithms: PageRank, Cosine Similarity, etc…

● Suppose we partition a graph into communities:
● Modularity score measures how good is a particular graph partition:
Mod ~ (% of edges that are in-group) minus
(expected % of in-group edges, if edges were randomized in a certain way)
● Task: Find the partitioning that has the highest modularity
● Challenge: Exponential number of possible partitionings
● Solution: Louvain is one of the fastest methods for modularity-based partitioning
Louvain Modularity Method for Community Detection
12
first try ⇒ Mod(case 1) better ⇒ Mod(case 2)

© Copyright 2020 Xilinx
• Logic blocks
• Look-up tables – combinatorial logic
• Flip flops – sequential logic
• DSP (Digital Signal Processing)
‒ Pre-adder, Multiplier, Accumulator
‒ And, OR, NOT, NAND, NOR, XOR, XNOR
‒ Pattern Detector
• Writable Memory
• LUTRAM (Look-up table RAM)
• BRAM (Block RAM)
• URAM (Ultra RAM)
• Communication
• I/O, Transceiver, PCIe, Ethernet
• Programmable Interconnect
What is an FPGA (Field Programmable Gate Array)?
Credit: https://towardsdatascience.com/introduction-to-fpga-and-its-architecture-20a62c14421c
LUTs: 1.2 M
Flip-Flops: 2.4M
Writable Memory: 47 MB
DSP Units: 6800
Xilinx VU9P FPGA has:

Configuring an FPGA
Unprogrammed
configuration memory
Unconfigured
logic circuit
‘Programmed’
configuration memory
‘Configured’
logic circuit
Credit: ‘Bebop to the Boolean Boogie: An Unconventional Guide to
Electronics’

>> 15
Computing Devices
CPU GPU FPGA ASIC
Example AMD EPYC
7702
NVIDIA A100 Xilinx Alveo U50 Google TPU
Architecture Instruction Set Instruction Set Domain Specific Domain Specific
Purpose General
Purpose
General
Purpose
Domain Specific Domain Specific
Workload Types Serialized
Workloads
Parallel
Workloads
Any workload Single Workload
Ease of
Programming
Easy Medium Medium No
programmability
Energy Efficiency Low Medium High Very High

High-Performance FPGA Applications: Think “Parallel”
˃ Data-level parallelism
• Processing different blocks of a data set in parallel
˃ Task-level parallelism
• Executing different tasks in parallel
• Executing different tasks in a pipelined fashion
˃ Instruction-level parallelism
• Parallel instructions (superscalar)
• Pipelined instructions
˃ Bit-level parallelism
• Custom word width
funcCfuncB
funcA
funcD

Using C, C++ or OpenCL to Program FPGAs
˃ Xilinx pioneered C to FPGA compilation technology (aka “HLS”) in 2011
˃ No need for low-level hardware description languages
˃ FPGAs are “Software Programmable”
loop_main:for(int j=0;j<NUM_SIMGROUPS;j+=2) {
loop_share:for(uint k=0;k<NUM_SIMS;k++) {
loop_parallel:for(int i=0;i<NUM_RNGS;i++) {
mt_rng[i].BOX_MULLER(&num1[i][k],&num2[i][k],ratio4,ratio3);
float payoff1 = expf(num1[i][k])-1.0f;
float payoff2 = expf(num2[i][k])-1.0f;
if(num1[i][k]>0.0f)
pCall1[i][k]+= payoff1;
else
pPut1[i][k]-=payoff1;
if(num2[i][k]>0.0f)
pCall2[i][k]+=payoff2;
else
pPut2[i][k]-=payoff2;
}
}
}
FPGAVitis Compiler (v++)

Software Programmability: FPGA Development in C/C++
PCIe
x86 CPU
Host
Application
Runtime and Drivers
Acceleration API
FPGA
Accelerated
Functions
DMA Engine
AXI Interfaces
User
Application
Code
Xilinx
Acceleration
Platform
C/C++ code
Synthesizable
C/C++
GCC VITIS

U50 U20
0
U28
0
U25
0
Cloud On-premise
Louvain
Modularity
(C++)
TigerGraph
Xilinx Accelerated TigerGraph
>> 19
Vitis core
development kit
compilers
BLAS
Library
Vitis accelerated
libraries
Vitis drivers & runtime (XRT)
analyzers debuggers
Vitis target platforms
Graph
Algorithms and
User Defined
Functions (UDFs)

Coloring vertices
can relieve
dependencies
Louvain Modularity Algorithm
˃ Measurement of Modularity Q: judgement of stability of current
clustering
˃ Simple judgement for moving a node: ΔQ ： judgement for job
hopping(move) for a vertice
˃ main challenge: Integrating large-size variables by scanning graph as
input
Fig. 2 Parallel Louvain Algorithm flow
The algorithm is like a group of people clustering and then
job hopping until stable
Cid,
TOT,
cSize
Get
Cid[v{e}]
Find Best
Target
Update
>> 20
Cid,
TOT,
cSiz
e
Get
Cid[v{e}]
Find Best
Target
Update
Cid,
TOT,
cSiz
e
Building
-Phases
• Merged to
smaller graph
Coloring Coloring
Clustering:
No more clustering
happen
Phase-1 Phase-2 Phase-n
Same-color
vertices’
distance >1
No need for
coloring small
graph
Clustering:
• Iterating until
ΔQ small
enough
• Q: modularity
• 1 iteration will
scan all
vertices
• 1st
Phase
always take
most of
time(>80%)
Clustering:
The smaller the
graph,
the fewer the
vertices,
the faster the
iteration
Building
-Phases
>90%
workload
can be
accelerated
Input
graph
(Done )
For
FPGA
(… )

Benchmark: Louvain Modularity for 50M nodes network
Xilinx Alveo U50 PCIE Accelerator Card
8GB HBM, 75W
Dataset: europe_osm, Number of vertices: 50912018, Number of edges: 54054660
Nimbix Cloud

Demo: Louvain Modularity for 50M nodes network

Time (seconds) to calculate Louvain Modularity
20x faster than CPU
Using one Alveo U50

Thank You!
● Contact Us
○ TigerGraph: Victor Lee, victor@tigergraph.com
○ Xilinx: Dan Eaton, daniele@xilinx.com
Q&A
24

Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money Laundering Rings

More Related Content

What's hot

Similar to Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money Laundering Rings

More from TigerGraph

Recently uploaded

Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money Laundering Rings