Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Analytics with Spark AI

WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics

Benyue (Emma) Liu, TigerGraph Inc.
Real-time Fraud Detection at
Scale - Integrating Real-Time
Deep-Link Graph Analytics with
Spark AI
#UnifiedDataAnalytics #SparkAISummit

Graph analysis is possibly the single most effective
competitive differentiator for organizations pursuing data-driven
operations and decisions after the design of data capture.”

Graph is HOW WE THINK
4#UnifiedDataAnalytics #SparkAISummit

Common TigerGraph Use Cases
5
Improve Operational EfficiencyReduce Costs & Manage RisksIncrease Revenue
• Recommendation Engine
• Real-time Customer 360/
MDM
• Product & Service Marketing
• Fraud Detection
• Anti-Money Laundering
(AML)
• Risk Assessment & Monitoring
• Cyber Security
• Enterprise Knowledge Graph
• Network, IT and Cloud
Resource Optimization
• Energy Management System
• Supply Chain Analysis
Analyze all interactions
in real-time to sell more
Reduce costs and assess and
monitor risks effectively
Manage resources for
maximum output
Foundational Use Cases: Geospatial Analysis, Time Series Analysis, AI and Machine Learning

7 Key Data Science Capabilities Powered By a Native Parallel Graph
Deep Link Analysis
Relational Commonality
Discovery and Computation
From a set of entities (e.g. devices,
customers, accounts, doctors), show
all links or connections
Given 2 entities (e.g. customers,
businesses), follow their
relationships to find commonality
6
Multi-dimensional Entity
& Pattern Matching
Given a pattern (e.g. referring
business to a relative), find similar
patterns in the graph
Hub & Community Detection
Find most influential members of a
group (customers, doctors, citizens)
& detect community around them
Community 1
Community 2
1 32 4
5 Geospatial Graph Analysis Analyze changes in entities & relationships with location data
A
C
A
B
Machine Learning Feature
Generation & Explainable AI
Extract graph-based features to feed as training data for
machine learning; Power Explainable AI7
Temporal (Time-Series) Graph Analysis Analyze changes in entities & relationships over time
Query Pattern P
MatchB
D

Power Explainable AI with TigerGraph
7

Spark + TigerGraph Data Pipeline
9

Typical Spark + TigerGraph Integration
● Data Preparation and Integration (TigerGraph/Spark)
● Unsupervised Learning (TigerGraph)
● Feature Extraction for Supervised Learning (TigerGraph/Spark)
● Model Training (Spark)
● Validate and Apply Model (TigerGraph)
● Visualize and Explore Interconnected Data (TigerGraph)
10

Machine Learning with TigerGraph
China Mobile Anti-Fraud/Scam Detection

12
Real-Time Phone-Based Fraud Detection
Massive, Worldwide Problem
● 18 Billion robocalls in US in 2017 (hiya.com)
● Spam/Scam - agile, spoofed numbers
Customer:
● 600M subscribers
● 300M calls/day, peak 10K calls/sec
● Need: Real-time detection of various
types of phone-based fraud

Real-Time Phone Anti-Spam/Scam Detection
13
TigerGraph Solution: Real-time graph-based machine learning and
decision system
Graph Analytics
● Real-time machine learning
○ 118 graph features per call
○ Retrained periodically with
2M calls
● Real-time decisions
○ Call recipient sees alert if
ML system says call is
suspicious
● In production since Dec 2016
Graph Database
● 600M phone numbers
(inside and outside network)
● 15B phone-phone call edges
(2 month sliding window)
○ Time
○ Duration
● Real-time graph updates
Peak 10K+ calls/sec
● 118 graph features per phone

Examples of Graph Features for Machine Learning
14
Good Phone
Features
Bad Phone
Features
(1) Short term call
duration
(2) Empty stable group
(3) No call back phone
(4) Many rejected calls
(5) Average distance > 3
Empty stable group
Many rejected
calls
Average
distance > 3
(1) High call back
phone
(2) Stable group
(3) Long term phone
(4) Many in-group
connections
(5) 3-step friend relation
Stable
group
Many in-
group
connections
Good Phone
Features
3-step friend
relation
///
Good phone Bad phone
X
X
X

China Mobile - Detecting Phone-Based Fraud by
Analyzing Network or Graph Pattern Features
15
• Each phone node has a fraud flag,
indicating it’s a good phone or a bad phone
and what type of fraud: scam, harassment,
advertisement
• Run real-time GSQL query for each call:
○ Collect 118 features
○ Compute composite score
○ Update fraud flag
○ Return fraud type
Real-Time Call Event
Caller
Callee
Time
Call Detail Records
Caller
Callee
Time
Duration
Query
Continuous
Graph Update
Fraud Type

Phone Fraud Real-Time Detection System
phone vertex
- fraud flag
- expiration time
target4
target3
- num of call
- total duration
- call date list
- num of rejection
target2
target1
● 600 Million Vertices
● 15+ Billion Edges
● 300 Million Daily
Updatesphone_phone

Case 1: Call type was recently flagged
Real-time
Call Event Call Time
Caller ID
Callee ID
If caller was
recently
flagged as
“bad”
If Caller is
classified as
“bad”Classifier
Query
Real-time
Collect Caller’s
Graph Features
Update

Case 2: Call needs to be classified
Real-time
Call Event Call Time
Caller ID
Callee ID
If caller was
recently
flagged as
“bad”
If Caller is
classified as
“bad”Classifier
Query
Real-time
Collect Caller’s
Graph Features
Update
Input: list of
calls with
phone pairs
and call time
(batch)
Output: 1. Call fraud type; 2. Scoring and feature vector
of fraud calls for supporting evidence Explainable AI

China Mobile Machine Learning Workflow
1. Data labels from police reports and online third party sources
2. A total of 118 graph features analyzed to build fraud detection model
3. All 118 graph features collected by one GSQL query
4. Training data’s features collected in GSQL in batch processing and stored
as CSV file for future model training
5. TigerGraph performs fraud scoring with multiple Machine Learning models
in real-time
6. Machine Learning models are trained offline and model parameters stored
as configuration files for GSQL to use for real-time scoring
(Future: Training ML models in Spark)

Machine Learning with TigerGraph
Real-time Scoring with Multiple ML models in GSQL
Efficient EasyFast
Real-time
response for both
feature collection
and scoring
Aggregation during
traversal - multiple
features in one
Collect complex
features without
multiple RDBMS
joins

China Mobile Anti-Fraud Results
from TigerGraph Machine Learning Solutions
• 3.2 million fraud notifications
in Shandong Province
(Dec 2016 – July 2019)
• Save potential loss
• ~39.86 million RMB
(~ 6 million US dollars)

Why TigerGraph + Spark For Machine Learning?
Parallel processing,
distributed systems
in training, ETL &
feature collections
Capture business
moments with real-
time response with
explainable AI
23
Enrich machine
learning with
complex graph
features
AT SCALE ! AT SCALE ! AT SCALE !

Spark and TigerGraph Data Pipeline
Static
Data
Sources
TigerGraph
JDBC
Driver
Streaming
Data
Sources

JDBC Driver (v1.2)
● Type 4 driver
● Support Read and Write bi-directional data flow to TigerGraph
● Read: Converts ResultSet to DataFrame
● Write: Load DataFrame and files to vertex/edge in TigerGraph
● Supports REST endpoints of built-in, compiled and interpreted GSQL queries from
TigerGraph
● Open Source:
● https://github.com/tigergraph/ecosys/tree/master/etl/tg-jdbc-driver

DEMO
Graph Feature Extraction from TigerGraph
to Spark Via TigerGraph’s JDBC Driver
26

Examples of Graph Features for Machine Learning
27
Good Phone
Features
Bad Phone
Features
(1) Short term call
duration
(2) Empty stable group
(3) No call back phone
(4) Many rejected calls
(5) Average distance > 3
Empty stable group
Many rejected
calls
Average
distance > 3
(1) High call back
phone
(2) Stable group
(3) Long term phone
(4) Many in-group
connections
(5) 3-step friend relation
Stable
group
Many in-
group
connections
Good Phone
Features
3-step friend
relation
///
Good phone Bad phone
X
X
X

Graph Features: Stable Group & InGroup
Connection
• Stable Group: phones in the target group that have regular calls
(stable connection) with source phone
• Stable InGroup Connections: phones in the target group that have
regular calls (stable connection) among themselves
Stable Connection defined as
● Has both call and callback
● Num of calls is larger than a given limit
● Total duration is larger than a given limit

Resources
• TigerGraph Cloud Machine Learning Starter Kit
a. Register at tgcloud.us
• JDBC Driver (Open Source)
a. https://github.com/tigergraph/ecosys/tree/master/etl/tg-jdbc-
driver
• Contact me at emma.liu@tigergraph.com
29

More … TigerGraph & Neural Network
30
Training data: https://www.coursera.org/learn/machine-learning
Watch Graph Guru Episode 19
https://info.tigergraph.com/graph-gurus-19
Contact Me:
emma.liu@tigergraph.com

Graph analysis is possibly the single most effective
competitive differentiator for organizations pursuing data-driven
operations and decisions after the design of data capture.”
Realtime deep link graph analytics at scale is the
differentiator to your machine learning pipeline!

DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

Stable Group Pseudocode
Step 1: start from a given phone vertex,
find its 1-step neighbors
Step 2: check if a target has both stable
outgoing (phone_phone) and stable
incoming edges (phone_phone_reversed)
source
target4
target3
- num of call
- total duration
- call date list
- num of rejection
target2
target1
phone_phone
phone_phone
phone_phone_reversed
source

Stable InGroup Connections Pseudocode
Step 1: starting from a given phone vertex,
find its 1-step neighbors (target group)
Step 2: for each vertex in the target group,
find its 1-step neighbors and check for
stable connections
Step 3: check the stable target for each
vertex in the target group
source
target4
target3
- num of call
- total duration
- call date list
- num of rejection
target2
target1phone_phone
phone_phone
phone_phone_reversed
source

Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Analytics with Spark AI

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Analytics with Spark AI

Similar to Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Analytics with Spark AI (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Analytics with Spark AI