Graph Gurus Episode 7
Connecting the Dots in Real-Time:
Deep Link Analysis with A Native Parallel Graph Database
© 2018 TigerGraph. All Rights Reserved
Welcome
● Attendees are muted but you can talk to us via Chat in Zoom
● Send questions at any time using the Q&A tab in the Zoom menu
● We will have 10 min for Q&A at the end
● The webinar will be recorded and sent via email
● A link to the presentation and reproducible steps will be emailed
2
ZOOM ISSUES: update to the latest version of Zoom and if you are using
multiple monitors disable “use dual monitors” in settings
© 2018 TigerGraph. All Rights Reserved
Developer Edition Available
We now offer Docker versions and VirtualBox versions of the TigerGraph
Developer Edition, so you can now run on
● MacOS
● Windows 10
● Linux
Developer Edition Download https://www.tigergraph.com/developer/
3
© 2018 TigerGraph. All Rights Reserved
Today's Gurus
4
Emma Liu
Product Manager
● BS in Engineering from Harvey
Mudd College, MS in Engineering
Systems from MIT
● Prior work experience at Oracle
and MarkLogic
● Focus - Cloud, Containers,
Enterprise Infra, Monitoring,
Management, Connectors
Huiting Su
Software Engineer
● Masters in Industrial
Engineering from Purdue
● Focus - Graph Algorithms and
Analytics, Machine Learning
● Resident GSQL Expert
© 2018 TigerGraph. All Rights Reserved© 2018 TigerGraph. All Rights Reserved
Agenda
● Graph Algorithms Overview
● What’s Link Analysis?
● Why Real-time Deep Link Analysis with TigerGraph?
● TigerGraph GSQL Graph Algorithms Library
● TigerGraph Differentiated Architecture
● TigerGraph Benchmark Report
● Demos for Graph Solution Use Cases
5
© 2018 TigerGraph. All Rights Reserved
Graph Algorithms, Part 3
Part 1 discussed PageRank (Graph Gurus Episode 5).
Part 2 discussed Community Detection (Graph Guru Episode 6)
6
© 2018 TigerGraph. All Rights Reserved
Picking the Right Graph Analytics Tools
7
● Supervised learning:
look for particular patterns/features, then correlate to known
fraud
○ Pattern matching
○ Path finding, shortest path
● Unsupervised learning:
look for frequent/infrequent patterns & groupings
○ Community detection and clustering. Outliers.
○ Frequent subgraphs
© 2018 TigerGraph. All Rights Reserved
What is Link Analysis?
Stage One: Crawl - Finding
Relationship Among Entities (1
to 2 hops)
Stage Two: Walk - Go Across
Multi-Hop (3 to 4 hops)
Stage Three: Run - Real-time
Multi-Hop (6+ hop)
8
© 2018 TigerGraph. All Rights
Reserved
8 Hop Deep Link Analysis Graph Pattern in Healthcare
9
• Look for alternative connections
between prescriber/physician &
referred facilities
• Assign risk rating based on amount
of referral & strength of alternative
connections
• Compare billing across facilities to
identify “upcoding” / overbilling
• Improve efficiency for case
investigation for potential violations
with intuitive user interface
Dr. Thomas
Street Address
Administrator
Patient 1
John
Street Address
Patient 2
Susan
Patient 3
Ravi
Drug Rehabilitation
Center
Hop 1
Hop 3
Hop 5
Hop 6
Hop 7
Hop 8
Third Party Data
Source (Integrated
with internal data)
Claim
1
Claim
2
Claim
3
Hop 2
Claim
5
Claim
4
Claim
6
Hop 4
10
Why Real-time Deep Link Analysis with
TigerGraph?
© 2018 TigerGraph. All Rights Reserved
Why Real-time Deep Link Analysis?
Drive actions with
timely insights
• Prevent Loss
• Example: Detect
Fraudulent Credit
Card Transaction
(Ride Sharing)
Capture business
moments
• Increase Revenue
• Example: Wish.com
Product
Recommendation
11
• Exponential
Knowledge Growth
• Example: Social
Network Knowledge
Go deeper to find
new insights
© 2018 TigerGraph. All Rights Reserved
Common Link Analysis Questions
12
RDBMS
• Real World Challenge:
• Not designed for
discovery/exploratory type of
analytics
• No easy SQL for answering these
questions
TigerGraph
• Real World Benefit:
• Designed for discovery/exploratory
type of Link analytics
• GSQL can naturally and efficiently
express and solve the query
Is there a connection between Customer A and Customer B?
What is the shortest path between Location A and Location B?
Is Driver A in the vicinity of Passenger A, and B around the same location B?
© 2018 TigerGraph. All Rights Reserved
NEW! Open Source GSQL Graph Algorithm Library
Each graph algorithm is a GSQL query
● May have zero or more input parameters
● Typically 3 variations:
○ Standard JSON output
○ Write to a CSV file
○ Save to vertex attributes (requires that the attributes
exist)
https://github.com/tigergraph/ecosys/tree/master/graph_algorithms
13
© 2018 TigerGraph. All Rights Reserved
Single-Source Shortest Path, Unweighted
14
shortest_ss(VERTEX v, BOOL display)
shortest_ss_file(VERTEX v, BOOL display,STRING filepath)
shortest_ss_attr(VERTEX v, BOOL display)
© 2018 TigerGraph. All Rights Reserved
Single Source Shortest Path, Weighted
15
shortest_path_any_wt(VERTEX v, INT maxDepth, BOOL display)
shortest_path_any_wt_file(VERTEX v, INT maxDepth, BOOL display, STRING filepath)
shortest_path_any_wt_attr(VERTEX v, INT maxDepth, BOOL display)
© 2018 TigerGraph. All Rights Reserved
All Pairs Shortest Path
16
CREATE QUERY all_pairs_shortest(INT maxDepth, BOOL display, STRING fileBase)
{
Start = {Node.*};
Result = SELECT s FROM Start:s
POST-ACCUM
shortest_ss_any_wt_file(s, maxDepth, display, fileBase+s);
}
© 2018 TigerGraph. All Rights Reserved
Triangle Counting
17
tri_count_fast()
tri_count_fast_file(FILE filepath)
tri_count_fast_attr()
© 2018 TigerGraph. All Rights Reserved
{
"error": false,
"message": "",
"version": {
"schema": 0,
"api": "v2"
},
"results": [
{"num_triangles": 4}
]
}
Triangle Counting Output
19
How does a “Native Parallel Database”
Empower Deep Link Analysis?
TigerGraph
© 2018 TigerGraph. All Rights Reserved
TigerGraph Key
Differentiated Architecture
● Automatic computational parallelism
based on Massively Parallel
Processing (MPP)
○ Each node/edge = a unit of
storage + a computational unit
○ Each node/edge is processed
in parallel
Well-Designed Parallel Computation
20
© 2018 TigerGraph. All Rights Reserved
TigerGraph Empowers Real-time Deep Link Analysis
21
TigerGraph engine
automatically scales the
computation across all
threads and CPU cores
available
22
TigerGraph Benchmark
For Real-time Deep Link Analysis
© 2018 TigerGraph. All Rights Reserved
Twitter Dataset Benchmark - 2 Hop Path Query Time
23
Measure TigerGraph Neo4j Neptune1 Neptune2 JanusGraph ArrangoDB.r ArrangoDB.m
Time (ms) 0.46 18.34 26.17 27.40 27.78 28.98 N/A
Normalized 1 40 57 60 60 63 N/A
Timeouts 0 0 63 60 48 101 N/A
Twitter Dataset: User-Follower Directed Graph (41.6M Vertices / 1.47B Edges)
© 2018 TigerGraph. All Rights Reserved
Twitter Dataset Benchmark - 6 Hop Path Query Time
24
Measure TigerGraph Neo4j Neptune1 Neptune2 JanusGraph ArrangoDB.r ArrangoDB.m
Time (sec) 63.06 N/A N/A N/A N/A N/A N/A
Normalized 1 N/A N/A N/A N/A N/A N/A
% OOM 0 0 100% 100% 0 0 N/A
% Timeouts 0 100% 0 0 100% 100% N/A
Twitter Dataset: User-Follower Directed Graph (41.6M Vertices / 1.47B Edges)
25
Why Deep Link Analysis in your Industry?
© 2018 TigerGraph. All Rights
Reserved
Use Case in Finance: Finding False Negatives in AML with Graph
26
Flagged as a low risk AML alert as
the new customer has no financial
transaction history, no previous
alerts or SARs and is not in a high
risk geography
Elevated to a high risk AML alert as
the new account shares a phone
number with four customers who
have alerts turned into SARs
Traditional AML Solution Graph based AML Solution
1st
Hop
2nd
Hop
3rd
Hop
4th
Hop
5th
Hop
27
Use Case Demo
Huiting Su, Solution Engineer
© 2018 TigerGraph. All Rights Reserved
Demo Overview
28
● Demo Query 1: 2 hop traversal (“Walk”)
● Demo Query 2: 5 hop deep link analysis (“Run”)
● Demo Query 3: 3-6 hop circle detection - scalability (“Walk/Run”)
29
Takeaways
© 2018 TigerGraph. All Rights Reserved
Summary
30
● Real-time Deep Link Analysis is critical to businesses who need
Graph
● TigerGraph is designed from ground-up for deep link analysis
● GSQL Graph Algorithms for real-time deep link analysis
● Graph Solution Use Case Demo
Q&A
Please send your questions via the Q&A menu in Zoom
31
© 2018 TigerGraph. All Rights Reserved
Episode 8:
DECEMBER 19 AT 11:00 A.M. PT / 2:00 P.M. ET
Location, Location, Location -
Geospatial analysis with a Graph Database
https://info.tigergraph.com/graph-gurus-8
32
SEE MORE WEBINARS AT
https://www.tigergraph.com/w
ebinars-and-events/
© 2018 TigerGraph. All Rights Reserved
Additional Resources
33
New Developer Portal
https://www.tigergraph.com/developers/
Download the Developer Edition or Enterprise Free Trial
https://www.tigergraph.com/download/
Guru Scripts
https://github.com/tigergraph/ecosys/tree/master/guru_scripts
Join our Developer Forum
https://groups.google.com/a/opengsql.org/forum/#!forum/gsql-users
@TigerGraphDB youtube.com/tigergraph facebook.com/TigerGraphDB linkedin.com/company/TigerGraph

Graph Gurus Episode 7: Connecting the Dots in Real-Time: Deep Link Analysis with a Native Parallel Graph Database

  • 1.
    Graph Gurus Episode7 Connecting the Dots in Real-Time: Deep Link Analysis with A Native Parallel Graph Database
  • 2.
    © 2018 TigerGraph.All Rights Reserved Welcome ● Attendees are muted but you can talk to us via Chat in Zoom ● Send questions at any time using the Q&A tab in the Zoom menu ● We will have 10 min for Q&A at the end ● The webinar will be recorded and sent via email ● A link to the presentation and reproducible steps will be emailed 2 ZOOM ISSUES: update to the latest version of Zoom and if you are using multiple monitors disable “use dual monitors” in settings
  • 3.
    © 2018 TigerGraph.All Rights Reserved Developer Edition Available We now offer Docker versions and VirtualBox versions of the TigerGraph Developer Edition, so you can now run on ● MacOS ● Windows 10 ● Linux Developer Edition Download https://www.tigergraph.com/developer/ 3
  • 4.
    © 2018 TigerGraph.All Rights Reserved Today's Gurus 4 Emma Liu Product Manager ● BS in Engineering from Harvey Mudd College, MS in Engineering Systems from MIT ● Prior work experience at Oracle and MarkLogic ● Focus - Cloud, Containers, Enterprise Infra, Monitoring, Management, Connectors Huiting Su Software Engineer ● Masters in Industrial Engineering from Purdue ● Focus - Graph Algorithms and Analytics, Machine Learning ● Resident GSQL Expert
  • 5.
    © 2018 TigerGraph.All Rights Reserved© 2018 TigerGraph. All Rights Reserved Agenda ● Graph Algorithms Overview ● What’s Link Analysis? ● Why Real-time Deep Link Analysis with TigerGraph? ● TigerGraph GSQL Graph Algorithms Library ● TigerGraph Differentiated Architecture ● TigerGraph Benchmark Report ● Demos for Graph Solution Use Cases 5
  • 6.
    © 2018 TigerGraph.All Rights Reserved Graph Algorithms, Part 3 Part 1 discussed PageRank (Graph Gurus Episode 5). Part 2 discussed Community Detection (Graph Guru Episode 6) 6
  • 7.
    © 2018 TigerGraph.All Rights Reserved Picking the Right Graph Analytics Tools 7 ● Supervised learning: look for particular patterns/features, then correlate to known fraud ○ Pattern matching ○ Path finding, shortest path ● Unsupervised learning: look for frequent/infrequent patterns & groupings ○ Community detection and clustering. Outliers. ○ Frequent subgraphs
  • 8.
    © 2018 TigerGraph.All Rights Reserved What is Link Analysis? Stage One: Crawl - Finding Relationship Among Entities (1 to 2 hops) Stage Two: Walk - Go Across Multi-Hop (3 to 4 hops) Stage Three: Run - Real-time Multi-Hop (6+ hop) 8
  • 9.
    © 2018 TigerGraph.All Rights Reserved 8 Hop Deep Link Analysis Graph Pattern in Healthcare 9 • Look for alternative connections between prescriber/physician & referred facilities • Assign risk rating based on amount of referral & strength of alternative connections • Compare billing across facilities to identify “upcoding” / overbilling • Improve efficiency for case investigation for potential violations with intuitive user interface Dr. Thomas Street Address Administrator Patient 1 John Street Address Patient 2 Susan Patient 3 Ravi Drug Rehabilitation Center Hop 1 Hop 3 Hop 5 Hop 6 Hop 7 Hop 8 Third Party Data Source (Integrated with internal data) Claim 1 Claim 2 Claim 3 Hop 2 Claim 5 Claim 4 Claim 6 Hop 4
  • 10.
    10 Why Real-time DeepLink Analysis with TigerGraph?
  • 11.
    © 2018 TigerGraph.All Rights Reserved Why Real-time Deep Link Analysis? Drive actions with timely insights • Prevent Loss • Example: Detect Fraudulent Credit Card Transaction (Ride Sharing) Capture business moments • Increase Revenue • Example: Wish.com Product Recommendation 11 • Exponential Knowledge Growth • Example: Social Network Knowledge Go deeper to find new insights
  • 12.
    © 2018 TigerGraph.All Rights Reserved Common Link Analysis Questions 12 RDBMS • Real World Challenge: • Not designed for discovery/exploratory type of analytics • No easy SQL for answering these questions TigerGraph • Real World Benefit: • Designed for discovery/exploratory type of Link analytics • GSQL can naturally and efficiently express and solve the query Is there a connection between Customer A and Customer B? What is the shortest path between Location A and Location B? Is Driver A in the vicinity of Passenger A, and B around the same location B?
  • 13.
    © 2018 TigerGraph.All Rights Reserved NEW! Open Source GSQL Graph Algorithm Library Each graph algorithm is a GSQL query ● May have zero or more input parameters ● Typically 3 variations: ○ Standard JSON output ○ Write to a CSV file ○ Save to vertex attributes (requires that the attributes exist) https://github.com/tigergraph/ecosys/tree/master/graph_algorithms 13
  • 14.
    © 2018 TigerGraph.All Rights Reserved Single-Source Shortest Path, Unweighted 14 shortest_ss(VERTEX v, BOOL display) shortest_ss_file(VERTEX v, BOOL display,STRING filepath) shortest_ss_attr(VERTEX v, BOOL display)
  • 15.
    © 2018 TigerGraph.All Rights Reserved Single Source Shortest Path, Weighted 15 shortest_path_any_wt(VERTEX v, INT maxDepth, BOOL display) shortest_path_any_wt_file(VERTEX v, INT maxDepth, BOOL display, STRING filepath) shortest_path_any_wt_attr(VERTEX v, INT maxDepth, BOOL display)
  • 16.
    © 2018 TigerGraph.All Rights Reserved All Pairs Shortest Path 16 CREATE QUERY all_pairs_shortest(INT maxDepth, BOOL display, STRING fileBase) { Start = {Node.*}; Result = SELECT s FROM Start:s POST-ACCUM shortest_ss_any_wt_file(s, maxDepth, display, fileBase+s); }
  • 17.
    © 2018 TigerGraph.All Rights Reserved Triangle Counting 17 tri_count_fast() tri_count_fast_file(FILE filepath) tri_count_fast_attr()
  • 18.
    © 2018 TigerGraph.All Rights Reserved { "error": false, "message": "", "version": { "schema": 0, "api": "v2" }, "results": [ {"num_triangles": 4} ] } Triangle Counting Output
  • 19.
    19 How does a“Native Parallel Database” Empower Deep Link Analysis? TigerGraph
  • 20.
    © 2018 TigerGraph.All Rights Reserved TigerGraph Key Differentiated Architecture ● Automatic computational parallelism based on Massively Parallel Processing (MPP) ○ Each node/edge = a unit of storage + a computational unit ○ Each node/edge is processed in parallel Well-Designed Parallel Computation 20
  • 21.
    © 2018 TigerGraph.All Rights Reserved TigerGraph Empowers Real-time Deep Link Analysis 21 TigerGraph engine automatically scales the computation across all threads and CPU cores available
  • 22.
  • 23.
    © 2018 TigerGraph.All Rights Reserved Twitter Dataset Benchmark - 2 Hop Path Query Time 23 Measure TigerGraph Neo4j Neptune1 Neptune2 JanusGraph ArrangoDB.r ArrangoDB.m Time (ms) 0.46 18.34 26.17 27.40 27.78 28.98 N/A Normalized 1 40 57 60 60 63 N/A Timeouts 0 0 63 60 48 101 N/A Twitter Dataset: User-Follower Directed Graph (41.6M Vertices / 1.47B Edges)
  • 24.
    © 2018 TigerGraph.All Rights Reserved Twitter Dataset Benchmark - 6 Hop Path Query Time 24 Measure TigerGraph Neo4j Neptune1 Neptune2 JanusGraph ArrangoDB.r ArrangoDB.m Time (sec) 63.06 N/A N/A N/A N/A N/A N/A Normalized 1 N/A N/A N/A N/A N/A N/A % OOM 0 0 100% 100% 0 0 N/A % Timeouts 0 100% 0 0 100% 100% N/A Twitter Dataset: User-Follower Directed Graph (41.6M Vertices / 1.47B Edges)
  • 25.
    25 Why Deep LinkAnalysis in your Industry?
  • 26.
    © 2018 TigerGraph.All Rights Reserved Use Case in Finance: Finding False Negatives in AML with Graph 26 Flagged as a low risk AML alert as the new customer has no financial transaction history, no previous alerts or SARs and is not in a high risk geography Elevated to a high risk AML alert as the new account shares a phone number with four customers who have alerts turned into SARs Traditional AML Solution Graph based AML Solution 1st Hop 2nd Hop 3rd Hop 4th Hop 5th Hop
  • 27.
    27 Use Case Demo HuitingSu, Solution Engineer
  • 28.
    © 2018 TigerGraph.All Rights Reserved Demo Overview 28 ● Demo Query 1: 2 hop traversal (“Walk”) ● Demo Query 2: 5 hop deep link analysis (“Run”) ● Demo Query 3: 3-6 hop circle detection - scalability (“Walk/Run”)
  • 29.
  • 30.
    © 2018 TigerGraph.All Rights Reserved Summary 30 ● Real-time Deep Link Analysis is critical to businesses who need Graph ● TigerGraph is designed from ground-up for deep link analysis ● GSQL Graph Algorithms for real-time deep link analysis ● Graph Solution Use Case Demo
  • 31.
    Q&A Please send yourquestions via the Q&A menu in Zoom 31
  • 32.
    © 2018 TigerGraph.All Rights Reserved Episode 8: DECEMBER 19 AT 11:00 A.M. PT / 2:00 P.M. ET Location, Location, Location - Geospatial analysis with a Graph Database https://info.tigergraph.com/graph-gurus-8 32 SEE MORE WEBINARS AT https://www.tigergraph.com/w ebinars-and-events/
  • 33.
    © 2018 TigerGraph.All Rights Reserved Additional Resources 33 New Developer Portal https://www.tigergraph.com/developers/ Download the Developer Edition or Enterprise Free Trial https://www.tigergraph.com/download/ Guru Scripts https://github.com/tigergraph/ecosys/tree/master/guru_scripts Join our Developer Forum https://groups.google.com/a/opengsql.org/forum/#!forum/gsql-users @TigerGraphDB youtube.com/tigergraph facebook.com/TigerGraphDB linkedin.com/company/TigerGraph