Big Graph Analytics Engine
Yinglong Xia
6/23/2016
8th Linked Data Benchmark Council TUC Meeting@Oracle Conference Center
2
Introduction
3
Introduction
http://www.huawei.com/en/about-huawei/corporate-governance/corporate-governance
4
Recent Growth
Revenue
Net Profits
Cash flow 

from 

operating 

activities
http://www.huawei.com/en/about-huawei
5
Collaboration
Industrial Partners
Universities
Standards
Technical 

organizations
Global Research 

Institute & Labs
Open Source
6
Graph Analytics for Smart Big Data
Big Data Analytics & Management
Graph
Machine

Learning
NLP Deep 

Learning
7
Graph in ONOS
HotSDN’2014
8
Topology Impact on Information Propagation
9
Explore the Variety in Graph Analytics
Graph
10
Challenges
● Very large scale graphs for analysis
• 10B~1000B in terms of the number of vertices
• a few hundreds of properties, static and dynamic
• distributed communication introduces additional overhead
● Irregularity in graph data access
• Low data locality results in high disk/communication IO overhead
• Data access patterns are diverse among graph analysis algorithms
● Near real-time requirement
• Incorporate with incremental graph updates
• Approximate query & analysis should be considered
● Efficiency and productivity to balance
11
Graph Platform for Smart Big Data
Infrastructure
Data 

Management
Graph engines
Visualization
Analytics
Single Machine Cluster GPU Server Cloud
Structure
Management
Property

Management
Metadata

Management
Permission

Control
Basic Engine
Streaming Graph Graphical Model Hyper Graph
Bayes NetCommunity
Label propagationCentrality
Anomaly detection
Matching
Ego Feature
Max Flow
Dynamic Graph Vis Property Vis Large Graph Vis
Incremental Update
bi-temp query
E→Edge Prop
12
Graph Platform
Data Source
Graph Topology and Property
V→Adjacency
KC/KV Store
V→Vertex Prop
Prop Idx
Encoding
External
(Solr/CLucene)
Concurrency

Control
Main Storage Dynamic Graph
Onlinequery/modification
Property indices
Ingestion
V→TimeStamp→Adjacency
Streaming graph storage
V→TimeStamp→Vprop
V→TimeStamp→Eprop
SlidingWindow
KC/KV Store
CSRSparse subgraphs
Densse SubgraphDense subgraphs
GPUOffload
DirectSolver
IterativeSolver
Snapshots Double buffering Batch 

processing
Streaming

Graph
TripleStore
Streaming algorithms
Graph Inference
Inference Tools
(Virtuoso, Jena, etc.)
Knowledge Graph
Online update property graph
Periodically updated 

static graph snapshots
Probabilistic Graphical Model & Inference
Offline Batch Processing
online/offline analysis
MVCC
KV Store
Snapshot

Management
13
Unified Graph Data Access Patterns
1
2
3
4
5
6
1 2 3 4 5 6
0.3
0.2
1.4
0.5 0.6
0.8
0.4
0.3
0.8
0.2
1.9
0.6
0.9 1.2
0.3
1.1
equivalent
src dst value

1
2 0.3
3
2 0.2
4
1 1.4
5
1 0.5
2 0.6
6
2 0.8
src dst value

1
3 0.4
2
3 0.3
3
4 0.8
5
3 0.2
6
4 1.9
src dst value

2
5 0.6
3
5 0.9
6 1.2
4
5 0.3
5
6 1.1
shard 1 (1, 2) shard 2 (3,4) shard 3 (5,6)
src dst value

1
2 0.3
3
2 0.2
4
1 1.4
5
1 0.5
2 0.6
6
2 0.8
src dst value

1
3 0.4
2
3 0.3
3
4 0.8
5
3 0.2
6
4 1.9
src dst value

2
5 0.6
3
5 0.9
6 1.2
4
5 0.3
5
6 1.1
src dst value

1
2 0.3
3
2 0.2
4
1 1.4
5
1 0.5
2 0.6
6
2 0.8
src dst value

1
3 0.4
2
3 0.3
3
4 0.8
5
3 0.2
6
4 1.9
src dst value

2
5 0.6
3
5 0.9
6 1.2
4
5 0.3
5
6 1.1
1
2
3
4
5
6
0.3
0.2
1.4
0.5 0.6
0.8
0.4
0.3
0.8
0.2
1.9
0.6
0.9 1.2
0.3
1.1
1
2
3
4
5
6
0.3
0.2
1.4
0.5 0.6
0.8
0.4
0.3
0.8
0.2
1.9
0.6
0.9 1.2
0.3
1.1
step1step2step3
observationonPSWdataaccess

patternsinspireshighlyefficient

shardingrepresentation
Iterationi
14
Construct Edge-set Flows
1
2
3
4
5
6
1 2 3 4 5 6
0.3
0.2
1.4
0.5 0.6
0.8
0.4
0.3
0.8
0.2
1.9
0.6
0.9 1.2
0.3
1.1
3
5
1
2
4
6
1 2 3 4 5 6
0.2
0.5 0.6
0.8
0.2
0.9 1.2
1.1
0.3 0.4
0.3 0.6
1.4 0.3
0.8 1.9
3
5
1
2
4
6
1 2 3 4 5 6
0.2
0.5 0.6
0.8
0.2
0.9 1.2
1.1
0.3 0.4
0.3 0.6
1.4 0.3
0.8 1.9
1 4 7 1 2 3 2 5 8 4 5 6
row permutation column permutation Physical edge-sets
1 2 3
4 5 6
7 8 9
Flow direction
15
Preliminary Experiments - Preproc.
Graph Ingestion/Preprocessing Time
Create the data in our format
16
Preliminary Experiments - Comp.
PageRank w/o Loading Time
Decent speedup achieved w/ or w/o 

loading time
17
Preliminary Experiments
PageRank Total Time
18
Conclusion
● Many big data problems involve links among a lot of entities,
naturally represented as a graph
● Property graph is highly expressive
● Industry is looking for graph/graphical model engines for complex
network analysis, streaming graph, probabilistic graphical models,
and RDF graph computing
● Efficiency is the key in many industry graph analysis systems,
especially when the data volume is big
● Eventually, the graph engine should serve for AI Business systems
Thanks
Yinglong Xia
yinglong.xia.2010@ieee.org

8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

  • 1.
    Big Graph AnalyticsEngine Yinglong Xia 6/23/2016 8th Linked Data Benchmark Council TUC Meeting@Oracle Conference Center
  • 2.
  • 3.
  • 4.
    4 Recent Growth Revenue Net Profits Cashflow 
 from 
 operating 
 activities http://www.huawei.com/en/about-huawei
  • 5.
  • 6.
    6 Graph Analytics forSmart Big Data Big Data Analytics & Management Graph Machine
 Learning NLP Deep 
 Learning
  • 7.
  • 8.
    8 Topology Impact onInformation Propagation
  • 9.
    9 Explore the Varietyin Graph Analytics Graph
  • 10.
    10 Challenges ● Very largescale graphs for analysis • 10B~1000B in terms of the number of vertices • a few hundreds of properties, static and dynamic • distributed communication introduces additional overhead ● Irregularity in graph data access • Low data locality results in high disk/communication IO overhead • Data access patterns are diverse among graph analysis algorithms ● Near real-time requirement • Incorporate with incremental graph updates • Approximate query & analysis should be considered ● Efficiency and productivity to balance
  • 11.
    11 Graph Platform forSmart Big Data Infrastructure Data 
 Management Graph engines Visualization Analytics Single Machine Cluster GPU Server Cloud Structure Management Property
 Management Metadata
 Management Permission
 Control Basic Engine Streaming Graph Graphical Model Hyper Graph Bayes NetCommunity Label propagationCentrality Anomaly detection Matching Ego Feature Max Flow Dynamic Graph Vis Property Vis Large Graph Vis Incremental Update
  • 12.
    bi-temp query E→Edge Prop 12 GraphPlatform Data Source Graph Topology and Property V→Adjacency KC/KV Store V→Vertex Prop Prop Idx Encoding External (Solr/CLucene) Concurrency
 Control Main Storage Dynamic Graph Onlinequery/modification Property indices Ingestion V→TimeStamp→Adjacency Streaming graph storage V→TimeStamp→Vprop V→TimeStamp→Eprop SlidingWindow KC/KV Store CSRSparse subgraphs Densse SubgraphDense subgraphs GPUOffload DirectSolver IterativeSolver Snapshots Double buffering Batch 
 processing Streaming
 Graph TripleStore Streaming algorithms Graph Inference Inference Tools (Virtuoso, Jena, etc.) Knowledge Graph Online update property graph Periodically updated 
 static graph snapshots Probabilistic Graphical Model & Inference Offline Batch Processing online/offline analysis MVCC KV Store Snapshot
 Management
  • 13.
    13 Unified Graph DataAccess Patterns 1 2 3 4 5 6 1 2 3 4 5 6 0.3 0.2 1.4 0.5 0.6 0.8 0.4 0.3 0.8 0.2 1.9 0.6 0.9 1.2 0.3 1.1 equivalent src dst value
 1 2 0.3 3 2 0.2 4 1 1.4 5 1 0.5 2 0.6 6 2 0.8 src dst value
 1 3 0.4 2 3 0.3 3 4 0.8 5 3 0.2 6 4 1.9 src dst value
 2 5 0.6 3 5 0.9 6 1.2 4 5 0.3 5 6 1.1 shard 1 (1, 2) shard 2 (3,4) shard 3 (5,6) src dst value
 1 2 0.3 3 2 0.2 4 1 1.4 5 1 0.5 2 0.6 6 2 0.8 src dst value
 1 3 0.4 2 3 0.3 3 4 0.8 5 3 0.2 6 4 1.9 src dst value
 2 5 0.6 3 5 0.9 6 1.2 4 5 0.3 5 6 1.1 src dst value
 1 2 0.3 3 2 0.2 4 1 1.4 5 1 0.5 2 0.6 6 2 0.8 src dst value
 1 3 0.4 2 3 0.3 3 4 0.8 5 3 0.2 6 4 1.9 src dst value
 2 5 0.6 3 5 0.9 6 1.2 4 5 0.3 5 6 1.1 1 2 3 4 5 6 0.3 0.2 1.4 0.5 0.6 0.8 0.4 0.3 0.8 0.2 1.9 0.6 0.9 1.2 0.3 1.1 1 2 3 4 5 6 0.3 0.2 1.4 0.5 0.6 0.8 0.4 0.3 0.8 0.2 1.9 0.6 0.9 1.2 0.3 1.1 step1step2step3 observationonPSWdataaccess
 patternsinspireshighlyefficient
 shardingrepresentation Iterationi
  • 14.
    14 Construct Edge-set Flows 1 2 3 4 5 6 12 3 4 5 6 0.3 0.2 1.4 0.5 0.6 0.8 0.4 0.3 0.8 0.2 1.9 0.6 0.9 1.2 0.3 1.1 3 5 1 2 4 6 1 2 3 4 5 6 0.2 0.5 0.6 0.8 0.2 0.9 1.2 1.1 0.3 0.4 0.3 0.6 1.4 0.3 0.8 1.9 3 5 1 2 4 6 1 2 3 4 5 6 0.2 0.5 0.6 0.8 0.2 0.9 1.2 1.1 0.3 0.4 0.3 0.6 1.4 0.3 0.8 1.9 1 4 7 1 2 3 2 5 8 4 5 6 row permutation column permutation Physical edge-sets 1 2 3 4 5 6 7 8 9 Flow direction
  • 15.
    15 Preliminary Experiments -Preproc. Graph Ingestion/Preprocessing Time Create the data in our format
  • 16.
    16 Preliminary Experiments -Comp. PageRank w/o Loading Time Decent speedup achieved w/ or w/o 
 loading time
  • 17.
  • 18.
    18 Conclusion ● Many bigdata problems involve links among a lot of entities, naturally represented as a graph ● Property graph is highly expressive ● Industry is looking for graph/graphical model engines for complex network analysis, streaming graph, probabilistic graphical models, and RDF graph computing ● Efficiency is the key in many industry graph analysis systems, especially when the data volume is big ● Eventually, the graph engine should serve for AI Business systems
  • 19.