Trinity: A Distributed Graph Engine
on a Memory Cloud
Speaker: LIN Qian
http://www.comp.nus.edu.sg/~linqian/
Graph applications
Online query processing  Low latency
Offline graph analytics  High throughput
Online queries
Random data access
e.g., BFS, sub-graph matching, …
Offline computations
Performed iteratively
Insight: Keeping the
graph in memory
at least the topology
Trinity
Online query + Offline analytics
Random data access problem
in large graph computation
Globally addressable distr. memory
Random access abstraction
Belief
High-speed network is more available
DRAM is cheaper
In-memory solution become practical
“Trinity itself is not a system that comes with
comprehensive built-in graph computation
modules.”
Trinity cluster
Stack of Trinity system modules
User define:
Graph schema, Communication protocols, Computation paradigms
Memory cloud
Partition memory space into trunks
Hashing
Memory trunks
2p > m
1. Trunk level parallelism
2. Efficient hashing
Hashing
Key-value store
p-bit value  i ∈ [0, 2p – 1]
Inner trunk hash table
Data partitioning and addressing
Benefit:
Scalability
Fault-tolerance
Modeling graph
Cell: value + schema
Represent a node in a cell
TSL
Object-oriented cell manipulation
Data integration
Network communication
Online queries
Traversal based
New paradigm
Vertex centric
offline analytics
Restrictive vertex centric model
Message passing
optimization
Create a bipartite partition of the
local graph
Buffer hub vertices
A new paradigm for
offline analytics
1. Aggregate answers from local
computations
2. Employ probabilistic inference
Circular memory management
• Aim to avoid memory gaps between large
number of key-value pairs
Fault tolerance
Heartbeat-based failure detection
BSP: checkpointing
Async.: “periodical interruption”
Performance
Performance (cont.)

Trinity: A Distributed Graph Engine on a Memory Cloud