Trinity: A Distributed Graph Engine on a Memory Cloud

Trinity: A Distributed Graph Engine
on a Memory Cloud
Speaker: LIN Qian
http://www.comp.nus.edu.sg/~linqian/

Graph applications
Online query processing  Low latency
Offline graph analytics  High throughput

Online queries
Random data access
e.g., BFS, sub-graph matching, …

Offline computations
Performed iteratively

Insight: Keeping the
graph in memory
at least the topology

Trinity
Online query + Offline analytics

Random data access problem
in large graph computation
Globally addressable distr. memory
Random access abstraction

Belief
High-speed network is more available
DRAM is cheaper
In-memory solution become practical

“Trinity itself is not a system that comes with
comprehensive built-in graph computation
modules.”

Stack of Trinity system modules
User define:
Graph schema, Communication protocols, Computation paradigms

Memory cloud
Partition memory space into trunks
Hashing

Memory trunks
2p > m
1. Trunk level parallelism
2. Efficient hashing

Hashing
Key-value store
p-bit value  i ∈ [0, 2p – 1]
Inner trunk hash table

Data partitioning and addressing
Benefit:
Scalability
Fault-tolerance

Modeling graph
Cell: value + schema
Represent a node in a cell

TSL
Object-oriented cell manipulation
Data integration
Network communication

Online queries
Traversal based
New paradigm

Vertex centric
offline analytics
Restrictive vertex centric model

Message passing
optimization
Create a bipartite partition of the
local graph
Buffer hub vertices

A new paradigm for
offline analytics
1. Aggregate answers from local
computations
2. Employ probabilistic inference

Circular memory management
• Aim to avoid memory gaps between large
number of key-value pairs

Fault tolerance
Heartbeat-based failure detection
BSP: checkpointing
Async.: “periodical interruption”

More Related Content