Trinity: A Distributed Graph Engine
on a Memory Cloud
Speaker: LIN Qian
http://www.comp.nus.edu.sg/~linqian/
Graph applications
Online query processing  Low latency
Offline graph analytics  High throughput
Online queries
Random data access
e.g., BFS, sub-graph matching, …
Offline computations
Performed iteratively
Insight: Keeping the
graph in memory
at least the topology
Trinity
Online query + Offline analytics
Random data access problem
in large graph computation
Globally addressable distr. memory
Random access abstraction
Belief
High-speed network is more available
DRAM is cheaper
In-memory solution become practical
“Trinity itself is not a system that comes with
comprehensive built-in graph computation
modules.”
Trinity cluster
Stack of Trinity system modules
User define:
Graph schema, Communication protocols, Computation paradigms
Memory cloud
Partition memory space into trunks
Hashing
Memory trunks
2p > m
1. Trunk level parallelism
2. Efficient hashing
Hashing
Key-value store
p-bit value  i ∈ [0, 2p – 1]
Inner trunk hash table
Data partitioning and addressing
Benefit:
Scalability
Fault-tolerance
Modeling graph
Cell: value + schema
Represent a node in a cell
TSL
Object-oriented cell manipulation
Data integration
Network communication
Online queries
Traversal based
New paradigm
Vertex centric
offline analytics
Restrictive vertex centric model
Message passing
optimization
Create a bipartite partition of the
local graph
Buffer hub vertices
A new paradigm for
offline analytics
1. Aggregate answers from local
computations
2. Employ probabilistic inference
Circular memory management
• Aim to avoid memory gaps between large
number of key-value pairs
Fault tolerance
Heartbeat-based failure detection
BSP: checkpointing
Async.: “periodical interruption”
Performance
Performance (cont.)
Upcoming SlideShare
Loading in …5
×

Trinity: A Distributed Graph Engine on a Memory Cloud

748 views

Published on

Published in: Education, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
748
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
26
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Trinity: A Distributed Graph Engine on a Memory Cloud

  1. 1. Trinity: A Distributed Graph Engine on a Memory Cloud Speaker: LIN Qian http://www.comp.nus.edu.sg/~linqian/
  2. 2. Graph applications Online query processing  Low latency Offline graph analytics  High throughput
  3. 3. Online queries Random data access e.g., BFS, sub-graph matching, …
  4. 4. Offline computations Performed iteratively
  5. 5. Insight: Keeping the graph in memory at least the topology
  6. 6. Trinity Online query + Offline analytics
  7. 7. Random data access problem in large graph computation Globally addressable distr. memory Random access abstraction
  8. 8. Belief High-speed network is more available DRAM is cheaper In-memory solution become practical
  9. 9. “Trinity itself is not a system that comes with comprehensive built-in graph computation modules.”
  10. 10. Trinity cluster
  11. 11. Stack of Trinity system modules User define: Graph schema, Communication protocols, Computation paradigms
  12. 12. Memory cloud Partition memory space into trunks Hashing
  13. 13. Memory trunks 2p > m 1. Trunk level parallelism 2. Efficient hashing
  14. 14. Hashing Key-value store p-bit value  i ∈ [0, 2p – 1] Inner trunk hash table
  15. 15. Data partitioning and addressing Benefit: Scalability Fault-tolerance
  16. 16. Modeling graph Cell: value + schema Represent a node in a cell
  17. 17. TSL Object-oriented cell manipulation Data integration Network communication
  18. 18. Online queries Traversal based New paradigm
  19. 19. Vertex centric offline analytics Restrictive vertex centric model
  20. 20. Message passing optimization Create a bipartite partition of the local graph Buffer hub vertices
  21. 21. A new paradigm for offline analytics 1. Aggregate answers from local computations 2. Employ probabilistic inference
  22. 22. Circular memory management • Aim to avoid memory gaps between large number of key-value pairs
  23. 23. Fault tolerance Heartbeat-based failure detection BSP: checkpointing Async.: “periodical interruption”
  24. 24. Performance
  25. 25. Performance (cont.)

×