More Related Content Similar to Gluecon InfiniteGraph Presentation: Scaling the Social Graph in the Cloud (20) More from InfiniteGraph (20) Gluecon InfiniteGraph Presentation: Scaling the Social Graph in the Cloud 1. Complex graph traversal
across large, distributed datasets
Glue Conference 2010, Colorado
Darren Wood, Lead Architect
InfiniteGraph
Copyright © InfiniteGraph
3. Graph Databases (Quickly)
• Optimized around data relationships
• Small focused API (typically not SQL)
• Typical Use Cases :
– Social Graph Analysis
– Catching Bad Guys (see Booth 16)
– Fraud / Financial (more bad guys)
– Data Intensive Science
– Web / Advertising Analytics
Copyright © InfiniteGraph
4. Graph Databases (Almost Done)
Vertex alice = myGraph.addVertex(new Person(“Alice”));
Vertex bob = myGraph.addVertex(new Person(“Bob”));
Vertex carlos = myGraph.addVertex(new Person(“Carlos”));
Vertex charlie = myGraph.addVertex(new Person(“Charlie”));
alice.addEdge(new Meeting(“Denver”, “5-27-10”), bob);
bob.addEdge(new Call(timestamp), carlos);
carlos.addEdge(new Payment(100000.00), charlie);
bob.addEdge(new Call(timestamp), charlie);
Alice Bob Carlos Charlie
Meets Calls Pays
Calls
Copyright © InfiniteGraph
5. What’s So Difficult Then ?
• Graphs grow quickly
– Billions of phone calls / day in US
– Emails, social media events, IP Traffic
– Financial transactions
• Some analytics require navigation of large
sections of the graph
• Each step (often) depends on the last
• Must distribute data and go parallel
Copyright © InfiniteGraph
6. First Some Good News…
• Graph algorithms naturally branch
• Can be automated or guided
Bob Carlos Charlie
Meets Calls Pays
Alice
Calls
Chuck Dave Eve
Lives Meets
With
Copyright © InfiniteGraph
7. Big Distributed Data
(Traditional - Huge Generalization)
Application(s)
Distributed API
Processor Processor Processor Processor
Partition 1 Partition 2 Partition 3 Partition ...n
Copyright © InfiniteGraph
8. Big Distributed Data
(Graph)
Application(s)
Distributed API
Processor Processor Processor Processor
Partition 1 Partition 2 Partition 3 Partition ...n
Copyright © InfiniteGraph
9. So What Are The Answers?
Best Effort Partitioning
Distributed API
Processor Processor
Partition 1 Partition 2
Copyright © InfiniteGraph
10. So What Are The Answers?
The Look Ahead Example
Application
Distributed API
Processor Processor
A C
D
B
E
Y
X
Partition 1 Partition 2
Copyright © InfiniteGraph
11. Which of These Work ?
• A carefully orchestrated combination of
various options
• Can be tuned (degree of look ahead)
• Healing graph can be expensive (write cost)
• This can also be tuned/configured (external
edge thresholds)
Copyright © InfiniteGraph