Gluecon InfiniteGraph Presentation: Scaling the Social Graph in the Cloud

2,781 views

Published on

Darren Wood, InfiniteGraph Lead Architect, addresses the developer audience at Glue Conference 2010 in Colorado (May 27, 2010), where he discusses the technical challenges to running queries traversing relationships to 4, 5 or more degrees of separation, across extremely large graph datasets in the cloud.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,781
On SlideShare
0
From Embeds
0
Number of Embeds
302
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Gluecon InfiniteGraph Presentation: Scaling the Social Graph in the Cloud

  1. 1. Complex graph traversal across large, distributed datasets Glue Conference 2010, Colorado Darren Wood, Lead Architect InfiniteGraph Copyright © InfiniteGraph
  2. 2. Scaling the [Social] Graph in the [Cloud] Darren Wood Lead Architect, InfiniteGraph
  3. 3. Graph Databases (Quickly) • Optimized around data relationships • Small focused API (typically not SQL) • Typical Use Cases : – Social Graph Analysis – Catching Bad Guys (see Booth 16) – Fraud / Financial (more bad guys) – Data Intensive Science – Web / Advertising Analytics Copyright © InfiniteGraph
  4. 4. Graph Databases (Almost Done) Vertex alice = myGraph.addVertex(new Person(“Alice”)); Vertex bob = myGraph.addVertex(new Person(“Bob”)); Vertex carlos = myGraph.addVertex(new Person(“Carlos”)); Vertex charlie = myGraph.addVertex(new Person(“Charlie”)); alice.addEdge(new Meeting(“Denver”, “5-27-10”), bob); bob.addEdge(new Call(timestamp), carlos); carlos.addEdge(new Payment(100000.00), charlie); bob.addEdge(new Call(timestamp), charlie); Alice Bob Carlos Charlie Meets Calls Pays Calls Copyright © InfiniteGraph
  5. 5. What’s So Difficult Then ? • Graphs grow quickly – Billions of phone calls / day in US – Emails, social media events, IP Traffic – Financial transactions • Some analytics require navigation of large sections of the graph • Each step (often) depends on the last • Must distribute data and go parallel Copyright © InfiniteGraph
  6. 6. First Some Good News… • Graph algorithms naturally branch • Can be automated or guided Bob Carlos Charlie Meets Calls Pays Alice Calls Chuck Dave Eve Lives Meets With Copyright © InfiniteGraph
  7. 7. Big Distributed Data (Traditional - Huge Generalization) Application(s) Distributed API Processor Processor Processor Processor Partition 1 Partition 2 Partition 3 Partition ...n Copyright © InfiniteGraph
  8. 8. Big Distributed Data (Graph) Application(s) Distributed API Processor Processor Processor Processor Partition 1 Partition 2 Partition 3 Partition ...n Copyright © InfiniteGraph
  9. 9. So What Are The Answers? Best Effort Partitioning Distributed API Processor Processor Partition 1 Partition 2 Copyright © InfiniteGraph
  10. 10. So What Are The Answers? The Look Ahead Example Application Distributed API Processor Processor A C D B E Y X Partition 1 Partition 2 Copyright © InfiniteGraph
  11. 11. Which of These Work ? • A carefully orchestrated combination of various options  • Can be tuned (degree of look ahead) • Healing graph can be expensive (write cost) • This can also be tuned/configured (external edge thresholds) Copyright © InfiniteGraph
  12. 12. Thankyou ! darren.wood@infinitegraph.com twitter.com/infinitegraph Copyright © InfiniteGraph

×