Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Graph in Apache Cassandra. The World’s Most Scalable Graph Database

125 views

Published on

Graph databases are everywhere right now. The explosive growth in the graph market coupled with the hype of solving graph problems is causing both excitement and confusion. From labeled property graphs to RDF to pure graph analytics to multi-model databases, the breadth of graph offerings is staggering.

The good news? DataStax has been listening—and building.

In this session, we’ll show you how DataStax Graph is architected into Apache Cassandra to deliver the world’s most scalable graph database. You’ll learn how to integrate Cassandra data into mixed workloads, design scalable property graphs, and even turn your existing tables into graphs.

With your high throughput time series data distributed next to its relationships, what will you build next?

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Graph in Apache Cassandra. The World’s Most Scalable Graph Database

  1. 1. How to build a scalable graph database Bryn Cooke The smart way
  2. 2. In this talk 1. What does it take to build a graph database? 2. Why shouldn’t you do this at home. 3. What do you use this for?
  3. 3. Graph family tree
  4. 4. Graph database recipe 1. Model 2. Language 3. Storage
  5. 5. Model bob since: 2001 steph bob: knows :steph age: 30age: 34 knows known Property Graph RDF
  6. 6. Language g.V().has('name', 'marko').out('knows').values('name')
  7. 7. Storage
  8. 8. The adjacency list Vertex Adjacent to A B, D, E B C B D C E D, F F A B C E D F
  9. 9. //TODO • Storage • Indexing • Commit log • Drivers • Caching • Schema • Metrics • Backup/Restore • Logging • Security • Testing • Support • Failover • QoS • Paging • Partitioning • Sorting • Compaction • Repair • Community • Bux fixing • Optimisation
  10. 10. Storage - Cassandra • Fast • Distributed • Scalable • Reliable • 11 years of development • 54 committers (listed on apache) • 274 contributors (listed on github)
  11. 11. The adjacency list (in Cassandra)
  12. 12. Here's what you could do C* C* C* C*C* My Graph Database Client Client Client Client
  13. 13. Client Here's what you could do C* C* C* C*C* My Graph Database
  14. 14. Here's what you should do C* C* C* C*C* DS Graph Client Client Client Client
  15. 15. Deep integration with DataStax Enterprise DataStax Enterprise • DataStax Enterprise scalability > Cassandra scalability. • Analytics integration. • Search integration. • Thread optimisation. • Continuous paging. • Prefetching. • First class schema integration.
  16. 16. Today’s Graph Database Market Graph Problems > Graph Databases
  17. 17. Typical customer 360 queries Offline fast Human fast Machine fast Analytics CQL Search Responsetime Simple Complex No go zone DSE • Find me Jenny. • Find me all people with similar names to 'Jenny'. • Tell there are duplicate Jennys. • Find how Jenny and John are connected. • Find how influential Jenny is in my application.
  18. 18. Find me Jenny Offline fast Human fast Machine fast Analytics CQL Search Responsetime Simple Complex No go zone DSE How Complex? • Simple How Fast? • Machine What? • CQL Why? • Single partition lookup • Single iteration
  19. 19. Find me all people with similar names to 'Jenny' Offline fast Human fast Machine fast Analytics CQL Search Responsetime Simple Complex No go zone DSE How Complex? • Medium How Fast? • Human Fast What? • Search • Graph Why? • Single index lookup • Single iteration
  20. 20. Tell there are duplicate Jennys Offline fast Human fast Machine fast Analytics CQL Search Responsetime Simple Complex No go zone DSE How Complex? • Medium How Fast? • Offline What? • Analytics • Graph Why? • Aggregation • Multiple Iteration
  21. 21. Find how Jenny and John are connected Offline fast Human fast Machine fast Analytics CQL Search Responsetime Simple Complex No go zone DSE How Complex? • Complex How Fast? • Machine What? • Graph Why? • Multiple partition lookup • Multiple iteration
  22. 22. Find how influential Jenny is in my application Offline fast Human fast Machine fast Analytics CQL Search Responsetime Simple Complex No go zone DSE How Complex? • Complex How Fast? • Offline What? • Spark Analytics • Graph via PageRank Why? • Full scan • Unknown iterations
  23. 23. Typical customer 360 queries Offline fast Human fast Machine fast Analytics CQL Search Responsetime Simple Complex No go zone DSE • Find me Jenny. • Find me all people with similar names to 'Jenny'. • Tell there are duplicate Jennys. • Find how Jenny and John are connected. • Find how influential Jenny is in my application.
  24. 24. Summary 1. What it takes to create a graph database a. Model b. Language c. Storage 2. How you can leverage an existing storage engine, and why Cassandra is a great choice. 3. Solving graph problems requires more than just the basics. Search and Analytics are essential tools, especially graph database.
  25. 25. Don't try this at home Do not try replicate 100 person years of dev effort creating your own storage engine. Creating a graph database that scales is tough enough.
  26. 26. Try it now https://downloads.datastax.com/#labs Labs
  27. 27. Thank You

×