Graph in Apache Cassandra. The World’s Most Scalable Graph Database

How to build a
scalable graph database
Bryn Cooke
The smart way

In this talk
1. What does it take to build a graph database?
2. Why shouldn’t you do this at home.
3. What do you use this for?

Graph database recipe
1. Model
2. Language
3. Storage

Model
bob
since: 2001
steph
bob:
knows
:steph
age: 30age: 34
knows known
Property Graph RDF

Language
g.V().has('name', 'marko').out('knows').values('name')

The adjacency list
Vertex Adjacent to
A B, D, E
B
C B
D C
E D, F
F
A
B
C
E
D
F

//TODO
• Storage
• Indexing
• Commit log
• Drivers
• Caching
• Schema
• Metrics
• Backup/Restore
• Logging
• Security
• Testing
• Support
• Failover
• QoS
• Paging
• Partitioning
• Sorting
• Compaction
• Repair
• Community
• Bux ﬁxing
• Optimisation

Storage - Cassandra
• Fast
• Distributed
• Scalable
• Reliable
• 11 years of development
• 54 committers (listed on apache)
• 274 contributors (listed on github)

The adjacency list (in Cassandra)

Here's what you could do
C*
C*
C*
C*C*
My Graph
Database
Client
Client
Client
Client

Client
Here's what you could do
C*
C*
C*
C*C*
My Graph
Database

Here's what you should do
C*
C*
C*
C*C*
DS Graph
Client
Client
Client
Client

Deep integration with DataStax Enterprise
DataStax Enterprise
• DataStax Enterprise scalability > Cassandra scalability.
• Analytics integration.
• Search integration.
• Thread optimisation.
• Continuous paging.
• Prefetching.
• First class schema integration.

Today’s Graph Database Market
Graph
Problems > Graph
Databases

Typical customer 360 queries
Oﬄine
fast
Human
fast
Machine
fast
Analytics
CQL
Search
Responsetime
Simple Complex
No go zone
DSE
• Find me Jenny.
• Find me all people
with similar names
to 'Jenny'.
• Tell there are
duplicate Jennys.
• Find how Jenny
and John are
connected.
• Find how
inﬂuential Jenny is
in my application.

Find me Jenny
Oﬄine
fast
Human
fast
Machine
fast
Analytics
CQL
Search
Responsetime
Simple Complex
No go zone
DSE
How Complex?
• Simple
How Fast?
• Machine
What?
• CQL
Why?
• Single partition
lookup
• Single iteration

Find me all people with similar names to 'Jenny'
Oﬄine
fast
Human
fast
Machine
fast
Analytics
CQL
Search
Responsetime
Simple Complex
No go zone
DSE
How Complex?
• Medium
How Fast?
• Human Fast
What?
• Search
• Graph
Why?
• Single index
lookup
• Single iteration

Tell there are duplicate Jennys
Oﬄine
fast
Human
fast
Machine
fast
Analytics
CQL
Search
Responsetime
Simple Complex
No go zone
DSE
How Complex?
• Medium
How Fast?
• Oﬄine
What?
• Analytics
• Graph
Why?
• Aggregation
• Multiple Iteration

Find how Jenny and John are connected
Oﬄine
fast
Human
fast
Machine
fast
Analytics
CQL
Search
Responsetime
Simple Complex
No go zone
DSE
How Complex?
• Complex
How Fast?
• Machine
What?
• Graph
Why?
• Multiple partition
lookup
• Multiple iteration

Find how influential Jenny is in my application
Offline
fast
Human
fast
Machine
fast
Analytics
CQL
Search
Responsetime
Simple Complex
No go zone
DSE
How Complex?
• Complex
How Fast?
• Offline
What?
• Spark Analytics
• Graph via PageRank
Why?
• Full scan
• Unknown iterations

Summary
1. What it takes to create a graph database
a. Model
b. Language
c. Storage
2. How you can leverage an existing storage engine, and why Cassandra is a
great choice.
3. Solving graph problems requires more than just the basics. Search and
Analytics are essential tools, especially graph database.

Don't try this at home
Do not try replicate 100 person years of
dev eﬀort creating your own storage
engine.
Creating a graph database that scales is
tough enough.

Try it now
https://downloads.datastax.com/#labs
Labs

Graph in Apache Cassandra. The World’s Most Scalable Graph Database

More Related Content

What's hot

Similar to Graph in Apache Cassandra. The World’s Most Scalable Graph Database

More from Connected Data World

Recently uploaded

Graph in Apache Cassandra. The World’s Most Scalable Graph Database