GraphTour - Closing Keynote

The Native Graph Advantage
Dr. Jim Webber
Chief Scientist, Neo4j

• A cheeky bit of computer science
• Database architecture from 30,000ft
• Why Neo4j is graph native, and why it matters
• Quantitative performance advantages
• Finish
Overview

Applied to data, native data formats or communication protocols are those
supported by a certain computer hardware or software, with maximal
consistency and minimal amount of additional components.
-- Wikipedia
Native: A Definition

Those who can imagine anything,
can create the impossible.

An Unordered Singly Linked-List
27 1657 5674

A Write-Centric Database?
27 1657 5674Client

Impractical Design
27 1657 5674Client
Client
Client
Every client contends for write lock in naïve implementation

CRDTs to the Rescue
27 1657 5674Client
5674 7689 6Client
1657 5674 66Client
27 1657 5674 7689 6 66

Trees
27
1657
5674
7689
6
66
27 1657 5674 7689 6 66

Minimise contention
27
1657
27 1657 5674 7689 6 66
5674
7689
6
66
Client
Writes
Client
Writes
Client
Writes
Client
Writes

• Classic B-trees common pattern for on disk-databases
• “Index” in memory, files on leaf nodes on disk
• B+ Trees for linear scans are neat! But…
Databases Usually <3 Trees

Pick the right tool for the job

• It could be tables or columns or KV or documents…
• Each database is likely very good for that model
• Evolution driven by its primary workload in its
primary market
• Any add-on doesn’t benefit from this
• Unloved
• Opportunistic (e.g. “multi model”)
• Models don’t compose easily
All Databases have a native model

Why jump on the graph bandwagon?

Graph Layer
• Take existing data store
• Bolt-on Graph-like API from third-
party open source
• Declare victory
Graph Operator
• Take existing data store
• Add graph features into the query
language
• Declare victory
Two Non-Native Approaches to Graph

Non-Native Architectures
Other DBMS
(e.g. Column Store)
Graph Layer
Graph API
Other DBMS
(e.g. Document Store)
Other QL Graph Operator
Graph Layer Graph Operator

No Cypher!
Other DBMS
(e.g. Column Store)
Graph Layer
Graph API
Other DBMS

Requires
convention
at user levelDenormalization
No Cypher!
Other DBMS
(e.g. Column Store)
Graph Layer
Graph API
Other DBMS

Requires
convention
at user levelDenormalization
No Cypher!
Other DBMS
(e.g. Column Store)
Graph Layer
Graph API
Other DBMS
Does not understand graphs
Cannot prevent dangling relationships/logical corruption/etc

• Engine and store are not designed for graphs
• Graphs are not motivating workload
• Denormalization only works to certain modest limits
• E.g. depth 3
• Operational concerns: schema rigidity, evolution
Graph Layer Drawbacks

Popular Implementation: Column Store
http://javahungry.blogspot.com/2013/08/hashing-how-hash-map-works-in-java-or.html

• Works by convention only
• Underlying engine cannot enforce integrity
• Data structures and store formats are
designed for another job entirely
• Performance concerns
Graph Operator Drawbacks

Popular Implementation: B-Trees!
http://zhangliyong.github.io/posts/2014/02/19/mongodb-index-internals.htm

Transactional Servers Read-only
Replicas

Transactional Servers Read-only
Replicas
OLAP

OLTP Application
OLAP
ApplicationNeo4j
Drivers

causal_clustering.server_groups=olap1
causal_clustering.load_balancing.config.server_policies.OLAP=
groups(olap1)
Application code
neo4j.conf
GraphDatabase.driver( "bolt+routing://server?policy=OLAP" );

OLTP Application
OLAP
Application

Cypher Query
Engine
Cypher
Application

Graph
Algorithms
• Actions
• Insights

• Label Propagation
• Union Find / Weakly Connected Components
• Strongly Connected Components
• Triangle-Count / Clustering Coefficient
ClusteringCentrality
• PageRank
• Betweeness
• Closeness
• Degree
Path Finding
• Breadth-first search
• Depth-first search
• Single-source shortest path
• All-pairs shortest path
• Minimum weight spanning
tree

Cypher Query
Engine
Graph
Algorithms
Cypher
Application

Cypher Query
Engine
Graph
Algorithms
Cypher
Application
CALL algo.pageRankCREATE (:User)

PageRank
N MATCH (n) RETURN count(n)
M(p) MATCH (p)<--(q) RETURN q
L(p) MATCH (p)-[r]->() RETURN count(r)

(:Page)-[:Link]->(:Page)
11 million nodes
116 million relationships
DBPedia

• 11 million nodes
• 116 million relationships
• 20 iterations
• < 10 seconds
DBPedia

• Combine OLTP and OLAP in the same cluster
• Work on up-to-date data, no complex ETL,
warehousing
• Mix with graph algorithms
Neo4j is an HTAP Database

Quantitative Analysis
http://cdn2.hubspot.net/hubfs/145335/25-seo-statistics-for-2015-and-what-you-can-learn-from-them.jpg

• Asymptotic benchmarking effort for
native graph tech
• “What Neo4j can do when it’s pushed
to its limits?”
• The results are impressive
Pushing Neo4j to the Limits

• Asymptotic benchmarking effort for native graph tech
• “What Neo4j can do when it’s pushed to its limits?”
• The results are impressive
Pushing Neo4j to the Limits

Traversals
Realistic retail dataset from Amazon
Commodity dual Xeon processor server
Social recommendation (Java procedure) equivalent to:
MATCH (you)-[:BOUGHT]->(something)<-[:BOUGHT]-(other)-[:BOUGHT]->(reco)
WHERE id(you)={id}
RETURN reco

• Can comfortably handle 1 trillion relationships on a single server
• 24x2TB SSDs, 33TB size on disk.
• Compiled Cypher query
• Random reads
• Sustains over 100k user transactions/sec
• Even with 99.8% page faults because of modest 512GB RAM
Read Scale

• Import Friendster dataset
• 1.8 billion relationships takes around 20
minutes
• That is 1M writes/second!
Write Scale

>50M traversals/sec
1,000,000 writes/sec
1012 Records

Comparison on a ~10M node, ~100M relationship graph
Workload Non-native graph DB: 6 machines, each with
48 VCPUs, 256 GB disk and 256 GB of RAM
Count nodes 201s
Count outgoing rels 202s
Count outgoing rels at depth 2 276s
Count outgoing rels at depth 3 511s
Group nodes by property val 212s
Group rels by type 198s
Count depth 2 knows-likes 324s
Page Rank 2571s
Neo4j 3.3: single
machine
< 1ms
< 1ms
56s
423s
25s
26s
133s
27s

Amazing Native Graph Performance

Thanks for coming today!
Drinks on the 13th Floor
@jimwebber

GraphTour - Closing Keynote

More Related Content

What's hot

Similar to GraphTour - Closing Keynote

More from Neo4j

Recently uploaded

GraphTour - Closing Keynote

Editor's Notes