The Native Graph Advantage
Dr. Jim Webber
Chief Scientist, Neo4j
• A cheeky bit of computer science
• Database architecture from 30,000ft
• Why Neo4j is graph native, and why it matters
• Quantitative performance advantages
• Finish
Overview
Applied to data, native data formats or communication protocols are those
supported by a certain computer hardware or software, with maximal
consistency and minimal amount of additional components.
-- Wikipedia
Native: A Definition
Those who can imagine anything,
can create the impossible.
An Unordered Singly Linked-List
27 1657 5674
A Write-Centric Database?
27 1657 5674Client
Impractical Design
27 1657 5674Client
Client
Client
Every client contends for write lock in naïve implementation
CRDTs to the Rescue
27 1657 5674Client
5674 7689 6Client
1657 5674 66Client
27 1657 5674 7689 6 66
Trees
27
1657
5674
7689
6
66
27 1657 5674 7689 6 66
Minimise contention
27
1657
27 1657 5674 7689 6 66
5674
7689
6
66
Client
Writes
Client
Writes
Client
Writes
Client
Writes
• Classic B-trees common pattern for on disk-databases
• “Index” in memory, files on leaf nodes on disk
• B+ Trees for linear scans are neat! But…
Databases Usually <3 Trees
Pick the right tool for the job
• It could be tables or columns or KV or documents…
• Each database is likely very good for that model
• Evolution driven by its primary workload in its
primary market
• Any add-on doesn’t benefit from this
• Unloved
• Opportunistic (e.g. “multi model”)
• Models don’t compose easily
All Databases have a native model
Why jump on the graph bandwagon?
Graph Layer
• Take existing data store
• Bolt-on Graph-like API from third-
party open source
• Declare victory
Graph Operator
• Take existing data store
• Add graph features into the query
language
• Declare victory
Two Non-Native Approaches to Graph
Non-Native Architectures
Other DBMS
(e.g. Column Store)
Graph Layer
Graph API
Other DBMS
(e.g. Document Store)
Other QL Graph Operator
Graph Layer Graph Operator
Non-Native Architectures
No Cypher!
Other DBMS
(e.g. Column Store)
Graph Layer
Graph API
Other DBMS
(e.g. Document Store)
Other QL Graph Operator
Graph Layer Graph Operator
Non-Native Architectures
Requires
convention
at user levelDenormalization
No Cypher!
Other DBMS
(e.g. Column Store)
Graph Layer
Graph API
Other DBMS
(e.g. Document Store)
Other QL Graph Operator
Graph Layer Graph Operator
Non-Native Architectures
Requires
convention
at user levelDenormalization
No Cypher!
Other DBMS
(e.g. Column Store)
Graph Layer
Graph API
Other DBMS
(e.g. Document Store)
Other QL Graph Operator
Graph Layer Graph Operator
Does not understand graphs
Cannot prevent dangling relationships/logical corruption/etc
• Engine and store are not designed for graphs
• Graphs are not motivating workload
• Denormalization only works to certain modest limits
• E.g. depth 3
• Operational concerns: schema rigidity, evolution
Graph Layer Drawbacks
Popular Implementation: Column Store
http://javahungry.blogspot.com/2013/08/hashing-how-hash-map-works-in-java-or.html
• Works by convention only
• Underlying engine cannot enforce integrity
• Data structures and store formats are
designed for another job entirely
• Performance concerns
Graph Operator Drawbacks
Popular Implementation: B-Trees!
http://zhangliyong.github.io/posts/2014/02/19/mongodb-index-internals.htm
Do one thing, do it well
OLTP
OLAP
HTAP OLAPOLTP
Transactional Servers Read-only
Replicas
Transactional Servers Read-only
Replicas
OLAP
Transactional Servers Read-only
Replicas
OLAP
OLTP Application
OLAP
ApplicationNeo4j
Drivers
causal_clustering.server_groups=olap1
causal_clustering.load_balancing.config.server_policies.OLAP=
groups(olap1)
Application code
neo4j.conf
GraphDatabase.driver( "bolt+routing://server?policy=OLAP" );
OLTP Application
OLAP
Application
HTAP Application
Cypher Query
Engine
Cypher
Application
Graph
Algorithms
• Actions
• Insights
• Label Propagation
• Union Find / Weakly Connected Components
• Strongly Connected Components
• Triangle-Count / Clustering Coefficient
ClusteringCentrality
• PageRank
• Betweeness
• Closeness
• Degree
Path Finding
• Breadth-first search
• Depth-first search
• Single-source shortest path
• All-pairs shortest path
• Minimum weight spanning
tree
Cypher Query
Engine
Cypher
Application
Cypher Query
Engine
Graph
Algorithms
Cypher
Application
Cypher Query
Engine
Graph
Algorithms
Cypher
Application
CALL algo.pageRankCREATE (:User)
PageRank
N MATCH (n) RETURN count(n)
M(p) MATCH (p)<--(q) RETURN q
L(p) MATCH (p)-[r]->() RETURN count(r)
(:Page)-[:Link]->(:Page)
11 million nodes
116 million relationships
DBPedia
• 11 million nodes
• 116 million relationships
• 20 iterations
• < 10 seconds
DBPedia
• Combine OLTP and OLAP in the same cluster
• Work on up-to-date data, no complex ETL,
warehousing
• Mix with graph algorithms
Neo4j is an HTAP Database
Quantitative Analysis
http://cdn2.hubspot.net/hubfs/145335/25-seo-statistics-for-2015-and-what-you-can-learn-from-them.jpg
• Asymptotic benchmarking effort for
native graph tech
• “What Neo4j can do when it’s pushed
to its limits?”
• The results are impressive
Pushing Neo4j to the Limits
• Asymptotic benchmarking effort for native graph tech
• “What Neo4j can do when it’s pushed to its limits?”
• The results are impressive
Pushing Neo4j to the Limits
Traversals
Realistic retail dataset from Amazon
Commodity dual Xeon processor server
Social recommendation (Java procedure) equivalent to:
MATCH (you)-[:BOUGHT]->(something)<-[:BOUGHT]-(other)-[:BOUGHT]->(reco)
WHERE id(you)={id}
RETURN reco
• Can comfortably handle 1 trillion relationships on a single server
• 24x2TB SSDs, 33TB size on disk.
• Compiled Cypher query
• Random reads
• Sustains over 100k user transactions/sec
• Even with 99.8% page faults because of modest 512GB RAM
Read Scale
• Import Friendster dataset
• 1.8 billion relationships takes around 20
minutes
• That is 1M writes/second!
Write Scale
>50M traversals/sec
1,000,000 writes/sec
1012 Records
Comparison on a ~10M node, ~100M relationship graph
Workload Non-native graph DB: 6 machines, each with
48 VCPUs, 256 GB disk and 256 GB of RAM
Count nodes 201s
Count outgoing rels 202s
Count outgoing rels at depth 2 276s
Count outgoing rels at depth 3 511s
Group nodes by property val 212s
Group rels by type 198s
Count depth 2 knows-likes 324s
Page Rank 2571s
Neo4j 3.3: single
machine
< 1ms
< 1ms
56s
423s
25s
26s
133s
27s
Just one more thing…
Amazing Native Graph Performance
Thanks for coming today!
Drinks on the 13th Floor
@jimwebber

GraphTour - Closing Keynote

Editor's Notes

  • #5 The Neo4j database fits this definition: a small number of modules each dedicated to some part of graph storage and query. There’s no other DBMS underneath requiring translation into/out from the native world. And that provides serious benefits to the end user.
  • #7 Science time. A reminder of algorithms and data structures
  • #8 What are the properties of this list? If you want to add something, it’s easy to just insert If you want to find something (or find out it’s not there) it’s laborious. It is O(1) for writes = great It is O(N) for reads = sucky
  • #9 Pop it in a box Put an API on it Voila, it’s a database! Yes it’s crappy database, but for our purposes it suits. If you squint it could even be blockchain. Works great for one client, but…
  • #10 Works great for one client, but…
  • #11 Conflict Free Replicated Data Type A CRDT is a data structure that has well known merge rules. We can write into several concurrent copies (on different servers) and merge them all later. Great! Because we don’t care about ordering this is easy peasy and even a CRDT library can do this for us. But this database is still awful for reads: reads get slower the more data you add. You could even go another route and assume that you rarely read and do something like large fast ring buffers. Lots of options. But reading what you’ve written is always expensive.
  • #12 Let’s try again. Binary tree Log (n) for reads Log (n) for writes because you first have to do log(n) for reads
  • #13 Can read anywhere in principle Can write across the leading edge of the tree. Contention is generally federated through structure.
  • #15 When you’ve got trees then you get lots of logs That is log(n) lookups, and m log(n) traversal speed for graphs - this model isn’t a good choice for graph workloads
  • #16 Of course Neo4j has some indexes that are tree based, but most of the time we only use them to find starting points in the graph Traversals in Neo4j are O(1) Native graph = far fewer log(n) penalties
  • #17 The linked list is great for writes, less good for reads. B-trees strike a balance between reads and writes. Your design and implementation choices empower you for your native model Your design and implementation choices limit you for other use cases
  • #18 Caveat emptor – buyer beware. Models don’t compose easily Can make documents from graphs conveniently, but not so much the other way Non-native Sea Lamprey - costs $500k per year to control in NY state! It’s not a native part of the ecosystem
  • #19 The graph trend is enormous and outstripping all other models. If you’re a vendor in one of the slower growing models, you need some graph *story* Bandwagon jumping
  • #20 Some vendors have spotted the enormous graph trend and are simply jumping on the bandwagon Let’s take a look at their non-native architecture.
  • #21 Achitecture
  • #22 We’ve seen two approaches in the market where a non-graph vendor has tried to stretch their data structures to graph
  • #24 Today most non-native graphs have their own APIs – not cypher, not open Cypher. That excludes them from an amazing ecosystem of tools and people that is to their detriment. I also think that Cypher is by far the best graph query language – by definition, it built on the learning of earlier languages: SQL, Gremlin, Sparql.
  • #27 [Japanese Knotweed] Graph API suffers because most of the data store is focussed on the existing data model The data structures aren’t designed for graphs, nor are the store formats. Graphs are a hobby, tick box, something to answer RFPs. Graphs are not the motivating workload. The motivating workload doesn't even have relationships and therefore the DB engine will not optimise Upper levels try to compensate but generally only can do so for a few hops How many hops even to traverse your data center? Or your train ride? Or your Mars mission?
  • #28 Column store provides nested hashmap data structure Hashmap-of-hashmaps Theoretical O(1) lookups for items seems great! But O(n) in practice because of collisions and pathologically O(n2) for inserting n objects! But is not mechanically sympathetic Hashes distributed data to avoid clashes But performance comes from data locality Work at disk speed if unoptimized Work at RAM speed if optimized But have to denormalize Serious imitations (e.g. up to depth 3 queries optimised only) And then add in network latency for distributed hashring
  • #29 [Himalayan Balsam] Add a graph lookup operator to the query language Use some conventions in the existing model to infer linkage that the new operator can use But no native support for links means slow. The data structures aren’t designed for graphs, nor are the store formats. Also means you need clever workarounds and clever workarounds and you reach the limits of those workarounds quickly Again: How many hops even to traverse your data center? Or your train ride? Or your Mars mission? And if you disobey those conventions – no graph, and there is nothing to enforce them.
  • #30 Underlying model knows nothing about links, so: Is not that good for general purpose graphs because you can’t denormalize for all possible use cases Deleting documents leave dangling links (document engine doesn’t have referential constraints) More generally, user has to ensure conventions are upheld to make graph features work. Easy to unintentionally disable graph features when other folks have only a document view of the data. And then add network latency for all lookups Poor performance at modest search depth, difficult governance (engine does not respect graph), poor expresivity for any reasonable graph problem
  • #31 Non native serve 2 (or more domains). Always prefer their primary domain: it’s what most of their users need. So while there are CS and engineering considerations, there’s also the notion of doing one thing well that underpins Neo4j. Neo4j supports graph workloads natively. From bottom to top. It is not a document store, or a column store, it is a native graph database. Let’s see how we do it.
  • #32 For us that one thing is graphs But graphs are useful in a variety of processing contexts.
  • #33 First of all, Online Transaction Processing. OLTP. What OLTP typically means for a graph is reading or writing small part of the whole graph.
  • #34 The second way we see people using graphs is for Online Analytical Processing. Analytics typically means processing much larger sections of the graph, and often, in fact, processing the whole graph. For the last few decades, the trend has been for specialist technology to handle analytic workloads - different systems, different data models maybe - and isolated from OLTP systems. Well now there’s a new trend
  • #35 Frecently there’s been lots of talk about something called HTAP - *Hybrid* transactional and analytical processing. The idea is that if you could have one system that serves both workloads, you can run your analytics on up-to-date or nearly up-to-date data, so that you can respond to things faster. Also, maybe it’s just not worth the complexity of two totally different systems. What are we doing about this at Neo4j?
  • #36 Since Neo4j 3.1 the cluster architecture has supported dividing the cluster into different groups. Here I’m showing 5 servers on the left that handle transaction workload, updating the graph and a read-only replica which is useful for read-heavy workloads
  • #37 What this gives you is a part of the cluster that is perfect for OLAP workloads. Mostly isolated from the main transactional cluster, work over here won’t impact the transactional workload.
  • #38 You can also specialize the hardware for each workload - for example use machines with more RAM or CPU cores for the LDAP workload. How do you use this cluster?
  • #39 Well you use the Neo4j Drivers to talk to all the servers in this cluster. If you’ve got OLAP workload you want it to go to just to the OLAP specialized machines you can do this just trhought configuration.
  • #40 When you create a Neo4j Driver in your application, you specify a policy. And on the servers you say what that policy means, which groups it should send queries to, and which servers are in those groups.
  • #41 So that gives us our workload directed to the right servers in the cluster. I don’t really need to have two different applications.
  • #42 I can can have one application doing a mixture of OLTP and OLAP and have still have the work routed to the right place Now let’s look at the work itslef in more detail
  • #43 This a model for using Neo4j: You have an application. It sends Cypher queries. They get run by the query Engine. Which queries the graph model. This model is great, but now we’re adding something else into the picture
  • #44 Graph Alogoirithms. They’re firmly on the analytics side of things. They look at a whole graph. You run them and they lead to actions like “this transaction seems fraudulent, you should investigate” or they lead to insights like “this is the type of customer we do well selling to, we should tune our business around them”
  • #45 There are two broad categories of algorithms available with 3.3. Centrality algorithms identify nodes that have significant positions in the network. Clustering algorithms are about detecting groups or clusters of nodes.
  • #46 So if we want to run these graph algorithms, how do they fit into the picture?
  • #47 Well we’ve packaged the algorithms as a set of procedures. This means they sit alongside the cypher query engine behind exactly the same Cypher interface
  • #48 To run one of the algorithms, it’s just a call to the relevant procedure. Works just the same way as running a normal cypher query. Now let’s have a look at one of the algorithms in more detail
  • #49 I’ve picked PageRank because it’s quite well known. PageRank scores the importance of each node according to the importance of the other nodes that link to it. So it’s a kind of recursive definition
  • #50 Practically what that means is that you have iterate. So cosider all the nodes and all the relationships In the graph, many times over.
  • #51 As the algorithm
  • #53 Efficiency for graph operations is paramount. You don’t need huge macho clusters to do this.
  • #54 I think these are incredibly useful building blocks for your next-gen systems – I’m looking forward to seeing the kind applications that get built with this stuff
  • #55 On and on scalability note that Neo4j is light enough to scale down to some really interesting edge compute cases – like Stefan Armbruster’s RasPi cluster! But let’s dig down a bit further. Cypher is at the heart of neo4j and we’ve heard a lot about it today. I’d like to invite Tobias Lindaaker to the stage to talk about advances in the Cypher runtime that translate into performance advantages for you.
  • #56 But now let’s reflect on what it means practically to choose graph native technology
  • #57 So let’s zoom in on the lowest levels: what at the performance advantages of native graph. But what can we do when we really push the envelope – to work the machinery as hard as possible? Lots. Our CTO Johan decided to push the machinery to its limit and see what it can do.
  • #58 Tease Johan.
  • #60 User transaction means real units of work that are meaningful and valuable to the application. Lots of traversals involved. Not an artificial to-first-byte delivery benchmark. Random reads are the hardest for a database to optimise so this is a truly challenging benchmark.
  • #61 This is soon to be outdated – our new highly parallel importer will be far faster. For transactional updates even on my modest laptop I can get several thousand ACID tx/sec online.
  • #62 You can get so much work done so quickly with numbers like those.
  • #63 You don’t have to follow me on this path though. You take the blue pill, the story ends. You wake up in your data centre, the shoe-horning connected data into the those same DBMS systems not designed for it. You take the red pill, and stay in graph land. And I show you how deep traversals can go in the real world. We’re taking the red pill
  • #64 I saw this on the internet and thought it looked like a neat challenge. We had the Dbpedia dataset to hand which is comparable in size (slightly larger but from the real world, 11M nodes 116M links) Theirs was synthetic, slightly smaller. The original experiment ran on 288 cores with 1.5TB RAM. Neo4j ran on a single workstation with 128GB RAM for the database in total – thanks to Michael Hunger for running the experiment. That itself is remarkable illustration of how efficient neo4j can be. Sure it’s macho to run 6 large machines, but it’s more sensible not to. *** Describe what’s going on *** then: This is not really a fair comparison. The work undertaken by the non-native store is far higher than the work undertaken by neo4j. But that’s the whole point! Because neo4j can optimise for graphs all the way down the stack, we can and have implemented all kinds of shortcuts that databases optimised for tables or columns or keys and values or documents can’t do. If you saw a similar table a year ago: the Neo4j column is even faster now, in some cases 2x faster.
  • #66 One more thing…
  • #67 The neo4j engineering team has done some fantastic stuff in the last couple of years: That’s a 3B nod, 18B rel graph pageranked with 20 iterations in less than 2 hours using the graph algos. On commodity hardware. Imagine what we can do with Cypher for Apache Spark too! We also measure ourselves on the standard LDBC 100 benchmark: Running since March 2016: “SF100 Read” has improved *~2x* (~2800 tx/s --> ~5000 tx/s) “SF100 Write” has improved *~4x* (~5000 tx/s --> ~20000 tx/s)
  • #68 Just remains for me to invite you to join us for drinks
  • #69 OK, I guess this is more accurate.