Neo4j Database Overview
Andreas Kollegger
February 15 2018
1. Key product characteristics and ingredients
2. Where Neo4j fits into the larger data ecosystem
3. Latest innovations
2
Key Takeaways
Neo4j: Key Components
Key Architecture Components
1
Index-Free Adjacency
In memory and on flash/disk
2
vs
ACID Foundation
Required for safe writes
3
Full-Stack Clustering
Causal consistency
5
Graph Engine
Cost-Based Optimizer, Graph
Statistics, Cypher Runtime, …
6
Hardware Optimizations
For next-gen infrastructure
Language, Drivers, Tooling
Developer Experience,
Graph Efficiency, Type Safety
4
At Write Time:
data is connected
as it is stored
At Read Time:
Lightning-fast retrieval of data and
relationships via pointer chasing
Index-Free Adjacency:
Connectedness and Size of Data Set
ResponseTime
Relational and
Other NoSQL
Databases
0 to 2 hops
0 to 3 degrees
Thousands of connections
1000x
Advantage
Tens to hundreds of hops
Thousands of degrees
Billions of connections
Neo4j
“Minutes to
milliseconds”
“Minutes to Milliseconds” Real-Time Query Performance
24
Cypher Query Language
Example HR Query in SQL
The Same Query using Cypher
MATCH (boss)-[:MANAGES*0..3]->(sub),
(sub)-[:MANAGES*1..3]->(report)
WHERE boss.name = “John Doe”
RETURN sub.name AS Subordinate,
count(report) AS Total
Project ImpactLess time writing queries
• More time understanding the answers
• Leaving time to ask the next question
Less time debugging queries:
• More time writing the next piece of code
• Improved quality of overall code base
Code that’s easier to read:
• Faster ramp-up for new project members
• Improved maintainability & troubleshooting
Graph Transactions Over
ACID Consistency
Graph Transactions Over
Non Graph-ACID DBMSs
25
Maintains Integrity Over Time
Guaranteed Graph Consistency
Becomes Corrupt Over Time
Not Good Enough for Graphs
ACID Graph Writes
A Requirement for for Graph Transactions
Common Integration Patterns Inside the Enterprise
From Disparate Silos
To Cross-Silo Connections
From Tabular Data
To Connected Data
From Data Lake Analytics
to Real-Time Operations
Neo4j: Latest Innovations
3.3
Current Release
(Oct ‘17)
Neo4j 3.3 Release Highlights
Performance
& Scalability
Security &
Operations
Developer
Productivity
“Least Connected” load
balancing
Faster & more memory efficient
runtime
Batch generation of IDs
Schema operations now take local
locks
Page cache metadata moved off
heap
Native GB+ Tree numeric indexes
Bulk importer paging & memory
improvements
Dynamically reload config settings
without restarting Neo4j
Admin & Config
Storage & Indexing
Memory Management
Kernel & Transactions
Cypher Engine
Drivers & Bolt Protocol
Neo4j 3.3 Performance Improvements
“Least Connected” load
balancing
Faster & more memory efficient
runtime
Batch generation of IDs
Schema operations now take local
locks
Page cache metadata moved off heap
Native GB+ Tree numeric
indexes
Bulk importer paging & memory improvements
Dynamically reload config settings
without restarting Neo4j
Admin & Config
Storage & Indexing
Memory Management
Kernel & Transactions
Cypher Engine
Drivers & Bolt Protocol
Neo4j 3.3 Performance Improvements
Neo4j 3.2
Query 2
(μs)
Neo4j 3.2
Query 3
(μs)
Neo4j 3.2
35
70
0
105
140
Query 1
(ms)
Neo4j 3.3 Performance Improvements
Neo4j 3.3
53% faster
Neo4j 3.3
139% faster
Neo4j 3.3
242% faster
Concurrent/Transactional Write Performance
(Simulates Real-World Workloads)
25000
20000
15000
10000
5000
0
Neo4j 2.2 Neo4j 2.3 Neo4j 3.0 Neo4j 3.1 Neo4j 3.2 Neo4j 3.3
69%
31%
59%
38%
55%
Neo4j 3.3 — Release Highlights
Performance
& Scalability
Security &
Operations
Developer
Productivity
`
RR
RR RR
RRRRRR
READ REPLICAS
London
`
C C
RR RR RR
RRRRRR
READ REPLICAS
New York
Encryption
Multi-Data Center Clustering
(Neo4j 3.2)
Intra-Cluster Encryption
(Neo4j 3.3)
Neo4j 3.3 — Release Highlights
Performance
& Scalability
Security &
Operations
Developer
Productivity
Developer Productivity
Neo4j Drivers 1.5
Neo4j OGM (Object-to-Graph Mapper) 3.0
Spring Data Neo4j 5.0 Release
Neo4j Desktop 1.0
LDAP & Active Directory User & Roles Procedure Access Controls
Governance & Security Foundation
Kerberos
Strong Security
Intra-Cluster Encryption
*businessUnit
*customerID
infoProperties
Node Keys
Schema Constraints
Thanks!
39
Excluded
40
• DB Engines curve
• eBay Shopbot
Excluded because Emil is likely to cover them in the intro
Neo4j - The Graph Company
720+
7/10
12/25
8/10
53K+
100+
270+
450+
Adoption
Top Retail Firms
Top Financial Firms
Top Software Vendors
Customers Partners
• Creator of the Neo4j Graph Platform
• ~200 employees
• HQ in Silicon Valley, other offices include
London, Munich, Paris and Malmö (Sweden)
• $80M in funding from Fidelity, Sunstone,
Conor, Creandum, and Greenbridge Capital
• Over 10M+ downloads,
• 270+ enterprise subscription customers with
over half with >$1B in revenue
Ecosystem
Startups in program
Enterprise customers
Partners
Meet up members
Events per year
Industry’s Largest Dedicated Investment in Graphs
2010 2011 2012 2013 2015 2017
Invented Cypher -
Leading language
for graph queries
First open source GA
version of a property
graph database
O’Reilly Graph
Database —
first definitive
book for graph
professionals
Introduced
labels to
simplify graph
modeling
openCypher Project
— open sourced
Cypher to create the
de facto standard
Launched
industry’s
first Graph
Platform
Neo4j — The Graph Technology Pioneer
2014
Visual Graph
Query Browser
2016
Causal
Consistency
for Graphs
100 Best in Show 2014
ODBMS Magic Quadrant 2014
Who’s Who in NOSQL DBMSs 2013
Neo4j Awards and Headlines
Technology of the Year 2013 2014
Bossie Award for Big Data 2013
100 Companies that
Matter the Most in Data
2013
Big Data 100 in Data
Management 2013
“The leading system
among Graph DBMSs
is Neo4j” 2014
Neo's GraphConnect shows graph
databases coming into their own
Matt Aslett 2013 Neo Technology – The Rise
of the Graph Database –
Robin Bloor 2013
O’Reilly
Publications –
Graph Databases
authored by Neo
Technology staff
Knowledge Graphs
Provide Rich
Context for AI
AI Visibility
Human-Friendly
Graph Visualization
Graph Enhanced AI Models
Faster, More
Accurate Development
Graph Execution of AI
Operationalize Real-Time OLAP
and Monitoring
Graph Analytics
Enrich AI Inputs with
Graph Algorithms
Graph System of Record
Maintain a Source of
Connected AI Truth
Graph-Boosted Artificial Intelligence
The Next Phase in AI: Leveraging Connections in Data
Performance & High Availability:
Neo4j Causal Clustering
• Architected to guarantee graph consistency
Inside instances and across the cluster
• No single points of failure
• Seamless integration with Drivers, Bolt
Protocol and Cluster
No external load balancer required
• Optimized for maximum query throughput
and response time
• Choice of application guarantees
“Read your own writes” vs. “Read any”
ENTERPRISE HIGH AVAILABILITY & SCALABILITY FEATURES
Neo4j Causal Clustering
Multi-Data Center Capability*
Causal Clusters can now span data centers
• Clusters can be subdivided into groups and spread across
DCs
• Read-time choice of consistency at global scale:
“Read Any”, “Read-your-own-Writes”
Tiered Subclusters boost performance
• Speeds local reads and writes
• Replica servers pull from nearest
replicas minimizing WAN traffic
Topology-aware stack insulates developers & apps from
the many complexities of clustering
Improved Cloud Delivery via RPM, Azure and AWS EC2
47
dc1 group
dc2 group
*Included in Neo4j’s Enterprise Bundle
Productivity & Governance:
Schema Constraints
• Database-enforced schema
• Node Keys: enforce data uniqueness across a specified set of properties
• Property Existence Constraints: ensure that specified properties always exist for
given nodes & relationships
• Improves developer productivity & data quality
• Avoids need to encode data rules into the application
• Helps ensure data consistency within large teams
• Eases data integration across other enterprise systems
ENTERPRISE SCHEMA & GOVERNANCE FEATURES
Multiple users -> flexible authentication options
Active Directory/LDAP or Native users
Role-based authorization
Assign permissions to users and groups
List and terminate running queries
Users can manage their own queries
Admins can manage all queries
Access controls for user-defined procedures
Enables subgraph access control
Support for extended features
Kerberos add-on (available in Neo4j 3.2)
OGM-Based Property-Level Encryption*
Enables
Sarbanes-Oxley,
HIPAA, PCI-DSS, et al
Neo4j Security Foundation
Safeguards Data & Addresses Compliance
49
Neo4j Advantage – Security*Source: https://neo4j.com/blog/neo4j-data-encryption-ogm/
50
Background
• Large Nordic Telecom Provider
• 1M Broadband routers deployed in Sweden
• Half of subscribership are over 55yrs old
• Each household connects 10 devices
• Goal to improve customer experience
Business Problem
• Broadband router enhancement to improve
customer experience
• Context-based in home services
• How to build smart home platform that allows
vendors to build new “home-centric” apps
Solution and Benefits
• New Features deployed to 1M homes
• API-based platform for easy apps that:
• Automatically assemble Spotify playlists
based on who is in the house
• Notify parents when children get home
• Build smart shopping lists
TELIA ZONE TELECOMMUNICATIONS
Smart Home / Internet of Things51
EE Customer since 2016 Q4
Background
• SF-based C2C rental platform
• Dataportal democratizes data access for
growing number of employees while improving
discoverability and trust
• Data strewn everywhere—in silos, in segmented
departments, nothing was universally accessible
Business Problem
• Data-driven culture hampered by variety and
dependability of data, tribal knowledge and
word-of-mouth distribution
• Needed visibility into information usage, context,
lineage and popularity across company of 3,000+
Solution and Benefits
• Offers search with context & metadata, user &
team-centric pages for origin & lineage
• Nodes are resources: data tables, dashboards,
reports, users, teams, business outcomes, etc.
• Relationships reflect consumption, production,
association, etc.
• Neo4j, Elasticsearch, Python
Airbnb Dataportal TRAVEL TECHNOLOGY
Knowledge Graph, Metadata Management52
CE users since 2017
“Graph analysis is possibly the single most
effective competitive differentiator for
organizations pursuing data-driven operations
and decisions after the design of data capture.”
“In a recent Forrester survey, 51% of global data
and analytics technology decision makers either
are implementing, have already implemented, or
are upgrading their graph databases."
“We expect the graph database market to grow
significantly as organizations look to new
approaches in dealing with silos of data.”
Source: Vendor Landscape: Graph Databases, October 6, 2017
“By the end of 2018, 70% of leading organizations
will have one or more pilot or proof-of-concept
efforts underway utilizing graph databases.”
Source: Making Big Data Normal with Graph Analysis for the Masses, 2015
Source: IT Market Clock for Database Management Systems, 2014

GraphTour - Neo4j Database Overview

  • 1.
    Neo4j Database Overview AndreasKollegger February 15 2018
  • 2.
    1. Key productcharacteristics and ingredients 2. Where Neo4j fits into the larger data ecosystem 3. Latest innovations 2 Key Takeaways
  • 3.
  • 4.
    Key Architecture Components 1 Index-FreeAdjacency In memory and on flash/disk 2 vs ACID Foundation Required for safe writes 3 Full-Stack Clustering Causal consistency 5 Graph Engine Cost-Based Optimizer, Graph Statistics, Cypher Runtime, … 6 Hardware Optimizations For next-gen infrastructure Language, Drivers, Tooling Developer Experience, Graph Efficiency, Type Safety 4
  • 5.
    At Write Time: datais connected as it is stored At Read Time: Lightning-fast retrieval of data and relationships via pointer chasing Index-Free Adjacency:
  • 6.
    Connectedness and Sizeof Data Set ResponseTime Relational and Other NoSQL Databases 0 to 2 hops 0 to 3 degrees Thousands of connections 1000x Advantage Tens to hundreds of hops Thousands of degrees Billions of connections Neo4j “Minutes to milliseconds” “Minutes to Milliseconds” Real-Time Query Performance
  • 7.
    24 Cypher Query Language ExampleHR Query in SQL The Same Query using Cypher MATCH (boss)-[:MANAGES*0..3]->(sub), (sub)-[:MANAGES*1..3]->(report) WHERE boss.name = “John Doe” RETURN sub.name AS Subordinate, count(report) AS Total Project ImpactLess time writing queries • More time understanding the answers • Leaving time to ask the next question Less time debugging queries: • More time writing the next piece of code • Improved quality of overall code base Code that’s easier to read: • Faster ramp-up for new project members • Improved maintainability & troubleshooting
  • 8.
    Graph Transactions Over ACIDConsistency Graph Transactions Over Non Graph-ACID DBMSs 25 Maintains Integrity Over Time Guaranteed Graph Consistency Becomes Corrupt Over Time Not Good Enough for Graphs ACID Graph Writes A Requirement for for Graph Transactions
  • 9.
    Common Integration PatternsInside the Enterprise From Disparate Silos To Cross-Silo Connections From Tabular Data To Connected Data From Data Lake Analytics to Real-Time Operations
  • 10.
  • 11.
  • 12.
    Neo4j 3.3 ReleaseHighlights Performance & Scalability Security & Operations Developer Productivity
  • 13.
    “Least Connected” load balancing Faster& more memory efficient runtime Batch generation of IDs Schema operations now take local locks Page cache metadata moved off heap Native GB+ Tree numeric indexes Bulk importer paging & memory improvements Dynamically reload config settings without restarting Neo4j Admin & Config Storage & Indexing Memory Management Kernel & Transactions Cypher Engine Drivers & Bolt Protocol Neo4j 3.3 Performance Improvements
  • 14.
    “Least Connected” load balancing Faster& more memory efficient runtime Batch generation of IDs Schema operations now take local locks Page cache metadata moved off heap Native GB+ Tree numeric indexes Bulk importer paging & memory improvements Dynamically reload config settings without restarting Neo4j Admin & Config Storage & Indexing Memory Management Kernel & Transactions Cypher Engine Drivers & Bolt Protocol Neo4j 3.3 Performance Improvements
  • 15.
    Neo4j 3.2 Query 2 (μs) Neo4j3.2 Query 3 (μs) Neo4j 3.2 35 70 0 105 140 Query 1 (ms) Neo4j 3.3 Performance Improvements Neo4j 3.3 53% faster Neo4j 3.3 139% faster Neo4j 3.3 242% faster
  • 16.
    Concurrent/Transactional Write Performance (SimulatesReal-World Workloads) 25000 20000 15000 10000 5000 0 Neo4j 2.2 Neo4j 2.3 Neo4j 3.0 Neo4j 3.1 Neo4j 3.2 Neo4j 3.3 69% 31% 59% 38% 55%
  • 17.
    Neo4j 3.3 —Release Highlights Performance & Scalability Security & Operations Developer Productivity
  • 18.
    ` RR RR RR RRRRRR READ REPLICAS London ` CC RR RR RR RRRRRR READ REPLICAS New York Encryption Multi-Data Center Clustering (Neo4j 3.2) Intra-Cluster Encryption (Neo4j 3.3)
  • 19.
    Neo4j 3.3 —Release Highlights Performance & Scalability Security & Operations Developer Productivity
  • 20.
    Developer Productivity Neo4j Drivers1.5 Neo4j OGM (Object-to-Graph Mapper) 3.0 Spring Data Neo4j 5.0 Release Neo4j Desktop 1.0
  • 21.
    LDAP & ActiveDirectory User & Roles Procedure Access Controls Governance & Security Foundation Kerberos Strong Security Intra-Cluster Encryption *businessUnit *customerID infoProperties Node Keys Schema Constraints
  • 22.
  • 23.
  • 24.
    • DB Enginescurve • eBay Shopbot Excluded because Emil is likely to cover them in the intro
  • 25.
    Neo4j - TheGraph Company 720+ 7/10 12/25 8/10 53K+ 100+ 270+ 450+ Adoption Top Retail Firms Top Financial Firms Top Software Vendors Customers Partners • Creator of the Neo4j Graph Platform • ~200 employees • HQ in Silicon Valley, other offices include London, Munich, Paris and Malmö (Sweden) • $80M in funding from Fidelity, Sunstone, Conor, Creandum, and Greenbridge Capital • Over 10M+ downloads, • 270+ enterprise subscription customers with over half with >$1B in revenue Ecosystem Startups in program Enterprise customers Partners Meet up members Events per year Industry’s Largest Dedicated Investment in Graphs
  • 26.
    2010 2011 20122013 2015 2017 Invented Cypher - Leading language for graph queries First open source GA version of a property graph database O’Reilly Graph Database — first definitive book for graph professionals Introduced labels to simplify graph modeling openCypher Project — open sourced Cypher to create the de facto standard Launched industry’s first Graph Platform Neo4j — The Graph Technology Pioneer 2014 Visual Graph Query Browser 2016 Causal Consistency for Graphs
  • 27.
    100 Best inShow 2014 ODBMS Magic Quadrant 2014 Who’s Who in NOSQL DBMSs 2013 Neo4j Awards and Headlines Technology of the Year 2013 2014 Bossie Award for Big Data 2013 100 Companies that Matter the Most in Data 2013 Big Data 100 in Data Management 2013 “The leading system among Graph DBMSs is Neo4j” 2014 Neo's GraphConnect shows graph databases coming into their own Matt Aslett 2013 Neo Technology – The Rise of the Graph Database – Robin Bloor 2013 O’Reilly Publications – Graph Databases authored by Neo Technology staff
  • 28.
    Knowledge Graphs Provide Rich Contextfor AI AI Visibility Human-Friendly Graph Visualization Graph Enhanced AI Models Faster, More Accurate Development Graph Execution of AI Operationalize Real-Time OLAP and Monitoring Graph Analytics Enrich AI Inputs with Graph Algorithms Graph System of Record Maintain a Source of Connected AI Truth Graph-Boosted Artificial Intelligence The Next Phase in AI: Leveraging Connections in Data
  • 29.
    Performance & HighAvailability: Neo4j Causal Clustering • Architected to guarantee graph consistency Inside instances and across the cluster • No single points of failure • Seamless integration with Drivers, Bolt Protocol and Cluster No external load balancer required • Optimized for maximum query throughput and response time • Choice of application guarantees “Read your own writes” vs. “Read any” ENTERPRISE HIGH AVAILABILITY & SCALABILITY FEATURES
  • 30.
    Neo4j Causal Clustering Multi-DataCenter Capability* Causal Clusters can now span data centers • Clusters can be subdivided into groups and spread across DCs • Read-time choice of consistency at global scale: “Read Any”, “Read-your-own-Writes” Tiered Subclusters boost performance • Speeds local reads and writes • Replica servers pull from nearest replicas minimizing WAN traffic Topology-aware stack insulates developers & apps from the many complexities of clustering Improved Cloud Delivery via RPM, Azure and AWS EC2 47 dc1 group dc2 group *Included in Neo4j’s Enterprise Bundle
  • 31.
    Productivity & Governance: SchemaConstraints • Database-enforced schema • Node Keys: enforce data uniqueness across a specified set of properties • Property Existence Constraints: ensure that specified properties always exist for given nodes & relationships • Improves developer productivity & data quality • Avoids need to encode data rules into the application • Helps ensure data consistency within large teams • Eases data integration across other enterprise systems ENTERPRISE SCHEMA & GOVERNANCE FEATURES
  • 32.
    Multiple users ->flexible authentication options Active Directory/LDAP or Native users Role-based authorization Assign permissions to users and groups List and terminate running queries Users can manage their own queries Admins can manage all queries Access controls for user-defined procedures Enables subgraph access control Support for extended features Kerberos add-on (available in Neo4j 3.2) OGM-Based Property-Level Encryption* Enables Sarbanes-Oxley, HIPAA, PCI-DSS, et al Neo4j Security Foundation Safeguards Data & Addresses Compliance 49 Neo4j Advantage – Security*Source: https://neo4j.com/blog/neo4j-data-encryption-ogm/
  • 33.
  • 34.
    Background • Large NordicTelecom Provider • 1M Broadband routers deployed in Sweden • Half of subscribership are over 55yrs old • Each household connects 10 devices • Goal to improve customer experience Business Problem • Broadband router enhancement to improve customer experience • Context-based in home services • How to build smart home platform that allows vendors to build new “home-centric” apps Solution and Benefits • New Features deployed to 1M homes • API-based platform for easy apps that: • Automatically assemble Spotify playlists based on who is in the house • Notify parents when children get home • Build smart shopping lists TELIA ZONE TELECOMMUNICATIONS Smart Home / Internet of Things51 EE Customer since 2016 Q4
  • 35.
    Background • SF-based C2Crental platform • Dataportal democratizes data access for growing number of employees while improving discoverability and trust • Data strewn everywhere—in silos, in segmented departments, nothing was universally accessible Business Problem • Data-driven culture hampered by variety and dependability of data, tribal knowledge and word-of-mouth distribution • Needed visibility into information usage, context, lineage and popularity across company of 3,000+ Solution and Benefits • Offers search with context & metadata, user & team-centric pages for origin & lineage • Nodes are resources: data tables, dashboards, reports, users, teams, business outcomes, etc. • Relationships reflect consumption, production, association, etc. • Neo4j, Elasticsearch, Python Airbnb Dataportal TRAVEL TECHNOLOGY Knowledge Graph, Metadata Management52 CE users since 2017
  • 37.
    “Graph analysis ispossibly the single most effective competitive differentiator for organizations pursuing data-driven operations and decisions after the design of data capture.” “In a recent Forrester survey, 51% of global data and analytics technology decision makers either are implementing, have already implemented, or are upgrading their graph databases." “We expect the graph database market to grow significantly as organizations look to new approaches in dealing with silos of data.” Source: Vendor Landscape: Graph Databases, October 6, 2017 “By the end of 2018, 70% of leading organizations will have one or more pilot or proof-of-concept efforts underway utilizing graph databases.” Source: Making Big Data Normal with Graph Analysis for the Masses, 2015 Source: IT Market Clock for Database Management Systems, 2014