A whirlwind tour of graph databases

GraphDB
Whirlwind Tour
Michael Hunger
Code Days - OOP

(Michael Hunger)-[:WORKS_FOR]->(Neo4j)
michael@neo4j.com | @mesirii | github.com/jexp | jexp.de/blog
Michael Hunger - Head of Developer Relations @Neo4j

Why
Graphs
?
Use
Cases
Data
Model
Query-
ing
Neo4j

Why Graphs?
Because the World is a Graph!

Everything and Everyone is Connected
• people, places, events
• companies, markets
• countries, history, politics
• sciences, art, teaching
• technology, networks, machines, applications, users
• software, code, dependencies, architecture, deployments
• criminals, fraudsters and their behavior

Value from Data Relationships
Common Use Cases
Internal Applications
Master Data Management
Network and
IT Operations
Fraud Detection
Customer-Facing Applications
Real-Time Recommendations
Graph-Based Search
Identity and
Access Management

The Rise of Connections in Data
Networks of People Business Processes Knowledge Networks
E.g., Risk management, Supply
chain, Payments
E.g., Employees, Customers,
Suppliers, Partners,
Influencers
E.g., Enterprise content,
Domain specific content,
eCommerce content
Data connections are increasing as rapidly as data volumes

9
Harnessing Connections Drives Business Value
Enhanced Decision
Making
Hyper
Personalization
Massive Data
Integration
Data Driven Discovery
& Innovation
Product Recommendations
Personalized Health Care
Media and Advertising
Fraud Prevention
Network Analysis
Law Enforcement
Drug Discovery
Intelligence and Crime Detection
Product & Process Innovation
360 view of customer
Compliance
Optimize Operations
Connected Data at the Center
AI & Machine
Learning
Price optimization
Product Recommendations
Resource allocation
Digital Transformation Megatrends

Newcomers in the last 3 years
• DSE Graph
• Agens Graph
• IBM Graph
• JanusGraph
• Tibco GraphDB
• Microsoft CosmosDB
• TigerGraph
• MemGraph
• AWS Neptune
• SAP HANA Graph

Database Technology Architectures
Graph DB
Connected DataDiscrete Data
Relational DBMSOther NoSQL
Right Tool for the Job

The impact of Graphs
How Graphs are changing the World

Cancer Research - Candiolo Cancer Institute
“Our application relies on complex
hierarchical data, which required a more
flexible model than the one provided by
the traditional relational database
model,” said Andrea Bertotti, MD
neo4j.com/case-studies/candiolo-cancer-institute-ircc/

Graph Databases in Healthcare and Life Sciences
14 Presenters from all around Europe on:
• Genome
• Proteome
• Human Pathway
• Reactome
• SNP
• Drug Discovery
• Metabolic Symbols
• ...
neo4j.com/blog/neo4j-life-sciences-healthcare-workshop-berlin/

28
Real-Time
Recommendations
Fraud
Detection
Network &
IT Operations
Master Data
Management
Knowledge
Graph
Identity & Access
Management
Common Graph Technology Use Cases
AirBnb

30
• Record “Cyber Monday” sales
• About 35M daily transactions
• Each transaction is 3-22 hops
• Queries executed in 4ms or less
• Replaced IBM Websphere commerce
• 300M pricing operations per day
• 10x transaction throughput on half the
hardware compared to Oracle
• Replaced Oracle database
• Large postal service with over 500k
employees
• Neo4j routes 7M+ packages daily at peak,
with peaks of 5,000+ routing operations per
second.
Handling Large Graph Work Loads for Enterprises
Real-time promotion
recommendations
Marriott’s Real-time
Pricing Engine
Handling Package
Routing in Real-Time

Software
Financial
Services Telecom
Retail &
Consumer Goods
Media &
Entertainment Other Industries
Airbus

Machine Learning is Based on Graphs

The Property Graph
Model, Import, Query

The Whiteboard Model Is the Physical Model
Eliminates Graph-to-
Relational Mapping
In your data model
Bridge the gap
between business
and IT models
In your application
Greatly reduce need
for application code

CAR
name: “Dan”
born: May 29, 1970
twitter: “@dan”
name: “Ann”
born: Dec 5, 1975
since:
Jan 10, 2011
brand: “Volvo”
model: “V70”
Property Graph Model Components
Nodes
• The objects in the graph
• Can have name-value properties
• Can be labeled
Relationships
• Relate nodes by type and direction
• Can have name-value properties
LOVES
LOVES
LIVES WITH
PERSON PERSON

Cypher: Powerful and Expressive Query Language
MATCH (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )
LOVES
Dan Ann
LABEL PROPERTY
NODE NODE
LABEL PROPERTY

Relational Versus Graph Models
Relational Model Graph Model
KNOWS
ANDREAS
TOBIAS
MICA
DELIA
Person FriendPerson-Friend
ANDREAS
DELIA
TOBIAS
MICA

Our starting point – Northwind ER

Building Relationships in Graphs
ORDERED
Customer OrderOrder

(FKs)-[:BECOME]->(Relationships) & Correct Directions

Simple Join Tables Becomes Relationships

Attributed Join Tables Become Relationships with Properties

(:You)-[:QUERY]->(:Data)
in a graph

You all know SQL
SELECT distinct c.CompanyName
FROM customers AS c
JOIN orders AS o
ON (c.CustomerID = o.CustomerID)
JOIN order_details AS od
ON (o.OrderID = od.OrderID)
JOIN products AS p
ON (od.ProductID = p.ProductID)
WHERE p.ProductName = 'Chocolat'

Apache Tinkerpop 3.3.x - Gremlin
g = graph.traversal();
g.V().hasLabel('Product')
.has('productName','Chocolat')
.in('INCLUDES')
.in('ORDERED')
.values('companyName').dedup();

W3C Sparql
PREFIX sales_db: <http://sales.northwind.com/>
SELECT distinct ?company_name WHERE {
<sales_db:CompanyName> ?company_name .
?c <sales_db:ORDERED> ?o .
?o <sales_db:ITEMS> ?od .
?od <sales_db:INCLUDES> ?p .
?p <sales_db:ProductName> "Chocolat" .
}

openCypher
MATCH (c:Customer)-[:ORDERED]->(o)
-[:INCLUDES]->(p:Product)
WHERE p.productName = 'Chocolat'
RETURN distinct p.companyName

Basic Pattern: Customers Orders?
MATCH (:Customer {custName:"Delicatessen"} ) -[:ORDERED]-> (order:Order) RETURN order
VAR LABEL
NODE NODE
LABEL PROPERTY
ORDERED
Customer OrderOrder
REL

Basic Query: Customer's Orders?
MATCH (c:Customer)-[:ORDERED]->(order)
WHERE c.customerName = 'Delicatessen'
RETURN *

Basic Query: Customer's Frequent Purchases?
MATCH (c:Customer)-[:ORDERED]->
()-[:INCLUDES]->(p:Product)
WHERE c.customerName = 'Delicatessen'
RETURN p.productName, count(*) AS freq
ORDER BY freq DESC LIMIT 10;

openCypher - Recommendation
MATCH
(c:Customer)-[:ORDERED]->(o1)-[:INCLUDES]->(p),
(peer)-[:ORDERED]->(o2)-[:INCLUDES]->(p),
(peer)-[:ORDERED]->(o3)-[:INCLUDES]->(reco)
WHERE c.customerId = $customerId
AND NOT (c)-[:ORDERED]->()-[:INCLUDES]->(reco)
RETURN reco.productName, count(*) AS freq
ORDER BY freq DESC LIMIT 10

Product Cross-Sell
MATCH
(:Product {productName: 'Chocolat'})<-[:INCLUDES]-(:Order)
<-[:SOLD]-(employee)-[:SOLD]->()-[:INCLUDES]->(cross:Product)
RETURN
employee.firstName, cross.productName,
count(distinct o2) AS freq
ORDER BY freq DESC LIMIT 5;

openCypher...
...is a community effort to evolve Cypher, and to
make it the most useful language for querying
property graphs
openCypher implementations
SAP Hana Graph, Redis, Agens Graph, Cypher.PL, Neo4j

github.com/opencypher Language Artifacts
● Cypher 9 specification
● ANTLR and EBNF Grammars
● Formal Semantics (SIGMOD)
● TCK (Cucumber test suite)
● Style Guide
Implementations & Code
● openCypher for Apache Spark
● openCypher for Gremlin
● open source frontend (parser)
● ...

Cypher 10
● Next version of Cypher
● Actively working on natural language specification
● New features
○ Subqueries
○ Multiple graphs
○ Path patterns
○ Configurable pattern matching semantics

Extending Neo4j -
User Defined Procedures & Functions
Neo4j Execution Engine
User Defined
Procedure
User Defined
Functions
Applications
Bolt
User Defined Procedures & Functions let
you write custom code that is:
• Written in any JVM language
• Deployed to the Database
• Accessed by applications via Cypher

Procedure Examples
Built-In
• Metadata Information
• Index Management
• Security
• Cluster Information
• Query Listing &
Cancellation
• ...
Libraries
• APOC (std library)
• Spatial
• RDF (neosemantics)
• NLP
• ...
neo4j.com/developer/procedures-functions

Example: Data(base) Integration

Graph Analytics
Neo4j Graph Algorithms

”Graph analysis is possibly the single most effective
competitive differentiator for organizations pursuing data-
driven operations and decisions“
The Impact of Connected Data

Existing Options (so far)
•Data Processing
•Spark with GraphX, Flink with Gelly
•Gremlin Graph Computer
•Dedicated Graph Processing
•Urika, GraphLab, Giraph, Mosaic, GPS,
Signal-Collect, Gradoop
•Data Scientist Toolkit
•igraph, NetworkX, Boost in Python, R, C

Goal: Iterate Quickly
•Combine data from sources into one graph
•Project to relevant subgraphs
•Enrich data with algorithms
•Traverse, collect, filter aggregate
with queries
•Visualize, Explore, Decide, Export
•From all APIs and Tools

1. Call as Cypher procedure
2. Pass in specification (Label, Prop, Query) and configuration
3. ~.stream variant returns (a lot) of results
CALL algo.<name>.stream('Label','TYPE',{conf})
YIELD nodeId, score
4. non-stream variant writes results to graph returns statistics
CALL algo.<name>('Label','TYPE',{conf})
Usage

Pass in Cypher statement for node- and relationship-lists.
CALL algo.<name>(
'MATCH ... RETURN id(n)',
'MATCH (n)-->(m)
RETURN id(n) as source,
id(m) as target', {graph:'cypher'})
Cypher Projection

Data Storage and
Business Rules Execution
Data Mining
and Aggregation
Neo4j Fits into Your Environment
Application
Graph Database Cluster
Neo4j Neo4j Neo4j
Ad Hoc
Analysis
Bulk Analytic
Infrastructure
Graph Compute Engine
EDW …
Data
Scientist
End User
Databases
Relational
NoSQL
Hadoop

Official Language Drivers
• Foundational drivers for popular
programming languages
• Bolt: streaming
binary wire protocol
• Authoritative mapping to
native type system,
uniform across drivers
• Pluggable into richer frameworks
JavaScript Java .NET Python PHP, ....
Drivers
Bolt

Bolt + Official Language Drivers
http://neo4j.com/developer/ http://neo4j.com/developer/language-guides/

Using Bolt: Official Language Drivers look all the same
With JavaScript
var driver = Graph.Database.driver("bolt://localhost");
var session = driver.session();
var result = session.run("MATCH (u:User) RETURN u.name");

neo4j.com/developer/spring-data-neo4j
Spring Data Neo4j Neo4j OGM
@NodeEntity
public class Talk {
@Id @GeneratedValue
Long id;
String title;
Slot slot;
Track track;
@Relationship(type="PRESENTS",
direction=INCOMING)
Set<Person> speaker = new HashSet<>();
}

Spring Data Neo4j Neo4j OGM
interface TalkRepository extends Neo4jRepository<Talk, Long> {
@Query("MATCH (t:Talk)<-[rating:RATED]-(user)
WHERE t.id = {talkId} RETURN rating")
List<Rating> getRatings(@Param("talkId") Long talkId);
List<Talk> findByTitleContaining(String title);
}

github.com/neoj4-contrib/neo4j-spark-connector
Neo4j Spark Connector

github.com/neo4j-contrib/neo4j-jdbc
Neo4j JDBC Driver

Neo4j
THE Graph Database Platform

Graph
Transactions
Graph
Analytics
Data Integration
Development
& Admin
Analytics
Tooling
Drivers & APIs Discovery & Visualization
Developers
Admins
Applications Business Users
Data Analysts
Data Scientists

• Operational workloads
• Analytics workloads
Real-time Transactional
and Analytic Processing • Interactive graph exploration
• Graph representation of data
Discovery and
Visualization
• Native property graph model
• Dynamic schema
Agilit
y
• Cypher - Declarative query language
• Procedural language extensions
• Worldwide developer community
Developer Productivity
• 10x less CPU with index-free adjacency
• 10x less hardware than other platforms
Hardware efficiency
Neo4j: Graph Platform
Performance
• Index-free adjacency
• Millions of hops per second

Index-free adjacency ensures lightning-
fast retrieval of data and relationships
Native Graph Architecture
Index free adjacency
Unlike other database models Neo4j
connects data as it is stored

Neo4j Query Planner
Cost based Query Planner since Neo4j
• Uses transactional database statistics
• High performance Query Engine
• Bytecode compiled queries
• Future: Parallism

1
2
3
4
5
6
Architecture Components
Index-Free Adjacency
In memory and on flash/disk
vs
ACID Foundation
Required for safe writes
Full-Stack Clustering
Causal consistency
Security
Language, Drivers, Tooling
Developer Experience,
Graph Efficiency
Graph Engine
Cost-Based Optimizer, Graph
Statistics, Cypher Runtime
Hardware Optimizations
For next-gen infrastructure

Neo4j – allows you to connect the dots
• Was built to efficiently
• store,
• query and
• manage highly connected data
• Transactional, ACID
• Real-time OLTP
• Open source
• Highly scalable on few machines

High Query Performance: Some Numbers
• Traverse 2-4M+ relationships per
second and core
• Cost based query optimizer –
complex queries return in
milliseconds
• Import 100K-1M records per second
transactionally
• Bulk import tens of billions of records
in a few hours

How do I get it? Desktop – Container – Cloud
http://neo4j.com/download/
docker run neo4j

Neo4j Cluster Deployment Options
• Developer: Neo4j Desktop (free Enterprise License)
• On premise – Standalone or via OS package
• Containerized with official Docker Image
•
In the Cloud
• AWS, GCE, Azure
• Using Resource Managers
• DC/OS – Marathon
• Kubernetes
• Docker Swarm

10M+
Downloads
3M+ from Neo4j Distribution
7M+ from Docker
Events
400+
Approximate Number of
Neo4j Events per Year
50k+
Meetups
Number of Meetup
Members Globally
Active Community
50k+
Trained/certified Neo4j
professionals
Trained Developers

Summary: Graphs allow you ...
• Keep your rich data model
• Handle relationships efficiently
• Write queries easily
• Develop applications quickly
• Have fun

Thank You!
Questions?!
@neo4j | neo4j.com
@mesirii | Michael Hunger

Causal Clustering
Core & Replica Servers Causal Consistency

Causal Clustering - Features
• Two Zones – Core + Edge
• Group of Core Servers – Consistent and Partition tolerant (CP)
• Transactional Writes
• Quorum Writes, Cluster Membership, Leader via Raft Consensus
• Scale out with Read Replicas
• Smart Bolt Drivers with
• Routing, Read & Write Sessions
• Causal Consistency with Bookmarks

• For massive query
throughput
• Read-only replicas
• Not involved in Consensus
Commit
Replica
• Small group of Neo4j
databases
• Fault-tolerant Consensus
Commit
• Responsible for data safety
Core

Writing to the Core Cluster
Neo4j
Driver
✓
✓
✓
Success
Neo4j
Cluster

Application
Server
Neo4j
Driver
Max
Jim
Jane
Mar
k

Routed write statements
driver = GraphDatabase.driver( "bolt+routing://aCoreServer" );
try ( Session session = driver.session( AccessMode.WRITE ) )
{
try ( Transaction tx = session.beginTransaction() )
{
tx.run( "MERGE (user:User {userId: {userId}})",
parameters( "userId", userId ) );
tx.success();
}
}

Bookmark
• Session token
• String (for portability)
• Opaque to application
• Represents ultimate user’s most
recent view of the graph
• More capabilities to come

DataMassive High
3.0
Bigger Clusters
Consensus
Commit
Built-in load
balancing
3.1Causal
Clusteri
ng

Neo4j 3.0 Neo4j 3.1
High Availability
Cluster
Causal Cluster
Master-Slave architecture
Paxos consensus used for
master election
Raft protocol used for leader
election, membership changes
and
commitment of all
transactions
Two part cluster: writeable
Core and read-only read
replicas.
Transaction committed
once written durably on
the master
Transaction committed once written
durably on a majority of the core
members
Practical deployments:
10s servers
Practical deployments: 100s
servers

Writing to the Core Cluster – Raft Consensus
Commits
Neo4j
Driver
✓
✓
✓
Success
Neo4j
Cluster

Routed write statements
driver = GraphDatabase.driver( "bolt+routing://aCoreServer" );
try ( Session session = driver.session( AccessMode.WRITE ) )
{
try ( Transaction tx = session.beginTransaction() )
{
tx.run( "MERGE (user:User {userId: {userId}})“, parameters( "userId",
userId ) );
tx.success();
}
}

Flexible Authentication Options
Choose authentication method
• Built-in native users repository
Testing/POC, single-instance deployments
• LDAP connector to Active Directory
or openLDAP
Production deployments
• Custom auth provider plugins
Special deployment scenarios
128
Custom
Plugin
Active Directory openLDAP
LDAP
connector
LDAP
connector
Auth Plugin
Extension Module
Built-in
Native Users
Neo4j
Built-in Native Users
Auth Plugin
Extension Module

129
Flexible Authentication Options
LDAP Group to Role Mapping
dbms.security.ldap.authorization.group_to_role_mapping=
"CN=Neo4j Read Only,OU=groups,DC=example,DC=com" = reader;
"CN=Neo4j Read-Write,OU=groups,DC=example,DC=com" = publisher;
"CN=Neo4j Schema Manager,OU=groups,DC=example,DC=com" = architect;
"CN=Neo4j Administrator,OU=groups,DC=example,DC=com" = admin;
"CN=Neo4j Procedures,OU=groups,DC=example,DC=com" = allowed_role
./conf/neo4j.conf
CN=Bob Smith
CN=Carl JuniorOU=people
DC=example
DC=com
BASE DN
OU=groups
CN=Neo4j Read Only
CN=Neo4j Read-Write
CN=Neo4j Schema Manager
CN=Neo4j Administrator
CN=Neo4j Procedures
Map to Neo4j
permissions

Case Study: Knowledge Graphs at eBay

Bags

Men’s Backpack
Handbag

Case studySolving real-time recommendations for the
World’s largest retailer.
Challenge
• In its drive to provide the best web experience for its
customers, Walmart wanted to optimize its online
recommendations.
• Walmart recognized the challenge it faced in delivering
recommendations with traditional relational database
technology.
• Walmart uses Neo4j to quickly query customers’ past
purchases, as well as instantly capture any new interests
shown in the customers’ current online visit – essential
for making real-time recommendations.
Use of Neo4j
“As the current market leader in
graph databases, and with
enterprise features for scalability
and availability, Neo4j is the right
choice to meet our demands”.
- Marcos Vada, Walmart
• With Neo4j, Walmart could substitute a heavy batch
process with a simple and real-time graph database.
Result/Outcome

Case studyeBay Now Tackles eCommerce Delivery Service Routing with
Neo4j
Challenge
• The queries used to select the best courier for eBays
routing system were simply taking too long and they
needed a solution to maintain a competitive service.
• The MySQL joins being used created a code base too slow
and complex to maintain.
• eBay is now using Neo4j’s graph database platform to
redefine e-commerce, by making delivery of online and
mobile orders quick and convenient.
Use of Neo4j
• With Neo4j eBay managed to eliminate the biggest
roadblock between retailers and online shoppers: the
option to have your item delivered the same day.
• The schema-flexible nature of the database allowed easy
extensibility, speeding up development.
• Neo4j solution was more than 1000x faster than the prior
MySQL Soltution.
Our Neo4j solution is literally
thousands of times faster than the
prior MySQL solution, with queries
that require 10-100 times less code.
Result/Outcome
– Volker Pacher, eBay

Top Tier US Retailer
Case studySolving Real-time promotions for a top US
retailer
Challenge
• Suffered significant revenues loss, due to legacy
infrastructure.
• Particularly challenging when handling transaction volumes
on peak shopping occasions such as Thanksgiving and
Cyber Monday.
• Neo4j is used to revolutionize and reinvent its real-time
promotions engine.
• On an average Neo4j processes 90% of this retailer’s 35M+
daily transactions, each 3-22 hops, in 4ms or less.
Use of Neo4j
• Reached an all time high in online revenues, due to the
Neo4j-based friction free solution.
• Neo4j also enabled the company to be one of the first
retailers to provide the same promotions across both online
and traditional retail channels.
“On an average Neo4j processes
90% of this retailer’s 35M+ daily
transactions, each 3-22 hops, in
4ms or less.”
– Top Tier US Retailer
Result/Outcome

Relational DBs Can’t Handle Relationships Well
• Cannot model or store data and relationships
without complexity
• Performance degrades with number and levels
of relationships, and database size
• Query complexity grows with need for JOINs
• Adding new types of data and relationships
requires schema redesign, increasing time to
market
… making traditional databases inappropriate
when data relationships are valuable in real-time
Slow development
Poor performance
Low scalability
Hard to maintain

Unlocking Value from Your Data Relationships
• Model your data as a graph of data
and relationships
• Use relationship information in real-
time to transform your business
• Add new relationships on the fly to
adapt to your changing business

MATCH (sub)-[:REPORTS_TO*0..3]->(boss),
(report)-[:REPORTS_TO*1..3]->(sub)
WHERE boss.name = "Andrew K."
RETURN sub.name AS Subordinate,
count(report) AS Total
Express Complex Queries Easily with Cypher
Find all direct reports and how
many people they manage,
up to 3 levels down
Cypher Query
SQL Query

A whirlwind tour of graph databases

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A whirlwind tour of graph databases

Similar to A whirlwind tour of graph databases (20)

More from jexp

More from jexp (20)

Recently uploaded

Recently uploaded (20)

A whirlwind tour of graph databases