Neo4j 20 minutes introduction

THE REAL VALUE IS IN THE RELATIONSHIPS
• Google : Knowledge Graph
• Facebook: Unicorn
• Twitter: flockdb
• ....

WHAT IS THE PROBLEM WITH RDBMS? (PART 1)
The base question of all recommendation systems:
“User 99 has bought the products 1, 2, 3 and 765 so far. Get the list of other products bought by other users together
with the products 1, 2, 3 or 765 in descending order by popularity”

WHAT IS THE PROBLEM WITH RDBMS? (PART 2)
“Who are Bob’s friends-of-friends-of-friends?”
“What is the shortest path between two specific friends?”
...?

BASICS: WHAT IS A GRAPH?
• Origin: Euler 18th century
• It contains nodes and relationships.
• Nodes contain properties (key-value pairs).
• Nodes can be labeled with one or more labels.
• Relationships are named and directed, and
always have a start and end node.
• Relationships can also contain properties.

GRAPH DATABASES ON THE MARKET
• Non-native storage: data in
general purpose DB
• Native processing: index-free

SOCIAL NETWORK SPEED TEST
1 000 000 people each with approximately 50 friends:

USE CASES *
• Fraud Detection
• Graph-Based Search
• Identity and Access Management
• Master Data Management
• Network and IT Operations
• Real-Time Recommendations
• Social Network
* Detailed examples from Neo4j

DATA MODELING
• concept -> logical model -> physical model
• big gap between concept and DB
• structure and data volume determines query speed
• hard to change schema
• concept directly to DB
• no gap between concept and DB
• query speed not influenced by structure or data
volume
• easy to change connections

CYPHER – GRAPH DATABASE QUERY LANGUAGE
Name:
Joe
Name:
Bob
FRIEND
Person Person
(:Person{name:”Joe”})-[:FRIEND]->(:Person{name:”Bob”})
• Other query languages: SPARQL, Gremlin ...
• Case sensitive
• Most human friendly

CREATING SOME TEST DATA IN CYPHER
// creating nodes
create(:Person{name:"Tom Hanks"});
....
// creating relation between two specific nodes
match (a:Person),(b:Movie)
where
a.name='Ron Howard'
and b.title = 'The Da Vinci Code'
create (a)-[r:DIRECTED]->(b) return r;
....
// set relation property
match(Person{name:"Tom Hanks"})-[n:KNOWS]->
(Person{name:„Ron Howard"}) set n.since=1987;
....
// delete relation
match (a)-[r:KNOWS]->(b)
where
a.name='Matt Damon'
and b.name='Matt Damon'
delete r;

QUERYING DATA IN CYPHER
// whom does Tom Hanks know?
match (:Person{name:"Tom Hanks"})-[r:KNOWS]->(b) return b;
// who knows Steven Spielberg?
match (:Person{name:"Steven Spielberg"})<-[:KNOWS]-(b) return b;
// which films has Tom Hanks Acted in?
match (:Person{name:"Tom Hanks"})-[:ACTED_IN]-(b) return b;
// delete by id
match (n) where ID(n)=11 delete n;
// get Steven Spielberg aquantances 3 levels deep
match (:Person{name:"Steven Spielberg"})
-[:KNOWS]-(b)
-[:KNOWS]-(c)
-[:KNOWS]-(d)
return b, c, d

A BIGGER EXAMPLE
MATCH (tom:Person {name:"Tom Hanks"})
-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors)
RETURN tom, m, coActors
Tom Hanks’ co-actors:

FINDING THE SHORTEST PATH
MATCH p=shortestPath(
(kevin:Person {name:"Kevin Bacon"})-[*]-(meg:Person {name:"Meg Ryan"})
)
RETURN p
The shortest path between Kevin Bacon and Meg Ryan:

RECOMMENDING CO-ACTORS TO TOM HANKS
MATCH
// coActors: acted in the same movies as Tom
// cocoActors: acted in the same movies as coActors but they Tom did not
// act in the same movies as the coActors
(tom:Person {name:"Tom Hanks"})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors),
(coActors)-[:ACTED_IN]->(m2)<-[:ACTED_IN]-(cocoActors)
WHERE NOT (tom)-[:ACTED_IN]->(m2)
RETURN
cocoActors.name AS Recommended,
// strength: how many times the same cocoActor was found
count(*) AS Strength ORDER BY Strength DESC
Find co-actors who haven't work with Tom Hanks (co-co-actors):
Tom
m
(movie)
ACTED_IN
coActor
ACTED_IN
m2
(movie)
ACTED_IN
cocoActor
ACTED_IN
ACTED_IN

NEO4J CLUSTER ARCHITECTURE
• Automatic master election
• Possible to write to slaves, but it is faster to the master
• Full replication (/data redundancy); graph sharding is under development
• Single server capacity: 34 billion nodes, 34 billion relationships, 65 thousands relationship types
and 68 billion properties
• Cluster requires a quorum in order to serve write load
• Reads done on slaves : reads scale linearly
• Exceptionally high write loads: queing and vertical scaling
• Large graph that does not fit in RAM: cache sharding by routing queries
• Online backups full / incremental supported
• Reporting instances are slaves that will never be elected to be master

DEVELOPMENT
Query tuning:
• execution plan
• profiling
Indexing on properties
Accessing:
• web interface
• REST API
• shell
• embedding in Java applications
• Mazerunner extension (Using Apache Spark and Neo4j for Big Data Graph Analytics)
Utilities
• neo4j-shell
• neo4j-import
• neo4j-backup
• neo4j-arbiter

RESOURCES
Good official manual
From Relational to Graph:
A Developer's Guide

Neo4j 20 minutes introduction

Recommended

Recommended

More Related Content

Similar to Neo4j 20 minutes introduction

Similar to Neo4j 20 minutes introduction (20)

Recently uploaded

Recently uploaded (20)

Neo4j 20 minutes introduction

Editor's Notes