Design Decisions
Declarative
Most of the time, Neo4j knows better than you
Imperative Declarative
follow relationship specify starting point
breadth-first vs depth-first specify desired outcome
explicit algorithm algorithm adaptable
based on query
MATCH
SELECT skills.*
FROM users
JOIN skills ON users.id = skills.user_id
WHERE users.id = 101
START user = node(101)
MATCH user --> skills
RETURN skills
Optional MATCH
SELECT skills.*
FROM users
LEFT JOIN skills ON users.id = skills.user_id
WHERE users.id = 101
START user = node(101)
MATCH user –[?]-> skills
RETURN skills
SELECT skills.*, user_skill.*
FROM users
JOIN user_skill ON users.id = user_skill.user_id
JOIN skills ON user_skill.skill_id = skill.id WHERE
users.id = 1
START user = node(1)
MATCH user -[user_skill]-> skill
RETURN skill, user_skill
Indexes
Used as multiple starting points, not to speed
up any traversals
START a = node:nodes_index(type='User') MATCH
a-[r:knows]-b
RETURN ID(a), ID(b), r.weight
Complicated Match
Some UGLY recursive self join on the groups
table
START max=node:person(name=“Max")
MATCH group <-[:BELONGS_TO*]- max
RETURN group
Where
SELECT person.*
FROM person
WHERE person.age >32
OR person.hair = "bald"
START person = node:persons("name:*") WHERE
person.age >32
OR person.hair = "bald"
RETURN person
Return
SELECT person.name, count(*)
FROM Person
GROUP BY person.name
ORDER BY person.name
START person=node:persons("name:*") RETURN
person.name, count(*)
ORDER BY person.name
Order By, Parameters
Same as SQL
{node_id} expected as part of request
START me = node({node_id})
MATCH (me)-[?:follows]->(friends)-[?:follows]->(fof)-[?:follows]->(fofof)-
[?:follows]->others
RETURN me.name, friends.name, fof.name, fofof.name, count(others)
ORDER BY friends.name, fof.name, fofof.name, count(others) DESC
Graph Functions
Some UGLY multiple recursive self and inner joins on
the user and all related tables
START lucy=node(1000), kevin=node(759) MATCH p
= shortestPath( lucy-[*]-kevin ) RETURN p
Aggregate Functions
ID: get the neo4j assigned identifier
Count: add up the number of occurrences
Min: get the lowest value
Max: get the highest value
Avg: get the average of a numeric value
Distinct: remove duplicates
START me = node:nodes_index(type = 'user')
MATCH (me)-[r?:wrote]-()
RETURN ID(me), me.name, count(r), min(r.date), max(r.date)" ORDER
BY ID(me)
Functions
Collect: put all values in a list
START a = node:nodes_index(type='User')
MATCH a-[:follows]->b
RETURN a.name, collect(b.name)
Combine Functions
Collect the ID of friends
START me = node:nodes_index(type = 'user')"
MATCH (me)<-[r?:wrote]-(friends)
RETURN ID(me), me.name, collect(ID(friends)), collect(r.date)
ORDER BY ID(me)
Uses
Six Degrees of Kevin Bacon
Length: counts the number of nodes along a path
Extract: gets the nodes/relationships from a path
START me=node({start_node_id}),
them=node({destination_node_id})
MATCH path = allShortestPaths( me-[?*]->them )
RETURN length(path),
extract(person in nodes(path) : person.name)
Uses
Similar Users
Users who rated same items within 2 points.
Abs: gets absolute numeric value
START me = node(user1)
MATCH (me)-[myRating:RATED]->(i)<-[otherRating:RATED]-(u)
WHERE abs(myRating.rating-otherRating.rating)<=2
RETURN u
Boolean Operations
Items with a rating > 7 that similar users rated, but I have not
And: this and that are true
Or: this or that is true
Not: this is false
START me=node(user1),
similarUsers=node(3) (result received in the first query)
MATCH (similarUsers)-[r:RATED]->(item)
WHERE r.rating > 7 AND NOT((me)-[:RATED]->(item))
RETURN item
http://thought-bytes.blogspot.com/2012/02/similarity-based-recommendation
Predicates
ALL: closure is true for all items
ANY: closure is true for any item
NONE: closure is true for no items
SINGLE: closure is true for exactly 1 item
START london = node(1), moscow = node(2)
MATCH path = london -[*]-> moscow
WHERE all(city in nodes(path) where
city.capital = true)
Design Decisions
Parsed, not an internal DSL
Execution Semantics Serialisation
Type System Portability
Design Decisions
Database vs Application
Design Goal: single user
interaction expressible as
single query
Queries have enough logic to
find required data, not enough
to process it
Implementation
• Recursive matching with backtracking
START x=... MATCH x-->y, x-->z, y-->z, z-->a-->b, z-->b
Implementation
Execution Plan
start n=node(0) Cypher is Pipes
return n
lazily evaluated
Parameters() pulling from pipes underneath
Nodes(n)
Extract([n])
ColumnFilter([n])
Implementation
Execution Plan
start n=node(0)
match n-[*]-> b
return n.name, n, count(*)
order by n.age
Parameters()
Nodes(n)
PatternMatch(n-[*]->b)
Extract([n.name, n])
EagerAggregation( keys: [n.name, n], aggregates: [count(*)])
Extract([n.age])
Sort(n.age ASC)
ColumnFilter([n.name,n,count(*)])
Implementation
Execution Plan
start n=node(0)
match n-[*]-> b
return n.name, n, count(*)
order by n.name
Parameters()
Nodes(n)
PatternMatch(n-[*]->b)
Extract([n.name, n])
Sort(n.name ASC,n ASC)
EagerAgregation( keys: [n.name, n], aggregates: [count(*)])
ColumnFilter([n.name,n,count(*)])
There existed a number of different ways to query a graph database. This one aims to make querying easy, and to produce queries that are readable. We looked at alternatives - SPARQL, SQL, Gremlin and other...