Cypher Query Language
    Chicago Graph Database Meet-Up
             Max De Marzi
What is Cypher?


• Graph Query Language for Neo4j
• Aims to make querying simple
Why Cypher?


  • Existing Neo4j query mechanisms were not
    simple enough

   • Too verbose (Java API)
   • Too prescriptive (Gremlin)
SQL?


  • Unable to express paths
    • these are crucial for graph-based
       reasoning

  • Neo4j is schema/table free
SPARQL?

  • SPARQL designed for a different data
    model

   • namespaces
   • properties as nodes
   • high learning curve
Design
Design Decisions

  Declarative
  Most of the time, Neo4j knows better than you

        Imperative                    Declarative
   follow relationship           specify starting point
breadth-first vs depth-first   specify desired outcome

    explicit algorithm            algorithm adaptable
                                     based on query
Design Decisions
 Pattern matching
Design Decisions
 Pattern matching


                       A


                   B       C
Design Decisions
 Pattern matching
Design Decisions
 Pattern matching
Design Decisions
 Pattern matching
Design Decisions
 Pattern matching
Design Decisions
 ASCII-art patterns




           () --> ()
Design Decisions
 Directed relationship


             A           B

        (A) --> (B)
Design Decisions
 Undirected relationship


             A             B

         (A) -- (B)
Design Decisions
 specific relationships

                   LOVES
             A             B

          A -[:LOVES]-> B
Design Decisions
 Joined paths


      A            B   C

     A --> B --> C
Design Decisions
 multiple paths

                       A


                   B       C

     A --> B --> C, A --> C
      A --> B --> C <-- A
Design Decisions
 Variable length paths
             A           B

         A                   B

  A                              B
             ...
         A -[*]-> B
Design Decisions
 Optional relationships


             A            B

         A -[?]-> B
Design Decisions
 Familiar for SQL users


                select
                          start
                from
                         match
               where
                         where
              group by
                         return
              order by
START
SELECT *
FROM Person
WHERE firstName = “Max”


START max=node:persons(firstName = “Max”)
RETURN max
MATCH
SELECT skills.*
FROM users
JOIN skills ON users.id = skills.user_id
WHERE users.id = 101

START user = node(101)
MATCH user --> skills
RETURN skills
Optional MATCH
SELECT skills.*
FROM users
LEFT JOIN skills ON users.id = skills.user_id
WHERE users.id = 101

START user = node(101)
MATCH user –[?]-> skills
RETURN skills
SELECT skills.*, user_skill.*
FROM users
JOIN user_skill ON users.id = user_skill.user_id
JOIN skills ON user_skill.skill_id = skill.id WHERE
users.id = 1
START user = node(1)
MATCH user -[user_skill]-> skill
RETURN skill, user_skill
Indexes

Used as multiple starting points, not to speed
up any traversals


START a = node:nodes_index(type='User') MATCH
a-[r:knows]-b
RETURN ID(a), ID(b), r.weight
http://maxdemarzi.com/2012/03/16/jung-in-neo4j-par
Complicated Match

Some UGLY recursive self join on the groups
table


START max=node:person(name=“Max")
MATCH group <-[:BELONGS_TO*]- max
RETURN group
Where
SELECT person.*
FROM person
WHERE person.age >32
 OR person.hair = "bald"

START person = node:persons("name:*") WHERE
person.age >32
 OR person.hair = "bald"
RETURN person
Return
SELECT person.name, count(*)
FROM Person
GROUP BY person.name
ORDER BY person.name


START person=node:persons("name:*") RETURN
person.name, count(*)
ORDER BY person.name
Order By, Parameters
Same as SQL

{node_id} expected as part of request


START me = node({node_id})
MATCH (me)-[?:follows]->(friends)-[?:follows]->(fof)-[?:follows]->(fofof)-
[?:follows]->others
RETURN me.name, friends.name, fof.name, fofof.name, count(others)
ORDER BY friends.name, fof.name, fofof.name, count(others) DESC
http://maxdemarzi.com/2012/02/13/visualizing-a-netw
Graph Functions

Some UGLY multiple recursive self and inner joins on
the user and all related tables



START lucy=node(1000), kevin=node(759) MATCH p
= shortestPath( lucy-[*]-kevin ) RETURN p
Aggregate Functions
ID: get the neo4j assigned identifier
Count: add up the number of occurrences
Min: get the lowest value
Max: get the highest value
Avg: get the average of a numeric value
Distinct: remove duplicates

START me = node:nodes_index(type = 'user')
MATCH (me)-[r?:wrote]-()
RETURN ID(me), me.name, count(r), min(r.date), max(r.date)" ORDER
BY ID(me)
Functions

Collect: put all values in a list



START a = node:nodes_index(type='User')
MATCH a-[:follows]->b
RETURN a.name, collect(b.name)
http://maxdemarzi.com/2012/02/02/graph-visualizatio
Combine Functions

Collect the ID of friends



START me = node:nodes_index(type = 'user')"
MATCH (me)<-[r?:wrote]-(friends)
RETURN ID(me), me.name, collect(ID(friends)), collect(r.date)
ORDER BY ID(me)
http://maxdemarzi.com/2012/03/08/connections-in-time/
Uses
Recommend Friends

START me = node({node_id})
MATCH (me)-[:friends]->(friend)-[:friends]->(foaf)
RETURN foaf.name
Uses
Six Degrees of Kevin Bacon

Length: counts the number of nodes along a path
Extract: gets the nodes/relationships from a path


START me=node({start_node_id}),
     them=node({destination_node_id})
MATCH path = allShortestPaths( me-[?*]->them )
RETURN length(path),
        extract(person in nodes(path) : person.name)
Uses
Similar Users

Users who rated same items within 2 points.

Abs: gets absolute numeric value


START me = node(user1)
MATCH (me)-[myRating:RATED]->(i)<-[otherRating:RATED]-(u)
WHERE abs(myRating.rating-otherRating.rating)<=2
RETURN u
Boolean Operations
Items with a rating > 7 that similar users rated, but I have not
And: this and that are true
Or: this or that is true
Not: this is false

START me=node(user1), 
       similarUsers=node(3) (result received in the first query)
MATCH (similarUsers)-[r:RATED]->(item)
WHERE r.rating > 7 AND NOT((me)-[:RATED]->(item)) 
RETURN item



http://thought-bytes.blogspot.com/2012/02/similarity-based-recommendation
Predicates
ALL: closure is true for all items
ANY: closure is true for any item
NONE: closure is true for no items
SINGLE: closure is true for exactly 1 item


START london = node(1), moscow = node(2)
MATCH path = london -[*]-> moscow
WHERE all(city in nodes(path) where
city.capital = true)
Design Decisions
 Parsed, not an internal DSL



    Execution Semantics   Serialisation

       Type System        Portability
Design Decisions
 Database vs Application
     Design Goal: single user
    interaction expressible as
           single query


                         Queries have enough logic to
                        find required data, not enough
                                 to process it
Implementation
Implementation
        • Recursive matching with backtracking
START x=... MATCH x-->y, x-->z, y-->z, z-->a-->b, z-->b
Implementation

 Execution Plan


start n=node(0)     Cypher is Pipes
return n
                    lazily evaluated
Parameters()        pulling from pipes underneath
Nodes(n)
Extract([n])
ColumnFilter([n])
Implementation

 Execution Plan
start n=node(0)
match n-[*]-> b
return n.name, n, count(*)
order by n.age

Parameters()
Nodes(n)
PatternMatch(n-[*]->b)
Extract([n.name, n])
EagerAggregation( keys: [n.name, n], aggregates: [count(*)])
Extract([n.age])
Sort(n.age ASC)
ColumnFilter([n.name,n,count(*)])
Implementation

 Execution Plan
start n=node(0)
match n-[*]-> b
return n.name, n, count(*)
order by n.name


Parameters()
Nodes(n)
PatternMatch(n-[*]->b)
Extract([n.name, n])
Sort(n.name ASC,n ASC)
EagerAgregation( keys: [n.name, n], aggregates: [count(*)])
ColumnFilter([n.name,n,count(*)])
Thanks for Listening!
  Questions?

maxdemarzi.com

Cypher

Editor's Notes

  • #3 There existed a number of different ways to query a graph database. This one aims to make querying easy, and to produce queries that are readable. We looked at alternatives - SPARQL, SQL, Gremlin and other...