Title: Neo4j: The World's Leading Graph DB
Speaker: George Eleftheriadis (https://gr.linkedin.com/in/george-eleftheriadis-4526ba51/)
Date: Monday, April 18, 2016
Event: https://meetup.com/Athens-Big-Data/events/229812890/
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
3rd Athens Big Data Meetup - 2nd Talk - Neo4j: The World's Leading Graph DB
1. The world's leading graph DB
Georgios Eleftheriadis
Software/Database Engineer
2. What is NOSQL?
It’s not “No to SQL”
It’s not “Never SQL”
It’s “Not Only SQL” as they may support SQL-like query languages
NOSQL describes ongoing trend where developers increasingly opt for non-relational
databases to help solve their problems, in an effort to use the right tool for the right job.
NOSQL example databases
Document Oriented (CouchDB, MongoDB)
Key-Value (Memcached, Redis)
Graph Database (Neo4J, InfiniteGraph)
Multi-model (ArangoDB, OrientDB)
2
3. Graphs are everywhere
Relationships in
Politics, Economics, History, Science, Transportation
Biology, Chemistry, Physics, Sociology
Body, Ecosphere, Reaction, Interactions
Internet
Hardware, Software, Interaction
Social Networks
Family, Friends
Work, Communities
Neighbors, Cities, Society
3
4. A sample social graph
# persons query time
Relational database 1,000 2000ms
Neo4j 1,000 2ms
Neo4j 1,000,000 2ms
with ~1,000 persons
average 50 friends per person
pathExists(a,b) limited to depth 4
caches warmed up to eliminate disk I/O
4
5. A sample social graph
# persons query time
Relational database 1,000 2000ms
Neo4j 1,000 2ms
Neo4j 1,000,000 2ms
with ~1,000 persons
average 50 friends per person
pathExists(a,b) limited to depth 4
caches warmed up to eliminate disk I/O
5
6. A sample social graph
# persons query time
Relational database 1,000 2000ms
Neo4j 1,000 2ms
Neo4j 1,000,000 2ms
with ~1,000 persons
average 50 friends per person
pathExists(a,b) limited to depth 4
caches warmed up to eliminate disk I/O
6
7. SQL VS Cypher
MATCH (keanu:Person { name: 'Keanu Reeves' })-[:ACTED_IN]->(movie:Movie),
(director:Person)-[:DIRECTED]->(movie)
RETURN director.name, count(*)
ORDER BY count(*) DESC
7
SELECT director.name, count(*) FROM person Keanu
JOIN acted_in ON keanu.id = acted_in.person_id
JOIN directed ON acted_in.movie_id = directed.movie_id
JOIN person AS director ON directed.person_id = director.id
WHERE keanu.name = 'Keanu Reeves‘
GROUP BY director.name ORDER BY count(*) DESC
Now let’s find out a bit about the directors in movies that Keanu Reeves acted in. We want to
know how many of those movies each of them directed.
8. Two Ways to Work with Neo4j
Embeddable on JVM
Java
Jruby
Scala
Tomcat
Rails
Server with REST API
every language on the planet
flexible deployment scenarios
DIY server or cloud managed
Embedded capability == Server capability
same scalability, transactionality, and availability
8
9. Pros & Cons with Neo4j Database
Pros
Most of the data is connected
High performance to access connected
data
Least or no impact if changes to the data
model
Can be mixed with JPA with Spring Data
Neo4j
Integrate with Spring
Cons
Could not reuse SQL queries
Migration could be pain
Not effective to use Neo4j if data is not
connected
9
10. Enterprise VS Community Edition
Features Enterprise Community
Property Graph Model YES YES
Native Graph Processing & Storage YES YES
Cypher – Graph Query Language YES YES
Language Drivers YES YES
REST & High-Performance Native API YES YES
Enterprise Lock Manager YES NO
Cache Sharding YES NO
Clustered Replication YES NO
Hot Backups YES NO
Advanced Monitoring YES NO 10
12. Cypher Query Language
Declarative query language
Describe what you want, not how
Based on pattern matching
12
13. CQL MATCH
MATCH (a)-->(b)
RETURN a, b;
MATCH (a)-->()
RETURN a.name;
MATCH (n)-[r]->(m)
RETURN n, r, m;
MATCH (a)-[r]->()
RETURN id(a), labels(a), keys(a), type(r);
MATCH (a)-[r:ACTED_IN]->(m)
RETURN a.name, r.roles, m.title;
13
14. CQL MATCH
MATCH (a)-[:ACTED_IN]->(m)<-[:DIRECTED]-(d)
RETURN a.name, m.title, d.name;
MATCH (a)-[:ACTED_IN]->(m)<-[:DIRECTED]-(d)
RETURN a.name AS actor, m.title AS movie, d.name AS director;
MATCH (a)-[:ACTED_IN]->(m)
OPTIONAL MATCH (d)-[:DIRECTED]->(m)
RETURN a.name, m.title, d.name;
MATCH (a)-[:ACTED_IN]->(m), (d)-[:DIRECTED]->(m)
RETURN a.name, m.title, d.name;
MATCH p=(a)-[:ACTED_IN]->(m)<-[:DIRECTED]-(d)
RETURN nodes(p);
14
15. CQL MATCH Functions
count(x) - add up the number of occurrences
min(x) - get the lowest value
max(x) - get the highest value
avg(x) - get the average of a numeric value
collect(x) - collected all the occurrences into an array
15
16. CQL WHERE
MATCH (tom{name:"Tom Hanks"})-[:ACTED_IN]->(movie)
WHERE movie.released < 1992
RETURN DISTINCT movie.title;
MATCH (actor{name:"Keanu Reeves"})-[r:ACTED_IN]->(movie)
WHERE "Neo" IN r.roles
RETURN DISTINCT movie.title;
MATCH (tom{name:"Tom Hanks"})-[:ACTED_IN]->(movie)<-[:ACTED_IN]-(a)
WHERE a.born < tom.born
RETURN DISTINCT a.name, (tom.born - a.born) AS diff;
MATCH (kevin {name:"Kevin Bacon"})-[:ACTED_IN]->(movie)
RETURN DISTINCT movie.title;
MATCH (kevin)-[:ACTED_IN]->(movie)
WHERE kevin.name =~ '.*Kevin.*‘ // Regular expressions
RETURN DISTINCT movie.title;
16
17. CQL WHERE
MATCH (gene {name:"Gene Hackman"})-[:ACTED_IN]->(movie)<-[:ACTED_IN]-(n)
WHERE (n)-[:DIRECTED]->()
RETURN DISTINCT n.name;
MATCH (a)-[:ACTED_IN]->()
RETURN a.name, count(*) AS count
ORDER BY count DESC LIMIT 5;
MATCH (keanu {name:"Keanu Reeves"})-[:ACTED_IN]->()<-[:ACTED_IN]-(c),
(c)-[:ACTED_IN]->()<-[:ACTED_IN]-(coc)
WHERE NOT((keanu)-[:ACTED_IN]->()<-[:ACTED_IN]-(coc)) AND coc <> keanu
RETURN coc.name, count(coc)
ORDER BY count(coc) DESC LIMIT 3;
// Recommend 5 actors that Keanu Reeves should work with (but hasn't)
17
18. CQL CREATE
CREATE ({title:"Mystic River", released:1993});
MATCH (movie {title:"Mystic River"})
SET movie.tagline = "We bury our sins here, Dave. We wash them clean."
RETURN movie;
MATCH (kevin {name:"Kevin Bacon"}),(movie {title:"Mystic River"})
CREATE UNIQUE (kevin)-[:ACTED_IN {roles:["Sean"]}]->(movie);
MATCH (kevin {name:"Kevin Bacon"})-[r:ACTED_IN]->(movie {title:"Mystic River"})
SET r.roles = ["Sean Devine"]
RETURN r.roles;
MATCH (clint {name:"Clint Eastwood"})-[r:ACTED_IN]->(movie {title:"Mystic River"})
CREATE UNIQUE (clint)-[:DIRECTED]->(movie);
18
19. CQL DELETE
MATCH (matrix {title:"The Matrix"})<-[r:ACTED_IN]-(a)
WHERE "Emil" IN r.roles
RETURN a;
// Emil Eifrem is CEO of Neo Technology and co-founder of the Neo4j project
MATCH (emil{name:"Emil Eifrem"})
DELETE emil;
MATCH (emil{name:"Emil Eifrem"}) -[r]-()
DELETE r;
MATCH (emil{name:"Emil Eifrem"}) -[r]-()
DELETE r, emil;
MATCH (node) where ID(node)=1
OPTIONAL MATCH (node)-[r]-()
DELETE r, node;
19
20. CQL INDEXES
There is usually no need to specify which indexes to use in a query, Cypher will figure that out by itself. Indexes
are also automatically used for equality comparisons and inequality (range) comparisons of an indexed
property in the WHERE clause. USING is used to influence the decisions of the planner when building an
execution plan for a query.
CREATE INDEX ON :Person(name)
MATCH (person:Person { name: 'Keanu Reeves' })
RETURN person
MATCH (person:Person)
WHERE person.name = 'Keanu Reeves'
RETURN person
MATCH (person:Person)
WHERE person.name > 'Keanu'
RETURN person
MATCH (person:Person)
WHERE person.name STARTS WITH 'Kea' // CONTAINS, ENDS WITH
RETURN person
20
23. CQL Do-It-Yourself
Add KNOWS relationships between all actors who were in the same movie
23
MATCH (a)-[:ACTED_IN]->()<-[:ACTED_IN]-(b)
CREATE UNIQUE (a)-[:KNOWS]->(b);
MATCH (a)-[:ACTED_IN|DIRECTED]->()<-[:ACTED_IN|DIRECTED]-(b)
CREATE UNIQUE (a)-[:KNOWS]->(b);
24. CQL Useful Tricks
Find friends of friends
MATCH (keanu{name:"Keanu Reeves"})-[:KNOWS*2]->(fof)
WHERE NOT((keanu)-[:KNOWS]-(fof))
RETURN DISTINCT fof.name;
Find shortest path
MATCH p=shortestPath(
(charlize{name:"Charlize Theron"})-[:KNOWS*]->(bacon{name:"Kevin Bacon"}))
RETURN length(rels(p));
Return the names of the people joining Charlize to Kevin.
MATCH p=shortestPath(
(charlize{name:"Charlize Theron"})-[:KNOWS*]->(bacon{name:"Kevin Bacon"}))
RETURN extract(n IN nodes(p)| n.name) AS names
24
25. CQL Useful Tricks
Find movies and actors up to 4 "hops" away from Kevin Bacon
MATCH (bacon:Person {name:"Kevin Bacon"})-[*1..4]-(hollywood)
RETURN DISTINCT Hollywood
Find someone to introduce Tom Hanks to Tom Cruise
MATCH (tom:Person {name:"Tom Hanks"})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors),
(coActors)-[:ACTED_IN]->(m2)<-[:ACTED_IN]-(cruise:Person {name:"Tom Cruise"})
RETURN tom, m, coActors, m2, cruise
25
26. Social Network Database
Relationship Type Properties
IS_FRIEND since
LIKES
WROTE_COMMENT
HAS_COMMENT
UPLOADED
SENT_MESSAGE
RECEIVED_MESSAGE
TAGGED_IN
Node Labels Properties
Person name, email, dob
Photo caption, date
Status text, date
Comment text, date
Message Text, date
26