3rd Athens Big Data Meetup - 2nd Talk - Neo4j: The World's Leading Graph DB

The world's leading graph DB
Georgios Eleftheriadis
Software/Database Engineer

What is NOSQL?
 It’s not “No to SQL”
 It’s not “Never SQL”
 It’s “Not Only SQL” as they may support SQL-like query languages
NOSQL describes ongoing trend where developers increasingly opt for non-relational
databases to help solve their problems, in an effort to use the right tool for the right job.
 NOSQL example databases
 Document Oriented (CouchDB, MongoDB)
 Key-Value (Memcached, Redis)
 Graph Database (Neo4J, InfiniteGraph)
 Multi-model (ArangoDB, OrientDB)
2

Graphs are everywhere
 Relationships in
 Politics, Economics, History, Science, Transportation
 Biology, Chemistry, Physics, Sociology
 Body, Ecosphere, Reaction, Interactions
 Internet
 Hardware, Software, Interaction
 Social Networks
 Family, Friends
 Work, Communities
 Neighbors, Cities, Society
3

A sample social graph
# persons query time
Relational database 1,000 2000ms
Neo4j 1,000 2ms
Neo4j 1,000,000 2ms
 with ~1,000 persons
 average 50 friends per person
 pathExists(a,b) limited to depth 4
 caches warmed up to eliminate disk I/O
4

Neo4j 1,000 2ms
Neo4j 1,000,000 2ms
5

Neo4j 1,000 2ms
Neo4j 1,000,000 2ms
6

SQL VS Cypher
MATCH (keanu:Person { name: 'Keanu Reeves' })-[:ACTED_IN]->(movie:Movie),
(director:Person)-[:DIRECTED]->(movie)
RETURN director.name, count(*)
ORDER BY count(*) DESC
7
SELECT director.name, count(*) FROM person Keanu
JOIN acted_in ON keanu.id = acted_in.person_id
JOIN directed ON acted_in.movie_id = directed.movie_id
JOIN person AS director ON directed.person_id = director.id
WHERE keanu.name = 'Keanu Reeves‘
GROUP BY director.name ORDER BY count(*) DESC
Now let’s find out a bit about the directors in movies that Keanu Reeves acted in. We want to
know how many of those movies each of them directed.

Two Ways to Work with Neo4j
 Embeddable on JVM
 Java
 Jruby
 Scala
 Tomcat
 Rails
 Server with REST API
 every language on the planet
 ﬂexible deployment scenarios
 DIY server or cloud managed
Embedded capability == Server capability
same scalability, transactionality, and availability
8

Pros & Cons with Neo4j Database
 Pros
 Most of the data is connected
 High performance to access connected
data
 Least or no impact if changes to the data
model
 Can be mixed with JPA with Spring Data
Neo4j
 Integrate with Spring
 Cons
 Could not reuse SQL queries
 Migration could be pain
 Not effective to use Neo4j if data is not
connected
9

Enterprise VS Community Edition
Features Enterprise Community
Property Graph Model YES YES
Native Graph Processing & Storage YES YES
Cypher – Graph Query Language YES YES
Language Drivers YES YES
REST & High-Performance Native API YES YES
Enterprise Lock Manager YES NO
Cache Sharding YES NO
Clustered Replication YES NO
Hot Backups YES NO
Advanced Monitoring YES NO 10

RDBMS vs Graph DB
11
ID Name Grade
1510 Jordan 9
1689 Gabriel 9
1381 Tiffany 9
1709 Cassandra 9
1101 Haley 10
1782 Andrew 10
1468 Kris 10
1641 Brittany 10
1247 Alexis 11
1316 Austin 11
1911 Gabriel 11
1501 Jessica 11
1304 Jordan 12
1025 John 12
1934 Kyle 12
1661 Logan 12
ID1 ID2
1510 1381
1510 1689
1689 1709
1381 1247
1709 1247
1689 1782
1782 1468
1782 1316
1782 1304
1468 1101
1468 1641
1101 1641
… …
ID1 ID2
1689 1709
1709 1689
1782 1709
1911 1247
1247 1468
1641 1468
1316 1304
1501 1934
1934 1501
1025 1101

Cypher Query Language
 Declarative query language
 Describe what you want, not how
 Based on pattern matching
12

CQL MATCH
 MATCH (a)-->(b)
RETURN a, b;
 MATCH (a)-->()
RETURN a.name;
 MATCH (n)-[r]->(m)
RETURN n, r, m;
 MATCH (a)-[r]->()
RETURN id(a), labels(a), keys(a), type(r);
 MATCH (a)-[r:ACTED_IN]->(m)
RETURN a.name, r.roles, m.title;
13

CQL MATCH
 MATCH (a)-[:ACTED_IN]->(m)<-[:DIRECTED]-(d)
RETURN a.name, m.title, d.name;
 MATCH (a)-[:ACTED_IN]->(m)<-[:DIRECTED]-(d)
RETURN a.name AS actor, m.title AS movie, d.name AS director;
 MATCH (a)-[:ACTED_IN]->(m)
OPTIONAL MATCH (d)-[:DIRECTED]->(m)
 MATCH (a)-[:ACTED_IN]->(m), (d)-[:DIRECTED]->(m)
 MATCH p=(a)-[:ACTED_IN]->(m)<-[:DIRECTED]-(d)
RETURN nodes(p);
14

CQL MATCH Functions
 count(x) - add up the number of occurrences
 min(x) - get the lowest value
 max(x) - get the highest value
 avg(x) - get the average of a numeric value
 collect(x) - collected all the occurrences into an array
15

CQL WHERE
 MATCH (tom{name:"Tom Hanks"})-[:ACTED_IN]->(movie)
WHERE movie.released < 1992
RETURN DISTINCT movie.title;
 MATCH (actor{name:"Keanu Reeves"})-[r:ACTED_IN]->(movie)
WHERE "Neo" IN r.roles
 MATCH (tom{name:"Tom Hanks"})-[:ACTED_IN]->(movie)<-[:ACTED_IN]-(a)
WHERE a.born < tom.born
RETURN DISTINCT a.name, (tom.born - a.born) AS diff;
 MATCH (kevin {name:"Kevin Bacon"})-[:ACTED_IN]->(movie)
 MATCH (kevin)-[:ACTED_IN]->(movie)
WHERE kevin.name =~ '.*Kevin.*‘ // Regular expressions
16

CQL WHERE
 MATCH (gene {name:"Gene Hackman"})-[:ACTED_IN]->(movie)<-[:ACTED_IN]-(n)
WHERE (n)-[:DIRECTED]->()
RETURN DISTINCT n.name;
 MATCH (a)-[:ACTED_IN]->()
RETURN a.name, count(*) AS count
ORDER BY count DESC LIMIT 5;
 MATCH (keanu {name:"Keanu Reeves"})-[:ACTED_IN]->()<-[:ACTED_IN]-(c),
(c)-[:ACTED_IN]->()<-[:ACTED_IN]-(coc)
WHERE NOT((keanu)-[:ACTED_IN]->()<-[:ACTED_IN]-(coc)) AND coc <> keanu
RETURN coc.name, count(coc)
ORDER BY count(coc) DESC LIMIT 3;
// Recommend 5 actors that Keanu Reeves should work with (but hasn't)
17

CQL CREATE
 CREATE ({title:"Mystic River", released:1993});
 MATCH (movie {title:"Mystic River"})
SET movie.tagline = "We bury our sins here, Dave. We wash them clean."
RETURN movie;
 MATCH (kevin {name:"Kevin Bacon"}),(movie {title:"Mystic River"})
CREATE UNIQUE (kevin)-[:ACTED_IN {roles:["Sean"]}]->(movie);
 MATCH (kevin {name:"Kevin Bacon"})-[r:ACTED_IN]->(movie {title:"Mystic River"})
SET r.roles = ["Sean Devine"]
RETURN r.roles;
 MATCH (clint {name:"Clint Eastwood"})-[r:ACTED_IN]->(movie {title:"Mystic River"})
CREATE UNIQUE (clint)-[:DIRECTED]->(movie);
18

CQL DELETE
 MATCH (matrix {title:"The Matrix"})<-[r:ACTED_IN]-(a)
WHERE "Emil" IN r.roles
RETURN a;
// Emil Eifrem is CEO of Neo Technology and co-founder of the Neo4j project
 MATCH (emil{name:"Emil Eifrem"})
DELETE emil;
 MATCH (emil{name:"Emil Eifrem"}) -[r]-()
DELETE r;
 MATCH (emil{name:"Emil Eifrem"}) -[r]-()
DELETE r, emil;
 MATCH (node) where ID(node)=1
OPTIONAL MATCH (node)-[r]-()
DELETE r, node;
19

CQL INDEXES
There is usually no need to specify which indexes to use in a query, Cypher will figure that out by itself. Indexes
are also automatically used for equality comparisons and inequality (range) comparisons of an indexed
property in the WHERE clause. USING is used to influence the decisions of the planner when building an
execution plan for a query.
 CREATE INDEX ON :Person(name)
 MATCH (person:Person { name: 'Keanu Reeves' })
RETURN person
 MATCH (person:Person)
WHERE person.name = 'Keanu Reeves'
RETURN person
WHERE person.name > 'Keanu'
RETURN person
WHERE person.name STARTS WITH 'Kea' // CONTAINS, ENDS WITH
RETURN person
20

Embedded example code
GraphDatabaseService graphDb = new GraphDatabaseFactory().newEmbeddedDatabase("var/neo-4j");
try (Transaction tx = graphDb.beginTx()) {
Node firstNode = graphDb.createNode();
firstNode.setProperty("message", "Hello, ");
Node secondNode = graphDb.createNode();
secondNode.setProperty("message", "World!");
Relationship relationship = firstNode.createRelationshipTo(secondNode, RelTypes.KNOWS);
relationship.setProperty("message", "brave Neo4j ");
System.out.print(firstNode.getProperty("message"));
System.out.print(relationship.getProperty("message"));
System.out.print(secondNode.getProperty("message"));
tx.success();
}
21

Shortest path example code
GraphDatabaseService graphDb = new GraphDatabaseFactory().newEmbeddedDatabase("var/neo-4j");
try (Transaction tx = graphDb.beginTx()) {
static Index<Node> indexService = graphDb.index().forNodes("nodes");
Node neo = indexService.get(“name", “Neo").getSingle();
Node agentSmith = indexService.get(“name", "Agent Smith").getSingle();
PathFinder<Path> finder = GraphAlgoFactory.shortestPath(
PathExpanders.forTypeAndDirection(KNOWS, Direction.BOTH), 4);
Path foundPath = finder.findSinglePath(neo, agentSmith);
System.out.println(Paths.simplePathToString(foundPath, NAME_KEY));
}
22

CQL Do-It-Yourself
 Add KNOWS relationships between all actors who were in the same movie
23
 MATCH (a)-[:ACTED_IN]->()<-[:ACTED_IN]-(b)
CREATE UNIQUE (a)-[:KNOWS]->(b);
 MATCH (a)-[:ACTED_IN|DIRECTED]->()<-[:ACTED_IN|DIRECTED]-(b)
CREATE UNIQUE (a)-[:KNOWS]->(b);

CQL Useful Tricks
 Find friends of friends
 MATCH (keanu{name:"Keanu Reeves"})-[:KNOWS*2]->(fof)
WHERE NOT((keanu)-[:KNOWS]-(fof))
RETURN DISTINCT fof.name;
 Find shortest path
 MATCH p=shortestPath(
(charlize{name:"Charlize Theron"})-[:KNOWS*]->(bacon{name:"Kevin Bacon"}))
RETURN length(rels(p));
 Return the names of the people joining Charlize to Kevin.
 MATCH p=shortestPath(
(charlize{name:"Charlize Theron"})-[:KNOWS*]->(bacon{name:"Kevin Bacon"}))
RETURN extract(n IN nodes(p)| n.name) AS names
24

CQL Useful Tricks
 Find movies and actors up to 4 "hops" away from Kevin Bacon
 MATCH (bacon:Person {name:"Kevin Bacon"})-[*1..4]-(hollywood)
RETURN DISTINCT Hollywood
 Find someone to introduce Tom Hanks to Tom Cruise
 MATCH (tom:Person {name:"Tom Hanks"})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors),
(coActors)-[:ACTED_IN]->(m2)<-[:ACTED_IN]-(cruise:Person {name:"Tom Cruise"})
RETURN tom, m, coActors, m2, cruise
25

Social Network Database
Relationship Type Properties
IS_FRIEND since
LIKES
WROTE_COMMENT
HAS_COMMENT
UPLOADED
SENT_MESSAGE
RECEIVED_MESSAGE
TAGGED_IN
Node Labels Properties
Person name, email, dob
Photo caption, date
Status text, date
Comment text, date
Message Text, date
26

Social Network Database (2)
 CREATE (ann:Person { name: 'Ann', email:'ann@neo.4j', dob:487119060000 })
RETURN ann;
 CREATE (john:Person { name: 'John', email:'john@neo.4j', dob:435679060000 })
RETURN john;
 MATCH (ann:Person { name: 'Ann' }), (john:Person { name: 'John' })
CREATE UNIQUE (ann)-[:IS_FRIEND{since:'2009'}]-(john);
 MATCH (ann:Person { name: 'Ann' })
CREATE (ann)-[:UPLOADED]->(status:Status{text:'Happy Birthday', date:1451610010000});
 MATCH (john:Person { name: 'John' }), (status:Status{text:'Happy Birthday'})
CREATE (john)-[:LIKES]->(status);
27

 MATCH (john:Person { name: 'John' })
CREATE (john)-[:UPLOADED]->(photo:Photo{text:'Birthday Party', date:1452410019386});
 MATCH (ann:Person { name: 'Ann' }), (photo:Photo{text:'Birthday Party'})
CREATE (ann)-[:LIKES]->(photo);
CREATE (ann)-[:WROTE_COMMENT]->(comment:Comment{text:'Happy Birthday. The party
was great!', date:1452410478569})<-[:HAS_COMMENT]-(photo);
CREATE (ann)-[:TAGGED_IN]->(photo);
28

 MATCH (john:Person { name: 'John' })
RETURN (john)-[:UPLOADED]->(:Photo);
 MATCH (ann:Person { name: 'Ann' }) -[:LIKES]->(photo:Photo)
RETURN ann, photo;
 MATCH (ann:Person { name: 'Ann' })-[:IS_FRIEND]-(:Person)-[:UPLOADED]-(photo:Photo)
RETURN ann, photo
 MATCH (ann:Person { name: 'Ann' })-[:IS_FRIEND]-(friend:Person),
(friend:Person)-[:UPLOADED]-(photo:Photo)
RETURN ann, photo, friend
29

How to get started?
 Documentation
 http://neo4j.com/docs/ - tutorials & reference
 Neo4j in Action
 Graph Databases by O'Reilly
 Get Neo4j
 http://neo4j.org/download
 http://elements.heroku.com/addons/graphenedb
 Participate
 http://groups.google.com/group/neo4j
 http://neo4j.meetup.com
 http://stackoverflow.com/questions/tagged/neo4j
30

3rd Athens Big Data Meetup - 2nd Talk - Neo4j: The World's Leading Graph DB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 3rd Athens Big Data Meetup - 2nd Talk - Neo4j: The World's Leading Graph DB

Similar to 3rd Athens Big Data Meetup - 2nd Talk - Neo4j: The World's Leading Graph DB (20)

More from Athens Big Data

More from Athens Big Data (20)

Recently uploaded

Recently uploaded (20)

3rd Athens Big Data Meetup - 2nd Talk - Neo4j: The World's Leading Graph DB