GraphAware
TM
by Michal Bachman
plus a few best practices and lessons learned
Modelling
Data in Neo4j
GraphAware
TM
GraphAware
TM
Contents
GraphAware
TM
Quick intro
Contents
GraphAware
TM
Quick intro
1x mistake
Contents
GraphAware
TM
Quick intro
1x mistake
1x experiment
Contents
GraphAware
TM
Quick intro
1x mistake
1x experiment
1x FAQ
Contents
GraphAware
TM
Quick intro
1x mistake
1x experiment
1x FAQ
1x case-study
Contents
GraphAware
TM
Data Has Changed
GraphAware
TM
Larger Volumes
Data Has Changed
GraphAware
TM
Larger Volumes
Less Structured
Data Has Changed
GraphAware
TM
Larger Volumes
Less Structured
More Interconnected
Data Has Changed
GraphAware
TM
Larger Volumes
Less Structured
More Interconnected
Polygot Persistence
Data Has Changed
GraphAware
TM
NoSQL
GraphAware
TM
Key-Value Stores
NoSQL
GraphAware
TM
Key-Value Stores
Column-Family Stores
NoSQL
GraphAware
TM
Key-Value Stores
Column-Family Stores
Document Databases
NoSQL
GraphAware
TM
Key-Value Stores
Column-Family Stores
Document Databases
Graph Databases
NoSQL
GraphAware
TM
The first three use aggregate data
models, graph databases work with simple
records and complex interconnections.
Graph Databases
GraphAware
TM
Neo4j
GraphAware
TM
Open-source
Neo4j
GraphAware
TM
Open-source
Schema-less
Neo4j
GraphAware
TM
Open-source
Schema-less
JVM-based
Neo4j
GraphAware
TM
Open-source
Schema-less
JVM-based
Fully ACID
Neo4j
ipsum
name: "Drama"
type: "genre"
name: "Triller"
type: "genre"
name: "Pulp Fiction"
year: 1994
type: "movie"
DIRECTED
IS_OF_GENRE
name: "Quentin Tarantino"
type: "person"
name: "Director"
type: "occupation"
name: "Actor"
type: "occupation"
IS_OF_GENRE
ACTED_IN
name: "Samuel L. Jackson"
type: "person"
IS_A
IS_A
IS_A
ACTED_IN
role: "Jules Winnfield"
role: "Jimmie Dimmick"
GraphAware
TM
Property Graph
name: "Drama"
type: "genre"
name: "Triller"
type: "genre"
name: "Pulp Fiction"
year: 1994
type: "movie"
DIRECTED
IS_OF_GENRE
name: "Quentin Tarantino"
type: "person"
name: "Director"
type: "occupation"
name: "Actor"
type: "occupation"
IS_OF_GENRE
ACTED_IN
name: "Samuel L. Jackson"
type: "person"
IS_A
IS_A
IS_A
ACTED_IN
role: "Jules Winnfield"
role: "Jimmie Dimmick"
GraphAware
TM
Traversal
GraphAware
TM
There is no single correct way.
Modeling Data as Graphs
ipsum
name: "Drama"
type: "genre"
name: "Triller"
type: "genre"
name: "Pulp Fiction"
year: 1994
type: "movie"
DIRECTED
IS_OF_GENRE
name: "Quentin Tarantino"
type: "person"
name: "Director"
type: "occupation"
name: "Actor"
type: "occupation"
IS_OF_GENRE
ACTED_IN
name: "Samuel L. Jackson"
type: "person"
IS_A
IS_A
IS_A
ACTED_IN
role: "Jules Winnfield"
role: "Jimmie Dimmick"
GraphAware
TM
One Way
GraphAware
TM
name: "Pulp Fiction"
year: 1994
type: "movie"
genres: "Drama", "Thriller"
DIRECTED
name: "Quentin Tarantino"
type: "person"
occupation: "Actor", "Director"
ACTED_AS
name: "Samuel L. Jackson"
type: "person"
occupation: "Actor"
ACTED_AS
name: "Jules Winnfield"
type: "role"
name: "Jimmie Dimmick"
type: "role"
CHARACTER_IN
CHARACTER_IN
Another Way
GraphAware
TM
a common mistake
Bidirectional
Relationships
DEFEATEDCzech
Republic
Sweden
GraphAware
TM
Ice Hockey
DEFEATEDCzech
Republic
Sweden
GraphAware
TM
Ice Hockey
DEFEATED
Czech
Republic
Sweden
DEFEATED_BY
GraphAware
TM
Ice Hockey (Implied Relationship)
DEFEATED
Czech
Republic
Sweden
DEFEATED_BY
GraphAware
TM
Ice HockeyIce Hockey (Implied Relationship)
GraphAware
TM
In Neo4j, the speed of traversal does not
depend on the direction of the relationships
being traversed.
Traversals
GraphAware
TM
Why?
GraphAware
TM
Node Record in the Node Store (9 bytes), first bit = inUse flag
Relationship Record in the Relationship Store (33 bytes), first bit = inUse flag, second bit unused
next
relationship
(35 bits)
next
property
(36 bits)
first node
(35 bits)
second node
(35 bits)
type
(16
bits)
first node's
previous
relationship
(35 bits)
first node's
next
relationship
(35 bits)
second
node's first
relationship
(35 bits)
second
node's next
relationship
(35 bits)
next property
(36 bits)
GraphAware
TM
Neo4j Data Layout
PARTNERNeo
Technology
GraphAware
PARTNERNeo
Technology
GraphAware
GraphAware
TM
Company Partnership (Naturally Bidirectional)
PARTNER
Neo
Technology
GraphAware
PARTNER
GraphAware
TM
Company Partnership (Naturally Bidirectional)
PARTNER
Neo
Technology
GraphAware
PARTNER
GraphAware
TM
Company Partnership (Naturally Bidirectional)
Neo
Technology
GraphAware
PARTNER
GraphAware
TM
Company Partnership (Naturally Bidirectional)
Neo
Technology
GraphAware
PARTNER
GraphAware
TM
Company Partnership (Naturally Bidirectional)
GraphAware
TM
Neo4j APIs allow developers to
completely ignore relationship direction
when querying the graph.
Why?
GraphAware
TM
MATCH	
  (neo)-­‐[:PARTNER]-­‐>(partner)
Cypher
GraphAware
TM
MATCH	
  (neo)<-­‐[:PARTNER]-­‐(partner)
Cypher
GraphAware
TM
MATCH	
  (neo)-­‐[:PARTNER]-­‐(partner)
Cypher
GraphAware
TM
performance comparison
Qualifying
Relationships
Pulp
Fiction
Michal
RATED
rating: 5
Mark
Daniela
RATED
rating: 1
RATED
rating: 4
GraphAware
TM
Qualifying by Properties
GraphAware
TM
START	
  	
  	
  pulpFiction=node({id})
MATCH	
  	
  	
  (pulpFiction)<-­‐[r:RATED]-­‐(fan)
WHERE	
  	
  	
  r.rating	
  >	
  3
RETURN	
  	
  fan
Who liked Pulp Fiction? (Cypher)
GraphAware
TM
for	
  (Relationship	
  r	
  :	
  pulpFiction.getRelationships(INCOMING,	
  RATED))	
  
{
	
  	
  	
  	
  if	
  ((int)	
  r.getProperty("rating")	
  >	
  3)	
  
	
  	
  	
  	
  {
	
  	
  	
  	
  	
  	
  	
  	
  Node	
  fan	
  =	
  r.getStartNode();	
  //do	
  something	
  with	
  it
	
  	
  	
  	
  }
}
Who liked Pulp Fiction? (Java)
Pulp
Fiction
Michal
LOVED
Mark
Daniela
HATED
LIKED
GraphAware
TM
Qualifying by
Relationship Type
GraphAware
TM
START	
  	
  	
  pulpFiction=node({id})
MATCH	
  	
  	
  (pulpFiction)<-­‐[r:LIKED|LOVED]-­‐(fan)
RETURN	
  	
  fan
Who liked Pulp Fiction? (Cypher)
GraphAware
TM
for	
  (Relationship	
  r	
  :	
  pF.getRelationships(INCOMING,	
  LIKED,	
  LOVED))	
  
{
	
  	
  	
  	
  Node	
  fan	
  =	
  r.getStartNode();	
  //do	
  something	
  with	
  it
}
Who liked Pulp Fiction? (Java)
GraphAware
TM
GraphAware
TM
Pulp
Fiction
Michal
LOVED
Mark
Daniela
HATED
LIKED
GraphAware
TM
Winner!
Other interesting info?
GraphAware
TM
frequently asked question
Hardware Sizing
HDD
Record Files
Transaction
Log
Operating System
JVM
Neo4j
Object Cache
Core API
Other APIs
Transaction
Management
File System Cache
Nodes
Relationships
Properties
Relationship
Types
GraphAware
TM
Neo4j Architecture
GraphAware
TM
>	
  cd	
  data
>	
  ls	
  -­‐ah
Disk Space
GraphAware
TM
drwxr-­‐xr-­‐x	
  	
  	
  5	
  bachmanm	
  	
  wheel	
  	
  	
  170B	
  19	
  Oct	
  12:56	
  index
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  1	
  bachmanm	
  	
  wheel	
  	
  	
  	
  31K	
  19	
  Oct	
  12:56	
  messages.log
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  1	
  bachmanm	
  	
  wheel	
  	
  	
  	
  69B	
  19	
  Oct	
  12:56	
  neostore
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  1	
  bachmanm	
  	
  wheel	
  	
  	
  	
  	
  9B	
  19	
  Oct	
  12:56	
  neostore.id
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  1	
  bachmanm	
  	
  wheel	
  	
  	
  8.8K	
  19	
  Oct	
  12:56	
  neostore.nodestore.db
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  1	
  bachmanm	
  	
  wheel	
  	
  	
  	
  	
  9B	
  19	
  Oct	
  12:56	
  neostore.nodestore.db.id
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  1	
  bachmanm	
  	
  wheel	
  	
  	
  	
  39M	
  19	
  Oct	
  12:56	
  neostore.propertystore.db
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  1	
  bachmanm	
  	
  wheel	
  	
  	
  153B	
  19	
  Oct	
  12:56	
  neostore.propertystore.db.arrays
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  1	
  bachmanm	
  	
  wheel	
  	
  	
  	
  	
  9B	
  19	
  Oct	
  12:56	
  neostore.propertystore.db.arrays.id
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  1	
  bachmanm	
  	
  wheel	
  	
  	
  	
  	
  9B	
  19	
  Oct	
  12:56	
  neostore.propertystore.db.id
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  1	
  bachmanm	
  	
  wheel	
  	
  	
  	
  43B	
  19	
  Oct	
  12:56	
  neostore.propertystore.db.index
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  1	
  bachmanm	
  	
  wheel	
  	
  	
  	
  	
  9B	
  19	
  Oct	
  12:56	
  neostore.propertystore.db.index.id
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  1	
  bachmanm	
  	
  wheel	
  	
  	
  140B	
  19	
  Oct	
  12:56	
  neostore.propertystore.db.index.keys
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  1	
  bachmanm	
  	
  wheel	
  	
  	
  	
  	
  9B	
  19	
  Oct	
  12:56	
  neostore.propertystore.db.index.keys.id
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  1	
  bachmanm	
  	
  wheel	
  	
  	
  154B	
  19	
  Oct	
  12:56	
  neostore.propertystore.db.strings
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  1	
  bachmanm	
  	
  wheel	
  	
  	
  	
  	
  9B	
  19	
  Oct	
  12:56	
  neostore.propertystore.db.strings.id
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  1	
  bachmanm	
  	
  wheel	
  	
  	
  	
  31M	
  19	
  Oct	
  12:56	
  neostore.relationshipstore.db
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  1	
  bachmanm	
  	
  wheel	
  	
  	
  	
  	
  9B	
  19	
  Oct	
  12:56	
  neostore.relationshipstore.db.id
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  1	
  bachmanm	
  	
  wheel	
  	
  	
  	
  38B	
  19	
  Oct	
  12:56	
  neostore.relationshiptypestore.db
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  1	
  bachmanm	
  	
  wheel	
  	
  	
  	
  	
  9B	
  19	
  Oct	
  12:56	
  neostore.relationshiptypestore.db.id
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  1	
  bachmanm	
  	
  wheel	
  	
  	
  140B	
  19	
  Oct	
  12:56	
  neostore.relationshiptypestore.db.names
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  	
  1	
  bachmanm	
  	
  wheel	
  	
  	
  	
  	
  9B	
  19	
  Oct	
  12:56	
  neostore.relationshiptypestore.db.names.id
Disk Space
GraphAware
TM
Disk Space
node 9B
relationship 33B
property 41B
GraphAware
TM
Disk Space (Example)
1,000 nodes x 9B =
=
8.8 kB
1,000,000 rels x 33B =
=
31.5 MB
2,010,000 props x 41B =
=
78.6 MB
TOTAL 110.1 MB
GraphAware
TM
How about low level cache? Any guesses?
Low Level Cache
GraphAware
TM
Same as disk space
Low Level Cache
GraphAware
TM
High Level Cache
node 344B
relationship 208B
property 116B
...
Other interesting info?
GraphAware
TM
case study
Java API vs. Cypher
User 2
User 1
User 3
TRAVELLED_WITH
User 4
TRAVELLED_TOGETHER
FRIEND
TRAVELLED_WITH
weight: 5
weight: 1
weight: 3 weight: 4
GraphAware
TM
Data Model
GraphAware
TM
START	
  	
  	
  	
  from=node:node_auto_index(user_id="{FROM}"),
	
  	
  	
  	
  	
  	
  	
  	
  	
  to=node:node_auto_index(user_id="{TO}")
MATCH	
  	
  	
  	
  p	
  =	
  from-­‐[r*1..5]-­‐>to
RETURN	
  	
  	
  extract(n	
  in	
  nodes(p)	
  :	
  n.user_id),	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  extract(rel	
  in	
  relationships(p)	
  :	
  rel.weight),	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  extract(rel	
  in	
  relationships(p)	
  :	
  type(rel))
ORDER	
  BY	
  length(p),	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  reduce(totalWeight	
  =	
  0,	
  rel	
  in	
  relationships(p)	
  :
	
  	
  	
  	
  	
  	
  	
  	
  	
  totalWeight	
  +	
  rel.weight)
LIMIT	
  	
  	
  	
  3
GraphAware
TM
START	
  	
  	
  	
  from=node:node_auto_index(user_id="{FROM}"),
	
  	
  	
  	
  	
  	
  	
  	
  	
  to=node:node_auto_index(user_id="{TO}")
MATCH	
  	
  	
  	
  p	
  =	
  from-­‐[r*1..5]-­‐>to
RETURN	
  	
  	
  extract(n	
  in	
  nodes(p)	
  :	
  n.user_id),	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  extract(rel	
  in	
  relationships(p)	
  :	
  rel.weight),	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  extract(rel	
  in	
  relationships(p)	
  :	
  type(rel))
ORDER	
  BY	
  length(p),	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  reduce(totalWeight	
  =	
  0,	
  rel	
  in	
  relationships(p)	
  :
	
  	
  	
  	
  	
  	
  	
  	
  	
  totalWeight	
  +	
  rel.weight)
LIMIT	
  	
  	
  	
  3
> 1 second
10 - 20 ms
GraphAware
TM
Java API vs. Cypher
GraphAware
TM
Cypher is great!
Java API vs. Cypher
GraphAware
TM
Cypher is great!
Cypher is improving
Java API vs. Cypher
GraphAware
TM
Cypher is great!
Cypher is improving
But don’t be afraid of writing some Java
Java API vs. Cypher
GraphAware
TM
www.graphaware.com
@graph_aware
Thanks!

Modelling Data in Neo4j (plus a few tips)