Modelling
Data in Neo4j
plus a few best practices and lessons learned
by Michal Bachman

GraphAware

TM
GraphAware

TM
Contents

GraphAware

TM
Contents
Quick intro

GraphAware

TM
Contents
Quick intro
1x mistake

GraphAware

TM
Contents
Quick intro
1x mistake
1x experiment

GraphAware

TM
Contents
Quick intro
1x mistake
1x experiment
1x FAQ

GraphAware

TM
Contents
Quick intro
1x mistake
1x experiment
1x FAQ
1x case-study

GraphAware

TM
Data Has Changed

GraphAware

TM
Data Has Changed
Larger Volumes

GraphAware

TM
Data Has Changed
Larger Volumes
Less Structured

GraphAware

TM
Data Has Changed
Larger Volumes
Less Structured
More Interconnected

GraphAware

TM
Data Has Changed
Larger Volumes
Less Structured
More Interconnected
Polygot Persistence

GraphAware

TM
NoSQL

GraphAware

TM
NoSQL
Key-Value Stores

GraphAware

TM
NoSQL
Key-Value Stores
Column-Family Stores

GraphAware

TM
NoSQL
Key-Value Stores
Column-Family Stores
Document Databases

GraphAware

TM
NoSQL
Key-Value Stores
Column-Family Stores
Document Databases
Graph Databases

GraphAware

TM
Graph Databases
The first three use aggregate data
models, graph databases work with simple
records and complex interconnec...
Neo4j

GraphAware

TM
Neo4j
Open-source

GraphAware

TM
Neo4j
Open-source
Schema-less

GraphAware

TM
Neo4j
Open-source
Schema-less
JVM-based

GraphAware

TM
Neo4j
Open-source
Schema-less
JVM-based
Fully ACID

GraphAware

TM
Property Graph

name: "Triller"
type: "genre"

name: "Drama"
type: "genre"

IS
_

OF
_G
EN

RE

ED
CT

ipsum

IS_A

IS_

A...
Traversal

name: "Triller"
type: "genre"

name: "Drama"
type: "genre"

IS
_

OF
_G
EN

RE

IS_

IS_A

ED
CT

A

_
IS

name...
Modeling Data as Graphs
There is no single correct way.

GraphAware

TM
One Way

name: "Triller"
type: "genre"

name: "Drama"
type: "genre"

IS
_

OF
_G
EN

RE

ED
CT

ipsum

IS_A

IS_

A

_
IS
...
name: "Pulp Fiction"
year: 1994
type: "movie"
genres: "Drama", "Thriller"

TE
R
AC
AR
CH

CT
ED
DI
RE

name: "Quentin Tara...
Bidirectional
Relationships
a common mistake

GraphAware

TM
Ice Hockey

Czech
Republic

DEFEATED

Sweden

GraphAware

TM
Ice Hockey

Czech
Republic

DEFEATED

Sweden

GraphAware

TM
Ice Hockey (Implied Relationship)

DEFEATED
Czech
Republic

Sweden

DEFEATED_BY

GraphAware

TM
Ice Hockey (Implied Relationship)

DEFEATED
Czech
Republic

Sweden

DEFEATED_BY

GraphAware

TM
Traversals
In Neo4j, the speed of traversal does not
depend on the direction of the relationships
being traversed.

GraphA...
Why?
GraphAware

TM
TM
Node Record in the Node Store (9 bytes), first bit = inUse flag
next
relationship
(35 bits)

next
property
(36 bits)

Relati...
Company Partnership (Naturally Bidirectional)
Neo
Technology

PARTNER

GraphAware

Neo
Technology

PARTNER

GraphAware

Gr...
Company Partnership (Naturally Bidirectional)

PARTNER
Neo
Technology

GraphAware
PARTNER

GraphAware

TM
Company Partnership (Naturally Bidirectional)

PARTNER
Neo
Technology

GraphAware
PARTNER

GraphAware

TM
Company Partnership (Naturally Bidirectional)

Neo
Technology

PARTNER

GraphAware

GraphAware

TM
Company Partnership (Naturally Bidirectional)

Neo
Technology

PARTNER

GraphAware

GraphAware

TM
Why?
Neo4j APIs allow developers to
completely ignore relationship direction
when querying the graph.

GraphAware

TM
Cypher
MATCH	
  (neo)-­‐[:PARTNER]-­‐>(partner)

GraphAware

TM
Cypher
MATCH	
  (neo)<-­‐[:PARTNER]-­‐(partner)

GraphAware

TM
Cypher
MATCH	
  (neo)-­‐[:PARTNER]-­‐(partner)

GraphAware

TM
Qualifying
Relationships
performance comparison

GraphAware

TM
Qualifying by Properties

Daniela

D
E

T
A
R

:4
g

tin
a

r
Pulp
Fiction

RATED

Michal

rating: 5

RA
ra TE
tin
D
g:
1
...
Who liked Pulp Fiction? (Cypher)
START	
  	
  	
  pulpFiction=node({id})
MATCH	
  	
  	
  (pulpFiction)<-­‐[r:RATED]-­‐(fa...
Who liked Pulp Fiction? (Java)
for	
  (Relationship	
  r	
  :	
  pulpFiction.getRelationships(INCOMING,	
  RATED))	
  
{
	...
Qualifying by
Relationship Type

Daniela
D
E

IK
L

Pulp
Fiction

LOVED

Michal

HA
TE

D

Mark

GraphAware

TM
Who liked Pulp Fiction? (Cypher)
START	
  	
  	
  pulpFiction=node({id})
MATCH	
  	
  	
  (pulpFiction)<-­‐[r:LIKED|LOVED]...
Who liked Pulp Fiction? (Java)
for	
  (Relationship	
  r	
  :	
  pF.getRelationships(INCOMING,	
  LIKED,	
  LOVED))	
  
{
...
GraphAware

TM
GraphAware

TM
Winner!

Daniela
D
E

IK
L

Pulp
Fiction

LOVED

Michal

HA
TE

D

Mark

GraphAware

TM
Other interesting info?
Hardware Sizing
frequently asked question

GraphAware

TM
JVM

Other APIs
Transaction
Management

Core API

Object Cache

Operating System
File System Cache

HDD

Properties

Relat...
Disk Space
>	
  cd	
  data
>	
  ls	
  -­‐ah

GraphAware

TM
Disk Space
drwxr-­‐xr-­‐x	
  	
  	
  5	
  bachmanm	
  	
  wheel	
  	
  	
  170B	
  19	
  Oct	
  12:56	
  index
-­‐rw-­‐r-­...
Disk Space
node

9B

relationship

33B

property

41B

GraphAware

TM
Disk Space (Example)
1,000 nodes
1,000,000 rels
2,010,000 props
TOTAL

x 9B =
8.8 kB
x 33B = 31.5 MB
x 41B = 78.6 MB
= 110...
Low Level Cache
How about low level cache? Any guesses?

GraphAware

TM
Low Level Cache
Same as disk space

GraphAware

TM
High Level Cache
node

344B

relationship

208B

property

116B

...

GraphAware

TM
Other interesting info?
Java API vs. Cypher
case study

GraphAware

TM
Data Model
User 2

TRAVELLED_WITH
weight: 3

User 4

TRAVELLED_WITH
weight: 4

TRAVELLED_TOGETHER
weight: 1

User 3

FRIEN...
START	
  	
  	
  	
  from=node:node_auto_index(user_id="{FROM}"),
	
  	
  	
  	
  	
  	
  	
  	
  	
  to=node:node_auto_in...
START	
  	
  	
  	
  from=node:node_auto_index(user_id="{FROM}"),
	
  	
  	
  	
  	
  	
  	
  	
  	
  to=node:node_auto_in...
10 - 20 ms
Java API vs. Cypher

GraphAware

TM
Java API vs. Cypher
Cypher is great!

GraphAware

TM
Java API vs. Cypher
Cypher is great!
Cypher is improving

GraphAware

TM
Java API vs. Cypher
Cypher is great!
Cypher is improving
But don’t be afraid of writing some Java

GraphAware

TM
Thanks!
www.graphaware.com
@graph_aware

GraphAware

TM
Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)
Upcoming SlideShare
Loading in …5
×

Modelling Data in Neo4j (plus a few tips)

2,188 views
2,008 views

Published on

Modelling Data in Neo4j, bidirectional relationships, qualifying relationships with properties vs. relationship types (performance comparison), Neo4j hardware sizing, Cypher vs. Java API

Published in: Technology, Education

Modelling Data in Neo4j (plus a few tips)

  1. 1. Modelling Data in Neo4j plus a few best practices and lessons learned by Michal Bachman GraphAware TM
  2. 2. GraphAware TM
  3. 3. Contents GraphAware TM
  4. 4. Contents Quick intro GraphAware TM
  5. 5. Contents Quick intro 1x mistake GraphAware TM
  6. 6. Contents Quick intro 1x mistake 1x experiment GraphAware TM
  7. 7. Contents Quick intro 1x mistake 1x experiment 1x FAQ GraphAware TM
  8. 8. Contents Quick intro 1x mistake 1x experiment 1x FAQ 1x case-study GraphAware TM
  9. 9. Data Has Changed GraphAware TM
  10. 10. Data Has Changed Larger Volumes GraphAware TM
  11. 11. Data Has Changed Larger Volumes Less Structured GraphAware TM
  12. 12. Data Has Changed Larger Volumes Less Structured More Interconnected GraphAware TM
  13. 13. Data Has Changed Larger Volumes Less Structured More Interconnected Polygot Persistence GraphAware TM
  14. 14. NoSQL GraphAware TM
  15. 15. NoSQL Key-Value Stores GraphAware TM
  16. 16. NoSQL Key-Value Stores Column-Family Stores GraphAware TM
  17. 17. NoSQL Key-Value Stores Column-Family Stores Document Databases GraphAware TM
  18. 18. NoSQL Key-Value Stores Column-Family Stores Document Databases Graph Databases GraphAware TM
  19. 19. Graph Databases The first three use aggregate data models, graph databases work with simple records and complex interconnections. GraphAware TM
  20. 20. Neo4j GraphAware TM
  21. 21. Neo4j Open-source GraphAware TM
  22. 22. Neo4j Open-source Schema-less GraphAware TM
  23. 23. Neo4j Open-source Schema-less JVM-based GraphAware TM
  24. 24. Neo4j Open-source Schema-less JVM-based Fully ACID GraphAware TM
  25. 25. Property Graph name: "Triller" type: "genre" name: "Drama" type: "genre" IS _ OF _G EN RE ED CT ipsum IS_A IS_ A _ IS name: "Pulp Fiction" year: 1994 type: "movie" name: "Director" type: "occupation" A name: "Actor" type: "occupation" IS_OF_GENRE RE DI A ED CT le: ro name: "Samuel L. Jackson" type: "person" name: "Quentin Tarantino" type: "person" _IN imm "J Dim ie ck" mi N D_I ld" TE AC nfie Win les "J u : ole r GraphAware TM
  26. 26. Traversal name: "Triller" type: "genre" name: "Drama" type: "genre" IS _ OF _G EN RE IS_ IS_A ED CT A _ IS name: "Pulp Fiction" year: 1994 type: "movie" name: "Director" type: "occupation" A name: "Actor" type: "occupation" IS_OF_GENRE E IR D A ED CT le: ro name: "Samuel L. Jackson" type: "person" _IN imm "J name: "Quentin Tarantino" type: "person" Dim ie " ick m _IN D A : ole r E CT les "J u " eld fi inn W GraphAware TM
  27. 27. Modeling Data as Graphs There is no single correct way. GraphAware TM
  28. 28. One Way name: "Triller" type: "genre" name: "Drama" type: "genre" IS _ OF _G EN RE ED CT ipsum IS_A IS_ A _ IS name: "Pulp Fiction" year: 1994 type: "movie" name: "Director" type: "occupation" A name: "Actor" type: "occupation" IS_OF_GENRE RE DI A ED CT le: ro name: "Samuel L. Jackson" type: "person" name: "Quentin Tarantino" type: "person" _IN imm "J Dim ie ck" mi N D_I ld" TE AC nfie Win les "J u : ole r GraphAware TM
  29. 29. name: "Pulp Fiction" year: 1994 type: "movie" genres: "Drama", "Thriller" TE R AC AR CH CT ED DI RE name: "Quentin Tarantino" type: "person" occupation: "Actor", "Director" _IN TER ACTED_AS RAC name: "Jimmie Dimmick" type: "role" CHA _IN Another Way name: "Jules Winnfield" type: "role" ACTED_AS name: "Samuel L. Jackson" type: "person" occupation: "Actor" GraphAware TM
  30. 30. Bidirectional Relationships a common mistake GraphAware TM
  31. 31. Ice Hockey Czech Republic DEFEATED Sweden GraphAware TM
  32. 32. Ice Hockey Czech Republic DEFEATED Sweden GraphAware TM
  33. 33. Ice Hockey (Implied Relationship) DEFEATED Czech Republic Sweden DEFEATED_BY GraphAware TM
  34. 34. Ice Hockey (Implied Relationship) DEFEATED Czech Republic Sweden DEFEATED_BY GraphAware TM
  35. 35. Traversals In Neo4j, the speed of traversal does not depend on the direction of the relationships being traversed. GraphAware TM
  36. 36. Why? GraphAware TM TM
  37. 37. Node Record in the Node Store (9 bytes), first bit = inUse flag next relationship (35 bits) next property (36 bits) Relationship Record in the Relationship Store (33 bytes), first bit = inUse flag, second bit unused first node's first node's second second type previous next node's first node's next next property first node second node (16 relationship relationship relationship relationship (36 bits) (35 bits) (35 bits) bits) (35 bits) (35 bits) (35 bits) (35 bits) Neo4j Data Layout GraphAware TM
  38. 38. Company Partnership (Naturally Bidirectional) Neo Technology PARTNER GraphAware Neo Technology PARTNER GraphAware GraphAware TM
  39. 39. Company Partnership (Naturally Bidirectional) PARTNER Neo Technology GraphAware PARTNER GraphAware TM
  40. 40. Company Partnership (Naturally Bidirectional) PARTNER Neo Technology GraphAware PARTNER GraphAware TM
  41. 41. Company Partnership (Naturally Bidirectional) Neo Technology PARTNER GraphAware GraphAware TM
  42. 42. Company Partnership (Naturally Bidirectional) Neo Technology PARTNER GraphAware GraphAware TM
  43. 43. Why? Neo4j APIs allow developers to completely ignore relationship direction when querying the graph. GraphAware TM
  44. 44. Cypher MATCH  (neo)-­‐[:PARTNER]-­‐>(partner) GraphAware TM
  45. 45. Cypher MATCH  (neo)<-­‐[:PARTNER]-­‐(partner) GraphAware TM
  46. 46. Cypher MATCH  (neo)-­‐[:PARTNER]-­‐(partner) GraphAware TM
  47. 47. Qualifying Relationships performance comparison GraphAware TM
  48. 48. Qualifying by Properties Daniela D E T A R :4 g tin a r Pulp Fiction RATED Michal rating: 5 RA ra TE tin D g: 1 Mark GraphAware TM
  49. 49. Who liked Pulp Fiction? (Cypher) START      pulpFiction=node({id}) MATCH      (pulpFiction)<-­‐[r:RATED]-­‐(fan) WHERE      r.rating  >  3 RETURN    fan GraphAware TM
  50. 50. Who liked Pulp Fiction? (Java) for  (Relationship  r  :  pulpFiction.getRelationships(INCOMING,  RATED))   {        if  ((int)  r.getProperty("rating")  >  3)          {                Node  fan  =  r.getStartNode();  //do  something  with  it        } } GraphAware TM
  51. 51. Qualifying by Relationship Type Daniela D E IK L Pulp Fiction LOVED Michal HA TE D Mark GraphAware TM
  52. 52. Who liked Pulp Fiction? (Cypher) START      pulpFiction=node({id}) MATCH      (pulpFiction)<-­‐[r:LIKED|LOVED]-­‐(fan) RETURN    fan GraphAware TM
  53. 53. Who liked Pulp Fiction? (Java) for  (Relationship  r  :  pF.getRelationships(INCOMING,  LIKED,  LOVED))   {        Node  fan  =  r.getStartNode();  //do  something  with  it } GraphAware TM
  54. 54. GraphAware TM
  55. 55. GraphAware TM
  56. 56. Winner! Daniela D E IK L Pulp Fiction LOVED Michal HA TE D Mark GraphAware TM
  57. 57. Other interesting info?
  58. 58. Hardware Sizing frequently asked question GraphAware TM
  59. 59. JVM Other APIs Transaction Management Core API Object Cache Operating System File System Cache HDD Properties Relationship Types Relationships Record Files Nodes Neo4j Architecture Neo4j Transaction Log GraphAware TM
  60. 60. Disk Space >  cd  data >  ls  -­‐ah GraphAware TM
  61. 61. Disk Space drwxr-­‐xr-­‐x      5  bachmanm    wheel      170B  19  Oct  12:56  index -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel        31K  19  Oct  12:56  messages.log -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel        69B  19  Oct  12:56  neostore -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.id -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel      8.8K  19  Oct  12:56  neostore.nodestore.db -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.nodestore.db.id -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel        39M  19  Oct  12:56  neostore.propertystore.db -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel      153B  19  Oct  12:56  neostore.propertystore.db.arrays -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.propertystore.db.arrays.id -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.propertystore.db.id -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel        43B  19  Oct  12:56  neostore.propertystore.db.index -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.propertystore.db.index.id -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel      140B  19  Oct  12:56  neostore.propertystore.db.index.keys -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.propertystore.db.index.keys.id -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel      154B  19  Oct  12:56  neostore.propertystore.db.strings -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.propertystore.db.strings.id -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel        31M  19  Oct  12:56  neostore.relationshipstore.db -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.relationshipstore.db.id -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel        38B  19  Oct  12:56  neostore.relationshiptypestore.db -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.relationshiptypestore.db.id -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel      140B  19  Oct  12:56  neostore.relationshiptypestore.db.names -­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.relationshiptypestore.db.names.id GraphAware TM
  62. 62. Disk Space node 9B relationship 33B property 41B GraphAware TM
  63. 63. Disk Space (Example) 1,000 nodes 1,000,000 rels 2,010,000 props TOTAL x 9B = 8.8 kB x 33B = 31.5 MB x 41B = 78.6 MB = 110.1 MB GraphAware TM
  64. 64. Low Level Cache How about low level cache? Any guesses? GraphAware TM
  65. 65. Low Level Cache Same as disk space GraphAware TM
  66. 66. High Level Cache node 344B relationship 208B property 116B ... GraphAware TM
  67. 67. Other interesting info?
  68. 68. Java API vs. Cypher case study GraphAware TM
  69. 69. Data Model User 2 TRAVELLED_WITH weight: 3 User 4 TRAVELLED_WITH weight: 4 TRAVELLED_TOGETHER weight: 1 User 3 FRIEND weight: 5 User 1 GraphAware TM
  70. 70. START        from=node:node_auto_index(user_id="{FROM}"),                  to=node:node_auto_index(user_id="{TO}") MATCH        p  =  from-­‐[r*1..5]-­‐>to RETURN      extract(n  in  nodes(p)  :  n.user_id),                    extract(rel  in  relationships(p)  :  rel.weight),                    extract(rel  in  relationships(p)  :  type(rel)) ORDER  BY  length(p),                    reduce(totalWeight  =  0,  rel  in  relationships(p)  :                  totalWeight  +  rel.weight) LIMIT        3 GraphAware TM
  71. 71. START        from=node:node_auto_index(user_id="{FROM}"),                  to=node:node_auto_index(user_id="{TO}") MATCH        p  =  from-­‐[r*1..5]-­‐>to RETURN      extract(n  in  nodes(p)  :  n.user_id),                    extract(rel  in  relationships(p)  :  rel.weight),                    extract(rel  in  relationships(p)  :  type(rel)) ORDER  BY  length(p),                    reduce(totalWeight  =  0,  rel  in  relationships(p)  :                  totalWeight  +  rel.weight) LIMIT        3 > 1 second GraphAware TM
  72. 72. 10 - 20 ms
  73. 73. Java API vs. Cypher GraphAware TM
  74. 74. Java API vs. Cypher Cypher is great! GraphAware TM
  75. 75. Java API vs. Cypher Cypher is great! Cypher is improving GraphAware TM
  76. 76. Java API vs. Cypher Cypher is great! Cypher is improving But don’t be afraid of writing some Java GraphAware TM
  77. 77. Thanks! www.graphaware.com @graph_aware GraphAware TM

×