G R A P H DATA B A S ES
W H O A M I ?
• David Simons
• @SwamWithTurtles
• github.com/
SwamWithTurtles
• Technical Lead at Softwire
and part-time hacker
• Statistician in a past life
G ra p h i ca l M o d e l l i n g
N e o 4 J : T h e W h at A n d W h y ?
“ Cy p h e r ” Q u e ry La n g u a g e
Le t ’s S e e I t A ct i o n !
T h e G ra p h i n g Eco s y ste m
G ra p h i ca l M o d e l l i n g
N e o 4 J : T h e W h at A n d W h y ?
“ Cy p h e r ” Q u e ry La n g u a g e
Le t ’s S e e I t A ct i o n !
T h e G ra p h i n g Eco s y ste m
W H AT I S A G R A P H ?
Taken from Jim Webber’s Dr. Who Dataset
W H AT I S A G R A P H ?
{ (V, E) : V = [n], E ⊆ V(2) }
W H AT I S A G R A P H ?
{ (V, E) : V = [n], E ⊆ V(2) }
Made up of two parts,
“V” and “E”
W H AT I S A G R A P H ?
{ (V, E) : V = [n], E ⊆ V(2) }
V is a set of n items
W H AT I S A G R A P H ?
Vertex Set
W H AT I S A G R A P H ?
{ (V, E) : V = [n], E ⊆ V(2) }
E is made up of pairs
of elements of V
(Ordered and
not necessarily distinct)
W H AT I S A G R A P H ?
Edge Set
G I V I N G R E A L
W O R L D
M E A N I N G S T O V
A N D E
W H A T I S G R A P H I C A L
M O D E L L I N G ?
B R I D G E S AT K Ö N I G S B E R G
B R I D G E S AT K Ö N I G S B E R G
V =
bits of land
E =
bridges
K E V I N B A C O N
S I X D E G R E E S O F …
T H E R E I S N O O P E N
E L E C T I O N D ATA
T H E P R O B L E M
E L E C T I O N D ATA
E L E C T I O N D ATA
E L E C T I O N D ATA
E =
(e.g.) member of, held in,
stood in…
V =
elections, constituencies,
years, politicians and parties
G ra p h i ca l M o d e l l i n g
N e o 4 J : T h e W h at A n d W h y ?
“ Cy p h e r ” Q u e ry La n g u a g e
Le t ’s S e e I t A ct i o n !
T h e G ra p h i n g Eco s y ste m
W O R L D ’ S L E A D I N G G R A P H
D B :
W H E R E D I D I T
C O M E F R O M ?
• First version 2010, v2 came
out December 2013.
"embedded, disk-based, fully transactional Java
persistence engine that stores data structured in
graphs rather than in tables"
D ATA S T O R A G E
D ATA S T O R A G E
D ATA
S T O R A G E
• Nodes and edges are all:
• Stored as first-class
objects on the file
system
• “typed”
• Key-value stores
C O M M U N I T Y
E D I T I O N
• Free for hacking around in
E N T E R P R I S E
E D I T I O N
• Bespoke Prices, but
includes:
• Higher performance
for concurrent
querying
• Clustering
• Hot backups
• Advanced Monitoring
O T H E R G R A P H
D ATA B A S E S
• ArangoDB
• OrientDB
• New: Graph Engine
W H AT ’ S
W R O N G W I T H
S Q L ?
B U T…
N O T H I N G *
N O T H I N G *
*If you use it for
the right job
D ATA I N T H E
R E L AT I O N S
• “Joins” are first class
objects in the database
that can be queried at no
additional cost
• Certain queries become
trivial (e.g. Joins)
P R O T O T Y P I N G
• Easy to see and work with
data
• Schemaless
• Active community with a
lot of libraries
N E O 4 J U S E R S
C A S E S T U D I E S
Real-time Recommendations
C A S E S T U D I E S
Logistics & Delivery
Organisation
C A S E S T U D I E S
Online Dating
G ra p h i ca l M o d e l l i n g
N e o 4 J : T h e W h at A n d W h y ?
“ Cy p h e r ” Q u e ry La n g u a g e
Le t ’s S e e I t A ct i o n !
T h e G ra p h i n g Eco s y ste m
W H AT I S
C Y P H E R ?
• Neo4j’s own query
language
• Declarative
• Designed to be readable
and easy to learn
A S C I I A R T S Y N TA X : N O D E S
(n)
(n:Actor)
(n:Actor {name:”Kevin Bacon”})
A S C I I A R T S Y N TA X : E D G E S
-[r:starred_in]->
<-[r:starred_in]-
-[r:starred_in]-
A S C I I A R T S Y N TA X : E D G E S
(n:Actor)-[r:starred_in]->(m:Movie)
A S C I I A R T S Y N TA X : E D G E S
(n:Actor)-[r:starred_in]->(m:Movie)
<-[r:starred_in]-(a:Actor)
M AT C H & R E T U R N
MATCH {pattern} RETURN {variables}
M AT C H & R E T U R N
MATCH (n:Actor)-[r:starred_in]->(m:Movie)
RETURN n, r, m
M AT C H & R E T U R N
MATCH (n:Actor {name: ”Kevin Bacon”})
-[r:starred_in]->(m:Movie)
RETURN m
P E R S I S T E N C E
CREATE (n: Actor {name: “David”})
RETURN n
P E R S I S T E N C E
MATCH (m:Movie), (a:Actor {name =“David”})
CREATE (a)-[:starred_in]->(m)
RETURN a, m
A G G R E G AT I O N
MATCH (n:Actor)<-[:starred_in]-(m:Movie)
RETURN n, sum(m.revenue)
L O A D F R O M C S V
LOAD CSV FROM 'foo.csv' AS line
CREATE (:Actor { name: line[1]})
G ra p h i ca l M o d e l l i n g
N e o 4 J : T h e W h at A n d W h y ?
“ Cy p h e r ” Q u e ry La n g u a g e
Le t ’s S e e I t A ct i o n !
T h e G ra p h i n g Eco s y ste m
G ra p h i ca l M o d e l l i n g
N e o 4 J : T h e W h at A n d W h y ?
“ Cy p h e r ” Q u e ry La n g u a g e
Le t ’s S e e I t A ct i o n !
T h e G ra p h i n g Eco s y ste m
D I F F E R E N T
L A N G U A G E
S U P P O R T
• Java
• Spring Data for full
ORM
• Hibernate OGM
• Embedded Java API
• Kundera
D I F F E R E N T
L A N G U A G E
S U P P O R T
• .NET - Neo4jClient
• JavaScript - Seraph.js,
node-neo4j
• Clojure - Neocons
• Haskell, Go, PHP and
more…
G R A P H E N E D B
• Remote hosting of neo4j
on Heroku, AWS or Azure
• Monitoring, support, back-
ups, scalability
V I S U A L I S AT I O N
T O O L S
• Lots of tools out there to
take subgraphs and turn
them into pretty views.
V I S U A L I S AT I O N T O O L S : A L C H E M Y J S
V I S U A L I S AT I O N T O O L S : L I N K U R I O U S
G R A P H A WA R E
• Java libraries that make
developing with graphs
easier:
• “TimeTree”
• “GraphGen”
• “Reco”
I N
C O N C L U S I O N …
• Graphs more accurately
model a lot of domains
• Neo4j is a robust and
mature way of storing this
• It’s got a thriving
ecosystem and community
• Go forth and play!
A N Y Q U EST I O N S ?
@ Swa m Wi t h Tu rt l e s
s wa m w i t h t u rt l e s . co m

Graph Modelling

  • 1.
    G R AP H DATA B A S ES
  • 2.
    W H OA M I ? • David Simons • @SwamWithTurtles • github.com/ SwamWithTurtles • Technical Lead at Softwire and part-time hacker • Statistician in a past life
  • 3.
    G ra ph i ca l M o d e l l i n g N e o 4 J : T h e W h at A n d W h y ? “ Cy p h e r ” Q u e ry La n g u a g e Le t ’s S e e I t A ct i o n ! T h e G ra p h i n g Eco s y ste m
  • 4.
    G ra ph i ca l M o d e l l i n g N e o 4 J : T h e W h at A n d W h y ? “ Cy p h e r ” Q u e ry La n g u a g e Le t ’s S e e I t A ct i o n ! T h e G ra p h i n g Eco s y ste m
  • 5.
    W H ATI S A G R A P H ? Taken from Jim Webber’s Dr. Who Dataset
  • 6.
    W H ATI S A G R A P H ? { (V, E) : V = [n], E ⊆ V(2) }
  • 7.
    W H ATI S A G R A P H ? { (V, E) : V = [n], E ⊆ V(2) } Made up of two parts, “V” and “E”
  • 8.
    W H ATI S A G R A P H ? { (V, E) : V = [n], E ⊆ V(2) } V is a set of n items
  • 9.
    W H ATI S A G R A P H ? Vertex Set
  • 10.
    W H ATI S A G R A P H ? { (V, E) : V = [n], E ⊆ V(2) } E is made up of pairs of elements of V (Ordered and not necessarily distinct)
  • 11.
    W H ATI S A G R A P H ? Edge Set
  • 12.
    G I VI N G R E A L W O R L D M E A N I N G S T O V A N D E W H A T I S G R A P H I C A L M O D E L L I N G ?
  • 13.
    B R ID G E S AT K Ö N I G S B E R G
  • 14.
    B R ID G E S AT K Ö N I G S B E R G V = bits of land E = bridges
  • 15.
    K E VI N B A C O N S I X D E G R E E S O F …
  • 20.
    T H ER E I S N O O P E N E L E C T I O N D ATA T H E P R O B L E M
  • 21.
    E L EC T I O N D ATA
  • 22.
    E L EC T I O N D ATA
  • 23.
    E L EC T I O N D ATA E = (e.g.) member of, held in, stood in… V = elections, constituencies, years, politicians and parties
  • 24.
    G ra ph i ca l M o d e l l i n g N e o 4 J : T h e W h at A n d W h y ? “ Cy p h e r ” Q u e ry La n g u a g e Le t ’s S e e I t A ct i o n ! T h e G ra p h i n g Eco s y ste m
  • 25.
    W O RL D ’ S L E A D I N G G R A P H D B :
  • 26.
    W H ER E D I D I T C O M E F R O M ? • First version 2010, v2 came out December 2013.
  • 27.
    "embedded, disk-based, fullytransactional Java persistence engine that stores data structured in graphs rather than in tables"
  • 28.
    D ATA ST O R A G E
  • 29.
    D ATA ST O R A G E
  • 30.
    D ATA S TO R A G E • Nodes and edges are all: • Stored as first-class objects on the file system • “typed” • Key-value stores
  • 31.
    C O MM U N I T Y E D I T I O N • Free for hacking around in
  • 32.
    E N TE R P R I S E E D I T I O N • Bespoke Prices, but includes: • Higher performance for concurrent querying • Clustering • Hot backups • Advanced Monitoring
  • 33.
    O T HE R G R A P H D ATA B A S E S • ArangoDB • OrientDB • New: Graph Engine
  • 34.
    W H AT’ S W R O N G W I T H S Q L ? B U T…
  • 35.
    N O TH I N G *
  • 36.
    N O TH I N G * *If you use it for the right job
  • 37.
    D ATA IN T H E R E L AT I O N S • “Joins” are first class objects in the database that can be queried at no additional cost • Certain queries become trivial (e.g. Joins)
  • 38.
    P R OT O T Y P I N G • Easy to see and work with data • Schemaless • Active community with a lot of libraries
  • 39.
    N E O4 J U S E R S
  • 40.
    C A SE S T U D I E S Real-time Recommendations
  • 41.
    C A SE S T U D I E S Logistics & Delivery Organisation
  • 42.
    C A SE S T U D I E S Online Dating
  • 43.
    G ra ph i ca l M o d e l l i n g N e o 4 J : T h e W h at A n d W h y ? “ Cy p h e r ” Q u e ry La n g u a g e Le t ’s S e e I t A ct i o n ! T h e G ra p h i n g Eco s y ste m
  • 44.
    W H ATI S C Y P H E R ? • Neo4j’s own query language • Declarative • Designed to be readable and easy to learn
  • 45.
    A S CI I A R T S Y N TA X : N O D E S (n) (n:Actor) (n:Actor {name:”Kevin Bacon”})
  • 46.
    A S CI I A R T S Y N TA X : E D G E S -[r:starred_in]-> <-[r:starred_in]- -[r:starred_in]-
  • 47.
    A S CI I A R T S Y N TA X : E D G E S (n:Actor)-[r:starred_in]->(m:Movie)
  • 48.
    A S CI I A R T S Y N TA X : E D G E S (n:Actor)-[r:starred_in]->(m:Movie) <-[r:starred_in]-(a:Actor)
  • 49.
    M AT CH & R E T U R N MATCH {pattern} RETURN {variables}
  • 50.
    M AT CH & R E T U R N MATCH (n:Actor)-[r:starred_in]->(m:Movie) RETURN n, r, m
  • 51.
    M AT CH & R E T U R N MATCH (n:Actor {name: ”Kevin Bacon”}) -[r:starred_in]->(m:Movie) RETURN m
  • 52.
    P E RS I S T E N C E CREATE (n: Actor {name: “David”}) RETURN n
  • 53.
    P E RS I S T E N C E MATCH (m:Movie), (a:Actor {name =“David”}) CREATE (a)-[:starred_in]->(m) RETURN a, m
  • 54.
    A G GR E G AT I O N MATCH (n:Actor)<-[:starred_in]-(m:Movie) RETURN n, sum(m.revenue)
  • 55.
    L O AD F R O M C S V LOAD CSV FROM 'foo.csv' AS line CREATE (:Actor { name: line[1]})
  • 56.
    G ra ph i ca l M o d e l l i n g N e o 4 J : T h e W h at A n d W h y ? “ Cy p h e r ” Q u e ry La n g u a g e Le t ’s S e e I t A ct i o n ! T h e G ra p h i n g Eco s y ste m
  • 57.
    G ra ph i ca l M o d e l l i n g N e o 4 J : T h e W h at A n d W h y ? “ Cy p h e r ” Q u e ry La n g u a g e Le t ’s S e e I t A ct i o n ! T h e G ra p h i n g Eco s y ste m
  • 58.
    D I FF E R E N T L A N G U A G E S U P P O R T • Java • Spring Data for full ORM • Hibernate OGM • Embedded Java API • Kundera
  • 59.
    D I FF E R E N T L A N G U A G E S U P P O R T • .NET - Neo4jClient • JavaScript - Seraph.js, node-neo4j • Clojure - Neocons • Haskell, Go, PHP and more…
  • 60.
    G R AP H E N E D B • Remote hosting of neo4j on Heroku, AWS or Azure • Monitoring, support, back- ups, scalability
  • 61.
    V I SU A L I S AT I O N T O O L S • Lots of tools out there to take subgraphs and turn them into pretty views.
  • 62.
    V I SU A L I S AT I O N T O O L S : A L C H E M Y J S
  • 63.
    V I SU A L I S AT I O N T O O L S : L I N K U R I O U S
  • 64.
    G R AP H A WA R E • Java libraries that make developing with graphs easier: • “TimeTree” • “GraphGen” • “Reco”
  • 65.
    I N C ON C L U S I O N … • Graphs more accurately model a lot of domains • Neo4j is a robust and mature way of storing this • It’s got a thriving ecosystem and community • Go forth and play!
  • 66.
    A N YQ U EST I O N S ? @ Swa m Wi t h Tu rt l e s s wa m w i t h t u rt l e s . co m