Intro	
  to	
  Graph	
  Databases	
  	
  
in	
  a	
  NOSQL	
  world	
  
19th	
  of	
  May	
  2015	
  
Agenda	
  
•  About	
  Graphs	
  
•  About	
  Graph	
  Databases	
  
–  About	
  Neo4j	
  
•  Graph	
  Querying	
  
–  Short	
  demonstra:on	
  
•  Case	
  Studies	
  
•  Q&A	
  
Introduc.on:	
  about	
  Graphs	
  
Meet	
  	
  
Leonhard	
  Euler	
  
(again?)	
  
•  Swiss	
  mathema:cian	
  
•  Inventor	
  of	
  Graph	
  
Theory	
  (1736)	
  
Königsberg	
  (Prussia)	
  -­‐	
  1736	
  
A	
  
B	
  
D	
  
C	
  
A	
  
B	
  
D	
  
C	
  
1
2
3
4
7
6
5
About	
  Graph	
  Databases	
  
Complemen.ng	
  	
  
Relational Databases
VOLUME	
   COMPLEXITY	
  
NOT	
  ONLY	
  SQL	
  
RDBMS	
  
Living	
  in	
  a	
  NOSQL	
  World	
  
Complexity	
  
Column	
  
Family	
  
Size	
  
Key-­‐Value	
  
Store	
  
Document	
  
Databases	
  
Graph	
  
Databases	
  
90%	
  of	
  Use	
  
cases	
  
Rela:onal	
  
Databases	
  
Naviga:onal	
  
Databases	
  
So	
  what	
  is	
  a	
  graph	
  database?	
  
•  OLTP	
  database	
  
•  “end-­‐user”	
  transac:ons	
  
•  Model,	
  store,	
  manage	
  data	
  as	
  a	
  graph	
  
What	
  is	
  a	
  graph?	
  
Vertex	
  
Edge	
  
What	
  is	
  a	
  graph?	
  
Node	
  
Rela:onship	
  
Contrast	
  with	
  Rela.onal	
  
Graphs	
  are	
  o]en	
  referred	
  to	
  as	
  “Whiteboard	
  Friendly”.	
  The	
  data	
  model	
  reflects	
  the	
  way	
  a	
  
domain	
  expert	
  would	
  naturally	
  draw	
  their	
  data	
  on	
  a	
  whiteboard	
  
“The	
  schema	
  is	
  the	
  data”.	
  Schema	
  flexibility	
  allows	
  the	
  system	
  to	
  change	
  in	
  response	
  to	
  a	
  
changing	
  environment	
  
Neo4j	
  is	
  a	
  Graph	
  Database	
  
•  JVM	
  based	
  
•  ACID	
  transac:ons	
  
•  Rich	
  Java	
  APIs	
  
•  Query	
  language	
  
•  Using	
  the	
  Labeled	
  	
  
Property	
  Graph	
  
model	
  
Cypher:	
  THE	
  graph	
  query	
  language	
  
•  Learning	
  from	
  RDBMS’	
  evolu:on	
  
•  Introduc:on	
  of	
  SQL!	
  
•  Key	
  characteris:cs	
  
•  Declara:ve:	
  tell	
  it	
  what	
  you	
  want,	
  not	
  how	
  to	
  get	
  it	
  
•  Expressive:	
  Op:mize	
  for	
  reading	
  
•  Pagern	
  matching:	
  easy	
  on	
  your	
  brain!	
  
•  Idempotent:	
  state	
  change	
  expressed	
  idempotently	
  
Labeled	
  Property	
  Graph	
  Model	
  
Author
Book
Reader
Reader
Author
Book
Author
Labeled	
  Property	
  Graph	
  Summary	
  
•  Nodes	
  
•  Containers	
  for	
  proper:es	
  
•  Grouped	
  together	
  in	
  subgraphs	
  by	
  “Labels”	
  
•  Proper:es	
  
•  Key-­‐value	
  pairs	
  
•  Primi:ve	
  and	
  array	
  values	
  
•  Rela:onships	
  
•  Name	
  
•  Direc:on	
  
•  May	
  also	
  contain	
  proper:es	
  
•  Rela:onships	
  (ctd.)	
  
•  Must	
  have	
  a	
  start	
  node	
  and	
  an	
  end	
  node	
  
(no	
  dangling	
  rela:onships)	
  
•  Start	
  node	
  and	
  end	
  node	
  can	
  be	
  the	
  same	
  
(e.g.	
  ‘self’	
  rela:onships)	
  
•  Nodes	
  can	
  be	
  connected	
  by	
  more	
  than	
  one	
  
rela:onship	
  
What	
  are	
  graphs	
  good	
  for?	
  
Complexity	
  
Data	
  Complexity	
  
complexity = f(size, semi-structure, connectedness)
complexity = f(size, semi-structure, connectedness)
The	
  Real	
  Complexity	
  
Semi-­‐Structure	
  
Semi-­‐Structure	
  
Email:	
  rik@neotechnology.com	
  
Email:	
  rik@vanbruggen.be	
  
Twiger:	
  @rvanbruggen	
  
Skype:	
  rvanbruggen	
  
USER	
  
CONTACT	
  
CONTACT_TYPE	
  
FIRST_NAME	
   LAST_NAME	
  USER_ID	
   EMAIL_1	
   EMAIL_2	
   TWITTER	
  FACEBOOK	
   SKYPE	
  
Rik	
   Van	
  Bruggen	
  315	
   rik@neotechnology.com	
   rik@vanbruggen.be	
   @rvanbruggen	
  NULL	
   rvanbruggen	
  
complexity = f(size, semi-structure, connectedness)
The	
  Real	
  Complexity	
  
Examples	
  of	
  Connectedness	
  
When	
  Should	
  I	
  Use	
  Graph	
  Databases??	
  
•  Densely-­‐connected,	
  semi-­‐structured	
  domains	
  
•  Lots	
  of	
  join	
  tables?	
  Connectedness	
  
•  Lots	
  of	
  sparse	
  tables?	
  Semi-­‐structure	
  
•  Data	
  Model	
  Vola:lity	
  
•  Easy	
  to	
  evolve	
  
•  “Graphy”	
  Query	
  pagerns	
  
•  Deeps	
  Join	
  Complexity	
  and	
  Performance	
  
•  Pathfinding	
  opera:ons	
  
•  Millions	
  of	
  ‘joins’	
  per	
  second	
  
•  Consistent	
  query	
  :mes	
  as	
  dataset	
  grows	
  
Graph	
  Querying	
  
Querying	
  a	
  Graph	
  
•  “Graph	
  local”	
  vs	
  “Graph	
  global”	
  
•  Contextualized	
  “ego-­‐centric”	
  queries	
  
•  “Parachute”	
  into	
  graph	
  
•  Start	
  node(s)	
  
•  Found	
  through	
  Index	
  lookups	
  
•  Crawl	
  the	
  surrounding	
  graph	
  
•  2	
  million+	
  joins	
  per	
  second	
  
•  No	
  more	
  Index	
  lookups:	
  	
  
Index-­‐free	
  adjacency	
  
Queries:	
  Paern	
  Matching	
  
Pagern	
  
Short	
  demo	
  
Case	
  Studies	
  
www.neo4j.com	
  
	
  
www.meetup.com/graphdb-­‐belgium	
  
	
  
rik@neotechnology.com	
  or	
  +32	
  478	
  686800	
  
Q&A,	
  Conclusion,	
  Next	
  Steps	
  

Intro to Graphs for Fedict

  • 1.
    Intro  to  Graph  Databases     in  a  NOSQL  world   19th  of  May  2015  
  • 2.
    Agenda   •  About  Graphs   •  About  Graph  Databases   –  About  Neo4j   •  Graph  Querying   –  Short  demonstra:on   •  Case  Studies   •  Q&A  
  • 3.
  • 5.
    Meet     Leonhard  Euler   (again?)   •  Swiss  mathema:cian   •  Inventor  of  Graph   Theory  (1736)  
  • 6.
  • 7.
    A   B   D   C  
  • 8.
    A   B   D   C   1 2 3 4 7 6 5
  • 9.
  • 10.
    Complemen.ng     RelationalDatabases VOLUME   COMPLEXITY  
  • 11.
  • 12.
    RDBMS   Living  in  a  NOSQL  World   Complexity   Column   Family   Size   Key-­‐Value   Store   Document   Databases   Graph   Databases   90%  of  Use   cases   Rela:onal   Databases   Naviga:onal   Databases  
  • 13.
    So  what  is  a  graph  database?   •  OLTP  database   •  “end-­‐user”  transac:ons   •  Model,  store,  manage  data  as  a  graph  
  • 14.
    What  is  a  graph?   Vertex   Edge  
  • 15.
    What  is  a  graph?   Node   Rela:onship  
  • 16.
    Contrast  with  Rela.onal   Graphs  are  o]en  referred  to  as  “Whiteboard  Friendly”.  The  data  model  reflects  the  way  a   domain  expert  would  naturally  draw  their  data  on  a  whiteboard   “The  schema  is  the  data”.  Schema  flexibility  allows  the  system  to  change  in  response  to  a   changing  environment  
  • 17.
    Neo4j  is  a  Graph  Database   •  JVM  based   •  ACID  transac:ons   •  Rich  Java  APIs   •  Query  language   •  Using  the  Labeled     Property  Graph   model  
  • 18.
    Cypher:  THE  graph  query  language   •  Learning  from  RDBMS’  evolu:on   •  Introduc:on  of  SQL!   •  Key  characteris:cs   •  Declara:ve:  tell  it  what  you  want,  not  how  to  get  it   •  Expressive:  Op:mize  for  reading   •  Pagern  matching:  easy  on  your  brain!   •  Idempotent:  state  change  expressed  idempotently  
  • 19.
    Labeled  Property  Graph  Model   Author Book Reader Reader Author Book Author
  • 20.
    Labeled  Property  Graph  Summary   •  Nodes   •  Containers  for  proper:es   •  Grouped  together  in  subgraphs  by  “Labels”   •  Proper:es   •  Key-­‐value  pairs   •  Primi:ve  and  array  values   •  Rela:onships   •  Name   •  Direc:on   •  May  also  contain  proper:es   •  Rela:onships  (ctd.)   •  Must  have  a  start  node  and  an  end  node   (no  dangling  rela:onships)   •  Start  node  and  end  node  can  be  the  same   (e.g.  ‘self’  rela:onships)   •  Nodes  can  be  connected  by  more  than  one   rela:onship  
  • 21.
    What  are  graphs  good  for?   Complexity  
  • 22.
    Data  Complexity   complexity= f(size, semi-structure, connectedness)
  • 23.
    complexity = f(size,semi-structure, connectedness) The  Real  Complexity  
  • 24.
  • 25.
    Semi-­‐Structure   Email:  rik@neotechnology.com   Email:  rik@vanbruggen.be   Twiger:  @rvanbruggen   Skype:  rvanbruggen   USER   CONTACT   CONTACT_TYPE   FIRST_NAME   LAST_NAME  USER_ID   EMAIL_1   EMAIL_2   TWITTER  FACEBOOK   SKYPE   Rik   Van  Bruggen  315   rik@neotechnology.com   rik@vanbruggen.be   @rvanbruggen  NULL   rvanbruggen  
  • 26.
    complexity = f(size,semi-structure, connectedness) The  Real  Complexity  
  • 27.
  • 28.
    When  Should  I  Use  Graph  Databases??   •  Densely-­‐connected,  semi-­‐structured  domains   •  Lots  of  join  tables?  Connectedness   •  Lots  of  sparse  tables?  Semi-­‐structure   •  Data  Model  Vola:lity   •  Easy  to  evolve   •  “Graphy”  Query  pagerns   •  Deeps  Join  Complexity  and  Performance   •  Pathfinding  opera:ons   •  Millions  of  ‘joins’  per  second   •  Consistent  query  :mes  as  dataset  grows  
  • 29.
  • 30.
    Querying  a  Graph   •  “Graph  local”  vs  “Graph  global”   •  Contextualized  “ego-­‐centric”  queries   •  “Parachute”  into  graph   •  Start  node(s)   •  Found  through  Index  lookups   •  Crawl  the  surrounding  graph   •  2  million+  joins  per  second   •  No  more  Index  lookups:     Index-­‐free  adjacency  
  • 31.
  • 32.
  • 33.
  • 34.
    www.neo4j.com     www.meetup.com/graphdb-­‐belgium     rik@neotechnology.com  or  +32  478  686800   Q&A,  Conclusion,  Next  Steps