Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Neo4j	
  in	
  Depth
Max	
  De	
  Marzi
About	
  Me
• Max	
  De	
  Marzi	
  -­‐	
  Neo4j	
  Field	
  Engineer	
  	
  
• My	
  Blog:	
  http://maxdemarzi.com	
  
•...
TLDR:
Property	
  Graph	
  Data	
  Model
What	
  you	
  already	
  know
The	
  Problem
• all JOINs are executed every time you query
(traverse) the relationship
•  executing a JOIN means to sear...
People ConferencesAttend
143 Max
326
Big Data Tech Con
725
NoSQL Now
981 Chariot Data IO143 981
143 725
143 326
Max
Big Data Tech Con
NoSQL Now
Chariot Data IO
143
326
725
981
143 981
143 725
143 326
uid: MDM
name: Max
uid: BDTC
where: Burlinggame
uid: NSN
where: San Francisco
uid: CDIO
where: Philadelphia
Nodes
Relation...
Neo4j	
  Secret	
  Sauce
• Pointers instead of look-ups
• Fixed sized records for fast access
• Do all your “Joining” on c...
Relational	
  Databases	
  Can’t	
  Handle	
  Relationships	
  Well
• Cannot	
  model	
  or	
  store	
  data	
  and	
  rel...
NoSQL	
  Databases	
  Don’t	
  Handle	
  Relationships
• No	
  data	
  structures	
  to	
  model	
  or	
  store	
  
relati...
Real-­‐Time	
  Query	
  Performance

Performance	
  must	
  hold	
  steady	
  with	
  scale
Connectedness	
  and	
  Size	
...
Re-­‐Imagine	
  Your	
  Data	
  as	
  a	
  Graph
Neo4j	
  is	
  an	
  enterprise-­‐grade	
  graph	
  
database	
  that	
  ...
Neo4j	
  Overview
Product	
  
• Neo4j	
  -­‐	
  World’s	
  leading	
  graph	
  
database	
  
• 1M+	
  downloads,	
  adding...
 	
  	
  2000	
  	
  	
  	
  	
  	
  	
  	
  	
  2003	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  2007	...
“Forrester	
  estimates	
  that	
  over	
  25%	
  of	
  enterprises	
  will	
  be	
  using	
  
graph	
  databases	
  by	
 ...
Building	
  a	
  Recommendation	
  Engine	
  in	
  2	
  Minutes	
  with	
  Neo4j	
  

Developer	
  Experience:	
  Neo4j	
 ...
Neo4j	
  –	
  Key	
  Product	
  Features
Native	
  Graph	
  Storage

Ensures	
  data	
  consistency	
  and	
  
performance...
CAR
DRIVES
name:	
  “Dan”	
  
born:	
  May	
  29,	
  1970	
  
twitter:	
  “@dan”
name:	
  “Ann”	
  
born:	
  	
  Dec	
  5,...
Triple	
  Store/RDF	
  Model
• Resource	
  Description	
  Framework	
  
• Subject,	
  Predicate,	
  Object	
  
• Standard	...
Property	
  Graph	
  Data	
  Model	
  (Movies)
RDF	
  Data	
  Model	
  (Movies)
Property	
  Graph	
  Vs	
  Triple	
  Store
• Property	
  Graph	
  is	
  a	
  more	
  generic	
  case	
  of	
  the	
  Tripl...
Query	
  Languages
• Graph	
  Databases:	
  
• Cypher	
  -­‐	
  declarative,	
  pattern	
  
matching,	
  easy	
  to	
  und...
General	
  Use	
  Cases
• Graph	
  Databases:	
  
• Local	
  Queries	
  (anchor	
  on	
  a	
  
node	
  or	
  set	
  of	
  ...
How	
  do	
  you	
  model	
  Flight	
  Data?
How	
  do	
  you	
  model	
  Flight	
  Data?
How	
  do	
  you	
  model	
  Flight	
  Data?
How	
  do	
  you	
  model	
  Flight	
  Data?
How	
  do	
  you	
  model	
  Flight	
  Data?
How	
  do	
  you	
  model	
  Flight	
  Data?
How	
  do	
  you	
  model	
  Flight	
  Data?
How	
  do	
  you	
  model	
  Comic	
  Books?
How	
  do	
  you	
  model	
  a	
  world	
  where	
  anything	
  can	
  happen?
Graph	
  Databases	
  allow	
  Model	
  Flexibility
https://vimeo.com/79399404
Watch	
  the	
  presentation	
  at:
Java	
  CORE	
  API
Direct	
  access	
  to	
  Nodes	
  and	
  
Relationships
Java	
  Core	
  API
• Step	
  by	
  Step	
  from	
  GraphDatabaseService	
  
• Start	
  a	
  transaction	
  (reads	
  and	...
Example	
  (get	
  the	
  friends	
  of	
  a	
  user)
Traversal	
  API
Describe	
  Traversals
Traversal	
  API
• Start	
  with	
  the	
  Simple	
  Defaults	
  (order,	
  relationships,	
  
depth,	
  uniqueness,	
  et...
Traversal	
  API	
  Example
Cypher	
  Query	
  Language
ASCII	
  Art	
  Pattern	
  Matching
Cypher:	
  Powerful	
  and	
  Expressive	
  Query	
  Language
MATCH	
  (:Person	
  {	
  name:“Dan”}	
  )	
  -­‐[:LOVES]-­‐...
MATCH	
  (boss)-­‐[:MANAGES*0..3]-­‐>(sub),	
  
	
  	
  	
  	
  	
  	
  (sub)-­‐[:MANAGES*1..3]-­‐>(report)	
  
WHERE	
  b...
Hello	
  World	
  Recommendation
Hello	
  World	
  Recommendation
Movie	
  Data	
  Model
Cypher	
  Query:	
  Movie	
  Recommendation
MATCH	
  (watched:Movie	
  {title:"Toy	
  Story”})	
  <-­‐[r1:RATED]-­‐	
  ()	...
Movie	
  Data	
  Model
Cypher	
  Query:	
  k-­‐NN	
  Recommendation
MATCH	
  (m:Movie)	
  <-­‐[r:RATED]-­‐	
  (b:Person)	
  -­‐[s:SIMILARITY]-­‐	...
Neo4j	
  Interface
Server,	
  Service,	
  Library
High	
  Speed	
  Fraud	
  -­‐	
  1000	
  R/S
http://maxdemarzi.com/2014/02/12/online-­‐payment-­‐risk-­‐management-­‐with-...
High	
  Speed	
  Fraud	
  -­‐	
  8000	
  R/S
http://maxdemarzi.com/2014/02/27/neo4j-­‐at-­‐ludicrous-­‐speed/
High	
  Speed	
  Fraud	
  -­‐	
  28000	
  R/S
http://maxdemarzi.com/2014/03/10/its-­‐over-­‐9000-­‐neo4j-­‐on-­‐websockets/
Neo4j
Additional	
  Features
Neo4j	
  Clustering	
  

Architecture	
  Optimized	
  for	
  Speed	
  &	
  Availability	
  at	
  Scale
57
Performance	
  B...
Getting	
  Data	
  into	
  Neo4j
Cypher-­‐Based	
  “LOAD	
  CSV”	
  Capability	
  
• Transactional	
  (ACID)	
  writes	
  ...
Databases
Data	
  Storage	
  and

Business	
  Rules	
  Execution
Data	
  Mining	
  

and	
  Aggregation
Neo4j	
  Fits	
  i...
Value	
  from	
  Relationships	
  –	
  Common	
  Use	
  Cases
Internal	
  Applications	
  
Master	
  Data	
  Management	
 ...
Open	
  Corporates
Uses	
  Neo4j
Open	
  Corporates
Open	
  Corporates
Uses	
  Neo4j
https://skillsmatter.com/skillscasts/4097-­‐case-­‐study-­‐how-­‐opencorporates-­‐uses-­‐...
Open	
  Source	
  Examples
http://maxdemarzi.com/2012/10/18/matches-
are-the-new-hotness/
What	
  are	
  the	
  Top	
  10	
  Jobs	
  for	
  me	
  
• that	
  are	
  in	
  the	
  same	
  location	
  I’m	
  in	
  
•...
Partial	
  Subgraph	
  Search
Recommend	
  Love
Find	
  your	
  soulmate	
  in	
  the	
  graph	
  	
  
• Are	
  they	
  energetic?	
  
• Do	
  they	
  l...
Love	
  Recommendation
Two	
  Party	
  Partial	
  Subgraph	
  Search
http://maxdemarzi.com/2013/04/19/match-making-with-neo4j/
Real-­‐Time	
  Recommendations	
  with	
  Neo4j
Social

Recommendations
Products	
  

and	
  Services
Content Routing
Walmart	
  	
  	
  	
  BUSINESS	
  CASE
World’s	
  largest	
  company

by	
  revenue	
  
World’s	
  largest	
  retailer	
 ...
Walmart	
  	
  	
  	
  SOLUTION
• Brings	
  customers,	
  preferences,	
  purchases,	
  
products	
  and	
  locations	
  i...
Global	
  Courier	
  	
  	
  	
  BUSINESS	
  CASE
World’s	
  largest	
  courier	
  
480,000	
  employees

€55	
  billion	
...
Global	
  Courier	
  	
  	
  	
  SOLUTION
Neo4j	
  provides	
  the	
  ideal	
  domain	
  fit	
  since	
  

a	
  logistics	...
eBay	
  	
  	
  	
  BUSINESS	
  CASE
C2C	
  and	
  B2C

retail	
  network	
  
Full	
  e-­‐commerce	
  
functionality	
  fo...
eBay	
  Now	
  	
  	
  	
  	
  SOLUTION
• Acquired	
  UK-­‐based	
  Shutl.	
  a	
  leader	
  
in	
  same-­‐day	
  delivery...
Classmates	
  	
  	
  	
  BUSINESS	
  CASE
Online	
  yearbook	
  
connecting	
  friends	
  from	
  
school,	
  work	
  and...
Classmates	
  	
  	
  	
  SOLUTION
Neo4j	
  provides	
  a	
  robust	
  and	
  scalable	
  graph	
  
database	
  solution	
...
National	
  Geographic	
  	
  	
  	
  BUSINESS	
  CASE
Non-­‐profit	
  scientific	
  and	
  
educational	
  institution	
 ...
National	
  Geographic	
  	
  	
  	
  SOLUTION
• Enabled	
  complex	
  real-­‐time	
  analytics	
  across	
  
eight	
  mil...
Curaspan	
  	
  	
  	
  BUSINESS	
  CASE
Leader	
  in	
  patient	
  
management	
  for	
  discharges	
  
and	
  referrals	...
Curaspan	
  	
  	
  	
  SOLUTION
• Met	
  fast,	
  real-­‐time	
  performance	
  demands	
  
• Supported	
  queries	
  spa...
FiftyThree	
  	
  	
  BUSINESS	
  CASE
Maker	
  of	
  Paper,	
  

one	
  of	
  the	
  top	
  apps	
  

in	
  Apple’s	
  Ap...
FiftyThree	
  	
  	
  	
  SOLUTION
• Neo4j	
  data	
  model	
  ideal	
  for	
  social	
  network,	
  content	
  
managemen...
Users	
  Love	
  Neo4j
jQuery	
  Inventor Heroku	
  Founder
THANK	
  YOU
Neo4j in Depth
Neo4j in Depth
Neo4j in Depth
Upcoming SlideShare
Loading in …5
×

Neo4j in Depth

3,277 views

Published on

Internals and comparing Neo4j vs RDF plus some graph database use cases.

Published in: Education
  • Hi, it is great material. I have a question about recommendation in graph database. I think that recommendation is a special area which several techniques and spcialized engines are existing; collaborative filtering and other machine learning techniques. I wonder what is the real strength of the recommendataion using Neo4j compared to these recommendation methods.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hi Max, really love your slides :) I do wonder though, about the topic of relational db joins. You've assumed that the joins will be indexed with a BTree - but it's possible that they use a hash index right? If so, I wouldn't imagine there'd be performance degradation?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Neo4j in Depth

  1. 1. Neo4j  in  Depth Max  De  Marzi
  2. 2. About  Me • Max  De  Marzi  -­‐  Neo4j  Field  Engineer     • My  Blog:  http://maxdemarzi.com   • Find  me  on  Twitter:  @maxdemarzi   • Email  me:  maxdemarzi@gmail.com   • GitHub:  http://github.com/maxdemarzi
  3. 3. TLDR:
  4. 4. Property  Graph  Data  Model
  5. 5. What  you  already  know
  6. 6. The  Problem • all JOINs are executed every time you query (traverse) the relationship •  executing a JOIN means to search for a key in another table •  with Indices executing a JOIN means to lookup a key •  B-Tree Index: O(log(n)) •  more entries => more lookups => slower JOINs
  7. 7. People ConferencesAttend 143 Max 326 Big Data Tech Con 725 NoSQL Now 981 Chariot Data IO143 981 143 725 143 326
  8. 8. Max Big Data Tech Con NoSQL Now Chariot Data IO 143 326 725 981 143 981 143 725 143 326
  9. 9. uid: MDM name: Max uid: BDTC where: Burlinggame uid: NSN where: San Francisco uid: CDIO where: Philadelphia Nodes Relationships member member member A Property Graph
  10. 10. Neo4j  Secret  Sauce • Pointers instead of look-ups • Fixed sized records for fast access • Do all your “Joining” on creation • Spin spin spin through this data structure
  11. 11. Relational  Databases  Can’t  Handle  Relationships  Well • Cannot  model  or  store  data  and  relationships   without  complexity   • Performance  degrades  with  number  &  levels  of   relationships,  and  database  size   • Query  complexity  grows  with  need  for  JOINs   • Adding  new  types  of    data  and  relationships   requires  schema  redesign,  increasing  time  to   market   …  making  traditional  databases  inappropriate  when   relationships  are  valuable  in  real-­‐time Slow  development
 Poor  performance
 Low  scalability
 Hard  to  maintain
  12. 12. NoSQL  Databases  Don’t  Handle  Relationships • No  data  structures  to  model  or  store   relationships   • No  query  constructs  to  support   relationships   • Relating  data  requires  “JOIN  logic”  in  the   application   • No  ACID  support  for  transactions   …  making  NoSQL  databases  inappropriate  when   relationships  are  valuable  in  real-­‐time
  13. 13. Real-­‐Time  Query  Performance
 Performance  must  hold  steady  with  scale Connectedness  and  Size  of  Data  Set Response  Time 0  to  2  hops
 0  to  3  degrees
 Thousands  of  connections Tens  to  hundreds  of  hops
 Thousands  of  degrees
 Billions    of  connections Relational  and
 Other  NoSQL
 Databases Neo4j Neo4j  is  
 1000x  faster
 Reduces  minutes  
 to  milliseconds
  14. 14. Re-­‐Imagine  Your  Data  as  a  Graph Neo4j  is  an  enterprise-­‐grade  graph   database  that  enables  you  to:   • Model  and  store  your  data  as  a   graph   • Query  relationships  with  ease   and  in  real-­‐time   • Seamlessly  evolve  applications   to  support  new  requirements  by  
 adding  new  kinds  of  data  and   relationships Agile  development
 High  performance
 Vertical  and  horizontal  scale
 Seamless  evolution
  15. 15. Neo4j  Overview Product   • Neo4j  -­‐  World’s  leading  graph   database   • 1M+  downloads,  adding  50k+  
 per  month   • 150+  enterprise  subscription   customers  including  over  
 50  of  the  Global  2000 Company   • Neo  Technology,  Creator  of  Neo4j   • 80  employees  with  HQ  in  Silicon   Valley,  London,  Munich,  Paris  and   Malmö   • $45M  in  funding  from  Fidelity,   Sunstone,  Conor,  Creandum,   Dawn  Capital
  16. 16.      2000                  2003                                2007      2009   2011 2013 2014 2015 Neo4j:  The  Graph  Database  Leader GraphConnect,  
 first  conference   for  graph  DBs First  Global  2000    
 Customer   Introduced  Cypher   a  declarative  query   language  for   property  graphs Published   O’Reilly  book
 on  Graph   Databases $11M  Series  A  
 from  Fidelity,   Sunstone
 and  Conor   $11M  Series  B  
 from  Fidelity,   Sunstone
 and  Conor   Commercial
 Leadership First  
 native  
 graph  DB  
 in  24/7   production Invented   property   graph   model Contributed   first  graph   DB  to  open   source $2.5M  Seed
 Round  from   Sunstone  
 and  Conor Funding Technical
 Leadership Extended  
 graph  data   model  to  
 labeled   property  graph 150+  customers   50K+  monthly
 downloads   500+  graph  
 DB  events
 worldwide  
 $20M  Series  C  
 led  by   Creandum,  with   Dawn  and   existing  investors
  17. 17. “Forrester  estimates  that  over  25%  of  enterprises  will  be  using   graph  databases  by  2017” Neo4j  Leads  the  Graph  Database  Revolution “Neo4j  is  the  current  market  leader  in  graph  databases.” “Graph  analysis  is  possibly  the  single  most  effective  competitive   differentiator  for  organizations  pursuing  data-­‐driven  operations   and  decisions  after  the  design  of  data  capture.” 1.  IT  Market  Clock  for  Database  Management  Systems,  2014   2.  TechRadar™:  Enterprise  DBMS,  Q1  2014   3.Graph  Databases  –  and  Their  Potential  to  Transform  How  We  Capture  Interdependencies  (Enterprise  Management  Associates)
  18. 18. Building  a  Recommendation  Engine  in  2  Minutes  with  Neo4j  
 Developer  Experience:  Neo4j  UI  with  Cypher  Query  Language Two-­‐Minute  Video  Demo https://www.youtube.com/watch?v=qbZ_Q-­‐YnHYo
  19. 19. Neo4j  –  Key  Product  Features Native  Graph  Storage
 Ensures  data  consistency  and   performance   Native  Graph  Processing
 Millions  of  hops  per  second,  in  real  time   “Whiteboard  Friendly”  Data  Modeling
 Model  data  as  it  naturally  occurs   High  Data  Integrity
 Fully  ACID  transactions The  Graph  Query  Language:  Cypher
 Requires  10x  to  100x  less  code  than  SQL   Scalability  and  High  Availability
 Vertical  and  horizontal  scaling  optimized   for  graphs   Built-­‐in  ETL
 Seamless  import  from  other  databases   Integration
 Drivers  and  APIs  for  popular  languages MATCH
 (A)
  20. 20. CAR DRIVES name:  “Dan”   born:  May  29,  1970   twitter:  “@dan” name:  “Ann”   born:    Dec  5,  1975 since:  
 Jan  10,  2011 brand:  “Volvo”   model:  “V70” Property  Graph  Model  Components Nodes   • The  objects  in  the  graph   • Can  have  properties   • Can  be  labeled   Relationships   • Relate  nodes  by  type  and  direction   • Can  have  properties LOVES LOVES LIVES  WITH OW NS PERSON PERSON
  21. 21. Triple  Store/RDF  Model • Resource  Description  Framework   • Subject,  Predicate,  Object   • Standard  Data  Model   • Names  for  subjects,  predicates,   objects  must  be  URIs   • Names  must  be  Global   • No  properties  on  the  Relationships   • Like  “3rd  Normal  Form”  for  Relational   Databases  (but  really  more  like  5/6th)
  22. 22. Property  Graph  Data  Model  (Movies)
  23. 23. RDF  Data  Model  (Movies)
  24. 24. Property  Graph  Vs  Triple  Store • Property  Graph  is  a  more  generic  case  of  the  Triple  Store   • Lack  of  properties  on  relationships  for  Triple  Stores  reduce  (  or   complicate)  their  expressive  power
  25. 25. Query  Languages • Graph  Databases:   • Cypher  -­‐  declarative,  pattern   matching,  easy  to  understand   • Gremlin  -­‐  imperative,  step   driven,  math  inspired   • Native  APIs  (Java,  REST) • Triple  Stores:   • SPARQL  (standard)   • PROLOG  (or  prolog-­‐like   languages)
  26. 26. General  Use  Cases • Graph  Databases:   • Local  Queries  (anchor  on  a   node  or  set  of  nodes  then   traverse)   • Realtime  (<20ms)  requirements   • Complex,  deep  traversals   • Flexible  graph  models • Triple  Stores:   • Global  Queries  (find  pattern  in   large  volume  of  information)   • Browsing  Content   • Inference  Discovery
  27. 27. How  do  you  model  Flight  Data?
  28. 28. How  do  you  model  Flight  Data?
  29. 29. How  do  you  model  Flight  Data?
  30. 30. How  do  you  model  Flight  Data?
  31. 31. How  do  you  model  Flight  Data?
  32. 32. How  do  you  model  Flight  Data?
  33. 33. How  do  you  model  Flight  Data?
  34. 34. How  do  you  model  Comic  Books? How  do  you  model  a  world  where  anything  can  happen?
  35. 35. Graph  Databases  allow  Model  Flexibility https://vimeo.com/79399404 Watch  the  presentation  at:
  36. 36. Java  CORE  API Direct  access  to  Nodes  and   Relationships
  37. 37. Java  Core  API • Step  by  Step  from  GraphDatabaseService   • Start  a  transaction  (reads  and  writes)   • findNode(Label,  Property,  Value)   • findNodes(Label,  Property,  Value)   • findNodes(Label)   • getNodeById(Long)     • getRelationships(Direction,  Type)   • getProperty(Property,  (optional)  Default  Value)
  38. 38. Example  (get  the  friends  of  a  user)
  39. 39. Traversal  API Describe  Traversals
  40. 40. Traversal  API • Start  with  the  Simple  Defaults  (order,  relationships,   depth,  uniqueness,  etc)   • Custom  Expanders   • Where  should  I  go  next   • Custom  Evaluators   • I’ve  gone  there…  should  I  accept  this  path?
  41. 41. Traversal  API  Example
  42. 42. Cypher  Query  Language ASCII  Art  Pattern  Matching
  43. 43. Cypher:  Powerful  and  Expressive  Query  Language MATCH  (:Person  {  name:“Dan”}  )  -­‐[:LOVES]-­‐>  (:Person  {  name:“Ann”}  )   LOVES Dan Ann Label Property Label Property Node Node
  44. 44. MATCH  (boss)-­‐[:MANAGES*0..3]-­‐>(sub),              (sub)-­‐[:MANAGES*1..3]-­‐>(report)   WHERE  boss.name  =  “John  Doe”   RETURN  sub.name  AS  Subordinate,  
    count(report)  AS  Total Express  Complex  Queries  Easily  with  Cypher Find  all  direct  reports  and  
 how  many  people  they  manage,  
 up  to  3  levels  down Cypher  QuerySQL  Query
  45. 45. Hello  World  Recommendation
  46. 46. Hello  World  Recommendation
  47. 47. Movie  Data  Model
  48. 48. Cypher  Query:  Movie  Recommendation MATCH  (watched:Movie  {title:"Toy  Story”})  <-­‐[r1:RATED]-­‐  ()  -­‐[r2:RATED]-­‐>  (unseen:Movie)   WHERE  r1.rating  >  7  AND  r2.rating  >  7   AND  watched.genres  =  unseen.genres   AND  NOT(  (:Person  {username:”maxdemarzi"})  -­‐[:RATED|WATCHED]-­‐>  (unseen)  )   RETURN  unseen.title,  COUNT(*)   ORDER  BY  COUNT(*)  DESC   LIMIT  25 What  are  the  Top  25  Movies   • that  I  haven't  seen   • with  the  same  genres  as  Toy  Story     • given  high  ratings   • by  people  who  liked  Toy  Story
  49. 49. Movie  Data  Model
  50. 50. Cypher  Query:  k-­‐NN  Recommendation MATCH  (m:Movie)  <-­‐[r:RATED]-­‐  (b:Person)  -­‐[s:SIMILARITY]-­‐  (p:Person  {name:'Zoltan  Varju'})   WHERE  NOT(  (p)  -­‐[:RATED|WATCHED]-­‐>  (m)  )   WITH  m,  s.similarity  AS  similarity,  r.rating  AS  rating   ORDER  BY  m.name,  similarity  DESC   WITH  m.name  AS  movie,  COLLECT(rating)[0..3]  AS  ratings   WITH  movie,  REDUCE(s  =  0,  i  IN  ratings  |  s  +  i)*1.0  /  LENGTH(ratings)  AS  recommendation   ORDER  BY  recommendation  DESC   RETURN  movie,  recommendation
 LIMIT  25 What  are  the  Top  25  Movies   • that  Zoltan  Varju  has  not  seen   • using  the  average  rating   • by  my  top  3  neighbors  
  51. 51. Neo4j  Interface Server,  Service,  Library
  52. 52. High  Speed  Fraud  -­‐  1000  R/S http://maxdemarzi.com/2014/02/12/online-­‐payment-­‐risk-­‐management-­‐with-­‐neo4j/  
  53. 53. High  Speed  Fraud  -­‐  8000  R/S http://maxdemarzi.com/2014/02/27/neo4j-­‐at-­‐ludicrous-­‐speed/
  54. 54. High  Speed  Fraud  -­‐  28000  R/S http://maxdemarzi.com/2014/03/10/its-­‐over-­‐9000-­‐neo4j-­‐on-­‐websockets/
  55. 55. Neo4j Additional  Features
  56. 56. Neo4j  Clustering  
 Architecture  Optimized  for  Speed  &  Availability  at  Scale 57 Performance  Benefits:   • No  network  hops  within  queries   • Real-­‐time  operations  with  fast  and   consistent  response  times     • Cache  sharding  spreads  cache  across   cluster  for  very  large  graphs Clustering  Features:   • Master-­‐slave  replication  with  
 master  re-­‐election  and  failover     • Each  instance  has  its  own  local  cache   • Horizontal  scaling  &  disaster  recovery Load  Balancer Neo4jNeo4jNeo4j
  57. 57. Getting  Data  into  Neo4j Cypher-­‐Based  “LOAD  CSV”  Capability   • Transactional  (ACID)  writes   • Initial  and  incremental  loads  of  up  to  
 10  million  nodes  and  relationships   Command-­‐Line  Bulk  Loader        neo4j-­‐import   • For  initial  database  population   • For  loads  with  10B+  records   • Up  to  1M  records  per  second  4.58  million  things   and  their  relationships…   Loads  in  100  seconds!
  58. 58. Databases Data  Storage  and
 Business  Rules  Execution Data  Mining  
 and  Aggregation Neo4j  Fits  into  Your  Enterprise  Environment Application Graph  Database  Cluster Neo4j Neo4j Neo4j Ad  Hoc
 Analysis ETL Bulk  Analytic
 Infrastructure
 Graph  Compute  Engine
 Hadoop      EDW      … ETL Data   Scientist End  User
  59. 59. Value  from  Relationships  –  Common  Use  Cases Internal  Applications   Master  Data  Management     Network  and  IT  
 Operations   Fraud  Detection Customer-­‐Facing  Applications   Real-­‐time  Recommendations   Graph-­‐based  Search   Identity  and  
 Access  Management
  60. 60. Open  Corporates Uses  Neo4j
  61. 61. Open  Corporates
  62. 62. Open  Corporates Uses  Neo4j https://skillsmatter.com/skillscasts/4097-­‐case-­‐study-­‐how-­‐opencorporates-­‐uses-­‐neo4j-­‐to-­‐provide-­‐insight
  63. 63. Open  Source  Examples http://maxdemarzi.com/2012/10/18/matches- are-the-new-hotness/
  64. 64. What  are  the  Top  10  Jobs  for  me   • that  are  in  the  same  location  I’m  in   • for  which  I  have  the  necessary  qualifications
  65. 65. Partial  Subgraph  Search
  66. 66. Recommend  Love Find  your  soulmate  in  the  graph     • Are  they  energetic?   • Do  they  like  dogs?   • Have  a  good  sense  of  humor?   • Neat  and  tidy,  but  not  crazy  about  it? What  are  the  Top  10  Potential  Mates  for  me   • that  are  in  the  same  location   • are  sexually  compatible   • have  traits  I  want     • want  traits  I  have
  67. 67. Love  Recommendation
  68. 68. Two  Party  Partial  Subgraph  Search http://maxdemarzi.com/2013/04/19/match-making-with-neo4j/
  69. 69. Real-­‐Time  Recommendations  with  Neo4j Social
 Recommendations Products  
 and  Services Content Routing
  70. 70. Walmart        BUSINESS  CASE World’s  largest  company
 by  revenue   World’s  largest  retailer  and   private  employer   SF-­‐based  global  
 e-­‐commerce  division   manages  several  websites   Found  in  1969
 Bentonville,  Arkansas   • Needed  online  customer  recommendations  to   keep  pace  with  competition   • Data  connections  provided  predictive  context,  but   were  not  in  a  usable  format   • Solution  had  to  serve  many  millions  of  customers   and  products  while  maintaining  superior   scalability  and  performance
  71. 71. Walmart        SOLUTION • Brings  customers,  preferences,  purchases,   products  and  locations  into  a  graph  model   • Uses  connections  to  make  product   recommendations   • Solution  deployed  across  WalMart  
 divisions  and  websites
  72. 72. Global  Courier        BUSINESS  CASE World’s  largest  courier   480,000  employees
 €55  billion  in  revenue     Needed  new  
 B2C  and  B2B  parcel  routing   system  for  its  logistics   practice   Legacy  system  neither   supported  the  full  network   nor  the  shift  to  online   demands Needed  to  replace  aging  B2B  and  B2C  parcel  routing   system  whose  requirements  include:   • 24x7  availability   • Peak  loads  of  5M  parcels  per  day,  3K  per  second   • Support  for  complex  and  diverse  software  stack   • Predictable  performance  with  linear  scalability   • Daily  changes  to  logistics  networks   • Route  from  any  point  to  any  point   • Single  point  of  truth  for  entire  network
  73. 73. Global  Courier        SOLUTION Neo4j  provides  the  ideal  domain  fit  since  
 a  logistics  network  is  a  graph   • High  availability  and  performance  via  Neo4j   clustering   • Greatly  simplified  Cypher  queries  for  routing   versus  relational  SQL  queries   • Flexible  data  model  that  reflects  the  real   logistics  world  far  better  than  relational   • Easy-­‐to-­‐grasp  whiteboard-­‐friendly  model
  74. 74. eBay        BUSINESS  CASE C2C  and  B2C
 retail  network   Full  e-­‐commerce   functionality  for  individuals   and  businesses   Integrated  with  logistics   vendors  for  product   deliveries • Needed  an  offering  to  compete  with  
 Amazon  Prime   • Enable  customer-­‐selected  delivery  inside  
 90  minutes   • Calculate  best  route  option  in  real-­‐time   • Scale  to  enable  a  variety  of  services   • Offer  more  predictable  delivery  times
  75. 75. eBay  Now          SOLUTION • Acquired  UK-­‐based  Shutl.  a  leader   in  same-­‐day  delivery   • Used  Neo4j  to  create  eBay  Now   • 1000  times  faster  than  the  prior  
 MySQL-­‐based  solution   • Faster  time-­‐to-­‐market   • Improved  code  quality  with  
 10  to  100  times  less  query  code
  76. 76. Classmates        BUSINESS  CASE Online  yearbook   connecting  friends  from   school,  work  and  military   in  US  and  Canada   Founded  as  
 Memory  Lane  in  Seattle   Develop  new  social  networking  capabilities  to   monetize  yearbook-­‐related  offerings   • Show  all  the  people  I  know  in  a  yearbook   • Show  yearbooks  my  friends  appear  in  most  often   • Show  sections  of  a  yearbook  that  my  friends   appear  most  in   • Show  me  other  schools  my  friends  attended
  77. 77. Classmates        SOLUTION Neo4j  provides  a  robust  and  scalable  graph   database  solution   • 3-­‐instance  cluster  with  cache  sharding  and   disaster-­‐recovery   • 18ms  response  time  for  top  4  queries   • 100M  nodes  and  600M  relationships  in   initial  graph—including  people,  images,   schools,  yearbooks  and  pages   • Projected  to  grow  to  1B  nodes  and  6B   relationships
  78. 78. National  Geographic        BUSINESS  CASE Non-­‐profit  scientific  and   educational  institution   founded  in  1888   Covers  geography,   archaeology,  natural  science,   environment  and  historical   conservation   Journals,  online  media,  
 radio,  TV,  documentaries,  
 live  events  and  consumer   content  and  goods • Improve  poor  performance  of  PostgreSQL  app   • Increase  user  engagement  by  linking  to  100+  years   of  multimedia  content     • Improve  targeting  by  understand  subscribers’   interests  better   • Recommend  content  and  services  to  users  based   on  their  interests
  79. 79. National  Geographic        SOLUTION • Enabled  complex  real-­‐time  analytics  across   eight  million  users  and  a  century  of  content   • Delivered  robust  performance  by  eliminating   triple-­‐nested  SQL  joins     • Cross-­‐refers  users  among  content,  live  events,   travel,  goods  and  causes   • Neo4j  solution  much  less  cumbersome  
 and  easier  to  maintain  than  previous  
 SQL  system
  80. 80. Curaspan        BUSINESS  CASE Leader  in  patient   management  for  discharges   and  referrals   Manages  patient  referrals   4600+  health  care  facilities   Connects  providers,  payers   via  web-­‐based  patient   management  platform   Founded  in  1999  in   Newton,  Massachusetts • Improve  poor  performance  of  Oracle  solution   • Support  more  complexity  including  granular,  
 role-­‐based  access  control   • Satisfy  complex  Graph  Search  queries  by  discharge   nurses  and  intake  coordinators   Find  a  skilled  nursing  facility  within  n  miles  of  a   given  location,  belonging  to  health  care  group   XYZ,  offering  speech  therapy  and  cardiac  care,   and  optionally  Italian  language  services
  81. 81. Curaspan        SOLUTION • Met  fast,  real-­‐time  performance  demands   • Supported  queries  span  multiple  hierarchies   including  provider  and  employee-­‐permissions   graphs   • Improved  data  model  to  handle  adding  more   dimensions  to  the  data  such  as  insurance   networks,  service  areas  and  care  organizations   • Greatly  simplified  queries,  simplifying  
 multi-­‐page  SQL  statements  into  one  
 Neo4j  function
  82. 82. FiftyThree      BUSINESS  CASE Maker  of  Paper,  
 one  of  the  top  apps  
 in  Apple’s  App  Store,  with   millions  of  users   Based  in  New  York  City • Add  social  capabilities  to  digital-­‐paper  app   • Support  social  collaboration  across  millions  of   users  in  new  Mix  app   • Enable  seamless  interaction  between  social   and  content-­‐asset  networks   • Ensure  new  apps  are  robust,  scalable  and  fast
  83. 83. FiftyThree        SOLUTION • Neo4j  data  model  ideal  for  social  network,  content   management  and  access  control   • Users  create,  publish  and  share  designs  simply   • Easy  to  develop  and  evolve  Neo4j-­‐based  app   • Integrates  well  with  FiftyThree  EC2  architecture   See  the  Neo4j  solution  in  action   Betting  the  Company  (Literally)  on  a  Graph  Database
 http://aseemk.com/talks/neo4j-­‐lessons-­‐learned#/ App  Store  Editor’s  Choice
 2012  iPad  App  of  Year
 Apple  Best  Apps  of  2014
  84. 84. Users  Love  Neo4j jQuery  Inventor Heroku  Founder
  85. 85. THANK  YOU

×