Leveraging relations at scale with Neo4j

4,503 views

Published on

An introduction to graphs, graph databases, Neo4j and the Cypher Query language

Published in: Technology

Leveraging relations at scale with Neo4j

  1. 1. alberto@graphenedb.com | @albertoperdomo Leveraging relations at scale with Neo4j Madrid.rb - July 2013 Alberto Perdomo GrapheneDB
  2. 2. alberto@graphenedb.com | @albertoperdomo About me ๏Co-founder of Aentos ๏Ruby developer ๏GrapheneDB: Neo4j as a Service
  3. 3. alberto@graphenedb.com | @albertoperdomo Origin of Graphs
  4. 4. alberto@graphenedb.com | @albertoperdomo Leonhard Euler, 1736
  5. 5. alberto@graphenedb.com | @albertoperdomo Königsberg Bridge Problem
  6. 6. alberto@graphenedb.com | @albertoperdomo Euler’s Technique
  7. 7. alberto@graphenedb.com | @albertoperdomo Euler’s Technique
  8. 8. alberto@graphenedb.com | @albertoperdomo Königsberg Problem Graph
  9. 9. A little Graph Theory
  10. 10. alberto@graphenedb.com | @albertoperdomo The Math G=( V, E )
  11. 11. alberto@graphenedb.com | @albertoperdomo Types of Graphs
  12. 12. alberto@graphenedb.com | @albertoperdomo Undirected Graph A B C Adam Michael John Example: Facebook Friendships
  13. 13. alberto@graphenedb.com | @albertoperdomo Directed Graph A B C Adam Michael John Example: Twitter follows
  14. 14. alberto@graphenedb.com | @albertoperdomo Weighted Graph A B C 0.6 0.8 Adam John Star Wars: Episode IV Example: Movie Ratings
  15. 15. alberto@graphenedb.com | @albertoperdomo Labeled Graph A C friend_of fan_of Adam LA Lakers Michael fan_pagefriend_of B user user Example: Facebook friendships + fan pages
  16. 16. alberto@graphenedb.com | @albertoperdomo Property Graph A B C rated: 0.6 directed Type: Cast Member Name: George Lucas Born_At: 1944-05-14 Type: Movie Title: Star Wars Episode IV - A New Hope Release: 1977 Type: User Name: Adam Age: 34 Country: USA wrote Example: IMDB
  17. 17. alberto@graphenedb.com | @albertoperdomo Graph Databases
  18. 18. alberto@graphenedb.com | @albertoperdomo Graph DB: Definition ๏Uses graph as primary data structure ๏Property graph: store data as nodes, relations and properties
  19. 19. alberto@graphenedb.com | @albertoperdomo Graph DBs vs other DBs
  20. 20. alberto@graphenedb.com | @albertoperdomo A graph can be modeled with almost any technology
  21. 21. alberto@graphenedb.com | @albertoperdomo Mysql vs Neo4j ๏1M users ๏Friends of friends for 1K users Depth Execution Time – MySQL Execution Time – Neo4j 2 0. 016 0. 010 3 30. 267 168 4 1, 543. 505 1. 359 5 Not Finished in 1 Hour 2. 132 http://www.neotechnology.com/how-much-faster-is-a-graph-database-really/, http://www.manning.com/partner/
  22. 22. alberto@graphenedb.com | @albertoperdomo Conventional DBs ๏Index lookup to find out adjacent nodes ๏Depends on total number of vertices and edges in DB (global)
  23. 23. alberto@graphenedb.com | @albertoperdomo Graph DB: Definition ๏Any system that provides index-free adjacency[1] ๏Linear cost to retrieve adjacent nodes: depends on the number of local neighbours [1] http://www.slideshare.net/slidarko/problemsolving-using-graph-traversals-searching-scoring-ranking-and-recommendation
  24. 24. alberto@graphenedb.com | @albertoperdomo While DB grows, cost of local step remains the same
  25. 25. alberto@graphenedb.com | @albertoperdomo Modeling connected data is natural
  26. 26. alberto@graphenedb.com | @albertoperdomo When use a graph?
  27. 27. alberto@graphenedb.com | @albertoperdomo High density of relations
  28. 28. alberto@graphenedb.com | @albertoperdomo A search engine for relations
  29. 29. alberto@graphenedb.com | @albertoperdomo Graph analysis ๏Recommend vertices to user x ๏Search for y given x ๏Score x given its local neighbourhood ๏Rank x relative to y
  30. 30. alberto@graphenedb.com | @albertoperdomo Recommendations
  31. 31. alberto@graphenedb.com | @albertoperdomo Social Graph A representation of the relationship between people and other people
  32. 32. alberto@graphenedb.com | @albertoperdomo Social Graph ๏Facebook ๏Twitter ๏LinkedIn “Since you have many friends in common, you might know fellow X.”
  33. 33. alberto@graphenedb.com | @albertoperdomo Interest Graph A representation of the relationship between people and things
  34. 34. alberto@graphenedb.com | @albertoperdomo Interest Graph ๏Pinterest ๏Instagram ๏Quora ๏Spotify “A lot of people who like x like you, also like y, too.”
  35. 35. alberto@graphenedb.com | @albertoperdomo Pinterest Interest Graph http://engineering.pinterest.com/post/55272557617/building-a-follower-model-from-scratch
  36. 36. alberto@graphenedb.com | @albertoperdomo e-commerce
  37. 37. alberto@graphenedb.com | @albertoperdomo e-commerce Upselling
  38. 38. alberto@graphenedb.com | @albertoperdomo Recommendations bought Many users Star Wars I DVD bought C A B looking at A user Star Wars Trilogy DVD Pack “Customers who bought a, also bought b”
  39. 39. alberto@graphenedb.com | @albertoperdomo Rank x ๏Rank nodes based on their neighbourhood/network ๏Klout, PageRank
  40. 40. alberto@graphenedb.com | @albertoperdomo Geospatial problems ๏Travelling Salesman ๏Route for delivery of parcels ๏Optimize route for duration, distance, traffic flow, etc. ๏Must not be physical path, example: connecting people
  41. 41. alberto@graphenedb.com | @albertoperdomo Recognize patterns ๏Fraud detection ๏Debt compensation systems ๏Text analysis ๏Chain of exchanges
  42. 42. alberto@graphenedb.com | @albertoperdomo Visualize connected data
  43. 43. alberto@graphenedb.com | @albertoperdomo Your domain model determines what you can do
  44. 44. alberto@graphenedb.com | @albertoperdomo High chances your data is a graph
  45. 45. alberto@graphenedb.com | @albertoperdomo The Neo4j Graph Database
  46. 46. alberto@graphenedb.com | @albertoperdomo Data modeling
  47. 47. alberto@graphenedb.com | @albertoperdomo White board
  48. 48. alberto@graphenedb.com | @albertoperdomo Then Add Complexity
  49. 49. alberto@graphenedb.com | @albertoperdomo Process, Tips ๏Model facts as nodes ๏Use relations to model relations between facts ๏Refactor - schema-less !
  50. 50. alberto@graphenedb.com | @albertoperdomo Neo4j ๏Graph DB written in Java ๏Java API + HTTP/REST + Embedded ๏Full ACID ๏Built-in indexing (or roll your own) ๏Scale: 32B nodes, 32B relations
  51. 51. alberto@graphenedb.com | @albertoperdomo The Cypher Query Language
  52. 52. alberto@graphenedb.com | @albertoperdomo Cypher ๏Neo4j’s graph query language ๏Declarative pattern matching ๏“SQL for graphs” ๏ASCII art
  53. 53. alberto@graphenedb.com | @albertoperdomo Pattern matching
  54. 54. alberto@graphenedb.com | @albertoperdomo Pattern matching
  55. 55. alberto@graphenedb.com | @albertoperdomo Basic Syntax A B (a) --> (b)
  56. 56. alberto@graphenedb.com | @albertoperdomo Basic Syntax START a=node(*) MATCH (a)-->(b) RETURN a,b;
  57. 57. alberto@graphenedb.com | @albertoperdomo (a) --> (b)
  58. 58. alberto@graphenedb.com | @albertoperdomo Relations (a) -[:ACTED_IN]-> (b) A B ACTED IN
  59. 59. alberto@graphenedb.com | @albertoperdomo Syntax START a=node(*) MATCH (a)-[:ACTED_IN]->(b) RETURN a.name, b.title;
  60. 60. alberto@graphenedb.com | @albertoperdomo Syntax START a=node(*) MATCH (a)-[r:ACTED_IN]->(b) RETURN a.name, r.roles, b.title;
  61. 61. alberto@graphenedb.com | @albertoperdomo Syntax (a) --> (b) <-- (c) A B C
  62. 62. alberto@graphenedb.com | @albertoperdomo Syntax START a=node(*) MATCH (a) -[:ACTED_IN]->(m)<-[:DIRECTED]- (d) RETURN a.name, m.title, d.name;
  63. 63. alberto@graphenedb.com | @albertoperdomo Sort & Limit START a=node(*) MATCH (a) -[:ACTED_IN]->(m)<-[:DIRECTED]- (d) RETURN a.name, m.title, d.name ORDER BY(count) DESC LIMIT 5;
  64. 64. alberto@graphenedb.com | @albertoperdomo Starting point: All nodes START n=node(*) RETURN n;
  65. 65. alberto@graphenedb.com | @albertoperdomo Starting point: Where START n=node(*) WHERE has (n.name) AND n.name = “George Lucas” RETURN n;
  66. 66. alberto@graphenedb.com | @albertoperdomo Starting point: Auto Index START n=node:node_auto_index(name=“George Lucas”) RETURN n;
  67. 67. alberto@graphenedb.com | @albertoperdomo Starting point: multiple nodes START lucas=node:node_auto_index(name=“George Lucas”), ford=node:node_auto_index(name=”Harrison Ford”) MATCH (lucas) -[:DIRECTED]-> (m) <-[:ACTED_IN]- (ford) RETURN m.title;
  68. 68. alberto@graphenedb.com | @albertoperdomo Multiple relations MATCH (a)-[:ACTED_IN|DIRECTED]->()
  69. 69. alberto@graphenedb.com | @albertoperdomo Constraints with comparison START a=node:node_auto_index(name=“Alberto Perdomo”) MATCH (a) -[:KNOWS]-> (b) WHERE b.born < a.born RETURN a.name;
  70. 70. alberto@graphenedb.com | @albertoperdomo Contraints with patterns MATCH (alberto)-[:KNOWS*2]->(fof) WHERE NOT((ferblape)-[:KNOWS]-(fof))
  71. 71. alberto@graphenedb.com | @albertoperdomo Variable length paths MATCH (alberto)-[:KNOWS*2]->(fof)
  72. 72. alberto@graphenedb.com | @albertoperdomo Agreggation ๏count(x) ๏min(x) ๏max(x) ๏avg(x) ๏collect(x) ๏filter(x)
  73. 73. alberto@graphenedb.com | @albertoperdomo Updating the graph ๏Create, Set, Delete nodes ๏Create, Set, Delete relations
  74. 74. alberto@graphenedb.com | @albertoperdomo Neo4j: More features
  75. 75. alberto@graphenedb.com | @albertoperdomo Built-in Graph Algos ๏shortest path ๏allSimplePaths ๏allPaths ๏dijkstra
  76. 76. alberto@graphenedb.com | @albertoperdomo Extending Neo4j: Plugins ๏Provides extra API endpoints to run external code. JAR files. ๏Neo4j-Spatial ๏Neo4j-Sparql
  77. 77. alberto@graphenedb.com | @albertoperdomo Neo4j from Ruby ๏ Neography ๏ Wrapper around REST API ๏ Neo4j.rb: ๏ Language binding for JRuby ๏ ActiveModel, Mixins ๏ Embedded Neo4j w/ GPL license (not only?) ๏ Other?
  78. 78. alberto@graphenedb.com | @albertoperdomo Neography # Node creation: node1 = @neo.create_node("age" => 31, "name" => "Max") node2 = @neo.create_node("age" => 33, "name" => "Roel") # Node properties: @neo.set_node_properties(node1, {"weight" => 200}) # Relationships between nodes: @neo.create_relationship("coding_buddies", node1, node2) # Get node relationships: @neo.get_node_relationships(node2, "in", "coding_buddies") # Use indexes: @neo.add_node_to_index("people", "name", "max", node1) @neo.get_node_index("people", "name", "max") # Cypher queries: @neo.execute_query("start n=node(0) return n")
  79. 79. alberto@graphenedb.com | @albertoperdomo Neo4j.rb ActiveModel class User < Neo4j::Rails::Model attr_accessor :password attr_accessible :email, :password, :password_confirmation after_save :encrypt_password email_regex = /A[w+-.]+@[a-zd-.]+.[a-z]+z/i # add an exact lucene index on the email property property :email, index: :exact has_one(:avatar).to(Avator) accepts_nested_attributes_for :avatar, allow_destroy: true end
  80. 80. alberto@graphenedb.com | @albertoperdomo Neo4j.rb Mixin class Person include Neo4j::NodeMixin property :name, index: :exact property :city has_n :friends has_one :address end
  81. 81. alberto@graphenedb.com | @albertoperdomo Neo4j Licensing ๏ Community: GPL ๏ Advanced: Commercial + AGPL ๏ Monitoring ๏ Support ๏ Enterprise: Commercial + AGPL ๏ Monitoring + HA clustering + Online backups ๏ Support
  82. 82. alberto@graphenedb.com | @albertoperdomo Getting Started ๏www.neo4j.org/learn/try ๏www.neo4j.org/download ๏download -> unpack -> start ๏http://localhost:7474
  83. 83. alberto@graphenedb.com | @albertoperdomo Built in Web Admin ๏Stats ๏Console & browser ๏Indexes
  84. 84. alberto@graphenedb.com | @albertoperdomo Neo4j Resources ๏ Code & Issues: github.com/neo4j/neo4j ๏ Resources: www.neo4j.org/learn ๏ Mailing List: groups.google.com/forum/#!forum/neo4j ๏ Questions: stackoverflow.com/questions/tagged/neo4j ๏ Meetups: www.neo4j.org/participate/events/meetups Free download: http://graphdatabases.com/
  85. 85. alberto@graphenedb.com | @albertoperdomo GrapheneDB: Neo4j as a Service
  86. 86. @albertoperdomo alberto@graphenedb.com Thanks!

×