Data Modeling with Neo4j

12,640 views

Published on

This presentation covers several aspects of modeling data and domains with a graph database like Neo4j. The graph data model allows high fidelity modeling. Using the first class relationships of the graph model allow to use much higher forms of normalization than you would use in a relational database.

Video here: https://vimeo.com/67371996

Published in: Technology
3 Comments
31 Likes
Statistics
Notes
  • Slide#80, node on the right should be Language?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Typo on slide 80 and node on right should be "Language" and not another "Country". Great stuff, though and thanks!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Nice presentation, which could have a more precise context if the term 'data modelling' was used in a more qualified manner. 'Data modelling' here appears reduced to graph modelling vs. relational database modelling, which pertains to physical modelling of a relational database. All physical data models and databases however are best derived from a platform-independent data model (PIM), such as UML, or conceptual Entity-Relationship-Diagrams (ERD), or other graph models. At the PIM level, no foreign-keys exist, and UML and ERD are capturing the same classes/entities/nodes and their properties and interrelationships as any other graph model does.
    The real modelling difference does not show until the platform-specific models (PSM) are derived from the PIM.
    Getting the PIM right is paramount regardless of the target platform, and a graph model and database is not substituting a PIM. Therefore, graph modelling is not easier than relational modelling. Instead, graph modelling is a precursor to PSM modelling. Implementing a graph database however could well be easier than a relational database.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
12,640
On SlideShare
0
From Embeds
0
Number of Embeds
245
Actions
Shares
0
Downloads
753
Comments
3
Likes
31
Embeds 0
No embeds

No notes for slide

Data Modeling with Neo4j

  1. 1. Data Modeling with Neo4j 1 Michael Hunger, Neo Technology @neo4j | @mesirii | michael@neo4j.org
  2. 2. (Michael) -[:WORKS_ON]-> (Neo4j) ME Spring Cloud Community Cypher console community graph Server 2
  3. 3. 3
  4. 4. is a 4
  5. 5. 5 NOSQL
  6. 6. Graph Database 6
  7. 7. A graph database... 7
  8. 8. A graph database... 7 NO: not for charts & diagrams, or vector artwork
  9. 9. A graph database... 7 NO: not for charts & diagrams, or vector artwork YES: for storing data that is structured as a graph
  10. 10. A graph database... 7 NO: not for charts & diagrams, or vector artwork YES: for storing data that is structured as a graph remember linked lists, trees?
  11. 11. A graph database... 7 NO: not for charts & diagrams, or vector artwork YES: for storing data that is structured as a graph remember linked lists, trees? graphs are the general-purpose data structure
  12. 12. A graph database... 7 NO: not for charts & diagrams, or vector artwork YES: for storing data that is structured as a graph remember linked lists, trees? graphs are the general-purpose data structure “A relational database may tell you the average age of everyone in this place, but a graph database will tell you who is most likely to buy you a beer.”
  13. 13. 8
  14. 14. You know relational 8
  15. 15. You know relational 8
  16. 16. You know relational 8 foo
  17. 17. You know relational 8 foo bar
  18. 18. You know relational 8 foo barfoo_bar
  19. 19. You know relational 8 foo barfoo_bar
  20. 20. You know relational 8 foo barfoo_bar
  21. 21. You know relational 8 foo barfoo_bar
  22. 22. You know relational 8 now consider relationships...
  23. 23. You know relational 8 now consider relationships...
  24. 24. You know relational 8 now consider relationships...
  25. 25. You know relational 8 now consider relationships...
  26. 26. You know relational 8 now consider relationships...
  27. 27. You know relational 8 now consider relationships...
  28. 28. 8
  29. 29. 9
  30. 30. We're talking about a Property Graph 9
  31. 31. We're talking about a Property Graph 9 Nodes
  32. 32. We're talking about a Property Graph 9 Nodes Relationships
  33. 33. Emil Andrés Lars Johan Allison Peter Michael Tobias Andreas IanMica Delia knows knows knows knows knows knows knows knows knows knowsMica knowsknows Mica Delia knows We're talking about a Property Graph 9 Nodes Relationships Properties (each a key+value)
  34. 34. Emil Andrés Lars Johan Allison Peter Michael Tobias Andreas IanMica Delia knows knows knows knows knows knows knows knows knows knowsMica knowsknows Mica Delia knows We're talking about a Property Graph 9 Nodes Relationships Properties (each a key+value) + Indexes (for easy look-ups)
  35. 35. Aggregate vs. Connected Data-Model 10
  36. 36. NOSQL Relational Graph Document KeyValue Riak Column oriented 11 Redis Cassandra Mongo Couch Neo4j MySQL Postgres NOSQL Databases
  37. 37. 12 “There is a significant downside - the whole approach works really well when data access is aligned with the aggregates, but what if you want to look at the data in a different way? Order entry naturally stores orders as aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the database is that it allows you to slice and dice your data different ways for different audiences. This is why aggregate-oriented stores talk so much about map- reduce.” Martin Fowler Aggregate Oriented Model
  38. 38. 13 The connected data model is based on fine grained elements that are richly connected, the emphasis is on extracting many dimensions and attributes as elements. Connections are cheap and can be used not only for the domain-level relationships but also for additional structures that allow efficient access for different use-cases. The fine grained model requires a external scope for mutating operations that ensures Atomicity, Consistency, Isolation and Durability - ACID also known as Transactions. Michael Hunger Connected Data Model
  39. 39. Data Modeling 14
  40. 40. Why Data Modeling 15 ๏What is modeling? ๏Aren‘t we schema free? ๏How does it work in a graph? ๏Where should modeling happen? DB or Application
  41. 41. Data Models 16
  42. 42. Model mis-match Real World Model
  43. 43. Model mis-match Application Model Database Model
  44. 44. Trinity of models
  45. 45. Whiteboard --> Data 20
  46. 46. Whiteboard --> Data 20 Andreas Peter Emil Allison
  47. 47. Whiteboard --> Data 20 Andreas Peter Emil Allison knows knows knows knows
  48. 48. Whiteboard --> Data 20 Andreas Peter Emil Allison knows knows knows knows
  49. 49. Whiteboard --> Data 20 Andreas Peter Emil Allison knows knows knows knows // Cypher query - friend of a friend start n=node(0) match (n)--()--(foaf) return foaf
  50. 50. 21
  51. 51. You traverse the graph 21
  52. 52. You traverse the graph 21
  53. 53. // lookup starting point in an index START n=node:People(name = ‘Andreas’) Andreas You traverse the graph 21
  54. 54. // lookup starting point in an index START n=node:People(name = ‘Andreas’) Andreas You traverse the graph 21 // then traverse to find results START me=node:People(name = ‘Andreas’ MATCH (me)-[:FRIEND]-(friend)-[:FRIEND]-(friend2) RETURN friend2
  55. 55. 21
  56. 56. SELECT skills.*, user_skill.* FROM users JOIN user_skill ON users.id = user_skill.user_id JOIN skills ON user_skill.skill_id = skill.id WHERE users.id = 1 22 START user = node(1) MATCH user -[user_skill]-> skill RETURN skill, user_skill
  57. 57. An Example 23
  58. 58. What language do they speak here? Language Country
  59. 59. What language do they speak here? Language Country
  60. 60. What language do they speak here? Language Country
  61. 61. Tables language_code language_name word_count Language country_code country_name flag_uri Country
  62. 62. Need to model the relationship language_code language_name word_count Language country_code country_name flag_uri language_code Country
  63. 63. What if the cardinality changes? language_code language_name word_count country_code Language country_code country_name flag_uri Country
  64. 64. Or we go many-to-many? language_code language_name word_count Language country_code country_name flag_uri Country language_code country_code LanguageCountry
  65. 65. Or we want to qualify the relationship? language_code language_name word_count Language country_code country_name flag_uri Country language_code country_code primary LanguageCountry
  66. 66. Start talking about Graphs
  67. 67. Explicit Relationship name word_count Language name flag_uri Country IS_SPOKEN_IN
  68. 68. Relationship Properties name word_count Language name flag_uri Country IS_SPOKEN_IN as_primary
  69. 69. What’s different? language_code language_name word_count Language country_code country_name flag_uri Country language_code country_code primary LanguageCountry IS_SPOKEN_IN
  70. 70. What’s different? ๏ Implementation of maintaining relationships is left up to the database ๏ Artificial keys disappear or are unnecessary ๏ Relationships get an explicit name • can be navigated in both directions
  71. 71. Relationship specialisation name word_count Language name flag_uri Country IS_SPOKEN_IN as_primary
  72. 72. Bidirectional relationships name word_count Language name flag_uri Country IS_SPOKEN_IN PRIMARY_LANGUAGE
  73. 73. Weighted relationships name word_count Language name flag_uri Country POPULATION_SPEAKS population_fraction
  74. 74. Keep on adding relationships name word_count Language name flag_uri Country POPULATION_SPEAKS population_fraction SIMILAR_TO ADJACENT_TO
  75. 75. EMBRACE the paradigm
  76. 76. Use the building blocks ๏ Nodes ๏ Relationships ๏ Properties name: value RELATIONSHIP_NAME
  77. 77. Anti-pattern: rich properties name: “Canada” languages_spoken: “[ ‘English’, ‘French’ ]”
  78. 78. Normalize Nodes
  79. 79. Anti-Pattern: Node represents multiple concepts name flag_uri language_name number_of_words yes_in_language no_in_language currency_code currency_name Country
  80. 80. USES_CURRENCY Split up in separate concepts name flag_uri currency_code currency_name Country name number_of_words yes no Country SPEAKS Currency currency_code currency_name
  81. 81. Challenge: Property or Relationship? ๏ Can every property be replaced by a relationship? • Hint: triple stores. Are they easy to use? ๏ Should every entities with the same property values be connected?
  82. 82. Object Mapping ๏ Similar to how you would map objects to a relational database, using an ORM such as Hibernate ๏ Generally simpler and easier to reason about ๏ Examples • Java: Spring Data Neo4j • Ruby: Active Model ๏ Why Map? • Do you use mapping because you are scared of SQL? • Following DDD, could you write your repositories directly against the graph API?
  83. 83. CONNECT for fast access In-Graph Indices
  84. 84. Relationships for querying ๏ like in other databases • same structure for different use-cases (OLTP and OLAP) doesn‘t work • graph allows: add more structures ๏ Relationships should the primary means to access nodes in the database ๏ Traversing relationships is cheap – that’s the whole design goal of a graph database ๏ Use lookups only to find starting nodes for a query Data Modeling examples in Manual
  85. 85. Anti-pattern: unconnected graph name: “Jones” name: “Jones” name: “Jones” name: “Jones” name: “Jones” name: “Jones” name: “Jones” name: “Jones” name: “Jones” name: “Jones” name: “Jones”
  86. 86. Pattern: Linked List 52
  87. 87. Pattern: Multiple Relationships 53
  88. 88. Pattern-Trees:Tags and Categories 54
  89. 89. Pattern-Tree: Multi-Level-Tree 55
  90. 90. Pattern-Trees: R-Tree (spatial) 56
  91. 91. Example:Activity Stream 57
  92. 92. Graph Evolution 58
  93. 93. Evolution: Relationship to Node 59 Peter SENT_EMAIL Michael Peter EMAIL_FROM Michael EMAIL_TO Email Emil EMAIL_CC Community TAGGED . . . see Hyperedges
  94. 94. Combine multiple Domains in a Graph ๏ you start with a single domain ๏ add more connected domains as your system evolves ๏ more domains allow to ask different queries ๏ one domain „indexes“ the other ๏ Example Facebook Graph Search • social graph • location graph • activity graph • favorite graph • ...
  95. 95. Notes on the Graph Data Model ๏Schema free, but constraints ๏Model your graph with a whiteboard and a wise man ๏Nodes as main entities but useless without connections ๏Relationships are first level citizens in the model and database ๏Normalize more than in a relational database ๏use meaningful relationship-types, not generic ones like IS_ ๏use in-graph structures to allow different access paths ๏evolve your graph to your needs, incremental growth 61
  96. 96. Realworld Examples 62
  97. 97. 63
  98. 98. 63 Real World Use Cases:
  99. 99. 63 Real World Use Cases: •[A] ACL from Hell
  100. 100. 63 Real World Use Cases: •[A] ACL from Hell •[B] Timely recommendations
  101. 101. 63 Real World Use Cases: •[A] ACL from Hell •[B] Timely recommendations •[C] Global collaboration
  102. 102. [A] ACL from Hell 64
  103. 103. [A] ACL from Hell ๏ Customer: • leading consumer utility company with tons and tons of users ๏ Goal: • comprehensive access control administration for customers ๏ Benefits: • Flexible and dynamic architecture • Exceptional performance • Extensible data model supports new applications and features • Low cost 64
  104. 104. [A] ACL from Hell ๏ Customer: • leading consumer utility company with tons and tons of users ๏ Goal: • comprehensive access control administration for customers ๏ Benefits: • Flexible and dynamic architecture • Exceptional performance • Extensible data model supports new applications and features • Low cost 64 • A Reliable access control administration system for 5 million customers, subscriptions and agreements • Complex dependencies between groups, companies, individuals, accounts, products, subscriptions, services and agreements • Broad and deep graphs (master customers with 1000s of customers, subscriptions & agreements)
  105. 105. [A] ACL from Hell ๏ Customer: • leading consumer utility company with tons and tons of users ๏ Goal: • comprehensive access control administration for customers ๏ Benefits: • Flexible and dynamic architecture • Exceptional performance • Extensible data model supports new applications and features • Low cost 64 • A Reliable access control administration system for 5 million customers, subscriptions and agreements • Complex dependencies between groups, companies, individuals, accounts, products, subscriptions, services and agreements • Broad and deep graphs (master customers with 1000s of customers, subscriptions & agreements) name: Andreas subscription: sports service: NFL account: 9758352794 agreement: ultimate owns subscribes to has plan includes provides group: graphistas promotion: fall member of offered discounts company: Neo Technologyworks with gets discount on subscription: local subscribes to provides service: Ravens includes
  106. 106. [B] Timely Recommendations 65
  107. 107. [B] Timely Recommendations ๏ Customer: • a professional social network • 35 millions users, adding 30,000+ each day ๏ Goal: up-to-date recommendations • Scalable solution with real-time end-user experience • Low maintenance and reliable architecture • 8-week implementation 65
  108. 108. [B] Timely Recommendations ๏ Customer: • a professional social network • 35 millions users, adding 30,000+ each day ๏ Goal: up-to-date recommendations • Scalable solution with real-time end-user experience • Low maintenance and reliable architecture • 8-week implementation 65 ๏ Problem: • Real-time recommendation imperative to attract new users and maintain positive user retention • Clustered MySQL solution not scalable or fast enough to support real-time requirements ๏ Upgrade from running a batch job • initial hour-long batch job • but then success happened, and it became a day • then two days ๏ With Neo4j, real time recommendations
  109. 109. [B] Timely Recommendations ๏ Customer: • a professional social network • 35 millions users, adding 30,000+ each day ๏ Goal: up-to-date recommendations • Scalable solution with real-time end-user experience • Low maintenance and reliable architecture • 8-week implementation 65 ๏ Problem: • Real-time recommendation imperative to attract new users and maintain positive user retention • Clustered MySQL solution not scalable or fast enough to support real-time requirements ๏ Upgrade from running a batch job • initial hour-long batch job • but then success happened, and it became a day • then two days ๏ With Neo4j, real time recommendations name:Andreas job: talking name: Allison job: plumber name: Tobias job: coding knows knows name: Peter job: building name: Emil job: plumber knows name: Stephen job: DJ knows knows name: Delia job: barking knows knows name: Tiberius job: dancer knows knows knows knows
  110. 110. [C] Collaboration on Global Scale 66
  111. 111. [C] Collaboration on Global Scale ๏ Customer: a worldwide software leader • highly collaborative end-users ๏ Goal: offer an online platform for global collaboration • Highly flexible data analysis • Sub-second results for large, densely-connected data • User experience - competitive advantage 66
  112. 112. [C] Collaboration on Global Scale ๏ Customer: a worldwide software leader • highly collaborative end-users ๏ Goal: offer an online platform for global collaboration • Highly flexible data analysis • Sub-second results for large, densely-connected data • User experience - competitive advantage 66 • Massive amounts of data tied to members, user groups, member content, etc. all interconnected • Infer collaborative relationships through user- generated content • Worldwide Availability
  113. 113. [C] Collaboration on Global Scale ๏ Customer: a worldwide software leader • highly collaborative end-users ๏ Goal: offer an online platform for global collaboration • Highly flexible data analysis • Sub-second results for large, densely-connected data • User experience - competitive advantage 66 • Massive amounts of data tied to members, user groups, member content, etc. all interconnected • Infer collaborative relationships through user- generated content • Worldwide Availability Asia North America Europe
  114. 114. [C] Collaboration on Global Scale ๏ Customer: a worldwide software leader • highly collaborative end-users ๏ Goal: offer an online platform for global collaboration • Highly flexible data analysis • Sub-second results for large, densely-connected data • User experience - competitive advantage 66 • Massive amounts of data tied to members, user groups, member content, etc. all interconnected • Infer collaborative relationships through user- generated content • Worldwide Availability Asia North America Europe Asia North America Europe
  115. 115. How to get started? 67
  116. 116. How to get started? ๏ Documentation 67
  117. 117. How to get started? ๏ Documentation • neo4j.org 67
  118. 118. How to get started? ๏ Documentation • neo4j.org 67
  119. 119. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql 67
  120. 120. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql 67
  121. 121. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference 67
  122. 122. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference 67
  123. 123. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference ‣Data Modeling Examples 67
  124. 124. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference ‣Data Modeling Examples • http://console.neo4j.org 67
  125. 125. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference ‣Data Modeling Examples • http://console.neo4j.org 67
  126. 126. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference ‣Data Modeling Examples • http://console.neo4j.org • Neo4j in Action 67
  127. 127. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference ‣Data Modeling Examples • http://console.neo4j.org • Neo4j in Action 67
  128. 128. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference ‣Data Modeling Examples • http://console.neo4j.org • Neo4j in Action • Good Relationships 67
  129. 129. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference ‣Data Modeling Examples • http://console.neo4j.org • Neo4j in Action • Good Relationships ๏ Worldwide one-day Neo4j Trainings 67
  130. 130. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference ‣Data Modeling Examples • http://console.neo4j.org • Neo4j in Action • Good Relationships ๏ Worldwide one-day Neo4j Trainings ๏ Get Neo4j 67
  131. 131. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference ‣Data Modeling Examples • http://console.neo4j.org • Neo4j in Action • Good Relationships ๏ Worldwide one-day Neo4j Trainings ๏ Get Neo4j • http://neo4j.org/download 67
  132. 132. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference ‣Data Modeling Examples • http://console.neo4j.org • Neo4j in Action • Good Relationships ๏ Worldwide one-day Neo4j Trainings ๏ Get Neo4j • http://neo4j.org/download • http://addons.heroku.com/neo4j/ 67
  133. 133. 68
  134. 134. 68 Really, once you start thinking in graphs it's hard to stop Recommendations MDM Systems Management Geospatial Social computing Business intelligence Biotechnology Making Sense of all that data your brain access control linguistics catalogs genealogyrouting compensation market vectors
  135. 135. 68 Really, once you start thinking in graphs it's hard to stop Recommendations MDM Systems Management Geospatial Social computing Business intelligence Biotechnology Making Sense of all that data your brain access control linguistics catalogs genealogyrouting compensation market vectors What will you build?
  136. 136. ThankYou! Questions ? 69

×