• Like
Uploaded on

This presentation covers several aspects of modeling data and domains with a graph database like Neo4j. The graph data model allows high fidelity modeling. Using the first class relationships of the …

This presentation covers several aspects of modeling data and domains with a graph database like Neo4j. The graph data model allows high fidelity modeling. Using the first class relationships of the graph model allow to use much higher forms of normalization than you would use in a relational database.

Video here: https://vimeo.com/67371996

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Typo on slide 80 and node on right should be "Language" and not another "Country". Great stuff, though and thanks!
    Are you sure you want to
    Your message goes here
  • Nice presentation, which could have a more precise context if the term 'data modelling' was used in a more qualified manner. 'Data modelling' here appears reduced to graph modelling vs. relational database modelling, which pertains to physical modelling of a relational database. All physical data models and databases however are best derived from a platform-independent data model (PIM), such as UML, or conceptual Entity-Relationship-Diagrams (ERD), or other graph models. At the PIM level, no foreign-keys exist, and UML and ERD are capturing the same classes/entities/nodes and their properties and interrelationships as any other graph model does.
    The real modelling difference does not show until the platform-specific models (PSM) are derived from the PIM.
    Getting the PIM right is paramount regardless of the target platform, and a graph model and database is not substituting a PIM. Therefore, graph modelling is not easier than relational modelling. Instead, graph modelling is a precursor to PSM modelling. Implementing a graph database however could well be easier than a relational database.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
8,342
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
572
Comments
2
Likes
19

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Data Modeling with Neo4j 1 Michael Hunger, Neo Technology @neo4j | @mesirii | michael@neo4j.org
  • 2. (Michael) -[:WORKS_ON]-> (Neo4j) ME Spring Cloud Community Cypher console community graph Server 2
  • 3. 3
  • 4. is a 4
  • 5. 5 NOSQL
  • 6. Graph Database 6
  • 7. A graph database... 7
  • 8. A graph database... 7 NO: not for charts & diagrams, or vector artwork
  • 9. A graph database... 7 NO: not for charts & diagrams, or vector artwork YES: for storing data that is structured as a graph
  • 10. A graph database... 7 NO: not for charts & diagrams, or vector artwork YES: for storing data that is structured as a graph remember linked lists, trees?
  • 11. A graph database... 7 NO: not for charts & diagrams, or vector artwork YES: for storing data that is structured as a graph remember linked lists, trees? graphs are the general-purpose data structure
  • 12. A graph database... 7 NO: not for charts & diagrams, or vector artwork YES: for storing data that is structured as a graph remember linked lists, trees? graphs are the general-purpose data structure “A relational database may tell you the average age of everyone in this place, but a graph database will tell you who is most likely to buy you a beer.”
  • 13. 8
  • 14. You know relational 8
  • 15. You know relational 8
  • 16. You know relational 8 foo
  • 17. You know relational 8 foo bar
  • 18. You know relational 8 foo barfoo_bar
  • 19. You know relational 8 foo barfoo_bar
  • 20. You know relational 8 foo barfoo_bar
  • 21. You know relational 8 foo barfoo_bar
  • 22. You know relational 8 now consider relationships...
  • 23. You know relational 8 now consider relationships...
  • 24. You know relational 8 now consider relationships...
  • 25. You know relational 8 now consider relationships...
  • 26. You know relational 8 now consider relationships...
  • 27. You know relational 8 now consider relationships...
  • 28. 8
  • 29. 9
  • 30. We're talking about a Property Graph 9
  • 31. We're talking about a Property Graph 9 Nodes
  • 32. We're talking about a Property Graph 9 Nodes Relationships
  • 33. Emil Andrés Lars Johan Allison Peter Michael Tobias Andreas IanMica Delia knows knows knows knows knows knows knows knows knows knowsMica knowsknows Mica Delia knows We're talking about a Property Graph 9 Nodes Relationships Properties (each a key+value)
  • 34. Emil Andrés Lars Johan Allison Peter Michael Tobias Andreas IanMica Delia knows knows knows knows knows knows knows knows knows knowsMica knowsknows Mica Delia knows We're talking about a Property Graph 9 Nodes Relationships Properties (each a key+value) + Indexes (for easy look-ups)
  • 35. Aggregate vs. Connected Data-Model 10
  • 36. NOSQL Relational Graph Document KeyValue Riak Column oriented 11 Redis Cassandra Mongo Couch Neo4j MySQL Postgres NOSQL Databases
  • 37. 12 “There is a significant downside - the whole approach works really well when data access is aligned with the aggregates, but what if you want to look at the data in a different way? Order entry naturally stores orders as aggregates, but analyzing product sales cuts across the aggregate structure. The advantage of not using an aggregate structure in the database is that it allows you to slice and dice your data different ways for different audiences. This is why aggregate-oriented stores talk so much about map- reduce.” Martin Fowler Aggregate Oriented Model
  • 38. 13 The connected data model is based on fine grained elements that are richly connected, the emphasis is on extracting many dimensions and attributes as elements. Connections are cheap and can be used not only for the domain-level relationships but also for additional structures that allow efficient access for different use-cases. The fine grained model requires a external scope for mutating operations that ensures Atomicity, Consistency, Isolation and Durability - ACID also known as Transactions. Michael Hunger Connected Data Model
  • 39. Data Modeling 14
  • 40. Why Data Modeling 15 ๏What is modeling? ๏Aren‘t we schema free? ๏How does it work in a graph? ๏Where should modeling happen? DB or Application
  • 41. Data Models 16
  • 42. Model mis-match Real World Model
  • 43. Model mis-match Application Model Database Model
  • 44. Trinity of models
  • 45. Whiteboard --> Data 20
  • 46. Whiteboard --> Data 20 Andreas Peter Emil Allison
  • 47. Whiteboard --> Data 20 Andreas Peter Emil Allison knows knows knows knows
  • 48. Whiteboard --> Data 20 Andreas Peter Emil Allison knows knows knows knows
  • 49. Whiteboard --> Data 20 Andreas Peter Emil Allison knows knows knows knows // Cypher query - friend of a friend start n=node(0) match (n)--()--(foaf) return foaf
  • 50. 21
  • 51. You traverse the graph 21
  • 52. You traverse the graph 21
  • 53. // lookup starting point in an index START n=node:People(name = ‘Andreas’) Andreas You traverse the graph 21
  • 54. // lookup starting point in an index START n=node:People(name = ‘Andreas’) Andreas You traverse the graph 21 // then traverse to find results START me=node:People(name = ‘Andreas’ MATCH (me)-[:FRIEND]-(friend)-[:FRIEND]-(friend2) RETURN friend2
  • 55. 21
  • 56. SELECT skills.*, user_skill.* FROM users JOIN user_skill ON users.id = user_skill.user_id JOIN skills ON user_skill.skill_id = skill.id WHERE users.id = 1 22 START user = node(1) MATCH user -[user_skill]-> skill RETURN skill, user_skill
  • 57. An Example 23
  • 58. What language do they speak here? Language Country
  • 59. What language do they speak here? Language Country
  • 60. What language do they speak here? Language Country
  • 61. Tables language_code language_name word_count Language country_code country_name flag_uri Country
  • 62. Need to model the relationship language_code language_name word_count Language country_code country_name flag_uri language_code Country
  • 63. What if the cardinality changes? language_code language_name word_count country_code Language country_code country_name flag_uri Country
  • 64. Or we go many-to-many? language_code language_name word_count Language country_code country_name flag_uri Country language_code country_code LanguageCountry
  • 65. Or we want to qualify the relationship? language_code language_name word_count Language country_code country_name flag_uri Country language_code country_code primary LanguageCountry
  • 66. Start talking about Graphs
  • 67. Explicit Relationship name word_count Language name flag_uri Country IS_SPOKEN_IN
  • 68. Relationship Properties name word_count Language name flag_uri Country IS_SPOKEN_IN as_primary
  • 69. What’s different? language_code language_name word_count Language country_code country_name flag_uri Country language_code country_code primary LanguageCountry IS_SPOKEN_IN
  • 70. What’s different? ๏ Implementation of maintaining relationships is left up to the database ๏ Artificial keys disappear or are unnecessary ๏ Relationships get an explicit name • can be navigated in both directions
  • 71. Relationship specialisation name word_count Language name flag_uri Country IS_SPOKEN_IN as_primary
  • 72. Bidirectional relationships name word_count Language name flag_uri Country IS_SPOKEN_IN PRIMARY_LANGUAGE
  • 73. Weighted relationships name word_count Language name flag_uri Country POPULATION_SPEAKS population_fraction
  • 74. Keep on adding relationships name word_count Language name flag_uri Country POPULATION_SPEAKS population_fraction SIMILAR_TO ADJACENT_TO
  • 75. EMBRACE the paradigm
  • 76. Use the building blocks ๏ Nodes ๏ Relationships ๏ Properties name: value RELATIONSHIP_NAME
  • 77. Anti-pattern: rich properties name: “Canada” languages_spoken: “[ ‘English’, ‘French’ ]”
  • 78. Normalize Nodes
  • 79. Anti-Pattern: Node represents multiple concepts name flag_uri language_name number_of_words yes_in_language no_in_language currency_code currency_name Country
  • 80. USES_CURRENCY Split up in separate concepts name flag_uri currency_code currency_name Country name number_of_words yes no Country SPEAKS Currency currency_code currency_name
  • 81. Challenge: Property or Relationship? ๏ Can every property be replaced by a relationship? • Hint: triple stores. Are they easy to use? ๏ Should every entities with the same property values be connected?
  • 82. Object Mapping ๏ Similar to how you would map objects to a relational database, using an ORM such as Hibernate ๏ Generally simpler and easier to reason about ๏ Examples • Java: Spring Data Neo4j • Ruby: Active Model ๏ Why Map? • Do you use mapping because you are scared of SQL? • Following DDD, could you write your repositories directly against the graph API?
  • 83. CONNECT for fast access In-Graph Indices
  • 84. Relationships for querying ๏ like in other databases • same structure for different use-cases (OLTP and OLAP) doesn‘t work • graph allows: add more structures ๏ Relationships should the primary means to access nodes in the database ๏ Traversing relationships is cheap – that’s the whole design goal of a graph database ๏ Use lookups only to find starting nodes for a query Data Modeling examples in Manual
  • 85. Anti-pattern: unconnected graph name: “Jones” name: “Jones” name: “Jones” name: “Jones” name: “Jones” name: “Jones” name: “Jones” name: “Jones” name: “Jones” name: “Jones” name: “Jones”
  • 86. Pattern: Linked List 52
  • 87. Pattern: Multiple Relationships 53
  • 88. Pattern-Trees:Tags and Categories 54
  • 89. Pattern-Tree: Multi-Level-Tree 55
  • 90. Pattern-Trees: R-Tree (spatial) 56
  • 91. Example:Activity Stream 57
  • 92. Graph Evolution 58
  • 93. Evolution: Relationship to Node 59 Peter SENT_EMAIL Michael Peter EMAIL_FROM Michael EMAIL_TO Email Emil EMAIL_CC Community TAGGED . . . see Hyperedges
  • 94. Combine multiple Domains in a Graph ๏ you start with a single domain ๏ add more connected domains as your system evolves ๏ more domains allow to ask different queries ๏ one domain „indexes“ the other ๏ Example Facebook Graph Search • social graph • location graph • activity graph • favorite graph • ...
  • 95. Notes on the Graph Data Model ๏Schema free, but constraints ๏Model your graph with a whiteboard and a wise man ๏Nodes as main entities but useless without connections ๏Relationships are first level citizens in the model and database ๏Normalize more than in a relational database ๏use meaningful relationship-types, not generic ones like IS_ ๏use in-graph structures to allow different access paths ๏evolve your graph to your needs, incremental growth 61
  • 96. Realworld Examples 62
  • 97. 63
  • 98. 63 Real World Use Cases:
  • 99. 63 Real World Use Cases: •[A] ACL from Hell
  • 100. 63 Real World Use Cases: •[A] ACL from Hell •[B] Timely recommendations
  • 101. 63 Real World Use Cases: •[A] ACL from Hell •[B] Timely recommendations •[C] Global collaboration
  • 102. [A] ACL from Hell 64
  • 103. [A] ACL from Hell ๏ Customer: • leading consumer utility company with tons and tons of users ๏ Goal: • comprehensive access control administration for customers ๏ Benefits: • Flexible and dynamic architecture • Exceptional performance • Extensible data model supports new applications and features • Low cost 64
  • 104. [A] ACL from Hell ๏ Customer: • leading consumer utility company with tons and tons of users ๏ Goal: • comprehensive access control administration for customers ๏ Benefits: • Flexible and dynamic architecture • Exceptional performance • Extensible data model supports new applications and features • Low cost 64 • A Reliable access control administration system for 5 million customers, subscriptions and agreements • Complex dependencies between groups, companies, individuals, accounts, products, subscriptions, services and agreements • Broad and deep graphs (master customers with 1000s of customers, subscriptions & agreements)
  • 105. [A] ACL from Hell ๏ Customer: • leading consumer utility company with tons and tons of users ๏ Goal: • comprehensive access control administration for customers ๏ Benefits: • Flexible and dynamic architecture • Exceptional performance • Extensible data model supports new applications and features • Low cost 64 • A Reliable access control administration system for 5 million customers, subscriptions and agreements • Complex dependencies between groups, companies, individuals, accounts, products, subscriptions, services and agreements • Broad and deep graphs (master customers with 1000s of customers, subscriptions & agreements) name: Andreas subscription: sports service: NFL account: 9758352794 agreement: ultimate owns subscribes to has plan includes provides group: graphistas promotion: fall member of offered discounts company: Neo Technologyworks with gets discount on subscription: local subscribes to provides service: Ravens includes
  • 106. [B] Timely Recommendations 65
  • 107. [B] Timely Recommendations ๏ Customer: • a professional social network • 35 millions users, adding 30,000+ each day ๏ Goal: up-to-date recommendations • Scalable solution with real-time end-user experience • Low maintenance and reliable architecture • 8-week implementation 65
  • 108. [B] Timely Recommendations ๏ Customer: • a professional social network • 35 millions users, adding 30,000+ each day ๏ Goal: up-to-date recommendations • Scalable solution with real-time end-user experience • Low maintenance and reliable architecture • 8-week implementation 65 ๏ Problem: • Real-time recommendation imperative to attract new users and maintain positive user retention • Clustered MySQL solution not scalable or fast enough to support real-time requirements ๏ Upgrade from running a batch job • initial hour-long batch job • but then success happened, and it became a day • then two days ๏ With Neo4j, real time recommendations
  • 109. [B] Timely Recommendations ๏ Customer: • a professional social network • 35 millions users, adding 30,000+ each day ๏ Goal: up-to-date recommendations • Scalable solution with real-time end-user experience • Low maintenance and reliable architecture • 8-week implementation 65 ๏ Problem: • Real-time recommendation imperative to attract new users and maintain positive user retention • Clustered MySQL solution not scalable or fast enough to support real-time requirements ๏ Upgrade from running a batch job • initial hour-long batch job • but then success happened, and it became a day • then two days ๏ With Neo4j, real time recommendations name:Andreas job: talking name: Allison job: plumber name: Tobias job: coding knows knows name: Peter job: building name: Emil job: plumber knows name: Stephen job: DJ knows knows name: Delia job: barking knows knows name: Tiberius job: dancer knows knows knows knows
  • 110. [C] Collaboration on Global Scale 66
  • 111. [C] Collaboration on Global Scale ๏ Customer: a worldwide software leader • highly collaborative end-users ๏ Goal: offer an online platform for global collaboration • Highly flexible data analysis • Sub-second results for large, densely-connected data • User experience - competitive advantage 66
  • 112. [C] Collaboration on Global Scale ๏ Customer: a worldwide software leader • highly collaborative end-users ๏ Goal: offer an online platform for global collaboration • Highly flexible data analysis • Sub-second results for large, densely-connected data • User experience - competitive advantage 66 • Massive amounts of data tied to members, user groups, member content, etc. all interconnected • Infer collaborative relationships through user- generated content • Worldwide Availability
  • 113. [C] Collaboration on Global Scale ๏ Customer: a worldwide software leader • highly collaborative end-users ๏ Goal: offer an online platform for global collaboration • Highly flexible data analysis • Sub-second results for large, densely-connected data • User experience - competitive advantage 66 • Massive amounts of data tied to members, user groups, member content, etc. all interconnected • Infer collaborative relationships through user- generated content • Worldwide Availability Asia North America Europe
  • 114. [C] Collaboration on Global Scale ๏ Customer: a worldwide software leader • highly collaborative end-users ๏ Goal: offer an online platform for global collaboration • Highly flexible data analysis • Sub-second results for large, densely-connected data • User experience - competitive advantage 66 • Massive amounts of data tied to members, user groups, member content, etc. all interconnected • Infer collaborative relationships through user- generated content • Worldwide Availability Asia North America Europe Asia North America Europe
  • 115. How to get started? 67
  • 116. How to get started? ๏ Documentation 67
  • 117. How to get started? ๏ Documentation • neo4j.org 67
  • 118. How to get started? ๏ Documentation • neo4j.org 67
  • 119. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql 67
  • 120. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql 67
  • 121. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference 67
  • 122. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference 67
  • 123. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference ‣Data Modeling Examples 67
  • 124. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference ‣Data Modeling Examples • http://console.neo4j.org 67
  • 125. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference ‣Data Modeling Examples • http://console.neo4j.org 67
  • 126. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference ‣Data Modeling Examples • http://console.neo4j.org • Neo4j in Action 67
  • 127. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference ‣Data Modeling Examples • http://console.neo4j.org • Neo4j in Action 67
  • 128. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference ‣Data Modeling Examples • http://console.neo4j.org • Neo4j in Action • Good Relationships 67
  • 129. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference ‣Data Modeling Examples • http://console.neo4j.org • Neo4j in Action • Good Relationships ๏ Worldwide one-day Neo4j Trainings 67
  • 130. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference ‣Data Modeling Examples • http://console.neo4j.org • Neo4j in Action • Good Relationships ๏ Worldwide one-day Neo4j Trainings ๏ Get Neo4j 67
  • 131. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference ‣Data Modeling Examples • http://console.neo4j.org • Neo4j in Action • Good Relationships ๏ Worldwide one-day Neo4j Trainings ๏ Get Neo4j • http://neo4j.org/download 67
  • 132. How to get started? ๏ Documentation • neo4j.org ‣http://www.neo4j.org/learn/nosql • docs.neo4j.org - tutorials+reference ‣Data Modeling Examples • http://console.neo4j.org • Neo4j in Action • Good Relationships ๏ Worldwide one-day Neo4j Trainings ๏ Get Neo4j • http://neo4j.org/download • http://addons.heroku.com/neo4j/ 67
  • 133. 68
  • 134. 68 Really, once you start thinking in graphs it's hard to stop Recommendations MDM Systems Management Geospatial Social computing Business intelligence Biotechnology Making Sense of all that data your brain access control linguistics catalogs genealogyrouting compensation market vectors
  • 135. 68 Really, once you start thinking in graphs it's hard to stop Recommendations MDM Systems Management Geospatial Social computing Business intelligence Biotechnology Making Sense of all that data your brain access control linguistics catalogs genealogyrouting compensation market vectors What will you build?
  • 136. ThankYou! Questions ? 69