Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

20141216 graph database prototyping ams meetup

9,469 views

Published on

This presentation outlines how you can go about prototyping a graph database quickly and efficiently.

Published in: Technology

20141216 graph database prototyping ams meetup

  1. 1. Graph Database Prototyping @ AMS GraphDB meetup
  2. 2. Agenda for Tonight • Building a Graph Database Prototype • 3 parts – Graph database & modeling concepts – Prototyping tools & import – Graph querying with Cypher
  3. 3. Data Modeling With Neo4j
  4. 4. Topics • Graph model building blocks • Quick intro to Cypher • Example modeling process • Modeling Eps • Recipes for common modeling scenarios • Refactoring • Test-­‐driven data modeling
  5. 5. Graph Model Building Blocks
  6. 6. Property Graph Data Model
  7. 7. Four Building Blocks • Nodes • RelaEonships • ProperEes • Labels
  8. 8. Nodes
  9. 9. Nodes • Used to represent en##es and complex value types in your domain • Can contain properEes – Used to represent enEty a1ributes and/or metadata (e.g. Emestamps, version) – Key-­‐value pairs • Java primiEves • Arrays • null is not a valid value – Every node can have different properEes
  10. 10. EnEEes and Value Types • EnEEes – Have unique conceptual idenEty – Change aWribute values, but idenEty remains the same • Value types – No conceptual idenEty – Can subsEtute for each other if they have the same value • Simple: single value (e.g. colour, category) • Complex: mulEple aWributes (e.g. address)
  11. 11. RelaEonships
  12. 12. RelaEonships • Every relaEonship has a name and a direc#on – Add structure to the graph – Provide semanEc context for nodes • Can contain properEes – Used to represent quality or weight of relaEonship, or metadata • Every relaEonship must have a start node and end node – No dangling relaEonships
  13. 13. RelaEonships (conEnued) Nodes can have more than one relaEonship Nodes can be connected by more than one relaEonship Self relaEonships are allowed
  14. 14. Variable Structure • RelaEonships are defined with regard to node instances, not classes of nodes – Two nodes represenEng the same kind of “thing” can be connected in very different ways • Allows for structural variaEon in the domain – Contrast with relaEonal schemas, where foreign key relaEonships apply to all rows in a table • No need to use null to represent the absence of a connecEon
  15. 15. Labels
  16. 16. Labels • Every node can have zero or more labels • Used to represent roles (e.g. user, product, company) – Group nodes – Allow us to associate indexes and constraints with groups of nodes
  17. 17. Four Building Blocks • Nodes – EnEEes • RelaEonships – Connect enEEes and structure domain • ProperEes – EnEty aWributes, relaEonship qualiEes, and metadata • Labels – Group nodes by role
  18. 18. Designing a Graph Model
  19. 19. Models Purposeful abstracEon of a domain designed to saEsfy parEcular applicaEon/end-­‐user goals Images: en.wikipedia.org
  20. 20. Design for Queryability MQuoedreyl
  21. 21. Method 1. IdenEfy applicaEon/end-­‐user goals 2. Figure out what quesEons to ask of the domain 3. IdenEfy enEEes in each quesEon 4. IdenEfy relaEonships between enEEes in each quesEon 5. Convert enEEes and relaEonships to paths – These become the basis of the data model 6. Express quesEons as graph paWerns – These become the basis for queries
  22. 22. ApplicaEon/End-­‐User Goals As an employee I want to know who in the company has similar skills to me So that we can exchange knowledge
  23. 23. QuesEons To Ask of the Domain As an employee I want to know who in the company has similar skills to me So that we can exchange knowledge Which people, who work for the same company as me, have similar skills to me?
  24. 24. IdenEfy EnEEes Which people, who work for the same company as me, have similar skills to me? Person Company Skill
  25. 25. IdenEfy RelaEonships Between EnEEes Which people, who work for the same company as me, have similar skills to me? Person WORKS_FOR Company Person HAS_SKILL Skill
  26. 26. Convert to Cypher Paths RelaEonship Person WORKS_FOR Company Person HAS_SKILL Skill Label (:Person)-[:WORKS_FOR]->(:Company), (:Person)-[:HAS_SKILL]->(:Skill)
  27. 27. Consolidate Paths (:Person)-[:WORKS_FOR]->(:Company), (:Person)-[:HAS_SKILL]->(:Skill) (:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)
  28. 28. Create Person Subgraph MERGE (c:Company{name:'Acme'}) MERGE (p:Person{name:'Ian'}) MERGE (s1:Skill{name:'Java'}) MERGE (s2:Skill{name:'C#'}) MERGE (s3:Skill{name:'Neo4j'}) CREATE UNIQUE (c)<-[:WORKS_FOR]-(p), (p)-[:HAS_SKILL]->(s1), (p)-[:HAS_SKILL]->(s2), (p)-[:HAS_SKILL]->(s3) RETURN c, p, s1, s2, s3
  29. 29. Candidate Data Model (:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)
  30. 30. Express QuesEon as Graph PaWern Which people, who work for the same company as me, have similar skills to me?
  31. 31. Cypher Query Which people, who work for the same company as me, have similar skills to me? MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill) WHERE me.name = {name} RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skills ORDER BY score DESC
  32. 32. Graph PaWern Which people, who work for the same company as me, have similar skills to me? MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill) WHERE me.name = {name} RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skills ORDER BY score DESC
  33. 33. Anchor PaWern in Graph Which people, who work for the same company as me, have similar skills to me? MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill) WHERE me.name = {name} RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skills ORDER BY score DESC If an index for Person.name exists, Cypher will use it
  34. 34. Create ProjecEon of Results Which people, who work for the same company as me, have similar skills to me? MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill) WHERE me.name = {name} RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skills ORDER BY score DESC
  35. 35. First Match
  36. 36. Second Match
  37. 37. Third Match
  38. 38. Running the Query +-----------------------------------+ | name | score | skills | +-----------------------------------+ | "Lucy" | 2 | ["Java","Neo4j"] | | "Bill" | 1 | ["Neo4j"] | +-----------------------------------+ 2 rows
  39. 39. From User Story to Model and Query MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill), (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill) WHERE me.name = {name} RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skills ORDER BY score DESC As an employee I want to know who in the company has similar skills to me So that we can exchange knowledge Person WORKS_FOR Company Person HAS_SKILL Skill (:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill) ? Which people, who work for the same company as me, have similar skills to me?
  40. 40. Modeling Tips
  41. 41. ProperEes Versus RelaEonships
  42. 42. Use RelaEonships When… • You need to specify the weight, strength, or some other quality of the rela#onship • AND/OR the aWribute value comprises a complex value type (e.g. address) • Examples: – Find all my colleagues who are expert (relaEonship quality) at a skill (aWribute value) we have in common – Find all recent orders delivered to the same delivery address (complex value type)
  43. 43. Use ProperEes When… • There’s no need to qualify the relaEonship • AND the aWribute value comprises a simple value type (e.g. colour) • Examples: – Find those projects wriWen by contributors to my projects that use the same language (aWribute value) as my projects
  44. 44. If Performance is CriEcal… • Small property lookup on a node will be quicker than traversing a relaEonship – But traversing a relaEonship is sEll faster than a SQL join… • However, many small proper#es on a node, or a lookup on a large string or large array property will impact performance – Always performance test against a representaEve dataset
  45. 45. RelaEonship Granularity
  46. 46. Align With Use Cases • RelaEonships are the “royal road” into the graph • When querying, well-­‐named relaEonships help discover only what is absolutely necessary – And eliminate unnecessary porEons of the graph from consideraEon
  47. 47. General RelaEonships • Qualified by property
  48. 48. Specific RelaEonships
  49. 49. Best of Both Worlds
  50. 50. Model and Query Recipes
  51. 51. Events and AcEons • Oken involve mulEple parEes • Can include other circumstanEal detail, which may be common to mulEple events • Examples – Patrick worked for Acme from 2001 to 2005 as a Sokware Developer – Sarah sent an email to Lucy, copying in David and Claire
  52. 52. Timeline Trees • Discrete events – No natural relaEonships to other events • You need to find events at differing levels of granularity – Between two days – Between two months – Between two minutes
  53. 53. Example Timeline Tree
  54. 54. Pimalls and AnE-­‐PaWerns
  55. 55. Modeling EnEEes as RelaEonships • Limits data model evoluEon – A relaEonship connects two things – Modeling an enEty as a relaEonship prevents it from being related to more than two things • Smells: – Lots of aWribute-­‐like properEes – Heavy use of relaEonship indexes • EnEEes hidden in verbs: – E.g. emailed, reviewed
  56. 56. Example: Movie Reviews • IniEal requirements: – People review films – ApplicaEon aggregates reviews from mulEple sites
  57. 57. IniEal Model
  58. 58. New Requirements • Allow user to comment on each other’s reviews – Can’t connect a review to a third enEty
  59. 59. Revised model
  60. 60. Model AcEons in Terms of Products
  61. 61. Now for Some Prototyping!
  62. 62. Draw a Model! Eg. Using Visio, www.apcjones.com/arrows, hWp://graphjson.io, Omnigraffle
  63. 63. CreaEng a prototype DB out of our model?
  64. 64. Now for Some Queries!
  65. 65. Next meetup! • January 22nd : how to create an APPLICATION on top of our newly created database
  66. 66. BACKUP slides: Cypher Query Language
  67. 67. Nodes and RelaEonships ()-->()
  68. 68. Labels and RelaEonship Types (:Person)-[:FRIEND]->(:Person)
  69. 69. ProperEes (:Person{name:'Peter'})-[:FRIEND]->(:Person{name:'Lucy'})
  70. 70. IdenEfiers (p1:Person{name:'Peter'})-[r:FRIEND]->(p2:Person{name:'Lucy'})
  71. 71. Cypher MATCH graph_pattern WHERE binding_and_filter_criteria RETURN results
  72. 72. Cypher MATCH (p:Person)-[:FRIEND]->(friends) WHERE p.name = 'Peter' RETURN friends
  73. 73. Lookup Using IdenEfier + Label MATCH (p:Person)-[:FRIEND]->(friends) WHERE p.name = 'Peter' RETURN friends

×