Graph db

1,336 views
1,195 views

Published on

Published in: Technology
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total views
1,336
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
40
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

Graph db

  1. 1. Using Graph Databases For Insights Into Connected Data Gagan Agrawal Xebia
  2. 2. 2 Agenda  High level view of Graph Space  Comparison with RDBMS and other NoSQL stores  Data Modeling  Cypher : Graph Query Language  Graphs in Real World  Graph Database Internals
  3. 3. 3 What is a Graph?
  4. 4. 4 Graph
  5. 5. 5 What is a Graph?  A collection of vertices and edges.  Set of nodes and the relationships that connect them.  Graph Represents -  Entities as NODES  The way those entities relate to the world as RELATIONSHIP  Allows to model all kind of scenarios  System of road  Medical history  Supply chain management  Data Center
  6. 6. 6 Example – Twitter's Data
  7. 7. 7 Example – Twitter's Data
  8. 8. 8 High Level view of Graph Space  Graph Databases - Technologies used primarily for transactional online graph persistence – OLTP.  Graph Compute Engines - Tecnologies used primarily for offline graph analytics - OLAP.
  9. 9. 9 Graph Databases  Online database management system with - Create, Read, Update, Delete methods that expose a graph data model.  Built for use with transactional (OLTP) systems.  Used for richly connected data.  Querying is performed through traversals.  Can perform millions of traversal steps per second.  Traversal step resembles a join in a RDBMS
  10. 10. 10 Graph Database Properties  The Underlying Storage : Native / Non-Native  The Processing Engine : Native / Non-Native
  11. 11. 11 Graph DB – The Underlying Storage  Native Graph Storage – Optimized and designed for storing and managing graphs.  Non-Native Graph Storage – Serialize the graph data into a relational database, an object oriented database, or some other general purpose data store.
  12. 12. 12 Native Graph Storage
  13. 13. 13 Graph DB – The processing Engine  Index free adjacency – Connected Nodes physically point to each other in the database
  14. 14. 14 Non-Native : Index Look-Up
  15. 15. 15 Native : Index Free Adjacency
  16. 16. 16 Graph Databases
  17. 17. 17 Power of Graph Databases  Performance  Flexibility  Agility
  18. 18. 18 Comparison  Relational Databases  NoSQL Databases  Graph Databases
  19. 19. 19 Relational Databases Lack Relationships  Initially designed to codify paper forms and tabular structures.  Deal poorly with relationships.  The rise in connectedness translates into increased joins.  Lower performance.  Difficult to cater for changing business needs.
  20. 20. 20 RDBMS
  21. 21. 21 RDBMS  What products did a customer buy?  Which customers bought this product?  Which customers bought this product who also bought that product?
  22. 22. 22 RDBMS
  23. 23. 23 Query to find friends-of-friends
  24. 24. 24 NoSQL Databases also lack Relationships  NOSQL Databases e.g key-value, document or column oriented store sets of disconnected values/documents/columns.  Makes it difficult to use them for connected data and graphs.  One of the solution is to embed an aggregate's identifier inside the field belonging to another aggregate.  Effectively introducing foreign keys  Requires joining aggregates at the application level.
  25. 25. 25 NoSQL DB
  26. 26. 26 NoSQL DB  Relationships between aggregates aren't first class citizens in the data model.  Foreign aggregate "links" are not reflexive.  Asking the database "Who has bought a particular product" is an expensive operation.  Need to use some external compute infrastructure e.g Hadoop for such processing.  Do not maintain consistency of connected data.  Do not support index-free adjacency.
  27. 27. 27 NoSQL DB
  28. 28. 28 Graph DB Embraces Relationships
  29. 29. 29 Graph DB  Find friends-of-friends in a social network, to a maximum depth of 5.  Total records : 1,000,000  Each with approximately 50 friends
  30. 30. 30 Graph DB
  31. 31. 31 NoSQL Comparison
  32. 32. 32 Data Modeling with Graph
  33. 33. 33 Data Modeling  “Whiteboard” friendly  The typical whiteboard view of a problem is a GRAPH.  Sketch in our creative and analytical modes, maps closely to the data model inside the database.
  34. 34. 34 The Property Graph Model
  35. 35. 35 Cypher : Graph Query Language  Pattern-Matching Query Language  Humane language  Expressive  Declarative : Say what you want, now how  Borrows from well know query languages  Aggregation, Ordering, Limit  Update the Graph
  36. 36. 36 Cypher  Cypher Representation : (c)-[:KNOWS]->(b)-[:KNOWS]->(a), (c)-[:KNOWS]- >(a) (c)-[:KNOWS]->(b)-[:KNOWS]->(a)<-[:KNOWS]-(c)
  37. 37. 37 Cypher START c=node:user(name='Michael') MATCH (c)-[:KNOWS]->(b)-[:KNOWS]->(a), (c)- [:KNOWS]->(a) RETURN a, b
  38. 38. 38 Other Cypher Clauses  WHERE  Provides criteria for filtering pattern matching results.  CREATE and CREATE UNIQUE  Create nodes and relationships  DELETE  Removes nodes, relationships and properties  SET  Sets property values
  39. 39. 39 Other Cypher Clauses  FOREACH  Performs an updating action for graph element in a list.  UNION  Merge results from two or more queries.  WITH  Chains subsequent query parts and forward results from one to the next. Similar to piping commands in UNIX.
  40. 40. 40 Comparison of Relational and Graph Modeling
  41. 41. 41 Systems Management Domain
  42. 42. 42 Entity Relationship Diagram
  43. 43. 43 Tables and Relationships
  44. 44. 44 Graph Representation
  45. 45. 45 Query to find faulty Equipment
  46. 46. 46 Matched Paths
  47. 47. 47 Fine Grained vs Generic Relationships DELIVERY_ADDRESS VS ADDRESS{type : 'delivery'}
  48. 48. 48
  49. 49. 49
  50. 50. 50 Graphs in the Real World
  51. 51. 51 Common Use Cases  Social  Recommendations  Geo  Logistics Networks : for package routing, finding shortest Path  Financial Transaction Graphs : for fraud detection  Master Data Management  Bioinformatics : Era7 to relate complex web of information that includes genes, proteins and enzymes  Authorization and Access Control : Adobe Creative Cloud, Telenor
  52. 52. 52 Graph Database Internals
  53. 53. 53 Non Functional Characteristics  Transactions  Fully ACID  Recoverability  Availability  Scalability
  54. 54. 54 Scalability  Capacity (Graph Size)  Latency (Response Time)  Read and Write Throughput
  55. 55. 55 Capacity  1.9 Release of Neo4j can support single graphs having 10s of billions of nodes, relationships and properties.  The Neo4j team has publicly expressed the intention to support 100B+ nodes/relationships/properties in a single graph as part of its 2013 roadmap.
  56. 56. 56 Latency  RDBMS – more data in tables/indexes result in longer join operations.  Graph DB doesn't suffer the same latency problem.  Index is used to find starting node.  Traversal uses a combination of pointer chasing and pattern matching to search the data.  Performance does not depend on total size of the dataset.  Depends only on the data being queried.
  57. 57. 57 Throughput  Constant performance irrespective of graph size.
  58. 58. 58 Who uses Neo4j ?
  59. 59. 59 Resources
  60. 60. 60 Thank You

×