NATC 2013 - Using Graph Databases for Insights into Connected Data

684 views
520 views

Published on


NASSCOM Annual Technology Conference 2013

Session: Using Graph Databases for Insights into Connected Data

Speaker: Gagan Agarwal, Xebia

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
684
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
25
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

NATC 2013 - Using Graph Databases for Insights into Connected Data

  1. 1. Using Graph Databases For Insights Into Connected Data Gagan Agrawal Xebia India 1
  2. 2. Agenda       High level view of Graph Space Comparison with RDBMS and other NoSQL stores Data Modeling Cypher : Graph Query Language Graph Database Internals Graphs In Real World Xebia India 2
  3. 3. What is a Graph? Xebia India 3
  4. 4. Graph Xebia India 4
  5. 5. What is a Graph?    A collection of vertices and edges. Set of nodes and the relationships that connect them. Graph Represents    Entities as NODES The way those entities relate to the world as RELATIONSHIP Allows to model all kind of scenarios     System of road Medical history Supply chain management Data Center Xebia India 5
  6. 6. Example – Twitter's Data Xebia India 6
  7. 7. Example – Twitter's Data Xebia India 7
  8. 8. High Level view of Graph Space   Graph Databases - Technologies used primarily for transactional online graph persistence – OLTP. Graph Compute Engines - Tecnologies used primarily for offline graph analytics - OLAP. Xebia India 8
  9. 9. Graph Databases  Online database management system with Create, Read, Update, Delete methods that expose a graph data model.  Built for use with transactional (OLTP) systems.  Used for richly connected data.  Querying is performed through traversals.  Can perform millions of traversal steps per second.  Traversal step resembles a join in a RDBMS Xebia India 9
  10. 10. Graph Database Properties  The Underlying Storage : Native / Non-Native  The Processing Engine : Native / Non-Native Xebia India 10
  11. 11. Graph DB – The Underlying Storage   Native Graph Storage – Optimized and designed for storing and managing graphs. Non-Native Graph Storage – Serialize the graph data into a relational database, an object oriented database, or some other general purpose data store. Xebia India 11
  12. 12. Native Graph Storage Xebia India 12
  13. 13. Graph DB – The processing Engine  Index free adjacency – Connected Nodes physically point to each other in the database Xebia India 13
  14. 14. Non-Native : Index Look-Up Xebia India 14
  15. 15. Native : Index Free Adjacency Xebia India 15
  16. 16. Graph Databases Xebia India 16
  17. 17. Power of Graph Databases  Performance  Flexibility  Agility Xebia India 17
  18. 18. Comparison  Relational Databases  NoSQL Databases  Graph Databases Xebia India 18
  19. 19. Relational Databases Lack Relationships      Initially designed to codify paper forms and tabular structures. Deal poorly with relationships. The rise in connectedness translates into increased joins. Lower performance. Difficult to cater for changing business needs. Xebia India 19
  20. 20. RDBMS Xebia India 20
  21. 21. Query to find friends-of-friends Xebia India 21
  22. 22. NoSQL Databases also lack Relationships    NOSQL Databases e.g key-value, document or column oriented store sets of disconnected values/documents/columns. Makes it difficult to use them for connected data and graphs. One of the solution is to embed an aggregate's identifier inside the field belonging to another aggregate.   Effectively introducing foreign keys Requires joining aggregates at the application level. Xebia India 22
  23. 23. NoSQL DB      Relationships between aggregates aren't first class citizens in the data model. Foreign aggregate "links" are not reflexive. Need to use some external compute infrastructure e.g Hadoop for such processing. Do not maintain consistency of connected data. Do not support index-free adjacency. Xebia India 23
  24. 24. NoSQL DB Xebia India 24
  25. 25. Graph DB Embraces Relationships Xebia India 25
  26. 26. Graph DB  Find friends-of-friends in a social network, to a maximum depth of 5.   Total records : 1,000,000 Each with approximately 50 friends Xebia India 26
  27. 27. NoSQL Comparison Xebia India 27
  28. 28. Data Modeling with Graph Xebia India 28
  29. 29. Data Modeling    “Whiteboard” friendly The typical whiteboard view of a problem is a GRAPH. Sketch in our creative and analytical modes, maps closely to the data model inside the database. Xebia India 29
  30. 30. The Property Graph Model Xebia India 30
  31. 31. Cypher : Graph Query Language        Pattern-Matching Query Language Humane language Expressive Declarative : Say what you want, now how Borrows from well know query languages Aggregation, Ordering, Limit Update the Graph Xebia India 31
  32. 32. Cypher  Cypher Representation : (c)-[:KNOWS]->(b)-[:KNOWS]->(a), (c)-[:KNOWS]>(a) (c)-[:KNOWS]->(b)-[:KNOWS]->(a)<-[:KNOWS]-(c) Xebia India 32
  33. 33. Cypher START c=node:user(name='Michael') MATCH (c)-[:KNOWS]->(b)-[:KNOWS]->(a), (c)[:KNOWS]->(a) RETURN a, b Xebia India 33
  34. 34. Other Cypher Clauses  WHERE   CREATE and CREATE UNIQUE   Create nodes and relationships DELETE   Provides criteria for filtering pattern matching results. Removes nodes, relationships and properties SET  Sets property values Xebia India 34
  35. 35. Other Cypher Clauses  FOREACH   UNION   Performs an updating action for graph element in a list. Merge results from two or more queries. WITH  Chains subsequent query parts and forward results from one to the next. Similar to piping commands in UNIX. Xebia India 35
  36. 36. Comparison of Relational and Graph Modeling Xebia India 36
  37. 37. Systems Management Domain Xebia India 37
  38. 38. Tables and Relationships Xebia India 38
  39. 39. Graph Representation Xebia India 39
  40. 40. Query to find faulty Equipment Xebia India 40
  41. 41. Matched Paths Xebia India 41
  42. 42. Graph Database Internals Xebia India 42
  43. 43. Non Functional Characteristics  Transactions     Fully ACID Recoverability Availability Scalability Xebia India 43
  44. 44. Scalability  Capacity (Graph Size)  Latency (Response Time)  Read and Write Throughput Xebia India 44
  45. 45. Capacity   1.9 Release of Neo4j can support single graphs having 10s of billions of nodes, relationships and properties. The Neo4j team has publicly expressed the intention to support 100B+ nodes/relationships/properties in a single graph. Xebia India 45
  46. 46. Latency       RDBMS – more data in tables/indexes result in longer join operations. Graph DB doesn't suffer the same latency problem. Index is used to find starting node. Traversal uses a combination of pointer chasing and pattern matching to search the data. Performance does not depend on total size of the dataset. Depends only on the data being queried. Xebia India 46
  47. 47. Throughput  Constant performance irrespective of graph size. Xebia India 47
  48. 48. Graphs in the Real World Xebia India 48
  49. 49. Common Use Cases     Social Recommendations Geo Logistics Networks : for package routing, finding shortest Path    Financial Transaction Graphs : for fraud detection Master Data Management Bioinformatics : Era7 to relate complex web of information that includes genes, proteins and enzymes  Authorization and Access Control : Adobe Creative Cloud, Telenor Xebia India 49
  50. 50. Who uses Neo4j ? Xebia India 50
  51. 51. Resources Xebia India 51
  52. 52. Thank You Xebia India 52

×