0
Using Graph Databases For Insights
Into Connected Data

Gagan Agrawal

Xebia India

1
SOFTWARE DEVELOPMENT DONE RIGHT
Netherlands | USA | India | France | UK
Agenda








High level view of Graph Space
Comparison with RDBMS and other NoSQL
stores
Data Modeling
Cypher : Gr...
What is a Graph?

Xebia India

4
What is a Graph?






A collection of vertices and edges.
Set of nodes and the relationships that connect
them.
Graph ...
High Level view of Graph Space




Graph Databases - Technologies used primarily
for transactional online graph persiste...
Graph Databases


Online database management system with Create, Read, Update, Delete

methods that expose a graph data m...
Graph Database Properties


The Underlying Storage : Native / Non-Native



The Processing Engine : Native / Non-Native
...
Graph DB – The Underlying Storage




Native Graph Storage – Optimized and designed
for storing and managing graphs.
Non...
Graph DB – The processing Engine


Index free adjacency – Connected Nodes
physically point to each other in the database
...
Power of Graph Databases


Performance



Flexibility



Agility

Xebia India

18
Comparison


Relational Databases



NoSQL Databases



Graph Databases

Xebia India

19
Relational Databases Lack
Relationships








Initially designed to codify paper forms and
tabular structures.
Deal...
NoSQL Databases also lack
Relationships






NOSQL Databases e.g key-value, document or
column oriented store sets of ...
NoSQL DB








Relationships between aggregates aren't first
class citizens in the data model.
Foreign aggregate "l...
Graph DB


Find friends-of-friends in a social network, to a
maximum depth of 5.



Total records : 1,000,000
Each with...
Data Modeling with Graph

Xebia India

29
Data Modeling






“Whiteboard” friendly

The typical whiteboard view of a problem is a
GRAPH.
Sketch in our creative ...
Cypher : Graph Query Language









Pattern-Matching Query Language
Humane language
Expressive
Declarative : Say...
Cypher


Cypher Representation :
(c)-[:KNOWS]->(b)-[:KNOWS]->(a), (c)-[:KNOWS]->(a)
(c)-[:KNOWS]->(b)-[:KNOWS]->(a)<-[:KN...
Cypher
START c=node:user(name='Michael')
MATCH (c)-[:KNOWS]->(b)-[:KNOWS]->(a), (c)[:KNOWS]->(a)
RETURN a, b

Xebia India
...
Other Cypher Clauses


WHERE




CREATE and CREATE UNIQUE




Create nodes and relationships

DELETE




Provides c...
Other Cypher Clauses


FOREACH




UNION




Performs an updating action for graph element in
a list.
Merge results f...
Comparison of Relational and Graph Modeling

Xebia India

37
Graph Database Internals

Xebia India

43
Non Functional Characteristics


Transactions






Fully ACID

Recoverability
Availability
Scalability

Xebia India
...
Scalability


Capacity (Graph Size)



Latency (Response Time)



Read and Write Throughput

Xebia India

45
Capacity




1.9 Release of Neo4j can support single graphs
having 10s of billions of nodes, relationships
and propertie...
Latency











RDBMS – more data in tables/indexes result in
longer join operations.
Graph DB doesn't suffer the...
Throughput


Constant performance irrespective of graph size.

Xebia India

48
Graphs in the Real World

Xebia India

49
Common Use Cases





Social
Recommendations
Geo
Logistics Networks : for package routing, finding shortest
Path



...
Thank You

Xebia India

53
BigData & Real Time Analytics

Services
Visualization (Tableau)
Analytics Framework (Mahout)
Integration (Sqoop, Flume , S...
Contact us @

Websites

www.xebia.in
www.xebia.com
www.xebia.fr

Xebia India

infoindia@xebia.com

Thought
Leadership

Htt...
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Upcoming SlideShare
Loading in...5
×

Using Graph Databases For Insights Into Connected Data.

686

Published on

Graph databases address one of the great macroscopic business trends of today: leveraging complex and dynamic relationships in highly connected data to generate insight and competitive advantage. Whether we want to understand relationships between customers, elements in a telephone or data center network, entertainment producers and consumers, or genes and proteins, the ability to understand and analyze vast graphs of highly connected data will be key in determining which companies outperform their competitors over the coming decade. In this session, I am going to cover following graph database concepts mainly w.r.t Neo4j.

High level view of Graph Space
Power of Graph Databases
Data Modeling with Graphs
Cypher : Graph Query language
Building a Graph Database Application
Graphs in Real World / Common Use cases
Predictive Analysis with Graph Theory

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
686
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
19
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Services should include hadoop consulting rather
  • Transcript of "Using Graph Databases For Insights Into Connected Data."

    1. 1. Using Graph Databases For Insights Into Connected Data Gagan Agrawal Xebia India 1
    2. 2. SOFTWARE DEVELOPMENT DONE RIGHT Netherlands | USA | India | France | UK
    3. 3. Agenda       High level view of Graph Space Comparison with RDBMS and other NoSQL stores Data Modeling Cypher : Graph Query Language Graph Database Internals Graphs In Real World Xebia India 3
    4. 4. What is a Graph? Xebia India 4
    5. 5. What is a Graph?    A collection of vertices and edges. Set of nodes and the relationships that connect them. Graph Represents    Entities as NODES The way those entities relate to the world as RELATIONSHIP Allows to model all kind of scenarios     System of road Medical history Supply chain management Data Center Xebia India 6
    6. 6. High Level view of Graph Space   Graph Databases - Technologies used primarily for transactional online graph persistence – OLTP. Graph Compute Engines - Tecnologies used primarily for offline graph analytics - OLAP. Xebia India 9
    7. 7. Graph Databases  Online database management system with Create, Read, Update, Delete methods that expose a graph data model.  Built for use with transactional (OLTP) systems.  Used for richly connected data.  Querying is performed through traversals.  Can perform millions of traversal steps per second.  Traversal step resembles a join in a RDBMS Xebia India 10
    8. 8. Graph Database Properties  The Underlying Storage : Native / Non-Native  The Processing Engine : Native / Non-Native Xebia India 11
    9. 9. Graph DB – The Underlying Storage   Native Graph Storage – Optimized and designed for storing and managing graphs. Non-Native Graph Storage – Serialize the graph data into a relational database, an object oriented database, or some other general purpose data store. Xebia India 12
    10. 10. Graph DB – The processing Engine  Index free adjacency – Connected Nodes physically point to each other in the database Xebia India 14
    11. 11. Power of Graph Databases  Performance  Flexibility  Agility Xebia India 18
    12. 12. Comparison  Relational Databases  NoSQL Databases  Graph Databases Xebia India 19
    13. 13. Relational Databases Lack Relationships      Initially designed to codify paper forms and tabular structures. Deal poorly with relationships. The rise in connectedness translates into increased joins. Lower performance. Difficult to cater for changing business needs. Xebia India 20
    14. 14. NoSQL Databases also lack Relationships    NOSQL Databases e.g key-value, document or column oriented store sets of disconnected values/documents/columns. Makes it difficult to use them for connected data and graphs. One of the solution is to embed an aggregate's identifier inside the field belonging to another aggregate.   Effectively introducing foreign keys Requires joining aggregates at the application level. Xebia India 23
    15. 15. NoSQL DB      Relationships between aggregates aren't first class citizens in the data model. Foreign aggregate "links" are not reflexive. Need to use some external compute infrastructure e.g Hadoop for such processing. Do not maintain consistency of connected data. Do not support index-free adjacency. Xebia India 24
    16. 16. Graph DB  Find friends-of-friends in a social network, to a maximum depth of 5.   Total records : 1,000,000 Each with approximately 50 friends Xebia India 27
    17. 17. Data Modeling with Graph Xebia India 29
    18. 18. Data Modeling    “Whiteboard” friendly The typical whiteboard view of a problem is a GRAPH. Sketch in our creative and analytical modes, maps closely to the data model inside the database. Xebia India 30
    19. 19. Cypher : Graph Query Language        Pattern-Matching Query Language Humane language Expressive Declarative : Say what you want, now how Borrows from well know query languages Aggregation, Ordering, Limit Update the Graph Xebia India 32
    20. 20. Cypher  Cypher Representation : (c)-[:KNOWS]->(b)-[:KNOWS]->(a), (c)-[:KNOWS]->(a) (c)-[:KNOWS]->(b)-[:KNOWS]->(a)<-[:KNOWS]-(c) Xebia India 33
    21. 21. Cypher START c=node:user(name='Michael') MATCH (c)-[:KNOWS]->(b)-[:KNOWS]->(a), (c)[:KNOWS]->(a) RETURN a, b Xebia India 34
    22. 22. Other Cypher Clauses  WHERE   CREATE and CREATE UNIQUE   Create nodes and relationships DELETE   Provides criteria for filtering pattern matching results. Removes nodes, relationships and properties SET  Sets property values Xebia India 35
    23. 23. Other Cypher Clauses  FOREACH   UNION   Performs an updating action for graph element in a list. Merge results from two or more queries. WITH  Chains subsequent query parts and forward results from one to the next. Similar to piping commands in UNIX. Xebia India 36
    24. 24. Comparison of Relational and Graph Modeling Xebia India 37
    25. 25. Graph Database Internals Xebia India 43
    26. 26. Non Functional Characteristics  Transactions     Fully ACID Recoverability Availability Scalability Xebia India 44
    27. 27. Scalability  Capacity (Graph Size)  Latency (Response Time)  Read and Write Throughput Xebia India 45
    28. 28. Capacity   1.9 Release of Neo4j can support single graphs having 10s of billions of nodes, relationships and properties. The Neo4j team has publicly expressed the intention to support 100B+ nodes/relationships/properties in a single graph. Xebia India 46
    29. 29. Latency       RDBMS – more data in tables/indexes result in longer join operations. Graph DB doesn't suffer the same latency problem. Index is used to find starting node. Traversal uses a combination of pointer chasing and pattern matching to search the data. Performance does not depend on total size of the dataset. Depends only on the data being queried. Xebia India 47
    30. 30. Throughput  Constant performance irrespective of graph size. Xebia India 48
    31. 31. Graphs in the Real World Xebia India 49
    32. 32. Common Use Cases     Social Recommendations Geo Logistics Networks : for package routing, finding shortest Path    Financial Transaction Graphs : for fraud detection Master Data Management Bioinformatics : Era7 to relate complex web of information that includes genes, proteins and enzymes  Authorization and Access Control : Adobe Creative Cloud, Telenor Xebia India 50
    33. 33. Thank You Xebia India 53
    34. 34. BigData & Real Time Analytics Services Visualization (Tableau) Analytics Framework (Mahout) Integration (Sqoop, Flume , Storm) Hadoop Powered Solutions (Pig, Hive, Oozie, Hbase Impala) (Solr, Elastic Search) Core Hadoop (HDFS, MapReduce,Zookeeper, Cloudera Trainings - Cloudera Data Analyst / Developer / Admin Training Products - Divolte - Wearable Sensors Solutions - Big data warehousing - Scalable big data etl - High volume web analytics
    35. 35. Contact us @ Websites www.xebia.in www.xebia.com www.xebia.fr Xebia India infoindia@xebia.com Thought Leadership Htto://xebee.xebia.in http://blog.xebia.com http://podcast.xebia.com
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×