Graph db

Using Graph Databases For Insights
Into Connected Data
Gagan Agrawal
Xebia

2
Agenda

High level view of Graph Space

Comparison with RDBMS and other NoSQL
stores

Data Modeling

Cypher : Graph Query Language

Graphs in Real World

Graph Database Internals

5
What is a Graph?

A collection of vertices and edges.

Set of nodes and the relationships that connect
them.

Graph Represents -

Entities as NODES

The way those entities relate to the world as
RELATIONSHIP

Allows to model all kind of scenarios

System of road

Medical history

Supply chain management

Data Center

8
High Level view of Graph Space

Graph Databases - Technologies used primarily
for transactional online graph persistence –
OLTP.

Graph Compute Engines - Tecnologies used
primarily for offline graph analytics - OLAP.

9
Graph Databases

Online database management system with -
Create, Read, Update, Delete
methods that expose a graph data model.

Built for use with transactional (OLTP) systems.

Used for richly connected data.

Querying is performed through traversals.

Can perform millions of traversal steps per
second.

Traversal step resembles a join in a RDBMS

10
Graph Database Properties

The Underlying Storage : Native / Non-Native

The Processing Engine : Native / Non-Native

11
Graph DB – The Underlying Storage

Native Graph Storage – Optimized and designed
for storing and managing graphs.

Non-Native Graph Storage – Serialize the graph
data into a relational database, an object oriented
database, or some other general purpose data
store.

13
Graph DB – The processing Engine

Index free adjacency – Connected Nodes
physically point to each other in the database

15
Native : Index Free Adjacency

17
Power of Graph Databases

Performance

Flexibility

Agility

18
Comparison

Relational Databases

NoSQL Databases

Graph Databases

19
Relational Databases Lack Relationships

Initially designed to codify paper forms and
tabular structures.

Deal poorly with relationships.

The rise in connectedness translates into
increased joins.

Lower performance.

Difficult to cater for changing business needs.

21
RDBMS

What products did a customer buy?

Which customers bought this product?

Which customers bought this product who also
bought that product?

23
Query to find friends-of-friends

24
NoSQL Databases also lack Relationships

NOSQL Databases e.g key-value, document or
column oriented store sets of disconnected
values/documents/columns.

Makes it difficult to use them for connected data
and graphs.

One of the solution is to embed an aggregate's
identifier inside the field belonging to another
aggregate.

Effectively introducing foreign keys

Requires joining aggregates at the application
level.

26
NoSQL DB

Relationships between aggregates aren't first class
citizens in the data model.

Foreign aggregate "links" are not reflexive.

Asking the database "Who has bought a
particular product" is an expensive operation.

Need to use some external compute
infrastructure e.g Hadoop for such processing.

Do not maintain consistency of connected data.

Do not support index-free adjacency.

28
Graph DB Embraces Relationships

29
Graph DB

Find friends-of-friends in a social network, to a
maximum depth of 5.

Total records : 1,000,000

Each with approximately 50 friends

33
Data Modeling

“Whiteboard” friendly

The typical whiteboard view of a problem is a
GRAPH.

Sketch in our creative and analytical modes,
maps closely to the data model inside the
database.

35
Cypher : Graph Query Language

Pattern-Matching Query Language

Humane language

Expressive

Declarative : Say what you want, now how

Borrows from well know query languages

Aggregation, Ordering, Limit

Update the Graph

36
Cypher

Cypher Representation :
(c)-[:KNOWS]->(b)-[:KNOWS]->(a), (c)-[:KNOWS]-
>(a)
(c)-[:KNOWS]->(b)-[:KNOWS]->(a)<-[:KNOWS]-(c)

37
Cypher
START c=node:user(name='Michael')
MATCH (c)-[:KNOWS]->(b)-[:KNOWS]->(a), (c)-
[:KNOWS]->(a)
RETURN a, b

38
Other Cypher Clauses

WHERE

Provides criteria for filtering pattern matching
results.

CREATE and CREATE UNIQUE

Create nodes and relationships

DELETE

Removes nodes, relationships and properties

SET

Sets property values

39
Other Cypher Clauses

FOREACH

Performs an updating action for graph element in
a list.

UNION

Merge results from two or more queries.

WITH

Chains subsequent query parts and forward
results from one to the next. Similar to piping
commands in UNIX.

40
Comparison of Relational and Graph Modeling

42
Entity Relationship Diagram

45
Query to find faulty Equipment

47
Fine Grained vs Generic Relationships
DELIVERY_ADDRESS
VS
ADDRESS{type : 'delivery'}

51
Common Use Cases

Social

Recommendations

Geo

Logistics Networks : for package routing, finding shortest
Path

Financial Transaction Graphs : for fraud detection

Master Data Management

Bioinformatics : Era7 to relate complex web of information
that includes genes, proteins and enzymes

Authorization and Access Control : Adobe Creative
Cloud, Telenor

53
Non Functional Characteristics

Transactions

Fully ACID

Recoverability

Availability

Scalability

54
Scalability

Capacity (Graph Size)

Latency (Response Time)

Read and Write Throughput

55
Capacity

1.9 Release of Neo4j can support single graphs
having 10s of billions of nodes, relationships
and properties.

The Neo4j team has publicly expressed the
intention to support 100B+
nodes/relationships/properties in a single graph
as part of its 2013 roadmap.

56
Latency

RDBMS – more data in tables/indexes result in
longer join operations.

Graph DB doesn't suffer the same latency
problem.

Index is used to find starting node.

Traversal uses a combination of pointer chasing
and pattern matching to search the data.

Performance does not depend on total size of the
dataset.

Depends only on the data being queried.

57
Throughput

Constant performance irrespective of graph size.

Graph db

More Related Content

What's hot

Similar to Graph db

More from Gagan Agrawal

Recently uploaded

Graph db