What is a Graph?
A collection of vertices and edges.
Set of nodes and the relationships that connect
Graph Represents -
Entities as NODES
The way those entities relate to the world as
Allows to model all kind of scenarios
System of road
Supply chain management
High Level view of Graph Space
Graph Databases - Technologies used primarily
for transactional online graph persistence –
Graph Compute Engines - Tecnologies used
primarily for offline graph analytics - OLAP.
Online database management system with -
Create, Read, Update, Delete
methods that expose a graph data model.
Built for use with transactional (OLTP) systems.
Used for richly connected data.
Querying is performed through traversals.
Can perform millions of traversal steps per
Traversal step resembles a join in a RDBMS
Graph DB – The Underlying Storage
Native Graph Storage – Optimized and designed
for storing and managing graphs.
Non-Native Graph Storage – Serialize the graph
data into a relational database, an object oriented
database, or some other general purpose data
Relational Databases Lack Relationships
Initially designed to codify paper forms and
Deal poorly with relationships.
The rise in connectedness translates into
Difficult to cater for changing business needs.
NoSQL Databases also lack Relationships
NOSQL Databases e.g key-value, document or
column oriented store sets of disconnected
Makes it difficult to use them for connected data
One of the solution is to embed an aggregate's
identifier inside the field belonging to another
Effectively introducing foreign keys
Requires joining aggregates at the application
Relationships between aggregates aren't first class
citizens in the data model.
Foreign aggregate "links" are not reflexive.
Asking the database "Who has bought a
particular product" is an expensive operation.
Need to use some external compute
infrastructure e.g Hadoop for such processing.
Do not maintain consistency of connected data.
Do not support index-free adjacency.
Cypher : Graph Query Language
Pattern-Matching Query Language
Declarative : Say what you want, now how
Borrows from well know query languages
Aggregation, Ordering, Limit
Update the Graph
MATCH (c)-[:KNOWS]->(b)-[:KNOWS]->(a), (c)-
RETURN a, b
Other Cypher Clauses
Provides criteria for filtering pattern matching
CREATE and CREATE UNIQUE
Create nodes and relationships
Removes nodes, relationships and properties
Sets property values
Other Cypher Clauses
Performs an updating action for graph element in
Merge results from two or more queries.
Chains subsequent query parts and forward
results from one to the next. Similar to piping
commands in UNIX.
Comparison of Relational and Graph Modeling
Common Use Cases
Logistics Networks : for package routing, finding shortest
Financial Transaction Graphs : for fraud detection
Master Data Management
Bioinformatics : Era7 to relate complex web of information
that includes genes, proteins and enzymes
Authorization and Access Control : Adobe Creative
1.9 Release of Neo4j can support single graphs
having 10s of billions of nodes, relationships
The Neo4j team has publicly expressed the
intention to support 100B+
nodes/relationships/properties in a single graph
as part of its 2013 roadmap.
RDBMS – more data in tables/indexes result in
longer join operations.
Graph DB doesn't suffer the same latency
Index is used to find starting node.
Traversal uses a combination of pointer chasing
and pattern matching to search the data.
Performance does not depend on total size of the
Depends only on the data being queried.
Constant performance irrespective of graph size.