2. 2
Agenda
High level view of Graph Space
Comparison with RDBMS and other NoSQL
stores
Data Modeling
Cypher : Graph Query Language
Graphs in Real World
Graph Database Internals
5. 5
What is a Graph?
A collection of vertices and edges.
Set of nodes and the relationships that connect
them.
Graph Represents -
Entities as NODES
The way those entities relate to the world as
RELATIONSHIP
Allows to model all kind of scenarios
System of road
Medical history
Supply chain management
Data Center
8. 8
High Level view of Graph Space
Graph Databases - Technologies used primarily
for transactional online graph persistence –
OLTP.
Graph Compute Engines - Tecnologies used
primarily for offline graph analytics - OLAP.
9. 9
Graph Databases
Online database management system with -
Create, Read, Update, Delete
methods that expose a graph data model.
Built for use with transactional (OLTP) systems.
Used for richly connected data.
Querying is performed through traversals.
Can perform millions of traversal steps per
second.
Traversal step resembles a join in a RDBMS
11. 11
Graph DB – The Underlying Storage
Native Graph Storage – Optimized and designed
for storing and managing graphs.
Non-Native Graph Storage – Serialize the graph
data into a relational database, an object oriented
database, or some other general purpose data
store.
19. 19
Relational Databases Lack Relationships
Initially designed to codify paper forms and
tabular structures.
Deal poorly with relationships.
The rise in connectedness translates into
increased joins.
Lower performance.
Difficult to cater for changing business needs.
21. 21
RDBMS
What products did a customer buy?
Which customers bought this product?
Which customers bought this product who also
bought that product?
24. 24
NoSQL Databases also lack Relationships
NOSQL Databases e.g key-value, document or
column oriented store sets of disconnected
values/documents/columns.
Makes it difficult to use them for connected data
and graphs.
One of the solution is to embed an aggregate's
identifier inside the field belonging to another
aggregate.
Effectively introducing foreign keys
Requires joining aggregates at the application
level.
26. 26
NoSQL DB
Relationships between aggregates aren't first class
citizens in the data model.
Foreign aggregate "links" are not reflexive.
Asking the database "Who has bought a
particular product" is an expensive operation.
Need to use some external compute
infrastructure e.g Hadoop for such processing.
Do not maintain consistency of connected data.
Do not support index-free adjacency.
29. 29
Graph DB
Find friends-of-friends in a social network, to a
maximum depth of 5.
Total records : 1,000,000
Each with approximately 50 friends
33. 33
Data Modeling
“Whiteboard” friendly
The typical whiteboard view of a problem is a
GRAPH.
Sketch in our creative and analytical modes,
maps closely to the data model inside the
database.
35. 35
Cypher : Graph Query Language
Pattern-Matching Query Language
Humane language
Expressive
Declarative : Say what you want, now how
Borrows from well know query languages
Aggregation, Ordering, Limit
Update the Graph
38. 38
Other Cypher Clauses
WHERE
Provides criteria for filtering pattern matching
results.
CREATE and CREATE UNIQUE
Create nodes and relationships
DELETE
Removes nodes, relationships and properties
SET
Sets property values
39. 39
Other Cypher Clauses
FOREACH
Performs an updating action for graph element in
a list.
UNION
Merge results from two or more queries.
WITH
Chains subsequent query parts and forward
results from one to the next. Similar to piping
commands in UNIX.
51. 51
Common Use Cases
Social
Recommendations
Geo
Logistics Networks : for package routing, finding shortest
Path
Financial Transaction Graphs : for fraud detection
Master Data Management
Bioinformatics : Era7 to relate complex web of information
that includes genes, proteins and enzymes
Authorization and Access Control : Adobe Creative
Cloud, Telenor
55. 55
Capacity
1.9 Release of Neo4j can support single graphs
having 10s of billions of nodes, relationships
and properties.
The Neo4j team has publicly expressed the
intention to support 100B+
nodes/relationships/properties in a single graph
as part of its 2013 roadmap.
56. 56
Latency
RDBMS – more data in tables/indexes result in
longer join operations.
Graph DB doesn't suffer the same latency
problem.
Index is used to find starting node.
Traversal uses a combination of pointer chasing
and pattern matching to search the data.
Performance does not depend on total size of the
dataset.
Depends only on the data being queried.