Subject: NoSQL
Module 5.2
Module Name: Graph-Based
Databases
1
Version Code:NSDB3
Released Date:4-OCT-2019
Graph-Based
Databases
2
AIM:.
To understand the concept of Graph-Based
Databases.
Graph-Based
Databases
3
Objective
The objective of this module is to:-
• Understand the features of Graph-Based Databases.
• Understand the working of Graph-Based
Databases.
Graph-Based
Databases
4
Outcome
At the end of this module, you are expected to:-
• Define the features of Graph-Based Databases.
• The goal is not the material itself, but the experience in developing and
experimenting with real systems in a realistic environment.
Graph-Based Databases
5
Content
• Introduction
• Features of Graph-Based Databases
• Working of Graph-Based Databases
• What makes Graph Databases
unique?
• Conclusion
Graph-Based
Databases
6
Introduction
• Graph databases allow you to store entities and relationships between these entities.
• Entities are also known as nodes, which have properties. Think of a node as an instance of
an object in the application.
• Relations are known as edges that can have properties. Edges have directional significance;
nodes are organized by relationships which allow you to find interesting patterns
between the nodes.
• The organization of the graph lets the data to be stored once and then interpreted in
different ways based on the relationships.
Graph-Based
Databases
7
Introduction
• The graph paradigm goes well beyond databases and application development; it
is a reimagining of what is possible around the idea of connections.
• And just like any new problem-solving framework, approaching a challenge from a
different dimension often produces an order-of-magnitude change in possible
solutions.
Graph-Based
Databases
8
Features
The following are the features of graph-based
databases.
• Consistency
• Transactions
• Availability
• Query Features
• Scaling
Graph-Based
Databases
9
Consistency
• Since graph databases are operating on connected nodes, most graph database solutions usually
do not support distributing the nodes on different servers.
• There are some solutions, however, that support node distribution across a cluster of servers, such
as Infinite Graph.
• Within a single server, data is always consistent, especially in Neo4J which is fully ACID-
compliant. When running Neo4J in a cluster, a write to the master is eventually
synchronized to the slaves, while slaves are always available for read. Writes to slaves are allowed and
are immediately synchronized to the master; other slaves will not be synchronized immediately, though
they will have to wait for the data to propagate from the master.
Graph-Based
Databases
10
Consistency
• Graph databases ensure consistency through transactions.
• They do not allow dangling relationships: The start node and end node always
have to exist, and nodes can only be deleted if they don’t have any relationships attached to
them.
Graph-Based
Databases
11
Transactions
• Neo4J is ACID-compliant. Before changing any nodes or adding any relationships to existing
nodes, we have to start a transaction. Without wrapping operations in transactions, we
will get a Not In Transaction Exception. Read operations can be done without initiating
a transaction.
Transaction transaction =
database.beginTx(); try {
Node node = database.createNode();
node.setProperty("name", "NoSQL
Distilled"); node.setProperty("published",
"2012"); transaction.success();
} finally
{ transaction.finish
();
}
Graph-Based
Databases
12
Transactions
• In the code, we started a transaction on the database, then created a node and set properties on
it.
• We marked the transaction as success and finally completed it by finish. A transaction has to be
marked as success, otherwise Neo4J assumes that it was a failure and rolls it back
when finish is issued.
• Setting success without issuing finish also does not commit the data to the database.
This way of managing transactions has to be remembered when developing, as it differs from
the standard way of doing transactions in an RDBMS.
Graph-Based
Databases
13
Availability
• Neo4J, as of version 1.8, achieves high availability by providing for replicated slaves.
• These slaves can also handle writes: when they are written to, they synchronize the write
to the current master, and the write is committed first at the master and then at the slave. Other
slaves will eventually get the update.
• Other graph databases, such as Infinite Graph and FlockDB provide for distributed storage of
the nodes.
• Neo4J uses the Apache ZooKeeper [ZooKeeper] to keep track of the last transaction IDs persisted
on each slave node and the current master node. Once a server starts up, it
communicates with ZooKeeper and finds out which server is the master. If the server is the
first one to join the cluster, it becomes the master; when a master goes down, the cluster
elects a master from the available nodes, thus providing high availability.
Graph-Based
Databases
14
Query Features
• Graph databases are supported by query languages such as Gremlin [Gremlin].
• Gremlin is a domain specific language for traversing graphs; it can traverse all graph
databases that implement the Blueprints [Blueprints] property graph. Neo4J
also has the Cypher [Cypher] query language for querying the graph.
• Outside these query languages, Neo4J allows you to query the graph for properties of the
nodes, traverse the graph, or navigate the relationships of the nodes using language
bindings.
Graph-Based
Databases
15
Scaling
• In NoSQL databases, one of the commonly used scaling techniques is sharding, where data is
split and distributed across different servers.
• With graph databases, sharding is difficult, as graph databases are not aggregate-oriented but
relationship-oriented. Since any given node can be related to any other node, storing
related nodes on the same server is better for graph traversal.
• Traversing a graph when the nodes are on different machines is not good for performance.
Graph-Based
Databases
16
Scaling
• Generally speaking, there are three ways to scale graph databases.
• Since machines now can come with lots of RAM, we can add enough RAM to the
server, so that the working set of nodes and relationships is held entirely in the memory.
• We can improve the read scaling of the database by adding more slaves with read-only access
to the data, with all the writes going to the master. This pattern of writing once and
reading from many servers is a proven technique in MySQL clusters and is really useful
when the dataset is large enough to not fit in a single machine’s RAM, but small
enough to be replicated across multiple machines.
Graph-Based
Databases
17
Scaling
• Slaves can also contribute to the availability and read-scaling, as they can be configured
to never become a master, remaining always read-only.
• When the dataset size makes replication impractical, we can get the data from the
application side using domain-specific knowledge. For example, nodes that relate to
North America can be created on one server while the nodes that relate to Asia on
another. This application-level sharding need to understand that nodes are stored on
physically different databases
Graph-Based
Databases
18
Working of Graph Database
• Unlike other database management systems (DBMS), relationships take first priority
in graph databases.
• In the graph world, connected data is equally (or more) important than individual
data points.
This connections-first approach to data means relationships and connections are persisted
(and not just temporarily calculated) through every part of the data lifecycle: from idea to
design in a logical model, to implementation in a physical model, to operation using a
query language, and to persistence within a scalable, reliable database system.
Graph-Based
Databases
19
Working of Graph Database
• Unlike other database systems, this approach means your application doesn’t have
to infer data connections using things like foreign keys or out-of-band
processing, like MapReduce.
The result: Your data models are simpler and yet more expressive than the ones
you would produce with relational databases.
Graph-Based
Databases
20
What Makes Graph Database Unique?
Graph Storage
• Some graph databases use native graph storage that is specifically designed to store
and manage graphs – from bare metal on up.
• Other graph technologies use relational, columnar or object-oriented databases as
their storage layer.
• Non-native storage is often slower than a native approach because all of the
graph connections have to be translated into a different data model.
Graph-Based
Databases
21
What Makes Graph Database Unique?
Graph processing
• Native graph processing (a.k.a. index-free adjacency) is the most efficient means of
processing data in a graph because connected nodes physically point to each other in
the database.
• Non-native graph processing engines use other means to process Create, Read,
Update, or Delete (CRUD) operations that aren’t optimized for handling connected data.
Graph-Based
Databases
22
Conclusion
• ā€œUse the right database for the jobā€ is the propagated ideology of the graph
community, because every graph-based database is specialized on certain use
cases.
• Since there is no evaluation available which answers the question ā€œwhich
database is the right tool for the job?ā€, the features of all the different graph stores should
be considered to design a real life database system in a realistic environment.
Graph-Based
Databases
23
Summary
• This module describes the different features of graph-based databases.
• Consistency, transactions, scaling, and availability data are the features that
differs graph-based databases from that of the relational databases.
• However, all the features should be utilised in an optimized way to ensure a good design of
a graph-based database.
Graph-Based
Databases
24
References
• No-SQL: A Brief Guide to the Emerging World of Polyglot Persistence By: Pramod
J. Sadalage & Martin Fowler, Pearson Education, Inc. Page 260-269
• NoSQL for Dummies, By: Adam Fowler, Published by: John Wiley & Sons, Inc.
Page 104-113
a. Native graph processing
b. Riak
c. Neo4j
Answer: a
25
Graph-Based
Databases
Self Assessment Questions
1. is the most efficient means of processing data in a graph.
a. Native graph processing
b. Non-native graph
processing
Answer: b
26
Graph-Based
Databases
Self Assessment Questions
2. engines use other means to process Create, Read, Update or Delete (CRUD)
operations that aren’t optimized for handling connected data.
a. Hbase
b. Riak
c. None of the above
Answer: b
27
Graph-Based
Databases
Self Assessment Questions
3. offers full consistency in graph-based database.
Graph-Based
Databases
28
Assignment
Describe in detail the features of graph-based
databases.
Topics
29
URL Notes
Graph Database for Beginners.
Why the Graph Technology is the
Future?
https://neo4j.com/blog/why-grap
h-databases-are-the-future/
This document describes in brief
about what makes the graph
database unique?
Graph-Based
Databases
Document Links
Graph-Based
Databases
30
Video Links
No videos on the features of graph-based databases
E-Book Name URL Link Page
No-SQL: A Brief Guide to the Emerging
World of Polyglot Persistence By:
Pramod J. Sadalage & Martin Fowler,
Pearson Education, Inc.
https://bigdata-ir.com/wp-content/uploads/201
7/04/NoSQL-Distilled.pdf 104-113
NoSQL for Dummies, By: Adam
Fowler, Published by: John Wiley &
Sons, Inc.
https://www.academia.edu/21554991/No_SQL_
For_Dummies1 260-269
31
Graph-Based
Databases
E-Book Links

NoSQL 5 2_graph Database Edited - Updated.pptx.pptx

  • 1.
    Subject: NoSQL Module 5.2 ModuleName: Graph-Based Databases 1 Version Code:NSDB3 Released Date:4-OCT-2019
  • 2.
    Graph-Based Databases 2 AIM:. To understand theconcept of Graph-Based Databases.
  • 3.
    Graph-Based Databases 3 Objective The objective ofthis module is to:- • Understand the features of Graph-Based Databases. • Understand the working of Graph-Based Databases.
  • 4.
    Graph-Based Databases 4 Outcome At the endof this module, you are expected to:- • Define the features of Graph-Based Databases. • The goal is not the material itself, but the experience in developing and experimenting with real systems in a realistic environment.
  • 5.
    Graph-Based Databases 5 Content • Introduction •Features of Graph-Based Databases • Working of Graph-Based Databases • What makes Graph Databases unique? • Conclusion
  • 6.
    Graph-Based Databases 6 Introduction • Graph databasesallow you to store entities and relationships between these entities. • Entities are also known as nodes, which have properties. Think of a node as an instance of an object in the application. • Relations are known as edges that can have properties. Edges have directional significance; nodes are organized by relationships which allow you to find interesting patterns between the nodes. • The organization of the graph lets the data to be stored once and then interpreted in different ways based on the relationships.
  • 7.
    Graph-Based Databases 7 Introduction • The graphparadigm goes well beyond databases and application development; it is a reimagining of what is possible around the idea of connections. • And just like any new problem-solving framework, approaching a challenge from a different dimension often produces an order-of-magnitude change in possible solutions.
  • 8.
    Graph-Based Databases 8 Features The following arethe features of graph-based databases. • Consistency • Transactions • Availability • Query Features • Scaling
  • 9.
    Graph-Based Databases 9 Consistency • Since graphdatabases are operating on connected nodes, most graph database solutions usually do not support distributing the nodes on different servers. • There are some solutions, however, that support node distribution across a cluster of servers, such as Infinite Graph. • Within a single server, data is always consistent, especially in Neo4J which is fully ACID- compliant. When running Neo4J in a cluster, a write to the master is eventually synchronized to the slaves, while slaves are always available for read. Writes to slaves are allowed and are immediately synchronized to the master; other slaves will not be synchronized immediately, though they will have to wait for the data to propagate from the master.
  • 10.
    Graph-Based Databases 10 Consistency • Graph databasesensure consistency through transactions. • They do not allow dangling relationships: The start node and end node always have to exist, and nodes can only be deleted if they don’t have any relationships attached to them.
  • 11.
    Graph-Based Databases 11 Transactions • Neo4J isACID-compliant. Before changing any nodes or adding any relationships to existing nodes, we have to start a transaction. Without wrapping operations in transactions, we will get a Not In Transaction Exception. Read operations can be done without initiating a transaction. Transaction transaction = database.beginTx(); try { Node node = database.createNode(); node.setProperty("name", "NoSQL Distilled"); node.setProperty("published", "2012"); transaction.success(); } finally { transaction.finish (); }
  • 12.
    Graph-Based Databases 12 Transactions • In thecode, we started a transaction on the database, then created a node and set properties on it. • We marked the transaction as success and finally completed it by finish. A transaction has to be marked as success, otherwise Neo4J assumes that it was a failure and rolls it back when finish is issued. • Setting success without issuing finish also does not commit the data to the database. This way of managing transactions has to be remembered when developing, as it differs from the standard way of doing transactions in an RDBMS.
  • 13.
    Graph-Based Databases 13 Availability • Neo4J, asof version 1.8, achieves high availability by providing for replicated slaves. • These slaves can also handle writes: when they are written to, they synchronize the write to the current master, and the write is committed first at the master and then at the slave. Other slaves will eventually get the update. • Other graph databases, such as Infinite Graph and FlockDB provide for distributed storage of the nodes. • Neo4J uses the Apache ZooKeeper [ZooKeeper] to keep track of the last transaction IDs persisted on each slave node and the current master node. Once a server starts up, it communicates with ZooKeeper and finds out which server is the master. If the server is the first one to join the cluster, it becomes the master; when a master goes down, the cluster elects a master from the available nodes, thus providing high availability.
  • 14.
    Graph-Based Databases 14 Query Features • Graphdatabases are supported by query languages such as Gremlin [Gremlin]. • Gremlin is a domain specific language for traversing graphs; it can traverse all graph databases that implement the Blueprints [Blueprints] property graph. Neo4J also has the Cypher [Cypher] query language for querying the graph. • Outside these query languages, Neo4J allows you to query the graph for properties of the nodes, traverse the graph, or navigate the relationships of the nodes using language bindings.
  • 15.
    Graph-Based Databases 15 Scaling • In NoSQLdatabases, one of the commonly used scaling techniques is sharding, where data is split and distributed across different servers. • With graph databases, sharding is difficult, as graph databases are not aggregate-oriented but relationship-oriented. Since any given node can be related to any other node, storing related nodes on the same server is better for graph traversal. • Traversing a graph when the nodes are on different machines is not good for performance.
  • 16.
    Graph-Based Databases 16 Scaling • Generally speaking,there are three ways to scale graph databases. • Since machines now can come with lots of RAM, we can add enough RAM to the server, so that the working set of nodes and relationships is held entirely in the memory. • We can improve the read scaling of the database by adding more slaves with read-only access to the data, with all the writes going to the master. This pattern of writing once and reading from many servers is a proven technique in MySQL clusters and is really useful when the dataset is large enough to not fit in a single machine’s RAM, but small enough to be replicated across multiple machines.
  • 17.
    Graph-Based Databases 17 Scaling • Slaves canalso contribute to the availability and read-scaling, as they can be configured to never become a master, remaining always read-only. • When the dataset size makes replication impractical, we can get the data from the application side using domain-specific knowledge. For example, nodes that relate to North America can be created on one server while the nodes that relate to Asia on another. This application-level sharding need to understand that nodes are stored on physically different databases
  • 18.
    Graph-Based Databases 18 Working of GraphDatabase • Unlike other database management systems (DBMS), relationships take first priority in graph databases. • In the graph world, connected data is equally (or more) important than individual data points. This connections-first approach to data means relationships and connections are persisted (and not just temporarily calculated) through every part of the data lifecycle: from idea to design in a logical model, to implementation in a physical model, to operation using a query language, and to persistence within a scalable, reliable database system.
  • 19.
    Graph-Based Databases 19 Working of GraphDatabase • Unlike other database systems, this approach means your application doesn’t have to infer data connections using things like foreign keys or out-of-band processing, like MapReduce. The result: Your data models are simpler and yet more expressive than the ones you would produce with relational databases.
  • 20.
    Graph-Based Databases 20 What Makes GraphDatabase Unique? Graph Storage • Some graph databases use native graph storage that is specifically designed to store and manage graphs – from bare metal on up. • Other graph technologies use relational, columnar or object-oriented databases as their storage layer. • Non-native storage is often slower than a native approach because all of the graph connections have to be translated into a different data model.
  • 21.
    Graph-Based Databases 21 What Makes GraphDatabase Unique? Graph processing • Native graph processing (a.k.a. index-free adjacency) is the most efficient means of processing data in a graph because connected nodes physically point to each other in the database. • Non-native graph processing engines use other means to process Create, Read, Update, or Delete (CRUD) operations that aren’t optimized for handling connected data.
  • 22.
    Graph-Based Databases 22 Conclusion • ā€œUse theright database for the jobā€ is the propagated ideology of the graph community, because every graph-based database is specialized on certain use cases. • Since there is no evaluation available which answers the question ā€œwhich database is the right tool for the job?ā€, the features of all the different graph stores should be considered to design a real life database system in a realistic environment.
  • 23.
    Graph-Based Databases 23 Summary • This moduledescribes the different features of graph-based databases. • Consistency, transactions, scaling, and availability data are the features that differs graph-based databases from that of the relational databases. • However, all the features should be utilised in an optimized way to ensure a good design of a graph-based database.
  • 24.
    Graph-Based Databases 24 References • No-SQL: ABrief Guide to the Emerging World of Polyglot Persistence By: Pramod J. Sadalage & Martin Fowler, Pearson Education, Inc. Page 260-269 • NoSQL for Dummies, By: Adam Fowler, Published by: John Wiley & Sons, Inc. Page 104-113
  • 25.
    a. Native graphprocessing b. Riak c. Neo4j Answer: a 25 Graph-Based Databases Self Assessment Questions 1. is the most efficient means of processing data in a graph.
  • 26.
    a. Native graphprocessing b. Non-native graph processing Answer: b 26 Graph-Based Databases Self Assessment Questions 2. engines use other means to process Create, Read, Update or Delete (CRUD) operations that aren’t optimized for handling connected data.
  • 27.
    a. Hbase b. Riak c.None of the above Answer: b 27 Graph-Based Databases Self Assessment Questions 3. offers full consistency in graph-based database.
  • 28.
    Graph-Based Databases 28 Assignment Describe in detailthe features of graph-based databases.
  • 29.
    Topics 29 URL Notes Graph Databasefor Beginners. Why the Graph Technology is the Future? https://neo4j.com/blog/why-grap h-databases-are-the-future/ This document describes in brief about what makes the graph database unique? Graph-Based Databases Document Links
  • 30.
    Graph-Based Databases 30 Video Links No videoson the features of graph-based databases
  • 31.
    E-Book Name URLLink Page No-SQL: A Brief Guide to the Emerging World of Polyglot Persistence By: Pramod J. Sadalage & Martin Fowler, Pearson Education, Inc. https://bigdata-ir.com/wp-content/uploads/201 7/04/NoSQL-Distilled.pdf 104-113 NoSQL for Dummies, By: Adam Fowler, Published by: John Wiley & Sons, Inc. https://www.academia.edu/21554991/No_SQL_ For_Dummies1 260-269 31 Graph-Based Databases E-Book Links