NoSQL 5 2_graph Database Edited - Updated.pptx.pptx

Subject: NoSQL
Module 5.2
Module Name: Graph-Based
Databases
1
Version Code:NSDB3
Released Date:4-OCT-2019

Graph-Based
Databases
2
AIM:.
To understand the concept of Graph-Based
Databases.

Graph-Based
Databases
3
Objective
The objective of this module is to:-
• Understand the features of Graph-Based Databases.
• Understand the working of Graph-Based
Databases.

Graph-Based
Databases
4
Outcome
At the end of this module, you are expected to:-
• Define the features of Graph-Based Databases.
• The goal is not the material itself, but the experience in developing and
experimenting with real systems in a realistic environment.

Graph-Based Databases
5
Content
• Introduction
• Features of Graph-Based Databases
• Working of Graph-Based Databases
• What makes Graph Databases
unique?
• Conclusion

Graph-Based
Databases
6
Introduction
• Graph databases allow you to store entities and relationships between these entities.
• Entities are also known as nodes, which have properties. Think of a node as an instance of
an object in the application.
• Relations are known as edges that can have properties. Edges have directional significance;
nodes are organized by relationships which allow you to find interesting patterns
between the nodes.
• The organization of the graph lets the data to be stored once and then interpreted in
different ways based on the relationships.

Graph-Based
Databases
7
Introduction
• The graph paradigm goes well beyond databases and application development; it
is a reimagining of what is possible around the idea of connections.
• And just like any new problem-solving framework, approaching a challenge from a
different dimension often produces an order-of-magnitude change in possible
solutions.

Graph-Based
Databases
8
Features
The following are the features of graph-based
databases.
• Consistency
• Transactions
• Availability
• Query Features
• Scaling

Graph-Based
Databases
9
Consistency
• Since graph databases are operating on connected nodes, most graph database solutions usually
do not support distributing the nodes on different servers.
• There are some solutions, however, that support node distribution across a cluster of servers, such
as Infinite Graph.
• Within a single server, data is always consistent, especially in Neo4J which is fully ACID-
compliant. When running Neo4J in a cluster, a write to the master is eventually
synchronized to the slaves, while slaves are always available for read. Writes to slaves are allowed and
are immediately synchronized to the master; other slaves will not be synchronized immediately, though
they will have to wait for the data to propagate from the master.

Graph-Based
Databases
10
Consistency
• Graph databases ensure consistency through transactions.
• They do not allow dangling relationships: The start node and end node always
have to exist, and nodes can only be deleted if they don’t have any relationships attached to
them.

Graph-Based
Databases
11
Transactions
• Neo4J is ACID-compliant. Before changing any nodes or adding any relationships to existing
nodes, we have to start a transaction. Without wrapping operations in transactions, we
will get a Not In Transaction Exception. Read operations can be done without initiating
a transaction.
Transaction transaction =
database.beginTx(); try {
Node node = database.createNode();
node.setProperty("name", "NoSQL
Distilled"); node.setProperty("published",
"2012"); transaction.success();
} finally
{ transaction.finish
();
}

Graph-Based
Databases
12
Transactions
• In the code, we started a transaction on the database, then created a node and set properties on
it.
• We marked the transaction as success and finally completed it by finish. A transaction has to be
marked as success, otherwise Neo4J assumes that it was a failure and rolls it back
when finish is issued.
• Setting success without issuing finish also does not commit the data to the database.
This way of managing transactions has to be remembered when developing, as it differs from
the standard way of doing transactions in an RDBMS.

Graph-Based
Databases
13
Availability
• Neo4J, as of version 1.8, achieves high availability by providing for replicated slaves.
• These slaves can also handle writes: when they are written to, they synchronize the write
to the current master, and the write is committed first at the master and then at the slave. Other
slaves will eventually get the update.
• Other graph databases, such as Infinite Graph and FlockDB provide for distributed storage of
the nodes.
• Neo4J uses the Apache ZooKeeper [ZooKeeper] to keep track of the last transaction IDs persisted
on each slave node and the current master node. Once a server starts up, it
communicates with ZooKeeper and finds out which server is the master. If the server is the
first one to join the cluster, it becomes the master; when a master goes down, the cluster
elects a master from the available nodes, thus providing high availability.

Graph-Based
Databases
14
Query Features
• Graph databases are supported by query languages such as Gremlin [Gremlin].
• Gremlin is a domain specific language for traversing graphs; it can traverse all graph
databases that implement the Blueprints [Blueprints] property graph. Neo4J
also has the Cypher [Cypher] query language for querying the graph.
• Outside these query languages, Neo4J allows you to query the graph for properties of the
nodes, traverse the graph, or navigate the relationships of the nodes using language
bindings.

Graph-Based
Databases
15
Scaling
• In NoSQL databases, one of the commonly used scaling techniques is sharding, where data is
split and distributed across different servers.
• With graph databases, sharding is difficult, as graph databases are not aggregate-oriented but
relationship-oriented. Since any given node can be related to any other node, storing
related nodes on the same server is better for graph traversal.
• Traversing a graph when the nodes are on different machines is not good for performance.

Graph-Based
Databases
16
Scaling
• Generally speaking, there are three ways to scale graph databases.
• Since machines now can come with lots of RAM, we can add enough RAM to the
server, so that the working set of nodes and relationships is held entirely in the memory.
• We can improve the read scaling of the database by adding more slaves with read-only access
to the data, with all the writes going to the master. This pattern of writing once and
reading from many servers is a proven technique in MySQL clusters and is really useful
when the dataset is large enough to not fit in a single machine’s RAM, but small
enough to be replicated across multiple machines.

Graph-Based
Databases
17
Scaling
• Slaves can also contribute to the availability and read-scaling, as they can be configured
to never become a master, remaining always read-only.
• When the dataset size makes replication impractical, we can get the data from the
application side using domain-specific knowledge. For example, nodes that relate to
North America can be created on one server while the nodes that relate to Asia on
another. This application-level sharding need to understand that nodes are stored on
physically different databases

Graph-Based
Databases
18
Working of Graph Database
• Unlike other database management systems (DBMS), relationships take first priority
in graph databases.
• In the graph world, connected data is equally (or more) important than individual
data points.
This connections-first approach to data means relationships and connections are persisted
(and not just temporarily calculated) through every part of the data lifecycle: from idea to
design in a logical model, to implementation in a physical model, to operation using a
query language, and to persistence within a scalable, reliable database system.

Graph-Based
Databases
19
Working of Graph Database
• Unlike other database systems, this approach means your application doesn’t have
to infer data connections using things like foreign keys or out-of-band
processing, like MapReduce.
The result: Your data models are simpler and yet more expressive than the ones
you would produce with relational databases.

Graph-Based
Databases
20
What Makes Graph Database Unique?
Graph Storage
• Some graph databases use native graph storage that is specifically designed to store
and manage graphs – from bare metal on up.
• Other graph technologies use relational, columnar or object-oriented databases as
their storage layer.
• Non-native storage is often slower than a native approach because all of the
graph connections have to be translated into a different data model.

Graph-Based
Databases
21
What Makes Graph Database Unique?
Graph processing
• Native graph processing (a.k.a. index-free adjacency) is the most efficient means of
processing data in a graph because connected nodes physically point to each other in
the database.
• Non-native graph processing engines use other means to process Create, Read,
Update, or Delete (CRUD) operations that aren’t optimized for handling connected data.

Graph-Based
Databases
22
Conclusion
• “Use the right database for the job” is the propagated ideology of the graph
community, because every graph-based database is specialized on certain use
cases.
• Since there is no evaluation available which answers the question “which
database is the right tool for the job?”, the features of all the different graph stores should
be considered to design a real life database system in a realistic environment.

Graph-Based
Databases
23
Summary
• This module describes the different features of graph-based databases.
• Consistency, transactions, scaling, and availability data are the features that
differs graph-based databases from that of the relational databases.
• However, all the features should be utilised in an optimized way to ensure a good design of
a graph-based database.

Graph-Based
Databases
24
References
• No-SQL: A Brief Guide to the Emerging World of Polyglot Persistence By: Pramod
J. Sadalage & Martin Fowler, Pearson Education, Inc. Page 260-269
• NoSQL for Dummies, By: Adam Fowler, Published by: John Wiley & Sons, Inc.
Page 104-113

a. Native graph processing
b. Riak
c. Neo4j
Answer: a
25
Graph-Based
Databases
Self Assessment Questions
1. is the most efficient means of processing data in a graph.

a. Native graph processing
b. Non-native graph
processing
Answer: b
26
Graph-Based
Databases
2. engines use other means to process Create, Read, Update or Delete (CRUD)
operations that aren’t optimized for handling connected data.

a. Hbase
b. Riak
c. None of the above
Answer: b
27
Graph-Based
Databases
3. offers full consistency in graph-based database.

Graph-Based
Databases
28
Assignment
Describe in detail the features of graph-based
databases.

Topics
29
URL Notes
Graph Database for Beginners.
Why the Graph Technology is the
Future?
https://neo4j.com/blog/why-grap
h-databases-are-the-future/
This document describes in brief
about what makes the graph
database unique?
Graph-Based
Databases
Document Links

Graph-Based
Databases
30
Video Links
No videos on the features of graph-based databases

E-Book Name URL Link Page
No-SQL: A Brief Guide to the Emerging
World of Polyglot Persistence By:
Pramod J. Sadalage & Martin Fowler,
Pearson Education, Inc.
https://bigdata-ir.com/wp-content/uploads/201
7/04/NoSQL-Distilled.pdf 104-113
NoSQL for Dummies, By: Adam
Fowler, Published by: John Wiley &
Sons, Inc.
https://www.academia.edu/21554991/No_SQL_
For_Dummies1 260-269
31
Graph-Based
Databases
E-Book Links

NoSQL 5 2_graph Database Edited - Updated.pptx.pptx

More Related Content

Similar to NoSQL 5 2_graph Database Edited - Updated.pptx.pptx

More from ajajkhan16

Recently uploaded

NoSQL 5 2_graph Database Edited - Updated.pptx.pptx