Relational and Non-
Relational Database
Dr. Zainab Namh
Lect. 9
Relational vs
Non –
Relational
Database
Introduction to NoSQL?
2
NoSQL
• NoSQL is a non-relational DMS, that does not require a
fixed schema, avoids joins, and is easy to scale.
• NoSQL database is used for distributed data stores with
humongous data storage needs.
• NoSQL is used for Big data and real-time web apps. For
example, companies like Twitter, Facebook, Google that
collect terabytes of user data every single day.
• NoSQL database stands for "Not Only SQL" or "Not SQL“
Though a better term would NoREL.
• Carl Strozz introduced the NoSQL concept in 1998.
3
What is Big Data?
• A term for data sets that are so large that traditional
methods storage and processing are inadequate.
• Massive increase in data volume within the last decade or
so
• Social networks, search engines, etc.
• Challenges in storage, capture, analysis, transfer, etc.
4
Advantages and Limitations of
RDBMS
• Better for relational data
• Normalization
• Well known standard language (SQL)
• ACID Compliance
However
• No flexibility in choosing the right DB for the right
application
• Expensive if runs across a cluster
• Integrity is more important than availability and low
latency 5
Why NoSQL?
• The concept of NoSQL databases became popular with
Internet giants like Google, Facebook, Amazon, etc. who
deal with huge volumes of data. The system response time
becomes slow when you use RDBMS for massive volumes
of data.
• To resolve this problem, we could "scale up" our systems
by upgrading our existing hardware. This process is
expensive.
• The alternative for this issue is to distribute database
load on multiple hosts whenever the load increases. This
method is known as "scaling out."
6
Horizontal vs Vertical scaling
7
Features of NoSQL
• Non-relational
• NoSQL databases never follow the relational model
• Never provide tables with flat fixed-column records
• Work with self-contained aggregates
• Doesn't require data normalization
• No complex features like query languages, referential integrity
joins and ACID
• Schema-free
• NoSQL databases are either schema-free or have relaxed
schemas
• Do not require any sort of definition of the schema of the data
• Offers heterogeneous structures of data in the same domain
8
NoSQL – Data Structure
9
Features of NoSQL
• Distributed
• Multiple NoSQL databases can be executed in a distributed
fashion
• Offers auto-scaling and fail-over capabilities
• Often ACID concept can be sacrificed for scalability and
throughput
10
What is the CAP Theorem?
• CAP theorem is also called brewer's theorem. It states
that is impossible for a distributed data store to offer more
than two out of three guarantees:
• Consistency
• Availability
• Partition Tolerance
• As soon as you partition a DB
the data might either:
• Become inconsistent –
one replication holds different data from another
• Become unavailable – to keep consistency, we must propagate
updates, which takes time
11
What is the CAP Theorem?
• Consistency:
• The data should remain consistent even after the execution of an
operation.
• This means once data is written, any future read request should
contain that data.
• For example, after updating the order status, all the clients
should be able to see the same data.
12
What is the CAP Theorem?
• Availability:
• The database should always be available and responsive. It
should not have any downtime.
• Partition Tolerance:
• Partition Tolerance means that the system should continue to
function even if the communication among the servers is not
stable.
• For example, the servers can be partitioned into multiple groups
which may not communicate with each other.
• Here, if part of the database is unavailable, other parts are
always unaffected.
13
Eventual Consistency - BASE
• BASE: Basically Available, Soft state, Eventual
consistency
• Basically available means DB is available all the time as
per CAP theorem
• Soft state means even without an input; the system state
may change
• Eventual consistency means that the system will become
consistent over time
14
Eventual Consistency Example
• If you pay some money into your bank account, you don’t
want some cash machines to know its there and others not
to
• But if you post something on Facebook, you might put up
with the fact that a friend in the UK can see it for a
moment while an American friend, looking at a slightly
older replication cannot. He’ll see it eventually
15
ACID vs BASE
16
NoSQL database types
• There are mainly four categories of NoSQL databases.
Each of these categories has its unique attributes and
limitations:
1. Key-value Pair Based
2. Column-oriented
3. Graphs based
4. Document-oriented
• No specific database is better to solve all problems. You
should select a database based on your product needs.
17
Key Value Store NoSQL Database
• Data is stored in key/value pairs. It is designed in such a
way to handle lots of data and heavy load.
• Key-value pair storage databases store data as a hash
table where each key is unique, and the value can be a
JSON (JavaScript Object Notation), BLOB(Binary Large
Objects), string, etc.
• For example, a key-value pair may contain a key like
"Website" associated with a value like "Guru99".
• When we try and reflect back on the CAP theorem, it
becomes quite clear that key value stores are great around
the Availability and Partition aspects but definitely lack
in Consistency.
18
Example
19
Column-based NoSQL Database
• Column stores in NoSQL are similar at first appearance to
traditional relational DBMS. The concepts of rows and
columns are still there. You also define column families
before loading data into the database, meaning that the
structure of data must be known in advance.
• However, column stores organize data differently than
relational databases do. Instead of storing data in a row
for fast access, data is organized for fast column
operations.
• They deliver high performance on aggregation queries like
SUM, COUNT, AVG, MIN etc. as the data is readily
available in a column.
20
Column-based NoSQL Database
21
Document Store NoSQL Database
• Document-Oriented NoSQL DB stores and retrieves data
as a key value pair but the value part is stored as a
document. The document is stored in JSON or XML
formats.
22
NoSQL - Example
23
NoSQL - Example
24
Graph-Based NoSQL Database
• A graph type database stores entities as well the relations
amongst those entities. The entity is stored as a node with
the relationship as edges. An edge gives a relationship
between nodes. Every node and edge has a unique
identifier.
25
Disadvantages of NoSQL
• No standardization rules
• Limited query capabilities
• RDBMS databases and tools are comparatively mature
• It does not offer any traditional database capabilities, like
consistency when multiple transactions are performed
simultaneously.
• When the volume of data increases it is difficult to
maintain unique values as keys become difficult
• Doesn't work as well with relational data
26
NoSQL Example – Review DB
Product: iPhone 5
Price: £500
Camera: Fine
Screen: Very good
Accessories: Headphone, Case, ....
Product: iPhone 5
Price: £500
Camera: Excellent
Screen: Poor in sunshine
Operating system: Easy to use
27
Example Review in JSON
{
Id:847543,
Name:iPhone5,
Features:[GPS,Retina Display,Siri],
Reviews:[
{Reviewer:458743,Date:12.4.1013,Speed:Slow},
{Reviewer:636534,Date:2.5.1013,Camera:Great},
]
}
28
Possible Review DB Design
• For our product review database, potential units to
aggregate are:
– Users – keep all a person’s reviews together
– Products – keep all the reviews of a product together
29
Review Queries?
• What are Common Queries?
• Which design you choose depends on a few things, for
example, what is the most common query?
– List all reviews from a given customer
– List all reviews of a given product
– Find all products that are ‘fast’
30
Transaction
Concurrency
ACID
Recovery
31
ACID- Atomic, Consistent, Isolated,
Durable
• Atomicity: A transaction is an atomic unit of processing; it
is either performed in its entirety or not performed at all.
• Consistency preservation: A transaction is consistency
preserving if its complete execution take(s) the database
from one consistent state to another.
• Isolation: A transaction should appear as though it is
being executed in isolation from other transactions. That
is, the execution of a transaction should not be interfered
with by any other transactions executing concurrently.
• Durability or permanency: The changes applied to the
database by a committed transaction must persist in the
database. These changes must not be lost because of any
failure. 32
What is a Transaction?
• A transaction is a logical unit of database processing (or a
logical unit of work on the database) that includes one or
more database access operations:
– insertion,
– deletion,
– modification, or
– retrieval operations.
33
What is a Transaction?
• If the database operations in a transaction do not update
the database but only retrieve data, the transaction is
called a read-only transaction.
• Read-only transaction - do not changes the state of a
database, only retrieves data.
• A database system interprets a transaction not as an
application program but a logical sequence of read and
write operations.
34
Concurrent and Interleaving
• Multiprogramming operating systems execute some
commands from one process, then suspend this process
and execute some commands from the next process, and so
on. This type of process execution is called interleaving.
• After a while, the execution of the first process is resumed
at the point where it was suspended.
• Interleaving keeps the CPU busy when a process requires
an input or output operation. The CPU is switched to
execute another process rather than remaining idle during
I/O time.
• Interleaving also prevents a long process from delaying
other processes.
35
Why Interleave?
36
Why concurrency control is needed?
37
Why Recovery Is Needed?
• Whenever a transaction is submitted to a DBMS for
execution, the system is responsible for making sure that
either
(1) all the operations in the transaction are completed
successfully and their effect is recorded permanently in the
database, or
(2) the transaction has no effect whatsoever on the
database or on any other transactions.
• For the purpose of recovery and concurrency control, we
are mainly interested in the read_item and write_item
operations of the transactions, as well as the commit and
abort operations.
38
Successful Schedule
39
References
• https://www.guru99.com/nosql-tutorial.html
• Introduction to NoSQL – Kevin Swingler – 2017.
• Advanced Database Lectures – Prof. Musbah Aqel – 2011.
40

Relational and non relational database 7

  • 1.
    Relational and Non- RelationalDatabase Dr. Zainab Namh Lect. 9 Relational vs Non – Relational Database
  • 2.
  • 3.
    NoSQL • NoSQL isa non-relational DMS, that does not require a fixed schema, avoids joins, and is easy to scale. • NoSQL database is used for distributed data stores with humongous data storage needs. • NoSQL is used for Big data and real-time web apps. For example, companies like Twitter, Facebook, Google that collect terabytes of user data every single day. • NoSQL database stands for "Not Only SQL" or "Not SQL“ Though a better term would NoREL. • Carl Strozz introduced the NoSQL concept in 1998. 3
  • 4.
    What is BigData? • A term for data sets that are so large that traditional methods storage and processing are inadequate. • Massive increase in data volume within the last decade or so • Social networks, search engines, etc. • Challenges in storage, capture, analysis, transfer, etc. 4
  • 5.
    Advantages and Limitationsof RDBMS • Better for relational data • Normalization • Well known standard language (SQL) • ACID Compliance However • No flexibility in choosing the right DB for the right application • Expensive if runs across a cluster • Integrity is more important than availability and low latency 5
  • 6.
    Why NoSQL? • Theconcept of NoSQL databases became popular with Internet giants like Google, Facebook, Amazon, etc. who deal with huge volumes of data. The system response time becomes slow when you use RDBMS for massive volumes of data. • To resolve this problem, we could "scale up" our systems by upgrading our existing hardware. This process is expensive. • The alternative for this issue is to distribute database load on multiple hosts whenever the load increases. This method is known as "scaling out." 6
  • 7.
  • 8.
    Features of NoSQL •Non-relational • NoSQL databases never follow the relational model • Never provide tables with flat fixed-column records • Work with self-contained aggregates • Doesn't require data normalization • No complex features like query languages, referential integrity joins and ACID • Schema-free • NoSQL databases are either schema-free or have relaxed schemas • Do not require any sort of definition of the schema of the data • Offers heterogeneous structures of data in the same domain 8
  • 9.
    NoSQL – DataStructure 9
  • 10.
    Features of NoSQL •Distributed • Multiple NoSQL databases can be executed in a distributed fashion • Offers auto-scaling and fail-over capabilities • Often ACID concept can be sacrificed for scalability and throughput 10
  • 11.
    What is theCAP Theorem? • CAP theorem is also called brewer's theorem. It states that is impossible for a distributed data store to offer more than two out of three guarantees: • Consistency • Availability • Partition Tolerance • As soon as you partition a DB the data might either: • Become inconsistent – one replication holds different data from another • Become unavailable – to keep consistency, we must propagate updates, which takes time 11
  • 12.
    What is theCAP Theorem? • Consistency: • The data should remain consistent even after the execution of an operation. • This means once data is written, any future read request should contain that data. • For example, after updating the order status, all the clients should be able to see the same data. 12
  • 13.
    What is theCAP Theorem? • Availability: • The database should always be available and responsive. It should not have any downtime. • Partition Tolerance: • Partition Tolerance means that the system should continue to function even if the communication among the servers is not stable. • For example, the servers can be partitioned into multiple groups which may not communicate with each other. • Here, if part of the database is unavailable, other parts are always unaffected. 13
  • 14.
    Eventual Consistency -BASE • BASE: Basically Available, Soft state, Eventual consistency • Basically available means DB is available all the time as per CAP theorem • Soft state means even without an input; the system state may change • Eventual consistency means that the system will become consistent over time 14
  • 15.
    Eventual Consistency Example •If you pay some money into your bank account, you don’t want some cash machines to know its there and others not to • But if you post something on Facebook, you might put up with the fact that a friend in the UK can see it for a moment while an American friend, looking at a slightly older replication cannot. He’ll see it eventually 15
  • 16.
  • 17.
    NoSQL database types •There are mainly four categories of NoSQL databases. Each of these categories has its unique attributes and limitations: 1. Key-value Pair Based 2. Column-oriented 3. Graphs based 4. Document-oriented • No specific database is better to solve all problems. You should select a database based on your product needs. 17
  • 18.
    Key Value StoreNoSQL Database • Data is stored in key/value pairs. It is designed in such a way to handle lots of data and heavy load. • Key-value pair storage databases store data as a hash table where each key is unique, and the value can be a JSON (JavaScript Object Notation), BLOB(Binary Large Objects), string, etc. • For example, a key-value pair may contain a key like "Website" associated with a value like "Guru99". • When we try and reflect back on the CAP theorem, it becomes quite clear that key value stores are great around the Availability and Partition aspects but definitely lack in Consistency. 18
  • 19.
  • 20.
    Column-based NoSQL Database •Column stores in NoSQL are similar at first appearance to traditional relational DBMS. The concepts of rows and columns are still there. You also define column families before loading data into the database, meaning that the structure of data must be known in advance. • However, column stores organize data differently than relational databases do. Instead of storing data in a row for fast access, data is organized for fast column operations. • They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN etc. as the data is readily available in a column. 20
  • 21.
  • 22.
    Document Store NoSQLDatabase • Document-Oriented NoSQL DB stores and retrieves data as a key value pair but the value part is stored as a document. The document is stored in JSON or XML formats. 22
  • 23.
  • 24.
  • 25.
    Graph-Based NoSQL Database •A graph type database stores entities as well the relations amongst those entities. The entity is stored as a node with the relationship as edges. An edge gives a relationship between nodes. Every node and edge has a unique identifier. 25
  • 26.
    Disadvantages of NoSQL •No standardization rules • Limited query capabilities • RDBMS databases and tools are comparatively mature • It does not offer any traditional database capabilities, like consistency when multiple transactions are performed simultaneously. • When the volume of data increases it is difficult to maintain unique values as keys become difficult • Doesn't work as well with relational data 26
  • 27.
    NoSQL Example –Review DB Product: iPhone 5 Price: £500 Camera: Fine Screen: Very good Accessories: Headphone, Case, .... Product: iPhone 5 Price: £500 Camera: Excellent Screen: Poor in sunshine Operating system: Easy to use 27
  • 28.
    Example Review inJSON { Id:847543, Name:iPhone5, Features:[GPS,Retina Display,Siri], Reviews:[ {Reviewer:458743,Date:12.4.1013,Speed:Slow}, {Reviewer:636534,Date:2.5.1013,Camera:Great}, ] } 28
  • 29.
    Possible Review DBDesign • For our product review database, potential units to aggregate are: – Users – keep all a person’s reviews together – Products – keep all the reviews of a product together 29
  • 30.
    Review Queries? • Whatare Common Queries? • Which design you choose depends on a few things, for example, what is the most common query? – List all reviews from a given customer – List all reviews of a given product – Find all products that are ‘fast’ 30
  • 31.
  • 32.
    ACID- Atomic, Consistent,Isolated, Durable • Atomicity: A transaction is an atomic unit of processing; it is either performed in its entirety or not performed at all. • Consistency preservation: A transaction is consistency preserving if its complete execution take(s) the database from one consistent state to another. • Isolation: A transaction should appear as though it is being executed in isolation from other transactions. That is, the execution of a transaction should not be interfered with by any other transactions executing concurrently. • Durability or permanency: The changes applied to the database by a committed transaction must persist in the database. These changes must not be lost because of any failure. 32
  • 33.
    What is aTransaction? • A transaction is a logical unit of database processing (or a logical unit of work on the database) that includes one or more database access operations: – insertion, – deletion, – modification, or – retrieval operations. 33
  • 34.
    What is aTransaction? • If the database operations in a transaction do not update the database but only retrieve data, the transaction is called a read-only transaction. • Read-only transaction - do not changes the state of a database, only retrieves data. • A database system interprets a transaction not as an application program but a logical sequence of read and write operations. 34
  • 35.
    Concurrent and Interleaving •Multiprogramming operating systems execute some commands from one process, then suspend this process and execute some commands from the next process, and so on. This type of process execution is called interleaving. • After a while, the execution of the first process is resumed at the point where it was suspended. • Interleaving keeps the CPU busy when a process requires an input or output operation. The CPU is switched to execute another process rather than remaining idle during I/O time. • Interleaving also prevents a long process from delaying other processes. 35
  • 36.
  • 37.
  • 38.
    Why Recovery IsNeeded? • Whenever a transaction is submitted to a DBMS for execution, the system is responsible for making sure that either (1) all the operations in the transaction are completed successfully and their effect is recorded permanently in the database, or (2) the transaction has no effect whatsoever on the database or on any other transactions. • For the purpose of recovery and concurrency control, we are mainly interested in the read_item and write_item operations of the transactions, as well as the commit and abort operations. 38
  • 39.
  • 40.
    References • https://www.guru99.com/nosql-tutorial.html • Introductionto NoSQL – Kevin Swingler – 2017. • Advanced Database Lectures – Prof. Musbah Aqel – 2011. 40