Modern Databases
AND ITS CHALLENGES
Outline
The objective of databases
The Relational Model and ACID rules
The Distributed Systems and CAP Theorem
The NoSQL Databases
The NewSQL Databases
Why need databases
Availability : data are made available to wide variety of users
Integrity : the data available in the database is a reliable data
Security : only authorized users can access the data
Independence : users deal with data in efficient manner
, “abstract view” of how the data is stored in the database.
The Relational model
The current traditional Databases such as (SQL Server,
Oracle, MySQL …. etc.).
The Idea of relational model come at 1970
Is very stable architecture
The most popular model for storing data in the web and
business applications till now
has fixed structure for database schema
Relational model databases depend on Transactions and it’s
ACID rules
ACID Rules
Atomic : A transaction is a logical unit of work which must be
either completed with all of its data modifications, or none of
them is performed.
Consistent : At the end of the transaction, all data must be left in
a consistent state.
Isolated : Modifications of data performed by a transaction must
be independent of another transaction. Unless this happens, the
outcome of a transaction may be erroneous.
Durable : When the transaction is completed, effects of the
modifications performed by the transaction must be permanent
in the system.
The problem with Relational Model
Some datatypes not suitable for traditional
database (Graph , unstructured)
Scalability problem : and it’s Cost
Agility challenges that face modern applications
(Schema Free)
Distributed Systems
A distributed system consists of multiple computers and
software components that communicate through a
computer network
A distributed system can consist of any number of possible
configurations, such as mainframes, workstations, personal
computers, and so on.
The computers interact with each other and share the
resources of the system to achieve a common goal.
CAP Theorem
CAP theorem states that there are three basic requirements which exist in a special relation
when designing applications for a distributed architecture
Consistency - This means that the data in the database remains consistent after the execution of
an operation. For example after an update operation all clients see the same data.
Availability - This means that the system is always on (Every request receives response), no
downtime.
Partition Tolerance - This means that the system continues to function even the communication
among the servers is unreliable (nodes are up, but can't communicate)
Ex: if the network stops delivering messages between two sets of servers, will the system
continue to work correctly?
CAP Theorem
In theoretically it is impossible to fulfill all 3
requirements
Relational Databases achieve
◦ Availability
Every request receives a (non-error) response
◦ Consistency
All clients always have the same view of data
◦ No Partition Tolerance
What is NoSQL
non-relational database management systems
It is designed for distributed data stores where
very large scale of data storing needs
Scale Horizontally
Schema Free
Why NoSQL
The huge contents generated every time by
humans, devices, machines… etc.
Certain types of data
Agility challenges that face modern
applications
History of NoSQL
The term NoSQL was coined by Carlo Strozzi in the year 1998. He
used this term to name his database which did not have an SQL
interface.
In the early 2009, in an event on open-source distributed
databases the term reused to refer databases which are non-
relational, distributed, and does not conform to ACID.
In the same year, the "no:sql(east)" conference held in Atlanta,
USA, NoSQL was discussed.
And then, discussion and practice of NoSQL got a momentum,
and NoSQL saw an unprecedented growth.
NoSQL Consistency
According to CAP Theorem it is impossible to fulfill all 3
requirements
NoSQL Databases achieves
◦ No Consistency
may some clients have different views of data
◦ Availability
all nodes are always in contact
◦ Partition Tolerance
system continues to function even the communication among the
servers is unreliable
NoSQL Consistency
In 2007, Amazon discovered that every 100ms of latency on the Amazon website cost 1% in
sales. At the time their annual sales were around $14.7 billion. And 1% of $14.7 billion is a lot of
sales to lose.
they outlined an approach for a new kind of database. One that guaranteed Availability and
Partition tolerance at the expense of Consistency.
They rely on Eventual Consistency, where data would be consistent in the end (after some of
time).
For a bank where transactions have to be consistent, that just wouldn’t work. For companies
like Google, it’s acceptable.
Types of NoSQL Databases
Key-Value Store
It has a Big Hash Table of keys & values {Riak,
Amazon S3 (Dynamo)}
Column-based Store
Each storage block contains data from only one
column, {HBase, Cassandra}
Document-based Store
It stores documents made up of tagged elements.
{CouchDB, MongoDB}
Graph-based
A network database that uses edges and nodes to
represent and store data. {Neo4J}
Advantages and Disadvantages
Advantages Disadvantages
High scalability No standardization
Distributed Computing Limited query capabilities (so far)
Lower cost Eventual consistent
Schema flexibility, semi-structure data Less support and tools compared to
relational databases
No complicated Relationships
Very Easy software development
Example of NoSQL (MongoDB , C#)
1 2
3
Example of NoSQL (MongoDB)
Mongo Query
•db.Student.find({"FirstName":"Mohamed"})
SQL Query
•Select * From Student Where
FirstName=‘Mohamed’
The NewSQL
is a class of modern relational
database management systems that
seek to provide
the same scalable performance of
NoSQL systems for online
transaction processing (OLTP) read-
write workloads while still
maintaining the ACID guarantees of
a traditional database system.
How?
◦ Minimize the Locking
◦ Avoid Table or Row level locking
◦ Heavily depend on Memory
NuoDB Architecture
Transaction Engines (TEs): The transaction processing layer is
made up of in-memory process nodes that coordinate with
each other and the storage management layer. Nodes can be
easily added or removed to align with transaction volume.
Storage Managers (SMs): The storage management layer consists
of process nodes that have both in-memory and on-disk storage
components. SMs provide on-disk data durability guarantees, and
multiple SMs can be used to increase data redundancy.
Multi-version
concurrency
Management
Tire
How to choose
It depends
◦ The degree of consistency and integrity
◦ The scalability you need
◦ The complexity of data model
◦ the query complexity
◦ Your team knowledge
How about hybrid
SQL
NoSQL
Integration
Application
References
https://www.red-gate.com/simple-talk/opinion/opinion-pieces/does-nosql--
nodba/?utm_source=simpletalk&utm_medium=weblink&utm_content=sombrero&utm_campai
gn=magazine
https://www.w3resource.com/mongodb/nosql.php
https://www.mongodb.com/nosql-explained?jmp=footer
https://en.wikipedia.org/wiki/NewSQL
https://en.wikipedia.org/wiki/CAP_theorem
http://galaldev.blogspot.com.eg/2017/02/sql-server-transaction.html

Modern databases and its challenges (SQL ,NoSQL, NewSQL)

  • 1.
  • 2.
    Outline The objective ofdatabases The Relational Model and ACID rules The Distributed Systems and CAP Theorem The NoSQL Databases The NewSQL Databases
  • 3.
    Why need databases Availability: data are made available to wide variety of users Integrity : the data available in the database is a reliable data Security : only authorized users can access the data Independence : users deal with data in efficient manner , “abstract view” of how the data is stored in the database.
  • 4.
    The Relational model Thecurrent traditional Databases such as (SQL Server, Oracle, MySQL …. etc.). The Idea of relational model come at 1970 Is very stable architecture The most popular model for storing data in the web and business applications till now has fixed structure for database schema Relational model databases depend on Transactions and it’s ACID rules
  • 5.
    ACID Rules Atomic :A transaction is a logical unit of work which must be either completed with all of its data modifications, or none of them is performed. Consistent : At the end of the transaction, all data must be left in a consistent state. Isolated : Modifications of data performed by a transaction must be independent of another transaction. Unless this happens, the outcome of a transaction may be erroneous. Durable : When the transaction is completed, effects of the modifications performed by the transaction must be permanent in the system.
  • 6.
    The problem withRelational Model Some datatypes not suitable for traditional database (Graph , unstructured) Scalability problem : and it’s Cost Agility challenges that face modern applications (Schema Free)
  • 7.
    Distributed Systems A distributedsystem consists of multiple computers and software components that communicate through a computer network A distributed system can consist of any number of possible configurations, such as mainframes, workstations, personal computers, and so on. The computers interact with each other and share the resources of the system to achieve a common goal.
  • 8.
    CAP Theorem CAP theoremstates that there are three basic requirements which exist in a special relation when designing applications for a distributed architecture Consistency - This means that the data in the database remains consistent after the execution of an operation. For example after an update operation all clients see the same data. Availability - This means that the system is always on (Every request receives response), no downtime. Partition Tolerance - This means that the system continues to function even the communication among the servers is unreliable (nodes are up, but can't communicate) Ex: if the network stops delivering messages between two sets of servers, will the system continue to work correctly?
  • 9.
    CAP Theorem In theoreticallyit is impossible to fulfill all 3 requirements Relational Databases achieve ◦ Availability Every request receives a (non-error) response ◦ Consistency All clients always have the same view of data ◦ No Partition Tolerance
  • 10.
    What is NoSQL non-relationaldatabase management systems It is designed for distributed data stores where very large scale of data storing needs Scale Horizontally Schema Free
  • 11.
    Why NoSQL The hugecontents generated every time by humans, devices, machines… etc. Certain types of data Agility challenges that face modern applications
  • 12.
    History of NoSQL Theterm NoSQL was coined by Carlo Strozzi in the year 1998. He used this term to name his database which did not have an SQL interface. In the early 2009, in an event on open-source distributed databases the term reused to refer databases which are non- relational, distributed, and does not conform to ACID. In the same year, the "no:sql(east)" conference held in Atlanta, USA, NoSQL was discussed. And then, discussion and practice of NoSQL got a momentum, and NoSQL saw an unprecedented growth.
  • 13.
    NoSQL Consistency According toCAP Theorem it is impossible to fulfill all 3 requirements NoSQL Databases achieves ◦ No Consistency may some clients have different views of data ◦ Availability all nodes are always in contact ◦ Partition Tolerance system continues to function even the communication among the servers is unreliable
  • 14.
    NoSQL Consistency In 2007,Amazon discovered that every 100ms of latency on the Amazon website cost 1% in sales. At the time their annual sales were around $14.7 billion. And 1% of $14.7 billion is a lot of sales to lose. they outlined an approach for a new kind of database. One that guaranteed Availability and Partition tolerance at the expense of Consistency. They rely on Eventual Consistency, where data would be consistent in the end (after some of time). For a bank where transactions have to be consistent, that just wouldn’t work. For companies like Google, it’s acceptable.
  • 15.
    Types of NoSQLDatabases Key-Value Store It has a Big Hash Table of keys & values {Riak, Amazon S3 (Dynamo)} Column-based Store Each storage block contains data from only one column, {HBase, Cassandra} Document-based Store It stores documents made up of tagged elements. {CouchDB, MongoDB} Graph-based A network database that uses edges and nodes to represent and store data. {Neo4J}
  • 16.
    Advantages and Disadvantages AdvantagesDisadvantages High scalability No standardization Distributed Computing Limited query capabilities (so far) Lower cost Eventual consistent Schema flexibility, semi-structure data Less support and tools compared to relational databases No complicated Relationships Very Easy software development
  • 17.
    Example of NoSQL(MongoDB , C#) 1 2 3
  • 18.
    Example of NoSQL(MongoDB) Mongo Query •db.Student.find({"FirstName":"Mohamed"}) SQL Query •Select * From Student Where FirstName=‘Mohamed’
  • 19.
    The NewSQL is aclass of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (OLTP) read- write workloads while still maintaining the ACID guarantees of a traditional database system. How? ◦ Minimize the Locking ◦ Avoid Table or Row level locking ◦ Heavily depend on Memory
  • 20.
    NuoDB Architecture Transaction Engines(TEs): The transaction processing layer is made up of in-memory process nodes that coordinate with each other and the storage management layer. Nodes can be easily added or removed to align with transaction volume. Storage Managers (SMs): The storage management layer consists of process nodes that have both in-memory and on-disk storage components. SMs provide on-disk data durability guarantees, and multiple SMs can be used to increase data redundancy. Multi-version concurrency Management Tire
  • 21.
    How to choose Itdepends ◦ The degree of consistency and integrity ◦ The scalability you need ◦ The complexity of data model ◦ the query complexity ◦ Your team knowledge
  • 22.
  • 23.

Editor's Notes

  • #16 Every type has different properties Ex: Mongo and Cassandra has different architecture