1. Lecture1_NOSQL_Introduction.pdf

Introduction To NOSQL
Databases
1
Lecture 1
Dr. Shaimaa Galal

Course Resources
• Text book: Part I (Chapters 1-7) - copyright 2015
Next Generation Databases
NoSQL, NewSQL, and Big Data
• Lab tool: Apache Cassandra on
https://www.datastax.com/
2

Additional Resources
• YouTube: NoSQL Database Tutorial – Full Course for
Beginners
https://www.coursera.org/learn/nosql-databases
• Coursera : NOSQL Systems
https://www.coursera.org/learn/nosql-databases
3

Background
• Relational databases → mainstay of business
• Web-based applications caused spikes
• explosion of social media sites (Facebook, Twitter) with large data
needs and various data types such as images, videos, and
documents.
• rise of cloud-based solutions such as Amazon S3 (simple storage
solution)
• Hooking RDBMS to web-based application becomes
trouble due to big data.
4

Current day trends
• Big Data
• Cloud Computing
• Solid State Disk
5
The solution is to scale up

Issues with scaling up
• Limits to scaling up (or vertical scaling: make a “single”
machine more powerful) → dataset is just too big!
6

Issues with scaling out
• Scaling out (or horizontal
scaling: adding more
smaller/cheaper servers) is a
better choice
• Different horizontal scaling
approaches (multi-node
database):
• Master/Slave.
• Sharding (partitioning)
7

Scaling out RDBMS: Master/Slave
• Master/Slave
• All writes are written to the master
• All reads performed against the replicated slave databases
• Critical reads may be incorrect as writes may not have been
propagated down
• Large datasets can pose problems as master needs to
duplicate data to slaves
8
Writes
Reads

Multi-Master replication
9
• INSERT only, not UPDATES/DELETES
• No JOINs, thereby reducing query time (this involves de-
normalizing data)

Scaling out RDBMS: Sharding
• Sharding (Partitioning) involves partitioning the data
across multiple databases based on a key-attribute.
• Scales well for both reads and writes
• Not transparent, application needs to be partition-aware
• Can no longer have relationships/joins across partitions
(loss of referential integrity across shards).
10

Scaling out RDBMS: Sharding
11

Other ways to scale out RDBMS
• In-memory databases
• Primarily relies on main memory for data storage.
• Faster than disk-optimized databases because disk
access is slower than memory access and the internal
optimization algorithms are simpler and execute fewer
CPU instructions.
• Accessing data in memory eliminates querying seek time.
12

Other ways to scale out RDBMS
• NewSQL
• Is a database that retain main key characteristics of
RDBMS but different from the common architecture
exhibited by Oracle and SQL.
• There are two designs:
1. H-Store: pure distributed InMemory database.
2. C-Store: Involves columnar database design
13

14
Three waves of database technology

What is NOSQL?
• The Name:
• Stands for Not Only SQL
• The term NOSQL was introduced by Carl Strozzi in 1998 to name
his file-based database
• It was again re-introduced by Eric Evans when an event was
organized to discuss open source distributed databases
• Eric states that “… but the whole point of seeking alternatives is
that you need to solve a problem that relational databases are a
bad fit for. …”
15

What is NOSQL?
• Key features
(advantages):
• Non-relational and doesn’t
require schema.
• Data are replicated to multiple
nodes (fault-tolerant):
• down nodes easily replaced
• no single point of failure
• Horizontal scalable
• cheap, easy to implement
• Massive write performance
• Fast key-value access
17

What is NOSQL?
• Disadvantages:
• Don’t fully support relational features
• no join, group by, order by operations (except within partitions)
• no referential integrity constraints across partitions
• No declarative query language (e.g., SQL) → more programming
• Relaxed ACID (see CAP theorem) → fewer guarantees
• No easy integration with other applications that support SQL
20

Discussion
Are NoSQL databases better than
relational databases?
21

CAP Theorem
23
All client always have the
same view of the data
Consistency
Partition
tolerance
Availability

CAP Theorem
24
Each client always can
read and write.
Consistency
Partition
tolerance
Availability

CAP Theorem
25
A system can continue to
operate in the presence of
a network partitions
Consistency
Partition
tolerance
Availability

CAP Theorem (Consistency)
26
Write X row Read X row Is it the last
value of X?
Client A Client B
Answer is: Maybe,
however eventual
consistency occurs.

CAP Theorem
• Brewer’s CAP Theorem:
• For any system sharing data, it is “impossible” to guarantee
simultaneously all of these three properties
• You can have at most two of these three properties for any
shared-data system
• Very large systems will “partition” at some point:
• That leaves either C or A to choose from (traditional DBMS
prefers C over A and P )
• In almost all cases, you would choose A over C (except in
specific applications such as order processing)
• Example: Apache Cassandra focus on AP, while MongoDB
and Kafka focus on CP.
27

CAP Theorem summary
• ACID
• A DBMS is expected to support “ACID transactions,” processes that
are:
• Atomicity: either the whole process is done or none is
• Consistency: only valid data are written
• Isolation: one operation at a time
• Durability: once committed, it stays that way
• CAP
• Consistency: all data on cluster has the same copies
• Availability: cluster always accepts reads and writes
• Partition tolerance: guaranteed properties are maintained even
when network failures prevent some machines from communicating
with others
28

NO-SQL categories
1. Key-value
• Example: DynamoDB, Voldermort, Scalaris
2. Document-based
• Example: MongoDB, CouchDB
3. Column-based
• Example: BigTable, Cassandra, Hbased
4. Graph-based
• Example: Neo4J, InfoGrid
• “No-schema” is a common characteristics of most NOSQL
storage systems
• Provide “flexible” data types
29

Next Lecture
Key-value Database
30

1. Lecture1_NOSQL_Introduction.pdf

More Related Content

Similar to 1. Lecture1_NOSQL_Introduction.pdf

More from ShaimaaMohamedGalal

Recently uploaded

1. Lecture1_NOSQL_Introduction.pdf