Types of Databases

Types of Databases
Where to use what ?

So many names and technologies (aka confusing)
Azure data warehouse
Blob
Redis
Cassandra
Druid
Redis
Graphite
MySQL
MemSQL
…. plus 10’s of more options in the market

Break it down
1. How is data stored
Row oriented,
Column oriented,
Sorted string,
Document,
Object store,
Key-value in memory,
Time series
2. Partitioning - Scale up and down
3. Replication - Consistency
4. Atomicity - All or none
5. Isolation - consistent view of data
through the transaction

2. Partitioning
It’s all about Key
Key Data

2. Partitioning
Writing the data
Key Data
Specified as partition key
Or
Generated by system
Range partition (Manual intervention)
Hash partition (Scaling issues)
Consistent hashing (Avoid shuffle during scaling)
Round robin (Even distribution)
State Less
State Full

2. Partitioning
Writing the data
Key Data
Specified as partition key
Or
Generated by system
Partition 1
Partition 2
Partition n
Redirect
Logic

2. Partitioning
Reading it back
Key (?) Data
Partition 1
Partition 2
Partition n
Partitioning key
columns are specified
Partitioning key columns are
NOT specified
Local indexes
Local indexes
Local indexes

2. Partitioning
Reading it back
Key (?) Data
Partition 1
Partition 2
Partition n
Partitioning key
columns are specified
Partitioning key columns are
NOT specified
Local indexes
Local indexes
Local indexes
Process 1
Process 2
Process n
Collect
Render
Output
MPP
Massively parallel processing

3. Replication
Centralised model - Master slave
Round robin
based
partitioning
requires
centralised
metastore to
keep track of
states

3. Replication
Decentralised model - Peer to Peer

How is data stored & CRUD operations
Data format is for the partition of Data
Your replication and partitioning strategy
is Independent of
Storage format

Data storage - Row oriented
Write path

Data storage - Row oriented
Read path
Ordering
of columns
matter
(a,b,c) is
different
from (c,b,a)
Penalty for
updating all index
trees for the table
Statistics refresh
can be deferred -
Hybrids

Data storage - Row oriented - Examples
Row level operations
Multiple types of query searches - by virtue of different indexes
Inefficiencies -

Analytical querying !
Lot of seeks from Disk (Range based queries)
Efficiency(Scan) >>> Efficiency(Seek)
Entire Row is fetched to operate on Few columns
Big drawback for Analytical queries.

Data storage - Column oriented
In-memory table / Memtable
Threshold
Write path

Read path

Hybrid Hybrid
Efficient for columnar aggregates and joins - Analytical queries
Efficient for filtering data based on condition
Inefficient for frequent updates (causes lot of soft deletes/tombstones)
Inefficient for retrieval of selected few rows
Compaction overheads

Data storage - Sorted String
Immutable concept of Columnar but Storage is Row level
Row
based
data
Threshold
Write path

Read path

Conceptually SSTable => Segments
Efficient for range based queries - Scan on disk
Low latency Inserts
Peer to peer protocol. Multi datacenter replication.
Inefficient for interleaved reads - filter queries. Potentially traverse complete table.
Inefficient for aggregates and joins
Compaction overheads
Query first approach

Data storage - Key, Value = Doc : Document
Key Data
Data is a Document whose schema can vary.
Usually a json format is standard.
Query ability may be required on certain columns
in the document.
Ability to specify a column within document as key
for partitioning

Data storage - Key, Value = File : Object store
Key Data
Data is a large file.
Query ability is not required.
Eventual consistency is fine.
Metadata layer to provide a file system look and feel

Data storage - Key, Value = Minimal data : Cache
Key Data
Data is in few MB’s
Lightweight data structures used for persisting value
In-memory and fast
Ideal for caching use cases
Hybrid

Data storage - Key, Value = Periodic : Time Series
Key Data
Data captured from
data-stream/device-measurements periodically at
high frequency.
Size of value is not large. Older values should have
the capability to be aggregated and stored
Using concept of
sorted string database

De-Normalisation vs Normalisation

Types of Databases

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Types of Databases

Similar to Types of Databases (20)

Recently uploaded

Recently uploaded (20)

Types of Databases