Datastore PPT.pptx

How to choose the
right Database

HELLO!
I am Jatin Chuglani
Product Manager, Zeta
Ex-infoworks, Amazon
LinkedIn
Blog
2

Who is this PPT for?
▪ Anyone trying to make a decision about the storage
architecture for their application
▪ Students, Engineers, Architects
3

What is a Database?
A database is an organized collection of data stored and accessed
electronically
5

Importance of the right database
6
▪ Databases are a core part of any application
architecture
▪ Database architecture will determine
▫ How many users can be served by the application
▫ Time it takes to respond to a request
▫ How much downtime would there for the system
▪ Migrating Databases is an expensive/time-consuming
operation

Questions to ask yourself
▪ Who are my users?
▪ What are my users’ short term and long term goals and
requirements?
▪ Which languages and tools should I be using to meet those
requirements?
▪ How would I grow my team as per my users and their
requirements?
8

Decision Framework
9
Maintenance
● Maturity
● Maintainability
● Community
Support
Functional Reqs
● Query
Patterns
● Evolution
● Language
Support
Scale
● Scalability
● Latency
● Reliability
● Consistency

In Focus Today
Relational DBs
▪ MySQL
▪ SQL Server
▪ Oracle
NoSQL DBs
▪ Cassandra
▪ MongoDB
▪ Neo4j
Evented Datastores
▪ Kafka
▪ Azure EventHubs
11

In Focus Today
▪ Storage structure
▫ How is the data logically stored
▪ Querying support
▫ Ways the data be accessed
▪ Scalability
▫ How to account for future growth
▪ Use cases
▫ Where to use these databases
12

Relational Databases
Relational DBs
▪ MySQL
▪ SQL Server
▪ Oracle

Relational Databases Overview
▪ “SQL” (Structured Query Language)
Databases
▪ Tables can have (Foreign Key)
relationships with each other
▪ Allow creation of tables with Fixed
Structure
▪ Powerful Query interface with SQL
▪ Can be vertically scaled easily,
horizontal scaling requires more
effort 14

Storage Structure
▪ Foreign Key - Primary Key
Relationship
▪ Columns are assigned a
datatype
▪ Columns can have
constraints
15

Querying Capabilities
▪ SQL
▫ Joins!
▫ Subqueries
▫ Filter, etc.
▪ Object Relational Mapper (ORM)
▪ Mature Language Support
16

Scalability
▪ Vertical Scaling (Bigger
Machines)
▪ Horizontal Scaling
▫ Replicas (same data in
multiple machines)
▫ Sharding (distributing
data)
▫ Joins could suffer
17

Use Cases
▪ Relational Query Requirements
▪ Flexibility in Query Patterns
▫ Data Analytics
▪ Column Constraints Requirements
▫ Transaction Processing
▪ Small to Large Scale*
▫ Boutique Ecommerce Website
18
*With Sharding and Replicas

NoSQL Databases
NoSQL DBs
▪ Cassandra
▪ MongoDB
▪ Neo4j

NoSQL Databases Overview
▪ “NoSQL” is a loaded term, which refers to any database
without SQL
▪ Started getting popular in late 2000s
▪ Schema is not pre-defined, “Schemaless” or “Schema-on-
Read”
▪ Instead of breaking data into multiple units (or tables), the
data is typically stored together as one.
20

Storage Structure*
Wide Column DBs
▪ Tables, Rows,
columns
▪ “Column
Families” are
stored together
▪ No Relationships
▪ No/limited joins
▪ e.g. Cassandra
Document DBs
▪ JSON like storage
structure
▪ Data stored in
one nested
structure called
“Documents”
▪ E.g. MongoDB
21
*There are other NoSQL stores like key-value (Redis), GraphDB(Neo4j) and search
engines (Elastic), we will skip those in today’s conversation

Scalability
▪ Vertical Scaling (Bigger
Machines)
▪ Highly Horizontal Scalable
▫ Tables/documents are self
contained
▫ Load Balancer/router layer
22

▪ Custom Query Languages
▫ Range from SQL-like
(CQL/Cassandra) to very
different
(JavaScript/MongoDB)
▪ REST API Support
▪ Querying primarily dependent
on IDs
▪ Programming Language Support
mileage may vary 23

Use Cases
▪ Fixed Access Pattern, cases where everything always comes
together
▪ High Scale and Low latency requirements
▫ Gaming servers
▪ Evolving Data
▫ Product Catalogs
24

Evented Datastores
Evented DBs
▪ Kafka
▪ Azure
EventHubs

Evented Datastores Overview
▪ Also, called “Messaging
Queues”
▪ Storage of “Events”
▪ Event = self-
contained,immutable object
with timestamp
▪ “Producers” and
“Consumers” of events
▪ Used in conjunction with
SQL/NoSQL stores
26

Difference from Conventional Databases
▪ Focus on transient data, data usually deleted after sometime
▪ Query tools mostly focused on offsets (message number)
▪ No support for secondary indexes / other search capabilities
27

Storage Structure
▪ Data could be in multiple
formats
▫ JSON
▫ Text
▫ Binary
▪ Schema constraints can
sometimes be added
28

▪ Consumers can read/write data
using offsets, i.e., message
number
▪ Some high level functions
▫ Data manipulations
▫ Aggregation over small
periods of time
29

Scalability
▪ Vertical Scaling (Bigger Machines)
▪ Horizontally Scalable
▫ Partitioning
30

Use Cases
▪ Asynchronous Systems
▫ Order Processing
▪ Stream Processing
▫ Fraud Detection
▪ Keep systems in sync
▫ Change Data Capture
31

Datastore PPT.pptx

More Related Content

Similar to Datastore PPT.pptx

Recently uploaded

Datastore PPT.pptx

Editor's Notes