How to choose the
right Database
HELLO!
I am Jatin Chuglani
Product Manager, Zeta
Ex-infoworks, Amazon
LinkedIn
Blog
2
Who is this PPT for?
▪ Anyone trying to make a decision about the storage
architecture for their application
▪ Students, Engineers, Architects
3
What is a Database?
What is a Database?
A database is an organized collection of data stored and accessed
electronically
5
Importance of the right database
6
▪ Databases are a core part of any application
architecture
▪ Database architecture will determine
▫ How many users can be served by the application
▫ Time it takes to respond to a request
▫ How much downtime would there for the system
▪ Migrating Databases is an expensive/time-consuming
operation
Prep work before the
Decision
Questions to ask yourself
▪ Who are my users?
▪ What are my users’ short term and long term goals and
requirements?
▪ Which languages and tools should I be using to meet those
requirements?
▪ How would I grow my team as per my users and their
requirements?
8
Decision Framework
9
Maintenance
● Maturity
● Maintainability
● Community
Support
Functional Reqs
● Query
Patterns
● Evolution
● Language
Support
Scale
● Scalability
● Latency
● Reliability
● Consistency
Types of Databases
In Focus Today
Relational DBs
▪ MySQL
▪ SQL Server
▪ Oracle
NoSQL DBs
▪ Cassandra
▪ MongoDB
▪ Neo4j
Evented Datastores
▪ Kafka
▪ Azure EventHubs
11
In Focus Today
▪ Storage structure
▫ How is the data logically stored
▪ Querying support
▫ Ways the data be accessed
▪ Scalability
▫ How to account for future growth
▪ Use cases
▫ Where to use these databases
12
Relational Databases
Relational DBs
▪ MySQL
▪ SQL Server
▪ Oracle
Relational Databases Overview
▪ “SQL” (Structured Query Language)
Databases
▪ Tables can have (Foreign Key)
relationships with each other
▪ Allow creation of tables with Fixed
Structure
▪ Powerful Query interface with SQL
▪ Can be vertically scaled easily,
horizontal scaling requires more
effort 14
Storage Structure
▪ Foreign Key - Primary Key
Relationship
▪ Columns are assigned a
datatype
▪ Columns can have
constraints
15
Querying Capabilities
▪ SQL
▫ Joins!
▫ Subqueries
▫ Filter, etc.
▪ Object Relational Mapper (ORM)
▪ Mature Language Support
16
Scalability
▪ Vertical Scaling (Bigger
Machines)
▪ Horizontal Scaling
▫ Replicas (same data in
multiple machines)
▫ Sharding (distributing
data)
▫ Joins could suffer
17
Use Cases
▪ Relational Query Requirements
▪ Flexibility in Query Patterns
▫ Data Analytics
▪ Column Constraints Requirements
▫ Transaction Processing
▪ Small to Large Scale*
▫ Boutique Ecommerce Website
18
*With Sharding and Replicas
NoSQL Databases
NoSQL DBs
▪ Cassandra
▪ MongoDB
▪ Neo4j
NoSQL Databases Overview
▪ “NoSQL” is a loaded term, which refers to any database
without SQL
▪ Started getting popular in late 2000s
▪ Schema is not pre-defined, “Schemaless” or “Schema-on-
Read”
▪ Instead of breaking data into multiple units (or tables), the
data is typically stored together as one.
20
Storage Structure*
Wide Column DBs
▪ Tables, Rows,
columns
▪ “Column
Families” are
stored together
▪ No Relationships
▪ No/limited joins
▪ e.g. Cassandra
Document DBs
▪ JSON like storage
structure
▪ Data stored in
one nested
structure called
“Documents”
▪ E.g. MongoDB
21
*There are other NoSQL stores like key-value (Redis), GraphDB(Neo4j) and search
engines (Elastic), we will skip those in today’s conversation
Scalability
▪ Vertical Scaling (Bigger
Machines)
▪ Highly Horizontal Scalable
▫ Tables/documents are self
contained
▫ Load Balancer/router layer
22
Querying Capabilities
▪ Custom Query Languages
▫ Range from SQL-like
(CQL/Cassandra) to very
different
(JavaScript/MongoDB)
▪ REST API Support
▪ Querying primarily dependent
on IDs
▪ Programming Language Support
mileage may vary 23
Use Cases
▪ Fixed Access Pattern, cases where everything always comes
together
▪ High Scale and Low latency requirements
▫ Gaming servers
▪ Evolving Data
▫ Product Catalogs
24
Evented Datastores
Evented DBs
▪ Kafka
▪ Azure
EventHubs
Evented Datastores Overview
▪ Also, called “Messaging
Queues”
▪ Storage of “Events”
▪ Event = self-
contained,immutable object
with timestamp
▪ “Producers” and
“Consumers” of events
▪ Used in conjunction with
SQL/NoSQL stores
26
Difference from Conventional Databases
▪ Focus on transient data, data usually deleted after sometime
▪ Query tools mostly focused on offsets (message number)
▪ No support for secondary indexes / other search capabilities
27
Storage Structure
▪ Data could be in multiple
formats
▫ JSON
▫ Text
▫ Binary
▪ Schema constraints can
sometimes be added
28
Querying Capabilities
▪ Consumers can read/write data
using offsets, i.e., message
number
▪ Some high level functions
▫ Data manipulations
▫ Aggregation over small
periods of time
29
Scalability
▪ Vertical Scaling (Bigger Machines)
▪ Horizontally Scalable
▫ Partitioning
30
Use Cases
▪ Asynchronous Systems
▫ Order Processing
▪ Stream Processing
▫ Fraud Detection
▪ Keep systems in sync
▫ Change Data Capture
31
32
THANKS!
Any questions?

Datastore PPT.pptx

  • 1.
    How to choosethe right Database
  • 2.
    HELLO! I am JatinChuglani Product Manager, Zeta Ex-infoworks, Amazon LinkedIn Blog 2
  • 3.
    Who is thisPPT for? ▪ Anyone trying to make a decision about the storage architecture for their application ▪ Students, Engineers, Architects 3
  • 4.
    What is aDatabase?
  • 5.
    What is aDatabase? A database is an organized collection of data stored and accessed electronically 5
  • 6.
    Importance of theright database 6 ▪ Databases are a core part of any application architecture ▪ Database architecture will determine ▫ How many users can be served by the application ▫ Time it takes to respond to a request ▫ How much downtime would there for the system ▪ Migrating Databases is an expensive/time-consuming operation
  • 7.
    Prep work beforethe Decision
  • 8.
    Questions to askyourself ▪ Who are my users? ▪ What are my users’ short term and long term goals and requirements? ▪ Which languages and tools should I be using to meet those requirements? ▪ How would I grow my team as per my users and their requirements? 8
  • 9.
    Decision Framework 9 Maintenance ● Maturity ●Maintainability ● Community Support Functional Reqs ● Query Patterns ● Evolution ● Language Support Scale ● Scalability ● Latency ● Reliability ● Consistency
  • 10.
  • 11.
    In Focus Today RelationalDBs ▪ MySQL ▪ SQL Server ▪ Oracle NoSQL DBs ▪ Cassandra ▪ MongoDB ▪ Neo4j Evented Datastores ▪ Kafka ▪ Azure EventHubs 11
  • 12.
    In Focus Today ▪Storage structure ▫ How is the data logically stored ▪ Querying support ▫ Ways the data be accessed ▪ Scalability ▫ How to account for future growth ▪ Use cases ▫ Where to use these databases 12
  • 13.
    Relational Databases Relational DBs ▪MySQL ▪ SQL Server ▪ Oracle
  • 14.
    Relational Databases Overview ▪“SQL” (Structured Query Language) Databases ▪ Tables can have (Foreign Key) relationships with each other ▪ Allow creation of tables with Fixed Structure ▪ Powerful Query interface with SQL ▪ Can be vertically scaled easily, horizontal scaling requires more effort 14
  • 15.
    Storage Structure ▪ ForeignKey - Primary Key Relationship ▪ Columns are assigned a datatype ▪ Columns can have constraints 15
  • 16.
    Querying Capabilities ▪ SQL ▫Joins! ▫ Subqueries ▫ Filter, etc. ▪ Object Relational Mapper (ORM) ▪ Mature Language Support 16
  • 17.
    Scalability ▪ Vertical Scaling(Bigger Machines) ▪ Horizontal Scaling ▫ Replicas (same data in multiple machines) ▫ Sharding (distributing data) ▫ Joins could suffer 17
  • 18.
    Use Cases ▪ RelationalQuery Requirements ▪ Flexibility in Query Patterns ▫ Data Analytics ▪ Column Constraints Requirements ▫ Transaction Processing ▪ Small to Large Scale* ▫ Boutique Ecommerce Website 18 *With Sharding and Replicas
  • 19.
    NoSQL Databases NoSQL DBs ▪Cassandra ▪ MongoDB ▪ Neo4j
  • 20.
    NoSQL Databases Overview ▪“NoSQL” is a loaded term, which refers to any database without SQL ▪ Started getting popular in late 2000s ▪ Schema is not pre-defined, “Schemaless” or “Schema-on- Read” ▪ Instead of breaking data into multiple units (or tables), the data is typically stored together as one. 20
  • 21.
    Storage Structure* Wide ColumnDBs ▪ Tables, Rows, columns ▪ “Column Families” are stored together ▪ No Relationships ▪ No/limited joins ▪ e.g. Cassandra Document DBs ▪ JSON like storage structure ▪ Data stored in one nested structure called “Documents” ▪ E.g. MongoDB 21 *There are other NoSQL stores like key-value (Redis), GraphDB(Neo4j) and search engines (Elastic), we will skip those in today’s conversation
  • 22.
    Scalability ▪ Vertical Scaling(Bigger Machines) ▪ Highly Horizontal Scalable ▫ Tables/documents are self contained ▫ Load Balancer/router layer 22
  • 23.
    Querying Capabilities ▪ CustomQuery Languages ▫ Range from SQL-like (CQL/Cassandra) to very different (JavaScript/MongoDB) ▪ REST API Support ▪ Querying primarily dependent on IDs ▪ Programming Language Support mileage may vary 23
  • 24.
    Use Cases ▪ FixedAccess Pattern, cases where everything always comes together ▪ High Scale and Low latency requirements ▫ Gaming servers ▪ Evolving Data ▫ Product Catalogs 24
  • 25.
    Evented Datastores Evented DBs ▪Kafka ▪ Azure EventHubs
  • 26.
    Evented Datastores Overview ▪Also, called “Messaging Queues” ▪ Storage of “Events” ▪ Event = self- contained,immutable object with timestamp ▪ “Producers” and “Consumers” of events ▪ Used in conjunction with SQL/NoSQL stores 26
  • 27.
    Difference from ConventionalDatabases ▪ Focus on transient data, data usually deleted after sometime ▪ Query tools mostly focused on offsets (message number) ▪ No support for secondary indexes / other search capabilities 27
  • 28.
    Storage Structure ▪ Datacould be in multiple formats ▫ JSON ▫ Text ▫ Binary ▪ Schema constraints can sometimes be added 28
  • 29.
    Querying Capabilities ▪ Consumerscan read/write data using offsets, i.e., message number ▪ Some high level functions ▫ Data manipulations ▫ Aggregation over small periods of time 29
  • 30.
    Scalability ▪ Vertical Scaling(Bigger Machines) ▪ Horizontally Scalable ▫ Partitioning 30
  • 31.
    Use Cases ▪ AsynchronousSystems ▫ Order Processing ▪ Stream Processing ▫ Fraud Detection ▪ Keep systems in sync ▫ Change Data Capture 31
  • 32.

Editor's Notes

  • #13 Please add at least 1 line of description here
  • #30 (kSQL/Kafka)