Database Decision Framework

Database Decision
Making Framework
An Opinionated Guide by Evan Zlotnick

Background
Forty years ago, if there was a desire to store and query
large amounts of data, there was one choice - a
Relational Database Management System
Today, there are multiple alternatives all with their own
pros and cons
How does an organization decide how to store their data?

RDBMS - Tried and True
Designed and implemented in the 1970’s
Useful for transactional data
Logical Units of Work may be submitted to the database
If any piece of the unit raises an error, the entire block is rolled-
back to a previous known good state
Depends on a formally defined schema
Usually, data is stored in 3NF and enforces strict
referential integrity

RDBMS - At Its Best
Let’s say we have a simple e-commerce site
A customer can place an order which can consist of multiple
products and have different quantities of each product
The price of the product will change over time, so we need
to know when the order was placed and what the price was
at the given time

RDBMS - At Its Best (continued)
Typically, the user would pick their products, place them
in a client side storage, such as a cookie or session,
then checkout
The user would provide their name, address, phone, and
credit card number
The order would be sent to a server and some high level
validation checks would be made
Every order has at least one line item
The product id’s are valid

RDBMS - At Its Best (continued)
The order would be saved in draft status if the validations are passed
Next, the system would start a transaction and put a lock on various rows in
the product table and decrement the stock_quantity by the quantity
specified in the line items
Finally, a third-party credit card API would be invoked
If no errors were raised, all changes would be committed to the database and
the locks released
However, if any step failed, the entire transaction would fail and the tables
would be rolled back to the previous state

An Example of an RDBMS Data Model

RDBMS Strengths and Weaknesses
Strengths:
Full ACID compliance
Two phase commit
Record locking
Use of Structured Query Language
(SQL)
Weaknesses:
Difficult to scale
The underlying schema of the
database must be designed
before data can be inserted

Scalability
There is a lot of talk in the industry about scalability,
or the ability to handle a large increase in the number
of users
Traditionally, an RDBMS was scaled by improving the
underlying physical hardware - more memory, storage,
processor cores, etc.
Ultimately, there is a limit to how much this can be
increased and to allow for further scalability, an RDBMS
must go through a process of sharding which is beyond the
scope of this deck, but is a pain in the butt...

NOSQL - Incredible Scale At a Cost
Increasing a server’s physical specifications for increased
capacity is called vertical scaling
A different approach is to use horizontal scaling, or:
Distributing the computational and storage requirements amongst
multiple physical servers
Duplication of data to allow for quicker access

FaceBook - An Extreme Example
FaceBook has over 1 Billion global users
At any given minute, over 500k concurrent users are hitting
FaceBook’s servers
And yet… The main page loads in under a second
How is this possible?

Cassandra
FaceBook has lots and lots of
Cassandra servers all over the
world
There is no single point of
failure, as all nodes are equal
The servers “gossip” with each
other propagating their changes
If I post my status in Texas, my
friends in Australia may not

Eventual Consistency
On the surface, NOSQL databases commit multiple cardinal
sins in database technologies
Data is duplicated, not ACID compliant, and it introduces
the concept of “eventual consistency”
A change can be made to any server in the network
Changes are propagated out, but as the changes are propagating
different servers may have different query results

For the Love of All Things Holy, Do Not Use a Non-
Transactional Compliant Database For Transactional Data!
In our simple e-commerce site example, we locked rows and
decremented quantities in the product table, rejecting
the entire transaction if a product was unavailable
The entire concept depends on a single source of the truth
for how much inventory of a product we have
We cannot have one server saying that there are 10 units of
Product A and a different server reporting 4 units
The outcome of the operation would then be dependent on
which server we were randomly querying… Not cool!

On the Other Hand...
While not suitable for transactional data which requires
row level record locking and establishment of a rollback
position,
NOSQL databases scale really, really well and…
Not all data needs this level of ACID compliance
Which is more important:
My friends in Australia see my FaceBook post immediately no matter how
long it takes the page to load, or

NOSQL and Unstructured Data
When dealing with a relational database, we typically talk
about the database’s schema
NOSQL databases, on the other hand, are schemaless
Depending on the database, data is stored in tables or
collections as key-value pairs
The number or types of key-value pairs can change at any
time with no pesky DDL to write

A NOSQL Example
Let’s say we are running a restaurant and we want our
customers to be able to order online
If a menu item is listed, it is assumed to be available, so
we are not going to track and decrement quantities
Our orders can be stored in a collection which will look a
bit different from the previous relational example...

Customer Order Document
{“first_name”: “Evan”, “last_name”: “Zlotnick”, “phone”:
“4258675309”, items: [
{“menu_item”: “taco”, “quantity”: 1},
{“menu_item”: “soda”, “quantity”: 1},
{“menu_item”: “sopapilla”, “quantity”: 3}
]
}

Well, that’s um, different
In the relational model, we split our order and our line
items into two separate tables
In the document model, we can embed any number of line
items in the initial document using nesting
Instead of doing joins and using referential integrity, we
can keep a lot of the data in the same document
Sometimes this is the right approach and sometimes, it
isn’t!

NOSQL Strengths and Weaknesses
Strengths:
Horizontally scalable
The exact data being stored does
not need to be predefined - no
schema, no DDL to keep track of
Incredibly flexible
Weaknesses:
Not ACID compliant
Depends on eventual consistency
Does not allow for two phase
commits and roll backs

So Which One Do I Choose???
Like most things, the right answer is, “It depends…”
However, picking the right database is not easy, there are
two surefire rules for picking the wrong database (I have
seen these rules violated!):
If referential integrity, row level locking, and rollbacks are required
- DO NOT USE NOSQL!
If a model is not well defined and the schema has columns like:
custom_field1, custom_field2, …, custom_field50 - you probably wanted
a flexible key value store

JSON Documents in a Traditional RDBMS
Postgres and MySQL are relational databases, but are now allowing the ability
to define a column as a type JSON
The other columns behave like they did before, with a strict typing, row
level locking, transactions, rollbacks, etc.
But… the new JSON columns allow for flexibility and can be queried
Instead of the horrible custom_field1, …, custom_field50 we can have a
custom_field which then looks like {“foo”: 1, “bar”: 2, “baz”: 3}
We can then run a query that will only return rows if bar = 2
We do not have tons of columns with nulls in them
No more DDL to run when someone wants to shove another random property into the

Distributed Model
Going back to our e-commerce example, it’s possible to build a transactional
system using a traditional RDBMS complete with rollbacks and locking, but…
Create a distributed key that would be referenced in a separate NOSQL
database that contains information like reviews
Items like reviews do not matter to the transaction and eventual consistency
will not impact the main goal of not accepting an order for an item that
does not have inventory
Reviews can consume a lot of storage and can start to affect the scalability
of the RDBMS
By splitting the two functions into two data stores, it is possible to get
the best of both

Database Decision Framework

More Related Content

Similar to Database Decision Framework

Recently uploaded

Database Decision Framework