Database Decision
Making Framework
An Opinionated Guide by Evan Zlotnick
Background
Forty years ago, if there was a desire to store and query
large amounts of data, there was one choice - a
Relational Database Management System
Today, there are multiple alternatives all with their own
pros and cons
How does an organization decide how to store their data?
RDBMS - Tried and True
Designed and implemented in the 1970’s
Useful for transactional data
Logical Units of Work may be submitted to the database
If any piece of the unit raises an error, the entire block is rolled-
back to a previous known good state
Depends on a formally defined schema
Usually, data is stored in 3NF and enforces strict
referential integrity
RDBMS - At Its Best
Let’s say we have a simple e-commerce site
A customer can place an order which can consist of multiple
products and have different quantities of each product
The price of the product will change over time, so we need
to know when the order was placed and what the price was
at the given time
RDBMS - At Its Best (continued)
Typically, the user would pick their products, place them
in a client side storage, such as a cookie or session,
then checkout
The user would provide their name, address, phone, and
credit card number
The order would be sent to a server and some high level
validation checks would be made
Every order has at least one line item
The product id’s are valid
RDBMS - At Its Best (continued)
The order would be saved in draft status if the validations are passed
Next, the system would start a transaction and put a lock on various rows in
the product table and decrement the stock_quantity by the quantity
specified in the line items
Finally, a third-party credit card API would be invoked
If no errors were raised, all changes would be committed to the database and
the locks released
However, if any step failed, the entire transaction would fail and the tables
would be rolled back to the previous state
An Example of an RDBMS Data Model
RDBMS Strengths and Weaknesses
Strengths:
Full ACID compliance
Two phase commit
Record locking
Use of Structured Query Language
(SQL)
Weaknesses:
Difficult to scale
The underlying schema of the
database must be designed
before data can be inserted
Scalability
There is a lot of talk in the industry about scalability,
or the ability to handle a large increase in the number
of users
Traditionally, an RDBMS was scaled by improving the
underlying physical hardware - more memory, storage,
processor cores, etc.
Ultimately, there is a limit to how much this can be
increased and to allow for further scalability, an RDBMS
must go through a process of sharding which is beyond the
scope of this deck, but is a pain in the butt...
NOSQL - Incredible Scale At a Cost
Increasing a server’s physical specifications for increased
capacity is called vertical scaling
A different approach is to use horizontal scaling, or:
Distributing the computational and storage requirements amongst
multiple physical servers
Duplication of data to allow for quicker access
FaceBook - An Extreme Example
FaceBook has over 1 Billion global users
At any given minute, over 500k concurrent users are hitting
FaceBook’s servers
And yet… The main page loads in under a second
How is this possible?
Cassandra
FaceBook has lots and lots of
Cassandra servers all over the
world
There is no single point of
failure, as all nodes are equal
The servers “gossip” with each
other propagating their changes
If I post my status in Texas, my
friends in Australia may not
Eventual Consistency
On the surface, NOSQL databases commit multiple cardinal
sins in database technologies
Data is duplicated, not ACID compliant, and it introduces
the concept of “eventual consistency”
A change can be made to any server in the network
Changes are propagated out, but as the changes are propagating
different servers may have different query results
For the Love of All Things Holy, Do Not Use a Non-
Transactional Compliant Database For Transactional Data!
In our simple e-commerce site example, we locked rows and
decremented quantities in the product table, rejecting
the entire transaction if a product was unavailable
The entire concept depends on a single source of the truth
for how much inventory of a product we have
We cannot have one server saying that there are 10 units of
Product A and a different server reporting 4 units
The outcome of the operation would then be dependent on
which server we were randomly querying… Not cool!
On the Other Hand...
While not suitable for transactional data which requires
row level record locking and establishment of a rollback
position,
NOSQL databases scale really, really well and…
Not all data needs this level of ACID compliance
Which is more important:
My friends in Australia see my FaceBook post immediately no matter how
long it takes the page to load, or
NOSQL and Unstructured Data
When dealing with a relational database, we typically talk
about the database’s schema
NOSQL databases, on the other hand, are schemaless
Depending on the database, data is stored in tables or
collections as key-value pairs
The number or types of key-value pairs can change at any
time with no pesky DDL to write
A NOSQL Example
Let’s say we are running a restaurant and we want our
customers to be able to order online
If a menu item is listed, it is assumed to be available, so
we are not going to track and decrement quantities
Our orders can be stored in a collection which will look a
bit different from the previous relational example...
Customer Order Document
{“first_name”: “Evan”, “last_name”: “Zlotnick”, “phone”:
“4258675309”, items: [
{“menu_item”: “taco”, “quantity”: 1},
{“menu_item”: “soda”, “quantity”: 1},
{“menu_item”: “sopapilla”, “quantity”: 3}
]
}
Well, that’s um, different
In the relational model, we split our order and our line
items into two separate tables
In the document model, we can embed any number of line
items in the initial document using nesting
Instead of doing joins and using referential integrity, we
can keep a lot of the data in the same document
Sometimes this is the right approach and sometimes, it
isn’t!
NOSQL Strengths and Weaknesses
Strengths:
Horizontally scalable
The exact data being stored does
not need to be predefined - no
schema, no DDL to keep track of
Incredibly flexible
Weaknesses:
Not ACID compliant
Depends on eventual consistency
Does not allow for two phase
commits and roll backs
So Which One Do I Choose???
Like most things, the right answer is, “It depends…”
However, picking the right database is not easy, there are
two surefire rules for picking the wrong database (I have
seen these rules violated!):
If referential integrity, row level locking, and rollbacks are required
- DO NOT USE NOSQL!
If a model is not well defined and the schema has columns like:
custom_field1, custom_field2, …, custom_field50 - you probably wanted
a flexible key value store
A Hybrid Approach
JSON Documents in a Traditional RDBMS
Postgres and MySQL are relational databases, but are now allowing the ability
to define a column as a type JSON
The other columns behave like they did before, with a strict typing, row
level locking, transactions, rollbacks, etc.
But… the new JSON columns allow for flexibility and can be queried
Instead of the horrible custom_field1, …, custom_field50 we can have a
custom_field which then looks like {“foo”: 1, “bar”: 2, “baz”: 3}
We can then run a query that will only return rows if bar = 2
We do not have tons of columns with nulls in them
No more DDL to run when someone wants to shove another random property into the
Distributed Model
Going back to our e-commerce example, it’s possible to build a transactional
system using a traditional RDBMS complete with rollbacks and locking, but…
Create a distributed key that would be referenced in a separate NOSQL
database that contains information like reviews
Items like reviews do not matter to the transaction and eventual consistency
will not impact the main goal of not accepting an order for an item that
does not have inventory
Reviews can consume a lot of storage and can start to affect the scalability
of the RDBMS
By splitting the two functions into two data stores, it is possible to get
the best of both

Database Decision Framework

  • 1.
    Database Decision Making Framework AnOpinionated Guide by Evan Zlotnick
  • 2.
    Background Forty years ago,if there was a desire to store and query large amounts of data, there was one choice - a Relational Database Management System Today, there are multiple alternatives all with their own pros and cons How does an organization decide how to store their data?
  • 3.
    RDBMS - Triedand True Designed and implemented in the 1970’s Useful for transactional data Logical Units of Work may be submitted to the database If any piece of the unit raises an error, the entire block is rolled- back to a previous known good state Depends on a formally defined schema Usually, data is stored in 3NF and enforces strict referential integrity
  • 4.
    RDBMS - AtIts Best Let’s say we have a simple e-commerce site A customer can place an order which can consist of multiple products and have different quantities of each product The price of the product will change over time, so we need to know when the order was placed and what the price was at the given time
  • 5.
    RDBMS - AtIts Best (continued) Typically, the user would pick their products, place them in a client side storage, such as a cookie or session, then checkout The user would provide their name, address, phone, and credit card number The order would be sent to a server and some high level validation checks would be made Every order has at least one line item The product id’s are valid
  • 6.
    RDBMS - AtIts Best (continued) The order would be saved in draft status if the validations are passed Next, the system would start a transaction and put a lock on various rows in the product table and decrement the stock_quantity by the quantity specified in the line items Finally, a third-party credit card API would be invoked If no errors were raised, all changes would be committed to the database and the locks released However, if any step failed, the entire transaction would fail and the tables would be rolled back to the previous state
  • 7.
    An Example ofan RDBMS Data Model
  • 8.
    RDBMS Strengths andWeaknesses Strengths: Full ACID compliance Two phase commit Record locking Use of Structured Query Language (SQL) Weaknesses: Difficult to scale The underlying schema of the database must be designed before data can be inserted
  • 9.
    Scalability There is alot of talk in the industry about scalability, or the ability to handle a large increase in the number of users Traditionally, an RDBMS was scaled by improving the underlying physical hardware - more memory, storage, processor cores, etc. Ultimately, there is a limit to how much this can be increased and to allow for further scalability, an RDBMS must go through a process of sharding which is beyond the scope of this deck, but is a pain in the butt...
  • 10.
    NOSQL - IncredibleScale At a Cost Increasing a server’s physical specifications for increased capacity is called vertical scaling A different approach is to use horizontal scaling, or: Distributing the computational and storage requirements amongst multiple physical servers Duplication of data to allow for quicker access
  • 11.
    FaceBook - AnExtreme Example FaceBook has over 1 Billion global users At any given minute, over 500k concurrent users are hitting FaceBook’s servers And yet… The main page loads in under a second How is this possible?
  • 12.
    Cassandra FaceBook has lotsand lots of Cassandra servers all over the world There is no single point of failure, as all nodes are equal The servers “gossip” with each other propagating their changes If I post my status in Texas, my friends in Australia may not
  • 13.
    Eventual Consistency On thesurface, NOSQL databases commit multiple cardinal sins in database technologies Data is duplicated, not ACID compliant, and it introduces the concept of “eventual consistency” A change can be made to any server in the network Changes are propagated out, but as the changes are propagating different servers may have different query results
  • 14.
    For the Loveof All Things Holy, Do Not Use a Non- Transactional Compliant Database For Transactional Data! In our simple e-commerce site example, we locked rows and decremented quantities in the product table, rejecting the entire transaction if a product was unavailable The entire concept depends on a single source of the truth for how much inventory of a product we have We cannot have one server saying that there are 10 units of Product A and a different server reporting 4 units The outcome of the operation would then be dependent on which server we were randomly querying… Not cool!
  • 15.
    On the OtherHand... While not suitable for transactional data which requires row level record locking and establishment of a rollback position, NOSQL databases scale really, really well and… Not all data needs this level of ACID compliance Which is more important: My friends in Australia see my FaceBook post immediately no matter how long it takes the page to load, or
  • 16.
    NOSQL and UnstructuredData When dealing with a relational database, we typically talk about the database’s schema NOSQL databases, on the other hand, are schemaless Depending on the database, data is stored in tables or collections as key-value pairs The number or types of key-value pairs can change at any time with no pesky DDL to write
  • 17.
    A NOSQL Example Let’ssay we are running a restaurant and we want our customers to be able to order online If a menu item is listed, it is assumed to be available, so we are not going to track and decrement quantities Our orders can be stored in a collection which will look a bit different from the previous relational example...
  • 18.
    Customer Order Document {“first_name”:“Evan”, “last_name”: “Zlotnick”, “phone”: “4258675309”, items: [ {“menu_item”: “taco”, “quantity”: 1}, {“menu_item”: “soda”, “quantity”: 1}, {“menu_item”: “sopapilla”, “quantity”: 3} ] }
  • 19.
    Well, that’s um,different In the relational model, we split our order and our line items into two separate tables In the document model, we can embed any number of line items in the initial document using nesting Instead of doing joins and using referential integrity, we can keep a lot of the data in the same document Sometimes this is the right approach and sometimes, it isn’t!
  • 20.
    NOSQL Strengths andWeaknesses Strengths: Horizontally scalable The exact data being stored does not need to be predefined - no schema, no DDL to keep track of Incredibly flexible Weaknesses: Not ACID compliant Depends on eventual consistency Does not allow for two phase commits and roll backs
  • 21.
    So Which OneDo I Choose??? Like most things, the right answer is, “It depends…” However, picking the right database is not easy, there are two surefire rules for picking the wrong database (I have seen these rules violated!): If referential integrity, row level locking, and rollbacks are required - DO NOT USE NOSQL! If a model is not well defined and the schema has columns like: custom_field1, custom_field2, …, custom_field50 - you probably wanted a flexible key value store
  • 22.
  • 23.
    JSON Documents ina Traditional RDBMS Postgres and MySQL are relational databases, but are now allowing the ability to define a column as a type JSON The other columns behave like they did before, with a strict typing, row level locking, transactions, rollbacks, etc. But… the new JSON columns allow for flexibility and can be queried Instead of the horrible custom_field1, …, custom_field50 we can have a custom_field which then looks like {“foo”: 1, “bar”: 2, “baz”: 3} We can then run a query that will only return rows if bar = 2 We do not have tons of columns with nulls in them No more DDL to run when someone wants to shove another random property into the
  • 24.
    Distributed Model Going backto our e-commerce example, it’s possible to build a transactional system using a traditional RDBMS complete with rollbacks and locking, but… Create a distributed key that would be referenced in a separate NOSQL database that contains information like reviews Items like reviews do not matter to the transaction and eventual consistency will not impact the main goal of not accepting an order for an item that does not have inventory Reviews can consume a lot of storage and can start to affect the scalability of the RDBMS By splitting the two functions into two data stores, it is possible to get the best of both