Looking for the right database technology to use? Luckily there are many database technologies to choose from, including relational databases (MySQL, Postgres), NoSQL (MongoDB), columnar databases (Amazon Redshift, BigQuery), and others. Each choice has its own pros and cons, but today let’s walk through how columnar databases are unique, by comparing it against the more traditional row-oriented database (e.g., MySQL).
6. This two-dimensional table would be stored in a row-oriented database like this:
● As you can see, a record’s fields are stored one by one, then the next record’s
fields are stored, then the next, and so on.
8. ● Contrast the previous slide with how a columnar
database would store this data:
● Each field is stored by the column so that each ‘id’ is
stored, then the ‘name’ column, then the ‘zip codes’,
etc.
● So what implications are there when storing data in a
column-oriented fashion?
9. The Advantages of Columnar Databases, and the Disadvantages
● Imagine for example, that you wanted to know the
average age of all your users
● Instead of looking up the age for each record row-
by-row (row-oriented database), you can simply
jump to the area where the “age” data is stored
and read just the data you need.
● So in a nutshell, columnar storage lets you skip
over non-relevant data pretty quickly
10.
11. ● Hence, aggregation queries (queries where you
only need to look up small subsets of your total
data) could become really fast compared to row-
oriented databases .
● Since the data type for each column is similar:
○ Better compression on each column
○ Queries are even faster!
12. Cons
● Querying against a few rows
● Writing new data could take more time
○ You need to write each column one by
one
○ Inserting a record into a row-oriented
database can simply take just one
operation
○ Therefore, loading new data or updating
many values could take much more time.
13. Cons
● This is why you would want a row-oriented
database running the back-end of your web
app, etc.
● Once your app becomes huge, you might
want to consider having a columnar
database
○ like Amazon Redshift!
○ For BI analytics queries!
● We’ve seen many companies that make web
apps or mobile games go through this same
transition.
15. Conclusion
To summarize, columnar databases are good for:
● Queries that involve only a few columns
● Aggregation queries against vast amounts of data
● Column-wise compression
But are not so good at:
● Incremental data loading
● Online transaction processing (OLTP)
● Queries against only a few rows
16. www.flydata.com
Check us out!
-> http://flydata.com
sales@flydata.com
Toll Free: 1-855-427-9787
http://flydata.com
We are an official data integration
partner of Amazon Redshift