SQL vs NoSQL deep dive

SQL VS NOSQL: DEEP DIVE
As we have walked through earlier in the last topic, now I suppose you have built
your team and you have the SQL/NoSQL versatile guy. Then he can make value
out of this one!
Caution: This article is not intended for beginners/ Amateurs.
Warm Up:
I will start with a warm up example before we dive in this comparison,
The following figure is a simple demonstration to compare between an RDBMS table
and a simple graph structured database for friendship relations which was introduced
by Healey. So, simply on the left it is schemaless structured and on the right this shows
how it can be extended to a normal structured schema.
Also before diving deep, we can just revise the simple instructions, by a simple example of both
representations.
Schemaless simply means that two documents, (a NoSQL data structure we will
discuss) can have different fields, or common fields that store different types of data as
this example:
var cars = [
{ Model: "BMW", Color: "Red", Manufactured: 2016 },
{ Model: "Mercedes", Type: "Coupe", Color: "Black", Manufactured: “1-1-2017” }
];
The basic difference in representation between SQL and NoSQL common technologies
, is the support of JOIN in SQL, it is as simple as joining two tables like this one.
SELECT Orders.OrderID, Customers.Name, Orders.Date
FROM Orders
INNER JOIN Customers
ON Orders.CustID = Customers.CustID
One major difference between NoSQL and SQL is that NoSQL databases are more
scalable than SQL databases. MongoDB for example has built in support for replication
and sharding (horizontal partitioning of data) to support this scalability. On the other
hand,SQL databases have less scalability but they ensure very high consistency and
they support fault tolerance, built in journaling and transaction management.

Sounds simple till now, but I assume that you have refreshed your mind skillset and we
can speak more technically. Let’s take this from the point of view of the compaction, the
NoSQL recently proposed compation algorithm by Ghosh and Gupta shows that it is
very challenging to handle the continuous generation of sstables (sorted string tables) ;
this is the file of key value string paired sorted by keys. The continuous generation of
sstables at a server overtime causes the read operation to contact multiple sstables
creating a disk I/O bottleneck for reads, so reads are slower than writes in NoSQL
databases, and for that, the NoSQL systems run the compaction protocols in the
background. The compaction algorithm to merge multiple sstables into a single sstable
by the merge sorting keys is NP-hard !
So, apparently, SQL doesn’t have this complexity deep inside it has simpler
implementation, but that of course at the cost of scalability. SQL databases are of
course less scalable.
Scalability:
Now take a deep breath! We are still in the shallow surface but at least we can now see
the corals, and we have an overview of the scalability for both. To further discuss this I
will go to Cattell’s proposed classification for the different data stores.
Data store
type
Use case Example Hints/Recommendations
Key-value
Store
For simple
application with
only one kind of
object,
and you only need
to look up objects
based on one
attribute.
Facebook’s
user home page’s
live updates.
Developers familiarity of
memcached is
recommended
Move to document store if
you intend to make key
value store lookups based
on multiple attributes.
Document
Store
for multiple
different kinds of
objects
Department of
Motor Vehicles
application,
with vehicles and
drivers), where
you need to look
up
objects based on
multiple fields
(say, a driver’s
name,
license number,
owned vehicle, or
birth date)
Use it when you accept to
tolerate
an “eventually consistent”
model with limited
atomicity and isolation.
“quorum-read”mechanism is
recommended for up to date
atomically consistent data.

Extensible
Record
Store
Higher throughput
and stronger
concurrency at the
cost of slightly
more complexity
than document
store
EBay style
applications:
Partitioning data
both vertically and
horizontally for
storing customer
information on an
HBase or Hypertable is used
for this partitioning
and making it easily
extensible.
Scalable
RDBMS
The usage of
ACID semantics to
free developers
from dealing with
locks, out-of-date
data, update
collisions, and
consistency;
Applications
which
do not demand
updates or joins
that span many
Nodes
Use MySQL clusters
(VoltDB and Clustrix) as
they were benchmarked for
improved scalability.
Now it is important to know the benchmarking KPIs for this scalability assessment,
The benchmarking is pivoted on three main axes: the concurrency, Data storage and
replication. The concurrency is pinned on the mechanism of locking, mutli version
concurrency control and the ACID. The data storage is to check either it is in memory
(Ram based) or disk based. The replication is split to two types either it is synchronous
or asynchronous.
By this kind of benchmarking and testing, the conclusion was that the cluster SQL
databases have shown promising performance per node as well as the capacity of
performing at scale. And hence SQL scalable RDBMS still has some competitive
advantage over the NoSQL data stores because of the convenience of the higher-level
SQL and ACID properties but you are about to lose this advantage if you are spanning
nodes.
NoSQL Characteristics:
NoSQL databases have been distinguished over SQL databases with three main
characteristics that stick to the CAP theorem developed by computer scientist Eric
Brewer which states that “it is impossible for a distributed computer system to
simultaneously provide more than two out of three of the following guarantees:
Consistency. Availability. Partition tolerance.”

Indexing:
“I'm trying to create indexes on a table with 308 million rows. It took ~20 minutes to load the
table but 10 days to build indexes on it.” ‣ MySQL bug #9544 • “Select queries were slow until I
added an index onto the timestamp field... Adding the index really helped our reporting, BUT
now the inserts are taking forever.” ‣ This was a comment on mysqlperformanceblog.com.
The SQL databases always follow the B-Tree indexing structure and it is the well-known for
almost all the DBMS. On the other hand, the NoSQL databases use the key/value pair index
structures or the T-trees. To better have a grasp of the T-tree, the next diagram indicates the
structure of the T-Tree with pointers in each T-Node.

In some cases, like in MongoDB it uses the B-Tree indexing with Memory-mapped files
indexing pointer. More contributions were done by Otoo, Nimako and Kwofie for developing
more advanced indexing algorithms like the O2-Tree for the In-Memory Database.
In general, the column store indexes are not the same like the traditional indices, they are more
like pre-aggregated statistics. This column store indexing structure was introduced in SQL
server 2012 and it requires you to specify what fields you want to index.
However, it is a fact to say that on absolute NoSQL systems needs indexing as they are too
fragmented in structure not like the SQL databases. Nevertheless, Cassandra has also introduced
secondary indexing over the single clustering indexing. In the next diagram, Victoria Malaya,
has summarized this in correspondence between MongoDB and SQL server in the following
diagram to illustrate the difference between SQL server DB and MongoDB indexing, the SQL
server database index on the column level and MongoDB indexes on the collection level and

and supports indexing on any field or subfield of the document in the MongoDB collection,
Business state of the art (Hybrid):
The current state of the art and the business need may require a hybrid structure that includes
NoSQL alongside SQL. A nice representation for that was introduced by Moniruzzaman.
However, there is big trend for moving to NoSQL more, due to the lack of flexibility and the
rigid schemas of SQL, with the high latency alongside the low performance compared to
NoSQL in addition to the inability to scale out data with the same power.
The authors of SQLite and CouchDB have proposed UnQL2 as an attempt to create a hybrid
SQL-NoSQL query language. UnQL is based on SQL with an extension to query NoSQL data

Complex use cases of SQL vs NoSQL:
Now I will cover couple of examples to show the difference in implementation for both,
1- The SQL (relational database) vs the NoSQL (document and graph database)
The below designs are for an app designed for testing purposes and it acts as a social networking
portal, it provides the common features for social media (friends grouping, private messaging,
microblogging, rate and write comments, add tags to topics...).. Just to have the sense of how this
can be, here is a highlevel snapshot of the three proposed designs.
1- Relational database 2- Document Database 3- Graph database.
(SQL PostgreSQL) (NoSQL MongoDB) (NoSQL Neo4j)
Later there were lots of performance test cases applied, using the execution time as
benchmarking. But this is out of our scope this is only for demonstrating the idea of the
design.
2-Image processing
In NoSQL databases like MongoDB you can do image processing by using NodeJs modules like
sharp , Jimp and many more and this is the mainstream nowadays. What is may be more
interesting is using SQL in image processing; you can use something like PixQL which is an
SQL inspired command-line image processing.
You can also use pure SQL in image processing but this is more complicated and you will need
some advanced design like this one:
These diagrams represent the OLTP transactional layer and the ROLAP cube to store
information about objects.
An example query to calculate image histogram info from the cube.
SELECT
FK_OBJECTS_ID,
1 AS CHANNEL,

RED AS BRIGHTNESS, 1 AS IMG_AREA,
COUNT(*) AS VALUE
FROM
FACT_IMAGES
GROUP BY
FK_OBJECTS_ID, RED
UNION ALL
SELECT
FK_OBJECTS_ID, 2 AS CHANNEL,
GREEN AS BRIGHTNESS,
1 AS IMG_AREA,
COUNT(*) AS VALUE
FROM
FACT_IMAGES
GROUP BY
FK_OBJECTS_ID,
GREEN
UNION ALL
SELECT
FK_OBJECTS_ID,
3 AS CHANNEL,
BLUE AS BRIGHTNESS,
1 AS IMG_AREA,
COUNT(*) AS VALUE
FROM
FACT_IMAGES
GROUP BY
FK_OBJECTS_ID,
BLUE
UNION ALL
SELECT FK_
OBJECTS_ID,
4 AS CHANNEL,
ALPHA AS BRIGHTNESS,
1 AS IMG_AREA,
COUNT(*) AS VALUE
FROM FACT_IMAGES
GROUP BY
FK_OBJECTS_ID, ALPHA
CRM Application:
Another application of using both is a company which stores the CRM on both database types,
the product info and related data on a NoSQL database and the CRM data on an SQL database.
This topic opens lots of discussions as there is big debate on this data bridging, so that dividing
the same application data over SQL and NoSQL generates a gap in terms of data access.
Conclusion:
After we have just explored the corals! you should have clearly noticed that favoring
SQL over NoSQL or using both alongside each other, is dependent on your business
context and the utilization of the resources you have. Every application and every
business demand has its own feasibility study and criteria. So, there is no battle between
SQL and NoSQL, it is a harmony!

SQL vs NoSQL deep dive

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to SQL vs NoSQL deep dive

Similar to SQL vs NoSQL deep dive (20)

Recently uploaded

Recently uploaded (20)

SQL vs NoSQL deep dive