The Rise of Nosql Databases

1 | P a g e
A study into the rise of NoSQL Technologies
James Ngondo
Bsc. (Hons) Software Development
Department of Science and Computing,
Galway-Mayo Institute of Technology, Ireland

2 | P a g e
ABSTRACT
As data generation has increased sharply over the years, data storage and retrieval has also been
a major concern since relational databases were not designed to cope with big data being
produced at exceedingly great rates, from web applications that store digital photos and videos,
and other devices such GPS cell phones for geolocations.
This paper will examine the new generation of Not only Structured Query Language (NoSQL)
databases that have arisen in order to cope with huge volumes of user data, products and objects
that need to be stored and retrieved at the same time and at relatively high speeds. These
databases are thought flexible enough to cope with big data as they are schema-less[4][14].There
are many types of NoSQL databases with varying performances. We will evaluate some of the
most popular NoSQL databases such as : MongoDB, Cassandra and HBase and then compare
them in terms of performance based on workloads and time required to perform search, update,
delete and insertion operations.
Key Words: Big Data, MongoDB, NoSQL, Zettabyte (ZB), performance, Cassandra, HBase.
1. Introduction
A database can be summarily described as a collection of related data that is stored in such a way
that it can be easily retrieved [10]. A Database Management System (DBMS) is a software
application that handles the way data is stored and allows the user to interact and maintain a
database [5][10].
For the past 40 years, relational databases have been widely used by most application developers
and became the integral part for data storage and retrieval in the technology industry. Relational
databases store data in tables with rows and columns, and a database management system
(DBMS) manages how data is stored and retrieved and also allow users to perform transactions
[5]. A transaction in DBMS is referred to as a series of actions performed by a single user that
can alter the database and it emerges with the following A.C.I.D. Properties: a) Atomic - either a
transaction happens or it does not, b) Consistency - transaction being able to transform the
database from one consistent state to another, c) Isolation - transactions to execute independent
of one another, d) Durability - once a transaction is committed, it will stay permanently in the
database [7]. In spite of all these properties, horizontal scaling in relational databases has been a
challenging, and more or less an impossible task [8].
As a database grows in size or the number of users multiplies, many RDBMS-based sites suffer
serious performance issues [6]. Researchers have also found that changes in application
development and technology infrastructure has led to developers seek an alternative database
technologies that may cope with modern web applications and increased volume of user data and

3 | P a g e
objects and at the same time, making full use of the cheap processing power and storage that is
available today. Research has also found that developers are exploring database technologies that
are able to cope with agile challenges and also scaling-out factors faced by modern
applications[1].
NoSQL Databases
Not only Sequential Query Language (NoSQL) is a database that provides NoSQL databases
have proven to be the solution to what is known as Big Data as they follow a schema-less data
model, hence provide increased scalability and flexibility as compared to relational databases.In
recent years, developers and organizations have experiences a sharp rise in volume of user data
and products that has to be stored in databases [1]. NoSQL databases are widely used to store and
retrieve very large amounts of data using a key-value format [15]. These types of databases have
emerged as the best choices that suite modern mobile and web development.
This literature review intends to describe the architecture of NoSQL databases in contrast to the
well-known traditional relationaldatabases. It will also answer some of the most common asked
questions as to why so many companies and organizations are now opting for NoSQL databases
as compared to relational databases,and how these databases have managed cope with the rapid
rising volumes of user data that has to be stored in databases. This review will also consider
evaluating some of the well-known NoSQL databases such as: OrientDB and MongoDB which
are Document Store databases, HBase and Cassandra are Column Family databases, Redis being
a Key-value Store database and also Neo4j from Graph databases. This report will as shade light
on how these NoSQL databases have effectively managed deal with challenges that face modern
day web and mobile development in terms of scalability and agility. A comparison between
database architectures such as NoSQL and SQL (relational databases) will be drawn throughout
this review.
2. Literature Review
The literature review process emerged from different sources which are directly relevant to the
topic under review and they include the following:
 academic journals papers published online
 books
 database technology abstracts
 research papers/ reports
 Internet publications

4 | P a g e
One of the global known storage company EMC, sponsored the study which found that 2.8 ZB
(Zettabyte) of data was created and replicated in the year 2012 and that every two years
thereafter, the total amount of data will double and that by 2020, there will be approximately
5,3427 GB of data for every woman, man and child[11][12]. Research suggest that this data will
emerge from some of the data types such as [12]:
 Consumer images and products. These are images that people, products and
organizations posted on website or social media sites.
 Embedded Devices. Devices such as sensors, RFID tags, smart meters, trackers etc.
generate data that has to be captured and accessed in real-time.
 Surveillance footage. Captured video footage by crime investigation and military
intelligence.
Evidence from the resources provided in this review shows that the aforementioned data types
are placing high data demands on organizations and businesses such that they are constantly
researching and investing in database technologies that deal with big data in a timely manner
hence, moving from traditional relational databases to NoSQL databases.
NoSQL databases are proving to addressing the challenges and shortcomings faced by relational
databases due to increased volumes of user data and modern web application [3].
A large number of organizations and companieshave implemented NoSQL database technologies
due to improved flexibility, scalability and performance issues [22].
Table1. Examples of major companies implementing NoSQLdatabases
Company Name NoSQL Name NoSQL Storage Type
Adobe HBase Column
Amazon Dynamo | SimpleDB Key-Value | Document
BestBuy Riak Key---Value
eBay Cassandra | MongoDB Column | Document
Facebook Cassandra | Neo4j Column | Graph
Google BigTable Column
LinkedIn Voldemort Key---Value
Lots Of Words CouchDB Document
MongoHQ MongoDB Document
Mozilla HBase | Riak Column | Key---Value
Netflix SimpleDB | HBase | Cassandra Document | Column | Column
Twitter Cassandra Column
Source: Fidelis Cyber Security - 2014
Michael Stonebraker, [13]: In his article, he highlights more concerning NoSQL flexibility
issues. His argument is based on the fact that data should not only conform to an inflexible
relational schema. Hence, there is no need to be bound by structures of a Relational Database
Management System (RDBMS).

5 | P a g e
Kota Tsuyuzaki, Makoto Onizuka [16]: According to the authors, scalability in NoSQL databases
refers to the performance ability of NoSQL databasesas more machines are added to the cluster.
High performance machines to the cluster in a distributed manner and if performance is
improved, then NoSQL databases are said to be scalable. In this regard, data is distributed over a
large number of high performance machines which results in the distribution of processing load
hence contributing to improved performance. When a new machine is added to a cluster, NoSQL
databases automatically distribute data to those machines. This has become a common feature for
NoSQL databases.
3. Categories of NoSQL Databases
Veronika Abramova; Jing Han and others, have explained four categories of NoSQL databases
and these are based on different optimizations [15,17]:
3.1 Key-Value Store
As regard to this category, data is stored inform of a unique key - value pair, which
simply means that a key corresponds to a value. This structureis sometimes known as a
"Harsh Table". Data retrieval is usually done using a key in order to access a value.These
databases support high volumes of data and as result query performances are relatively
faster as compared to relational databases. They also provide a mechanism for high
concurrency. With this data model, a large volume of data can be easily mapped into
physical memory, hence best suited for stock/ product management and data analysis in
real time because of their ability to retrieve data at high speed based on Key-Value pair.
3.1.1 Redis
Redis was reviewed by Jing Han, Haihong, E.Guan Le, Jian Du in their paper as
one good example of a Key- Value data model based on the fact that the entire
data is initially loaded into memory when Redis runs. As a result of this, all the
database operations run in memory. Thereafter, it saved data to the hard disk
asynchronously. This simply means that read or write operations are handled at
high speed and this contributes to high performances when dealing with small
amounts of data. Redis is not well suited for big data since its operations are
limited to physical memory[17].
3.2 Document Store
These databases do not have a well predefined schema and this makes them more
complex as compared to Key-Value stores. They are more flexible and designed to
support data documents that are in XML, BSON (Binary JSON) or JSON formats.

6 | P a g e
Document Store databases are a perfect choice when dealing with huge amounts of data
documents [15].
3.2.1 MongoDB
This is a document Store, non-relational, open-source database developed by
10gen. The name mongo is extracted from the word humongous. It provides high
availability, high performance, and automatic scaling and allows data insertion
without a predefined schema. A record in MongoDB is composed of field and
value pairs and are similar to JSON objects. The values of fields may consist of
arrays, and arrays of documents or other documents MongoDB maintains data
consistency in the sense that one write operation to the data in the database allows
subsequent reads to retrieve the same value until the next update. These databases
are optimized for read operations. They use a locking mechanism that contributes
to increased execution time as the number of update operation increases.[1,9].
AnamZahid, Rahat Masood, Muhammad AwaisShibli, in their paper describe how
MongoDB offers horizontal scale-out for databases using a technique called
sharding. With sharding, data is distributed across multiple physical partitions
known as shards. This was designed in order to address the hardware limitations
where only a single server existed and contributed to such things as bottlenecks in
RAM or disk I/O. MongDB has the sharding functionality automatically built
into the databases and as the size of the data grows, MongoDB automatically
balances the data in the shard and so when the size of clusters decreases or
increases. As a result, a dynamically balanced load is experienced. Concurrency
control measures for multiple clients accessing the same database are
enforced by MongoDB by managing multi-threaded access to shared objects and
data structures. [1][18].
3.2.2 CouchDB
CouchDB is a database that best suites the web and supports data with JSON
documents. Data can be accessed on a web browser via HTTP as it provides a
RESTFUL-style API and this makes it function well with modern mobile and web
apps. CouchDB works well with modern web and mobile apps and also supports
incremental replication which makes it easy to distribute data. It maintains data
consistency by complying with ACID properties. CouchDB is based on
JavaScript and its limited only to HTPP requests [17].

7 | P a g e
3.3 Column Family
Column Family databases store data in grouped columns rather than rows of data.
These columns are logically grouped together into column families that may
contain a virtually infinite number of columns created at runtime. CRUD
operations are done using columns rather than rows. Examples of Column Family
databases are HBase, Yale University HadoopDB, Facebook's, Cassandra,
Hypertable, Google's Big table, and Yahoo's PNUTS [17].
3.3.1 Cassandra
Column-oriented databases, such as Cassandra, have proven to be highly scalable
and consistent. It is a distributed database system that was designed to administer
huge volumes of structured data spread across multiple server clusters that have
been deployed in different geographical locations. The design and implementation
of Cassandra is relatively similar to RDBMS only that they differ in terms of
control over data structure as Cassandra offers a simple data model. Cassandra
also takes advantage of cheap commodity servers and manage high read and write
output. This helps to cut cost and increase business value.
The aims of designing Cassandra has been greatly achieved. Several companies
have adopted and benefited from Apache Cassandra including leading ones such
as Netflix, Twitter, Cisco, eBay, Adobe and Comcast. [19].
3.4 Graph Databases
Graph databases are suitable for working with highly inter-linked data, for example, road
maps, transport routes and also social networking sites. Each node of a graph
database points to an adjacent node and so does not need to index every node of the
graph. This is referred to as index-free adjacency. Social networking sites with heavily
related data are best served by these types of databases. Graph databases are best known
for serving special purpose of handling relation-heavy data.[8][15].
3.4.1 Neo4j
Neo4j is one of the NoSQL graph databases. It is an open-source graph database
based on Java. These databases are considered to be highly extensible hence
providing high performance and high reliability when dealing with relationships.
A graph database also provides Create, Read, Update, and Delete (CRUD)
methods for data management and they are considered to be comply with ACID
(Atomicity, Consistency, Isolation and Durability) properties [20].

8 | P a g e
4. Analysis
This paper also provides an in-depth analysis on the performance of NoSQL databases and if
they can live up to their expectation. NoSQL databases have proven to be faster and reliable in
terms of performance and they are also high scalable as compared to traditional relational
databases. However, NoSQL databases have some drawbacks that we are yet to analyze.
N. Leavit, Yishan Li and Sathiamoorthy Manoharan [21][22] have noted that some NoSQL
databases perform faster for simple tasks and take up much time for complex operations.Again,
NoSQL databasesdo not offer a higher degree of reliability and consistency due lack of native
support for ACID properties.In other words, there's no guarantee that database transactions will
be processed reliably. N.Leavit, Yishan Li and Sathiamoorthy Manoharan also noted that
NoSQL databases are a new technology and as a result many organizationsare still unfamiliar to
it and they seem to lack knowledge as to which approach best supports their needs and there are
limited customer support tools [21][22].
Security concerns were also noted by Fidelis Cyber Security [23] where incidents of data
breaches occurred at Mongo headquarters and LinkedIn.
Data breaches occurred at Mongo Headquarters (Oct 2013) [24] and LinkedIn (July 2012) [25]
underscores the importance of NoSQL data security as more and more companies are bracing for
the new family of products. Although the above two incidents are caused by weak encryption of
passwords, and not directly linked to any known NoSQL vulnerability, they point to a fact that
NoSQL are becoming targets of attackers who seek valuable information. NoSQL database may
become even more susceptible to exploits once attackers overcome the learning curve, and are
able to identify hidden security or software weaknesses.
Almost all NoSQL databases are considered as products that are still work-in-progress.
MongoDB's current manual states "The most effective way to reduce risk for MongoDB is to run
your entire MongoDB deployment in a trusted environment" [18].

9 | P a g e
5. Conclusion
Today, we have seen the popularity of NoSQL databases increase due to huge amounts of data
are that is being collected and processed every single moment. These databases have
overpowered relational databases as they deal with large volumes of data that are semi structured
or non-structured. Different types of NoSQL databases have different set of characteristics and
hence, they also differ in performances. When deciding the database to use, performance will be
the most important factor to consider. It is very necessary to analyze after drawing a comparison
between difference NoSQL databases and their execution time and have a performance
conclusion. In this paper, we evaluated some of the most popular NoSQL databases such as:
OrientDB and MongoDB which are Document Store databases, HBase and Cassandra are
Column Family databases, Redis being a Key-value Store database and also Neo4j from Graph
databases.
This report also indicated that some databases may depend on the amount of the computer’s
volatile memory. This storage type is much more expensive as compared to the disk storage.
Document databases such as MongoDB were designed to address the hardware limitations where
only a single server existed and contributed to such things as bottlenecks in Read-Only Memory
(RAM) or disk I/O. These databases are optimized for read operations. They use a locking
mechanism that contributes to increased execution time as the number of update operation
increases.
Cassandra and HBase are NoSQL databases that store all the changes that have been performed
by use a log. Subsequent disk flushing is performed as records are stored in memory. This
follows sequential writing of data to disk, and because of the disk flushing mechanism, the
amount of disk operation is reduced. Hence these databases are optimized for performing
updates, and read operations become time consuming as compared to Document databases such
as MongoDB.
In the future report, I intend to carry out experiments on these NoSQL databases so that I will be
able to compare and analyze their performance as this will let me understand their behaviors
better as they run on different parallel and distributed environments.

10 | P a g e
References
[1]MondoDBInc: https://www.mongodb.com/nosql-explained
https://docs.mongodb.org/master/MongoDB-manual-master.pdf
[2] http://www.ukessays.com/essays/information-technology/mongodb-is-new-generation-
database-information-technology-essay.php
[3] MongoDB- The Leading NoSQL Database [Online] Available from
https://www.mongodb.com/leading-nosql-database [Accessed 10 Oct, 2015]
[4] Lawrence, R., (10-13 March 2014) Integration and Virtualization of Relational SQL and
NoSQL Systems Including MySQL and MongoDB
[5] https://docs.oracle.com/javase/tutorial/jdbc/overview/database.html
[6] http://www.zdnet.com/article/rdbms-vs-nosql-how-do-you-pick/
[7] Jim Gray, The Transaction Concept:Virtues and Limitations :1981
[8] Vaish, G. (2013) Getting started with NoSQL. Birmingham – Mumbai: PACKT Publishing.
[9] Eric Redmond Jim R. Wilson (2012); Seven Databases in Seven Weeks
http://media.pragprog.com/titles/rwdata/intro.pdf
[10] Robert J. Robbins (1995) Database Fundamentals available from: http://www.esp.org/db-
fund.pdf
[11] EMC Corporation, (2012) Big Data
available from: http://www.emc.com/big-data/marketing.htm
[12] EMC Corporation, (2015) Big Data
available from http://www.emc.com/leadership/digital-universe/2012iview/big-data-
2020.htm
[13] Stonebraker, M., Communications of the ACM, 53(4): 10-11, 2010. SQL databases vs.
NoSQL databases.
[14] Couchbase 2013. Making the Shift from Relational to NoSQL [Online]
Available from:
http://www.couchbase.com/sites/default/files/uploads/all/whitepapers/Couchbase_Whitepape
r_Transitioning_Relational_to_NoSQL.pdf [Accessed 16 Aug 2014]

11 | P a g e
[15] VeronikaAbramova, Jorge Bernardino, Pedro Furtado: Open Journal of Databases (OJDB)
Volume 1, Issue 2, 2014 Which NoSQL Database? A Performance Overview
available from: https://www.ronpub.com/publications/OJDB-v1i2n02_Abramova.pdf
[16] Kota Tsuyuzaki, Makoto Onizuka, (2015): NTT Technical Review, NoSQL Database
Characteristics and Benchmark System Available from:
online:https://www.nttreview.jp/archive/ntttechnical.php?contents=ntr201212fa3.html
[17] Jing Han;Haihong, E. ; Guan Le ; Jian Du;(2011) Survey on NoSQL Database
available from: http://0
ieeexplore.ieee.org.library.gmit.ie/stamp/stamp.jsp?tp=&arnumber=6106531
[18] MongoDB Inc. "MongoDB Architecture Guide." Internet:
http://info.mongodb.com/rs/mongodb/images/MongoDB_Architectur e_Guide.pdf, Mar.
2014.
https://docs.mongodb.org/v2.4/core/security-introduction/
[19] Guoxi Wang, Jianfeng Tang (2012); The NoSQL Principles and Basic Application of
Cassandra Model
http://0-ieeexplore.ieee.org.library.gmit.ie/stamp/stamp.jsp?tp=&arnumber=6394574
[20] MAYisong, WU Zhigang, GUAN Lin, ZHOU Baorong, and LI Rongrong (2014); Study on
the relationship between transmission line failure rate and lightning information based on
Neo4j
` http://0-ieeexplore.ieee.org.library.gmit.ie/stamp/stamp.jsp?tp=&arnumber=6993713
[21] Yishan Li and SathiamoorthyManoharan (2013); A performance comparison of SQL and
NoSQL databases
http://0-
ieeexplore.ieee.org.library.gmit.ie/xpl/articleDetails.jsp?arnumber=6625441&newsearch
=true&queryText=A%20performance%20comparison%20of%20SQL%20and%20NoSQ
L%20databases
[22] N. Leavitt, (2010, pp. 12 –14) “Will NoSQL databases live up to their promise?” Computer,
vol. 43, no. 2
[23] Fidelis Cybersecurity (2014); Current Data Security Issues of NoSQL Databases
https://www.fidelissecurity.com/files/NDFInsightsWhitePaper.pdf
[24] "Hosting Service MongoDB Suffers Major Security Breach That ExplainsBuffer's Hack
Over the Weekend", http://techcrunch.com/2013/10/29/hosting-service-mongohq-
suffers-major-security-breach-that-explains-buffers-hack-over-the-weekend/
[25] "LinkedIn Suffer Data Breach", http://www.reuters.com/article/2012/06/06/net-us-
linkedin-breach-idUSBRE85511820120606

The Rise of Nosql Databases

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to The Rise of Nosql Databases

Similar to The Rise of Nosql Databases (20)

Recently uploaded

Recently uploaded (20)

The Rise of Nosql Databases