SlideShare a Scribd company logo
Comparison of HBase and Cassandra: The two
NoSQL Databases
Shantanu Deshpande
x18125514
Abstract:
The recent years have seen a rapid growth in the digital world,
and it has resulted in an increased data complexity in terms of its
volume, velocity and variety termed as Big Data. For instance, nowadays,
social media websites are generating terabytes, petabytes of information
on daily basis which needs to be collected and effectively managed in
real time. The rate at which read-write operations are being performed is
immense with expectations of even faster retrievals and loading. The
traditional methods like SQL are incapable to process the new generation
data due to lack of high scalability, structure and elasticity needs. Of late,
NoSQL has surged in popularity as they are claimed to perform better
than traditional methods. The two widely popular NoSQL databases are
HBase and Cassandra. In this paper, we will examine the performance of
these two databases and compare the results thus obtained through
different operations on the Ubuntu interface.
Keywords: Big Data, SQL, NoSQL, HBase, Cassandra, Ubuntu
1) Introduction:
Due to the advent of digital age and the growing number of internet users worldwide, there
has been an astounding increase in the data across the globe. One such example is Internet
of Things which performs real-time analysis and continuously gathers data through its
sensors.
Managing all these data is a complex task and a challenge for the companies that own the d
ata and need it to be processed further. Previously, it was possible for the organizations to
maintain the data with the help of relational database management systems however as the
load kept on increasing, the processing time increased significantly and resulted in high
latency in query processing, data transmission rate went down significantly and it had poor
horizontal scalability. This had an adverse impact on the associated cost of data processing
thereby increasing company ioverheads and still getting poor performance. As a result of the
drawbacks of relational database systems, NoSQL was introduced around a decade back.
The characteristics include - Design simplicity, simpler "horizontal" scaling to machine
clusters and improved performance because of node-to-node architecture.
Because it has structure storage, the irelationship database SQL acts as a subset of
NoSQL.Unlike the vertical scalability scheme of traditional databases, results in lower
maintenance costs. (Anon., n.d.)
The four types of NoSQL databases are –
Column - There is only one column of data in each storage block. Ex. Cassandra and HBase
Document - The document - oriented system is based on the document's internal structure
to extract metadata for further optimization. Ex. MongoDB
Graph– A database that depicts and stores data with nodes, edges and properties using
semantic graph structures. Ex. Neo4j
Key Value - Is a storage, retrieval and management paradigm for associative arrays,
commonly known as hash tables. Ex. Amazon S3.
2) Key Characteristics:
2.1 HBase:
HBase is an open-source project built on top of Hadoop file system. It is a distributed
column-oriented database and is horizontally scalable. HBase is not a relational data store
hence it does not support a structured query language like SQL. Much like a traditional
database, HBase also comprises of tables that contain rows and columns and it must define
an element as Primary key.
The key characteristics of HBase are-
• Consistency: For high-speed requirements. It is suitable to use HBase as it provides
consistent read-write operations.
• Sharding: It is a process of division of logical database into smaller, more
manageable parts called as data shards. This process reduces the I/O time and
overhead. The split can be done either automatically or manually at a threshold size.
• Atomic Read and Write: While the system is processing one read or write operation,
all other processes are iprevented from performing another read or write operation.
This is known as atomic read/write. HBase performs this on a row-level.
• High Availability: As HBase offers WAN and LAN, it supports recovery and failover.
Basically, at the core it has a master server, which handles the metadata for the
cluster as well as monitors the region servers.
• It also has an effortless Java API for the client.
Based on above characteristics, it is ideal to use HBase wherever there is requirement of
write heavy operations. It is also used where there is a need to provide quick random access
to the available data.
2.2 Cassandra:
Cassandra is an open source, distributed and a decentralized system that has been
designed to manage humongous amounts of data. It provides no single point of failure with
highly available service.
The key characteristics of Cassandra are-
• Always on architecture: It does not have a single point of failure thus ensuring that
no critical business application fails.
• Flexible data storage: Cassandra can accommodate any possible type of data i.e. the
data can be either structured, semi-structured or unstructured. According to the
requirement it can accommodate changes to the data structures.
• Data distribution: Data is replicated across multiple data centres; Cassandra thus
provides the flexibility to distribute data as and where it is required.
• Elastic scalability: It is one of the key characteristics of Cassandra. It is possible to
easily scale-up or scale-down the cluster, as it provides the flexibility for deletion and
addition of any number of nodes without any disruptions.
• Faster linear-scale performance: It is able to achieve and maintain quick response
time by increasing the throughput as you go on increasing the number of nodes.
• Tunable Consistency: Cassandra has two types of consistency, Strong consistency
and Eventual consistency. Whenever the cluster accepts a write, eventual
consistency is responsible and imakes it sure that it is approved by the client. Strong
consistency, on the other hand, ensures that any update is transmitted to all nodes
or machines where the data is appropriate.
3) Architecture:
3.1 HBase:
There are three important components in HBase architecture, HMaster, Zookeeper and
Region Server.
HMaster: HBase HMaster does the task of assigning the regions to region servers in the
Hadoop cluster for uniform load balancing.
Region Server: They are the worker nodes that handle the transactional queries like read,
write, update and delete from the clients. This process runs on every node within the
Hadoop cluster.
ZooKeeper: It is a centralized monitoring server that does the task of region assignment and
recovers any server region crashes by loading it to other working region servers.
3.2 Cassandra:
Cassandra is designed to handle large data workloads with no single point of failure a
cross multiple nodes. The architecture is based such that it is understood that both hardwar
e and system failures do occur. Cassandra addresses the issue of failures by using a peer -
to peer distributed system across homogeneous nodes where data is distributed across all cl
uster nodes. All nodes within a cluster play similar role. Each node is interconnected with
other nodes and is also independent.
Key Structure:
Node: Here the data is stored and is the basic infrastructure component of Cassandra.
Datacentre: A collection of related nodes is termed as datacentre.
Cluster:It contains one or more datacentres.
Commit log: Complete data is first written on the commit log. Once the data is transferred
to SSTable, then it is either deleted, archived or recycled.
SSTable: A sorted string table (SSTable) is an unchangeable data file that Cassandra
periodically writes memtables to.
CQL Table: A collection of columns that have been ordered by table row. A table is made up
of columns with a primary key.
4) Comparison between the two:
For the purpose of designing distributed database systems, the CAP theorem made the
designers aware about the various trade-offs that need to be considered beforehand. This
theorem applies to distributed systems that store data and stands for Consistency,
Availability and Partition tolerant. The key aspect is to lookout if the database is able to
achieve at least two parameters out of the three. Here, we have compared HBase and
Cassandra in terms of their Scalability, Availability, Reliability and Security.
Scalability:
A database's scalability is characterized by its capacity to deal with a lot of information
together with high effectiveness of execution. Here we can say that the HBase is profoundly
versatile as the information is disseminated evenly along the tables when it develops in the
database. It can be supported very well asthe HBase relies on Google's Big Table. We can
watch dynamic tables circulation in HBase. Horizontal Scalability can be observed in Hbase
over the Region Servers as it acts as slaves in the cluster. Region, in HBase, is termed as the
basic unit for horizontal scalability.
Regions are a subset of data from the table and are basically a contiguous, sorted range of r
ows that are stored together. Initially, a table has only one region. Once the number of rows
increases and the region becomes large, it is split into two at middle key, thereby creating
two almost equal halves.
In case of Cassandra, the database is linear scalable. That means, by simply adding new
nodes the scalability can be increased. It is possible to scale Cassandra database both
horizontally, by adding more databases or vertically, by adding more nodes.
Availability:
Availability of a database means that any request given to the database as an input should
receive a response from the system, either success or failure. Also, it refers to the
accessibility of data even incase of ifailure of server or data nodes in the cluster. If the
database has high availability, then this will lead to fewer interruptions for the client in the
event of server failure. As we know, HBase has a master-slave relationship just like HDFS
however it also has a HMaster thus having many masters thereby ensuring that even incase
one of the masters fail to communicate, the data transmission would not be halted. This
would no doubt create inconsistency in the data but as explained above, in order to satisfy
CAP theorem, it is fine to proceed even if any of the two parameters are fulfilled. In case of
Cassandra, it does not have a master-slave relationship. Just that all the nodes are same and
there is no master node for controlling all other nodes. And thus, this avoids single point of
failure. Cassandra also provides replication feature, which means that even if any of the
node within the cluster goes down, one or more copies are available on different machines
within the cluster. Source: (Anon., n.d.)
Reliability:
Reliability of a database is measured by its performance in terms of its deliverables
which should ideally be as per defined specifications. A highly reliable system is the one
which shows same or better performance even in the event of any environment changes or
fault in the system. Zookeeper assures the reliability for HBase. Znodes are present which
act as the subordinate. Once the Zookeeper receives a request from the client it then runs
on all the Region servers and data is then stored across various levels. Through various
experiments it has also been observed that HBase performance efficiency increases as the
workload increases thus assuring higher reliability. Even Incase of Cassandra, due to the
distributed ring structure and replication of nodes, Cassandra is also considered to be
reliable.
4.2 Security:
HBase:
The key security features available in HBase, according to (Anon., n.d.) are-
1. Authentication:
For gaining a secure access to a database, it should be must that client authenticate with the
server for establishing credentials. The various options for authentication are-
• Client authentication: There are numerous security protocols for allowing clients to
authenticate with the database. For HBase they are - Kerberos, SSL.
• Server Authentication: Different database servers must as well authenticate
with each other for ensuring a secure operating environment. In HBase,
shared keyfile is the one such method.
2. Role Based Security:
Role - based security simplifies the administration and operations of security
considerably. There are various security role features available in HBase for
supporting ease of administration; they are, custom roles, default roles etc. It is also
important to define the scope of roles as this would be useful for systems that
normally support extremely sensitive data.
3. Database Security:
HBase supports database encryption and it is highly important to encrypt the data in
sensitive application domains. Logging is also essential for recording all the activities
and interaction of clients with the system for auditing and detailed investigations.
Administrator is able to define which security groups to be logged. In Hbase, fixed
event logging and configurable event logging are the options supported for logging.
Cassandra:
According to (Anon., n.d.) the three main components of the security features furnished by
Cassandra are –
1. TLS/SSL encryption for inter-node communication and client.
There are two options in Cassandra for ensuring encryption and both are managed
separately and need to be configured independently – client-to-node encryption and node-
to-node encryption. When encryption is enabled, both the cipher suites and JVM defaults
are utilized. Although these can be overridden using settings, it is not recommended unless
certain specific settings need to configure as per certain policy.
2. Client Authentication
Authentication is configured in Cassandra using the ‘authenticator’ setting in
Cassandra.yaml. Under default settings, the system does not perform any
authentication checks and thereby requires no credentials. Password Authenticator
is also included in the package that stores encrypted credentials.
3. Authorization
Similar to encryption, there are two options for authorization. By default, no check is
performed thus allowing all permissions to all roles. Cassandra also includes
Cassandra Authorizer which provides functionality to manage full permissions and
the related data is stored in Cassandra system tables.
5 Learning’s from Literature Review:
With the development of the Internet and cloud computing, databases are needed t
o be able to effectively store and process big data, demanding high performance when readi
ng and writing, while the traditional relational database confronts many new challenges.
(Han, 2011)
Especially in large scale and highly competitive applications such as search engines a
nd SNS, it appeared to be inadequate ito use the relational database to store and query dyn
amic user data. NoSQL database has been created in this case. With the exponential growth
in the global data generation, the demands from the database technology grew significantly.
Some of them being, iireading and writing simultaneously with low latency, Efficient
requirements for large data storage and access, improved scalability and ihigh availability
and Lower operating and management costs. These were some of the key limitations of
traditional relational databases. iTo overcome this, NoSQL has emerged as an alternative
paradigm for this new non-relational data schema (Dede, 2013). NoSQL database features
described above are common; in reality, each product is compliant with the various data
models and the CAP theorem. CAP theorem stands for Consistency, Availability and
tolerance of network Partition. The core idea of CAP theorem is that a distributed system
cannot simultaneously meet the three needs but can only meet two (Han, 2011). Depending
on the project requirements, idifferent storages offer different consistency levels. These
options enable users to choose various trade-offs like availability, latency and consistency.
(Kumar, 2014). Therefore, in order to understand which system would be better, it is
essential to assess the performance of each storage system so as to judge the appropriate
storage type for a particular application. In this paper ( (Abubakar, 2014)) the author
attempts to introduce YCSB, an open source tool provided by Yahoo that allows
benchmarking multiple systems and comparing them by creating workloads. Distributed
systems are often more complicated than their isingle-network counterparts due to the
trade-off which need to be balance as per the applications requirements. The author made
an attempt in this paper to upgrade YCSB in such a way that the YCSB could calculate stale
reads in real time. One can use the model created in this paper to calculate the trade-offs
between availability, latency and consistency.
According to (Dede, 2013)Internet applications are rapidly increasing, generating
enormous iamounts of data. In order to store humongous amounts of data we make use of
NoSQL database systems like HBase and Cassandra as they are widely used by many
organizations as their storage solution. The author ihas tried to test the Cassandra database
based on its performance. In this paper, the author has discussed how Cassandra's different
features, like replication and data partitioning, affect the performance of Apache Hadoop.
Then a test model is introduced that icarries out the testing on the basis of the system's
performance and ensures that it considers the architecture and its business while
conducting the testing. Finally, these tests are applied at the level of the architecture based
on performance, which also includes ifew performance-based elements such as the column-
oriented data model, the split mechanism data model and the data replication factor. A test
procedure is performed for each performance element and a test scenario is designed. Due
to the continuous development of cloud computing, non - structural data storage is also
steadily increasing. The schema evaluation iwas divided into a separate unit known as a
schematic analyser. The schema analyser therefore does not have to rely on web
applications and can be connected to visual tools.
Performance of five NoSQL databases in another study by (Tang, 2016) included
Cassandra and HBase and they were compared on YCSB (Yahoo Cloud Serving Benchmark).
The experiment involved three different workloads- Workload A (50% read and 50% write),
Workload C (100% read) and Workload H(100% write). These workloads have been
performed on iaround 10000 operations out of the 100,000 loaded operations. Out of the
two experiments that were conducted, the initial was for executing total time taken by
these databases iagainst all three workloads. Redis turned out to be superior than the other
databases as the time taken for loading and executing the data was less. As compared to
Cassandra and HBase, it was 1.43 and 3.61 times faster respectively. Second experiment was
for the Throughput. Notably, all the five databases showed a similar trend in this
experiment. Here as well, Redis performed significantly better than the other databases. In
this case, Cassandra performed isignificantly better with greater throughput than HBase.
Based on the experiments, it proved that Redis database is ibetter capable for execution and
loading of the workloads and thus this study thereby proved to be a motivation for our
study. In the following section, we will work on finding out these experiments are relevant
to the study that we have performed in this paper.
6) Performance Test Plan:
For the execution of the process and the subsequent comparison of the two databases, we
first created an instance on the OpenStack which is hosted on cloud. Then we assigned a
floating ip to this instance for getting access to Ubuntu system. A keypair was generated
with authorized keys in the ssh directory. Then we installed Hadoop along with HBase. For
initiating Hadoop installation, we first installed Java version 8 and created a group with
name Hadoop group and a user named hduser. Then we disabled the IPV6 and downloaded
the Hadoop, unzipped the file and assigned hduser to Hadoop file by creating a symbolic
link. The various xml files, namely, hadoop-env.sh, core-site and hdfs-site were edited
according to the manual. Thereafter, we formatted the name node and started the dfs and
yarn.
After successful installation of Hadoop, we installed HBase. Similar to the Hadoop
process, we downloaded HBase from website, unzipped it and a symbolic link was
established. Then we edit the hbase-env.sh file, start the HBase and create a user table.
YCSB, a benchmarking tool, was then installed in the system by referring the lab manual.
Test harness was already created in ycsb specifying workload type, number of opcounts,
database type, etc. As per our requirement the files in the test harness were updated.
Workload types considered were Workload A and Workload C and three opcounts were
considered, 100000, 150000 and 200000 for both HBase and Cassandra. Workload A is a
combination of 50% reads and 50% writes whereas Workload C is 100% read. The process
was run using command runtest.sh for 3 times. Following this, Cassandra was downloaded
and installed in system by following the guidelines given in an online manual. (Anon., n.d.).
The files in the test harness were modified as required for Cassandra and similar activity was
performed. After successful completion of both HBase and Cassandra, the average of the
output was then evaluated.
Devise Specifications:
• Sony Vaio Fit 14 SVF14A15SNB
• 8GB RAM
• Intel Core I5 (3rd Generation)
• 1.8 GHz With Turbo Boost Upto 2.7 GHz
• 1TB HDD
Databases:
• HBase
• Cassandra
Workload Type:
• Workload A: 50% read and 50% write
• Workload C: 100% read
Operating Environment:
Open stack
• Name: m1. medium
• VCPU’s: 2
• RAM: 4GB
• Disk size: 40GB
• MSc data-net
7. Evaluation and Results:
Here, we have performed two workload tests, Workload A and Workload C against our two
databases, HBase and Cassandra using YCSB as the benchmarking tool. Following are the
test specifications:
Workload A:
1. Read: 50 %
2. Update: 50 %
Workload C:
1. Read: 100 %
7.1Workload A Results:
7.1.1 Average Insert latency vs. overall throughput
Database Workload A Count
[OVERALL]
Throughput(ops/sec) [INSERT] AverageLatency(us)
Cassandra Count 1 100000 1830.161054 471.04802
Cassandra Count 2 150000 2207.667967 405.07366
Cassandra Count 3 200000 2472.157328 361.238945
Hbase Count 1 100000 1907.632437 432.87972
Hbase Count 2 150000 2395.821687 394.2167
Hbase Count 3 200000 2363.256094 395.25737
• Here, we are comparing the average insert latency with the overall throughput.
• If the database latency is lower, then we can say that the database performance is
good.
• The latency of HBase is less than Cassandra for a lower count but as the data size
increases, the latency rate of Cassandra drops below that of HBase.
7.1.2 Average Update Latency vs. Update operations
Database
Workload
A Count
[UPDATE]
Operations
[UPDATE] Average
Latency(us)
Cassandra Count 1 100000 50118 405.3865677
Cassandra Count 2 150000 75256 394.719039
Cassandra Count 3 200000 99707 335.3669451
HBase Count 1 100000 49972 387.6640519
HBase Count 2 150000 75369 373.7879101
HBase Count 3 200000 100005 383.7894005
• Here, we are comparing average update latency with update operations.
• As the workload increases, the number of update operations are increased, the
latency of HBase increases whereas that of Cassandra decreases significantly.
• This shows Cassandra performs better for update operations when the workload is
high.
7.1.3 Read operations vs. Avg. Read latency
Database
Workload
A Count
[READ]
Operations
[READ] Average
Latency(us)
Cassandra Count 1 100000 49882 499.271621
Cassandra Count 2 150000 74744 526.7530905
Cassandra Count 3 200000 100293 439.4176164
HBase Count 1 100000 50028 334.307208
HBase Count 2 150000 74631 314.4751645
HBase Count 3 200000 99995 325.520106
• Here, we compare the Read operations with the average read latency.
• From this graph, we can interpret that the average latency for HBase is consistent
even with the increase in workload whereas for Cassandra, as the workload
increases beyond count 2, the latency rate drops significantly.
7.2 Workload C
7.2.1 Average Insert latency vs. overall throughput
Database
Workload
C Count
[OVERALL]
Throughput(ops/sec)
[INSERT] Average
Latency(us)
Cassandra Count 1 100000 2072.023538 412.90974
Cassandra Count 2 150000 2202.610828 406.38106
Cassandra Count 3 200000 2521.718299 360.39823
HBase Count 1 100000 2312.726936 401.83924
HBase Count 2 150000 2421.893921 395.4957067
HBase Count 3 200000 2276.789272 410.009585
• Here, we compare Average Insert latency with overall throughput for Workload C.
• It can be observed from the graph that the latency for Cassandra is lower than
• HBase in all the three cases and also it is decreasing as the workload is increasing.
7.2.2 Average Read Latency(us) vs. Overall Throughput(ops/sec)
Database
Workloa
d C Count
[OVERALL]
Throughput(ops/sec)
[READ]
AverageLatency(us)
Cassandr
a Count 1 100000 2058.248431 416.41129
Cassandr
a Count 2 150000 2350.581377 378.94206
Cassandr
a Count 3 200000 2494.636531 366.025005
HBase Count 1 100000 3037.667072 283.80343
HBase Count 2 150000 3758.833258 254.5179933
HBase Count 3 200000 4144.734115 221.76259
• Here, Here, we have compared Average Read Latency with Overall Throughput for
Workload C.
• From the graph it is visible that the start count has the maximum latency rate for both the
databases, HBase and Cassandra, although as the workload increases, the latency rate for
both the databases drops significantly.
8 Conclusions and Discussion:
In this paper, we have explained the underlying concepts of HBase and Cassandra database. The
benchmarking tool that was used for the comparison is Yahoo! Cloud Servicing Benchmark
(YCSB) to determine which database performed better under different workload scenarios.
Similar count of workloads was provided to each of the database. The workloads that were
provided are 100000, 150000 and 200000. We have used two types of workloads here, A and C.
Workload A supports 50% read and 50% write operations and workload C which supports 100%
read operations. Upon visualization of the data on Tableau, we found out that the latency
behaviour of HBase is different than that of Cassandra. Although in both databases the latency
rate is decreasing upon increase in workload, this rate is more in Cassandra database than
HBase. In workload A, as the update operations increases the average latency for Cassandra
database goes below the HBase latency rate. Overall, we can observe that for higher workloads
the performance of Cassandra is better than that of HBase and we can recommend to use
Cassandra for higher workload requirements. Also, all the benchmarking parameters were
available in the YCSB tool hence we can say that it is one of the great tool for benchmarking
several NoSQL databases on cloud environment.
Bibliography
Abubakar, Y., 2014. Performance Evaluation of NoSQL Systems using YCSB in a Resource
Austere Environment. ResearchGate.
Anon., n.d. An Evaluation of Cassandra for Hadoop. [Online]
Available at: http://sci-hub.tw/https://ieeexplore.ieee.org/abstract/document/6676732
[Accessed 2019].
Anon., n.d. Cassandra. [Online]
Available at: https://www.rapidvaluesolutions.com/tech_blog/cassandra-the-right-data-
store-for-scalability-performance-availability-and-maintainability/
[Accessed 2019].
Anon., n.d. Cassandra Installation. [Online]
Available at: https://www.vultr.com/docs/how-to-install-apache-cassandra-3-11-x-on-
ubuntu-16-04-lts
[Accessed 2019].
Anon., n.d. Cassandra-Security. [Online]
Available at: http://cassandra.apache.org/doc/latest/operating/security.html
[Accessed 2019].
Anon., n.d. HBase security features. [Online]
Available at: https://quabase.sei.cmu.edu/mediawiki/index.php/HBase_Security_Features
[Accessed 2019].
Dede, E., 2013. An Evaluation of Cassandra for Hadoop. IEEE.
Han, J., 2011. Survey on NoSQL database. IEEE.
Kumar, S. P., 2014. Evaluating consistency on the fly using YCSB. IEEE.
Tang, E., 2016. Performance Comparison between Five NoSQL Databases. IEEE.

More Related Content

What's hot

Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
Arjen de Vries
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
Nalini Mehta
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
Anuja Gunale
 
Hadoop and HBase in the Real World
Hadoop and HBase in the Real WorldHadoop and HBase in the Real World
Hadoop and HBase in the Real World
Cloudera, Inc.
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sql
Anuja Gunale
 
Big Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data WorldBig Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data World
Jongwook Woo
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
Prashant Gupta
 
Sql vs NoSQL-Presentation
 Sql vs NoSQL-Presentation Sql vs NoSQL-Presentation
Sql vs NoSQL-Presentation
Shubham Tomar
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
datastack
 
Iaetsd mapreduce streaming over cassandra datasets
Iaetsd mapreduce streaming over cassandra datasetsIaetsd mapreduce streaming over cassandra datasets
Iaetsd mapreduce streaming over cassandra datasets
Iaetsd Iaetsd
 
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEMCASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
IJCI JOURNAL
 
Cloud Deployments with Apache Hadoop and Apache HBase
Cloud Deployments with Apache Hadoop and Apache HBaseCloud Deployments with Apache Hadoop and Apache HBase
Cloud Deployments with Apache Hadoop and Apache HBaseDATAVERSITY
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
datastack
 
Comparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheComparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs Apache
SandeepTaksande
 
No sql databases explained
No sql databases explainedNo sql databases explained
No sql databases explained
Salil Mehendale
 
NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-ConceptsBhaskar Gunda
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
Anil Gupta
 

What's hot (20)

Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
Hadoop and HBase in the Real World
Hadoop and HBase in the Real WorldHadoop and HBase in the Real World
Hadoop and HBase in the Real World
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sql
 
Big Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data WorldBig Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data World
 
Datastores
DatastoresDatastores
Datastores
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Sql vs NoSQL-Presentation
 Sql vs NoSQL-Presentation Sql vs NoSQL-Presentation
Sql vs NoSQL-Presentation
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
 
Iaetsd mapreduce streaming over cassandra datasets
Iaetsd mapreduce streaming over cassandra datasetsIaetsd mapreduce streaming over cassandra datasets
Iaetsd mapreduce streaming over cassandra datasets
 
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEMCASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
 
Datastores
DatastoresDatastores
Datastores
 
HDFS Architecture
HDFS ArchitectureHDFS Architecture
HDFS Architecture
 
Cloud Deployments with Apache Hadoop and Apache HBase
Cloud Deployments with Apache Hadoop and Apache HBaseCloud Deployments with Apache Hadoop and Apache HBase
Cloud Deployments with Apache Hadoop and Apache HBase
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
 
Comparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheComparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs Apache
 
No sql databases explained
No sql databases explainedNo sql databases explained
No sql databases explained
 
NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-Concepts
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 

Similar to Dsm project-h base-cassandra

Cassandra tutorial
Cassandra tutorialCassandra tutorial
Cassandra tutorial
Ramakrishna kapa
 
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGEVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
ijiert bestjournal
 
cassandra
cassandracassandra
cassandra
Akash R
 
What is Scalability and How can affect on overall system performance of database
What is Scalability and How can affect on overall system performance of databaseWhat is Scalability and How can affect on overall system performance of database
What is Scalability and How can affect on overall system performance of database
Alireza Kamrani
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
hothyfa
 
Benchmarking Scalability and Elasticity of DistributedDataba.docx
Benchmarking Scalability and Elasticity of DistributedDataba.docxBenchmarking Scalability and Elasticity of DistributedDataba.docx
Benchmarking Scalability and Elasticity of DistributedDataba.docx
jasoninnes20
 
Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...
ijdms
 
2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx
RushikeshChikane2
 
Performance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODBPerformance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODB
Kaushik Rajan
 
Vskills Apache Cassandra sample material
Vskills Apache Cassandra sample materialVskills Apache Cassandra sample material
Vskills Apache Cassandra sample material
Vskills
 
Nosql Presentation.pdf for DBMS understanding
Nosql Presentation.pdf for DBMS understandingNosql Presentation.pdf for DBMS understanding
Nosql Presentation.pdf for DBMS understanding
HUSNAINAHMAD39
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesEditor Jacotech
 
A request skew aware heterogeneous distributed
A request skew aware heterogeneous distributedA request skew aware heterogeneous distributed
A request skew aware heterogeneous distributedJoão Gabriel Lima
 
Uint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdfUint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdf
Sitamarhi Institute of Technology
 
Uint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdfUint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdf
Sitamarhi Institute of Technology
 
Deep semantic understanding
Deep semantic understandingDeep semantic understanding
Deep semantic understanding
sidra ali
 
No SQL introduction
No SQL introductionNo SQL introduction
No SQL introduction
surabhi_dwivedi
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
ijfcstjournal
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
ijfcstjournal
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppt
hothyfa
 

Similar to Dsm project-h base-cassandra (20)

Cassandra tutorial
Cassandra tutorialCassandra tutorial
Cassandra tutorial
 
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGEVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
 
cassandra
cassandracassandra
cassandra
 
What is Scalability and How can affect on overall system performance of database
What is Scalability and How can affect on overall system performance of databaseWhat is Scalability and How can affect on overall system performance of database
What is Scalability and How can affect on overall system performance of database
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
 
Benchmarking Scalability and Elasticity of DistributedDataba.docx
Benchmarking Scalability and Elasticity of DistributedDataba.docxBenchmarking Scalability and Elasticity of DistributedDataba.docx
Benchmarking Scalability and Elasticity of DistributedDataba.docx
 
Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...
 
2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx
 
Performance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODBPerformance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODB
 
Vskills Apache Cassandra sample material
Vskills Apache Cassandra sample materialVskills Apache Cassandra sample material
Vskills Apache Cassandra sample material
 
Nosql Presentation.pdf for DBMS understanding
Nosql Presentation.pdf for DBMS understandingNosql Presentation.pdf for DBMS understanding
Nosql Presentation.pdf for DBMS understanding
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunities
 
A request skew aware heterogeneous distributed
A request skew aware heterogeneous distributedA request skew aware heterogeneous distributed
A request skew aware heterogeneous distributed
 
Uint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdfUint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdf
 
Uint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdfUint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdf
 
Deep semantic understanding
Deep semantic understandingDeep semantic understanding
Deep semantic understanding
 
No SQL introduction
No SQL introductionNo SQL introduction
No SQL introduction
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppt
 

More from Shantanu Deshpande

Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques
Shantanu Deshpande
 
Corporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniquesCorporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniques
Shantanu Deshpande
 
Analyzing financial behavior of a person based on financial literacy
Analyzing financial behavior of a person based on financial literacyAnalyzing financial behavior of a person based on financial literacy
Analyzing financial behavior of a person based on financial literacy
Shantanu Deshpande
 
Pneumonia detection using CNN
Pneumonia detection using CNNPneumonia detection using CNN
Pneumonia detection using CNN
Shantanu Deshpande
 
X18125514 ca2-statisticsfor dataanalytics
X18125514 ca2-statisticsfor dataanalyticsX18125514 ca2-statisticsfor dataanalytics
X18125514 ca2-statisticsfor dataanalytics
Shantanu Deshpande
 
Pharmaceutical store management system
Pharmaceutical store management systemPharmaceutical store management system
Pharmaceutical store management system
Shantanu Deshpande
 
Data-Warehouse-and-Business-Intelligence
Data-Warehouse-and-Business-IntelligenceData-Warehouse-and-Business-Intelligence
Data-Warehouse-and-Business-Intelligence
Shantanu Deshpande
 

More from Shantanu Deshpande (7)

Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques
 
Corporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniquesCorporate bankruptcy prediction using Deep learning techniques
Corporate bankruptcy prediction using Deep learning techniques
 
Analyzing financial behavior of a person based on financial literacy
Analyzing financial behavior of a person based on financial literacyAnalyzing financial behavior of a person based on financial literacy
Analyzing financial behavior of a person based on financial literacy
 
Pneumonia detection using CNN
Pneumonia detection using CNNPneumonia detection using CNN
Pneumonia detection using CNN
 
X18125514 ca2-statisticsfor dataanalytics
X18125514 ca2-statisticsfor dataanalyticsX18125514 ca2-statisticsfor dataanalytics
X18125514 ca2-statisticsfor dataanalytics
 
Pharmaceutical store management system
Pharmaceutical store management systemPharmaceutical store management system
Pharmaceutical store management system
 
Data-Warehouse-and-Business-Intelligence
Data-Warehouse-and-Business-IntelligenceData-Warehouse-and-Business-Intelligence
Data-Warehouse-and-Business-Intelligence
 

Recently uploaded

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 

Recently uploaded (20)

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 

Dsm project-h base-cassandra

  • 1. Comparison of HBase and Cassandra: The two NoSQL Databases Shantanu Deshpande x18125514 Abstract: The recent years have seen a rapid growth in the digital world, and it has resulted in an increased data complexity in terms of its volume, velocity and variety termed as Big Data. For instance, nowadays, social media websites are generating terabytes, petabytes of information on daily basis which needs to be collected and effectively managed in real time. The rate at which read-write operations are being performed is immense with expectations of even faster retrievals and loading. The traditional methods like SQL are incapable to process the new generation data due to lack of high scalability, structure and elasticity needs. Of late, NoSQL has surged in popularity as they are claimed to perform better than traditional methods. The two widely popular NoSQL databases are HBase and Cassandra. In this paper, we will examine the performance of these two databases and compare the results thus obtained through different operations on the Ubuntu interface. Keywords: Big Data, SQL, NoSQL, HBase, Cassandra, Ubuntu 1) Introduction: Due to the advent of digital age and the growing number of internet users worldwide, there has been an astounding increase in the data across the globe. One such example is Internet of Things which performs real-time analysis and continuously gathers data through its sensors. Managing all these data is a complex task and a challenge for the companies that own the d ata and need it to be processed further. Previously, it was possible for the organizations to maintain the data with the help of relational database management systems however as the load kept on increasing, the processing time increased significantly and resulted in high latency in query processing, data transmission rate went down significantly and it had poor horizontal scalability. This had an adverse impact on the associated cost of data processing thereby increasing company ioverheads and still getting poor performance. As a result of the
  • 2. drawbacks of relational database systems, NoSQL was introduced around a decade back. The characteristics include - Design simplicity, simpler "horizontal" scaling to machine clusters and improved performance because of node-to-node architecture. Because it has structure storage, the irelationship database SQL acts as a subset of NoSQL.Unlike the vertical scalability scheme of traditional databases, results in lower maintenance costs. (Anon., n.d.) The four types of NoSQL databases are – Column - There is only one column of data in each storage block. Ex. Cassandra and HBase Document - The document - oriented system is based on the document's internal structure to extract metadata for further optimization. Ex. MongoDB Graph– A database that depicts and stores data with nodes, edges and properties using semantic graph structures. Ex. Neo4j Key Value - Is a storage, retrieval and management paradigm for associative arrays, commonly known as hash tables. Ex. Amazon S3. 2) Key Characteristics: 2.1 HBase: HBase is an open-source project built on top of Hadoop file system. It is a distributed column-oriented database and is horizontally scalable. HBase is not a relational data store hence it does not support a structured query language like SQL. Much like a traditional database, HBase also comprises of tables that contain rows and columns and it must define an element as Primary key. The key characteristics of HBase are- • Consistency: For high-speed requirements. It is suitable to use HBase as it provides consistent read-write operations. • Sharding: It is a process of division of logical database into smaller, more manageable parts called as data shards. This process reduces the I/O time and overhead. The split can be done either automatically or manually at a threshold size. • Atomic Read and Write: While the system is processing one read or write operation, all other processes are iprevented from performing another read or write operation. This is known as atomic read/write. HBase performs this on a row-level.
  • 3. • High Availability: As HBase offers WAN and LAN, it supports recovery and failover. Basically, at the core it has a master server, which handles the metadata for the cluster as well as monitors the region servers. • It also has an effortless Java API for the client. Based on above characteristics, it is ideal to use HBase wherever there is requirement of write heavy operations. It is also used where there is a need to provide quick random access to the available data. 2.2 Cassandra: Cassandra is an open source, distributed and a decentralized system that has been designed to manage humongous amounts of data. It provides no single point of failure with highly available service. The key characteristics of Cassandra are- • Always on architecture: It does not have a single point of failure thus ensuring that no critical business application fails. • Flexible data storage: Cassandra can accommodate any possible type of data i.e. the data can be either structured, semi-structured or unstructured. According to the requirement it can accommodate changes to the data structures. • Data distribution: Data is replicated across multiple data centres; Cassandra thus provides the flexibility to distribute data as and where it is required. • Elastic scalability: It is one of the key characteristics of Cassandra. It is possible to easily scale-up or scale-down the cluster, as it provides the flexibility for deletion and addition of any number of nodes without any disruptions. • Faster linear-scale performance: It is able to achieve and maintain quick response time by increasing the throughput as you go on increasing the number of nodes. • Tunable Consistency: Cassandra has two types of consistency, Strong consistency and Eventual consistency. Whenever the cluster accepts a write, eventual consistency is responsible and imakes it sure that it is approved by the client. Strong consistency, on the other hand, ensures that any update is transmitted to all nodes or machines where the data is appropriate.
  • 4. 3) Architecture: 3.1 HBase: There are three important components in HBase architecture, HMaster, Zookeeper and Region Server. HMaster: HBase HMaster does the task of assigning the regions to region servers in the Hadoop cluster for uniform load balancing. Region Server: They are the worker nodes that handle the transactional queries like read, write, update and delete from the clients. This process runs on every node within the Hadoop cluster. ZooKeeper: It is a centralized monitoring server that does the task of region assignment and recovers any server region crashes by loading it to other working region servers. 3.2 Cassandra: Cassandra is designed to handle large data workloads with no single point of failure a cross multiple nodes. The architecture is based such that it is understood that both hardwar e and system failures do occur. Cassandra addresses the issue of failures by using a peer -
  • 5. to peer distributed system across homogeneous nodes where data is distributed across all cl uster nodes. All nodes within a cluster play similar role. Each node is interconnected with other nodes and is also independent. Key Structure: Node: Here the data is stored and is the basic infrastructure component of Cassandra. Datacentre: A collection of related nodes is termed as datacentre. Cluster:It contains one or more datacentres. Commit log: Complete data is first written on the commit log. Once the data is transferred to SSTable, then it is either deleted, archived or recycled. SSTable: A sorted string table (SSTable) is an unchangeable data file that Cassandra periodically writes memtables to. CQL Table: A collection of columns that have been ordered by table row. A table is made up of columns with a primary key. 4) Comparison between the two: For the purpose of designing distributed database systems, the CAP theorem made the designers aware about the various trade-offs that need to be considered beforehand. This theorem applies to distributed systems that store data and stands for Consistency, Availability and Partition tolerant. The key aspect is to lookout if the database is able to achieve at least two parameters out of the three. Here, we have compared HBase and Cassandra in terms of their Scalability, Availability, Reliability and Security. Scalability: A database's scalability is characterized by its capacity to deal with a lot of information together with high effectiveness of execution. Here we can say that the HBase is profoundly versatile as the information is disseminated evenly along the tables when it develops in the database. It can be supported very well asthe HBase relies on Google's Big Table. We can watch dynamic tables circulation in HBase. Horizontal Scalability can be observed in Hbase over the Region Servers as it acts as slaves in the cluster. Region, in HBase, is termed as the basic unit for horizontal scalability. Regions are a subset of data from the table and are basically a contiguous, sorted range of r ows that are stored together. Initially, a table has only one region. Once the number of rows
  • 6. increases and the region becomes large, it is split into two at middle key, thereby creating two almost equal halves. In case of Cassandra, the database is linear scalable. That means, by simply adding new nodes the scalability can be increased. It is possible to scale Cassandra database both horizontally, by adding more databases or vertically, by adding more nodes. Availability: Availability of a database means that any request given to the database as an input should receive a response from the system, either success or failure. Also, it refers to the accessibility of data even incase of ifailure of server or data nodes in the cluster. If the database has high availability, then this will lead to fewer interruptions for the client in the event of server failure. As we know, HBase has a master-slave relationship just like HDFS however it also has a HMaster thus having many masters thereby ensuring that even incase one of the masters fail to communicate, the data transmission would not be halted. This would no doubt create inconsistency in the data but as explained above, in order to satisfy CAP theorem, it is fine to proceed even if any of the two parameters are fulfilled. In case of Cassandra, it does not have a master-slave relationship. Just that all the nodes are same and there is no master node for controlling all other nodes. And thus, this avoids single point of failure. Cassandra also provides replication feature, which means that even if any of the node within the cluster goes down, one or more copies are available on different machines within the cluster. Source: (Anon., n.d.) Reliability: Reliability of a database is measured by its performance in terms of its deliverables which should ideally be as per defined specifications. A highly reliable system is the one which shows same or better performance even in the event of any environment changes or fault in the system. Zookeeper assures the reliability for HBase. Znodes are present which act as the subordinate. Once the Zookeeper receives a request from the client it then runs on all the Region servers and data is then stored across various levels. Through various experiments it has also been observed that HBase performance efficiency increases as the workload increases thus assuring higher reliability. Even Incase of Cassandra, due to the distributed ring structure and replication of nodes, Cassandra is also considered to be reliable. 4.2 Security: HBase: The key security features available in HBase, according to (Anon., n.d.) are-
  • 7. 1. Authentication: For gaining a secure access to a database, it should be must that client authenticate with the server for establishing credentials. The various options for authentication are- • Client authentication: There are numerous security protocols for allowing clients to authenticate with the database. For HBase they are - Kerberos, SSL. • Server Authentication: Different database servers must as well authenticate with each other for ensuring a secure operating environment. In HBase, shared keyfile is the one such method. 2. Role Based Security: Role - based security simplifies the administration and operations of security considerably. There are various security role features available in HBase for supporting ease of administration; they are, custom roles, default roles etc. It is also important to define the scope of roles as this would be useful for systems that normally support extremely sensitive data. 3. Database Security: HBase supports database encryption and it is highly important to encrypt the data in sensitive application domains. Logging is also essential for recording all the activities and interaction of clients with the system for auditing and detailed investigations. Administrator is able to define which security groups to be logged. In Hbase, fixed event logging and configurable event logging are the options supported for logging. Cassandra: According to (Anon., n.d.) the three main components of the security features furnished by Cassandra are – 1. TLS/SSL encryption for inter-node communication and client. There are two options in Cassandra for ensuring encryption and both are managed separately and need to be configured independently – client-to-node encryption and node- to-node encryption. When encryption is enabled, both the cipher suites and JVM defaults are utilized. Although these can be overridden using settings, it is not recommended unless certain specific settings need to configure as per certain policy. 2. Client Authentication Authentication is configured in Cassandra using the ‘authenticator’ setting in Cassandra.yaml. Under default settings, the system does not perform any
  • 8. authentication checks and thereby requires no credentials. Password Authenticator is also included in the package that stores encrypted credentials. 3. Authorization Similar to encryption, there are two options for authorization. By default, no check is performed thus allowing all permissions to all roles. Cassandra also includes Cassandra Authorizer which provides functionality to manage full permissions and the related data is stored in Cassandra system tables. 5 Learning’s from Literature Review: With the development of the Internet and cloud computing, databases are needed t o be able to effectively store and process big data, demanding high performance when readi ng and writing, while the traditional relational database confronts many new challenges. (Han, 2011) Especially in large scale and highly competitive applications such as search engines a nd SNS, it appeared to be inadequate ito use the relational database to store and query dyn amic user data. NoSQL database has been created in this case. With the exponential growth in the global data generation, the demands from the database technology grew significantly. Some of them being, iireading and writing simultaneously with low latency, Efficient requirements for large data storage and access, improved scalability and ihigh availability and Lower operating and management costs. These were some of the key limitations of traditional relational databases. iTo overcome this, NoSQL has emerged as an alternative paradigm for this new non-relational data schema (Dede, 2013). NoSQL database features described above are common; in reality, each product is compliant with the various data models and the CAP theorem. CAP theorem stands for Consistency, Availability and tolerance of network Partition. The core idea of CAP theorem is that a distributed system cannot simultaneously meet the three needs but can only meet two (Han, 2011). Depending on the project requirements, idifferent storages offer different consistency levels. These options enable users to choose various trade-offs like availability, latency and consistency. (Kumar, 2014). Therefore, in order to understand which system would be better, it is essential to assess the performance of each storage system so as to judge the appropriate storage type for a particular application. In this paper ( (Abubakar, 2014)) the author attempts to introduce YCSB, an open source tool provided by Yahoo that allows benchmarking multiple systems and comparing them by creating workloads. Distributed systems are often more complicated than their isingle-network counterparts due to the trade-off which need to be balance as per the applications requirements. The author made an attempt in this paper to upgrade YCSB in such a way that the YCSB could calculate stale
  • 9. reads in real time. One can use the model created in this paper to calculate the trade-offs between availability, latency and consistency. According to (Dede, 2013)Internet applications are rapidly increasing, generating enormous iamounts of data. In order to store humongous amounts of data we make use of NoSQL database systems like HBase and Cassandra as they are widely used by many organizations as their storage solution. The author ihas tried to test the Cassandra database based on its performance. In this paper, the author has discussed how Cassandra's different features, like replication and data partitioning, affect the performance of Apache Hadoop. Then a test model is introduced that icarries out the testing on the basis of the system's performance and ensures that it considers the architecture and its business while conducting the testing. Finally, these tests are applied at the level of the architecture based on performance, which also includes ifew performance-based elements such as the column- oriented data model, the split mechanism data model and the data replication factor. A test procedure is performed for each performance element and a test scenario is designed. Due to the continuous development of cloud computing, non - structural data storage is also steadily increasing. The schema evaluation iwas divided into a separate unit known as a schematic analyser. The schema analyser therefore does not have to rely on web applications and can be connected to visual tools. Performance of five NoSQL databases in another study by (Tang, 2016) included Cassandra and HBase and they were compared on YCSB (Yahoo Cloud Serving Benchmark). The experiment involved three different workloads- Workload A (50% read and 50% write), Workload C (100% read) and Workload H(100% write). These workloads have been performed on iaround 10000 operations out of the 100,000 loaded operations. Out of the two experiments that were conducted, the initial was for executing total time taken by these databases iagainst all three workloads. Redis turned out to be superior than the other databases as the time taken for loading and executing the data was less. As compared to Cassandra and HBase, it was 1.43 and 3.61 times faster respectively. Second experiment was for the Throughput. Notably, all the five databases showed a similar trend in this experiment. Here as well, Redis performed significantly better than the other databases. In this case, Cassandra performed isignificantly better with greater throughput than HBase. Based on the experiments, it proved that Redis database is ibetter capable for execution and loading of the workloads and thus this study thereby proved to be a motivation for our study. In the following section, we will work on finding out these experiments are relevant to the study that we have performed in this paper. 6) Performance Test Plan: For the execution of the process and the subsequent comparison of the two databases, we first created an instance on the OpenStack which is hosted on cloud. Then we assigned a
  • 10. floating ip to this instance for getting access to Ubuntu system. A keypair was generated with authorized keys in the ssh directory. Then we installed Hadoop along with HBase. For initiating Hadoop installation, we first installed Java version 8 and created a group with name Hadoop group and a user named hduser. Then we disabled the IPV6 and downloaded the Hadoop, unzipped the file and assigned hduser to Hadoop file by creating a symbolic link. The various xml files, namely, hadoop-env.sh, core-site and hdfs-site were edited according to the manual. Thereafter, we formatted the name node and started the dfs and yarn. After successful installation of Hadoop, we installed HBase. Similar to the Hadoop process, we downloaded HBase from website, unzipped it and a symbolic link was established. Then we edit the hbase-env.sh file, start the HBase and create a user table. YCSB, a benchmarking tool, was then installed in the system by referring the lab manual. Test harness was already created in ycsb specifying workload type, number of opcounts, database type, etc. As per our requirement the files in the test harness were updated. Workload types considered were Workload A and Workload C and three opcounts were considered, 100000, 150000 and 200000 for both HBase and Cassandra. Workload A is a combination of 50% reads and 50% writes whereas Workload C is 100% read. The process was run using command runtest.sh for 3 times. Following this, Cassandra was downloaded and installed in system by following the guidelines given in an online manual. (Anon., n.d.). The files in the test harness were modified as required for Cassandra and similar activity was performed. After successful completion of both HBase and Cassandra, the average of the output was then evaluated. Devise Specifications: • Sony Vaio Fit 14 SVF14A15SNB • 8GB RAM • Intel Core I5 (3rd Generation) • 1.8 GHz With Turbo Boost Upto 2.7 GHz • 1TB HDD Databases: • HBase • Cassandra Workload Type: • Workload A: 50% read and 50% write • Workload C: 100% read Operating Environment:
  • 11. Open stack • Name: m1. medium • VCPU’s: 2 • RAM: 4GB • Disk size: 40GB • MSc data-net 7. Evaluation and Results: Here, we have performed two workload tests, Workload A and Workload C against our two databases, HBase and Cassandra using YCSB as the benchmarking tool. Following are the test specifications: Workload A: 1. Read: 50 % 2. Update: 50 % Workload C: 1. Read: 100 % 7.1Workload A Results: 7.1.1 Average Insert latency vs. overall throughput Database Workload A Count [OVERALL] Throughput(ops/sec) [INSERT] AverageLatency(us) Cassandra Count 1 100000 1830.161054 471.04802 Cassandra Count 2 150000 2207.667967 405.07366 Cassandra Count 3 200000 2472.157328 361.238945 Hbase Count 1 100000 1907.632437 432.87972 Hbase Count 2 150000 2395.821687 394.2167 Hbase Count 3 200000 2363.256094 395.25737
  • 12. • Here, we are comparing the average insert latency with the overall throughput. • If the database latency is lower, then we can say that the database performance is good. • The latency of HBase is less than Cassandra for a lower count but as the data size increases, the latency rate of Cassandra drops below that of HBase. 7.1.2 Average Update Latency vs. Update operations Database Workload A Count [UPDATE] Operations [UPDATE] Average Latency(us) Cassandra Count 1 100000 50118 405.3865677 Cassandra Count 2 150000 75256 394.719039 Cassandra Count 3 200000 99707 335.3669451 HBase Count 1 100000 49972 387.6640519 HBase Count 2 150000 75369 373.7879101 HBase Count 3 200000 100005 383.7894005
  • 13. • Here, we are comparing average update latency with update operations. • As the workload increases, the number of update operations are increased, the latency of HBase increases whereas that of Cassandra decreases significantly. • This shows Cassandra performs better for update operations when the workload is high. 7.1.3 Read operations vs. Avg. Read latency Database Workload A Count [READ] Operations [READ] Average Latency(us) Cassandra Count 1 100000 49882 499.271621 Cassandra Count 2 150000 74744 526.7530905 Cassandra Count 3 200000 100293 439.4176164 HBase Count 1 100000 50028 334.307208 HBase Count 2 150000 74631 314.4751645 HBase Count 3 200000 99995 325.520106
  • 14. • Here, we compare the Read operations with the average read latency. • From this graph, we can interpret that the average latency for HBase is consistent even with the increase in workload whereas for Cassandra, as the workload increases beyond count 2, the latency rate drops significantly. 7.2 Workload C 7.2.1 Average Insert latency vs. overall throughput Database Workload C Count [OVERALL] Throughput(ops/sec) [INSERT] Average Latency(us) Cassandra Count 1 100000 2072.023538 412.90974 Cassandra Count 2 150000 2202.610828 406.38106 Cassandra Count 3 200000 2521.718299 360.39823 HBase Count 1 100000 2312.726936 401.83924 HBase Count 2 150000 2421.893921 395.4957067 HBase Count 3 200000 2276.789272 410.009585
  • 15. • Here, we compare Average Insert latency with overall throughput for Workload C. • It can be observed from the graph that the latency for Cassandra is lower than • HBase in all the three cases and also it is decreasing as the workload is increasing. 7.2.2 Average Read Latency(us) vs. Overall Throughput(ops/sec) Database Workloa d C Count [OVERALL] Throughput(ops/sec) [READ] AverageLatency(us) Cassandr a Count 1 100000 2058.248431 416.41129 Cassandr a Count 2 150000 2350.581377 378.94206 Cassandr a Count 3 200000 2494.636531 366.025005 HBase Count 1 100000 3037.667072 283.80343 HBase Count 2 150000 3758.833258 254.5179933 HBase Count 3 200000 4144.734115 221.76259
  • 16. • Here, Here, we have compared Average Read Latency with Overall Throughput for Workload C. • From the graph it is visible that the start count has the maximum latency rate for both the databases, HBase and Cassandra, although as the workload increases, the latency rate for both the databases drops significantly. 8 Conclusions and Discussion: In this paper, we have explained the underlying concepts of HBase and Cassandra database. The benchmarking tool that was used for the comparison is Yahoo! Cloud Servicing Benchmark (YCSB) to determine which database performed better under different workload scenarios. Similar count of workloads was provided to each of the database. The workloads that were provided are 100000, 150000 and 200000. We have used two types of workloads here, A and C. Workload A supports 50% read and 50% write operations and workload C which supports 100% read operations. Upon visualization of the data on Tableau, we found out that the latency behaviour of HBase is different than that of Cassandra. Although in both databases the latency rate is decreasing upon increase in workload, this rate is more in Cassandra database than HBase. In workload A, as the update operations increases the average latency for Cassandra database goes below the HBase latency rate. Overall, we can observe that for higher workloads the performance of Cassandra is better than that of HBase and we can recommend to use Cassandra for higher workload requirements. Also, all the benchmarking parameters were
  • 17. available in the YCSB tool hence we can say that it is one of the great tool for benchmarking several NoSQL databases on cloud environment. Bibliography Abubakar, Y., 2014. Performance Evaluation of NoSQL Systems using YCSB in a Resource Austere Environment. ResearchGate. Anon., n.d. An Evaluation of Cassandra for Hadoop. [Online] Available at: http://sci-hub.tw/https://ieeexplore.ieee.org/abstract/document/6676732 [Accessed 2019]. Anon., n.d. Cassandra. [Online] Available at: https://www.rapidvaluesolutions.com/tech_blog/cassandra-the-right-data- store-for-scalability-performance-availability-and-maintainability/ [Accessed 2019]. Anon., n.d. Cassandra Installation. [Online] Available at: https://www.vultr.com/docs/how-to-install-apache-cassandra-3-11-x-on- ubuntu-16-04-lts [Accessed 2019]. Anon., n.d. Cassandra-Security. [Online] Available at: http://cassandra.apache.org/doc/latest/operating/security.html [Accessed 2019]. Anon., n.d. HBase security features. [Online] Available at: https://quabase.sei.cmu.edu/mediawiki/index.php/HBase_Security_Features [Accessed 2019]. Dede, E., 2013. An Evaluation of Cassandra for Hadoop. IEEE. Han, J., 2011. Survey on NoSQL database. IEEE. Kumar, S. P., 2014. Evaluating consistency on the fly using YCSB. IEEE. Tang, E., 2016. Performance Comparison between Five NoSQL Databases. IEEE.