This document analyzes and evaluates the performance of the Riak KV NoSQL database cluster using the Basho-bench benchmark tool. Experiments were conducted on a 5-node Riak KV cluster to test throughput and latency under different workloads, data sizes, and operations (read, write, update). The results found that Riak KV can handle large volumes of data and various workloads effectively with good throughput, though latency increased with larger data sizes. Overall, Riak KV is suitable for distributed big data environments where high availability, scalability and fault tolerance are important.
Analysis and evaluation of riak kv cluster environment using basho benchStevenChike
Many institutions and companies with technological development have been producing large size of structured and unstructured data. Therefore, we need special databases to deal with these data and thus emerged NoSQL databases. They are widely used in the cloud databases and the distributed systems. In the era of big data, those databases provide a scalable high availability solution. So we need new architectures to try to meet the need to store more and more different kinds of different data. In order to arrive at a good structure of large and diverse data, this structure must be tested and analyzed in depth with the use of different benchmark tools. In this paper, we experiment the Riak key-value database to measure their performance in terms of throughput and latency, where huge amounts of data are stored and retrieved in different sizes in a distributed database environment. Throughput and latency of the NoSQL database over different types of experiments and different sizes of data are compared and then results were discussed.
This document provides a literature review of NoSQL databases. It discusses how the rise of big data from sources like social media, sensors, and surveillance footage has led organizations to adopt NoSQL databases that can handle large volumes of unstructured data more efficiently than traditional relational databases. The document evaluates several popular NoSQL databases like MongoDB, Cassandra, and HBase, categorizing them as either document stores, column family databases, or key-value stores. It also provides examples of major companies that use NoSQL and discusses factors like flexibility and scalability that have driven adoption.
TOP NEWSQL DATABASES AND FEATURES CLASSIFICATIONijdms
Versatility of NewSQL databases is to achieve low latency constrains as well as to reduce cost commodity
nodes. Out work emphasize on how big data is addressed through top NewSQL databases considering their
features. This NewSQL databases paper conveys some of the top NewSQL databases [54] features collection
considering high demand and usage. First part, around 11 NewSQL databases have been investigated for
eliciting, comparing and examining their features so that they might assist to observe high hierarchy of
NewSQL databases and to reveal their similarities and their differences. Our taxonomy involves four types
categories in terms of how NewSQL databases handle, and process big data considering technologies are
offered or supported. Advantages and disadvantages are conveyed in this survey for each of NewSQL
databases. At second part, we register our findings based on several categories and aspects: first, by our
first taxonomy which sees features characteristics are either functional or non-functional. A second
taxonomy moved into another aspect regarding data integrity and data manipulation; we found data
features classified based on supervised, semi-supervised, or unsupervised. Third taxonomy was about how
diverse each single NewSQL database can deal with different types of databases. Surprisingly, Not only do
NewSQL databases process regular (raw) data, but also they are stringent enough to afford diverse type of
data such as historical and vertical distributed system, real-time, streaming, and timestamp databases.
Thereby we release NewSQL databases are significant enough to survive and associate with other
technologies to support other database types such as NoSQL, traditional, distributed system, and semirelationship
to be as our fourth taxonomy-based. We strive to visualize our results for the former categories
and the latter using chart graph. Eventually, NewSQL databases motivate us to analyze its big data
throughput and we could classify them into good data or bad data. We conclude this paper with couple
suggestions in how to manage big data using Predictable Analytics and other techniques.
The growth of data and its effi cient handling is becoming more popular trend in recent years bringing
new challenges to explore new avenues. Data analytics can be done more effi ciently with the availability of
distributed architecture of “Not Only SQL” NoSQL databases.
This document provides information about big data and its characteristics. It discusses the different types of data that comprise big data, including structured, semi-structured, and unstructured data. It also addresses some of the challenges of big data, such as its increasing volume and the need to process it in real-time for applications like online promotions and healthcare monitoring. Traditional data warehouse architectures may not be well-suited for big data applications.
A survey on data mining and analysis in hadoop and mongo dbAlexander Decker
This document discusses data mining of big data using Hadoop and MongoDB. It provides an overview of Hadoop and MongoDB and their uses in big data analysis. Specifically, it proposes using Hadoop for distributed processing and MongoDB for data storage and input. The document reviews several related works that discuss big data analysis using these tools, as well as their capabilities for scalable data storage and mining. It aims to improve computational time and fault tolerance for big data analysis by mining data stored in Hadoop using MongoDB and MapReduce.
The document discusses the rise of NoSQL databases as an alternative to traditional relational databases. It provides a brief history of NoSQL, noting that new types of applications and data led developers to look for databases that offer more flexibility and scalability. It also describes the main types of NoSQL databases - key-value stores, graph stores, column stores, and document stores - and discusses some of the advantages of NoSQL databases like flexibility, scalability, availability and lower costs.
Big Data is used to store huge volume of both structured and unstructured data which is so large and is
hard to process using current / traditional database tools and software technologies. The goal of Big Data
Storage Management is to ensure a high level of data quality and availability for business intellect and big
data analytics applications. Graph database which is not most popular NoSQL database compare to
relational database yet but it is a most powerful NoSQL database which can handle large volume of data in
very efficient way. It is very difficult to manage large volume of data using traditional technology. Data
retrieval time may be more as per database size gets increase. As solution of that NoSQL databases are
available. This paper describe what is big data storage management, dimensions of big data, types of data,
what is structured and unstructured data, what is NoSQL database, types of NoSQL database, basic
structure of graph database, advantages, disadvantages and application area and comparison of various
graph database.
Analysis and evaluation of riak kv cluster environment using basho benchStevenChike
Many institutions and companies with technological development have been producing large size of structured and unstructured data. Therefore, we need special databases to deal with these data and thus emerged NoSQL databases. They are widely used in the cloud databases and the distributed systems. In the era of big data, those databases provide a scalable high availability solution. So we need new architectures to try to meet the need to store more and more different kinds of different data. In order to arrive at a good structure of large and diverse data, this structure must be tested and analyzed in depth with the use of different benchmark tools. In this paper, we experiment the Riak key-value database to measure their performance in terms of throughput and latency, where huge amounts of data are stored and retrieved in different sizes in a distributed database environment. Throughput and latency of the NoSQL database over different types of experiments and different sizes of data are compared and then results were discussed.
This document provides a literature review of NoSQL databases. It discusses how the rise of big data from sources like social media, sensors, and surveillance footage has led organizations to adopt NoSQL databases that can handle large volumes of unstructured data more efficiently than traditional relational databases. The document evaluates several popular NoSQL databases like MongoDB, Cassandra, and HBase, categorizing them as either document stores, column family databases, or key-value stores. It also provides examples of major companies that use NoSQL and discusses factors like flexibility and scalability that have driven adoption.
TOP NEWSQL DATABASES AND FEATURES CLASSIFICATIONijdms
Versatility of NewSQL databases is to achieve low latency constrains as well as to reduce cost commodity
nodes. Out work emphasize on how big data is addressed through top NewSQL databases considering their
features. This NewSQL databases paper conveys some of the top NewSQL databases [54] features collection
considering high demand and usage. First part, around 11 NewSQL databases have been investigated for
eliciting, comparing and examining their features so that they might assist to observe high hierarchy of
NewSQL databases and to reveal their similarities and their differences. Our taxonomy involves four types
categories in terms of how NewSQL databases handle, and process big data considering technologies are
offered or supported. Advantages and disadvantages are conveyed in this survey for each of NewSQL
databases. At second part, we register our findings based on several categories and aspects: first, by our
first taxonomy which sees features characteristics are either functional or non-functional. A second
taxonomy moved into another aspect regarding data integrity and data manipulation; we found data
features classified based on supervised, semi-supervised, or unsupervised. Third taxonomy was about how
diverse each single NewSQL database can deal with different types of databases. Surprisingly, Not only do
NewSQL databases process regular (raw) data, but also they are stringent enough to afford diverse type of
data such as historical and vertical distributed system, real-time, streaming, and timestamp databases.
Thereby we release NewSQL databases are significant enough to survive and associate with other
technologies to support other database types such as NoSQL, traditional, distributed system, and semirelationship
to be as our fourth taxonomy-based. We strive to visualize our results for the former categories
and the latter using chart graph. Eventually, NewSQL databases motivate us to analyze its big data
throughput and we could classify them into good data or bad data. We conclude this paper with couple
suggestions in how to manage big data using Predictable Analytics and other techniques.
The growth of data and its effi cient handling is becoming more popular trend in recent years bringing
new challenges to explore new avenues. Data analytics can be done more effi ciently with the availability of
distributed architecture of “Not Only SQL” NoSQL databases.
This document provides information about big data and its characteristics. It discusses the different types of data that comprise big data, including structured, semi-structured, and unstructured data. It also addresses some of the challenges of big data, such as its increasing volume and the need to process it in real-time for applications like online promotions and healthcare monitoring. Traditional data warehouse architectures may not be well-suited for big data applications.
A survey on data mining and analysis in hadoop and mongo dbAlexander Decker
This document discusses data mining of big data using Hadoop and MongoDB. It provides an overview of Hadoop and MongoDB and their uses in big data analysis. Specifically, it proposes using Hadoop for distributed processing and MongoDB for data storage and input. The document reviews several related works that discuss big data analysis using these tools, as well as their capabilities for scalable data storage and mining. It aims to improve computational time and fault tolerance for big data analysis by mining data stored in Hadoop using MongoDB and MapReduce.
The document discusses the rise of NoSQL databases as an alternative to traditional relational databases. It provides a brief history of NoSQL, noting that new types of applications and data led developers to look for databases that offer more flexibility and scalability. It also describes the main types of NoSQL databases - key-value stores, graph stores, column stores, and document stores - and discusses some of the advantages of NoSQL databases like flexibility, scalability, availability and lower costs.
Big Data is used to store huge volume of both structured and unstructured data which is so large and is
hard to process using current / traditional database tools and software technologies. The goal of Big Data
Storage Management is to ensure a high level of data quality and availability for business intellect and big
data analytics applications. Graph database which is not most popular NoSQL database compare to
relational database yet but it is a most powerful NoSQL database which can handle large volume of data in
very efficient way. It is very difficult to manage large volume of data using traditional technology. Data
retrieval time may be more as per database size gets increase. As solution of that NoSQL databases are
available. This paper describe what is big data storage management, dimensions of big data, types of data,
what is structured and unstructured data, what is NoSQL database, types of NoSQL database, basic
structure of graph database, advantages, disadvantages and application area and comparison of various
graph database.
Big data refers to the massive amounts of unstructured data that are growing exponentially. Hadoop is an open-source framework that allows processing and storing large data sets across clusters of commodity hardware. It provides reliability and scalability through its distributed file system HDFS and MapReduce programming model. The Hadoop ecosystem includes components like Hive, Pig, HBase, Flume, Oozie, and Mahout that provide SQL-like queries, data flows, NoSQL capabilities, data ingestion, workflows, and machine learning. Microsoft integrates Hadoop with its BI and analytics tools to enable insights from diverse data sources.
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
Vikram Andem, Senior Manager, United Airlines, A case for Bigdata Program and Strategy @ IATA Technology Roadmap 2014, October 13th, 2014, Montréal, Canada
2013 NIST Big Data Subgroups Combined Outputs Bob Marcus
This document provides an outline for combining deliverables from the NIST Big Data Working Group into an integrated document. It includes sections on introducing a definition of Big Data, describing reference architectures at a high and detailed level, mapping requirements to the architectures, discussing future directions and security/privacy challenges, and concluding with general advice. Appendices provide terminology glossaries, use case examples, roles/actors, and questions from a customer perspective.
What is NoSQL? How does it come to the picture? What are the types of NoSQL? Some basics of different NoSQL types? Differences between RDBMS and NoSQL. Pros and Cons of NoSQL.
What is MongoDB? What are the features of MongoDB? Nexus architecture of MongoDB. Data model and query model of MongoDB? Various MongoDB data management techniques. Indexing in MongoDB. A working example using MongoDB Java driver on Mac OSX.
This deck talks about the basic overview of NoSQL technologies, implementation vendors/products, case studies, and some of the core implementation algorithms. The presentation also describes a quick overview of "Polyglot Persistency", "NewSQL" like emerging trends.
The deck is targeted to beginners who wants to get an overview of NoSQL databases.
This document discusses big data and SQL Server. It covers what big data is, the Hadoop environment, big data analytics, and how SQL Server fits into the big data world. It describes using Sqoop to load data between Hadoop and SQL Server, and SQL Server features for big data analytics like columnstore and PolyBase. The document concludes that a big data analytics approach is needed for massive, variable data, and that SQL Server 2012 supports this with features like columnstore and tabular SSAS.
No sql databases new millennium database for big data, big users, cloud compu...eSAT Publishing House
This document discusses NoSQL databases as a new type of database designed for big data, big users, and cloud computing. It describes how the growth of data volumes, increased numbers of global users, and cloud architectures are driving organizations to adopt NoSQL databases over traditional relational databases. The document provides an overview of the characteristics of NoSQL databases, including how they are classified based on the CAP theorem and how their scale-out architecture provides improved performance and scalability over relational databases. Security challenges of NoSQL databases are also mentioned.
This document discusses the suitability of NoSQL databases for interactive applications. It notes that traditional databases do not provide the necessary scalability and performance for interactive applications that require reading and writing huge amounts of data swiftly. NoSQL databases are better suited for these tasks as they are scalable, fast, and robust. The document provides an overview of NoSQL databases and how their flexible schema, high performance, availability, and ability to scale horizontally make them a better fit than traditional databases for modern, large-scale, interactive applications.
Implementation of Multi-node Clusters in Column Oriented Database using HDFSIJEACS
Generally HBASE is NoSQL database which runs in the Hadoop environment, so it can be called as Hadoop Database. By using Hadoop distributed file system and map reduce with the implementation of key/value store as real time data access combines the deep capabilities and efficiency of map reduce. Basically testing is done by using single node clustering which improved the performance of query when compared to SQL, even though performance is enhanced, the data retrieval becomes complicated as there is no multi node clusters and totally based on SQL queries. In this paper, we use the concepts of HBase, which is a column oriented database and it is on the top of HDFS (Hadoop distributed file system) along with multi node clustering which increases the performance. HBase is key/value store which is Consistent, Distributed, Multidimensional and Sorted map. Data storage in HBase in the form of cells, and here those cells are grouped by a row key. Hence our proposal yields better results regarding query performance and data retrieval compared to existing approaches.
Challenges Management and Opportunities of Cloud DBAinventy
Research Inventy provides an outlet for research findings and reviews in areas of Engineering, Computer Science found to be relevant for national and international development, Research Inventy is an open access, peer reviewed international journal with a primary objective to provide research and applications related to Engineering. In its publications, to stimulate new research ideas and foster practical application from the research findings. The journal publishes original research of such high quality as to attract contributions from the relevant local and international communities.
NoSQL databases allow for a variety of data models like key-value, document, columnar and graph formats. NoSQL stands for "not only SQL" and provides an alternative to relational databases. It is useful for large distributed datasets and prioritizes performance and scalability over rigid data consistency. Common NoSQL databases include key-value stores like Redis and Riak, document databases like MongoDB and CouchDB, wide-column stores like Cassandra and HBase, and graph databases like Neo4j and Titan.
Iaetsd mapreduce streaming over cassandra datasetsIaetsd Iaetsd
This document discusses processing large datasets from Denmark's traffic using Apache Cassandra and MapReduce. It begins with an introduction to big data and how the volume, velocity, and variety of data requires alternative processing methods. Apache Cassandra is introduced as a distributed and scalable NoSQL database for storing large amounts of structured and unstructured data across servers. The document then discusses Cassandra's data model and system architecture. It describes how MapReduce can be used for distributed processing of datasets stored in Cassandra. The paper aims to process traffic datasets from Denmark using Cassandra and MapReduce to help the transportation department monitor traffic.
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONSijdms
ABSTRACT
The amount of data stored in IoT databases increases as the IoT applications extend throughout smart city appliances, industry and agriculture. Contemporary database systems must process huge amounts of sensory and actuator data in real-time or interactively. Facing this first wave of IoT revolution, database vendors struggle day-by-day in order to gain more market share, develop new capabilities and attempt to overcome the disadvantages of previous releases, while providing features for the IoT.
There are two popular database types: The Relational Database Management Systems and NoSQL databases, with NoSQL gaining ground on IoT data storage. In the context of this paper these two types are examined. Focusing on open source databases, the authors experiment on IoT data sets and pose an answer to the question which one performs better than the other. It is a comparative study on the performance of the commonly market used open source databases, presenting results for the NoSQL MongoDB database and SQL databases of MySQL and PostgreSQL
The rising interest in NoSQL technology over the last few years resulted in an increasing number of evaluations and comparisons among competing NoSQL technologies From survey we create a concise and up-to-date comparison of NoSQL engines, identifying their most beneficial use from the software engineer point of view.
Big data, agile development, and cloud computing
are driving new requirements for database
management systems. These requirements are in turn
driving the next phase of growth in the database
industry, mirroring the evolution of the OLAP
industry. This document describes this evolution, the
new application workload, and how MongoDB is
uniquely suited to address these challenges.
This white paper will present the opportunities laid down by
data lake and advanced analytics, as well as, the challenges
in integrating, mining and analyzing the data collected from
these sources. It goes over the important characteristics of
the data lake architecture and Data and Analytics as a
Service (DAaaS) model. It also delves into the features of a
successful data lake and its optimal designing. It goes over
data, applications, and analytics that are strung together to
speed-up the insight brewing process for industry’s
improvements with the help of a powerful architecture for
mining and analyzing unstructured data – data lake.
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENTIJCSEA Journal
Relational database systems have been the standard storage system over the last forty years. Recently,
advancements in technologies have led to an exponential increase in data volume, velocity and variety
beyond what relational databases can handle. Developers are turning to NoSQL which is a non- relational
database for data storage and management. Some core features of database system such as ACID have
been compromised in NOSQL databases. This work proposed a hybrid database system for the storage and
management of extremely voluminous data of diverse components known as big data, such that the two
models are integrated in one system to eliminate the limitations of the individual systems. The system is
implemented in MongoDB which is a NoSQL database and SQL. The results obtained, revealed that having
these two databases in one system can enhance storage and management of big data bridging the gap
between relational and NoSQL storage approach.
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENTIJCSEA Journal
This document proposes a hybrid database system that integrates a NoSQL database (MongoDB) and a relational database (MySQL) to address the limitations of each individual system for big data storage and management. It discusses the properties of big data, reviews the approaches of relational and NoSQL databases, highlights their strengths and weaknesses, and then describes the proposed hybrid system that categorizes data as structured or unstructured and stores it in the appropriate database to leverage the benefits of both models. The system is designed to enhance big data storage and management by bridging the gaps between relational and NoSQL approaches.
Very basic Introduction to Big Data. Touches on what it is, characteristics, some examples of Big Data frameworks. Hadoop 2.0 example - Yarn, HDFS and Map-Reduce with Zookeeper.
The document discusses NoSQL databases as an alternative to traditional SQL databases. It provides an overview of NoSQL databases, including their key features, data models, and popular examples like MongoDB and Cassandra. Some key points:
- NoSQL databases were developed to overcome limitations of SQL databases in handling large, unstructured datasets and high volumes of read/write operations.
- NoSQL databases come in various data models like key-value, column-oriented, and document-oriented. Popular examples discussed are MongoDB and Cassandra.
- MongoDB is a document database that stores data as JSON-like documents. It supports flexible querying. Cassandra is a column-oriented database developed by Facebook that is highly scalable
Big data refers to the massive amounts of unstructured data that are growing exponentially. Hadoop is an open-source framework that allows processing and storing large data sets across clusters of commodity hardware. It provides reliability and scalability through its distributed file system HDFS and MapReduce programming model. The Hadoop ecosystem includes components like Hive, Pig, HBase, Flume, Oozie, and Mahout that provide SQL-like queries, data flows, NoSQL capabilities, data ingestion, workflows, and machine learning. Microsoft integrates Hadoop with its BI and analytics tools to enable insights from diverse data sources.
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
Vikram Andem, Senior Manager, United Airlines, A case for Bigdata Program and Strategy @ IATA Technology Roadmap 2014, October 13th, 2014, Montréal, Canada
2013 NIST Big Data Subgroups Combined Outputs Bob Marcus
This document provides an outline for combining deliverables from the NIST Big Data Working Group into an integrated document. It includes sections on introducing a definition of Big Data, describing reference architectures at a high and detailed level, mapping requirements to the architectures, discussing future directions and security/privacy challenges, and concluding with general advice. Appendices provide terminology glossaries, use case examples, roles/actors, and questions from a customer perspective.
What is NoSQL? How does it come to the picture? What are the types of NoSQL? Some basics of different NoSQL types? Differences between RDBMS and NoSQL. Pros and Cons of NoSQL.
What is MongoDB? What are the features of MongoDB? Nexus architecture of MongoDB. Data model and query model of MongoDB? Various MongoDB data management techniques. Indexing in MongoDB. A working example using MongoDB Java driver on Mac OSX.
This deck talks about the basic overview of NoSQL technologies, implementation vendors/products, case studies, and some of the core implementation algorithms. The presentation also describes a quick overview of "Polyglot Persistency", "NewSQL" like emerging trends.
The deck is targeted to beginners who wants to get an overview of NoSQL databases.
This document discusses big data and SQL Server. It covers what big data is, the Hadoop environment, big data analytics, and how SQL Server fits into the big data world. It describes using Sqoop to load data between Hadoop and SQL Server, and SQL Server features for big data analytics like columnstore and PolyBase. The document concludes that a big data analytics approach is needed for massive, variable data, and that SQL Server 2012 supports this with features like columnstore and tabular SSAS.
No sql databases new millennium database for big data, big users, cloud compu...eSAT Publishing House
This document discusses NoSQL databases as a new type of database designed for big data, big users, and cloud computing. It describes how the growth of data volumes, increased numbers of global users, and cloud architectures are driving organizations to adopt NoSQL databases over traditional relational databases. The document provides an overview of the characteristics of NoSQL databases, including how they are classified based on the CAP theorem and how their scale-out architecture provides improved performance and scalability over relational databases. Security challenges of NoSQL databases are also mentioned.
This document discusses the suitability of NoSQL databases for interactive applications. It notes that traditional databases do not provide the necessary scalability and performance for interactive applications that require reading and writing huge amounts of data swiftly. NoSQL databases are better suited for these tasks as they are scalable, fast, and robust. The document provides an overview of NoSQL databases and how their flexible schema, high performance, availability, and ability to scale horizontally make them a better fit than traditional databases for modern, large-scale, interactive applications.
Implementation of Multi-node Clusters in Column Oriented Database using HDFSIJEACS
Generally HBASE is NoSQL database which runs in the Hadoop environment, so it can be called as Hadoop Database. By using Hadoop distributed file system and map reduce with the implementation of key/value store as real time data access combines the deep capabilities and efficiency of map reduce. Basically testing is done by using single node clustering which improved the performance of query when compared to SQL, even though performance is enhanced, the data retrieval becomes complicated as there is no multi node clusters and totally based on SQL queries. In this paper, we use the concepts of HBase, which is a column oriented database and it is on the top of HDFS (Hadoop distributed file system) along with multi node clustering which increases the performance. HBase is key/value store which is Consistent, Distributed, Multidimensional and Sorted map. Data storage in HBase in the form of cells, and here those cells are grouped by a row key. Hence our proposal yields better results regarding query performance and data retrieval compared to existing approaches.
Challenges Management and Opportunities of Cloud DBAinventy
Research Inventy provides an outlet for research findings and reviews in areas of Engineering, Computer Science found to be relevant for national and international development, Research Inventy is an open access, peer reviewed international journal with a primary objective to provide research and applications related to Engineering. In its publications, to stimulate new research ideas and foster practical application from the research findings. The journal publishes original research of such high quality as to attract contributions from the relevant local and international communities.
NoSQL databases allow for a variety of data models like key-value, document, columnar and graph formats. NoSQL stands for "not only SQL" and provides an alternative to relational databases. It is useful for large distributed datasets and prioritizes performance and scalability over rigid data consistency. Common NoSQL databases include key-value stores like Redis and Riak, document databases like MongoDB and CouchDB, wide-column stores like Cassandra and HBase, and graph databases like Neo4j and Titan.
Iaetsd mapreduce streaming over cassandra datasetsIaetsd Iaetsd
This document discusses processing large datasets from Denmark's traffic using Apache Cassandra and MapReduce. It begins with an introduction to big data and how the volume, velocity, and variety of data requires alternative processing methods. Apache Cassandra is introduced as a distributed and scalable NoSQL database for storing large amounts of structured and unstructured data across servers. The document then discusses Cassandra's data model and system architecture. It describes how MapReduce can be used for distributed processing of datasets stored in Cassandra. The paper aims to process traffic datasets from Denmark using Cassandra and MapReduce to help the transportation department monitor traffic.
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONSijdms
ABSTRACT
The amount of data stored in IoT databases increases as the IoT applications extend throughout smart city appliances, industry and agriculture. Contemporary database systems must process huge amounts of sensory and actuator data in real-time or interactively. Facing this first wave of IoT revolution, database vendors struggle day-by-day in order to gain more market share, develop new capabilities and attempt to overcome the disadvantages of previous releases, while providing features for the IoT.
There are two popular database types: The Relational Database Management Systems and NoSQL databases, with NoSQL gaining ground on IoT data storage. In the context of this paper these two types are examined. Focusing on open source databases, the authors experiment on IoT data sets and pose an answer to the question which one performs better than the other. It is a comparative study on the performance of the commonly market used open source databases, presenting results for the NoSQL MongoDB database and SQL databases of MySQL and PostgreSQL
The rising interest in NoSQL technology over the last few years resulted in an increasing number of evaluations and comparisons among competing NoSQL technologies From survey we create a concise and up-to-date comparison of NoSQL engines, identifying their most beneficial use from the software engineer point of view.
Big data, agile development, and cloud computing
are driving new requirements for database
management systems. These requirements are in turn
driving the next phase of growth in the database
industry, mirroring the evolution of the OLAP
industry. This document describes this evolution, the
new application workload, and how MongoDB is
uniquely suited to address these challenges.
This white paper will present the opportunities laid down by
data lake and advanced analytics, as well as, the challenges
in integrating, mining and analyzing the data collected from
these sources. It goes over the important characteristics of
the data lake architecture and Data and Analytics as a
Service (DAaaS) model. It also delves into the features of a
successful data lake and its optimal designing. It goes over
data, applications, and analytics that are strung together to
speed-up the insight brewing process for industry’s
improvements with the help of a powerful architecture for
mining and analyzing unstructured data – data lake.
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENTIJCSEA Journal
Relational database systems have been the standard storage system over the last forty years. Recently,
advancements in technologies have led to an exponential increase in data volume, velocity and variety
beyond what relational databases can handle. Developers are turning to NoSQL which is a non- relational
database for data storage and management. Some core features of database system such as ACID have
been compromised in NOSQL databases. This work proposed a hybrid database system for the storage and
management of extremely voluminous data of diverse components known as big data, such that the two
models are integrated in one system to eliminate the limitations of the individual systems. The system is
implemented in MongoDB which is a NoSQL database and SQL. The results obtained, revealed that having
these two databases in one system can enhance storage and management of big data bridging the gap
between relational and NoSQL storage approach.
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENTIJCSEA Journal
This document proposes a hybrid database system that integrates a NoSQL database (MongoDB) and a relational database (MySQL) to address the limitations of each individual system for big data storage and management. It discusses the properties of big data, reviews the approaches of relational and NoSQL databases, highlights their strengths and weaknesses, and then describes the proposed hybrid system that categorizes data as structured or unstructured and stores it in the appropriate database to leverage the benefits of both models. The system is designed to enhance big data storage and management by bridging the gaps between relational and NoSQL approaches.
Very basic Introduction to Big Data. Touches on what it is, characteristics, some examples of Big Data frameworks. Hadoop 2.0 example - Yarn, HDFS and Map-Reduce with Zookeeper.
The document discusses NoSQL databases as an alternative to traditional SQL databases. It provides an overview of NoSQL databases, including their key features, data models, and popular examples like MongoDB and Cassandra. Some key points:
- NoSQL databases were developed to overcome limitations of SQL databases in handling large, unstructured datasets and high volumes of read/write operations.
- NoSQL databases come in various data models like key-value, column-oriented, and document-oriented. Popular examples discussed are MongoDB and Cassandra.
- MongoDB is a document database that stores data as JSON-like documents. It supports flexible querying. Cassandra is a column-oriented database developed by Facebook that is highly scalable
This document discusses NoSQL databases and compares MongoDB and Cassandra. It begins with an introduction to NoSQL databases and why they were created. It then describes the key features and data models of NoSQL databases including key-value, column-oriented, document, and graph databases. Specific details are provided about MongoDB and Cassandra, including their data structure, query operations, examples of usage, and enhancements. The document provides an in-depth overview of NoSQL databases and a side-by-side comparison of MongoDB and Cassandra.
Big Data is used to store huge volume of both structured and unstructured data which is so large and is
hard to process using current / traditional database tools and software technologies. The goal of Big Data
Storage Management is to ensure a high level of data quality and availability for business intellect and big
data analytics applications. Graph database which is not most popular NoSQL database compare to
relational database yet but it is a most powerful NoSQL database which can handle large volume of data in
very efficient way. It is very difficult to manage large volume of data using traditional technology. Data
retrieval time may be more as per database size gets increase. As solution of that NoSQL databases are
available. This paper describe what is big data storage management, dimensions of big data, types of data,
what is structured and unstructured data, what is NoSQL database, types of NoSQL database, basic
structure of graph database, advantages, disadvantages and application area and comparison of various
graph database.
A Study on Graph Storage Database of NOSQLIJSCAI Journal
Big Data is used to store huge volume of both structured and unstructured data which is so large and is
hard to process using current / traditional database tools and software technologies. The goal of Big Data
Storage Management is to ensure a high level of data quality and availability for business intellect and big
data analytics applications. Graph database which is not most popular NoSQL database compare to
relational database yet but it is a most powerful NoSQL database which can handle large volume of data in
very efficient way. It is very difficult to manage large volume of data using traditional technology. Data
retrieval time may be more as per database size gets increase. As solution of that NoSQL databases are
available. This paper describe what is big data storage management, dimensions of big data, types of data,
what is structured and unstructured data, what is NoSQL database, types of NoSQL database, basic
structure of graph database, advantages, disadvantages and application area and comparison of various
graph database.
A Study on Graph Storage Database of NOSQLIJSCAI Journal
This document summarizes a research paper on graph storage databases in NoSQL. It discusses big data and the need for alternative databases to handle large, diverse datasets. It defines the key aspects of big data including volume, velocity, variety and complexity. It also describes different types of NoSQL databases, focusing on the basic structure of graph databases. Graph databases use nodes and relationships to model connected data. The document compares several graph database systems and discusses advantages like performance and flexibility as well as disadvantages like complexity. It outlines several applications of graph databases in areas like social networks and logistics.
This document discusses data migration in schemaless NoSQL databases. It begins by defining NoSQL databases and comparing them to traditional relational databases. It then covers aggregate data models and the concepts of schemalessness and implicit schemas in NoSQL databases. The main focus is on data migration when an implicit schema changes, including principles, strategies, and test options for ensuring data matches the new implicit schema in applications.
The document discusses the NoSQL movement and non-relational databases. It provides background on the limitations of relational databases that led to the development of NoSQL databases. Examples of NoSQL databases are described like Voldemort, CouchDB, and Cassandra. Benefits of NoSQL databases include horizontal scaling, high availability, and faster performance.
Relational Databases For An Efficient Data Management And...Sheena Crouch
This paper analyzes and compares MySQL and Neo4j databases. MySQL is a relational database that has been used for decades, while Neo4j is a graph database that is part of the emerging NoSQL technology. The paper reviews Neo4j and compares it to MySQL based on features such as ACID properties, replication, availability, and the languages used. The goal is to determine how well each database handles big data and complex relationships between entities. The analysis focuses on the differentiation between relational and graph-based approaches to data management.
This document summarizes a study that compares the performance of time series databases using real-world datasets versus synthetic datasets. The study measures three key performance metrics - data loading throughput, storage space usage, and query latency - for different time series databases when ingesting and querying both real and synthetic time series data. The results show significant differences in performance between real and synthetic datasets for data injection throughput and query execution times. Specifically, databases perform differently when handling real-world versus synthetic datasets, indicating that benchmarks using only synthetic data may not accurately represent real-world database performance for time series applications.
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGijiert bestjournal
This document summarizes a research paper that evaluates Cassandra and MongoDB NoSQL databases for processing unstructured data using Hadoop streaming. It proposes a system with three stages: data preparation where data is downloaded from Cassandra servers to file systems; data transformation where JSON data is converted to other formats using MapReduce; and data processing where non-Java executables run on the transformed data. The document reviews related work on Cassandra and Hadoop performance and discusses the data models of key-value, document, column-oriented, and graph databases. It concludes that comparing Cassandra and MongoDB can help process unstructured data and outline new approaches.
Relational databases are a technology used universally that enables storage, management and retrieval of
varied data schemas. However, execution of requests can become a lengthy and inefficient process for
some large databases. Moreover, storing large amounts of data requires servers with larger capacities and
scalability capabilities. Relational databases have limitations to deal with scalability for large volumes of
data. On the other hand, non-relational database technologies, also known as NoSQL, were developed to
better meet the needs of key-value storage of large amounts of records. But there is a large amount of
NoSQL candidates, and most have not been compared thoroughly yet. The purpose of this paper is to
compare different NoSQL databases, to evaluate their performance according to the typical use for storing
and retrieving data. We tested 10 NoSQL databases with Yahoo! Cloud Serving Benchmark using a mix of
operations to better understand the capability of non-relational databases for handling different requests,
and to understand how performance is affected by each database type and their internal mechanisms.
A Comparative Study of NoSQL and Relational Database.pdfJennifer Roman
The document compares relational databases and NoSQL databases. It discusses their key features such as scalability, cost, data volume handling, availability, and performance. Relational databases are better for consistency but struggle with scalability, availability and handling large volumes of data. NoSQL databases are better suited for modern web and big data applications as they offer better performance, scalability and can handle large volumes of data, though they lack standardization and have weaker security. The choice of database depends on the nature and requirements of the application. Both database models have strengths and weaknesses and will continue to co-exist to support different application needs.
NoSQL databases have a distributed data structure that provides high availability and scalability compared to relational databases. NoSQL databases are categorized as key-value stores, document stores, extensible record stores, or graph stores depending on how data is stored and accessed. The right NoSQL database choice depends on factors like performance needs, scalability, flexibility, and whether transactions or analytics are more important for a given use case.
The aim of this paper is to evaluate, through indexing techniques, the performance of Neo4j and
OrientDB, both graph databases technologies and to come up with strength and weaknesses os each
technology as a candidate for a storage mechanism of a graph structure. An index is a data structure that
makes the searching faster for a specific node in concern of graph databases. The referred data structure
is habitually a B-tree, however, can be a hash table or some other logic structure as well. The pivotal
point of having an index is to speed up search queries, primarily by reducing the number of nodes in a
graph or table to be examined. Graphs and graph databases are more commonly associated with social
networking or “graph search” style recommendations. Thus, these technologies remarkably are a core
technology platform for some Internet giants like Hi5, Facebook, Google, Badoo, Twitter and LinkedIn.
The key to understanding graph database systems, in the social networking context, is they give equal
prominence to storing both the data (users, favorites) and the relationships between them (who liked
what, who ‘follows’ whom, which post was liked the most, what is the shortest path to ‘reach’ who). By a
suitable application case study, in case a Twitter social networking of almost 5,000 nodes imported in
local servers (Neo4j and Orient-DB), one queried to retrieval the node with the searched data, first
without index (full scan), and second with index, aiming at comparing the response time (statement query
time) of the aforementioned graph databases and find out which of them has a better performance (the
speed of data or information retrieval) and in which case. Thereof, the main results are presented in the
section 6.
A Survey of Non -Relational Databases with Big Datarahulmonikasharma
The paper's objective is to provide classification, characteristics and evaluation of available non relational database systems which may be used in Big Data Predictions and Analytics .Paper describes why Relational Database Bases Management Systems such as IBM’s, DB2, Oracle, and SAP fail to meet the Big Data Analytical and Prediction Requirements. The paper also compares the structured, semi-structured, and unstructured data. The paper also includes the various types of NoSQL databases and their specifications Finally, the operational issues such as scale, performance and availability of data by utilizing these database systems will be compared.
A Survey of Non -Relational Databases with Big Datarahulmonikasharma
The paper's objective is to provide classification, characteristics and evaluation of available non relational database systems which may be used in Big Data Predictions and Analytics .Paper describes why Relational Database Bases Management Systems such as IBM’s, DB2, Oracle, and SAP fail to meet the Big Data Analytical and Prediction Requirements. The paper also compares the structured, semi-structured, and unstructured data. The paper also includes the various types of NoSQL databases and their specifications Finally, the operational issues such as scale, performance and availability of data by utilizing these database systems will be compared.
Similar to Analysis and evaluation of riak kv cluster environment using basho bench (20)
RoHS stands for Restriction of Hazardous Substances, which is also known as t...vijaykumar292010
RoHS stands for Restriction of Hazardous Substances, which is also known as the Directive 2002/95/EC. It includes the restrictions for the use of certain hazardous substances in electrical and electronic equipment. RoHS is a WEEE (Waste of Electrical and Electronic Equipment).
Evolving Lifecycles with High Resolution Site Characterization (HRSC) and 3-D...Joshua Orris
The incorporation of a 3DCSM and completion of HRSC provided a tool for enhanced, data-driven, decisions to support a change in remediation closure strategies. Currently, an approved pilot study has been obtained to shut-down the remediation systems (ISCO, P&T) and conduct a hydraulic study under non-pumping conditions. A separate micro-biological bench scale treatability study was competed that yielded positive results for an emerging innovative technology. As a result, a field pilot study has commenced with results expected in nine-twelve months. With the results of the hydraulic study, field pilot studies and an updated risk assessment leading site monitoring optimization cost lifecycle savings upwards of $15MM towards an alternatively evolved best available technology remediation closure strategy.
Optimizing Post Remediation Groundwater Performance with Enhanced Microbiolog...Joshua Orris
Results of geophysics and pneumatic injection pilot tests during 2003 – 2007 yielded significant positive results for injection delivery design and contaminant mass treatment, resulting in permanent shut-down of an existing groundwater Pump & Treat system.
Accessible source areas were subsequently removed (2011) by soil excavation and treated with the placement of Emulsified Vegetable Oil EVO and zero-valent iron ZVI to accelerate treatment of impacted groundwater in overburden and weathered fractured bedrock. Post pilot test and post remediation groundwater monitoring has included analyses of CVOCs, organic fatty acids, dissolved gases and QuantArray® -Chlor to quantify key microorganisms (e.g., Dehalococcoides, Dehalobacter, etc.) and functional genes (e.g., vinyl chloride reductase, methane monooxygenase, etc.) to assess potential for reductive dechlorination and aerobic cometabolism of CVOCs.
In 2022, the first commercial application of MetaArray™ was performed at the site. MetaArray™ utilizes statistical analysis, such as principal component analysis and multivariate analysis to provide evidence that reductive dechlorination is active or even that it is slowing. This creates actionable data allowing users to save money by making important site management decisions earlier.
The results of the MetaArray™ analysis’ support vector machine (SVM) identified groundwater monitoring wells with a 80% confidence that were characterized as either Limited for Reductive Decholorination or had a High Reductive Reduction Dechlorination potential. The results of MetaArray™ will be used to further optimize the site’s post remediation monitoring program for monitored natural attenuation.
Kinetic studies on malachite green dye adsorption from aqueous solutions by A...Open Access Research Paper
Water polluted by dyestuffs compounds is a global threat to health and the environment; accordingly, we prepared a green novel sorbent chemical and Physical system from an algae, chitosan and chitosan nanoparticle and impregnated with algae with chitosan nanocomposite for the sorption of Malachite green dye from water. The algae with chitosan nanocomposite by a simple method and used as a recyclable and effective adsorbent for the removal of malachite green dye from aqueous solutions. Algae, chitosan, chitosan nanoparticle and algae with chitosan nanocomposite were characterized using different physicochemical methods. The functional groups and chemical compounds found in algae, chitosan, chitosan algae, chitosan nanoparticle, and chitosan nanoparticle with algae were identified using FTIR, SEM, and TGADTA/DTG techniques. The optimal adsorption conditions, different dosages, pH and Temperature the amount of algae with chitosan nanocomposite were determined. At optimized conditions and the batch equilibrium studies more than 99% of the dye was removed. The adsorption process data matched well kinetics showed that the reaction order for dye varied with pseudo-first order and pseudo-second order. Furthermore, the maximum adsorption capacity of the algae with chitosan nanocomposite toward malachite green dye reached as high as 15.5mg/g, respectively. Finally, multiple times reusing of algae with chitosan nanocomposite and removing dye from a real wastewater has made it a promising and attractive option for further practical applications.
Improving the viability of probiotics by encapsulation methods for developmen...Open Access Research Paper
The popularity of functional foods among scientists and common people has been increasing day by day. Awareness and modernization make the consumer think better regarding food and nutrition. Now a day’s individual knows very well about the relation between food consumption and disease prevalence. Humans have a diversity of microbes in the gut that together form the gut microflora. Probiotics are the health-promoting live microbial cells improve host health through gut and brain connection and fighting against harmful bacteria. Bifidobacterium and Lactobacillus are the two bacterial genera which are considered to be probiotic. These good bacteria are facing challenges of viability. There are so many factors such as sensitivity to heat, pH, acidity, osmotic effect, mechanical shear, chemical components, freezing and storage time as well which affects the viability of probiotics in the dairy food matrix as well as in the gut. Multiple efforts have been done in the past and ongoing in present for these beneficial microbial population stability until their destination in the gut. One of a useful technique known as microencapsulation makes the probiotic effective in the diversified conditions and maintain these microbe’s community to the optimum level for achieving targeted benefits. Dairy products are found to be an ideal vehicle for probiotic incorporation. It has been seen that the encapsulated microbial cells show higher viability than the free cells in different processing and storage conditions as well as against bile salts in the gut. They make the food functional when incorporated, without affecting the product sensory characteristics.
2. 2 AUTHOR (All CAPS)
• To monitor the performance of the Riak KV
(Throughput, Latency) when data is being
read, write and update operation.
The rest of this paper is organized as follows:
Section 2 we present background and basic concepts.
Section 3 takes a deeper look at related works. Section
4 provides an overview of Riak KV NoSQL databases
system and its infrastructure. Section 5 is about the
Basho bench benchmarking of Riak KV. Section 6
presents the experiment environment for testing the
Riak KV NoSQL database with the Basho bench.
Section 6 provides our experimental results and
discussion. Section 7 concludes the paper.
2 BACKGROUND AND BASIC CONCEPTS
IN this section, the basic concepts related to the big
data, NoSQL database properties will be introduced.
The challenges associated with big data and NoSQL are
also introduced.
2.1 Big data
In this part, we will describe the term big data that is
very related with NoSQL database systems. Big data
can be defined as the capability of managing a huge
volume of data within the right time and proper speed.
Big data is an evolving term that describes any
voluminous amount of structured, semi structured and
unstructured data that has the potential to be mined for
information, which cannot be managed using relational
database management systems (RDBMs). [6,8]
Every day, new data is created from a variety of
sources, including social networks, photos, videos, and
more. Due to the rapid growth of data, it has become
very difficult to process this data through the available
database management system. One of the solutions that
have been proposed to overcome the fast growth of data
has been applying better hardware; however, this
approach has not been sufficient as the hardware
enhancement reached a point where the growth of data
volume outpaces computer resources [5]. Now, big data
could be found in three forms:
• Structured- Any data that can be stored, accessed
and processed in the form of fixed format is
termed as a 'structured' data. Over time, talent in
computer science have achieved greater success
in developing techniques for working with such
kind of data (where the format is well known in
advance) and also deriving value out of it. There
are two sources that provide structured data: data
generated by human intervention such as gaming
data and input data. The second source is the data
generated by machines such as sensor data, web
log data and financial data. [8,9]
• Unstructured data- Before the current ubiquitous
of online and mobile applications, databases
processed direct, structured data. The data forms
were almost simple and described a set of
relationships between various data types in the
database. In contrast, unstructured data refers to
data that is not fully suited to the traditional
column and row structure of a relational
database. In today's big data world, most of the
data created are unstructured, and some estimates
that it is more than 95% of all data generated. [29]
• Semi-structured data- This data combines
structured and unstructured data. Dealing with
this level of data complexity is not easy. Big data
and extensive records lead to long-running
queries; So, we need new methods and
techniques to overcome this challenge and
manage large amounts of data.[14]
2.2 Nosql
The term NoSQL ("Not only SQL") is the term that
describes the entire class of databases which do not
have the characteristics of traditional relational
databases and for which standard query SQL language
is not generally used. NoSQL databases are considered
to be the next generation databases and It supports huge
data storage, horizontally scalable, open source,
distributed databases and massive- parallel data
processing. They are characterized by a less strict static
data structure, simple support to replication and simple
application programming interface. They are often
related to large data sets that need to be quickly and
efficiently accessed and changed on the Web. [11,10]
NoSQL databases can be classified into four
categories.
• Key-Value (KV)- In general, NoSQL
databases allow the use of various types of
relational data tools. These are becoming
common in new business plans and big data
analysis in which classified data should be
stored in a practical and efficient manner [16].
Within this context, key-value store databases
are the simplest NoSQL databases. They can
help developers in the absence of a predefined
schema. Different kinds of objects, data types,
and data containers and are used to
accommodate this [17,15].High query speed
with a simple structure, where KV is the data
model, supports benefits such as high
concurrency and mass storage. Data
modification and query operations are well
supported through primary keys, such as Riak
KV [18] and Redis [19].
• Column-oriented- A table in a column-
oriented database can be used for the data
model; however, This stores tables of
extensible records. It includes columns and
rows, which may be shared through being
divided over nodes. In general, the benefit of
this data model is a more appropriate
application on aggregation and data
3. INTELLIGENT AUTOMATION AND SOFT COMPUTING 3
warehouses, HBase [20] and Cassandra [21]
are an example of this kind of data store.
• Documents data stores- Also known as a document-
oriented database, this program is used to retrieve, store
and manage information. The data is semi-structured
data. The documents database can usually use the
secondary index to facilitate the value of the upper
application; however. The Key Value and document
database structures are very similar, they differ in how
they process data. It was named by that name fr from the
manner of storing. So that the data is stored documents
in XML or JSON format [22,23]. Couch and MongoDB
dB [24] are examples of documents data.
• Graph Databases- A graph database comprises nodes that
are connected by edges. Data can be stored in edges and
nodes. One advantage of a graph database is that it can
traverse relationships very quickly. Similar to the other
three types of NoSQL databases mentioned above, graph
databases have some problems with horizontal scaling.
This is why every node can connect to any other node.
Traversing nodes on various physical machines can have
a negative effect on performance. Another difference
from the above three is that most graphics databases
support ACID (atomicity, consistency, isolation, and
durability) transactions. Graph databases are often used to
deal with complex issues such as social networks or path-
finding problems [25], such as Neo4j [26].
3 RELATED WORK
THERE have been numerous papers, researches, blogs, that
test and evaluate NoSQL database to discuss various features
such as their benefits, and find the suitable NoSQL database,
such as that by Ali Hammood et al. [9], this research examines
the more recent versions of the systems. For this purpose, was set
up a testing environment for each workload and monitor the
responses for the Cassandra, HBase, and MongoDB database
systems. according to the results obtained, HBase and Cassandra
worked very well under heavy loads. MongoDB worked very
well with low throughput, but not as well with high throughput.
In the read operation, HBase has lower performance. And the
latency for them is lower than before for all operations,
particularly in MongoDB. Lazar J. Krstić et al. [11], was used
YCSB tool for testing the performance of five NoSQL databases:
BrightstarDB,LevelDB, HamsterDB,RavenDB and STSdb
4.0.Database benchmark, a tool that was used to perform the
measurement itself, was selected to manage the NoSQL
databases running in various ways at approximately the same
level, so that the obtained measurement results could be almost
realistically compared.The authors reported that HamsterDB has
the best performance, while the worst is BrightstarDB. This
conclusion was expected before the start of the actual
performance measurement. In the study by Kuldeep Singh [12],
compared Riak, HBase, Cassandra and mongo dB from different
views. the experiments to compare and evaluate the performance
of different NoSQL datastores on a distributed cluster used ycsb
to test the performances of these four systems using the same test
environment and applying different workloads on these systems.
A summary of the results of this thesis concluded that each
system has a different response when applying a workload due to
the differences in designs.
Abramova et al. [13], tested the performance of
Cassandra based on a number of factors, including the
number of nodes, workload characteristics, number of
threads, and data size, and analyzed whether it provides
the desired acceleration and scalability attributes.
Scaling nodes and the number of data-sets do not
guarantee performance. However, Cassandra handles
concurrent request threads well and extends well with
concurrent threads. A summary of the results of that
paper concluded that when the number of nodes in a
cluster has increased from 1, or 3 to 6, even for
relatively large data sets, this trend cannot guarantee an
improvement in performance.
The authors of [29] showed a method and the results of a
research that selected between three NoSQL databases
systems for a large, distributed healthcare society. The
performance assessment methods and results are displayed
to the following databases: MongoDB, Cassandra and
Riak. The test was based on the YCSB benchmark for
evaluating NoSQL databases. The paper's summary of the
results concludes that the Cassandra database provides the
best throughput performance with the highest latency.
4 RIAK KEY-VALUE (KV)
RIAK is an open-source enterprise version of Riak
Enterprise DS. It is a KV database developed by Basho in
2007 and written in Erlang and C. The enterprise version
adds multi-data center replication, monitoring, and
additional support [22].
Riak KV is a distributed NoSQL database that is
extremely scalable, available, and straightforward to work
with. It automatically assigns the data in a cluster to ensure
quick performance and fault tolerance. Riak Enterprise
includes multi-cluster replication that guarantees low
latency and strong business continuity. Riak KV is an
appropriated distributed NoSQL KV database that ensures
read and write functions even in cases of hardware failure
or network partitions by supporting both local and multi-
cluster replication. Riak KV is designed to work and deal
with an assortment of difficulties confronting big data
applications that incorporate following client or session
data, storing data from connected devices, and replicating
data around the world. It is designed with KV to provide a
powerful, simple data model to store large amounts of
unstructured data [22,18].
Riak KV achieves fast performance and robust business
continuity by automating data distribution across the
cluster, where there is easily added capacity without a
large operational burden with a masterless architecture that
guarantees high availability and scales that are nearly
linear using commodity hardware [18]. Nodes in Riak
form a cluster. This cluster is isolated into partitions and
4. 4 AUTHOR (All CAPS)
virtual nodes (Vnodes) to form a ring to obtain all the
benefits of Riak. The ring is a 160 bit integer space
separated into a similarly sized partitions, as shown in
Figure 1.
Figure 1. Architecture of the Riak cluster.
Each node (also called a physical node) in the ring runs
a certain number of virtual nodes (Vnodes). Each
Vnode occupies one partition in the ring. It defines the
partition size of the ring when configuring Riak or
when the cluster is initialized [27].
5 BENCHMARKING OF NOSQL
THE Basho-bench is a benchmarking tool was
created to conduct accurate and repeatable performance
tests and stress tests and produce performance graphs.
Originally developed to benchmark Riak, it exposes a
pluggable driver interface and has been extended to
serve as a benchmarking tool across a variety of
projects. Basho-bench focuses on two metrics of
performance throughput and latency [28].
How Does the Benchmark Work?
Each node can be either a traffic generator or a Riak
node. A traffic generator runs one copy of Basho-bench
that generates and sends commands to Riak nodes. A
Riak node contains a complete and independent copy
of the Riak package which is identified by an Internet
Protocol (IP) address and a port number. Figure 2
shows how traffic generators and Riak nodes are
organized inside a cluster. There is one traffic generator
for every three Riak nodes [4].
Figure 2. Riak Nodes and Traffic Generators in Basho-bench.
Appendix
6 EXPERIMENT ENVIRONMENT
6.1 Experimental Setup
IN this part, we will introduce the results of
experiments realized by the testing of the Riak KV
NoSQL database with the Basho-bench. The
benchmark is specifically designed for Riak
performance test and analysis. Riak benchmark is done
using the Basho ́s measurement software that defines
the number of transactions per seconds executed per
second. The benchmark needs a configuration file,
which contains the required parameter to begin the
benchmark. It executes the given number of workers
that together perform the given task. The test was done
with a different number of keys (10 K,100 K,1000 K,
10,000 K, and 200,000 K), and the fixed size of 10000
KB every key.
The experiments were performed in the following
environment using 5 nodes of the cluster with 16 GB
RAM, Intel®- Xeon(R) -CPU E3 1241 v3-@ 3.50 GHz
× 8 processor speed and 1TB of ephemeral storage in
each unit. Ubuntu 14.0.4 LTS (64-bit) was installed on
each unit. Figure 3 illustrates the experimental structure
containing details of the primary components.
5. 5 AUTHOR (All CAPS)
Figure 3. Experimental structure.
6.2 Performance Configuration
THE Basho-bench is a test tool to perform reads, updates and
writes based on workload and measure performance. The
possible operations that the driver will run, such as
[{get,4},{put,4},{delete, 1}], which means that out of every 9
operations, get will be called four times, put will be called four
times, and delete will be called once, on average. The benchmark
package gives a set of predetermined experiment s that can be
executed as follows:
• Experiment #A- Updates are heavy. It consists of a 1/1
proportion of reads and updates.
• Experiment #B- Reads mostly. It consists of a 9/1
proportion of reads/updates.
• Experiment #C- Reads only. The workload is 1/ read.
To evaluate the loading time, we generated a different number of
keys (10 K,100 K,1000 K, 10,000 K, and 200,000 K), and a
varying number of threads (4, 8, and 12).
7 EXPERIMENTAL RESULTS AND DISCUSSION
In the following, we assign a section to each experiment,
which describes the different scenario experiments between read
and an update, also the results are illustrated in that.
7.1 Experiment #A: Updates are heavy. It consists of a 1/1
proportion of reads and updates. Figure. 3 shows the results.
• Throughput Result
Figure 4. Throughput performance for experiment (A) (1/1
read/update).
We notice from the figure 4 with thread 8 that when
the number of keys in the cluster increased from 100 K
to 1000 K, the throughput performance was similar
(190 operation). However, when the number of keys is
10 K, the performance was high (250 operations). The
overall case if thread 12, performance was very high in
all records compared to other threads.
• Latency Results
Latency is the delay from the input system to the
desired result; in each case, the term is understood
slightly differently, and the latency problems vary from
system to different. Latency greatly affects the
enjoyable and usable of electronic and mechanical
10k 100k 1000k 10,000k 200,000k
Thread 4 340 320 200 30 20
Thread 8 250 190 190 60 10
Thread 12 590 400 420 80 100
340
320
200
30 20
250
190 190
60
10
590
400
420
80
100
0
50
100
150
200
250
300
350
400
450
500
550
600OPERATIONSSEC
6. 6 AUTHOR (All CAPS)
equipment as well as communications. From the figure
5, observe that the three cases have a high latency in the
process of data update.
This is expected because the reading process usually
does not have a great latency like the rest of the
operations. Where the highest value in threads 4
reached the latency rate to 66 ms, and was almost equal
to the other threads 8, 12.
Figure 5. Latency for experiment (A) (1/1 read/update).
7.2 Experiment # B: Our second experiment updates
are heavy. It consists of a 9/1 proportion of reads and
updates. Figure. 6 shows the results.
• Throughput Result
The experimental results are shown and analyzed
are illustrated in Figure 6.
Figure 6. Throughput performance for experiment (B) (9/1
read/update).
The performance behavior exhibited in experiment (A)
differed from the experiment conducted in the
experiment (B) (9 operation read, 1 operation update).
Moreover, the throughput performance in experiment
(B) was higher than the throughput performance of
experiment (A) in all threads. Furthermore, the
performance decreases when we increase the number of
threads, for example, for a number of keys 10,000 to
20,000 K with threads 4 and 8, The difference in the
number of operations was the not expected.
As can be seen from the figure, the number of keys has
a significant effect on the performance of Riak KV. For
example, in Figure 6, the number of operations using the
10 K keys and 200,000 K with threads 12 are 710
operations/sec and 20 operations/sec, respectively,
which is very large.
• Latency Result
From figure 7, the results here were different from
experiment (C). The latency was high in figure 7 (a),
where the number of records reached 10,000 K to about
44 ms and 70 ms with 200,000 K keys.
0.5 2 2 1
3
1
5
8
11
22
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
10k 100k 1000k 10.000k 200.000k
mean-get mean-update
(ms)
1 2 3
9
54 5 4
55
66
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
10k 100k 1000k 10.000k 200.000k
mean-get mean-update
(ms)
2 2 3
5 5
7 7 6
20
22
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
10k 100k 1000k 10.000k 200.000k
mean-get mean-update
(ms)
(a) (b) (c)
10k 100k 1000k 10.000k
200.000
k
Thread 4 580 400 300 280 230
Thread 8 385 390 300 277 210
Thread 12 710 470 410 70 20
580
400
300 280
230
385 390
300
277
210
710
470
410
70
200
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
OPERATIONSSEC
7. INTELLIGENT AUTOMATION AND SOFT COMPUTING 7
Figure 7. Latency for experiment (B) (9/1 read/update).
7.3 Experiment # C: Read-only. This experiment read ratio is
100%. The results are shown in Figure 8.
• Throughput Result
The throughput performance of experiment (C) of 100% read
is shown in figure 8. The increase of read keys decreases the
throughput, which confirms again that the number of keys has
significant effect. For example, figure 8 shows the number of
operations using 100 K keys as being 625 operations/sec and when
the number of keys reaches 200,000 K, the number of operations
increased to 475 operations/sec.
Figure 8 shows the number of operations for 200,000 K as 320
operations/sec, thus making it less efficient for thread 8, and also
less throughput performance compared to other threads. In
general, and through the figure of the read-only experiments, the
performance was high and stable in all number of keys, compared
to other experiments (A, B).
Figure 8. Throughput performance for experiment (C) (100% read).
•. Latency Result
Figure 9 shows the Latency for experiment (C) read
operation only, we note that with the increase of the
threads that was caused by the reduction of latency,
through with thread 4, the latency was high where the
highest value 12 ms with 200,000K keys. The result
shown the latency was almost equal in all number of keys
from 10 K to 200,000 K, while the threads were
performing read-only operations.
Figure 9. Latency for experiment (C) (100% read).
In the summary of the previous 3 experiments A, B and
C, we note that the increase in the number of the thread
has had a significant effect on the performance of Riak
KV NoSQL databases, increasing the number of the
thread increases the performance. But its performance
measures varies from one experiment to another, we
notice the throughput effect of the operations of update
and read when they were equal as in experiment (A), so
that they were low compared to other experiments. Figure
10 shows the throughput comparison in the previous 3
experiments.
10k 100k 1000k 10.000k 200.000k
Thread 4 625 550 510 500 475
Thread 8 420 400 380 360 320
Thread 12 680 650 610 475 470
625
550
510 500
475
420 400 380 360
320
680
650
610
475 470
0
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
OPERATIONSSEC
3 3
7
9
12
2
3 3
5
9
2
3
4
5
8
0
5
10
15
1 2 3 4 5
Thread 4 Thread 8 Thread 12
(ms)
2 3
7
2 44 6
30
44
70
-10
20
50
80
10k 100k 1000k 10.000k 200.000k
mean-get mean-update
(ms)
2 3 5
9 11
6 7 8
22
30
-10
20
50
80
10k 100k 1000k 10.000k 200.000k
mean-get mean-update
(ms)
2
6
9 11 12
2
7 9
12
22
-10
20
50
80
10k 100k 1000k 10.000k 200.000k
mean-get mean-update
(ms)
(a) (b) (c)
8. 8 AUTHOR (All CAPS)
Figure 10. Comparing the throughput of three experiment A, B and C.
8 CONCLUSION
IN this paper, we tackle analysis and evaluation of
the read/update throughput as well as the latency of Riak
KV NoSQL database management systems cluster
environment. To achieve this goal, Basho-bench is used.
Benchmarking the NoSQL data stores in the perspective
of the cluster environment and monitor factors such as
throughput, latency are important requirements as there
exists a difference of NoSQL databases and its utility
differs from one application to another. In addition,
system performance is still an important factor when
processing large amounts of data. We did measurements
on three experiments of a different number of operations;
experiment A, B andC. We measured the read throughput
and latency of each of the experiments, and the update
throughput and latency. We found that the performance
is affected significantly by increased data size. We also
found that with the increase in the number of threads, the
throughput performance is better and the latency factor
reduced.
REFERENCES
[1] Rakesh Kumar, Shilpi Charu, Somya
Bansal.”Effective Way to Handling Big Data Problems
using NoSQL Database (MongoDB)”. Journal of
Advanced Database Management & Systems ISSN:
2393-8730 (online) Volume 2, Issue 2 .2015.
[2] Rakesh K. Lenka and et al.,”Comparative
Analysis of Spatial Hadoop and GeoSpark for Geospatial
Big Data Analytics”, Published in: 2016 2nd
International Conference on Contemporary Computing
and Informatics (IC3I). Date of Conference: 14-17 Dec.
2016.
[3] Anasuya N Jadagerimath1 and Dr. Prakash. S.
“Efficient IoT Data Management for Cloud Environment
using Mongo DB”. Proc. of Int. Conf. on Current Trends
in Eng., Science and Technology, ICCTEST .2017
[4] Amir Ghaffari ,Natalia Chechina, Phil Trinder,Jon
Meredith (Sep 2013) Scalable Persistent Storage for
Erlang: Theory and Practice, Twelfth ACM SIGPLAN
Workshop on Erlang, Boston, MA, USA.
[5] “Challenges and Opportunities with Big Data”.
CRA.org. Retrieved Jan 2016.
[6]. "Big data for dummies", Dr. Fern Halper, Marcia
Kaufman, Judith Hurwitz, Alan Nugent 2013.
[7] Raj R. Parmar and Sudipta Roy. ”MongoDB as an
Efficient Graph Database: An Application of Document
Oriented NOSQL Database”. Data Intensive Computing
Applications for Big Data.2018
[8]
https://www.webopedia.com/TERM/B/big_data.html
[9] A Comparison of NoSQL Database Systems: A
Study on MongoDB, Apache Hbase, and Apache
Cassandra
[10] NoSQL Databases: Critical Analysis and
Comparison
0
200
400
600
800
1000
1200
1400
1600
1800
2000
operationssec
NUMBER Of KEYS
12 Thread
8 Thread
4 Thread
A B C
10 K
A B C
100 K
A B C
1000 K
A B C
10,000 K
A B C
200,000 K
9. INTELLIGENT AUTOMATION AND SOFT COMPUTING 9
[11] TESTING THE PERFORMANCE OF NoSQL
DATABASES VIA THE DATABASE BENCHMARK
TOOL
[12] Survey of NoSQL Database Engines for Big Data
[13] V. Abramova, J. Bernardino, P. Furtado. (2014).
Which NOSQL database? A performance overview. In
Paper presented at Open Journal Databases, Volume 1,
Issue 2, pp. 17-24.
[14]
https://www.techopedia.com/definition/28802/semi-
structured-data
[15] Jing Han, Haihong E, Guan Le,Jian Du. Survey
on NoSQL Database. (2011). In IEEE 6th International
Conference on Pervasive Computing and Applications
(ICPCA).
[16] Asadulla Khan Zaki. (2014). NoSQL databases:
new millennium database for big data, big users, cloud
computing and its security challenges. IJRET:
International Journal of Research in Engineering and
Technology. Volume: 03 Special Issue.
[17] Techopedia [Online]. 2018, Retrieved from:
https://www.techopedia.com/definition/26284/key-
value-store.
[18] Riak-kv database [Online]. 2018, Retrieved from:
http://basho.com/products/riak-kv/
[19] Redis database [[Online]. 2018, Retrieved from:
https://redis.io/ .
[20] Hbase database [Online]. 2018, Retrieved from:
http://hbase.apache.org/.
[21] Cassandra database [Online]. 2018, Retrieved
from: http://cassandra.apache.org/.
[22] Man Qi. Digital Forensics and NoSQL
Databases. (2014). In IEEE 11th International
Conference on Fuzzy Systems and Knowledge
Discovery.
[23] Jing Han, Haihong E, Guan Le,Jian Du. Survey
on NoSQL Database. (2011). In IEEE 6th International
Conference on Pervasive Computing and Applications
(ICPCA).
[24] MongodB database [Online]. 2018, Retrieved
from: https://www.mongodb.com/.
[25] Man Qi. Digital Forensics and NoSQL
Databases. (2014). In IEEE 11th International
Conference on Fuzzy Systems and Knowledge
Discovery.
[26] Neo4j database [Online]. 2018, Retrieved from:
https://neo4j.com/
[27]Yousaf Muhammad. (2011). Evaluation and
Implementation of Distributed NoSQL Database for
MMO Gaming Environment. Uppsala University,
Retrieved from:
http://uu.divaportal.org/smash/get/diva2:447210/FUL
LTEXT01.pdf.
[28] https://github.com/basho/basho_bench.
[29] John Klein, Ian Gorton, Neil Ernst, Patrick
Donohoe, Kim Pham, and Chrisjan Matser. (2015).
Performance Evaluation of NoSQL Databases: A Case
Study. In Proceedings of the 1st Workshop on
Performance Analysis of Big Data Systems (PABS ’15).
ACM, New York, NY, USA, pp. 5-10.