This document provides an overview of NoSQL databases. It discusses the key features of NoSQL, including that it has no fixed schema and avoids ACID properties. Cassandra is presented as a popular example of a NoSQL database, with its ability to handle large amounts of structured data without failures. The document compares NoSQL to SQL databases, noting NoSQL's advantages in scalability and performance.
Relational databases are a technology used universally that enables storage, management and retrieval of
varied data schemas. However, execution of requests can become a lengthy and inefficient process for
some large databases. Moreover, storing large amounts of data requires servers with larger capacities and
scalability capabilities. Relational databases have limitations to deal with scalability for large volumes of
data. On the other hand, non-relational database technologies, also known as NoSQL, were developed to
better meet the needs of key-value storage of large amounts of records. But there is a large amount of
NoSQL candidates, and most have not been compared thoroughly yet. The purpose of this paper is to
compare different NoSQL databases, to evaluate their performance according to the typical use for storing
and retrieving data. We tested 10 NoSQL databases with Yahoo! Cloud Serving Benchmark using a mix of
operations to better understand the capability of non-relational databases for handling different requests,
and to understand how performance is affected by each database type and their internal mechanisms.
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAijfcstjournal
Apache Cassandra is a distributed storage system for managing very large amounts of structured data.
Cassandra provides highly available service with no single point of failure. Cassandra aims to run on top
of an infrastructure of hundreds of nodes possibly spread across different data centers with small and large
components fail continuously. Cassandra manages the persistent state in the face of the failures which
drives the reliability and scalability of the software systems. Cassandra does not support a full relational
data model because it resembles a database and shares many design and implementation strategies. In this
paper, discuss an implementation of Cassandra as Hotel Management System application. Cassandra
system was designed to run on cheap commodity hardware. Cassandra provides high write throughput and
read efficiency.
TOP NEWSQL DATABASES AND FEATURES CLASSIFICATIONijdms
Versatility of NewSQL databases is to achieve low latency constrains as well as to reduce cost commodity
nodes. Out work emphasize on how big data is addressed through top NewSQL databases considering their
features. This NewSQL databases paper conveys some of the top NewSQL databases [54] features collection
considering high demand and usage. First part, around 11 NewSQL databases have been investigated for
eliciting, comparing and examining their features so that they might assist to observe high hierarchy of
NewSQL databases and to reveal their similarities and their differences. Our taxonomy involves four types
categories in terms of how NewSQL databases handle, and process big data considering technologies are
offered or supported. Advantages and disadvantages are conveyed in this survey for each of NewSQL
databases. At second part, we register our findings based on several categories and aspects: first, by our
first taxonomy which sees features characteristics are either functional or non-functional. A second
taxonomy moved into another aspect regarding data integrity and data manipulation; we found data
features classified based on supervised, semi-supervised, or unsupervised. Third taxonomy was about how
diverse each single NewSQL database can deal with different types of databases. Surprisingly, Not only do
NewSQL databases process regular (raw) data, but also they are stringent enough to afford diverse type of
data such as historical and vertical distributed system, real-time, streaming, and timestamp databases.
Thereby we release NewSQL databases are significant enough to survive and associate with other
technologies to support other database types such as NoSQL, traditional, distributed system, and semirelationship
to be as our fourth taxonomy-based. We strive to visualize our results for the former categories
and the latter using chart graph. Eventually, NewSQL databases motivate us to analyze its big data
throughput and we could classify them into good data or bad data. We conclude this paper with couple
suggestions in how to manage big data using Predictable Analytics and other techniques.
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEMIJCI JOURNAL
Apache Cassandra is a distributed storage system for managing very large amounts of structured data.
Cassandra provides highly available service with no single point of failure. Cassandra aims to run on top
of an infrastructure of hundreds of nodes possibly spread across different data centers with small and large
components fail continuously. Cassandra manages the persistent state in the face of the failures which
drives the reliability and scalability of the software systems. Cassandra does not support a full relational
data model because it resembles a database and shares many design and implementation strategies. In this
paper, discuss an implementation of Cassandra as Hotel Management System application. Cassandra
system was designed to run on cheap commodity hardware. Cassandra provides high write throughput and
read efficiency.
Relational databases are a technology used universally that enables storage, management and retrieval of
varied data schemas. However, execution of requests can become a lengthy and inefficient process for
some large databases. Moreover, storing large amounts of data requires servers with larger capacities and
scalability capabilities. Relational databases have limitations to deal with scalability for large volumes of
data. On the other hand, non-relational database technologies, also known as NoSQL, were developed to
better meet the needs of key-value storage of large amounts of records. But there is a large amount of
NoSQL candidates, and most have not been compared thoroughly yet. The purpose of this paper is to
compare different NoSQL databases, to evaluate their performance according to the typical use for storing
and retrieving data. We tested 10 NoSQL databases with Yahoo! Cloud Serving Benchmark using a mix of
operations to better understand the capability of non-relational databases for handling different requests,
and to understand how performance is affected by each database type and their internal mechanisms.
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAijfcstjournal
Apache Cassandra is a distributed storage system for managing very large amounts of structured data.
Cassandra provides highly available service with no single point of failure. Cassandra aims to run on top
of an infrastructure of hundreds of nodes possibly spread across different data centers with small and large
components fail continuously. Cassandra manages the persistent state in the face of the failures which
drives the reliability and scalability of the software systems. Cassandra does not support a full relational
data model because it resembles a database and shares many design and implementation strategies. In this
paper, discuss an implementation of Cassandra as Hotel Management System application. Cassandra
system was designed to run on cheap commodity hardware. Cassandra provides high write throughput and
read efficiency.
TOP NEWSQL DATABASES AND FEATURES CLASSIFICATIONijdms
Versatility of NewSQL databases is to achieve low latency constrains as well as to reduce cost commodity
nodes. Out work emphasize on how big data is addressed through top NewSQL databases considering their
features. This NewSQL databases paper conveys some of the top NewSQL databases [54] features collection
considering high demand and usage. First part, around 11 NewSQL databases have been investigated for
eliciting, comparing and examining their features so that they might assist to observe high hierarchy of
NewSQL databases and to reveal their similarities and their differences. Our taxonomy involves four types
categories in terms of how NewSQL databases handle, and process big data considering technologies are
offered or supported. Advantages and disadvantages are conveyed in this survey for each of NewSQL
databases. At second part, we register our findings based on several categories and aspects: first, by our
first taxonomy which sees features characteristics are either functional or non-functional. A second
taxonomy moved into another aspect regarding data integrity and data manipulation; we found data
features classified based on supervised, semi-supervised, or unsupervised. Third taxonomy was about how
diverse each single NewSQL database can deal with different types of databases. Surprisingly, Not only do
NewSQL databases process regular (raw) data, but also they are stringent enough to afford diverse type of
data such as historical and vertical distributed system, real-time, streaming, and timestamp databases.
Thereby we release NewSQL databases are significant enough to survive and associate with other
technologies to support other database types such as NoSQL, traditional, distributed system, and semirelationship
to be as our fourth taxonomy-based. We strive to visualize our results for the former categories
and the latter using chart graph. Eventually, NewSQL databases motivate us to analyze its big data
throughput and we could classify them into good data or bad data. We conclude this paper with couple
suggestions in how to manage big data using Predictable Analytics and other techniques.
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEMIJCI JOURNAL
Apache Cassandra is a distributed storage system for managing very large amounts of structured data.
Cassandra provides highly available service with no single point of failure. Cassandra aims to run on top
of an infrastructure of hundreds of nodes possibly spread across different data centers with small and large
components fail continuously. Cassandra manages the persistent state in the face of the failures which
drives the reliability and scalability of the software systems. Cassandra does not support a full relational
data model because it resembles a database and shares many design and implementation strategies. In this
paper, discuss an implementation of Cassandra as Hotel Management System application. Cassandra
system was designed to run on cheap commodity hardware. Cassandra provides high write throughput and
read efficiency.
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...DataStax
In this presentation, we will detail two image processing applications which rely on a Cassandra centric architecture to achieve distributed, high accuracy analysis of a variety of image formats, types, and quality, and which require different kinds of metadata processing as well as feature extraction from the image themselves. We will outline the architecture choices made for the two use case studies, and how we found Cassandra to be the ideal choice for the persistence layer implementation technology. In conclusion we will discuss extensions to the two use cases discussed and some of the 'lessons learned' from the two implementation projects.
About the Speaker
Kerry Koitzsch Project Lead, Kildane Software Technologies, Inc
Kerry Koitzsch is a software engineer and architect specializing in big data applications, NoSQL databases, and image processing. He currently works for Correlli Software Systems, a big data analytics company in Sunnyvale CA.
Wengines, Workflows, and 2 years of advanced data processing in Apache OODTChris Mattmann
With the advent of OODT-215 and OODT-491, there has been a tremendous amount of work to port our next generation Workflow Management system (cutely dubbed "WEngine" for "workflow engine") from an isolated branch into the mainline trunk.
The WEngine system brings amazing advantages including explicit support for branch and bounds in workflow models; prioritized thread pooling and queueing on a per task, and per workflow level; global workflow level conditions (pre and post); condition and workflow timeouts, and an entirely new and more descriptive state model complete with failure codes, and with checkpointing.
WEngine is currently processing the NPOESS Preparatory Project (NPP) PEATE testbed and its thousands of jobs per day, and is being slowly introduced into processing of an entire snow and ice climatology for the Western US and Alaska for the U.S. National Climate Assessment (NCA), working with the world's best snow hydrologists and snow scientists.
With all of those new features, what's an Apache OODT user and fan to do? How can you use WEngine in your system? How does it work today? How will it work tomorrow? We'll answer those questions and more in this fly-by-the-seat-of-your-pants exciting super talk!
This report describes how the Aucfanlab team used Azure’s Data Factory service to
implement the orchestration and monitoring of all data pipelines for our “Aucfan
Datalake” project.
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...DataStax
In this presentation, we will detail two image processing applications which rely on a Cassandra centric architecture to achieve distributed, high accuracy analysis of a variety of image formats, types, and quality, and which require different kinds of metadata processing as well as feature extraction from the image themselves. We will outline the architecture choices made for the two use case studies, and how we found Cassandra to be the ideal choice for the persistence layer implementation technology. In conclusion we will discuss extensions to the two use cases discussed and some of the 'lessons learned' from the two implementation projects.
About the Speaker
Kerry Koitzsch Project Lead, Kildane Software Technologies, Inc
Kerry Koitzsch is a software engineer and architect specializing in big data applications, NoSQL databases, and image processing. He currently works for Correlli Software Systems, a big data analytics company in Sunnyvale CA.
Wengines, Workflows, and 2 years of advanced data processing in Apache OODTChris Mattmann
With the advent of OODT-215 and OODT-491, there has been a tremendous amount of work to port our next generation Workflow Management system (cutely dubbed "WEngine" for "workflow engine") from an isolated branch into the mainline trunk.
The WEngine system brings amazing advantages including explicit support for branch and bounds in workflow models; prioritized thread pooling and queueing on a per task, and per workflow level; global workflow level conditions (pre and post); condition and workflow timeouts, and an entirely new and more descriptive state model complete with failure codes, and with checkpointing.
WEngine is currently processing the NPOESS Preparatory Project (NPP) PEATE testbed and its thousands of jobs per day, and is being slowly introduced into processing of an entire snow and ice climatology for the Western US and Alaska for the U.S. National Climate Assessment (NCA), working with the world's best snow hydrologists and snow scientists.
With all of those new features, what's an Apache OODT user and fan to do? How can you use WEngine in your system? How does it work today? How will it work tomorrow? We'll answer those questions and more in this fly-by-the-seat-of-your-pants exciting super talk!
This report describes how the Aucfanlab team used Azure’s Data Factory service to
implement the orchestration and monitoring of all data pipelines for our “Aucfan
Datalake” project.
What is NoSQL? How does it come to the picture? What are the types of NoSQL? Some basics of different NoSQL types? Differences between RDBMS and NoSQL. Pros and Cons of NoSQL.
What is MongoDB? What are the features of MongoDB? Nexus architecture of MongoDB. Data model and query model of MongoDB? Various MongoDB data management techniques. Indexing in MongoDB. A working example using MongoDB Java driver on Mac OSX.
Challenges Management and Opportunities of Cloud DBAinventy
Research Inventy provides an outlet for research findings and reviews in areas of Engineering, Computer Science found to be relevant for national and international development, Research Inventy is an open access, peer reviewed international journal with a primary objective to provide research and applications related to Engineering. In its publications, to stimulate new research ideas and foster practical application from the research findings. The journal publishes original research of such high quality as to attract contributions from the relevant local and international communities.
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGijiert bestjournal
An unstructured data poses challenges to storing da ta. Experts estimate that 80 to 90 percent of the d ata in any organization is unstructured. And the amount of uns tructured data in enterprises is growing significan tly� often many times faster than structured databases are gro wing. As structured data is existing in table forma t i,e having proper scheme but unstructured data is schema less database So it�s directly signifying the importance of NoSQL storage Model and Map Reduce platform. For processi ng unstructured data,where in existing it is given to Cassandra dataset. Here in present system along wit h Cassandra dataset,Mongo DB is to be implemented. As Mongo DB provide flexible data model and large amou nt of options for querying unstructured data. Where as Cassandra model their data in such a way as to mini mize the total number of queries through more caref ul planning and renormalizations. It offers basic secondary ind exes but for the best performance it�s recommended to model our data as to use them infrequently. So to process
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...IJCERT JOURNAL
NOSQL is a database provides a mechanism for storage and retrieval of data that is modeled for huge amount of data which is used in big data and Cloud Computing . NOSQL systems are also called "Not only SQL" to emphasize that they may support SQL-like query languages. A basic classification of NOSQL is based on data model; they are like column, Document, Key-Value etc. The objective of this paper is to study and compare the implantation of various column oriented data stores like Bigtable, Cassandra.
The rising interest in NoSQL technology over the last few years resulted in an increasing number of evaluations and comparisons among competing NoSQL technologies From survey we create a concise and up-to-date comparison of NoSQL engines, identifying their most beneficial use from the software engineer point of view.
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAijfcstjournal
Apache Cassandra is a distributed storage system for managing very large amounts of structured data. Cassandra provides highly available service with no single point of failure. Cassandra aims to run on top of an infrastructure of hundreds of nodes possibly spread across different data centers with small and large components fail continuously. Cassandra manages the persistent state in the face of the failures which drives the reliability and scalability of the software systems. Cassandra does not support a full relational data model because it resembles a database and shares many design and implementation strategies. In this paper, discuss an implementation of Cassandra as Hotel Management System application. Cassandra system was designed to run on cheap commodity hardware. Cassandra provides high write throughput and read efficiency.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
1. NoSQL
Thenraja Vettivelraj
Swansea University
Contents List
ABSTRACT
1. INTRODUCTION
2. MAIN FEATURES
2.1 COMPARISON WITH SQL
3. EXAMPLE - CASSANDRA
3.1 MAIN FEATURES OF APACHE CASSANDRA
3.2 WHY APACHE CASSANDRA?
3.3 APPLICATIONS
4. DRAWBACKS OF NOSQL
5. SUMMARY
6. REFERENCES
2. ABSTRACT
NoSQL is one of the emerging fields without any arguments. It is a very powerful and efficient tool in
data storage and manipulating the data. It has no fixed Schema, no Joins and it also avoided the
“ACID” properties. [Han, J. et al., 2011] And basically one of the advantages of the NoSQL is very
much faster than the SQL and also the operational cost will be low than the relational database. Due to
the current trend there is necessity in increase of Storage, Connectedness, Architecture and Semi-
Structure [Accessed: 25 Feb 2012].
1. INTRODUCTION
The term “NoSQL” means, it has so many interpretations at first many told that it is Non-Relational
database and others say that “NOSQL” stands for Not Only SQL. And now-a-days they are calling the
term “NOSQL” as an Umbrella term for all the databases and the data stores which don’t follow the
relational database and also it is not a single technology or a product but it is a class of products,
collection of diverse and matter of about how to manipulate and store the data [Accessed: 24 Feb
2012]
It's a term basically hit the market on 1998 [Accessed: 24 Feb 2012] and now for the past 3-4 years it
has its own place in the market because of its tremendous growth. Massive scalability, Lower cost,
Schema flexibility, Massive Data Stores and high availability [Accessed: 24 Feb2012]. Some of the
main applications of the NoSQL are Search Engines, Data Processing and Social Website. NoSQL
does not support Joins and but it supports ACID properties.
There are four main data models in NoSQL namely
Key-Value Stores
Big Table Clones
Document Databases
Graph Databases
In these we have to choose the right one for our job [Accessed: 25 Feb 2012]. Some of the very
examples of NOSQL databases are Cassandra which is used by Facebook (Social Networking Site)
and it comes under the Key-Value store. It has the capability to handle data very huge Terabyte (TB)
of data in a single day because of its users. Big Table is an example for BigTable Clones and they
reasoned for developing their own database in order to increase the control the performance and
scalability. Google uses for its Search Engine, Gmail, Orkut and other Google applications. Neo4j is a
very good example for Graph database and it is written in Java. Apache CouchDB which is an
example for Document database written in Erlang. In the Figure 1 they have compared the four
different data models of NoSQL in a graph size versus complexity.
2. MAIN FEATURES
CAP theorem-Consistency, Availability and Partition tolerance. According to [Accessed: 11 Mar
2012] “Available, Partition-Tolerant (AP) Systems achieve "eventual consistency" through
replication and verification. Examples of AP systems is Cassandra, CouchDB
Consistency means that each client always has the same view of the data.
Availability means that all clients can always read and write.
Partition tolerance means that the system works well across physical network
partitions.”
3. Size
Complexity
Figure 1: Comparison on NoSQL data models
2.1 COMPARISON WITH SQL
When we compare with SQL, NoSQL slightly have the upper hand because of scalability and
performance. Uses map reduce, CQL instead of SQL language.
3. EXAMPLE - CASSANDRA
Cassandra is one of the well known NoSQL database and it is used widely because it has the
capability to handle large amounts of structured data without any failure and it will be ease of use.
It is written in Java and it requires JVM (Java Virtual Machine) to be installed in the system before
you start your Server and also is of key-value store type. Basically Cassandra supports CQL
(Cassandra Query Language). DataStax is one of the third party distributions of the Cassandra and it
has the Cassandra CQL Shell where we have to create the Keyspace and Column family.
Figure 2: Cassandra CQL Shell where keyspace and column family created
Key-value
stores
Big table clones
Document databases
Graph databases
4. Keyspace is the outer most grouping of our data and it also a collection of column family and typically
each application will have one keyspace name. They are the management and configuration part for
the column family. And one most important thing about the keyspace is the replicating factor. In the
above we created the strategy class as Simple strategy, other than this there is Network strategy
topology. And we can create multiple number of nodes. Then created the Column family named
example. Normally there are two types of column namely
Standard column family and
Super column family
Cassandra consists of three simple methods. They are insert, get and delete.
Standard column family
Super Colum family
Figure 3: Cassandra Data Modelling
3.1 MAIN FEATURES OF APACHE CASSANDRA
Partitioning
This is one of the main features in Cassandra because the data we are storing will be partitioned
dynamically and stored in the cluster over the set of available nodes by using the Hash mechanism.
By consistent hashing we will get a fixed circular space or “ring”. Each node has been assigned with a
random which denotes the position in the ring. Each data stored has been assigned a specific key in
the ring.
5. Figure 4: Ring View of Cassandra Test cluster
The above shown is the ring view of the Cassandra test cluster which has a token value and also it has
some other information like IP, Size and Load which is available in Web Interface of Datastax
(http://localhost:8888/opscenter/index.html) by default.
Scaling the cluster
Cassandra can also support multi node. When a new node is added into the existing system which
already has one node will split up the workload of other node and hence will be responsible for the
same job what the other node does. This can be done by the Bootstrap algorithm by some node in
command line utility or by the Cassandra web dashboard.
Figure 5: Cassandra dashboard
3.2 WHY APACHE CASSANDRA?
There are many factors that why I should have Cassandra mainly because it has the capability to
handle TB or PB’s of data in a peer to peer architecture, it follows CQL (Cassandra Query Language)
which is alike SQL, peer to peer architecture, Data will be replicated to multiple nodes and hence
6. there won’t be single point of failure, cloud enabled, data will be replicated to more than one location
in case of disaster recovery scenarios so there will be durability and high availability, transparent fault
detection and recovery which follows gossip protocol, ease of use and no special hardware is required
to run.
3.3 APPLICATIONS
Companies like Accenture, Twitter, Facebook and many more companies were using the NoSQL
database in one or other way because of its main features. Not only in industries but also in
Educational and other government sectors also slowly started using the NoSQL database. For example
“Burt uses Cassandra in their software to help advertisers and agencies improve the efficiency and
effect of online campaigns” [Accessed: 11 Mar 2012].
4. DRAWBACKS OF NOSQL
Unlike the SQL it doesn't have ACID properties. So we cannot expect the degree of reliability what
we get in the SQL database. Many were unfamiliar with this technology. Unlike the other commercial
SQL databases here we won't get enough support for the product, since many of the NoSQL were
only limited support.
5. SUMMARY
Like Graph database, Key-value database, Big table Clones, Document database it has made a very
big impact in the database field and most of them are Open source. So in my point of view I am sure
that many will soon migrate towards NoSQL from SQL. So in the next two to four years we can
expect a major change in the database field because of its scalability and its other features, but
chances are less that it will replace the SQL databases. Each database has its Pros and Cons and it’s
our duty to choose the right one.
7. 6. REFERENCES
[Accessed: 24 Feb2012] Slideshare.net (2010) NoSQL databases. [Online] Available at:
http://www.slideshare.net/marin_dimitrov/nosql-databases-3584443
[Accessed: 24 Feb 2012] Perdue, T. (1998) NoSQL - An Overview of NoSQL. [online] Available at:
http://newtech.about.com/od/databasemanagement/a/Nosql.htm
[Accessed: 24 Feb 2012] Tiwari, S. (2011) Professional NoSQL. [e-book] Wrox Programmer to
Programmer. Available through: Google Books
http://books.google.co.uk/books?id=tv5iO9MnObUC&printsec=frontcover&dq=nosql&hl=en&sa=X
&ei=5vw_T9CABMG_0QWtzqyPDw&ved=0CEQQ6AEwAg#v=onepage&q=nosql&f=false
[Han, J. et al. , 2011] Han, J. et al. (2011)"Survey on NoSQL database," Pervasive Computing and
Applications (ICPCA), 2011 6th International Conference on , vol., no., pp.363-366, 26-28 Oct. 2011
doi: 10.1109/ICPCA.2011.6106531
[Accessed: 4 Mar 2012] Slideshare.net (2010) NoSQL or not NoSQL? [Online] Available at:
http://www.slideshare.net/ruflin/nosql-or-not-nosql
[Accessed: 25 Feb 2012] Blogs.neotechnology.com (2009) NOSQL: scaling to size and scaling to
complexity - Emil's Neo Thoughts. [Online] Available at:
http://blogs.neotechnology.com/emil/2009/11/nosql-scaling-to-size-and-scaling-to-complexity.html
[Accessed: 25 Feb 2012] Slideshare.net (2011) A NOSQL Overview And The Benefits Of Graph
Databases (nosql east 2009). [Online] Available at: http://www.slideshare.net/emileifrem/nosql-east-
a-nosql-overview-and-the-benefits-of-graph-databases
[Accessed: 25 Feb 2012] Slideshare.net (2011) NOSQL for Dummies. [Online] Available at:
http://www.slideshare.net/thobe/nosql-for-dummies
Leavitt, N.; , "Will NoSQL Databases Live Up to Their Promise?," Computer , vol.43, no.2, pp.12-14,
Feb. 2010 doi: 10.1109/MC.2010.58
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5410700&isnumber=5410692
[Accessed: 11 Mar 2012] Blog.nahurst.com (2010) Visual Guide to NoSQL Systems - Nathan Hurst's
Blog. [Online] Available at: http://blog.nahurst.com/visual-guide-to-nosql-systems
[Accessed: 11 Mar 2012] Datastax.com (2011) Cassandra Users | DataStax. [online] Available at:
http://www.datastax.com/cassandrausers