Cassandra MongoDB Hbase Neo4j
Type Distributed NoSQL database
Document-oriented NoSQL
database
Distributed, scalable NoSQL
database built on Hadoop
Graph database
Storage Column-family based
Stores data in JSON-like BSON
format
Column-oriented Node-relationship storage
Consistency
Tunable consistency (can be
configured per operation)
Eventually consistent with strong
consistency options
Strong consistency
ACID-compliant (strong
consistency)
Scalability
Horizontally scalable, designed for
high availability and fault tolerance
without a single point of failure
Horizontally scalable using
sharding
Horizontally scalable, built to
handle large amounts of data
across many servers
Scales vertically
Replication
Data is replicated across multiple
nodes with replication factor
Replica sets for redundancy and
high availability
Supports master-slave
replication
High availability cluster with
master-slave replication
Data Model
Wide-column store. Data is organized
into tables with rows and columns,
where rows can have a dynamic set of
columns
Document store. Data is stored in
collections as documents (BSON
format), which can vary in structure
Wide-column store. Data is
stored in tables with rows and
column families.
Graph-based. Data is
represented as nodes,
relationships, and properties
Schema
Flexible schema with a primary key
for row identification
Schema-less, allowing for flexible
and dynamic document structures
Defined by column families;
rows within a family can have
different columns
Schema-optional, with the
flexibility to add properties and
relationships without predefined
schema
Data
Distribution
Data is partitioned and distributed
across multiple nodes using
consistent hashing
Sharding for horizontal scaling,
where data is distributed across
shards
Data is distributed across region
servers using HDFS (Hadoop
Distributed File System)
Primarily vertical scaling, with
some horizontal scaling
capabilities in the Enterprise
edition
Fault Tolerance
Designed for fault tolerance and high
availability with automatic data
replication
Replica sets provide redundancy
and automatic failover
Built on HDFS, which handles
replication and fault tolerance
Provides high availability with
master-slave replication
Query
Language
CQL (Cassandra Query Language)
MongoDB query language (based
on JSON-like syntax)
No native query language.
Typically accessed through Java
API.
Cypher, a declarative graph
query language
Development
Flexibility
Suitable for time-series data, IoT, and
applications requiring high write
throughput
Ideal for applications requiring
flexible, evolving schemas, such as
content management systems and e-
commerce platforms
Best for real-time read/write
access to large datasets, often
used in conjunction with Hadoop
for big data analytics
Optimal for applications
involving complex relationships
and graph-based queries, such
as social networks,
recommendation engines, and
fraud detection
Architecture
Data
Model
Data
Distribution
Model
Development
Model
Factors

Databases Comparison in nosql databases.

  • 1.
    Cassandra MongoDB HbaseNeo4j Type Distributed NoSQL database Document-oriented NoSQL database Distributed, scalable NoSQL database built on Hadoop Graph database Storage Column-family based Stores data in JSON-like BSON format Column-oriented Node-relationship storage Consistency Tunable consistency (can be configured per operation) Eventually consistent with strong consistency options Strong consistency ACID-compliant (strong consistency) Scalability Horizontally scalable, designed for high availability and fault tolerance without a single point of failure Horizontally scalable using sharding Horizontally scalable, built to handle large amounts of data across many servers Scales vertically Replication Data is replicated across multiple nodes with replication factor Replica sets for redundancy and high availability Supports master-slave replication High availability cluster with master-slave replication Data Model Wide-column store. Data is organized into tables with rows and columns, where rows can have a dynamic set of columns Document store. Data is stored in collections as documents (BSON format), which can vary in structure Wide-column store. Data is stored in tables with rows and column families. Graph-based. Data is represented as nodes, relationships, and properties Schema Flexible schema with a primary key for row identification Schema-less, allowing for flexible and dynamic document structures Defined by column families; rows within a family can have different columns Schema-optional, with the flexibility to add properties and relationships without predefined schema Data Distribution Data is partitioned and distributed across multiple nodes using consistent hashing Sharding for horizontal scaling, where data is distributed across shards Data is distributed across region servers using HDFS (Hadoop Distributed File System) Primarily vertical scaling, with some horizontal scaling capabilities in the Enterprise edition Fault Tolerance Designed for fault tolerance and high availability with automatic data replication Replica sets provide redundancy and automatic failover Built on HDFS, which handles replication and fault tolerance Provides high availability with master-slave replication Query Language CQL (Cassandra Query Language) MongoDB query language (based on JSON-like syntax) No native query language. Typically accessed through Java API. Cypher, a declarative graph query language Development Flexibility Suitable for time-series data, IoT, and applications requiring high write throughput Ideal for applications requiring flexible, evolving schemas, such as content management systems and e- commerce platforms Best for real-time read/write access to large datasets, often used in conjunction with Hadoop for big data analytics Optimal for applications involving complex relationships and graph-based queries, such as social networks, recommendation engines, and fraud detection Architecture Data Model Data Distribution Model Development Model Factors