Comparative study of modern databases

Modern Databases in
a nutshell

Memory based Distributed Transactional Databases:
• Removes the impediments to performance of a traditional OLTP, by removing overheads like
Concurrency Control, logging, Locking mechanisms, and by storing the data in main memory.
• Uses Main Memory to store data (departure from traditional DB storing data in disk)
• Uses a pattern known as Anti-caching (opp. of caching)
• Uses Command logging (not Data logging as in traditional DB)
• Is Single threaded, uses multiple single-threaded engines
• Distributed cluster of shared nothing machines, with high availability
• High throughput : 100x times faster than traditional OLTP, Maintains ACID Transactions
• Uses streaming analytics with millisecond latency
Ref: H-store, Volt DB

Column stores:
• Stores data as Columns, Querying using a Column-executor is much faster (100x times faster
compared to Relational DB) as compared to traditional row-executor (technique called
‘vector processing’)
• Data compression is high as it is easy to compress data in columns stores, Record headers
are optimized
• Uses Shared nothing architecture, each node stores part of the data
• High availability, high elasticity
• Supports SQL Like Query language, has built-in analytical functions
• Supports open source technologies like Hadoop and R, and is Cloud enabled
Ref : HP Vertica, SAP Hana

NoSQL Distributed Document Stores:
• High throughput and Performance – millions/sec
• Supports database replication (primary and secondary nodes) and automatic failover
• Data is stored in the form of JSON documents, objects from programming language can be persisted as-is
(schema later paradigm). (MongoDB uses BSON serialization)
• Does not support JOINs, Does not support ACID Transactions over multiple documents
• Uses JSON like Query language, and powerful querying and aggregation capabilities including Analytics
Ex: Mongo DB, Apache Couch DB
NoSQL Distributed Key-value Store:
• High Performance, high throughput (sp. on writes)
• Ideally suited for Data-centers at geographically different places, fast replication
• No Single Point of Failure, automatic failover
• Linearly scalable
• Supports SQL Like query language, no Joins allowed
• Uses modeling by Query and key-value stores with Column families
Ex: Apache Cassandra

NoSQL Distributed Data Store, using Apache Lucene:
• Stores JSON documents (ElasticSearch), and multiple document formats like email, pdf (Apache
Solr)
• Can scale out to hundreds of servers and handle petabytes of data, and hides the complexity of
distributed systems. Uses shards (primary and replica) to scale horizontally
• Does not support ACID transactions on multiple documents
• Has extensive querying and aggregation function inbuilt
• Uses Lucene Search and Inverted index (every field is indexed), providing very fast text based
search capabilities
• Limited joins allowed (join can be used to restrict the output from one document type)
• Queries can be sent to any node in the cluster to trigger full distributed search across all shards,
with load balancing built in
Ex : Elastic Search , Apache Solr

Distributed Data Store, supporting ACID transactions:
• Scalable, distributes to multiple servers across geographical locations (proprietary clustering model)
• Supports ACID Transactions and uses SQL
• Combines features of traditional RDBMS with support for elastic scalability and availability
• Cloud enabled
• High performance
Ex : Nuo DB
Graph DB:
• Query performance is orders of magnitude better than RDBMS, and does not deteriorate with more
data
• Allows addition of relationships, node types without any change to existing queries
• Gives native Graph Database processing and storage
• Scales horizontally using Shards
• Supports ACID Transactions
• Has its own powerful Query Language (Cypher programming language)
• Superior caching features
Ex: Neo 4j

When to use which option:
• Use Main Memory DB when you have the following requirements:
• Response time needs to be v fast
• You need v high throughput in terms of millions / sec
• You need ACID Transactions
• Your data follows Relational pattern
• Use Column store DB like when you have the following requirements:
• Response time and Throughput needs are very high
• You have DWH Fact tables with huge no of columns and complex querying
requirements.
• Data growth is rapid
• Need to generate complex analytics, with fast response time
• Use a NoSQL Document Store when you have the following requirements
• Your data can be described in self-contained Documents, with little or no relations
• You need to generate analytics using aggregation techniques
• You need text centric search capabilities, ranking of results
• Your data can grow rapidly and you need elastic scalability
• You do not have ACID Transaction requirements when updating data

• Use a Distributed, No SQL Key-value store when you have the following requirements:
• Your data centers are geographically apart
• You know your queries and you can model your queries and define column families
• You have complex querying and analytics requirements
• You have self-contained data structures with no relations
• Your data can grow exponentially
• You need high performance
• Use a Distributed, Transactional DB when you have the following requirements
• You have data centers spread across geographical locations
• Your data can grow rapidly
• You need to maintain Transactions with ACID properties
• Your data is defined in the Relational form
• Use Graph Database, when you have the following requirements:
• You have highly inter-connected data, for ex: social networks and connections
• Your data can grow exponentially
• You need to maintain transactions with ACID properties
• You need fast response time
• Your data model can change very fast over time

References:
• Course content : Tackling the Challenges of Big Data – MIT Professional Education
• Elastic Search : The Definitive Guide, By: Clinton Gormley; Zachary TongO’Reily Media
Inc
• Neo4j Graph Data Modeling : By: Mahesh Lal, Packt Publishing
• MongoDB in Action, By: Kyle Banker, Publisher: Manning Publications
• Next Generation Databases: NoSQL, NewSQL, and Big Data, By: Guy Harrison Publisher:
Apress
• HP Vertica Essentials, By: Rishabh Agrawal, Publisher: Packt Publishing
• Practical Cassandra: A Developer’s Approach, By: Russell Bradberry; Eric Lubow,
Publisher: Addison-Wesley Professional
• Solr in Action, By: Trey Grainger and Timothy Potter, Publisher: Manning Publications

Comparative study of modern databases

More Related Content

What's hot

Similar to Comparative study of modern databases

Recently uploaded

Comparative study of modern databases