• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
NoSQL Databases

NoSQL Databases



A presentation showing some NoSQL databases and Apache Cassandra in detail

A presentation showing some NoSQL databases and Apache Cassandra in detail



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    NoSQL Databases NoSQL Databases Presentation Transcript

    • Databases Eduard Tudenhöfner
    • Overview ● Why NoSQL? ● Classification ● CAP Theorem ● BASE vs ACID ● Cassandra in Action ● Summary
    • Overview ● Why NoSQL? ● Classification ● CAP Theorem ● BASE vs ACID ● Cassandra in Action ● Summary
    • Why NoSQL? ● original intention: modern web-scale DBs ○ amount of data drastically increased ○ data in the web is less structured ● higher requirements regarding performance ● some problems are easier to solve without the relational approach ● scaling out & running on commodity HW is much cheaper than scaling up
    • Typical Characteristics ● non-relational ● horizontally scalable ● flexible schema ● easy replication support ● simple API ● eventually consistent -> BASE principle
    • Overview ● Why NoSQL? ● Classification ● CAP Theorem ● BASE vs ACID ● Cassandra in Action ● Summary
    • Classification source: http://blog.octo.com/wp-content/uploads/2012/07/QuadrantNoSQL.png
    • Classification source: http://www.sics.se/~amir/files/download/dic/NoSQL%20Databases.pdf
    • Key/Value Stores ● data model: collection of key/value pairs ● keys and values can be complex compounds ● based on Amazon’s Dynamo Paper ● designed to handle massive load
    • Key/Value Stores ● no complex query filters ● all joins must be in the code ● easy to distribute across cluster ● very predictable performance -> O(1)
    • Wide Column Stores ● Tables are similar to RDBMS, but semi-structured ● based on Google’s BigTable ● Rows can have arbitrary columns
    • Wide Column Stores -> BigTable ● <RowKey, ColumnKey, Timestamp> triple as key for lookups, inserts, deletes ● ColumnKey uses syntax family:qualifier ● arbitrary columns on a row-by-row basis ● does not support a relational model ○ no table-wide integrity constraints ○ no multi-row transactions source: http://research.google.com/archive/bigtable.html
    • Document Stores ● inspired by Lotus Notes ● central concept of a Document ● Documents encapsulate/encode data in some format/encoding ● Encodings: ○ XML, YAML, JSON, BSON, PDF
    • Document Stores source: http://www.mongodb.org/
    • Document Stores source: http://www.mongodb.org/
    • Graph Databases ● based on Graph Theory -> G = (V, E) ● designed for data that is well represented in a graph ○ social networks, public transport links, network topologies, road maps ● nodes, edges, properties are used to represent and store data ● graph relationships are queryable
    • Graph Databases source: http://www.neo4j.org/
    • Graph Databases source: http://en.wikipedia.org/wiki/Graph_database
    • Overview ● Why NoSQL? ● Classification ● CAP Theorem ● BASE vs ACID ● Cassandra in Action ● Summary
    • CAP Theorem source: http://blog.nahurst.com/visual-guide-to-nosql-systems
    • Overview ● Why NoSQL? ● Classification ● CAP Theorem ● BASE vs ACID ● Cassandra in Action ● Summary
    • ACID ● Atomicity ○ all-or-nothing approach ● Consistency ○ DB will be in a consistent state before & after a transaction ● Isolation ○ transaction will behave as if it’s the only operation being performed upon the DB ● Durability ○ once a transaction is committed, it is durably preserved ● CA-Systems are ACID-Systems
    • BASE ● an application that works basically all the time, does not have to be consistent all the time, but will be in some known state eventually ● Basically Available ○ achieved by using a highly distributed approach ● Soft State ○ state of the system is always “soft” due to eventual consistency ● Eventual Consistency (in German: schlussendliche Konsistenz) ○ at some point in the future, the data will be consistent ○ no guarantees are made about when this will occur
    • BASE vs ACID source: http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
    • Overview ● Why NoSQL? ● Classification ● CAP Theorem ● BASE vs ACID ● Cassandra in Action ● Summary
    • Cassandra ● initially created by Facebook for Inbox Search ● distributed, horizontally scalable database ● high availability ● very flexible data model ○ data might be structured, semi-structured, unstructured ● commercial support through DataStax
    • Cassandra - Design ● all nodes are equally important ● no Single-Point-of-Failure ● no central controller ● no master/slave relationships ● every node knows how to route requests and where the data lives source: http://cassandra.apache.org/
    • Scales Linearly source: http://www.datastax.com
    • Uses Consistent Hashing Murmur3Partitioner generates hash source: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureDataDistributeHashing_c.html
    • Uses Consistent Hashing source: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureDataDistributeHashing_c.html
    • Writes are very fast ● All writes are sequential ● no reading & seeking before a write ● Each of the N node will perform the following upon receiving the RowMutation message: ○ Append write to the commit log ○ Update in-memory Memtable data structure ○ Write is done! ● If Memtable gets full, it’s flushed to disk (SSTable) source: http://www.roman10.net/how-apache-cassandra-write-works/
    • Write Requests ● Client requests can go to any node in the cluster because all nodes are peers source: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureClientRequestsWrite.html write consistency level is configurable
    • Write Requests ● Cassandra chooses one Coordinator per remote data center to handle requests to replicas ● coordinator only needs to forward WR to one node in each remote data center source: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureClientRequestsWrite.html
    • Read Requests ● Two different types of Read Requests ○ direct read request (RR) ○ background read repair request (RRR) ● number of replicas contacted by a RR is determined by Consistency Level ● RRR are sent to any additional nodes that did not get a direct RR ● RRR ensure consistency
    • Read Requests source: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureClientRequestsRead_c.html
    • Read Requests source: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureClientRequestsRead_c.html 2 of the 3 replicas for the given row must respond to fulfill the read request
    • Read Requests source: http://www.datastax.com/documentation/cassandra/2. 0/cassandra/architecture/architectureClientRequestsRead_c.html
    • CQL ● very similar to SQL ● does not support JOINS / Subqueries ● no referential integrity ● no cascading operations We denormalize the data because joins are not performant in a distributed system
    • CQL
    • CQL no index, no service :)
    • CQL - Collections ● CQL introduced collections to columns ○ list ○ map ○ set ● Add new collections to the previous example
    • CQL - Collections
    • Cassandra vs MySQL (50GB) ● MySQL ○ writes avg: ~300ms ○ reads avg: ~350ms ● Cassandra ○ writes avg: ~0.12ms ○ reads avg: ~15ms source: http://www.odbms.org/wp-content/uploads/2013/11/cassandra.pdf
    • Overview ● Why NoSQL? ● Classification ● CAP Theorem ● BASE vs ACID ● Cassandra in Action ● Summary
    • Summary ● elastic scaling (scaling out instead of up) ● huge amounts of data can be handled while maintaining high throughput rates ● require less DBA’s and management resources ○ automatic repairs/data distribution ○ simpler data models ● better economics ○ cost per GB is much lower than for RDBMS due to clusters of commodity HW ○ we handle more data with less money ● flexible data models ○ very relaxed or even non-existent data model restrictions ○ changes to data model are much cheaper
    • Summary ● might not be mature enough for enterprises ● compatibility issues regarding standards ○ each DB has its own API ○ not easy to switch to another NoSQL DB ● search support is not the same as in RDBMS ● easier to find experienced RDBMS experts than NoSQL experts
    • Which DB for which purpose? ● NoSQL is an alternative ○ addresses certain limitations of the relational DB world ● depends on characteristics of data ○ if data is well structured -> relational DB might be better ○ if data is very complex -> might be difficult to map it to the relational model ● depends on volatility of the data model ○ what if schema changes daily? ● relational DBs still have their pluses ○ relational model / transactions / query language ○ should be used when multi-row transactions and strict consistency is required
    • Thank you! - Questions?