NoSQL Databases

334
-1

Published on

A presentation showing some NoSQL databases and Apache Cassandra in detail

Published in: Engineering, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
334
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
14
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

NoSQL Databases

  1. 1. Databases Eduard Tudenhöfner
  2. 2. Overview ● Why NoSQL? ● Classification ● CAP Theorem ● BASE vs ACID ● Cassandra in Action ● Summary
  3. 3. Overview ● Why NoSQL? ● Classification ● CAP Theorem ● BASE vs ACID ● Cassandra in Action ● Summary
  4. 4. Why NoSQL? ● original intention: modern web-scale DBs ○ amount of data drastically increased ○ data in the web is less structured ● higher requirements regarding performance ● some problems are easier to solve without the relational approach ● scaling out & running on commodity HW is much cheaper than scaling up
  5. 5. Typical Characteristics ● non-relational ● horizontally scalable ● flexible schema ● easy replication support ● simple API ● eventually consistent -> BASE principle
  6. 6. Overview ● Why NoSQL? ● Classification ● CAP Theorem ● BASE vs ACID ● Cassandra in Action ● Summary
  7. 7. Classification source: http://blog.octo.com/wp-content/uploads/2012/07/QuadrantNoSQL.png
  8. 8. Classification source: http://www.sics.se/~amir/files/download/dic/NoSQL%20Databases.pdf
  9. 9. Key/Value Stores ● data model: collection of key/value pairs ● keys and values can be complex compounds ● based on Amazon’s Dynamo Paper ● designed to handle massive load
  10. 10. Key/Value Stores ● no complex query filters ● all joins must be in the code ● easy to distribute across cluster ● very predictable performance -> O(1)
  11. 11. Wide Column Stores ● Tables are similar to RDBMS, but semi-structured ● based on Google’s BigTable ● Rows can have arbitrary columns
  12. 12. Wide Column Stores -> BigTable ● <RowKey, ColumnKey, Timestamp> triple as key for lookups, inserts, deletes ● ColumnKey uses syntax family:qualifier ● arbitrary columns on a row-by-row basis ● does not support a relational model ○ no table-wide integrity constraints ○ no multi-row transactions source: http://research.google.com/archive/bigtable.html
  13. 13. Document Stores ● inspired by Lotus Notes ● central concept of a Document ● Documents encapsulate/encode data in some format/encoding ● Encodings: ○ XML, YAML, JSON, BSON, PDF
  14. 14. Document Stores source: http://www.mongodb.org/
  15. 15. Document Stores source: http://www.mongodb.org/
  16. 16. Graph Databases ● based on Graph Theory -> G = (V, E) ● designed for data that is well represented in a graph ○ social networks, public transport links, network topologies, road maps ● nodes, edges, properties are used to represent and store data ● graph relationships are queryable
  17. 17. Graph Databases source: http://www.neo4j.org/
  18. 18. Graph Databases source: http://en.wikipedia.org/wiki/Graph_database
  19. 19. Overview ● Why NoSQL? ● Classification ● CAP Theorem ● BASE vs ACID ● Cassandra in Action ● Summary
  20. 20. CAP Theorem source: http://blog.nahurst.com/visual-guide-to-nosql-systems
  21. 21. Overview ● Why NoSQL? ● Classification ● CAP Theorem ● BASE vs ACID ● Cassandra in Action ● Summary
  22. 22. ACID ● Atomicity ○ all-or-nothing approach ● Consistency ○ DB will be in a consistent state before & after a transaction ● Isolation ○ transaction will behave as if it’s the only operation being performed upon the DB ● Durability ○ once a transaction is committed, it is durably preserved ● CA-Systems are ACID-Systems
  23. 23. BASE ● an application that works basically all the time, does not have to be consistent all the time, but will be in some known state eventually ● Basically Available ○ achieved by using a highly distributed approach ● Soft State ○ state of the system is always “soft” due to eventual consistency ● Eventual Consistency (in German: schlussendliche Konsistenz) ○ at some point in the future, the data will be consistent ○ no guarantees are made about when this will occur
  24. 24. BASE vs ACID source: http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
  25. 25. Overview ● Why NoSQL? ● Classification ● CAP Theorem ● BASE vs ACID ● Cassandra in Action ● Summary
  26. 26. Cassandra ● initially created by Facebook for Inbox Search ● distributed, horizontally scalable database ● high availability ● very flexible data model ○ data might be structured, semi-structured, unstructured ● commercial support through DataStax
  27. 27. Cassandra - Design ● all nodes are equally important ● no Single-Point-of-Failure ● no central controller ● no master/slave relationships ● every node knows how to route requests and where the data lives source: http://cassandra.apache.org/
  28. 28. Scales Linearly source: http://www.datastax.com
  29. 29. Uses Consistent Hashing Murmur3Partitioner generates hash source: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureDataDistributeHashing_c.html
  30. 30. Uses Consistent Hashing source: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureDataDistributeHashing_c.html
  31. 31. Writes are very fast ● All writes are sequential ● no reading & seeking before a write ● Each of the N node will perform the following upon receiving the RowMutation message: ○ Append write to the commit log ○ Update in-memory Memtable data structure ○ Write is done! ● If Memtable gets full, it’s flushed to disk (SSTable) source: http://www.roman10.net/how-apache-cassandra-write-works/
  32. 32. Write Requests ● Client requests can go to any node in the cluster because all nodes are peers source: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureClientRequestsWrite.html write consistency level is configurable
  33. 33. Write Requests ● Cassandra chooses one Coordinator per remote data center to handle requests to replicas ● coordinator only needs to forward WR to one node in each remote data center source: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureClientRequestsWrite.html
  34. 34. Read Requests ● Two different types of Read Requests ○ direct read request (RR) ○ background read repair request (RRR) ● number of replicas contacted by a RR is determined by Consistency Level ● RRR are sent to any additional nodes that did not get a direct RR ● RRR ensure consistency
  35. 35. Read Requests source: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureClientRequestsRead_c.html
  36. 36. Read Requests source: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureClientRequestsRead_c.html 2 of the 3 replicas for the given row must respond to fulfill the read request
  37. 37. Read Requests source: http://www.datastax.com/documentation/cassandra/2. 0/cassandra/architecture/architectureClientRequestsRead_c.html
  38. 38. CQL ● very similar to SQL ● does not support JOINS / Subqueries ● no referential integrity ● no cascading operations We denormalize the data because joins are not performant in a distributed system
  39. 39. CQL
  40. 40. CQL no index, no service :)
  41. 41. CQL - Collections ● CQL introduced collections to columns ○ list ○ map ○ set ● Add new collections to the previous example
  42. 42. CQL - Collections
  43. 43. Cassandra vs MySQL (50GB) ● MySQL ○ writes avg: ~300ms ○ reads avg: ~350ms ● Cassandra ○ writes avg: ~0.12ms ○ reads avg: ~15ms source: http://www.odbms.org/wp-content/uploads/2013/11/cassandra.pdf
  44. 44. Overview ● Why NoSQL? ● Classification ● CAP Theorem ● BASE vs ACID ● Cassandra in Action ● Summary
  45. 45. Summary ● elastic scaling (scaling out instead of up) ● huge amounts of data can be handled while maintaining high throughput rates ● require less DBA’s and management resources ○ automatic repairs/data distribution ○ simpler data models ● better economics ○ cost per GB is much lower than for RDBMS due to clusters of commodity HW ○ we handle more data with less money ● flexible data models ○ very relaxed or even non-existent data model restrictions ○ changes to data model are much cheaper
  46. 46. Summary ● might not be mature enough for enterprises ● compatibility issues regarding standards ○ each DB has its own API ○ not easy to switch to another NoSQL DB ● search support is not the same as in RDBMS ● easier to find experienced RDBMS experts than NoSQL experts
  47. 47. Which DB for which purpose? ● NoSQL is an alternative ○ addresses certain limitations of the relational DB world ● depends on characteristics of data ○ if data is well structured -> relational DB might be better ○ if data is very complex -> might be difficult to map it to the relational model ● depends on volatility of the data model ○ what if schema changes daily? ● relational DBs still have their pluses ○ relational model / transactions / query language ○ should be used when multi-row transactions and strict consistency is required
  48. 48. Thank you! - Questions?

×