Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to cassandra

1,378 views

Published on

Introduction to cassandra

Published in: Software
  • Be the first to comment

Introduction to cassandra

  1. 1. Cassandra - A Decentralized Structured Storage System Nguyen Tuan Quang Saltlux – Vietnam Development Center 2016.03.21
  2. 2. Agenda • Database System Outlines • Cassandra Overview • Data Model & Architecture • Key features • Comparison
  3. 3. Database Market
  4. 4. Relational DBMS • Since 1970 • Use SQL to manipulate data • Excellent for applications such as management (accounting, reservations, staff management, etc)
  5. 5. Relational DBMS • Schemas aren't designed for sparse data • Databases are simply not designed to be distributed
  6. 6. New Trends and Requirements
  7. 7. New Trends and Requirements
  8. 8. CAP Theory all nodes see the same data at the same time the system continues to operate despite arbitrary message loss every request receives a response about whether it was successful or failed
  9. 9. Consistency Level • Strong (Sequential): After the update completes any subsequent access will return the updated value • Weak (weaker than Sequential): The system does not guarantee that subsequent accesses will return the updated value • Eventual: All updates will propagate throughout all of the replicas in a distributed system, but that this may take some time. Eventually, all replicas will be consistent.
  10. 10. Cassandra • Apache Cassandra was initially developed at Facebook to power their Inbox Search • Originally designed at Facebook, Cassandra came from Amazon’s highly available Dynamo and Google’s BigTable data model
  11. 11. Use-case: Facebook Inbox Search • Cassandra developed to address this problem. • 50+TB of user messages data in 150 node cluster on which Cassandra is tested. • Search user index of all messages in 2 ways. – Term search : search by a key word – Interactions search : search by a user id
  12. 12. Use-cases: Apple • Cassandra is Apple's dominant NoSQL database – MongoDB - 35 job listings (iTunes, Customer Systems Platform, and others) – Couchbase - 4 job listings (iTunes Social) – Hbase - 33 job listings (Maps, Siri, iAd, iCloud, and more) – Cassandra - 70 job listings (Maps, iAd, iCloud, iTunes, and more) Replication and Multi Data Center Replication
  13. 13. Use-cases: NetFlix
  14. 14. Use-cases - Apple
  15. 15. Data Model • Keyspace is the outermost container for data in Cassandra • Columns are grouped into Column Families. • Each Column has – Name – Value – Timestamp
  16. 16. Keyspace: metasearch Column Families: Metasearch_korean Data Model for Tornado Metasearch TOPIC_URL URL1 TOPIC_CONTENT CONTENT 1 TOPIC_TITLE TOPIC_TITLE1 Row 1 Key TOPIC_URL URL2 TOPIC_CONTENT CONTENT 2 TOPIC_TITLE TOPIC_TITLE2 Row 2 Key
  17. 17. • Partitioning How data is partitioned across nodes • Replication How data is duplicated across nodes • Cluster Membership How nodes are added, deleted to the cluster System Architecture
  18. 18. • Nodes are logically structured in Ring Topology. • Hashed value of key associated with data partition is used to assign it to a node in the ring. • Hashing rounds off after certain value to support ring structure. • Lightly loaded nodes moves position to alleviate highly loaded nodes. Partitioning
  19. 19. Partitioning
  20. 20. Partitioning ?
  21. 21. Partitioning
  22. 22. Partitions, Partition Key
  23. 23. Replication • Each data item is replicated at N (replication factor) nodes. • Different Replication Policies – Rack Unaware – replicate data at N-1 successive nodes after its coordinator – Rack Aware – uses ‘Zookeeper’ to choose a leader which tells nodes the range they are replicas for – Datacenter Aware – similar to Rack Aware but leader is chosen at Datacenter level instead of Rack level.
  24. 24. 01 1/2 F E D C B A N=3 h(key2) h(key1) 24 Partitioning and Replication * Figure taken from Avinash Lakshman and Prashant Malik (authors of the paper) slides.
  25. 25. 25 Partitioning and Replication
  26. 26. Cassandra Key features • Big Data Scalability – Scalable to petabytes – New nodes = linear performance increase – Add new nodes online
  27. 27. Cassandra Key features • No Single Point of Failture – All nodes are the same – Read/write from any nodes – Can replicate from different data centers
  28. 28. Cassandra Key features • Easy Replica/Data Distribution – Transparently handled by Cassandra – Multiple data centers are supported – Exploit the benefits of cloud computing
  29. 29. Cassandra Key features • No need for caching software – Peer-to-peer architectures removes needs for special caching layer – Database cluster uses memory of its own nodes to cache data
  30. 30. Cassandra Key features • Tunable Data Consistency – Choose between strong and eventually consistency – Can be done on per-operation basis, and for both reads and writes
  31. 31. Cassandra Key features • Tunable Data Consistency – Choose between strong and eventually consistency – Can be done on per-operation basis, and for both reads and writes
  32. 32. Mongodb vs. Cassandra
  33. 33. Comparison with MySQL • MySQL > 50 GB Data Writes Average : ~300 ms Reads Average : ~350 ms • Cassandra > 50 GB Data Writes Average : 0.12 ms Reads Average : 15 ms • Stats provided by Authors using facebook data.
  34. 34. Key features Recaps • Distributed and Decentralized – Some nodes need to be set up as masters in order to organize other nodes, which are set up as slaves – That there is no single point of failure • High Availability & Fault Tolerance – You can replace failed nodes in the cluster with no downtime, and you can replicate data to multiple data centers to offer improved local performance and prevent downtime if one data center experiences a catastrophe such as fire or flood. • Tunable Consistency – It allows you to easily decide the level of consistency you require, in balance with the level of availability
  35. 35. Key features Recaps • Elastic Scalability – Elastic scalability refers to a special property of horizontal scalability. It means that your cluster can seamlessly scale up and scale back down.
  36. 36. References • https://jaxenter.com/evaluating-nosql-performance-which-database-is- right-for-your-data-107481.html • http://www.slideshare.net/amcsquarelearning/learn-mongo-db-at- amc-square-learning?next_slideshow=1 • https://en.wikipedia.org/wiki/Apache_Cassandra • http://www.datastax.com/ • http://www.slideshare.net/asismohanty/cassandra-basics-20
  37. 37. Thank You

×