Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cassandra - Distributed Data Store

1,061 views

Published on

Why Big Data need new storage technology?
Linearly scalable
Fully Durable
Cassandra have table ,columns and CQL does not mean it’s SQL
Fully Distributed, no SPOF
Tuneable Consistency
Multi-master, Multi-DC

Published in: Technology
  • Login to see the comments

Cassandra - Distributed Data Store

  1. 1. Cassandra NoSQL Distributed Key-Value Store
  2. 2. ME TPSE 2013 TPSE 2015 Agile Thailand 2015 Big Data Conference 2015 @ Taipei Untitled 2016 Fulltime Software Engineer
  3. 3. Why Big Data need new storage technology?
  4. 4. • 1 Brand • 4 Hotels • 10 Rooms
  5. 5. • 1 Brand • 775,000 Hotels
  6. 6. Cassandra Linearly scalable Fully Durable Fully Distributed, no SPOF Tuneable Consistency Multi-master, Multi-DC
  7. 7. Linearly scalable
  8. 8. Fully durable
  9. 9. Fully Distributed, no SPOF P1 I need P1 P1 P2 P6 P1 P1
  10. 10. Tuneable Consistency Replication Factor Read Consistency Level (One, Two, Quorum, All) Write Consistency Level (One, Two, Quorum, All)
  11. 11. Replication Factor INSERT P1
  12. 12. Replication Factor INSERT P1 = 3
  13. 13. Replication Factor INSERT P1 = 3 P1 P1 P1
  14. 14. Consistency Level INSERT P1 P1 P1 = ALL P1 100 ms 110 ms 200 ms 200 ms DONE
  15. 15. Consistency Level INSERT P1 P1 P1 = 1 P1 100 ms 110 ms 200 ms ??? ms DONE
  16. 16. Consistency Level SELECT P1 P1 P1 = ALL P1 50 ms 70 ms 60 ms ??? ms P1
  17. 17. QUORUM = N / 2 + 1 N = Replication Factor
  18. 18. N = 2, QUORUM = 2 N = 3, QUORUM = 2 N = 4, QUORUM = 3 N = 5, QUORUM = 3 N = Replication Factor
  19. 19. Read Fast or Write Fast WRITE - ALL ——> Read - 1 WRITE - ONE ——> Read - ALL WRITE - QUARUM ——> Read - QUARUM Tuning Tuning Tuning Tuning Tuning
  20. 20. Multi DC http://www.slideshare.net/cjohannsen/apache-cassandra-at-the-geek2geek-berlin
  21. 21. Replication Strategies SimpleStrategy : Single Datacenter NetworkTopologyStrategy : Multi Datacenter
  22. 22. Keyspace Name Replication Stategy Replication Factor Durable Write Tables Index
  23. 23. Tables Name Column Primary Key Comment
  24. 24. Data Model Partitioning Key Clustering Column Primary Key
  25. 25. CQL
  26. 26. INDEX BY DESIGN
  27. 27. NoSQL = No! SQL
  28. 28. Cassandra have table ,columns and CQL does not mean it’s SQL
  29. 29. Message TTL
  30. 30. Insert = Upsert
  31. 31. Key-Value
  32. 32. https://academy.datastax.com/
  33. 33. References http://www.slideshare.net/jbellis http://cassandra.apache.org/ http://www.datastax.com/dev/blog/ deploying-cassandra-across-multiple-data- centers http://www.slideshare.net/cjohannsen/ apache-cassandra-at-the-geek2geek-berlin

×