Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra

1,146 views

Published on

Slides from my talk at Cassandra Summit 2015
http://cassandrasummit-datastax.com/agenda/repeatable-scalable-reliable-observable-cassandra/

thelastpickle.com

Published in: Software

Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra

  1. 1. CASSANDRA SF 2015 REPEATABLE, SCALABLE, RELIABLE, OBSERVABLE CASSANDRA Aaron Morton @aaronmorton Co-Founder & Principal Consultant Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
  2. 2. AboutThe Last Pickle. Work with clients to deliver and improve Apache Cassandra based solutions. Apache Cassandra Committer, DataStax MVP, Apache Usergrid Committer. Based in New Zealand,Australia, & USA.
  3. 3. Design Development Deployment
  4. 4. Scaleable Data Model Use no look writes to avoid unnecessary reads.
  5. 5. No Look Writes CREATE TABLE user_visits ( user text, day int, // YYYYMMDD PRIMARY KEY (user, day) );
  6. 6. No Look Writes // Bad SELECT * FROM user_visits WHERE user = ‘aaron’ AND day = 20150924; INSERT INTO user_visits (user, day) VALUES ('aaron', 20150924);
  7. 7. No Look Writes // Better INSERT INTO user_visits (user, day) VALUES ('aaron', 20150924); INSERT INTO user_visits (user, day) VALUES ('aaron', 20150924);
  8. 8. Scaleable Data Model Limit Partition size by bounding it in time or space.
  9. 9. Limit Partition Size // Bad CREATE TABLE user_visits ( user text, visit_time timestamp, data blob, // up to 100K PRIMARY KEY (user, visit) );
  10. 10. Limit Partition Size // Better CREATE TABLE user_visits ( user text, day_bucket int, // YYYYMMDD visit_time timestamp, data blob, // up to 100K PRIMARY KEY ( (user, day_bucket), visit) );
  11. 11. Scaleable Data Model Avoid mixed workloads on a single Table to reduce impact of fragmentation.
  12. 12. Mixed Workloads // Bad CREATE TABLE user ( user text, password text, // when password changed last_visit timestamp, // each page request PRIMARY KEY (user) );
  13. 13. Mixed Workloads // Better CREATE TABLE user_password ( user text, password text, PRIMARY KEY (user) ); CREATE TABLE user_last_visit ( user text, last_visit timestamp, PRIMARY KEY (user) );
  14. 14. Scaleable Data Model Use LeveledCompactionStrategy when overwrites or Tombstones.
  15. 15. Use LCS for Overwrites CREATE TABLE user_visits ( user text, day int, // YYYYMMDD PRIMARY KEY (user, day) ) WITH COMPACTION = { 'class' : 'LeveledCompactionStrategy' };
  16. 16. Scaleable Data Model Create parallel data models so throughput increases with node count.
  17. 17. Parallel Data Models // Bad CREATE TABLE hotel_price ( checkin_day int, // YYYYMMDD hotel_name text, price_data blob, PRIMARY KEY (checkin_day, hotel_name) );
  18. 18. Parallel Data Models // Better CREATE TABLE hotel_price ( checkin_day int, // YYYYMMDD city text, hotel_name text, price_data blob, PRIMARY KEY ( (checkin_day, city), hotel_name) );
  19. 19. Scaleable Data Model Use concurrent asynchronous requests to complete tasks.
  20. 20. Concurrent Asynchronous Requests CREATE TABLE hotel_price ( checkin_day int, // YYYYMMDD city text, hotel_name text, price_data blob, PRIMARY KEY ( (checkin_day, city), hotel_name) );
  21. 21. Concurrent Asynchronous Requests // request for cities concurrently SELECT * FROM hotel_price WHERE checkin_day = 20150924 AND city = 'Santa Clara'; SELECT * FROM hotel_price WHERE checkin_day = 20150924 AND city = 'San Jose';
  22. 22. Scaleable Data Model Document when Eventual Consistency, Strong Consistency or Linerizable Consistency is required.
  23. 23. Scaleable Data Model Smoke Test the data model.
  24. 24. Data Model SmokeTest /* * Get Pricing Data */ // Load Data INSERT INTO city_distances (city, distance, nearby_city) VALUES ('Santa Clara', 0, 'Santa Clara'); INSERT INTO city_distances (city, distance, nearby_city) VALUES ('Santa Clara', 1, 'San Jose'); INSERT INTO hotel_price (checkin_day, city, hotel_name, price_data) VALUES (20150924, 'Santa Clara', 'Hilton Santa Clara', 0xFF); INSERT INTO hotel_price (checkin_day, city, hotel_name, price_data) VALUES (20150924, 'San Jose', 'Hyatt San Jose', 0xFF);
  25. 25. Data Model SmokeTest // Step 1 // Get the near by cities for the one selected by the user SELECT nearby_city FROM city_distances WHERE city = 'Santa Clara' and distance < 2; // Step 2 // Parallel requests for each city returned. SELECT city, hotel_name, price_data FROM hotel_price WHERE checkin_day = 20150924 AND city = 'Santa Clara'; SELECT city, hotel_name, price_data FROM hotel_price WHERE checkin_day = 20150924 AND city = 'San Jose';
  26. 26. Design Development Deployment
  27. 27. Application Development Ensure read requests are bound and know what the size is. (hint: use auto-paging in 2.0)
  28. 28. Auto Paging PreparedStatement prepStmt = session.prepare(CQL); BoundStatement boundStmt = new BoundStatement(prepStmt); boundStatement.setFetchSize(100)
  29. 29. Application Development Use appropriate Consistency Level. (see Data Model Smoke Test)
  30. 30. Application Development Use Token Aware Asynchronous requests with CL ONE where possible.
  31. 31. Token Aware Policy cluster = Cluster.builder() .addContactPoints("10.10.10.10") .withLoadBalancingPolicy(new TokenAwarePolicy( new DCAwareRoundRobinPolicy(“DC1”))) .build()
  32. 32. Asynchronous Requests ResultSetFuture f = ses.executeAsync(stmt.bind("fo")); Row row = f.getUninterruptibly().one();
  33. 33. Application Development Avoid DDOS’ing the cluster.
  34. 34. Monitoring and Alerting Use what you like and what works for you.
  35. 35. Monitoring and Alerting Some suggestions: OpsCentre, Riemann, Grafana, Log Stash, Sensu.
  36. 36. HowTo Monitor Cluster wide aggregate. All nodes (if possible). Top 3 & Bottom 3 Nodes. Individual Nodes.
  37. 37. HowTo Monitor Rates 1 Minute Rate Derivative of Counts
  38. 38. HowTo Monitor Latency 75th Percentile 95th Percentile 99th Percentile
  39. 39. Monitoring ClusterThroughput .o.a.c.m.ClientRequest. Write.Latency.1MinuteRate Read.Latency.1MinuteRate
  40. 40. Monitoring LocalTableThroughput .o.a.c.m.ColumnFamily. KEYSPACE.TABLE.WriteLatency.1MinuteRate KEYSPACE.TABLE.ReadLatency.1MinuteRate
  41. 41. Monitoring Request Latency .o.a.c.m.ClientRequest. Write.Latency.75percentile Write.Latency.95percentile Write.Latency.99percentile Read.Latency.75percentile…
  42. 42. Monitoring Request Latency PerTable .o.a.c.m.ColumnFamily. KEYSPACE.TABLE.CoordinatorWriteLatency. 95percentile KEYSPACE.TABLE.CoordinatorReadLatency. 95percentile
  43. 43. Monitoring LocalTable Latency .o.a.c.m.ColumnFamily. KEYSPACE.TABLE.WriteLatency.95percentile KEYSPACE.TABLE.ReadLatency.95percentile
  44. 44. Monitoring Read Path .o.a.c.m.ColumnFamily.KEYSPACE.TABLE. LiveScannedHistogram.95percentile TombstoneScannedHistogram.95percentile SSTablesPerReadHistogram.95percentile
  45. 45. Monitoring Inconsistency .o.a.c.m. Storage.TotalHints.count HintedHandOffManager. Hints_created-IP_ADDRESS.count .o.a.c.m.Connection.TotalTimeouts. 1MinuteRate
  46. 46. Monitoring Eventual Consistency .o.a.c.m. ReadRepair.RepairedBackground. 1MinuteRate ReadRepair.RepairedBlocking.1MinuteRate
  47. 47. Monitoring Client Errors .o.a.c.m.ClientRequest. Write.Unavailables.1MinuteRate Read.Unavailables.1MinuteRate Write.Timeouts.1MinuteRate Read.Timeouts.1MinuteRate
  48. 48. Monitoring Errors .o.a.c.m. Storage.Exceptions.count
  49. 49. Monitoring Disk Usage .o.a.c.m. Storage.Load.count ColumnFamily.KEYSPACE.TABLE. TotalDiskSpaceUsed.count
  50. 50. Monitoring Pending Compactions .o.a.c.m. Compaction.PendingTasks.value ColumnFamily.KEYSPACE.TABLE.PendingCompactions .value Compaction.TotalCompactionsCompleted. 1MinuteRate
  51. 51. Monitoring Node Performance .o.a.c.m.ThreadPools.request. MutationStage.PendingTasks.value ReadStage.PendingTasks.value ReplicateOnWriteStage.PendingTasks.value RequestResponseStage.PendingTasks.value
  52. 52. Monitoring Node Performance .o.a.c.m.DroppedMessage. MUTATION.Dropped.1MinuteRate READ.Dropped.1MinuteRate
  53. 53. Design Development Provisioning
  54. 54. SmokeTests “preliminary testing to reveal simple failures severe enough to reject a prospective software release.”
  55. 55. Disk SmokeTests “Disk Latency and Other Random Numbers” Al Toby http://tobert.github.io/post/2014-11-13-slides-disk- latency-and-other-random-numbers.html
  56. 56. Cassandra SmokeTest cassandra-stress write cl=quorum -schema replication(factor=3) -mode native prepared cql3 cassandra-stress read cl=quorum -mode native prepared cql3 cassandra-stress mixed cl=quorum ratio(read=1,write=4) -mode native prepared cql3
  57. 57. Run Books Plan now.
  58. 58. Run Books Why are we doing this? What are we doing? How will we do it?
  59. 59. Fire Drills Practice now.
  60. 60. Fire Drill: ShortTerm Single Node Failure Down for less than Hint Window. Available for QUORUM. No action necessary on return.
  61. 61. Fire Drill: ShortTerm Multi Node Failure (Break the cluster) Down for less than Hint Window. Available for ONE (maybe). Repair on return.
  62. 62. Fire Drill:Availability Zone / Rack Partition Down for less than Hint Window. Available for QUORUM. Maybe repair on return.
  63. 63. Fire Drill: MediumTerm Single Node Failure Down between Hint Window and gc_grace_seconds. Available for QUORUM. Repair on return.
  64. 64. Fire Drill: LongTerm Single Node Failure Down longer than gc_grace_seconds. Available for QUORUM. Replace node.
  65. 65. Fire Drill: Rolling Upgrade Repeated short term failure. Available for QUORUM.
  66. 66. Fire Drill: Scale Up Repeated short term failure. Available for QUORUM.
  67. 67. Fire Drill: Scale Out Available for ALL.
  68. 68. Thanks.
  69. 69. Aaron Morton @aaronmorton Co-Founder & Principal Consultant www.thelastpickle.com

×