Successfully reported this slideshow.
Your SlideShare is downloading. ×

Declarative benchmarking of cassandra and it's data models

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 51 Ad

Declarative benchmarking of cassandra and it's data models

With the Netflix’s large cassandra footprint there are lots of interesting data models both new and evolving and we have different versions of cassandra.

Hence, developing or evolving scalable data models takes iterations in application code, schema and configurations to achieve desired functional and scalability requirements.

I will share use cases and details about how we make it easy for engineers to validate Cassandra data models across versions, and configuration tweaks to assure application scalability.

With the Netflix’s large cassandra footprint there are lots of interesting data models both new and evolving and we have different versions of cassandra.

Hence, developing or evolving scalable data models takes iterations in application code, schema and configurations to achieve desired functional and scalability requirements.

I will share use cases and details about how we make it easy for engineers to validate Cassandra data models across versions, and configuration tweaks to assure application scalability.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Declarative benchmarking of cassandra and it's data models (20)

Advertisement

Recently uploaded (20)

Advertisement

Declarative benchmarking of cassandra and it's data models

  1. 1. Monal Daxini @ monaldax 11/11/2019 ApacheCon, Las Vegas, 2019 https://www.linkedin.com/in/monaldaxini Declarative Benchmarking of Cassandra and It's Data Models
  2. 2. ● Cloud Data Engineering @ Netflix, work on many data stores ● Help engineers build scalable solutions ● Built scalable data platforms using Apache Flink / Kafka / Docker ● Working with distributed systems for 18+ years Profile @monaldax
  3. 3. • 100’s of applications using Cassandra • (several unique data models / config) • 10’s of thousands instances • 100’s of global C* clusters • > 6 PB of data • Millions of requests/ seconds Netflix Cassandra Footprint @monaldax
  4. 4. • Challenges developing a scalable data model (Cassandra) • Declarative Cassandra benchmarking tool in action • Tool’s philosophy, how it works, & how it can apply to other data stores Structure Of The Talk @monaldax
  5. 5. 1. Design data model & schema 2. Design application queries 3. Identify application load & query distribution 4. Prepare test data 5. Prepare query parameter values to run queries efficiently Developing a Scalable Cassandra Data Model For each application: 6. Code an app to execute queries, and instrument to capture metrics 7. Generate load against application to run queries with desired distribution 8. Analyze results (build dashboard) 9. If results unsatisfactory, iterate from step 1 @monaldax
  6. 6. In addition, We may need to test application workload on different versions of Cassandra and or data models. @monaldax
  7. 7. That’s a lot of steps, duplicate effort, and its cumbersome! @monaldax We want it to be easy, quick, and ergonomic!
  8. 8. 1. Design data model & schema 2. Design application queries 3. Identify the application load & query distribution 4. Prepare test data (generate) 9. Config tool, run test, if results unsatisfactory, iterate from step 1 Developing a Scalable Cassandra Data Model With tooling for each application: 5. Prepare query parameter values to run queries efficiently 6. Code an app to execute queries, and instrument to capture metrics 7. Generate load against application to run queries with desired distribution 8. Analyze results (build dashboard) Heavy Lifting in a Tool @monaldax
  9. 9. ● Generic benchmarking tool ● Support different data stores via plugin (available plugins) ● Dynamically tunable RPS and configuration ● Load patterns - random, time window, zipfian What is NDBench? @monaldax
  10. 10. NDBench In Action NDBench NodeNDBench Node (EC2 Instance) NDBench Node NDBench Node (EC2 Instance) Test Cassandra Cluster Schema & Test Data reads / writes Record Metrics NDBench NodeNDBench APP UI @monaldax
  11. 11. • Emulate application query logic runs against real or generated data • Specify the traffic % distribution • Basic data type coalescing for using query result in another query • Run any CQL statement (Select, Update, Insert, Delete) & support all CQL types • Support any Cassandra version with CQL support Cassandra NDBench CQL plugin @monaldax
  12. 12. • Validate scalability of data model and application query workload • Compare the performance of data model for Cassandra version 3.x & 2.x • Help certify Cassandra updates / upgrades - test different data models and application workloads • Use for data generation for given schema before running queries What Do We Use It For / Plan To Use It For @monaldax
  13. 13. Walkthrough of NDBench CQL Plugin In Action Steps 1-4, 9 @monaldax
  14. 14. Cassandra Schema Of Sample Application (step 1) @monaldax
  15. 15. Application CQL Queries For API 1 (steps 2, 3) Query Group 1: 70% SELECT user_id, profile_id FROM user WHERE user_id = ?; SELECT foreign_keys FROM user_index WHERE type = 'profile_id' AND value = ?; @monaldax
  16. 16. Application CQL Queries For API 2 (steps 2, 3) Query Group 2: 30% SELECT user_id, profile_id, acc_guid FROM user WHERE user_id = ?; BEGIN BATCH INSERT INTO user_index (create_time, foreign_keys, type, value) VALUES (?, [ ?, ? ], ''profile_id'', ?); INSERT INTO user_index (create_time, foreign_keys, type, value) VALUES (?, [ ? ], ''acc_guid'', ?); APPLY BATCH; INSERT INTO map_test (id, uid_pid) VALUES (''1'', {user_id : ?, profile_id: ?}); INSERT INTO set_test(id, uid_pid) VALUES (''2'', {?}); @monaldax
  17. 17. NDBench CQL Plugin Overview Test Cassandra Cluster Schema & Test Data Run Queries ndb_perf_queries Perf Test Profile NDBench NodeNDBench NodeNDBench Node With CQL Plugin (EC2 Instance) Record Metrics NDBench NodeNDBench APP UI @monaldax
  18. 18. NDBench CQL Plugin Perf-Test-Profile Schema (step 9) @monaldax var_* columns point to different sources for query parameter values. Only one is used ordered CQL in group (id)
  19. 19. Modified App Query With Parameter Reference - Group 1 (70%) SELECT user_id, profile_id FROM user WHERE user_id = ?user_id?; SELECT foreign_keys FROM user_index WHERE type = 'profile_id' AND value = ?profile_id?; @monaldax
  20. 20. Modified App Query With Reference - 2 (30%) SELECT user_id, profile_id, acc_guid FROM user WHERE user_id = ?user_id?; BEGIN BATCH INSERT INTO user_index (create_time, foreign_keys, type, value) VALUES (?:TS?, ?[user_id, profile_id]?, ''profile_id'', ?profile_id?); INSERT INTO user_index (create_time, foreign_keys, type, value) VALUES (?:TS?, ?[user_id]?, ''acc_guid'', ?acc_guid?); APPLY BATCH; INSERT INTO map_test (id, uid_pid) VALUES (''1'', ?{user_id : user_id, profile_id: profile_id}?); INSERT INTO set_test(id, uid_pid) VALUES (''2'', ?s{user_id}s?); Type Coercion @monaldax
  21. 21. 00:00 (mm: ss) @monaldax
  22. 22. NDBench CQL Plugin Perf Test Profile - 2 Query Groups @monaldax
  23. 23. NDBench CQL Plugin Perf Test Profile - Select source @monaldax
  24. 24. NDBench CQL Plugin Perf Test Profile - Source Precedence
  25. 25. • Total traffic % of query groups must add up to 100 • Support different consistency level for each statement • Columns in cql statement inferred, and available from the parameter source • Parameter source - Table, Previous query results, SELECT statement • Support large number of parameters to perf test CQL queries Summary - Ergonomic Perf Test Profile, & Comprehensive Validation @monaldax
  26. 26. Run Load Test Spinnaker Pipeline @monaldax
  27. 27. Run Load Test Spinnaker Pipeline @monaldax
  28. 28. Run Load Test Spinnaker Pipeline Manual Judgement @monaldax Test Specific Link
  29. 29. NDBenchUI-CQLPlugin @monaldax CassCQLPlugin
  30. 30. NDBenchUI-CQLPlugin @monaldax CassCQLPlugin
  31. 31. NDBenchUI-CQLPlugin CassCQLPlugin @monaldax
  32. 32. 30:00 (mm: ss) 25 min perf test profile table entry, 5 min run test @monaldax
  33. 33. Run Load Test Spinnaker Pipeline Manual Judgement @monaldax Test Specific Link
  34. 34. Dashboard @monaldax
  35. 35. Dashboard - CQL Plugin Specific @monaldax
  36. 36. Dashboard - Query Execution Latency Per Group @monaldax
  37. 37. • Test scale up to 1.2 million ops / second (1.2 billion parameter rows) • 96 nodes i3.8xl, LCS (compaction), LZ4, mostly read heavy • Found data model bug, slowly leading to wide rows • Client wrapper bugs - slow memory leak, metrics, prepared statement caching not working Testing C* Data Model For A Critical Service On 2.x & 3.x @monaldax
  38. 38. We Would Like To Use Plugin To Test Cassandra @ Netflix Use restores from prod data backups and define of CQL Perf Test Profiles, exercised by the NDBench CQL plugin, and triggered by Cassandra builds @monaldax
  39. 39. Under The Hood Of The CQL Plugin @monaldax
  40. 40. NDBench CQL Plugin Architecture Test Cassandra Cluster Schema & Test Data ndb_perf_queries Run QueriesNDBench NodeNDBench Node (EC2 Instance) NDBench NodeNDBench Node With CQL Plugin (EC2 Instance) Record Metrics NDBench NodeNDBench APP UI @monaldax Perf Test Profile
  41. 41. @monaldax NDBench NodeNDBench Node Sqlite Param store Cassandra Cluster ndb_perf_queries Schema & Test Data Metadata could live on any Cassandra cluster. Parse metadata1 Load from user & Storeon node in Sqlite 2 Run queries with param values from Sqlite & record metrics 4 NDBench UI /init/ all nodes 0 REST /start/ all nodes3 High-level Architecture Randomize start
  42. 42. High-level Architecture (optimized) @monaldax NDBench NodeNDBench Node Sqlite Param store Cassandra Cluster Schema & Test Data Metadata could live on any Cassandra cluster. Parse metadata1 If ! user param on S3Load from & Store on1 node in Sqlite 2 Run queries with param values from Sqlite & record metrics 7 Upload Sqllite file3 /init/ a node0 NDBench UI /init/ all nodes 4 REST /start/ all nodes6 Download Sqllite file from each node 5 Randomize start ndb_perf_queries
  43. 43. Dashboard - Parameters Values Uploaded and Shared @monaldax
  44. 44. Lock-free Randomized Deterministic % Query Distribution On Each Node Query Group ID 1: 70% Query Group ID 2: 30% ( 1 ) 1 1 1 1 1 1 1 2 2 2 2 70 1s for Query Group 1 30 2s for Query Group 2 100 Element Array ↓ @monaldax
  45. 45. 1 2 1 1 2 1 2 1 2 1 1 1 time Fisher-Yates Shuffle Lock-free Randomized Deterministic % Query Distribution On Each Node Query Group ID 1: 70% Query Group ID 2: 30% ( 2 ) @monaldax
  46. 46. 1 2 1 1 2 1 2 1 2 1 1 Lock-free Randomized Deterministic % Query Distribution On Each Node Query Group ID 1: 70% Query Group ID 2: 30% ( 3 ) Thread 1 ︴ThreadLocal Array Index Thread n ︴ThreadLocal Array Index @monaldax
  47. 47. Data Generators And Generating Test Data • ?:TS? - This is replaced by a timestamp. • Add more generators (future) • generation of non-collection (bigint, text, uuid, etc.) and collection types • Use generators in INSERT to generate data for new schema @monaldax
  48. 48. Wrap Up @monaldax
  49. 49. • Declaratively benchmarking significantly reduces overhead in iterating over schema and Cassandra config to achieve scale • Used to test and benchmark against curated data sets and perf-test-profiles • Support all data types & LWT Support (beta) • Randomized deterministic percentage distribution of queries Summary @monaldax
  50. 50. • Open source NDBench CQL plugin (WIP) • Add more generators • Load sharded query parameter data on each NDBench node • UDT Support in dynamic collections • Build support for other data stores - leverage same philosophy & reuse code Future Enhancements (Lazily) @monaldax
  51. 51. @monaldax End of Season 1 Q & A @monaldax

×