Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cassandra At Wize Commerce

2,407 views

Published on

Did a presentation at Cassandra meetup explaining how we used Cassandra internally in Wize Commerce to improve our object cache. Also, I talked about a performance evaluation we carried out before we moved into Cassandra.

Published in: Business, Technology
  • Be the first to comment

Cassandra At Wize Commerce

  1. 1. CASSANDRA AT WIZE COMMERCE Eran Chinthaka Withana Eran.Withana@wizecommerce.comCASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  2. 2. About me • Engineer in Platform and Infrastructure team at Wize Commerce (formerly Nextag) • Member, PMC Member and a committer of Apache Software Foundation – Contributed to Web services project since 2004 • (in a different life) PhD in Computer Science from Indiana University, Bloomington, Indiana • Today 2CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  3. 3. In the next hour … • Wize Commerce • Impact of Cassandra on Wize Commerce – Object Cache – Personalized Search • Performance evaluation of Cassandra in a multi-data center and a read/write heavy environment 3CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  4. 4. WIZE COMMERCECASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  5. 5. About Wize Commerce • Helping companies maximize their eCommerce investments – across every channel, device and digital ecosystem – an expertise we’ve honed for years with our eCommerce customers – providing them with unmatched traffic and monetization services at incredible scale 5CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  6. 6. About Wize Commerce • Scale of Wize Commerce – We drive over $1.1 Billion in annual worldwide sales – Shopping Network includes Nextag, guenstiger.de, FanSnap, and Calibex – Each week, we manage • 21 Million Keyword Searches • 105 Million Retargeted Ads • 140 Million Bot Crawls • 300 Million Facebook Ads • 700 Million Keywords • 560 Million Product SKUs • 1000s of Simultaneous A/B Test 6CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  7. 7. CASSANDRA AT WIZE COMMERCE - CACHECASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  8. 8. Cache ArchitectureCASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  9. 9. Cache Architecture • Multi-tiered read-through cache, optimized for performance • TTLs at upper levels to keep the data fresh • JMS based infrastructure to refresh objects on-demand 9CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  10. 10. Cache - Expectations • For each object – Less than 30ms 95th percentile read latency – Less than 1-hour of update latency with 30M updates (phase 1, with existing components) – 10 minutes with eventing system integrated • Fault tolerance • Low maintenance overheads • Ability to scale 10CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  11. 11. Cache – Cassandra Integration 11CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  12. 12. Cache – Cassandra Integration DC1 DC2 DC3 DC4 • Replication factors to facilitate required number of copies per region • Consistency level to suit business requirements • 6 multi-data center clusters with total nodes per cluster ranging from 24 to 32 • In house monitoring system for continuous monitoring and escalations 12CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  13. 13. Cache – Cassandra Integration • Clients – Hector with DynamicLoadBalancing policy – Started experimenting with Astyanax • Maintenance – Weekly repair and compaction tasks • Monitoring – System health monitoring – End-to-end latency – Update latency 13CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  14. 14. Cache – Cassandra Integration • Ring output of a cluster Address DC Rack Status State Load Owns Token 148873535527910577765226390751398592512 xx.xx.xx.79 DC1 RAC1 Up Normal 90.19 GB 12.50% 0 xx.xx.xx.75 DC2 RAC1 Up Normal 51.15 GB 0.00% 1 xx.xx.xx.75 DC3 RAC1 Up Normal 126.62 GB 0.00% 2 xx.xx.xx.80 DC1 RAC1 Up Normal 88.57 GB 12.50% 21267647932558653966460912964485513216 xx.xx.xx.81 DC1 RAC1 Up Normal 89.82 GB 12.50% 42535295865117307932921825928971026432 xx.xx.xx.76 DC2 RAC1 Up Normal 51.1 GB 0.00% 42535295865117307932921825928971026433 xx.xx.xx.76 DC3 RAC1 Up Normal 124.49 GB 0.00% 42535295865117307932921825928971026434 xx.xx.xx.82 DC1 RAC1 Up Normal 85.78 GB 12.50% 63802943797675961899382738893456539648 xx.xx.xx.83 DC1 RAC1 Up Normal 84.34 GB 12.50% 85070591730234615865843651857942052864 xx.xx.xx.77 DC2 RAC1 Up Normal 49.34 GB 0.00% 85070591730234615865843651857942052865 xx.xx.xx.77 DC3 RAC1 Up Normal 123.54 GB 0.00% 85070591730234615865843651857942052866 xx.xx.xx.84 DC1 RAC1 Up Normal 82.94 GB 12.50% 106338239662793269832304564822427566080 xx.xx.xx.85 DC1 RAC1 Up Normal 83.1 GB 12.50% 127605887595351923798765477786913079296 xx.xx.xx.78 DC2 RAC1 Up Normal 47.98 GB 0.00% 127605887595351923798765477786913079297 xx.xx.xx.78 DC3 RAC1 Up Normal 121.25 GB 0.00% 127605887595351923798765477786913079298 xx.xx.xx.86 DC1 RAC1 Up Normal 83.41 GB 12.50% 148873535527910577765226390751398592512CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  15. 15. Cache – Cassandra Integration • Column family stats of a cluster Keyspace: XXXX Read Count: 37060467 Read Latency: 3.0589244618800944 ms. Write Count: 37013052 Write Latency: 0.05114632081677566 ms. Pending Tasks: 0 Column Family: YYY SSTable count: 11 Space used (live): 71463479840 Space used (total): 71463479840 Number of Keys (estimate): 66231424 Memtable Columns Count: 314964 Memtable Data Size: 68140546 Memtable Switch Count: 628 Read Count: 37060467 Read Latency: 3.138 ms. Write Count: 37013052 Write Latency: 0.058 ms. Pending Tasks: 0 Bloom Filter False Postives: 10653 Bloom Filter False Ratio: 0.01611 Bloom Filter Space Used: 173770024 Key cache capacity: 60000000 Key cache size: 13309399 Key cache hit rate: 0.9210111414757199 Row cache: disabled Compacted row minimum size: 925 Compacted row maximum size: 8239 Compacted row mean size: 2488CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  16. 16. Cache – Cassandra Datastore Performance • 3-6ms average read latency across all objects in all data centers • 15-20ms 95th percentile read latency • 30mins average update latency at 25M updates • Zero downtime even with multiple node failures 16CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  17. 17. Cache – Snapshot of Live System Median Read Latency Objects Scrubbed in Last 24hrs Scrubber Latency 17CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  18. 18. Cassandra Integration – Lessons Learned • Try to understand the internals, read code and find solutions on your own before getting into support requests – Assumption: you have adventurous engineers :D – Use IRC channels, user lists • Never use RoundRobinLoadBalancingPolicy if you care about performance – DynamicLoadBalancingPolicy: based on the probability of failure of node • Divide keyspace within the datacenter and use token + 1 method in other data centers • Experiment different configurations but make sure to have a quick fallback plan 18CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  19. 19. Cassandra Integration – Lessons Learned • Compaction are crucial for read/write heavy environment • 24 x 7 automated monitoring and alerts – Read/write latencies , read misses and node status at least • Consistency levels are important, if you expect node failures in a multi-data center environment • Concentrate on key cache and forget about row cache if you have limited resources. – Rely on OS file cache 19CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  20. 20. Cache: Future 20CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  21. 21. Cache: Future • Exposing Cache system using SOA based infrastructure – Thrift services enabling all cache accesses • Event based updates – Event based pipeline for changes for system-of-record – Based on Storm (Twitter) • Getting rid of Memcached 21CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  22. 22. CASSANDRA AT WIZE COMMERCE – PERSONALIZED SEARCHCASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  23. 23. Personalized Search • Aggregates user data from multiple data sources, e.g. site search, banner clicks. • Uses statistical model to re-rank search results tailored to the user. • Decomposes user information into model variables: brand preference, merchant preference, product category preference, etc. 23CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  24. 24. Personalized Search – Cassandra Integration • Serves 30-40MM banner ad impressions daily • Before: rely on user cookie (stores up to 4 weeks data) • After: use user cookie for todays data, combined with Cassandra Data Store to keep up to 3 months data 24CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  25. 25. PERFORMANCE EVALUATION OF APACHE CASSANDRA IN A MULTI-DATA CENTER, READ/WRITE HEAVY ENVIRONMENTCASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  26. 26. Objectives • Understand the limitations of Cassandra when deployed in a multi-data center environment • Find out the best set of parameters that can be used and tuned to improve the performance • Find out the limits of Cassandra cluster and for each version. 26CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  27. 27. Objectives • Understand its scalability characteristics with varying amount of operations per second – This will help us to understand how much of load we can serve without causing any significant performance degradations. • Understand the implications of node failures on its capability to efficiently serve data to client requests 27CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  28. 28. Environment Setup • Test Metrics – Operation Latency for a given throughput (set in the client) • Average, Minimum, Maximum, 95th percentile • Test Setup – Versions: Apache Cassandra 0.8.6 and 1.0.1 – Node Distribution: 12-nodes distributed over three geographically distributed data centers in US – Key Distribution: Keyspace is divided into four in each data center and each node in the cluster is responsible for 1/4th of the keyspace – Replication Factor: 3. Each datacenter has a copy of the data. 28CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  29. 29. Environment Setup • Hardware Setup – Dell R410 • 2 Quad-core with hyper-threading • 8 x 4GB RAM • PERC 6/i RAID Controller with 4 x 450GB and 15k RPM drives • GigE Network • CentOS 5.7 • Clients – Uses Yahoo Cloud Serving Benchmark (YCSB) – Two clients in each data-center, with a total of 6 clients – Records metrics at 10s intervals • Every test case is independent of each other 29CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  30. 30. Workload • Read:Write ratio is 1:1. • Thread Counts: 256 from each client (total of 6 clients, 2 from each data-center) – Contacts Cassandra nodes only in its own data-center (no cross data-center traffic) • Key Distribution: Zipfian • Record Count: 100 million • Total Operations Per Test Case: 1 million 30CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  31. 31. Workload • Target Operations per Second: Varies • Test Data – Columns per row: 10 – Compacted row minimum size: 150 – Compacted row maximum size: 1331 – Compacted row mean size: 736 31CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  32. 32. Test Cases • Parameters varied in each test case – Apache Cassandra version: 0.8.6 vs 1.0.1 – Concurrent read and write threads in a Cassandra node – Number of keys cached 32CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  33. 33. Test Cases Test Description Number 1 Cassandra 0.8.6 binary as it is with no changes (Concurrent reads/writes = 32 and keys cached = 200k). base case for 0.8.6 2 Cassandra 0.8.6 with 64 concurrent reads and writes. Also keys cached is increased to 1 million. 3 Cassandra 0.8.6 with 64 concurrent reads and 32 concurrent writes. Also keys cached is increased to 1 million 4 Cassandra 1.0.1 with 64 concurrent reads and writes. Also keys cached is increased to 1 million. 5 Cassandra 1.0.1 with 64 concurrent reads and 32 concurrent writes. Also keys cached is increased to 1 million 6 Failure Test: Cassandra 1.0.1 with 64 concurrent reads and 64 concurrent writes. Also keys cached is increased to 1 million. For each test case, we plot operations per second (varied from 3000 to 24000) vs read/write latency 33CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  34. 34. Test Cases • Failure Test • Brought down a node in east coast data center (DC2) and ran the test varying the • Node going down has three implications on the latency. • Our test clients timeout after 300 retries to connect to failed node. • Our nodes in DC2 will go to DC3 to serve data that are not available in DC2 due to the node failure 3) • Our nodes in DC3 will have requests coming from the nodes of DC2 putting more load on them 34CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  35. 35. ResultsTest Case 1: Varying OPS with Cassandra 0.8.6 DefaultConfiguration• read performance of default configuration is increasing beyond 25ms after 3000 OPS. 35CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  36. 36. ResultsTest Case 1: Varying OPS with Cassandra 0.8.6 DefaultConfiguration• Even though write performance is staying almost constant the poor read performance will be a concern with this configuration. 36CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  37. 37. Results Test Case 2: Varying OPS with Cassandra 0.8.6 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) 37CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  38. 38. Results Test Case 2: Varying OPS with Cassandra 0.8.6 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) • even with good write performance, read performance after 12000 QPS is going beyond our threshold of 25ms 38CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  39. 39. Results Test Case 3: Varying OPS with Cassandra 0.8.6 and Custom Configuration (64 Concurrent Writes, 32 Concurrent Reads and 1 million keys cached) • latency goes beyond 25ms after reaching 18000 OPS 39CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  40. 40. Results Test Case 3: Varying OPS with Cassandra 0.8.6 and Custom Configuration (64 Concurrent Writes, 32 Concurrent Reads and 1 million keys cached) • better and consistent write performance 40CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  41. 41. Results Test Case 4: Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) • read performance has improved significantly and even at 24000 OPS it has stayed well below 10ms range. 41CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  42. 42. Results Test Case 4: Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) • better and consistent write performance 42CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  43. 43. Results Test Case 5: Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Writes, 32 Concurrent Reads, 1 million keys cached) • a degradation of read performance compared to test case 4 • latency goes beyond 25ms after reaching 21000 QPS. 43CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  44. 44. Results Test Case 5: Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Writes, 32 Concurrent Reads, 1 million keys cached) • a degradation of read performance compared to test case 4 • latency goes beyond 25ms after reaching 21000 QPS. 44CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  45. 45. Results Test Case 6: Failure Test - Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) • Node going down has three implications on the latency. • Our test clients timeout after 300 retries to connect to failed node. • Our nodes in DC2 will go to DC3 to serve data that are not available in DC2 due to the node failure 3) • Our nodes in DC3 will have requests coming from the nodes of DC2 putting more load on them 45CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  46. 46. Results Test Case 6: Failure Test - Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) DC2 46CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  47. 47. Results Test Case 6: Failure Test - Varying QPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) DC2 47CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  48. 48. Results Test Case 6: Failure Test - Varying QPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) DC3 • increase in average latency in both DC2 and DC3 data centers but even with the node failure the latency has stayed below 25ms. 48CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  49. 49. Results Test Case 6: Failure Test - Varying QPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) DC3 • increase in average latency in both DC2 and DC3 data centers but even with the node failure the latency has stayed below 25ms. 49CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  50. 50. Comparisons Cassandra 0.8.6 50CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  51. 51. Comparisons Cassandra 0.8.6 51CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  52. 52. Comparisons Cassandra 1.0.1 • 64 concurrent reads and writes with 1 millions keys cached has performed significantly better than the other configurations in terms of read performance 52CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  53. 53. Comparisons Cassandra 1.0.1 53CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  54. 54. Comparisons Cassandra 1.0.1 vs 0.8.6 Average Read Performance Comparison • 64 concurrent reads and writes with 1 millions keys cached has performed significantly better than the other configurations in terms of read performance 54CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  55. 55. Comparisons Cassandra 1.0.1 vs 0.8.6 95th Percentile Read Performance Comparison • 64 concurrent reads and writes with 1 millions keys cached has performed 55 significantly better than the other configurations in terms of read performanceCASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  56. 56. Performance Evaluation: Conclusions • Cassandra 1.0.1 with 64 concurrent reads and writes and with 1 millions keys cached we could serve 24000 operations per second under 15ms • Node failure tests prove that in this configuration we can serve higher load in the cluster with less than 25ms • Even the 95th percentile latency and 99th percentile numbers for this configuration is well within our expected limits 56CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  57. 57. Excited about the work? We’re hiring !! 57CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  58. 58. Thank you !! 58CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  59. 59. Questions !! (Presentation is available at http://goo.gl/Ba9o4) 59CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)

×