Cassandra At Wize Commerce

2,246 views
2,125 views

Published on

Did a presentation at Cassandra meetup explaining how we used Cassandra internally in Wize Commerce to improve our object cache. Also, I talked about a performance evaluation we carried out before we moved into Cassandra.

Published in: Business, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,246
On SlideShare
0
From Embeds
0
Number of Embeds
28
Actions
Shares
0
Downloads
36
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • ----- Meeting Notes (7/25/12 16:58) -----Datastore will be Cassandra
  • ----- Meeting Notes (7/25/12 16:58) -----DC2 and DC3 are in the same region
  • ----- Meeting Notes (7/25/12 16:58) -----Read latencykey cache hit ratewrite latency
  • ----- Meeting Notes (7/25/12 17:02) -----Astyanax
  • Cassandra At Wize Commerce

    1. 1. CASSANDRA AT WIZE COMMERCE Eran Chinthaka Withana Eran.Withana@wizecommerce.comCASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    2. 2. About me • Engineer in Platform and Infrastructure team at Wize Commerce (formerly Nextag) • Member, PMC Member and a committer of Apache Software Foundation – Contributed to Web services project since 2004 • (in a different life) PhD in Computer Science from Indiana University, Bloomington, Indiana • Today 2CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    3. 3. In the next hour … • Wize Commerce • Impact of Cassandra on Wize Commerce – Object Cache – Personalized Search • Performance evaluation of Cassandra in a multi-data center and a read/write heavy environment 3CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    4. 4. WIZE COMMERCECASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    5. 5. About Wize Commerce • Helping companies maximize their eCommerce investments – across every channel, device and digital ecosystem – an expertise we’ve honed for years with our eCommerce customers – providing them with unmatched traffic and monetization services at incredible scale 5CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    6. 6. About Wize Commerce • Scale of Wize Commerce – We drive over $1.1 Billion in annual worldwide sales – Shopping Network includes Nextag, guenstiger.de, FanSnap, and Calibex – Each week, we manage • 21 Million Keyword Searches • 105 Million Retargeted Ads • 140 Million Bot Crawls • 300 Million Facebook Ads • 700 Million Keywords • 560 Million Product SKUs • 1000s of Simultaneous A/B Test 6CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    7. 7. CASSANDRA AT WIZE COMMERCE - CACHECASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    8. 8. Cache ArchitectureCASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    9. 9. Cache Architecture • Multi-tiered read-through cache, optimized for performance • TTLs at upper levels to keep the data fresh • JMS based infrastructure to refresh objects on-demand 9CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    10. 10. Cache - Expectations • For each object – Less than 30ms 95th percentile read latency – Less than 1-hour of update latency with 30M updates (phase 1, with existing components) – 10 minutes with eventing system integrated • Fault tolerance • Low maintenance overheads • Ability to scale 10CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    11. 11. Cache – Cassandra Integration 11CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    12. 12. Cache – Cassandra Integration DC1 DC2 DC3 DC4 • Replication factors to facilitate required number of copies per region • Consistency level to suit business requirements • 6 multi-data center clusters with total nodes per cluster ranging from 24 to 32 • In house monitoring system for continuous monitoring and escalations 12CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    13. 13. Cache – Cassandra Integration • Clients – Hector with DynamicLoadBalancing policy – Started experimenting with Astyanax • Maintenance – Weekly repair and compaction tasks • Monitoring – System health monitoring – End-to-end latency – Update latency 13CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    14. 14. Cache – Cassandra Integration • Ring output of a cluster Address DC Rack Status State Load Owns Token 148873535527910577765226390751398592512 xx.xx.xx.79 DC1 RAC1 Up Normal 90.19 GB 12.50% 0 xx.xx.xx.75 DC2 RAC1 Up Normal 51.15 GB 0.00% 1 xx.xx.xx.75 DC3 RAC1 Up Normal 126.62 GB 0.00% 2 xx.xx.xx.80 DC1 RAC1 Up Normal 88.57 GB 12.50% 21267647932558653966460912964485513216 xx.xx.xx.81 DC1 RAC1 Up Normal 89.82 GB 12.50% 42535295865117307932921825928971026432 xx.xx.xx.76 DC2 RAC1 Up Normal 51.1 GB 0.00% 42535295865117307932921825928971026433 xx.xx.xx.76 DC3 RAC1 Up Normal 124.49 GB 0.00% 42535295865117307932921825928971026434 xx.xx.xx.82 DC1 RAC1 Up Normal 85.78 GB 12.50% 63802943797675961899382738893456539648 xx.xx.xx.83 DC1 RAC1 Up Normal 84.34 GB 12.50% 85070591730234615865843651857942052864 xx.xx.xx.77 DC2 RAC1 Up Normal 49.34 GB 0.00% 85070591730234615865843651857942052865 xx.xx.xx.77 DC3 RAC1 Up Normal 123.54 GB 0.00% 85070591730234615865843651857942052866 xx.xx.xx.84 DC1 RAC1 Up Normal 82.94 GB 12.50% 106338239662793269832304564822427566080 xx.xx.xx.85 DC1 RAC1 Up Normal 83.1 GB 12.50% 127605887595351923798765477786913079296 xx.xx.xx.78 DC2 RAC1 Up Normal 47.98 GB 0.00% 127605887595351923798765477786913079297 xx.xx.xx.78 DC3 RAC1 Up Normal 121.25 GB 0.00% 127605887595351923798765477786913079298 xx.xx.xx.86 DC1 RAC1 Up Normal 83.41 GB 12.50% 148873535527910577765226390751398592512CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    15. 15. Cache – Cassandra Integration • Column family stats of a cluster Keyspace: XXXX Read Count: 37060467 Read Latency: 3.0589244618800944 ms. Write Count: 37013052 Write Latency: 0.05114632081677566 ms. Pending Tasks: 0 Column Family: YYY SSTable count: 11 Space used (live): 71463479840 Space used (total): 71463479840 Number of Keys (estimate): 66231424 Memtable Columns Count: 314964 Memtable Data Size: 68140546 Memtable Switch Count: 628 Read Count: 37060467 Read Latency: 3.138 ms. Write Count: 37013052 Write Latency: 0.058 ms. Pending Tasks: 0 Bloom Filter False Postives: 10653 Bloom Filter False Ratio: 0.01611 Bloom Filter Space Used: 173770024 Key cache capacity: 60000000 Key cache size: 13309399 Key cache hit rate: 0.9210111414757199 Row cache: disabled Compacted row minimum size: 925 Compacted row maximum size: 8239 Compacted row mean size: 2488CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    16. 16. Cache – Cassandra Datastore Performance • 3-6ms average read latency across all objects in all data centers • 15-20ms 95th percentile read latency • 30mins average update latency at 25M updates • Zero downtime even with multiple node failures 16CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    17. 17. Cache – Snapshot of Live System Median Read Latency Objects Scrubbed in Last 24hrs Scrubber Latency 17CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    18. 18. Cassandra Integration – Lessons Learned • Try to understand the internals, read code and find solutions on your own before getting into support requests – Assumption: you have adventurous engineers :D – Use IRC channels, user lists • Never use RoundRobinLoadBalancingPolicy if you care about performance – DynamicLoadBalancingPolicy: based on the probability of failure of node • Divide keyspace within the datacenter and use token + 1 method in other data centers • Experiment different configurations but make sure to have a quick fallback plan 18CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    19. 19. Cassandra Integration – Lessons Learned • Compaction are crucial for read/write heavy environment • 24 x 7 automated monitoring and alerts – Read/write latencies , read misses and node status at least • Consistency levels are important, if you expect node failures in a multi-data center environment • Concentrate on key cache and forget about row cache if you have limited resources. – Rely on OS file cache 19CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    20. 20. Cache: Future 20CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    21. 21. Cache: Future • Exposing Cache system using SOA based infrastructure – Thrift services enabling all cache accesses • Event based updates – Event based pipeline for changes for system-of-record – Based on Storm (Twitter) • Getting rid of Memcached 21CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    22. 22. CASSANDRA AT WIZE COMMERCE – PERSONALIZED SEARCHCASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    23. 23. Personalized Search • Aggregates user data from multiple data sources, e.g. site search, banner clicks. • Uses statistical model to re-rank search results tailored to the user. • Decomposes user information into model variables: brand preference, merchant preference, product category preference, etc. 23CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    24. 24. Personalized Search – Cassandra Integration • Serves 30-40MM banner ad impressions daily • Before: rely on user cookie (stores up to 4 weeks data) • After: use user cookie for todays data, combined with Cassandra Data Store to keep up to 3 months data 24CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    25. 25. PERFORMANCE EVALUATION OF APACHE CASSANDRA IN A MULTI-DATA CENTER, READ/WRITE HEAVY ENVIRONMENTCASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    26. 26. Objectives • Understand the limitations of Cassandra when deployed in a multi-data center environment • Find out the best set of parameters that can be used and tuned to improve the performance • Find out the limits of Cassandra cluster and for each version. 26CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    27. 27. Objectives • Understand its scalability characteristics with varying amount of operations per second – This will help us to understand how much of load we can serve without causing any significant performance degradations. • Understand the implications of node failures on its capability to efficiently serve data to client requests 27CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    28. 28. Environment Setup • Test Metrics – Operation Latency for a given throughput (set in the client) • Average, Minimum, Maximum, 95th percentile • Test Setup – Versions: Apache Cassandra 0.8.6 and 1.0.1 – Node Distribution: 12-nodes distributed over three geographically distributed data centers in US – Key Distribution: Keyspace is divided into four in each data center and each node in the cluster is responsible for 1/4th of the keyspace – Replication Factor: 3. Each datacenter has a copy of the data. 28CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    29. 29. Environment Setup • Hardware Setup – Dell R410 • 2 Quad-core with hyper-threading • 8 x 4GB RAM • PERC 6/i RAID Controller with 4 x 450GB and 15k RPM drives • GigE Network • CentOS 5.7 • Clients – Uses Yahoo Cloud Serving Benchmark (YCSB) – Two clients in each data-center, with a total of 6 clients – Records metrics at 10s intervals • Every test case is independent of each other 29CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    30. 30. Workload • Read:Write ratio is 1:1. • Thread Counts: 256 from each client (total of 6 clients, 2 from each data-center) – Contacts Cassandra nodes only in its own data-center (no cross data-center traffic) • Key Distribution: Zipfian • Record Count: 100 million • Total Operations Per Test Case: 1 million 30CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    31. 31. Workload • Target Operations per Second: Varies • Test Data – Columns per row: 10 – Compacted row minimum size: 150 – Compacted row maximum size: 1331 – Compacted row mean size: 736 31CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    32. 32. Test Cases • Parameters varied in each test case – Apache Cassandra version: 0.8.6 vs 1.0.1 – Concurrent read and write threads in a Cassandra node – Number of keys cached 32CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    33. 33. Test Cases Test Description Number 1 Cassandra 0.8.6 binary as it is with no changes (Concurrent reads/writes = 32 and keys cached = 200k). base case for 0.8.6 2 Cassandra 0.8.6 with 64 concurrent reads and writes. Also keys cached is increased to 1 million. 3 Cassandra 0.8.6 with 64 concurrent reads and 32 concurrent writes. Also keys cached is increased to 1 million 4 Cassandra 1.0.1 with 64 concurrent reads and writes. Also keys cached is increased to 1 million. 5 Cassandra 1.0.1 with 64 concurrent reads and 32 concurrent writes. Also keys cached is increased to 1 million 6 Failure Test: Cassandra 1.0.1 with 64 concurrent reads and 64 concurrent writes. Also keys cached is increased to 1 million. For each test case, we plot operations per second (varied from 3000 to 24000) vs read/write latency 33CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    34. 34. Test Cases • Failure Test • Brought down a node in east coast data center (DC2) and ran the test varying the • Node going down has three implications on the latency. • Our test clients timeout after 300 retries to connect to failed node. • Our nodes in DC2 will go to DC3 to serve data that are not available in DC2 due to the node failure 3) • Our nodes in DC3 will have requests coming from the nodes of DC2 putting more load on them 34CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    35. 35. ResultsTest Case 1: Varying OPS with Cassandra 0.8.6 DefaultConfiguration• read performance of default configuration is increasing beyond 25ms after 3000 OPS. 35CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    36. 36. ResultsTest Case 1: Varying OPS with Cassandra 0.8.6 DefaultConfiguration• Even though write performance is staying almost constant the poor read performance will be a concern with this configuration. 36CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    37. 37. Results Test Case 2: Varying OPS with Cassandra 0.8.6 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) 37CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    38. 38. Results Test Case 2: Varying OPS with Cassandra 0.8.6 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) • even with good write performance, read performance after 12000 QPS is going beyond our threshold of 25ms 38CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    39. 39. Results Test Case 3: Varying OPS with Cassandra 0.8.6 and Custom Configuration (64 Concurrent Writes, 32 Concurrent Reads and 1 million keys cached) • latency goes beyond 25ms after reaching 18000 OPS 39CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    40. 40. Results Test Case 3: Varying OPS with Cassandra 0.8.6 and Custom Configuration (64 Concurrent Writes, 32 Concurrent Reads and 1 million keys cached) • better and consistent write performance 40CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    41. 41. Results Test Case 4: Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) • read performance has improved significantly and even at 24000 OPS it has stayed well below 10ms range. 41CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    42. 42. Results Test Case 4: Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) • better and consistent write performance 42CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    43. 43. Results Test Case 5: Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Writes, 32 Concurrent Reads, 1 million keys cached) • a degradation of read performance compared to test case 4 • latency goes beyond 25ms after reaching 21000 QPS. 43CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    44. 44. Results Test Case 5: Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Writes, 32 Concurrent Reads, 1 million keys cached) • a degradation of read performance compared to test case 4 • latency goes beyond 25ms after reaching 21000 QPS. 44CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    45. 45. Results Test Case 6: Failure Test - Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) • Node going down has three implications on the latency. • Our test clients timeout after 300 retries to connect to failed node. • Our nodes in DC2 will go to DC3 to serve data that are not available in DC2 due to the node failure 3) • Our nodes in DC3 will have requests coming from the nodes of DC2 putting more load on them 45CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    46. 46. Results Test Case 6: Failure Test - Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) DC2 46CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    47. 47. Results Test Case 6: Failure Test - Varying QPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) DC2 47CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    48. 48. Results Test Case 6: Failure Test - Varying QPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) DC3 • increase in average latency in both DC2 and DC3 data centers but even with the node failure the latency has stayed below 25ms. 48CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    49. 49. Results Test Case 6: Failure Test - Varying QPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) DC3 • increase in average latency in both DC2 and DC3 data centers but even with the node failure the latency has stayed below 25ms. 49CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    50. 50. Comparisons Cassandra 0.8.6 50CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    51. 51. Comparisons Cassandra 0.8.6 51CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    52. 52. Comparisons Cassandra 1.0.1 • 64 concurrent reads and writes with 1 millions keys cached has performed significantly better than the other configurations in terms of read performance 52CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    53. 53. Comparisons Cassandra 1.0.1 53CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    54. 54. Comparisons Cassandra 1.0.1 vs 0.8.6 Average Read Performance Comparison • 64 concurrent reads and writes with 1 millions keys cached has performed significantly better than the other configurations in terms of read performance 54CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    55. 55. Comparisons Cassandra 1.0.1 vs 0.8.6 95th Percentile Read Performance Comparison • 64 concurrent reads and writes with 1 millions keys cached has performed 55 significantly better than the other configurations in terms of read performanceCASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    56. 56. Performance Evaluation: Conclusions • Cassandra 1.0.1 with 64 concurrent reads and writes and with 1 millions keys cached we could serve 24000 operations per second under 15ms • Node failure tests prove that in this configuration we can serve higher load in the cluster with less than 25ms • Even the 95th percentile latency and 99th percentile numbers for this configuration is well within our expected limits 56CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    57. 57. Excited about the work? We’re hiring !! 57CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    58. 58. Thank you !! 58CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
    59. 59. Questions !! (Presentation is available at http://goo.gl/Ba9o4) 59CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)

    ×