CASSANDRA AT WIZE COMMERCE

 Eran Chinthaka Withana
 Eran.Withana@wizecommerce.com



CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
About me


    • Engineer in Platform and Infrastructure team at Wize
      Commerce (formerly Nextag)
    • Member, PMC Member and a committer of Apache
      Software Foundation
            – Contributed to Web services project since 2004
    • (in a different life) PhD in Computer Science from
      Indiana University, Bloomington, Indiana

    • Today

                                                                                     2
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
In the next hour …


    • Wize Commerce
    • Impact of Cassandra on Wize Commerce
            – Object Cache
            – Personalized Search
    • Performance evaluation of Cassandra in a multi-data
      center and a read/write heavy environment




                                                                                     3
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
WIZE COMMERCE




CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
About Wize Commerce

    • Helping companies maximize their eCommerce
      investments
            – across every channel, device and digital ecosystem
            – an expertise we’ve honed for years with our eCommerce
              customers
            – providing them with unmatched traffic and monetization
              services at incredible scale




                                                                                     5
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
About Wize Commerce

    • Scale of Wize Commerce
            – We drive over $1.1 Billion in annual worldwide sales
            – Shopping Network includes Nextag, guenstiger.de,
              FanSnap, and Calibex
            – Each week, we manage
                   •   21 Million Keyword Searches
                   •   105 Million Retargeted Ads
                   •   140 Million Bot Crawls
                   •   300 Million Facebook Ads
                   •   700 Million Keywords
                   •   560 Million Product SKUs
                   •   1000s of Simultaneous A/B Test

                                                                                     6
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
CASSANDRA AT WIZE COMMERCE - CACHE




CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Cache Architecture




CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Cache Architecture




    • Multi-tiered read-through cache, optimized for performance
    • TTLs at upper levels to keep the data fresh
    • JMS based infrastructure to refresh objects on-demand

                                                                                     9
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Cache - Expectations


    • For each object
            – Less than 30ms 95th percentile read latency
            – Less than 1-hour of update latency with 30M updates
              (phase 1, with existing components)
            – 10 minutes with eventing system integrated
    • Fault tolerance
    • Low maintenance overheads
    • Ability to scale


                                                                                     10
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Cache – Cassandra Integration




                                                                                     11
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Cache – Cassandra Integration

           DC1                             DC2                            DC3        DC4




    • Replication factors to facilitate required number of copies per
      region
    • Consistency level to suit business requirements
    • 6 multi-data center clusters with total nodes per cluster
      ranging from 24 to 32
    • In house monitoring system for continuous monitoring and
      escalations
                                                                                           12
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Cache – Cassandra Integration




    • Clients
            – Hector with DynamicLoadBalancing policy
            – Started experimenting with Astyanax
    • Maintenance
            – Weekly repair and compaction tasks
    • Monitoring
            – System health monitoring
            – End-to-end latency
            – Update latency
                                                                                     13
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Cache – Cassandra Integration


    • Ring output of a cluster
       Address       DC       Rack     Status State Load     Owns Token
                                                    148873535527910577765226390751398592512
       xx.xx.xx.79    DC1       RAC1     Up Normal 90.19 GB       12.50% 0
       xx.xx.xx.75    DC2       RAC1     Up Normal 51.15 GB       0.00% 1
       xx.xx.xx.75    DC3       RAC1     Up Normal 126.62 GB       0.00% 2
       xx.xx.xx.80    DC1       RAC1     Up Normal 88.57 GB       12.50% 21267647932558653966460912964485513216
       xx.xx.xx.81    DC1       RAC1     Up Normal 89.82 GB       12.50% 42535295865117307932921825928971026432
       xx.xx.xx.76    DC2       RAC1     Up Normal 51.1 GB       0.00% 42535295865117307932921825928971026433
       xx.xx.xx.76    DC3       RAC1     Up Normal 124.49 GB       0.00% 42535295865117307932921825928971026434
       xx.xx.xx.82    DC1       RAC1     Up Normal 85.78 GB       12.50% 63802943797675961899382738893456539648
       xx.xx.xx.83    DC1       RAC1     Up Normal 84.34 GB       12.50% 85070591730234615865843651857942052864
       xx.xx.xx.77    DC2       RAC1     Up Normal 49.34 GB       0.00% 85070591730234615865843651857942052865
       xx.xx.xx.77    DC3       RAC1     Up Normal 123.54 GB       0.00% 85070591730234615865843651857942052866
       xx.xx.xx.84    DC1       RAC1     Up Normal 82.94 GB       12.50% 106338239662793269832304564822427566080
       xx.xx.xx.85    DC1       RAC1     Up Normal 83.1 GB       12.50% 127605887595351923798765477786913079296
       xx.xx.xx.78    DC2       RAC1     Up Normal 47.98 GB       0.00% 127605887595351923798765477786913079297
       xx.xx.xx.78    DC3       RAC1     Up Normal 121.25 GB       0.00% 127605887595351923798765477786913079298
       xx.xx.xx.86    DC1       RAC1     Up Normal 83.41 GB       12.50% 148873535527910577765226390751398592512




CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Cache – Cassandra Integration


    • Column family stats of a cluster
                             Keyspace: XXXX
                                 Read Count: 37060467
                                 Read Latency: 3.0589244618800944 ms.
                                 Write Count: 37013052
                                 Write Latency: 0.05114632081677566 ms.
                                 Pending Tasks: 0
                                     Column Family: YYY
                                     SSTable count: 11
                                     Space used (live): 71463479840
                                     Space used (total): 71463479840
                                     Number of Keys (estimate): 66231424
                                     Memtable Columns Count: 314964
                                     Memtable Data Size: 68140546
                                     Memtable Switch Count: 628
                                     Read Count: 37060467
                                     Read Latency: 3.138 ms.
                                     Write Count: 37013052
                                     Write Latency: 0.058 ms.
                                     Pending Tasks: 0
                                     Bloom Filter False Postives: 10653
                                     Bloom Filter False Ratio: 0.01611
                                     Bloom Filter Space Used: 173770024
                                     Key cache capacity: 60000000
                                     Key cache size: 13309399
                                     Key cache hit rate: 0.9210111414757199
                                     Row cache: disabled
                                     Compacted row minimum size: 925
                                     Compacted row maximum size: 8239
                                     Compacted row mean size: 2488

CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Cache – Cassandra Datastore Performance


    • 3-6ms average read latency across all objects in all
      data centers
    • 15-20ms 95th percentile read latency
    • 30mins average update latency at 25M updates
    • Zero downtime even with multiple node failures




                                                                                     16
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Cache – Snapshot of Live System



                                                                                      Median Read Latency




                                                                                     Objects Scrubbed in Last
                                                                                     24hrs




                                                                                      Scrubber Latency



                                                                                                          17
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Cassandra Integration – Lessons Learned

    • Try to understand the internals, read code and find solutions
      on your own before getting into support requests
            – Assumption: you have adventurous engineers :D
            – Use IRC channels, user lists
    • Never use RoundRobinLoadBalancingPolicy if you care about
      performance
            – DynamicLoadBalancingPolicy: based on the probability of failure of
              node
    • Divide keyspace within the datacenter and use token + 1
      method in other data centers
    • Experiment different configurations but make sure to have a
      quick fallback plan


                                                                                     18
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Cassandra Integration – Lessons Learned

    • Compaction are crucial for read/write heavy
      environment
    • 24 x 7 automated monitoring and alerts
            – Read/write latencies , read misses and node status at least
    • Consistency levels are important, if you expect node
      failures in a multi-data center environment
    • Concentrate on key cache and forget about row
      cache if you have limited resources.
            – Rely on OS file cache




                                                                                     19
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Cache: Future




                                                                                     20
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Cache: Future


    • Exposing Cache system using SOA based
      infrastructure
            – Thrift services enabling all cache accesses
    • Event based updates
            – Event based pipeline for changes for system-of-record
            – Based on Storm (Twitter)
    • Getting rid of Memcached




                                                                                     21
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
CASSANDRA AT WIZE COMMERCE – PERSONALIZED
 SEARCH




CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Personalized Search


    • Aggregates user data from multiple data sources, e.g.
      site search, banner clicks.
    • Uses statistical model to re-rank search results
      tailored to the user.
    • Decomposes user information into model variables:
      brand preference, merchant preference, product
      category preference, etc.




                                                                                     23
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Personalized Search – Cassandra Integration


    • Serves 30-40MM banner ad impressions daily
    • Before: rely on user cookie (stores up to 4 weeks
      data)
    • After: use user cookie for today's data, combined
      with Cassandra Data Store to keep up to 3 months
      data




                                                                                     24
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
PERFORMANCE EVALUATION OF APACHE
 CASSANDRA IN A MULTI-DATA CENTER, READ/WRITE
 HEAVY ENVIRONMENT



CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Objectives



    • Understand the limitations of Cassandra when
      deployed in a multi-data center environment
    • Find out the best set of parameters that can
      be used and tuned to improve the
      performance
    • Find out the limits of Cassandra cluster and for
      each version.


                                                                                     26
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Objectives



    • Understand its scalability characteristics with
      varying amount of operations per second
            – This will help us to understand how much of load
              we can serve without causing any significant
              performance degradations.
    • Understand the implications of node failures
      on its capability to efficiently serve data to
      client requests

                                                                                     27
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Environment Setup


    • Test Metrics
            – Operation Latency for a given throughput (set in the client)
                   • Average, Minimum, Maximum, 95th percentile
    • Test Setup
            – Versions: Apache Cassandra 0.8.6 and 1.0.1
            – Node Distribution: 12-nodes distributed over three
              geographically distributed data centers in US
            – Key Distribution: Keyspace is divided into four in each data
              center and each node in the cluster is responsible for 1/4th
              of the keyspace
            – Replication Factor: 3. Each datacenter has a copy of the
              data.
                                                                                     28
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Environment Setup


    • Hardware Setup
            – Dell R410
                   •   2 Quad-core with hyper-threading
                   •   8 x 4GB RAM
                   •   PERC 6/i RAID Controller with 4 x 450GB and 15k RPM drives
                   •   GigE Network
                   •   CentOS 5.7
    • Clients
            – Uses Yahoo Cloud Serving Benchmark (YCSB)
            – Two clients in each data-center, with a total of 6 clients
            – Records metrics at 10s intervals
    • Every test case is independent of each other                                   29
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Workload

    • Read:Write ratio is 1:1.
    • Thread Counts: 256 from each client (total of 6 clients, 2 from
      each data-center)
            – Contacts Cassandra nodes only in its own data-center (no cross data-center
              traffic)
    • Key Distribution: Zipfian
    • Record Count: 100 million
    • Total Operations Per Test Case: 1 million




                                                                                           30
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Workload


    • Target Operations per Second: Varies
    • Test Data
            –   Columns per row: 10
            –   Compacted row minimum size: 150
            –   Compacted row maximum size: 1331
            –   Compacted row mean size: 736




                                                                                     31
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Test Cases


    • Parameters varied in each test case
            – Apache Cassandra version: 0.8.6 vs 1.0.1
            – Concurrent read and write threads in a Cassandra node
            – Number of keys cached




                                                                                     32
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Test Cases

       Test               Description
       Number
       1                  Cassandra 0.8.6 binary as it is with no changes (Concurrent reads/writes = 32
                          and keys cached = 200k). base case for 0.8.6
       2                  Cassandra 0.8.6 with 64 concurrent reads and writes. Also keys cached is
                          increased to 1 million.
       3                  Cassandra 0.8.6 with 64 concurrent reads and 32 concurrent writes. Also keys
                          cached is increased to 1 million
       4                  Cassandra 1.0.1 with 64 concurrent reads and writes. Also keys cached is
                          increased to 1 million.
       5                  Cassandra 1.0.1 with 64 concurrent reads and 32 concurrent writes. Also keys
                          cached is increased to 1 million
       6                  Failure Test: Cassandra 1.0.1 with 64 concurrent reads and 64 concurrent
                          writes. Also keys cached is increased to 1 million.


         For each test case, we plot operations per second (varied from 3000 to 24000) vs
         read/write latency
                                                                                                          33
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Test Cases


    • Failure Test
           • Brought down a node in east coast data center (DC2) and ran the test
             varying the
           • Node going down has three implications on the latency.
               • Our test clients timeout after 300 retries to connect to failed node.
               • Our nodes in DC2 will go to DC3 to serve data that are not
                 available in DC2 due to the node failure 3)
               • Our nodes in DC3 will have requests coming from the nodes of
                 DC2 putting more load on them




                                                                                     34
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Results

Test Case 1: Varying OPS with Cassandra 0.8.6 Default
Configuration




•     read performance of default configuration is increasing beyond 25ms after 3000 OPS.
                                                                                       35
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Results

Test Case 1: Varying OPS with Cassandra 0.8.6 Default
Configuration




•     Even though write performance is staying almost constant the poor read
      performance will be a concern with this configuration.
                                                                                     36
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Results
    Test Case 2: Varying OPS with Cassandra 0.8.6 and Custom Configuration (64
    Concurrent Reads/Writes, 1 million keys cached)




                                                                                     37
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Results
    Test Case 2: Varying OPS with Cassandra 0.8.6 and Custom Configuration (64
    Concurrent Reads/Writes, 1 million keys cached)




    • even with good write performance, read performance after 12000 QPS is
      going beyond our threshold of 25ms
                                                                                     38
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Results
    Test Case 3: Varying OPS with Cassandra 0.8.6 and Custom Configuration (64
    Concurrent Writes, 32 Concurrent Reads and 1 million keys cached)




    • latency goes beyond 25ms after reaching 18000 OPS
                                                                                     39
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Results
    Test Case 3: Varying OPS with Cassandra 0.8.6 and Custom Configuration (64
    Concurrent Writes, 32 Concurrent Reads and 1 million keys cached)




    • better and consistent write performance

                                                                                     40
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Results
    Test Case 4: Varying OPS with Cassandra 1.0.1 and Custom Configuration (64
    Concurrent Reads/Writes, 1 million keys cached)




    •     read performance has improved significantly and even at 24000 OPS it has stayed
          well below 10ms range.
                                                                                            41
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Results
    Test Case 4: Varying OPS with Cassandra 1.0.1 and Custom Configuration (64
    Concurrent Reads/Writes, 1 million keys cached)




    •     better and consistent write performance

                                                                                     42
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Results
    Test Case 5: Varying OPS with Cassandra 1.0.1 and Custom Configuration (64
    Concurrent Writes, 32 Concurrent Reads, 1 million keys cached)




    • a degradation of read performance compared to test case 4
    • latency goes beyond 25ms after reaching 21000 QPS.
                                                                                     43
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Results
    Test Case 5: Varying OPS with Cassandra 1.0.1 and Custom Configuration (64
    Concurrent Writes, 32 Concurrent Reads, 1 million keys cached)




    • a degradation of read performance compared to test case 4
    • latency goes beyond 25ms after reaching 21000 QPS.
                                                                                     44
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Results

    Test Case 6: Failure Test - Varying OPS with Cassandra 1.0.1 and
    Custom Configuration (64 Concurrent Reads/Writes, 1 million
    keys cached)


    • Node going down has three implications on the
      latency.
            • Our test clients timeout after 300 retries to connect to
              failed node.
            • Our nodes in DC2 will go to DC3 to serve data that are not
              available in DC2 due to the node failure 3)
            • Our nodes in DC3 will have requests coming from the
              nodes of DC2 putting more load on them

                                                                                     45
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Results

    Test Case 6: Failure Test - Varying OPS with Cassandra 1.0.1 and
    Custom Configuration (64 Concurrent Reads/Writes, 1 million
    keys cached)

                                                                           DC2




                                                                                     46
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Results

    Test Case 6: Failure Test - Varying QPS with Cassandra 1.0.1 and
    Custom Configuration (64 Concurrent Reads/Writes, 1 million
    keys cached)

                                                                             DC2




                                                                                     47
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Results
    Test Case 6: Failure Test - Varying QPS with Cassandra 1.0.1 and Custom
    Configuration (64 Concurrent Reads/Writes, 1 million keys cached)

                                                                         DC3




    • increase in average latency in both DC2 and DC3 data centers but even
      with the node failure the latency has stayed below 25ms.
                                                                                     48
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Results
    Test Case 6: Failure Test - Varying QPS with Cassandra 1.0.1 and Custom
    Configuration (64 Concurrent Reads/Writes, 1 million keys cached)

                                                                          DC3




    • increase in average latency in both DC2 and DC3 data centers but even
      with the node failure the latency has stayed below 25ms.
                                                                                     49
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Comparisons

    Cassandra 0.8.6




                                                                                     50
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Comparisons

    Cassandra 0.8.6




                                                                                     51
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Comparisons

    Cassandra 1.0.1




    • 64 concurrent reads and writes with 1 millions keys cached has performed
      significantly better than the other configurations in terms of read
      performance
                                                                                     52
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Comparisons

    Cassandra 1.0.1




                                                                                     53
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Comparisons

    Cassandra 1.0.1 vs 0.8.6 Average Read Performance Comparison




    •     64 concurrent reads and writes with 1 millions keys cached has performed
          significantly better than the other configurations in terms of read performance
                                                                                            54
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Comparisons
    Cassandra 1.0.1 vs 0.8.6 95th Percentile Read Performance Comparison




    •     64 concurrent reads and writes with 1 millions keys cached has performed
                                                                                            55
          significantly better than the other configurations in terms of read performance
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Performance Evaluation: Conclusions


    • Cassandra 1.0.1 with 64 concurrent reads and writes
      and with 1 millions keys cached we could serve
      24000 operations per second under 15ms
    • Node failure tests prove that in this configuration we
      can serve higher load in the cluster with less than
      25ms
    • Even the 95th percentile latency and 99th percentile
      numbers for this configuration is well within our
      expected limits

                                                                                     56
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Excited about the work?

                                                      We’re hiring !!




                                                                                     57
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Thank you !!




                                                                                     58
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
Questions !!
                           (Presentation is available at http://goo.gl/Ba9o4)




                                                                                     59
CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)

Cassandra At Wize Commerce

  • 1.
    CASSANDRA AT WIZECOMMERCE Eran Chinthaka Withana Eran.Withana@wizecommerce.com CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 2.
    About me • Engineer in Platform and Infrastructure team at Wize Commerce (formerly Nextag) • Member, PMC Member and a committer of Apache Software Foundation – Contributed to Web services project since 2004 • (in a different life) PhD in Computer Science from Indiana University, Bloomington, Indiana • Today 2 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 3.
    In the nexthour … • Wize Commerce • Impact of Cassandra on Wize Commerce – Object Cache – Personalized Search • Performance evaluation of Cassandra in a multi-data center and a read/write heavy environment 3 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 4.
    WIZE COMMERCE CASSANDRA ATWize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 5.
    About Wize Commerce • Helping companies maximize their eCommerce investments – across every channel, device and digital ecosystem – an expertise we’ve honed for years with our eCommerce customers – providing them with unmatched traffic and monetization services at incredible scale 5 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 6.
    About Wize Commerce • Scale of Wize Commerce – We drive over $1.1 Billion in annual worldwide sales – Shopping Network includes Nextag, guenstiger.de, FanSnap, and Calibex – Each week, we manage • 21 Million Keyword Searches • 105 Million Retargeted Ads • 140 Million Bot Crawls • 300 Million Facebook Ads • 700 Million Keywords • 560 Million Product SKUs • 1000s of Simultaneous A/B Test 6 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 7.
    CASSANDRA AT WIZECOMMERCE - CACHE CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 8.
    Cache Architecture CASSANDRA ATWize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 9.
    Cache Architecture • Multi-tiered read-through cache, optimized for performance • TTLs at upper levels to keep the data fresh • JMS based infrastructure to refresh objects on-demand 9 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 10.
    Cache - Expectations • For each object – Less than 30ms 95th percentile read latency – Less than 1-hour of update latency with 30M updates (phase 1, with existing components) – 10 minutes with eventing system integrated • Fault tolerance • Low maintenance overheads • Ability to scale 10 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 11.
    Cache – CassandraIntegration 11 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 12.
    Cache – CassandraIntegration DC1 DC2 DC3 DC4 • Replication factors to facilitate required number of copies per region • Consistency level to suit business requirements • 6 multi-data center clusters with total nodes per cluster ranging from 24 to 32 • In house monitoring system for continuous monitoring and escalations 12 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 13.
    Cache – CassandraIntegration • Clients – Hector with DynamicLoadBalancing policy – Started experimenting with Astyanax • Maintenance – Weekly repair and compaction tasks • Monitoring – System health monitoring – End-to-end latency – Update latency 13 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 14.
    Cache – CassandraIntegration • Ring output of a cluster Address DC Rack Status State Load Owns Token 148873535527910577765226390751398592512 xx.xx.xx.79 DC1 RAC1 Up Normal 90.19 GB 12.50% 0 xx.xx.xx.75 DC2 RAC1 Up Normal 51.15 GB 0.00% 1 xx.xx.xx.75 DC3 RAC1 Up Normal 126.62 GB 0.00% 2 xx.xx.xx.80 DC1 RAC1 Up Normal 88.57 GB 12.50% 21267647932558653966460912964485513216 xx.xx.xx.81 DC1 RAC1 Up Normal 89.82 GB 12.50% 42535295865117307932921825928971026432 xx.xx.xx.76 DC2 RAC1 Up Normal 51.1 GB 0.00% 42535295865117307932921825928971026433 xx.xx.xx.76 DC3 RAC1 Up Normal 124.49 GB 0.00% 42535295865117307932921825928971026434 xx.xx.xx.82 DC1 RAC1 Up Normal 85.78 GB 12.50% 63802943797675961899382738893456539648 xx.xx.xx.83 DC1 RAC1 Up Normal 84.34 GB 12.50% 85070591730234615865843651857942052864 xx.xx.xx.77 DC2 RAC1 Up Normal 49.34 GB 0.00% 85070591730234615865843651857942052865 xx.xx.xx.77 DC3 RAC1 Up Normal 123.54 GB 0.00% 85070591730234615865843651857942052866 xx.xx.xx.84 DC1 RAC1 Up Normal 82.94 GB 12.50% 106338239662793269832304564822427566080 xx.xx.xx.85 DC1 RAC1 Up Normal 83.1 GB 12.50% 127605887595351923798765477786913079296 xx.xx.xx.78 DC2 RAC1 Up Normal 47.98 GB 0.00% 127605887595351923798765477786913079297 xx.xx.xx.78 DC3 RAC1 Up Normal 121.25 GB 0.00% 127605887595351923798765477786913079298 xx.xx.xx.86 DC1 RAC1 Up Normal 83.41 GB 12.50% 148873535527910577765226390751398592512 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 15.
    Cache – CassandraIntegration • Column family stats of a cluster Keyspace: XXXX Read Count: 37060467 Read Latency: 3.0589244618800944 ms. Write Count: 37013052 Write Latency: 0.05114632081677566 ms. Pending Tasks: 0 Column Family: YYY SSTable count: 11 Space used (live): 71463479840 Space used (total): 71463479840 Number of Keys (estimate): 66231424 Memtable Columns Count: 314964 Memtable Data Size: 68140546 Memtable Switch Count: 628 Read Count: 37060467 Read Latency: 3.138 ms. Write Count: 37013052 Write Latency: 0.058 ms. Pending Tasks: 0 Bloom Filter False Postives: 10653 Bloom Filter False Ratio: 0.01611 Bloom Filter Space Used: 173770024 Key cache capacity: 60000000 Key cache size: 13309399 Key cache hit rate: 0.9210111414757199 Row cache: disabled Compacted row minimum size: 925 Compacted row maximum size: 8239 Compacted row mean size: 2488 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 16.
    Cache – CassandraDatastore Performance • 3-6ms average read latency across all objects in all data centers • 15-20ms 95th percentile read latency • 30mins average update latency at 25M updates • Zero downtime even with multiple node failures 16 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 17.
    Cache – Snapshotof Live System Median Read Latency Objects Scrubbed in Last 24hrs Scrubber Latency 17 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 18.
    Cassandra Integration –Lessons Learned • Try to understand the internals, read code and find solutions on your own before getting into support requests – Assumption: you have adventurous engineers :D – Use IRC channels, user lists • Never use RoundRobinLoadBalancingPolicy if you care about performance – DynamicLoadBalancingPolicy: based on the probability of failure of node • Divide keyspace within the datacenter and use token + 1 method in other data centers • Experiment different configurations but make sure to have a quick fallback plan 18 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 19.
    Cassandra Integration –Lessons Learned • Compaction are crucial for read/write heavy environment • 24 x 7 automated monitoring and alerts – Read/write latencies , read misses and node status at least • Consistency levels are important, if you expect node failures in a multi-data center environment • Concentrate on key cache and forget about row cache if you have limited resources. – Rely on OS file cache 19 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 20.
    Cache: Future 20 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 21.
    Cache: Future • Exposing Cache system using SOA based infrastructure – Thrift services enabling all cache accesses • Event based updates – Event based pipeline for changes for system-of-record – Based on Storm (Twitter) • Getting rid of Memcached 21 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 22.
    CASSANDRA AT WIZECOMMERCE – PERSONALIZED SEARCH CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 23.
    Personalized Search • Aggregates user data from multiple data sources, e.g. site search, banner clicks. • Uses statistical model to re-rank search results tailored to the user. • Decomposes user information into model variables: brand preference, merchant preference, product category preference, etc. 23 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 24.
    Personalized Search –Cassandra Integration • Serves 30-40MM banner ad impressions daily • Before: rely on user cookie (stores up to 4 weeks data) • After: use user cookie for today's data, combined with Cassandra Data Store to keep up to 3 months data 24 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 25.
    PERFORMANCE EVALUATION OFAPACHE CASSANDRA IN A MULTI-DATA CENTER, READ/WRITE HEAVY ENVIRONMENT CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 26.
    Objectives • Understand the limitations of Cassandra when deployed in a multi-data center environment • Find out the best set of parameters that can be used and tuned to improve the performance • Find out the limits of Cassandra cluster and for each version. 26 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 27.
    Objectives • Understand its scalability characteristics with varying amount of operations per second – This will help us to understand how much of load we can serve without causing any significant performance degradations. • Understand the implications of node failures on its capability to efficiently serve data to client requests 27 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 28.
    Environment Setup • Test Metrics – Operation Latency for a given throughput (set in the client) • Average, Minimum, Maximum, 95th percentile • Test Setup – Versions: Apache Cassandra 0.8.6 and 1.0.1 – Node Distribution: 12-nodes distributed over three geographically distributed data centers in US – Key Distribution: Keyspace is divided into four in each data center and each node in the cluster is responsible for 1/4th of the keyspace – Replication Factor: 3. Each datacenter has a copy of the data. 28 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 29.
    Environment Setup • Hardware Setup – Dell R410 • 2 Quad-core with hyper-threading • 8 x 4GB RAM • PERC 6/i RAID Controller with 4 x 450GB and 15k RPM drives • GigE Network • CentOS 5.7 • Clients – Uses Yahoo Cloud Serving Benchmark (YCSB) – Two clients in each data-center, with a total of 6 clients – Records metrics at 10s intervals • Every test case is independent of each other 29 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 30.
    Workload • Read:Write ratio is 1:1. • Thread Counts: 256 from each client (total of 6 clients, 2 from each data-center) – Contacts Cassandra nodes only in its own data-center (no cross data-center traffic) • Key Distribution: Zipfian • Record Count: 100 million • Total Operations Per Test Case: 1 million 30 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 31.
    Workload • Target Operations per Second: Varies • Test Data – Columns per row: 10 – Compacted row minimum size: 150 – Compacted row maximum size: 1331 – Compacted row mean size: 736 31 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 32.
    Test Cases • Parameters varied in each test case – Apache Cassandra version: 0.8.6 vs 1.0.1 – Concurrent read and write threads in a Cassandra node – Number of keys cached 32 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 33.
    Test Cases Test Description Number 1 Cassandra 0.8.6 binary as it is with no changes (Concurrent reads/writes = 32 and keys cached = 200k). base case for 0.8.6 2 Cassandra 0.8.6 with 64 concurrent reads and writes. Also keys cached is increased to 1 million. 3 Cassandra 0.8.6 with 64 concurrent reads and 32 concurrent writes. Also keys cached is increased to 1 million 4 Cassandra 1.0.1 with 64 concurrent reads and writes. Also keys cached is increased to 1 million. 5 Cassandra 1.0.1 with 64 concurrent reads and 32 concurrent writes. Also keys cached is increased to 1 million 6 Failure Test: Cassandra 1.0.1 with 64 concurrent reads and 64 concurrent writes. Also keys cached is increased to 1 million. For each test case, we plot operations per second (varied from 3000 to 24000) vs read/write latency 33 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 34.
    Test Cases • Failure Test • Brought down a node in east coast data center (DC2) and ran the test varying the • Node going down has three implications on the latency. • Our test clients timeout after 300 retries to connect to failed node. • Our nodes in DC2 will go to DC3 to serve data that are not available in DC2 due to the node failure 3) • Our nodes in DC3 will have requests coming from the nodes of DC2 putting more load on them 34 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 35.
    Results Test Case 1:Varying OPS with Cassandra 0.8.6 Default Configuration • read performance of default configuration is increasing beyond 25ms after 3000 OPS. 35 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 36.
    Results Test Case 1:Varying OPS with Cassandra 0.8.6 Default Configuration • Even though write performance is staying almost constant the poor read performance will be a concern with this configuration. 36 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 37.
    Results Test Case 2: Varying OPS with Cassandra 0.8.6 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) 37 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 38.
    Results Test Case 2: Varying OPS with Cassandra 0.8.6 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) • even with good write performance, read performance after 12000 QPS is going beyond our threshold of 25ms 38 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 39.
    Results Test Case 3: Varying OPS with Cassandra 0.8.6 and Custom Configuration (64 Concurrent Writes, 32 Concurrent Reads and 1 million keys cached) • latency goes beyond 25ms after reaching 18000 OPS 39 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 40.
    Results Test Case 3: Varying OPS with Cassandra 0.8.6 and Custom Configuration (64 Concurrent Writes, 32 Concurrent Reads and 1 million keys cached) • better and consistent write performance 40 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 41.
    Results Test Case 4: Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) • read performance has improved significantly and even at 24000 OPS it has stayed well below 10ms range. 41 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 42.
    Results Test Case 4: Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) • better and consistent write performance 42 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 43.
    Results Test Case 5: Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Writes, 32 Concurrent Reads, 1 million keys cached) • a degradation of read performance compared to test case 4 • latency goes beyond 25ms after reaching 21000 QPS. 43 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 44.
    Results Test Case 5: Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Writes, 32 Concurrent Reads, 1 million keys cached) • a degradation of read performance compared to test case 4 • latency goes beyond 25ms after reaching 21000 QPS. 44 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 45.
    Results Test Case 6: Failure Test - Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) • Node going down has three implications on the latency. • Our test clients timeout after 300 retries to connect to failed node. • Our nodes in DC2 will go to DC3 to serve data that are not available in DC2 due to the node failure 3) • Our nodes in DC3 will have requests coming from the nodes of DC2 putting more load on them 45 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 46.
    Results Test Case 6: Failure Test - Varying OPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) DC2 46 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 47.
    Results Test Case 6: Failure Test - Varying QPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) DC2 47 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 48.
    Results Test Case 6: Failure Test - Varying QPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) DC3 • increase in average latency in both DC2 and DC3 data centers but even with the node failure the latency has stayed below 25ms. 48 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 49.
    Results Test Case 6: Failure Test - Varying QPS with Cassandra 1.0.1 and Custom Configuration (64 Concurrent Reads/Writes, 1 million keys cached) DC3 • increase in average latency in both DC2 and DC3 data centers but even with the node failure the latency has stayed below 25ms. 49 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 50.
    Comparisons Cassandra 0.8.6 50 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 51.
    Comparisons Cassandra 0.8.6 51 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 52.
    Comparisons Cassandra 1.0.1 • 64 concurrent reads and writes with 1 millions keys cached has performed significantly better than the other configurations in terms of read performance 52 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 53.
    Comparisons Cassandra 1.0.1 53 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 54.
    Comparisons Cassandra 1.0.1 vs 0.8.6 Average Read Performance Comparison • 64 concurrent reads and writes with 1 millions keys cached has performed significantly better than the other configurations in terms of read performance 54 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 55.
    Comparisons Cassandra 1.0.1 vs 0.8.6 95th Percentile Read Performance Comparison • 64 concurrent reads and writes with 1 millions keys cached has performed 55 significantly better than the other configurations in terms of read performance CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 56.
    Performance Evaluation: Conclusions • Cassandra 1.0.1 with 64 concurrent reads and writes and with 1 millions keys cached we could serve 24000 operations per second under 15ms • Node failure tests prove that in this configuration we can serve higher load in the cluster with less than 25ms • Even the 95th percentile latency and 99th percentile numbers for this configuration is well within our expected limits 56 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 57.
    Excited about thework? We’re hiring !! 57 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 58.
    Thank you !! 58 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)
  • 59.
    Questions !! (Presentation is available at http://goo.gl/Ba9o4) 59 CASSANDRA AT Wize Commerce – Eran Chinthaka Withana. Cassandra Meetup (07/25/2012)

Editor's Notes

  • #12 ----- Meeting Notes (7/25/12 16:58) -----Datastore will be Cassandra
  • #13 ----- Meeting Notes (7/25/12 16:58) -----DC2 and DC3 are in the same region
  • #16 ----- Meeting Notes (7/25/12 16:58) -----Read latencykey cache hit ratewrite latency
  • #19 ----- Meeting Notes (7/25/12 17:02) -----Astyanax