Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017

1,056 views

Published on

Apache Cassandra is distributed masterless column store database which is becoming mainstream for analytics and IoT data.

https://www.bigdataspain.org/2017/talk/tuning-java-driver-for-apache-cassandra

Big Data Spain 2017
November 16th - 17th Kinépolis Madrid

Published in: Technology
  • Be the first to comment

Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017

  1. 1. Tuning Java Driver for Apache Cassandra November 2017 Nenad Bozic @NenadBozicNs nenad.bozic@smartcat. SmartCat www.smartcat. io
  2. 2. When people start with Apache Cassandra
  3. 3. When people call us for help
  4. 4. Agenda • intro to Apache Cassandra • tuning options in driver • use cases • takeaways and Q&A
  5. 5. Apache Cassandra
  6. 6. Cassandra Overview • partitioned data with tunable consistency • replication factor - how many replicas • masterless architecture • native multi-datacenter support
  7. 7. Architecture Client contact
  8. 8. Architecture Client request Consistency level 1 Replication factor 3
  9. 9. Architecture Client request response Consistency level 1 Replication factor 3
  10. 10. Architecture DC1 DC2 Cluster
  11. 11. Data Modeling • query based modeling • data is denormalized • data is duplicated
  12. 12. Use Cases • when high availability is crucial, and eventual consistency is tolerable • event sourcing • logging continuous streams of data • deep visitor analytics • early prototyping with significant query changes • referential integrity required • dynamic access patterns on data
  13. 13. Tuning options in driver
  14. 14. Drivers for Apache Cassandra
  15. 15. Load balancing https://www.slideshare.net/planetcassandra/apache-cassandra-and-drivers
  16. 16. Data Center Aware Load Balancing https://www.slideshare.net/planetcassandra/apache-cassandra-and-drivers
  17. 17. Toke Aware Load Balancing https://www.slideshare.net/planetcassandra/apache-cassandra-and-drivers
  18. 18. Latency Aware Load Balancing
  19. 19. Pooling options • driver communicates with cluster with pool of connections • changed between V2 and V3 version of protocol (core lowered to 1) • going for more requests on connection can put more load to cluster • add monitoring of in flight queries on driver side and tune for your use case
  20. 20. Pooling options
  21. 21. Speculative executions • spawn additional queries to other nodes after configured time http://docs.datastax.com/en/developer/java-driver/3.1/manual/speculative_execution/
  22. 22. Speculative executions • constant speculative execution policy • percentile speculative execution policy
  23. 23. Timeouts • driver read timeout vs server read timeout • driver settings for all queries or per query settings • setReadTimeoutMillis and setConnectionTimeoutMillis
  24. 24. Retry policies • fail early and retry • add retry policy or speculative execution • downgrading retry policy if inconsistent data is more important than no data
  25. 25. Use cases
  26. 26. Click stream and IoT measurements • visualize measurements from many devices • fast access with tolerable inconsistencies • DC aware and token aware policy to land on local node with data • lower consistency level (ONE) or use downgrading retry policy • use speculative executions to query more nodes if cluster can manage load
  27. 27. Mission critical data with tolerable performance • stock data in warehouse used to compare with ERP system • high consistency (read + write > replication factor) • retry and reconnect policy is a must • choose lower requests per connection numbers not to overload cluster • set lower read timeout to fail early and retry
  28. 28. Write heavy low latency read use case • ad serving (store user analytics and serve ads fast) • separate read and write for different tuning options • latency aware policy on reads to choose always fast performing nodes • lower down read timeout on driver and server to fail early • increase maximum requests per connection
  29. 29. Conclusion
  30. 30. Conclusion and take aways • know your use case and know your database • each tuning options requires good monitoring TEST ADJUST MEASURE
  31. 31. Links • SmartCat Blog post - Tuning Java driver for Apache Cassandra - part 1 • SmartCat Blog post - Tuning Java driver for Apache Cassandra - part 2 • Use case example - Tuning for heavy write and low latency read scenario
  32. 32. Q&A
  33. 33. Thank you Nenad Bozic @NenadBozic Ns SmartCat www.smartcat.i o

×