Tuning Java Driver for
Apache Cassandra
November 2017
Nenad Bozic
@NenadBozicNs
nenad.bozic@smartcat.
SmartCat
www.smartcat.
io
When people start with Apache Cassandra
When people call us for help
Agenda
• intro to Apache Cassandra
• tuning options in driver
• use cases
• takeaways and Q&A
Apache Cassandra
Cassandra Overview
• partitioned data with tunable consistency
• replication factor - how many replicas
• masterless architecture
• native multi-datacenter support
Architecture
Client contact
Architecture
Client request
Consistency level 1
Replication factor 3
Architecture
Client request
response
Consistency level 1
Replication factor 3
Architecture
DC1 DC2
Cluster
Data Modeling
• query based modeling
• data is denormalized
• data is duplicated
Use Cases
• when high availability is crucial, and eventual consistency is tolerable
• event sourcing
• logging continuous streams of data
• deep visitor analytics
• early prototyping with significant query changes
• referential integrity required
• dynamic access patterns on data
Tuning options in driver
Drivers for Apache Cassandra
Load balancing
https://www.slideshare.net/planetcassandra/apache-cassandra-and-drivers
Data Center Aware Load Balancing
https://www.slideshare.net/planetcassandra/apache-cassandra-and-drivers
Toke Aware Load Balancing
https://www.slideshare.net/planetcassandra/apache-cassandra-and-drivers
Latency Aware Load Balancing
Pooling options
• driver communicates with cluster with pool of connections
• changed between V2 and V3 version of protocol (core lowered to 1)
• going for more requests on connection can put more load to cluster
• add monitoring of in flight queries on driver side and tune for your use case
Pooling options
Speculative executions
• spawn additional queries to other nodes after configured time
http://docs.datastax.com/en/developer/java-driver/3.1/manual/speculative_execution/
Speculative executions
• constant speculative execution policy
• percentile speculative execution policy
Timeouts
• driver read timeout vs server read timeout
• driver settings for all queries or per query settings
• setReadTimeoutMillis and setConnectionTimeoutMillis
Retry policies
• fail early and retry
• add retry policy or speculative execution
• downgrading retry policy if inconsistent data is more important than no data
Use cases
Click stream and IoT measurements
• visualize measurements from many devices
• fast access with tolerable inconsistencies
• DC aware and token aware policy to land on local node with data
• lower consistency level (ONE) or use downgrading retry policy
• use speculative executions to query more nodes if cluster can manage load
Mission critical data with tolerable performance
• stock data in warehouse used to compare with ERP system
• high consistency (read + write > replication factor)
• retry and reconnect policy is a must
• choose lower requests per connection numbers not to overload cluster
• set lower read timeout to fail early and retry
Write heavy low latency read use case
• ad serving (store user analytics and serve ads fast)
• separate read and write for different tuning options
• latency aware policy on reads to choose always fast performing nodes
• lower down read timeout on driver and server to fail early
• increase maximum requests per connection
Conclusion
Conclusion and take aways
• know your use case and know your database
• each tuning options requires good monitoring
TEST
ADJUST MEASURE
Links
• SmartCat Blog post - Tuning Java driver for Apache Cassandra - part 1
• SmartCat Blog post - Tuning Java driver for Apache Cassandra - part 2
• Use case example - Tuning for heavy write and low latency read scenario
Q&A
Thank you
Nenad Bozic
@NenadBozic
Ns
SmartCat
www.smartcat.i
o

Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017