Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017

Tuning Java Driver for
Apache Cassandra
November 2017
Nenad Bozic
@NenadBozicNs
nenad.bozic@smartcat.
SmartCat
www.smartcat.
io

When people start with Apache Cassandra

Agenda
• intro to Apache Cassandra
• tuning options in driver
• use cases
• takeaways and Q&A

Cassandra Overview
• partitioned data with tunable consistency
• replication factor - how many replicas
• masterless architecture
• native multi-datacenter support

Architecture
Client request
Consistency level 1
Replication factor 3

Architecture
Client request
response
Consistency level 1
Replication factor 3

Data Modeling
• query based modeling
• data is denormalized
• data is duplicated

Use Cases
• when high availability is crucial, and eventual consistency is tolerable
• event sourcing
• logging continuous streams of data
• deep visitor analytics
• early prototyping with significant query changes
• referential integrity required
• dynamic access patterns on data

Load balancing
https://www.slideshare.net/planetcassandra/apache-cassandra-and-drivers

Data Center Aware Load Balancing

Toke Aware Load Balancing

Pooling options
• driver communicates with cluster with pool of connections
• changed between V2 and V3 version of protocol (core lowered to 1)
• going for more requests on connection can put more load to cluster
• add monitoring of in flight queries on driver side and tune for your use case

Speculative executions
• spawn additional queries to other nodes after configured time
http://docs.datastax.com/en/developer/java-driver/3.1/manual/speculative_execution/

Speculative executions
• constant speculative execution policy
• percentile speculative execution policy

Timeouts
• driver read timeout vs server read timeout
• driver settings for all queries or per query settings
• setReadTimeoutMillis and setConnectionTimeoutMillis

Retry policies
• fail early and retry
• add retry policy or speculative execution
• downgrading retry policy if inconsistent data is more important than no data

Click stream and IoT measurements
• visualize measurements from many devices
• fast access with tolerable inconsistencies
• DC aware and token aware policy to land on local node with data
• lower consistency level (ONE) or use downgrading retry policy
• use speculative executions to query more nodes if cluster can manage load

Mission critical data with tolerable performance
• stock data in warehouse used to compare with ERP system
• high consistency (read + write > replication factor)
• retry and reconnect policy is a must
• choose lower requests per connection numbers not to overload cluster
• set lower read timeout to fail early and retry

Write heavy low latency read use case
• ad serving (store user analytics and serve ads fast)
• separate read and write for different tuning options
• latency aware policy on reads to choose always fast performing nodes
• lower down read timeout on driver and server to fail early
• increase maximum requests per connection

Conclusion and take aways
• know your use case and know your database
• each tuning options requires good monitoring
TEST
ADJUST MEASURE

Links
• SmartCat Blog post - Tuning Java driver for Apache Cassandra - part 1
• SmartCat Blog post - Tuning Java driver for Apache Cassandra - part 2
• Use case example - Tuning for heavy write and low latency read scenario

Thank you
Nenad Bozic
@NenadBozic
Ns
SmartCat
www.smartcat.i
o

Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017

More Related Content

What's hot

Similar to Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017

More from Big Data Spain

Recently uploaded

Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017