SlideShare a Scribd company logo
1 of 21
Download to read offline
Tuning Speculative Retries
to Fight Latency
Minh Do - Senior Distributed System Engineer
Michaël Figuière - Senior Distributed System Engineer
Agenda
What are Speculative Retries?
C* Setting & Internal Code
C* Testing and Outcome
Datastax Java Driver
Speculative Retries
• “The Tail at Scale” - Google paper
• Hedged Requests vs. Speculative Retries
Speculative Retries
• Tuning available both in C* and Java Driver
• It is all about the additional read retries to improve
the latency tail
C* Setting & Internals
• Read path only
• Set at table level:
– None, Always, Custom, Percentile
• Using different executors for different behaviors
• Impacted by Read Repair (for simplicity, ignore in
this talk)
No Retry Policy
• Use NeverSpeculatingReadExecutor
• Coordinator computes a list of alive replicas:
– #blockFor
• Sends only one full data request to one replica
• Sends additional digest request(s) to others
depending on Consistency Level (CL)
No Retry Policy
Example using CL_LOCAL_QUORUM (replication factor 3)
Always Retry Policy
• Use AlwaysSpeculatingReadExecutor
• List of replicas: (#blockFor + 1) or all nodes
• Coordinator sends 2 full data requests to 2
replicas
• Sends additional digest requests to others
depending on Consistency Level
Always Retry Policy
Example using CL_ONE
Custom/Percentile Retry Policy
• Use SpeculatingReadExecutor
• List of replicas: #blockFor + 1
• Send 1 full data request and 1 or more digest request to
replicas depending Consistency Level
• Use the last replicas on the list for the retries
• Coordinator waits for a duration to retry on another replica
– Custom: duration in millisecs
– Percentile: the percentile on the sampling latency
distribution
Custom/Percentile Retry Policy
Example using CL_LOCAL_QUORUM (replication factor 3)
C* test - setup
• Cassandra: 12 nodes (8 cores/60Gb RAM), approx 120Gb
data per node
• Client: 6 (8 cores/30 Gb RAM)
• Use Linux Traffic Control (tc) to simulate network glitch on
one C* node using delayed packet transmission technique
C* test - result
• Clients send 50K reads/sec with 3K writes/sec,
• Use TC to force a slow node
• Enable 95th Speculative Retry setting
• Throughput degraded to 30K reads/sec on Coord.
• Avg. latency gets doubled 0.5ms to 1ms
• 95th/99th latencies have many spikes to 2x-10x
C* test - result
• Client 18K reads/sec with 1K writes/sec
• Use TC to force a slow node
• Keep 95th Speculative Retry setting on
• Throughput increased to 20+K reads/sec
• Avg. latency gets doubled 0.5ms to 1ms
• 95th/99th latencies are lower, no spikes or more
stable
C* test - conclusion
• More stable and lower 95th/99th latencies:
– Build cluster with extra capacities and turn on
Speculative Retry
• C* cluster is already near its max capacity
– Disable Speculative Retry as performance getting worse
Client Speculative Retries
• SpeculativeExecutionPolicy
• Speculative retries for both reads and writes
• But only for idempotent requests
Client Speculative Retries
Example using CL_LOCAL_QUORUM (replication factor 3)
NoSpeculativeExecutionPolicy
• Just the normal Driver behavior
• Retries are attempted only after a fixed timeout as
part of Failover
ConstantSpeculativeExecutionPolicy
Trigger a Speculative Retry if the Coordinator
hasn’t answer within a given delay.
PercentileSpeculativeExecutionPolicy
Trigger speculative retries if the chosen
coordinator doesn’t answer within a given
percentile of its typical response time
Questions
?

More Related Content

What's hot

Exadata Performance Optimization
Exadata Performance OptimizationExadata Performance Optimization
Exadata Performance Optimization
Enkitec
 

What's hot (20)

Troubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTroubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contention
 
Tajo TPC-H Benchmark Test on AWS
Tajo TPC-H Benchmark Test on AWSTajo TPC-H Benchmark Test on AWS
Tajo TPC-H Benchmark Test on AWS
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 
Transparent Data Encryption in PostgreSQL
Transparent Data Encryption in PostgreSQLTransparent Data Encryption in PostgreSQL
Transparent Data Encryption in PostgreSQL
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Exadata
ExadataExadata
Exadata
 
Adapting and adopting spm v04
Adapting and adopting spm v04Adapting and adopting spm v04
Adapting and adopting spm v04
 
How to Monitoring the SRE Golden Signals (E-Book)
How to Monitoring the SRE Golden Signals (E-Book)How to Monitoring the SRE Golden Signals (E-Book)
How to Monitoring the SRE Golden Signals (E-Book)
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWR
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 
SFO15-302: Energy Aware Scheduling: Progress Update
SFO15-302: Energy Aware Scheduling: Progress UpdateSFO15-302: Energy Aware Scheduling: Progress Update
SFO15-302: Energy Aware Scheduling: Progress Update
 
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
 
Exadata Performance Optimization
Exadata Performance OptimizationExadata Performance Optimization
Exadata Performance Optimization
 
MySQL Group Replication
MySQL Group ReplicationMySQL Group Replication
MySQL Group Replication
 
주니어의 쿠버네티스 생태계에서 살아남기
주니어의 쿠버네티스 생태계에서 살아남기주니어의 쿠버네티스 생태계에서 살아남기
주니어의 쿠버네티스 생태계에서 살아남기
 
Oracle Performance Tools of the Trade
Oracle Performance Tools of the TradeOracle Performance Tools of the Trade
Oracle Performance Tools of the Trade
 
Average Active Sessions RMOUG2007
Average Active Sessions RMOUG2007Average Active Sessions RMOUG2007
Average Active Sessions RMOUG2007
 

Viewers also liked

What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
DataStax
 
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databasesGive sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
DataStax
 
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
DataStax
 
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
DataStax
 
Building Killr Applications with DSE
Building Killr Applications with DSEBuilding Killr Applications with DSE
Building Killr Applications with DSE
DataStax
 

Viewers also liked (20)

There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
 
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
 
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
 
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
 
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
 
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databasesGive sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
 
Webinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data PlatformWebinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data Platform
 
Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...
Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...
Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...
 
Webinar: Fighting Fraud with Graph Databases
Webinar: Fighting Fraud with Graph DatabasesWebinar: Fighting Fraud with Graph Databases
Webinar: Fighting Fraud with Graph Databases
 
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
 
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
The Promise and Perils of Encrypting Cassandra Data (Ameesh Divatia, Baffle, ...
 
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
 
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
 
Building Killr Applications with DSE
Building Killr Applications with DSEBuilding Killr Applications with DSE
Building Killr Applications with DSE
 
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
 
Can My Inventory Survive Eventual Consistency?
Can My Inventory Survive Eventual Consistency?Can My Inventory Survive Eventual Consistency?
Can My Inventory Survive Eventual Consistency?
 
Webinar - Bringing Game Changing Insights with Graph Databases
Webinar - Bringing Game Changing Insights with Graph DatabasesWebinar - Bringing Game Changing Insights with Graph Databases
Webinar - Bringing Game Changing Insights with Graph Databases
 
London Cassandra Meetup 10/23: Apache Cassandra at British Gas Connected Home...
London Cassandra Meetup 10/23: Apache Cassandra at British Gas Connected Home...London Cassandra Meetup 10/23: Apache Cassandra at British Gas Connected Home...
London Cassandra Meetup 10/23: Apache Cassandra at British Gas Connected Home...
 

Similar to Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016

Similar to Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016 (20)

Observer, a "real life" time series application
Observer, a "real life" time series applicationObserver, a "real life" time series application
Observer, a "real life" time series application
 
Training Slides: 202 - Monitoring & Troubleshooting
Training Slides: 202 - Monitoring & TroubleshootingTraining Slides: 202 - Monitoring & Troubleshooting
Training Slides: 202 - Monitoring & Troubleshooting
 
Cloud Performance Benchmarking
Cloud Performance BenchmarkingCloud Performance Benchmarking
Cloud Performance Benchmarking
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
Migrating 500 Nodes from Rackspace to Google Cloud with Zero Downtime
Migrating 500 Nodes from Rackspace to Google Cloud with Zero DowntimeMigrating 500 Nodes from Rackspace to Google Cloud with Zero Downtime
Migrating 500 Nodes from Rackspace to Google Cloud with Zero Downtime
 
Aerospike Go Language Client
Aerospike Go Language ClientAerospike Go Language Client
Aerospike Go Language Client
 
Chap2 slides
Chap2 slidesChap2 slides
Chap2 slides
 
(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive
 
Load Balancing 101
Load Balancing 101Load Balancing 101
Load Balancing 101
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Geographically dispersed perconaxtra db cluster deployment
Geographically dispersed perconaxtra db cluster deploymentGeographically dispersed perconaxtra db cluster deployment
Geographically dispersed perconaxtra db cluster deployment
 
IBM Maximo Performance Tuning
IBM Maximo Performance TuningIBM Maximo Performance Tuning
IBM Maximo Performance Tuning
 
20160503 Amazed by AWS | Tips about Performance on AWS
20160503 Amazed by AWS | Tips about Performance on AWS20160503 Amazed by AWS | Tips about Performance on AWS
20160503 Amazed by AWS | Tips about Performance on AWS
 
Training Slides: Intermediate 202: Performing Cluster Maintenance with Zero-D...
Training Slides: Intermediate 202: Performing Cluster Maintenance with Zero-D...Training Slides: Intermediate 202: Performing Cluster Maintenance with Zero-D...
Training Slides: Intermediate 202: Performing Cluster Maintenance with Zero-D...
 
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong KimCeph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
 
Parallel Distributed Deep Learning on HPCC Systems
Parallel Distributed Deep Learning on HPCC SystemsParallel Distributed Deep Learning on HPCC Systems
Parallel Distributed Deep Learning on HPCC Systems
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
Devops kc
Devops kcDevops kc
Devops kc
 

More from DataStax

More from DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

Recently uploaded

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Recently uploaded (20)

A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 

Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netflix) | Cassandra Summit 2016

  • 1. Tuning Speculative Retries to Fight Latency Minh Do - Senior Distributed System Engineer Michaël Figuière - Senior Distributed System Engineer
  • 2. Agenda What are Speculative Retries? C* Setting & Internal Code C* Testing and Outcome Datastax Java Driver
  • 3. Speculative Retries • “The Tail at Scale” - Google paper • Hedged Requests vs. Speculative Retries
  • 4. Speculative Retries • Tuning available both in C* and Java Driver • It is all about the additional read retries to improve the latency tail
  • 5. C* Setting & Internals • Read path only • Set at table level: – None, Always, Custom, Percentile • Using different executors for different behaviors • Impacted by Read Repair (for simplicity, ignore in this talk)
  • 6. No Retry Policy • Use NeverSpeculatingReadExecutor • Coordinator computes a list of alive replicas: – #blockFor • Sends only one full data request to one replica • Sends additional digest request(s) to others depending on Consistency Level (CL)
  • 7. No Retry Policy Example using CL_LOCAL_QUORUM (replication factor 3)
  • 8. Always Retry Policy • Use AlwaysSpeculatingReadExecutor • List of replicas: (#blockFor + 1) or all nodes • Coordinator sends 2 full data requests to 2 replicas • Sends additional digest requests to others depending on Consistency Level
  • 10. Custom/Percentile Retry Policy • Use SpeculatingReadExecutor • List of replicas: #blockFor + 1 • Send 1 full data request and 1 or more digest request to replicas depending Consistency Level • Use the last replicas on the list for the retries • Coordinator waits for a duration to retry on another replica – Custom: duration in millisecs – Percentile: the percentile on the sampling latency distribution
  • 11. Custom/Percentile Retry Policy Example using CL_LOCAL_QUORUM (replication factor 3)
  • 12. C* test - setup • Cassandra: 12 nodes (8 cores/60Gb RAM), approx 120Gb data per node • Client: 6 (8 cores/30 Gb RAM) • Use Linux Traffic Control (tc) to simulate network glitch on one C* node using delayed packet transmission technique
  • 13. C* test - result • Clients send 50K reads/sec with 3K writes/sec, • Use TC to force a slow node • Enable 95th Speculative Retry setting • Throughput degraded to 30K reads/sec on Coord. • Avg. latency gets doubled 0.5ms to 1ms • 95th/99th latencies have many spikes to 2x-10x
  • 14. C* test - result • Client 18K reads/sec with 1K writes/sec • Use TC to force a slow node • Keep 95th Speculative Retry setting on • Throughput increased to 20+K reads/sec • Avg. latency gets doubled 0.5ms to 1ms • 95th/99th latencies are lower, no spikes or more stable
  • 15. C* test - conclusion • More stable and lower 95th/99th latencies: – Build cluster with extra capacities and turn on Speculative Retry • C* cluster is already near its max capacity – Disable Speculative Retry as performance getting worse
  • 16. Client Speculative Retries • SpeculativeExecutionPolicy • Speculative retries for both reads and writes • But only for idempotent requests
  • 17. Client Speculative Retries Example using CL_LOCAL_QUORUM (replication factor 3)
  • 18. NoSpeculativeExecutionPolicy • Just the normal Driver behavior • Retries are attempted only after a fixed timeout as part of Failover
  • 19. ConstantSpeculativeExecutionPolicy Trigger a Speculative Retry if the Coordinator hasn’t answer within a given delay.
  • 20. PercentileSpeculativeExecutionPolicy Trigger speculative retries if the chosen coordinator doesn’t answer within a given percentile of its typical response time