SlideShare a Scribd company logo
Improving Tail Latency of
Stateful Cloud Services via
GC Control and Load Shedding
Daniel Fireman danielfireman@gmail.com
João Brunet, Raquel Lopes, David Quaresma, Thiago Emmanuel Pereira
Runtime Environments (RTEs)
Manage/Control/Support the program execution
● Platform independence
● Abstractions
○ Memory model
○ Execution
○ Concurrency
● Safety
○ Checks, checks and more checks
○ Automatic memory management
● ...
Cloud loves RTEs
Garbage Collection (GC) impact on latency
CPU Competition
+
Pauses
Overhead
distribution tail:
0.1% of the
requests were
impacted
Scaled out view of the latency impact
1 - 0.99 100
= 63%
Number of endpoints
reached to serve one
request
Considered percentile (when
does a peak happen?)
Which ends up harming
● User experience
● Predictability
● SLAs
You could try
● Tune the GC configuration
● Investigate and fix spots of GC pressure
● Go off-heap
● Switch to manual memory management
● Buy comercial implementations and support
If all that is too hard or expensive
Avoid collecting garbage while processing requests
Garbage Collector Control
Interceptor (GCI)
Overview
Components: Proxy
● Intercepts all requests
● Service/RTE agnostic
● Decides when to check heap
● Decides which requests to shed
github.com/gcinterceptor/gci-proxy
Components: Request Processor (a.k.a Agent)
● Exec. commands from the proxy
● RTE Specific
● Calls RTE’s APIs
○ Check Heap
○ GC
github.com/gcinterceptor/gci-{java,ruby, nodejs, go}
How GCI works: part 1
How GCI works: part 2
How GCI works: part 3
And more!
● Plug and Play
● Adaptive
● Fully decentralized
● Runtime agnostic
○ uses available APIs to trigger garbage collection and check the heap
● Transport agnostic
○ uses the available interception and load shedding mechanisms to avoid
receiving request during collection
Evaluation
Research Question
Does GCI shorten the tail of the latency distribution of
stateful services without significantly penalizing the service
throughput?
Small (4 nodes) and Large (64 clusters)
Stateful services matter
Simulator
github.com/gcinterceptor/gci-simulator
Results - Large Cluster
Does GCI shorten the tail of the latency distribution of
stateful services without significantly penalizing the service
throughput?
Results - Large Cluster
● 99th → 30%
● 99.999th → 47%
● No throughput loss
Yes!
Conclusions and Next Steps
● Avoid processing requests while GC’ing is effective to improve tail latency of
stateful cloud services
GCI is an easy-to-use mechanism to achieve that
● Furthermore, it is adaptive, low overhead and fully distributed!
Next we would like to:
● Decrease the proxy overhead
● Experiment a more realistic application (e.g. Hazelcast) and load (e.g. YCSB)
● Investigate stateless and DSP cloud applications
Thank you
@daniellfireman@danielfireman
github.com/gcinterceptor

More Related Content

What's hot

QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)
TAUS - The Language Data Network
 
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward
 
Monitoring with riemann
Monitoring with riemannMonitoring with riemann
Monitoring with riemann
Abhishek Amralkar
 
Cassandra Meetup Nov 2019 - Cassandra Resiliency
Cassandra Meetup Nov 2019 -  Cassandra ResiliencyCassandra Meetup Nov 2019 -  Cassandra Resiliency
Cassandra Meetup Nov 2019 - Cassandra Resiliency
Sumanth Pasupuleti
 
ICANN DNS Symposium (IDS 2019): RDAP CDN Distribution Experience
ICANN DNS Symposium (IDS 2019): RDAP CDN Distribution ExperienceICANN DNS Symposium (IDS 2019): RDAP CDN Distribution Experience
ICANN DNS Symposium (IDS 2019): RDAP CDN Distribution Experience
APNIC
 
Flink Forward Berlin 2017: Kostas Kloudas - Complex Event Processing with Fli...
Flink Forward Berlin 2017: Kostas Kloudas - Complex Event Processing with Fli...Flink Forward Berlin 2017: Kostas Kloudas - Complex Event Processing with Fli...
Flink Forward Berlin 2017: Kostas Kloudas - Complex Event Processing with Fli...
Flink Forward
 
Serverless Apps on Google Cloud: more dev, less ops
Serverless Apps on Google Cloud:  more dev, less opsServerless Apps on Google Cloud:  more dev, less ops
Serverless Apps on Google Cloud: more dev, less ops
Joseph Lust
 

What's hot (8)

QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)
 
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
 
Monitoring with riemann
Monitoring with riemannMonitoring with riemann
Monitoring with riemann
 
Cassandra Meetup Nov 2019 - Cassandra Resiliency
Cassandra Meetup Nov 2019 -  Cassandra ResiliencyCassandra Meetup Nov 2019 -  Cassandra Resiliency
Cassandra Meetup Nov 2019 - Cassandra Resiliency
 
ICANN DNS Symposium (IDS 2019): RDAP CDN Distribution Experience
ICANN DNS Symposium (IDS 2019): RDAP CDN Distribution ExperienceICANN DNS Symposium (IDS 2019): RDAP CDN Distribution Experience
ICANN DNS Symposium (IDS 2019): RDAP CDN Distribution Experience
 
Flink Forward Berlin 2017: Kostas Kloudas - Complex Event Processing with Fli...
Flink Forward Berlin 2017: Kostas Kloudas - Complex Event Processing with Fli...Flink Forward Berlin 2017: Kostas Kloudas - Complex Event Processing with Fli...
Flink Forward Berlin 2017: Kostas Kloudas - Complex Event Processing with Fli...
 
vk1
vk1vk1
vk1
 
Serverless Apps on Google Cloud: more dev, less ops
Serverless Apps on Google Cloud:  more dev, less opsServerless Apps on Google Cloud:  more dev, less ops
Serverless Apps on Google Cloud: more dev, less ops
 

Similar to Improving Tail Latency of Stateful Cloud Services via GC Control and Load Shedding

Container world 2019 Canary Release
Container world 2019 Canary ReleaseContainer world 2019 Canary Release
Container world 2019 Canary Release
Billy Yuen
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
Ed Hunter
 
Infrastructure setup on Google Cloud using terraform and Ansible
Infrastructure setup on Google Cloud using terraform and AnsibleInfrastructure setup on Google Cloud using terraform and Ansible
Infrastructure setup on Google Cloud using terraform and Ansible
Knoldus Inc.
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3
LibbySchulze
 
Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...
Omid Vahdaty
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
confluent
 
AWS Techniques and lessons writing low cost autoscaling GitLab runners
AWS Techniques and lessons writing low cost autoscaling GitLab runnersAWS Techniques and lessons writing low cost autoscaling GitLab runners
AWS Techniques and lessons writing low cost autoscaling GitLab runners
Anthony Scata
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
confluent
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Brian Brazil
 
Reactive by example (DevOpsDaysTLV 2019)
Reactive by example (DevOpsDaysTLV 2019)Reactive by example (DevOpsDaysTLV 2019)
Reactive by example (DevOpsDaysTLV 2019)
Eran Harel
 
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
Haggai Philip Zagury
 
Designing apps for resiliency
Designing apps for resiliencyDesigning apps for resiliency
Designing apps for resiliency
Masashi Narumoto
 
Three Perspectives on Measuring Latency
Three Perspectives on Measuring LatencyThree Perspectives on Measuring Latency
Three Perspectives on Measuring Latency
ScyllaDB
 
Benchmarks, performance, scalability, and capacity what s behind the numbers...
Benchmarks, performance, scalability, and capacity  what s behind the numbers...Benchmarks, performance, scalability, and capacity  what s behind the numbers...
Benchmarks, performance, scalability, and capacity what s behind the numbers...
james tong
 
Benchmarks, performance, scalability, and capacity what's behind the numbers
Benchmarks, performance, scalability, and capacity what's behind the numbersBenchmarks, performance, scalability, and capacity what's behind the numbers
Benchmarks, performance, scalability, and capacity what's behind the numbers
Justin Dorfman
 
2 years into drinking the Microservice kool-aid (Fact and Fiction)
2 years into drinking the Microservice kool-aid (Fact and Fiction)2 years into drinking the Microservice kool-aid (Fact and Fiction)
2 years into drinking the Microservice kool-aid (Fact and Fiction)
roblund
 
Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google
Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, GoogleBringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google
Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google
Ambassador Labs
 
Monitoring and automation
Monitoring and automationMonitoring and automation
Monitoring and automation
Ricardo Bánffy
 
Rate limits and Performance
Rate limits and PerformanceRate limits and Performance
Rate limits and Performance
supergigas
 
Microservices summit talk 1/31
Microservices summit talk   1/31Microservices summit talk   1/31
Microservices summit talk 1/31
Varun Talwar
 

Similar to Improving Tail Latency of Stateful Cloud Services via GC Control and Load Shedding (20)

Container world 2019 Canary Release
Container world 2019 Canary ReleaseContainer world 2019 Canary Release
Container world 2019 Canary Release
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
 
Infrastructure setup on Google Cloud using terraform and Ansible
Infrastructure setup on Google Cloud using terraform and AnsibleInfrastructure setup on Google Cloud using terraform and Ansible
Infrastructure setup on Google Cloud using terraform and Ansible
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3
 
Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
 
AWS Techniques and lessons writing low cost autoscaling GitLab runners
AWS Techniques and lessons writing low cost autoscaling GitLab runnersAWS Techniques and lessons writing low cost autoscaling GitLab runners
AWS Techniques and lessons writing low cost autoscaling GitLab runners
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
Reactive by example (DevOpsDaysTLV 2019)
Reactive by example (DevOpsDaysTLV 2019)Reactive by example (DevOpsDaysTLV 2019)
Reactive by example (DevOpsDaysTLV 2019)
 
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
 
Designing apps for resiliency
Designing apps for resiliencyDesigning apps for resiliency
Designing apps for resiliency
 
Three Perspectives on Measuring Latency
Three Perspectives on Measuring LatencyThree Perspectives on Measuring Latency
Three Perspectives on Measuring Latency
 
Benchmarks, performance, scalability, and capacity what s behind the numbers...
Benchmarks, performance, scalability, and capacity  what s behind the numbers...Benchmarks, performance, scalability, and capacity  what s behind the numbers...
Benchmarks, performance, scalability, and capacity what s behind the numbers...
 
Benchmarks, performance, scalability, and capacity what's behind the numbers
Benchmarks, performance, scalability, and capacity what's behind the numbersBenchmarks, performance, scalability, and capacity what's behind the numbers
Benchmarks, performance, scalability, and capacity what's behind the numbers
 
2 years into drinking the Microservice kool-aid (Fact and Fiction)
2 years into drinking the Microservice kool-aid (Fact and Fiction)2 years into drinking the Microservice kool-aid (Fact and Fiction)
2 years into drinking the Microservice kool-aid (Fact and Fiction)
 
Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google
Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, GoogleBringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google
Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google
 
Monitoring and automation
Monitoring and automationMonitoring and automation
Monitoring and automation
 
Rate limits and Performance
Rate limits and PerformanceRate limits and Performance
Rate limits and Performance
 
Microservices summit talk 1/31
Microservices summit talk   1/31Microservices summit talk   1/31
Microservices summit talk 1/31
 

Recently uploaded

Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
Google
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 

Recently uploaded (20)

Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 

Improving Tail Latency of Stateful Cloud Services via GC Control and Load Shedding

  • 1. Improving Tail Latency of Stateful Cloud Services via GC Control and Load Shedding Daniel Fireman danielfireman@gmail.com João Brunet, Raquel Lopes, David Quaresma, Thiago Emmanuel Pereira
  • 2. Runtime Environments (RTEs) Manage/Control/Support the program execution ● Platform independence ● Abstractions ○ Memory model ○ Execution ○ Concurrency ● Safety ○ Checks, checks and more checks ○ Automatic memory management ● ...
  • 4. Garbage Collection (GC) impact on latency CPU Competition + Pauses Overhead distribution tail: 0.1% of the requests were impacted
  • 5. Scaled out view of the latency impact 1 - 0.99 100 = 63% Number of endpoints reached to serve one request Considered percentile (when does a peak happen?)
  • 6. Which ends up harming ● User experience ● Predictability ● SLAs
  • 7. You could try ● Tune the GC configuration ● Investigate and fix spots of GC pressure ● Go off-heap ● Switch to manual memory management ● Buy comercial implementations and support
  • 8. If all that is too hard or expensive Avoid collecting garbage while processing requests
  • 11. Components: Proxy ● Intercepts all requests ● Service/RTE agnostic ● Decides when to check heap ● Decides which requests to shed github.com/gcinterceptor/gci-proxy
  • 12. Components: Request Processor (a.k.a Agent) ● Exec. commands from the proxy ● RTE Specific ● Calls RTE’s APIs ○ Check Heap ○ GC github.com/gcinterceptor/gci-{java,ruby, nodejs, go}
  • 13. How GCI works: part 1
  • 14. How GCI works: part 2
  • 15. How GCI works: part 3
  • 16. And more! ● Plug and Play ● Adaptive ● Fully decentralized ● Runtime agnostic ○ uses available APIs to trigger garbage collection and check the heap ● Transport agnostic ○ uses the available interception and load shedding mechanisms to avoid receiving request during collection
  • 18. Research Question Does GCI shorten the tail of the latency distribution of stateful services without significantly penalizing the service throughput? Small (4 nodes) and Large (64 clusters)
  • 21. Results - Large Cluster Does GCI shorten the tail of the latency distribution of stateful services without significantly penalizing the service throughput?
  • 22. Results - Large Cluster ● 99th → 30% ● 99.999th → 47% ● No throughput loss Yes!
  • 23. Conclusions and Next Steps ● Avoid processing requests while GC’ing is effective to improve tail latency of stateful cloud services GCI is an easy-to-use mechanism to achieve that ● Furthermore, it is adaptive, low overhead and fully distributed! Next we would like to: ● Decrease the proxy overhead ● Experiment a more realistic application (e.g. Hazelcast) and load (e.g. YCSB) ● Investigate stateless and DSP cloud applications