SlideShare a Scribd company logo
1 of 39
Download to read offline
Paul Balogh
Developer Advocate, Grafana Labs
@javaducky
Embracing Disruption
Adding a Bit of Chaos to Help You Grow!
Overview
1
2
3
4
Why are we here?
A brief history of how we test
How fault injection can help us
Where do we go from here?
Complex
architecture and
infrastructure
Many potential
points of failure
Inadequate
tooling and
practices
Application reliability is hard
High demands on availability
SLOs
UX
High demands on usability
Distributed
Ever increasing complexity
(Example of Netflix services)
Fragility
Potentially fateful interdependency
https://xkcd.com/2347/
You are not alone
Overview
1
2
3
4
Why are we here?
A brief history of how we test
How fault injection can help us
Where do we go from here?
Release frequency
How it’s initiated
Testing environment
Testing frequency
Checklist / OLD WAY
Before releases
Test and Production
Manually
Quarterly or biannually
The way we test
● QA bottleneck
● Lower coverage
● Late in process
Release frequency
How it’s initiated
Testing environment
Testing frequency
DevOps / MODERN WAY
Weekly, Daily, As needed
Nightly, feature branches, continuous with
synthetic monitoring
Scheduled. Automatically as part of CI/CD
Staging (Long-lived) and ephemeral
environments (Short-lived)
Checklist / OLD WAY
Before releases
Test and Production
Manually
Quarterly or biannually
The way we test
● Unit testing
● Integration testing
● Contract testing
● Functional testing
● E2E testing
● Load testing
The way we test
The way we test
● Applications
instrumented
● Observability
platform available
● Start simple
● Test frequently
● Continually expand
● Evolve over time
The way we test
Time
Yet we learn from
failure
Overview
1
2
3
4
A brief history of how we test
How fault injection can help us
Where do we go from here?
Why are we here?
A software testing technique which
introduces errors to a system to
ensure it can withstand and recover
from those conditions.
Fault Injection
Failure happens
Test Release Deploy Operate
Production
Monitor
e Build
DEV OPS
��
Resolve, Inform, Learn
Build more confidence to withstand failures?
Chaos
Testing
☑
☑
☒ ��
Shift left
Learn from
Incidents
Production systems
Development
From the distributed system perspective, almost all
interesting availability experiments can be driven by
affecting latency or response type.
Nora Jones
Casey Rosenthal
- Chaos Engineering, O’Reilly
● Formerly known as Load Impact
● Open Source since 2016
● ~22.4k GitHub stars (as of January 2024)
● Promotes “shift-left” testing
● Acquired by Grafana Labs in 2021
github.com/grafana/k6
Introducing k6 and xk6-disruptor
k6, a reliability testing tool
● Becomes a project in August 2022, evolved
from previous experiments
github.com/grafana/xk6-disruptor
xk6-disruptor for fault injection
OSS is at the
heart of what
we do and
helps leave the
world a little
better than we
found it
CLI and API
designed for
automating
your tests with
pass/fail criteria
using JavaScript
syntax
A k6 engine
written in Go
making it one of
the the best
performing
tools available
Use Go(lang) code
to add support for
new outputs,
protocols, and
products from
within your test
scripts
OpenSource Scriptable Performant Extensible
k6: a reliability testing tool
Testing with
errors
92% of the catastrophic system
failures are the result of incorrect
handling of non-fatal errors
In 58% of the cases the resulting
faults could have been detected
through simple testing of error
handling code
How effective is
testing known
errors?
“Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems”
Yuan et al. USENIX OSDI 2014
In 35% of the cases, error handling
code falls into one of three patterns:
Overreactive. Aborts the system under
non-fatal errors
Low Context. Was empty or only
contained a log printing statement
Incomplete. Related comments like
“FIXME” or “TODO”
How to improve
error handling?
Incorporate chaos engineering
principles early in the development
process
Emphasize verification over
experimentation
Change focus from uncovering
unknown faults to ensuring proper
handling of known faults
Introduce
chaos testing
Continually improve reliability
Chaos
Testing
☑
☑
☒
Chaos
Experiments
Incident
Enacting
��
Shift left
Production systems
Development
Progress
towards
Improve
Learn from
Incidents
Incremental
adoption
Application
Centric
Controlled
Chaos
Chaos as
Code
< >
⚙
Four tenets of Chaos Testing
Chaos Testing
in action
OpenTelemetry Demo - Astronomy Shop
● Microservices architecture
● HTTP, gRPC, Kafka
between services
● Polyglot (Go, Java, JS, …)
● Kubernetes-ready
https://github.com/open-telemetry/opentelemetry-demo
https://github.com/open-telemetry/opentelemetry-demo
OpenTelemetry Demo - Astronomy Shop
● Microservices architecture
● HTTP, gRPC, Kafka
between services
● Polyglot (Go, Java, JS, …)
● Kubernetes-ready
��
How would an incident
affect our services?
● Tests can be reused to validate the system under turbulent conditions
● Conditions are defined in familiar terms: latency and error rate
● Tests have a controlled effect on the target service
● Tests are repeatable with results that are predictable
● Fault injection is coordinated from the test code
● Fault injection should not add any operational complexity
Chaos testing principles in action
Overview
1
2
3
4
A brief history of how we test
How fault injection can help us
Where do we go from here?
Why are we here?
Integration
Testing
Contract
Testing
Reliability testing strategy
Browser
Automation
(E2E)
Load
Testing
Functional
Testing
Chaos
Testing
PRE-PRODUCTION PRODUCTION
Virtual User
traffic
Real User
traffic
Virtual User
traffic
SUT SUT
Proactively improve reliability
Final remarks
The ability to operate reliably should not
be a privilege of the technology elite
Chaos Engineering can be democratized
by promoting the adoption of Chaos
Testing
To be effective, Chaos Testing must be
compatible with the existing testing
practices used by development teams
Make Chaos Engineering practices
accessible to a broad spectrum of
organizations by building a solid
foundation from which they can
progress towards more reliable
applications
Our Goal
Connect with Paul as
@javaducky or linkedin/in/pabalogh
Thanks for participating!
k6.io/slack grafana/xk6-disruptor

More Related Content

Similar to Embracing Disruption: Adding a Bit of Chaos to Help You Grow

Deliver Faster with BDD/TDD - Designing Automated Tests That Don't Suck
Deliver Faster with BDD/TDD - Designing Automated Tests That Don't SuckDeliver Faster with BDD/TDD - Designing Automated Tests That Don't Suck
Deliver Faster with BDD/TDD - Designing Automated Tests That Don't SuckKevin Brockhoff
 
Continuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and JenkinsContinuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and JenkinsSOASTA
 
Continuous Delivery with Jenkins declarative pipeline XPDays-2018-12-08
Continuous Delivery with Jenkins declarative pipeline XPDays-2018-12-08Continuous Delivery with Jenkins declarative pipeline XPDays-2018-12-08
Continuous Delivery with Jenkins declarative pipeline XPDays-2018-12-08Борис Зора
 
Running distributed tests with k6.pdf
Running distributed tests with k6.pdfRunning distributed tests with k6.pdf
Running distributed tests with k6.pdfLibbySchulze
 
DevQAOps - Surviving in a DevOps World
DevQAOps - Surviving in a DevOps WorldDevQAOps - Surviving in a DevOps World
DevQAOps - Surviving in a DevOps WorldWinston Laoh
 
Continuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and JenkinsContinuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and JenkinsSOASTA
 
Cloud Native Testing, 2020 Edition: A Modern Blueprint for Pre-production Tes...
Cloud Native Testing, 2020 Edition: A Modern Blueprint for Pre-production Tes...Cloud Native Testing, 2020 Edition: A Modern Blueprint for Pre-production Tes...
Cloud Native Testing, 2020 Edition: A Modern Blueprint for Pre-production Tes...OlyaSurits
 
5 Steps to Jump Start Your Test Automation
5 Steps to Jump Start Your Test Automation5 Steps to Jump Start Your Test Automation
5 Steps to Jump Start Your Test AutomationSauce Labs
 
No Devops Without Continuous Testing
No Devops Without Continuous TestingNo Devops Without Continuous Testing
No Devops Without Continuous TestingParasoft
 
Level Up Your Integration Testing With Testcontainers
Level Up Your Integration Testing With TestcontainersLevel Up Your Integration Testing With Testcontainers
Level Up Your Integration Testing With TestcontainersVMware Tanzu
 
Principles and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at EtsyPrinciples and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at EtsyMike Brittain
 
From 0 to DevOps in 80 Days [Webinar Replay]
From 0 to DevOps in 80 Days [Webinar Replay]From 0 to DevOps in 80 Days [Webinar Replay]
From 0 to DevOps in 80 Days [Webinar Replay]Dynatrace
 
Anatomy of a Build Pipeline
Anatomy of a Build PipelineAnatomy of a Build Pipeline
Anatomy of a Build PipelineSamuel Brown
 
Developer Experience to Testing
Developer Experience to TestingDeveloper Experience to Testing
Developer Experience to TestingMozaic Works
 
Skip Staging! Test Docker, Helm, and Kubernetes Apps like a Pro
Skip Staging! Test Docker, Helm, and Kubernetes Apps like a ProSkip Staging! Test Docker, Helm, and Kubernetes Apps like a Pro
Skip Staging! Test Docker, Helm, and Kubernetes Apps like a ProCodefresh
 
Oracle Forms Performance Testing PushToTest TestMaker JAT
Oracle Forms Performance Testing PushToTest TestMaker JATOracle Forms Performance Testing PushToTest TestMaker JAT
Oracle Forms Performance Testing PushToTest TestMaker JATClever Moe
 
Innovate Better Through Machine data Analytics
Innovate Better Through Machine data AnalyticsInnovate Better Through Machine data Analytics
Innovate Better Through Machine data AnalyticsHal Rottenberg
 
Google, quality and you
Google, quality and youGoogle, quality and you
Google, quality and younelinger
 

Similar to Embracing Disruption: Adding a Bit of Chaos to Help You Grow (20)

Deliver Faster with BDD/TDD - Designing Automated Tests That Don't Suck
Deliver Faster with BDD/TDD - Designing Automated Tests That Don't SuckDeliver Faster with BDD/TDD - Designing Automated Tests That Don't Suck
Deliver Faster with BDD/TDD - Designing Automated Tests That Don't Suck
 
Continuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and JenkinsContinuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and Jenkins
 
Continuous Delivery with Jenkins declarative pipeline XPDays-2018-12-08
Continuous Delivery with Jenkins declarative pipeline XPDays-2018-12-08Continuous Delivery with Jenkins declarative pipeline XPDays-2018-12-08
Continuous Delivery with Jenkins declarative pipeline XPDays-2018-12-08
 
Running distributed tests with k6.pdf
Running distributed tests with k6.pdfRunning distributed tests with k6.pdf
Running distributed tests with k6.pdf
 
DevQAOps - Surviving in a DevOps World
DevQAOps - Surviving in a DevOps WorldDevQAOps - Surviving in a DevOps World
DevQAOps - Surviving in a DevOps World
 
Continuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and JenkinsContinuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and Jenkins
 
Cloud Native Testing, 2020 Edition: A Modern Blueprint for Pre-production Tes...
Cloud Native Testing, 2020 Edition: A Modern Blueprint for Pre-production Tes...Cloud Native Testing, 2020 Edition: A Modern Blueprint for Pre-production Tes...
Cloud Native Testing, 2020 Edition: A Modern Blueprint for Pre-production Tes...
 
5 Steps to Jump Start Your Test Automation
5 Steps to Jump Start Your Test Automation5 Steps to Jump Start Your Test Automation
5 Steps to Jump Start Your Test Automation
 
No Devops Without Continuous Testing
No Devops Without Continuous TestingNo Devops Without Continuous Testing
No Devops Without Continuous Testing
 
Level Up Your Integration Testing With Testcontainers
Level Up Your Integration Testing With TestcontainersLevel Up Your Integration Testing With Testcontainers
Level Up Your Integration Testing With Testcontainers
 
Principles and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at EtsyPrinciples and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at Etsy
 
From 0 to DevOps in 80 Days [Webinar Replay]
From 0 to DevOps in 80 Days [Webinar Replay]From 0 to DevOps in 80 Days [Webinar Replay]
From 0 to DevOps in 80 Days [Webinar Replay]
 
Api gitlab: configurazione dei progetti as a service
Api gitlab: configurazione dei progetti as a serviceApi gitlab: configurazione dei progetti as a service
Api gitlab: configurazione dei progetti as a service
 
Continuous testing
Continuous testing Continuous testing
Continuous testing
 
Anatomy of a Build Pipeline
Anatomy of a Build PipelineAnatomy of a Build Pipeline
Anatomy of a Build Pipeline
 
Developer Experience to Testing
Developer Experience to TestingDeveloper Experience to Testing
Developer Experience to Testing
 
Skip Staging! Test Docker, Helm, and Kubernetes Apps like a Pro
Skip Staging! Test Docker, Helm, and Kubernetes Apps like a ProSkip Staging! Test Docker, Helm, and Kubernetes Apps like a Pro
Skip Staging! Test Docker, Helm, and Kubernetes Apps like a Pro
 
Oracle Forms Performance Testing PushToTest TestMaker JAT
Oracle Forms Performance Testing PushToTest TestMaker JATOracle Forms Performance Testing PushToTest TestMaker JAT
Oracle Forms Performance Testing PushToTest TestMaker JAT
 
Innovate Better Through Machine data Analytics
Innovate Better Through Machine data AnalyticsInnovate Better Through Machine data Analytics
Innovate Better Through Machine data Analytics
 
Google, quality and you
Google, quality and youGoogle, quality and you
Google, quality and you
 

More from Paul Balogh

Creating Realistic Unit Tests with Testcontainers
Creating Realistic Unit Tests with TestcontainersCreating Realistic Unit Tests with Testcontainers
Creating Realistic Unit Tests with TestcontainersPaul Balogh
 
Trace-based Testing with Tracetest
Trace-based Testing with TracetestTrace-based Testing with Tracetest
Trace-based Testing with TracetestPaul Balogh
 
Reliability Pipelines With Keptn Quality Gates
Reliability Pipelines With Keptn Quality GatesReliability Pipelines With Keptn Quality Gates
Reliability Pipelines With Keptn Quality GatesPaul Balogh
 
It's just a jump to the left...
It's just a jump to the left...It's just a jump to the left...
It's just a jump to the left...Paul Balogh
 
StLGo Meeting Intro, March 2020
StLGo Meeting Intro, March 2020StLGo Meeting Intro, March 2020
StLGo Meeting Intro, March 2020Paul Balogh
 
2020 02-26-meeting intro
2020 02-26-meeting intro2020 02-26-meeting intro
2020 02-26-meeting introPaul Balogh
 
Let's Go @ St. Louis CocoaHeads
Let's Go @ St. Louis CocoaHeadsLet's Go @ St. Louis CocoaHeads
Let's Go @ St. Louis CocoaHeadsPaul Balogh
 

More from Paul Balogh (7)

Creating Realistic Unit Tests with Testcontainers
Creating Realistic Unit Tests with TestcontainersCreating Realistic Unit Tests with Testcontainers
Creating Realistic Unit Tests with Testcontainers
 
Trace-based Testing with Tracetest
Trace-based Testing with TracetestTrace-based Testing with Tracetest
Trace-based Testing with Tracetest
 
Reliability Pipelines With Keptn Quality Gates
Reliability Pipelines With Keptn Quality GatesReliability Pipelines With Keptn Quality Gates
Reliability Pipelines With Keptn Quality Gates
 
It's just a jump to the left...
It's just a jump to the left...It's just a jump to the left...
It's just a jump to the left...
 
StLGo Meeting Intro, March 2020
StLGo Meeting Intro, March 2020StLGo Meeting Intro, March 2020
StLGo Meeting Intro, March 2020
 
2020 02-26-meeting intro
2020 02-26-meeting intro2020 02-26-meeting intro
2020 02-26-meeting intro
 
Let's Go @ St. Louis CocoaHeads
Let's Go @ St. Louis CocoaHeadsLet's Go @ St. Louis CocoaHeads
Let's Go @ St. Louis CocoaHeads
 

Recently uploaded

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Recently uploaded (20)

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Embracing Disruption: Adding a Bit of Chaos to Help You Grow

  • 1. Paul Balogh Developer Advocate, Grafana Labs @javaducky Embracing Disruption Adding a Bit of Chaos to Help You Grow!
  • 2. Overview 1 2 3 4 Why are we here? A brief history of how we test How fault injection can help us Where do we go from here?
  • 3. Complex architecture and infrastructure Many potential points of failure Inadequate tooling and practices Application reliability is hard
  • 4. High demands on availability SLOs
  • 5. UX High demands on usability
  • 8. You are not alone
  • 9. Overview 1 2 3 4 Why are we here? A brief history of how we test How fault injection can help us Where do we go from here?
  • 10. Release frequency How it’s initiated Testing environment Testing frequency Checklist / OLD WAY Before releases Test and Production Manually Quarterly or biannually The way we test ● QA bottleneck ● Lower coverage ● Late in process
  • 11. Release frequency How it’s initiated Testing environment Testing frequency DevOps / MODERN WAY Weekly, Daily, As needed Nightly, feature branches, continuous with synthetic monitoring Scheduled. Automatically as part of CI/CD Staging (Long-lived) and ephemeral environments (Short-lived) Checklist / OLD WAY Before releases Test and Production Manually Quarterly or biannually The way we test
  • 12. ● Unit testing ● Integration testing ● Contract testing ● Functional testing ● E2E testing ● Load testing The way we test
  • 13. The way we test ● Applications instrumented ● Observability platform available
  • 14. ● Start simple ● Test frequently ● Continually expand ● Evolve over time The way we test Time
  • 15. Yet we learn from failure
  • 16. Overview 1 2 3 4 A brief history of how we test How fault injection can help us Where do we go from here? Why are we here?
  • 17. A software testing technique which introduces errors to a system to ensure it can withstand and recover from those conditions. Fault Injection
  • 18. Failure happens Test Release Deploy Operate Production Monitor e Build DEV OPS �� Resolve, Inform, Learn
  • 19. Build more confidence to withstand failures? Chaos Testing ☑ ☑ ☒ �� Shift left Learn from Incidents Production systems Development
  • 20. From the distributed system perspective, almost all interesting availability experiments can be driven by affecting latency or response type. Nora Jones Casey Rosenthal - Chaos Engineering, O’Reilly
  • 21. ● Formerly known as Load Impact ● Open Source since 2016 ● ~22.4k GitHub stars (as of January 2024) ● Promotes “shift-left” testing ● Acquired by Grafana Labs in 2021 github.com/grafana/k6 Introducing k6 and xk6-disruptor k6, a reliability testing tool ● Becomes a project in August 2022, evolved from previous experiments github.com/grafana/xk6-disruptor xk6-disruptor for fault injection
  • 22. OSS is at the heart of what we do and helps leave the world a little better than we found it CLI and API designed for automating your tests with pass/fail criteria using JavaScript syntax A k6 engine written in Go making it one of the the best performing tools available Use Go(lang) code to add support for new outputs, protocols, and products from within your test scripts OpenSource Scriptable Performant Extensible k6: a reliability testing tool
  • 24. 92% of the catastrophic system failures are the result of incorrect handling of non-fatal errors In 58% of the cases the resulting faults could have been detected through simple testing of error handling code How effective is testing known errors? “Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems” Yuan et al. USENIX OSDI 2014
  • 25. In 35% of the cases, error handling code falls into one of three patterns: Overreactive. Aborts the system under non-fatal errors Low Context. Was empty or only contained a log printing statement Incomplete. Related comments like “FIXME” or “TODO” How to improve error handling?
  • 26. Incorporate chaos engineering principles early in the development process Emphasize verification over experimentation Change focus from uncovering unknown faults to ensuring proper handling of known faults Introduce chaos testing
  • 27. Continually improve reliability Chaos Testing ☑ ☑ ☒ Chaos Experiments Incident Enacting �� Shift left Production systems Development Progress towards Improve Learn from Incidents
  • 30. OpenTelemetry Demo - Astronomy Shop ● Microservices architecture ● HTTP, gRPC, Kafka between services ● Polyglot (Go, Java, JS, …) ● Kubernetes-ready https://github.com/open-telemetry/opentelemetry-demo
  • 31. https://github.com/open-telemetry/opentelemetry-demo OpenTelemetry Demo - Astronomy Shop ● Microservices architecture ● HTTP, gRPC, Kafka between services ● Polyglot (Go, Java, JS, …) ● Kubernetes-ready �� How would an incident affect our services?
  • 32.
  • 33. ● Tests can be reused to validate the system under turbulent conditions ● Conditions are defined in familiar terms: latency and error rate ● Tests have a controlled effect on the target service ● Tests are repeatable with results that are predictable ● Fault injection is coordinated from the test code ● Fault injection should not add any operational complexity Chaos testing principles in action
  • 34. Overview 1 2 3 4 A brief history of how we test How fault injection can help us Where do we go from here? Why are we here?
  • 36. PRE-PRODUCTION PRODUCTION Virtual User traffic Real User traffic Virtual User traffic SUT SUT Proactively improve reliability
  • 37. Final remarks The ability to operate reliably should not be a privilege of the technology elite Chaos Engineering can be democratized by promoting the adoption of Chaos Testing To be effective, Chaos Testing must be compatible with the existing testing practices used by development teams
  • 38. Make Chaos Engineering practices accessible to a broad spectrum of organizations by building a solid foundation from which they can progress towards more reliable applications Our Goal
  • 39. Connect with Paul as @javaducky or linkedin/in/pabalogh Thanks for participating! k6.io/slack grafana/xk6-disruptor