SlideShare a Scribd company logo
1 of 45
Download to read offline
Distributed system fault
injection testing with Docker
Jedrzej Dabrowa
Software Engineer
Engineering optimized for impact
What will I be talking about?
● What fails and why?
● Case study
○ Use of Elasticsearch in Base
○ Elasticsearch Proxy
● Testing microservices
● Fault injection tests toolkit
● Examples
Engineering optimized for impact
Microservices - quick recap
● Hard to do, even harder to do right
Engineering optimized for impact
Microservices - quick recap
● Hard to do, even harder to do right
● You get all the benefits of distributed systems…
Engineering optimized for impact
Microservices - quick recap
● Hard to do, even harder to do right
● You get all the benefits of distributed systems…
● …with all the pains:
Engineering optimized for impact
Microservices - quick recap
● Hard to do, even harder to do right
● You get all the benefits of distributed systems…
● … with all the pains:
○ CAP theorem
nope
Engineering optimized for impact
Microservices - quick recap
● Hard to do, even harder to do right
● You get all the benefits of distributed systems…
● … with all the pains:
○ CAP theorem
○ Cloud-native (SCALE!)
Engineering optimized for impact
Microservices - quick recap
● Hard to do, even harder to do right
● You get all the benefits of distributed systems…
● … with all the pains:
○ CAP theorem
○ Cloud-native (SCALE!)
○ Network failures
https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing
Engineering optimized for impact
Microservices - quick recap
● Hard to do, even harder to do right
● You get all the benefits of distributed systems…
● … with all the pains:
○ CAP theorem
○ Cloud-native (SCALE!)
○ Network failures
○ Hardware failures
Engineering optimized for impact
Microservices - quick recap
● Hard to do, even harder to do right
● You get all the benefits of distributed systems…
● … with all the pains:
○ CAP theorem
○ Cloud-native (SCALE!)
○ Network failures
○ Hardware failures
○ All kind of stuff you would never think about
Engineering optimized for impact
http://www.theregister.co.uk/2013/06/08/facebook_cloud_versus_cloud/
Engineering optimized for impact
Elasticsearch proxy - Case study
● Account ID-based sharding
○ Works well for small to medium, similarly sized clients
Engineering optimized for impact
Engineering optimized for impact
Elasticsearch proxy - Case study
● Account ID-based sharding
○ Works well for small to medium, similarly sized clients
● Account ID sharding
○ Works well for small to medium, similarly sized clients
○ Enter big clients… and problems start
Engineering optimized for impact
●
Engineering optimized for impact
●
Engineering optimized for impact
Elasticsearch proxy - Case study
● Need for new solution
○ Solve existing problems...
■ Keep current solution for small accounts
■ Handle big ones differently
○ … as well as those yet to come. And improve!
■ Prioritize interactive traffic for top notch user experience
■ Provide SLA/QoS for database access
■ Enable dynamic configuration
Engineering optimized for impact
“All problems in computer science can
be solved by another level of indirection”
(“... except of course for the problem of too many indirections”)
David Wheeler
Engineering optimized for impact
account ID
sharding
Custom sharding
Engineering optimized for impact
Testing microservices
● Testing single service is no different
than testing regular application
● Testing microservices-based system
is a whole other story
● New challenges require new
approaches
cost/effortexecution
speed
Mike Cohn
Our target
https://github.com/xolvio/qualityfaster
Engineering optimized for impact
Testing microservices
● Welcome to the
non-deterministic world
Engineering optimized for impact
Testing microservices
● Welcome to the
non-deterministic world
● Sources of complexity:
○ System space complexity
Engineering optimized for impact
● Welcome to the
non-deterministic world
● Sources of complexity:
○ System space complexity
○ Fault space complexity
Testing microservices
Engineering optimized for impact
● Welcome to the
non-deterministic world
● Sources of complexity:
○ System space complexity
○ Fault space complexity
● Impossible to efficiently
explore every possibility
Testing microservices
Engineering optimized for impact
● Interesting example: Netflix
● Chaos Monkey
● Mess with production (!)
● Lineage-driven fault injection
○ https://people.ucsc.edu/~palvaro/socc16.pdf
● Found problem in critical place (App Boot):
○ Brute force exploration would take ~2100
iterations
○ 5 potential failures found in ~200 experiments
● http://techblog.netflix.com/2016/01/automated-failure-testing.html
Testing microservices
Engineering optimized for impact
Engineering optimized for impact
ES Proxy Fault Injection tests
● Need: environment
● Solution: Docker
○ Docker compose
○ Allows to easily setup whole environment
○ Relatively complex system may be hosted on PC
○ Nice, declarative configuration through compose files
Engineering optimized for impact
Engineering optimized for impact
ES Proxy Fault Injection tests
● Need: microservices binaries
● Solution: Amazon ECR
○ Allows teams to share their services
○ Current version easily recognizable
Engineering optimized for impact
Engineering optimized for impact
ES Proxy Fault Injection tests
● Need: Harness tool
● Solution: BATS
○ https://github.com/sstephenson/bats
○ BASH-based tool // :(
○ TAP-compliant
○ Simple
○ Allows to implement isolated test scenarios and setup
environment
Engineering optimized for impact
Engineering optimized for impact
ES Proxy Fault Injection tests
● Need: possibility to inject fault
● Solution: Pumba tool
○ https://github.com/gaia-adm/pumba
○ Based on netem
○ Modifies egress traffic
○ Supports various delay/loss models
○ Relatively fresh (current version: 0.2.6)
Engineering optimized for impact
Engineering optimized for impact
ES Proxy Fault Injection tests scenarios
● So… what do we test?
○ Remember first slide?
○ Focus on most obvious failure
points first
○ What is the potential problem?
How should the application
behave?
Engineering optimized for impact
ES Proxy loses ZK connection
Engineering optimized for impact
ES Proxy Fault Injection tests scenarios
Engineering optimized for impact
ES Proxy Fault Injection tests scenarios
● So… what do we test?
○ Remember first slide?
○ Focus on most obvious failure
points first
○ What is the potential problem?
How should the application
behave?
ZK partition / quorum loss
Engineering optimized for impact
ES Proxy Fault Injection tests scenarios
● So… what do we test?
○ Remember first slide?
○ Focus on most obvious failure
points first
○ What is the potential problem?
How should the application
behave?
There are delays in network
Packets are dropped
Packets are reordered
Engineering optimized for impact
Enable global
800 ms delay
Enable global 8%
package loss
Engineering optimized for impact
Enable global
800 ms delay
Enable global 8%
package loss
Engineering optimized for impact
Conclusions
● Failures will happen
Engineering optimized for impact
Conclusions
● Failures will happen
● Proper design and keeping fault tolerance in mind gives pretty good level of
confidence
Engineering optimized for impact
Conclusions
● Failures will happen
● Proper design and keeping fault tolerance in mind gives pretty good level of
confidence
● Fault injection tests improve software reliability in a way that cannot be
achieved through other kinds of tests
Engineering optimized for impact
Conclusions
● Failures will happen
● Proper design and keeping fault tolerance in mind gives pretty good level of
confidence
● Fault injection tests improve software reliability in a way that cannot be
achieved through other kinds of tests
● Those tests are only as good as you want/need them to be - they’re not
exhaustive
Questions?
44
Engineering optimized for impact
jedrzej.dabrowa@getbase.com
lab.getbase.com/java
@JeDabrowa @getbaselab

More Related Content

What's hot

Daniel Cerecedo | From legacy to cloud... and beyond | Codemotion Madrid 2018
Daniel Cerecedo | From legacy to cloud... and beyond | Codemotion Madrid 2018  Daniel Cerecedo | From legacy to cloud... and beyond | Codemotion Madrid 2018
Daniel Cerecedo | From legacy to cloud... and beyond | Codemotion Madrid 2018 Codemotion
 
JavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systemsJavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systemsBert Jan Schrijver
 
What if you could eliminate the hidden costs of development?
What if you could eliminate the hidden costs of development?What if you could eliminate the hidden costs of development?
What if you could eliminate the hidden costs of development?Rogue Wave Software
 
Agile development practices - How do they really work ?
Agile development practices - How do they really work ?Agile development practices - How do they really work ?
Agile development practices - How do they really work ?anand003
 
JavaLand 2022 - Software architecture in a DevOps world
JavaLand 2022 - Software architecture in a DevOps worldJavaLand 2022 - Software architecture in a DevOps world
JavaLand 2022 - Software architecture in a DevOps worldBert Jan Schrijver
 
Agile principles and practices
Agile principles and practicesAgile principles and practices
Agile principles and practicesVipin Jose
 

What's hot (8)

Daniel Cerecedo | From legacy to cloud... and beyond | Codemotion Madrid 2018
Daniel Cerecedo | From legacy to cloud... and beyond | Codemotion Madrid 2018  Daniel Cerecedo | From legacy to cloud... and beyond | Codemotion Madrid 2018
Daniel Cerecedo | From legacy to cloud... and beyond | Codemotion Madrid 2018
 
Debugging distributed systems
Debugging distributed systemsDebugging distributed systems
Debugging distributed systems
 
JavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systemsJavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systems
 
What if you could eliminate the hidden costs of development?
What if you could eliminate the hidden costs of development?What if you could eliminate the hidden costs of development?
What if you could eliminate the hidden costs of development?
 
Agile development practices - How do they really work ?
Agile development practices - How do they really work ?Agile development practices - How do they really work ?
Agile development practices - How do they really work ?
 
Agileee 2012
Agileee 2012Agileee 2012
Agileee 2012
 
JavaLand 2022 - Software architecture in a DevOps world
JavaLand 2022 - Software architecture in a DevOps worldJavaLand 2022 - Software architecture in a DevOps world
JavaLand 2022 - Software architecture in a DevOps world
 
Agile principles and practices
Agile principles and practicesAgile principles and practices
Agile principles and practices
 

Viewers also liked

DOD 2016 - Kamil Szczygieł - Patching 100 OpenStack Compute Nodes with Zero-d...
DOD 2016 - Kamil Szczygieł - Patching 100 OpenStack Compute Nodes with Zero-d...DOD 2016 - Kamil Szczygieł - Patching 100 OpenStack Compute Nodes with Zero-d...
DOD 2016 - Kamil Szczygieł - Patching 100 OpenStack Compute Nodes with Zero-d...PROIDEA
 
JDD 2016 - Maciej Schmidt - Jak AR i VR Pomogą w Produkcji Twoich Produktów?
JDD 2016 - Maciej Schmidt - Jak AR i VR Pomogą w Produkcji Twoich Produktów?JDD 2016 - Maciej Schmidt - Jak AR i VR Pomogą w Produkcji Twoich Produktów?
JDD 2016 - Maciej Schmidt - Jak AR i VR Pomogą w Produkcji Twoich Produktów?PROIDEA
 
infraxstructure: Mirosław Burnejko "Cloud Migration Checklist – Czyli jakie ...
infraxstructure: Mirosław Burnejko  "Cloud Migration Checklist – Czyli jakie ...infraxstructure: Mirosław Burnejko  "Cloud Migration Checklist – Czyli jakie ...
infraxstructure: Mirosław Burnejko "Cloud Migration Checklist – Czyli jakie ...PROIDEA
 
JDD 2016 - Marcin Stozek - Docker. Przewodnik dla poczatkujacych
JDD 2016 - Marcin Stozek - Docker. Przewodnik dla poczatkujacychJDD 2016 - Marcin Stozek - Docker. Przewodnik dla poczatkujacych
JDD 2016 - Marcin Stozek - Docker. Przewodnik dla poczatkujacychPROIDEA
 
JDD 2016 - Maciej Hryszniak - Webpack and Friends
JDD 2016 - Maciej Hryszniak - Webpack and FriendsJDD 2016 - Maciej Hryszniak - Webpack and Friends
JDD 2016 - Maciej Hryszniak - Webpack and FriendsPROIDEA
 
2016 - Daniel Lebrero - REPL driven development
2016 - Daniel Lebrero - REPL driven development2016 - Daniel Lebrero - REPL driven development
2016 - Daniel Lebrero - REPL driven developmentPROIDEA
 
JDD 2016 - Pawel Byszewski - Kotlin, why?
JDD 2016 - Pawel Byszewski - Kotlin, why?JDD 2016 - Pawel Byszewski - Kotlin, why?
JDD 2016 - Pawel Byszewski - Kotlin, why?PROIDEA
 

Viewers also liked (7)

DOD 2016 - Kamil Szczygieł - Patching 100 OpenStack Compute Nodes with Zero-d...
DOD 2016 - Kamil Szczygieł - Patching 100 OpenStack Compute Nodes with Zero-d...DOD 2016 - Kamil Szczygieł - Patching 100 OpenStack Compute Nodes with Zero-d...
DOD 2016 - Kamil Szczygieł - Patching 100 OpenStack Compute Nodes with Zero-d...
 
JDD 2016 - Maciej Schmidt - Jak AR i VR Pomogą w Produkcji Twoich Produktów?
JDD 2016 - Maciej Schmidt - Jak AR i VR Pomogą w Produkcji Twoich Produktów?JDD 2016 - Maciej Schmidt - Jak AR i VR Pomogą w Produkcji Twoich Produktów?
JDD 2016 - Maciej Schmidt - Jak AR i VR Pomogą w Produkcji Twoich Produktów?
 
infraxstructure: Mirosław Burnejko "Cloud Migration Checklist – Czyli jakie ...
infraxstructure: Mirosław Burnejko  "Cloud Migration Checklist – Czyli jakie ...infraxstructure: Mirosław Burnejko  "Cloud Migration Checklist – Czyli jakie ...
infraxstructure: Mirosław Burnejko "Cloud Migration Checklist – Czyli jakie ...
 
JDD 2016 - Marcin Stozek - Docker. Przewodnik dla poczatkujacych
JDD 2016 - Marcin Stozek - Docker. Przewodnik dla poczatkujacychJDD 2016 - Marcin Stozek - Docker. Przewodnik dla poczatkujacych
JDD 2016 - Marcin Stozek - Docker. Przewodnik dla poczatkujacych
 
JDD 2016 - Maciej Hryszniak - Webpack and Friends
JDD 2016 - Maciej Hryszniak - Webpack and FriendsJDD 2016 - Maciej Hryszniak - Webpack and Friends
JDD 2016 - Maciej Hryszniak - Webpack and Friends
 
2016 - Daniel Lebrero - REPL driven development
2016 - Daniel Lebrero - REPL driven development2016 - Daniel Lebrero - REPL driven development
2016 - Daniel Lebrero - REPL driven development
 
JDD 2016 - Pawel Byszewski - Kotlin, why?
JDD 2016 - Pawel Byszewski - Kotlin, why?JDD 2016 - Pawel Byszewski - Kotlin, why?
JDD 2016 - Pawel Byszewski - Kotlin, why?
 

Similar to JDD 2016 - Jedrzej Dabrowa - Distributed System Fault Injection Testing With Docker

Chaos Engineering
Chaos EngineeringChaos Engineering
Chaos EngineeringYury Roa
 
Data Science in Production: Technologies That Drive Adoption of Data Science ...
Data Science in Production: Technologies That Drive Adoption of Data Science ...Data Science in Production: Technologies That Drive Adoption of Data Science ...
Data Science in Production: Technologies That Drive Adoption of Data Science ...Nir Yungster
 
Chaos Engineering: Injecting Failure for Building Resilience in Systems
Chaos Engineering: Injecting Failure for Building Resilience in SystemsChaos Engineering: Injecting Failure for Building Resilience in Systems
Chaos Engineering: Injecting Failure for Building Resilience in SystemsYury Roa
 
Antifragility and testing for distributed systems failure
Antifragility and testing for distributed systems failureAntifragility and testing for distributed systems failure
Antifragility and testing for distributed systems failureDiUS
 
Rise of the machines: Continuous Delivery at SEEK - YOW! Night Summary Slides
Rise of the machines: Continuous Delivery at SEEK - YOW! Night Summary SlidesRise of the machines: Continuous Delivery at SEEK - YOW! Night Summary Slides
Rise of the machines: Continuous Delivery at SEEK - YOW! Night Summary SlidesDiUS
 
Microservices. Mastering Chaos
Microservices. Mastering ChaosMicroservices. Mastering Chaos
Microservices. Mastering ChaosUP2IT
 
Continuous Integration Testing Techniques to Improve Chef Cookbook Quality
Continuous Integration Testing Techniques to Improve Chef Cookbook QualityContinuous Integration Testing Techniques to Improve Chef Cookbook Quality
Continuous Integration Testing Techniques to Improve Chef Cookbook QualityJosiah Renaudin
 
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)Martin Spier
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning InfrastructureSigOpt
 
chaos-engineering-Knolx
chaos-engineering-Knolxchaos-engineering-Knolx
chaos-engineering-KnolxKnoldus Inc.
 
AWS re:Invent 2016: Hardware-Accelerating Graphics Desktop Workloads with Ama...
AWS re:Invent 2016: Hardware-Accelerating Graphics Desktop Workloads with Ama...AWS re:Invent 2016: Hardware-Accelerating Graphics Desktop Workloads with Ama...
AWS re:Invent 2016: Hardware-Accelerating Graphics Desktop Workloads with Ama...Amazon Web Services
 
Resilience Testing
Resilience Testing Resilience Testing
Resilience Testing Ran Levy
 
Migrate to Microservices Judiciously!
Migrate to Microservices Judiciously!Migrate to Microservices Judiciously!
Migrate to Microservices Judiciously!pflueras
 
Rally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at ScaleRally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at ScaleMirantis
 
DevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 SlidesDevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 SlidesAlex Cruise
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
Why Kubernetes Freedom Requires Chaos Engineering to Shine in Production
Why Kubernetes Freedom Requires Chaos Engineering to Shine in ProductionWhy Kubernetes Freedom Requires Chaos Engineering to Shine in Production
Why Kubernetes Freedom Requires Chaos Engineering to Shine in ProductionScyllaDB
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Codemotion
 
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...Roberto Pérez Alcolea
 

Similar to JDD 2016 - Jedrzej Dabrowa - Distributed System Fault Injection Testing With Docker (20)

Chaos Engineering
Chaos EngineeringChaos Engineering
Chaos Engineering
 
Data Science in Production: Technologies That Drive Adoption of Data Science ...
Data Science in Production: Technologies That Drive Adoption of Data Science ...Data Science in Production: Technologies That Drive Adoption of Data Science ...
Data Science in Production: Technologies That Drive Adoption of Data Science ...
 
Chaos Engineering: Injecting Failure for Building Resilience in Systems
Chaos Engineering: Injecting Failure for Building Resilience in SystemsChaos Engineering: Injecting Failure for Building Resilience in Systems
Chaos Engineering: Injecting Failure for Building Resilience in Systems
 
Antifragility and testing for distributed systems failure
Antifragility and testing for distributed systems failureAntifragility and testing for distributed systems failure
Antifragility and testing for distributed systems failure
 
TestIstanbul 2015
TestIstanbul 2015TestIstanbul 2015
TestIstanbul 2015
 
Rise of the machines: Continuous Delivery at SEEK - YOW! Night Summary Slides
Rise of the machines: Continuous Delivery at SEEK - YOW! Night Summary SlidesRise of the machines: Continuous Delivery at SEEK - YOW! Night Summary Slides
Rise of the machines: Continuous Delivery at SEEK - YOW! Night Summary Slides
 
Microservices. Mastering Chaos
Microservices. Mastering ChaosMicroservices. Mastering Chaos
Microservices. Mastering Chaos
 
Continuous Integration Testing Techniques to Improve Chef Cookbook Quality
Continuous Integration Testing Techniques to Improve Chef Cookbook QualityContinuous Integration Testing Techniques to Improve Chef Cookbook Quality
Continuous Integration Testing Techniques to Improve Chef Cookbook Quality
 
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
 
chaos-engineering-Knolx
chaos-engineering-Knolxchaos-engineering-Knolx
chaos-engineering-Knolx
 
AWS re:Invent 2016: Hardware-Accelerating Graphics Desktop Workloads with Ama...
AWS re:Invent 2016: Hardware-Accelerating Graphics Desktop Workloads with Ama...AWS re:Invent 2016: Hardware-Accelerating Graphics Desktop Workloads with Ama...
AWS re:Invent 2016: Hardware-Accelerating Graphics Desktop Workloads with Ama...
 
Resilience Testing
Resilience Testing Resilience Testing
Resilience Testing
 
Migrate to Microservices Judiciously!
Migrate to Microservices Judiciously!Migrate to Microservices Judiciously!
Migrate to Microservices Judiciously!
 
Rally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at ScaleRally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at Scale
 
DevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 SlidesDevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 Slides
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Why Kubernetes Freedom Requires Chaos Engineering to Shine in Production
Why Kubernetes Freedom Requires Chaos Engineering to Shine in ProductionWhy Kubernetes Freedom Requires Chaos Engineering to Shine in Production
Why Kubernetes Freedom Requires Chaos Engineering to Shine in Production
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
 
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

JDD 2016 - Jedrzej Dabrowa - Distributed System Fault Injection Testing With Docker

  • 1. Distributed system fault injection testing with Docker Jedrzej Dabrowa Software Engineer
  • 2. Engineering optimized for impact What will I be talking about? ● What fails and why? ● Case study ○ Use of Elasticsearch in Base ○ Elasticsearch Proxy ● Testing microservices ● Fault injection tests toolkit ● Examples
  • 3. Engineering optimized for impact Microservices - quick recap ● Hard to do, even harder to do right
  • 4. Engineering optimized for impact Microservices - quick recap ● Hard to do, even harder to do right ● You get all the benefits of distributed systems…
  • 5. Engineering optimized for impact Microservices - quick recap ● Hard to do, even harder to do right ● You get all the benefits of distributed systems… ● …with all the pains:
  • 6. Engineering optimized for impact Microservices - quick recap ● Hard to do, even harder to do right ● You get all the benefits of distributed systems… ● … with all the pains: ○ CAP theorem nope
  • 7. Engineering optimized for impact Microservices - quick recap ● Hard to do, even harder to do right ● You get all the benefits of distributed systems… ● … with all the pains: ○ CAP theorem ○ Cloud-native (SCALE!)
  • 8. Engineering optimized for impact Microservices - quick recap ● Hard to do, even harder to do right ● You get all the benefits of distributed systems… ● … with all the pains: ○ CAP theorem ○ Cloud-native (SCALE!) ○ Network failures https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing
  • 9. Engineering optimized for impact Microservices - quick recap ● Hard to do, even harder to do right ● You get all the benefits of distributed systems… ● … with all the pains: ○ CAP theorem ○ Cloud-native (SCALE!) ○ Network failures ○ Hardware failures
  • 10. Engineering optimized for impact Microservices - quick recap ● Hard to do, even harder to do right ● You get all the benefits of distributed systems… ● … with all the pains: ○ CAP theorem ○ Cloud-native (SCALE!) ○ Network failures ○ Hardware failures ○ All kind of stuff you would never think about
  • 11. Engineering optimized for impact http://www.theregister.co.uk/2013/06/08/facebook_cloud_versus_cloud/
  • 12. Engineering optimized for impact Elasticsearch proxy - Case study ● Account ID-based sharding ○ Works well for small to medium, similarly sized clients
  • 14. Engineering optimized for impact Elasticsearch proxy - Case study ● Account ID-based sharding ○ Works well for small to medium, similarly sized clients ● Account ID sharding ○ Works well for small to medium, similarly sized clients ○ Enter big clients… and problems start
  • 17. Engineering optimized for impact Elasticsearch proxy - Case study ● Need for new solution ○ Solve existing problems... ■ Keep current solution for small accounts ■ Handle big ones differently ○ … as well as those yet to come. And improve! ■ Prioritize interactive traffic for top notch user experience ■ Provide SLA/QoS for database access ■ Enable dynamic configuration
  • 18. Engineering optimized for impact “All problems in computer science can be solved by another level of indirection” (“... except of course for the problem of too many indirections”) David Wheeler
  • 19. Engineering optimized for impact account ID sharding Custom sharding
  • 20. Engineering optimized for impact Testing microservices ● Testing single service is no different than testing regular application ● Testing microservices-based system is a whole other story ● New challenges require new approaches cost/effortexecution speed Mike Cohn Our target https://github.com/xolvio/qualityfaster
  • 21. Engineering optimized for impact Testing microservices ● Welcome to the non-deterministic world
  • 22. Engineering optimized for impact Testing microservices ● Welcome to the non-deterministic world ● Sources of complexity: ○ System space complexity
  • 23. Engineering optimized for impact ● Welcome to the non-deterministic world ● Sources of complexity: ○ System space complexity ○ Fault space complexity Testing microservices
  • 24. Engineering optimized for impact ● Welcome to the non-deterministic world ● Sources of complexity: ○ System space complexity ○ Fault space complexity ● Impossible to efficiently explore every possibility Testing microservices
  • 25. Engineering optimized for impact ● Interesting example: Netflix ● Chaos Monkey ● Mess with production (!) ● Lineage-driven fault injection ○ https://people.ucsc.edu/~palvaro/socc16.pdf ● Found problem in critical place (App Boot): ○ Brute force exploration would take ~2100 iterations ○ 5 potential failures found in ~200 experiments ● http://techblog.netflix.com/2016/01/automated-failure-testing.html Testing microservices Engineering optimized for impact
  • 26. Engineering optimized for impact ES Proxy Fault Injection tests ● Need: environment ● Solution: Docker ○ Docker compose ○ Allows to easily setup whole environment ○ Relatively complex system may be hosted on PC ○ Nice, declarative configuration through compose files
  • 28. Engineering optimized for impact ES Proxy Fault Injection tests ● Need: microservices binaries ● Solution: Amazon ECR ○ Allows teams to share their services ○ Current version easily recognizable
  • 30. Engineering optimized for impact ES Proxy Fault Injection tests ● Need: Harness tool ● Solution: BATS ○ https://github.com/sstephenson/bats ○ BASH-based tool // :( ○ TAP-compliant ○ Simple ○ Allows to implement isolated test scenarios and setup environment
  • 32. Engineering optimized for impact ES Proxy Fault Injection tests ● Need: possibility to inject fault ● Solution: Pumba tool ○ https://github.com/gaia-adm/pumba ○ Based on netem ○ Modifies egress traffic ○ Supports various delay/loss models ○ Relatively fresh (current version: 0.2.6)
  • 34. Engineering optimized for impact ES Proxy Fault Injection tests scenarios ● So… what do we test? ○ Remember first slide? ○ Focus on most obvious failure points first ○ What is the potential problem? How should the application behave? Engineering optimized for impact ES Proxy loses ZK connection
  • 35. Engineering optimized for impact ES Proxy Fault Injection tests scenarios
  • 36. Engineering optimized for impact ES Proxy Fault Injection tests scenarios ● So… what do we test? ○ Remember first slide? ○ Focus on most obvious failure points first ○ What is the potential problem? How should the application behave? ZK partition / quorum loss
  • 37. Engineering optimized for impact ES Proxy Fault Injection tests scenarios ● So… what do we test? ○ Remember first slide? ○ Focus on most obvious failure points first ○ What is the potential problem? How should the application behave? There are delays in network Packets are dropped Packets are reordered
  • 38. Engineering optimized for impact Enable global 800 ms delay Enable global 8% package loss
  • 39. Engineering optimized for impact Enable global 800 ms delay Enable global 8% package loss
  • 40. Engineering optimized for impact Conclusions ● Failures will happen
  • 41. Engineering optimized for impact Conclusions ● Failures will happen ● Proper design and keeping fault tolerance in mind gives pretty good level of confidence
  • 42. Engineering optimized for impact Conclusions ● Failures will happen ● Proper design and keeping fault tolerance in mind gives pretty good level of confidence ● Fault injection tests improve software reliability in a way that cannot be achieved through other kinds of tests
  • 43. Engineering optimized for impact Conclusions ● Failures will happen ● Proper design and keeping fault tolerance in mind gives pretty good level of confidence ● Fault injection tests improve software reliability in a way that cannot be achieved through other kinds of tests ● Those tests are only as good as you want/need them to be - they’re not exhaustive
  • 45. Engineering optimized for impact jedrzej.dabrowa@getbase.com lab.getbase.com/java @JeDabrowa @getbaselab