SlideShare a Scribd company logo
1 of 53
Chaos Engineering at Jet.com
Rachel Reese | @rachelreese | rachelree.se
Jet Technology | @JetTechnology | tech.jet.com
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/jet-microservices-testing
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon London
www.qconlondon.com
Why do you need chaos testing?
The world is naturally chaotic
But do we need more testing?
Unit Sanity Random Continuous
UsabilityA/BLocalizationAcceptance
Regression Performance Integration Security
You’ve already tested all your
components in multiple ways.
It’s super important to test the interactions in your
environment
Jet? Jet who?
Taking on Amazon!
Launched July 22
• Both Apple & Android named our
app as one of their tops for 2015
• Over 20k orders per day
• Over 10.5 million SKUs
• #4 marketplace worldwide
• 700 microservices
We’re hiring!
http://jet.com/about-us/working-at-jet
Azure Web sites Cloud
services VMs Service bus
queues
Services
bus topics
Blob storage
Table
storage Queues Hadoop DNS Active
directory
SQL Azure R
F# Paket FSharp.Data Chessie Unquote SQLProvider Python
Deedle
FAK
E
FSharp.Async React Node Angular SAS
Storm Elastic
Search
Xamarin Microservices Consul Kafka PDW
Splunk Redis SQL Puppet Jenkins
Apache
Hive
Apache
Tez
Microservices at Jet
Microservices
• An application of the single responsibility principle at the service level.
• Has an input, produces an output.
Easy scalability
Independent releasability
More even distribution of complexity
Benefits
“A class should have one, and only one, reason to change.”
What is chaos engineering?
It’s just wreaking havoc with your code
for fun, right?
Chaos Engineering is…
Controlled experiments on a distributed system
that help you build confidence in the system’s
ability to tolerate the inevitable failures.
Principles of Chaos Engineering
1. Define “normal”
2. Assume ”normal” will continue in both a control group
and an experimental group.
3. Introduce chaos: servers that crash, hard drives that
malfunction, network connections that are severed, etc.
4. Look for a difference in behavior between the control
group and the experimental group.
Going farther
Build a Hypothesis around Normal Behavior
Vary Real-world Events
Run Experiments in Production
Automate Experiments to Run Continuously
From http://principlesofchaos.org/
Benefits of chaos engineering
Benefits of chaos engineering
You're awake Design for failure
Healthy systems Self service
Current examples of chaos engineering
Maybe you meant Netflix’s Chaos Monkey?
How is Jet different?
We’re not testing in prod (yet).
SQL restarts & geo-replication
Start
- Checks the source db for write access
- Renames db on destination server (to create a new one)
- Creates a geo-replication in the destination region
Stop
- Shuts down cloud services writing to source db
- Sets source db as read-only
- Ends continuous copy
- Allows writes to secondary db
Azure & F#
Why F#?
What FP means to us
Prefer immutability
Avoid state changes,
side effects, and
mutable data
Use data in  data out
transformations
Think about mapping
inputs to outputs.
Look at problems
recursively
Consider successively
smaller chunks of the
same problem
Treat functions as
unit of work
Higher-order functions
The F# solution offers us an order of magnitude
increase in productivity and allows one developer to
perform the work [of] a team of dedicated
developers…
Yan Cui
Lead Server Engineer, Gamesys
“
“ “
Concise and powerful code
public abstract class Transport{ }
public abstract class Car : Transport {
public string Make { get; private set; }
public string Model { get; private set; }
public Car (string make, string model) {
this.Make = make;
this.Model = model;
}
}
public abstract class Bus : Transport {
public int Route { get; private set; }
public Bus (int route) {
this.Route = route;
}
}
public class Bicycle: Transport {
public Bicycle() {
}
}
type Transport =
| Car of Make:string * Model:string
| Bus of Route:int
| Bicycle
C# F#
Trivial to pattern match on!
F#patternmatching
C#
Concise and powerful code
public abstract class Transport{ }
public abstract class Car : Transport {
public string Make { get; private set; }
public string Model { get; private set; }
public Car (string make, string model) {
this.Make = make;
this.Model = model;
}
}
public abstract class Bus : Transport {
public int Route { get; private set; }
public Bus (int route) {
this.Route = route;
}
}
public class Bicycle: Transport {
public Bicycle() {
}
}
type Transport =
| Car of Make:string * Model:string
| Bus of Route:int
| Bicycle
| Train of Line:int
let getThereVia (transport:Transport) =
match transport with
| Car (make,model) -> ...
| Bus route -> ...
| Bicycle -> ...
Warning FS0025: Incomplete pattern
matches on this expression. For example,
the value ’Train' may indicate a case not
covered by the pattern(s)
C# F#
Units of Measure
TickSpec – an F# project
Thanks to Scott Wlaschin for his post, Cycles and modularity in the wild
SpecFlow– a comparable C# project
Thanks to Scott Wlaschin for his post, Cycles and modularity in the wild
Chaos code!
type Input =
| Product of Product
type Output =
| ProductPriceNile of Product * decimal
| ProductPriceCheckFailed of PriceCheckFailed
let handle (input:Input) =
async {
return Some(ProductPriceNile({Sku="343434"; ProductId = 17; ProductDescription = "My
amazing product"; CostPer=1.96M}, 3.96M))
}
let interpret id output =
match output with
| Some (Output.ProductPriceNile (e, price)) -> async {()} // write to event store
| Some (Output.ProductPriceCheckFailed e) -> async {()} // log failure
| None -> async.Return ()
let consume = EventStoreQueue.consume (decodeT Input.Product) handle interpret
What do our services look like?
Define inputs
& outputs
Define how input
transforms to output
Define what to do
with output
Read events,
handle, & interpret
Our code!
let selectRandomInstance compute hostedService = async {
try
let! details = getHostedServiceDetails compute hostedService.ServiceName
let deployment = getProductionDeployment details
let instance = deployment.RoleInstances
|> Seq.toArray
|> randomPick
return details.ServiceName, deployment.Name, instance
with e ->
log.error "Failed selecting random instancen%A" e
reraise e
}
Our code!
let restartRandomInstance compute hostedService = async {
try
let! serviceName, deploymentId, roleInstance =
selectRandomInstance compute hostedService
match roleInstance.PowerState with
| RoleInstancePowerState.Stopped ->
log.info "Service=%s Instance=%s is stopped...ignoring...”
serviceName roleInstance.InstanceName
| _ ->
do! restartInstance compute serviceName deploymentId roleInstance.InstanceName
with e ->
log.error "%s" e.Message
}
Our code!
compute
|> getHostedServices
|> Seq.filter ignoreList
|> knuthShuffle
|> Seq.distinctBy (fun a -> a.ServiceName)
|> Seq.map (fun hostedService -> async {
try
return! restartRandomInstance compute hostedService
with
e -> log.warn "failed: service=%s . %A" hostedService.ServiceName e
return ()
})
|> Async.ParallelIgnore 1
|> Async.RunSynchronously
Has it helped?
Elasticsearch restart
Additional chaos finds
- Redis
- Checkpointing
If availability matters, you should be
testing for it.
Azure + F# + Chaos = <3
Chaos Engineering at Jet.com
Rachel Reese | @rachelreese | rachelree.se
Jet Technology | @JetTechnology | tech.jet.com
Nora Jones | @nora_js
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/jet-
microservices-testing

More Related Content

Similar to Microservices Chaos Testing at Jet

Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 
Breaking Dependencies Legacy Code - Cork Software Crafters - September 2019
Breaking Dependencies Legacy Code -  Cork Software Crafters - September 2019Breaking Dependencies Legacy Code -  Cork Software Crafters - September 2019
Breaking Dependencies Legacy Code - Cork Software Crafters - September 2019Paulo Clavijo
 
Going open source with small teams
Going open source with small teamsGoing open source with small teams
Going open source with small teamsJamie Thomas
 
AWS Summit Auckland - Application Delivery Patterns for Developers
AWS Summit Auckland - Application Delivery Patterns for DevelopersAWS Summit Auckland - Application Delivery Patterns for Developers
AWS Summit Auckland - Application Delivery Patterns for DevelopersAmazon Web Services
 
Multilanguage Pipelines with Jenkins, Docker and Kubernetes (Commit Conf 2018)
Multilanguage Pipelines with Jenkins, Docker and Kubernetes (Commit Conf 2018)Multilanguage Pipelines with Jenkins, Docker and Kubernetes (Commit Conf 2018)
Multilanguage Pipelines with Jenkins, Docker and Kubernetes (Commit Conf 2018)Jorge Hidalgo
 
Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech &...
Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech &...Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech &...
Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech &...Henning Jacobs
 
DockerCon SF 2015: Keynote Day 1
DockerCon SF 2015: Keynote Day 1DockerCon SF 2015: Keynote Day 1
DockerCon SF 2015: Keynote Day 1Docker, Inc.
 
Serverless Single Page Apps with React and Redux at ItCamp 2017
Serverless Single Page Apps with React and Redux at ItCamp 2017Serverless Single Page Apps with React and Redux at ItCamp 2017
Serverless Single Page Apps with React and Redux at ItCamp 2017Melania Andrisan (Danciu)
 
Framework engineering JCO 2011
Framework engineering JCO 2011Framework engineering JCO 2011
Framework engineering JCO 2011YoungSu Son
 
The Modern Tech Stack: Microservices - The Dark Side
The Modern Tech Stack: Microservices - The Dark SideThe Modern Tech Stack: Microservices - The Dark Side
The Modern Tech Stack: Microservices - The Dark SideAggregage
 
Cloud continuous integration- A distributed approach using distinct services
Cloud continuous integration- A distributed approach using distinct servicesCloud continuous integration- A distributed approach using distinct services
Cloud continuous integration- A distributed approach using distinct servicesAndré Agostinho
 
2008 - TechDays PT: Building Software + Services with Volta
2008 - TechDays PT: Building Software + Services with Volta2008 - TechDays PT: Building Software + Services with Volta
2008 - TechDays PT: Building Software + Services with VoltaDaniel Fisher
 
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...Henning Jacobs
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Containers and the Docker EE Difference and usecases
Containers and the Docker EE Difference and usecasesContainers and the Docker EE Difference and usecases
Containers and the Docker EE Difference and usecasesAshnikbiz
 
LINQ Inside
LINQ InsideLINQ Inside
LINQ Insidejeffz
 
Keynote: Trends in Modern Application Development - Gilly Dekel, IBM
Keynote: Trends in Modern Application Development - Gilly Dekel, IBMKeynote: Trends in Modern Application Development - Gilly Dekel, IBM
Keynote: Trends in Modern Application Development - Gilly Dekel, IBMCodemotion Tel Aviv
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...MLconf
 

Similar to Microservices Chaos Testing at Jet (20)

Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Breaking Dependencies Legacy Code - Cork Software Crafters - September 2019
Breaking Dependencies Legacy Code -  Cork Software Crafters - September 2019Breaking Dependencies Legacy Code -  Cork Software Crafters - September 2019
Breaking Dependencies Legacy Code - Cork Software Crafters - September 2019
 
Going open source with small teams
Going open source with small teamsGoing open source with small teams
Going open source with small teams
 
AWS Summit Auckland - Application Delivery Patterns for Developers
AWS Summit Auckland - Application Delivery Patterns for DevelopersAWS Summit Auckland - Application Delivery Patterns for Developers
AWS Summit Auckland - Application Delivery Patterns for Developers
 
Multilanguage Pipelines with Jenkins, Docker and Kubernetes (Commit Conf 2018)
Multilanguage Pipelines with Jenkins, Docker and Kubernetes (Commit Conf 2018)Multilanguage Pipelines with Jenkins, Docker and Kubernetes (Commit Conf 2018)
Multilanguage Pipelines with Jenkins, Docker and Kubernetes (Commit Conf 2018)
 
Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech &...
Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech &...Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech &...
Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech &...
 
DockerCon SF 2015: Keynote Day 1
DockerCon SF 2015: Keynote Day 1DockerCon SF 2015: Keynote Day 1
DockerCon SF 2015: Keynote Day 1
 
Serverless Single Page Apps with React and Redux at ItCamp 2017
Serverless Single Page Apps with React and Redux at ItCamp 2017Serverless Single Page Apps with React and Redux at ItCamp 2017
Serverless Single Page Apps with React and Redux at ItCamp 2017
 
Clean Architecture @ Taxibeat
Clean Architecture @ TaxibeatClean Architecture @ Taxibeat
Clean Architecture @ Taxibeat
 
Framework engineering JCO 2011
Framework engineering JCO 2011Framework engineering JCO 2011
Framework engineering JCO 2011
 
The Modern Tech Stack: Microservices - The Dark Side
The Modern Tech Stack: Microservices - The Dark SideThe Modern Tech Stack: Microservices - The Dark Side
The Modern Tech Stack: Microservices - The Dark Side
 
Cloud continuous integration- A distributed approach using distinct services
Cloud continuous integration- A distributed approach using distinct servicesCloud continuous integration- A distributed approach using distinct services
Cloud continuous integration- A distributed approach using distinct services
 
2008 - TechDays PT: Building Software + Services with Volta
2008 - TechDays PT: Building Software + Services with Volta2008 - TechDays PT: Building Software + Services with Volta
2008 - TechDays PT: Building Software + Services with Volta
 
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Containers and the Docker EE Difference and usecases
Containers and the Docker EE Difference and usecasesContainers and the Docker EE Difference and usecases
Containers and the Docker EE Difference and usecases
 
LINQ Inside
LINQ InsideLINQ Inside
LINQ Inside
 
Keynote: Trends in Modern Application Development - Gilly Dekel, IBM
Keynote: Trends in Modern Application Development - Gilly Dekel, IBMKeynote: Trends in Modern Application Development - Gilly Dekel, IBM
Keynote: Trends in Modern Application Development - Gilly Dekel, IBM
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 

More from C4Media

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoC4Media
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileC4Media
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020C4Media
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No KeeperC4Media
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like OwnersC4Media
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsC4Media
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechC4Media
 

More from C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 

Recently uploaded

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Microservices Chaos Testing at Jet

  • 1. Chaos Engineering at Jet.com Rachel Reese | @rachelreese | rachelree.se Jet Technology | @JetTechnology | tech.jet.com
  • 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /jet-microservices-testing
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon London www.qconlondon.com
  • 4. Why do you need chaos testing?
  • 5. The world is naturally chaotic
  • 6. But do we need more testing? Unit Sanity Random Continuous UsabilityA/BLocalizationAcceptance Regression Performance Integration Security
  • 7. You’ve already tested all your components in multiple ways.
  • 8.
  • 9. It’s super important to test the interactions in your environment
  • 11. Taking on Amazon! Launched July 22 • Both Apple & Android named our app as one of their tops for 2015 • Over 20k orders per day • Over 10.5 million SKUs • #4 marketplace worldwide • 700 microservices We’re hiring! http://jet.com/about-us/working-at-jet
  • 12. Azure Web sites Cloud services VMs Service bus queues Services bus topics Blob storage Table storage Queues Hadoop DNS Active directory SQL Azure R F# Paket FSharp.Data Chessie Unquote SQLProvider Python Deedle FAK E FSharp.Async React Node Angular SAS Storm Elastic Search Xamarin Microservices Consul Kafka PDW Splunk Redis SQL Puppet Jenkins Apache Hive Apache Tez
  • 14. Microservices • An application of the single responsibility principle at the service level. • Has an input, produces an output. Easy scalability Independent releasability More even distribution of complexity Benefits “A class should have one, and only one, reason to change.”
  • 15. What is chaos engineering?
  • 16. It’s just wreaking havoc with your code for fun, right?
  • 17.
  • 18. Chaos Engineering is… Controlled experiments on a distributed system that help you build confidence in the system’s ability to tolerate the inevitable failures.
  • 19.
  • 20. Principles of Chaos Engineering 1. Define “normal” 2. Assume ”normal” will continue in both a control group and an experimental group. 3. Introduce chaos: servers that crash, hard drives that malfunction, network connections that are severed, etc. 4. Look for a difference in behavior between the control group and the experimental group.
  • 21. Going farther Build a Hypothesis around Normal Behavior Vary Real-world Events Run Experiments in Production Automate Experiments to Run Continuously From http://principlesofchaos.org/
  • 22. Benefits of chaos engineering
  • 23. Benefits of chaos engineering You're awake Design for failure Healthy systems Self service
  • 24. Current examples of chaos engineering
  • 25. Maybe you meant Netflix’s Chaos Monkey?
  • 26. How is Jet different?
  • 27. We’re not testing in prod (yet).
  • 28. SQL restarts & geo-replication Start - Checks the source db for write access - Renames db on destination server (to create a new one) - Creates a geo-replication in the destination region Stop - Shuts down cloud services writing to source db - Sets source db as read-only - Ends continuous copy - Allows writes to secondary db
  • 31.
  • 32. What FP means to us Prefer immutability Avoid state changes, side effects, and mutable data Use data in  data out transformations Think about mapping inputs to outputs. Look at problems recursively Consider successively smaller chunks of the same problem Treat functions as unit of work Higher-order functions
  • 33. The F# solution offers us an order of magnitude increase in productivity and allows one developer to perform the work [of] a team of dedicated developers… Yan Cui Lead Server Engineer, Gamesys “ “ “
  • 34. Concise and powerful code public abstract class Transport{ } public abstract class Car : Transport { public string Make { get; private set; } public string Model { get; private set; } public Car (string make, string model) { this.Make = make; this.Model = model; } } public abstract class Bus : Transport { public int Route { get; private set; } public Bus (int route) { this.Route = route; } } public class Bicycle: Transport { public Bicycle() { } } type Transport = | Car of Make:string * Model:string | Bus of Route:int | Bicycle C# F# Trivial to pattern match on!
  • 36. Concise and powerful code public abstract class Transport{ } public abstract class Car : Transport { public string Make { get; private set; } public string Model { get; private set; } public Car (string make, string model) { this.Make = make; this.Model = model; } } public abstract class Bus : Transport { public int Route { get; private set; } public Bus (int route) { this.Route = route; } } public class Bicycle: Transport { public Bicycle() { } } type Transport = | Car of Make:string * Model:string | Bus of Route:int | Bicycle | Train of Line:int let getThereVia (transport:Transport) = match transport with | Car (make,model) -> ... | Bus route -> ... | Bicycle -> ... Warning FS0025: Incomplete pattern matches on this expression. For example, the value ’Train' may indicate a case not covered by the pattern(s) C# F#
  • 38. TickSpec – an F# project Thanks to Scott Wlaschin for his post, Cycles and modularity in the wild
  • 39. SpecFlow– a comparable C# project Thanks to Scott Wlaschin for his post, Cycles and modularity in the wild
  • 41.
  • 42. type Input = | Product of Product type Output = | ProductPriceNile of Product * decimal | ProductPriceCheckFailed of PriceCheckFailed let handle (input:Input) = async { return Some(ProductPriceNile({Sku="343434"; ProductId = 17; ProductDescription = "My amazing product"; CostPer=1.96M}, 3.96M)) } let interpret id output = match output with | Some (Output.ProductPriceNile (e, price)) -> async {()} // write to event store | Some (Output.ProductPriceCheckFailed e) -> async {()} // log failure | None -> async.Return () let consume = EventStoreQueue.consume (decodeT Input.Product) handle interpret What do our services look like? Define inputs & outputs Define how input transforms to output Define what to do with output Read events, handle, & interpret
  • 43. Our code! let selectRandomInstance compute hostedService = async { try let! details = getHostedServiceDetails compute hostedService.ServiceName let deployment = getProductionDeployment details let instance = deployment.RoleInstances |> Seq.toArray |> randomPick return details.ServiceName, deployment.Name, instance with e -> log.error "Failed selecting random instancen%A" e reraise e }
  • 44. Our code! let restartRandomInstance compute hostedService = async { try let! serviceName, deploymentId, roleInstance = selectRandomInstance compute hostedService match roleInstance.PowerState with | RoleInstancePowerState.Stopped -> log.info "Service=%s Instance=%s is stopped...ignoring...” serviceName roleInstance.InstanceName | _ -> do! restartInstance compute serviceName deploymentId roleInstance.InstanceName with e -> log.error "%s" e.Message }
  • 45. Our code! compute |> getHostedServices |> Seq.filter ignoreList |> knuthShuffle |> Seq.distinctBy (fun a -> a.ServiceName) |> Seq.map (fun hostedService -> async { try return! restartRandomInstance compute hostedService with e -> log.warn "failed: service=%s . %A" hostedService.ServiceName e return () }) |> Async.ParallelIgnore 1 |> Async.RunSynchronously
  • 48. Additional chaos finds - Redis - Checkpointing
  • 49.
  • 50. If availability matters, you should be testing for it.
  • 51. Azure + F# + Chaos = <3
  • 52. Chaos Engineering at Jet.com Rachel Reese | @rachelreese | rachelree.se Jet Technology | @JetTechnology | tech.jet.com Nora Jones | @nora_js
  • 53. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/jet- microservices-testing