SlideShare a Scribd company logo
1 of 63
Expect the unexpected:
Anticipate and prepare for failures in
microservices based architectures
Bhakti Mehta
@bhakti_mehta
Introduction
• Senior Software Engineer at Blue Jeans
Network
• Worked at Sun Microsystems/Oracle for 13
years
• Committer to numerous open source projects
including GlassFish Application Server
My recent book
Previous book
Blue Jeans Network
• Video conferencing in the cloud
• 4000+ customers
• Millions of users
What you will learn
• Monoliths v/s microservices
• Challenges at scale
• Preventing Cascading failures
• Resilience planning at various stages
• Dealing with latencies in response
• Real world examples
Monolithic Service
Bundle
Billing Notification
Provisioning
accounts
Meeting
Scaling monolithic service
Microservices
Billing Provisioning
accounts
Notification Meeting
A micro service based application puts each
element of functionality in a separate service
Scaling microservices
Microservices
• Advantages
– Simplicity
– Isolation of problems
– Scale up and scale down
– Easy deployment
– Clear separation of concerns
– Heterogeneity and polyglotism
Microservices
• Disadvantages
– Not a free lunch!
– Distributed systems prone to failures
– Eventual consistency
– More effort in terms of deployments, release
managements
– Challenges in testing the various services evolving
independently, regression tests etc
API Gateway
Resilient system
• Processes transactions, even when there are
transient impulses, persistent stresses
• Functions even when there are component
failures disrupting normal processing
• Accepts failures will happen
• Designs for crumple zones
Kinds of failures
• Challenges at scale
• Integration point failures
– Network errors
– Semantic errors.
– Slow responses
– Outright hang
– GC issues
Anticipate failures at scale
• Anticipate growth
• Design for next order of magnitude
• Design for 10x plan to rewrite for 100x
Resiliency planning Stage 1
• When developing code
– Avoiding Cascading failures
• Circuit breaker
• Timeouts
• Retry
• Bulkhead
• Cache optimizations
– Avoid malicious clients
• Rate limiting
Resiliency planning Stage 2
• Planning for dealing with failures before
deploy
– load test
– a/b test
– longevity
Resiliency planning Stage 3
• Watching out for failures after deploy
– health check
– metrics
Cascading failures
Caused by Chain reactions
For example
One node in a load balance group fails
Others need to pick up work
Eventually performance can degenerate
Cascading failures with aggregation
Cascading failure with aggregation
Timeouts
• Clients may prefer a response
– failure
– success
– job queued for later
All aggregation requests to microservices should
have reasonable timeouts set
Types of Timeouts
• Connection timeout
– Max time before connection can be established or
Error
• Socket timeout
– Max time of inactivity between two packets once
connection is established
Timeouts pattern
• Timeouts + Retries go together
• Transient failures can be remedied with fast
retries
• However problems in network can last for a
while so probability of retries failing
Timeouts in code
In JAX-RS
Client client = ClientBuilder.newClient();
client.property(ClientProperties.CONNECT_TIMEOUT, 5000);
client.property(ClientProperties.READ_TIMEOUT, 5000)
Retry pattern
• Retry for failures in case of network failures,
timeouts or server errors
• Helps transient network errors such as
dropped connections or server fail over
Retry pattern
• If one of the services is slow or malfunctioning
and other services keep retrying then the
problem becomes worse
• Solution
– Exponential backup
– Circuit breaker pattern
Circuit breaker pattern
Circuit breaker A circuit breaker is an electrical device used in an
electrical panel that monitors and controls the amount of amperes
(amps) being sent through
Circuit breaker pattern
• Safety device
• If a power surge occurs in the electrical wiring,
the breaker will trip.
• Flips from “On” to “Off” and shuts electrical
power from that breaker
Circuit breaker
• Netflix Hystrix follows circuit breaker pattern
• If a service’s error rate exceeds a threshold it
will trip the circuit breaker and block the
requests for a specific period of time
Bulkhead
Bulkhead
• Avoiding chain reactions by isolating failures
• Helps prevent cascading failures
Bulkhead
• An example of bulkhead could be isolating the
database dependencies per service
• Similarly other infrastructure components can
be isolated such as cache infrastructure
Rate Limiting
• Restricting the number of requests that can be
made by a client
• Client can be identified based on the access
token used
• Additionally clients can be identified based on
IP address
Rate Limiting
• With JAX-RS Rate limiting can be implemented
as a filter
• This filter can check the access count for a
client and if within limit accept the request
• Else throw a 429 Error
• Code at https://github.com/bhakti-
mehta/samples/tree/master/ratelimiting
Cache optimizations
• Stores response information related to
requests in a temporary storage for a specific
period of time
• Ensures that server is not burdened
processing those requests in future when
responses can be fulfilled from the cache
Cache optimizations
Getting from first level cache
Getting from second
level cache
Getting from the DB
Dealing with latencies in response
• Have a timeout for the aggregation service
• Dispatch requests in parallel and collect
responses
• Associate a priority with all the responses
collected
Handling partial failures best practices
• One service calls another which can be slow or
unavailable
• Never block indefinitely waiting for the service
• Try to return partial results
• Provide a caching layer and return cached
data
Asynchronous Patterns
• Pattern to deal with long running jobs
• Some resources may take longer time to
provide results
• Not needing client to wait for the response
Reactive programming model
• Use reactive programming such as
CompletableFuture in Java 8, ListenableFuture
• Rx Java
Asynchronous API
• Reactive patterns
• Message Passing
– Akka actor model
• Message queues
– Communication between services via shared
message queues
– Websockets
Logging
• Complex distributed systems introduce many
points of failure
• Logging helps link events/transactions between
various components that make an application or
a business service
• ELK stack
• Splunk, syslog
• Loggly
• LogEntries
Logging best practices
• Include detailed, consistent pattern across
service logs
• Obfuscate sensitive data
• Identify caller or initiator as part of logs
• Do not log payloads by default
Best practices when designing APIs for
mobile clients
– Avoid chattiness
– Use aggregator pattern
Resilience planning Stage 2
• Before deploy
– Load testing
– Longevity testing
– Capacity planning
Load testing
• Ensure that you test for load on APIs
– Jmeter
• Plan for longevity testing
Capacity Planning
• Anticipate growth
• Design for handling exponential growth
Resilience planning Stage 3
• After deploy
– Health check
– Metrics
– Phased rollout of features
Health Check
• Memory
• CPU
• Threads
• Error rate
• If any of the checks exceed a threshold send
alert
Metrics
• Response times, throughput
– Identify slow running DB queries
• GC rate and pause duration
– Garbage collection can cause slow responses
• Monitor unusual activity
• Third party library metrics
– For example Couchbase hits
– atop
Rollout of new features
• Phasing rollout of new features
• Have a way to turn features off if not behaving
as expected
• Alerts and more alerts!
Real time examples
• Netflix's Simian Army induces failures of
services and even datacenters during the
working day to test both the application's
resilience and monitoring.
• Latency Monkey to simulate slow running
requests
• Wiremock to mock services
• Saboteur to create deliberate network
mayhem
Takeaway
• Inevitability of failures
– Expect systems will fail
– Failure prevention
References
• https://commons.wikimedia.org/wiki/File:Bulkhead_PSF.png
• https://en.wikipedia.org/wiki/Circuit_breaker#/media/File:Four_1_pole_circuit_breakers_fitted_in_a_met
er_box.jpg
• https://www.flickr.com/photos/skynoir/ Beer in hand: skynoir/Flickr/Creative Commons License

More Related Content

What's hot

Oracle WebLogic Server 12.2.1 Do More with Less
Oracle WebLogic Server 12.2.1 Do More with LessOracle WebLogic Server 12.2.1 Do More with Less
Oracle WebLogic Server 12.2.1 Do More with LessEd Burns
 
Oracle Fuson Middleware Diagnostics, Performance and Troubleshoot
Oracle Fuson Middleware Diagnostics, Performance and TroubleshootOracle Fuson Middleware Diagnostics, Performance and Troubleshoot
Oracle Fuson Middleware Diagnostics, Performance and TroubleshootMichel Schildmeijer
 
Alfresco DevCon 2019 Performance Tools of the Trade
Alfresco DevCon 2019   Performance Tools of the TradeAlfresco DevCon 2019   Performance Tools of the Trade
Alfresco DevCon 2019 Performance Tools of the TradeLuis Colorado
 
Serverless Java Challenges & Triumphs
Serverless Java Challenges & TriumphsServerless Java Challenges & Triumphs
Serverless Java Challenges & TriumphsDavid Delabassee
 
EJB 3.2 - Java EE 7 - Java One Hyderabad 2012
EJB 3.2 - Java EE 7 - Java One Hyderabad 2012EJB 3.2 - Java EE 7 - Java One Hyderabad 2012
EJB 3.2 - Java EE 7 - Java One Hyderabad 2012Jagadish Prasath
 
Testing Java EE Applications Using Arquillian
Testing Java EE Applications Using ArquillianTesting Java EE Applications Using Arquillian
Testing Java EE Applications Using ArquillianReza Rahman
 
Java Serverless in Action - Voxxed Banff
Java Serverless in Action - Voxxed BanffJava Serverless in Action - Voxxed Banff
Java Serverless in Action - Voxxed BanffDavid Delabassee
 
Top 50 java ee 7 best practices [con5669]
Top 50 java ee 7 best practices [con5669]Top 50 java ee 7 best practices [con5669]
Top 50 java ee 7 best practices [con5669]Ryan Cuprak
 
Developing Java EE Applications on IntelliJ IDEA with Oracle WebLogic 12c
Developing Java EE Applications on IntelliJ IDEA with Oracle WebLogic 12cDeveloping Java EE Applications on IntelliJ IDEA with Oracle WebLogic 12c
Developing Java EE Applications on IntelliJ IDEA with Oracle WebLogic 12cBruno Borges
 
Batching and Java EE (jdk.io)
Batching and Java EE (jdk.io)Batching and Java EE (jdk.io)
Batching and Java EE (jdk.io)Ryan Cuprak
 
Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Ryan Cuprak
 
Preparing for java 9 modules upload
Preparing for java 9 modules uploadPreparing for java 9 modules upload
Preparing for java 9 modules uploadRyan Cuprak
 
Flexible Permissions Management with ACL Templates
Flexible Permissions Management with ACL TemplatesFlexible Permissions Management with ACL Templates
Flexible Permissions Management with ACL TemplatesJeff Potts
 
Java EE 8 Update
Java EE 8 UpdateJava EE 8 Update
Java EE 8 UpdateRyan Cuprak
 
Reactive Java EE - Let Me Count the Ways!
Reactive Java EE - Let Me Count the Ways!Reactive Java EE - Let Me Count the Ways!
Reactive Java EE - Let Me Count the Ways!Reza Rahman
 
HTTP/2 comes to Java. What Servlet 4.0 means to you. DevNexus 2015
HTTP/2 comes to Java.  What Servlet 4.0 means to you. DevNexus 2015HTTP/2 comes to Java.  What Servlet 4.0 means to you. DevNexus 2015
HTTP/2 comes to Java. What Servlet 4.0 means to you. DevNexus 2015Edward Burns
 

What's hot (20)

Oracle WebLogic Server 12.2.1 Do More with Less
Oracle WebLogic Server 12.2.1 Do More with LessOracle WebLogic Server 12.2.1 Do More with Less
Oracle WebLogic Server 12.2.1 Do More with Less
 
Oracle Fuson Middleware Diagnostics, Performance and Troubleshoot
Oracle Fuson Middleware Diagnostics, Performance and TroubleshootOracle Fuson Middleware Diagnostics, Performance and Troubleshoot
Oracle Fuson Middleware Diagnostics, Performance and Troubleshoot
 
Why Play Framework is fast
Why Play Framework is fastWhy Play Framework is fast
Why Play Framework is fast
 
Alfresco DevCon 2019 Performance Tools of the Trade
Alfresco DevCon 2019   Performance Tools of the TradeAlfresco DevCon 2019   Performance Tools of the Trade
Alfresco DevCon 2019 Performance Tools of the Trade
 
Serverless Java Challenges & Triumphs
Serverless Java Challenges & TriumphsServerless Java Challenges & Triumphs
Serverless Java Challenges & Triumphs
 
EJB 3.2 - Java EE 7 - Java One Hyderabad 2012
EJB 3.2 - Java EE 7 - Java One Hyderabad 2012EJB 3.2 - Java EE 7 - Java One Hyderabad 2012
EJB 3.2 - Java EE 7 - Java One Hyderabad 2012
 
Testing Java EE Applications Using Arquillian
Testing Java EE Applications Using ArquillianTesting Java EE Applications Using Arquillian
Testing Java EE Applications Using Arquillian
 
Java Serverless in Action - Voxxed Banff
Java Serverless in Action - Voxxed BanffJava Serverless in Action - Voxxed Banff
Java Serverless in Action - Voxxed Banff
 
Top 50 java ee 7 best practices [con5669]
Top 50 java ee 7 best practices [con5669]Top 50 java ee 7 best practices [con5669]
Top 50 java ee 7 best practices [con5669]
 
Developing Java EE Applications on IntelliJ IDEA with Oracle WebLogic 12c
Developing Java EE Applications on IntelliJ IDEA with Oracle WebLogic 12cDeveloping Java EE Applications on IntelliJ IDEA with Oracle WebLogic 12c
Developing Java EE Applications on IntelliJ IDEA with Oracle WebLogic 12c
 
Batching and Java EE (jdk.io)
Batching and Java EE (jdk.io)Batching and Java EE (jdk.io)
Batching and Java EE (jdk.io)
 
Java 11 OMG
Java 11 OMGJava 11 OMG
Java 11 OMG
 
Java EE 8
Java EE 8Java EE 8
Java EE 8
 
Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)
 
Java on Azure
Java on AzureJava on Azure
Java on Azure
 
Preparing for java 9 modules upload
Preparing for java 9 modules uploadPreparing for java 9 modules upload
Preparing for java 9 modules upload
 
Flexible Permissions Management with ACL Templates
Flexible Permissions Management with ACL TemplatesFlexible Permissions Management with ACL Templates
Flexible Permissions Management with ACL Templates
 
Java EE 8 Update
Java EE 8 UpdateJava EE 8 Update
Java EE 8 Update
 
Reactive Java EE - Let Me Count the Ways!
Reactive Java EE - Let Me Count the Ways!Reactive Java EE - Let Me Count the Ways!
Reactive Java EE - Let Me Count the Ways!
 
HTTP/2 comes to Java. What Servlet 4.0 means to you. DevNexus 2015
HTTP/2 comes to Java.  What Servlet 4.0 means to you. DevNexus 2015HTTP/2 comes to Java.  What Servlet 4.0 means to you. DevNexus 2015
HTTP/2 comes to Java. What Servlet 4.0 means to you. DevNexus 2015
 

Viewers also liked

Internationale clusters in vergelijkend perpsectief
Internationale  clusters in vergelijkend perpsectiefInternationale  clusters in vergelijkend perpsectief
Internationale clusters in vergelijkend perpsectiefAnika Snel
 
Open Source Approach to Design and Deployment of Microservices-based VNF
Open Source Approach to Design and Deployment of Microservices-based VNFOpen Source Approach to Design and Deployment of Microservices-based VNF
Open Source Approach to Design and Deployment of Microservices-based VNFOpen Networking Summit
 
concepto de colección local
concepto de colección localconcepto de colección local
concepto de colección localguestf488db7
 
Service Orchestrierung mit Apache Mesos
Service Orchestrierung mit Apache MesosService Orchestrierung mit Apache Mesos
Service Orchestrierung mit Apache MesosRalf Ernst
 
AppSensor Near Real-Time Event Detection and Response - DevNexus 2016
AppSensor Near Real-Time Event Detection and Response - DevNexus 2016AppSensor Near Real-Time Event Detection and Response - DevNexus 2016
AppSensor Near Real-Time Event Detection and Response - DevNexus 2016jtmelton
 
Modern Infrastructure from Scratch with Puppet
Modern Infrastructure from Scratch with PuppetModern Infrastructure from Scratch with Puppet
Modern Infrastructure from Scratch with PuppetPuppet
 
Ahg microsoft stream_insight_queries
Ahg microsoft stream_insight_queriesAhg microsoft stream_insight_queries
Ahg microsoft stream_insight_queriesSteve Xu
 
George Park Workshop 1 - Cosumnes CSD
George Park Workshop 1 - Cosumnes CSDGeorge Park Workshop 1 - Cosumnes CSD
George Park Workshop 1 - Cosumnes CSDCosumnes CSD
 
Chicago AWS user group meetup - May 2014 at Cohesive
Chicago AWS user group meetup - May 2014 at CohesiveChicago AWS user group meetup - May 2014 at Cohesive
Chicago AWS user group meetup - May 2014 at CohesiveCloudCamp Chicago
 
Migrate Oracle WebLogic Applications onto a Containerized Cloud Data Center
Migrate Oracle WebLogic Applications onto a Containerized Cloud Data CenterMigrate Oracle WebLogic Applications onto a Containerized Cloud Data Center
Migrate Oracle WebLogic Applications onto a Containerized Cloud Data CenterJingnan Zhou
 
Hadoop / Spark on Malware Expression
Hadoop / Spark on Malware ExpressionHadoop / Spark on Malware Expression
Hadoop / Spark on Malware ExpressionMapR Technologies
 
Microservices Tracing with Spring Cloud and Zipkin
Microservices Tracing with Spring Cloud and ZipkinMicroservices Tracing with Spring Cloud and Zipkin
Microservices Tracing with Spring Cloud and ZipkinMarcin Grzejszczak
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
 

Viewers also liked (20)

Book of Fauna and Flora
Book of Fauna and FloraBook of Fauna and Flora
Book of Fauna and Flora
 
Internationale clusters in vergelijkend perpsectief
Internationale  clusters in vergelijkend perpsectiefInternationale  clusters in vergelijkend perpsectief
Internationale clusters in vergelijkend perpsectief
 
An Introduction to event sourcing and CQRS
An Introduction to event sourcing and CQRSAn Introduction to event sourcing and CQRS
An Introduction to event sourcing and CQRS
 
Open Source Approach to Design and Deployment of Microservices-based VNF
Open Source Approach to Design and Deployment of Microservices-based VNFOpen Source Approach to Design and Deployment of Microservices-based VNF
Open Source Approach to Design and Deployment of Microservices-based VNF
 
Tic’s y enfermería
Tic’s y enfermeríaTic’s y enfermería
Tic’s y enfermería
 
Mohamed Ahmed Abdelkhalek
Mohamed Ahmed AbdelkhalekMohamed Ahmed Abdelkhalek
Mohamed Ahmed Abdelkhalek
 
concepto de colección local
concepto de colección localconcepto de colección local
concepto de colección local
 
Service Orchestrierung mit Apache Mesos
Service Orchestrierung mit Apache MesosService Orchestrierung mit Apache Mesos
Service Orchestrierung mit Apache Mesos
 
AppSensor Near Real-Time Event Detection and Response - DevNexus 2016
AppSensor Near Real-Time Event Detection and Response - DevNexus 2016AppSensor Near Real-Time Event Detection and Response - DevNexus 2016
AppSensor Near Real-Time Event Detection and Response - DevNexus 2016
 
Modern Infrastructure from Scratch with Puppet
Modern Infrastructure from Scratch with PuppetModern Infrastructure from Scratch with Puppet
Modern Infrastructure from Scratch with Puppet
 
Ahg microsoft stream_insight_queries
Ahg microsoft stream_insight_queriesAhg microsoft stream_insight_queries
Ahg microsoft stream_insight_queries
 
Crow
CrowCrow
Crow
 
George Park Workshop 1 - Cosumnes CSD
George Park Workshop 1 - Cosumnes CSDGeorge Park Workshop 1 - Cosumnes CSD
George Park Workshop 1 - Cosumnes CSD
 
Composite çelik
Composite çelikComposite çelik
Composite çelik
 
Chicago AWS user group meetup - May 2014 at Cohesive
Chicago AWS user group meetup - May 2014 at CohesiveChicago AWS user group meetup - May 2014 at Cohesive
Chicago AWS user group meetup - May 2014 at Cohesive
 
Migrate Oracle WebLogic Applications onto a Containerized Cloud Data Center
Migrate Oracle WebLogic Applications onto a Containerized Cloud Data CenterMigrate Oracle WebLogic Applications onto a Containerized Cloud Data Center
Migrate Oracle WebLogic Applications onto a Containerized Cloud Data Center
 
Coniferous Forest
Coniferous ForestConiferous Forest
Coniferous Forest
 
Hadoop / Spark on Malware Expression
Hadoop / Spark on Malware ExpressionHadoop / Spark on Malware Expression
Hadoop / Spark on Malware Expression
 
Microservices Tracing with Spring Cloud and Zipkin
Microservices Tracing with Spring Cloud and ZipkinMicroservices Tracing with Spring Cloud and Zipkin
Microservices Tracing with Spring Cloud and Zipkin
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 

Similar to Expect the unexpected: Anticipate and prepare for failures in microservices based architectures

Resilience planning and how the empire strikes back
Resilience planning and how the empire strikes backResilience planning and how the empire strikes back
Resilience planning and how the empire strikes backBhakti Mehta
 
Architecting for Failures in micro services: patterns and lessons learned
Architecting for Failures in micro services: patterns and lessons learnedArchitecting for Failures in micro services: patterns and lessons learned
Architecting for Failures in micro services: patterns and lessons learnedBhakti Mehta
 
Resilience Planning & How the Empire Strikes Back
Resilience Planning & How the Empire Strikes BackResilience Planning & How the Empire Strikes Back
Resilience Planning & How the Empire Strikes BackC4Media
 
Expect the unexpected: Prepare for failures in microservices
Expect the unexpected: Prepare for failures in microservicesExpect the unexpected: Prepare for failures in microservices
Expect the unexpected: Prepare for failures in microservicesBhakti Mehta
 
WebLogic Stability; Detect and Analyse Stuck Threads
WebLogic Stability; Detect and Analyse Stuck ThreadsWebLogic Stability; Detect and Analyse Stuck Threads
WebLogic Stability; Detect and Analyse Stuck ThreadsMaarten Smeets
 
Azure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesAzure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesIvo Andreev
 
Microservices for java architects it-symposium-2015-09-15
Microservices for java architects it-symposium-2015-09-15Microservices for java architects it-symposium-2015-09-15
Microservices for java architects it-symposium-2015-09-15Derek Ashmore
 
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...Prolifics
 
Stay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithStay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithMarkus Eisele
 
Software Architecture for Cloud Infrastructure
Software Architecture for Cloud InfrastructureSoftware Architecture for Cloud Infrastructure
Software Architecture for Cloud InfrastructureTapio Rautonen
 
Mieke Gevers - Performance Testing in 5 Steps - A Guideline to a Successful L...
Mieke Gevers - Performance Testing in 5 Steps - A Guideline to a Successful L...Mieke Gevers - Performance Testing in 5 Steps - A Guideline to a Successful L...
Mieke Gevers - Performance Testing in 5 Steps - A Guideline to a Successful L...TEST Huddle
 
Arsitektur Aplikasi Modern - Faisal Henry Susanto
Arsitektur Aplikasi Modern - Faisal Henry SusantoArsitektur Aplikasi Modern - Faisal Henry Susanto
Arsitektur Aplikasi Modern - Faisal Henry SusantoDicodingEvent
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready AppsVMware Tanzu
 
OnPrem Monitoring.pdf
OnPrem Monitoring.pdfOnPrem Monitoring.pdf
OnPrem Monitoring.pdfTarekHamdi8
 
Service Stampede: Surviving a Thousand Services
Service Stampede: Surviving a Thousand ServicesService Stampede: Surviving a Thousand Services
Service Stampede: Surviving a Thousand ServicesAnil Gursel
 
Production Ready Microservices at Scale
Production Ready Microservices at ScaleProduction Ready Microservices at Scale
Production Ready Microservices at ScaleRajeev Bharshetty
 
Cloud based dlms cosem metering head end
Cloud based dlms cosem metering head endCloud based dlms cosem metering head end
Cloud based dlms cosem metering head endNirmal Thaliyil
 

Similar to Expect the unexpected: Anticipate and prepare for failures in microservices based architectures (20)

Resilience planning and how the empire strikes back
Resilience planning and how the empire strikes backResilience planning and how the empire strikes back
Resilience planning and how the empire strikes back
 
Architecting for Failures in micro services: patterns and lessons learned
Architecting for Failures in micro services: patterns and lessons learnedArchitecting for Failures in micro services: patterns and lessons learned
Architecting for Failures in micro services: patterns and lessons learned
 
Resilience Planning & How the Empire Strikes Back
Resilience Planning & How the Empire Strikes BackResilience Planning & How the Empire Strikes Back
Resilience Planning & How the Empire Strikes Back
 
Expect the unexpected: Prepare for failures in microservices
Expect the unexpected: Prepare for failures in microservicesExpect the unexpected: Prepare for failures in microservices
Expect the unexpected: Prepare for failures in microservices
 
WebLogic Stability; Detect and Analyse Stuck Threads
WebLogic Stability; Detect and Analyse Stuck ThreadsWebLogic Stability; Detect and Analyse Stuck Threads
WebLogic Stability; Detect and Analyse Stuck Threads
 
Azure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesAzure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challenges
 
Microservices for java architects it-symposium-2015-09-15
Microservices for java architects it-symposium-2015-09-15Microservices for java architects it-symposium-2015-09-15
Microservices for java architects it-symposium-2015-09-15
 
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
 
Devoxx2017
Devoxx2017Devoxx2017
Devoxx2017
 
Micro service architecture
Micro service architecture  Micro service architecture
Micro service architecture
 
Stay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithStay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolith
 
Kafka PPT.pptx
Kafka PPT.pptxKafka PPT.pptx
Kafka PPT.pptx
 
Software Architecture for Cloud Infrastructure
Software Architecture for Cloud InfrastructureSoftware Architecture for Cloud Infrastructure
Software Architecture for Cloud Infrastructure
 
Mieke Gevers - Performance Testing in 5 Steps - A Guideline to a Successful L...
Mieke Gevers - Performance Testing in 5 Steps - A Guideline to a Successful L...Mieke Gevers - Performance Testing in 5 Steps - A Guideline to a Successful L...
Mieke Gevers - Performance Testing in 5 Steps - A Guideline to a Successful L...
 
Arsitektur Aplikasi Modern - Faisal Henry Susanto
Arsitektur Aplikasi Modern - Faisal Henry SusantoArsitektur Aplikasi Modern - Faisal Henry Susanto
Arsitektur Aplikasi Modern - Faisal Henry Susanto
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
OnPrem Monitoring.pdf
OnPrem Monitoring.pdfOnPrem Monitoring.pdf
OnPrem Monitoring.pdf
 
Service Stampede: Surviving a Thousand Services
Service Stampede: Surviving a Thousand ServicesService Stampede: Surviving a Thousand Services
Service Stampede: Surviving a Thousand Services
 
Production Ready Microservices at Scale
Production Ready Microservices at ScaleProduction Ready Microservices at Scale
Production Ready Microservices at Scale
 
Cloud based dlms cosem metering head end
Cloud based dlms cosem metering head endCloud based dlms cosem metering head end
Cloud based dlms cosem metering head end
 

More from Bhakti Mehta

Reliability teamwork
Reliability teamworkReliability teamwork
Reliability teamworkBhakti Mehta
 
Scaling Confluence Architecture: A Sneak Peek Under the Hood
Scaling Confluence Architecture: A Sneak Peek Under the HoodScaling Confluence Architecture: A Sneak Peek Under the Hood
Scaling Confluence Architecture: A Sneak Peek Under the HoodBhakti Mehta
 
Let if flow: Java 8 Streams puzzles and more
Let if flow: Java 8 Streams puzzles and moreLet if flow: Java 8 Streams puzzles and more
Let if flow: Java 8 Streams puzzles and moreBhakti Mehta
 
Real world RESTful service development problems and solutions
Real world RESTful service development problems and solutionsReal world RESTful service development problems and solutions
Real world RESTful service development problems and solutionsBhakti Mehta
 
Fight empire-html5
Fight empire-html5Fight empire-html5
Fight empire-html5Bhakti Mehta
 
Con fess 2013-sse-websockets-json-bhakti
Con fess 2013-sse-websockets-json-bhaktiCon fess 2013-sse-websockets-json-bhakti
Con fess 2013-sse-websockets-json-bhaktiBhakti Mehta
 

More from Bhakti Mehta (8)

Reliability teamwork
Reliability teamworkReliability teamwork
Reliability teamwork
 
Scaling Confluence Architecture: A Sneak Peek Under the Hood
Scaling Confluence Architecture: A Sneak Peek Under the HoodScaling Confluence Architecture: A Sneak Peek Under the Hood
Scaling Confluence Architecture: A Sneak Peek Under the Hood
 
Let if flow: Java 8 Streams puzzles and more
Let if flow: Java 8 Streams puzzles and moreLet if flow: Java 8 Streams puzzles and more
Let if flow: Java 8 Streams puzzles and more
 
Real world RESTful service development problems and solutions
Real world RESTful service development problems and solutionsReal world RESTful service development problems and solutions
Real world RESTful service development problems and solutions
 
Think async
Think asyncThink async
Think async
 
Fight empire-html5
Fight empire-html5Fight empire-html5
Fight empire-html5
 
50 tips50minutes
50 tips50minutes50 tips50minutes
50 tips50minutes
 
Con fess 2013-sse-websockets-json-bhakti
Con fess 2013-sse-websockets-json-bhaktiCon fess 2013-sse-websockets-json-bhakti
Con fess 2013-sse-websockets-json-bhakti
 

Recently uploaded

University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Recently uploaded (20)

University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 

Expect the unexpected: Anticipate and prepare for failures in microservices based architectures

  • 1. Expect the unexpected: Anticipate and prepare for failures in microservices based architectures Bhakti Mehta @bhakti_mehta
  • 2. Introduction • Senior Software Engineer at Blue Jeans Network • Worked at Sun Microsystems/Oracle for 13 years • Committer to numerous open source projects including GlassFish Application Server
  • 5. Blue Jeans Network • Video conferencing in the cloud • 4000+ customers • Millions of users
  • 6. What you will learn • Monoliths v/s microservices • Challenges at scale • Preventing Cascading failures • Resilience planning at various stages • Dealing with latencies in response • Real world examples
  • 9. Microservices Billing Provisioning accounts Notification Meeting A micro service based application puts each element of functionality in a separate service
  • 11. Microservices • Advantages – Simplicity – Isolation of problems – Scale up and scale down – Easy deployment – Clear separation of concerns – Heterogeneity and polyglotism
  • 12. Microservices • Disadvantages – Not a free lunch! – Distributed systems prone to failures – Eventual consistency – More effort in terms of deployments, release managements – Challenges in testing the various services evolving independently, regression tests etc
  • 14. Resilient system • Processes transactions, even when there are transient impulses, persistent stresses • Functions even when there are component failures disrupting normal processing • Accepts failures will happen • Designs for crumple zones
  • 15. Kinds of failures • Challenges at scale • Integration point failures – Network errors – Semantic errors. – Slow responses – Outright hang – GC issues
  • 16.
  • 17.
  • 18. Anticipate failures at scale • Anticipate growth • Design for next order of magnitude • Design for 10x plan to rewrite for 100x
  • 19. Resiliency planning Stage 1 • When developing code – Avoiding Cascading failures • Circuit breaker • Timeouts • Retry • Bulkhead • Cache optimizations – Avoid malicious clients • Rate limiting
  • 20. Resiliency planning Stage 2 • Planning for dealing with failures before deploy – load test – a/b test – longevity
  • 21. Resiliency planning Stage 3 • Watching out for failures after deploy – health check – metrics
  • 22.
  • 23. Cascading failures Caused by Chain reactions For example One node in a load balance group fails Others need to pick up work Eventually performance can degenerate
  • 24. Cascading failures with aggregation
  • 25. Cascading failure with aggregation
  • 26.
  • 27. Timeouts • Clients may prefer a response – failure – success – job queued for later All aggregation requests to microservices should have reasonable timeouts set
  • 28. Types of Timeouts • Connection timeout – Max time before connection can be established or Error • Socket timeout – Max time of inactivity between two packets once connection is established
  • 29. Timeouts pattern • Timeouts + Retries go together • Transient failures can be remedied with fast retries • However problems in network can last for a while so probability of retries failing
  • 30. Timeouts in code In JAX-RS Client client = ClientBuilder.newClient(); client.property(ClientProperties.CONNECT_TIMEOUT, 5000); client.property(ClientProperties.READ_TIMEOUT, 5000)
  • 31. Retry pattern • Retry for failures in case of network failures, timeouts or server errors • Helps transient network errors such as dropped connections or server fail over
  • 32. Retry pattern • If one of the services is slow or malfunctioning and other services keep retrying then the problem becomes worse • Solution – Exponential backup – Circuit breaker pattern
  • 33. Circuit breaker pattern Circuit breaker A circuit breaker is an electrical device used in an electrical panel that monitors and controls the amount of amperes (amps) being sent through
  • 34. Circuit breaker pattern • Safety device • If a power surge occurs in the electrical wiring, the breaker will trip. • Flips from “On” to “Off” and shuts electrical power from that breaker
  • 35. Circuit breaker • Netflix Hystrix follows circuit breaker pattern • If a service’s error rate exceeds a threshold it will trip the circuit breaker and block the requests for a specific period of time
  • 37. Bulkhead • Avoiding chain reactions by isolating failures • Helps prevent cascading failures
  • 38. Bulkhead • An example of bulkhead could be isolating the database dependencies per service • Similarly other infrastructure components can be isolated such as cache infrastructure
  • 39. Rate Limiting • Restricting the number of requests that can be made by a client • Client can be identified based on the access token used • Additionally clients can be identified based on IP address
  • 40. Rate Limiting • With JAX-RS Rate limiting can be implemented as a filter • This filter can check the access count for a client and if within limit accept the request • Else throw a 429 Error • Code at https://github.com/bhakti- mehta/samples/tree/master/ratelimiting
  • 41. Cache optimizations • Stores response information related to requests in a temporary storage for a specific period of time • Ensures that server is not burdened processing those requests in future when responses can be fulfilled from the cache
  • 42. Cache optimizations Getting from first level cache Getting from second level cache Getting from the DB
  • 43. Dealing with latencies in response • Have a timeout for the aggregation service • Dispatch requests in parallel and collect responses • Associate a priority with all the responses collected
  • 44. Handling partial failures best practices • One service calls another which can be slow or unavailable • Never block indefinitely waiting for the service • Try to return partial results • Provide a caching layer and return cached data
  • 45. Asynchronous Patterns • Pattern to deal with long running jobs • Some resources may take longer time to provide results • Not needing client to wait for the response
  • 46. Reactive programming model • Use reactive programming such as CompletableFuture in Java 8, ListenableFuture • Rx Java
  • 47. Asynchronous API • Reactive patterns • Message Passing – Akka actor model • Message queues – Communication between services via shared message queues – Websockets
  • 48. Logging • Complex distributed systems introduce many points of failure • Logging helps link events/transactions between various components that make an application or a business service • ELK stack • Splunk, syslog • Loggly • LogEntries
  • 49. Logging best practices • Include detailed, consistent pattern across service logs • Obfuscate sensitive data • Identify caller or initiator as part of logs • Do not log payloads by default
  • 50. Best practices when designing APIs for mobile clients – Avoid chattiness – Use aggregator pattern
  • 51. Resilience planning Stage 2 • Before deploy – Load testing – Longevity testing – Capacity planning
  • 52. Load testing • Ensure that you test for load on APIs – Jmeter • Plan for longevity testing
  • 53. Capacity Planning • Anticipate growth • Design for handling exponential growth
  • 54. Resilience planning Stage 3 • After deploy – Health check – Metrics – Phased rollout of features
  • 55.
  • 56. Health Check • Memory • CPU • Threads • Error rate • If any of the checks exceed a threshold send alert
  • 57.
  • 58. Metrics • Response times, throughput – Identify slow running DB queries • GC rate and pause duration – Garbage collection can cause slow responses • Monitor unusual activity • Third party library metrics – For example Couchbase hits – atop
  • 59. Rollout of new features • Phasing rollout of new features • Have a way to turn features off if not behaving as expected • Alerts and more alerts!
  • 60. Real time examples • Netflix's Simian Army induces failures of services and even datacenters during the working day to test both the application's resilience and monitoring. • Latency Monkey to simulate slow running requests • Wiremock to mock services • Saboteur to create deliberate network mayhem
  • 61. Takeaway • Inevitability of failures – Expect systems will fail – Failure prevention
  • 62.

Editor's Notes

  1. The service will have a caching layer and a database layer