SlideShare a Scribd company logo
1 of 68
Resilience Planning and how the
empire strikes back
Bhakti Mehta
@bhakti_mehta
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/resilient-systems-blue-jeans-
network
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
Introduction
• Senior Software Engineer at Blue Jeans
Network
• Worked at Sun Microsystems/Oracle for 13
years
• Committer to numerous open source projects
including GlassFish Application Server
My recent book
Previous book
Blue Jeans Network
Blue Jeans Network
• Video conferencing in the cloud
• Customers in all segments
• Millions of users
• Interoperable
• Video sharing, Content sharing
• Mobile friendly
• Solutions for large scale events
What you will learn
• Blue Jeans architecture
• Challenges at scale
• Lessons learned, tips and practices to prevent
cascading failures
• Resilience planning at various stages
• Real world examples
Customer B
Top level architecture
INTERNET
Customer A
SIP, H.323
HTTP / HTTPS
Media Node
Web Server
Middleware
services
Cache
Service
discovery
Messaging
DB
Proxy
layer
Connector Node
Micro services architecture
Path to Micro services
• Advantages
– Simplicity
– Isolation of problems
– Scale up and scale down
– Easy deployment
– Clear separation of concerns
– Heterogeneity and polyglotism
Microservices
• Disadvantages
– Not a free lunch!
– Distributed systems prone to failures
– Eventual consistency
– More effort in terms of deployments, release
managements
– Challenges in testing the various services evolving
independently, regression tests etc
Resilient system
• Processes transactions, even when there are
transient impulses, persistent stresses
• Functions even when there are component
failures disrupting normal processing
• Accepts failures will happen
• Designs for crumple zones
Kinds of failures
• Challenges at scale
• Integration point failures
– Network errors
– Semantic errors.
– Slow responses
– Outright hang
– GC issues
Anticipate failures at scale
• Anticipate growth
• Design for next order of magnitude
• Design for 10x plan to rewrite for 100x
Resiliency planning Stage 1
• When developing code
– Avoiding Cascading failures
• Circuit breaker
• Timeouts
• Retry
• Bulkhead
• Cache optimizations
– Avoid malicious clients
• Rate limiting
Resiliency planning Stage 2
• Planning for dealing with failures before
deploy
– load test
– a/b test
– longevity
Resiliency planning Stage 3
• Watching out for failures after deploy
– health check
– metrics
Cascading failures
Caused by Chain reactions
For example
One node in a load balance group fails
Others need to pick up work
Eventually performance can degenerate
Cascading failures with aggregation
Cascading failure with aggregation
Timeouts
• Clients may prefer a response
– failure
– success
– job queued for later
All aggregation requests to microservices should
have reasonable timeouts set
Types of Timeouts
• Connection timeout
– Max time before connection can be established or
Error
• Socket timeout
– Max time of inactivity between two packets once
connection is established
Timeouts pattern
• Timeouts + Retries go together
• Transient failures can be remedied with fast
retries
• However problems in network can last for a
while so probability of retries failing
Timeouts in code
In JAX-RS
Client client = ClientBuilder.newClient();
client.property(ClientProperties.CONNECT_TIMEOUT, 5000);
client.property(ClientProperties.READ_TIMEOUT, 5000)
Retry pattern
• Retry for failures in case of network failures,
timeouts or server errors
• Helps transient network errors such as
dropped connections or server fail over
Retry pattern
• If one of the services is slow or malfunctioning
and other services keep retrying then the
problem becomes worse
• Solution
– Exponential backoff
– Circuit breaker pattern
Circuit breaker pattern
Circuit breaker A circuit breaker is an electrical device used in an
electrical panel that monitors and controls the amount of amperes
(amps) being sent through
Circuit breaker pattern
• Safety device
• If a power surge occurs in the electrical wiring,
the breaker will trip.
• Flips from “On” to “Off” and shuts electrical
power from that breaker
Circuit breaker
• Netflix Hystrix follows circuit breaker pattern
• If a service’s error rate exceeds a threshold it
will trip the circuit breaker and block the
requests for a specific period of time
Bulkhead
Bulkhead
• Avoiding chain reactions by isolating failures
• Helps prevent cascading failures
Bulkhead
• An example of bulkhead could be isolating the
database dependencies per service
• Similarly other infrastructure components can
be isolated such as cache infrastructure
Rate Limiting
• Restricting the number of requests that can be
made by a client
• Client can be identified based on the access
token used
• Additionally clients can be identified based on
IP address
Rate Limiting
• With JAX-RS Rate limiting can be implemented
as a filter
• This filter can check the access count for a
client and if within limit accept the request
• Else throw a 429 Error
• Code at https://github.com/bhakti-
mehta/samples/tree/master/ratelimiting
Cache optimizations
• Stores response information related to
requests in a temporary storage for a specific
period of time
• Ensures that server is not burdened
processing those requests in future when
responses can be fulfilled from the cache
Cache optimizations
Getting from first level cache
Getting from second
level cache
Getting from the DB
Dealing with latencies in response
• Have a timeout for the aggregation service
• Dispatch requests in parallel and collect
responses
• Associate a priority with all the responses
collected
Handling partial failures best practices
• One service calls another which can be slow or
unavailable
• Never block indefinitely waiting for the service
• Try to return partial results
• Provide a caching layer and return cached
data
Asynchronous Patterns
• Pattern to deal with long running jobs
• Some resources may take longer time to
provide results
• Not needing client to wait for the response
Reactive programming model
• Use reactive programming such as
CompletableFuture in Java 8, ListenableFuture
• Rx Java
Asynchronous API
• Reactive patterns
• Message Passing
– Akka actor model
• Message queues
– Communication between services via shared
message queues
– Websockets
Logging
• Complex distributed systems introduce many
points of failure
• Logging helps link events/transactions between
various components that make an application or
a business service
• ELK stack
• Splunk, syslog
• Loggly
• LogEntries
Logging best practices
• Include detailed, consistent pattern across
service logs
• Obfuscate sensitive data
• Identify caller or initiator as part of logs
• Do not log payloads by default
Best practices when designing APIs for
mobile clients
– Avoid chattiness
– Use aggregator pattern
Resilience planning Stage 2
• Before deploy
– Load testing
– Longevity testing
– Capacity planning
Load testing
• Ensure that you test for load on APIs
– Jmeter
• Plan for longevity testing
Capacity Planning
• Anticipate growth
• Design for handling exponential growth
Resilience planning Stage 3
• After deploy
– Health check
– Metrics
– Phased rollout of features
Health Check
• Memory
• CPU
• Threads
• Error rate
• If any of the checks exceed a threshold send
alert
Monitoring
Monitoring
server
Production Environment
CHECKS
ALERTS
Email
Monitoring Stack
• Log Aggregation frameworkApplication
• Newrelic (Java, Python)
OS / Application
Code
• Collectd / GraphiteNetwork, Server
IcingaHealthchecks
Metrics
• Response times, throughput
– Identify slow running DB queries
• GC rate and pause duration
– Garbage collection can cause slow responses
• Monitor unusual activity
• Third party library metrics
– For example Couchbase hits
– atop
Metrics
• Load average
• Uptime
• Log sizes
Rollout of new features
• Phasing rollout of new features
• Have a way to turn features off if not behaving
as expected
• Alerts and more alerts!
Real time examples
• Netflix's Simian Army induces failures of
services and even datacenters during the
working day to test both the application's
resilience and monitoring.
• Latency Monkey to simulate slow running
requests
• Wiremock to mock services
• Saboteur to create deliberate network
mayhem
Takeaway
• Inevitability of failures
– Expect systems will fail
– Failure prevention
References
• https://commons.wikimedia.org/wiki/File:Bulkhead_PSF.png
• https://en.wikipedia.org/wiki/Circuit_breaker#/media/File:Four_1_pole_circuit_breakers_fitted_in_a_met
er_box.jpg
• https://www.flickr.com/photos/skynoir/ Beer in hand: skynoir/Flickr/Creative Commons License
Questions
• Twitter: @bhakti_mehta
• Email: bhakti@bluejeans.com
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/resilient-
systems-blue-jeans-network

More Related Content

What's hot

Designing Highly-Available Architectures for OTM
Designing Highly-Available Architectures for OTMDesigning Highly-Available Architectures for OTM
Designing Highly-Available Architectures for OTMMavenWire
 
2013 OTM EU SIG evolv applications Data Management
2013 OTM EU SIG evolv applications Data Management2013 OTM EU SIG evolv applications Data Management
2013 OTM EU SIG evolv applications Data ManagementMavenWire
 
Analyzing OTM Logs and Troubleshooting
Analyzing OTM Logs and TroubleshootingAnalyzing OTM Logs and Troubleshooting
Analyzing OTM Logs and TroubleshootingMavenWire
 
Configlets, compliance, RBAC & reports - Network Configuration Manager
Configlets, compliance, RBAC & reports - Network Configuration ManagerConfiglets, compliance, RBAC & reports - Network Configuration Manager
Configlets, compliance, RBAC & reports - Network Configuration ManagerManageEngine, Zoho Corporation
 
Effective admin and development in iib
Effective admin and development in iibEffective admin and development in iib
Effective admin and development in iibm16k
 
Performance Engineering
Performance EngineeringPerformance Engineering
Performance EngineeringKumar Gupta
 
Training Webinar: Enterprise application performance with distributed caching
Training Webinar: Enterprise application performance with distributed cachingTraining Webinar: Enterprise application performance with distributed caching
Training Webinar: Enterprise application performance with distributed cachingOutSystems
 
New Approaches to Faster Oracle Forms System Performance
New Approaches to Faster Oracle Forms System PerformanceNew Approaches to Faster Oracle Forms System Performance
New Approaches to Faster Oracle Forms System PerformanceCorrelsense
 
Database failover from client perspective
Database failover from client perspectiveDatabase failover from client perspective
Database failover from client perspectivePriit Piipuu
 
MariaDB High Availability Webinar
MariaDB High Availability WebinarMariaDB High Availability Webinar
MariaDB High Availability WebinarMariaDB plc
 
OTM(Oracle Transport Management)
OTM(Oracle Transport Management)OTM(Oracle Transport Management)
OTM(Oracle Transport Management)Cognizant
 
Chef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scaleChef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scaleBiju Nair
 
How Mature is Your Infrastructure?
How Mature is Your Infrastructure?How Mature is Your Infrastructure?
How Mature is Your Infrastructure?Gary Stafford
 
Monitoring your physical, virtual and cloud infrastructure with Applications ...
Monitoring your physical, virtual and cloud infrastructure with Applications ...Monitoring your physical, virtual and cloud infrastructure with Applications ...
Monitoring your physical, virtual and cloud infrastructure with Applications ...ManageEngine, Zoho Corporation
 
Load and Performance Testing for J2EE - Testing, monitoring and reporting usi...
Load and Performance Testing for J2EE - Testing, monitoring and reporting usi...Load and Performance Testing for J2EE - Testing, monitoring and reporting usi...
Load and Performance Testing for J2EE - Testing, monitoring and reporting usi...Alexandru Ersenie
 
Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...
Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...
Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...Michael Pirker
 
Performance Testing from Scratch + JMeter intro
Performance Testing from Scratch + JMeter introPerformance Testing from Scratch + JMeter intro
Performance Testing from Scratch + JMeter introMykola Kovsh
 

What's hot (20)

Designing Highly-Available Architectures for OTM
Designing Highly-Available Architectures for OTMDesigning Highly-Available Architectures for OTM
Designing Highly-Available Architectures for OTM
 
2013 OTM EU SIG evolv applications Data Management
2013 OTM EU SIG evolv applications Data Management2013 OTM EU SIG evolv applications Data Management
2013 OTM EU SIG evolv applications Data Management
 
Analyzing OTM Logs and Troubleshooting
Analyzing OTM Logs and TroubleshootingAnalyzing OTM Logs and Troubleshooting
Analyzing OTM Logs and Troubleshooting
 
Configlets, compliance, RBAC & reports - Network Configuration Manager
Configlets, compliance, RBAC & reports - Network Configuration ManagerConfiglets, compliance, RBAC & reports - Network Configuration Manager
Configlets, compliance, RBAC & reports - Network Configuration Manager
 
Effective admin and development in iib
Effective admin and development in iibEffective admin and development in iib
Effective admin and development in iib
 
Performance Engineering
Performance EngineeringPerformance Engineering
Performance Engineering
 
Performance testing material
Performance testing materialPerformance testing material
Performance testing material
 
Training Webinar: Enterprise application performance with distributed caching
Training Webinar: Enterprise application performance with distributed cachingTraining Webinar: Enterprise application performance with distributed caching
Training Webinar: Enterprise application performance with distributed caching
 
New Approaches to Faster Oracle Forms System Performance
New Approaches to Faster Oracle Forms System PerformanceNew Approaches to Faster Oracle Forms System Performance
New Approaches to Faster Oracle Forms System Performance
 
Database failover from client perspective
Database failover from client perspectiveDatabase failover from client perspective
Database failover from client perspective
 
MariaDB High Availability Webinar
MariaDB High Availability WebinarMariaDB High Availability Webinar
MariaDB High Availability Webinar
 
OTM(Oracle Transport Management)
OTM(Oracle Transport Management)OTM(Oracle Transport Management)
OTM(Oracle Transport Management)
 
Chef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scaleChef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scale
 
Play With Streams
Play With StreamsPlay With Streams
Play With Streams
 
How Mature is Your Infrastructure?
How Mature is Your Infrastructure?How Mature is Your Infrastructure?
How Mature is Your Infrastructure?
 
Monitoring your physical, virtual and cloud infrastructure with Applications ...
Monitoring your physical, virtual and cloud infrastructure with Applications ...Monitoring your physical, virtual and cloud infrastructure with Applications ...
Monitoring your physical, virtual and cloud infrastructure with Applications ...
 
Load and Performance Testing for J2EE - Testing, monitoring and reporting usi...
Load and Performance Testing for J2EE - Testing, monitoring and reporting usi...Load and Performance Testing for J2EE - Testing, monitoring and reporting usi...
Load and Performance Testing for J2EE - Testing, monitoring and reporting usi...
 
Improving User Experience with Applications Manager
Improving User Experience with Applications ManagerImproving User Experience with Applications Manager
Improving User Experience with Applications Manager
 
Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...
Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...
Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...
 
Performance Testing from Scratch + JMeter intro
Performance Testing from Scratch + JMeter introPerformance Testing from Scratch + JMeter intro
Performance Testing from Scratch + JMeter intro
 

Viewers also liked

Navigating the AWS Compliance Framework | AWS Security Roadshow Dublin
Navigating the AWS Compliance Framework | AWS Security Roadshow DublinNavigating the AWS Compliance Framework | AWS Security Roadshow Dublin
Navigating the AWS Compliance Framework | AWS Security Roadshow DublinAmazon Web Services
 
IntelliMedia Netwoks Services
IntelliMedia Netwoks ServicesIntelliMedia Netwoks Services
IntelliMedia Netwoks ServicesRaj Shah
 
2014 Future of Cloud Computing - 4th Annual Survey Results
2014 Future of Cloud Computing - 4th Annual Survey Results2014 Future of Cloud Computing - 4th Annual Survey Results
2014 Future of Cloud Computing - 4th Annual Survey ResultsMichael Skok
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)Kohei KaiGai
 
Security and Privacy of Sensitive Data in Cloud Computing : A Survey of Recen...
Security and Privacy of Sensitive Data in Cloud Computing : A Survey of Recen...Security and Privacy of Sensitive Data in Cloud Computing : A Survey of Recen...
Security and Privacy of Sensitive Data in Cloud Computing : A Survey of Recen...csandit
 
01282016 Aerospike-Docker webinar
01282016 Aerospike-Docker webinar01282016 Aerospike-Docker webinar
01282016 Aerospike-Docker webinarAerospike, Inc.
 
Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016
Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016
Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016Jisc
 
Using FIORI to enhance user experience on SAP PPM
Using FIORI to enhance user experience on SAP PPMUsing FIORI to enhance user experience on SAP PPM
Using FIORI to enhance user experience on SAP PPMNeeraj Sahu
 
RESILIENT INTERFACE DESIGN FOR SAFETY-CRITICAL EMBEDDED AUTOMOTIVE SOFTWARE
RESILIENT INTERFACE DESIGN FOR SAFETY-CRITICAL EMBEDDED AUTOMOTIVE SOFTWARERESILIENT INTERFACE DESIGN FOR SAFETY-CRITICAL EMBEDDED AUTOMOTIVE SOFTWARE
RESILIENT INTERFACE DESIGN FOR SAFETY-CRITICAL EMBEDDED AUTOMOTIVE SOFTWAREcsandit
 
Expanding the control over the operating system from the database
Expanding the control over the operating system from the databaseExpanding the control over the operating system from the database
Expanding the control over the operating system from the databaseBernardo Damele A. G.
 
Anupam chaturvedi resume latest
Anupam chaturvedi resume  latestAnupam chaturvedi resume  latest
Anupam chaturvedi resume latestAnupam chaturvedi
 
A introduction to Laravel framework
A introduction to Laravel frameworkA introduction to Laravel framework
A introduction to Laravel frameworkPhu Luong Trong
 

Viewers also liked (20)

Navigating the AWS Compliance Framework | AWS Security Roadshow Dublin
Navigating the AWS Compliance Framework | AWS Security Roadshow DublinNavigating the AWS Compliance Framework | AWS Security Roadshow Dublin
Navigating the AWS Compliance Framework | AWS Security Roadshow Dublin
 
IntelliMedia Netwoks Services
IntelliMedia Netwoks ServicesIntelliMedia Netwoks Services
IntelliMedia Netwoks Services
 
2014 Future of Cloud Computing - 4th Annual Survey Results
2014 Future of Cloud Computing - 4th Annual Survey Results2014 Future of Cloud Computing - 4th Annual Survey Results
2014 Future of Cloud Computing - 4th Annual Survey Results
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 
Security and Privacy of Sensitive Data in Cloud Computing : A Survey of Recen...
Security and Privacy of Sensitive Data in Cloud Computing : A Survey of Recen...Security and Privacy of Sensitive Data in Cloud Computing : A Survey of Recen...
Security and Privacy of Sensitive Data in Cloud Computing : A Survey of Recen...
 
01282016 Aerospike-Docker webinar
01282016 Aerospike-Docker webinar01282016 Aerospike-Docker webinar
01282016 Aerospike-Docker webinar
 
Nidhi jindal
Nidhi jindalNidhi jindal
Nidhi jindal
 
Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016
Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016
Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016
 
Using FIORI to enhance user experience on SAP PPM
Using FIORI to enhance user experience on SAP PPMUsing FIORI to enhance user experience on SAP PPM
Using FIORI to enhance user experience on SAP PPM
 
RESILIENT INTERFACE DESIGN FOR SAFETY-CRITICAL EMBEDDED AUTOMOTIVE SOFTWARE
RESILIENT INTERFACE DESIGN FOR SAFETY-CRITICAL EMBEDDED AUTOMOTIVE SOFTWARERESILIENT INTERFACE DESIGN FOR SAFETY-CRITICAL EMBEDDED AUTOMOTIVE SOFTWARE
RESILIENT INTERFACE DESIGN FOR SAFETY-CRITICAL EMBEDDED AUTOMOTIVE SOFTWARE
 
Expanding the control over the operating system from the database
Expanding the control over the operating system from the databaseExpanding the control over the operating system from the database
Expanding the control over the operating system from the database
 
Group report
Group reportGroup report
Group report
 
CV Mihaela Antonescu
CV Mihaela AntonescuCV Mihaela Antonescu
CV Mihaela Antonescu
 
AKASH_RESUME
AKASH_RESUMEAKASH_RESUME
AKASH_RESUME
 
Building IoT solutions in Milton Keynes | Sarah Gonsalves | June 2015
Building IoT solutions in Milton Keynes | Sarah Gonsalves | June 2015Building IoT solutions in Milton Keynes | Sarah Gonsalves | June 2015
Building IoT solutions in Milton Keynes | Sarah Gonsalves | June 2015
 
CAW2016_Resume
CAW2016_ResumeCAW2016_Resume
CAW2016_Resume
 
Ahmed Elsayed Ali
Ahmed Elsayed Ali Ahmed Elsayed Ali
Ahmed Elsayed Ali
 
Anupam chaturvedi resume latest
Anupam chaturvedi resume  latestAnupam chaturvedi resume  latest
Anupam chaturvedi resume latest
 
winter-2014-2015
winter-2014-2015winter-2014-2015
winter-2014-2015
 
A introduction to Laravel framework
A introduction to Laravel frameworkA introduction to Laravel framework
A introduction to Laravel framework
 

Similar to Resilience Planning & How the Empire Strikes Back

Expect the unexpected: Anticipate and prepare for failures in microservices b...
Expect the unexpected: Anticipate and prepare for failures in microservices b...Expect the unexpected: Anticipate and prepare for failures in microservices b...
Expect the unexpected: Anticipate and prepare for failures in microservices b...Bhakti Mehta
 
Ncerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssmNcerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssmssmarar
 
Azure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesAzure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesIvo Andreev
 
Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014Lari Hotari
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestRodolfo Kohn
 
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...confluent
 
Debugging Microservices - key challenges and techniques - Microservices Odesa...
Debugging Microservices - key challenges and techniques - Microservices Odesa...Debugging Microservices - key challenges and techniques - Microservices Odesa...
Debugging Microservices - key challenges and techniques - Microservices Odesa...Lohika_Odessa_TechTalks
 
Tech talk microservices debugging
Tech talk microservices debuggingTech talk microservices debugging
Tech talk microservices debuggingAndrey Kolodnitsky
 
Mieke Gevers - Performance Testing in 5 Steps - A Guideline to a Successful L...
Mieke Gevers - Performance Testing in 5 Steps - A Guideline to a Successful L...Mieke Gevers - Performance Testing in 5 Steps - A Guideline to a Successful L...
Mieke Gevers - Performance Testing in 5 Steps - A Guideline to a Successful L...TEST Huddle
 
OnPrem Monitoring.pdf
OnPrem Monitoring.pdfOnPrem Monitoring.pdf
OnPrem Monitoring.pdfTarekHamdi8
 
Stay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithStay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithMarkus Eisele
 
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming ApplicationsMetrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applicationsconfluent
 
Integration strategies best practices- Mulesoft meetup April 2018
Integration strategies   best practices- Mulesoft meetup April 2018Integration strategies   best practices- Mulesoft meetup April 2018
Integration strategies best practices- Mulesoft meetup April 2018Rohan Rasane
 
05. performance-concepts-26-slides
05. performance-concepts-26-slides05. performance-concepts-26-slides
05. performance-concepts-26-slidesMuhammad Ahad
 
Continuous Delivery for the Rest of Us
Continuous Delivery for the Rest of UsContinuous Delivery for the Rest of Us
Continuous Delivery for the Rest of UsC4Media
 

Similar to Resilience Planning & How the Empire Strikes Back (20)

Expect the unexpected: Anticipate and prepare for failures in microservices b...
Expect the unexpected: Anticipate and prepare for failures in microservices b...Expect the unexpected: Anticipate and prepare for failures in microservices b...
Expect the unexpected: Anticipate and prepare for failures in microservices b...
 
JMeter
JMeterJMeter
JMeter
 
Ncerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssmNcerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssm
 
Kafka PPT.pptx
Kafka PPT.pptxKafka PPT.pptx
Kafka PPT.pptx
 
Azure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesAzure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challenges
 
Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014
 
DevOps 101
DevOps 101DevOps 101
DevOps 101
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance Test
 
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
 
Performance Testing Overview
Performance Testing OverviewPerformance Testing Overview
Performance Testing Overview
 
Debugging Microservices - key challenges and techniques - Microservices Odesa...
Debugging Microservices - key challenges and techniques - Microservices Odesa...Debugging Microservices - key challenges and techniques - Microservices Odesa...
Debugging Microservices - key challenges and techniques - Microservices Odesa...
 
Tech talk microservices debugging
Tech talk microservices debuggingTech talk microservices debugging
Tech talk microservices debugging
 
Mieke Gevers - Performance Testing in 5 Steps - A Guideline to a Successful L...
Mieke Gevers - Performance Testing in 5 Steps - A Guideline to a Successful L...Mieke Gevers - Performance Testing in 5 Steps - A Guideline to a Successful L...
Mieke Gevers - Performance Testing in 5 Steps - A Guideline to a Successful L...
 
ADF Performance Monitor
ADF Performance MonitorADF Performance Monitor
ADF Performance Monitor
 
OnPrem Monitoring.pdf
OnPrem Monitoring.pdfOnPrem Monitoring.pdf
OnPrem Monitoring.pdf
 
Stay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithStay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolith
 
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming ApplicationsMetrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
 
Integration strategies best practices- Mulesoft meetup April 2018
Integration strategies   best practices- Mulesoft meetup April 2018Integration strategies   best practices- Mulesoft meetup April 2018
Integration strategies best practices- Mulesoft meetup April 2018
 
05. performance-concepts-26-slides
05. performance-concepts-26-slides05. performance-concepts-26-slides
05. performance-concepts-26-slides
 
Continuous Delivery for the Rest of Us
Continuous Delivery for the Rest of UsContinuous Delivery for the Rest of Us
Continuous Delivery for the Rest of Us
 

More from C4Media

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoC4Media
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileC4Media
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020C4Media
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No KeeperC4Media
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like OwnersC4Media
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 

More from C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Recently uploaded

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Resilience Planning & How the Empire Strikes Back

  • 1. Resilience Planning and how the empire strikes back Bhakti Mehta @bhakti_mehta
  • 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /resilient-systems-blue-jeans- network
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  • 4. Introduction • Senior Software Engineer at Blue Jeans Network • Worked at Sun Microsystems/Oracle for 13 years • Committer to numerous open source projects including GlassFish Application Server
  • 8. Blue Jeans Network • Video conferencing in the cloud • Customers in all segments • Millions of users • Interoperable • Video sharing, Content sharing • Mobile friendly • Solutions for large scale events
  • 9. What you will learn • Blue Jeans architecture • Challenges at scale • Lessons learned, tips and practices to prevent cascading failures • Resilience planning at various stages • Real world examples
  • 10. Customer B Top level architecture INTERNET Customer A SIP, H.323 HTTP / HTTPS Media Node Web Server Middleware services Cache Service discovery Messaging DB Proxy layer Connector Node
  • 12. Path to Micro services • Advantages – Simplicity – Isolation of problems – Scale up and scale down – Easy deployment – Clear separation of concerns – Heterogeneity and polyglotism
  • 13. Microservices • Disadvantages – Not a free lunch! – Distributed systems prone to failures – Eventual consistency – More effort in terms of deployments, release managements – Challenges in testing the various services evolving independently, regression tests etc
  • 14. Resilient system • Processes transactions, even when there are transient impulses, persistent stresses • Functions even when there are component failures disrupting normal processing • Accepts failures will happen • Designs for crumple zones
  • 15. Kinds of failures • Challenges at scale • Integration point failures – Network errors – Semantic errors. – Slow responses – Outright hang – GC issues
  • 16.
  • 17.
  • 18. Anticipate failures at scale • Anticipate growth • Design for next order of magnitude • Design for 10x plan to rewrite for 100x
  • 19. Resiliency planning Stage 1 • When developing code – Avoiding Cascading failures • Circuit breaker • Timeouts • Retry • Bulkhead • Cache optimizations – Avoid malicious clients • Rate limiting
  • 20. Resiliency planning Stage 2 • Planning for dealing with failures before deploy – load test – a/b test – longevity
  • 21. Resiliency planning Stage 3 • Watching out for failures after deploy – health check – metrics
  • 22.
  • 23. Cascading failures Caused by Chain reactions For example One node in a load balance group fails Others need to pick up work Eventually performance can degenerate
  • 24. Cascading failures with aggregation
  • 25. Cascading failure with aggregation
  • 26.
  • 27. Timeouts • Clients may prefer a response – failure – success – job queued for later All aggregation requests to microservices should have reasonable timeouts set
  • 28. Types of Timeouts • Connection timeout – Max time before connection can be established or Error • Socket timeout – Max time of inactivity between two packets once connection is established
  • 29. Timeouts pattern • Timeouts + Retries go together • Transient failures can be remedied with fast retries • However problems in network can last for a while so probability of retries failing
  • 30. Timeouts in code In JAX-RS Client client = ClientBuilder.newClient(); client.property(ClientProperties.CONNECT_TIMEOUT, 5000); client.property(ClientProperties.READ_TIMEOUT, 5000)
  • 31. Retry pattern • Retry for failures in case of network failures, timeouts or server errors • Helps transient network errors such as dropped connections or server fail over
  • 32. Retry pattern • If one of the services is slow or malfunctioning and other services keep retrying then the problem becomes worse • Solution – Exponential backoff – Circuit breaker pattern
  • 33. Circuit breaker pattern Circuit breaker A circuit breaker is an electrical device used in an electrical panel that monitors and controls the amount of amperes (amps) being sent through
  • 34. Circuit breaker pattern • Safety device • If a power surge occurs in the electrical wiring, the breaker will trip. • Flips from “On” to “Off” and shuts electrical power from that breaker
  • 35. Circuit breaker • Netflix Hystrix follows circuit breaker pattern • If a service’s error rate exceeds a threshold it will trip the circuit breaker and block the requests for a specific period of time
  • 37. Bulkhead • Avoiding chain reactions by isolating failures • Helps prevent cascading failures
  • 38. Bulkhead • An example of bulkhead could be isolating the database dependencies per service • Similarly other infrastructure components can be isolated such as cache infrastructure
  • 39. Rate Limiting • Restricting the number of requests that can be made by a client • Client can be identified based on the access token used • Additionally clients can be identified based on IP address
  • 40. Rate Limiting • With JAX-RS Rate limiting can be implemented as a filter • This filter can check the access count for a client and if within limit accept the request • Else throw a 429 Error • Code at https://github.com/bhakti- mehta/samples/tree/master/ratelimiting
  • 41. Cache optimizations • Stores response information related to requests in a temporary storage for a specific period of time • Ensures that server is not burdened processing those requests in future when responses can be fulfilled from the cache
  • 42. Cache optimizations Getting from first level cache Getting from second level cache Getting from the DB
  • 43. Dealing with latencies in response • Have a timeout for the aggregation service • Dispatch requests in parallel and collect responses • Associate a priority with all the responses collected
  • 44. Handling partial failures best practices • One service calls another which can be slow or unavailable • Never block indefinitely waiting for the service • Try to return partial results • Provide a caching layer and return cached data
  • 45. Asynchronous Patterns • Pattern to deal with long running jobs • Some resources may take longer time to provide results • Not needing client to wait for the response
  • 46. Reactive programming model • Use reactive programming such as CompletableFuture in Java 8, ListenableFuture • Rx Java
  • 47. Asynchronous API • Reactive patterns • Message Passing – Akka actor model • Message queues – Communication between services via shared message queues – Websockets
  • 48. Logging • Complex distributed systems introduce many points of failure • Logging helps link events/transactions between various components that make an application or a business service • ELK stack • Splunk, syslog • Loggly • LogEntries
  • 49. Logging best practices • Include detailed, consistent pattern across service logs • Obfuscate sensitive data • Identify caller or initiator as part of logs • Do not log payloads by default
  • 50. Best practices when designing APIs for mobile clients – Avoid chattiness – Use aggregator pattern
  • 51. Resilience planning Stage 2 • Before deploy – Load testing – Longevity testing – Capacity planning
  • 52. Load testing • Ensure that you test for load on APIs – Jmeter • Plan for longevity testing
  • 53. Capacity Planning • Anticipate growth • Design for handling exponential growth
  • 54. Resilience planning Stage 3 • After deploy – Health check – Metrics – Phased rollout of features
  • 55.
  • 56. Health Check • Memory • CPU • Threads • Error rate • If any of the checks exceed a threshold send alert
  • 57.
  • 59. Monitoring Stack • Log Aggregation frameworkApplication • Newrelic (Java, Python) OS / Application Code • Collectd / GraphiteNetwork, Server IcingaHealthchecks
  • 60. Metrics • Response times, throughput – Identify slow running DB queries • GC rate and pause duration – Garbage collection can cause slow responses • Monitor unusual activity • Third party library metrics – For example Couchbase hits – atop
  • 61. Metrics • Load average • Uptime • Log sizes
  • 62. Rollout of new features • Phasing rollout of new features • Have a way to turn features off if not behaving as expected • Alerts and more alerts!
  • 63. Real time examples • Netflix's Simian Army induces failures of services and even datacenters during the working day to test both the application's resilience and monitoring. • Latency Monkey to simulate slow running requests • Wiremock to mock services • Saboteur to create deliberate network mayhem
  • 64. Takeaway • Inevitability of failures – Expect systems will fail – Failure prevention
  • 65.
  • 67. Questions • Twitter: @bhakti_mehta • Email: bhakti@bluejeans.com
  • 68. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/resilient- systems-blue-jeans-network