The document discusses the circuit breaker pattern, which allows a subsystem to fail gracefully without causing a complete system failure. It is primarily used in microservice architectures to handle failures that are inevitable. A circuit breaker acts like a physical circuit breaker, where it trips open after failures exceed a threshold for a period of time to prevent cascading failures. The document then demonstrates the circuit breaker pattern using Spring Cloud, Eureka, Hystrix and Zuul in a sample application topology.
Introduction to Frontline Systems presented by Vikash Kodati outlines the agenda including key topics on the Circuit Breaker pattern.
Highlights features of microservices: componentization, decentralized data management, infrastructure automation, and design to handle failures.
Emphasizes on typical failures in system clusters and the importance of planning for operational resilience with statistics on various potential failures.
Discusses the need for a fault-tolerant system that ensures continuous operation, high availability, and graceful failure handling.
Outlines strategies for developing fault-tolerant systems, including circuit breakers, timeouts, and load testing during different phases.
Defines Circuit Breaker pattern with examples and configurable thresholds for failures based on service response times.
Illustration of the Circuit Breaker concept, though the details aren't specified in the slide.
Describes the state transitions of the circuit breaker: Closed, Open, and Half-Open with operations during these states.
Visual layout of a demo topology involving web browser, Zuul, Eureka Server, and Reading Service.
Outlines key components involved in Circuit Breaker pattern implementation including service discovery and intelligent routing.
Introduces the Hystrix Dashboard, a tool for monitoring circuit breakers.
Drill down insights into metrics and performance indicators on the Hystrix Dashboard.
Summarizes the circuit breaker pattern's role in ensuring subsystem resilience against failures.
Concludes the presentation with contact information for further inquiries and a session for questions.
CHARACTERISTICS OF MICROSERVICE
6/13/2016T-MobileConfidential3
• Componentization via services
• Organized around business capabilities
• Products not projects
• Smart endpoints and dump pipes
• Decentralized Data Management
• Infrastructure Automation
• Design for failure
4.
DESIGN FOR FAILURE
6/13/2016T-MobileConfidential4
Typical first year for a new cluster:
~0.5 overheating (power down most machines in <5 mins, ~1-2 days to recover)
~1 PDU failure (~500-1000 machines suddenly disappear, ~6 hours to come back)
~1 rack-move (plenty of warning, ~500-1000 machines powered down, ~6 hours)
~1 network rewiring (rolling ~5% of machines down over 2-day span)
~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back)
~5 racks go wonky (40-80 machines see 50% packet loss)
~8 network maintenances (4 might cause ~30-minute random connectivity losses)
~12 router reloads (takes out DNS and external vips for a couple minutes)
~3 router failures (have to immediately pull traffic for an hour)
~dozens of minor 30-second blips for dns
~1000 individual machine failures
~thousands of hard drive failures
slow disks, bad memory, misconfigured machines, flaky machines, etc.
Note: Data taken from Jeff Dean’s slides
5.
PROBLEM STATEMENT
4/6/2016 T-MobileConfidential5
Giventhe types of failures that can occur, we need a Fault-
Tolerant system such that it
• System to continues to operate in event of failure of a
subset of its components
• System needs to be Highly Available (HA)
• Handles failure gracefully
CIRCUIT BREAKER PATTERN
4/6/2016T-MobileConfidential7
• If a power surge occurs in the electrical wiring, the breaker will
trip. (“On” to “Off”)
• Netflix Hystrix follows circuit breaker pattern
• If a service’s error rate exceeds a threshold it will trip the
circuit breaker and blocks the requests for a specific period of
time
• Threshold configurable:
• End point taking > 1 sec to respond
• End point returns a 500 error
• End point returns a 500 error 6 times in a row
SUMMARY
6/13/2016 T-MobileConfidential14
• Likea physical circuit breaker, the circuit breaker
pattern allows a subsystem to fail gracefully without
a complete system failure
• Failure is inevitable, be prepared for it
• Primarily used in aggregation scnearios
#2 Encourage interactive session
Informal discussion
Eat lunch
My Goal is to keep us all on the same page at a conceptual level.
Please stop me and ask questions
#4 Its still hard to come up with a firm definition.
Instead of defining think about common characteristics.
Most of those who are doing MS will be doing most of these things
Lets go through each of these
#5 Netflix randomly brings down nodes and simulate failure to check resiliency
Bring up CAP theorem here
#6 If its operating quality decreases at all, the decrease is proportional to the severity of the failure, as compared to a naively designed system in which even a small failure can cause total breakdown.