Utilization, headroom, and all thatPresentation Transcript
Utilization, Headroom, and All That Large-scale Production Engineering User Group January 13, 2011 Daniel Austin Sr. Principal Architect Yahoo! Exceptional Performance Team
Capacity, Headroom, Utilization Headroom Utilization Capacity C = H + U = 1
What’s the Problem?
We want to figure out ways to maximize the value we get from our hardware and software, while meeting our performance goals and other requirements.
What does this mean for utilization? Should it be 100%?
We also need to make sure we are not taking big risks if something happens
We want to avoid this!
A Simple Model…
Each node is identical
The system load is distributed equally in parallel across all nodes
The system can lose 1 node without failing or exceeding 100% utilization
(We’ll handle the serial case later)
…and some Definitions
W = peak-to-average workload ratio
Provides an upper bound on the necessary system capacity
Typically 3-5 for HTTP servers, but bursts can reach 100:1
H = Headroom = 1 -maximum (safe) utilization for a single node
This could be 0% in the limit but this is usually unwise!
Should be closer to 30% to avoid nonlinear response
N = The number of nodes (machines)
What’s the right level of utilization?
U = (1-H)/W*(N-1/N)
That’s a very simple result!
Example: Let’s say we have 3 web servers and a load balancer. If we lose 1 node we’ll have only 2 left. And we know we are cpu-bound, so we’ll choose our headroom to be at least 30% so we don’t get nonlinear response times from an overloaded cpu. At peak our traffic is twice the average, so we have to plan for that too: N = 3, W=2, H=0.3 So the average cpu utilization turns out to be: U = 0.7/2*2/3= 0.235 = 23.5% ! But maybe not the expected result! Load Balancer
But We Can Always Throw Hardware At It, Right?
Not really. Another Example: Let’s say we have 30 web servers and a load balancer as shown. If we lose 1 node we’ll have 29 left, no worries. We’ll choose our headroom to be at least 30% again, and keep peak traffic at twice the average, : N = 30, W=2, H=0.3 So the average cpu utilization turns out to be: U = 0.7/2*29/30= 0.338 = 33.8% ! Lots of machines won’t make a big difference in the utilization. … + 27 more Load Balancer
Four Takeaways On Utilization
Utilization for a single member of a set of parallel servers may be very low (20-30%), especially for smaller systems, if we take redundancy and performance into account.
Key metrics are Headroom & peak-to-average Workload