WelcomeMe:• Dave Rosenthal• Co-founder of FoundationDB• Spent last three years building a distributed transactional NoSQL database• It’s my birthdayAny time you have multiple computers working on a job, you have a load balancing problem!
WarningThere is an ugly downside to learning about loadbalancing: TSA checkpoints, grocery store lines,and traffic lights may become even morefrustrating.
What is load balancing?Wikipedia: “…methodology to distributeworkload across multiple computers … toachieve optimal resource utilization, maximizethroughput, minimize response time, and avoidoverload” All part of the latency curve
Goal for real-time systems 10000 1000Latency 100 Low latency at 10 given load 1 Jobs/second
Goal for batch systems 10000 1000 High Jobs/sec at aLatency 100 reasonable latency 10 1 Jobs/second
The latency curve 1000 100Latency (ms) Better load balancing strategies can dramatically improve both latency and throughput 10 1 0 0.2 0.4 0.6 0.8 1 Load
Load balancing tensions• We want to reduce queue lengths in the system to yield better latency• We want to lengthen queue lengths to keep a “buffer” of work to keep busy during irregular traffic and yield better throughput• For distributed systems, equalizing queue lengths sounds good
Can we just limit queue sizes? 40 35 30% of dropped jobs 25 20 15 10 5 0 0 5 10 15 20 Queued job limit
Simple strategiesGlobal job queue: for slow tasksRound robin: for highly uniform situationsRandom: probably won’t screw youSticky: for cacheable situationsFastest of N tries: tradeoff throughput forlatency. I recommend N = 2 or 3.
Use a global queue if possible 10Latency under 80% load 1 Random assignment Global Job Queue 0.1 1 2 3 4 5 6 7 8 9 10 Cluster Size
Options for information transfer• None (rare)• Latency (most common)• Failure detection• Explicit – Load average – Queue length – Response times
FoundationDB’s approach1. Request to random of three servers2. Server either answers query or replies “busy” if its queue is longer than the queue limit estimate3. Queries that were busy are sent to second random server with “must do” flag set.Queue limit = 25 * 2^(20*P)• A global queue limit is implicitly shared by estimating the fraction of incoming requests (P) that are flagged “must do”• Converges to a P(redirect)/queue-size equilibrium
FDB latency curve before/after 100 10Latency 1 0.1 0 200000 400000 600000 800000 1000000 1200000 Operations per second
Tackling load balancing• Queuing theory: One useful insight• Simulation: Do this• Instrumentation: Do this• Control theory: Know how to avoid this• Operations research: Read about this for fun – Blackett: Shield planes where they are not shot!
The one insight: Little’s law Q = R*W• (Q)ueue size = (R)ate * (W)ait-time• Q is the average number of jobs in the system• R is the average arrival rate (jobs/second)• W is the average wait time (seconds)• For any (!) steady-state systems – Or sub-systems, or joint systems, or…
Little’s law example 1 Q = R*W• We get 1,000,000 request per second (R=1E6)• We take 100 ms to service each request• (Q = 1E6*0.100)• Little’s Law: Average queue depth is 100,000!
Little’s law example 2 W = Q/R• We have 100 users in the system making continuous requests (Q=100)• We get 10,000 requests per second• (W = 100 / 10,000)• Little’s Law: Average wait time is 10 ms
Little’s law ramifications Q = R*W• In distributed system: – R scales up – W remains the same, or gets a bit worse• To maintain performance, you’re going to need a whole lot of jobs in flight
The rest of queuing theoryErlang• A language• A man (Agner Krarup Erlang)• And a unit! (Q from little’s law AKA offered load is measured in dimensionless Erlang units)• Erlang-B formula (for limited-length queues)• Erlang-C formula (P(waiting))
Abandon hope Math for queuing theory 10000Complexity of Math 1000 100 10 1 Little’s law Real-world applicability ?
SimulationThe best way to explore distributed systembehavior
QuizModel: Jobs of random durations. 80% load.Goal: Minimize average job latency.What to work a bit more on?• First task received• Last task received• Shortest task• Longest task• Random task• Task with least work remaining• Task with most work remaining
Simulation results at 80% load First task received Last task received Shortest task Longest task Random taskTask with least work remainingTask with most work remaining 0 10 20 30 40 50 Latency
Simulation results at 95% load First task received Last task received Shortest task Longest task Random taskTask with least work remainingTask with most work remaining 1 10 100 1000 10000 100000 Latency
FoundationDB’s approach• Strategy validated using simulation used for a single server’s fiber scheduling• High priority: Work on the next task to finish• But be careful to enqueue incoming work from the network with highest priority—we want to know about all our jobs to make good decisions• Low priority: Catch up with housekeeping (e.g. non-log writing)
Load spikesLow load system High load system Bursts of job requests can destroy latency. The effect is quadratic: A burst produces a queue of size B that lasts time proportional to B. On highly- loaded systems, the effect is multiplied by 1/(1- load), leading to huge latency impacts.
Burst-avoiding tip1. Search for any delay/interval in your system2. If system correctness depends on the delay/interval being exact, first fix that3. Now change that delay/interval to randomly wait 0.8-1.2 times the nominal time on each executionYMMV, but this tends to diffuse system events more evenly in time and help utilization and latency.
OverloadWhat happens when work comes in too fast?• Somewhere in your system a queue is going to get huge. Where?• Lowered efficiency due to: – Sloshing – Poor caching• Unconditional acceptance of new work means no information transfer to previous system!
Overload (cont’d): SloshingLoading 10 million rows into popular NoSQL K/Vstore shows sloshing 12.5 minutes
Overload (cont’d): No sloshingLoading 10 million rows into FDB shows smoothbehavior:
System queuing Queue Queue QueueWork A B C Node 1 Node 2 Node 3 D E
System queuing Queue Queue Queue AWork B C Node 1 Node 2 Node 3 D E
Internal queue buildup Queue Queue Queue A B CWork D Node 1 Node 2 Node 3 E
Even queues, external buildup Queue Queue Queue A B CWork Node 1 Node 2 Node 3 D E …
Our approach“Ratekeeper”• Active management of internal queue sizes prevents sloshing• Avoids every subcomponent needing it’s own well-tuned load balancing strategy• Explicitly send queue information at 10hz back to a centrally-elected control algorithm• When queues get large, slow system input• Pushes latency into an external queue at the front of the system using “tickets”
Ratekeeper in action 1400000 1200000Operations per second 1000000 800000 600000 400000 200000 0 0 100 200 300 400 500 600 Seconds
What can go wrongWell, we are controlling the queue depths of thesystem, so, basically, everything in controltheory…Namely, oscillation:
Recognizing oscillation• Something moving up and down :) – Look for low utilization of parallel resources – Zoom in!• Think about sources of feedback—is there some way that having a machine getting more job done feeds either less or more work for that machine in the future? (probably yes)
What oscillation looks like 70 60 50Utilization % 40 Node A 30 Node B 20 10 0 1 2 3 4 5
What oscillation looks like 120 100 80Utilization % 60 Node A Node B 40 20 0 2 2.05 2.1 2.15 2.2 2.25 2.3 -20
Avoiding oscillation• This is control theory—avoid if possible!• The major thing to know: control gets harder at frequencies get higher. (e.g. Bose headphones)• Two strategies: – Control on a longer time scale – Introduce a low-pass-filer in the control loop (e.g. exponential moving average)
Instrumentation If you can’t measure, you can’t make it betterThings that might be nice to measure:• Latencies• Queue lengths• Causes of latency?
Measuring latenciesOur approach:• We want information about the distribution, not just the average• We use a “Distribution” class – addSample(X) – Stores 500+ samples – Throws away half of them when it hits 1000 samples, and halves probability of accepting new samples – Also tracks exact min, max, mean, and stddev
Measuring queue lengthsOur approach:• Track the % of time that a queue is at zero length• Measure queue length snapshots at intervals• Watch out for oscillations – Slow ones you can see – Fast ones look like noise (which, unfortunately, is also what noise looks like) – “Zoom in” to exclude the possibility of micro- oscillations
Measuring latency from blocking• Easy to calculate: – L = (b0^2 + b1^2 … bN^2) / elapsed – Total all squared seconds of blocking time over some interval, divide by the duration of the interval.• Measures impact of unavailability on mean latency from random traffic• Example: Is server’s slow latency explained by this lock?• Doesn’t count catch-up time.
SummaryThanks for listening, and remember:• Everything has a latency curve• Little’s law• Randomize regular intervals• Validate designs with simulation• Instrument May your queues be small, but not email@example.com
Prioritization/QOS• Can help in systems under partial load• Vital in systems that handle batch and real- time loads simultaneously• Be careful that high priority work doesn’t generate other high priority work plus other jobs in the queue. This can lead to poor utilization analogous to the internal queue buildup case.
Congestion pricing• My favorite topic• Priority isn’t just a function of the benefit of your job• To be a good citizen, you should subtract the costs to others• For example, jumping into the front of a long queue has costs proportional to the queue size
Other FIFO alternatives?• LIFO – Avoids the reason to line up early – In situations where there is adequate capacity to serve everyone, can yield better waiting times for everyone involved
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.