2. Welcome
Me:
• Dave Rosenthal
• Co-founder of FoundationDB
• Spent last three years building a distributed
transactional NoSQL database
• It’s my birthday
Any time you have multiple computers working on a
job, you have a load balancing problem!
3. Warning
There is an ugly downside to learning about load
balancing: TSA checkpoints, grocery store lines,
and traffic lights may become even more
frustrating.
4. What is load balancing?
Wikipedia: “…methodology to distribute
workload across multiple computers … to
achieve optimal resource utilization, maximize
throughput, minimize response time, and avoid
overload”
All part of the latency curve
6. Goal for real-time systems
10000
1000
Latency
100
Low latency at
10 given load
1
Jobs/second
7. Goal for batch systems
10000
1000
High Jobs/sec at a
Latency
100 reasonable latency
10
1
Jobs/second
8. The latency curve
1000
100
Latency (ms)
Better load balancing strategies
can dramatically improve both
latency and throughput
10
1
0 0.2 0.4 0.6 0.8 1
Load
9. Load balancing tensions
• We want to reduce queue lengths in the
system to yield better latency
• We want to lengthen queue lengths to keep a
“buffer” of work to keep busy during irregular
traffic and yield better throughput
• For distributed systems, equalizing queue
lengths sounds good
10. Can we just limit queue sizes?
40
35
30
% of dropped jobs
25
20
15
10
5
0
0 5 10 15 20
Queued job limit
11. Simple strategies
Global job queue: for slow tasks
Round robin: for highly uniform situations
Random: probably won’t screw you
Sticky: for cacheable situations
Fastest of N tries: tradeoff throughput for
latency. I recommend N = 2 or 3.
12. Use a global queue if possible
10
Latency under 80% load
1
Random assignment
Global Job Queue
0.1
1 2 3 4 5 6 7 8 9 10
Cluster Size
13. Options for information transfer
• None (rare)
• Latency (most common)
• Failure detection
• Explicit
– Load average
– Queue length
– Response times
14. FoundationDB’s approach
1. Request to random of three servers
2. Server either answers query or replies “busy” if its
queue is longer than the queue limit estimate
3. Queries that were busy are sent to second random
server with “must do” flag set.
Queue limit = 25 * 2^(20*P)
• A global queue limit is implicitly shared by estimating
the fraction of incoming requests (P) that are flagged
“must do”
• Converges to a P(redirect)/queue-size equilibrium
15. FDB latency curve before/after
100
10
Latency
1
0.1
0 200000 400000 600000 800000 1000000 1200000
Operations per second
16. Tackling load balancing
• Queuing theory: One useful insight
• Simulation: Do this
• Instrumentation: Do this
• Control theory: Know how to avoid this
• Operations research: Read about this for fun
– Blackett: Shield planes where they are not shot!
17. The one insight: Little’s law
Q = R*W
• (Q)ueue size = (R)ate * (W)ait-time
• Q is the average number of jobs in the system
• R is the average arrival rate (jobs/second)
• W is the average wait time (seconds)
• For any (!) steady-state systems
– Or sub-systems, or joint systems, or…
18. Little’s law example 1
Q = R*W
• We get 1,000,000 request per second (R=1E6)
• We take 100 ms to service each request
• (Q = 1E6*0.100)
• Little’s Law: Average queue depth is 100,000!
19. Little’s law example 2
W = Q/R
• We have 100 users in the system making
continuous requests (Q=100)
• We get 10,000 requests per second
• (W = 100 / 10,000)
• Little’s Law: Average wait time is 10 ms
20. Little’s law ramifications
Q = R*W
• In distributed system:
– R scales up
– W remains the same, or gets a bit worse
• To maintain performance, you’re going to
need a whole lot of jobs in flight
21. The rest of queuing theory
Erlang
• A language
• A man (Agner Krarup Erlang)
• And a unit! (Q from little’s law AKA offered load is
measured in dimensionless Erlang units)
• Erlang-B formula (for limited-length queues)
• Erlang-C formula (P(waiting))
22. Abandon hope
Math for queuing theory
10000
Complexity of Math
1000
100
10
1
Little’s law
Real-world applicability
?
24. Quiz
Model: Jobs of random durations. 80% load.
Goal: Minimize average job latency.
What to work a bit more on?
• First task received
• Last task received
• Shortest task
• Longest task
• Random task
• Task with least work remaining
• Task with most work remaining
26. Simulation results at 80% load
First task received
Last task received
Shortest task
Longest task
Random task
Task with least work remaining
Task with most work remaining
0 10 20 30 40 50
Latency
27. Simulation results at 95% load
First task received
Last task received
Shortest task
Longest task
Random task
Task with least work remaining
Task with most work remaining
1 10 100 1000 10000 100000
Latency
28. FoundationDB’s approach
• Strategy validated using simulation used for a
single server’s fiber scheduling
• High priority: Work on the next task to finish
• But be careful to enqueue incoming work
from the network with highest priority—we
want to know about all our jobs to make good
decisions
• Low priority: Catch up with housekeeping (e.g.
non-log writing)
29. Load spikes
Low load system High load system
Bursts of job requests can destroy latency. The
effect is quadratic: A burst produces a queue of
size B that lasts time proportional to B. On highly-
loaded systems, the effect is multiplied by 1/(1-
load), leading to huge latency impacts.
30. Burst-avoiding tip
1. Search for any delay/interval in your system
2. If system correctness depends on the
delay/interval being exact, first fix that
3. Now change that delay/interval to randomly
wait 0.8-1.2 times the nominal time on each
execution
YMMV, but this tends to diffuse system events more
evenly in time and help utilization and latency.
32. Overload
What happens when work comes in too fast?
• Somewhere in your system a queue is going to
get huge. Where?
• Lowered efficiency due to:
– Sloshing
– Poor caching
• Unconditional acceptance of new work means
no information transfer to previous system!
38. Even queues, external buildup
Queue Queue Queue
A
B
C
Work
Node 1 Node 2 Node 3
D
E
…
39. Our approach
“Ratekeeper”
• Active management of internal queue sizes
prevents sloshing
• Avoids every subcomponent needing it’s own
well-tuned load balancing strategy
• Explicitly send queue information at 10hz back to
a centrally-elected control algorithm
• When queues get large, slow system input
• Pushes latency into an external queue at the
front of the system using “tickets”
40. Ratekeeper in action
1400000
1200000
Operations per second
1000000
800000
600000
400000
200000
0
0 100 200 300 400 500 600
Seconds
42. What can go wrong
Well, we are controlling the queue depths of the
system, so, basically, everything in control
theory…
Namely, oscillation:
43. Recognizing oscillation
• Something moving up and down :)
– Look for low utilization of parallel resources
– Zoom in!
• Think about sources of feedback—is there
some way that having a machine getting more
job done feeds either less or more work for
that machine in the future? (probably yes)
44. What oscillation looks like
70
60
50
Utilization %
40
Node A
30
Node B
20
10
0
1 2 3 4 5
45. What oscillation looks like
120
100
80
Utilization %
60
Node A
Node B
40
20
0
2 2.05 2.1 2.15 2.2 2.25 2.3
-20
46. Avoiding oscillation
• This is control theory—avoid if possible!
• The major thing to know: control gets harder
at frequencies get higher. (e.g. Bose
headphones)
• Two strategies:
– Control on a longer time scale
– Introduce a low-pass-filer in the control loop (e.g.
exponential moving average)
47. Instrumentation
If you can’t measure, you can’t make it better
Things that might be nice to measure:
• Latencies
• Queue lengths
• Causes of latency?
48. Measuring latencies
Our approach:
• We want information about the distribution, not
just the average
• We use a “Distribution” class
– addSample(X)
– Stores 500+ samples
– Throws away half of them when it hits 1000
samples, and halves probability of accepting new
samples
– Also tracks exact min, max, mean, and stddev
49. Measuring queue lengths
Our approach:
• Track the % of time that a queue is at zero length
• Measure queue length snapshots at intervals
• Watch out for oscillations
– Slow ones you can see
– Fast ones look like noise (which, unfortunately, is also
what noise looks like)
– “Zoom in” to exclude the possibility of micro-
oscillations
50. Measuring latency from blocking
• Easy to calculate:
– L = (b0^2 + b1^2 … bN^2) / elapsed
– Total all squared seconds of blocking time over some
interval, divide by the duration of the interval.
• Measures impact of unavailability on mean
latency from random traffic
• Example: Is server’s slow latency explained by
this lock?
• Doesn’t count catch-up time.
51. Summary
Thanks for listening, and remember:
• Everything has a latency curve
• Little’s law
• Randomize regular intervals
• Validate designs with simulation
• Instrument
May your queues be small, but not empty
david.rosenthal@foundationdb.com
52. Prioritization/QOS
• Can help in systems under partial load
• Vital in systems that handle batch and real-
time loads simultaneously
• Be careful that high priority work doesn’t
generate other high priority work plus other
jobs in the queue. This can lead to poor
utilization analogous to the internal queue
buildup case.
53. Congestion pricing
• My favorite topic
• Priority isn’t just a function of the benefit of
your job
• To be a good citizen, you should subtract the
costs to others
• For example, jumping into the front of a long
queue has costs proportional to the queue
size
54. Other FIFO alternatives?
• LIFO
– Avoids the reason to line up early
– In situations where there is adequate capacity to
serve everyone, can yield better waiting times for
everyone involved