Unless you have a problem which scales to many independent tasks easily e.g. web services, you may find that the best way to improve throughput is by reducing latency. This talk starts with Little's Law and it's consequences for high performance computing.
1. Peter Lawrey
CEO of Higher Frequency Trading
JAX Finance 2015
Low Latency: The best way to high
throughput
2. Peter Lawrey
Java Developer/Consultant for hedge fund and trading firms for 6
years.
Most answers for Java and JVM on stackoverflow.com
Founder of the Performance Java User’s Group.
Architect of Chronicle Software
3. Agenda
• Little’s law and concurrency
• Co-ordinated omission
• Why should you use less servers?
4. Little’s law
Little’s law states;
The long-term average number of customers in a stable system L
is equal to the long-term average effective arrival rate, λ,
multiplied by the (Palm-)average time a customer spends in the
system, W; or expressed algebraically: L = λW
5. Little’s law as work.
The number of active workers must be at least the average arrival
rate of tasks multiplied by the average time to complete those
tasks.
workers >= tasks/second * seconds to perform task.
Or
throughput <= workers / latency.
6. Consequences of Little’s law
• If you have a problem with a high degree of independent tasks,
you can throw more workers at the problem to handle the
load. E.g. web services
• If you have a problem with a low degree of independent tasks,
adding more workers will mean more will be idle. E.g. many
trading systems. The solution is to reduce latency to increase
throughput.
7. Consequences of Little’s law
• Average latency is a function, sometimes the inverse, of the
throughput.
• Throughput focuses on the average experience. The worst case
is often the ones which will hurt you, but averages are very
good at hiding your worst cases. E.g. from long GC pauses.
• Testing with Co-ordinated omission also hides worst case
latencies.
8. Co-ordinated omission
• A term coined by Gil Tene.
• Co-ordinated omission occurs when the system being tested is
allowed to apply back pressure on the system doing the
testing. When the tested system being tested is slow, it can
effectively pause the test, esp. when averages or latency
percentiles are considered.
9. Co-ordinated omission: Example
• A shop is open 10 hours a day between 8 AM and 6 PM.
• A customer comes every 5 minutes, waits to be served and
leaves.
• When the shop keeper is there, he takes 1 minute to serve.
• But if he takes a 2 hour lunch break, how does this effect the
average latency or the 98th percentile?
10. How not to measure latency.
• You have one person go to the shop and time how long she has
to wait. Once per day she has to wait 2 hours and 1 minute,
but the rest of the day it only takes 1 minute.
• The average of 97 tests is 2.2 minutes. Had the shop been open
all day, there would be 120 tests, but one took 2 hours. Not
great but doesn’t sound much worse than 1 minute.
• The 98th percentile is 1 minute.
11. Avoiding co-ordinated omission
• You have as many people as you need. Most of the time, only
one is waiting, however over the lunch break, there is 31
people delayed 121, 117, 113, 109 … 5 mins.
• The average of 120 tests is 16.5 minutes wait time. This is much
higher than the 2.2 minutes calculated previously.
• The 98th percentile is 111 minutes, instead of 1 minute in the
previous test.
12. Why use less servers?
• You can buy commodity mid range servers with 38 cores and
512 GB of memory for a reasonable price. < £20K each.
• Increasing number of libraries support off heap storage
allowing you to support much larger datasets in memory.
13. Why use less servers?
• Deploying to one servers lowers the cost of development. The
cost of development is often higher than the cost of the
hardware.
• Deploying to one server also reduces the network latency,
increasing the throughput.
14. Even latencies you can’t see add up
Data passing Latency Human scale Throughput on at a
time
Method call Inlined: 0
Real call: 50 ns.
Eye blink 20,000,000/sec
Shared memory 200 ns Mouse click 5,000,000/sec
SYSV Shared memory 2 µs Drop a phone. 500,000/sec
Low latency network 8 µs Flight a paper plane 125,000/sec
Typical LAN network 30 µs Half a minute 30,000/sec
Typical data grid system 800 µs Running three miles 1,250/sec
60 Hz power flickers 8 ms A football game 120/sec
4G request latency in UK 55 ms A summer’s day. 18/sec
15. Doesn’t the GC stop the world?
• The GC only pauses the JVM when it has some work to do.
Produce less garbage and it will pause less often
• Produce less than 1 GB/hour of garbage and you can get less
than one pause per day. (With a 24 GB Eden)
16. Do I need to avoid all objects?
• In Java 8 you can have very short lived objects placed on the
stack. This requires your code to be inlined and escape analysis
to kick in. When this happens, no garbage is created and the
code is faster.
• You can have very long lived objects, provided you don’t have
too much.
• The rest of your data you can place in native memory (off
heap)
• You can create 1 GB/hour of garbage and still not GC
17. Low Latency with lots of Lambdas
Chronicle Wire is an API for generic serialization and
deserialization. You determine what you want to read/write, but
the exact wire format can be injected. This works for Yaml, Binary
Yaml, and raw data. It will support XML, FIX, JSON and BSON.
This uses lambdas extensively but the objects associated can be
eliminated.
18. Low Latency with lots of Lambdas
wire.writeDocument(false, out ->
out.write(() -> "put")
.marshallable(m ->
m.write(() -> "key").int64(n)
.write(() -> "value").text(words[n])));
As Yaml
--- !!data
put: { key: 1, value: hello }
As Binary Yaml
⒗٠٠٠Ãputu0082⒎٠٠٠⒈åhello
19. Next Steps
• Chronicle is open source so you can start right away!
• Working with clients to produce Chronicle Enterprise
• Support contract for Chronicle and consultancy
20. Q & A
Peter Lawrey
@PeterLawrey
http://chronicle.software
http://vanillajava.blogspot.com