Advanced Off Heap IPC
in Java
using OpenHFT
(How does it change your design)
Peter Lawrey
CEO, Higher Frequency Trading.
Who are we
Higher Frequency Trading is a small consulting
and software development house specialising in

Low latency, high throughput software

8 developers in Europe and USA.

Sponsor HFT related open source projects

Core Java engineering
What is our OSS
Key OpenHFT projects

OpenHFT Chronicle, low latency logging,
event store and IPC.
(record / log everything)

OpenHFT Collections, cross process
embedded persisted data stores.
(only need the latest)
Millions of operations per second.
Micro-second latency.
Why use Java?
A rule of thumb is that 90% of the time is spent in
10% of the code.
Writing in Java will mean that 10% of your code
might mean optimising heavily.
Writing in C or C++ will mean that 100% of your
code will be harder to write, or you have to use
JNI, JNA, JNR-FFI.
Low level Java works well with natural Java.
Problem: The Java heap size
As the heap gets larger, the worst case GC pauses increase into the seconds.
Solution: Use memory off the heap

This often means using a database.

Embedded data is much faster.

OpenHFT supports embedded data, off heap
shared across multiple processes.
How is off heap memory used?

Memory mapped files

Durable on application restart

One copy in memory.

Can be used without serialization /
deserialization.

Thread safe operations across processes.

Around 8x faster than System V IPC.
Use case: SharedHashMap

Large machine, 240 cores, 3 TB of memory.

80 JVMs sharing 50 GB of data.

One copy in memory.

Between 40 - 350 nano-second latency.
Creating the Map
SharedHashMap<String, BondVOInterface> shm =
new SharedHashMapBuilder()
.generatedValueType(true)
.entrySize(320)
.create(
new File("/dev/shm/BondPortfolioSHM"),
String.class,
BondVOInterface.class
);
Using the Map
// old style map.get, creates objects.
BondVOInterface bond = shm.get("369604101");
// re-using an off heap reference
BondVOInterface bond =
newDirectReference(BondVOInterface.class);
// get or create bond for key
shm.acquireUsing("369604101", bond);
bond.setCoupon(4.25);
double coupon = bond.getCoupon();
Problem: You have more data than
memory

Using a heap larger than main memory will kill
performance, if not lock up your machine.

OpenHFT supports dramatic over committing
with modest impact to performance.

On Linux, sparse files are supported, and data
is swapped asycnrhonously by the OS.
Over commiting your size.
File file = File.createTempFile("over-sized", "deleteme");
SharedHashMap<String, String> map = new
SharedHashMapBuilder()
.entrySize(1024 * 1024)
.entries(1024 * 1024)
.create(file, String.class, String.class);
for (int i = 0; i < 1000; i++) {
char[] chars = new char[i];
Arrays.fill(chars, '+');
map.put("key-" + i, new String(chars));
}
By over committing,
we avoid resizing
System memory: 7.7 GB,
Extents of map: 2199.0 GB,
disk used: 13MB,
addressRange: 7d380b7bd000-
7F380c000000
This was run on a laptop.
BTW: Only one memory mapping is used.
How does this appear in “top”
Note: the third program “java” has a virtual memory size of 2051 GB.
SharedHashMap design

You can cache data, shared between
processes in a thread safe manner.

Allows you to split your JVMs how you want or
add monitoring or control in an external
process.

Can support more data than main memory.

Avoids the need to resize, or collect garbage.
SHM and throughput
SharedHashMap tested on a machine with 128
GB, 16 cores, 32 threads.
String keys, 64-bit long values.

10 million key-values updated at 37 M/s

500 million key-values updated at 23 M/s

On tmpfs, 2.5 billion key-values at 26 M/s
SHM and latency
For a Map of small key-values (both 64-bit longs)
With an update rate of 1 M/s, one thread.
Percentile 100K
entries
1 M entries 10 M entries
50% (typical) 0.1 μsec 0.2 μsec 0.2 μsec
90% (worst 1 in 10) 0.4 μsec 0.5 μsec 0.5 μsec
99% (worst 1 in 100) 4.4 μsec 5.5 μsec 7 μsec
99.9% 9 μsec 10 μsec 10 μsec
99.99% 10 μsec 12 μsec 13 μsec
worst 24 μsec 29 μsec 26 μsec
Problem: your sustained update is
too high for your consumers.
Your consumers might be

on a limited bandwidth.

Humans
Do you want to control the rate of data sent but
still ensure the latest data is available ASAP.
SHM replication

Supports TCP replication and/or UDP & TCP.

UDP replication only sends the data once and
doesn't have NACK storms. Uses TCP as
back up.

You control the rate the data is sent.

It always sends the latest values.
Problem: you want to record
everything, but this is too slow.

Chronicle is designed to support millions of
messages per second, without locking or
garbage.

Build deterministic systems where all the
inputs and outputs are recorded and
reproducible

Downstream systems don't need to interrogate
up stream as they have a complete view of the
state of the system.
Problem: TCP and System IPC take
many micro-seconds.

Chronicle typically takes micro-seconds,
including serialization and deserialization.

Most messaging solutions don't consider
serialization cost in Java.

Short binary messages can be 200 ns.
Use for Chronicle

Synchronous text logging.

Synchronous binary data logging
Use for Chronicle

Messaging between processes
via shared memory

Messaging across systems
Use for Chronicle

Supports recording micro-second timestamps
across the systems

Replay for production data in test
Chronicle and replication
Replication is point to point (TCP)
Server A records an event
– replicates to Server B
Server B reads local copy
– B processes the event
Server B stores the result.
– replicates to Server A
Server A replies.
Round trip
25 micro-seconds
99% of the time
GC-free
Lock less
Off heap
Unbounded
How does it recover?
Once finish()
returns, the OS will do
the rest.
If an excerpt is
incomplete, it will be
pruned.
Cache friendly
Data is laid out continuously, naturally packed.
You can compress some types. One entry
starts in the next byte to the previous one.
Problem: A slow consumer,
slows the producer.
No matter how slow the consumer is, the
producer never has to wait. It never needs to
clean messages before publishing (as a ring
buffer does)
You can start a consumer at the end of the day
e.g. for reporting. The consumer can be more
than the main memory size behind the
producer as a Chronicle is not limited by main
memory.
How does it collect garbage?
There is an assumption that your application has a daily
or weekly maintenance cycle.
This is implemented by
closing the files and
creating new ones.
i.e. the whole lot is moved,
compressed or deleted.
Anything which must be
retained can be copied
to the new Chronicle
Is there a higher level API?
You can hide the low level details with an
interface.
Is there a higher level API?
There is a demo
program with a
simple interface.
This models a “hub”
process which take in
events, processes
them and publishes
results.
Q & A
https://github.com/OpenHFT/OpenHFT
@PeterLawrey
peter.lawrey@higherfrequencytrading.com

Advanced off heap ipc

  • 1.
    Advanced Off HeapIPC in Java using OpenHFT (How does it change your design) Peter Lawrey CEO, Higher Frequency Trading.
  • 2.
    Who are we HigherFrequency Trading is a small consulting and software development house specialising in  Low latency, high throughput software  8 developers in Europe and USA.  Sponsor HFT related open source projects  Core Java engineering
  • 3.
    What is ourOSS Key OpenHFT projects  OpenHFT Chronicle, low latency logging, event store and IPC. (record / log everything)  OpenHFT Collections, cross process embedded persisted data stores. (only need the latest) Millions of operations per second. Micro-second latency.
  • 4.
    Why use Java? Arule of thumb is that 90% of the time is spent in 10% of the code. Writing in Java will mean that 10% of your code might mean optimising heavily. Writing in C or C++ will mean that 100% of your code will be harder to write, or you have to use JNI, JNA, JNR-FFI. Low level Java works well with natural Java.
  • 5.
    Problem: The Javaheap size As the heap gets larger, the worst case GC pauses increase into the seconds.
  • 6.
    Solution: Use memoryoff the heap  This often means using a database.  Embedded data is much faster.  OpenHFT supports embedded data, off heap shared across multiple processes.
  • 7.
    How is offheap memory used?  Memory mapped files  Durable on application restart  One copy in memory.  Can be used without serialization / deserialization.  Thread safe operations across processes.  Around 8x faster than System V IPC.
  • 8.
    Use case: SharedHashMap  Largemachine, 240 cores, 3 TB of memory.  80 JVMs sharing 50 GB of data.  One copy in memory.  Between 40 - 350 nano-second latency.
  • 9.
    Creating the Map SharedHashMap<String,BondVOInterface> shm = new SharedHashMapBuilder() .generatedValueType(true) .entrySize(320) .create( new File("/dev/shm/BondPortfolioSHM"), String.class, BondVOInterface.class );
  • 10.
    Using the Map //old style map.get, creates objects. BondVOInterface bond = shm.get("369604101"); // re-using an off heap reference BondVOInterface bond = newDirectReference(BondVOInterface.class); // get or create bond for key shm.acquireUsing("369604101", bond); bond.setCoupon(4.25); double coupon = bond.getCoupon();
  • 11.
    Problem: You havemore data than memory  Using a heap larger than main memory will kill performance, if not lock up your machine.  OpenHFT supports dramatic over committing with modest impact to performance.  On Linux, sparse files are supported, and data is swapped asycnrhonously by the OS.
  • 12.
    Over commiting yoursize. File file = File.createTempFile("over-sized", "deleteme"); SharedHashMap<String, String> map = new SharedHashMapBuilder() .entrySize(1024 * 1024) .entries(1024 * 1024) .create(file, String.class, String.class); for (int i = 0; i < 1000; i++) { char[] chars = new char[i]; Arrays.fill(chars, '+'); map.put("key-" + i, new String(chars)); }
  • 13.
    By over committing, weavoid resizing System memory: 7.7 GB, Extents of map: 2199.0 GB, disk used: 13MB, addressRange: 7d380b7bd000- 7F380c000000 This was run on a laptop. BTW: Only one memory mapping is used.
  • 14.
    How does thisappear in “top” Note: the third program “java” has a virtual memory size of 2051 GB.
  • 15.
    SharedHashMap design  You cancache data, shared between processes in a thread safe manner.  Allows you to split your JVMs how you want or add monitoring or control in an external process.  Can support more data than main memory.  Avoids the need to resize, or collect garbage.
  • 16.
    SHM and throughput SharedHashMaptested on a machine with 128 GB, 16 cores, 32 threads. String keys, 64-bit long values.  10 million key-values updated at 37 M/s  500 million key-values updated at 23 M/s  On tmpfs, 2.5 billion key-values at 26 M/s
  • 17.
    SHM and latency Fora Map of small key-values (both 64-bit longs) With an update rate of 1 M/s, one thread. Percentile 100K entries 1 M entries 10 M entries 50% (typical) 0.1 μsec 0.2 μsec 0.2 μsec 90% (worst 1 in 10) 0.4 μsec 0.5 μsec 0.5 μsec 99% (worst 1 in 100) 4.4 μsec 5.5 μsec 7 μsec 99.9% 9 μsec 10 μsec 10 μsec 99.99% 10 μsec 12 μsec 13 μsec worst 24 μsec 29 μsec 26 μsec
  • 18.
    Problem: your sustainedupdate is too high for your consumers. Your consumers might be  on a limited bandwidth.  Humans Do you want to control the rate of data sent but still ensure the latest data is available ASAP.
  • 19.
    SHM replication  Supports TCPreplication and/or UDP & TCP.  UDP replication only sends the data once and doesn't have NACK storms. Uses TCP as back up.  You control the rate the data is sent.  It always sends the latest values.
  • 20.
    Problem: you wantto record everything, but this is too slow.  Chronicle is designed to support millions of messages per second, without locking or garbage.  Build deterministic systems where all the inputs and outputs are recorded and reproducible  Downstream systems don't need to interrogate up stream as they have a complete view of the state of the system.
  • 21.
    Problem: TCP andSystem IPC take many micro-seconds.  Chronicle typically takes micro-seconds, including serialization and deserialization.  Most messaging solutions don't consider serialization cost in Java.  Short binary messages can be 200 ns.
  • 22.
    Use for Chronicle  Synchronoustext logging.  Synchronous binary data logging
  • 23.
    Use for Chronicle  Messagingbetween processes via shared memory  Messaging across systems
  • 24.
    Use for Chronicle  Supportsrecording micro-second timestamps across the systems  Replay for production data in test
  • 25.
    Chronicle and replication Replicationis point to point (TCP) Server A records an event – replicates to Server B Server B reads local copy – B processes the event Server B stores the result. – replicates to Server A Server A replies. Round trip 25 micro-seconds 99% of the time GC-free Lock less Off heap Unbounded
  • 26.
    How does itrecover? Once finish() returns, the OS will do the rest. If an excerpt is incomplete, it will be pruned.
  • 27.
    Cache friendly Data islaid out continuously, naturally packed. You can compress some types. One entry starts in the next byte to the previous one.
  • 28.
    Problem: A slowconsumer, slows the producer. No matter how slow the consumer is, the producer never has to wait. It never needs to clean messages before publishing (as a ring buffer does) You can start a consumer at the end of the day e.g. for reporting. The consumer can be more than the main memory size behind the producer as a Chronicle is not limited by main memory.
  • 29.
    How does itcollect garbage? There is an assumption that your application has a daily or weekly maintenance cycle. This is implemented by closing the files and creating new ones. i.e. the whole lot is moved, compressed or deleted. Anything which must be retained can be copied to the new Chronicle
  • 30.
    Is there ahigher level API? You can hide the low level details with an interface.
  • 31.
    Is there ahigher level API? There is a demo program with a simple interface. This models a “hub” process which take in events, processes them and publishes results.
  • 32.