Disruptor –
Ultrafast communication
March 2012
#theedge2012
Guy Raz Nir
Disruptor
» Introduction
» The problem …
» The (not so good) alternatives
» Architecture
» Summary
Agenda
Disruptor
Hida !
(Quiz!)
Disruptor
Introduction
Disruptor
» QUEUE !
» Communication facility between threads.
Disruptor
Disruptor
» London Multi Asset Exchange platform
» Can handle up to 6,000,000 TPS
▪ Dual socket, 3GHz quad-core Nehalem processors.
» 98% transactions under 38ms.
» Average transaction length: 9.22ms
LMAX
Disruptor
» To learn about LMAX Disruptor abilities and
usability.
» To practice “different thinking” in order to
solve complex problems.
Why are we here ?
Disruptor
"Any intelligent fool can make things bigger,
more complex, and more violent.
It takes a touch of genius,
and a lot of courage
to move in the opposite direction.“
Albert Einstein
Disruptor
Disruptor
57.3 MB/s
-20%
Disruptor
“Mechanical Sympathy”
Hardware and software working together in harmony *
* Martin Thompson’s blog
Disruptor
The problem …
Disruptor
» Test case: execution 10,000,000 primitive
increments.
» Single thread execution: ~ 7ms
▪ No concurrency
Multi-threading test
long value = 0;
while (value < 10000000L) {
value++;
Disruptor
» Synchronized approach:
» Mutual exclusion approach:
» AtomicLong:
java.util.concurrent.locks.ReentrantLock lock = …
lock.lock();
value++;
lock.unlock();
synchronized (syncObj) {
value++;
AtomicLong value = new AtomicLong(0);
long result = value.incrementAndGet();
Disruptor
» Single thread, bare execution (value++)
▪ About ~7 milliseconds
» Single thread, AtomicLong
▪ 68 milliseconds (x8.5)
» Single thread with lock
▪ 125 milliseconds (x15.5)
» Single thread with synchronized approach
▪ 450 milliseconds (x56)
Disruptor
» Single thread, bare execution (value++)
▪ About ~7 milliseconds
» Two threads, AtomicLong
▪ 270 milliseconds (x33.7)
» Two thread with lock
▪ 298 milliseconds (x37.5)
Disruptor
Concurrent execution latency (increment)
Time(ms)
Number of threads
Synchronized CAS Lock
Disruptor
Concurrent execution latency (PI calculation)
Time(ms)
Number of threads
Synchronized Lock
“A good preliminary design
overcomes any lastly patch”
Guy (Raz) Nir, The Edge 2012
Disruptor
Core #1
L1 cache
Core #2
L1 cache
Core #3
L1 cache
Core #4
L1 cache
L2 cache L2 cache L2 cache L2 cache
L3 cache
32KB inst.
32KB data
256KB
2M – 16M
64-bit
registers
Model CPU architecture
Disruptor
Non-volatile vs volatile
Non-volatile Volatile
7
Time(ms)
Disruptor
The (not so good) alternatives
Disruptor
» Linked-list based queue
▪ Requires re-allocation of units
▪ Memory fragmentation
▪ Garbage collection
▪ Bad contention
#0 #1 #2 #3 #4
Disruptor
» Cyclic array-based queue
▪ Bad contention
#0 #1 #2 #3 #4
Head
Tail
Disruptor
java.util.concurrent.ArrayBlockingQueue
// Put new element in the queue.
public boolean offer(E e, long timeout, TimeUnit unit) {
// Consume ‘lock’ for writing.
final ReentrantLock lock = this.lock;
lock.lock();
}
// Take one element from the queue.
public E poll() {
// Consume ‘lock’ for reading.
final ReentrantLock lock = this.lock;
lock.lock();
}
Sun (Oracle) JDK 1.7.0_u2
Disruptor
» General purpose assumptions:
▪ Multiple readers, multiple writers
▪ Queues can run as big as memory
▪ Other operations that degrade design
» No regards to hardware
Other problems
Disruptor
Architecture
Disruptor
» Barriers
» Ring buffer
» Sequences
Main components
Disruptors
Barriers
Disruptor
The ring buffer
1 2
3
4
MyDataType[] buffer = ...;
int offset = sequence % buffer.length;
Next read
sequence
Available
sequence
Disruptor
» Array-based cyclic buffer.
▪ Fast index-based accessed.
» Allow us to allocate all entries in advance
▪ Save GC time
▪ Continuous block allocation
▪ Save new costs at runtime.
The ring buffer
1 2
3
4
5
Ring Buffer
Disruptor
Barriers
1 2
3
4
5
Producer Consumer
sequence
nextSequence
Disruptor
public class StandardProducer {
public void offer(Object o) {
// ...
}
}
public class DisruptorProducer {
private RingBuffer buffer;
public void addMessage(String message, long timestamp) {
int seq = buffer.writeSequenceNumber++;
buffer.data[seq].msg = message;
buffer.data[seq].timestamp = timestamp;
buffer.availableSequenceNumber = seq;
}
}
X
Disruptor
public class DisruptorConsumer {
private RingBuffer buffer;
int nextSequenceNumber;
public Object take() {
while (nextSequenceNumber < buffer.sequenceNumber) { .. }
return buffer.get(nextSequenceNumber++);
}
}
Buffer.sequenceNumber
My
sequence
number
Ring Buffer
Disruptor
Multi consumers
1 2
3
4
5
Consumer
sequence
nextSequence = 2
Sequence
barrier
Consumer
nextSequence = 3
Consumer
nextSequence = 4
Disruptor
» Allow us to fetch multiple elements.
» Using event processors
▪ Callbacks
Batches & Events
Disruptor
Code sample – Create ring buffer
//
// Create a new ring buffer.
//
RingBuffer<MyEvent> ringBuffer = new RingBuffer<MyEvent>(
new MyOwnFactory(),
new SingleThreadedClaimStrategy(sizeOfRing),
new SleepingWaitStrategy());
Disruptor
Code sample - Producer
// Request the next available sequence number.
long sequence = buffer.next();
// Fetch the object at the that location.
MyEvent event = buffer.get(sequence);
//
// ... do something with the event.
//
// Notify the rest of the world this event is ready to be consumed.
buffer.publish(sequence);
Disruptor
Code sample - Consumer
// Extract a consumer's barrier.
SequenceBarrier barrier = ringBuffer.newBarrier();
// Wait for an event to come.
barrier.waitFor(nextSequence);
// Take the event (data).
MyEvent event = ringBuffer.get(nextSequence);
Disruptor
» Disruptor is a smart Queue.
» Latest release is 2.8
» Exploit hardware acceleration points.
» Won the Duke’s 2011 award for innovation !
Summary
Disruptor
» Google code:
▪ http://code.google.com/p/disruptor/
» Technical paper:
▪ http://disruptor.googlecode.com/files/Disruptor-1.0.pdf
» Martin Thompson’s blog:
▪ http://mechanical-sympathy.blogspot.com
» Trisha Gee’s blog:
▪ http://mechanitis.blogspot.com/
» InfoQ on Disruptor (session video):
▪ http://www.infoq.com/presentations/LMAX
References
Guy Raz Nir
guyn@alphacsp.com

LMAX Disruptor as real-life example

  • 1.
    Disruptor – Ultrafast communication March2012 #theedge2012 Guy Raz Nir
  • 2.
    Disruptor » Introduction » Theproblem … » The (not so good) alternatives » Architecture » Summary Agenda
  • 3.
  • 4.
  • 5.
    Disruptor » QUEUE ! »Communication facility between threads. Disruptor
  • 6.
    Disruptor » London MultiAsset Exchange platform » Can handle up to 6,000,000 TPS ▪ Dual socket, 3GHz quad-core Nehalem processors. » 98% transactions under 38ms. » Average transaction length: 9.22ms LMAX
  • 7.
    Disruptor » To learnabout LMAX Disruptor abilities and usability. » To practice “different thinking” in order to solve complex problems. Why are we here ?
  • 8.
    Disruptor "Any intelligent foolcan make things bigger, more complex, and more violent. It takes a touch of genius, and a lot of courage to move in the opposite direction.“ Albert Einstein
  • 9.
  • 10.
  • 11.
    Disruptor “Mechanical Sympathy” Hardware andsoftware working together in harmony * * Martin Thompson’s blog
  • 12.
  • 13.
    Disruptor » Test case:execution 10,000,000 primitive increments. » Single thread execution: ~ 7ms ▪ No concurrency Multi-threading test long value = 0; while (value < 10000000L) { value++;
  • 14.
    Disruptor » Synchronized approach: »Mutual exclusion approach: » AtomicLong: java.util.concurrent.locks.ReentrantLock lock = … lock.lock(); value++; lock.unlock(); synchronized (syncObj) { value++; AtomicLong value = new AtomicLong(0); long result = value.incrementAndGet();
  • 15.
    Disruptor » Single thread,bare execution (value++) ▪ About ~7 milliseconds » Single thread, AtomicLong ▪ 68 milliseconds (x8.5) » Single thread with lock ▪ 125 milliseconds (x15.5) » Single thread with synchronized approach ▪ 450 milliseconds (x56)
  • 16.
    Disruptor » Single thread,bare execution (value++) ▪ About ~7 milliseconds » Two threads, AtomicLong ▪ 270 milliseconds (x33.7) » Two thread with lock ▪ 298 milliseconds (x37.5)
  • 17.
    Disruptor Concurrent execution latency(increment) Time(ms) Number of threads Synchronized CAS Lock
  • 18.
    Disruptor Concurrent execution latency(PI calculation) Time(ms) Number of threads Synchronized Lock
  • 19.
    “A good preliminarydesign overcomes any lastly patch” Guy (Raz) Nir, The Edge 2012
  • 20.
    Disruptor Core #1 L1 cache Core#2 L1 cache Core #3 L1 cache Core #4 L1 cache L2 cache L2 cache L2 cache L2 cache L3 cache 32KB inst. 32KB data 256KB 2M – 16M 64-bit registers Model CPU architecture
  • 21.
  • 22.
    Disruptor The (not sogood) alternatives
  • 23.
    Disruptor » Linked-list basedqueue ▪ Requires re-allocation of units ▪ Memory fragmentation ▪ Garbage collection ▪ Bad contention #0 #1 #2 #3 #4
  • 24.
    Disruptor » Cyclic array-basedqueue ▪ Bad contention #0 #1 #2 #3 #4 Head Tail
  • 25.
    Disruptor java.util.concurrent.ArrayBlockingQueue // Put newelement in the queue. public boolean offer(E e, long timeout, TimeUnit unit) { // Consume ‘lock’ for writing. final ReentrantLock lock = this.lock; lock.lock(); } // Take one element from the queue. public E poll() { // Consume ‘lock’ for reading. final ReentrantLock lock = this.lock; lock.lock(); } Sun (Oracle) JDK 1.7.0_u2
  • 26.
    Disruptor » General purposeassumptions: ▪ Multiple readers, multiple writers ▪ Queues can run as big as memory ▪ Other operations that degrade design » No regards to hardware Other problems
  • 27.
  • 28.
    Disruptor » Barriers » Ringbuffer » Sequences Main components
  • 29.
  • 30.
    Disruptor The ring buffer 12 3 4 MyDataType[] buffer = ...; int offset = sequence % buffer.length; Next read sequence Available sequence
  • 31.
    Disruptor » Array-based cyclicbuffer. ▪ Fast index-based accessed. » Allow us to allocate all entries in advance ▪ Save GC time ▪ Continuous block allocation ▪ Save new costs at runtime. The ring buffer 1 2 3 4 5
  • 32.
  • 33.
    Disruptor public class StandardProducer{ public void offer(Object o) { // ... } } public class DisruptorProducer { private RingBuffer buffer; public void addMessage(String message, long timestamp) { int seq = buffer.writeSequenceNumber++; buffer.data[seq].msg = message; buffer.data[seq].timestamp = timestamp; buffer.availableSequenceNumber = seq; } } X
  • 34.
    Disruptor public class DisruptorConsumer{ private RingBuffer buffer; int nextSequenceNumber; public Object take() { while (nextSequenceNumber < buffer.sequenceNumber) { .. } return buffer.get(nextSequenceNumber++); } } Buffer.sequenceNumber My sequence number
  • 35.
    Ring Buffer Disruptor Multi consumers 12 3 4 5 Consumer sequence nextSequence = 2 Sequence barrier Consumer nextSequence = 3 Consumer nextSequence = 4
  • 36.
    Disruptor » Allow usto fetch multiple elements. » Using event processors ▪ Callbacks Batches & Events
  • 37.
    Disruptor Code sample –Create ring buffer // // Create a new ring buffer. // RingBuffer<MyEvent> ringBuffer = new RingBuffer<MyEvent>( new MyOwnFactory(), new SingleThreadedClaimStrategy(sizeOfRing), new SleepingWaitStrategy());
  • 38.
    Disruptor Code sample -Producer // Request the next available sequence number. long sequence = buffer.next(); // Fetch the object at the that location. MyEvent event = buffer.get(sequence); // // ... do something with the event. // // Notify the rest of the world this event is ready to be consumed. buffer.publish(sequence);
  • 39.
    Disruptor Code sample -Consumer // Extract a consumer's barrier. SequenceBarrier barrier = ringBuffer.newBarrier(); // Wait for an event to come. barrier.waitFor(nextSequence); // Take the event (data). MyEvent event = ringBuffer.get(nextSequence);
  • 40.
    Disruptor » Disruptor isa smart Queue. » Latest release is 2.8 » Exploit hardware acceleration points. » Won the Duke’s 2011 award for innovation ! Summary
  • 41.
    Disruptor » Google code: ▪http://code.google.com/p/disruptor/ » Technical paper: ▪ http://disruptor.googlecode.com/files/Disruptor-1.0.pdf » Martin Thompson’s blog: ▪ http://mechanical-sympathy.blogspot.com » Trisha Gee’s blog: ▪ http://mechanitis.blogspot.com/ » InfoQ on Disruptor (session video): ▪ http://www.infoq.com/presentations/LMAX References
  • 42.

Editor's Notes

  • #7 16 threadsAbout 375K TPS per threadAbout 518 billion T/day.1 cent per transaction = 5 billion Euros per day.Won the Duke’s 2011 award for innovation !
  • #14 Sun (Oracle) JDK 1.7Intel i7 2600K (SandyBridge) + Overclocking
  • #21 Intel i7 2600K SandyBridgeL1 cache speed: 450GB/secCPU – Memory speed: about 18GB (x25 slower than L1)
  • #34 Single-threaded example
  • #35 Single-threaded example
  • #36 Producers works in the same way.Disruptor provide various barriers for various models.