Stuff that did not work for various reasons 1. RDBMS 2. Actors Receive Unmarshal Journal Replicate Business Logic Marshall Send Queue Queue Queue Queue Queue Queue 3. SEDA Service / Transaction Processor 4. J2EE …15 20.04.12 Fußzeilentext
SizeLinked List Queue Node Node Node NodeAdd Remove Add Remove Array Queue Cache Line Cache Line
Queue as a data structure Problems with Queues 1. Reading (Take) and Writing (Add) are both write access => Write Contention 2. Write Contention solves with Locks 1. Other solutions include Deques 3. Locks lead to context switches to the kernel 1. Context switches lead to CPU cache misses etc. 2. Kernel might use opportunity to do other stuff as well19
Locks Costs according to LMAX Paper Method Time in ms Single Thread 300 Single Thread mit Lock 10.000 Zwei Threads mit Lock 224.000 Single Thread mit CAS 5.700 Zwei Threads mit CAS 30.000 Single Thread/ 4.700 Volatile Write “Compare And Swap” Atomic Reference etc. in Java => No Context Switch Memory Read/Write Barrier20
LMAX Data Structure – Ring Buffer Event Publisher Processor Ring Buffer21
Pre-Allocation of Buckets 31 30 29 28 0 27 1 26 2 25 3 24 4 23 5 22 Event Publisher 6 21 Processor 7 20 8 19 9 18 10 17 11 16 12 15 13 14 Ring Buffer2^5 • No (less) GC problems • Objects are near each other in memory22 => cache friendly
HA Node Publisher Receiver Replicator Marshaller Journaler Un- Marshaller Output Disruptor Input Disruptor File SystemJede Stagekann mehrere Business Logic HandlerThreads haben28
Receiver writes on 31. Journaler and Replicator read on 24 and can move up the Receiver sequence to 30. Journaler 31 Replicator 24 Un-Marshaller can move beyond Un- Journaler and Replicator up to Marshaller 30. 19 18 Business Logic Handler needs Business Logic to stay behind all others. Handler29
Java API30 20.04.12 Fußzeilentext
C1P1 C2 C3 C4
C2P1 C1 C3 C4
C1 C2P1 C3 C4
C2P1 C1 C4 C3
P1 C1P2
Demo38 20.04.12 Fußzeilentext
LMAX Low Level Ideas 1. Simple Code 2. Everything in memory 3. Single threaded per CPU for business logic 4. Business logic has no I/O, I/O is done somewhere else 5. Scheduler “knows” dependencies of handlers39 20.04.12 Fußzeilentext
6M TPS? How did LMAX do it? x 10 x 10 3 billions of 1000K+ TPS instructions on modern Custom, cache friendly CPU collections 100K+ TPS Performance Testing Clean organized code Controlled GC Standard libraries 10K+ TPS Very well modeled domain If you dont do anything stupid40
Images CC from Flickr: nimboo, imjustcreative, gremionis, justonlysteve, John_Scone, Matthias Wicke, irisgodd3ss, TunnelBug, alandd, seasonal wanderer, raulbarraltamayo, Gilmoth, Dunechaser, graftedno143
Sources “Disruptor: High performance alternative to bounded queues for exchanging data between concurrent threads”, Martin Thompson, Dave Farley, Michael Barker, Patricia Gee, Andrew Stewart, 2011 "The LMAX Architecture”, Martin Fowler, 2011 http://martinfowler.com/articles/lmax.html “How to do 100K+ TPS at less than 1ms latency”, Martin Thompson, Michael Barker, 201044 20.04.12 Fußzeilentext
Let LinkedIn power your SlideShare experience
+
Let LinkedIn power your SlideShare experience
Customize SlideShare content based on your interests
We will import your LinkedIn profile and you will be visible on SlideShare.
Keep up to date when your LinkedIn contacts post on SlideShare