LMAX Architecture

23,762 views
23,566 views

Published on

LMAX is a concurrent architecture optimized for high throughput. It is optimized for SEDA architectures.

Published in: Technology, Education

LMAX Architecture

  1. 1. Die LMAX-Architecture withDisruptors:6M Transactions per SecondStephan Schmidt, Vice CTO, brands4friends
  2. 2. MeStephan SchmidtVice CTO brands4friends@codemonkeyismwww.codemonkeyism.comstephan.schmidt@brands4friends.de
  3. 3. 3
  4. 4. brands4friends No.1 Shopping Club in Germany > 360k daily visitors > 4.5M Users eBay company5 20.04.12 WJAX 2011
  5. 5. 6
  6. 6. 7
  7. 7. Development atbrands4friendsTeamJava and web developers,data warehouse developersProcessScrum since 2009Kanban for DWH since2012
  8. 8. LMAX - The London Multi-Asset Exchange "We aim to build the highest performance financial exchange in the world"9 20.04.12 Fußzeilentext
  9. 9. High Performance Transaction Processing10 20.04.12 Fußzeilentext
  10. 10. Business Receive Unmarshal Journal Replicate Marshall Send LogicService / Transaction Processor
  11. 11. Business Receive Unmarshal Journal Replicate Marshall Send Logic Queue Queue Queue Queue Queue QueueService / Transaction Processor
  12. 12. Cores Ghz CPU
  13. 13. Actors? SEDA?14 20.04.12 Fußzeilentext
  14. 14. Stuff that did not work for various reasons 1.  RDBMS 2.  Actors Receive Unmarshal Journal Replicate Business Logic Marshall Send Queue Queue Queue Queue Queue Queue 3.  SEDA Service / Transaction Processor 4.  J2EE …15 20.04.12 Fußzeilentext
  15. 15. LMAX Architecture16 20.04.12 Fußzeilentext
  16. 16. Business Receive Unmarshal Journal Replicate Marshall Send Logic Queue Queue Queue Queue Queue QueueService / Transaction Processor
  17. 17. SizeLinked List Queue Node Node Node NodeAdd Remove Add Remove Array Queue Cache Line Cache Line
  18. 18. Queue as a data structure Problems with Queues 1.  Reading (Take) and Writing (Add) are both write access => Write Contention 2.  Write Contention solves with Locks 1.  Other solutions include Deques 3.  Locks lead to context switches to the kernel 1.  Context switches lead to CPU cache misses etc. 2.  Kernel might use opportunity to do other stuff as well19
  19. 19. Locks Costs according to LMAX Paper Method Time in ms Single Thread 300 Single Thread mit Lock 10.000 Zwei Threads mit Lock 224.000 Single Thread mit CAS 5.700 Zwei Threads mit CAS 30.000 Single Thread/ 4.700 Volatile Write “Compare And Swap” Atomic Reference etc. in Java => No Context Switch Memory Read/Write Barrier20
  20. 20. LMAX Data Structure – Ring Buffer Event Publisher Processor Ring Buffer21
  21. 21. Pre-Allocation of Buckets 31 30 29 28 0 27 1 26 2 25 3 24 4 23 5 22 Event Publisher 6 21 Processor 7 20 8 19 9 18 10 17 11 16 12 15 13 14 Ring Buffer2^5 •  No (less) GC problems •  Objects are near each other in memory22 => cache friendly
  22. 22. Coordination 31 30 29 28 0 27 1 26 2 25 3 24 4 23 5 22 Event Publisher 6 21 Processor 7 20 Claim 8 19 Wait Strategy 9 18 Strategy 10 17 11 16 12 15 13 14 Ring Buffer2^5 1.Claim 2.Write 3.Make Public by advancing sequence23
  23. 23. Latency Business Receive Unmarshal Journal Replicate Marshall Send Logic Queue Queue Queue Queue Queue QueueService / Transaction Processor
  24. 24. JournalReceive Replicate BusinessMessage Logic Unmarshall
  25. 25. Business Receive Unmarshal Journal Replicate Marshall Send Logic Datenstruktur DatenstrukturService / Transaction Processor
  26. 26. LMAX Architektur Input Disruptor Ouput Disruptor Ouput Disruptor Ouput Disruptor Business Logic Handler
  27. 27. HA Node Publisher Receiver Replicator Marshaller Journaler Un- Marshaller Output Disruptor Input Disruptor File SystemJede Stagekann mehrere Business Logic HandlerThreads haben28
  28. 28. Receiver writes on 31. Journaler and Replicator read on 24 and can move up the Receiver sequence to 30. Journaler 31 Replicator 24 Un-Marshaller can move beyond Un- Journaler and Replicator up to Marshaller 30. 19 18 Business Logic Handler needs Business Logic to stay behind all others. Handler29
  29. 29. Java API30 20.04.12 Fußzeilentext
  30. 30. C1P1 C2 C3 C4
  31. 31. C2P1 C1 C3 C4
  32. 32. C1 C2P1 C3 C4
  33. 33. C2P1 C1 C4 C3
  34. 34. P1 C1P2
  35. 35. Demo38 20.04.12 Fußzeilentext
  36. 36. LMAX Low Level Ideas 1.  Simple Code 2.  Everything in memory 3.  Single threaded per CPU for business logic 4.  Business logic has no I/O, I/O is done somewhere else 5.  Scheduler “knows” dependencies of handlers39 20.04.12 Fußzeilentext
  37. 37. 6M TPS? How did LMAX do it? x 10 x 10 3 billions of 1000K+ TPS instructions on modern Custom, cache friendly CPU collections 100K+ TPS Performance Testing Clean organized code Controlled GC Standard libraries 10K+ TPS Very well modeled domain If you dont do anything stupid40
  38. 38. We’re looking for very good developers
  39. 39. Thanks!@codemonkeyismstephan.schmidt@brands4friends.de
  40. 40. Images CC from Flickr: nimboo, imjustcreative, gremionis, justonlysteve, John_Scone, Matthias Wicke, irisgodd3ss, TunnelBug, alandd, seasonal wanderer, raulbarraltamayo, Gilmoth, Dunechaser, graftedno143
  41. 41. Sources “Disruptor: High performance alternative to bounded queues for exchanging data between concurrent threads”, Martin Thompson, Dave Farley, Michael Barker, Patricia Gee, Andrew Stewart, 2011 "The LMAX Architecture”, Martin Fowler, 2011 http://martinfowler.com/articles/lmax.html “How to do 100K+ TPS at less than 1ms latency”, Martin Thompson, Michael Barker, 201044 20.04.12 Fußzeilentext

×