SlideShare a Scribd company logo
1 of 54
Download to read offline
Low latency & Mechanical Sympathy:
Issues and solutions
Jean-Philippe BEMPEL @jpbempel
Performance Architect http://jpbempel.blogspot.com
© ULLINK 2016
Low latency order router
2
• pure Java SE application
• FIX protocol / Direct Market Access connectivity (TCP)
• transform to an intermediate form (UL Message)
Performance target
3
process & route orders in
< 100us
Low latency
4
• Process under 1ms
• No (Disk) I/O involved
• Memory is the new disk
• Hardware architecture has a major impact on
performance
• Mechanical Sympathy
Agenda
5
• GC pauses
• Power Management
• NUMA
• Cache pollution
6
GC pauses
Generations
7
• Minor GC & Full GC are Stop the World (STW)
=> All app threads are stopped
• Minor GC pauses: 30-100ms
• Full GC pauses: couple of seconds to minutes
GC pauses
8
• None during trading hours
• Outside of trading hours, maintenance time allowed
• Java heap sized accordingly to avoid Full GC
Full GC
9
Some tricks to reduce both
• frequency
• pause time
Minor GC
10
• Increase Young Gen (512MB)
• But increasing YG can increase pause time
• Increase likelihood of objects die
Minor GC frequency
11
pause time factors:
• Refs traversing
• Card scanning
• Object Copy (survivor or tenured)
Minor GC pause time
12
Card Scanning
13
Young Gen
Card table
C C D
Old Gen
1 2 3 4
GC Roots
1 card = 512 bytes
Object Copy
14
• Need to keep orders during app lifetime
• Orders are relatively large objects
• Orders will be promoted to old generation
=> copy to survivor spaces and then to old gen
• Increase significantly the pause time for minor GC
Object Copy
15
• Access pattern: 90% of the time accessed immediately
after creation
• Weak references: avoid copying orders object
• persistence cache to allow reloading orders without too
much impact
• Allocation should be controlled carefully
• Reduce frequency of minor gc occurrence
• JDK API is not always friendly with GC/allocation
Allocation
16
need to rewrite some part:
conversion int -> char[]
int intToString(int i, char[] buffer, int offset)
avoid intermediate/temporary char[]
Allocation
17
Usage of empty collections & pre-sizing:
Collection<String> process() {
Collection<String> results = Collections.emptyList();
for (String s : list) {
if (results == Collections.<String>emptyList())
results = new ArrayList(list.size());
results.add(s);
}
return results;
}
Allocation
18
CMS
19
• Promotion allocation use Free Lists (like malloc)
• No Compaction of Old Gen
CMS
20
Old Gen
• Increase minor pause time
Full GC fallback unpredictable (fragmentation, promotion
failure, …)
176710.366: [GC 176710.366: [ParNew ( promotion failed):
471872K->455002K(471872K), 2.4424250 secs]176712.809: [CMS:
6998032K->5328833K(7864320K), 53.8558640 secs] 7457624K->5328833K(8336192K),
[CMS Perm : 199377K->198212K(524288K)], 56.2987730 secs] [Times: user=56.60
sys=0.78, real=56.29 secs]
Same problem for G1
CMS
21
• We are partners
• Best GC algorithm in the world
• Our tests show maximum pause time of 2ms
• Some overhead compared to HotSpot (8%)
• False issue syndrom
Zing JVM from Azul?
22
23
Power Management
• few orders per day, most of the time in idle
Why power management matters ?
24
• CPU embeds technologies to reduce power usage
• Frequency scaling during execution (P states)
P0 = nominal frequency
• Deep sleep modes during idle phases (C states)
C0 = Running, C1 = idle
• latency to wakeup
cat /sys/devices/system/cpu/cpu0/cpuidle/state3/latency
200 (us)
Power management technologies
25
BIOS Settings
26
BIOS settings impact
27
BIOS settings impact
28
BIOS settings impact
29
BIOS settings impact
30
• Turbo boost is dynamic frequency scaling
• Depends on thermal envelop
• Reducing cores reduce thermal envelop => higher freq
• Some vendors can fix this higher frequency
• Example:
intel E5 v3 14 cores => 2 cores, 2.6Ghz => 3.6GHz
Cores reduction/Fixed Turbo Boost
31
• Some OS drivers are also very aggressive
# cat /sys/devices/system/cpu/cpuidle/current_driver
intel_idle
• Disabled in boot kernel parameters:
intel_idle.max_cstate=0
Power management at OS level
32
With intel_idle driver, 1 msg/s
33
Without intel_idle driver, 1 msg/s
34
With intel_idle driver, 100 msg/s
35
Without intel_idle driver, 100 msg/s
36
37
NUMA
Uniform Memory Access?
38
CPU
DRAM
Uniform Memory Access?
39
CPU
DRAM
CPU
• Non-Uniform Memory Access
• Starts from 2 cpus (sockets)
• 1 Memory controller per socket (Node)
• Local Memory vs remote memory
NUMA
40
NUMA architecture
41
Socket 0 Socket 1
MC
DRAM
MC
DRAM
QPI QPI
65ns 65ns40ns
• Avoid remote memory access
• Avoid scheduling and allocation on different node
• Bind process on one CPU only (numactl)
Restrictive but effective
Binding on node 0:
numactl --cpunodebind=0 --membind=0 java …
• BIOS: node interleaving disabled
How to deal with NUMA ?
42
43
Cache pollution
Cache hierarchy architecture
44
Socket 0
Core0 Core1
L2
L1
L3
L2
L1
DRAM
• CPU L3 shared cache across cores
• Non critical threads can “pollute” cache
• eviction of data required by critical threads
• => adds latency to re-access those data
• Need to isolate critical threads to non-critical ones
Cache pollution on L3
45
• OS Scheduling cannot work in this case
• Need to pin manually threads on cores
• Thread affinity allow us to do this
• Identify our critical and non-critical threads in application
Thread affinity
46
No thread affinity
47
With thread affinity
48
• Other processes can also pollute L3 cache
• Need to isolate all processes away from critical threads
• The whole CPU need to be dedicated
System processes & cache pollution
49
• No other processes should be scheduled on the same
socket (isolcpus)
• But if more than one thread per core need to use cpuset
Isolating cores
50
Socket 0 Socket 1
Core0 Core2
Core4 Core6
Core1 Core3
Core5 Core7
Scheduler
• Make GC pauses predictable
• Tune BIOS/OS for maximum performance
• Control NUMA effect by binding on one node
• Avoid cache pollution with Thread Affinity and CPU
isolation
Recommendations
51
• Can optimize a process without changing your code
• Know your hardware and your OS
• This is Mechanical Sympathy
Conclusion
52
• Mechanical Sympathy blog
http://mechanical-sympathy.blogspot.com/
• Mechanical Sympathy forum
https://groups.google.com/forum/#!forum/mechanical-sympathy
• Java Thread Affinity library
https://github.com/peter-lawrey/Java-Thread-Affinity
References
53
Q&A
Jean-Philippe BEMPEL @jpbempel
Performance Architect http://jpbempel.blogspot.com

More Related Content

What's hot

Debugging Distributed Systems - Velocity Santa Clara 2016
Debugging Distributed Systems - Velocity Santa Clara 2016Debugging Distributed Systems - Velocity Santa Clara 2016
Debugging Distributed Systems - Velocity Santa Clara 2016Donny Nadolny
 
HKG15-100: What is Linaro working on - core development lightning talks
HKG15-100:  What is Linaro working on - core development lightning talksHKG15-100:  What is Linaro working on - core development lightning talks
HKG15-100: What is Linaro working on - core development lightning talksLinaro
 
LXC on Ganeti
LXC on GanetiLXC on Ganeti
LXC on Ganetikawamuray
 
Kernel Recipes 2015: Speed up your kernel development cycle with QEMU
Kernel Recipes 2015: Speed up your kernel development cycle with QEMUKernel Recipes 2015: Speed up your kernel development cycle with QEMU
Kernel Recipes 2015: Speed up your kernel development cycle with QEMUAnne Nicolas
 
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Monica Beckwith
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptablesKernel TLV
 
BKK16-410 SoC Idling & CPU Cluster PM
BKK16-410 SoC Idling & CPU Cluster PMBKK16-410 SoC Idling & CPU Cluster PM
BKK16-410 SoC Idling & CPU Cluster PMLinaro
 
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)Red Hat Developers
 
Varnish Web Accelerator
Varnish Web AcceleratorVarnish Web Accelerator
Varnish Web AcceleratorRahul Ghose
 
Pick diamonds from garbage
Pick diamonds from garbagePick diamonds from garbage
Pick diamonds from garbageTier1 App
 
VLANs in the Linux Kernel
VLANs in the Linux KernelVLANs in the Linux Kernel
VLANs in the Linux KernelKernel TLV
 
CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016Belmiro Moreira
 
Bpf performance tools chapter 4 bcc
Bpf performance tools chapter 4   bccBpf performance tools chapter 4   bcc
Bpf performance tools chapter 4 bccViller Hsiao
 
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)Wei Shan Ang
 
Training Slides: Intermediate 201: Single and Multi-Site Tungsten Clustering ...
Training Slides: Intermediate 201: Single and Multi-Site Tungsten Clustering ...Training Slides: Intermediate 201: Single and Multi-Site Tungsten Clustering ...
Training Slides: Intermediate 201: Single and Multi-Site Tungsten Clustering ...Continuent
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbWei Shan Ang
 
Juggle your data with Tungsten Replicator
Juggle your data with Tungsten ReplicatorJuggle your data with Tungsten Replicator
Juggle your data with Tungsten ReplicatorGiuseppe Maxia
 

What's hot (19)

Debugging Distributed Systems - Velocity Santa Clara 2016
Debugging Distributed Systems - Velocity Santa Clara 2016Debugging Distributed Systems - Velocity Santa Clara 2016
Debugging Distributed Systems - Velocity Santa Clara 2016
 
HKG15-100: What is Linaro working on - core development lightning talks
HKG15-100:  What is Linaro working on - core development lightning talksHKG15-100:  What is Linaro working on - core development lightning talks
HKG15-100: What is Linaro working on - core development lightning talks
 
LXC on Ganeti
LXC on GanetiLXC on Ganeti
LXC on Ganeti
 
Kernel Recipes 2015: Speed up your kernel development cycle with QEMU
Kernel Recipes 2015: Speed up your kernel development cycle with QEMUKernel Recipes 2015: Speed up your kernel development cycle with QEMU
Kernel Recipes 2015: Speed up your kernel development cycle with QEMU
 
Linux Kernel Live Patching
Linux Kernel Live PatchingLinux Kernel Live Patching
Linux Kernel Live Patching
 
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptables
 
BKK16-410 SoC Idling & CPU Cluster PM
BKK16-410 SoC Idling & CPU Cluster PMBKK16-410 SoC Idling & CPU Cluster PM
BKK16-410 SoC Idling & CPU Cluster PM
 
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
 
Varnish Web Accelerator
Varnish Web AcceleratorVarnish Web Accelerator
Varnish Web Accelerator
 
Pick diamonds from garbage
Pick diamonds from garbagePick diamonds from garbage
Pick diamonds from garbage
 
VLANs in the Linux Kernel
VLANs in the Linux KernelVLANs in the Linux Kernel
VLANs in the Linux Kernel
 
CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016
 
Corralling Big Data at TACC
Corralling Big Data at TACCCorralling Big Data at TACC
Corralling Big Data at TACC
 
Bpf performance tools chapter 4 bcc
Bpf performance tools chapter 4   bccBpf performance tools chapter 4   bcc
Bpf performance tools chapter 4 bcc
 
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
 
Training Slides: Intermediate 201: Single and Multi-Site Tungsten Clustering ...
Training Slides: Intermediate 201: Single and Multi-Site Tungsten Clustering ...Training Slides: Intermediate 201: Single and Multi-Site Tungsten Clustering ...
Training Slides: Intermediate 201: Single and Multi-Site Tungsten Clustering ...
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodb
 
Juggle your data with Tungsten Replicator
Juggle your data with Tungsten ReplicatorJuggle your data with Tungsten Replicator
Juggle your data with Tungsten Replicator
 

Similar to Low latency & mechanical sympathy issues and solutions

Tuning Java GC to resolve performance issues
Tuning Java GC to resolve performance issuesTuning Java GC to resolve performance issues
Tuning Java GC to resolve performance issuesSergey Podolsky
 
Gc and-pagescan-attacks-by-linux
Gc and-pagescan-attacks-by-linuxGc and-pagescan-attacks-by-linux
Gc and-pagescan-attacks-by-linuxCuong Tran
 
Become a Garbage Collection Hero
Become a Garbage Collection HeroBecome a Garbage Collection Hero
Become a Garbage Collection HeroTier1app
 
High Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsHigh Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsYinghai Lu
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightDataStax Academy
 
Clr jvm implementation differences
Clr jvm implementation differencesClr jvm implementation differences
Clr jvm implementation differencesJean-Philippe BEMPEL
 
**Understanding_CTS_Log_Messages.pdf
**Understanding_CTS_Log_Messages.pdf**Understanding_CTS_Log_Messages.pdf
**Understanding_CTS_Log_Messages.pdfagnathavasi
 
QCon 2015 Broken Performance Tools
QCon 2015 Broken Performance ToolsQCon 2015 Broken Performance Tools
QCon 2015 Broken Performance ToolsBrendan Gregg
 
Installing Oracle Database on LDOM
Installing Oracle Database on LDOMInstalling Oracle Database on LDOM
Installing Oracle Database on LDOMPhilippe Fierens
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageKernel TLV
 
Playing BBR with a userspace network stack
Playing BBR with a userspace network stackPlaying BBR with a userspace network stack
Playing BBR with a userspace network stackHajime Tazaki
 
Graphics processing uni computer archiecture
Graphics processing uni computer archiectureGraphics processing uni computer archiecture
Graphics processing uni computer archiectureHaris456
 
Oracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionTanel Poder
 
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneEnkitec
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceBrendan Gregg
 
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCHadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCErik Krogen
 
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...ScyllaDB
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeDatabricks
 

Similar to Low latency & mechanical sympathy issues and solutions (20)

Tuning Java GC to resolve performance issues
Tuning Java GC to resolve performance issuesTuning Java GC to resolve performance issues
Tuning Java GC to resolve performance issues
 
Gc and-pagescan-attacks-by-linux
Gc and-pagescan-attacks-by-linuxGc and-pagescan-attacks-by-linux
Gc and-pagescan-attacks-by-linux
 
Become a Garbage Collection Hero
Become a Garbage Collection HeroBecome a Garbage Collection Hero
Become a Garbage Collection Hero
 
High Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsHigh Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and Solutions
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 
Clr jvm implementation differences
Clr jvm implementation differencesClr jvm implementation differences
Clr jvm implementation differences
 
**Understanding_CTS_Log_Messages.pdf
**Understanding_CTS_Log_Messages.pdf**Understanding_CTS_Log_Messages.pdf
**Understanding_CTS_Log_Messages.pdf
 
lec16-memory.ppt
lec16-memory.pptlec16-memory.ppt
lec16-memory.ppt
 
QCon 2015 Broken Performance Tools
QCon 2015 Broken Performance ToolsQCon 2015 Broken Performance Tools
QCon 2015 Broken Performance Tools
 
Installing Oracle Database on LDOM
Installing Oracle Database on LDOMInstalling Oracle Database on LDOM
Installing Oracle Database on LDOM
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
Playing BBR with a userspace network stack
Playing BBR with a userspace network stackPlaying BBR with a userspace network stack
Playing BBR with a userspace network stack
 
Graphics processing uni computer archiecture
Graphics processing uni computer archiectureGraphics processing uni computer archiecture
Graphics processing uni computer archiecture
 
Linux boot-time
Linux boot-timeLinux boot-time
Linux boot-time
 
Oracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in Action
 
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry Osborne
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCHadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
 
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
 

More from Jean-Philippe BEMPEL

Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...Jean-Philippe BEMPEL
 
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...Jean-Philippe BEMPEL
 
Tools in action jdk mission control and flight recorder
Tools in action  jdk mission control and flight recorderTools in action  jdk mission control and flight recorder
Tools in action jdk mission control and flight recorderJean-Philippe BEMPEL
 
Understanding low latency jvm gcs V2
Understanding low latency jvm gcs V2Understanding low latency jvm gcs V2
Understanding low latency jvm gcs V2Jean-Philippe BEMPEL
 
Out ofmemoryerror what is the cost of java objects
Out ofmemoryerror  what is the cost of java objectsOut ofmemoryerror  what is the cost of java objects
Out ofmemoryerror what is the cost of java objectsJean-Philippe BEMPEL
 
OutOfMemoryError : quel est le coût des objets en java
OutOfMemoryError : quel est le coût des objets en javaOutOfMemoryError : quel est le coût des objets en java
OutOfMemoryError : quel est le coût des objets en javaJean-Philippe BEMPEL
 
Lock free programming - pro tips devoxx uk
Lock free programming - pro tips devoxx ukLock free programming - pro tips devoxx uk
Lock free programming - pro tips devoxx ukJean-Philippe BEMPEL
 
Programmation lock free - les techniques des pros (2eme partie)
Programmation lock free - les techniques des pros (2eme partie)Programmation lock free - les techniques des pros (2eme partie)
Programmation lock free - les techniques des pros (2eme partie)Jean-Philippe BEMPEL
 
Programmation lock free - les techniques des pros (1ere partie)
Programmation lock free - les techniques des pros (1ere partie)Programmation lock free - les techniques des pros (1ere partie)
Programmation lock free - les techniques des pros (1ere partie)Jean-Philippe BEMPEL
 
Measuring directly from cpu hardware performance counters
Measuring directly from cpu  hardware performance countersMeasuring directly from cpu  hardware performance counters
Measuring directly from cpu hardware performance countersJean-Philippe BEMPEL
 
Devoxx france 2014 compteurs de perf
Devoxx france 2014 compteurs de perfDevoxx france 2014 compteurs de perf
Devoxx france 2014 compteurs de perfJean-Philippe BEMPEL
 

More from Jean-Philippe BEMPEL (15)

Mastering GC.pdf
Mastering GC.pdfMastering GC.pdf
Mastering GC.pdf
 
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
 
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
 
Tools in action jdk mission control and flight recorder
Tools in action  jdk mission control and flight recorderTools in action  jdk mission control and flight recorder
Tools in action jdk mission control and flight recorder
 
Understanding JVM GC: advanced!
Understanding JVM GC: advanced!Understanding JVM GC: advanced!
Understanding JVM GC: advanced!
 
Understanding low latency jvm gcs V2
Understanding low latency jvm gcs V2Understanding low latency jvm gcs V2
Understanding low latency jvm gcs V2
 
Understanding low latency jvm gcs
Understanding low latency jvm gcsUnderstanding low latency jvm gcs
Understanding low latency jvm gcs
 
Understanding jvm gc advanced
Understanding jvm gc advancedUnderstanding jvm gc advanced
Understanding jvm gc advanced
 
Out ofmemoryerror what is the cost of java objects
Out ofmemoryerror  what is the cost of java objectsOut ofmemoryerror  what is the cost of java objects
Out ofmemoryerror what is the cost of java objects
 
OutOfMemoryError : quel est le coût des objets en java
OutOfMemoryError : quel est le coût des objets en javaOutOfMemoryError : quel est le coût des objets en java
OutOfMemoryError : quel est le coût des objets en java
 
Lock free programming - pro tips devoxx uk
Lock free programming - pro tips devoxx ukLock free programming - pro tips devoxx uk
Lock free programming - pro tips devoxx uk
 
Programmation lock free - les techniques des pros (2eme partie)
Programmation lock free - les techniques des pros (2eme partie)Programmation lock free - les techniques des pros (2eme partie)
Programmation lock free - les techniques des pros (2eme partie)
 
Programmation lock free - les techniques des pros (1ere partie)
Programmation lock free - les techniques des pros (1ere partie)Programmation lock free - les techniques des pros (1ere partie)
Programmation lock free - les techniques des pros (1ere partie)
 
Measuring directly from cpu hardware performance counters
Measuring directly from cpu  hardware performance countersMeasuring directly from cpu  hardware performance counters
Measuring directly from cpu hardware performance counters
 
Devoxx france 2014 compteurs de perf
Devoxx france 2014 compteurs de perfDevoxx france 2014 compteurs de perf
Devoxx france 2014 compteurs de perf
 

Recently uploaded

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 

Recently uploaded (20)

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 

Low latency & mechanical sympathy issues and solutions

  • 1. Low latency & Mechanical Sympathy: Issues and solutions Jean-Philippe BEMPEL @jpbempel Performance Architect http://jpbempel.blogspot.com © ULLINK 2016
  • 2. Low latency order router 2 • pure Java SE application • FIX protocol / Direct Market Access connectivity (TCP) • transform to an intermediate form (UL Message)
  • 3. Performance target 3 process & route orders in < 100us
  • 4. Low latency 4 • Process under 1ms • No (Disk) I/O involved • Memory is the new disk • Hardware architecture has a major impact on performance • Mechanical Sympathy
  • 5. Agenda 5 • GC pauses • Power Management • NUMA • Cache pollution
  • 8. • Minor GC & Full GC are Stop the World (STW) => All app threads are stopped • Minor GC pauses: 30-100ms • Full GC pauses: couple of seconds to minutes GC pauses 8
  • 9. • None during trading hours • Outside of trading hours, maintenance time allowed • Java heap sized accordingly to avoid Full GC Full GC 9
  • 10. Some tricks to reduce both • frequency • pause time Minor GC 10
  • 11. • Increase Young Gen (512MB) • But increasing YG can increase pause time • Increase likelihood of objects die Minor GC frequency 11
  • 12. pause time factors: • Refs traversing • Card scanning • Object Copy (survivor or tenured) Minor GC pause time 12
  • 13. Card Scanning 13 Young Gen Card table C C D Old Gen 1 2 3 4 GC Roots 1 card = 512 bytes
  • 14. Object Copy 14 • Need to keep orders during app lifetime • Orders are relatively large objects • Orders will be promoted to old generation => copy to survivor spaces and then to old gen • Increase significantly the pause time for minor GC
  • 15. Object Copy 15 • Access pattern: 90% of the time accessed immediately after creation • Weak references: avoid copying orders object • persistence cache to allow reloading orders without too much impact
  • 16. • Allocation should be controlled carefully • Reduce frequency of minor gc occurrence • JDK API is not always friendly with GC/allocation Allocation 16
  • 17. need to rewrite some part: conversion int -> char[] int intToString(int i, char[] buffer, int offset) avoid intermediate/temporary char[] Allocation 17
  • 18. Usage of empty collections & pre-sizing: Collection<String> process() { Collection<String> results = Collections.emptyList(); for (String s : list) { if (results == Collections.<String>emptyList()) results = new ArrayList(list.size()); results.add(s); } return results; } Allocation 18
  • 20. • Promotion allocation use Free Lists (like malloc) • No Compaction of Old Gen CMS 20 Old Gen • Increase minor pause time
  • 21. Full GC fallback unpredictable (fragmentation, promotion failure, …) 176710.366: [GC 176710.366: [ParNew ( promotion failed): 471872K->455002K(471872K), 2.4424250 secs]176712.809: [CMS: 6998032K->5328833K(7864320K), 53.8558640 secs] 7457624K->5328833K(8336192K), [CMS Perm : 199377K->198212K(524288K)], 56.2987730 secs] [Times: user=56.60 sys=0.78, real=56.29 secs] Same problem for G1 CMS 21
  • 22. • We are partners • Best GC algorithm in the world • Our tests show maximum pause time of 2ms • Some overhead compared to HotSpot (8%) • False issue syndrom Zing JVM from Azul? 22
  • 24. • few orders per day, most of the time in idle Why power management matters ? 24
  • 25. • CPU embeds technologies to reduce power usage • Frequency scaling during execution (P states) P0 = nominal frequency • Deep sleep modes during idle phases (C states) C0 = Running, C1 = idle • latency to wakeup cat /sys/devices/system/cpu/cpu0/cpuidle/state3/latency 200 (us) Power management technologies 25
  • 31. • Turbo boost is dynamic frequency scaling • Depends on thermal envelop • Reducing cores reduce thermal envelop => higher freq • Some vendors can fix this higher frequency • Example: intel E5 v3 14 cores => 2 cores, 2.6Ghz => 3.6GHz Cores reduction/Fixed Turbo Boost 31
  • 32. • Some OS drivers are also very aggressive # cat /sys/devices/system/cpu/cpuidle/current_driver intel_idle • Disabled in boot kernel parameters: intel_idle.max_cstate=0 Power management at OS level 32
  • 35. With intel_idle driver, 100 msg/s 35
  • 40. • Non-Uniform Memory Access • Starts from 2 cpus (sockets) • 1 Memory controller per socket (Node) • Local Memory vs remote memory NUMA 40
  • 41. NUMA architecture 41 Socket 0 Socket 1 MC DRAM MC DRAM QPI QPI 65ns 65ns40ns
  • 42. • Avoid remote memory access • Avoid scheduling and allocation on different node • Bind process on one CPU only (numactl) Restrictive but effective Binding on node 0: numactl --cpunodebind=0 --membind=0 java … • BIOS: node interleaving disabled How to deal with NUMA ? 42
  • 44. Cache hierarchy architecture 44 Socket 0 Core0 Core1 L2 L1 L3 L2 L1 DRAM
  • 45. • CPU L3 shared cache across cores • Non critical threads can “pollute” cache • eviction of data required by critical threads • => adds latency to re-access those data • Need to isolate critical threads to non-critical ones Cache pollution on L3 45
  • 46. • OS Scheduling cannot work in this case • Need to pin manually threads on cores • Thread affinity allow us to do this • Identify our critical and non-critical threads in application Thread affinity 46
  • 49. • Other processes can also pollute L3 cache • Need to isolate all processes away from critical threads • The whole CPU need to be dedicated System processes & cache pollution 49
  • 50. • No other processes should be scheduled on the same socket (isolcpus) • But if more than one thread per core need to use cpuset Isolating cores 50 Socket 0 Socket 1 Core0 Core2 Core4 Core6 Core1 Core3 Core5 Core7 Scheduler
  • 51. • Make GC pauses predictable • Tune BIOS/OS for maximum performance • Control NUMA effect by binding on one node • Avoid cache pollution with Thread Affinity and CPU isolation Recommendations 51
  • 52. • Can optimize a process without changing your code • Know your hardware and your OS • This is Mechanical Sympathy Conclusion 52
  • 53. • Mechanical Sympathy blog http://mechanical-sympathy.blogspot.com/ • Mechanical Sympathy forum https://groups.google.com/forum/#!forum/mechanical-sympathy • Java Thread Affinity library https://github.com/peter-lawrey/Java-Thread-Affinity References 53
  • 54. Q&A Jean-Philippe BEMPEL @jpbempel Performance Architect http://jpbempel.blogspot.com