SlideShare a Scribd company logo
When the OS
gets in the way
(and what you can do about it)
Mark Price
@epickrram
LMAX Exchange
When the OS
gets in the way
(and what you can do about it)
Linux
Mark Price
@epickrram
LMAX Exchange
● Linux is an excellent general-purpose OS
● Many target platforms
● Scheduling is actually fairly complicated
● Low-latency is a special use-case
● We need to provide some hints
It’s not the OS’s fault
Why should I care?
Useful in some scenarios
● Low latency applications
● Response times < 1ms
● Compute-intensive workloads
● Long-running jobs
A real-world scenario: LMAX
System
Latency = T1 - T0
Before tuning:
250us / 10+ms
After tuning:
80us / <1ms
(mean / max)
Jitter
● “slight irregular movement, variation, or
unsteadiness, especially in an electrical
signal or electronic device”
● Variation in response time latency
● Long-tail in response time
Dealing with it
● First take care of the low-hanging fruit
○ e.g. Garbage collection (gc-free / Zing)
○ e.g. Slow I/O
● Once response times are < 10ms the fun
begins
● Make sure your code is running!
Measure first
● Need to validate changes are good
● End-to-end tests
● Using realistic load
● Change one thing and observe
● A refresher...
Modern hardware layout
Multi-tasking
● num(tasks) > num(HyperThreads)
● OS must share out hardware resources
● Clever? Dumb? Fast? Slow?
● Fair...
Linux CFS
● Completely Fair Scheduler
● Maintains a task ‘queue’ per HT
● Runs the task with the lowest runtime
● Updates task runtime after execution
● Higher priority implies longer execution time
● Tasks are load-balanced across HTs
An example application ...
Threads
… running on a language runtime
… running on an operating system
Optimise for locality - PCI/memory
Target deployment
How do I start?
● BIOS settings for maximum performance
● That’s a whole other talk...
Start with the metal
● lstopo is a useful tool for looking at hardware
● Provided by the hwloc package
● Displays:
○ HyperThreads
○ Physical cores
○ NUMA nodes
○ PCI locality
Discover what’s available
lstopo
lstopo
HyperThread
Core
Caches
NUMA-local
RAM
● Use isolcpus to reserve cpu resource
● kernel boot parameter
● isolcpus=0-5,10-13
● Use taskset to pin your application to cpus:
● taskset -c 10-13 java …
● Set affinity of hot threads:
● sched_setaffinity(...)
Reserve & use specific resource
Deploy the application
sched_setaffinity() !{isolcpus} taskset
You have no load-balancer
Pile-up
A solution: cpusets
● Create hierarchical sets of reserved
resource
● CPU, memory
● Userland tools: cset (SUSE)
Isolate OS processes
● cset set --set=/system --cpu=6-9
○ create a cpuset with cpus 6-9
○ create it at the path /system
● cset proc --move --from-set=/ --to-set=/system
○ move all processes from / to /system
○ -k => move unbound kernel threads
○ --threads => move child threads
○ --force => erm... force
Run the application
● cset set --cpu=0-5,10-13 --set=/app
● cset proc --exec /app taskset -cp 10-13 java …
○ start a process in the /app cpuset
○ run the program on cpus 10-13
● sched_setaffinity() to pin the hot threads to cpus 1,3,5
Isolated threads
/app
sched_setaffinity()
/system
/app
taskset
No more jitter?
● Sampling tracer
● Static/dynamic trace points
● Very low overhead
● A good starting point for digging deeper
● perf list to view available trace points
● network, file-system, scheduler, etc
perf_events
What’s happening CPU?
● perf record -e "sched:sched_switch" -C 3
○ Sample task switches on CPU 3
● perf report (best for multiple events)
● perf script (best for single events)
Rogue process
java
36049 [003] 3011858.780856: sched:sched_switch: java:
36049 [110] R ==> kworker/3:1:13991 [120]
kworker/3:1
13991 [003] 3011858.780861: sched:sched_switch:
kworker/3:1:13991 [120] S ==> java:36049 [110]
ftrace
● Function tracer
● Static/dynamic trace points
● Higher overhead
● But captures everything
● Can provide function graphs
● trace-cmd is the usable front-end
So what is that kernel thread doing?
● trace-cmd record -P <pid> -p function_graph
○ Trace functions called by process <pid>
● trace-cmd report
○ Display captured trace data
Some things can’t be deferred
kworker/3:1-13991 [003] 3013287.180771: funcgraph_entry: | process_one_work() {
kworker/3:1-13991 [003] 3013287.180772: funcgraph_entry: | cache_reap() {
kworker/3:1-13991 [003] 3013287.180772: funcgraph_entry: 0.137 us | mutex_trylock();
kworker/3:1-13991 [003] 3013287.180772: funcgraph_entry: 0.289 us | drain_array();
kworker/3:1-13991 [003] 3013287.180773: funcgraph_entry: 0.040 us | _cond_resched();
………
………
kworker/3:1-13991 [003] 3013287.180859: funcgraph_exit: +86.735 us | }
+86.735 us
Things to look out for
● cache_reap() - SLAB allocator
● vmstat_update() - kernel stats
● other workqueue events
○ perf record -e “workqueue:*” -C 3
● Interrupts - set affinity in /proc/irq
● Timer ticks - tickless mode
● CPU governor - set to performance
○ /sys/devices/system/cpu/cpuN/cpufreq/scaling_governor
Some numbers
● Inter-thread latency is a good proxy
● 2 busy-spinning threads passing a message
● Time taken between producer & consumer
● Record times over several seconds
● Compare tuned/untuned
Results
== Latency (ns) ==
mean
min
50.00%
90.00%
99.00%
99.90%
99.99%
max
untuned
466
200
464
608
768
992
2432
69632
tuned
216
128
208
288
336
544
1664
69632
tuned vs untuned
tuned vs untuned (log scale)
Results (loaded system)
== Latency (ns) ==
mean
min
50.00%
90.00%
99.00%
99.90%
99.99%
max
untuned
545
144
464
544
736
2944
294913
884739
tuned
332
216
336
352
448
544
704
36864
tuned vs untuned (loaded system)
Summary
● Select threads that need access to CPU
● Isolate CPUs from the OS
● Pin important threads to isolated CPUs
● Don’t forget interrupts
● There will be more things…
● Always test assumptions!
● Run validation tests to ensure tunings are as
expected
Thank you
● lmax.com/blog/staff-blogs/
● epickrram.blogspot.com
● github.com/epickrram/perf-workshop
● @epickrram

More Related Content

What's hot

LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)
Linaro
 
Kgdb kdb modesetting
Kgdb kdb modesettingKgdb kdb modesetting
Kgdb kdb modesetting
Kaushal Kumar Gupta
 
Linux monitoring and Troubleshooting for DBA's
Linux monitoring and Troubleshooting for DBA'sLinux monitoring and Troubleshooting for DBA's
Linux monitoring and Troubleshooting for DBA's
Mydbops
 
Interruption Timer Périodique
Interruption Timer PériodiqueInterruption Timer Périodique
Interruption Timer Périodique
Anne Nicolas
 
Le guide de dépannage de la jvm
Le guide de dépannage de la jvmLe guide de dépannage de la jvm
Le guide de dépannage de la jvm
Jean-Philippe BEMPEL
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
Hao-Ran Liu
 
BKK16-104 sched-freq
BKK16-104 sched-freqBKK16-104 sched-freq
BKK16-104 sched-freq
Linaro
 
Spying on the Linux kernel for fun and profit
Spying on the Linux kernel for fun and profitSpying on the Linux kernel for fun and profit
Spying on the Linux kernel for fun and profit
Andrea Righi
 
Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...
Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...
Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...
Anne Nicolas
 
Linux Troubleshooting
Linux TroubleshootingLinux Troubleshooting
Linux Troubleshooting
Keith Wright
 
Smarter Scheduling
Smarter SchedulingSmarter Scheduling
Smarter Scheduling
David Evans
 
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Anne Nicolas
 
Ganglia Monitoring Tool
Ganglia Monitoring ToolGanglia Monitoring Tool
Ganglia Monitoring Tool
sudhirpg
 
Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?
Tier1 App
 
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...
Anne Nicolas
 
Metrics with Ganglia
Metrics with GangliaMetrics with Ganglia
Metrics with Ganglia
Gareth Rushgrove
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
Brendan Gregg
 

What's hot (20)

LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)
 
Kgdb kdb modesetting
Kgdb kdb modesettingKgdb kdb modesetting
Kgdb kdb modesetting
 
Linux monitoring and Troubleshooting for DBA's
Linux monitoring and Troubleshooting for DBA'sLinux monitoring and Troubleshooting for DBA's
Linux monitoring and Troubleshooting for DBA's
 
Interruption Timer Périodique
Interruption Timer PériodiqueInterruption Timer Périodique
Interruption Timer Périodique
 
Le guide de dépannage de la jvm
Le guide de dépannage de la jvmLe guide de dépannage de la jvm
Le guide de dépannage de la jvm
 
RCU
RCURCU
RCU
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
BKK16-104 sched-freq
BKK16-104 sched-freqBKK16-104 sched-freq
BKK16-104 sched-freq
 
Spying on the Linux kernel for fun and profit
Spying on the Linux kernel for fun and profitSpying on the Linux kernel for fun and profit
Spying on the Linux kernel for fun and profit
 
Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...
Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...
Kernel Recipes 2018 - New GPIO interface for linux user space - Bartosz Golas...
 
Linux Troubleshooting
Linux TroubleshootingLinux Troubleshooting
Linux Troubleshooting
 
Process scheduling linux
Process scheduling linuxProcess scheduling linux
Process scheduling linux
 
Smarter Scheduling
Smarter SchedulingSmarter Scheduling
Smarter Scheduling
 
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
 
Ganglia Monitoring Tool
Ganglia Monitoring ToolGanglia Monitoring Tool
Ganglia Monitoring Tool
 
Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?
 
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...
 
Metrics with Ganglia
Metrics with GangliaMetrics with Ganglia
Metrics with Ganglia
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
 
test1_361
test1_361test1_361
test1_361
 

Viewers also liked

Performance: Observe and Tune
Performance: Observe and TunePerformance: Observe and Tune
Performance: Observe and Tune
Paul V. Novarese
 
FPGA Applications in Finance
FPGA Applications in FinanceFPGA Applications in Finance
FPGA Applications in Finance
zpektral
 
TMPA-2015: FPGA-Based Low Latency Sponsored Access
TMPA-2015: FPGA-Based Low Latency Sponsored AccessTMPA-2015: FPGA-Based Low Latency Sponsored Access
TMPA-2015: FPGA-Based Low Latency Sponsored Access
Iosif Itkin
 
Writing and testing high frequency trading engines in java
Writing and testing high frequency trading engines in javaWriting and testing high frequency trading engines in java
Writing and testing high frequency trading engines in java
Peter Lawrey
 
Extent3 turquoise equity_trading_2012
Extent3 turquoise equity_trading_2012Extent3 turquoise equity_trading_2012
Extent3 turquoise equity_trading_2012extentconf Tsoy
 
Extent3 exactpro testing_of_hft_gui
Extent3 exactpro testing_of_hft_guiExtent3 exactpro testing_of_hft_gui
Extent3 exactpro testing_of_hft_guiextentconf Tsoy
 
Work4 23
Work4 23Work4 23
Premiare i dipendenti: i nostri 5 suggerimenti
Premiare i dipendenti: i nostri 5 suggerimentiPremiare i dipendenti: i nostri 5 suggerimenti
Premiare i dipendenti: i nostri 5 suggerimenti
Cezanne HR Italia
 
Come (e perché) le HR devono sviluppare la loro resilienza...
Come (e perché) le HR devono sviluppare la loro resilienza...Come (e perché) le HR devono sviluppare la loro resilienza...
Come (e perché) le HR devono sviluppare la loro resilienza...
Cezanne HR Italia
 
Esclarecer o-habitus
Esclarecer o-habitusEsclarecer o-habitus
Esclarecer o-habitus
Miguel Cohapar
 
Levchenko Andrey
Levchenko AndreyLevchenko Andrey
Levchenko Andrey
Ekaterina Vashurina
 
Presupuesto publico
Presupuesto publicoPresupuesto publico
Presupuesto publico
ajuzcategui
 
C3 new
C3 newC3 new
Three NJ High Schools Roll Out New CTEP Marketing Course to Prepare Students ...
Three NJ High Schools Roll Out New CTEP Marketing Course to Prepare Students ...Three NJ High Schools Roll Out New CTEP Marketing Course to Prepare Students ...
Three NJ High Schools Roll Out New CTEP Marketing Course to Prepare Students ...
legalsinger203
 

Viewers also liked (19)

Performance: Observe and Tune
Performance: Observe and TunePerformance: Observe and Tune
Performance: Observe and Tune
 
Tuned
TunedTuned
Tuned
 
FPGA Applications in Finance
FPGA Applications in FinanceFPGA Applications in Finance
FPGA Applications in Finance
 
TMPA-2015: FPGA-Based Low Latency Sponsored Access
TMPA-2015: FPGA-Based Low Latency Sponsored AccessTMPA-2015: FPGA-Based Low Latency Sponsored Access
TMPA-2015: FPGA-Based Low Latency Sponsored Access
 
Writing and testing high frequency trading engines in java
Writing and testing high frequency trading engines in javaWriting and testing high frequency trading engines in java
Writing and testing high frequency trading engines in java
 
Extent3 turquoise equity_trading_2012
Extent3 turquoise equity_trading_2012Extent3 turquoise equity_trading_2012
Extent3 turquoise equity_trading_2012
 
Extent3 exactpro testing_of_hft_gui
Extent3 exactpro testing_of_hft_guiExtent3 exactpro testing_of_hft_gui
Extent3 exactpro testing_of_hft_gui
 
Work4 23
Work4 23Work4 23
Work4 23
 
Premiare i dipendenti: i nostri 5 suggerimenti
Premiare i dipendenti: i nostri 5 suggerimentiPremiare i dipendenti: i nostri 5 suggerimenti
Premiare i dipendenti: i nostri 5 suggerimenti
 
Come (e perché) le HR devono sviluppare la loro resilienza...
Come (e perché) le HR devono sviluppare la loro resilienza...Come (e perché) le HR devono sviluppare la loro resilienza...
Come (e perché) le HR devono sviluppare la loro resilienza...
 
Esclarecer o-habitus
Esclarecer o-habitusEsclarecer o-habitus
Esclarecer o-habitus
 
Levchenko Andrey
Levchenko AndreyLevchenko Andrey
Levchenko Andrey
 
great
greatgreat
great
 
Madhusmita pati
Madhusmita patiMadhusmita pati
Madhusmita pati
 
Presupuesto publico
Presupuesto publicoPresupuesto publico
Presupuesto publico
 
C3 new
C3 newC3 new
C3 new
 
Three NJ High Schools Roll Out New CTEP Marketing Course to Prepare Students ...
Three NJ High Schools Roll Out New CTEP Marketing Course to Prepare Students ...Three NJ High Schools Roll Out New CTEP Marketing Course to Prepare Students ...
Three NJ High Schools Roll Out New CTEP Marketing Course to Prepare Students ...
 
Silabus mtk xii
Silabus mtk xiiSilabus mtk xii
Silabus mtk xii
 
Ayesha tanwir ppt
Ayesha tanwir pptAyesha tanwir ppt
Ayesha tanwir ppt
 

Similar to When the OS gets in the way

BUD17-309: IRQ prediction
BUD17-309: IRQ prediction BUD17-309: IRQ prediction
BUD17-309: IRQ prediction
Linaro
 
OS scheduling and The anatomy of a context switch
OS scheduling and The anatomy of a context switchOS scheduling and The anatomy of a context switch
OS scheduling and The anatomy of a context switch
Daniel Ben-Zvi
 
HKG15-409: ARM Hibernation enablement on SoCs - a case study
HKG15-409: ARM Hibernation enablement on SoCs - a case studyHKG15-409: ARM Hibernation enablement on SoCs - a case study
HKG15-409: ARM Hibernation enablement on SoCs - a case study
Linaro
 
UNIT 3 - General Purpose Processors
UNIT 3 - General Purpose ProcessorsUNIT 3 - General Purpose Processors
UNIT 3 - General Purpose Processors
ButtaRajasekhar2
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
Adrien Mahieux
 
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptxEmbedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
ProfMonikaJain
 
Hardware Assisted Latency Investigations
Hardware Assisted Latency InvestigationsHardware Assisted Latency Investigations
Hardware Assisted Latency Investigations
ScyllaDB
 
PERFORMANCE_SCHEMA and sys schema
PERFORMANCE_SCHEMA and sys schemaPERFORMANCE_SCHEMA and sys schema
PERFORMANCE_SCHEMA and sys schema
FromDual GmbH
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
Brendan Gregg
 
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
PROIDEA
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloud
Brendan Gregg
 
My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)
Gustavo Rene Antunez
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE Method
Brendan Gregg
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
Emanuel Di Nardo
 
Lec 12-15 mips instruction set processor
Lec 12-15 mips instruction set processorLec 12-15 mips instruction set processor
Lec 12-15 mips instruction set processor
Mayank Roy
 
RTOS implementation
RTOS implementationRTOS implementation
RTOS implementation
Rajan Kumar
 
VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld
 
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel FernandesKernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Anne Nicolas
 
Operating Systems: Revision
Operating Systems: RevisionOperating Systems: Revision
Operating Systems: Revision
Damian T. Gordon
 

Similar to When the OS gets in the way (20)

BUD17-309: IRQ prediction
BUD17-309: IRQ prediction BUD17-309: IRQ prediction
BUD17-309: IRQ prediction
 
Optimizing Linux Servers
Optimizing Linux ServersOptimizing Linux Servers
Optimizing Linux Servers
 
OS scheduling and The anatomy of a context switch
OS scheduling and The anatomy of a context switchOS scheduling and The anatomy of a context switch
OS scheduling and The anatomy of a context switch
 
HKG15-409: ARM Hibernation enablement on SoCs - a case study
HKG15-409: ARM Hibernation enablement on SoCs - a case studyHKG15-409: ARM Hibernation enablement on SoCs - a case study
HKG15-409: ARM Hibernation enablement on SoCs - a case study
 
UNIT 3 - General Purpose Processors
UNIT 3 - General Purpose ProcessorsUNIT 3 - General Purpose Processors
UNIT 3 - General Purpose Processors
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptxEmbedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
 
Hardware Assisted Latency Investigations
Hardware Assisted Latency InvestigationsHardware Assisted Latency Investigations
Hardware Assisted Latency Investigations
 
PERFORMANCE_SCHEMA and sys schema
PERFORMANCE_SCHEMA and sys schemaPERFORMANCE_SCHEMA and sys schema
PERFORMANCE_SCHEMA and sys schema
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloud
 
My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE Method
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
Lec 12-15 mips instruction set processor
Lec 12-15 mips instruction set processorLec 12-15 mips instruction set processor
Lec 12-15 mips instruction set processor
 
RTOS implementation
RTOS implementationRTOS implementation
RTOS implementation
 
VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep Dive
 
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel FernandesKernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
 
Operating Systems: Revision
Operating Systems: RevisionOperating Systems: Revision
Operating Systems: Revision
 

Recently uploaded

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 

Recently uploaded (20)

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 

When the OS gets in the way

  • 1. When the OS gets in the way (and what you can do about it) Mark Price @epickrram LMAX Exchange
  • 2. When the OS gets in the way (and what you can do about it) Linux Mark Price @epickrram LMAX Exchange
  • 3. ● Linux is an excellent general-purpose OS ● Many target platforms ● Scheduling is actually fairly complicated ● Low-latency is a special use-case ● We need to provide some hints It’s not the OS’s fault
  • 4. Why should I care?
  • 5. Useful in some scenarios ● Low latency applications ● Response times < 1ms ● Compute-intensive workloads ● Long-running jobs
  • 6. A real-world scenario: LMAX System Latency = T1 - T0 Before tuning: 250us / 10+ms After tuning: 80us / <1ms (mean / max)
  • 7. Jitter ● “slight irregular movement, variation, or unsteadiness, especially in an electrical signal or electronic device” ● Variation in response time latency ● Long-tail in response time
  • 8. Dealing with it ● First take care of the low-hanging fruit ○ e.g. Garbage collection (gc-free / Zing) ○ e.g. Slow I/O ● Once response times are < 10ms the fun begins ● Make sure your code is running!
  • 9. Measure first ● Need to validate changes are good ● End-to-end tests ● Using realistic load ● Change one thing and observe ● A refresher...
  • 11. Multi-tasking ● num(tasks) > num(HyperThreads) ● OS must share out hardware resources ● Clever? Dumb? Fast? Slow? ● Fair...
  • 12. Linux CFS ● Completely Fair Scheduler ● Maintains a task ‘queue’ per HT ● Runs the task with the lowest runtime ● Updates task runtime after execution ● Higher priority implies longer execution time ● Tasks are load-balanced across HTs
  • 13. An example application ... Threads
  • 14. … running on a language runtime
  • 15. … running on an operating system
  • 16. Optimise for locality - PCI/memory
  • 18. How do I start?
  • 19. ● BIOS settings for maximum performance ● That’s a whole other talk... Start with the metal
  • 20. ● lstopo is a useful tool for looking at hardware ● Provided by the hwloc package ● Displays: ○ HyperThreads ○ Physical cores ○ NUMA nodes ○ PCI locality Discover what’s available
  • 23. ● Use isolcpus to reserve cpu resource ● kernel boot parameter ● isolcpus=0-5,10-13 ● Use taskset to pin your application to cpus: ● taskset -c 10-13 java … ● Set affinity of hot threads: ● sched_setaffinity(...) Reserve & use specific resource
  • 25. You have no load-balancer Pile-up
  • 26. A solution: cpusets ● Create hierarchical sets of reserved resource ● CPU, memory ● Userland tools: cset (SUSE)
  • 27. Isolate OS processes ● cset set --set=/system --cpu=6-9 ○ create a cpuset with cpus 6-9 ○ create it at the path /system ● cset proc --move --from-set=/ --to-set=/system ○ move all processes from / to /system ○ -k => move unbound kernel threads ○ --threads => move child threads ○ --force => erm... force
  • 28. Run the application ● cset set --cpu=0-5,10-13 --set=/app ● cset proc --exec /app taskset -cp 10-13 java … ○ start a process in the /app cpuset ○ run the program on cpus 10-13 ● sched_setaffinity() to pin the hot threads to cpus 1,3,5
  • 31. ● Sampling tracer ● Static/dynamic trace points ● Very low overhead ● A good starting point for digging deeper ● perf list to view available trace points ● network, file-system, scheduler, etc perf_events
  • 32. What’s happening CPU? ● perf record -e "sched:sched_switch" -C 3 ○ Sample task switches on CPU 3 ● perf report (best for multiple events) ● perf script (best for single events)
  • 33. Rogue process java 36049 [003] 3011858.780856: sched:sched_switch: java: 36049 [110] R ==> kworker/3:1:13991 [120] kworker/3:1 13991 [003] 3011858.780861: sched:sched_switch: kworker/3:1:13991 [120] S ==> java:36049 [110]
  • 34. ftrace ● Function tracer ● Static/dynamic trace points ● Higher overhead ● But captures everything ● Can provide function graphs ● trace-cmd is the usable front-end
  • 35. So what is that kernel thread doing? ● trace-cmd record -P <pid> -p function_graph ○ Trace functions called by process <pid> ● trace-cmd report ○ Display captured trace data
  • 36. Some things can’t be deferred kworker/3:1-13991 [003] 3013287.180771: funcgraph_entry: | process_one_work() { kworker/3:1-13991 [003] 3013287.180772: funcgraph_entry: | cache_reap() { kworker/3:1-13991 [003] 3013287.180772: funcgraph_entry: 0.137 us | mutex_trylock(); kworker/3:1-13991 [003] 3013287.180772: funcgraph_entry: 0.289 us | drain_array(); kworker/3:1-13991 [003] 3013287.180773: funcgraph_entry: 0.040 us | _cond_resched(); ……… ……… kworker/3:1-13991 [003] 3013287.180859: funcgraph_exit: +86.735 us | } +86.735 us
  • 37. Things to look out for ● cache_reap() - SLAB allocator ● vmstat_update() - kernel stats ● other workqueue events ○ perf record -e “workqueue:*” -C 3 ● Interrupts - set affinity in /proc/irq ● Timer ticks - tickless mode ● CPU governor - set to performance ○ /sys/devices/system/cpu/cpuN/cpufreq/scaling_governor
  • 38. Some numbers ● Inter-thread latency is a good proxy ● 2 busy-spinning threads passing a message ● Time taken between producer & consumer ● Record times over several seconds ● Compare tuned/untuned
  • 39. Results == Latency (ns) == mean min 50.00% 90.00% 99.00% 99.90% 99.99% max untuned 466 200 464 608 768 992 2432 69632 tuned 216 128 208 288 336 544 1664 69632
  • 41. tuned vs untuned (log scale)
  • 42. Results (loaded system) == Latency (ns) == mean min 50.00% 90.00% 99.00% 99.90% 99.99% max untuned 545 144 464 544 736 2944 294913 884739 tuned 332 216 336 352 448 544 704 36864
  • 43. tuned vs untuned (loaded system)
  • 44. Summary ● Select threads that need access to CPU ● Isolate CPUs from the OS ● Pin important threads to isolated CPUs ● Don’t forget interrupts ● There will be more things… ● Always test assumptions! ● Run validation tests to ensure tunings are as expected
  • 45. Thank you ● lmax.com/blog/staff-blogs/ ● epickrram.blogspot.com ● github.com/epickrram/perf-workshop ● @epickrram