Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Embedded Recipes 2019 - RT is about to make it to mainline. Now what?

183 views

Published on

The PREEMPT_RT (aka real-time patch) started back in 2004. Since then, a lot of it has made it into the kernel. The only part left is the change of turning spinning locks into mutexes, then the merge of the PREEMPT_RT patch will be complete. This is expected to happen no later than the Q1 of 2020. But once it is in, how can you take advantage of it? This talk will discuss the basics of writing a real-time application, and some tricks to play with Linux configured with PREEMPT_RT.

Steven Rostedt

Published in: Software
  • Be the first to comment

  • Be the first to like this

Embedded Recipes 2019 - RT is about to make it to mainline. Now what?

  1. 1. ©2019 VMware, Inc. RT is about to make it into mainline! Now what? Steven Rostedt Open Source Engineer rostedt@goodmis.org / srostedt@vmware.com
  2. 2. 2©2019 VMware, Inc. PREEMPT_RT (aka The RT Patch) ● Started in 2004 ● Responsible for (in current mainline) – Mutexes – Lockdep – ftrace – high resolution timers – NO HZ – NO HZ FULL – SCHED_DEADLINE – Generic interrupts – Interrupt threads – Real Time Scheduler
  3. 3. 3©2019 VMware, Inc. Goal of PREEMPT_RT ● Making Linux into a hard real-time designed system – Too complex to prove it is a true hard real-time system ● Modeling work is helping us in this regard (See Daniel Bristot de Oliveira’s KR talk) – All critical sections have a max execution time ● Knowing what that time is, I’ll leave as an academic exercise for you – No unbounded latency ● There will always be a bounded latency ● This allows the system to be deterministic
  4. 4. 4©2019 VMware, Inc. When will PREEMPT_RT (The Realtime Patch) be merged?
  5. 5. 5©2019 VMware, Inc. When will PREEMPT_RT (The Realtime Patch) be merged? https://lwn.net/Articles/263129/
  6. 6. 6©2019 VMware, Inc. When will PREEMPT_RT (The Realtime Patch) be merged? https://lwn.net/Articles/311258/
  7. 7. 7©2019 VMware, Inc. When will PREEMPT_RT (The Realtime Patch) be merged? https://lwn.net/Articles/313615/
  8. 8. 8©2019 VMware, Inc. When will PREEMPT_RT (The Realtime Patch) be merged? https://lwn.net/Articles/367638/
  9. 9. 9©2019 VMware, Inc. When will PREEMPT_RT (The Realtime Patch) be merged? https://lwn.net/Articles/368120/
  10. 10. 10©2019 VMware, Inc. When will PREEMPT_RT (The Realtime Patch) be merged? commit a50a3f4b6a313dc76912bd4ad3b8b4f4b479c801 Author: Thomas Gleixner <tglx@linutronix.de> Date: Wed Jul 17 22:01:49 2019 +0200 sched/rt, Kconfig: Introduce CONFIG_PREEMPT_RT Add a new entry to the preemption menu which enables the real-time support for the kernel. The choice is only enabled when an architecture supports it. It selects PREEMPT as the RT features depend on it. To achieve that the existing PREEMPT choice is renamed to PREEMPT_LL which select PREEMPT as well. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Clark Williams <williams@redhat.com> Acked-by: Daniel Bristot de Oliveira <bristot@redhat.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Daniel Wagner <wagi@monom.org> Acked-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Acked-by: Julia Cartwright <julia@ni.com> Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com> Acked-by: Gratian Crisan <gratian.crisan@ni.com> Acked-by: Sebastian Siewior <bigeasy@linutronix.de> Cc: Andrew Morton <akpm@linuxfoundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1907172200190.1778@nanos.tec.linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
  11. 11. 11©2019 VMware, Inc. When will PREEMPT_RT (The Realtime Patch) be merged? commit a50a3f4b6a313dc76912bd4ad3b8b4f4b479c801 Author: Thomas Gleixner <tglx@linutronix.de> Date: Wed Jul 17 22:01:49 2019 +0200 sched/rt, Kconfig: Introduce CONFIG_PREEMPT_RT Add a new entry to the preemption menu which enables the real-time support for the kernel. The choice is only enabled when an architecture supports it. It selects PREEMPT as the RT features depend on it. To achieve that the existing PREEMPT choice is renamed to PREEMPT_LL which select PREEMPT as well. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Clark Williams <williams@redhat.com> Acked-by: Daniel Bristot de Oliveira <bristot@redhat.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Daniel Wagner <wagi@monom.org> Acked-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Acked-by: Julia Cartwright <julia@ni.com> Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com> Acked-by: Gratian Crisan <gratian.crisan@ni.com> Acked-by: Sebastian Siewior <bigeasy@linutronix.de> Cc: Andrew Morton <akpm@linuxfoundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1907172200190.1778@nanos.tec.linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
  12. 12. 12©2019 VMware, Inc. Huh?
  13. 13. 13©2019 VMware, Inc. When will PREEMPT_RT (The Realtime Patch) be merged? +config PREEMPT_RT + bool "Fully Preemptible Kernel (Real-Time)" + depends on EXPERT && ARCH_SUPPORTS_RT + select PREEMPT + help + This option turns the kernel into a real-time kernel by replacing + various locking primitives (spinlocks, rwlocks, etc.) with + preemptible priority-inheritance aware variants, enforcing + interrupt threading and introducing mechanisms to break up long + non-preemptible sections. This makes the kernel, except for very + low level and critical code pathes (entry code, scheduler, low + level interrupt handling) fully preemptible and brings most + execution contexts under scheduler control. + + Select this if you are building a kernel for systems which + require real-time guarantees. + endchoice
  14. 14. 14©2019 VMware, Inc. When will PREEMPT_RT (The Realtime Patch) be merged? +config PREEMPT_RT + bool "Fully Preemptible Kernel (Real-Time)" + depends on EXPERT && ARCH_SUPPORTS_RT + select PREEMPT + help + This option turns the kernel into a real-time kernel by replacing + various locking primitives (spinlocks, rwlocks, etc.) with + preemptible priority-inheritance aware variants, enforcing + interrupt threading and introducing mechanisms to break up long + non-preemptible sections. This makes the kernel, except for very + low level and critical code pathes (entry code, scheduler, low + level interrupt handling) fully preemptible and brings most + execution contexts under scheduler control. + + Select this if you are building a kernel for systems which + require real-time guarantees. + endchoice
  15. 15. 15©2019 VMware, Inc. It’s almost there (2020 should have it?) Now what?  What to do with the PREEMPT_RT kernel  First, need to get the configs right
  16. 16. 16©2019 VMware, Inc. Select CONFIG_PREEMPT_RT In the -rt patch it’s still CONFIG_PREEMPT_RT_FULL
  17. 17. 17©2019 VMware, Inc. Select CONFIG_NO_HZ_FULL (optional) Plan on running an RT task is polling mode? - Adds performance overhead
  18. 18. 18©2019 VMware, Inc. Select CONFIG_HIGH_RES_TIMERS You do care about precision, right?
  19. 19. 19©2019 VMware, Inc. Select CONFIG_HWLAT_TRACER Test your system for SMIs and such
  20. 20. 20©2019 VMware, Inc. Select CONFIG_SCHED_TRACER May want to see scheduling latency (doesn’t hurt)
  21. 21. 21©2019 VMware, Inc. Preempt / IRQs Off Latency Tracers Good to have, but adds significant performance overhead!
  22. 22. 22©2019 VMware, Inc. CONFIG_HIST_TRIGGERS Can be used to define your own latency measurements
  23. 23. 23©2019 VMware, Inc. Basic RT application coding mlockall()  Avoid unexpected latency due to page faults CPU affinity  Avoid unexpected latency due to migration  The more you control, the more deterministic it is  Pin tasks to CPUs when possible (keep variables to a minimum) Priority inheritance Mutex  pthread_mutexattr_setprotocol(&attr, PTHREAD_PRIO_INHERIT)
  24. 24. 24©2019 VMware, Inc. Priority Inversion preempted preempted A B C blocked
  25. 25. 25©2019 VMware, Inc. Priority Inversion prevented with Priority Inheritance preempted releases lock A B C wakes up blocked sleeps
  26. 26. 26©2019 VMware, Inc. Setting up your machine Configure your system  Know what you need to use  What interrupts are important (what devices does your RT task use?)  RT gives you the power to destroy yourself
  27. 27. 27©2019 VMware, Inc. Threaded Interrupts # ps aux |grep irq root 9 1.6 0.0 0 0 ? S 17:37 0:02 [ksoftirqd/0] root 21 1.9 0.0 0 0 ? S 17:37 0:03 [ksoftirqd/1] root 28 2.2 0.0 0 0 ? S 17:37 0:03 [ksoftirqd/2] root 35 1.3 0.0 0 0 ? S 17:37 0:02 [ksoftirqd/3] root 148 0.0 0.0 0 0 ? S 17:37 0:00 [irq/9-acpi] root 163 0.0 0.0 0 0 ? S 17:37 0:00 [irq/14-ata_piix] root 164 0.0 0.0 0 0 ? S 17:37 0:00 [irq/15-ata_piix] root 170 0.0 0.0 0 0 ? S 17:37 0:00 [irq/10-ehci_hcd] root 171 0.0 0.0 0 0 ? S 17:37 0:00 [irq/10-uhci_hcd] root 172 0.0 0.0 0 0 ? S 17:37 0:00 [irq/11-uhci_hcd] root 173 0.0 0.0 0 0 ? S 17:38 0:00 [irq/11-uhci_hcd] root 175 0.0 0.0 0 0 ? S 17:38 0:00 [irq/12-i8042] root 176 0.0 0.0 0 0 ? S 17:38 0:00 [irq/1-i8042] root 178 0.0 0.0 0 0 ? S 17:38 0:00 [irq/8-rtc0] root 422 0.0 0.0 0 0 ? S 17:38 0:00 [irq/11-qxl] root 429 0.0 0.0 0 0 ? S 17:38 0:00 [irq/24-virtio0-] root 430 0.0 0.0 0 0 ? S 17:38 0:00 [irq/25-virtio0-] root 431 0.0 0.0 0 0 ? S 17:38 0:00 [irq/26-virtio1-] root 432 0.2 0.0 0 0 ? S 17:38 0:00 [irq/27-virtio1-] root 673 0.0 0.0 0 0 ? R 17:38 0:00 [irq/10-virtio2] root 753 0.0 0.0 0 0 ? S 17:38 0:00 [irq/28-snd_hda_] root 877 0.0 0.0 0 0 ? S 17:38 0:00 [irq/10-ens9] root 1267 0.0 0.0 0 0 ? S 17:39 0:00 [irq/4-ttyS0]
  28. 28. 28©2019 VMware, Inc. Threaded Interrupts # ps aux |grep irq root 9 1.6 0.0 0 0 ? S 17:37 0:02 [ksoftirqd/0] root 21 1.9 0.0 0 0 ? S 17:37 0:03 [ksoftirqd/1] root 28 2.2 0.0 0 0 ? S 17:37 0:03 [ksoftirqd/2] root 35 1.3 0.0 0 0 ? S 17:37 0:02 [ksoftirqd/3] root 148 0.0 0.0 0 0 ? S 17:37 0:00 [irq/9-acpi] root 163 0.0 0.0 0 0 ? S 17:37 0:00 [irq/14-ata_piix] root 164 0.0 0.0 0 0 ? S 17:37 0:00 [irq/15-ata_piix] root 170 0.0 0.0 0 0 ? S 17:37 0:00 [irq/10-ehci_hcd] root 171 0.0 0.0 0 0 ? S 17:37 0:00 [irq/10-uhci_hcd] root 172 0.0 0.0 0 0 ? S 17:37 0:00 [irq/11-uhci_hcd] root 173 0.0 0.0 0 0 ? S 17:38 0:00 [irq/11-uhci_hcd] root 175 0.0 0.0 0 0 ? S 17:38 0:00 [irq/12-i8042] root 176 0.0 0.0 0 0 ? S 17:38 0:00 [irq/1-i8042] root 178 0.0 0.0 0 0 ? S 17:38 0:00 [irq/8-rtc0] root 422 0.0 0.0 0 0 ? S 17:38 0:00 [irq/11-qxl] root 429 0.0 0.0 0 0 ? S 17:38 0:00 [irq/24-virtio0-] root 430 0.0 0.0 0 0 ? S 17:38 0:00 [irq/25-virtio0-] root 431 0.0 0.0 0 0 ? S 17:38 0:00 [irq/26-virtio1-] root 432 0.2 0.0 0 0 ? S 17:38 0:00 [irq/27-virtio1-] root 673 0.0 0.0 0 0 ? R 17:38 0:00 [irq/10-virtio2] root 753 0.0 0.0 0 0 ? S 17:38 0:00 [irq/28-snd_hda_] root 877 0.0 0.0 0 0 ? S 17:38 0:00 [irq/10-ens9] root 1267 0.0 0.0 0 0 ? S 17:39 0:00 [irq/4-ttyS0]
  29. 29. 29©2019 VMware, Inc. Threaded Interrupts # ps aux |grep irq root 9 1.6 0.0 0 0 ? S 17:37 0:02 [ksoftirqd/0] root 21 1.9 0.0 0 0 ? S 17:37 0:03 [ksoftirqd/1] root 28 2.2 0.0 0 0 ? S 17:37 0:03 [ksoftirqd/2] root 35 1.3 0.0 0 0 ? S 17:37 0:02 [ksoftirqd/3] root 148 0.0 0.0 0 0 ? S 17:37 0:00 [irq/9-acpi] root 163 0.0 0.0 0 0 ? S 17:37 0:00 [irq/14-ata_piix] root 164 0.0 0.0 0 0 ? S 17:37 0:00 [irq/15-ata_piix] root 170 0.0 0.0 0 0 ? S 17:37 0:00 [irq/10-ehci_hcd] root 171 0.0 0.0 0 0 ? S 17:37 0:00 [irq/10-uhci_hcd] root 172 0.0 0.0 0 0 ? S 17:37 0:00 [irq/11-uhci_hcd] root 173 0.0 0.0 0 0 ? S 17:38 0:00 [irq/11-uhci_hcd] root 175 0.0 0.0 0 0 ? S 17:38 0:00 [irq/12-i8042] root 176 0.0 0.0 0 0 ? S 17:38 0:00 [irq/1-i8042] root 178 0.0 0.0 0 0 ? S 17:38 0:00 [irq/8-rtc0] root 422 0.0 0.0 0 0 ? S 17:38 0:00 [irq/11-qxl] root 429 0.0 0.0 0 0 ? S 17:38 0:00 [irq/24-virtio0-] root 430 0.0 0.0 0 0 ? S 17:38 0:00 [irq/25-virtio0-] root 431 0.0 0.0 0 0 ? S 17:38 0:00 [irq/26-virtio1-] root 432 0.2 0.0 0 0 ? S 17:38 0:00 [irq/27-virtio1-] root 673 0.0 0.0 0 0 ? R 17:38 0:00 [irq/10-virtio2] root 753 0.0 0.0 0 0 ? S 17:38 0:00 [irq/28-snd_hda_] root 877 0.0 0.0 0 0 ? S 17:38 0:00 [irq/10-ens9] root 1267 0.0 0.0 0 0 ? S 17:39 0:00 [irq/4-ttyS0]
  30. 30. 30©2019 VMware, Inc. Threaded Interrupts # ps aux |grep irq | awk '{print $2}' | while read a ; do chrt -p $a; done pid 9's current scheduling policy: SCHED_OTHER pid 9's current scheduling priority: 0 pid 21's current scheduling policy: SCHED_OTHER pid 21's current scheduling priority: 0 pid 28's current scheduling policy: SCHED_OTHER pid 28's current scheduling priority: 0 pid 35's current scheduling policy: SCHED_OTHER pid 35's current scheduling priority: 0 pid 148's current scheduling policy: SCHED_FIFO pid 148's current scheduling priority: 50 pid 163's current scheduling policy: SCHED_FIFO pid 163's current scheduling priority: 50 pid 164's current scheduling policy: SCHED_FIFO pid 164's current scheduling priority: 50 pid 170's current scheduling policy: SCHED_FIFO pid 170's current scheduling priority: 50 pid 171's current scheduling policy: SCHED_FIFO pid 171's current scheduling priority: 50 pid 172's current scheduling policy: SCHED_FIFO pid 172's current scheduling priority: 50 pid 173's current scheduling policy: SCHED_FIFO pid 173's current scheduling priority: 50 pid 175's current scheduling policy: SCHED_FIFO
  31. 31. 31©2019 VMware, Inc. Threaded Interrupts # ps aux |grep irq | awk '{print $2}' | while read a ; do chrt -p $a; done pid 9's current scheduling policy: SCHED_OTHER pid 9's current scheduling priority: 0 pid 21's current scheduling policy: SCHED_OTHER pid 21's current scheduling priority: 0 pid 28's current scheduling policy: SCHED_OTHER pid 28's current scheduling priority: 0 pid 35's current scheduling policy: SCHED_OTHER pid 35's current scheduling priority: 0 pid 148's current scheduling policy: SCHED_FIFO pid 148's current scheduling priority: 50 pid 163's current scheduling policy: SCHED_FIFO pid 163's current scheduling priority: 50 pid 164's current scheduling policy: SCHED_FIFO pid 164's current scheduling priority: 50 pid 170's current scheduling policy: SCHED_FIFO pid 170's current scheduling priority: 50 pid 171's current scheduling policy: SCHED_FIFO pid 171's current scheduling priority: 50 pid 172's current scheduling policy: SCHED_FIFO pid 172's current scheduling priority: 50 pid 173's current scheduling policy: SCHED_FIFO pid 173's current scheduling priority: 50 pid 175's current scheduling policy: SCHED_FIFO
  32. 32. 32©2019 VMware, Inc. Threaded Interrupts # ps aux |grep irq | awk '{print $2}' | while read a ; do chrt -p $a; done pid 9's current scheduling policy: SCHED_OTHER pid 9's current scheduling priority: 0 pid 21's current scheduling policy: SCHED_OTHER pid 21's current scheduling priority: 0 pid 28's current scheduling policy: SCHED_OTHER pid 28's current scheduling priority: 0 pid 35's current scheduling policy: SCHED_OTHER pid 35's current scheduling priority: 0 pid 148's current scheduling policy: SCHED_FIFO pid 148's current scheduling priority: 50 pid 163's current scheduling policy: SCHED_FIFO pid 163's current scheduling priority: 50 pid 164's current scheduling policy: SCHED_FIFO pid 164's current scheduling priority: 50 pid 170's current scheduling policy: SCHED_FIFO pid 170's current scheduling priority: 50 pid 171's current scheduling policy: SCHED_FIFO pid 171's current scheduling priority: 50 pid 172's current scheduling policy: SCHED_FIFO pid 172's current scheduling priority: 50 pid 173's current scheduling policy: SCHED_FIFO pid 173's current scheduling priority: 50 pid 175's current scheduling policy: SCHED_FIFO
  33. 33. 33©2019 VMware, Inc. Threaded Interrupts # ps aux |grep irq | awk '{print $2}' | while read a; do taskset -p $a; done pid 9's current affinity mask: 1 pid 21's current affinity mask: 2 pid 28's current affinity mask: 4 pid 35's current affinity mask: 8 pid 148's current affinity mask: 2 pid 163's current affinity mask: 4 pid 164's current affinity mask: 8 pid 170's current affinity mask: 1 pid 171's current affinity mask: 1 pid 172's current affinity mask: 2 pid 173's current affinity mask: 2 pid 175's current affinity mask: 4 pid 176's current affinity mask: 8 pid 178's current affinity mask: 1 pid 422's current affinity mask: 2 pid 429's current affinity mask: 4 pid 430's current affinity mask: 8 pid 431's current affinity mask: 1 pid 432's current affinity mask: 8 pid 673's current affinity mask: 1 pid 753's current affinity mask: 2 pid 877's current affinity mask: 1 pid 1267's current affinity mask: 2
  34. 34. 34©2019 VMware, Inc. Threaded Interrupts # cat /proc/irq/*/smp_affinity f f f f f f f f f f f f f f f f f f f f f
  35. 35. 35©2019 VMware, Inc. Threaded Interrupts # ls /proc/irq/*/smp_affinity| while read a ; do echo 1 > $a ;done # cat /proc/irq/*/smp_affinity f 1 1 1 1 1 1 1 1 1 1 f 1 f 1 1 1 1 1 1 1
  36. 36. 36©2019 VMware, Inc. Threaded Interrupts # ps aux |grep irq | awk '{print $2}' | while read a; do taskset -p $a; done pid 9's current affinity mask: 1 pid 21's current affinity mask: 2 pid 28's current affinity mask: 4 pid 35's current affinity mask: 8 pid 148's current affinity mask: 2 pid 163's current affinity mask: 4 pid 164's current affinity mask: 8 pid 170's current affinity mask: 1 pid 171's current affinity mask: 1 pid 172's current affinity mask: 1 pid 173's current affinity mask: 1 pid 175's current affinity mask: 4 pid 176's current affinity mask: 8 pid 178's current affinity mask: 1 pid 422's current affinity mask: 1 pid 429's current affinity mask: 4 pid 430's current affinity mask: 8 pid 431's current affinity mask: 1 pid 432's current affinity mask: 8 pid 673's current affinity mask: 1 pid 753's current affinity mask: 2 pid 877's current affinity mask: 1 pid 1267's current affinity mask: 2
  37. 37. 37©2019 VMware, Inc. Threaded Interrupts # ls /proc/irq/*/smp_affinity| while read a ; do echo 2 > $a ;done # cat /proc/irq/*/smp_affinity f 2 2 2 2 2 2 2 2 2 2 f 2 f 2 2 2 2 2 2 2
  38. 38. 38©2019 VMware, Inc. Threaded Interrupts # ps aux |grep irq | awk '{print $2}' | while read a; do taskset -p $a; done pid 9's current affinity mask: 1 pid 21's current affinity mask: 2 pid 28's current affinity mask: 4 pid 35's current affinity mask: 8 pid 148's current affinity mask: 2 pid 163's current affinity mask: 4 pid 164's current affinity mask: 8 pid 170's current affinity mask: 2 pid 171's current affinity mask: 2 pid 172's current affinity mask: 2 pid 173's current affinity mask: 2 pid 175's current affinity mask: 4 pid 176's current affinity mask: 8 pid 178's current affinity mask: 1 pid 422's current affinity mask: 2 pid 429's current affinity mask: 4 pid 430's current affinity mask: 8 pid 431's current affinity mask: 1 pid 432's current affinity mask: 8 pid 673's current affinity mask: 2 pid 753's current affinity mask: 2 pid 877's current affinity mask: 2 pid 1267's current affinity mask: 2
  39. 39. 39©2019 VMware, Inc. Threaded Interrupts # ps aux |grep irq | awk '{print $2}' | while read a; do taskset -p $a; done pid 9's current affinity mask: 1 pid 21's current affinity mask: 2 pid 28's current affinity mask: 4 pid 35's current affinity mask: 8 pid 148's current affinity mask: 2 pid 163's current affinity mask: 4 pid 164's current affinity mask: 8 pid 170's current affinity mask: 1 pid 171's current affinity mask: 1 pid 172's current affinity mask: 1 pid 173's current affinity mask: 1 pid 175's current affinity mask: 4 pid 176's current affinity mask: 8 pid 178's current affinity mask: 1 pid 422's current affinity mask: 1 pid 429's current affinity mask: 4 pid 430's current affinity mask: 8 pid 431's current affinity mask: 1 pid 432's current affinity mask: 8 pid 673's current affinity mask: 1 pid 753's current affinity mask: 2 pid 877's current affinity mask: 1 pid 1267's current affinity mask: 2
  40. 40. 40©2019 VMware, Inc. Threaded Interrupts # ps aux |grep irq | awk '{print $2}' | while read a; do taskset -p $a; done pid 9's current affinity mask: 1 pid 21's current affinity mask: 2 pid 28's current affinity mask: 4 pid 35's current affinity mask: 8 pid 148's current affinity mask: 2 pid 163's current affinity mask: 4 pid 164's current affinity mask: 8 pid 170's current affinity mask: 2 pid 171's current affinity mask: 2 pid 172's current affinity mask: 2 pid 173's current affinity mask: 2 pid 175's current affinity mask: 4 pid 176's current affinity mask: 8 pid 178's current affinity mask: 1 pid 422's current affinity mask: 2 pid 429's current affinity mask: 4 pid 430's current affinity mask: 8 pid 431's current affinity mask: 1 pid 432's current affinity mask: 8 pid 673's current affinity mask: 2 pid 753's current affinity mask: 2 pid 877's current affinity mask: 2 pid 1267's current affinity mask: 2
  41. 41. 41©2019 VMware, Inc. Check your system Real-time is more than just a kernel The hardware must not induce latency  Your OS is only as good as the hardware it runs on Need to worry about System Management Interrupts (SMI)  Triggered by the BIOS  The OS (Linux) has no control over them  Can be used to monitor thermal (fans) and memory (ECC) Use the hwlat tracer to detect this
  42. 42. 42©2019 VMware, Inc. hwlat tracer # trace-cmd start -p hwlat # trace-cmd show # tracer: hwlat # # entries-in-buffer/entries-written: 39/39 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # ||||/ delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | <...>-1981 [001] d.L.... 17454.195055: #1 inner/outer(us): 631/321 ts:1569119307.978708067 <...>-1981 [002] d.L.... 17455.314708: #2 inner/outer(us): 311/244 ts:1569119308.988240898 <...>-1981 [003] d.L.... 17456.220180: #3 inner/outer(us): 699/672 ts:1569119309.996060780 <...>-1981 [000] d...... 17457.264116: #4 inner/outer(us): 165/219 ts:1569119311.003999819 <...>-1981 [001] d...... 17458.344601: #5 inner/outer(us): 32/44 ts:1569119312.012000302 <...>-1981 [002] d.L.... 17459.239703: #6 inner/outer(us): 326/496 ts:1569119313.020149009 <...>-1981 [000] d...... 17461.156198: #8 inner/outer(us): 117/548 ts:1569119315.036214857 <...>-1981 [001] d...... 17462.344629: #9 inner/outer(us): 908/82 ts:1569119316.044070049 <...>-1981 [003] d...... 17463.909470: #11 inner/outer(us): 172/860 ts:1569119318.060040701 <...>-1981 [000] d...... 17465.196280: #12 inner/outer(us): 876/783 ts:1569119319.068065360 <...>-1981 [001] d...... 17466.118113: #13 inner/outer(us): 29/25 ts:1569119320.076075661 <...>-1981 [002] d...... 17466.933660: #14 inner/outer(us): 251/29 ts:1569119321.084216130 <...>-1981 [003] d...... 17467.941405: #15 inner/outer(us): 177/632 ts:1569119322.091966450 <...>-1981 [000] d...... 17469.196170: #16 inner/outer(us): 656/711 ts:1569119323.099677972 <...>-1981 [001] d...... 17470.158097: #17 inner/outer(us): 162/29 ts:1569119324.107980626 <...>-1981 [002] d...... 17470.965666: #18 inner/outer(us): 211/975 ts:1569119325.116225970 <...>-1981 [003] d.L.... 17472.334110: #19 inner/outer(us): 656/845 ts:1569119326.124090524 <...>-1981 [000] d...... 17473.196171: #20 inner/outer(us): 197/64 ts:1569119327.132047281 <...>-1981 [001] d...... 17473.989240: #21 inner/outer(us): 2282/80 ts:1569119328.139772360 <...>-1981 [002] d...... 17475.003012: #22 inner/outer(us): 219/9624 ts:1569119329.153625986
  43. 43. 43©2019 VMware, Inc. hwlat tracer # trace-cmd start -p hwlat # trace-cmd show # tracer: hwlat # # entries-in-buffer/entries-written: 39/39 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # ||||/ delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | <...>-1981 [001] d.L.... 17454.195055: #1 inner/outer(us): 631/321 ts:1569119307.978708067 <...>-1981 [002] d.L.... 17455.314708: #2 inner/outer(us): 311/244 ts:1569119308.988240898 <...>-1981 [003] d.L.... 17456.220180: #3 inner/outer(us): 699/672 ts:1569119309.996060780 <...>-1981 [000] d...... 17457.264116: #4 inner/outer(us): 165/219 ts:1569119311.003999819 <...>-1981 [001] d...... 17458.344601: #5 inner/outer(us): 32/44 ts:1569119312.012000302 <...>-1981 [002] d.L.... 17459.239703: #6 inner/outer(us): 326/496 ts:1569119313.020149009 <...>-1981 [000] d...... 17461.156198: #8 inner/outer(us): 117/548 ts:1569119315.036214857 <...>-1981 [001] d...... 17462.344629: #9 inner/outer(us): 908/82 ts:1569119316.044070049 <...>-1981 [003] d...... 17463.909470: #11 inner/outer(us): 172/860 ts:1569119318.060040701 <...>-1981 [000] d...... 17465.196280: #12 inner/outer(us): 876/783 ts:1569119319.068065360 <...>-1981 [001] d...... 17466.118113: #13 inner/outer(us): 29/25 ts:1569119320.076075661 <...>-1981 [002] d...... 17466.933660: #14 inner/outer(us): 251/29 ts:1569119321.084216130 <...>-1981 [003] d...... 17467.941405: #15 inner/outer(us): 177/632 ts:1569119322.091966450 <...>-1981 [000] d...... 17469.196170: #16 inner/outer(us): 656/711 ts:1569119323.099677972 <...>-1981 [001] d...... 17470.158097: #17 inner/outer(us): 162/29 ts:1569119324.107980626 <...>-1981 [002] d...... 17470.965666: #18 inner/outer(us): 211/975 ts:1569119325.116225970 <...>-1981 [003] d.L.... 17472.334110: #19 inner/outer(us): 656/845 ts:1569119326.124090524 <...>-1981 [000] d...... 17473.196171: #20 inner/outer(us): 197/64 ts:1569119327.132047281 <...>-1981 [001] d...... 17473.989240: #21 inner/outer(us): 2282/80 ts:1569119328.139772360 <...>-1981 [002] d...... 17475.003012: #22 inner/outer(us): 219/9624 ts:1569119329.153625986
  44. 44. 44©2019 VMware, Inc. 9624 microsecond latency?
  45. 45. 45©2019 VMware, Inc. 9624 microsecond latency? # cat /proc/cpuinfo | head processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 6 model name : Common KVM processor stepping : 1 microcode : 0x1 cpu MHz : 1896.000 cache size : 16384 KB physical id : 0
  46. 46. 46©2019 VMware, Inc. NO_HZ_FULL Good example of seeing how the system is behaving Command line: isolcpus=2,3 rcu_nocbs=2,3  Isolating CPUs 2 and 3  Have RCU callbacks for CPU 2 and 3 run on other CPUs Test with a user spinner program  Does nothing but spins (tests to see if we stay in userspace)  Use KernelShark to see if it gets interrupted
  47. 47. 47©2019 VMware, Inc. NO_HZ_FULL # trace-cmd record -e all -p function_graph --max-graph-depth 1 -M 4 taskset -c 2 /work/c/userspin 30
  48. 48. 48©2019 VMware, Inc. NO_HZ_FULL # trace-cmd record -e all -p function_graph --max-graph-depth 1 -M 4 taskset -c 2 /work/c/userspin 30
  49. 49. 49©2019 VMware, Inc. NO_HZ_FULL # trace-cmd record -e all -p function_graph --max-graph-depth 1 -M 4 taskset -c 2 /work/c/userspin 30
  50. 50. 50©2019 VMware, Inc. NO_HZ_FULL # ls /proc/irq/*/smp_affinity| while read a ; do echo f3 > $a ;done # cat /proc/irq/*/smp_affinity ff f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 ff f3 f3 f3 f3 f3
  51. 51. 51©2019 VMware, Inc. NO_HZ_FULL # trace-cmd record -e all -p function_graph --max-graph-depth 1 -M 4 taskset -c 2 /work/c/userspin 30
  52. 52. 52©2019 VMware, Inc. NO_HZ_FULL # trace-cmd record -e all -p function_graph --max-graph-depth 1 -M 4 taskset -c 2 /work/c/userspin 30
  53. 53. 53©2019 VMware, Inc. NO_HZ_FULL Command line: isolcpus=2,3 rcu_nocbs=2,3 nowatchdog  Isolating CPUs 2 and 3  Have RCU callbacks for CPU 2 and 3 run on other CPUs  Turn off watchdogs – Will not detect hard lockups!
  54. 54. 54©2019 VMware, Inc. NO_HZ_FULL # trace-cmd record -e all -p function_graph --max-graph-depth 1 -M 4 taskset -c 2 /work/c/userspin 30
  55. 55. 55©2019 VMware, Inc. NO_HZ_FULL __visible void __irq_entry smp_call_function_interrupt(struct pt_regs *regs) { ipi_entering_ack_irq(); trace_call_function_entry(CALL_FUNCTION_VECTOR); inc_irq_stat(irq_call_count); generic_smp_call_function_interrupt(); trace_call_function_exit(CALL_FUNCTION_VECTOR); exiting_irq(); }
  56. 56. 56©2019 VMware, Inc. NO_HZ_FULL # trace-cmd record -e all -p function_graph -g 'generic_smp_call_function_*'-M 4 taskset -c 2 /work/c/userspin 30
  57. 57. 57©2019 VMware, Inc. NO_HZ_FULL # trace-cmd record -e all -p function_graph -g 'generic_smp_call_function_*' -l 'tick_nohz_full*:stacktrace' -M ff taskset -c 2 /work/c/userspin 30 # trace-cmd report [..] ksoftirqd/1-20 [001] 1758.326782: timer_start: timer=0xffffffff828fb640 function=clocksource_watchdog expires=4296424801 [timeout=497] cpu=2 idx=109 flags=D|P|I ksoftirqd/1-20 [001] 1758.326792: kernel_stack: <stack trace > => ftrace_call (ffffffff81a01811) => tick_nohz_full_kick_cpu (ffffffff8116d995) => add_timer_on (ffffffff81158d71) => clocksource_watchdog (ffffffff811614b4) => run_timer_softirq (ffffffff811595a0) => __do_softirq (ffffffff81c000e6) => run_ksoftirqd (ffffffff810d938b) => smpboot_thread_fn (ffffffff810fc161) => kthread (ffffffff810f7c3d) => ret_from_fork (ffffffff81a00215)
  58. 58. 58©2019 VMware, Inc. NO_HZ_FULL Command line: isolcpus=2,3 rcu_nocbs=2,3 nowatchdog tsc=reliable  Isolating CPUs 2 and 3  Have RCU callbacks for CPU 2 and 3 run on other CPUs  Turn off watchdogs – Will not detect hard lockups!  Tell the kernel the TSC is reliable – In other words, lie about it!
  59. 59. 59©2019 VMware, Inc. NO_HZ_FULL # trace-cmd record -e all -p function_graph -g 'generic_smp_call_function_*'-M ff taskset -c 2 /work/c/userspin 30
  60. 60. Thank You

×