Linux Scheduler
(Practical View)
1
Liran Ben Haim
liran@discoversdk.com
Rights to Copy
— Attribution – ShareAlike 2.0
— You are free
to copy, distribute, display, and perform the work
to make derivative works
to make commercial use of the work
Under the following conditions
Attribution. You must give the original author credit.
Share Alike. If you alter, transform, or build upon this work, you may
distribute the resulting work only under a license identical to this
one.
For any reuse or distribution, you must make clear to others the
license terms of this work.
Any of these conditions can be waived if you get permission from
the copyright holder.
Your fair use and other rights are in no way affected by the above.
License text: http://creativecommons.org/licenses/by-sa/2.0/legalcode
— This kit contains work by the
following authors:
— © Copyright 2004-2009
Michael Opdenacker /Free
Electrons
michael@free-electrons.com
http://www.free-electrons.com
— © Copyright 2003-2006
Oron Peled
oron@actcom.co.il
http://www.actcom.co.il/~oron
— © Copyright 2004–2008
Codefidence ltd.
info@codefidence.com
http://www.codefidence.com
— © Copyright 2009–2010
Bina ltd.
info@bna.co.il
http://www.bna.co.il
2
Processes and Threads
A process is an instance of a running program.
Multiple instances of the same program can be running.
Program code (“text section”) memory is shared.
Each process has its own data section, address space, open
files and signal handlers.
A thread is a single task in a program.
It belongs to a process and shares the common data
section, address space, open files and pending signals.
It has its own stack, pending signals and state.
It's common to refer to single threaded programs as
processes.
3
The Kernel and Threads
In 2.6 an explicit notion of processes and threads
was introduced to the kernel.
Scheduling is done on a thread by thread basis.
The basic object the kernel works with is a task,
which is analogous to a thread.
4
Thread 1 Thread
1
Thread
2
Thread
3
Thread
4
Process 123 Process 124
File
Descriptors
Memory
Signal
Handlers
File
Descriptors
Memory
Signal
Handlers
Stack
State
Signal
Mask
Stack
State
Signal
Mask
Stack
State
Signal
Mask
Stack
State
Signal
Mask
Stack
State
Signal
Mask
Priority Priority Priority Priority Priority
5
6
Linux Priorities
0
1
2
3
4
98
99
97
...
Non real-time processes
SCHED_OTHER
SCHED_BATCH
SCHED_IDLE
Real time processes
SCHED_FIFO
SCHED_RR
SCHED_DEADLINE (3.14)
19
18
17
16
-19
-20
-18
...
Nice
level
Real Time priority
7
API
— int sched_setscheduler(pid_t pid, int policy,
const struct sched_param *param);
— int setpriority(int which, id_t who, int prio);
— int sched_setparam(pid_t pid, const struct sched_param
*param);
— int sched_setattr(pid_t pid, struct sched_attr *attr, unsigned
int flags);
8
9
Blocking Threads
— A nonblocking infinite loop in a thread scheduled under the
SCHED_FIFO, SCHED_RR, or SCHED_DEADLINE policy will
block all threads with lower priority forever
— Solution: Limiting the CPU usage of real-time and deadline
processes
— /proc/sys/kernel/sched_rt_period_us
— Period that is equivalent to 100% CPU (default:
1000000)
— /proc/sys/kernel/sched_rt_runtime_us
— how much of the "period" time can be used by all real-
time and deadline scheduled processes on the system
(default: 950000)
10
Preemption
— The Linux kernel is a preemptive operating system
— When a task runs in user space mode and gets
interrupted by an interruption, if the interrupt
handler wakes up another task, this task can be
scheduled as soon as we return from the interrupt
handler
11
— However, when the interrupt comes while the task is executing
a system call, this system call has to finish before another task
can be scheduled.
— By default, the Linux kernel does not do kernel preemption.
— This means that the time before which the scheduler will be
called to schedule another task is unbounded
12
Preemption Models
13
CONFIG_PREEMPT_NONE
— Kernel code (interrupts, exceptions, system calls)
never preempted. Default behavior in standard
kernels.
— Best for systems making intense computations, on
which overall throughput is key.
— Best to reduce task switching to maximize CPU and
cache usage (by reducing context switching).
14
CONFIG_PREEMPT_VOLUNTARY
— Kernel code can preempt itself
— Typically for desktop systems, for quicker application
reaction to user input.
— Adds explicit rescheduling points throughout kernel
code.
— Minor impact on throughput.
— Used in: Ubuntu Desktop 15.04, Ubuntu Server 14.04
— Use: cond_resched()
15
CONFIG_PREEMPT
— Most kernel code can be involuntarily preempted at any
time. When a process becomes runnable, no more need
to wait for kernel code (typically a system call) to return
before running the scheduler.
— Exception: kernel critical sections (holding spinlocks). In
a case you hold a spinlock on a uni-processor system,
kernel preemption could run another process, which
would loop forever if it tried to acquire the same
spinlock.
— Typically for desktop or embedded systems with latency
requirements in the milliseconds range.
16
CONFIG_PREEMPT_RT
— The PREEMPT_RT patch adds a new level of preemption, called
CONFIG_PREEMPT_RT_FULL
— This level of preemption replaces all kernel spinlocks by mutexes (or so-called
sleeping spinlocks)
— Instead of providing mutual exclusion by disabling interrupts and preemption, they
are just normal locks: when contention happens, the process is blocked and
another one is selected by the scheduler.
— Works well with threaded interrupts, since threads can block, while usual interrupt
handlers could not.
— Some core, carefully controlled, kernel spinlocks remain as normal spinlocks.
— With CONFIG_PREEMPT_RT_FULL, virtually all kernel code becomes preemptible
— An interrupt can occur at any time, when returning from the interrupt handler, the
woken up process can start immediately.
17
Thank You
Code examples and more
http://www.discoversdk.com/blog
18

Linux scheduler

  • 1.
    Linux Scheduler (Practical View) 1 LiranBen Haim liran@discoversdk.com
  • 2.
    Rights to Copy —Attribution – ShareAlike 2.0 — You are free to copy, distribute, display, and perform the work to make derivative works to make commercial use of the work Under the following conditions Attribution. You must give the original author credit. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. For any reuse or distribution, you must make clear to others the license terms of this work. Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above. License text: http://creativecommons.org/licenses/by-sa/2.0/legalcode — This kit contains work by the following authors: — © Copyright 2004-2009 Michael Opdenacker /Free Electrons michael@free-electrons.com http://www.free-electrons.com — © Copyright 2003-2006 Oron Peled oron@actcom.co.il http://www.actcom.co.il/~oron — © Copyright 2004–2008 Codefidence ltd. info@codefidence.com http://www.codefidence.com — © Copyright 2009–2010 Bina ltd. info@bna.co.il http://www.bna.co.il 2
  • 3.
    Processes and Threads Aprocess is an instance of a running program. Multiple instances of the same program can be running. Program code (“text section”) memory is shared. Each process has its own data section, address space, open files and signal handlers. A thread is a single task in a program. It belongs to a process and shares the common data section, address space, open files and pending signals. It has its own stack, pending signals and state. It's common to refer to single threaded programs as processes. 3
  • 4.
    The Kernel andThreads In 2.6 an explicit notion of processes and threads was introduced to the kernel. Scheduling is done on a thread by thread basis. The basic object the kernel works with is a task, which is analogous to a thread. 4
  • 5.
    Thread 1 Thread 1 Thread 2 Thread 3 Thread 4 Process123 Process 124 File Descriptors Memory Signal Handlers File Descriptors Memory Signal Handlers Stack State Signal Mask Stack State Signal Mask Stack State Signal Mask Stack State Signal Mask Stack State Signal Mask Priority Priority Priority Priority Priority 5
  • 6.
  • 7.
    Linux Priorities 0 1 2 3 4 98 99 97 ... Non real-timeprocesses SCHED_OTHER SCHED_BATCH SCHED_IDLE Real time processes SCHED_FIFO SCHED_RR SCHED_DEADLINE (3.14) 19 18 17 16 -19 -20 -18 ... Nice level Real Time priority 7
  • 8.
    API — int sched_setscheduler(pid_tpid, int policy, const struct sched_param *param); — int setpriority(int which, id_t who, int prio); — int sched_setparam(pid_t pid, const struct sched_param *param); — int sched_setattr(pid_t pid, struct sched_attr *attr, unsigned int flags); 8
  • 9.
  • 10.
    Blocking Threads — Anonblocking infinite loop in a thread scheduled under the SCHED_FIFO, SCHED_RR, or SCHED_DEADLINE policy will block all threads with lower priority forever — Solution: Limiting the CPU usage of real-time and deadline processes — /proc/sys/kernel/sched_rt_period_us — Period that is equivalent to 100% CPU (default: 1000000) — /proc/sys/kernel/sched_rt_runtime_us — how much of the "period" time can be used by all real- time and deadline scheduled processes on the system (default: 950000) 10
  • 11.
    Preemption — The Linuxkernel is a preemptive operating system — When a task runs in user space mode and gets interrupted by an interruption, if the interrupt handler wakes up another task, this task can be scheduled as soon as we return from the interrupt handler 11
  • 12.
    — However, whenthe interrupt comes while the task is executing a system call, this system call has to finish before another task can be scheduled. — By default, the Linux kernel does not do kernel preemption. — This means that the time before which the scheduler will be called to schedule another task is unbounded 12
  • 13.
  • 14.
    CONFIG_PREEMPT_NONE — Kernel code(interrupts, exceptions, system calls) never preempted. Default behavior in standard kernels. — Best for systems making intense computations, on which overall throughput is key. — Best to reduce task switching to maximize CPU and cache usage (by reducing context switching). 14
  • 15.
    CONFIG_PREEMPT_VOLUNTARY — Kernel codecan preempt itself — Typically for desktop systems, for quicker application reaction to user input. — Adds explicit rescheduling points throughout kernel code. — Minor impact on throughput. — Used in: Ubuntu Desktop 15.04, Ubuntu Server 14.04 — Use: cond_resched() 15
  • 16.
    CONFIG_PREEMPT — Most kernelcode can be involuntarily preempted at any time. When a process becomes runnable, no more need to wait for kernel code (typically a system call) to return before running the scheduler. — Exception: kernel critical sections (holding spinlocks). In a case you hold a spinlock on a uni-processor system, kernel preemption could run another process, which would loop forever if it tried to acquire the same spinlock. — Typically for desktop or embedded systems with latency requirements in the milliseconds range. 16
  • 17.
    CONFIG_PREEMPT_RT — The PREEMPT_RTpatch adds a new level of preemption, called CONFIG_PREEMPT_RT_FULL — This level of preemption replaces all kernel spinlocks by mutexes (or so-called sleeping spinlocks) — Instead of providing mutual exclusion by disabling interrupts and preemption, they are just normal locks: when contention happens, the process is blocked and another one is selected by the scheduler. — Works well with threaded interrupts, since threads can block, while usual interrupt handlers could not. — Some core, carefully controlled, kernel spinlocks remain as normal spinlocks. — With CONFIG_PREEMPT_RT_FULL, virtually all kernel code becomes preemptible — An interrupt can occur at any time, when returning from the interrupt handler, the woken up process can start immediately. 17
  • 18.
    Thank You Code examplesand more http://www.discoversdk.com/blog 18