2. Multitasking
• Multitasking operating systems come in two flavors: cooperative
multitasking and preemptive multitasking
• Linux implements preemptive multitasking
• the scheduler decides when a process is to cease running and a new process
is to begin running
3. Process Scheduler
• Linux kernel introduce Completely Fair Scheduler since version 2.6.23
• CFS has been modified a bit further in 2.6.24
• Comparison
• Linux pre-2.6 Multilevel feedback queue
• Linux 2.6-2.6.23 O(1) scheduler
• Linux post-2.6.23 Completely Fair Scheduler
• FreeBSD Multilevel feedback queue
• Mac OS X Multilevel feedback queue
• Windows NT Multilevel feedback queue
• Brain Fuck Scheduler
4. Policy
• I/O-bound processes
• Processor-bound processes
• tends to run such processes less frequently but for longer durations
• Policy in Unix systems tends to explicitly favor I/O-bound processes,
thus providing good process response time
• Linux is favoring I/O-bound processes over processor-bound
processors
5. Process Priority
• The Linux kernel implements two separate priority ranges
• Nice value
• Real-time priority
• Nice value
• A number from -20 to +19 with a default of 0
• Real-time priority
• Default range from 0 to 99
• Real-time priority and nice value are in disjoint value spaces
6. Timeslice
• Timeslice is the numeric value that represents how long a task can
run until it is preempted
• Linux’s CFS scheduler does NOT directly assign timeslices to processes
• CFS assigns processes a proportion of the processor
8. Scheduling Algorithm
• How traditional Unix systems schedule processes.
• Mapping nice values onto timeslice to alloct each nice value cause
some drawbacks.
• Process A: nice value = 0 timeslice of 100 milliseconds
Process B: nice value = 20 timeslice of 5 milliseconds,
• Process A: nice value = 20 timeslice of 5 milliseconds
Process B: nice value = 20 timeslice of 5 milliseconds,
• Process A: nice value = 0 timeslice of 100 milliseconds
Process B: nice value = 0 timeslice of 100 milliseconds,
9. Scheduling Algorithm
• Process A: nice value = 0 timeslice of 100 milliseconds
Process B: nice value = 1 timeslice of 95 milliseconds,
• Process A: nice value = 18 timeslice of 10 milliseconds
Process B: nice value = 19 timeslice of 5 milliseconds,
• If performing a nice value to timeslice mapping, we need the ability to assign
the absolute timeslice.(ex. integer multiple of the timer ticks) Timeslice
change with different timer ticks.
• Optimize for interactive tasks. One process gains unfair amount of process
time.
10. Scheduling Algorithm
• The Linux scheduler is modular, and the modularity is called scheduler
classes
• The base scheduler code is defined in kernel/sched.c
• CFS is defined in kernel/sched_fair.c
• CFS basically models an “ideal, precise multi-tasking CPU” on real
hardware
• Do away with timeslices completely and assign each process a
PROPOTION of the processor
13. Fair Scheduling
• CFS is called a fair scheduler because it gives each process a fair
share—a proportion—of the processor’s time
• The absolute timeslice allotted any nice value is NOT an absolute
number, but a given proportion of the processor
• CFS is NOT perfectly fair, because it only approximates perfect
multitasking
• But it can place a lower bound on latency of n for n runnable
processes on the unfairness
14. The Linux Scheduling Implementation
• We discuss four components of CFS
• Time Accounting
• Process Selection
• The Scheduler Entry Point
• Sleeping and Waking Up
15. Time Accounting
• CFS does NOT have the notion of a timeslice, but it must still keep
account for the time that each process runs
• CFS uses the scheduler entity structure, struct sched_entity,
defined in <linux/sched.h>, to keep track of process accounting
• The scheduler entity structure is embedded in the process descriptor,
struct task_stuct, as a member variable named se
16. Time Accounting: Virtual Runtime
• The virtual runtime is used to help us approximate the “ideal
multitasking processor” that CFS is modeling
• CFS uses vruntime to account for how long a process has run and
thus how much longer it ought to run
• The vruntime variable stores the virtual runtime of a process, which is
the actual runtime normalized by the number of runnable processes
17. Process Selection
• CFS uses a red-black tree to manage the list of runnable processes
and efficiently find the process with the smallest vruntime
• Picking the next task
• run the process represented by the leftmost node in the rbtree
• __pick_next_entity()
• Adding processes to the tree
• enqueue_entity()
• Removing processes from the tree
• dequeue_entity()
18. The Scheduler Entry Point
• The main entry point into the process schedule is the function
schedule(), defined in kernel/sched.c
19. Sleeping and Waking Up
• Tasks that are sleeping (blocked) are in a special non-runnable state
• Without this special state, the scheduler would select tasks that did
not want to run
• Sleeping is handled via wait queues
• A wait queue is a simple list of processes waiting for an event to occur
20. Preemption and Context Switching
• Context switching is handled by the context_switch() function
defined in kernel/sched.c
• It is called by schedule() when a new process has been selected to
run to do two basic jobs
• Calls switch_mm() to switch the virtual memory mapping from the previous
process’s to that of the new process
• Calls switch_to() switch the processor state from the previous process’s to
the current’s
• The kernel provides the need_resched flag to signify whether a
reschedule should be performed
21. Preemption and Context Switching (cont.)
• Upon returning to user-space or returning from an interrupt, the
need_resched flag is checked
• If it is set, the kernel invokes the scheduler before continuing
• In 2.6, the need_resched flag was moved into a single bit of a special
flag variable inside the thread_info structure
22. User Preemption
• User preemption can occur
• When returning to user-space from a system call
• When returning to user-space from an interrupt handler
23. Kernel Preemption
• Kernel preemption can occur
• When an interrupt handler exits, before returning to kernel-space
• When kernel code becomes preemptible again
• If a task in the kernel explicitly calls schedule()
• If a task in the kernel blocks (which results in a call to schedule())
24. Real-Time Scheduling Policies
• Linux provides two real-time scheduling policies, SCHED_FIFO and
SCHED_RR
• The normal, not real-time scheduling policy is SCHED_NORMAL
• Real-time policies are managed not by the CFS, but by a special real-
time scheduler, defined in kernel/sched_rt.c