SlideShare a Scribd company logo
Process Scheduling
Hao-Ran Liu
Objective
• Decide which process runs, when, and for
how long
• Considering the overhead of context
switches, we need to balance between
conflicting goal
– CPU utilization (high throughput)
– Better Interactive performance (low latency)
Multitasking
• Cooperative
– A process does not stop running until it voluntary
decides to do so
– Anyone can monopolize the processor; a hung
process that never yields can lock the entire system
– A technique used in many user-mode threading
libraries.
• Preemptive
– A running process can be suspended at any time
(usually because it exhausts its time slice)
Type of processes
• I/O-bound processes
– spend most of their time waiting for I/O
– should be executed often (for short durations)
when they are runnable
• CPU-bound processes
– spend most of their time executing code; tend
to run until they are preempted
– should be executed for longer durations (to
improve throughput)
Scheduling policies
• Check sched(7) man page for more details
• Normal
• Real-time
Name Description
SCHED_NORMAL The standard time-sharing policy for regular tasks
SCHED_BATCH For CPU-bound tasks that does not preempt often
SCHED_IDLE For running very low priority background jobs (lower
than a +19 nice value)
SCHED_FIFO FIFO without time slice
SCHED_RR Round robin with maximum time slice
SCHED_DEADLINE Earliest Deadline First + Constant Bandwidth Server
Accept a task only if its periodic job can be done
before deadline
Process priority
• Processes with a higher priority
– run before those with a lower priority
– receive a longer time slice
• Priority range [static, dynamic]
– Normal, batch: [always 0, -20~+19], default: [0, 0],
dynamic priority is the nice value you adjust in user
space. A larger “nice” value correspond to a lower
priority
– FIFO, RR: [0~99, 0], higher value means greater
priority. FIFO, RR processes are at a higher priority
than normal processes
– Deadline: Not applicable. Deadline processes are
always the highest priority in the system
Time slice
• How long a task can run until it is
preempted
• The value of the time slice:
– higher: better throughput
– lower: better interactive performance (shorter
scheduling latency), but more CPU time
wasted on context switches
– Default value is usally pretty small (for good
interactive performance)
Completely Fair Scheduler
• The scheduler for SCHED_NORMAL,
SCHED_BATCH, SCHED_IDLE classes
• CFS assigns a proportion of the processor,
instead of time slices, to processes
– A process with higher nice value receives a smaller
proportion of the CPU
• If a process enters runnable state and has
consumed a smaller proportion of the CPU than
the currently executing one, it runs immediately,
preempting the current one.
CFS scheduler in action
• Two processes
– Video encoder(CPU-bound) and text editor(I/O-bound)
– Both processes have the same nice value
• We want text editor to preempt video encoder
when the editor is runnable
– the text editor consumes a smaller proportion of the
CPU than the video encoder, so it will preempt the
video encoder once it is runnable.
“timeslice” in CFS
• Target latency
– /proc/sys/kernel/sched_latency_ns
– the period in which all run queue tasks are scheduled at least
once
• Timeslice_CFS = target latency / number of runnable
processes * nice_weight
– Ex: target latency = 20ms, two runnable processes at the same
priority, each will run for 10ms before preemption
• If the number of runnable processes =>∞,
timeslice_CFS => 0
– Unacceptable switching costs
– CFS imposes a floor on the “timeslice”:
/proc/sys/kernel/sched_min_granularity_ns, default value is 1ms
• CFS is not “fair” if the number of processes is extremely
large
CFS example again
• Two processes, nice value = 0, 5
– Weight for a nice value of 5 is 1/3
– If target latency = 20ms, the two processes receive 15,
5ms “”timeslice” respectively
– If we change nice value to 10,15, they still receive the
same “timeslice”
• The proportion of processor time that any
process receives is determined only by the
relative difference in niceness between it and
the other runnable processes
CFS group scheduling
• Sometimes, it may be desirable to group tasks and
provide fair CPU time to each such task group
• Kernel config required:
– CONFIG_FAIR_GROUP_SCHED
– CONFIG_RT_GROUP_SCHED
• Example:
# mount -t tmpfs cgroup_root /sys/fs/cgroup
# mkdir /sys/fs/cgroup/cpu
# mount -t cgroup -ocpu none /sys/fs/cgroup/cpu
# cd /sys/fs/cgroup/cpu
# mkdir multimedia # create "multimedia" group of tasks
# mkdir browser # create "browser" group of tasks
# #Configure the multimedia group to receive twice the CPU bandwidth
# #that of browser group
# echo 2048 > multimedia/cpu.shares
# echo 1024 > browser/cpu.shares
# firefox & # Launch firefox and move it to "browser" group
# echo <firefox_pid> > browser/tasks
# #Launch gmplayer (or your favourite movie player)
# echo <movie_player_pid> > multimedia/tasks
Sporadic task model
deadline scheduling
• Each SCHED_DEADLINE task is characterized by the
"runtime", "deadline", and "period" parameters
• The kernel performs an admittance test when setting or
changing SCHED_DEADLINE policy and attributes with
sched_attr() system call.
arrival/wakeup absolute deadline
| start time |
| | |
v v v
-----x--------xooooooooooooooooo--------x--------x---
|<-- Runtime ------->|
|<----------- Deadline ----------->|
|<-------------- Period ------------------->|
Some tools for real-time tasks
• chrt sets or retrieves the real-time scheduling attributes
of an existing pid, or runs command with the given
attributes.
• Limiting the CPU usage of real-time and deadline
processes
– A nonblocking infinite loop in a thread scheduled under the
FIFO, RR, or DEADLINE policy will block all threads with lower
priority forever
– two /proc files can be used to reserve a certain amount of CPU
time to be used by non-real-time processes.
• /proc/sys/kernel/sched_rt_period_us (default: 1000000)
• /proc/sys/kernel/sched_rt_runtime_us (default: 950000)
chrt [options] [<policy>] <priority> [-p <pid> | <command> [<arg>...]]
Context switches
• schedule() called context_switch() after when a
new process has been selected to run
• context_switch()
– switch_mm(): switch virtual memory mapping
– switch_to(): switch processor state.
• Kernel are informed to reschedule if
need_resched variable is set true
– Set by scheduler_tick() when a process should be
preempted
– Set by try_to_wake_up() when a process with higher
priority than current process is awaken
Example: creating kernel thread
#include <linux/module.h>
#include <linux/kthread.h>
#define DPRINTK(fmt, args...) 
printk("%s(): " fmt, __func__, ##args)
static struct task_struct *kth_test_task;
static int data;
static int kth_test(void *arg)
{
unsigned int timeout;
int *d = (int *) arg;
while (!kthread_should_stop()) {
DPRINTK("data=%dn", ++(*d));
set_current_state(TASK_INTERRUPTIBLE);
timeout = schedule_timeout(10 * HZ);
if (timeout)
DPRINTK("schedule_timeout return early.n");
}
DPRINTK("exit.n");
return 0;
}
static int __init init_modules(void)
{
int ret;
kth_test_task = kthread_create(kth_test, 
&data, "kth_test");
if (IS_ERR(kth_test_task)) {
ret = PTR_ERR(kth_test_task);
kth_test_task = NULL;
goto out;
}
wake_up_process(kth_test_task);
return 0;
out:
return ret;
}
static void __exit exit_modules(void)
{
/* block until kth_test_task exit */
kthread_stop(kth_test_task);
}
module_init(init_modules);
module_exit(exit_modules);
Process sleeping
 Processes need to sleep when requests cannot be
satisfied immediately
 Kernel output buffer is full or no data is available
 Rule for sleeping
 Never sleep in an atomic context
 Holding a spinlock, seqlock or RCU lock
 Interrupts are disabled
 Always check to ensure that the condition the process
was waiting for is indeed true after the process wakes up
Wait queue
 Wait queue contains a list of processes, all
waiting for a specific event
 Declaration and initialization of wait queue
// defined and initialized statically with
DECLARE_WAIT_QUEUE_HEAD(name);
// initialized dynamically
Wait_queue_head_t my_queue;
init_waitqueue_head(&my_queue);
wait_event macros
// queue: the wait queue head to use. Note that it is passed “by value”
// condition: arbitrary boolean expression, evaluated by the macro before
// and after sleeping until the condition becomes true. It may
// be evaluated an arbitrary number of times, so it should not
// have any side effects.
// timeout: wait for the specific number of clock ticks (in jiffies)
// uninterruptible sleep until a condition gets true
wait_event(queue, condition);
// interruptible sleep until a condition gets true, return –ERESTARTSYS if
// interrupted by a signal, return 0 if condition evaluated to be true
wait_event_interruptible(queue, condition);
// uninterruptible sleep until a condition gets true or a timeout elapses
// return 0 if the timeout elapsed, and the remaining jiffies if the
// condition evaluated to true before the timout elapsed
wait_event_timeout(queue, condition, timeout);
// interruptible sleep until a condition gets true or a timeout elapses
// return 0 if the timeout elapsed, -ERESTARTSYS if interrupted by a
// signal, and the remaining jiffies if the condition evaluated to true
// before the timout elapsed
wait_event_interruptible_timeout(queue, condition, timeout);
wake_up macros
// Wake processes that are sleeping on the queue q. The _interruptible
// form wakes only interruptible processes. Normally, only one exclusive
// waiter is awakened (to avoid thundering herd problem), but that
// behavior can be changed with the _nr or _all forms. The _sync version
// does not reschedule the CPU before returning.
void wake_up(struct wait_queue_head_t *q);
void wake_up_interruptible(struct wait_queue_head_t *q);
void wake_up_nr(struct wait_queue_head_t *q, int nr);
void wake_up_interruptible_nr(struct wait_queue_head_t *q, int nr);
void wake_up_all(struct wait_queue_head_t *q);
void wake_up_interruptible_all(struct wait_queue_head_t *q);
void wake_up_interruptible_sync(struct wait_queue_head_t *q);
 Within a real device driver, a process blocked in a read call is
awaken when data arrives; usually the hardware issues an
interrupt to signal such an event, and the driver awakens
waiting processes as part of handling the interrupt
A simple example of putting
processes to sleep
 sleepy device behavior: any process that
attempts to read from the device is put to
sleep. Whenever a process writes to the
device, all sleeping processes are awaken
 Note that on single processor, the second
process to wake up would immediately go
back to sleep
sleepy’s read and write
ssize_t sleepy_read (struct file *filp, char __user *buf,
size_t count, loff_t *pos) {
printk(KERN_DEBUG "process %i (%s) going to sleepn",
current->pid, current->comm);
wait_event_interruptible(wq, flag != 0);
flag = 0;
printk(KERN_DEBUG "awoken %i (%s)n", current->pid, current->comm);
return 0; /* EOF */
}
ssize_t sleepy_write (struct file *filp, const char __user *buf,
size_t count, loff_t *pos) {
printk(KERN_DEBUG "process %i (%s) awakening the readers...n",
current->pid, current->comm);
flag = 1;
wake_up_interruptible(&wq);
return count; /* succeed, to avoid retrial */
}
Implementation of wait_event:
How to implement sleep manually
#define wait_event(wq, condition) 
do { 
if (condition) 
break; 
__wait_event(wq, condition); 
} while (0)
#define __wait_event(wq, condition) 
do { 
DEFINE_WAIT(__wait); 

for (;;) { 
prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); 
if (condition) 
break; 
schedule(); 
} 
finish_wait(&wq, &__wait); 
} while (0)
Implementation of wait_event:
How to implement sleep manually
 prepare_to_wait
 add wait queue entry to the wait queue and set the
process state
 finish_wait
 set task state to TASK_RUNNING and remove wait queue
entry from wait queue
 Questions:
 What if the ‘if (condition) ..’ statement is moved to
the front of prepare_to_wait()?
 What if the ‘wake_up’ event happens just after the ’if
(condition) ..‘ statement but before the execution of
the schedule() function?
User Preemption
• It can occur if need_resched is true when
returning to user-space
– from a system call
– from an interrupt handler
Kernel Preemption
• In nonpreemptive kernels, kernel code runs until
completion.
– The scheduler cannot reschedule a task while it is in
the kernel
– kernel code is scheduled cooperatively, not
preemptively
• In the 2.6+ kernel, however, the Linux kernel
became preemptive:
– It is now possible to preempt a task at any point, so
long as the kernel is in a state in which it is safe to
reschedule
• Safe => preempt_count == 0 (kernel doesn’t hold any lock
and isn’t in any atomic context like softirq or hardirq)
Kernel Preemption
• preempt_count
– a variable in each process’s thread_info
– Begins at zero and increments when kernel
enters any atomic contexts, decrements when
leaves.
– If this counter is zero, kernel is preemptible
Cases that needs preemption disable
• Per-CPU data structures
• Some registers must be protected
– On x86, kernel does not save FPU state
except for user tasks. Entering and exiting
FPU mode is a critical section that must occur
while preemption is disabled
struct this_needs_locking tux[NR_CPUS];
tux[smp_processor_id()] = some_value;
/* task is preempted here... */
something = tux[smp_processor_id()];
preempt_count
/*
* We put the hardirq and softirq counter into the preemption
* counter. The bitmask has the following meaning:
*
* - bits 0-7 are the preemption count (max preemption depth: 256)
* - bits 8-15 are the softirq count (max # of softirqs: 256)
*
* The hardirq count can in theory reach the same as NR_IRQS.
* In reality, the number of nested IRQS is limited to the stack
* size as well. For archs with over 1000 IRQS it is not practical
* to expect that they will all nest. We give a max of 10 bits for
* hardirq nesting. An arch may choose to give less than 10 bits.
* m68k expects it to be 8.
*
* - bits 16-25 are the hardirq count (max # of nested hardirqs: 1024)
* - bit 26 is the NMI_MASK
* - bit 28 is the PREEMPT_ACTIVE flag
*
* PREEMPT_MASK: 0x000000ff
* SOFTIRQ_MASK: 0x0000ff00
* HARDIRQ_MASK: 0x03ff0000
* NMI_MASK: 0x04000000
*/
include/linux/hardirq.h
References
• Linux Kernel Development, 3rd Edition,
Robert Love, 2010
• Linux kernel source, http://lxr.free-
electrons.com

More Related Content

What's hot

Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Anne Nicolas
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
Brendan Gregg
 
Using cgroups in docker container
Using cgroups in docker containerUsing cgroups in docker container
Using cgroups in docker container
Vinay Jindal
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using Tracing
ScyllaDB
 
Linux Troubleshooting
Linux TroubleshootingLinux Troubleshooting
Linux Troubleshooting
Keith Wright
 
Kgdb kdb modesetting
Kgdb kdb modesettingKgdb kdb modesetting
Kgdb kdb modesetting
Kaushal Kumar Gupta
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory model
SeongJae Park
 
Introduction to Perf
Introduction to PerfIntroduction to Perf
Introduction to Perf
Wang Hsiangkai
 
What Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versaWhat Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versa
Brendan Gregg
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019
Brendan Gregg
 
Cache profiling on ARM Linux
Cache profiling on ARM LinuxCache profiling on ARM Linux
Cache profiling on ARM Linux
Prabindh Sundareson
 
FreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame GraphsFreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame Graphs
Brendan Gregg
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
Brendan Gregg
 
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens AxboeKernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Anne Nicolas
 
Bpf performance tools chapter 4 bcc
Bpf performance tools chapter 4   bccBpf performance tools chapter 4   bcc
Bpf performance tools chapter 4 bcc
Viller Hsiao
 
DTrace Topics: Introduction
DTrace Topics: IntroductionDTrace Topics: Introduction
DTrace Topics: Introduction
Brendan Gregg
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
SUSE Labs Taipei
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPF
Brendan Gregg
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
Brendan Gregg
 

What's hot (20)

Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Using cgroups in docker container
Using cgroups in docker containerUsing cgroups in docker container
Using cgroups in docker container
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using Tracing
 
Linux Troubleshooting
Linux TroubleshootingLinux Troubleshooting
Linux Troubleshooting
 
Kgdb kdb modesetting
Kgdb kdb modesettingKgdb kdb modesetting
Kgdb kdb modesetting
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory model
 
Introduction to Perf
Introduction to PerfIntroduction to Perf
Introduction to Perf
 
What Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versaWhat Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versa
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019
 
Cache profiling on ARM Linux
Cache profiling on ARM LinuxCache profiling on ARM Linux
Cache profiling on ARM Linux
 
FreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame GraphsFreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame Graphs
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens AxboeKernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
 
Bpf performance tools chapter 4 bcc
Bpf performance tools chapter 4   bccBpf performance tools chapter 4   bcc
Bpf performance tools chapter 4 bcc
 
DTrace Topics: Introduction
DTrace Topics: IntroductionDTrace Topics: Introduction
DTrace Topics: Introduction
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPF
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
 

Similar to Process scheduling

LM10,11,12 - CPU SCHEDULING algorithms and its processes
LM10,11,12 - CPU SCHEDULING algorithms and its processesLM10,11,12 - CPU SCHEDULING algorithms and its processes
LM10,11,12 - CPU SCHEDULING algorithms and its processes
manideepakc
 
CPU Scheduling
CPU SchedulingCPU Scheduling
CPU Scheduling
amadayshwan
 
CPU scheduling in Operating System Explanation
CPU scheduling in Operating System ExplanationCPU scheduling in Operating System Explanation
CPU scheduling in Operating System Explanation
AnitaSofiaKeyser
 
LM9 - OPERATIONS, SCHEDULING, Inter process xommuncation
LM9 - OPERATIONS, SCHEDULING, Inter process xommuncationLM9 - OPERATIONS, SCHEDULING, Inter process xommuncation
LM9 - OPERATIONS, SCHEDULING, Inter process xommuncation
Mani Deepak Choudhry
 
FreeRTOS basics (Real time Operating System)
FreeRTOS basics (Real time Operating System)FreeRTOS basics (Real time Operating System)
FreeRTOS basics (Real time Operating System)
Naren Chandra
 
Operating Systems 1 (10/12) - Scheduling
Operating Systems 1 (10/12) - SchedulingOperating Systems 1 (10/12) - Scheduling
Operating Systems 1 (10/12) - Scheduling
Peter Tröger
 
In computing, scheduling is the action .
In computing, scheduling is the action .In computing, scheduling is the action .
In computing, scheduling is the action .
nathansel1
 
Cpu scheduling final
Cpu scheduling finalCpu scheduling final
Cpu scheduling final
marangburu42
 
Ch6 cpu scheduling
Ch6   cpu schedulingCh6   cpu scheduling
Ch6 cpu scheduling
Welly Dian Astika
 
Operating System
Operating SystemOperating System
Operating System
krishna partiwala
 
ch_scheduling (1).ppt
ch_scheduling (1).pptch_scheduling (1).ppt
ch_scheduling (1).ppt
Farhanahmad540205
 
Window scheduling algorithm
Window scheduling algorithmWindow scheduling algorithm
Window scheduling algorithm
Binal Parekh
 
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptxEmbedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
ProfMonikaJain
 
UNIPROCESS SCHEDULING.pptx
UNIPROCESS SCHEDULING.pptxUNIPROCESS SCHEDULING.pptx
UNIPROCESS SCHEDULING.pptx
ansariparveen06
 
Operating System.pptx
Operating System.pptxOperating System.pptx
Operating System.pptx
VanshikaRajput33
 
exp 3.docx
exp 3.docxexp 3.docx
exp 3.docx
Ganesh Chavan
 
Cpu scheduling
Cpu schedulingCpu scheduling
Cpu scheduling
marangburu42
 

Similar to Process scheduling (20)

LM10,11,12 - CPU SCHEDULING algorithms and its processes
LM10,11,12 - CPU SCHEDULING algorithms and its processesLM10,11,12 - CPU SCHEDULING algorithms and its processes
LM10,11,12 - CPU SCHEDULING algorithms and its processes
 
Section05 scheduling
Section05 schedulingSection05 scheduling
Section05 scheduling
 
CPU Scheduling
CPU SchedulingCPU Scheduling
CPU Scheduling
 
CPU scheduling in Operating System Explanation
CPU scheduling in Operating System ExplanationCPU scheduling in Operating System Explanation
CPU scheduling in Operating System Explanation
 
LM9 - OPERATIONS, SCHEDULING, Inter process xommuncation
LM9 - OPERATIONS, SCHEDULING, Inter process xommuncationLM9 - OPERATIONS, SCHEDULING, Inter process xommuncation
LM9 - OPERATIONS, SCHEDULING, Inter process xommuncation
 
Os2
Os2Os2
Os2
 
FreeRTOS basics (Real time Operating System)
FreeRTOS basics (Real time Operating System)FreeRTOS basics (Real time Operating System)
FreeRTOS basics (Real time Operating System)
 
Operating Systems 1 (10/12) - Scheduling
Operating Systems 1 (10/12) - SchedulingOperating Systems 1 (10/12) - Scheduling
Operating Systems 1 (10/12) - Scheduling
 
In computing, scheduling is the action .
In computing, scheduling is the action .In computing, scheduling is the action .
In computing, scheduling is the action .
 
Cp usched 2
Cp usched  2Cp usched  2
Cp usched 2
 
Cpu scheduling final
Cpu scheduling finalCpu scheduling final
Cpu scheduling final
 
Ch6 cpu scheduling
Ch6   cpu schedulingCh6   cpu scheduling
Ch6 cpu scheduling
 
Operating System
Operating SystemOperating System
Operating System
 
ch_scheduling (1).ppt
ch_scheduling (1).pptch_scheduling (1).ppt
ch_scheduling (1).ppt
 
Window scheduling algorithm
Window scheduling algorithmWindow scheduling algorithm
Window scheduling algorithm
 
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptxEmbedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
 
UNIPROCESS SCHEDULING.pptx
UNIPROCESS SCHEDULING.pptxUNIPROCESS SCHEDULING.pptx
UNIPROCESS SCHEDULING.pptx
 
Operating System.pptx
Operating System.pptxOperating System.pptx
Operating System.pptx
 
exp 3.docx
exp 3.docxexp 3.docx
exp 3.docx
 
Cpu scheduling
Cpu schedulingCpu scheduling
Cpu scheduling
 

Recently uploaded

space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 

Recently uploaded (20)

space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 

Process scheduling

  • 2. Objective • Decide which process runs, when, and for how long • Considering the overhead of context switches, we need to balance between conflicting goal – CPU utilization (high throughput) – Better Interactive performance (low latency)
  • 3. Multitasking • Cooperative – A process does not stop running until it voluntary decides to do so – Anyone can monopolize the processor; a hung process that never yields can lock the entire system – A technique used in many user-mode threading libraries. • Preemptive – A running process can be suspended at any time (usually because it exhausts its time slice)
  • 4. Type of processes • I/O-bound processes – spend most of their time waiting for I/O – should be executed often (for short durations) when they are runnable • CPU-bound processes – spend most of their time executing code; tend to run until they are preempted – should be executed for longer durations (to improve throughput)
  • 5. Scheduling policies • Check sched(7) man page for more details • Normal • Real-time Name Description SCHED_NORMAL The standard time-sharing policy for regular tasks SCHED_BATCH For CPU-bound tasks that does not preempt often SCHED_IDLE For running very low priority background jobs (lower than a +19 nice value) SCHED_FIFO FIFO without time slice SCHED_RR Round robin with maximum time slice SCHED_DEADLINE Earliest Deadline First + Constant Bandwidth Server Accept a task only if its periodic job can be done before deadline
  • 6. Process priority • Processes with a higher priority – run before those with a lower priority – receive a longer time slice • Priority range [static, dynamic] – Normal, batch: [always 0, -20~+19], default: [0, 0], dynamic priority is the nice value you adjust in user space. A larger “nice” value correspond to a lower priority – FIFO, RR: [0~99, 0], higher value means greater priority. FIFO, RR processes are at a higher priority than normal processes – Deadline: Not applicable. Deadline processes are always the highest priority in the system
  • 7. Time slice • How long a task can run until it is preempted • The value of the time slice: – higher: better throughput – lower: better interactive performance (shorter scheduling latency), but more CPU time wasted on context switches – Default value is usally pretty small (for good interactive performance)
  • 8. Completely Fair Scheduler • The scheduler for SCHED_NORMAL, SCHED_BATCH, SCHED_IDLE classes • CFS assigns a proportion of the processor, instead of time slices, to processes – A process with higher nice value receives a smaller proportion of the CPU • If a process enters runnable state and has consumed a smaller proportion of the CPU than the currently executing one, it runs immediately, preempting the current one.
  • 9. CFS scheduler in action • Two processes – Video encoder(CPU-bound) and text editor(I/O-bound) – Both processes have the same nice value • We want text editor to preempt video encoder when the editor is runnable – the text editor consumes a smaller proportion of the CPU than the video encoder, so it will preempt the video encoder once it is runnable.
  • 10. “timeslice” in CFS • Target latency – /proc/sys/kernel/sched_latency_ns – the period in which all run queue tasks are scheduled at least once • Timeslice_CFS = target latency / number of runnable processes * nice_weight – Ex: target latency = 20ms, two runnable processes at the same priority, each will run for 10ms before preemption • If the number of runnable processes =>∞, timeslice_CFS => 0 – Unacceptable switching costs – CFS imposes a floor on the “timeslice”: /proc/sys/kernel/sched_min_granularity_ns, default value is 1ms • CFS is not “fair” if the number of processes is extremely large
  • 11. CFS example again • Two processes, nice value = 0, 5 – Weight for a nice value of 5 is 1/3 – If target latency = 20ms, the two processes receive 15, 5ms “”timeslice” respectively – If we change nice value to 10,15, they still receive the same “timeslice” • The proportion of processor time that any process receives is determined only by the relative difference in niceness between it and the other runnable processes
  • 12. CFS group scheduling • Sometimes, it may be desirable to group tasks and provide fair CPU time to each such task group • Kernel config required: – CONFIG_FAIR_GROUP_SCHED – CONFIG_RT_GROUP_SCHED • Example: # mount -t tmpfs cgroup_root /sys/fs/cgroup # mkdir /sys/fs/cgroup/cpu # mount -t cgroup -ocpu none /sys/fs/cgroup/cpu # cd /sys/fs/cgroup/cpu # mkdir multimedia # create "multimedia" group of tasks # mkdir browser # create "browser" group of tasks # #Configure the multimedia group to receive twice the CPU bandwidth # #that of browser group # echo 2048 > multimedia/cpu.shares # echo 1024 > browser/cpu.shares # firefox & # Launch firefox and move it to "browser" group # echo <firefox_pid> > browser/tasks # #Launch gmplayer (or your favourite movie player) # echo <movie_player_pid> > multimedia/tasks
  • 13. Sporadic task model deadline scheduling • Each SCHED_DEADLINE task is characterized by the "runtime", "deadline", and "period" parameters • The kernel performs an admittance test when setting or changing SCHED_DEADLINE policy and attributes with sched_attr() system call. arrival/wakeup absolute deadline | start time | | | | v v v -----x--------xooooooooooooooooo--------x--------x--- |<-- Runtime ------->| |<----------- Deadline ----------->| |<-------------- Period ------------------->|
  • 14. Some tools for real-time tasks • chrt sets or retrieves the real-time scheduling attributes of an existing pid, or runs command with the given attributes. • Limiting the CPU usage of real-time and deadline processes – A nonblocking infinite loop in a thread scheduled under the FIFO, RR, or DEADLINE policy will block all threads with lower priority forever – two /proc files can be used to reserve a certain amount of CPU time to be used by non-real-time processes. • /proc/sys/kernel/sched_rt_period_us (default: 1000000) • /proc/sys/kernel/sched_rt_runtime_us (default: 950000) chrt [options] [<policy>] <priority> [-p <pid> | <command> [<arg>...]]
  • 15. Context switches • schedule() called context_switch() after when a new process has been selected to run • context_switch() – switch_mm(): switch virtual memory mapping – switch_to(): switch processor state. • Kernel are informed to reschedule if need_resched variable is set true – Set by scheduler_tick() when a process should be preempted – Set by try_to_wake_up() when a process with higher priority than current process is awaken
  • 16. Example: creating kernel thread #include <linux/module.h> #include <linux/kthread.h> #define DPRINTK(fmt, args...) printk("%s(): " fmt, __func__, ##args) static struct task_struct *kth_test_task; static int data; static int kth_test(void *arg) { unsigned int timeout; int *d = (int *) arg; while (!kthread_should_stop()) { DPRINTK("data=%dn", ++(*d)); set_current_state(TASK_INTERRUPTIBLE); timeout = schedule_timeout(10 * HZ); if (timeout) DPRINTK("schedule_timeout return early.n"); } DPRINTK("exit.n"); return 0; } static int __init init_modules(void) { int ret; kth_test_task = kthread_create(kth_test, &data, "kth_test"); if (IS_ERR(kth_test_task)) { ret = PTR_ERR(kth_test_task); kth_test_task = NULL; goto out; } wake_up_process(kth_test_task); return 0; out: return ret; } static void __exit exit_modules(void) { /* block until kth_test_task exit */ kthread_stop(kth_test_task); } module_init(init_modules); module_exit(exit_modules);
  • 17. Process sleeping  Processes need to sleep when requests cannot be satisfied immediately  Kernel output buffer is full or no data is available  Rule for sleeping  Never sleep in an atomic context  Holding a spinlock, seqlock or RCU lock  Interrupts are disabled  Always check to ensure that the condition the process was waiting for is indeed true after the process wakes up
  • 18. Wait queue  Wait queue contains a list of processes, all waiting for a specific event  Declaration and initialization of wait queue // defined and initialized statically with DECLARE_WAIT_QUEUE_HEAD(name); // initialized dynamically Wait_queue_head_t my_queue; init_waitqueue_head(&my_queue);
  • 19. wait_event macros // queue: the wait queue head to use. Note that it is passed “by value” // condition: arbitrary boolean expression, evaluated by the macro before // and after sleeping until the condition becomes true. It may // be evaluated an arbitrary number of times, so it should not // have any side effects. // timeout: wait for the specific number of clock ticks (in jiffies) // uninterruptible sleep until a condition gets true wait_event(queue, condition); // interruptible sleep until a condition gets true, return –ERESTARTSYS if // interrupted by a signal, return 0 if condition evaluated to be true wait_event_interruptible(queue, condition); // uninterruptible sleep until a condition gets true or a timeout elapses // return 0 if the timeout elapsed, and the remaining jiffies if the // condition evaluated to true before the timout elapsed wait_event_timeout(queue, condition, timeout); // interruptible sleep until a condition gets true or a timeout elapses // return 0 if the timeout elapsed, -ERESTARTSYS if interrupted by a // signal, and the remaining jiffies if the condition evaluated to true // before the timout elapsed wait_event_interruptible_timeout(queue, condition, timeout);
  • 20. wake_up macros // Wake processes that are sleeping on the queue q. The _interruptible // form wakes only interruptible processes. Normally, only one exclusive // waiter is awakened (to avoid thundering herd problem), but that // behavior can be changed with the _nr or _all forms. The _sync version // does not reschedule the CPU before returning. void wake_up(struct wait_queue_head_t *q); void wake_up_interruptible(struct wait_queue_head_t *q); void wake_up_nr(struct wait_queue_head_t *q, int nr); void wake_up_interruptible_nr(struct wait_queue_head_t *q, int nr); void wake_up_all(struct wait_queue_head_t *q); void wake_up_interruptible_all(struct wait_queue_head_t *q); void wake_up_interruptible_sync(struct wait_queue_head_t *q);  Within a real device driver, a process blocked in a read call is awaken when data arrives; usually the hardware issues an interrupt to signal such an event, and the driver awakens waiting processes as part of handling the interrupt
  • 21. A simple example of putting processes to sleep  sleepy device behavior: any process that attempts to read from the device is put to sleep. Whenever a process writes to the device, all sleeping processes are awaken  Note that on single processor, the second process to wake up would immediately go back to sleep
  • 22. sleepy’s read and write ssize_t sleepy_read (struct file *filp, char __user *buf, size_t count, loff_t *pos) { printk(KERN_DEBUG "process %i (%s) going to sleepn", current->pid, current->comm); wait_event_interruptible(wq, flag != 0); flag = 0; printk(KERN_DEBUG "awoken %i (%s)n", current->pid, current->comm); return 0; /* EOF */ } ssize_t sleepy_write (struct file *filp, const char __user *buf, size_t count, loff_t *pos) { printk(KERN_DEBUG "process %i (%s) awakening the readers...n", current->pid, current->comm); flag = 1; wake_up_interruptible(&wq); return count; /* succeed, to avoid retrial */ }
  • 23. Implementation of wait_event: How to implement sleep manually #define wait_event(wq, condition) do { if (condition) break; __wait_event(wq, condition); } while (0) #define __wait_event(wq, condition) do { DEFINE_WAIT(__wait); for (;;) { prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); if (condition) break; schedule(); } finish_wait(&wq, &__wait); } while (0)
  • 24. Implementation of wait_event: How to implement sleep manually  prepare_to_wait  add wait queue entry to the wait queue and set the process state  finish_wait  set task state to TASK_RUNNING and remove wait queue entry from wait queue  Questions:  What if the ‘if (condition) ..’ statement is moved to the front of prepare_to_wait()?  What if the ‘wake_up’ event happens just after the ’if (condition) ..‘ statement but before the execution of the schedule() function?
  • 25. User Preemption • It can occur if need_resched is true when returning to user-space – from a system call – from an interrupt handler
  • 26. Kernel Preemption • In nonpreemptive kernels, kernel code runs until completion. – The scheduler cannot reschedule a task while it is in the kernel – kernel code is scheduled cooperatively, not preemptively • In the 2.6+ kernel, however, the Linux kernel became preemptive: – It is now possible to preempt a task at any point, so long as the kernel is in a state in which it is safe to reschedule • Safe => preempt_count == 0 (kernel doesn’t hold any lock and isn’t in any atomic context like softirq or hardirq)
  • 27. Kernel Preemption • preempt_count – a variable in each process’s thread_info – Begins at zero and increments when kernel enters any atomic contexts, decrements when leaves. – If this counter is zero, kernel is preemptible
  • 28. Cases that needs preemption disable • Per-CPU data structures • Some registers must be protected – On x86, kernel does not save FPU state except for user tasks. Entering and exiting FPU mode is a critical section that must occur while preemption is disabled struct this_needs_locking tux[NR_CPUS]; tux[smp_processor_id()] = some_value; /* task is preempted here... */ something = tux[smp_processor_id()];
  • 29. preempt_count /* * We put the hardirq and softirq counter into the preemption * counter. The bitmask has the following meaning: * * - bits 0-7 are the preemption count (max preemption depth: 256) * - bits 8-15 are the softirq count (max # of softirqs: 256) * * The hardirq count can in theory reach the same as NR_IRQS. * In reality, the number of nested IRQS is limited to the stack * size as well. For archs with over 1000 IRQS it is not practical * to expect that they will all nest. We give a max of 10 bits for * hardirq nesting. An arch may choose to give less than 10 bits. * m68k expects it to be 8. * * - bits 16-25 are the hardirq count (max # of nested hardirqs: 1024) * - bit 26 is the NMI_MASK * - bit 28 is the PREEMPT_ACTIVE flag * * PREEMPT_MASK: 0x000000ff * SOFTIRQ_MASK: 0x0000ff00 * HARDIRQ_MASK: 0x03ff0000 * NMI_MASK: 0x04000000 */ include/linux/hardirq.h
  • 30. References • Linux Kernel Development, 3rd Edition, Robert Love, 2010 • Linux kernel source, http://lxr.free- electrons.com