SlideShare a Scribd company logo
1 of 36
Download to read offline
Much Ado About Blocking:
Wait/Wake in the Linux Kernel
LinuxCon China – June 2017. Beijing, China.
Davidlohr Bueso <dave@stgolabs.net>
SUSE Labs.
2
Agenda
1. Fundamentals of Blocking
2.Flavors of blocking
● Wait-queues
● Simple wait-queues
● Lockless wake-queues
● rcuwait
Fundamentals of Blocking
4
Blocking 101
• As opposed to busy waiting, sleeping is required in
scenarios where the wait time can be too long.
‒ Other tasks ought to therefore run instead of wasting cycles.
‒ Overhead of context switch vs wasting cycles.
5
Blocking 101
• As opposed to busy waiting, sleeping is required in
scenarios where the wait time can be too long.
‒ Other tasks ought to therefore run instead of wasting cycles.
‒ Overhead of context switch vs wasting cycles.
• TASK_{UNINTERRUPTIBLE,INTERRUPTIBLE,KILLABLE}
• There are three elements to consider when waiting on
an event:
‒ The wakee (wake_up, wake_up_process, etc).
‒ The waker (schedule, schedule_timeout, io, etc).
‒ The condition/event
6
sleep_on()
CPU0
while (!cond)
sleep_on()
CPU1
cond = true
wake_up()
7
sleep_on()
CPU0
while (!cond)
sleep_on()
CPU1
cond = true
wake_up()
• Inherently racy on SMP.
‒ Missed wakeups
• Synchronization must be provided.
‒ Locking
‒ Memory barriers
8
SMP-safe Blocking
for (;;) {
set_current_state();
if (cond)
break;
schedule();
}
__set_current_state(TASK_RUNNING);
9
SMP-safe Blocking
for (;;) {
set_current_state();
if (cond)
break;
schedule();
}
__set_current_state(TASK_RUNNING);
Pairs with barrier in
wake_up()
Wait-queues
11
Wait-queues
• Provide different wrappers, such that users do not
suffer from races aforementioned.
for (;;) {
long __int = prepare_to_wait_event(&wq, &__wait,
state);
if (condition)
break;
schedule();
}
finish_wait(&wq, &__wait);
12
Wait-queues
• Provide different wrappers, such that users do not
suffer from races aforementioned.
for (;;) {
long __int = prepare_to_wait_event(&wq, &__wait,
state);
if (condition)
break;
schedule();
}
finish_wait(&wq, &__wait);
13
Wait-queues
• Provide different wrappers, such that users do not
suffer from races aforementioned.
for (;;) {
long __int = prepare_to_wait_event(&wq, &__wait,
state);
if (condition)
break;
schedule();
}
finish_wait(&wq, &__wait);
14
Wait-queues
• Provide different wrappers, such that users do not
suffer from races aforementioned.
for (;;) {
long __int = prepare_to_wait_event(&wq, &__wait,
state);
if (condition)
break;
schedule();
}
finish_wait(&wq, &__wait);
removal keeps the process
from seeing multiple wakeups
15
When Wait-queues Were Simple
• Basic linked list of waiting
threads.
• A wake_up() call on a
wait queue would walk
the list, putting each
thread into the runnable
state.
16
Wait-queues
• Exclusive Wait
‒ Only the first N tasks are awoken.
• Callback mechanism were added for asynchronous
I/O facility could step in instead.
17
Wait-queues
• Lockless waitqueue_active() checks
CPU0 - waker CPU1 - waiter
for (;;) {
cond = true; prepare_to_wait(&wq, &wait, state);
smp_mb(); // smp_mb() from set_current_state()
if (waitqueue_active(wq)) if (cond)
wake_up(wq); break;
schedule();
}
finish_wait(&wq, &wait);
18
Wait-queues as Building Blocks
• Completions
‒ Allows processes to wait until another task reaches some point
or state.
‒ Similar to pthread_barrier().
‒ Documents the code very well.
• Bit waiting.
19
Wait-queues and RT
• Waitqueues are a big problem for realtime as they use
spinlocks, which in RT are sleepable.
‒ Cannot convert convert to raw spinlock due to callback
mechanism.
‒ This means we cannot perform wakeups in IRQ context.
Simple Wait-queues (swait)
21
Simple wait-queues
• Similar to the traditional (bulky) waitqueues, yet
guarantees bounded irq and lock hold times.
‒ Taken from PREEMPT_RT.
• To accomplish this, it must remove wq complexities:
‒ Mixing INTERRUPTIBLE and UNINTERRUPTIBLE blocking in
the same queue. Ends up doing O(n) lookups.
‒ Custom callbacks (unknown code execution).
‒ Specific task wakeups (maintaining list order is tricky).
22
Simple wait-queues
• Similar to the traditional (bulky) waitqueues, yet
guarantees bounded irq and lock hold times.
‒ Taken from PREEMPT_RT.
• To accomplish this, it must remove wq complexities:
‒ Mixing INTERRUPTIBLE and UNINTERRUPTIBLE blocking in
the same queue. Ends up doing O(n) lookups.
‒ Custom callbacks (unknown code execution).
‒ Specific task wakeups (maintaining list order is tricky).
• Results in a simpler wq, less memory footprint.
23
Simple wait-queues
• swait_wake_all(): task context
raw_spin_lock_irq(&q->lock);
list_splice_init(&q->task_list, &tmp);
while (!list_empty(&tmp)) {
curr = list_first_entry(&tmp, typeof(*curr), task_list);
wake_up_state(curr->task, TASK_NORMAL);
list_del_init(&curr->task_list);
if (list_empty(&tmp))
break;
raw_spin_unlock_irq(&q->lock);
raw_spin_lock_irq(&q->lock);
}
raw_spin_unlock_irq(&q->lock);
24
Simple wait-queues
• swait_wake_all(): task context
raw_spin_lock_irq(&q->lock);
list_splice_init(&q->task_list, &tmp);
while (!list_empty(&tmp)) {
curr = list_first_entry(&tmp, typeof(*curr), task_list);
wake_up_state(curr->task, TASK_NORMAL);
list_del_init(&curr->task_list);
if (list_empty(&tmp))
break;
raw_spin_unlock_irq(&q->lock);
raw_spin_lock_irq(&q->lock);
}
raw_spin_unlock_irq(&q->lock);
drop the lock (and
IRQ disable)
after every wakeup
Lockless Wake Queues
26
Wake Queues
• Internally acknowledge that one or more tasks are to
be awoken, then call wake_up_process() after
releasing a lock.
‒ wake_q_add()/wake_up_q()
• Hold reference to each task in the list across the
wakeup thus guaranteeing that the memory is still
valid by the time the actual wakeups are performed in
wake_up_q().
27
Wake Queues
• Maintains caller wakeup order.
• Works particularly well for batch wakeups of tasks
blocked on a particular event.
‒ Futexes, locking, ipc.
28
Wake Queues
rcuwait
30
rcuwait
• task_struct is not properly rcu protected unless
dealing with an rcu aware list
‒ find_task_by_*().
• delayed_put_task_struct() (via release_task()) can
drop the last reference to a task.
‒ Bogus wakeups, etc.
• Provides a way of blocking and waking up a single
task in an rcu-safe manner.
31
rcuwait
• But what about task_rcu_dereference()?
‒ Task freeing detection
‒ probe_kernel_read()
‒ May return a new task (false positives)
32
rcuwait
• If we ensure a waiter is blocked on an event before
calling do_exit()/exit_notify(), we can avoid all of
task_rcu_dereference overhead an magic.
• Currently used in percpu-semaphores.
References
34
References
• Documentation/memory-barriers.txt
• Documentation/scheduler/completion.txt
• Simple Wait Queues LWN
• Return of Simple Waitqueues (LWN)
• Source Code:
‒ kernel/sched/{wait,swait}.c
‒ kernel/sched/exit.c (rcuwait)
‒ include/linux/sched/wake_q.h
Thank you.
36

More Related Content

What's hot

PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).Alexey Lesovsky
 
FreeRTOS Xilinx Vivado: Hello World!
FreeRTOS Xilinx Vivado: Hello World!FreeRTOS Xilinx Vivado: Hello World!
FreeRTOS Xilinx Vivado: Hello World!Vincent Claes
 
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321maclean liu
 
你所不知道的Oracle后台进程Smon功能
你所不知道的Oracle后台进程Smon功能你所不知道的Oracle后台进程Smon功能
你所不知道的Oracle后台进程Smon功能maclean liu
 
Troubleshooting PostgreSQL with pgCenter
Troubleshooting PostgreSQL with pgCenterTroubleshooting PostgreSQL with pgCenter
Troubleshooting PostgreSQL with pgCenterAlexey Lesovsky
 
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...JAX London
 
Rtos 3 & 4
Rtos 3 & 4Rtos 3 & 4
Rtos 3 & 4PRADEEP
 
Infinum iOS Talks #1 - Swift under the hood: Method Dispatching by Vlaho Poluta
Infinum iOS Talks #1 - Swift under the hood: Method Dispatching by Vlaho PolutaInfinum iOS Talks #1 - Swift under the hood: Method Dispatching by Vlaho Poluta
Infinum iOS Talks #1 - Swift under the hood: Method Dispatching by Vlaho PolutaInfinum
 
WiredTiger In-Memory vs WiredTiger B-Tree
WiredTiger In-Memory vs WiredTiger B-TreeWiredTiger In-Memory vs WiredTiger B-Tree
WiredTiger In-Memory vs WiredTiger B-TreeSveta Smirnova
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 
Масштабируемая конфигурация Nginx, Игорь Сысоев (Nginx)
Масштабируемая конфигурация Nginx, Игорь Сысоев (Nginx)Масштабируемая конфигурация Nginx, Игорь Сысоев (Nginx)
Масштабируемая конфигурация Nginx, Игорь Сысоев (Nginx)Ontico
 
When the OS gets in the way
When the OS gets in the wayWhen the OS gets in the way
When the OS gets in the wayMark Price
 
Jenkins 2を使った究極のpipeline ~ 明日もう一度来てください、本物のpipelineをお見せしますよ ~
Jenkins 2を使った究極のpipeline ~ 明日もう一度来てください、本物のpipelineをお見せしますよ ~Jenkins 2を使った究極のpipeline ~ 明日もう一度来てください、本物のpipelineをお見せしますよ ~
Jenkins 2を使った究極のpipeline ~ 明日もう一度来てください、本物のpipelineをお見せしますよ ~ikikko
 
The Node.js Event Loop: Not So Single Threaded
The Node.js Event Loop: Not So Single ThreadedThe Node.js Event Loop: Not So Single Threaded
The Node.js Event Loop: Not So Single ThreadedBryan Hughes
 

What's hot (20)

Node day 2014
Node day 2014Node day 2014
Node day 2014
 
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
 
Node js lecture
Node js lectureNode js lecture
Node js lecture
 
Cuda 2
Cuda 2Cuda 2
Cuda 2
 
Java 8 고급 (4/6)
Java 8 고급 (4/6)Java 8 고급 (4/6)
Java 8 고급 (4/6)
 
FreeRTOS Xilinx Vivado: Hello World!
FreeRTOS Xilinx Vivado: Hello World!FreeRTOS Xilinx Vivado: Hello World!
FreeRTOS Xilinx Vivado: Hello World!
 
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
 
Qt Network Explained (Portuguese)
Qt Network Explained (Portuguese)Qt Network Explained (Portuguese)
Qt Network Explained (Portuguese)
 
你所不知道的Oracle后台进程Smon功能
你所不知道的Oracle后台进程Smon功能你所不知道的Oracle后台进程Smon功能
你所不知道的Oracle后台进程Smon功能
 
Osol Pgsql
Osol PgsqlOsol Pgsql
Osol Pgsql
 
Troubleshooting PostgreSQL with pgCenter
Troubleshooting PostgreSQL with pgCenterTroubleshooting PostgreSQL with pgCenter
Troubleshooting PostgreSQL with pgCenter
 
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
 
Rtos 3 & 4
Rtos 3 & 4Rtos 3 & 4
Rtos 3 & 4
 
Infinum iOS Talks #1 - Swift under the hood: Method Dispatching by Vlaho Poluta
Infinum iOS Talks #1 - Swift under the hood: Method Dispatching by Vlaho PolutaInfinum iOS Talks #1 - Swift under the hood: Method Dispatching by Vlaho Poluta
Infinum iOS Talks #1 - Swift under the hood: Method Dispatching by Vlaho Poluta
 
WiredTiger In-Memory vs WiredTiger B-Tree
WiredTiger In-Memory vs WiredTiger B-TreeWiredTiger In-Memory vs WiredTiger B-Tree
WiredTiger In-Memory vs WiredTiger B-Tree
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
Масштабируемая конфигурация Nginx, Игорь Сысоев (Nginx)
Масштабируемая конфигурация Nginx, Игорь Сысоев (Nginx)Масштабируемая конфигурация Nginx, Игорь Сысоев (Nginx)
Масштабируемая конфигурация Nginx, Игорь Сысоев (Nginx)
 
When the OS gets in the way
When the OS gets in the wayWhen the OS gets in the way
When the OS gets in the way
 
Jenkins 2を使った究極のpipeline ~ 明日もう一度来てください、本物のpipelineをお見せしますよ ~
Jenkins 2を使った究極のpipeline ~ 明日もう一度来てください、本物のpipelineをお見せしますよ ~Jenkins 2を使った究極のpipeline ~ 明日もう一度来てください、本物のpipelineをお見せしますよ ~
Jenkins 2を使った究極のpipeline ~ 明日もう一度来てください、本物のpipelineをお見せしますよ ~
 
The Node.js Event Loop: Not So Single Threaded
The Node.js Event Loop: Not So Single ThreadedThe Node.js Event Loop: Not So Single Threaded
The Node.js Event Loop: Not So Single Threaded
 

Similar to Much Ado About Blocking: Wait/Wakke in the Linux Kernel

Show innodb status
Show innodb statusShow innodb status
Show innodb statusjustlooks
 
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)Sneeker Yeh
 
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptxEmbedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptxProfMonikaJain
 
OS Process Synchronization, semaphore and Monitors
OS Process Synchronization, semaphore and MonitorsOS Process Synchronization, semaphore and Monitors
OS Process Synchronization, semaphore and Monitorssgpraju
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdfAdrian Huang
 
Node.js Event Loop & EventEmitter
Node.js Event Loop & EventEmitterNode.js Event Loop & EventEmitter
Node.js Event Loop & EventEmitterSimen Li
 
Wait queue
Wait queueWait queue
Wait queueRoy Lee
 
Java util concurrent
Java util concurrentJava util concurrent
Java util concurrentRoger Xia
 
Concurrency 2010
Concurrency 2010Concurrency 2010
Concurrency 2010敬倫 林
 
Synchronization problem with threads
Synchronization problem with threadsSynchronization problem with threads
Synchronization problem with threadsSyed Zaid Irshad
 
Wait for your fortune without Blocking!
Wait for your fortune without Blocking!Wait for your fortune without Blocking!
Wait for your fortune without Blocking!Roman Elizarov
 
Algorithm analysis.pptx
Algorithm analysis.pptxAlgorithm analysis.pptx
Algorithm analysis.pptxDrBashirMSaad
 
Async and parallel patterns and application design - TechDays2013 NL
Async and parallel patterns and application design - TechDays2013 NLAsync and parallel patterns and application design - TechDays2013 NL
Async and parallel patterns and application design - TechDays2013 NLArie Leeuwesteijn
 
Process Scheduler and Balancer in Linux Kernel
Process Scheduler and Balancer in Linux KernelProcess Scheduler and Balancer in Linux Kernel
Process Scheduler and Balancer in Linux KernelHaifeng Li
 
IOT Firmware: Best Pratices
IOT Firmware:  Best PraticesIOT Firmware:  Best Pratices
IOT Firmware: Best Praticesfarmckon
 
Profiling the logwriter and database writer
Profiling the logwriter and database writerProfiling the logwriter and database writer
Profiling the logwriter and database writerEnkitec
 

Similar to Much Ado About Blocking: Wait/Wakke in the Linux Kernel (20)

Show innodb status
Show innodb statusShow innodb status
Show innodb status
 
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
 
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptxEmbedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
 
FreeRTOS Slides annotations.pdf
FreeRTOS Slides annotations.pdfFreeRTOS Slides annotations.pdf
FreeRTOS Slides annotations.pdf
 
Os3
Os3Os3
Os3
 
OS Process Synchronization, semaphore and Monitors
OS Process Synchronization, semaphore and MonitorsOS Process Synchronization, semaphore and Monitors
OS Process Synchronization, semaphore and Monitors
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdf
 
Node.js Event Loop & EventEmitter
Node.js Event Loop & EventEmitterNode.js Event Loop & EventEmitter
Node.js Event Loop & EventEmitter
 
Wait queue
Wait queueWait queue
Wait queue
 
Java util concurrent
Java util concurrentJava util concurrent
Java util concurrent
 
Concurrency 2010
Concurrency 2010Concurrency 2010
Concurrency 2010
 
Synchronization problem with threads
Synchronization problem with threadsSynchronization problem with threads
Synchronization problem with threads
 
Wait for your fortune without Blocking!
Wait for your fortune without Blocking!Wait for your fortune without Blocking!
Wait for your fortune without Blocking!
 
Algorithm analysis.pptx
Algorithm analysis.pptxAlgorithm analysis.pptx
Algorithm analysis.pptx
 
FPGA Tutorial - LCD Interface
FPGA Tutorial - LCD InterfaceFPGA Tutorial - LCD Interface
FPGA Tutorial - LCD Interface
 
Kernel
KernelKernel
Kernel
 
Async and parallel patterns and application design - TechDays2013 NL
Async and parallel patterns and application design - TechDays2013 NLAsync and parallel patterns and application design - TechDays2013 NL
Async and parallel patterns and application design - TechDays2013 NL
 
Process Scheduler and Balancer in Linux Kernel
Process Scheduler and Balancer in Linux KernelProcess Scheduler and Balancer in Linux Kernel
Process Scheduler and Balancer in Linux Kernel
 
IOT Firmware: Best Pratices
IOT Firmware:  Best PraticesIOT Firmware:  Best Pratices
IOT Firmware: Best Pratices
 
Profiling the logwriter and database writer
Profiling the logwriter and database writerProfiling the logwriter and database writer
Profiling the logwriter and database writer
 

Recently uploaded

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 

Recently uploaded (20)

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 

Much Ado About Blocking: Wait/Wakke in the Linux Kernel

  • 1. Much Ado About Blocking: Wait/Wake in the Linux Kernel LinuxCon China – June 2017. Beijing, China. Davidlohr Bueso <dave@stgolabs.net> SUSE Labs.
  • 2. 2 Agenda 1. Fundamentals of Blocking 2.Flavors of blocking ● Wait-queues ● Simple wait-queues ● Lockless wake-queues ● rcuwait
  • 4. 4 Blocking 101 • As opposed to busy waiting, sleeping is required in scenarios where the wait time can be too long. ‒ Other tasks ought to therefore run instead of wasting cycles. ‒ Overhead of context switch vs wasting cycles.
  • 5. 5 Blocking 101 • As opposed to busy waiting, sleeping is required in scenarios where the wait time can be too long. ‒ Other tasks ought to therefore run instead of wasting cycles. ‒ Overhead of context switch vs wasting cycles. • TASK_{UNINTERRUPTIBLE,INTERRUPTIBLE,KILLABLE} • There are three elements to consider when waiting on an event: ‒ The wakee (wake_up, wake_up_process, etc). ‒ The waker (schedule, schedule_timeout, io, etc). ‒ The condition/event
  • 7. 7 sleep_on() CPU0 while (!cond) sleep_on() CPU1 cond = true wake_up() • Inherently racy on SMP. ‒ Missed wakeups • Synchronization must be provided. ‒ Locking ‒ Memory barriers
  • 8. 8 SMP-safe Blocking for (;;) { set_current_state(); if (cond) break; schedule(); } __set_current_state(TASK_RUNNING);
  • 9. 9 SMP-safe Blocking for (;;) { set_current_state(); if (cond) break; schedule(); } __set_current_state(TASK_RUNNING); Pairs with barrier in wake_up()
  • 11. 11 Wait-queues • Provide different wrappers, such that users do not suffer from races aforementioned. for (;;) { long __int = prepare_to_wait_event(&wq, &__wait, state); if (condition) break; schedule(); } finish_wait(&wq, &__wait);
  • 12. 12 Wait-queues • Provide different wrappers, such that users do not suffer from races aforementioned. for (;;) { long __int = prepare_to_wait_event(&wq, &__wait, state); if (condition) break; schedule(); } finish_wait(&wq, &__wait);
  • 13. 13 Wait-queues • Provide different wrappers, such that users do not suffer from races aforementioned. for (;;) { long __int = prepare_to_wait_event(&wq, &__wait, state); if (condition) break; schedule(); } finish_wait(&wq, &__wait);
  • 14. 14 Wait-queues • Provide different wrappers, such that users do not suffer from races aforementioned. for (;;) { long __int = prepare_to_wait_event(&wq, &__wait, state); if (condition) break; schedule(); } finish_wait(&wq, &__wait); removal keeps the process from seeing multiple wakeups
  • 15. 15 When Wait-queues Were Simple • Basic linked list of waiting threads. • A wake_up() call on a wait queue would walk the list, putting each thread into the runnable state.
  • 16. 16 Wait-queues • Exclusive Wait ‒ Only the first N tasks are awoken. • Callback mechanism were added for asynchronous I/O facility could step in instead.
  • 17. 17 Wait-queues • Lockless waitqueue_active() checks CPU0 - waker CPU1 - waiter for (;;) { cond = true; prepare_to_wait(&wq, &wait, state); smp_mb(); // smp_mb() from set_current_state() if (waitqueue_active(wq)) if (cond) wake_up(wq); break; schedule(); } finish_wait(&wq, &wait);
  • 18. 18 Wait-queues as Building Blocks • Completions ‒ Allows processes to wait until another task reaches some point or state. ‒ Similar to pthread_barrier(). ‒ Documents the code very well. • Bit waiting.
  • 19. 19 Wait-queues and RT • Waitqueues are a big problem for realtime as they use spinlocks, which in RT are sleepable. ‒ Cannot convert convert to raw spinlock due to callback mechanism. ‒ This means we cannot perform wakeups in IRQ context.
  • 21. 21 Simple wait-queues • Similar to the traditional (bulky) waitqueues, yet guarantees bounded irq and lock hold times. ‒ Taken from PREEMPT_RT. • To accomplish this, it must remove wq complexities: ‒ Mixing INTERRUPTIBLE and UNINTERRUPTIBLE blocking in the same queue. Ends up doing O(n) lookups. ‒ Custom callbacks (unknown code execution). ‒ Specific task wakeups (maintaining list order is tricky).
  • 22. 22 Simple wait-queues • Similar to the traditional (bulky) waitqueues, yet guarantees bounded irq and lock hold times. ‒ Taken from PREEMPT_RT. • To accomplish this, it must remove wq complexities: ‒ Mixing INTERRUPTIBLE and UNINTERRUPTIBLE blocking in the same queue. Ends up doing O(n) lookups. ‒ Custom callbacks (unknown code execution). ‒ Specific task wakeups (maintaining list order is tricky). • Results in a simpler wq, less memory footprint.
  • 23. 23 Simple wait-queues • swait_wake_all(): task context raw_spin_lock_irq(&q->lock); list_splice_init(&q->task_list, &tmp); while (!list_empty(&tmp)) { curr = list_first_entry(&tmp, typeof(*curr), task_list); wake_up_state(curr->task, TASK_NORMAL); list_del_init(&curr->task_list); if (list_empty(&tmp)) break; raw_spin_unlock_irq(&q->lock); raw_spin_lock_irq(&q->lock); } raw_spin_unlock_irq(&q->lock);
  • 24. 24 Simple wait-queues • swait_wake_all(): task context raw_spin_lock_irq(&q->lock); list_splice_init(&q->task_list, &tmp); while (!list_empty(&tmp)) { curr = list_first_entry(&tmp, typeof(*curr), task_list); wake_up_state(curr->task, TASK_NORMAL); list_del_init(&curr->task_list); if (list_empty(&tmp)) break; raw_spin_unlock_irq(&q->lock); raw_spin_lock_irq(&q->lock); } raw_spin_unlock_irq(&q->lock); drop the lock (and IRQ disable) after every wakeup
  • 26. 26 Wake Queues • Internally acknowledge that one or more tasks are to be awoken, then call wake_up_process() after releasing a lock. ‒ wake_q_add()/wake_up_q() • Hold reference to each task in the list across the wakeup thus guaranteeing that the memory is still valid by the time the actual wakeups are performed in wake_up_q().
  • 27. 27 Wake Queues • Maintains caller wakeup order. • Works particularly well for batch wakeups of tasks blocked on a particular event. ‒ Futexes, locking, ipc.
  • 30. 30 rcuwait • task_struct is not properly rcu protected unless dealing with an rcu aware list ‒ find_task_by_*(). • delayed_put_task_struct() (via release_task()) can drop the last reference to a task. ‒ Bogus wakeups, etc. • Provides a way of blocking and waking up a single task in an rcu-safe manner.
  • 31. 31 rcuwait • But what about task_rcu_dereference()? ‒ Task freeing detection ‒ probe_kernel_read() ‒ May return a new task (false positives)
  • 32. 32 rcuwait • If we ensure a waiter is blocked on an event before calling do_exit()/exit_notify(), we can avoid all of task_rcu_dereference overhead an magic. • Currently used in percpu-semaphores.
  • 34. 34 References • Documentation/memory-barriers.txt • Documentation/scheduler/completion.txt • Simple Wait Queues LWN • Return of Simple Waitqueues (LWN) • Source Code: ‒ kernel/sched/{wait,swait}.c ‒ kernel/sched/exit.c (rcuwait) ‒ include/linux/sched/wake_q.h
  • 36. 36