The document summarizes Linux synchronization mechanisms including semaphores and mutexes. It discusses:
1. Semaphores can be used to solve producer-consumer problems and are implemented using a count and wait list.
2. Mutexes enforce serialization on shared memory and have fast, mid, and slow paths for lock and unlock. The mid path uses optimistic spinning and OSQs.
3. Only the lock owner can unlock a mutex, and mutexes transition to the slow path if the owner is preempted, a spinner is preempted, or the owner sleeps.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Reverse Mapping (rmap) in Linux KernelAdrian Huang
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...Adrian Huang
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Page cache mechanism in Linux kernel.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Reverse Mapping (rmap) in Linux KernelAdrian Huang
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...Adrian Huang
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Page cache mechanism in Linux kernel.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Memory Mapping Implementation (mmap) in Linux KernelAdrian Huang
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Anatomy of the loadable kernel module (lkm)Adrian Huang
Talk about how Linux kernel invokes your module's init function.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Talk by Brendan Gregg for USENIX LISA 2019: Linux Systems Performance. Abstract: "
Systems performance is an effective discipline for performance analysis and tuning, and can help you find performance wins for your applications and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas of Linux systems performance: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (Ftrace, bcc/BPF, and bpftrace/BPF), and much advice about what is and isn't important to learn. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud."
Process Address Space: The way to create virtual address (page table) of user...Adrian Huang
Process Address Space: The way to create virtual address (page table) of userspace application.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedAdrian Huang
This slide deck describes the Linux booting flow for x86_64 processors.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Virtual File System in Linux Kernel
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Linux Kernel Booting Process (2) - For NLKBshimosawa
Describes the bootstrapping part in Linux, and related architectural mechanisms and technologies.
This is the part two of the slides, and the succeeding slides may contain the errata for this slide.
Decompressed vmlinux: linux kernel initialization from page table configurati...Adrian Huang
Talk about how Linux kernel initializes the page table.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
A brief overview of linux scheduler, context switch , priorities and scheduling classes as well as new features. Also provides an overview of preemption models in linux and how to use each model. all the examples are taken from http://www.discoversdk.com
UM2019 Extended BPF: A New Type of SoftwareBrendan Gregg
Keynote for Ubuntu Masters 2019 by Brendan Gregg, Netflix. Video https://www.youtube.com/watch?v=7pmXdG8-7WU&feature=youtu.be . "Extended BPF is a new type of software, and the first fundamental change to how kernels are used in 50 years. This new type of software is already in use by major companies: Netflix has 14 BPF programs running by default on all of its cloud servers, which run Ubuntu Linux. Facebook has 40 BPF programs running by default. Extended BPF is composed of an in-kernel runtime for executing a virtual BPF instruction set through a safety verifier and with JIT compilation. So far it has been used for software defined networking, performance tools, security policies, and device drivers, with more uses planned and more we have yet to think of. It is changing how we use and think about systems. This talk explores the past, present, and future of BPF, with BPF performance tools as a use case."
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedAnne Nicolas
Ftrace’s most powerful feature is the function tracer (and function graph tracer which is built from it). But to have this enabled on production systems, it had to have its overhead be negligible when disabled. As the function tracer uses gcc’s profiling mechanism, which adds a call to “mcount” (or more recently fentry, don’t worry if you don’t know what this is, it will all be explained) at the start of almost all functions, it had to do something about the overhead that causes. The solution was to turn those calls into “nops” (an instruction that the CPU simply ignores). But this was no easy feat. It took a lot to come up with a solution (and also turning a few network cards into bricks). This talk will explain the history of how ftrace came about implementing the function tracer, and brought with it the possibility of static branches and soon static calls!
Steven Rostedt
Memory Mapping Implementation (mmap) in Linux KernelAdrian Huang
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Anatomy of the loadable kernel module (lkm)Adrian Huang
Talk about how Linux kernel invokes your module's init function.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Talk by Brendan Gregg for USENIX LISA 2019: Linux Systems Performance. Abstract: "
Systems performance is an effective discipline for performance analysis and tuning, and can help you find performance wins for your applications and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas of Linux systems performance: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (Ftrace, bcc/BPF, and bpftrace/BPF), and much advice about what is and isn't important to learn. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud."
Process Address Space: The way to create virtual address (page table) of user...Adrian Huang
Process Address Space: The way to create virtual address (page table) of userspace application.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedAdrian Huang
This slide deck describes the Linux booting flow for x86_64 processors.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Virtual File System in Linux Kernel
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Linux Kernel Booting Process (2) - For NLKBshimosawa
Describes the bootstrapping part in Linux, and related architectural mechanisms and technologies.
This is the part two of the slides, and the succeeding slides may contain the errata for this slide.
Decompressed vmlinux: linux kernel initialization from page table configurati...Adrian Huang
Talk about how Linux kernel initializes the page table.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
A brief overview of linux scheduler, context switch , priorities and scheduling classes as well as new features. Also provides an overview of preemption models in linux and how to use each model. all the examples are taken from http://www.discoversdk.com
UM2019 Extended BPF: A New Type of SoftwareBrendan Gregg
Keynote for Ubuntu Masters 2019 by Brendan Gregg, Netflix. Video https://www.youtube.com/watch?v=7pmXdG8-7WU&feature=youtu.be . "Extended BPF is a new type of software, and the first fundamental change to how kernels are used in 50 years. This new type of software is already in use by major companies: Netflix has 14 BPF programs running by default on all of its cloud servers, which run Ubuntu Linux. Facebook has 40 BPF programs running by default. Extended BPF is composed of an in-kernel runtime for executing a virtual BPF instruction set through a safety verifier and with JIT compilation. So far it has been used for software defined networking, performance tools, security policies, and device drivers, with more uses planned and more we have yet to think of. It is changing how we use and think about systems. This talk explores the past, present, and future of BPF, with BPF performance tools as a use case."
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedAnne Nicolas
Ftrace’s most powerful feature is the function tracer (and function graph tracer which is built from it). But to have this enabled on production systems, it had to have its overhead be negligible when disabled. As the function tracer uses gcc’s profiling mechanism, which adds a call to “mcount” (or more recently fentry, don’t worry if you don’t know what this is, it will all be explained) at the start of almost all functions, it had to do something about the overhead that causes. The solution was to turn those calls into “nops” (an instruction that the CPU simply ignores). But this was no easy feat. It took a lot to come up with a solution (and also turning a few network cards into bricks). This talk will explain the history of how ftrace came about implementing the function tracer, and brought with it the possibility of static branches and soon static calls!
Steven Rostedt
In computer science, synchronization refers to one of two distinct but related concepts: synchronization of processes, and synchronization of data. Process synchronization refers to the idea that multiple processes are to join up or handshake at a certain point, in order to reach an agreement or commit to a certain sequence of action. Data synchronization refers to the idea of keeping multiple copies of a dataset in coherence with one another, or to maintain data integrity. Process synchronization primitives are commonly used to implement data synchronization.
Training Slides: Basics 105: Backup, Recovery and Provisioning Within Tungste...Continuent
Join us for this training session where we discuss tools for backing up MySQL databases within Tungsten Clustering, and how Tungsten Clustering makes it easy to reprovision databases and recover a cluster. This training is for all engineers using Tungsten Clustering and will explain the tools needed to create a sound backup and recovery strategy, and is also valuable for those who wish to review their current MySQL backup and MySQL recovery procedures.
AGENDA
- Methods and tools for taking a MySQL backup from within a Tungsten cluster
- mysqldump
- xtrackbackup
- Snapshots (lvm or other)
- File copy
- tungsten_provision_slave
- Restoring a MySQL backup from within a Tungsten cluster
- Verifying the backup contains the last binary log position, and the importance of having this
- Restoring backups into a cluster
- Provisioning slaves from an existing datasource
Training Slides: 203 - Backup & RecoveryContinuent
Watch this 36min training to learn about planning for backups, what some of the methods and tools are, how to restore backups and more.
TOPICS COVERED
- How to develop a backup plan
- Methods and tools for taking a backup
- Verifying the backup contains the last binary position, and the importance of this
- Restore backups into the cluster
- Provision a replica from an existing datasource
The openSUSE Tumbleweed kernel is lockded-down since v6.4.3 when secure boot is enabled. It means that the behavior of Tumbleweed kernel will align with SLE and openSUSE Leap when secure boot is enabled.
Implementing efficient spinlocks in userspace is not possible yet in Linux,
even after years of different approaches and proposed solutions.The main gap to
achieve it is the lack of ABI providing an easy and low-overhead way to check
if the current lock holder is running or not.
In this session, we are going to present the problem, and to propose a solution
for it using the restartable sequences infrastructure as means to expose the
thread state to userspace in a cheap way, without requiring system calls.
RFC:
https://lore.kernel.org/lkml/20230529191416.53955-1-mathieu.desnoyers@efficios.com/
(c) Linux Plumbers Conference 2023
15-15 Nov
Omni Richmond Hotel, Richmond, VA (US)
https://lpc.events/event/17/page/198-lpc-2023-overview
Slides from tech talk about the art of non-blocking waiting in Java with LockSupport.park/unpark and AbstractQueuedSynchronizer. Presented on JPoint 2016 Conference.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Your Digital Assistant.
Making complex approach simple. Straightforward process saves time. No more waiting to connect with people that matter to you. Safety first is not a cliché - Securely protect information in cloud storage to prevent any third party from accessing data.
Would you rather make your visitors feel burdened by making them wait? Or choose VizMan for a stress-free experience? VizMan is an automated visitor management system that works for any industries not limited to factories, societies, government institutes, and warehouses. A new age contactless way of logging information of visitors, employees, packages, and vehicles. VizMan is a digital logbook so it deters unnecessary use of paper or space since there is no requirement of bundles of registers that is left to collect dust in a corner of a room. Visitor’s essential details, helps in scheduling meetings for visitors and employees, and assists in supervising the attendance of the employees. With VizMan, visitors don’t need to wait for hours in long queues. VizMan handles visitors with the value they deserve because we know time is important to you.
Feasible Features
One Subscription, Four Modules – Admin, Employee, Receptionist, and Gatekeeper ensures confidentiality and prevents data from being manipulated
User Friendly – can be easily used on Android, iOS, and Web Interface
Multiple Accessibility – Log in through any device from any place at any time
One app for all industries – a Visitor Management System that works for any organisation.
Stress-free Sign-up
Visitor is registered and checked-in by the Receptionist
Host gets a notification, where they opt to Approve the meeting
Host notifies the Receptionist of the end of the meeting
Visitor is checked-out by the Receptionist
Host enters notes and remarks of the meeting
Customizable Components
Scheduling Meetings – Host can invite visitors for meetings and also approve, reject and reschedule meetings
Single/Bulk invites – Invitations can be sent individually to a visitor or collectively to many visitors
VIP Visitors – Additional security of data for VIP visitors to avoid misuse of information
Courier Management – Keeps a check on deliveries like commodities being delivered in and out of establishments
Alerts & Notifications – Get notified on SMS, email, and application
Parking Management – Manage availability of parking space
Individual log-in – Every user has their own log-in id
Visitor/Meeting Analytics – Evaluate notes and remarks of the meeting stored in the system
Visitor Management System is a secure and user friendly database manager that records, filters, tracks the visitors to your organization.
"Secure Your Premises with VizMan (VMS) – Get It Now"
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
Worried about document security while sharing them in Salesforce? Fret no more! Here are the top-notch security standards XfilesPro upholds to ensure strong security for your Salesforce documents while sharing with internal or external people.
To learn more, read the blog: https://www.xfilespro.com/how-does-xfilespro-make-document-sharing-secure-and-seamless-in-salesforce/
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
Data privacy is one of the most critical issues that businesses face. This presentation shares insights on the principles and best practices for ensuring the resilience and security of your workload.
Drawing on a real-life project from the HR industry, the various challenges will be demonstrated: data protection, self-healing, business continuity, security, and transparency of data processing. This systematized approach allowed to create a secure AWS cloud infrastructure that not only met strict compliance rules but also exceeded the client's expectations.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
semaphore & mutex.pdf
1. * Based on kernel 5.11 (x86_64) – QEMU
* 2-socket CPUs (2 cores/socket)
* 16GB memory
* Kernel parameter: nokaslr norandmaps
* KASAN: disabled
* Userspace: ASLR is disabled
* Legacy BIOS
Linux Synchronization Mechanism: Semaphore &
Mutex
Adrian Huang | Feb, 2023
2. Agenda
• Semaphore
✓producer-consumer problem
✓Implementation in Linux kernel
• Mutex (introduced in v2.6.16)
✓Enforce serialization on shared memory systems
✓Implementation in Linux kernel
✓Mutex lock
➢Fast path, midpath, slow path
✓Mutex unlock
➢Fast path and slow path
➢Mutex ownership (with a lab)
◼ Re-visit this concept: Only the lock owner has the permission to unlock the mutex
✓Q & A
3. Semaphore: producer-consumer problem
task 0
Semaphore
wakeup/signal
#0
#1
#2
#3
Share resource
counter = 4
wait_list
wait/sleep
P, down()
V, up()
task 1
task N
.
.
task task task
• Sleeping lock
• Used in process context *ONLY*
• Cannot hold a spin lock while acquiring a semaphore
• Mainly use in producer-consumer scenario
• The lock holder does not require to unlock the lock. (non-ownership concept)
✓ Something like notification
6. Semaphore Implementation in Linux Kernel
[Only for interruptible and wakekill task] Check if the sleeping task gets a signal
7. Semaphore Implementation in Linux Kernel
1 Protect sem->count data
2
Reschedule: Need to unlock spinlock
down
__down
raw_spin_lock_irqsave
sem->count > 0
sem->count--
raw_spin_unlock_irqrestore
__down_common
Y
N
8. Semaphore Implementation in Linux Kernel
Protect sem->count data
up
__up
raw_spin_lock_irqsave
wait_list empty?
sem->count++
raw_spin_unlock_irqrestore
Y
N
semaphore
lock
count
wait_list semaphore_waiter
list
task
struct raw_spinlock or raw_spinlock_t
raw_lock
up
10. Mutex: Enforce serialization on shared memory systems
task 0
mutex_unlock()
mutex_lock()
task 1
task N
.
.
task task task
mutex
owner
wait_lock
osq
wait_list
Critical Section
task
rlock
struct spinlock or spinlock_t
atomic_t tail;
optimistic_spin_queue
Lock owner
Protect accessing members
of mutex struct
[midpath] spinning: busy-waiting
[slow path] waiting tasks: sleep
(non-busy waiting)
11. Mutex Implementation in Linux
• Mutex implementation paths
✓Fastpath: Uncontended case by using cmpxchg(): CAS (Compare and Swap)
✓Midpath (optimistic spinning) - The priority of the lock owner is the highest one
➢Spin for mutex lock acquisition when the lock owner is running.
➢The lock owner is likely to release the lock soon.
➢Leverage cancelable MCS lock (OSQ - Optimistic Spin Queue: MCS-like lock): v3.15
✓Slowpath: The task is added to the waiting queue and sleeps until woken up by the
unlock path
• Mutex is a hybrid type (spinning & sleeping): Busy-waiting for a few cycles
instead of immediately sleeping
• Ownership: Only the lock owner can release the lock
• kernel/locking/{mutex.c, osq_lock.c}
• Reference: Generic Mutex Subsystem
14. __mutex_trylock
__mutex_trylock_or_owner
mutex_can_spin_on_owner
The lock might be unlocked by another core
mutex_optimistic_spin
osq_lock
__mutex_trylock_or_owner
mutex_spin_on_owner
cpu_relax
osq_unlock
for (;;)
break if getting the lock
There’s an owner: break if the lock
is released or owner goes to sleep
Return true if the following conditions are met
• The spinning task is not preempted: need_resched()
• The lock owner:
✓ Not preempted : checked by vcpu_is_preempted()
✓ Not sleep: checked by owner->on_cpu
• Spinner is spinning on the current lock owner!
mutex_spin_on_owner() returns true → keep looping for
acquiring the lock
• Lock release: one of spinning tasks can get the lock
mutex_spin_on_owner() returns false → break ‘for’ loop
• The spinning task is preempted
• The lock owner is preempted
• The lock owner sleeps
mutex_lock(): midpath
mutex_can_spin_on_owner
mutex_spin_on_owner
15. __mutex_trylock
__mutex_trylock_or_owner
mutex_can_spin_on_owner
The lock might be unlocked by another core
mutex_optimistic_spin
osq_lock
__mutex_trylock_or_owner
mutex_spin_on_owner
cpu_relax
osq_unlock
for (;;)
break if getting the lock
There’s an owner: break if the lock
is released or owner goes to sleep
mutex_lock(): midpath
Second or later osq_lock() is spinned in this function.
First osq_lock() gets osq lock and spins in this loop.
Notify other osq spinners to get an osq lock.
16. owner
owner
mutex_unlock()
mutex_lock()
Critical Section
core 0
mutex_unlock()
mutex_lock()
Critical Section
core 1 core 2 core 3
spinning
mutex_unlock()
mutex_lock()
Critical Section
spinning
mutex_unlock()
mutex_lock()
Critical Section
spinning
midpath: [Case #1: ideal] without preemption or sleep (both lock
owner and spinner)
owner
owner
One of spinning tasks can get the lock after the owner releases the lock:
Spinning tasks do not need to be moved to wait list
17. mutex_unlock()
mutex_lock()
Critical Section
core 0
mutex_unlock()
mutex_lock()
Critical Section
core 1 core 2 core 3
spinning
mutex_unlock()
mutex_lock()
Critical Section
spinning
mutex_unlock()
mutex_lock()
Critical Section
spinning
midpath – [Case #1: ideal] lock release without preemption or
sleep
When to exit the spinning?
1. The lock owner releases the lock
2. The lock owner goes to sleep or is preempted: spinning tasks go to slow path
✓ Check task->on_cpu
✓ Functions: prepare_task(), finish_task()…
3. The spinning task is preempted: the spinning task goes to slow path
✓ need_resched()
owner
owner
owner
owner
18. Three cases for “cannot spin on
mutex owner”
• The lock owner is preempted
• The spinning task is preempted
• The lock owner sleeps
24. Who sets TIF_NEED_RESCHED?
Who sets TIF_NEED_RESCHED? → set_tsk_need_resched()
1. Call path
✓ timer_interrupt → tick_handle_periodic → tick_periodic
→ update_process_times → scheduler_tick → curr-
>sched_class->task_tick → task_tick_fair → entity_tick ->
check_preempt_tick -> resched_curr ->
set_tsk_need_resched
✓ HW interrupt (not timer HW) → wake up a higher priority
task
2. Users:
✓ check_preempt_tick(), check_preempt_wakeup(),
wake_up_process()….and so on.
25. Set TIF_NEED_RESCHED: current task will be rescheduled later
PREEMPT_NEED_RESCHED bit = 0 → Need to reschedule (check comments in this header)
Who sets TIF_NEED_RESCHED?
26. • Set TIF_NEED_RESCHED flag if the delta is greater than
ideal_runtime
✓ The running task will be scheduled out.
Who sets TIF_NEED_RESCHED?
27. Who sets TIF_NEED_RESCHED? → set_tsk_need_resched()
1. Call path
✓ timer_interrupt → tick_handle_periodic → tick_periodic →
update_process_times → scheduler_tick → curr->sched_class-
>task_tick → task_tick_fair → entity_tick -> check_preempt_tick
-> resched_curr -> set_tsk_need_resched
✓ HW interrupt (not timer HW) → wake up a higher priority task
2. Users:
✓ check_preempt_tick(), check_preempt_wakeup(),
wake_up_process()….and so on.
Who sets TIF_NEED_RESCHED?
28. Three cases for “cannot spin on
mutex owner”
• The lock owner is preempted
• The spinning task is preempted
• The lock owner sleeps
29. [Case #4] Locker owner sleeps (reschedule): A test kernel module
The action of sleep is identical to preemption and “wait for IO”: reschedule
Create 4 kernel threads
Source code (github): test-modules/mutex/mutex.c
36. mutex_unlock
__mutex_unlock_fast
mutex_unlock(): Call path
return
mutex
owner
wait_lock
osq
wait_list
task
• task_struct pointers aligns to at least L1_CACHE_BYTES
• 3 LSB bits are used for non-empty waiter list
✓ W (MUTEX_FLAG_WAITERS)
◼ Non-empty waiter list. Issue a wakeup when unlocking
✓ H (MUTEX_FLAG_HANDOFF)
◼ Unlock needs to hand the lock to the top-waiter
◼ Use by ww_mutex because ww_mutex’s waiter list is not FIFO order.
✓ P (MUTEX_FLAG_PICKUP)
◼ Handoff has been done and we're waiting for pickup
◼ Use by ww_mutex because ww_mutex’s waiter list is not FIFO order.
locker->owner = 0
__mutex_unlock_slowpath
Have waiters: One of 3-bit LSB
of lock->owner is not cleared.
spin_lock(&lock->wait_lock)
spin_unlock(&lock->wait_lock)
Get a waiter from lock->wait_list
wake_up_q
wake_up_process
owner
task virtual addr W
P H
0
1
2
63
lock->owner
__mutex_handoff Called if MUTEX_FLAG_HANDOFF is set
atomic_long_cmpxchg_release(
&lock->owner, owner,
__owner_flags(owner))
The woken task will update lock->owner
Set 3-bit LSB of lock->owner → Clear the
original task struct address
* ww_mutex (Wound/Wait Mutex): Deadlock-proof mutex
[Unlock task = lock->owner]
No waiter: 3-bit LSB of lock->owner are cleared
38. mutex_unlock
__mutex_unlock_fast
mutex_unlock(): slow path
return
locker->owner = 0
__mutex_unlock_slowpath
Have waiters: One of 3-bit LSB
of lock->owner is not cleared.
spin_lock(&lock->wait_lock)
spin_unlock(&lock->wait_lock)
Get a waiter from lock->wait_list
wake_up_q
wake_up_process
__mutex_handoff Called if MUTEX_FLAG_HANDOFF is set
owner
task virtual addr 1
0 0
0
1
2
63
atomic_long_cmpxchg_release(
&lock->owner, owner,
__owner_flags(owner)
owner
0 1
0 0
0
1
2
63
The woken task will update lock->owner
mutex
owner
wait_lock
osq
wait_list
task
• task_struct pointers aligns to at least L1_CACHE_BYTES
• 3 LSB bits are used for non-empty waiter list
✓ W (MUTEX_FLAG_WAITERS)
◼ Non-empty waiter list. Issue a wakeup when unlocking
✓ H (MUTEX_FLAG_HANDOFF)
◼ Unlock needs to hand the lock to the top-waiter
◼ Use by ww_mutex because ww_mutex’s waiter list is not FIFO order.
✓ P (MUTEX_FLAG_PICKUP)
◼ Handoff has been done and we're waiting for pickup
◼ Use by ww_mutex because ww_mutex’s waiter list is not FIFO order.
owner
task virtual addr W
P H
0
1
2
63
lock>-owner
* ww_mutex (Wound/Wait Mutex): Deadlock-proof mutex
[Unlock task = lock->owner]
No waiter: 3-bit LSB of lock->owner are cleared
39. mutex_unlock
__mutex_unlock_fast
[Unlock task = lock->owner]
No waiter: 3-bit LSB of lock->owner are cleared
return
locker->owner = 0
__mutex_unlock_slowpath
Have waiters: One of 3-bit LSB
of lock-owner is not cleared.
spin_lock(&lock->wait_lock)
spin_unlock(&lock->wait_lock)
Get a waiter from lock->wait_list
wake_up_q
wake_up_process
__mutex_handoff Called if MUTEX_FLAG_HANDOFF is set
owner
task virtual addr 1
0 0
0
1
2
63
atomic_long_cmpxchg_release(
&lock->owner, owner,
__owner_flags(owner)
owner
0 1
0 0
0
1
2
63
The woken task will update lock->owner
mutex_unlock(): slow path
43. * Bit 0 is still set (MUTEX_FLAG_WAITERS): The upcoming
mutex_unlock() will wake up the waiter instead of spinner.
* Bit 1 (MUTEX_FLAG_HANDOFF) is cleared from
__mutex_trylock->__mutex_trylock_or_owner.
Update lock->owner
When/who clears 3-bit LSB of lock->owner?
44. Woken task: When/who to clear 3-bit LSB of lock-owner?
Clear 3-bit LSB of lock->owner if no waiters
45. mutex_unlock
__mutex_unlock_fast
mutex_unlock(): Mutex ownership
return
locker->owner = 0
__mutex_unlock_slowpath
Have waiters: One of 3-bit LSB
of lock->owner is not cleared.
spin_lock(&lock->wait_lock)
spin_unlock(&lock->wait_lock)
Get a waiter from lock->wait_list
wake_up_q
wake_up_process
__mutex_handoff Called if MUTEX_FLAG_HANDOFF is set
atomic_long_cmpxchg_release(
&lock->owner, owner,
__owner_flags(owner))
The woken task will update lock->owner
Set 3-bit LSB of lock->owner → Clear the
original task struct address
[Unlock task = lock->owner]
No waiter: 3-bit LSB of lock->owner are cleared
• [Fastpath] Check ownership of a mutex
• [Slowpath] Does not check ownership of a mutex
46. mutex_unlock(): [lab] Behavior observation when lock owner !=
unlocker’s task (Mutex ownership)
mutex_unlock()
mutex_lock()
core 0 core 1
kthread_0 kthread_1
Critical Section
sleep 3 seconds mutex_unlock()
sleep 1 second
lock_thread unlock_thread
Source code: test-modules/mutex-unlock-by-another-task/mutex.c
This scenario is created on purpose for
demonstration. It won’t happen in real case.
Note
47. mutex_unlock()
mutex_lock()
core 0 core 1
kthread_0 kthread_1
Critical Section
sleep 3 seconds mutex_unlock()
sleep 1 second
lock_thread unlock_thread
Can another task
unlock the mutex?
mutex_unlock(): [lab] Behavior observation when lock owner !=
unlocker’s task (Mutex ownership)
This scenario is created on purpose for
demonstration. It won’t happen in real case.
Note
50. breakpoint
1
breakpoint
1
lock owner = old
2
lock->owner is set 0 by atomic_long_cmpxchg_release()
3
mutex_unlock(): [lab] Behavior if lock owner != unlocker’s task
mutex_unlock()
mutex_lock()
core 0 core 1
kthread_0 kthread_1
Critical Section
sleep 3 seconds mutex_unlock()
sleep 1 second
lock_thread unlock_thread
We’re here
51. mutex_unlock(): [lab] Behavior observation when lock owner !=
unlocker’s task (Mutex ownership)
mutex_unlock()
mutex_lock()
core 0 core 1
kthread_0 kthread_1
Critical Section
sleep 3 seconds
mutex_unlock()
sleep 1 second
lock_thread unlock_thread
1
2
3
This unlocks the
locked mutex
mutex
owner = 0
wait_lock = 0
osq = 0
wait_list
Breakpoint stops at second mutex_unlock() in kthread_0
52. mutex_unlock(): [lab] Behavior observation when lock owner !=
unlocker’s task (Mutex ownership)
mutex_unlock()
mutex_lock()
core 0 core 1
kthread_0 kthread_1
Critical Section
sleep 3 seconds mutex_unlock()
sleep 1 second
lock_thread unlock_thread
1
2
3
This unlocks the locked mutex
What’s the behavior?
53. breakpoint
1
mutex_unlock(): [lab] Behavior when lock owner != unlocker’s task
mutex_unlock()
mutex_lock()
core 0 core 1
kthread_0 kthread_1
Critical Section
sleep 3 seconds mutex_unlock()
sleep 1 second
lock_thread unlock_thread
breakpoint
1
No lock owner
2
We’re here
54. breakpoint
1
mutex_unlock(): [lab] Behavior if lock owner != unlocker’s task
mutex_unlock()
mutex_lock()
core 0 core 1
kthread_0 kthread_1
Critical Section
sleep 3 seconds mutex_unlock()
sleep 1 second
lock_thread unlock_thread
breakpoint
1
lock owner = old
2
lock->owner is set 0 by atomic_long_cmpxchg_release()
3
4
We’re here
55. mutex_unlock(): [lab] Behavior observation when lock owner !=
unlocker’s task (Mutex ownership)
mutex_unlock()
mutex_lock()
core 0 core 1
kthread_0 kthread_1
Critical Section
sleep 3 seconds mutex_unlock()
sleep 1 second
lock_thread unlock_thread
Takeaways
1. [Fastpath] Linux kernel checks mutex’s ownership
2. [Slowpath] Linux kernel does not check mutex’s ownership when unlocking a mutex
✓ Developers must take care of mutex_lock/mutex_unlock pair
✓ Slowpath prints a warning message if mutex debug option is enabled.
✓ Different from the concept: Only the lock owner has the permission to unlock the mutex
56. mutex_unlock(): [lab] Behavior observation when lock owner !=
unlocker’s task (Mutex ownership)
Why doesn’t slowpath check ownership?
1. Ownership checking is only for developers and not enforced by Linux kernel.
✓ Developers need to take care of it.
Quotes
1. From Generic Mutex Subsystem
✓ Mutex Semantics
• Only one task can hold the mutex at a time.
• Only the owner can unlock the mutex.
• …
✓ These semantics are fully enforced when CONFIG_DEBUG_MUTEXES is enabled
57. Think about…
mutex_unlock()
mutex_lock()
core 0 core 1
Critical Section
sleep 3 seconds
mutex_unlock()
sleep 1 second
mutex_unlock()
mutex_lock()
core 2
Critical Section
1 Will it acquire the mutex lock
successfully?
2
Will this unlock the mutex
lock acquired by core 2?
58. Q&A #1: Semaphore can be used to synchronize with user-space
Consumer
buffer
Producer
up()
down()
[Semaphore] Producer/Consumer Concept
* Screenshot captured from: Chapter 10, Linux Kernel Development, 3rd Edition, Robert Love
• Consumer waits if buffer is empty
• Producer waits if buffer is full
• Only one process can manipulate the buffer at a time
(mutual exclusion)
Principle
59. Q&A #1: Semaphore can be used to synchronize with user-space
Consumer
buffer
Producer
up(), V
down(), P
[Semaphore] Producer/Consumer Concept
Code reference: CS 537 Notes, Section #6: Semaphores and Producer/Consumer Problem
60. Q&A #1: Semaphore can be used to synchronize with user-space
Consumer
buffer
Producer
up(), V
down(), P
[Semaphore] Producer/Consumer Concept
Data structure synchronization
Notify consumer
Wait if buffer is full Wait if buffer is empty
Notify producer
Code reference: CS 537 Notes, Section #6: Semaphores and Producer/Consumer Problem
61. Q&A #1: Semaphore can be used to synchronize with user-space
Consumer
buffer
Producer
up(), V
down(), P
[Semaphore] Producer/Consumer Concept
User Space
Kernel Space
. . .
buffer
Consumer Consumer
Producer
system
call
system
call
Possible Scenario
Note
1. up()/down() invocations are done in kernel.
62. Q&A #2: Mutex isn’t suitable for synchronizations
between kernel and user-space
* Screenshot captured from: Chapter 10, Linux Kernel Development, 3rd Edition, Robert Love
Explanation
1. [Mutex] ownership!
✓ Whoever locked a mutex must unlock it