SlideShare a Scribd company logo
1 of 30
Download to read offline
Optimizing Scheduler for Linux Gaming
Changwoo Min
changwoo@igalia.com
April 17, 2024
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
Talk outline
● Linux gaming is here to stay!
○ Does scheduler matter in gaming experience?
● Performance metrics for optimization
○ How to measure “stuttering” and what metric to optimize for?
● Gaming workload characterization
○ Are there any peculiar properties in gaming that a scheduler can leverage?
● sched_ext: BPF-based extensible scheduler framework
○ Easy to experiment new scheduler without pain
● Latency-criticality Aware Virtual Deadline (LAVD) scheduler
○ Scheduler should consider the latency criticality of a task
2
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
Linux gaming is here to stay
3
Image source: https://store.steampowered.com/steamdeck/
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
Linux gaming is here to stay
SteamDeck runs x86 Windows games!
● AMD APU
○ AMD Van Gogh: x86 8-core CPU + AMD GPU
● SteamOS
○ Arch-Linux based Linux distribution
○ with optimizations for SteamDeck hardware (especially AMDGPU)
● Proton
○ Compatibility layer for Windows games on Linux
○ Includes patched version of Wine
○ Plus a lot more: DXVK, VKD3D, gstreamer, ffmpeg
4
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
My game is stuttering
even after upgrading my PC :-(
5
Image source:
https://www.reddit.com/r/AM
DHelp/comments/172sjtj/gam
es_keep_on_stuttering_i_just_
upgraded_my_pc/
https://steamcommunity.com/
discussions/forum/1/5824898
961055124750/
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
Does scheduler matter
in gaming experience?
Yes, it does.
While most games are GPU-heavy, CPU is still the controller.
6
Scheduler A
Avg FPS: 41.3
Low 1% FPS: 29.5
Scheduler B
Avg FPS: 34.2
Low 1% FPS: 4.6
Time
Frame
Per
Second
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
What metrics to optimize for?
7
● When we say “performance”, what exactly does it mean?
● Performance means many different things
○ Throughput: how much job can be done in a given time
○ Latency: time to process a task
○ Tail latency (99p, 99.9p): latency of the worst 1% or 0.1%
● The optimization strategy could vary depending on the optimization
metrics
○ E.g., throughput vs. latency vs. tail latency optimized design
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
How to quantify “stuttering”?
8
● In games, FPS (frame per second) or frame time are the widely used
metric
○ Average FPS ≈ throughput of FPS
○ Low 1% FPS ≈ 99p latency of FPS
○ Low 0.1% FPS ≈ 99.9p latency of FPS
● Not only average FPS but also low 1% FPS are critical in user
experience. A sudden FPS dip is also known as a stuttering problem.
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
Talk outline
● Linux gaming is here to stay!
○ Does scheduler matter in gaming experience?
● Performance metrics for optimization
○ How to measure “stuttering” and what metric to optimize for?
● Gaming workload characterization
○ Are there any peculiar properties in gaming that a scheduler can leverage?
● sched_ext: BPF-based extensible scheduler framework
○ Easy to experiment new scheduler without pain
● Latency-criticality Aware Virtual Deadline (LAVD) scheduler
○ Scheduler should consider the latency criticality of a task
9
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
Understanding game workloads
● It is critical to understand the characteristics of games to optimize
scheduling policy for games.
● If games have the common characteristics, which are differentiated
from other general program domains (e.g., HPC, AI/ML, database), there
are chance to come up with a game optimized scheduling policy.
● We collect and analyze all scheduling activities while playing games.
10
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
Analyzed a lot of scheduling traces
11
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
Task scheduling and CPU utilization
12
● Around 300 tasks are scheduled while running a game.
○ Around 90% are long-living tasks; only 10% of tasks are terminated.
● Top 30-40 most frequently scheduled tasks take 95% of scheduling.
○ Around half of them are system tasks -- especially, wine,
graphics, and audio servers, taking 30--40% of scheduling.
○ There are 15-20 game-specific tasks, which takes 60-70%
scheduling.
● CPU utilization is moderately high -- around 65-95%, but not
overloaded (i.e., no 100%).
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
Task execution time per schedule
13
● In general, tasks run for very short duration – roughly a few 100s usec
on average to a few msec maximum.
● Task execution time is very stable and is predictable using its average.
Distribution of task’s runtime
average per schedule in a game
tasks
runtime
avg
(msec)
○ Some coordination tasks run very
shortly (e.g., wineserver:260 usec) but
some work tasks (e.g., a task worker:
1.65 msec) run longer than average. < 1 msec
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
What makes scheduling happen?
14
● Preemption (e.g., timer interrupt) takes only 25-30% of scheduling.
● 70-75% of scheduling is initiated by waiting system calls – such as
epoll, pipe_read, futex_wait, etc.
● In this case, a task is first scheduled out waiting for an event then it is
scheduled in when the event is notified.
Call stacks which triggered
scheduling in a game
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
Task waiting time
15
● Wait time from one schedule to another schedule is a task’s behavioral
property. Each task’s wait time is pretty constant.
Wait times over time in a game
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
Tasks are tightly linked in a graph
16
● Task graphs of top 50 percentile of waiter-wakers
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
Summary of game workloads characteristics
17
● Majority of game tasks shows periodic behavior
○ given the fact that a task has a stable execution time and a stable
wait time.
● A sequence of tasks serves for a single job (e.g., from a user input to
display update).
● Therefore, the accumulated scheduling delay in a task chain will affect
end-user’s experience; it would eventually result in a sudden spike of
frame time (low 1% FPS).
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
Talk outline
● Linux gaming is here to stay!
○ Does scheduler matter in gaming experience?
● Performance metrics for optimization
○ How to measure “stuttering” and what metric to optimize for?
● Gaming workload characterization
○ Are there any peculiar properties in gaming that a scheduler can leverage?
● sched_ext: BPF-based extensible scheduler framework
○ Easy to experiment new scheduler without pain
● Latency-criticality Aware Virtual Deadline (LAVD) scheduler
○ Scheduler should consider the latency criticality of a task
18
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
sched_ext: BPF-based
extensible scheduler framework
19
● sched_ext enables scheduling policies to be implemented in a BPF program
● The development procedure is somewhat similar to kernel module
programming but BPF guarantees safety (e.g., no memory bugs)
○ Write a scheduler policy in BPF
○ Compile it
○ Load it onto the system, letting BPF and sched_ext infrastructure do all
of the heavy lifting to enable it
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
sched_ext: BPF-based
extensible scheduler framework
20
● Rapid experimentation is possible
○ No reboot required, yay!
○ BPF cannot crash the host machine!
● GPLv2 only
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
Implementing scheduling policies
with sched_ext
21
● BPF program must implement a set of callbacks
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
Latency-criticality Aware Virtual Deadline
(LAVD) scheduler
22
● A new scheduling algorithm motivated by gaming workloads on Linux
○ Implemented as a BPF scheduler on top of the sched_ext framework
● Proportional share scheduler
○ Respect nice value to pursue the fair use of CPU time
● (Virtual) deadline based algorithm
○ Task’s deadline to represent the urgency of a task scheduling
○ But it is not a real-time scheduling algorithm
● Latency-criticality is the first class factor in making scheduling decisions
○ Infer a latency criticality of a task from task’s behavior in a task graph
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
Overall procedure of LAVD
23
● LAVD is a deadline-based scheduling algorithm, so its overall procedure is
similar to other deadline-based scheduling algorithms.
● Under LAVD, a runnable task has its time slice and virtual deadline,
which are decided by the scheduler.
● The LAVD scheduler picks a task with the earliest virtual deadline and
allows it to execute for the given time slice.
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
Defining latency criticality of a task
24
● We leverage a task's communication and behavioral properties to quantify
its latency criticality.
● Suppose there are three tasks, Task A, B, and C, coordinated by any waiting
calls (e.g., epoll, futex, etc).
○ [Task A] --> [Task B] --> [Task C]
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
Defining latency criticality of a task
25
● [Task A] --> [Task B] --> [Task C]
● We define Task B is more latency-critical in the following cases:
○ as Task B's runtime per schedule is shorter (runtime B)
■ If Task B's runtime per schedule is long, a relatively short scheduling
delay won't affect a lot.
○ as Task B wakes Task C more frequently (wake_freq B)
■ If Task B frequently wakes up Task C, the scheduling delay of Task B
also delays the execution of Task C.
○ as Task B waits for Task A more frequently (wait_freq B)
■ If Task B often waits for Task A, the scheduling delay of Task B
delays the completion of executing the task graph.
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
Task’s (virtual) deadline
26
● The latency criticality of a task is used to determine task's virtual deadline.
● A more latency-critical task will have a tighter (shorter) deadline, so the
scheduler picks such a task more urgently among other runnable tasks.
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
Task’s time slice
27
● The LAVD scheduler tries to schedule all runnable tasks at least once
within a predefined time window, which is called a targeted latency (e.g.,
15 msec).
● The scheduler proportionally divides the targeted latency as per task’s nice
priority (i.e., weight).
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
Enforcing the fair of use CPU time
28
● Assigning a task's time slice per its priority does not guarantee the fair use
of CPU time.
○ That is because a task can be more frequently executed than other
tasks or yield CPU before entirely consuming its assigned time slice.
● Deferring task’s execution
○ LAVD defers picking over-scheduled tasks to reduce the frequency of
task execution.
○ The deferring duration is proportional to task’s over-scheduled time.
Optimizing Scheduler for Linux Gaming
Changwoo Min, April 17, 2024
Early result is promising
29
● In many cases, LAVD provides better or similar performance than EEVDF in
terms of average FPS and Low 1% FPS
LAVD
EEVDF
6.9-rc1
upstream kernel
30
Join us!

More Related Content

Similar to Optimizing Scheduler for Linux Gaming.pdf

Similar to Optimizing Scheduler for Linux Gaming.pdf (20)

Building a Small Datacenter
Building a Small DatacenterBuilding a Small Datacenter
Building a Small Datacenter
 
Os prj ppt
Os prj pptOs prj ppt
Os prj ppt
 
Building a Small DC
Building a Small DCBuilding a Small DC
Building a Small DC
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3
 
L-2 (Computer Performance).ppt
L-2 (Computer Performance).pptL-2 (Computer Performance).ppt
L-2 (Computer Performance).ppt
 
Kernel Recipes 2014 - Performance Does Matter
Kernel Recipes 2014 - Performance Does MatterKernel Recipes 2014 - Performance Does Matter
Kernel Recipes 2014 - Performance Does Matter
 
Pcp
PcpPcp
Pcp
 
HPC on OpenStack
HPC on OpenStackHPC on OpenStack
HPC on OpenStack
 
Stream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUpStream processing with Apache Flink @ OfferUp
Stream processing with Apache Flink @ OfferUp
 
Monitoring and automation
Monitoring and automationMonitoring and automation
Monitoring and automation
 
Tales from the Optimization Trenches - Unite Copenhagen 2019
Tales from the Optimization Trenches - Unite Copenhagen 2019Tales from the Optimization Trenches - Unite Copenhagen 2019
Tales from the Optimization Trenches - Unite Copenhagen 2019
 
chapter 1 -Basic Structure of Computers.pptx
chapter 1 -Basic Structure of Computers.pptxchapter 1 -Basic Structure of Computers.pptx
chapter 1 -Basic Structure of Computers.pptx
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache Spark
 
How to Design for Database High Availability
How to Design for Database High AvailabilityHow to Design for Database High Availability
How to Design for Database High Availability
 
ch5_EN_CPUSched.pdf
ch5_EN_CPUSched.pdfch5_EN_CPUSched.pdf
ch5_EN_CPUSched.pdf
 
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
 
LAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMGLAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMG
 
How to Plan for Performance and Scale for Multiplayer Games
How to Plan for Performance and Scale for Multiplayer GamesHow to Plan for Performance and Scale for Multiplayer Games
How to Plan for Performance and Scale for Multiplayer Games
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performance
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 

More from Igalia

Building End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEBuilding End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPE
Igalia
 
Automated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesAutomated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded Devices
Igalia
 
Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JIT
Igalia
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Igalia
 

More from Igalia (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Building End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEBuilding End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPE
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesAutomated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded Devices
 
Embedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to MaintenanceEmbedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to Maintenance
 
Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JIT
 
To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!
 
Implementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerImplementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamer
 
8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
 
2023 in Chimera Linux
2023 in Chimera                    Linux2023 in Chimera                    Linux
2023 in Chimera Linux
 
Building a Linux distro with LLVM
Building a Linux distro        with LLVMBuilding a Linux distro        with LLVM
Building a Linux distro with LLVM
 
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUsturnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
 
Graphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesGraphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devices
 
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSDelegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
 
MessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webMessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the web
 
Replacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersReplacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shaders
 
I'm not an AMD expert, but...
I'm not an AMD expert, but...I'm not an AMD expert, but...
I'm not an AMD expert, but...
 
Status of Vulkan on Raspberry
Status of Vulkan on RaspberryStatus of Vulkan on Raspberry
Status of Vulkan on Raspberry
 
Enable hardware acceleration for GL applications without glamor on Xorg modes...
Enable hardware acceleration for GL applications without glamor on Xorg modes...Enable hardware acceleration for GL applications without glamor on Xorg modes...
Enable hardware acceleration for GL applications without glamor on Xorg modes...
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

Optimizing Scheduler for Linux Gaming.pdf

  • 1. Optimizing Scheduler for Linux Gaming Changwoo Min changwoo@igalia.com April 17, 2024
  • 2. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 Talk outline ● Linux gaming is here to stay! ○ Does scheduler matter in gaming experience? ● Performance metrics for optimization ○ How to measure “stuttering” and what metric to optimize for? ● Gaming workload characterization ○ Are there any peculiar properties in gaming that a scheduler can leverage? ● sched_ext: BPF-based extensible scheduler framework ○ Easy to experiment new scheduler without pain ● Latency-criticality Aware Virtual Deadline (LAVD) scheduler ○ Scheduler should consider the latency criticality of a task 2
  • 3. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 Linux gaming is here to stay 3 Image source: https://store.steampowered.com/steamdeck/
  • 4. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 Linux gaming is here to stay SteamDeck runs x86 Windows games! ● AMD APU ○ AMD Van Gogh: x86 8-core CPU + AMD GPU ● SteamOS ○ Arch-Linux based Linux distribution ○ with optimizations for SteamDeck hardware (especially AMDGPU) ● Proton ○ Compatibility layer for Windows games on Linux ○ Includes patched version of Wine ○ Plus a lot more: DXVK, VKD3D, gstreamer, ffmpeg 4
  • 5. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 My game is stuttering even after upgrading my PC :-( 5 Image source: https://www.reddit.com/r/AM DHelp/comments/172sjtj/gam es_keep_on_stuttering_i_just_ upgraded_my_pc/ https://steamcommunity.com/ discussions/forum/1/5824898 961055124750/
  • 6. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 Does scheduler matter in gaming experience? Yes, it does. While most games are GPU-heavy, CPU is still the controller. 6 Scheduler A Avg FPS: 41.3 Low 1% FPS: 29.5 Scheduler B Avg FPS: 34.2 Low 1% FPS: 4.6 Time Frame Per Second
  • 7. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 What metrics to optimize for? 7 ● When we say “performance”, what exactly does it mean? ● Performance means many different things ○ Throughput: how much job can be done in a given time ○ Latency: time to process a task ○ Tail latency (99p, 99.9p): latency of the worst 1% or 0.1% ● The optimization strategy could vary depending on the optimization metrics ○ E.g., throughput vs. latency vs. tail latency optimized design
  • 8. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 How to quantify “stuttering”? 8 ● In games, FPS (frame per second) or frame time are the widely used metric ○ Average FPS ≈ throughput of FPS ○ Low 1% FPS ≈ 99p latency of FPS ○ Low 0.1% FPS ≈ 99.9p latency of FPS ● Not only average FPS but also low 1% FPS are critical in user experience. A sudden FPS dip is also known as a stuttering problem.
  • 9. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 Talk outline ● Linux gaming is here to stay! ○ Does scheduler matter in gaming experience? ● Performance metrics for optimization ○ How to measure “stuttering” and what metric to optimize for? ● Gaming workload characterization ○ Are there any peculiar properties in gaming that a scheduler can leverage? ● sched_ext: BPF-based extensible scheduler framework ○ Easy to experiment new scheduler without pain ● Latency-criticality Aware Virtual Deadline (LAVD) scheduler ○ Scheduler should consider the latency criticality of a task 9
  • 10. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 Understanding game workloads ● It is critical to understand the characteristics of games to optimize scheduling policy for games. ● If games have the common characteristics, which are differentiated from other general program domains (e.g., HPC, AI/ML, database), there are chance to come up with a game optimized scheduling policy. ● We collect and analyze all scheduling activities while playing games. 10
  • 11. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 Analyzed a lot of scheduling traces 11
  • 12. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 Task scheduling and CPU utilization 12 ● Around 300 tasks are scheduled while running a game. ○ Around 90% are long-living tasks; only 10% of tasks are terminated. ● Top 30-40 most frequently scheduled tasks take 95% of scheduling. ○ Around half of them are system tasks -- especially, wine, graphics, and audio servers, taking 30--40% of scheduling. ○ There are 15-20 game-specific tasks, which takes 60-70% scheduling. ● CPU utilization is moderately high -- around 65-95%, but not overloaded (i.e., no 100%).
  • 13. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 Task execution time per schedule 13 ● In general, tasks run for very short duration – roughly a few 100s usec on average to a few msec maximum. ● Task execution time is very stable and is predictable using its average. Distribution of task’s runtime average per schedule in a game tasks runtime avg (msec) ○ Some coordination tasks run very shortly (e.g., wineserver:260 usec) but some work tasks (e.g., a task worker: 1.65 msec) run longer than average. < 1 msec
  • 14. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 What makes scheduling happen? 14 ● Preemption (e.g., timer interrupt) takes only 25-30% of scheduling. ● 70-75% of scheduling is initiated by waiting system calls – such as epoll, pipe_read, futex_wait, etc. ● In this case, a task is first scheduled out waiting for an event then it is scheduled in when the event is notified. Call stacks which triggered scheduling in a game
  • 15. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 Task waiting time 15 ● Wait time from one schedule to another schedule is a task’s behavioral property. Each task’s wait time is pretty constant. Wait times over time in a game
  • 16. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 Tasks are tightly linked in a graph 16 ● Task graphs of top 50 percentile of waiter-wakers
  • 17. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 Summary of game workloads characteristics 17 ● Majority of game tasks shows periodic behavior ○ given the fact that a task has a stable execution time and a stable wait time. ● A sequence of tasks serves for a single job (e.g., from a user input to display update). ● Therefore, the accumulated scheduling delay in a task chain will affect end-user’s experience; it would eventually result in a sudden spike of frame time (low 1% FPS).
  • 18. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 Talk outline ● Linux gaming is here to stay! ○ Does scheduler matter in gaming experience? ● Performance metrics for optimization ○ How to measure “stuttering” and what metric to optimize for? ● Gaming workload characterization ○ Are there any peculiar properties in gaming that a scheduler can leverage? ● sched_ext: BPF-based extensible scheduler framework ○ Easy to experiment new scheduler without pain ● Latency-criticality Aware Virtual Deadline (LAVD) scheduler ○ Scheduler should consider the latency criticality of a task 18
  • 19. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 sched_ext: BPF-based extensible scheduler framework 19 ● sched_ext enables scheduling policies to be implemented in a BPF program ● The development procedure is somewhat similar to kernel module programming but BPF guarantees safety (e.g., no memory bugs) ○ Write a scheduler policy in BPF ○ Compile it ○ Load it onto the system, letting BPF and sched_ext infrastructure do all of the heavy lifting to enable it
  • 20. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 sched_ext: BPF-based extensible scheduler framework 20 ● Rapid experimentation is possible ○ No reboot required, yay! ○ BPF cannot crash the host machine! ● GPLv2 only
  • 21. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 Implementing scheduling policies with sched_ext 21 ● BPF program must implement a set of callbacks
  • 22. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 Latency-criticality Aware Virtual Deadline (LAVD) scheduler 22 ● A new scheduling algorithm motivated by gaming workloads on Linux ○ Implemented as a BPF scheduler on top of the sched_ext framework ● Proportional share scheduler ○ Respect nice value to pursue the fair use of CPU time ● (Virtual) deadline based algorithm ○ Task’s deadline to represent the urgency of a task scheduling ○ But it is not a real-time scheduling algorithm ● Latency-criticality is the first class factor in making scheduling decisions ○ Infer a latency criticality of a task from task’s behavior in a task graph
  • 23. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 Overall procedure of LAVD 23 ● LAVD is a deadline-based scheduling algorithm, so its overall procedure is similar to other deadline-based scheduling algorithms. ● Under LAVD, a runnable task has its time slice and virtual deadline, which are decided by the scheduler. ● The LAVD scheduler picks a task with the earliest virtual deadline and allows it to execute for the given time slice.
  • 24. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 Defining latency criticality of a task 24 ● We leverage a task's communication and behavioral properties to quantify its latency criticality. ● Suppose there are three tasks, Task A, B, and C, coordinated by any waiting calls (e.g., epoll, futex, etc). ○ [Task A] --> [Task B] --> [Task C]
  • 25. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 Defining latency criticality of a task 25 ● [Task A] --> [Task B] --> [Task C] ● We define Task B is more latency-critical in the following cases: ○ as Task B's runtime per schedule is shorter (runtime B) ■ If Task B's runtime per schedule is long, a relatively short scheduling delay won't affect a lot. ○ as Task B wakes Task C more frequently (wake_freq B) ■ If Task B frequently wakes up Task C, the scheduling delay of Task B also delays the execution of Task C. ○ as Task B waits for Task A more frequently (wait_freq B) ■ If Task B often waits for Task A, the scheduling delay of Task B delays the completion of executing the task graph.
  • 26. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 Task’s (virtual) deadline 26 ● The latency criticality of a task is used to determine task's virtual deadline. ● A more latency-critical task will have a tighter (shorter) deadline, so the scheduler picks such a task more urgently among other runnable tasks.
  • 27. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 Task’s time slice 27 ● The LAVD scheduler tries to schedule all runnable tasks at least once within a predefined time window, which is called a targeted latency (e.g., 15 msec). ● The scheduler proportionally divides the targeted latency as per task’s nice priority (i.e., weight).
  • 28. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 Enforcing the fair of use CPU time 28 ● Assigning a task's time slice per its priority does not guarantee the fair use of CPU time. ○ That is because a task can be more frequently executed than other tasks or yield CPU before entirely consuming its assigned time slice. ● Deferring task’s execution ○ LAVD defers picking over-scheduled tasks to reduce the frequency of task execution. ○ The deferring duration is proportional to task’s over-scheduled time.
  • 29. Optimizing Scheduler for Linux Gaming Changwoo Min, April 17, 2024 Early result is promising 29 ● In many cases, LAVD provides better or similar performance than EEVDF in terms of average FPS and Low 1% FPS LAVD EEVDF 6.9-rc1 upstream kernel