SlideShare a Scribd company logo
Deadline Scheduling Accounting,
CPUset and CPUhotplug
Mathieu Poirier
ENGINEERS AND DEVICES
WORKING TOGETHER
In This Presentation
● Describe a classic feature interaction problem between:
○ The deadline (DL) scheduler
○ CPUhotplug
○ CPUset
● Short overview of the initial RFC published before Linux Plumbers
● Open discussion on how best to approach the remaining work
● We are not talking about…
○ How the Linux scheduler works
○ What is deadline scheduling
○ How deadline scheduling works
ENGINEERS AND DEVICES
WORKING TOGETHER
A Very Simple Problem
● Deadline accounting metrics are lost when...
○ A CPU is hotpluged in or out of a system
○ CPUsets are created or destroyed
● Why is this a problem?
○ Losing track of deadline utilisation leads to a collapse of the mathematical model behind DL
○ Under utilisation of the system’s potential
○ Over selling capacity → failure of the system to honor deadline contracts
● Steve Rostedt’s initial report [1].
[1]. https://lkml.org/lkml/2016/2/3/966
ENGINEERS AND DEVICES
WORKING TOGETHER
Before a Hotplug Operation...
root@linaro-developer:/home/linaro# grep dl /proc/sched_debug
dl_rq[0]:
.dl_nr_running : 0
.dl_nr_migratory : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 629145 <------ Result of a 6:10 DL reservation is accounted
Dl_rq[1]: on all runqueues included in the root domain
.dl_nr_running : 0 (60% of 1048576 == 629145)
.dl_nr_migratory : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 629145
dl_rq[2]:
.dl_nr_running : 1
.dl_nr_migratory : 1
.dl_bw->bw : 996147
.dl_bw->total_bw : 629145
dl_rq[3]:
.dl_nr_running : 0
.dl_nr_migratory : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 629145
ENGINEERS AND DEVICES
WORKING TOGETHER
After a Hotplug Operation...
root@linaro-developer:/home/linaro# grep dl /proc/sched_debug
dl_rq[0]:
.dl_nr_running : 0
.dl_nr_migratory : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 0 <------ DL accounting is lost on all the runqueues
Dl_rq[1]: in the root domain
.dl_nr_running : 0
.dl_nr_migratory : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 0
dl_rq[2]:
.dl_nr_running : 1
.dl_nr_migratory : 1
.dl_bw->bw : 996147
.dl_bw->total_bw : 0
● The same can be achieved with various CPUset operations
ENGINEERS AND DEVICES
WORKING TOGETHER
Why Is This Happening?
● When a CPUhotplug/CPUset operation happens:
○ Runqueues are removed from the current root domain and added to the default root domain
○ Current root domains are deleted
○ New root domains are created
○ Runqueues are removed from the default root domain and added to the new root domains
● DL accounting metrics are lost in the first step
● For CPUhotplug the above is split in a two-step (and asynchronous) process
● Tricky to fix because the code is shared between CPUhotplug and CPUset
ENGINEERS AND DEVICES
WORKING TOGETHER
Obvious Solution, Not So Easy To Implement
● Obviously recompute DL bandwidth utilisation when new root domains are
created
But…
● Common code between CPUhotplug and CPUset
○ But the processes calling the common code is different
■ CPUhotplug → Asynchronous
■ CPUSet → Synchronous
● Problems to keep in mind:
○ Tasks are removed from runqueues when they get suspended → they can go anywhere
○ Tasks can span more than one root domain → not supported by DL scheduler
ENGINEERS AND DEVICES
WORKING TOGETHER
The Current Solution
● An RFC that fix the problem using CPUsets has been posted [1]
● It is based on the premise that every task (suspended or not) in the system
belongs to a single CPUset.
● As such to recompute root domains’ DL metrics:
○ Circle all the CPUsets in the system
○ If a DL task is found, get a reference to the root domain it is associated to [2]
○ Add DL task utilisation to the root domain
● General consensus this is the right approach
[1]. https://marc.info/?l=linux-kernel&m=150291845422763&w=2
[2]. By way of the task’s cpu_allowed field and runqueues
ENGINEERS AND DEVICES
WORKING TOGETHER
Pros And Cons Of The Solution
● Advantages:
○ Approach is relatively simple
○ Works for any kind of CPUset topology one can imagine
○ Deals with running, runnable and suspended task in the same way
● Disadvantages:
○ Currently prevents tasks from belonging to more than a single root domain
○ Hard to cover all the scenarios that can lead to the above
ENGINEERS AND DEVICES
WORKING TOGETHER
Task In Multiple Root Domains
How can this happen?
○ By default most new tasks can use all the CPUs in the system
○ New tasks belong to the default CPUset [1]
○ Newly created CPUsets will have their own root domain
○ Existing tasks aren’t impacted and as such are allowed to use the CPUs in the new root
domains, something that breaks the DL scheduler acceptance test
How do we deal with tasks spanning more than a single root domain?
[1].Tasks need to be explicitly assigned to a CPUset in order to belong to that set
ENGINEERS AND DEVICES
WORKING TOGETHER
Possible Solution #1
● Continue with what the current RFC does and prevent DL tasks from spanning
more than one root domain?
○ Somewhat simple
○ Hackish and brittle (in my opinion)
ENGINEERS AND DEVICES
WORKING TOGETHER
Possible Solution #2
● Update the DL scheduler so that it can deal with tasks spanning more than
one root domain?
○ Represent a serious amount of work
○ Jeopardize the integrity of the current DL scheduler
○ Modifies the underlying DL scheduler mathematical model
○ Mathematical model rigorously proven, probably not a good idea to mess with it
ENGINEERS AND DEVICES
WORKING TOGETHER
Possible Solution #3
● Rebuild the scheduling domains before a CPUset/CPUhotplug operation
starts, rather than near the end.
○ Would provide a view of the system after the CPUset/CPUhotplug operation
○ Solid and reliable way to spot tasks ending up in more than one root domain
○ An idea that came to me at 30 thousand feet - could be a pipe dream
ENGINEERS AND DEVICES
WORKING TOGETHER
Your
Thank You
#SFO17
SFO17 keynotes and videos on: connect.linaro.org
For further information: www.linaro.org

More Related Content

More from Linaro

HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
Linaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
Linaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
Linaro
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
Linaro
 

More from Linaro (20)

Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024
 

Deadline Scheduling Accounting, CPUsets and CPUhotplug - SFO17-505

  • 1. Deadline Scheduling Accounting, CPUset and CPUhotplug Mathieu Poirier
  • 2. ENGINEERS AND DEVICES WORKING TOGETHER In This Presentation ● Describe a classic feature interaction problem between: ○ The deadline (DL) scheduler ○ CPUhotplug ○ CPUset ● Short overview of the initial RFC published before Linux Plumbers ● Open discussion on how best to approach the remaining work ● We are not talking about… ○ How the Linux scheduler works ○ What is deadline scheduling ○ How deadline scheduling works
  • 3. ENGINEERS AND DEVICES WORKING TOGETHER A Very Simple Problem ● Deadline accounting metrics are lost when... ○ A CPU is hotpluged in or out of a system ○ CPUsets are created or destroyed ● Why is this a problem? ○ Losing track of deadline utilisation leads to a collapse of the mathematical model behind DL ○ Under utilisation of the system’s potential ○ Over selling capacity → failure of the system to honor deadline contracts ● Steve Rostedt’s initial report [1]. [1]. https://lkml.org/lkml/2016/2/3/966
  • 4. ENGINEERS AND DEVICES WORKING TOGETHER Before a Hotplug Operation... root@linaro-developer:/home/linaro# grep dl /proc/sched_debug dl_rq[0]: .dl_nr_running : 0 .dl_nr_migratory : 0 .dl_bw->bw : 996147 .dl_bw->total_bw : 629145 <------ Result of a 6:10 DL reservation is accounted Dl_rq[1]: on all runqueues included in the root domain .dl_nr_running : 0 (60% of 1048576 == 629145) .dl_nr_migratory : 0 .dl_bw->bw : 996147 .dl_bw->total_bw : 629145 dl_rq[2]: .dl_nr_running : 1 .dl_nr_migratory : 1 .dl_bw->bw : 996147 .dl_bw->total_bw : 629145 dl_rq[3]: .dl_nr_running : 0 .dl_nr_migratory : 0 .dl_bw->bw : 996147 .dl_bw->total_bw : 629145
  • 5. ENGINEERS AND DEVICES WORKING TOGETHER After a Hotplug Operation... root@linaro-developer:/home/linaro# grep dl /proc/sched_debug dl_rq[0]: .dl_nr_running : 0 .dl_nr_migratory : 0 .dl_bw->bw : 996147 .dl_bw->total_bw : 0 <------ DL accounting is lost on all the runqueues Dl_rq[1]: in the root domain .dl_nr_running : 0 .dl_nr_migratory : 0 .dl_bw->bw : 996147 .dl_bw->total_bw : 0 dl_rq[2]: .dl_nr_running : 1 .dl_nr_migratory : 1 .dl_bw->bw : 996147 .dl_bw->total_bw : 0 ● The same can be achieved with various CPUset operations
  • 6. ENGINEERS AND DEVICES WORKING TOGETHER Why Is This Happening? ● When a CPUhotplug/CPUset operation happens: ○ Runqueues are removed from the current root domain and added to the default root domain ○ Current root domains are deleted ○ New root domains are created ○ Runqueues are removed from the default root domain and added to the new root domains ● DL accounting metrics are lost in the first step ● For CPUhotplug the above is split in a two-step (and asynchronous) process ● Tricky to fix because the code is shared between CPUhotplug and CPUset
  • 7. ENGINEERS AND DEVICES WORKING TOGETHER Obvious Solution, Not So Easy To Implement ● Obviously recompute DL bandwidth utilisation when new root domains are created But… ● Common code between CPUhotplug and CPUset ○ But the processes calling the common code is different ■ CPUhotplug → Asynchronous ■ CPUSet → Synchronous ● Problems to keep in mind: ○ Tasks are removed from runqueues when they get suspended → they can go anywhere ○ Tasks can span more than one root domain → not supported by DL scheduler
  • 8. ENGINEERS AND DEVICES WORKING TOGETHER The Current Solution ● An RFC that fix the problem using CPUsets has been posted [1] ● It is based on the premise that every task (suspended or not) in the system belongs to a single CPUset. ● As such to recompute root domains’ DL metrics: ○ Circle all the CPUsets in the system ○ If a DL task is found, get a reference to the root domain it is associated to [2] ○ Add DL task utilisation to the root domain ● General consensus this is the right approach [1]. https://marc.info/?l=linux-kernel&m=150291845422763&w=2 [2]. By way of the task’s cpu_allowed field and runqueues
  • 9. ENGINEERS AND DEVICES WORKING TOGETHER Pros And Cons Of The Solution ● Advantages: ○ Approach is relatively simple ○ Works for any kind of CPUset topology one can imagine ○ Deals with running, runnable and suspended task in the same way ● Disadvantages: ○ Currently prevents tasks from belonging to more than a single root domain ○ Hard to cover all the scenarios that can lead to the above
  • 10. ENGINEERS AND DEVICES WORKING TOGETHER Task In Multiple Root Domains How can this happen? ○ By default most new tasks can use all the CPUs in the system ○ New tasks belong to the default CPUset [1] ○ Newly created CPUsets will have their own root domain ○ Existing tasks aren’t impacted and as such are allowed to use the CPUs in the new root domains, something that breaks the DL scheduler acceptance test How do we deal with tasks spanning more than a single root domain? [1].Tasks need to be explicitly assigned to a CPUset in order to belong to that set
  • 11. ENGINEERS AND DEVICES WORKING TOGETHER Possible Solution #1 ● Continue with what the current RFC does and prevent DL tasks from spanning more than one root domain? ○ Somewhat simple ○ Hackish and brittle (in my opinion)
  • 12. ENGINEERS AND DEVICES WORKING TOGETHER Possible Solution #2 ● Update the DL scheduler so that it can deal with tasks spanning more than one root domain? ○ Represent a serious amount of work ○ Jeopardize the integrity of the current DL scheduler ○ Modifies the underlying DL scheduler mathematical model ○ Mathematical model rigorously proven, probably not a good idea to mess with it
  • 13. ENGINEERS AND DEVICES WORKING TOGETHER Possible Solution #3 ● Rebuild the scheduling domains before a CPUset/CPUhotplug operation starts, rather than near the end. ○ Would provide a view of the system after the CPUset/CPUhotplug operation ○ Solid and reliable way to spot tasks ending up in more than one root domain ○ An idea that came to me at 30 thousand feet - could be a pipe dream
  • 15. Thank You #SFO17 SFO17 keynotes and videos on: connect.linaro.org For further information: www.linaro.org