SlideShare a Scribd company logo
Wed 5 March, 11:15am, Daniel Lezcano, Mike Turquette
LCA14-306: CPUidle & CPUfreq
integration with scheduler
Introduction
● Power aware discussion
● Patchset « Small task packing »
− Some informations shared between cpuidle and the
scheduler
− https://lwn.net/Articles/520857/
● « Line on the sand » by Ingo Molnar
− Integrate first cpuidle and cpufreq with the scheduler
− http://lwn.net/Articles/552885/
Scheduler CPUidle
Idle task
Governor
CPUidle backend
driver
cpuidle_idle_callswitch_to
cpuidle_select cpuidle_enter
CPUidle + scheduler : Current design
Idle time measurement
● From the scheduler :
− The duration of the idle task is running
− Includes the interrupt processing time
● From CPUidle :
− The duration between interrupts
● CPUIdle code happens with local interrupts disabled
● T(idle task) = Σ T(CPUidle) + Σ T(irqs)
Idle time measurement
Idle time measurement unification
● What is the impact of returning to the
scheduler each time an interrupt occurred ?
− Scheduler will choose the idle task again if nothing
to do
− Mainloop code simplified
− Idle time measured nearly the same for the
scheduler and cpuidle
− Probably a negative impact on performance to fix
Load balance
● Taking the decision to balance a task when
going to idle
■ Use of avg_idle
● Does not use how long the cpu will sleep
■ The idle state should be selected before
■ CPUIdle should give the state the cpu will be
● Balance a task to the idlest cpu
■ Does not use the cpu's exit latency
■ CPUidle should give back the state the cpu is
CPUidle main function
● Reduce the distance between the scheduler
and the cpuidle framework
− Move the idle task to kernel/sched
− Move the cpuidle_idle function in the idle task code
− Integrate the idle mainloop and cpuidle_idle_call
● Allows to access the scheduler's private
structure definition
Menu governor split
● The events could be classified in three
categories :
1. Predictable → timers
2. Repetitive → IOs
3. Random → key stroke, incoming packet
● Category 2 could be integrated into the
scheduler
IO latency tracking
● IO are repetitive within a reasonable interval to
assume it as predictable enough
IO latency tracking
● Measurement from the scheduler
− io_schedule
− io_schedule_timeout
● Count per task the io latency
− Task migration moves IO history unlike current
governor
− Latency constraint for the task
Combine informations
● Move predictable event framework in the
scheduler
● Informations combined between the scheduler
and menu governor will be more accurate
− Idle balance decision based on the idle state a cpu
is or about to enter
− Load tracking from task for idle state exit latency
− CPU computation power and topology
− DVFS strategies for exit idle state boost
Scheduler + CPUidle
● The scheduler should have all the informations
to tell CPUidle :
− How long it will sleep
− What is the latency constraint
● The CPUidle should use the information
provided by the scheduler :
− Select an idle state
− Use the backend driver idle callback
− No more heuristics
Status
● A lot of cleanups around the idle mainloop
● CPUidle main function inside the idle mainloop
− Code distance reduced, sharing the structures
scheduler/cpuidle
− Communication between sub-systems made easier
Work in progress
● First iteration of IO latency tracking
implemented
− Validation in progress
● Simple governor for CPUIdle
− Select a state
● Idle time unification experimentation
CPUfreq + scheduler
The title is misleading … CPUfreq may completely
disappear in the future.
CPUfreq + scheduler
The title is misleading … CPUfreq may completely
disappear in the future.
Goal is to initiate CPU dynamic voltage & frequency
scaling (DVFS) from the Linux scheduler
CPUfreq + scheduler
The title is misleading … CPUfreq may completely
disappear in the future.
Goal is to initiate CPU dynamic voltage & frequency
scaling (DVFS) from the Linux scheduler
Nobody knows what this will look like, so please ask
questions and raise suggestions
• Polling workqueue
• E.g. ondemand
• Based on idle time / busyness
• No relation to decisions taken by the scheduler
• Task may be run at any time
• No relation to idle task
• In fact, task will not wake-up during idle
CPUfreq today
• Replace polling loop with event driven action
• Scheduler already takes action which affects available
compute capacity
• Load balance
• Migrating tasks to and from CPUs of different compute capacity
• DVFS transitions are a natural fit
Event driven behavior
• Method to initiate CPU DVFS transitions from the
scheduler
• Identify call sites to initiate those transitions
• Enqueue/dequeue task
• Load balance
• Idle entry/exit
• Aggressively schedule deadline tasks
• Maybe others
• Define interface between the scheduler & the DVFS
thingy
• Currently a power driver in Morten’s RFC
• Remove CPUfreq governor layer from the power driver completely?
Lots of work ahead
• Experiment with policy
• When and where to evaluate if frequency should be changed
• What metrics are important to the algorithm?
• DVFS versus race-to-idle
• Integrate with power model
• Benchmark performance & power
• Performance regressions
• Does it save power?
• Make it work with non-CPUfreq things like PSCI and
ACPI for changing CPU P-state
Lots of work ahead, part 2
• https://lkml.org/lkml/2013/10/11/547
• Replaces polling loop in CPUfreq governor with
scheduler event-driven action
• CPUfreq machine drivers are re-used initially
• CPUfreq governor becomes a shim layer to the power
driver
Morten’s power aware scheduling RFC
• DVFS task is itself scheduled on a workqueue
• Might not be run for some time after the scheduler determines that a
DVFS transition should happen
• Kworker threads are filtered out
• Prevents infinite reentrancy into the scheduler
• CPU capacity is not changed when enqueuing and dequeuing these
tasks
Nitty gritty details
include/linux/sched/power.h
struct power_driver {
/*
* Power driver calls may happen from scheduler context with irq
* disabled and rq locks held. This must be taken into account in
* the power driver.
*/
/* cpu already at max capacity? */
int (*at_max_capacity) (int cpu);
/* Increase cpu capacity hint */
int (*go_faster) (int cpu, int hint);
/* Decrease cpu capacity hint */
int (*go_slower) (int cpu, int hint);
/* Best cpu to wake up */
int (*best_wake_cpu) (void);
/* Scheduler call-back without rq lock held and with irq enabled */
void (*late_callback) (int cpu);
};
• https://github.com/mturquette/linux/commits/sched-cpufreq
• Replaced workqueue method with per-CPU kthread
• This allows removal of the kworker filter
• Please commence bikeshedding over the name of this kthread
• Use SCHED_FIFO policy for the task
• Will be run before the normal work (right?)
• These patches were just validated yesterday
• Bugs
• Holes in logic
• Misunderstandings
• Voided warranties
Incremental changes on top
• Gather more opinions on the power driver interface
• Is go_faster/go_slower the right way?
• Spoiler alert: Probably not.
• When else might we want to evaluate CPU frequency?
• Idle entry/exit as mentioned by Daniel
• Cluster-level considerations
• Sched domains
• Not just per-core
• Four Cortex-A9’s with single CPU clock
• Coordinate with the power model work
What’s next?
Questions?
More about Linaro Connect: http://connect.linaro.org
More about Linaro: http://www.linaro.org/about/
More about Linaro engineering: http://www.linaro.org/engineering/
Linaro members: www.linaro.org/members

More Related Content

What's hot

BKK16-208 EAS
BKK16-208 EASBKK16-208 EAS
BKK16-208 EAS
Linaro
 
WALT vs PELT : Redux - SFO17-307
WALT vs PELT : Redux  - SFO17-307WALT vs PELT : Redux  - SFO17-307
WALT vs PELT : Redux - SFO17-307
Linaro
 
Spi drivers
Spi driversSpi drivers
Spi drivers
pradeep_tewani
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)
shimosawa
 
리눅스 커널 디버거 KGDB/KDB
리눅스 커널 디버거 KGDB/KDB리눅스 커널 디버거 KGDB/KDB
리눅스 커널 디버거 KGDB/KDB
Manjong Han
 
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
Linaro
 
Rootlinux17: Hypervisors on ARM - Overview and Design Choices by Julien Grall...
Rootlinux17: Hypervisors on ARM - Overview and Design Choices by Julien Grall...Rootlinux17: Hypervisors on ARM - Overview and Design Choices by Julien Grall...
Rootlinux17: Hypervisors on ARM - Overview and Design Choices by Julien Grall...
The Linux Foundation
 
ACPI Debugging from Linux Kernel
ACPI Debugging from Linux KernelACPI Debugging from Linux Kernel
ACPI Debugging from Linux Kernel
SUSE Labs Taipei
 
HSA Kernel Code (KFD v0.6)
HSA Kernel Code (KFD v0.6)HSA Kernel Code (KFD v0.6)
HSA Kernel Code (KFD v0.6)
Hann Yu-Ju Huang
 
XPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM Systems
XPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM SystemsXPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM Systems
XPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM Systems
The Linux Foundation
 
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted FirmwareHKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
Linaro
 
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted FirmwareHKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted FirmwareLinaro
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
Divye Kapoor
 
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Anne Nicolas
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
Adrien Mahieux
 
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
Adrian Huang
 
Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VStatic partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-V
RISC-V International
 
Hibernation in Linux 2.6.29
Hibernation in Linux 2.6.29Hibernation in Linux 2.6.29
Hibernation in Linux 2.6.29Varun Mahajan
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
Adrian Huang
 
The Theory and Implementation of DVFS on Linux
The Theory and Implementation of DVFS on LinuxThe Theory and Implementation of DVFS on Linux
The Theory and Implementation of DVFS on Linux
Picker Weng
 

What's hot (20)

BKK16-208 EAS
BKK16-208 EASBKK16-208 EAS
BKK16-208 EAS
 
WALT vs PELT : Redux - SFO17-307
WALT vs PELT : Redux  - SFO17-307WALT vs PELT : Redux  - SFO17-307
WALT vs PELT : Redux - SFO17-307
 
Spi drivers
Spi driversSpi drivers
Spi drivers
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)
 
리눅스 커널 디버거 KGDB/KDB
리눅스 커널 디버거 KGDB/KDB리눅스 커널 디버거 KGDB/KDB
리눅스 커널 디버거 KGDB/KDB
 
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
 
Rootlinux17: Hypervisors on ARM - Overview and Design Choices by Julien Grall...
Rootlinux17: Hypervisors on ARM - Overview and Design Choices by Julien Grall...Rootlinux17: Hypervisors on ARM - Overview and Design Choices by Julien Grall...
Rootlinux17: Hypervisors on ARM - Overview and Design Choices by Julien Grall...
 
ACPI Debugging from Linux Kernel
ACPI Debugging from Linux KernelACPI Debugging from Linux Kernel
ACPI Debugging from Linux Kernel
 
HSA Kernel Code (KFD v0.6)
HSA Kernel Code (KFD v0.6)HSA Kernel Code (KFD v0.6)
HSA Kernel Code (KFD v0.6)
 
XPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM Systems
XPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM SystemsXPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM Systems
XPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM Systems
 
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted FirmwareHKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
 
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted FirmwareHKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
 
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
 
Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VStatic partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-V
 
Hibernation in Linux 2.6.29
Hibernation in Linux 2.6.29Hibernation in Linux 2.6.29
Hibernation in Linux 2.6.29
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
 
The Theory and Implementation of DVFS on Linux
The Theory and Implementation of DVFS on LinuxThe Theory and Implementation of DVFS on Linux
The Theory and Implementation of DVFS on Linux
 

Viewers also liked

LCA13: Who Disturbs My Slumber
LCA13: Who Disturbs My SlumberLCA13: Who Disturbs My Slumber
LCA13: Who Disturbs My Slumber
Linaro
 
BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement
Linaro
 
Q2.12: Scheduler Inputs
Q2.12: Scheduler InputsQ2.12: Scheduler Inputs
Q2.12: Scheduler Inputs
Linaro
 
LCE12: big.LITTLE TC2 update
LCE12: big.LITTLE TC2 updateLCE12: big.LITTLE TC2 update
LCE12: big.LITTLE TC2 update
Linaro
 
LCE12: big.LITTLE Mini-Summit
LCE12: big.LITTLE Mini-SummitLCE12: big.LITTLE Mini-Summit
LCE12: big.LITTLE Mini-Summit
Linaro
 
LAS16-400K2: TianoCore – Open Source UEFI Community Update
LAS16-400K2: TianoCore – Open Source UEFI Community UpdateLAS16-400K2: TianoCore – Open Source UEFI Community Update
LAS16-400K2: TianoCore – Open Source UEFI Community Update
Linaro
 
LCE12: LCE12 ARMv8 Plenary
LCE12: LCE12 ARMv8 PlenaryLCE12: LCE12 ARMv8 Plenary
LCE12: LCE12 ARMv8 Plenary
Linaro
 
Q4.11: Sched_mc on dual / quad cores
Q4.11: Sched_mc on dual / quad coresQ4.11: Sched_mc on dual / quad cores
Q4.11: Sched_mc on dual / quad cores
Linaro
 

Viewers also liked (8)

LCA13: Who Disturbs My Slumber
LCA13: Who Disturbs My SlumberLCA13: Who Disturbs My Slumber
LCA13: Who Disturbs My Slumber
 
BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement
 
Q2.12: Scheduler Inputs
Q2.12: Scheduler InputsQ2.12: Scheduler Inputs
Q2.12: Scheduler Inputs
 
LCE12: big.LITTLE TC2 update
LCE12: big.LITTLE TC2 updateLCE12: big.LITTLE TC2 update
LCE12: big.LITTLE TC2 update
 
LCE12: big.LITTLE Mini-Summit
LCE12: big.LITTLE Mini-SummitLCE12: big.LITTLE Mini-Summit
LCE12: big.LITTLE Mini-Summit
 
LAS16-400K2: TianoCore – Open Source UEFI Community Update
LAS16-400K2: TianoCore – Open Source UEFI Community UpdateLAS16-400K2: TianoCore – Open Source UEFI Community Update
LAS16-400K2: TianoCore – Open Source UEFI Community Update
 
LCE12: LCE12 ARMv8 Plenary
LCE12: LCE12 ARMv8 PlenaryLCE12: LCE12 ARMv8 Plenary
LCE12: LCE12 ARMv8 Plenary
 
Q4.11: Sched_mc on dual / quad cores
Q4.11: Sched_mc on dual / quad coresQ4.11: Sched_mc on dual / quad cores
Q4.11: Sched_mc on dual / quad cores
 

Similar to LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptxEmbedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
ProfMonikaJain
 
Container Orchestration from Theory to Practice
Container Orchestration from Theory to PracticeContainer Orchestration from Theory to Practice
Container Orchestration from Theory to Practice
Docker, Inc.
 
Operating Systems 1 (10/12) - Scheduling
Operating Systems 1 (10/12) - SchedulingOperating Systems 1 (10/12) - Scheduling
Operating Systems 1 (10/12) - Scheduling
Peter Tröger
 
Process scheduling
Process schedulingProcess scheduling
Process scheduling
Hao-Ran Liu
 
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
The Linux Foundation
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloud
Brendan Gregg
 
Task migration using CRIU
Task migration using CRIUTask migration using CRIU
Task migration using CRIU
Rohit Jnagal
 
Lecture- 2_Process Management.pdf
Lecture- 2_Process Management.pdfLecture- 2_Process Management.pdf
Lecture- 2_Process Management.pdf
Harika Pudugosula
 
Ch6 cpu scheduling
Ch6   cpu schedulingCh6   cpu scheduling
Ch6 cpu scheduling
Welly Dian Astika
 
Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...
ScyllaDB
 
Hardware Assisted Latency Investigations
Hardware Assisted Latency InvestigationsHardware Assisted Latency Investigations
Hardware Assisted Latency Investigations
ScyllaDB
 
operating system (1).pdf
operating system (1).pdfoperating system (1).pdf
operating system (1).pdf
AliyanAbbas1
 
LF_DPDK17_Integrating and using DPDK with Open vSwitch
LF_DPDK17_Integrating and using DPDK with Open vSwitchLF_DPDK17_Integrating and using DPDK with Open vSwitch
LF_DPDK17_Integrating and using DPDK with Open vSwitch
LF_DPDK
 
Introduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technologyIntroduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technology
義洋 顏
 
QCon 2015 Broken Performance Tools
QCon 2015 Broken Performance ToolsQCon 2015 Broken Performance Tools
QCon 2015 Broken Performance Tools
Brendan Gregg
 
Scheduling in Android
Scheduling in AndroidScheduling in Android
Scheduling in Android
Opersys inc.
 
Kubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayKubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard way
Laurent Bernaille
 
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Noam Elfanbaum
 

Similar to LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler (20)

Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptxEmbedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
 
Container Orchestration from Theory to Practice
Container Orchestration from Theory to PracticeContainer Orchestration from Theory to Practice
Container Orchestration from Theory to Practice
 
Operating Systems 1 (10/12) - Scheduling
Operating Systems 1 (10/12) - SchedulingOperating Systems 1 (10/12) - Scheduling
Operating Systems 1 (10/12) - Scheduling
 
Process scheduling
Process schedulingProcess scheduling
Process scheduling
 
Section05 scheduling
Section05 schedulingSection05 scheduling
Section05 scheduling
 
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloud
 
Task migration using CRIU
Task migration using CRIUTask migration using CRIU
Task migration using CRIU
 
Lecture- 2_Process Management.pdf
Lecture- 2_Process Management.pdfLecture- 2_Process Management.pdf
Lecture- 2_Process Management.pdf
 
Ch6 cpu scheduling
Ch6   cpu schedulingCh6   cpu scheduling
Ch6 cpu scheduling
 
Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...
 
Hardware Assisted Latency Investigations
Hardware Assisted Latency InvestigationsHardware Assisted Latency Investigations
Hardware Assisted Latency Investigations
 
operating system (1).pdf
operating system (1).pdfoperating system (1).pdf
operating system (1).pdf
 
LF_DPDK17_Integrating and using DPDK with Open vSwitch
LF_DPDK17_Integrating and using DPDK with Open vSwitchLF_DPDK17_Integrating and using DPDK with Open vSwitch
LF_DPDK17_Integrating and using DPDK with Open vSwitch
 
Introduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technologyIntroduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technology
 
QCon 2015 Broken Performance Tools
QCon 2015 Broken Performance ToolsQCon 2015 Broken Performance Tools
QCon 2015 Broken Performance Tools
 
Scheduling in Android
Scheduling in AndroidScheduling in Android
Scheduling in Android
 
Os2
Os2Os2
Os2
 
Kubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayKubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard way
 
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
 

More from Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Linaro
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Linaro
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Linaro
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
Linaro
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
Linaro
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Linaro
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
Linaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
Linaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
Linaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
Linaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
Linaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
Linaro
 

More from Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 

LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

  • 1. Wed 5 March, 11:15am, Daniel Lezcano, Mike Turquette LCA14-306: CPUidle & CPUfreq integration with scheduler
  • 2. Introduction ● Power aware discussion ● Patchset « Small task packing » − Some informations shared between cpuidle and the scheduler − https://lwn.net/Articles/520857/ ● « Line on the sand » by Ingo Molnar − Integrate first cpuidle and cpufreq with the scheduler − http://lwn.net/Articles/552885/
  • 3. Scheduler CPUidle Idle task Governor CPUidle backend driver cpuidle_idle_callswitch_to cpuidle_select cpuidle_enter CPUidle + scheduler : Current design
  • 4. Idle time measurement ● From the scheduler : − The duration of the idle task is running − Includes the interrupt processing time ● From CPUidle : − The duration between interrupts ● CPUIdle code happens with local interrupts disabled ● T(idle task) = Σ T(CPUidle) + Σ T(irqs)
  • 6. Idle time measurement unification ● What is the impact of returning to the scheduler each time an interrupt occurred ? − Scheduler will choose the idle task again if nothing to do − Mainloop code simplified − Idle time measured nearly the same for the scheduler and cpuidle − Probably a negative impact on performance to fix
  • 7. Load balance ● Taking the decision to balance a task when going to idle ■ Use of avg_idle ● Does not use how long the cpu will sleep ■ The idle state should be selected before ■ CPUIdle should give the state the cpu will be ● Balance a task to the idlest cpu ■ Does not use the cpu's exit latency ■ CPUidle should give back the state the cpu is
  • 8. CPUidle main function ● Reduce the distance between the scheduler and the cpuidle framework − Move the idle task to kernel/sched − Move the cpuidle_idle function in the idle task code − Integrate the idle mainloop and cpuidle_idle_call ● Allows to access the scheduler's private structure definition
  • 9. Menu governor split ● The events could be classified in three categories : 1. Predictable → timers 2. Repetitive → IOs 3. Random → key stroke, incoming packet ● Category 2 could be integrated into the scheduler
  • 10. IO latency tracking ● IO are repetitive within a reasonable interval to assume it as predictable enough
  • 11. IO latency tracking ● Measurement from the scheduler − io_schedule − io_schedule_timeout ● Count per task the io latency − Task migration moves IO history unlike current governor − Latency constraint for the task
  • 12. Combine informations ● Move predictable event framework in the scheduler ● Informations combined between the scheduler and menu governor will be more accurate − Idle balance decision based on the idle state a cpu is or about to enter − Load tracking from task for idle state exit latency − CPU computation power and topology − DVFS strategies for exit idle state boost
  • 13. Scheduler + CPUidle ● The scheduler should have all the informations to tell CPUidle : − How long it will sleep − What is the latency constraint ● The CPUidle should use the information provided by the scheduler : − Select an idle state − Use the backend driver idle callback − No more heuristics
  • 14. Status ● A lot of cleanups around the idle mainloop ● CPUidle main function inside the idle mainloop − Code distance reduced, sharing the structures scheduler/cpuidle − Communication between sub-systems made easier
  • 15. Work in progress ● First iteration of IO latency tracking implemented − Validation in progress ● Simple governor for CPUIdle − Select a state ● Idle time unification experimentation
  • 16. CPUfreq + scheduler The title is misleading … CPUfreq may completely disappear in the future.
  • 17. CPUfreq + scheduler The title is misleading … CPUfreq may completely disappear in the future. Goal is to initiate CPU dynamic voltage & frequency scaling (DVFS) from the Linux scheduler
  • 18. CPUfreq + scheduler The title is misleading … CPUfreq may completely disappear in the future. Goal is to initiate CPU dynamic voltage & frequency scaling (DVFS) from the Linux scheduler Nobody knows what this will look like, so please ask questions and raise suggestions
  • 19. • Polling workqueue • E.g. ondemand • Based on idle time / busyness • No relation to decisions taken by the scheduler • Task may be run at any time • No relation to idle task • In fact, task will not wake-up during idle CPUfreq today
  • 20. • Replace polling loop with event driven action • Scheduler already takes action which affects available compute capacity • Load balance • Migrating tasks to and from CPUs of different compute capacity • DVFS transitions are a natural fit Event driven behavior
  • 21. • Method to initiate CPU DVFS transitions from the scheduler • Identify call sites to initiate those transitions • Enqueue/dequeue task • Load balance • Idle entry/exit • Aggressively schedule deadline tasks • Maybe others • Define interface between the scheduler & the DVFS thingy • Currently a power driver in Morten’s RFC • Remove CPUfreq governor layer from the power driver completely? Lots of work ahead
  • 22. • Experiment with policy • When and where to evaluate if frequency should be changed • What metrics are important to the algorithm? • DVFS versus race-to-idle • Integrate with power model • Benchmark performance & power • Performance regressions • Does it save power? • Make it work with non-CPUfreq things like PSCI and ACPI for changing CPU P-state Lots of work ahead, part 2
  • 23. • https://lkml.org/lkml/2013/10/11/547 • Replaces polling loop in CPUfreq governor with scheduler event-driven action • CPUfreq machine drivers are re-used initially • CPUfreq governor becomes a shim layer to the power driver Morten’s power aware scheduling RFC
  • 24. • DVFS task is itself scheduled on a workqueue • Might not be run for some time after the scheduler determines that a DVFS transition should happen • Kworker threads are filtered out • Prevents infinite reentrancy into the scheduler • CPU capacity is not changed when enqueuing and dequeuing these tasks Nitty gritty details
  • 25. include/linux/sched/power.h struct power_driver { /* * Power driver calls may happen from scheduler context with irq * disabled and rq locks held. This must be taken into account in * the power driver. */ /* cpu already at max capacity? */ int (*at_max_capacity) (int cpu); /* Increase cpu capacity hint */ int (*go_faster) (int cpu, int hint); /* Decrease cpu capacity hint */ int (*go_slower) (int cpu, int hint); /* Best cpu to wake up */ int (*best_wake_cpu) (void); /* Scheduler call-back without rq lock held and with irq enabled */ void (*late_callback) (int cpu); };
  • 26. • https://github.com/mturquette/linux/commits/sched-cpufreq • Replaced workqueue method with per-CPU kthread • This allows removal of the kworker filter • Please commence bikeshedding over the name of this kthread • Use SCHED_FIFO policy for the task • Will be run before the normal work (right?) • These patches were just validated yesterday • Bugs • Holes in logic • Misunderstandings • Voided warranties Incremental changes on top
  • 27. • Gather more opinions on the power driver interface • Is go_faster/go_slower the right way? • Spoiler alert: Probably not. • When else might we want to evaluate CPU frequency? • Idle entry/exit as mentioned by Daniel • Cluster-level considerations • Sched domains • Not just per-core • Four Cortex-A9’s with single CPU clock • Coordinate with the power model work What’s next?
  • 29. More about Linaro Connect: http://connect.linaro.org More about Linaro: http://www.linaro.org/about/ More about Linaro engineering: http://www.linaro.org/engineering/ Linaro members: www.linaro.org/members