• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
 

LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

on

  • 356 views

...


Resource: LCA14
Name: LCA14-306: CPUidle & CPUfreq integration with scheduler
Date: 05-03-2014
Speaker: Daniel Lezcano, Mike Turquette
Video: https://www.youtube.com/watch?v=Ug4uQEYwl5s

Statistics

Views

Total Views
356
Views on SlideShare
356
Embed Views
0

Actions

Likes
0
Downloads
31
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler Presentation Transcript

    • Wed 5 March, 11:15am, Daniel Lezcano, Mike Turquette LCA14-306: CPUidle & CPUfreq integration with scheduler
    • Introduction ● Power aware discussion ● Patchset « Small task packing » − Some informations shared between cpuidle and the scheduler − https://lwn.net/Articles/520857/ ● « Line on the sand » by Ingo Molnar − Integrate first cpuidle and cpufreq with the scheduler − http://lwn.net/Articles/552885/
    • Scheduler CPUidle Idle task Governor CPUidle backend driver cpuidle_idle_callswitch_to cpuidle_select cpuidle_enter CPUidle + scheduler : Current design
    • Idle time measurement ● From the scheduler : − The duration of the idle task is running − Includes the interrupt processing time ● From CPUidle : − The duration between interrupts ● CPUIdle code happens with local interrupts disabled ● T(idle task) = Σ T(CPUidle) + Σ T(irqs)
    • Idle time measurement
    • Idle time measurement unification ● What is the impact of returning to the scheduler each time an interrupt occurred ? − Scheduler will choose the idle task again if nothing to do − Mainloop code simplified − Idle time measured nearly the same for the scheduler and cpuidle − Probably a negative impact on performance to fix
    • Load balance ● Taking the decision to balance a task when going to idle ■ Use of avg_idle ● Does not use how long the cpu will sleep ■ The idle state should be selected before ■ CPUIdle should give the state the cpu will be ● Balance a task to the idlest cpu ■ Does not use the cpu's exit latency ■ CPUidle should give back the state the cpu is
    • CPUidle main function ● Reduce the distance between the scheduler and the cpuidle framework − Move the idle task to kernel/sched − Move the cpuidle_idle function in the idle task code − Integrate the idle mainloop and cpuidle_idle_call ● Allows to access the scheduler's private structure definition
    • Menu governor split ● The events could be classified in three categories : 1. Predictable → timers 2. Repetitive → IOs 3. Random → key stroke, incoming packet ● Category 2 could be integrated into the scheduler
    • IO latency tracking ● IO are repetitive within a reasonable interval to assume it as predictable enough
    • IO latency tracking ● Measurement from the scheduler − io_schedule − io_schedule_timeout ● Count per task the io latency − Task migration moves IO history unlike current governor − Latency constraint for the task
    • Combine informations ● Move predictable event framework in the scheduler ● Informations combined between the scheduler and menu governor will be more accurate − Idle balance decision based on the idle state a cpu is or about to enter − Load tracking from task for idle state exit latency − CPU computation power and topology − DVFS strategies for exit idle state boost
    • Scheduler + CPUidle ● The scheduler should have all the informations to tell CPUidle : − How long it will sleep − What is the latency constraint ● The CPUidle should use the information provided by the scheduler : − Select an idle state − Use the backend driver idle callback − No more heuristics
    • Status ● A lot of cleanups around the idle mainloop ● CPUidle main function inside the idle mainloop − Code distance reduced, sharing the structures scheduler/cpuidle − Communication between sub-systems made easier
    • Work in progress ● First iteration of IO latency tracking implemented − Validation in progress ● Simple governor for CPUIdle − Select a state ● Idle time unification experimentation
    • CPUfreq + scheduler The title is misleading … CPUfreq may completely disappear in the future.
    • CPUfreq + scheduler The title is misleading … CPUfreq may completely disappear in the future. Goal is to initiate CPU dynamic voltage & frequency scaling (DVFS) from the Linux scheduler
    • CPUfreq + scheduler The title is misleading … CPUfreq may completely disappear in the future. Goal is to initiate CPU dynamic voltage & frequency scaling (DVFS) from the Linux scheduler Nobody knows what this will look like, so please ask questions and raise suggestions
    • • Polling workqueue • E.g. ondemand • Based on idle time / busyness • No relation to decisions taken by the scheduler • Task may be run at any time • No relation to idle task • In fact, task will not wake-up during idle CPUfreq today
    • • Replace polling loop with event driven action • Scheduler already takes action which affects available compute capacity • Load balance • Migrating tasks to and from CPUs of different compute capacity • DVFS transitions are a natural fit Event driven behavior
    • • Method to initiate CPU DVFS transitions from the scheduler • Identify call sites to initiate those transitions • Enqueue/dequeue task • Load balance • Idle entry/exit • Aggressively schedule deadline tasks • Maybe others • Define interface between the scheduler & the DVFS thingy • Currently a power driver in Morten’s RFC • Remove CPUfreq governor layer from the power driver completely? Lots of work ahead
    • • Experiment with policy • When and where to evaluate if frequency should be changed • What metrics are important to the algorithm? • DVFS versus race-to-idle • Integrate with power model • Benchmark performance & power • Performance regressions • Does it save power? • Make it work with non-CPUfreq things like PSCI and ACPI for changing CPU P-state Lots of work ahead, part 2
    • • https://lkml.org/lkml/2013/10/11/547 • Replaces polling loop in CPUfreq governor with scheduler event-driven action • CPUfreq machine drivers are re-used initially • CPUfreq governor becomes a shim layer to the power driver Morten’s power aware scheduling RFC
    • • DVFS task is itself scheduled on a workqueue • Might not be run for some time after the scheduler determines that a DVFS transition should happen • Kworker threads are filtered out • Prevents infinite reentrancy into the scheduler • CPU capacity is not changed when enqueuing and dequeuing these tasks Nitty gritty details
    • include/linux/sched/power.h struct power_driver { /* * Power driver calls may happen from scheduler context with irq * disabled and rq locks held. This must be taken into account in * the power driver. */ /* cpu already at max capacity? */ int (*at_max_capacity) (int cpu); /* Increase cpu capacity hint */ int (*go_faster) (int cpu, int hint); /* Decrease cpu capacity hint */ int (*go_slower) (int cpu, int hint); /* Best cpu to wake up */ int (*best_wake_cpu) (void); /* Scheduler call-back without rq lock held and with irq enabled */ void (*late_callback) (int cpu); };
    • • https://github.com/mturquette/linux/commits/sched-cpufreq • Replaced workqueue method with per-CPU kthread • This allows removal of the kworker filter • Please commence bikeshedding over the name of this kthread • Use SCHED_FIFO policy for the task • Will be run before the normal work (right?) • These patches were just validated yesterday • Bugs • Holes in logic • Misunderstandings • Voided warranties Incremental changes on top
    • • Gather more opinions on the power driver interface • Is go_faster/go_slower the right way? • Spoiler alert: Probably not. • When else might we want to evaluate CPU frequency? • Idle entry/exit as mentioned by Daniel • Cluster-level considerations • Sched domains • Not just per-core • Four Cortex-A9’s with single CPU clock • Coordinate with the power model work What’s next?
    • Questions?
    • More about Linaro Connect: http://connect.linaro.org More about Linaro: http://www.linaro.org/about/ More about Linaro engineering: http://www.linaro.org/engineering/ Linaro members: www.linaro.org/members