© 2017 Arm Limited
EAS in Android
Common Kernel
Linaro Connect
Hong Kong 2018
Chris Redpath
Quentin Perret
Open Source Software
© 2017 Arm Limited2
EAS in Android Common Kernel
AOSP Common Kernel Update
EAS Mainline Strategy
EAS Upstreaming
© 2017 Arm Limited
AOSP Common Kernel
Update
© 2017 Arm Limited4
AOSP Common Kernel Update
• EAS r1.3, July 2017
• android-4.4, android-4.9
• Default cpufreq governor switched to schedutil, sched-freq removed
• Backports of upstream schedutil changes
• Upstream backports of relevant scheduler features
android-4.4
android-4.9
android-4.14
EAS r1.3 EAS r1.4 EAS r1.5
© 2017 Arm Limited5
AOSP Common Kernel Update
• EAS r1.4, November 2017
• android-4.4, android-4.9
• Upstream backports of more scheduler and schedutil patches
• Energy diff improvements & fixes
• android-4.14 EAS released including 1.4 & most 1.5 functionality
android-4.4
android-4.9
android-4.14
EAS r1.3 EAS r1.4 EAS r1.5
© 2017 Arm Limited6
EAS in android-4.14
A new set of patches implementing EAS rather than forward-porting
• Based upon our latest mainline-focussed integration branch
• Refactored latest android-eas on top to build clean set of patches
• More Experimental features placed behind sched_features
• Feature configuration matches android-4.9
• Produced during linux-4.14 rc phase, ready 2-weeks after linux-4.14
© 2017 Arm Limited7
EAS in android-4.14
android-specific
Upstream-targeted
find_best_target
schedtune
WALT Trace & Debug
Topology Detection
Invariance Support
Use of idle states
Rt-PELT
Schedutil changes
Sync Wakeups
Energy Diff
Calculation
NOHZ Signal Updates
Misfit Tasks
&
Overutilized Flags
Load balance tweaks
© 2017 Arm Limited8
AOSP Common Kernel Update
• EAS r1.5, Feb 2018 (eas-dev), merging to android-4.9 soon
• android-4.9 only, most changes already in android-4.14
• Refactored energy diff to make calculation more efficient
• Further refinement of EAS CPU pre-selection (find_best_target)
– Thanks for excellent contributions from Qualcomm, Spreadtrum, Mstar, Linaro
• Aggressive up-migrate of Misfit tasks & WALT updates from CodeAurora
android-4.4
android-4.9
android-4.14
EAS r1.3 EAS r1.4 EAS r1.5
© 2017 Arm Limited9
AOSP Common Kernel Update
• EAS r1.6, eas-dev starting April 2018
• Moving to android-4.14
• Adding back Schedtune PE space filtering
• Util_est backport, with PELT decay rate changes
• Use mainline wakeup code for prefer_idle tasks
• Remove ordering dependency in find_best_target
– ( better tri-gear support when using find_best_target )
android-4.4
android-4.9
android-4.14
EAS r1.5 EAS r1.6
© 2017 Arm Limited10
AOSP Common Kernel Update
Branches:
• android-4.4, android-4.9 & android-4.14
• Common kernel upstream for device kernels
• Only post against this for bugfixes
• People merge these into device kernels, so need to be selective about
changes
© 2017 Arm Limited11
AOSP Common Kernel Update
More branches:
• android-4.9-eas-dev (soon android-4.14-eas-dev)
• This is where in-development patches should be posted
• Arm power team usually post patches at RFC stage to stimulate discussion
• Changes picked or merged back to common
• android-4.4-eas-test (android-4.9-eas-test later)
• Test branch is against android common for the latest well-supported public
device
• Intended to hold backports of EAS patches which merged into the active
common branch, but did not get back to the branch we test with
© 2017 Arm Limited12
AOSP Common Kernel Update
There have been some consistent themes in EAS development over the last year or so:
© 2017 Arm Limited13
AOSP Common Kernel Update
There have been some consistent themes in EAS development over the last year or so:
• Reducing delta with mainline
© 2017 Arm Limited14
AOSP Common Kernel Update
There have been some consistent themes in EAS development over the last year or so:
• Reducing delta with mainline
• Refactoring to improve maintainability and predictability
© 2017 Arm Limited15
AOSP Common Kernel Update
There have been some consistent themes in EAS development over the last year or so:
• Reducing delta with mainline
• Refactoring to improve maintainability and predictability
• New features where necessary
© 2017 Arm Limited16
AOSP Common Kernel Update
There have been some consistent themes in EAS development over the last year or so:
• Reducing delta with mainline
• Refactoring to improve maintainability and predictability
• New features where necessary
• Open, collaborative development
© 2017 Arm Limited17
AOSP Common Kernel Update
Open Development
• Patches for AOSP are reviewed on AOSP Gerrit
• https://android-review.googlesource.com
• We always try to justify patches with performance & energy numbers – use wltests for this
• Wltests is part of LISA https://www.github.com/arm-software/lisa
• Discussion of other topics and announcements are on Linaro’s eas-dev list
• https://lists.linaro.org/mailman/listinfo/eas-dev
© 2017 Arm Limited
EAS Mainline Strategy
© 2017 Arm Limited19
EAS Mainline Strategy
• EAS is a large, complex piece of functionality
• EAS being in AOSP helps a lot of users but not all
• Upstream development results in better code
© 2017 Arm Limited20
EAS Mainline Strategy
• We make regular bi-weekly integrations of all our upstream-focussed code
• Available on linux-arm.org & announced on eas-dev
• Allows us to more easily see when changes impact us and work to resolve as soon as possible
• Have been identifying suitable code we already have
• Working on getting them into acceptable shape
• Pushing when we think they are good enough for a review
• Hoping to upstream quite a lot of EAS this year
© 2017 Arm Limited21
EAS Mainline Strategy
• Also working upstream where we can and backporting to Android
• schedutil fixes
• cpu signal updates
• any fix/change applicable and potentially useful elsewhere
• participating in reviews and testing
© 2017 Arm Limited22
EAS in AOSP
EAS Code Size by Category Target EAS Code Size by Category
Android-specific 3797 824
WALT 1470 0
Upstreamable Features 2785 2785
Documentation 1153 1153
1153 1153
2785 2785
1470
0
3797
824
© 2017 Arm Limited23
EAS Size
100.00%
14.76%
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
EAS Code size in android-4.14 Android Specific Code Target (rough estimate if everything goes to plan, mid-2019)
ANDROID-SPECIFIC EAS CODE SIZE
© 2017 Arm Limited24
Bringing EAS in AOSP Closer to Mainline
1. Reach performance/energy parity with WALT
• WALT is great for mobile but not popular upstream
• It’s also 1.5k LoC
• Touches many parts of the scheduler we want to change upstream, which makes
backporting harder
© 2017 Arm Limited25
Bringing EAS in AOSP Closer to Mainline
1. Reach performance/energy parity with WALT
• Disable WALT by default in android-common when ready
2. Push better support for big.LITTLE into mainline scheduler
• Push out-of-tree wakeup and periodic balance changes upstream
• Push energy diff calculations upstream
© 2017 Arm Limited26
Bringing EAS in AOSP Closer to Mainline
1. Reach performance/energy parity with WALT
• Disable WALT by default in android-common when ready
2. Push better support for big.LITTLE into mainline scheduler
• Push out-of-tree wakeup and periodic balance changes upstream
• Push energy diff calculations upstream
3. Expect to continue to carry mobile-specific changes in AOSP
• Schedutil up/down throttle split
• Rt-rq signals
• Performance/Energy task classification
© 2017 Arm Limited
EAS Upstreaming
© 2017 Arm Limited28
EAS Upstreaming
7 areas identified for upstreaming.
Feature Status
Energy Model On LKML (v1 March 2018, during Connect!)
Frequency and Cpu Invariant Engines (FIE/CIE) Merged in v4.15
Idle Cpu PELT update (Remote status update) Merged in tip/sched/core
Util Est Merged in tip/sched/core (during Connect!)
Util Clamp Almost ready (v1 on LKML April 2018)
Misfit Task On LKML (v2 March 2018)
Dynamic Topology Flag Detection In development, many scenarios to cover
© 2017 Arm Limited29
EAS Upstreaming
• Util-Est
• Add an aggregator on top of the PELT estimator
– keep track of what “we learned” about task’s
previous activations
– generate a “new” signal on top of PELT
• Build a low-overhead statistic for SEs and CPUs
– Tasks at dequeue time
– Root RQs at task enqueue/dequeue
• Lots of detail at last year’s OSPM Summit and lkml
• Patches merged into upstream tip/sched/core
branch
© 2017 Arm Limited30
EAS Upstreaming
• Misfit Tasks
• Promote long-running tasks to most capable Cpus
• Key to achieving consistent performance in
heterogenous systems
• Tasks which don’t sleep need active migration
© 2017 Arm Limited
A Simplified Energy
Model for EAS
© 2017 Arm Limited32
An Energy Model: why ?
• Power/perf. characteristics differ between different SoCs
• Heuristics don’t perform well on many platforms
• The Energy Model enables the design of a platform-agnostic algorithm in the
scheduler
• Designed for mainline
© 2017 Arm Limited33
Summary
1. Today’s Energy Model
2. Which simplified Energy Model ?
3. Mainline implementation
4. Conclusion
© 2017 Arm Limited34
Summary
1. Today’s Energy Model
2. Which simplified Energy Model ?
3. Mainline implementation
4. Conclusion
© 2017 Arm Limited35
The Energy Model in Android / Hikey960
© 2017 Arm Limited36
The Energy Model in Android / Hikey960
LITTLE big
MHz Cap. Cost MHz Cap. Cost
533 133 87 903 390 404
999 250 167 1421 615 861
1402 351 265 1805 782 1398
1709 429 388 2112 915 2200
1844 462 502 2362 1024 2848
0
500
1000
1500
2000
2500
3000
0 200 400 600 800 1000
CPULEVEL
Capacity
Power
© 2017 Arm Limited37
The Energy Model in Android / Hikey960
LITTLE big
MHz Cap. Cost MHz Cap. Cost
533 133 87 903 390 404
999 250 167 1421 615 861
1402 351 265 1805 782 1398
1709 429 388 2112 915 2200
1844 462 502 2362 1024 2848
0
500
1000
1500
2000
2500
3000
0 200 400 600 800 1000
CPULEVEL
LITTLE big
5 18
5 18
0 0
0 0
Capacity
Power
© 2017 Arm Limited38
The Energy Model in Android / Hikey960
LITTLE big
MHz Cap. Cost MHz Cap. Cost
533 133 87 903 390 404
999 250 167 1421 615 861
1402 351 265 1805 782 1398
1709 429 388 2112 915 2200
1844 462 502 2362 1024 2848
0
100
200
300
400
500
0 200 400 600 800 1000
LITTLE big
MHz Cap. Cost MHz Cap. Cost
533 133 12 903 390 102
999 250 22 1421 615 124
1402 351 36 1805 782 221
1709 429 67 2112 915 330
1844 462 144 2362 1024 433
0
500
1000
1500
2000
2500
3000
0 200 400 600 800 1000
CPULEVELCLUSTERLEVEL
LITTLE big
5 18
5 18
0 0
0 0
LITTLE big
12 102
12 102
12 102
0 0
Capacity
Capacity
PowerPower
© 2017 Arm Limited39
Device-tree bindings
© 2017 Arm Limited40
Device-tree bindings
[...]
cpu0: cpu@0 {
[...]
sched-energy-costs = <&CPU_COST_A53 &CL_COST_A53>;
[...]
}
[...]
cpu1: cpu@1 {
[...]
sched-energy-costs = <&CPU_COST_A53 &CL_COST_A53>;
[...]
}
[...]
cpu4: cpu@100 {
[...]
sched-energy-costs = <&CPU_COST_A72 &CL_COST_A72>;
[...]
}
arch/arm64/boot/dts/hisilicon/hi3660.dtsi
© 2017 Arm Limited41
Device-tree bindings
[...]
cpu0: cpu@0 {
[...]
sched-energy-costs = <&CPU_COST_A53 &CL_COST_A53>;
[...]
}
[...]
cpu1: cpu@1 {
[...]
sched-energy-costs = <&CPU_COST_A53 &CL_COST_A53>;
[...]
}
[...]
cpu4: cpu@100 {
[...]
sched-energy-costs = <&CPU_COST_A72 &CL_COST_A72>;
[...]
}
CPU_COST_A72: core-cost0 {
busy-cost-data = <
390 404
615 861
782 1398
915 2200
1024 2848 >;
idle-cost-data = < 18 18 0 0 >;
};
CPU_COST_A53: core-cost1 {
busy-cost-data = <
133 87
250 164
351 265
429 388
462 502 >;
idle-cost-data = < 5 5 0 0 >;
};
CLUSTER_COST_A72: cluster-cost0 {
busy-cost-data = <
[...]
arch/arm64/boot/dts/hisilicon/hi3660.dtsi arch/arm64/boot/dts/hisilicon/hi3660-sched-energy.dtsi
© 2017 Arm Limited42
Device-tree bindings
[...]
cpu0: cpu@0 {
[...]
sched-energy-costs = <&CPU_COST_A53 &CL_COST_A53>;
[...]
}
[...]
cpu1: cpu@1 {
[...]
sched-energy-costs = <&CPU_COST_A53 &CL_COST_A53>;
[...]
}
[...]
cpu4: cpu@100 {
[...]
sched-energy-costs = <&CPU_COST_A72 &CL_COST_A72>;
[...]
}
CPU_COST_A72: core-cost0 {
busy-cost-data = <
390 404
615 861
782 1398
915 2200
1024 2848 >;
idle-cost-data = < 18 18 0 0 >;
};
CPU_COST_A53: core-cost1 {
busy-cost-data = <
133 87
250 164
351 265
429 388
462 502 >;
idle-cost-data = < 5 5 0 0 >;
};
CLUSTER_COST_A72: cluster-cost0 {
busy-cost-data = <
[...]
arch/arm64/boot/dts/hisilicon/hi3660.dtsi arch/arm64/boot/dts/hisilicon/hi3660-sched-energy.dtsi
© 2017 Arm Limited43
Sched domains
© 2017 Arm Limited44
Sched domains
CPU 0
© 2017 Arm Limited45
Sched domains
MC 0 1 2 3
0 1 2 3
groups
CPU span
CPU 0
© 2017 Arm Limited46
Sched domains
MC 0 1 2 3
0 1 2 3
groups
CPU span
CPU 0
LITTLE
Cap. Cost
133 87
250 167
351 265
429 388
462 502
LITTLE
5
5
0
0
CPU
© 2017 Arm Limited47
Sched domains
MC
DIE
0 1 2 3
10 11
0 1 2 3 4 5 6 7
parent
groups
groups
CPU span
CPU 0
LITTLE
Cap. Cost
133 87
250 167
351 265
429 388
462 502
LITTLE
5
5
0
0
CPU
© 2017 Arm Limited48
Sched domains
MC
DIE
0 1 2 3
10 11
0 1 2 3 4 5 6 7
parent
groups
groups
CPU span
CPU 0
LITTLE
Cap. Cost
133 87
250 167
351 265
429 388
462 502
LITTLE
5
5
0
0
CPU
big
Cap. Cost
390 102
615 124
782 221
915 330
1024 433
big
102
102
102
0
ClusterCluster
LITTLE
12
12
12
0
LITTLE
Cap. Cost
133 12
250 22
351 36
429 67
462 144
© 2017 Arm Limited49
Need for simplification
• Comprehensive Energy Model, but …
© 2017 Arm Limited50
Need for simplification
• Comprehensive Energy Model, but …
• Complex to measure for new platforms
© 2017 Arm Limited51
Need for simplification
• Comprehensive Energy Model, but …
• Complex to measure for new platforms
• Computationally expensive scheduling decisions
© 2017 Arm Limited52
Need for simplification
• Comprehensive Energy Model, but …
• Complex to measure for new platforms
• Computationally expensive scheduling decisions
• Existing code relies only on out-of-tree bindings
© 2017 Arm Limited53
Need for simplification
• Comprehensive Energy Model, but …
• Complex to measure for new platforms
• Computationally expensive scheduling decisions
• Existing code relies only on out-of-tree bindings
• Inaccurate assumptions for future platforms
© 2017 Arm Limited54
Summary
1. Today’s Energy Model
2. Which simplified Energy Model ?
3. Mainline implementation
4. Conclusion
© 2017 Arm Limited55
Which simplified EM ?
© 2017 Arm Limited56
Which simplified EM ?
Name
CPU Level Cluster Level
Active costs Idle costs Active costs Idle costs
FULL YES YES YES YES
NOIDLE YES NO YES NO
NOCLUSTER YES YES NO NO
NOCLUSTER_NOIDLE YES NO NO NO
NO_EAS NO NO NO NO
© 2017 Arm Limited57
Which simplified EM ?
Name
CPU Level Cluster Level
Active costs Idle costs Active costs Idle costs
FULL YES YES YES YES
NOIDLE YES NO YES NO
NOCLUSTER YES YES NO NO
NOCLUSTER_NOIDLE YES NO NO NO
NO_EAS NO NO NO NO
© 2017 Arm Limited58
Which simplified EM ?
Name
CPU Level Cluster Level
Active costs Idle costs Active costs Idle costs
FULL YES YES YES YES
NOIDLE YES NO YES NO
NOCLUSTER YES YES NO NO
NOCLUSTER_NOIDLE YES NO NO NO
NO_EAS NO NO NO NO
© 2017 Arm Limited59
Which simplified EM ?
Name
CPU Level Cluster Level
Active costs Idle costs Active costs Idle costs
FULL YES YES YES YES
NOIDLE YES NO YES NO
NOCLUSTER YES YES NO NO
NOCLUSTER_NOIDLE YES NO NO NO
NO_EAS NO NO NO NO
© 2017 Arm Limited60
Which simplified EM ?
Name
CPU Level Cluster Level
Active costs Idle costs Active costs Idle costs
FULL YES YES YES YES
NOIDLE YES NO YES NO
NOCLUSTER YES YES NO NO
NOCLUSTER_NOIDLE YES NO NO NO
NO_EAS NO NO NO NO
© 2017 Arm Limited61
Which simplified EM ?
Name
CPU Level Cluster Level
Active costs Idle costs Active costs Idle costs
FULL YES YES YES YES
NOIDLE YES NO YES NO
NOCLUSTER YES YES NO NO
NOCLUSTER_NOIDLE YES NO NO NO
NO_EAS NO NO NO NO
© 2017 Arm Limited62
Which simplified EM ?
Name
CPU Level Cluster Level
Active costs Idle costs Active costs Idle costs
FULL YES YES YES YES
NOIDLE YES NO YES NO
NOCLUSTER YES YES NO NO
NOCLUSTER_NOIDLE YES NO NO NO
NO_EAS NO NO NO NO
• Tested on Android-4.4: Hikey960, Pixel2, Hikey620
© 2017 Arm Limited63
Which simplified EM ?
Name
CPU Level Cluster Level
Active costs Idle costs Active costs Idle costs
FULL YES YES YES YES
NOIDLE YES NO YES NO
NOCLUSTER YES YES NO NO
NOCLUSTER_NOIDLE YES NO NO NO
NO_EAS NO NO NO NO
• Tested on Android-4.4: Hikey960, Pixel2, Hikey620
• SchedTune disabled, no cpusets
© 2017 Arm Limited64
Jankbench / list_view - Hikey960
FULL
NOIDLE
NOCLUSTER_NOIDLE
NOCLUSTER
NO_EAS
© 2017 Arm Limited65
Jankbench / list_view - Hikey960
FULL
NOIDLE
NOCLUSTER_NOIDLE
NOCLUSTER
NO_EAS
Mean
© 2017 Arm Limited66
Jankbench / list_view - Hikey960
FULL
NOIDLE
NOCLUSTER_NOIDLE
NOCLUSTER
NO_EAS
© 2017 Arm Limited67
Jankbench / list_view - Hikey960
FULL
NOIDLE
NOCLUSTER_NOIDLE
NOCLUSTER
NO_EAS
50%
© 2017 Arm Limited68
Jankbench / list_view - Hikey960
FULL
NOIDLE
NOCLUSTER_NOIDLE
NOCLUSTER
NO_EAS
50%
~99%
© 2017 Arm Limited69
Jankbench / list_view - Hikey960
FULL
NOIDLE
NOCLUSTER_NOIDLE
NOCLUSTER
NO_EAS
© 2017 Arm Limited70
Jankbench / image_list_view - Hikey960
FULL
NOIDLE
NOCLUSTER_NOIDLE
NOCLUSTER
NO_EAS
© 2017 Arm Limited71
Jankbench / low_hitrate_text - Hikey960
FULL
NOIDLE
NOCLUSTER_NOIDLE
NOCLUSTER
NO_EAS
© 2017 Arm Limited72
Jankbench / shadow_grid - Hikey960
FULL
NOIDLE
NOCLUSTER_NOIDLE
NOCLUSTER
NO_EAS
© 2017 Arm Limited73
Jankbench / edit_text - Hikey960
FULL
NOIDLE
NOCLUSTER_NOIDLE
NOCLUSTER
NO_EAS
© 2017 Arm Limited74
Homescreen / Hikey960
FULL
NOIDLE
NOCLUSTER_NOIDLE
NOCLUSTER
NO_EAS
© 2017 Arm Limited75
ExoPlayer Video / Hikey960
FULL
NOIDLE
NOCLUSTER_NOIDLE
NOCLUSTER
NO_EAS
© 2017 Arm Limited76
ExoPlayer Audio / Hikey960
FULL
NOIDLE
NOCLUSTER_NOIDLE
NOCLUSTER
NO_EAS
© 2017 Arm Limited77
Results of experiments
• Hikey960: all energy models show comparable energy savings
• Pixel2: same conclusions with smaller savings (up to 13%, screen on)
• Hikey620 (SMP): No significant savings
© 2017 Arm Limited78
Results of experiments
• Hikey960: all energy models show comparable energy savings
• Pixel2: same conclusions with smaller savings (up to 13%, screen on)
• Hikey620 (SMP): No significant savings
The simplest EM (noidle_nocluster) is
a reasonable option for modern platforms
© 2017 Arm Limited79
Summary
1. Today’s Energy Model
2. Which simplified Energy Model ?
3. Mainline implementation
4. Conclusion
© 2017 Arm Limited80
Dynamic power model
© 2017 Arm Limited81
Dynamic power model
𝑃 = 𝐶 ∗ 𝑉 2 ∗ 𝑓
2 2 𝑓
© 2017 Arm Limited82
Dynamic power model
𝑃 = 𝐶 ∗ 𝑉 2 ∗ 𝑓
2 2 𝑓
Power
© 2017 Arm Limited83
Dynamic power model
𝑃 = 𝐶 ∗ 𝑉 2 ∗ 𝑓
2 2 𝑓
Power Capacitance
© 2017 Arm Limited84
Dynamic power model
𝑃 = 𝐶 ∗ 𝑉 2 ∗ 𝑓
2 2 𝑓
Power Capacitance Voltage
© 2017 Arm Limited85
Dynamic power model
𝑃 = 𝐶 ∗ 𝑉 2 ∗ 𝑓
2 2 𝑓
Power Capacitance Voltage Frequency
© 2017 Arm Limited86
Dynamic power model
𝑃 = 𝐶 ∗ 𝑉 2 ∗ 𝑓
2 2 𝑓
Power Capacitance Voltage Frequency
Mainline DT binding:
dynamic-power-coefficient
© 2017 Arm Limited87
Dynamic power model
𝑃 = 𝐶 ∗ 𝑉 2 ∗ 𝑓
2 2 𝑓
Power Capacitance Voltage Frequency
Mainline DT binding:
dynamic-power-coefficient
Managed by CPUFreq / OPP
© 2017 Arm Limited88
Energy Model Comparison / Hikey960
1
11
21
31
41
51
61
71
81
91
200 700 1200 1700 2200
Measured
𝐶 ∗ 𝑉2
∗ 𝑓
Frequency (MHz).
Power(%)
© 2017 Arm Limited89
Architecture
© 2017 Arm Limited90
Architecture
Thermal / IPA
𝑃 = 𝐶𝑉2
𝑓
PM OPP
Device Tree
𝑉 𝑓
𝐶
© 2017 Arm Limited91
Architecture
Thermal / IPA
PM OPP
Device Tree
𝑉 𝑓
𝐶
© 2017 Arm Limited92
Architecture
Thermal / IPA
PM OPP
Device Tree
𝑉 𝑓
𝐶
𝑃
𝐶𝑉2
𝑓
© 2017 Arm Limited93
Architecture
Thermal / IPA
PM OPP
Device Tree
𝑉 𝑓
𝐶
𝑃
𝐶𝑉2
𝑓
Scheduler
Energy Model
© 2017 Arm Limited94
Architecture
Thermal / IPA
PM OPP
Device Tree
𝑉 𝑓
𝐶
𝑃
𝐶𝑉2
𝑓
Scheduler
Energy Model
SCMI
© 2017 Arm Limited95
Architecture
Thermal / IPA
PM OPP
Device Tree
𝑉 𝑓
𝐶
𝑃
𝐶𝑉2
𝑓
Scheduler
Energy Model
SCMI Other ?
© 2017 Arm Limited96
Architecture
Thermal / IPA
PM OPP
Device Tree
𝑉 𝑓
𝐶
𝑃
𝐶𝑉2
𝑓
Scheduler
Energy Model
SCMI Other ?
[PATCH v3 0/2] thermal, OPP: move the CPU power estimation to the OPP library
-> [PATCH v3 1/2] PM / OPP: introduce an OPP power estimation helper
-> [PATCH v3 2/2] thermal: cpu_cooling: use power models from the OPP library
© 2017 Arm Limited97
Implementation
• No hierarchical data, no need to use the scheduling domains
• Data structures:
• Loaded from PM / OPP at boot time, after CPUfreq is setup
• Energy models are stored in a flat per-cpu array
• Frequency domains are stored in cpu-masks
© 2017 Arm Limited98
Assumptions
• All CPUs in a freq. domain share capacity states
• All CPUs in a freq. domain have the same micro-architecture
• Possible to relax this if needed, but higher computational cost
• EAS enabled for asymmetric platforms only (SD_ASYM_CPUCAPACITY flag set)
• EAS shines on heterogeneous platforms
• Avoid “conflicts” for purely perf-oriented platforms (servers, …)
© 2017 Arm Limited99
Tests against mainline
• Test setup:
• Platform: Hikey960 and Juno r0
• Debian userspace
• Base kernel: tip/sched/core – 4.16-rc2
• Test cases:
• Energy: “X” RTApp tasks, 16ms period, 5% duty cycle, 30 seconds
• Performance: `perf bench sched messaging –pipe –thread –group X –loop 50000`
© 2017 Arm Limited100
Tests against mainline / Energy
0
10
20
30
40
50
60
70
80
90
100
10 tasks 20 tasks 30 tasks 40 tasks 50 tasks
tip/sched/core EAS
0
10
20
30
40
50
60
70
80
90
100
10 tasks 20 tasks 30 tasks 40 tasks 50 tasks
tip/sched/core EAS
Hikey960 (ACME / full SoC + memory) Juno (HW monitor / b.L CPUs only)
Energy(%)
Energy(%)
© 2017 Arm Limited101
Tests against mainline / Perf.
0
10
20
30
40
50
60
70
40 tasks 80 tasks 160 tasks 320 tasks
tip/sched/core EAS
0
10
20
30
40
50
60
40 tasks 80 tasks 160 tasks 320 tasks
tip/sched/core EAS
Hikey960 Juno
Time(s)
Time(s)
© 2017 Arm Limited102
Posted to LKML this week
[RFC PATCH 0/6] Energy Aware Scheduling
[RFC PATCH 1/6] sched/fair: Create util_fits_capacity()
[RFC PATCH 2/6] sched: Introduce energy models of CPUs
[RFC PATCH 3/6] sched: Add over-utilization/tipping point indicator
[RFC PATCH 4/6] sched/fair: Introduce an energy estimation helper …
[RFC PATCH 5/6] sched/fair: Select an energy-efficient CPU on task …
[RFC PATCH 6/6] drivers: base: arch_topology.c: Enable EAS for arm/…
© 2017 Arm Limited103
Summary
1. Today’s Energy Model
2. Which simplified Energy Model ?
3. Mainline implementation
4. Conclusion
© 2017 Arm Limited104
Next steps
• Ideal scenario: simplified EM goes in the next LTS (4.19 ?)
• Test & assessment on android-4.14
• In case of gaps with the full EM, they will be filled in product
© 2017 Arm Limited105
Thanks.
Any questions ?
106106 © 2017 Arm Limited
The Arm trademarks featured in this presentation are registered trademarks or
trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights
reserved. All other marks featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks
© 2017 Arm Limited107
Algorithm complexity
T1
cap=100
cost=100
cap=350
cost=400
cap=500
cost=800
CPU0 CPU1 CPU2 CPU3
cap=400
cost=700
cap=600
cost=1000
cap=800
cost=1800
CPU4 CPU5 CPU6 CPU7
cap=1000
cost=3000
LITTLE CPUs BIG CPUs
© 2017 Arm Limited108
Algorithm complexity
T1
cap=100
cost=100
cap=350
cost=400
cap=500
cost=800
CPU0 CPU1 CPU2 CPU3
cap=400
cost=700
cap=600
cost=1000
cap=800
cost=1800
CPU4 CPU5 CPU6 CPU7
cap=1000
cost=3000
LITTLE CPUs BIG CPUs
T2
© 2017 Arm Limited109
Algorithm complexity
T1
cap=100
cost=100
cap=350
cost=400
cap=500
cost=800
CPU0 CPU1 CPU2 CPU3
cap=400
cost=700
cap=600
cost=1000
cap=800
cost=1800
CPU4 CPU5 CPU6 CPU7
cap=1000
cost=3000
LITTLE CPUs BIG CPUs
T2
© 2017 Arm Limited110
Algorithm complexity
T1
cap=100
cost=100
cap=350
cost=400
cap=500
cost=800
CPU0 CPU1 CPU2 CPU3
cap=400
cost=700
cap=600
cost=1000
cap=800
cost=1800
CPU4 CPU5 CPU6 CPU7
cap=1000
cost=3000
LITTLE CPUs BIG CPUs
T2
T2
© 2017 Arm Limited111
Algorithm complexity
T1
cap=100
cost=100
cap=350
cost=400
cap=500
cost=800
CPU0 CPU1 CPU2 CPU3
cap=400
cost=700
cap=600
cost=1000
cap=800
cost=1800
CPU4 CPU5 CPU6 CPU7
cap=1000
cost=3000
LITTLE CPUs BIG CPUs
T2
T2
© 2017 Arm Limited112
Algorithm complexity
T1
cap=100
cost=100
cap=350
cost=400
cap=500
cost=800
CPU0 CPU1 CPU2 CPU3
cap=400
cost=700
cap=600
cost=1000
cap=800
cost=1800
CPU4 CPU5 CPU6 CPU7
cap=1000
cost=3000
LITTLE CPUs BIG CPUs
T2
T2
© 2017 Arm Limited113
Algorithm complexity
T1
cap=100
cost=100
cap=350
cost=400
cap=500
cost=800
CPU0 CPU1 CPU2 CPU3
cap=400
cost=700
cap=600
cost=1000
cap=800
cost=1800
CPU4 CPU5 CPU6 CPU7
cap=1000
cost=3000
LITTLE CPUs BIG CPUs
T2
T2
© 2017 Arm Limited114
Algorithm complexity
T1
cap=100
cost=100
cap=350
cost=400
cap=500
cost=800
CPU0 CPU1 CPU2 CPU3
cap=400
cost=700
cap=600
cost=1000
cap=800
cost=1800
CPU4 CPU5 CPU6 CPU7
cap=1000
cost=3000
LITTLE CPUs BIG CPUs
T2
T2
© 2017 Arm Limited115
Algorithm complexity
T1
cap=100
cost=100
cap=350
cost=400
cap=500
cost=800
CPU0 CPU1 CPU2 CPU3
cap=400
cost=700
cap=600
cost=1000
cap=800
cost=1800
CPU4 CPU5 CPU6 CPU7
cap=1000
cost=3000
LITTLE CPUs BIG CPUs
T2
© 2017 Arm Limited116
Algorithm complexity
T1
cap=100
cost=100
cap=350
cost=400
cap=500
cost=800
CPU0 CPU1 CPU2 CPU3
cap=400
cost=700
cap=600
cost=1000
cap=800
cost=1800
CPU4 CPU5 CPU6 CPU7
cap=1000
cost=3000
LITTLE CPUs BIG CPUs
T2
T2
© 2017 Arm Limited117
Algorithm complexity
T1
cap=100
cost=100
cap=350
cost=400
cap=500
cost=800
CPU0 CPU1 CPU2 CPU3
cap=400
cost=700
cap=600
cost=1000
cap=800
cost=1800
CPU4 CPU5 CPU6 CPU7
cap=1000
cost=3000
LITTLE CPUs BIG CPUs
T2
T2
© 2017 Arm Limited118
Algorithm complexity
T1
cap=100
cost=100
cap=350
cost=400
cap=500
cost=800
CPU0 CPU1 CPU2 CPU3
cap=400
cost=700
cap=600
cost=1000
cap=800
cost=1800
CPU4 CPU5 CPU6 CPU7
cap=1000
cost=3000
LITTLE CPUs BIG CPUs
T2
T2
© 2017 Arm Limited119
Algorithm complexity
T1
cap=100
cost=100
cap=350
cost=400
cap=500
cost=800
CPU0 CPU1 CPU2 CPU3
cap=400
cost=700
cap=600
cost=1000
cap=800
cost=1800
CPU4 CPU5 CPU6 CPU7
cap=1000
cost=3000
LITTLE CPUs BIG CPUs
T2
T2
© 2017 Arm Limited120
Algorithm complexity
T1
cap=100
cost=100
cap=350
cost=400
cap=500
cost=800
CPU0 CPU1 CPU2 CPU3
cap=400
cost=700
cap=600
cost=1000
cap=800
cost=1800
CPU4 CPU5 CPU6 CPU7
cap=1000
cost=3000
LITTLE CPUs BIG CPUs
T2
© 2017 Arm Limited121
Algorithm complexity
T1
cap=100
cost=100
cap=350
cost=400
cap=500
cost=800
CPU0 CPU1 CPU2 CPU3
cap=400
cost=700
cap=600
cost=1000
cap=800
cost=1800
CPU4 CPU5 CPU6 CPU7
cap=1000
cost=3000
LITTLE CPUs BIG CPUs
T2
T2
© 2017 Arm Limited122
Algorithm complexity
T1
cap=100
cost=100
cap=350
cost=400
cap=500
cost=800
CPU0 CPU1 CPU2 CPU3
cap=400
cost=700
cap=600
cost=1000
cap=800
cost=1800
CPU4 CPU5 CPU6 CPU7
cap=1000
cost=3000
LITTLE CPUs BIG CPUs
T2
T2
© 2017 Arm Limited123
Algorithm complexity
T1
cap=100
cost=100
cap=350
cost=400
cap=500
cost=800
CPU0 CPU1 CPU2 CPU3
cap=400
cost=700
cap=600
cost=1000
cap=800
cost=1800
CPU4 CPU5 CPU6 CPU7
cap=1000
cost=3000
LITTLE CPUs BIG CPUs
T2
T2
© 2017 Arm Limited124
Algorithm complexity
T1
cap=100
cost=100
cap=350
cost=400
cap=500
cost=800
CPU0 CPU1 CPU2 CPU3
cap=400
cost=700
cap=600
cost=1000
cap=800
cost=1800
CPU4 CPU5 CPU6 CPU7
cap=1000
cost=3000
LITTLE CPUs BIG CPUs
T2
T2
© 2017 Arm Limited125
Algorithm complexity
T1
cap=100
cost=100
cap=350
cost=400
cap=500
cost=800
CPU0 CPU1 CPU2 CPU3
cap=400
cost=700
cap=600
cost=1000
cap=800
cost=1800
CPU4 CPU5 CPU6 CPU7
cap=1000
cost=3000
LITTLE CPUs BIG CPUs
T2
T2

HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline

  • 1.
    © 2017 ArmLimited EAS in Android Common Kernel Linaro Connect Hong Kong 2018 Chris Redpath Quentin Perret Open Source Software
  • 2.
    © 2017 ArmLimited2 EAS in Android Common Kernel AOSP Common Kernel Update EAS Mainline Strategy EAS Upstreaming
  • 3.
    © 2017 ArmLimited AOSP Common Kernel Update
  • 4.
    © 2017 ArmLimited4 AOSP Common Kernel Update • EAS r1.3, July 2017 • android-4.4, android-4.9 • Default cpufreq governor switched to schedutil, sched-freq removed • Backports of upstream schedutil changes • Upstream backports of relevant scheduler features android-4.4 android-4.9 android-4.14 EAS r1.3 EAS r1.4 EAS r1.5
  • 5.
    © 2017 ArmLimited5 AOSP Common Kernel Update • EAS r1.4, November 2017 • android-4.4, android-4.9 • Upstream backports of more scheduler and schedutil patches • Energy diff improvements & fixes • android-4.14 EAS released including 1.4 & most 1.5 functionality android-4.4 android-4.9 android-4.14 EAS r1.3 EAS r1.4 EAS r1.5
  • 6.
    © 2017 ArmLimited6 EAS in android-4.14 A new set of patches implementing EAS rather than forward-porting • Based upon our latest mainline-focussed integration branch • Refactored latest android-eas on top to build clean set of patches • More Experimental features placed behind sched_features • Feature configuration matches android-4.9 • Produced during linux-4.14 rc phase, ready 2-weeks after linux-4.14
  • 7.
    © 2017 ArmLimited7 EAS in android-4.14 android-specific Upstream-targeted find_best_target schedtune WALT Trace & Debug Topology Detection Invariance Support Use of idle states Rt-PELT Schedutil changes Sync Wakeups Energy Diff Calculation NOHZ Signal Updates Misfit Tasks & Overutilized Flags Load balance tweaks
  • 8.
    © 2017 ArmLimited8 AOSP Common Kernel Update • EAS r1.5, Feb 2018 (eas-dev), merging to android-4.9 soon • android-4.9 only, most changes already in android-4.14 • Refactored energy diff to make calculation more efficient • Further refinement of EAS CPU pre-selection (find_best_target) – Thanks for excellent contributions from Qualcomm, Spreadtrum, Mstar, Linaro • Aggressive up-migrate of Misfit tasks & WALT updates from CodeAurora android-4.4 android-4.9 android-4.14 EAS r1.3 EAS r1.4 EAS r1.5
  • 9.
    © 2017 ArmLimited9 AOSP Common Kernel Update • EAS r1.6, eas-dev starting April 2018 • Moving to android-4.14 • Adding back Schedtune PE space filtering • Util_est backport, with PELT decay rate changes • Use mainline wakeup code for prefer_idle tasks • Remove ordering dependency in find_best_target – ( better tri-gear support when using find_best_target ) android-4.4 android-4.9 android-4.14 EAS r1.5 EAS r1.6
  • 10.
    © 2017 ArmLimited10 AOSP Common Kernel Update Branches: • android-4.4, android-4.9 & android-4.14 • Common kernel upstream for device kernels • Only post against this for bugfixes • People merge these into device kernels, so need to be selective about changes
  • 11.
    © 2017 ArmLimited11 AOSP Common Kernel Update More branches: • android-4.9-eas-dev (soon android-4.14-eas-dev) • This is where in-development patches should be posted • Arm power team usually post patches at RFC stage to stimulate discussion • Changes picked or merged back to common • android-4.4-eas-test (android-4.9-eas-test later) • Test branch is against android common for the latest well-supported public device • Intended to hold backports of EAS patches which merged into the active common branch, but did not get back to the branch we test with
  • 12.
    © 2017 ArmLimited12 AOSP Common Kernel Update There have been some consistent themes in EAS development over the last year or so:
  • 13.
    © 2017 ArmLimited13 AOSP Common Kernel Update There have been some consistent themes in EAS development over the last year or so: • Reducing delta with mainline
  • 14.
    © 2017 ArmLimited14 AOSP Common Kernel Update There have been some consistent themes in EAS development over the last year or so: • Reducing delta with mainline • Refactoring to improve maintainability and predictability
  • 15.
    © 2017 ArmLimited15 AOSP Common Kernel Update There have been some consistent themes in EAS development over the last year or so: • Reducing delta with mainline • Refactoring to improve maintainability and predictability • New features where necessary
  • 16.
    © 2017 ArmLimited16 AOSP Common Kernel Update There have been some consistent themes in EAS development over the last year or so: • Reducing delta with mainline • Refactoring to improve maintainability and predictability • New features where necessary • Open, collaborative development
  • 17.
    © 2017 ArmLimited17 AOSP Common Kernel Update Open Development • Patches for AOSP are reviewed on AOSP Gerrit • https://android-review.googlesource.com • We always try to justify patches with performance & energy numbers – use wltests for this • Wltests is part of LISA https://www.github.com/arm-software/lisa • Discussion of other topics and announcements are on Linaro’s eas-dev list • https://lists.linaro.org/mailman/listinfo/eas-dev
  • 18.
    © 2017 ArmLimited EAS Mainline Strategy
  • 19.
    © 2017 ArmLimited19 EAS Mainline Strategy • EAS is a large, complex piece of functionality • EAS being in AOSP helps a lot of users but not all • Upstream development results in better code
  • 20.
    © 2017 ArmLimited20 EAS Mainline Strategy • We make regular bi-weekly integrations of all our upstream-focussed code • Available on linux-arm.org & announced on eas-dev • Allows us to more easily see when changes impact us and work to resolve as soon as possible • Have been identifying suitable code we already have • Working on getting them into acceptable shape • Pushing when we think they are good enough for a review • Hoping to upstream quite a lot of EAS this year
  • 21.
    © 2017 ArmLimited21 EAS Mainline Strategy • Also working upstream where we can and backporting to Android • schedutil fixes • cpu signal updates • any fix/change applicable and potentially useful elsewhere • participating in reviews and testing
  • 22.
    © 2017 ArmLimited22 EAS in AOSP EAS Code Size by Category Target EAS Code Size by Category Android-specific 3797 824 WALT 1470 0 Upstreamable Features 2785 2785 Documentation 1153 1153 1153 1153 2785 2785 1470 0 3797 824
  • 23.
    © 2017 ArmLimited23 EAS Size 100.00% 14.76% 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 120.00% EAS Code size in android-4.14 Android Specific Code Target (rough estimate if everything goes to plan, mid-2019) ANDROID-SPECIFIC EAS CODE SIZE
  • 24.
    © 2017 ArmLimited24 Bringing EAS in AOSP Closer to Mainline 1. Reach performance/energy parity with WALT • WALT is great for mobile but not popular upstream • It’s also 1.5k LoC • Touches many parts of the scheduler we want to change upstream, which makes backporting harder
  • 25.
    © 2017 ArmLimited25 Bringing EAS in AOSP Closer to Mainline 1. Reach performance/energy parity with WALT • Disable WALT by default in android-common when ready 2. Push better support for big.LITTLE into mainline scheduler • Push out-of-tree wakeup and periodic balance changes upstream • Push energy diff calculations upstream
  • 26.
    © 2017 ArmLimited26 Bringing EAS in AOSP Closer to Mainline 1. Reach performance/energy parity with WALT • Disable WALT by default in android-common when ready 2. Push better support for big.LITTLE into mainline scheduler • Push out-of-tree wakeup and periodic balance changes upstream • Push energy diff calculations upstream 3. Expect to continue to carry mobile-specific changes in AOSP • Schedutil up/down throttle split • Rt-rq signals • Performance/Energy task classification
  • 27.
    © 2017 ArmLimited EAS Upstreaming
  • 28.
    © 2017 ArmLimited28 EAS Upstreaming 7 areas identified for upstreaming. Feature Status Energy Model On LKML (v1 March 2018, during Connect!) Frequency and Cpu Invariant Engines (FIE/CIE) Merged in v4.15 Idle Cpu PELT update (Remote status update) Merged in tip/sched/core Util Est Merged in tip/sched/core (during Connect!) Util Clamp Almost ready (v1 on LKML April 2018) Misfit Task On LKML (v2 March 2018) Dynamic Topology Flag Detection In development, many scenarios to cover
  • 29.
    © 2017 ArmLimited29 EAS Upstreaming • Util-Est • Add an aggregator on top of the PELT estimator – keep track of what “we learned” about task’s previous activations – generate a “new” signal on top of PELT • Build a low-overhead statistic for SEs and CPUs – Tasks at dequeue time – Root RQs at task enqueue/dequeue • Lots of detail at last year’s OSPM Summit and lkml • Patches merged into upstream tip/sched/core branch
  • 30.
    © 2017 ArmLimited30 EAS Upstreaming • Misfit Tasks • Promote long-running tasks to most capable Cpus • Key to achieving consistent performance in heterogenous systems • Tasks which don’t sleep need active migration
  • 31.
    © 2017 ArmLimited A Simplified Energy Model for EAS
  • 32.
    © 2017 ArmLimited32 An Energy Model: why ? • Power/perf. characteristics differ between different SoCs • Heuristics don’t perform well on many platforms • The Energy Model enables the design of a platform-agnostic algorithm in the scheduler • Designed for mainline
  • 33.
    © 2017 ArmLimited33 Summary 1. Today’s Energy Model 2. Which simplified Energy Model ? 3. Mainline implementation 4. Conclusion
  • 34.
    © 2017 ArmLimited34 Summary 1. Today’s Energy Model 2. Which simplified Energy Model ? 3. Mainline implementation 4. Conclusion
  • 35.
    © 2017 ArmLimited35 The Energy Model in Android / Hikey960
  • 36.
    © 2017 ArmLimited36 The Energy Model in Android / Hikey960 LITTLE big MHz Cap. Cost MHz Cap. Cost 533 133 87 903 390 404 999 250 167 1421 615 861 1402 351 265 1805 782 1398 1709 429 388 2112 915 2200 1844 462 502 2362 1024 2848 0 500 1000 1500 2000 2500 3000 0 200 400 600 800 1000 CPULEVEL Capacity Power
  • 37.
    © 2017 ArmLimited37 The Energy Model in Android / Hikey960 LITTLE big MHz Cap. Cost MHz Cap. Cost 533 133 87 903 390 404 999 250 167 1421 615 861 1402 351 265 1805 782 1398 1709 429 388 2112 915 2200 1844 462 502 2362 1024 2848 0 500 1000 1500 2000 2500 3000 0 200 400 600 800 1000 CPULEVEL LITTLE big 5 18 5 18 0 0 0 0 Capacity Power
  • 38.
    © 2017 ArmLimited38 The Energy Model in Android / Hikey960 LITTLE big MHz Cap. Cost MHz Cap. Cost 533 133 87 903 390 404 999 250 167 1421 615 861 1402 351 265 1805 782 1398 1709 429 388 2112 915 2200 1844 462 502 2362 1024 2848 0 100 200 300 400 500 0 200 400 600 800 1000 LITTLE big MHz Cap. Cost MHz Cap. Cost 533 133 12 903 390 102 999 250 22 1421 615 124 1402 351 36 1805 782 221 1709 429 67 2112 915 330 1844 462 144 2362 1024 433 0 500 1000 1500 2000 2500 3000 0 200 400 600 800 1000 CPULEVELCLUSTERLEVEL LITTLE big 5 18 5 18 0 0 0 0 LITTLE big 12 102 12 102 12 102 0 0 Capacity Capacity PowerPower
  • 39.
    © 2017 ArmLimited39 Device-tree bindings
  • 40.
    © 2017 ArmLimited40 Device-tree bindings [...] cpu0: cpu@0 { [...] sched-energy-costs = <&CPU_COST_A53 &CL_COST_A53>; [...] } [...] cpu1: cpu@1 { [...] sched-energy-costs = <&CPU_COST_A53 &CL_COST_A53>; [...] } [...] cpu4: cpu@100 { [...] sched-energy-costs = <&CPU_COST_A72 &CL_COST_A72>; [...] } arch/arm64/boot/dts/hisilicon/hi3660.dtsi
  • 41.
    © 2017 ArmLimited41 Device-tree bindings [...] cpu0: cpu@0 { [...] sched-energy-costs = <&CPU_COST_A53 &CL_COST_A53>; [...] } [...] cpu1: cpu@1 { [...] sched-energy-costs = <&CPU_COST_A53 &CL_COST_A53>; [...] } [...] cpu4: cpu@100 { [...] sched-energy-costs = <&CPU_COST_A72 &CL_COST_A72>; [...] } CPU_COST_A72: core-cost0 { busy-cost-data = < 390 404 615 861 782 1398 915 2200 1024 2848 >; idle-cost-data = < 18 18 0 0 >; }; CPU_COST_A53: core-cost1 { busy-cost-data = < 133 87 250 164 351 265 429 388 462 502 >; idle-cost-data = < 5 5 0 0 >; }; CLUSTER_COST_A72: cluster-cost0 { busy-cost-data = < [...] arch/arm64/boot/dts/hisilicon/hi3660.dtsi arch/arm64/boot/dts/hisilicon/hi3660-sched-energy.dtsi
  • 42.
    © 2017 ArmLimited42 Device-tree bindings [...] cpu0: cpu@0 { [...] sched-energy-costs = <&CPU_COST_A53 &CL_COST_A53>; [...] } [...] cpu1: cpu@1 { [...] sched-energy-costs = <&CPU_COST_A53 &CL_COST_A53>; [...] } [...] cpu4: cpu@100 { [...] sched-energy-costs = <&CPU_COST_A72 &CL_COST_A72>; [...] } CPU_COST_A72: core-cost0 { busy-cost-data = < 390 404 615 861 782 1398 915 2200 1024 2848 >; idle-cost-data = < 18 18 0 0 >; }; CPU_COST_A53: core-cost1 { busy-cost-data = < 133 87 250 164 351 265 429 388 462 502 >; idle-cost-data = < 5 5 0 0 >; }; CLUSTER_COST_A72: cluster-cost0 { busy-cost-data = < [...] arch/arm64/boot/dts/hisilicon/hi3660.dtsi arch/arm64/boot/dts/hisilicon/hi3660-sched-energy.dtsi
  • 43.
    © 2017 ArmLimited43 Sched domains
  • 44.
    © 2017 ArmLimited44 Sched domains CPU 0
  • 45.
    © 2017 ArmLimited45 Sched domains MC 0 1 2 3 0 1 2 3 groups CPU span CPU 0
  • 46.
    © 2017 ArmLimited46 Sched domains MC 0 1 2 3 0 1 2 3 groups CPU span CPU 0 LITTLE Cap. Cost 133 87 250 167 351 265 429 388 462 502 LITTLE 5 5 0 0 CPU
  • 47.
    © 2017 ArmLimited47 Sched domains MC DIE 0 1 2 3 10 11 0 1 2 3 4 5 6 7 parent groups groups CPU span CPU 0 LITTLE Cap. Cost 133 87 250 167 351 265 429 388 462 502 LITTLE 5 5 0 0 CPU
  • 48.
    © 2017 ArmLimited48 Sched domains MC DIE 0 1 2 3 10 11 0 1 2 3 4 5 6 7 parent groups groups CPU span CPU 0 LITTLE Cap. Cost 133 87 250 167 351 265 429 388 462 502 LITTLE 5 5 0 0 CPU big Cap. Cost 390 102 615 124 782 221 915 330 1024 433 big 102 102 102 0 ClusterCluster LITTLE 12 12 12 0 LITTLE Cap. Cost 133 12 250 22 351 36 429 67 462 144
  • 49.
    © 2017 ArmLimited49 Need for simplification • Comprehensive Energy Model, but …
  • 50.
    © 2017 ArmLimited50 Need for simplification • Comprehensive Energy Model, but … • Complex to measure for new platforms
  • 51.
    © 2017 ArmLimited51 Need for simplification • Comprehensive Energy Model, but … • Complex to measure for new platforms • Computationally expensive scheduling decisions
  • 52.
    © 2017 ArmLimited52 Need for simplification • Comprehensive Energy Model, but … • Complex to measure for new platforms • Computationally expensive scheduling decisions • Existing code relies only on out-of-tree bindings
  • 53.
    © 2017 ArmLimited53 Need for simplification • Comprehensive Energy Model, but … • Complex to measure for new platforms • Computationally expensive scheduling decisions • Existing code relies only on out-of-tree bindings • Inaccurate assumptions for future platforms
  • 54.
    © 2017 ArmLimited54 Summary 1. Today’s Energy Model 2. Which simplified Energy Model ? 3. Mainline implementation 4. Conclusion
  • 55.
    © 2017 ArmLimited55 Which simplified EM ?
  • 56.
    © 2017 ArmLimited56 Which simplified EM ? Name CPU Level Cluster Level Active costs Idle costs Active costs Idle costs FULL YES YES YES YES NOIDLE YES NO YES NO NOCLUSTER YES YES NO NO NOCLUSTER_NOIDLE YES NO NO NO NO_EAS NO NO NO NO
  • 57.
    © 2017 ArmLimited57 Which simplified EM ? Name CPU Level Cluster Level Active costs Idle costs Active costs Idle costs FULL YES YES YES YES NOIDLE YES NO YES NO NOCLUSTER YES YES NO NO NOCLUSTER_NOIDLE YES NO NO NO NO_EAS NO NO NO NO
  • 58.
    © 2017 ArmLimited58 Which simplified EM ? Name CPU Level Cluster Level Active costs Idle costs Active costs Idle costs FULL YES YES YES YES NOIDLE YES NO YES NO NOCLUSTER YES YES NO NO NOCLUSTER_NOIDLE YES NO NO NO NO_EAS NO NO NO NO
  • 59.
    © 2017 ArmLimited59 Which simplified EM ? Name CPU Level Cluster Level Active costs Idle costs Active costs Idle costs FULL YES YES YES YES NOIDLE YES NO YES NO NOCLUSTER YES YES NO NO NOCLUSTER_NOIDLE YES NO NO NO NO_EAS NO NO NO NO
  • 60.
    © 2017 ArmLimited60 Which simplified EM ? Name CPU Level Cluster Level Active costs Idle costs Active costs Idle costs FULL YES YES YES YES NOIDLE YES NO YES NO NOCLUSTER YES YES NO NO NOCLUSTER_NOIDLE YES NO NO NO NO_EAS NO NO NO NO
  • 61.
    © 2017 ArmLimited61 Which simplified EM ? Name CPU Level Cluster Level Active costs Idle costs Active costs Idle costs FULL YES YES YES YES NOIDLE YES NO YES NO NOCLUSTER YES YES NO NO NOCLUSTER_NOIDLE YES NO NO NO NO_EAS NO NO NO NO
  • 62.
    © 2017 ArmLimited62 Which simplified EM ? Name CPU Level Cluster Level Active costs Idle costs Active costs Idle costs FULL YES YES YES YES NOIDLE YES NO YES NO NOCLUSTER YES YES NO NO NOCLUSTER_NOIDLE YES NO NO NO NO_EAS NO NO NO NO • Tested on Android-4.4: Hikey960, Pixel2, Hikey620
  • 63.
    © 2017 ArmLimited63 Which simplified EM ? Name CPU Level Cluster Level Active costs Idle costs Active costs Idle costs FULL YES YES YES YES NOIDLE YES NO YES NO NOCLUSTER YES YES NO NO NOCLUSTER_NOIDLE YES NO NO NO NO_EAS NO NO NO NO • Tested on Android-4.4: Hikey960, Pixel2, Hikey620 • SchedTune disabled, no cpusets
  • 64.
    © 2017 ArmLimited64 Jankbench / list_view - Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS
  • 65.
    © 2017 ArmLimited65 Jankbench / list_view - Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS Mean
  • 66.
    © 2017 ArmLimited66 Jankbench / list_view - Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS
  • 67.
    © 2017 ArmLimited67 Jankbench / list_view - Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS 50%
  • 68.
    © 2017 ArmLimited68 Jankbench / list_view - Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS 50% ~99%
  • 69.
    © 2017 ArmLimited69 Jankbench / list_view - Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS
  • 70.
    © 2017 ArmLimited70 Jankbench / image_list_view - Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS
  • 71.
    © 2017 ArmLimited71 Jankbench / low_hitrate_text - Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS
  • 72.
    © 2017 ArmLimited72 Jankbench / shadow_grid - Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS
  • 73.
    © 2017 ArmLimited73 Jankbench / edit_text - Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS
  • 74.
    © 2017 ArmLimited74 Homescreen / Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS
  • 75.
    © 2017 ArmLimited75 ExoPlayer Video / Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS
  • 76.
    © 2017 ArmLimited76 ExoPlayer Audio / Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS
  • 77.
    © 2017 ArmLimited77 Results of experiments • Hikey960: all energy models show comparable energy savings • Pixel2: same conclusions with smaller savings (up to 13%, screen on) • Hikey620 (SMP): No significant savings
  • 78.
    © 2017 ArmLimited78 Results of experiments • Hikey960: all energy models show comparable energy savings • Pixel2: same conclusions with smaller savings (up to 13%, screen on) • Hikey620 (SMP): No significant savings The simplest EM (noidle_nocluster) is a reasonable option for modern platforms
  • 79.
    © 2017 ArmLimited79 Summary 1. Today’s Energy Model 2. Which simplified Energy Model ? 3. Mainline implementation 4. Conclusion
  • 80.
    © 2017 ArmLimited80 Dynamic power model
  • 81.
    © 2017 ArmLimited81 Dynamic power model 𝑃 = 𝐶 ∗ 𝑉 2 ∗ 𝑓 2 2 𝑓
  • 82.
    © 2017 ArmLimited82 Dynamic power model 𝑃 = 𝐶 ∗ 𝑉 2 ∗ 𝑓 2 2 𝑓 Power
  • 83.
    © 2017 ArmLimited83 Dynamic power model 𝑃 = 𝐶 ∗ 𝑉 2 ∗ 𝑓 2 2 𝑓 Power Capacitance
  • 84.
    © 2017 ArmLimited84 Dynamic power model 𝑃 = 𝐶 ∗ 𝑉 2 ∗ 𝑓 2 2 𝑓 Power Capacitance Voltage
  • 85.
    © 2017 ArmLimited85 Dynamic power model 𝑃 = 𝐶 ∗ 𝑉 2 ∗ 𝑓 2 2 𝑓 Power Capacitance Voltage Frequency
  • 86.
    © 2017 ArmLimited86 Dynamic power model 𝑃 = 𝐶 ∗ 𝑉 2 ∗ 𝑓 2 2 𝑓 Power Capacitance Voltage Frequency Mainline DT binding: dynamic-power-coefficient
  • 87.
    © 2017 ArmLimited87 Dynamic power model 𝑃 = 𝐶 ∗ 𝑉 2 ∗ 𝑓 2 2 𝑓 Power Capacitance Voltage Frequency Mainline DT binding: dynamic-power-coefficient Managed by CPUFreq / OPP
  • 88.
    © 2017 ArmLimited88 Energy Model Comparison / Hikey960 1 11 21 31 41 51 61 71 81 91 200 700 1200 1700 2200 Measured 𝐶 ∗ 𝑉2 ∗ 𝑓 Frequency (MHz). Power(%)
  • 89.
    © 2017 ArmLimited89 Architecture
  • 90.
    © 2017 ArmLimited90 Architecture Thermal / IPA 𝑃 = 𝐶𝑉2 𝑓 PM OPP Device Tree 𝑉 𝑓 𝐶
  • 91.
    © 2017 ArmLimited91 Architecture Thermal / IPA PM OPP Device Tree 𝑉 𝑓 𝐶
  • 92.
    © 2017 ArmLimited92 Architecture Thermal / IPA PM OPP Device Tree 𝑉 𝑓 𝐶 𝑃 𝐶𝑉2 𝑓
  • 93.
    © 2017 ArmLimited93 Architecture Thermal / IPA PM OPP Device Tree 𝑉 𝑓 𝐶 𝑃 𝐶𝑉2 𝑓 Scheduler Energy Model
  • 94.
    © 2017 ArmLimited94 Architecture Thermal / IPA PM OPP Device Tree 𝑉 𝑓 𝐶 𝑃 𝐶𝑉2 𝑓 Scheduler Energy Model SCMI
  • 95.
    © 2017 ArmLimited95 Architecture Thermal / IPA PM OPP Device Tree 𝑉 𝑓 𝐶 𝑃 𝐶𝑉2 𝑓 Scheduler Energy Model SCMI Other ?
  • 96.
    © 2017 ArmLimited96 Architecture Thermal / IPA PM OPP Device Tree 𝑉 𝑓 𝐶 𝑃 𝐶𝑉2 𝑓 Scheduler Energy Model SCMI Other ? [PATCH v3 0/2] thermal, OPP: move the CPU power estimation to the OPP library -> [PATCH v3 1/2] PM / OPP: introduce an OPP power estimation helper -> [PATCH v3 2/2] thermal: cpu_cooling: use power models from the OPP library
  • 97.
    © 2017 ArmLimited97 Implementation • No hierarchical data, no need to use the scheduling domains • Data structures: • Loaded from PM / OPP at boot time, after CPUfreq is setup • Energy models are stored in a flat per-cpu array • Frequency domains are stored in cpu-masks
  • 98.
    © 2017 ArmLimited98 Assumptions • All CPUs in a freq. domain share capacity states • All CPUs in a freq. domain have the same micro-architecture • Possible to relax this if needed, but higher computational cost • EAS enabled for asymmetric platforms only (SD_ASYM_CPUCAPACITY flag set) • EAS shines on heterogeneous platforms • Avoid “conflicts” for purely perf-oriented platforms (servers, …)
  • 99.
    © 2017 ArmLimited99 Tests against mainline • Test setup: • Platform: Hikey960 and Juno r0 • Debian userspace • Base kernel: tip/sched/core – 4.16-rc2 • Test cases: • Energy: “X” RTApp tasks, 16ms period, 5% duty cycle, 30 seconds • Performance: `perf bench sched messaging –pipe –thread –group X –loop 50000`
  • 100.
    © 2017 ArmLimited100 Tests against mainline / Energy 0 10 20 30 40 50 60 70 80 90 100 10 tasks 20 tasks 30 tasks 40 tasks 50 tasks tip/sched/core EAS 0 10 20 30 40 50 60 70 80 90 100 10 tasks 20 tasks 30 tasks 40 tasks 50 tasks tip/sched/core EAS Hikey960 (ACME / full SoC + memory) Juno (HW monitor / b.L CPUs only) Energy(%) Energy(%)
  • 101.
    © 2017 ArmLimited101 Tests against mainline / Perf. 0 10 20 30 40 50 60 70 40 tasks 80 tasks 160 tasks 320 tasks tip/sched/core EAS 0 10 20 30 40 50 60 40 tasks 80 tasks 160 tasks 320 tasks tip/sched/core EAS Hikey960 Juno Time(s) Time(s)
  • 102.
    © 2017 ArmLimited102 Posted to LKML this week [RFC PATCH 0/6] Energy Aware Scheduling [RFC PATCH 1/6] sched/fair: Create util_fits_capacity() [RFC PATCH 2/6] sched: Introduce energy models of CPUs [RFC PATCH 3/6] sched: Add over-utilization/tipping point indicator [RFC PATCH 4/6] sched/fair: Introduce an energy estimation helper … [RFC PATCH 5/6] sched/fair: Select an energy-efficient CPU on task … [RFC PATCH 6/6] drivers: base: arch_topology.c: Enable EAS for arm/…
  • 103.
    © 2017 ArmLimited103 Summary 1. Today’s Energy Model 2. Which simplified Energy Model ? 3. Mainline implementation 4. Conclusion
  • 104.
    © 2017 ArmLimited104 Next steps • Ideal scenario: simplified EM goes in the next LTS (4.19 ?) • Test & assessment on android-4.14 • In case of gaps with the full EM, they will be filled in product
  • 105.
    © 2017 ArmLimited105 Thanks. Any questions ?
  • 106.
    106106 © 2017Arm Limited The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. www.arm.com/company/policies/trademarks
  • 107.
    © 2017 ArmLimited107 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs
  • 108.
    © 2017 ArmLimited108 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2
  • 109.
    © 2017 ArmLimited109 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2
  • 110.
    © 2017 ArmLimited110 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  • 111.
    © 2017 ArmLimited111 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  • 112.
    © 2017 ArmLimited112 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  • 113.
    © 2017 ArmLimited113 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  • 114.
    © 2017 ArmLimited114 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  • 115.
    © 2017 ArmLimited115 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2
  • 116.
    © 2017 ArmLimited116 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  • 117.
    © 2017 ArmLimited117 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  • 118.
    © 2017 ArmLimited118 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  • 119.
    © 2017 ArmLimited119 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  • 120.
    © 2017 ArmLimited120 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2
  • 121.
    © 2017 ArmLimited121 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  • 122.
    © 2017 ArmLimited122 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  • 123.
    © 2017 ArmLimited123 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  • 124.
    © 2017 ArmLimited124 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  • 125.
    © 2017 ArmLimited125 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2