Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline

534 views

Published on

"Session ID: HKG18-501
Session Name: HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Speaker: Chris Redpath
Track: Mobile, Kernel


★ Session Summary ★
This session will introduce the changes to EAS planned for 4.14 kernel, and how Arm hopes that EAS will develop in future. EAS has already evolved from an Arm/Linaro joint project to involving a much wider community of SoC vendors, Google and interested device manufacturers. We will highlight the product-specific pieces remaining in the Android Common Kernel EAS implementation, and our plans to provide an upstreaming plan for each product feature. In particular, the new 'simplified energy model' is designed to provide mainline-friendliness and comparable performance using a simple DT expression of cpu power/performance.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-501/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-501.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-501.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong

---------------------------------------------------
Keyword: Mobile, Kernel
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961"

Published in: Technology
  • Be the first to comment

  • Be the first to like this

HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline

  1. 1. © 2017 Arm Limited EAS in Android Common Kernel Linaro Connect Hong Kong 2018 Chris Redpath Quentin Perret Open Source Software
  2. 2. © 2017 Arm Limited2 EAS in Android Common Kernel AOSP Common Kernel Update EAS Mainline Strategy EAS Upstreaming
  3. 3. © 2017 Arm Limited AOSP Common Kernel Update
  4. 4. © 2017 Arm Limited4 AOSP Common Kernel Update • EAS r1.3, July 2017 • android-4.4, android-4.9 • Default cpufreq governor switched to schedutil, sched-freq removed • Backports of upstream schedutil changes • Upstream backports of relevant scheduler features android-4.4 android-4.9 android-4.14 EAS r1.3 EAS r1.4 EAS r1.5
  5. 5. © 2017 Arm Limited5 AOSP Common Kernel Update • EAS r1.4, November 2017 • android-4.4, android-4.9 • Upstream backports of more scheduler and schedutil patches • Energy diff improvements & fixes • android-4.14 EAS released including 1.4 & most 1.5 functionality android-4.4 android-4.9 android-4.14 EAS r1.3 EAS r1.4 EAS r1.5
  6. 6. © 2017 Arm Limited6 EAS in android-4.14 A new set of patches implementing EAS rather than forward-porting • Based upon our latest mainline-focussed integration branch • Refactored latest android-eas on top to build clean set of patches • More Experimental features placed behind sched_features • Feature configuration matches android-4.9 • Produced during linux-4.14 rc phase, ready 2-weeks after linux-4.14
  7. 7. © 2017 Arm Limited7 EAS in android-4.14 android-specific Upstream-targeted find_best_target schedtune WALT Trace & Debug Topology Detection Invariance Support Use of idle states Rt-PELT Schedutil changes Sync Wakeups Energy Diff Calculation NOHZ Signal Updates Misfit Tasks & Overutilized Flags Load balance tweaks
  8. 8. © 2017 Arm Limited8 AOSP Common Kernel Update • EAS r1.5, Feb 2018 (eas-dev), merging to android-4.9 soon • android-4.9 only, most changes already in android-4.14 • Refactored energy diff to make calculation more efficient • Further refinement of EAS CPU pre-selection (find_best_target) – Thanks for excellent contributions from Qualcomm, Spreadtrum, Mstar, Linaro • Aggressive up-migrate of Misfit tasks & WALT updates from CodeAurora android-4.4 android-4.9 android-4.14 EAS r1.3 EAS r1.4 EAS r1.5
  9. 9. © 2017 Arm Limited9 AOSP Common Kernel Update • EAS r1.6, eas-dev starting April 2018 • Moving to android-4.14 • Adding back Schedtune PE space filtering • Util_est backport, with PELT decay rate changes • Use mainline wakeup code for prefer_idle tasks • Remove ordering dependency in find_best_target – ( better tri-gear support when using find_best_target ) android-4.4 android-4.9 android-4.14 EAS r1.5 EAS r1.6
  10. 10. © 2017 Arm Limited10 AOSP Common Kernel Update Branches: • android-4.4, android-4.9 & android-4.14 • Common kernel upstream for device kernels • Only post against this for bugfixes • People merge these into device kernels, so need to be selective about changes
  11. 11. © 2017 Arm Limited11 AOSP Common Kernel Update More branches: • android-4.9-eas-dev (soon android-4.14-eas-dev) • This is where in-development patches should be posted • Arm power team usually post patches at RFC stage to stimulate discussion • Changes picked or merged back to common • android-4.4-eas-test (android-4.9-eas-test later) • Test branch is against android common for the latest well-supported public device • Intended to hold backports of EAS patches which merged into the active common branch, but did not get back to the branch we test with
  12. 12. © 2017 Arm Limited12 AOSP Common Kernel Update There have been some consistent themes in EAS development over the last year or so:
  13. 13. © 2017 Arm Limited13 AOSP Common Kernel Update There have been some consistent themes in EAS development over the last year or so: • Reducing delta with mainline
  14. 14. © 2017 Arm Limited14 AOSP Common Kernel Update There have been some consistent themes in EAS development over the last year or so: • Reducing delta with mainline • Refactoring to improve maintainability and predictability
  15. 15. © 2017 Arm Limited15 AOSP Common Kernel Update There have been some consistent themes in EAS development over the last year or so: • Reducing delta with mainline • Refactoring to improve maintainability and predictability • New features where necessary
  16. 16. © 2017 Arm Limited16 AOSP Common Kernel Update There have been some consistent themes in EAS development over the last year or so: • Reducing delta with mainline • Refactoring to improve maintainability and predictability • New features where necessary • Open, collaborative development
  17. 17. © 2017 Arm Limited17 AOSP Common Kernel Update Open Development • Patches for AOSP are reviewed on AOSP Gerrit • https://android-review.googlesource.com • We always try to justify patches with performance & energy numbers – use wltests for this • Wltests is part of LISA https://www.github.com/arm-software/lisa • Discussion of other topics and announcements are on Linaro’s eas-dev list • https://lists.linaro.org/mailman/listinfo/eas-dev
  18. 18. © 2017 Arm Limited EAS Mainline Strategy
  19. 19. © 2017 Arm Limited19 EAS Mainline Strategy • EAS is a large, complex piece of functionality • EAS being in AOSP helps a lot of users but not all • Upstream development results in better code
  20. 20. © 2017 Arm Limited20 EAS Mainline Strategy • We make regular bi-weekly integrations of all our upstream-focussed code • Available on linux-arm.org & announced on eas-dev • Allows us to more easily see when changes impact us and work to resolve as soon as possible • Have been identifying suitable code we already have • Working on getting them into acceptable shape • Pushing when we think they are good enough for a review • Hoping to upstream quite a lot of EAS this year
  21. 21. © 2017 Arm Limited21 EAS Mainline Strategy • Also working upstream where we can and backporting to Android • schedutil fixes • cpu signal updates • any fix/change applicable and potentially useful elsewhere • participating in reviews and testing
  22. 22. © 2017 Arm Limited22 EAS in AOSP EAS Code Size by Category Target EAS Code Size by Category Android-specific 3797 824 WALT 1470 0 Upstreamable Features 2785 2785 Documentation 1153 1153 1153 1153 2785 2785 1470 0 3797 824
  23. 23. © 2017 Arm Limited23 EAS Size 100.00% 14.76% 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 120.00% EAS Code size in android-4.14 Android Specific Code Target (rough estimate if everything goes to plan, mid-2019) ANDROID-SPECIFIC EAS CODE SIZE
  24. 24. © 2017 Arm Limited24 Bringing EAS in AOSP Closer to Mainline 1. Reach performance/energy parity with WALT • WALT is great for mobile but not popular upstream • It’s also 1.5k LoC • Touches many parts of the scheduler we want to change upstream, which makes backporting harder
  25. 25. © 2017 Arm Limited25 Bringing EAS in AOSP Closer to Mainline 1. Reach performance/energy parity with WALT • Disable WALT by default in android-common when ready 2. Push better support for big.LITTLE into mainline scheduler • Push out-of-tree wakeup and periodic balance changes upstream • Push energy diff calculations upstream
  26. 26. © 2017 Arm Limited26 Bringing EAS in AOSP Closer to Mainline 1. Reach performance/energy parity with WALT • Disable WALT by default in android-common when ready 2. Push better support for big.LITTLE into mainline scheduler • Push out-of-tree wakeup and periodic balance changes upstream • Push energy diff calculations upstream 3. Expect to continue to carry mobile-specific changes in AOSP • Schedutil up/down throttle split • Rt-rq signals • Performance/Energy task classification
  27. 27. © 2017 Arm Limited EAS Upstreaming
  28. 28. © 2017 Arm Limited28 EAS Upstreaming 7 areas identified for upstreaming. Feature Status Energy Model On LKML (v1 March 2018, during Connect!) Frequency and Cpu Invariant Engines (FIE/CIE) Merged in v4.15 Idle Cpu PELT update (Remote status update) Merged in tip/sched/core Util Est Merged in tip/sched/core (during Connect!) Util Clamp Almost ready (v1 on LKML April 2018) Misfit Task On LKML (v2 March 2018) Dynamic Topology Flag Detection In development, many scenarios to cover
  29. 29. © 2017 Arm Limited29 EAS Upstreaming • Util-Est • Add an aggregator on top of the PELT estimator – keep track of what “we learned” about task’s previous activations – generate a “new” signal on top of PELT • Build a low-overhead statistic for SEs and CPUs – Tasks at dequeue time – Root RQs at task enqueue/dequeue • Lots of detail at last year’s OSPM Summit and lkml • Patches merged into upstream tip/sched/core branch
  30. 30. © 2017 Arm Limited30 EAS Upstreaming • Misfit Tasks • Promote long-running tasks to most capable Cpus • Key to achieving consistent performance in heterogenous systems • Tasks which don’t sleep need active migration
  31. 31. © 2017 Arm Limited A Simplified Energy Model for EAS
  32. 32. © 2017 Arm Limited32 An Energy Model: why ? • Power/perf. characteristics differ between different SoCs • Heuristics don’t perform well on many platforms • The Energy Model enables the design of a platform-agnostic algorithm in the scheduler • Designed for mainline
  33. 33. © 2017 Arm Limited33 Summary 1. Today’s Energy Model 2. Which simplified Energy Model ? 3. Mainline implementation 4. Conclusion
  34. 34. © 2017 Arm Limited34 Summary 1. Today’s Energy Model 2. Which simplified Energy Model ? 3. Mainline implementation 4. Conclusion
  35. 35. © 2017 Arm Limited35 The Energy Model in Android / Hikey960
  36. 36. © 2017 Arm Limited36 The Energy Model in Android / Hikey960 LITTLE big MHz Cap. Cost MHz Cap. Cost 533 133 87 903 390 404 999 250 167 1421 615 861 1402 351 265 1805 782 1398 1709 429 388 2112 915 2200 1844 462 502 2362 1024 2848 0 500 1000 1500 2000 2500 3000 0 200 400 600 800 1000 CPULEVEL Capacity Power
  37. 37. © 2017 Arm Limited37 The Energy Model in Android / Hikey960 LITTLE big MHz Cap. Cost MHz Cap. Cost 533 133 87 903 390 404 999 250 167 1421 615 861 1402 351 265 1805 782 1398 1709 429 388 2112 915 2200 1844 462 502 2362 1024 2848 0 500 1000 1500 2000 2500 3000 0 200 400 600 800 1000 CPULEVEL LITTLE big 5 18 5 18 0 0 0 0 Capacity Power
  38. 38. © 2017 Arm Limited38 The Energy Model in Android / Hikey960 LITTLE big MHz Cap. Cost MHz Cap. Cost 533 133 87 903 390 404 999 250 167 1421 615 861 1402 351 265 1805 782 1398 1709 429 388 2112 915 2200 1844 462 502 2362 1024 2848 0 100 200 300 400 500 0 200 400 600 800 1000 LITTLE big MHz Cap. Cost MHz Cap. Cost 533 133 12 903 390 102 999 250 22 1421 615 124 1402 351 36 1805 782 221 1709 429 67 2112 915 330 1844 462 144 2362 1024 433 0 500 1000 1500 2000 2500 3000 0 200 400 600 800 1000 CPULEVELCLUSTERLEVEL LITTLE big 5 18 5 18 0 0 0 0 LITTLE big 12 102 12 102 12 102 0 0 Capacity Capacity PowerPower
  39. 39. © 2017 Arm Limited39 Device-tree bindings
  40. 40. © 2017 Arm Limited40 Device-tree bindings [...] cpu0: cpu@0 { [...] sched-energy-costs = <&CPU_COST_A53 &CL_COST_A53>; [...] } [...] cpu1: cpu@1 { [...] sched-energy-costs = <&CPU_COST_A53 &CL_COST_A53>; [...] } [...] cpu4: cpu@100 { [...] sched-energy-costs = <&CPU_COST_A72 &CL_COST_A72>; [...] } arch/arm64/boot/dts/hisilicon/hi3660.dtsi
  41. 41. © 2017 Arm Limited41 Device-tree bindings [...] cpu0: cpu@0 { [...] sched-energy-costs = <&CPU_COST_A53 &CL_COST_A53>; [...] } [...] cpu1: cpu@1 { [...] sched-energy-costs = <&CPU_COST_A53 &CL_COST_A53>; [...] } [...] cpu4: cpu@100 { [...] sched-energy-costs = <&CPU_COST_A72 &CL_COST_A72>; [...] } CPU_COST_A72: core-cost0 { busy-cost-data = < 390 404 615 861 782 1398 915 2200 1024 2848 >; idle-cost-data = < 18 18 0 0 >; }; CPU_COST_A53: core-cost1 { busy-cost-data = < 133 87 250 164 351 265 429 388 462 502 >; idle-cost-data = < 5 5 0 0 >; }; CLUSTER_COST_A72: cluster-cost0 { busy-cost-data = < [...] arch/arm64/boot/dts/hisilicon/hi3660.dtsi arch/arm64/boot/dts/hisilicon/hi3660-sched-energy.dtsi
  42. 42. © 2017 Arm Limited42 Device-tree bindings [...] cpu0: cpu@0 { [...] sched-energy-costs = <&CPU_COST_A53 &CL_COST_A53>; [...] } [...] cpu1: cpu@1 { [...] sched-energy-costs = <&CPU_COST_A53 &CL_COST_A53>; [...] } [...] cpu4: cpu@100 { [...] sched-energy-costs = <&CPU_COST_A72 &CL_COST_A72>; [...] } CPU_COST_A72: core-cost0 { busy-cost-data = < 390 404 615 861 782 1398 915 2200 1024 2848 >; idle-cost-data = < 18 18 0 0 >; }; CPU_COST_A53: core-cost1 { busy-cost-data = < 133 87 250 164 351 265 429 388 462 502 >; idle-cost-data = < 5 5 0 0 >; }; CLUSTER_COST_A72: cluster-cost0 { busy-cost-data = < [...] arch/arm64/boot/dts/hisilicon/hi3660.dtsi arch/arm64/boot/dts/hisilicon/hi3660-sched-energy.dtsi
  43. 43. © 2017 Arm Limited43 Sched domains
  44. 44. © 2017 Arm Limited44 Sched domains CPU 0
  45. 45. © 2017 Arm Limited45 Sched domains MC 0 1 2 3 0 1 2 3 groups CPU span CPU 0
  46. 46. © 2017 Arm Limited46 Sched domains MC 0 1 2 3 0 1 2 3 groups CPU span CPU 0 LITTLE Cap. Cost 133 87 250 167 351 265 429 388 462 502 LITTLE 5 5 0 0 CPU
  47. 47. © 2017 Arm Limited47 Sched domains MC DIE 0 1 2 3 10 11 0 1 2 3 4 5 6 7 parent groups groups CPU span CPU 0 LITTLE Cap. Cost 133 87 250 167 351 265 429 388 462 502 LITTLE 5 5 0 0 CPU
  48. 48. © 2017 Arm Limited48 Sched domains MC DIE 0 1 2 3 10 11 0 1 2 3 4 5 6 7 parent groups groups CPU span CPU 0 LITTLE Cap. Cost 133 87 250 167 351 265 429 388 462 502 LITTLE 5 5 0 0 CPU big Cap. Cost 390 102 615 124 782 221 915 330 1024 433 big 102 102 102 0 ClusterCluster LITTLE 12 12 12 0 LITTLE Cap. Cost 133 12 250 22 351 36 429 67 462 144
  49. 49. © 2017 Arm Limited49 Need for simplification • Comprehensive Energy Model, but …
  50. 50. © 2017 Arm Limited50 Need for simplification • Comprehensive Energy Model, but … • Complex to measure for new platforms
  51. 51. © 2017 Arm Limited51 Need for simplification • Comprehensive Energy Model, but … • Complex to measure for new platforms • Computationally expensive scheduling decisions
  52. 52. © 2017 Arm Limited52 Need for simplification • Comprehensive Energy Model, but … • Complex to measure for new platforms • Computationally expensive scheduling decisions • Existing code relies only on out-of-tree bindings
  53. 53. © 2017 Arm Limited53 Need for simplification • Comprehensive Energy Model, but … • Complex to measure for new platforms • Computationally expensive scheduling decisions • Existing code relies only on out-of-tree bindings • Inaccurate assumptions for future platforms
  54. 54. © 2017 Arm Limited54 Summary 1. Today’s Energy Model 2. Which simplified Energy Model ? 3. Mainline implementation 4. Conclusion
  55. 55. © 2017 Arm Limited55 Which simplified EM ?
  56. 56. © 2017 Arm Limited56 Which simplified EM ? Name CPU Level Cluster Level Active costs Idle costs Active costs Idle costs FULL YES YES YES YES NOIDLE YES NO YES NO NOCLUSTER YES YES NO NO NOCLUSTER_NOIDLE YES NO NO NO NO_EAS NO NO NO NO
  57. 57. © 2017 Arm Limited57 Which simplified EM ? Name CPU Level Cluster Level Active costs Idle costs Active costs Idle costs FULL YES YES YES YES NOIDLE YES NO YES NO NOCLUSTER YES YES NO NO NOCLUSTER_NOIDLE YES NO NO NO NO_EAS NO NO NO NO
  58. 58. © 2017 Arm Limited58 Which simplified EM ? Name CPU Level Cluster Level Active costs Idle costs Active costs Idle costs FULL YES YES YES YES NOIDLE YES NO YES NO NOCLUSTER YES YES NO NO NOCLUSTER_NOIDLE YES NO NO NO NO_EAS NO NO NO NO
  59. 59. © 2017 Arm Limited59 Which simplified EM ? Name CPU Level Cluster Level Active costs Idle costs Active costs Idle costs FULL YES YES YES YES NOIDLE YES NO YES NO NOCLUSTER YES YES NO NO NOCLUSTER_NOIDLE YES NO NO NO NO_EAS NO NO NO NO
  60. 60. © 2017 Arm Limited60 Which simplified EM ? Name CPU Level Cluster Level Active costs Idle costs Active costs Idle costs FULL YES YES YES YES NOIDLE YES NO YES NO NOCLUSTER YES YES NO NO NOCLUSTER_NOIDLE YES NO NO NO NO_EAS NO NO NO NO
  61. 61. © 2017 Arm Limited61 Which simplified EM ? Name CPU Level Cluster Level Active costs Idle costs Active costs Idle costs FULL YES YES YES YES NOIDLE YES NO YES NO NOCLUSTER YES YES NO NO NOCLUSTER_NOIDLE YES NO NO NO NO_EAS NO NO NO NO
  62. 62. © 2017 Arm Limited62 Which simplified EM ? Name CPU Level Cluster Level Active costs Idle costs Active costs Idle costs FULL YES YES YES YES NOIDLE YES NO YES NO NOCLUSTER YES YES NO NO NOCLUSTER_NOIDLE YES NO NO NO NO_EAS NO NO NO NO • Tested on Android-4.4: Hikey960, Pixel2, Hikey620
  63. 63. © 2017 Arm Limited63 Which simplified EM ? Name CPU Level Cluster Level Active costs Idle costs Active costs Idle costs FULL YES YES YES YES NOIDLE YES NO YES NO NOCLUSTER YES YES NO NO NOCLUSTER_NOIDLE YES NO NO NO NO_EAS NO NO NO NO • Tested on Android-4.4: Hikey960, Pixel2, Hikey620 • SchedTune disabled, no cpusets
  64. 64. © 2017 Arm Limited64 Jankbench / list_view - Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS
  65. 65. © 2017 Arm Limited65 Jankbench / list_view - Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS Mean
  66. 66. © 2017 Arm Limited66 Jankbench / list_view - Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS
  67. 67. © 2017 Arm Limited67 Jankbench / list_view - Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS 50%
  68. 68. © 2017 Arm Limited68 Jankbench / list_view - Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS 50% ~99%
  69. 69. © 2017 Arm Limited69 Jankbench / list_view - Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS
  70. 70. © 2017 Arm Limited70 Jankbench / image_list_view - Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS
  71. 71. © 2017 Arm Limited71 Jankbench / low_hitrate_text - Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS
  72. 72. © 2017 Arm Limited72 Jankbench / shadow_grid - Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS
  73. 73. © 2017 Arm Limited73 Jankbench / edit_text - Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS
  74. 74. © 2017 Arm Limited74 Homescreen / Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS
  75. 75. © 2017 Arm Limited75 ExoPlayer Video / Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS
  76. 76. © 2017 Arm Limited76 ExoPlayer Audio / Hikey960 FULL NOIDLE NOCLUSTER_NOIDLE NOCLUSTER NO_EAS
  77. 77. © 2017 Arm Limited77 Results of experiments • Hikey960: all energy models show comparable energy savings • Pixel2: same conclusions with smaller savings (up to 13%, screen on) • Hikey620 (SMP): No significant savings
  78. 78. © 2017 Arm Limited78 Results of experiments • Hikey960: all energy models show comparable energy savings • Pixel2: same conclusions with smaller savings (up to 13%, screen on) • Hikey620 (SMP): No significant savings The simplest EM (noidle_nocluster) is a reasonable option for modern platforms
  79. 79. © 2017 Arm Limited79 Summary 1. Today’s Energy Model 2. Which simplified Energy Model ? 3. Mainline implementation 4. Conclusion
  80. 80. © 2017 Arm Limited80 Dynamic power model
  81. 81. © 2017 Arm Limited81 Dynamic power model 𝑃 = 𝐶 ∗ 𝑉 2 ∗ 𝑓 2 2 𝑓
  82. 82. © 2017 Arm Limited82 Dynamic power model 𝑃 = 𝐶 ∗ 𝑉 2 ∗ 𝑓 2 2 𝑓 Power
  83. 83. © 2017 Arm Limited83 Dynamic power model 𝑃 = 𝐶 ∗ 𝑉 2 ∗ 𝑓 2 2 𝑓 Power Capacitance
  84. 84. © 2017 Arm Limited84 Dynamic power model 𝑃 = 𝐶 ∗ 𝑉 2 ∗ 𝑓 2 2 𝑓 Power Capacitance Voltage
  85. 85. © 2017 Arm Limited85 Dynamic power model 𝑃 = 𝐶 ∗ 𝑉 2 ∗ 𝑓 2 2 𝑓 Power Capacitance Voltage Frequency
  86. 86. © 2017 Arm Limited86 Dynamic power model 𝑃 = 𝐶 ∗ 𝑉 2 ∗ 𝑓 2 2 𝑓 Power Capacitance Voltage Frequency Mainline DT binding: dynamic-power-coefficient
  87. 87. © 2017 Arm Limited87 Dynamic power model 𝑃 = 𝐶 ∗ 𝑉 2 ∗ 𝑓 2 2 𝑓 Power Capacitance Voltage Frequency Mainline DT binding: dynamic-power-coefficient Managed by CPUFreq / OPP
  88. 88. © 2017 Arm Limited88 Energy Model Comparison / Hikey960 1 11 21 31 41 51 61 71 81 91 200 700 1200 1700 2200 Measured 𝐶 ∗ 𝑉2 ∗ 𝑓 Frequency (MHz). Power(%)
  89. 89. © 2017 Arm Limited89 Architecture
  90. 90. © 2017 Arm Limited90 Architecture Thermal / IPA 𝑃 = 𝐶𝑉2 𝑓 PM OPP Device Tree 𝑉 𝑓 𝐶
  91. 91. © 2017 Arm Limited91 Architecture Thermal / IPA PM OPP Device Tree 𝑉 𝑓 𝐶
  92. 92. © 2017 Arm Limited92 Architecture Thermal / IPA PM OPP Device Tree 𝑉 𝑓 𝐶 𝑃 𝐶𝑉2 𝑓
  93. 93. © 2017 Arm Limited93 Architecture Thermal / IPA PM OPP Device Tree 𝑉 𝑓 𝐶 𝑃 𝐶𝑉2 𝑓 Scheduler Energy Model
  94. 94. © 2017 Arm Limited94 Architecture Thermal / IPA PM OPP Device Tree 𝑉 𝑓 𝐶 𝑃 𝐶𝑉2 𝑓 Scheduler Energy Model SCMI
  95. 95. © 2017 Arm Limited95 Architecture Thermal / IPA PM OPP Device Tree 𝑉 𝑓 𝐶 𝑃 𝐶𝑉2 𝑓 Scheduler Energy Model SCMI Other ?
  96. 96. © 2017 Arm Limited96 Architecture Thermal / IPA PM OPP Device Tree 𝑉 𝑓 𝐶 𝑃 𝐶𝑉2 𝑓 Scheduler Energy Model SCMI Other ? [PATCH v3 0/2] thermal, OPP: move the CPU power estimation to the OPP library -> [PATCH v3 1/2] PM / OPP: introduce an OPP power estimation helper -> [PATCH v3 2/2] thermal: cpu_cooling: use power models from the OPP library
  97. 97. © 2017 Arm Limited97 Implementation • No hierarchical data, no need to use the scheduling domains • Data structures: • Loaded from PM / OPP at boot time, after CPUfreq is setup • Energy models are stored in a flat per-cpu array • Frequency domains are stored in cpu-masks
  98. 98. © 2017 Arm Limited98 Assumptions • All CPUs in a freq. domain share capacity states • All CPUs in a freq. domain have the same micro-architecture • Possible to relax this if needed, but higher computational cost • EAS enabled for asymmetric platforms only (SD_ASYM_CPUCAPACITY flag set) • EAS shines on heterogeneous platforms • Avoid “conflicts” for purely perf-oriented platforms (servers, …)
  99. 99. © 2017 Arm Limited99 Tests against mainline • Test setup: • Platform: Hikey960 and Juno r0 • Debian userspace • Base kernel: tip/sched/core – 4.16-rc2 • Test cases: • Energy: “X” RTApp tasks, 16ms period, 5% duty cycle, 30 seconds • Performance: `perf bench sched messaging –pipe –thread –group X –loop 50000`
  100. 100. © 2017 Arm Limited100 Tests against mainline / Energy 0 10 20 30 40 50 60 70 80 90 100 10 tasks 20 tasks 30 tasks 40 tasks 50 tasks tip/sched/core EAS 0 10 20 30 40 50 60 70 80 90 100 10 tasks 20 tasks 30 tasks 40 tasks 50 tasks tip/sched/core EAS Hikey960 (ACME / full SoC + memory) Juno (HW monitor / b.L CPUs only) Energy(%) Energy(%)
  101. 101. © 2017 Arm Limited101 Tests against mainline / Perf. 0 10 20 30 40 50 60 70 40 tasks 80 tasks 160 tasks 320 tasks tip/sched/core EAS 0 10 20 30 40 50 60 40 tasks 80 tasks 160 tasks 320 tasks tip/sched/core EAS Hikey960 Juno Time(s) Time(s)
  102. 102. © 2017 Arm Limited102 Posted to LKML this week [RFC PATCH 0/6] Energy Aware Scheduling [RFC PATCH 1/6] sched/fair: Create util_fits_capacity() [RFC PATCH 2/6] sched: Introduce energy models of CPUs [RFC PATCH 3/6] sched: Add over-utilization/tipping point indicator [RFC PATCH 4/6] sched/fair: Introduce an energy estimation helper … [RFC PATCH 5/6] sched/fair: Select an energy-efficient CPU on task … [RFC PATCH 6/6] drivers: base: arch_topology.c: Enable EAS for arm/…
  103. 103. © 2017 Arm Limited103 Summary 1. Today’s Energy Model 2. Which simplified Energy Model ? 3. Mainline implementation 4. Conclusion
  104. 104. © 2017 Arm Limited104 Next steps • Ideal scenario: simplified EM goes in the next LTS (4.19 ?) • Test & assessment on android-4.14 • In case of gaps with the full EM, they will be filled in product
  105. 105. © 2017 Arm Limited105 Thanks. Any questions ?
  106. 106. 106106 © 2017 Arm Limited The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. www.arm.com/company/policies/trademarks
  107. 107. © 2017 Arm Limited107 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs
  108. 108. © 2017 Arm Limited108 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2
  109. 109. © 2017 Arm Limited109 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2
  110. 110. © 2017 Arm Limited110 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  111. 111. © 2017 Arm Limited111 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  112. 112. © 2017 Arm Limited112 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  113. 113. © 2017 Arm Limited113 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  114. 114. © 2017 Arm Limited114 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  115. 115. © 2017 Arm Limited115 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2
  116. 116. © 2017 Arm Limited116 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  117. 117. © 2017 Arm Limited117 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  118. 118. © 2017 Arm Limited118 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  119. 119. © 2017 Arm Limited119 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  120. 120. © 2017 Arm Limited120 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2
  121. 121. © 2017 Arm Limited121 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  122. 122. © 2017 Arm Limited122 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  123. 123. © 2017 Arm Limited123 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  124. 124. © 2017 Arm Limited124 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2
  125. 125. © 2017 Arm Limited125 Algorithm complexity T1 cap=100 cost=100 cap=350 cost=400 cap=500 cost=800 CPU0 CPU1 CPU2 CPU3 cap=400 cost=700 cap=600 cost=1000 cap=800 cost=1800 CPU4 CPU5 CPU6 CPU7 cap=1000 cost=3000 LITTLE CPUs BIG CPUs T2 T2

×