Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Getting the most out of DynamIQ & Enabling support of DynamiQ - SFO17-104

543 views

Published on

Session ID: SFO17-104
Session Name: Getting the most out of DynamIQ & Enabling support of DynamiQ - SFO17-104
Speaker: Vincent Guittot

Track: Power Management


★ Session Summary ★
DynamIQ technology is the foundation for future ARM Cortex-A processors. This session discusses the software impact of DynamIQ and how to get the most out of it, with particular reference to ARM Trusted Firmware and EAS.

Last March, ARM announced DynamiQ which gives a new level of flexibility in designing heterogeneous system. During this session, we will cover what are the impacts and how to enable its supports in the kernel.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/sfo17/sfo17-104/
Presentation:
Video:
---------------------------------------------------

★ Event Details ★
Linaro Connect San Francisco 2017 (SFO17)
25-29 September 2017
Hyatt Regency San Francisco Airport

---------------------------------------------------
Keyword:
http://www.linaro.org
http://connect.linaro.org
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Getting the most out of DynamIQ & Enabling support of DynamiQ - SFO17-104

  1. 1. Enabling Arm® DynamIQ™ support Dan Handley (Arm) Ionela Voinescu (Arm) Vincent Guittot (Linaro)
  2. 2. ENGINEERS AND DEVICES WORKING TOGETHER Agenda ● DynamIQ introduction ● DynamIQ and Arm Trusted Firmware ● OS Power Management with DynamIQ ● L3 partial power-down support
  3. 3. ENGINEERS AND DEVICES WORKING TOGETHER DynamIQ™ key features From https://developer.arm.com/technologies/dynamiq 1. A new single-cluster design 2. Intelligent compute capabilities 3. Interfaces for closely coupled accelerators 4. Built-in power-saving features 5. DynamIQ big.LITTLE 6. Advanced RAS and safety features
  4. 4. ENGINEERS AND DEVICES WORKING TOGETHER DynamIQ™ key features From https://developer.arm.com/technologies/dynamiq 1. A new single-cluster design 2. Intelligent compute capabilities 3. Interfaces for closely coupled accelerators 4. Built-in power-saving features 5. DynamIQ big.LITTLE 6. Advanced RAS and safety features
  5. 5. ENGINEERS AND DEVICES WORKING TOGETHER DynamIQ Shared Unit (DSU) TRM: http://infocenter.arm.com/help/topic/com.arm.doc.100453_0002_00_en ● Armv8.2+ Cortex-A CPU support ○ e.g. Cortex-A55, Cortex-A75 ● 2 different CPU types in same cluster ○ Maximum 8 ● Per-CPU L1+L2 caches and shared L3 ● Per-CPU DVFS control ● Partial L3 cache power down ● Hardware assisted power management ○ Simplifies power up/down software
  6. 6. ENGINEERS AND DEVICES WORKING TOGETHER Agenda ● DynamIQ introduction ● DynamIQ and Arm Trusted Firmware ● OS Power Management with DynamIQ ● L3 partial power-down support
  7. 7. ENGINEERS AND DEVICES WORKING TOGETHER DynamIQ Shared Unit (DSU) and Arm TF ● DSU enables simpler, faster and more robust software during power up/down ○ Simplified micro-architectural programming sequence ○ Automatic enabling and disabling of coherency with the interconnect ○ Automatic and faster cache flushing at all levels without software intervention ○ Reduced power controller communication via P-channel interface ● TF enables more performant PSCI operations via HW_ASSISTED_COHERENCY option ○ CPU idle, hotplug, secondary CPU boot ○ Will still work without HW_ASSISTED_COHERENCY but won’t get the benefits ○ Allows more aggressive OSPM tuning ○ Warning: Some HW operations will be invisible to SW and may give misleading statistics
  8. 8. ENGINEERS AND DEVICES WORKING TOGETHER CPU idle to power down (Armv8.0 CPUs) ● Validate CPU_SUSPEND arguments ● Acquire locks for non-CPU levels ● PSCI state coordination ● CPU-specific power down handling ○ Disable data caches ○ Flush data cache(s) ○ Disable intra-cluster coherency (!SMP_BIT) ● Stack maintenance ● Platform suspend operations ● Release locks for non-CPU levels ● Wait For Interrupt (WFI) ● Minimal SCTLR initialization ● Platform reset handling ● CPU-specific reset handling ○ Errata handling ○ Enable intra-cluster coherency (SMP_BIT) ● CPU architectural register initialization ● Enable MMU ● Acquire locks for non-CPU levels ● Platform suspend-finish operations ● Stack maintenance ● Enable data caches ● Restore OS context ● PSCI bookkeeping ● Release locks for non-CPU levels ● ERET to OS Power Down Power UpOS calls SMC CPU_SUSPEND Reset
  9. 9. ENGINEERS AND DEVICES WORKING TOGETHER CPU idle to power down (Armv8.2 CPUs) ● Validate CPU_SUSPEND arguments ● Acquire locks for non-CPU levels ● PSCI state coordination ● CPU-specific power down handling ○ Request CPU power down (CORE_PWRDN_EN) ● Platform suspend operations ● Release locks for non-CPU levels ● Wait For Interrupt (WFI) ● Minimal SCTLR initialization ● Platform reset handling ● CPU-specific reset handling ○ Errata handling (none yet) ● CPU architectural register initialization ● Enable MMU and data caches ● Acquire locks for non-CPU levels ● Platform suspend-finish operations ● Restore OS context ● PSCI bookkeeping ● Release locks for non-CPU levels ● ERET to OS Power Down Power UpOS calls SMC CPU_SUSPEND Reset
  10. 10. ENGINEERS AND DEVICES WORKING TOGETHER CPU idle to power down (Armv8.2 CPUs) ● Validate CPU_SUSPEND arguments ● Acquire locks for non-CPU levels ● PSCI state coordination ● CPU-specific power down handling ○ Request CPU power down (CORE_PWRDN_EN) ● Platform suspend operations ● Release locks for non-CPU levels ● Wait For Interrupt (WFI) ● Minimal SCTLR initialization ● Platform reset handling ● CPU-specific reset handling ○ Errata handling (none yet) ● CPU architectural register initialization ● Enable MMU and data caches ● Acquire locks for non-CPU levels ● Platform suspend-finish operations ● Restore OS context ● PSCI bookkeeping ● Release locks for non-CPU levels ● ERET to OS Power Down Power UpOS calls SMC CPU_SUSPEND Reset D$ enabled much earlier D$ remains enabled throughout
  11. 11. ENGINEERS AND DEVICES WORKING TOGETHER CPU idle to power down (Armv8.2 CPUs) ● Validate CPU_SUSPEND arguments ● Acquire locks for non-CPU levels ● PSCI state coordination ● CPU-specific power down handling ○ Request CPU power down (CORE_PWRDN_EN) ● Platform suspend operations ● Release locks for non-CPU levels ● Wait For Interrupt (WFI) ● Minimal SCTLR initialization ● Platform reset handling ● CPU-specific reset handling ○ Errata handling (none yet) ● CPU architectural register initialization ● Enable MMU and data caches ● Acquire locks for non-CPU levels ● Platform suspend-finish operations ● Restore OS context ● PSCI bookkeeping ● Release locks for non-CPU levels ● ERET to OS Power Down Power UpOS calls SMC CPU_SUSPEND Reset No need for explicit cache flushes or stack maintenance
  12. 12. ENGINEERS AND DEVICES WORKING TOGETHER CPU idle to power down (Armv8.2 CPUs) ● Validate CPU_SUSPEND arguments ● Acquire locks for non-CPU levels ● PSCI state coordination ● CPU-specific power down handling ○ Request CPU power down (CORE_PWRDN_EN) ● Platform suspend operations ● Release locks for non-CPU levels ● Wait For Interrupt (WFI) ● Minimal SCTLR initialization ● Platform reset handling ● CPU-specific reset handling ○ Errata handling (none yet) ● CPU architectural register initialization ● Enable MMU and data caches ● Acquire locks for non-CPU levels ● Platform suspend-finish operations ● Restore OS context ● PSCI bookkeeping ● Release locks for non-CPU levels ● ERET to OS Power Down Power UpOS calls SMC CPU_SUSPEND Reset Much more efficient spin locks instead of bakery locks (using v8.1 CAS instruction)
  13. 13. ENGINEERS AND DEVICES WORKING TOGETHER CPU idle to power down (Armv8.2 CPUs) ● Validate CPU_SUSPEND arguments ● Acquire locks for non-CPU levels ● PSCI state coordination ● CPU-specific power down handling ○ Request CPU power down (CORE_PWRDN_EN) ● Platform suspend operations ● Release locks for non-CPU levels ● Wait For Interrupt (WFI) ● Minimal SCTLR initialization ● Platform reset handling ● CPU-specific reset handling ○ Errata handling (none yet) ● CPU architectural register initialization ● Enable MMU and data caches ● Acquire locks for non-CPU levels ● Platform suspend-finish operations ● Restore OS context ● PSCI bookkeeping ● Release locks for non-CPU levels ● ERET to OS Power Down Power UpOS calls SMC CPU_SUSPEND Reset No need for explicit interconnect programming for masters to enter/exit coherency (Potentially) reduced power controller communication
  14. 14. ENGINEERS AND DEVICES WORKING TOGETHER Future TF enhancements ● Use per-thread cluster power voting register ○ CLUSTERPWRDN_EL1 ○ Automatic cluster power down or memory retention ... ○ ... if the power controller hardware and firmware support it ● Remove cluster level locks ○ or at least reduce the time they are held ● Analyze performance on DynamIQ hardware platforms
  15. 15. ENGINEERS AND DEVICES WORKING TOGETHER Agenda ● DynamIQ introduction ● DynamIQ and Arm Trusted Firmware ● OS Power Management with DynamIQ ● L3 partial power-down support
  16. 16. ENGINEERS AND DEVICES WORKING TOGETHER OS Power Management with DynamIQ ● Finer grained power capabilities ○ Already handled by PM frameworks ● Per-core Frequency/Voltage domain ● DSU Frequency/Voltage domain
  17. 17. ENGINEERS AND DEVICES WORKING TOGETHER Scheduler domains ● Current big.LITTLE system ○ Energy model layout matches scheduler domain ● Example of 4 big cores + 4 LITTLE cores:
  18. 18. ENGINEERS AND DEVICES WORKING TOGETHER Scheduler domains ● DynamIQ changes domains boundaries ○ Not necessarily congruent ○ Physical / Voltage / Frequency / Architecture ● Change the scheduler topology ○ And energy model layout ● Example of 4 big cores + 4 LITTLE cores:
  19. 19. ENGINEERS AND DEVICES WORKING TOGETHER Phantom domains ● Add intermediate domain ○ Voltage/Frequency boundary ● Example of 4 big cores + 4 LITTLE cores: ○ Per core DVFS
  20. 20. ENGINEERS AND DEVICES WORKING TOGETHER Phantom domains ● Example of 4 big cores + 4 LITTLE cores: ○ One frequency domain for big cores and one for LITTLE cores ○ Frequency domain close to current big.LITTLE system ● Enable similar scheduler topology
  21. 21. ENGINEERS AND DEVICES WORKING TOGETHER OSPM next steps ● Shared frequency domains ● Shared voltage domains ● Impact on energy model ● Impact on compute capacity ● Getting notified of power domain OPP change ● Multiple DynamIQ clusters Reference: https://developer.arm.com/-/media/developer/developers/open- source/energy-aware-scheduling/DynamIQ_design_specification_v1.0.pdf
  22. 22. ENGINEERS AND DEVICES WORKING TOGETHER Agenda ● DynamIQ introduction ● DynamIQ and Arm Trusted Firmware ● OS Power Management with DynamIQ ● L3 partial power-down support
  23. 23. ENGINEERS AND DEVICES WORKING TOGETHER L3 partial power-down ● Arm DynamIQ Shared Unit (DSU) L3 cache ○ Implementation specific number of portions controlled through a power control register ○ Counters for cache misses and cache hits to help drive decisions ● Support in software ○ DevFreq driver ○ Control of active portions based on: ■ Cache hit/miss rates ■ Computed power benefit ■ Bias for performance ○ Out of tree reference implementation: https://git.linaro.org/landing-teams/working/arm/kernel- release.git/log/?h=dsu_partial_powerdown_support_v1.0
  24. 24. ENGINEERS AND DEVICES WORKING TOGETHER L3 partial power-down: architecture hit counter DSU register interfaceLinux Kernel DSU L3 cache miss counter control register DevFreq governor DevFreq device Target portions Timer 10ms Update DevFreq Set target portions
  25. 25. ENGINEERS AND DEVICES WORKING TOGETHER L3 partial power-down: algorithm Upsize: Weigh additional cost in energy of enabling another portion against potential savings by decreasing dynamic cost of accessing DRAM. ● Condition for upsize: MBW > (1.0 – Tu) * CB ● MBW – miss bandwidth: MiB/sec ● CB – cost bandwidth: MiB/sec ○ CB = L / ED ● L – static leakage of single portion: uJ/sec ● ED – dynamic energy of DRAM: uJ/MiB ● Tu – upsizing threshold: fraction 0.00 to 1.00 ○ Bias for performance ● Compare energy consumption ○ Bias for performance L3 cache static DRAM dynamic energy
  26. 26. ENGINEERS AND DEVICES WORKING TOGETHER L3 partial power-down: algorithm - 1 Downsize: From an energy trade-off perspective, to justify a portion to be powered on, requires a hit bandwidth that pays for its leakage. If that requirement is not met, it can be powered-off. ● Condition for downsize: HBW < (N – Td) * CB ● HBW – hit bandwidth: MiB/sec ● N – current number of portions enabled ● CB – cost bandwidth: MiB/sec ○ CB = L / ED ● L – static leakage of single portion: uJ/sec ● ED – dynamic energy of DRAM: uJ/MiB ● Td – downsize threshold: fraction 0.00 to 1.00 ○ Bias for performance ● Compare energy consumption ○ Bias for performance L3 cache static DRAM dynamic energy
  27. 27. ENGINEERS AND DEVICES WORKING TOGETHER L3 partial power-down: behaviour Example: ● 2MB L3 cache ● Memcpy workload with buffer size of 4MB
  28. 28. ENGINEERS AND DEVICES WORKING TOGETHER L3 partial power-down: behaviour - 1 Expected behaviour: ● CPU intensive workloads should not have an effect on the number of active portions ● I/O intensive loads should raise portions when the cache is well used
  29. 29. ENGINEERS AND DEVICES WORKING TOGETHER L3 partial power-down ● Limitations of current reference implementation ○ Portion is the smallest single unit of the cache that can be powered up/down ○ Only support for a single DynamIQ Shared Unit ○ Not suitable for use with the simple on-demand governor ● L3 partial power-down in Arm Trusted Firmware? Reference: https://developer.arm.com/-/media/developer/developers/open- source/energy-aware-scheduling/DynamIQ_design_specification_v1.0.pdf
  30. 30. Thank You #SFO17 BUD17 keynotes and videos on: connect.linaro.org For further information: www.linaro.org

×