EUROPE 2012 (LCE12)
big.LITTLE mini-summit
Amit Kucheria, Power Management Tech Lead
LCE 2012, Copenhagen
EUROPE 2012 (LCE12)
www.linaro.org
Asymmetric cores
Capacity
Energy
Latencies
Operating points
Cache types
EUROPE 2012 (LCE12)
www.linaro.org
In-kernel Switcher (IKS)
Pros
Minimal kernel changes
Available now through Linaro
Cons
...
EUROPE 2012 (LCE12)
www.linaro.org
Heterogenous MP (HMP)
Pros
All cores can be used
Cons
Large changes to Linux kernel
Pro...
EUROPE 2012 (LCE12)
www.linaro.org
Being a catalyst...
Solving long standing problems
Better CPU qiesce
Better scheduling
...
EUROPE 2012 (LCE12)
www.linaro.org
Mini-summit agenda
Plenary – Robin Randhawa
Whirlwind tour of experimental results on T...
7
big.LITTLE on TC2
Robin Randhawa
ARM’s Test Chip 2 (TC#2): An Overview
 AVersatile Express core tile
publically available:
 Capabilities
 2 x A15 (r2p1)...
IKS: CPU Migration
 big.LITTLE extends DVFS
 DVFS algorithm monitors load on each
CPU
 When load is low it can be handl...
IKS: CPU Migration
 big.LITTLE extends DVFS
 DVFS algorithm monitors load on each
CPU
 When load is low it can be handl...
11
IKS: Results for Audio on TC2
 Power compared to executing the use case on A15
 IKS does not use A15s during Audio ru...
12
IKS: Results for BBench + Audio on TC2
 Performance is measured as from page loading times of
BBench
 Results normali...
13
IKS: Hispeed2
14
IKS: Results: Bbench + Audio
 Power improves with no performance cost
BBench page + Audio
TC2:
A15 up to 1.2 GHz
A7 up...
15
MP solution – more details
 Scheduler modifications:
 Treat big and LITTLE cpus as
separate scheduling domains.
 Use...
16
MP: ARM TC2: Audio
 Workload: Audio (mp3 playback)
 Performance/Energy target:
 A7 energy
 Status:
 Audio related ...
17
MP: Audio workload analysis
 Where is the extra energy spent
with MP?
 Need a look at why A15's consume
power when th...
18
MP – Top Issues
 Spurious wakeups
 A15s are woken up by scheduler ticks (mainly)
 Workqueues
 Timers
 RCU
 Schedu...
Upcoming SlideShare
Loading in …5
×

LCE12: big.LITTLE Mini-Summit

630 views
410 views

Published on

Resource: LCE12
Name: big.LITTLE Mini-Summit
Date: 01-11-2012
Speaker: Amit Kucheria

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
630
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

LCE12: big.LITTLE Mini-Summit

  1. 1. EUROPE 2012 (LCE12) big.LITTLE mini-summit Amit Kucheria, Power Management Tech Lead LCE 2012, Copenhagen
  2. 2. EUROPE 2012 (LCE12) www.linaro.org Asymmetric cores Capacity Energy Latencies Operating points Cache types
  3. 3. EUROPE 2012 (LCE12) www.linaro.org In-kernel Switcher (IKS) Pros Minimal kernel changes Available now through Linaro Cons Half the cores used
  4. 4. EUROPE 2012 (LCE12) www.linaro.org Heterogenous MP (HMP) Pros All cores can be used Cons Large changes to Linux kernel Production-ready only next year Basic feature-set for partners 1Q 2013 Upstreaming - several months Optimisations
  5. 5. EUROPE 2012 (LCE12) www.linaro.org Being a catalyst... Solving long standing problems Better CPU qiesce Better scheduling Useful for SMP (A9, A15)
  6. 6. EUROPE 2012 (LCE12) www.linaro.org Mini-summit agenda Plenary – Robin Randhawa Whirlwind tour of experimental results on TC2 Session 1 (09:00 – 09:55) Status overview Making Linux work with asymmetric systems Session 2 (10:00 – 10:45) The Bluesky session: What would the ideal power-aware kernel do? (45 mins) Session 3 (11:00 – 11:55) Back to reality: What do we have today and the sequence of steps to get to where we want to be (55 mins) Session 4 (12:00 – 13:00) Workloads and Test Automation (30 mins) General Discussions on further work and Wrap-Up (30 mins)
  7. 7. 7 big.LITTLE on TC2 Robin Randhawa
  8. 8. ARM’s Test Chip 2 (TC#2): An Overview  AVersatile Express core tile publically available:  Capabilities  2 x A15 (r2p1) @ up to 1.2 Ghz  3 x A7 (r0p1) @ up to 1Ghz  CCI/DMC/GIC/ADB (r0p0)  DMA (PL330)  2GB external DDR2 memory @ 400Mhz  64k internal SRAM  Coresight debug (including JTAG and ITM trace but no STM)  No GPU  cpufreq support: Independent for each cluster with limited voltage scaling  cpuidle support: Cluster power gating TC2
  9. 9. IKS: CPU Migration  big.LITTLE extends DVFS  DVFS algorithm monitors load on each CPU  When load is low it can be handled on a LITTLE processor  When load is high the context is transferred to a big processor  The unused processor can be powered down  When all processors in a cluster are inactive the cluster and its L2 cache can be powered down
  10. 10. IKS: CPU Migration  big.LITTLE extends DVFS  DVFS algorithm monitors load on each CPU  When load is low it can be handled on a LITTLE processor  When load is high the context is transferred to a big processor  The unused processor can be powered down  When all processors in a cluster are inactive the cluster and its L2 cache can be powered down
  11. 11. 11 IKS: Results for Audio on TC2  Power compared to executing the use case on A15  IKS does not use A15s during Audio run 70% saving TC2: A15 up to 1.2 GHz A7 up to 1 GHz Better results expected on representative silicon.
  12. 12. 12 IKS: Results for BBench + Audio on TC2  Performance is measured as from page loading times of BBench  Results normalised to power and performance consumed on same use case run on A15 only BBench page + Audio TC2: A15 up to 1.2 GHz A7 up to 1 GHz Better results expected on representative silicon.
  13. 13. 13 IKS: Hispeed2
  14. 14. 14 IKS: Results: Bbench + Audio  Power improves with no performance cost BBench page + Audio TC2: A15 up to 1.2 GHz A7 up to 1 GHz Better results expected on representative silicon.
  15. 15. 15 MP solution – more details  Scheduler modifications:  Treat big and LITTLE cpus as separate scheduling domains.  Use PJT's load-tracking patches to track individual task load.  Migrate tasks between the big and the LITTLE domains based on task load. L BB L Load balance Load balance Load-based task migration Task load Task state Executing Sleep Load decay
  16. 16. 16 MP: ARM TC2: Audio  Workload: Audio (mp3 playback)  Performance/Energy target:  A7 energy  Status:  Audio related task do not use A15s, but the power consumption is still significantly more than A7 alone.  MP not as power efficient as IKS yet  Todo:  Target spurious wake-ups on A15. All the extra power comes from the A15's which shouldn't be used at all. Energy A7 30.79% MP 39.86% 0 10 20 30 40 50 60 70 80 90 100 Audio A15 A7 2CPU IKS MP Energy
  17. 17. 17 MP: Audio workload analysis  Where is the extra energy spent with MP?  Need a look at why A15's consume power when they are not necessary  We see unwarranted wake ups on A15  No user threads running on A15  Tend to favour CPU0  Examples:  tick_sched_timer (99.7% on CPU0)  Hrtimers  Workqueue A7 MP 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Audio energy breakdown A15 cluster A7 cluster Energy
  18. 18. 18 MP – Top Issues  Spurious wakeups  A15s are woken up by scheduler ticks (mainly)  Workqueues  Timers  RCU  Scheduler ticks  cpu wakeup prioritisation  Pick the cheapest target cpu  Balancing  Scale invariance  Load accumulation rate  Spread load to A7s when A15s are overloaded  Pack vs. spread  Cluster aware cpufreq governors

×