Automating Google Workspace (GWS) & more with Apps Script
LCE12: Handling bigLITTLE Core and Cluster Shutdown on ARM
1. Nicolas Pitre
Dave Martin
Linaro Connect Q4.12
October 2012
Nicolas Pitre
Dave Martin
Linaro Connect Q4.12
October 2012
Handling big.LITTLE Core and
Cluster Shutdowns on ARM
Handling big.LITTLE Core and
Cluster Shutdowns on ARM
3. big.LITTLE Activitiesbig.LITTLE Activities
● Current big.LITTLE projects:
● big.LITTLE switcher
● big.LITTLE “full MP”
● Goal: optimize performance and save
power on big.LITTLE SoCs
● Current big.LITTLE projects:
● big.LITTLE switcher
● big.LITTLE “full MP”
● Goal: optimize performance and save
power on big.LITTLE SoCs
4. Power SavingPower Saving
● Save power by:
● turning off individual CPUs;
● shutting down a whole cluster
● Opportunistic cluster shutdown is key.
● Much more complex than it may seem at
first glance.
● Save power by:
● turning off individual CPUs;
● shutting down a whole cluster
● Opportunistic cluster shutdown is key.
● Much more complex than it may seem at
first glance.
6. down
up
CPU Life-CycleCPU Life-Cycle
● up: powered on,
running normally
● going down:
shutdown in
progress
● down: powered off
● coming up:
powered on, setup
in progress
● up: powered on,
running normally
● going down:
shutdown in
progress
● down: powered off
● coming up:
powered on, setup
in progress
going downcoming up
7. Cluster ShutdownCluster Shutdown
● All CPUs shutting down must:
1)disable allocation into L1
2)flush dirty L1 content
3)disable CPU-level coherency
4)power itself down
● When all CPUs are shut down, we can shut down the
cluster:
● The Last Man must perform steps 1-3, and:
5)flush cluster-level (L2) cache
5)disable CCI snooping for the cluster
6)power the cluster down.
● All CPUs shutting down must:
1)disable allocation into L1
2)flush dirty L1 content
3)disable CPU-level coherency
4)power itself down
● When all CPUs are shut down, we can shut down the
cluster:
● The Last Man must perform steps 1-3, and:
5)flush cluster-level (L2) cache
5)disable CCI snooping for the cluster
6)power the cluster down.
8. Last Man ChallengesLast Man Challenges
● Last Man has to perform a sequence of actions without
interference from other CPUs.
● Problems:
● Other CPUs can be at various stages of shutdown.
● CPUs might wake up at any time.
● Flushing L2 can take quite some time.
● LDREX and STREX only work with cached memory.
● Concurrency is a hard problem.
● Last Man has to perform a sequence of actions without
interference from other CPUs.
● Problems:
● Other CPUs can be at various stages of shutdown.
● CPUs might wake up at any time.
● Flushing L2 can take quite some time.
● LDREX and STREX only work with cached memory.
● Concurrency is a hard problem.
9. ...and yet more challenges...and yet more challenges
● Concurrency:
● Which CPU is the Last Man?
● How does the Last Man know the other CPUs are really down?
● How to avoid races with one or more incoming CPUs?
● How the incoming CPU knows if the cluster needs to be set up.
● Races are everywhere!
● Last Man can't flush L2 until all the other CPUs are done flushing
their L1 caches.
● Incoming CPUs might power up at any time.
● Incoming CPUs can’t proceed safely if CCI snooping is disabled.
● Memory might be cached on some CPUs and uncached on others...
● Concurrency:
● Which CPU is the Last Man?
● How does the Last Man know the other CPUs are really down?
● How to avoid races with one or more incoming CPUs?
● How the incoming CPU knows if the cluster needs to be set up.
● Races are everywhere!
● Last Man can't flush L2 until all the other CPUs are done flushing
their L1 caches.
● Incoming CPUs might power up at any time.
● Incoming CPUs can’t proceed safely if CCI snooping is disabled.
● Memory might be cached on some CPUs and uncached on others...
10. Cluster Life-Cycle (simplified)Cluster Life-Cycle (simplified)
● Similar to CPU life-cycle,
but...
● Need to manage cluster
caches etc. safely
● Cluster power-down may
be preempted
● Need to avoid races
when tracking cluster
state.
● Similar to CPU life-cycle,
but...
● Need to manage cluster
caches etc. safely
● Cluster power-down may
be preempted
● Need to avoid races
when tracking cluster
state.
down
up
going downcoming up
11. Actual cluster life-cycleActual cluster life-cycle
down,
not coming up
up,
not coming up
going down,
not coming up
going down,
coming up
up,
coming up
down,
coming up
(preempt)
actions taken by last man during cluster shutdown
actions taken by first man during cluster wake-up
12. Platform Code Helper FunctionsPlatform Code Helper Functions
● void __bL_cpu_going_down(unsigned int cpu, unsigned int cluster)
Signal that the CPU is shutting down.
● bool __bL_outbound_enter_critical(unsigned int this_cpu, unsigned int
cluster)
Safely begin cluster shutdown, ensuring all other CPUs are down (last man only)
● void __bL_outbound_leave_critical(unsigned int cluster, int state)
End cluster shutdown (last man only)
● void __bL_cpu_down(unsigned int cpu, unsigned int cluster)
Signal that the CPU has finished shutting down.
● Fast models example code in arch/arm/mach-vexpress/dcscb.c.
● Equivalent operations for CPU and cluster stat-up handled by common code in
arch/arm/common/bL_head.S.
● void __bL_cpu_going_down(unsigned int cpu, unsigned int cluster)
Signal that the CPU is shutting down.
● bool __bL_outbound_enter_critical(unsigned int this_cpu, unsigned int
cluster)
Safely begin cluster shutdown, ensuring all other CPUs are down (last man only)
● void __bL_outbound_leave_critical(unsigned int cluster, int state)
End cluster shutdown (last man only)
● void __bL_cpu_down(unsigned int cpu, unsigned int cluster)
Signal that the CPU has finished shutting down.
● Fast models example code in arch/arm/mach-vexpress/dcscb.c.
● Equivalent operations for CPU and cluster stat-up handled by common code in
arch/arm/common/bL_head.S.
13. Managing Cluster Start-UpManaging Cluster Start-Up
● When powering up, the “First Man” must:
● invalidate cluster-level (L2) cache (if needed),
● enable CCI snooping for the cluster,
● resume execution of the kernel.
● Other CPUs must:
● wait until the first man has set up the cluster,
● resume execution of the kernel.
The kernel deals with local CPU setup.
● When powering up, the “First Man” must:
● invalidate cluster-level (L2) cache (if needed),
● enable CCI snooping for the cluster,
● resume execution of the kernel.
● Other CPUs must:
● wait until the first man has set up the cluster,
● resume execution of the kernel.
The kernel deals with local CPU setup.
14. Choosing the First ManChoosing the First Man
● Lightweight mutual
exclusion using “vlocks”
● A CPU “votes” for itself by
storing its ID to a common
location:
STR cpu_id, [ballot_box]
● Memory atomicity ensures
a single winner.
● The winner sets up the
cluster.
● Lightweight mutual
exclusion using “vlocks”
● A CPU “votes” for itself by
storing its ID to a common
location:
STR cpu_id, [ballot_box]
● Memory atomicity ensures
a single winner.
● The winner sets up the
cluster.
election in progress
election
started?
power-on
submit vote
election
finished?
yes
no
no
did I win?
yes
set up cluster
wait for
winner to
set up
cluster
no
boot or
resume
OS
15. Kernel APIKernel API
A convenient interface is provided to hide
hardware specifics from the kernel.
● Make given CPU in given cluster runnable:
bL_cpu_power_up(int cpu, int cluster)
● Power the calling CPU down:
bL_cpu_power_down(void)
● For self housekeeping:
bL_cpu_powered_up(void)
A convenient interface is provided to hide
hardware specifics from the kernel.
● Make given CPU in given cluster runnable:
bL_cpu_power_up(int cpu, int cluster)
● Power the calling CPU down:
bL_cpu_power_down(void)
● For self housekeeping:
bL_cpu_powered_up(void)
16. Targeted UsersTargeted Users
● the in-kernel switcher module (IKS)
● the cpuidle driver
● CPU hotplug
● secondary CPU booting.
● the in-kernel switcher module (IKS)
● the cpuidle driver
● CPU hotplug
● secondary CPU booting.
17. Code AvailabilityCode Availability
● http://git.linaro.org/gitweb?
p=people/nico/linux.git;
a=shortlog;h=refs/heads/bL_cluster_pm
● example implementation for ARM Fast
Model
● Still vaildating on ARM TC2 hardware.
● Should be headed upstream soon...
● http://git.linaro.org/gitweb?
p=people/nico/linux.git;
a=shortlog;h=refs/heads/bL_cluster_pm
● example implementation for ARM Fast
Model
● Still vaildating on ARM TC2 hardware.
● Should be headed upstream soon...