2. Motivation
Interactive wearables, like smart watches, are a newcomer to the
spectrum of mobile computers.
Integrate computing even tighter with our daily lives.
Substantial increase in demand for smart watches.
3. Usage
Patterns
&
Device
Hardware
Users interact with wearable devices frequently throughout the
daily use
Each interaction is short ( < 10s ), and is dedicated to a simple task
Due to the limited content that can be displayed on one screen,
users spend a short time on one screen before switching to the
next.
Tiny Battery capacity (200 – 400mAh)
Slower CPU – Fewer cores
Simpler CPU – Scaled-down but often architecturally identical to
handheld’s CPU
4. AndroidWearOS
One of the most popular OSes for interactive wearables.
Wearable OS with the most public information.
Supports third-party applications and features a resigned system
UI, including Card for notifications, Context streams, and voice
input.
Apps – renovated UI – Follow Android’s conventional
programming paradigm – Written in Java – Compiled ahead-of-
time – executed atop the managedAndroid Runtime.
Major OS components –
System Server – Key daemon hosting the core OS services
Surface Flinger – Daemon controlling UI animation
Clockwork – OS shell that implements the system UI
5. Benchmark
Scenarios
A benchmark suite that consists of 15 benchmarks falling into the
following 4 categories:
1. Wakeup – Due to internal or external events, device transits out of
suspended mode and presents brief information. Due to frequent daily
wakeup, energy efficiency is the most important metric.
2. Single input – A waking wearable device responds to a single input
from the user. Because the user is waiting, the device needs to achieve
low UI latency.
3. Continuous interaction – Users are interacting with the device
continuously. The resultant UI animation requires the device to
produce a steady stream of graphic frames.
4. Sensing – For the execution of wearable apps, sensor data is sampled
and processed periodically to collect context information.
7. Experimental
Setup
All the benchmarks are run on 2 state-of-the-art Android Wear
devices
LGWatch R
Samsung Gear Live
Qualcomm’s APQ8026 system on-chip
AndroidWear 5.0 “Lollipop”
8. Power
Management
Batteries have tiny contacts which are incompatible with
commodity power monitors.
A compatible interface circuit is carved out from a smartphone
battery.
Used the interface as an adapter between the smart watch and an
external power monitor.
The battery
interface carved
out from Nexus 5
The interface
(flipped) connected
to the LG watch R
9. Toolset
Used the following to examine system behaviors at different levels
and granularities
Systrace – for capturing global system events such as scheduling, I/O
activities, and IPC
Android Runtime’s built-in function tracer – for recording function
call history in individual processes
Linux perf – for sampling CPU performance counters.
10. Tackling
profiling
overhead
EventTracing – Major profiling overhead
Memory overhead can be overwhelming in tracing function
invocations.
2 ways used to tackle
In quantifying global system behaviors, the paper only relies on
system events. It collects function trace from extra runs.
In quantifying function-level activities, deduction of an overhead of
4 µs from each traced function invocation ( constant overhead ).
11. CPUUsage
CPU usage is collected at two granularities
Task-level breakdown. An analyzer is built to identify the tasks .
Function-level breakdown. To further locate the performance
hot spots in System Server, the following 2 metrics are employed:
Exclusive CPU cycles are spent in the function’s own code
Inclusive CPU cycles are spent in the function’s code as well as
in all subroutines being called
Both metrics include the time spent in both user and kernel spaces
and do not cover the time when a task is off CPU due to being
scheduled out.
12. IdleTime
Analysis
Amount and duration of the observed idle episodes are unusual.
Match some idle episodes to system events known to cause idle, e.g.
I/O and power management.
Others often root in stalling of OS service in serving app’s requests.
IdleChecker, an analyzer that helps mapping anomalous idle episodes
to the responsible code regions, based on a simple rationale:
The function calls and IPC transactions spanning an anomalous idle
episode are suspicious.
IdleChecker runs the following steps for each idle episode.
Identifies suspicious app tasks that are blocked throughout the entire
idle episode but run after the episode.
For each suspicious task, it identifies two suspicious CPU time quanta:
the one right before the idle episode and the one right after it.
Examines the suspicious quanta, looking for IPC transactions spanning
across the idle episode.
Identifies the function invocations that either coincide with the IPC
or span across the idle episode.
13. Thread-level
Parallelism
Metric widely used for gauging an interactive system’s need for
core count.
Average number of busy CPU cores during the non-idle time.
TLP = 𝑖=1
𝑛
𝑖 ∗ 𝑐𝑖/(1 − 𝑐0)
𝑐0 - total time when no threads are running
𝑐𝑖 - time when exactly i threads are running simultaneously
n - number of cores available.
For measurement, all 4 cores are forced online
14. Microarchitectural
behaviors
Microarchitecture design is a Mystery
By using the Linux perf, the paper samples the performance
counters of the Cortex-A7 CPU on test devices.
Observe branch prediction, cache, andTLB in all benchmarks
16. Where doCPU
cycles go?
Intensive OS execution often dominates the global CPU usage.
Many costly OS services are likely to make software unnecessarily
complicated
The CPU time distribution of hot functions is highly skewed.
Manipulating basic data structures consumes substantial CPU
cycles.
Legacy OS functions may become serious performance
bottlenecks
OS Execution Bottlenecks
setLight(), Layout(), computeOom(), getSimpleName()
17. Idle Episodes
Plentiful and of a variety of lengths
Improper OS Designs
Interference from voice UI
Legacy support for device suspending
Performance overprovision during continuous Interaction
Design Implications
Hunting OS inefficiencies
Filling idle time with useful work
reducing CPU & GPU clock rates which will shrink idle episodes
predictive execution
18. Thread-level
parallelism
Short interactions exhibit substantial TLP, which is on par with
desktop workloads.
While apps are mostly single-threaded, OS daemons contribute to
TLP significantly.
A wearable device needs at least two cores.
19. Microarchitectural
behaviors
A significant mismatch exists between the OS and CPU
microarchitecture, particularly in L1 icache, iTLB, and branch
predictor.
The mismatch is largely due to the OS code complexity, and will
not be eliminated by a unilateral enhancement of wearable CPU.
OS should be trimmed down to match the simplicity of its apps.
20. RelatedWork
Gao et al. find that smartphone workloads show limitedTLP,
concluding that they need no more than two cores.
ProfileDroid contributes an approach for charactering smartphone
apps at multiple layers
Min et al. studies the battery usage of smart watches
WearDrive creates synthetic benchmarks to shed light on
wearable storage.
RisQ andTypingRing target gesture recognition
iShadow tracks gaze in real time
Ha et al. build wearable for cognitive assistance
Cornelius et al. focus on user identification
21. Recap
In-depth analysis of one of the most popular wearable Oses,
Android Wear.
Examination of 4 key aspects: CPU usage, idle episodes, TLP, and
micro-architectural behaviors – in fifteen benchmarks.
Discovery of serious OS inefficiencies and system bottlenecks that
were widespread but unknown before.
The results clearly point out the system bottlenecks for immediate
optimization and have strong implications on future wearable
system software and hardware design.