SlideShare a Scribd company logo
1 of 58
Download to read offline
Using tracing to tune and optimize EAS
Leo Yan & Daniel Thompson
Linaro Support and Solutions Engineering
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Agenda
● Background
○ Review of typical workflow for GTS tuning
○ Introduce a workflow for EAS tuning
○ Quick introduction of the tools that support the new
workflow
● Worked examples
○ Development platform for the worked examples
○ Analyze for task ping-pong issue
○ Analyze for small task staying on big core
● Further reading
ENGINEERS AND DEVICES
WORKING TOGETHER
Typical workflow for optimizing GTS
This simple workflow is easy to understand
but has problems in practice.
Tunables are complex and interact with each
other (making it hard to decide which
tuneable to adjust).
Tuning for multiple use-cases is difficult.
Tuning is SoC specific, optimizations will not
necessarily apply to other SoCs.
Adjust
tuneables
Benchmark
ENGINEERS AND DEVICES
WORKING TOGETHER
GTS tunables
GTS
up_threshold
down_threshold
packing_enable
load_avg_period_ms
frequency_invariant_load_scale
CPUFreq
Interactive
Governor
hispeed_freq
go_hispeed_load
target_loads
Timer_rate
min_sample_time
above_hispeed_delay
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Agenda
● Background
○ Review of typical workflow for GTS tuning
○ Introduce a workflow for EAS tuning
○ Quick introduction of the tools that support the new
workflow
● Worked examples
○ Development platform for the worked examples
○ Task ping-pong issue
○ Small task staying on big core
● Further reading
ENGINEERS AND DEVICES
WORKING TOGETHER
Typical workflow for optimizing EAS systems
Trace a
use-case
Benchmark
Examine
traces
Improve
decisions
Workflow is knowledge intensive.
Decisions can be improved by
improving the power model or by
finding new opportunities in the
scheduler (a.k.a. debugging).
Optimizations are more portable.
● Can be shared for review
● Likely to benefit your new SoC
ENGINEERS AND DEVICES
WORKING TOGETHER
Trace points for EAS
Has a set of stock trace
points in kernel for diving
into debugging
Trace points are added by
patches marked “DEBUG”
Not posted to LKML,
currently only found
in product focused
patchsets
Enable kernel config:
CONFIG_FTRACE
ENGINEERS AND DEVICES
WORKING TOGETHER
Trace points for EAS - cont.
sched_contrib_scale_f
sched_load_avg_task
sched_load_avg_cpu
PELT signals
EAS core SchedTune
sched_switch
sched_migrate_task
sched_wakeup
sched_wakeup_new
Scheduler
default events
SchedFreq
LISA can be easily
extended to support
these trace points
cpufreq_sched_throttled
cpufreq_sched_request_opp
cpufreq_sched_update_capacity
sched_energy_diff
sched_overutilized
sched_tune_config
sched_boost_cpu
sched_tune_tasks_update
sched_tune_boostgroup_update
sched_boost_task
sched_tune_filter
Tracepoints in mainline kernel
Tracepoints for EAS extension
E.g. enable trace points:
trace-cmd start -e sched_energy_diff -e sched_wakeup
ENGINEERS AND DEVICES
WORKING TOGETHER
Summary
Features EAS GTS
Make decision strategy Power modeling Heuristics thresholds
Frequency selection Sched-freq or sched-util,
integrated with scheduler
Governor’s cascaded
parameters
Scenario based tuning schedTune (CGroup) None
Energy awared scheduling (EAS) has very few tunables and thus requires a
significantly different approach to tuning and optimization when compared to
global task scheduling (GTS).
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Agenda
● Background
○ Review of typical workflow for GTS tuning
○ Introduce a workflow for EAS tuning
○ Quick introduction of the tools that support the new
workflow
● Worked examples
○ Development platform for the worked examples
○ Task ping-pong issue
○ Small task staying on big core
● Further reading
ENGINEERS AND DEVICES
WORKING TOGETHER
LISA - interactive analysis and testing
● “Distro” of python libraries for interactive analysis and automatic testing
● Libraries support includes
○ Target control and manipulation (set cpufreq mode, run this workload, initiate trace)
○ Gather power measurement data and calculate energy
○ Analyze and graph trace results
○ Test assertions about the trace results (e.g. big CPU does not run more than 20ms)
● Interactive analysis using ipython and jupyter
○ Provides a notebook framework similar to Maple, Mathematica or Sage
○ Notebooks mix together documentation with executable code fragments
○ Notebooks record the output of an interactive session
○ All permanent file storage is on the host
○ Trace files and graphs can be reexamined in the future without starting the target
● Automatic testing
○ Notebooks containing assertion based tests that can be converted to normal python
ENGINEERS AND DEVICES
WORKING TOGETHER
General workflow for LISA
http://events.linuxfoundation.org/sites/events/files/slides/ELC16_LISA_20160326.pdf
ENGINEERS AND DEVICES
WORKING TOGETHER
LISA interactive test mode
http://127.0.0.1:8888 with
ipython file
Menu & control buttons
Markdown (headers)
Execute box with python
programming
Result box, the results of
experiments are
recorded when next
time reopen this file
ENGINEERS AND DEVICES
WORKING TOGETHER
kernelshark
Task scheduling
Filters for events, tasks, and CPUs
Details for events
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Agenda
● Background
○ Review of typical workflow for GTS tuning
○ Introduce a workflow for EAS tuning
○ Quick introduction of the tools that support the new
workflow
● Worked examples
○ Development platform for the worked examples
○ Task ping-pong issue
○ Small task staying on big core
● Further reading
ENGINEERS AND DEVICES
WORKING TOGETHER
Development platform for the worked examples
● All examples use artificial workloads to provoke a specific behaviour
○ It turned out to be quite difficult to deliberately provoke undesired behavior!
● Examples are reproducible on 96Boards HiKey
○ Octo-A53 multi-cluster (2x4) SMP device with five OPPs per cluster
■ Not big.LITTLE, and not using a fast/slow silicon process
○ We are able to fake a fast/slow system by using asymmetric power
modeling parameters and artificially reducing the running/runnable delta
time for “fast” CPU so the metrics indicate that is has a higher performance
● Most plots shown in these slides are copied from a LISA notebook
○ Notebooks and trace files have been shared for use after training
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Agenda
● Background
○ Review of typical workflow for GTS tuning
○ Introduce a workflow for EAS tuning
○ Quick introduction of the tools that support the new
workflow
● Worked examples
○ Development platform for the worked examples
○ Task ping-pong issue
○ Small task staying on big core
● Further reading
ENGINEERS AND DEVICES
WORKING TOGETHER
Testing environment
● CPU capacity info
○ The little core’s highest capacity is 447@850MHz
○ The big core’s highest capacity is 1024@1.1GHz
○ This case is running with correct power model parameters
● Test case
○ 16 small tasks are running with 15% utilization of little CPU (util ~= 67)
○ A single large task is running with 40% utilization of little CPU (util ~= 180)
ENGINEERS AND DEVICES
WORKING TOGETHER
General analysis steps
LISA::
Trace()
LISA::
Filters()
Generate
workload
Filter out
tasks LISA::
TraceAnalysis()
Analyze
tasks
LISA::
TestEnv()
Connect with
target
Analyze
events
LISA::
rtapp()
Step 1: Run workload and generate trace data
Step 2: Analyze trace data
platform.json
Platform description file
ENGINEERS AND DEVICES
WORKING TOGETHER
Connect with target board
Specify target board info for
connection
Calibration for CPUs
Tools copied to target board
Enable ftrace events
Create connection
ENGINEERS AND DEVICES
WORKING TOGETHER
Generate and execute workload
Define workload
Capture Ftrace data
Execute workload
Capture energy data
ENGINEERS AND DEVICES
WORKING TOGETHER
Graph showing task placement in LISA
Specify events to be extracted
Specify time interval
Display task placement graph
ENGINEERS AND DEVICES
WORKING TOGETHER
First make a quick graph showing task placement...
Too many tasks, need
method to quickly filter
out statistics for every
task.
ENGINEERS AND DEVICES
WORKING TOGETHER
… and decide how to tackle step 2 analysis
LISA::
Trace()
LISA::
Filters()
Generate
workload
Filter out
tasks LISA::
TraceAnalysis()
Analyze
tasks
LISA::
TestEnv()
Connect with
target
Analyze
events
LISA::
rtapp()
Step 1: Run workload and generate trace data
Step 2: Analyze trace data
platform.json
Platform description file
ENGINEERS AND DEVICES
WORKING TOGETHER
Analyze trace data for events
events_to_parse = [
"sched_switch",
"sched_wakeup",
"sched_wakeup_new",
"sched_contrib_scale_f",
"sched_load_avg_cpu",
"sched_load_avg_task",
"sched_tune_config",
"sched_tune_tasks_update",
"sched_tune_boostgroup_update",
"sched_tune_filter",
"sched_boost_cpu",
"sched_boost_task",
"sched_energy_diff",
"cpu_frequency",
"cpu_capacity",
""
]
platform.json
{
"clusters": {
"big": [
4,
5,
6,
7
],
"little": [
0,
1,
2,
3
]
},
"cpus_count": 8,
"freqs": {
"big": [
208000,
432000,
729000,
960000,
1200000
],
"little": [
208000,
432000,
729000,
960000,
1200000
]
},
[...]
}trace.dat
Format:
“SYSTRACE” or “Ftrace”
ENGINEERS AND DEVICES
WORKING TOGETHER
Selecting only task of interest (big tasks)
top_big_tasks
{'mmcqd/0': 733,
'Task01' : 2441,
'task010': 2442,
'task011': 2443,
'task012': 2444,
'task015': 2447,
'task016': 2448,
'task017': 2449,
'task019': 2451,
'task020': 2452,
'task021': 2453,
'task022': 2454,
'task023': 2455,
'task024': 2456,
'task025': 2457}
ENGINEERS AND DEVICES
WORKING TOGETHER
Plot big tasks with TraceAnalysis
ENGINEERS AND DEVICES
WORKING TOGETHER
TraceAnalysis graph of task residency on CPUs
At beginning task is placed on big core1
Then it ping-pongs between big
cores and LITTLE cores
2
ENGINEERS AND DEVICES
WORKING TOGETHER
TraceAnalysis graph of task PELT signals
Big core’s highest capacity
LITTLE core’s highest capacity
Big core’s tipping point
LITTLE core’s tipping point
util_avg = PELT(running time)
load_avg
= PELT(running time + runnable time) * weight
= PELT(running time + runnable time) (if NICE = 0)
The difference
between load_avg
and util_avg is
task’s runnable time
on rq (for NICE=0)
ENGINEERS AND DEVICES
WORKING TOGETHER
static int select_task_rq_fair(struct task_struct *p, int
prev_cpu, int sd_flag, int wake_flags)
{
[...]
if (!sd) {
if (energy_aware() && !cpu_rq(cpu)->rd->overutilized)
new_cpu = energy_aware_wake_cpu(p, prev_cpu);
else if (sd_flag & SD_BALANCE_WAKE) /* XXX always ? */
new_cpu = select_idle_sibling(p, new_cpu);
} else while (sd) {
[...]
}
}
System cross tipping point for “over-utilized”
static void
enqueue_task_fair(struct rq *rq, struct task_struct
*p, int flags)
{
[...]
if (!se) {
add_nr_running(rq, 1);
if (!task_new && !rq->rd->overutilized &&
cpu_overutilized(rq->cpu))
rq->rd->overutilized = true;
[...]
}
}
Over tipping point
EAS path
SMP load balance
static struct sched_group *find_busiest_group(struct lb_env
*env)
{
if (energy_aware() && !env->dst_rq->rd->overutilized)
goto out_balanced;
[...]
}
EAS path
SMP load balance
ENGINEERS AND DEVICES
WORKING TOGETHER
Write function to Analyze tipping point
If the LISA toolkit does not include the
plotting function you need, you can
write a plot function yourself
ENGINEERS AND DEVICES
WORKING TOGETHER
Plot for tipping point
System is over tipping point, migrate task
from CPU3 (little core) to CPU4 (big core)
System is under tipping point, migrate task
from CPU4 (big core) to CPU3 (little core)
ENGINEERS AND DEVICES
WORKING TOGETHER
Detailed trace log for migration to big core
nohz_idle_balance() for tasks migration
Migrate big task from CPU3 to CPU4
ENGINEERS AND DEVICES
WORKING TOGETHER
Issue 1: migration big task back to LITTLE core
Migrate task to LITTLE core
Migrate big task from CPU4 to CPU3
ENGINEERS AND DEVICES
WORKING TOGETHER
Issue 2: migration small tasks to big core
Migrate small tasks to big core
CPU is overutilized again
ENGINEERS AND DEVICES
WORKING TOGETHER
Tipping point criteria
Over tipping point
Util: 90%
Any CPU: cpu_util(cpu) > cpu_capacity(cpu) *
80%
Under tipping point
80% of capacity
E.g. LITTLE core:
cpu_capacity(cpu0) = 447
ALL CPUs: cpu_util(cpu) < cpu_capacity(cpu) * 80%
Util: 90% 80% of capacity
E.g. Big core:
cpu_capacity(cpu4) = 1024
Util: 70% 80% of capacity
E.g. LITTLE core:
cpu_capacity(cpu0) = 447
Util: 70%
80% of capacity
E.g. Big core:
cpu_capacity(cpu4) = 1024
ENGINEERS AND DEVICES
WORKING TOGETHER
Phenomenon for ping-pong issue
LITTLE cluster
Over tipping point Under tipping point
big cluster
LITTLE cluster
big cluster
Task1
Task1
Task1
Task1
Migration
ENGINEERS AND DEVICES
WORKING TOGETHER
Fixes for ping-pong issue
LITTLE cluster
Over tipping point Under tipping point
big cluster
LITTLE cluster
big cluster
Task1
Task1
Task1
Migration
Filter out small tasks to
avoid migrate to big core
Avoid migrating big task
back to LITTLE cluster
ENGINEERS AND DEVICES
WORKING TOGETHER
Fallback to LITTLE cluster after it is idle
LITTLE cluster
Over tipping point Under tipping point
big cluster
LITTLE cluster
big cluster
Task1
Task1
Task1
Migration
Migrate big task back to
LITTLE cluster if it’s idle
Task1
ENGINEERS AND DEVICES
WORKING TOGETHER
Filter out small tasks for (tick, idle) load balance
static
int can_migrate_task(struct task_struct *p, struct lb_env *env)
{
[...]
if (energy_aware() &&
(capacity_orig_of(env->dst_cpu) > capacity_orig_of(env->src_cpu))) {
if (task_util(p) * 4 < capacity_orig_of(env->src_cpu))
return 0;
}
[...]
}
Filter out small tasks: task running time < ¼ LITTLE CPU capacity.
These tasks will NOT be migrated to big core after return 0.
Result: Only big tasks has a chance to migrate to big core.
ENGINEERS AND DEVICES
WORKING TOGETHER
Avoid migrating big task to LITTLE cluster
static bool need_spread_task(int cpu)
{
struct sched_domain *sd;
int spread = 0, i;
if (cpu_rq(cpu)->rd->overutilized)
return 1;
sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd);
if (!sd)
return 0;
for_each_cpu(i, sched_domain_span(sd)) {
if (cpu_rq(i)->cfs.h_nr_running >= 1 &&
cpu_halfutilized(i)) {
spread = 1;
Break;
}
}
return spread;
}
static int select_task_rq_fair(struct task_struct *p, int prev_cpu,
int sd_flag, int wake_flags)
{
[...]
if (!sd) {
if (energy_aware() &&
(!need_spread_task(cpu) || need_filter_task(p)))
new_cpu = energy_aware_wake_cpu(p, prev_cpu);
else if (sd_flag & SD_BALANCE_WAKE) /* XXX always ? */
new_cpu = select_idle_sibling(p, new_cpu);
} else while (sd) {
[...]
}
}
Check if cluster is busy or not as well as
checking system tipping point:
● Easier to spread tasks within cluster
if cluster is busy
● Fallback to migrating big task when
cluster is idle
ENGINEERS AND DEVICES
WORKING TOGETHER
static bool need_filter_task(struct task_struct *p)
{
int cpu = task_cpu(p);
int origin_max_cap = capacity_orig_of(cpu);
int target_max_cap = cpu_rq(cpu)->rd->max_cpu_capacity.val;
struct sched_domain *sd;
struct sched_group *sg;
sd = rcu_dereference(per_cpu(sd_ea, cpu));
sg = sd->groups;
do {
int first_cpu = group_first_cpu(sg);
if (capacity_orig_of(first_cpu) < target_max_cap &&
task_util(p) * 4 < capacity_orig_of(first_cpu))
target_max_cap = capacity_orig_of(first_cpu);
} while (sg = sg->next, sg != sd->groups);
if (target_max_cap < origin_max_cap)
return 1;
return 0;
}
Filter out small tasks for wake up balance
Two purposes of this
function:
● Select small tasks
(task running time < ¼
LITTLE CPU capacity)
and keep them on the
energy aware path
● Prevent energy aware
path for big tasks on
the big core from
doing harm to little
tasks.
ENGINEERS AND DEVICES
WORKING TOGETHER
Results after applying patches
The big task always
run on CPU6 and
small tasks run on
LITTLE cores!
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Agenda
● Background
○ Review of typical workflow for GTS tuning
○ Introduce a workflow for EAS tuning
○ Quick introduction of the tools that support the new
workflow
● Worked examples
○ Development platform for the worked examples
○ Task ping-pong issue
○ Small task staying on big core
● Further reading
ENGINEERS AND DEVICES
WORKING TOGETHER
Testing environment
● Testing environment
○ The LITTLE core’s highest capacity is 447@850MHz
○ The big core’s highest capacity is 1024@1.1GHz
○ Single small task is running with 9% utilization of big CPU (util ~= 95)
● Phenomenon
○ The single small task runs on big CPU for long time, even though its
utilization is well below the tipping point
ENGINEERS AND DEVICES
WORKING TOGETHER
Global View For Task’s Placement
Small task run at big core for
about 3s, during this period the
system is not busy
ENGINEERS AND DEVICES
WORKING TOGETHER
Analyze task utilization
Filter only related tasks
Analyze task’s utilization signal
ENGINEERS AND DEVICES
WORKING TOGETHER
PELT Signals for task utilization
The task utilization is normalized to value ~95 on
big core, this utilization does not exceed the
LITTLE core’s tipping point of 447 * 80% = 358.
Thus the LITTLE core can meet the task’s
requirement for capacity, so scheduler should
place this task on a LITTLE core.
ENGINEERS AND DEVICES
WORKING TOGETHER
Use kernelshark to check wake up path
In energy aware path, we would expect to see
“sched_boost_task”, but in this case the event is missing,
implying the scheduler performed normal load balancing
because “overutilized” flag is set. Thus the balancer is run
to select an idle CPU in the lowest schedule domain. If
previous CPU is idle the task will stick on previous CPU so it
can benefit from a “hot cache”.
ENGINEERS AND DEVICES
WORKING TOGETHER
The “tipping point” has been set for long time
static inline void update_sg_lb_stats(struct lb_env *env,
struct sched_group *group, int load_idx,
int local_group, struct sg_lb_stats *sgs,
bool *overload, bool *overutilized)
{
unsigned long load;
int i, nr_running;
memset(sgs, 0, sizeof(*sgs));
for_each_cpu_and(i, sched_group_cpus(group), env->cpus) {
[...]
if (cpu_overutilized(i)) {
*overutilized = true;
if (!sgs->group_misfit_task && rq->misfit_task)
sgs->group_misfit_task = capacity_of(i);
}
[...]
}
}
*overutilized is initialized
as ‘false’ before we
commence the update,
so if any CPU is
over-utilized, then this is
enough to keep us over
the tipping-point.
So need analyze the
load of every CPU.
ENGINEERS AND DEVICES
WORKING TOGETHER
Plot for CPU utilization and idle state
ENGINEERS AND DEVICES
WORKING TOGETHER
CPU utilization does not update during idle
CPU utilization is updated when CPU is
woken up after long time
ENGINEERS AND DEVICES
WORKING TOGETHER
Fix Method: ignore overutilized state for idle CPUs
static inline void update_sg_lb_stats(struct lb_env *env,
struct sched_group *group, int load_idx,
int local_group, struct sg_lb_stats *sgs,
bool *overload, bool *overutilized)
{
unsigned long load;
int i, nr_running;
memset(sgs, 0, sizeof(*sgs));
for_each_cpu_and(i, sched_group_cpus(group), env->cpus) {
[...]
if (cpu_overutilized(i) && !idle_cpu(i)) {
*overutilized = true;
if (!sgs->group_misfit_task && rq->misfit_task)
sgs->group_misfit_task = capacity_of(i);
}
[...]
}
}
Code flow is altered so
we only consider the
overutilized state for
non-idle CPUs
ENGINEERS AND DEVICES
WORKING TOGETHER
After applying patch to fix this...
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Agenda
● Background
○ Review of typical workflow for GTS tuning
○ Introduce a workflow for EAS tuning
○ Quick introduction of the tools that support the new
workflow
● Worked examples
○ Development platform for the worked examples
○ Task ping-pong issue
○ Small task staying on big core
● Further reading
ENGINEERS AND DEVICES
WORKING TOGETHER
Related materials
● Notebooks and related materials for both worked examples
○ https://fileserver.linaro.org/owncloud/index.php/s/5gpVpzN0FdxMmGl
○ ipython notebooks for workload generation and analysis
○ Trace data before and after fixing together with platform.json
● Patches are under discussion on eas-dev mailing list
○ sched/fair: support to spread task in lowest schedule domain
○ sched/fair: avoid small task to migrate to higher capacity CPU
○ sched/fair: filter task for energy aware path
○ sched/fair: consider over utilized only for CPU is not idle
ENGINEERS AND DEVICES
WORKING TOGETHER
Next steps
● You can debug the scheduler
○ Try to focus on decision making, not hacks
○ New decisions should be as generic as possible (ideally based on normalized units)
○ Sharing resulting patches for review is highly recommended
■ Perhaps fix can be improved or is already expressed differently by someone
else
● Understanding tracepoint patches and the tooling from ARM
○ Basic python coding experience is needed to utilize LISA libraries
● Understanding SchedTune
○ SchedTune interferes with the task utilization levels for CPU selection and CPU
utilization levels to bias CPU and OPP selection decisions
○ Evaluate energy-performance trade-off
○ Without tools, it’s hard to define and debug SchedTune boost margin on a specific
platform
Thank You
#LAS16
For further information: www.linaro.org or support@linaro.org
LAS16 keynotes and videos on: connect.linaro.org

More Related Content

What's hot

BUD17-309: IRQ prediction
BUD17-309: IRQ prediction BUD17-309: IRQ prediction
BUD17-309: IRQ prediction
Linaro
 
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling ToolsTIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
Xiaozhe Wang
 

What's hot (20)

LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with schedulerLCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
 
Tech Days 2015: Embedded Product Update
Tech Days 2015: Embedded Product UpdateTech Days 2015: Embedded Product Update
Tech Days 2015: Embedded Product Update
 
Embedded system -Introduction to hardware designing
Embedded system  -Introduction to hardware designingEmbedded system  -Introduction to hardware designing
Embedded system -Introduction to hardware designing
 
Tech Days 2015: ARM Programming with GNAT and Ada 2012
Tech Days 2015: ARM Programming with GNAT and Ada 2012Tech Days 2015: ARM Programming with GNAT and Ada 2012
Tech Days 2015: ARM Programming with GNAT and Ada 2012
 
BKK16-317 How to generate power models for EAS and IPA
BKK16-317 How to generate power models for EAS and IPABKK16-317 How to generate power models for EAS and IPA
BKK16-317 How to generate power models for EAS and IPA
 
Get Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsGet Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java Applications
 
Performance: Observe and Tune
Performance: Observe and TunePerformance: Observe and Tune
Performance: Observe and Tune
 
Tech Day 2015: A Gentle Introduction to GPS and GNATbench
Tech Day 2015: A Gentle Introduction to GPS and GNATbenchTech Day 2015: A Gentle Introduction to GPS and GNATbench
Tech Day 2015: A Gentle Introduction to GPS and GNATbench
 
BUD17-309: IRQ prediction
BUD17-309: IRQ prediction BUD17-309: IRQ prediction
BUD17-309: IRQ prediction
 
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling ToolsTIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
 
LCU14-410: How to build an Energy Model for your SoC
LCU14-410: How to build an Energy Model for your SoCLCU14-410: How to build an Energy Model for your SoC
LCU14-410: How to build an Energy Model for your SoC
 
BKK16-TR08 How to generate power models for EAS and IPA
BKK16-TR08 How to generate power models for EAS and IPABKK16-TR08 How to generate power models for EAS and IPA
BKK16-TR08 How to generate power models for EAS and IPA
 
Omp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacOmp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdac
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
Continuous Performance Regression Testing with JfrUnit
Continuous Performance Regression Testing with JfrUnitContinuous Performance Regression Testing with JfrUnit
Continuous Performance Regression Testing with JfrUnit
 
Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...
 
Bpf performance tools chapter 4 bcc
Bpf performance tools chapter 4   bccBpf performance tools chapter 4   bcc
Bpf performance tools chapter 4 bcc
 
eTPU to GTM Migration
eTPU to GTM MigrationeTPU to GTM Migration
eTPU to GTM Migration
 
Onnc intro
Onnc introOnnc intro
Onnc intro
 
Efficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approachEfficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approach
 

Viewers also liked

BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement
Linaro
 
BUD17-510: Power management in Linux together with secure firmware
BUD17-510: Power management in Linux together with secure firmwareBUD17-510: Power management in Linux together with secure firmware
BUD17-510: Power management in Linux together with secure firmware
Linaro
 
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from HellLAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
Linaro
 

Viewers also liked (13)

BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement
 
BUD17-510: Power management in Linux together with secure firmware
BUD17-510: Power management in Linux together with secure firmwareBUD17-510: Power management in Linux together with secure firmware
BUD17-510: Power management in Linux together with secure firmware
 
Pakistan in age of 3 g modified
Pakistan in age of 3 g modifiedPakistan in age of 3 g modified
Pakistan in age of 3 g modified
 
Panda Adaptive Defense - The evolution of malware
Panda Adaptive Defense - The evolution of malwarePanda Adaptive Defense - The evolution of malware
Panda Adaptive Defense - The evolution of malware
 
LAS16-407: Internet of Tiny Linux (IoTL): the sequel.
LAS16-407: Internet of Tiny Linux (IoTL): the sequel.LAS16-407: Internet of Tiny Linux (IoTL): the sequel.
LAS16-407: Internet of Tiny Linux (IoTL): the sequel.
 
LAS16-406: Android Widevine on OP-TEE
LAS16-406: Android Widevine on OP-TEELAS16-406: Android Widevine on OP-TEE
LAS16-406: Android Widevine on OP-TEE
 
LAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel AwarenessLAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel Awareness
 
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from HellLAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
 
LAS16-504: Secure Storage updates in OP-TEE
LAS16-504: Secure Storage updates in OP-TEELAS16-504: Secure Storage updates in OP-TEE
LAS16-504: Secure Storage updates in OP-TEE
 
LAS16-203: Platform security architecture for embedded devices
LAS16-203: Platform security architecture for embedded devicesLAS16-203: Platform security architecture for embedded devices
LAS16-203: Platform security architecture for embedded devices
 
BUD17-400: Secure Data Path with OPTEE
BUD17-400: Secure Data Path with OPTEE BUD17-400: Secure Data Path with OPTEE
BUD17-400: Secure Data Path with OPTEE
 
BUD17-416: Benchmark and profiling in OP-TEE
BUD17-416: Benchmark and profiling in OP-TEE BUD17-416: Benchmark and profiling in OP-TEE
BUD17-416: Benchmark and profiling in OP-TEE
 
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to EmbeddedLAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
 

Similar to LAS16-TR04: Using tracing to tune and optimize EAS (English)

Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic Optimizing
Databricks
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
Laura Lorenz
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
PyData
 
Hardware Assisted Latency Investigations
Hardware Assisted Latency InvestigationsHardware Assisted Latency Investigations
Hardware Assisted Latency Investigations
ScyllaDB
 
Apache Hadoop India Summit 2011 talk "An Extension of Fairshare-Scheduler and...
Apache Hadoop India Summit 2011 talk "An Extension of Fairshare-Scheduler and...Apache Hadoop India Summit 2011 talk "An Extension of Fairshare-Scheduler and...
Apache Hadoop India Summit 2011 talk "An Extension of Fairshare-Scheduler and...
Yahoo Developer Network
 
Presentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop schedulingPresentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop scheduling
Antonio Maria Fiscarelli
 

Similar to LAS16-TR04: Using tracing to tune and optimize EAS (English) (20)

Differences of Deep Learning Frameworks
Differences of Deep Learning FrameworksDifferences of Deep Learning Frameworks
Differences of Deep Learning Frameworks
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache Spark
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
When the OS gets in the way
When the OS gets in the wayWhen the OS gets in the way
When the OS gets in the way
 
Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic Optimizing
 
Searching Algorithms
Searching AlgorithmsSearching Algorithms
Searching Algorithms
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architecture
 
Tokyo Webmining Talk1
Tokyo Webmining Talk1Tokyo Webmining Talk1
Tokyo Webmining Talk1
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
 
Parallel Batch Performance Considerations
Parallel Batch Performance ConsiderationsParallel Batch Performance Considerations
Parallel Batch Performance Considerations
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
 
Hardware Assisted Latency Investigations
Hardware Assisted Latency InvestigationsHardware Assisted Latency Investigations
Hardware Assisted Latency Investigations
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
Natural Laws of Software Performance
Natural Laws of Software PerformanceNatural Laws of Software Performance
Natural Laws of Software Performance
 
Common Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksCommon Design of Deep Learning Frameworks
Common Design of Deep Learning Frameworks
 
Apache Hadoop India Summit 2011 talk "An Extension of Fairshare-Scheduler and...
Apache Hadoop India Summit 2011 talk "An Extension of Fairshare-Scheduler and...Apache Hadoop India Summit 2011 talk "An Extension of Fairshare-Scheduler and...
Apache Hadoop India Summit 2011 talk "An Extension of Fairshare-Scheduler and...
 
Presentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop schedulingPresentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop scheduling
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
 

More from Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
Linaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
Linaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
Linaro
 

More from Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

LAS16-TR04: Using tracing to tune and optimize EAS (English)

  • 1. Using tracing to tune and optimize EAS Leo Yan & Daniel Thompson Linaro Support and Solutions Engineering
  • 2. ENGINEERS AND DEVICES WORKING TOGETHER Agenda ● Background ○ Review of typical workflow for GTS tuning ○ Introduce a workflow for EAS tuning ○ Quick introduction of the tools that support the new workflow ● Worked examples ○ Development platform for the worked examples ○ Analyze for task ping-pong issue ○ Analyze for small task staying on big core ● Further reading
  • 3. ENGINEERS AND DEVICES WORKING TOGETHER Typical workflow for optimizing GTS This simple workflow is easy to understand but has problems in practice. Tunables are complex and interact with each other (making it hard to decide which tuneable to adjust). Tuning for multiple use-cases is difficult. Tuning is SoC specific, optimizations will not necessarily apply to other SoCs. Adjust tuneables Benchmark
  • 4. ENGINEERS AND DEVICES WORKING TOGETHER GTS tunables GTS up_threshold down_threshold packing_enable load_avg_period_ms frequency_invariant_load_scale CPUFreq Interactive Governor hispeed_freq go_hispeed_load target_loads Timer_rate min_sample_time above_hispeed_delay
  • 5. ENGINEERS AND DEVICES WORKING TOGETHER Agenda ● Background ○ Review of typical workflow for GTS tuning ○ Introduce a workflow for EAS tuning ○ Quick introduction of the tools that support the new workflow ● Worked examples ○ Development platform for the worked examples ○ Task ping-pong issue ○ Small task staying on big core ● Further reading
  • 6. ENGINEERS AND DEVICES WORKING TOGETHER Typical workflow for optimizing EAS systems Trace a use-case Benchmark Examine traces Improve decisions Workflow is knowledge intensive. Decisions can be improved by improving the power model or by finding new opportunities in the scheduler (a.k.a. debugging). Optimizations are more portable. ● Can be shared for review ● Likely to benefit your new SoC
  • 7. ENGINEERS AND DEVICES WORKING TOGETHER Trace points for EAS Has a set of stock trace points in kernel for diving into debugging Trace points are added by patches marked “DEBUG” Not posted to LKML, currently only found in product focused patchsets Enable kernel config: CONFIG_FTRACE
  • 8. ENGINEERS AND DEVICES WORKING TOGETHER Trace points for EAS - cont. sched_contrib_scale_f sched_load_avg_task sched_load_avg_cpu PELT signals EAS core SchedTune sched_switch sched_migrate_task sched_wakeup sched_wakeup_new Scheduler default events SchedFreq LISA can be easily extended to support these trace points cpufreq_sched_throttled cpufreq_sched_request_opp cpufreq_sched_update_capacity sched_energy_diff sched_overutilized sched_tune_config sched_boost_cpu sched_tune_tasks_update sched_tune_boostgroup_update sched_boost_task sched_tune_filter Tracepoints in mainline kernel Tracepoints for EAS extension E.g. enable trace points: trace-cmd start -e sched_energy_diff -e sched_wakeup
  • 9. ENGINEERS AND DEVICES WORKING TOGETHER Summary Features EAS GTS Make decision strategy Power modeling Heuristics thresholds Frequency selection Sched-freq or sched-util, integrated with scheduler Governor’s cascaded parameters Scenario based tuning schedTune (CGroup) None Energy awared scheduling (EAS) has very few tunables and thus requires a significantly different approach to tuning and optimization when compared to global task scheduling (GTS).
  • 10. ENGINEERS AND DEVICES WORKING TOGETHER Agenda ● Background ○ Review of typical workflow for GTS tuning ○ Introduce a workflow for EAS tuning ○ Quick introduction of the tools that support the new workflow ● Worked examples ○ Development platform for the worked examples ○ Task ping-pong issue ○ Small task staying on big core ● Further reading
  • 11. ENGINEERS AND DEVICES WORKING TOGETHER LISA - interactive analysis and testing ● “Distro” of python libraries for interactive analysis and automatic testing ● Libraries support includes ○ Target control and manipulation (set cpufreq mode, run this workload, initiate trace) ○ Gather power measurement data and calculate energy ○ Analyze and graph trace results ○ Test assertions about the trace results (e.g. big CPU does not run more than 20ms) ● Interactive analysis using ipython and jupyter ○ Provides a notebook framework similar to Maple, Mathematica or Sage ○ Notebooks mix together documentation with executable code fragments ○ Notebooks record the output of an interactive session ○ All permanent file storage is on the host ○ Trace files and graphs can be reexamined in the future without starting the target ● Automatic testing ○ Notebooks containing assertion based tests that can be converted to normal python
  • 12. ENGINEERS AND DEVICES WORKING TOGETHER General workflow for LISA http://events.linuxfoundation.org/sites/events/files/slides/ELC16_LISA_20160326.pdf
  • 13. ENGINEERS AND DEVICES WORKING TOGETHER LISA interactive test mode http://127.0.0.1:8888 with ipython file Menu & control buttons Markdown (headers) Execute box with python programming Result box, the results of experiments are recorded when next time reopen this file
  • 14. ENGINEERS AND DEVICES WORKING TOGETHER kernelshark Task scheduling Filters for events, tasks, and CPUs Details for events
  • 15. ENGINEERS AND DEVICES WORKING TOGETHER Agenda ● Background ○ Review of typical workflow for GTS tuning ○ Introduce a workflow for EAS tuning ○ Quick introduction of the tools that support the new workflow ● Worked examples ○ Development platform for the worked examples ○ Task ping-pong issue ○ Small task staying on big core ● Further reading
  • 16. ENGINEERS AND DEVICES WORKING TOGETHER Development platform for the worked examples ● All examples use artificial workloads to provoke a specific behaviour ○ It turned out to be quite difficult to deliberately provoke undesired behavior! ● Examples are reproducible on 96Boards HiKey ○ Octo-A53 multi-cluster (2x4) SMP device with five OPPs per cluster ■ Not big.LITTLE, and not using a fast/slow silicon process ○ We are able to fake a fast/slow system by using asymmetric power modeling parameters and artificially reducing the running/runnable delta time for “fast” CPU so the metrics indicate that is has a higher performance ● Most plots shown in these slides are copied from a LISA notebook ○ Notebooks and trace files have been shared for use after training
  • 17. ENGINEERS AND DEVICES WORKING TOGETHER Agenda ● Background ○ Review of typical workflow for GTS tuning ○ Introduce a workflow for EAS tuning ○ Quick introduction of the tools that support the new workflow ● Worked examples ○ Development platform for the worked examples ○ Task ping-pong issue ○ Small task staying on big core ● Further reading
  • 18. ENGINEERS AND DEVICES WORKING TOGETHER Testing environment ● CPU capacity info ○ The little core’s highest capacity is 447@850MHz ○ The big core’s highest capacity is 1024@1.1GHz ○ This case is running with correct power model parameters ● Test case ○ 16 small tasks are running with 15% utilization of little CPU (util ~= 67) ○ A single large task is running with 40% utilization of little CPU (util ~= 180)
  • 19. ENGINEERS AND DEVICES WORKING TOGETHER General analysis steps LISA:: Trace() LISA:: Filters() Generate workload Filter out tasks LISA:: TraceAnalysis() Analyze tasks LISA:: TestEnv() Connect with target Analyze events LISA:: rtapp() Step 1: Run workload and generate trace data Step 2: Analyze trace data platform.json Platform description file
  • 20. ENGINEERS AND DEVICES WORKING TOGETHER Connect with target board Specify target board info for connection Calibration for CPUs Tools copied to target board Enable ftrace events Create connection
  • 21. ENGINEERS AND DEVICES WORKING TOGETHER Generate and execute workload Define workload Capture Ftrace data Execute workload Capture energy data
  • 22. ENGINEERS AND DEVICES WORKING TOGETHER Graph showing task placement in LISA Specify events to be extracted Specify time interval Display task placement graph
  • 23. ENGINEERS AND DEVICES WORKING TOGETHER First make a quick graph showing task placement... Too many tasks, need method to quickly filter out statistics for every task.
  • 24. ENGINEERS AND DEVICES WORKING TOGETHER … and decide how to tackle step 2 analysis LISA:: Trace() LISA:: Filters() Generate workload Filter out tasks LISA:: TraceAnalysis() Analyze tasks LISA:: TestEnv() Connect with target Analyze events LISA:: rtapp() Step 1: Run workload and generate trace data Step 2: Analyze trace data platform.json Platform description file
  • 25. ENGINEERS AND DEVICES WORKING TOGETHER Analyze trace data for events events_to_parse = [ "sched_switch", "sched_wakeup", "sched_wakeup_new", "sched_contrib_scale_f", "sched_load_avg_cpu", "sched_load_avg_task", "sched_tune_config", "sched_tune_tasks_update", "sched_tune_boostgroup_update", "sched_tune_filter", "sched_boost_cpu", "sched_boost_task", "sched_energy_diff", "cpu_frequency", "cpu_capacity", "" ] platform.json { "clusters": { "big": [ 4, 5, 6, 7 ], "little": [ 0, 1, 2, 3 ] }, "cpus_count": 8, "freqs": { "big": [ 208000, 432000, 729000, 960000, 1200000 ], "little": [ 208000, 432000, 729000, 960000, 1200000 ] }, [...] }trace.dat Format: “SYSTRACE” or “Ftrace”
  • 26. ENGINEERS AND DEVICES WORKING TOGETHER Selecting only task of interest (big tasks) top_big_tasks {'mmcqd/0': 733, 'Task01' : 2441, 'task010': 2442, 'task011': 2443, 'task012': 2444, 'task015': 2447, 'task016': 2448, 'task017': 2449, 'task019': 2451, 'task020': 2452, 'task021': 2453, 'task022': 2454, 'task023': 2455, 'task024': 2456, 'task025': 2457}
  • 27. ENGINEERS AND DEVICES WORKING TOGETHER Plot big tasks with TraceAnalysis
  • 28. ENGINEERS AND DEVICES WORKING TOGETHER TraceAnalysis graph of task residency on CPUs At beginning task is placed on big core1 Then it ping-pongs between big cores and LITTLE cores 2
  • 29. ENGINEERS AND DEVICES WORKING TOGETHER TraceAnalysis graph of task PELT signals Big core’s highest capacity LITTLE core’s highest capacity Big core’s tipping point LITTLE core’s tipping point util_avg = PELT(running time) load_avg = PELT(running time + runnable time) * weight = PELT(running time + runnable time) (if NICE = 0) The difference between load_avg and util_avg is task’s runnable time on rq (for NICE=0)
  • 30. ENGINEERS AND DEVICES WORKING TOGETHER static int select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_flags) { [...] if (!sd) { if (energy_aware() && !cpu_rq(cpu)->rd->overutilized) new_cpu = energy_aware_wake_cpu(p, prev_cpu); else if (sd_flag & SD_BALANCE_WAKE) /* XXX always ? */ new_cpu = select_idle_sibling(p, new_cpu); } else while (sd) { [...] } } System cross tipping point for “over-utilized” static void enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) { [...] if (!se) { add_nr_running(rq, 1); if (!task_new && !rq->rd->overutilized && cpu_overutilized(rq->cpu)) rq->rd->overutilized = true; [...] } } Over tipping point EAS path SMP load balance static struct sched_group *find_busiest_group(struct lb_env *env) { if (energy_aware() && !env->dst_rq->rd->overutilized) goto out_balanced; [...] } EAS path SMP load balance
  • 31. ENGINEERS AND DEVICES WORKING TOGETHER Write function to Analyze tipping point If the LISA toolkit does not include the plotting function you need, you can write a plot function yourself
  • 32. ENGINEERS AND DEVICES WORKING TOGETHER Plot for tipping point System is over tipping point, migrate task from CPU3 (little core) to CPU4 (big core) System is under tipping point, migrate task from CPU4 (big core) to CPU3 (little core)
  • 33. ENGINEERS AND DEVICES WORKING TOGETHER Detailed trace log for migration to big core nohz_idle_balance() for tasks migration Migrate big task from CPU3 to CPU4
  • 34. ENGINEERS AND DEVICES WORKING TOGETHER Issue 1: migration big task back to LITTLE core Migrate task to LITTLE core Migrate big task from CPU4 to CPU3
  • 35. ENGINEERS AND DEVICES WORKING TOGETHER Issue 2: migration small tasks to big core Migrate small tasks to big core CPU is overutilized again
  • 36. ENGINEERS AND DEVICES WORKING TOGETHER Tipping point criteria Over tipping point Util: 90% Any CPU: cpu_util(cpu) > cpu_capacity(cpu) * 80% Under tipping point 80% of capacity E.g. LITTLE core: cpu_capacity(cpu0) = 447 ALL CPUs: cpu_util(cpu) < cpu_capacity(cpu) * 80% Util: 90% 80% of capacity E.g. Big core: cpu_capacity(cpu4) = 1024 Util: 70% 80% of capacity E.g. LITTLE core: cpu_capacity(cpu0) = 447 Util: 70% 80% of capacity E.g. Big core: cpu_capacity(cpu4) = 1024
  • 37. ENGINEERS AND DEVICES WORKING TOGETHER Phenomenon for ping-pong issue LITTLE cluster Over tipping point Under tipping point big cluster LITTLE cluster big cluster Task1 Task1 Task1 Task1 Migration
  • 38. ENGINEERS AND DEVICES WORKING TOGETHER Fixes for ping-pong issue LITTLE cluster Over tipping point Under tipping point big cluster LITTLE cluster big cluster Task1 Task1 Task1 Migration Filter out small tasks to avoid migrate to big core Avoid migrating big task back to LITTLE cluster
  • 39. ENGINEERS AND DEVICES WORKING TOGETHER Fallback to LITTLE cluster after it is idle LITTLE cluster Over tipping point Under tipping point big cluster LITTLE cluster big cluster Task1 Task1 Task1 Migration Migrate big task back to LITTLE cluster if it’s idle Task1
  • 40. ENGINEERS AND DEVICES WORKING TOGETHER Filter out small tasks for (tick, idle) load balance static int can_migrate_task(struct task_struct *p, struct lb_env *env) { [...] if (energy_aware() && (capacity_orig_of(env->dst_cpu) > capacity_orig_of(env->src_cpu))) { if (task_util(p) * 4 < capacity_orig_of(env->src_cpu)) return 0; } [...] } Filter out small tasks: task running time < ¼ LITTLE CPU capacity. These tasks will NOT be migrated to big core after return 0. Result: Only big tasks has a chance to migrate to big core.
  • 41. ENGINEERS AND DEVICES WORKING TOGETHER Avoid migrating big task to LITTLE cluster static bool need_spread_task(int cpu) { struct sched_domain *sd; int spread = 0, i; if (cpu_rq(cpu)->rd->overutilized) return 1; sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd); if (!sd) return 0; for_each_cpu(i, sched_domain_span(sd)) { if (cpu_rq(i)->cfs.h_nr_running >= 1 && cpu_halfutilized(i)) { spread = 1; Break; } } return spread; } static int select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_flags) { [...] if (!sd) { if (energy_aware() && (!need_spread_task(cpu) || need_filter_task(p))) new_cpu = energy_aware_wake_cpu(p, prev_cpu); else if (sd_flag & SD_BALANCE_WAKE) /* XXX always ? */ new_cpu = select_idle_sibling(p, new_cpu); } else while (sd) { [...] } } Check if cluster is busy or not as well as checking system tipping point: ● Easier to spread tasks within cluster if cluster is busy ● Fallback to migrating big task when cluster is idle
  • 42. ENGINEERS AND DEVICES WORKING TOGETHER static bool need_filter_task(struct task_struct *p) { int cpu = task_cpu(p); int origin_max_cap = capacity_orig_of(cpu); int target_max_cap = cpu_rq(cpu)->rd->max_cpu_capacity.val; struct sched_domain *sd; struct sched_group *sg; sd = rcu_dereference(per_cpu(sd_ea, cpu)); sg = sd->groups; do { int first_cpu = group_first_cpu(sg); if (capacity_orig_of(first_cpu) < target_max_cap && task_util(p) * 4 < capacity_orig_of(first_cpu)) target_max_cap = capacity_orig_of(first_cpu); } while (sg = sg->next, sg != sd->groups); if (target_max_cap < origin_max_cap) return 1; return 0; } Filter out small tasks for wake up balance Two purposes of this function: ● Select small tasks (task running time < ¼ LITTLE CPU capacity) and keep them on the energy aware path ● Prevent energy aware path for big tasks on the big core from doing harm to little tasks.
  • 43. ENGINEERS AND DEVICES WORKING TOGETHER Results after applying patches The big task always run on CPU6 and small tasks run on LITTLE cores!
  • 44. ENGINEERS AND DEVICES WORKING TOGETHER Agenda ● Background ○ Review of typical workflow for GTS tuning ○ Introduce a workflow for EAS tuning ○ Quick introduction of the tools that support the new workflow ● Worked examples ○ Development platform for the worked examples ○ Task ping-pong issue ○ Small task staying on big core ● Further reading
  • 45. ENGINEERS AND DEVICES WORKING TOGETHER Testing environment ● Testing environment ○ The LITTLE core’s highest capacity is 447@850MHz ○ The big core’s highest capacity is 1024@1.1GHz ○ Single small task is running with 9% utilization of big CPU (util ~= 95) ● Phenomenon ○ The single small task runs on big CPU for long time, even though its utilization is well below the tipping point
  • 46. ENGINEERS AND DEVICES WORKING TOGETHER Global View For Task’s Placement Small task run at big core for about 3s, during this period the system is not busy
  • 47. ENGINEERS AND DEVICES WORKING TOGETHER Analyze task utilization Filter only related tasks Analyze task’s utilization signal
  • 48. ENGINEERS AND DEVICES WORKING TOGETHER PELT Signals for task utilization The task utilization is normalized to value ~95 on big core, this utilization does not exceed the LITTLE core’s tipping point of 447 * 80% = 358. Thus the LITTLE core can meet the task’s requirement for capacity, so scheduler should place this task on a LITTLE core.
  • 49. ENGINEERS AND DEVICES WORKING TOGETHER Use kernelshark to check wake up path In energy aware path, we would expect to see “sched_boost_task”, but in this case the event is missing, implying the scheduler performed normal load balancing because “overutilized” flag is set. Thus the balancer is run to select an idle CPU in the lowest schedule domain. If previous CPU is idle the task will stick on previous CPU so it can benefit from a “hot cache”.
  • 50. ENGINEERS AND DEVICES WORKING TOGETHER The “tipping point” has been set for long time static inline void update_sg_lb_stats(struct lb_env *env, struct sched_group *group, int load_idx, int local_group, struct sg_lb_stats *sgs, bool *overload, bool *overutilized) { unsigned long load; int i, nr_running; memset(sgs, 0, sizeof(*sgs)); for_each_cpu_and(i, sched_group_cpus(group), env->cpus) { [...] if (cpu_overutilized(i)) { *overutilized = true; if (!sgs->group_misfit_task && rq->misfit_task) sgs->group_misfit_task = capacity_of(i); } [...] } } *overutilized is initialized as ‘false’ before we commence the update, so if any CPU is over-utilized, then this is enough to keep us over the tipping-point. So need analyze the load of every CPU.
  • 51. ENGINEERS AND DEVICES WORKING TOGETHER Plot for CPU utilization and idle state
  • 52. ENGINEERS AND DEVICES WORKING TOGETHER CPU utilization does not update during idle CPU utilization is updated when CPU is woken up after long time
  • 53. ENGINEERS AND DEVICES WORKING TOGETHER Fix Method: ignore overutilized state for idle CPUs static inline void update_sg_lb_stats(struct lb_env *env, struct sched_group *group, int load_idx, int local_group, struct sg_lb_stats *sgs, bool *overload, bool *overutilized) { unsigned long load; int i, nr_running; memset(sgs, 0, sizeof(*sgs)); for_each_cpu_and(i, sched_group_cpus(group), env->cpus) { [...] if (cpu_overutilized(i) && !idle_cpu(i)) { *overutilized = true; if (!sgs->group_misfit_task && rq->misfit_task) sgs->group_misfit_task = capacity_of(i); } [...] } } Code flow is altered so we only consider the overutilized state for non-idle CPUs
  • 54. ENGINEERS AND DEVICES WORKING TOGETHER After applying patch to fix this...
  • 55. ENGINEERS AND DEVICES WORKING TOGETHER Agenda ● Background ○ Review of typical workflow for GTS tuning ○ Introduce a workflow for EAS tuning ○ Quick introduction of the tools that support the new workflow ● Worked examples ○ Development platform for the worked examples ○ Task ping-pong issue ○ Small task staying on big core ● Further reading
  • 56. ENGINEERS AND DEVICES WORKING TOGETHER Related materials ● Notebooks and related materials for both worked examples ○ https://fileserver.linaro.org/owncloud/index.php/s/5gpVpzN0FdxMmGl ○ ipython notebooks for workload generation and analysis ○ Trace data before and after fixing together with platform.json ● Patches are under discussion on eas-dev mailing list ○ sched/fair: support to spread task in lowest schedule domain ○ sched/fair: avoid small task to migrate to higher capacity CPU ○ sched/fair: filter task for energy aware path ○ sched/fair: consider over utilized only for CPU is not idle
  • 57. ENGINEERS AND DEVICES WORKING TOGETHER Next steps ● You can debug the scheduler ○ Try to focus on decision making, not hacks ○ New decisions should be as generic as possible (ideally based on normalized units) ○ Sharing resulting patches for review is highly recommended ■ Perhaps fix can be improved or is already expressed differently by someone else ● Understanding tracepoint patches and the tooling from ARM ○ Basic python coding experience is needed to utilize LISA libraries ● Understanding SchedTune ○ SchedTune interferes with the task utilization levels for CPU selection and CPU utilization levels to bias CPU and OPP selection decisions ○ Evaluate energy-performance trade-off ○ Without tools, it’s hard to define and debug SchedTune boost margin on a specific platform
  • 58. Thank You #LAS16 For further information: www.linaro.org or support@linaro.org LAS16 keynotes and videos on: connect.linaro.org