SlideShare a Scribd company logo
1 of 35
Download to read offline
Ineffective and effective way to find out
latency bottlenecks by Ftrace
16 Feb, 2012
Yoshitake Kobayashi
Advanced Software Technology Group
Corporate Software Engineering Center
TOSHIBA CORPORATION
Copyright 2012, Toshiba Corporation.
Agenda
About Ftrace
Problem definition
Some actual examples to fix latency issue
Conclusion
2Yoshitake Kobayashi - Embedded Linux Conference 2012 -
About Ftrace
Ftrace is a lightweight internal tracer
Event tracer
Function tracer
Latency tracer
Stack tracer
The trace log stay in the kernel ring buffer
3Yoshitake Kobayashi - Embedded Linux Conference 2012 -
The trace log stay in the kernel ring buffer
Documentation available in kernel source tree
Documentation/trace/ftrace.txt
Documentation/trace/ftrace-design.txt
About Ftrace
Ftrace is a lightweight internal tracer
Event tracer
Function tracer
Latency tracer
Stack tracer
The trace log stay in the kernel ring buffer
4Yoshitake Kobayashi - Embedded Linux Conference 2012 -
The trace log stay in the kernel ring buffer
Documentation available in kernel source tree
Documentation/trace/ftrace.txt
Documentation/trace/ftrace-design.txt
Definition of latency
Latency is a measure of time delay experienced in a
system, the precise definition of which depends on the
system and the time being measured. Latencies may have
different meaning in different contexts.
* http://en.wikipedia.org/wiki/Latency_(engineering)
In this presentation
5Yoshitake Kobayashi - Embedded Linux Conference 2012 -
time
expected time
to wake up
actual time
to wake up
latency
Problem definition
Evaluate the scheduling latency by Cyclictest*…. but
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0 200 400 600 800 1000
latency(μ秒)
回数
Latency (us)
Numberofsamples
23ms
Why?
6Yoshitake Kobayashi - Embedded Linux Conference 2012 -
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
latency (us)
Numberofsamples
Why?
*Cyclictest (RTwiki): https://rt.wiki.kernel.org/articles/c/y/c/Cyclictest.html
Problem definition
Evaluate the scheduling latency by Cyclictest*…. but
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0 200 400 600 800 1000
latency(μ秒)
回数
Latency (us)
Numberofsamples
23ms
Why?
7Yoshitake Kobayashi - Embedded Linux Conference 2012 -
Need to identify the cause of above problems
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
latency (us)
Numberofsamples
Why?
*Cyclictest (RTwiki): https://rt.wiki.kernel.org/articles/c/y/c/Cyclictest.html
Problem definition
Evaluate the scheduling latency by Cyclictest*…. but
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0 200 400 600 800 1000
latency(μ秒)
回数
Latency (us)
Numberofsamples
23ms
Why?
Case 1
Case 2
8Yoshitake Kobayashi - Embedded Linux Conference 2012 -
Need to identify the cause of above issues
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
latency (us)
Numberofsamples
Why?
Case 2
*Cyclictest (RTwiki): https://rt.wiki.kernel.org/articles/c/y/c/Cyclictest.html
First mistake
Incorrect use of trace-cmd
# trace-cmd start –p function_graph ; target_prog ; trace-cmd stop
The target_prog start with higher priority and stop if it missed
9Yoshitake Kobayashi - Embedded Linux Conference 2012 -
deadline
First mistake
Incorrect use of trace-cmd
# iostress ‘50%’ &
# trace-cmd start –p function_graph ; target_prog ; trace-cmd stop
The target_prog start with higher priority and stop if it missed
SCHED_OTHER SCHED_FIFO SCHED_OTHER
10Yoshitake Kobayashi - Embedded Linux Conference 2012 -
deadline
Ftrace record the logs in the kernel ring buffer
The older data just overwritten by new data
Evaluate with too much workload is not a good idea
Need to care about scheduling priority
Run a shell with higher priority
Stop logging in the target_prog
An example for stop tracing (Cyclictest)
rt-tests/src/cyclictest/cyclictest.c
Cyclictest has following option
“-b USEC” : send break trace command when latency > USEC
if (!stopped && tracelimit && (diff > tracelimit)) {
stopped++;
tracing(0);
shutdown++;
11Yoshitake Kobayashi - Embedded Linux Conference 2012 -
shutdown++;
pthread_mutex_lock(&break_thread_id_lock);
if (break_thread_id == 0)
break_thread_id = stat->tid;
break_thread_value = diff;
pthread_mutex_unlock(&break_thread_id_lock);
}
Case1: Evaluation environment
CPU: Intel Pentium 4(2.66GHz)
Memory: 512MB
Kernel: Linux 2.6.31.12-rt21
echo -1 > /proc/sys/kernel/sched_rt_runtime_us
12Yoshitake Kobayashi - Embedded Linux Conference 2012 -
Case1: Summary of evaluation result
Evaluated without other CPU work load
Cycle: 300us
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0 200 400 600 800 1000
latency(μ秒)
回数
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0 200 400 600 800 1000
latency(μ秒)
回数
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0 200 400 600 800 1000
latency(μ秒)
回数
Cycle: 500us Cycle: 1000us
Latency (us)
Numberofsamples
Latency (us)
Numberofsamples
Latency (us)
Numberofsamples 13Yoshitake Kobayashi - Embedded Linux Conference 2012 -
Cycle
[us]
Min.
Ave. Max Number of DL
misses
300 18.118 19.162 25.629 0
500 18.171 19.269 32.615 0
1000 17.935 19.361 27207.563 1
Case1: Check the trace log (First attempt)
Stop the Cyclictest if the evaluated latency is higher than
the given maximum
Backward search for sys_clock_gettime() function
Check the log for sys_rt_sigtimedwait() function
5586.207717 | 0) |5586.207717 | 0) |5586.207717 | 0) |5586.207717 | 0) | activate_task() {activate_task() {activate_task() {activate_task() {
5586.207718 | 0) | enqueue_task() {5586.207718 | 0) | enqueue_task() {5586.207718 | 0) | enqueue_task() {5586.207718 | 0) | enqueue_task() {
5586.207718 | 0) | enqueue_task_rt() {5586.207718 | 0) | enqueue_task_rt() {5586.207718 | 0) | enqueue_task_rt() {5586.207718 | 0) | enqueue_task_rt() {
5586.207718 | 0) | cpupri_set() {5586.207718 | 0) | cpupri_set() {5586.207718 | 0) | cpupri_set() {5586.207718 | 0) | cpupri_set() {
14Yoshitake Kobayashi - Embedded Linux Conference 2012 -
5586.207719 | 0) 0.407 us | _atomic_spin_lock_irqsave();5586.207719 | 0) 0.407 us | _atomic_spin_lock_irqsave();5586.207719 | 0) 0.407 us | _atomic_spin_lock_irqsave();5586.207719 | 0) 0.407 us | _atomic_spin_lock_irqsave();
5586.207720 | 0) 0.448 us | _atomic_spin_unlock_irqrestore();5586.207720 | 0) 0.448 us | _atomic_spin_unlock_irqrestore();5586.207720 | 0) 0.448 us | _atomic_spin_unlock_irqrestore();5586.207720 | 0) 0.448 us | _atomic_spin_unlock_irqrestore();
5586.207721 | 0) 0.430 us | _atomic_spin_lock_irqsave();5586.207721 | 0) 0.430 us | _atomic_spin_lock_irqsave();5586.207721 | 0) 0.430 us | _atomic_spin_lock_irqsave();5586.207721 | 0) 0.430 us | _atomic_spin_lock_irqsave();
5586.207722 | 0) 0.444 us | _atomic_spin_unlock_irqrestore();5586.207722 | 0) 0.444 us | _atomic_spin_unlock_irqrestore();5586.207722 | 0) 0.444 us | _atomic_spin_unlock_irqrestore();5586.207722 | 0) 0.444 us | _atomic_spin_unlock_irqrestore();
5586.207723 | 0) 4.391 us | }5586.207723 | 0) 4.391 us | }5586.207723 | 0) 4.391 us | }5586.207723 | 0) 4.391 us | }
5586.207723 | 0) 0.394 us | update_rt_migration();5586.207723 | 0) 0.394 us | update_rt_migration();5586.207723 | 0) 0.394 us | update_rt_migration();5586.207723 | 0) 0.394 us | update_rt_migration();
5586.207724 | 0) 6.158 us | }5586.207724 | 0) 6.158 us | }5586.207724 | 0) 6.158 us | }5586.207724 | 0) 6.158 us | }
5586.2077255586.2077255586.2077255586.207725 | 0) 7.040 us | }| 0) 7.040 us | }| 0) 7.040 us | }| 0) 7.040 us | }
5586.2358355586.2358355586.2358355586.235835 | 0) ! 28117.53 us || 0) ! 28117.53 us || 0) ! 28117.53 us || 0) ! 28117.53 us | }}}}
5586.235836 | 0) | check_preempt_curr_rt() {5586.235836 | 0) | check_preempt_curr_rt() {5586.235836 | 0) | check_preempt_curr_rt() {5586.235836 | 0) | check_preempt_curr_rt() {
5586.235836 | 0) 0.415 us | resched_task();5586.235836 | 0) 0.415 us | resched_task();5586.235836 | 0) 0.415 us | resched_task();5586.235836 | 0) 0.415 us | resched_task();
5586.235837 | 0) 1.397 us | }5586.235837 | 0) 1.397 us | }5586.235837 | 0) 1.397 us | }5586.235837 | 0) 1.397 us | }
Here!
Case1: Check the kernel source (First attempt)
The activate_task() defined in kernel/sched.c
static voidstatic voidstatic voidstatic void inc_nr_runninginc_nr_runninginc_nr_runninginc_nr_running(struct rq *rq)(struct rq *rq)(struct rq *rq)(struct rq *rq)
{{{{
rqrqrqrq---->nr_running++;>nr_running++;>nr_running++;>nr_running++;
}}}}
・・・・
・・・・
・・・・
static voidstatic voidstatic voidstatic void
activate_taskactivate_taskactivate_taskactivate_task(struct rq *rq, struct task_struct *p, int(struct rq *rq, struct task_struct *p, int(struct rq *rq, struct task_struct *p, int(struct rq *rq, struct task_struct *p, int
Only an increment instruction
15Yoshitake Kobayashi - Embedded Linux Conference 2012 -
activate_taskactivate_taskactivate_taskactivate_task(struct rq *rq, struct task_struct *p, int(struct rq *rq, struct task_struct *p, int(struct rq *rq, struct task_struct *p, int(struct rq *rq, struct task_struct *p, int
wakeup, bool head)wakeup, bool head)wakeup, bool head)wakeup, bool head)
{{{{
if (task_contributes_to_load(p))if (task_contributes_to_load(p))if (task_contributes_to_load(p))if (task_contributes_to_load(p))
rqrqrqrq---->nr_uninterruptible>nr_uninterruptible>nr_uninterruptible>nr_uninterruptible--------;;;;
enqueue_task(rq, p, wakeup, head);enqueue_task(rq, p, wakeup, head);enqueue_task(rq, p, wakeup, head);enqueue_task(rq, p, wakeup, head);
inc_nr_runninginc_nr_runninginc_nr_runninginc_nr_running(rq);(rq);(rq);(rq);
}}}}
Here!
Case1: Check the trace log (Second attempt)
Result of the function graph tracer
184.976213 | 0) 0.404 us | pre_schedule_rt();184.976213 | 0) 0.404 us | pre_schedule_rt();184.976213 | 0) 0.404 us | pre_schedule_rt();184.976213 | 0) 0.404 us | pre_schedule_rt();
184.976214 | 0) | put_prev_task_rt() {184.976214 | 0) | put_prev_task_rt() {184.976214 | 0) | put_prev_task_rt() {184.976214 | 0) | put_prev_task_rt() {
184.976214184.976214184.976214184.976214 | 0) || 0) || 0) || 0) | update_curr_rt() {update_curr_rt() {update_curr_rt() {update_curr_rt() {
185.004323185.004323185.004323185.004323 | 0) 0.537 us | sched_avg_update();| 0) 0.537 us | sched_avg_update();| 0) 0.537 us | sched_avg_update();| 0) 0.537 us | sched_avg_update();
185.004324 | 0) ! 28109.76 us |185.004324 | 0) ! 28109.76 us |185.004324 | 0) ! 28109.76 us |185.004324 | 0) ! 28109.76 us | }}}}
185.004324 | 0) ! 28110.62 us | }185.004324 | 0) ! 28110.62 us | }185.004324 | 0) ! 28110.62 us | }185.004324 | 0) ! 28110.62 us | }
185.004325 | 0) 0.496 us | pick_next_task_rt();185.004325 | 0) 0.496 us | pick_next_task_rt();185.004325 | 0) 0.496 us | pick_next_task_rt();185.004325 | 0) 0.496 us | pick_next_task_rt();
185.004326 | 0) 0.442 us | perf_counter_task_sched_out();185.004326 | 0) 0.442 us | perf_counter_task_sched_out();185.004326 | 0) 0.442 us | perf_counter_task_sched_out();185.004326 | 0) 0.442 us | perf_counter_task_sched_out();
185.004327 | 0) 0.434 us | native_load_sp0();185.004327 | 0) 0.434 us | native_load_sp0();185.004327 | 0) 0.434 us | native_load_sp0();185.004327 | 0) 0.434 us | native_load_sp0();
185.004328 | 0) 0.465 us | native_load_tls();185.004328 | 0) 0.465 us | native_load_tls();185.004328 | 0) 0.465 us | native_load_tls();185.004328 | 0) 0.465 us | native_load_tls();
Here!
16Yoshitake Kobayashi - Embedded Linux Conference 2012 -
185.004328 | 0) 0.465 us | native_load_tls();185.004328 | 0) 0.465 us | native_load_tls();185.004328 | 0) 0.465 us | native_load_tls();185.004328 | 0) 0.465 us | native_load_tls();
Case1: Check the kernel source (Second attempt)
Update_curr_rt() defined in kernel/sched.c
static voidstatic voidstatic voidstatic void update_curr_rtupdate_curr_rtupdate_curr_rtupdate_curr_rt(struct rq *rq)(struct rq *rq)(struct rq *rq)(struct rq *rq)
{{{{
struct task_struct *curr = rqstruct task_struct *curr = rqstruct task_struct *curr = rqstruct task_struct *curr = rq---->curr;>curr;>curr;>curr;
struct sched_rt_entity *rt_se = &currstruct sched_rt_entity *rt_se = &currstruct sched_rt_entity *rt_se = &currstruct sched_rt_entity *rt_se = &curr---->rt;>rt;>rt;>rt;
struct rt_rq *rt_rq = rt_rq_of_se(rt_se);struct rt_rq *rt_rq = rt_rq_of_se(rt_se);struct rt_rq *rt_rq = rt_rq_of_se(rt_se);struct rt_rq *rt_rq = rt_rq_of_se(rt_se);
u64 delta_exec;u64 delta_exec;u64 delta_exec;u64 delta_exec;
if (!task_has_rt_policy(curr))if (!task_has_rt_policy(curr))if (!task_has_rt_policy(curr))if (!task_has_rt_policy(curr))
return;return;return;return;
delta_exec = rqdelta_exec = rqdelta_exec = rqdelta_exec = rq---->clock>clock>clock>clock ---- currcurrcurrcurr---->se.exec_start;>se.exec_start;>se.exec_start;>se.exec_start;
if (unlikely((s64)delta_exec < 0))if (unlikely((s64)delta_exec < 0))if (unlikely((s64)delta_exec < 0))if (unlikely((s64)delta_exec < 0))
17Yoshitake Kobayashi - Embedded Linux Conference 2012 -
if (unlikely((s64)delta_exec < 0))if (unlikely((s64)delta_exec < 0))if (unlikely((s64)delta_exec < 0))if (unlikely((s64)delta_exec < 0))
delta_exec = 0;delta_exec = 0;delta_exec = 0;delta_exec = 0;
schedstat_set(currschedstat_set(currschedstat_set(currschedstat_set(curr---->se.exec_max, max(curr>se.exec_max, max(curr>se.exec_max, max(curr>se.exec_max, max(curr---->se.exec_max, delta_exec));>se.exec_max, delta_exec));>se.exec_max, delta_exec));>se.exec_max, delta_exec));
currcurrcurrcurr---->se.sum_exec_runtime += delta_exec;>se.sum_exec_runtime += delta_exec;>se.sum_exec_runtime += delta_exec;>se.sum_exec_runtime += delta_exec;
account_group_exec_runtime(curr, delta_exec);account_group_exec_runtime(curr, delta_exec);account_group_exec_runtime(curr, delta_exec);account_group_exec_runtime(curr, delta_exec);
currcurrcurrcurr---->se.exec_start = rq>se.exec_start = rq>se.exec_start = rq>se.exec_start = rq---->clock;>clock;>clock;>clock;
cpuacct_charge(curr, delta_exec);cpuacct_charge(curr, delta_exec);cpuacct_charge(curr, delta_exec);cpuacct_charge(curr, delta_exec);
sched_rt_avg_update(rq, delta_exec);sched_rt_avg_update(rq, delta_exec);sched_rt_avg_update(rq, delta_exec);sched_rt_avg_update(rq, delta_exec);
Maybe here!
But different
result.
Case1: Check the trace log (Third attempt)
641.941738 | 0) | schedule() {641.941738 | 0) | schedule() {641.941738 | 0) | schedule() {641.941738 | 0) | schedule() {
641.941738 | 0) |641.941738 | 0) |641.941738 | 0) |641.941738 | 0) | __schedule() {__schedule() {__schedule() {__schedule() {
641.941739 | 0) 0.421 us | rcu_qsctr_inc();641.941739 | 0) 0.421 us | rcu_qsctr_inc();641.941739 | 0) 0.421 us | rcu_qsctr_inc();641.941739 | 0) 0.421 us | rcu_qsctr_inc();
・・・・・・・・・・・・
641.941751 | 0) | pre_schedule_rt() {641.941751 | 0) | pre_schedule_rt() {641.941751 | 0) | pre_schedule_rt() {641.941751 | 0) | pre_schedule_rt() {
641.941752 | 0) 0.418 us | pull_rt_task();641.941752 | 0) 0.418 us | pull_rt_task();641.941752 | 0) 0.418 us | pull_rt_task();641.941752 | 0) 0.418 us | pull_rt_task();
641.941752 | 0) 1.281 us | }641.941752 | 0) 1.281 us | }641.941752 | 0) 1.281 us | }641.941752 | 0) 1.281 us | }
641.941753 | 0) | put_prev_task_rt() {641.941753 | 0) | put_prev_task_rt() {641.941753 | 0) | put_prev_task_rt() {641.941753 | 0) | put_prev_task_rt() {
641.941753 | 0) | update_curr_rt() {641.941753 | 0) | update_curr_rt() {641.941753 | 0) | update_curr_rt() {641.941753 | 0) | update_curr_rt() {
641.941754 | 0) 0.426 us | sched_avg_update();641.941754 | 0) 0.426 us | sched_avg_update();641.941754 | 0) 0.426 us | sched_avg_update();641.941754 | 0) 0.426 us | sched_avg_update();
641.941755 | 0) 1.310 us | }641.941755 | 0) 1.310 us | }641.941755 | 0) 1.310 us | }641.941755 | 0) 1.310 us | }
641.941755 | 0) 2.178 us | }641.941755 | 0) 2.178 us | }641.941755 | 0) 2.178 us | }641.941755 | 0) 2.178 us | }
641.941756641.941756641.941756641.941756 | 0) 0.423 us | pick_next_task_fair();| 0) 0.423 us | pick_next_task_fair();| 0) 0.423 us | pick_next_task_fair();| 0) 0.423 us | pick_next_task_fair();
18Yoshitake Kobayashi - Embedded Linux Conference 2012 -
641.941756641.941756641.941756641.941756 | 0) 0.423 us | pick_next_task_fair();| 0) 0.423 us | pick_next_task_fair();| 0) 0.423 us | pick_next_task_fair();| 0) 0.423 us | pick_next_task_fair();
641.969863641.969863641.969863641.969863 | 0) 0.839 us | pick_next_task_rt();| 0) 0.839 us | pick_next_task_rt();| 0) 0.839 us | pick_next_task_rt();| 0) 0.839 us | pick_next_task_rt();
641.969864 | 0) 0.422 us | pick_next_task_fair();641.969864 | 0) 0.422 us | pick_next_task_fair();641.969864 | 0) 0.422 us | pick_next_task_fair();641.969864 | 0) 0.422 us | pick_next_task_fair();
641.969865 | 0) 0.421 us | pick_next_task_idle();641.969865 | 0) 0.421 us | pick_next_task_idle();641.969865 | 0) 0.421 us | pick_next_task_idle();641.969865 | 0) 0.421 us | pick_next_task_idle();
641.969866 | 0) 0.461 us | perf_counter_task_sched_out();641.969866 | 0) 0.461 us | perf_counter_task_sched_out();641.969866 | 0) 0.461 us | perf_counter_task_sched_out();641.969866 | 0) 0.461 us | perf_counter_task_sched_out();
Here!
Case1: Check the kernel source (Third attempt)
Pick_next_task() also defined in kernel/sched.c
static inline struct task_struct *static inline struct task_struct *static inline struct task_struct *static inline struct task_struct *
pick_next_taskpick_next_taskpick_next_taskpick_next_task(struct rq *rq)(struct rq *rq)(struct rq *rq)(struct rq *rq)
{{{{
・・・・・・・・・・・・
if (likely(rqif (likely(rqif (likely(rqif (likely(rq---->nr_running == rq>nr_running == rq>nr_running == rq>nr_running == rq---->cfs.nr_running)) {>cfs.nr_running)) {>cfs.nr_running)) {>cfs.nr_running)) {
p = fair_sched_class.pick_next_task(rq);p = fair_sched_class.pick_next_task(rq);p = fair_sched_class.pick_next_task(rq);p = fair_sched_class.pick_next_task(rq);
if (likely(p))if (likely(p))if (likely(p))if (likely(p))
return p;return p;return p;return p;
}}}}
class = sched_class_highest;class = sched_class_highest;class = sched_class_highest;class = sched_class_highest;
19Yoshitake Kobayashi - Embedded Linux Conference 2012 -
class = sched_class_highest;class = sched_class_highest;class = sched_class_highest;class = sched_class_highest;
for ( ; ; ) {for ( ; ; ) {for ( ; ; ) {for ( ; ; ) {
p = classp = classp = classp = class---->pick_next_task(rq);>pick_next_task(rq);>pick_next_task(rq);>pick_next_task(rq);
if (p)if (p)if (p)if (p)
return p;return p;return p;return p;
・・・・・・・・・・・・
class = classclass = classclass = classclass = class---->next;>next;>next;>next;
}}}}
}}}}
asmlinkage void __schedasmlinkage void __schedasmlinkage void __schedasmlinkage void __sched __schedule__schedule__schedule__schedule(void)(void)(void)(void)
{{{{
・・・・・・・・・・・・
put_prev_task(rq, prev);put_prev_task(rq, prev);put_prev_task(rq, prev);put_prev_task(rq, prev);
next =next =next =next = pick_next_taskpick_next_taskpick_next_taskpick_next_task(rq);(rq);(rq);(rq);
Different
result again!
Case1: Summary of the evaluation results
Latency occurs in different points each time and difficult to
identify the cause
Only occurs on specific hardware
Maybe SMI (System Management Interrupt)
20Yoshitake Kobayashi - Embedded Linux Conference 2012 -
Learn a lesson from Case1
One shot trace log is not enough
21Yoshitake Kobayashi - Embedded Linux Conference 2012 -
Case2: Evaluation environment
CPU: ARM Coretex-A8
Memory: 512MB
Kernel: Linux 2.6.31.12-rt21
A function graph tracer patch applied
http://elinux.org/Ftrace_Function_Graph_ARM
22Yoshitake Kobayashi - Embedded Linux Conference 2012 -
Case2: Summary of evaluation result (First attempt)
Evaluated without other CPU workload
Cycle: 300us Cycle: 500us Cycle: 1000us
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
Latency (us)
Numberofsamples
Latency (us)
Numberofsamples
Latency (us)
Numberofsamples 23Yoshitake Kobayashi - Embedded Linux Conference 2012 -
Cycle [µs] Min. Ave. Max # of DL misses
300 11 19.025 921 3018
500 11 19.458 684 5026
1000 11 23.011 717 0
Case2: Summary of evaluation result (First attempt)
Evaluated with approx 60% CPU workload
Cycle: 300us Cycle: 500us Cycle: 1000us
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
Latency (us)
Numberofsamples
Latency (us)
Numberofsamples
Latency (us)
Numberofsamples 24Yoshitake Kobayashi - Embedded Linux Conference 2012 -
Cycle [µs] Min. Ave. Max # of DL misses
300 13 34.165 980 3018
500 13 33.234 760 5025
1000 12 35.014 714 0
504.294647 | 0) | sys_timer_getoverrun() {504.294647 | 0) | sys_timer_getoverrun() {504.294647 | 0) | sys_timer_getoverrun() {504.294647 | 0) | sys_timer_getoverrun() {
504.294649 | 0) 2.000 us | lock_timer();504.294649 | 0) 2.000 us | lock_timer();504.294649 | 0) 2.000 us | lock_timer();504.294649 | 0) 2.000 us | lock_timer();
504.294654 | 0) 6.625 us | }504.294654 | 0) 6.625 us | }504.294654 | 0) 6.625 us | }504.294654 | 0) 6.625 us | }
504.294656 | 0) |504.294656 | 0) |504.294656 | 0) |504.294656 | 0) | sys_rt_sigtimedwait() {sys_rt_sigtimedwait() {sys_rt_sigtimedwait() {sys_rt_sigtimedwait() {
504.294658 | 0) | dequeue_signal() {504.294658 | 0) | dequeue_signal() {504.294658 | 0) | dequeue_signal() {504.294658 | 0) | dequeue_signal() {
504.294660 | 0) | __dequeue_signal() {504.294660 | 0) | __dequeue_signal() {504.294660 | 0) | __dequeue_signal() {504.294660 | 0) | __dequeue_signal() {
504.294662 | 0) 1.625 us | next_signal();504.294662 | 0) 1.625 us | next_signal();504.294662 | 0) 1.625 us | next_signal();504.294662 | 0) 1.625 us | next_signal();
・・・・
・・・・
・・・・
・・・・
・・・・
Case2: Check the trace log (First attempt)
?
25Yoshitake Kobayashi - Embedded Linux Conference 2012 -
・・・・
・・・・
・・・・
・・・・
・・・・
504.297163 | 0) | recalc_sigpending() {504.297163 | 0) | recalc_sigpending() {504.297163 | 0) | recalc_sigpending() {504.297163 | 0) | recalc_sigpending() {
504.297165 | 0) | recalc_sigpending_tsk() {504.297165 | 0) | recalc_sigpending_tsk() {504.297165 | 0) | recalc_sigpending_tsk() {504.297165 | 0) | recalc_sigpending_tsk() {
504.297168 | 0) 5.250 us | }504.297168 | 0) 5.250 us | }504.297168 | 0) 5.250 us | }504.297168 | 0) 5.250 us | }
504.297170 | 0) ! 2513.625 us |504.297170 | 0) ! 2513.625 us |504.297170 | 0) ! 2513.625 us |504.297170 | 0) ! 2513.625 us | }}}}
504.297175 | 0) |504.297175 | 0) |504.297175 | 0) |504.297175 | 0) | sys_clock_gettime()sys_clock_gettime()sys_clock_gettime()sys_clock_gettime() {{{{
504.297177 | 0) | posix_ktime_get_ts() {504.297177 | 0) | posix_ktime_get_ts() {504.297177 | 0) | posix_ktime_get_ts() {504.297177 | 0) | posix_ktime_get_ts() {
504.297178 | 0) | ktime_get_ts() {504.297178 | 0) | ktime_get_ts() {504.297178 | 0) | ktime_get_ts() {504.297178 | 0) | ktime_get_ts() {
504.297181 | 0) | asm_do_IRQ() {504.297181 | 0) | asm_do_IRQ() {504.297181 | 0) | asm_do_IRQ() {504.297181 | 0) | asm_do_IRQ() {
Here!
?
504.295229 | 0) | __do_softirq() {504.295229 | 0) | __do_softirq() {504.295229 | 0) | __do_softirq() {504.295229 | 0) | __do_softirq() {
504.295232 | 0) | run_timer_softirq() {504.295232 | 0) | run_timer_softirq() {504.295232 | 0) | run_timer_softirq() {504.295232 | 0) | run_timer_softirq() {
504.295234 | 0) 1.750 us | hrtimer_run_pending();504.295234 | 0) 1.750 us | hrtimer_run_pending();504.295234 | 0) 1.750 us | hrtimer_run_pending();504.295234 | 0) 1.750 us | hrtimer_run_pending();
504.295239 | 0) 1.750 us | preempt_schedule();504.295239 | 0) 1.750 us | preempt_schedule();504.295239 | 0) 1.750 us | preempt_schedule();504.295239 | 0) 1.750 us | preempt_schedule();
504.295243 | 0) | ehci_watchdog() {504.295243 | 0) | ehci_watchdog() {504.295243 | 0) | ehci_watchdog() {504.295243 | 0) | ehci_watchdog() {
504.295245 | 0) | ehci_work() {504.295245 | 0) | ehci_work() {504.295245 | 0) | ehci_work() {504.295245 | 0) | ehci_work() {
504.295250 | 0) 1.875 us | free_cached_itd_list();504.295250 | 0) 1.875 us | free_cached_itd_list();504.295250 | 0) 1.875 us | free_cached_itd_list();504.295250 | 0) 1.875 us | free_cached_itd_list();
504.295256 | 0) 4.625 us | qh_completions();504.295256 | 0) 4.625 us | qh_completions();504.295256 | 0) 4.625 us | qh_completions();504.295256 | 0) 4.625 us | qh_completions();
504.295265 | 0) 3.375 us | qh_completions();504.295265 | 0) 3.375 us | qh_completions();504.295265 | 0) 3.375 us | qh_completions();504.295265 | 0) 3.375 us | qh_completions();
504.295272 | 0) 3.250 us | qh_completions();504.295272 | 0) 3.250 us | qh_completions();504.295272 | 0) 3.250 us | qh_completions();504.295272 | 0) 3.250 us | qh_completions();
504.295279 | 0) 3.250 us | qh_completions();504.295279 | 0) 3.250 us | qh_completions();504.295279 | 0) 3.250 us | qh_completions();504.295279 | 0) 3.250 us | qh_completions();
504.295286 | 0) 3.625 us | qh_completions();504.295286 | 0) 3.625 us | qh_completions();504.295286 | 0) 3.625 us | qh_completions();504.295286 | 0) 3.625 us | qh_completions();
Case2: Identify the latency (First attempt)
Point!
26Yoshitake Kobayashi - Embedded Linux Conference 2012 -
504.295286 | 0) 3.625 us | qh_completions();504.295286 | 0) 3.625 us | qh_completions();504.295286 | 0) 3.625 us | qh_completions();504.295286 | 0) 3.625 us | qh_completions();
504.295292 | 0) 3.375 us | qh_completions();504.295292 | 0) 3.375 us | qh_completions();504.295292 | 0) 3.375 us | qh_completions();504.295292 | 0) 3.375 us | qh_completions();
504.295299 | 0) 3.375 us | qh_completions();504.295299 | 0) 3.375 us | qh_completions();504.295299 | 0) 3.375 us | qh_completions();504.295299 | 0) 3.375 us | qh_completions();
504.295305 | 0) 3.375 us | qh_completions();504.295305 | 0) 3.375 us | qh_completions();504.295305 | 0) 3.375 us | qh_completions();504.295305 | 0) 3.375 us | qh_completions();
504.295312 | 0) 3.250 us | qh_completions();504.295312 | 0) 3.250 us | qh_completions();504.295312 | 0) 3.250 us | qh_completions();504.295312 | 0) 3.250 us | qh_completions();
504.295319 | 0) 3.250 us | qh_completions();504.295319 | 0) 3.250 us | qh_completions();504.295319 | 0) 3.250 us | qh_completions();504.295319 | 0) 3.250 us | qh_completions();
504.295325 | 0) 3.375 us | qh_completions()504.295325 | 0) 3.375 us | qh_completions()504.295325 | 0) 3.375 us | qh_completions()504.295325 | 0) 3.375 us | qh_completions();;;;
This function is used
to process and free
completed qtds for a qh.
(drivers/usb/host/ehci-q.c)
Case2: Cause and possible solution
Cause
Too many qh_completions() function call
Default polling threshold is 100ms
Possible solution
Change the polling threshold to 10ms
27Yoshitake Kobayashi - Embedded Linux Conference 2012 -
diffdiffdiffdiff --------git a/drivers/usb/host/ehcigit a/drivers/usb/host/ehcigit a/drivers/usb/host/ehcigit a/drivers/usb/host/ehci----hcd.c b/drivers/usb/host/ehcihcd.c b/drivers/usb/host/ehcihcd.c b/drivers/usb/host/ehcihcd.c b/drivers/usb/host/ehci----hcd.chcd.chcd.chcd.c
index 0c9b7d2..db2efd2 100644index 0c9b7d2..db2efd2 100644index 0c9b7d2..db2efd2 100644index 0c9b7d2..db2efd2 100644
------------ a/drivers/usb/host/ehcia/drivers/usb/host/ehcia/drivers/usb/host/ehcia/drivers/usb/host/ehci----hcd.chcd.chcd.chcd.c
+++ b/drivers/usb/host/ehci+++ b/drivers/usb/host/ehci+++ b/drivers/usb/host/ehci+++ b/drivers/usb/host/ehci----hcd.chcd.chcd.chcd.c
@@@@@@@@ ----83,7 +83,7 @@ static const char hcd_name [] = "ehci_hcd";83,7 +83,7 @@ static const char hcd_name [] = "ehci_hcd";83,7 +83,7 @@ static const char hcd_name [] = "ehci_hcd";83,7 +83,7 @@ static const char hcd_name [] = "ehci_hcd";
#define EHCI_TUNE_FLS 2 /* (small) 256 frame schedule */#define EHCI_TUNE_FLS 2 /* (small) 256 frame schedule */#define EHCI_TUNE_FLS 2 /* (small) 256 frame schedule */#define EHCI_TUNE_FLS 2 /* (small) 256 frame schedule */
#define EHCI_IAA_MSECS 10 /* arbitrary */#define EHCI_IAA_MSECS 10 /* arbitrary */#define EHCI_IAA_MSECS 10 /* arbitrary */#define EHCI_IAA_MSECS 10 /* arbitrary */
----#define EHCI_IO_JIFFIES (HZ/10) /* io watchdog > irq_thresh */#define EHCI_IO_JIFFIES (HZ/10) /* io watchdog > irq_thresh */#define EHCI_IO_JIFFIES (HZ/10) /* io watchdog > irq_thresh */#define EHCI_IO_JIFFIES (HZ/10) /* io watchdog > irq_thresh */
+#define EHCI_IO_JIFFIES (HZ/100) /* io watchdog > irq_thresh */+#define EHCI_IO_JIFFIES (HZ/100) /* io watchdog > irq_thresh */+#define EHCI_IO_JIFFIES (HZ/100) /* io watchdog > irq_thresh */+#define EHCI_IO_JIFFIES (HZ/100) /* io watchdog > irq_thresh */
#define EHCI_ASYNC_JIFFIES (HZ/20) /* async idle timeout */#define EHCI_ASYNC_JIFFIES (HZ/20) /* async idle timeout */#define EHCI_ASYNC_JIFFIES (HZ/20) /* async idle timeout */#define EHCI_ASYNC_JIFFIES (HZ/20) /* async idle timeout */
#define EHCI_SHRINK_FRAMES 5 /* async qh unlink delay */#define EHCI_SHRINK_FRAMES 5 /* async qh unlink delay */#define EHCI_SHRINK_FRAMES 5 /* async qh unlink delay */#define EHCI_SHRINK_FRAMES 5 /* async qh unlink delay */
Case2: Summary of evaluation result (Second attempt)
Evaluated without other CPU workload
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
Cycle: 300us Cycle: 500us Cycle: 1000us
Latency (us)
Numberofsamples
Latency (us)
Numberofsamples
Latency (us)
Numberofsamples 28Yoshitake Kobayashi - Embedded Linux Conference 2012 -
Cycle Min Ave. Max # of DL miss
300 11 17.590 209 0
500 11 17.583 117 0
1000 11 17.610 154 0
Case2: Summary of evaluation result (Second attempt)
Evaluated with approx 60% CPU workload
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0 100 200 300 400 500 600 700 800 900 1000
latency(μ秒)
回数
Cycle: 300us Cycle: 500us Cycle: 1000us
Latency (us)
Numberofsamples
Latency (us)
Numberofsamples
Latency (us)
Numberofsamples 29Yoshitake Kobayashi - Embedded Linux Conference 2012 -
Cycle Min Ave Max # of DL miss
300 13 33.365 259 0
500 13 29.695 228 0
1000 12 33.222 199 0
Maximum latency reduced to 1/4
Learn a lesson from Case2
Need to check what kind of functions are called
Need to count the number of callees
30Yoshitake Kobayashi - Embedded Linux Conference 2012 -
How to stabilize latency less than 50µs
Required spec is the following:
Single process
No network, No USB devices, No graphic device, No storage device
latency < 50µs
Evaluation result (cyclictest: cycle=250µs)
4500000
5000000
31Yoshitake Kobayashi - Embedded Linux Conference 2012 -
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
4500000
0 10 20 30 40 50 60 70 80 90 100
回数
latency(μ秒)Latency (us)
Numberofsamples
Why?
Second mistake
2000
4000
6000
8000
10000
12000
14000
回数Numberofsamples cycle = 250µs
32Yoshitake Kobayashi - Embedded Linux Conference 2012 -
Deadline miss ratio: 100%
Function graph tracing requires additional overhead
0
2000
0 500 1000 1500 2000
latency(μ秒)Latency (us)
Evaluate the latency with trace ON
2000
4000
6000
8000
10000
12000回数Numberofsamples cycle = 1500µs
33Yoshitake Kobayashi - Embedded Linux Conference 2012 -
This result seems to have same characteristics
Trace with the following setting
Stop if the latency is more than 790µs
Stop if the latency is between 575 and 585
0
2000
0 200 400 600 800 1000
latency(μ秒)Latency (us)
Conclusion
Evaluating with too much workload is not a good idea
Required log data is lost
One shot trace log is not enough
Need to check performance characteristics between trace on and off
Need to check if latency occurs at similar points
Require some creative thinking to identify the cause
34Yoshitake Kobayashi - Embedded Linux Conference 2012 -
Require some creative thinking to identify the cause
Check if the same function takes different execution time
Statistics approach may be applied
Check all functions included in specific area to identify each
function’s overhead
Eliminate callee’s overhead to calculate pure execution time for each
function’s algorithm
35Yoshitake Kobayashi - Embedded Linux Conference 2012 - 2008 / 7 / 24TOSHIBA Confidential

More Related Content

What's hot

Demystifying Android's Bluetooth Low Energy at MCE^3 Conf
Demystifying Android's Bluetooth Low Energy at MCE^3 ConfDemystifying Android's Bluetooth Low Energy at MCE^3 Conf
Demystifying Android's Bluetooth Low Energy at MCE^3 ConfPawel Urban
 
MCE^3 - Dariusz Seweryn, Paweł Urban - Demystifying Android's Bluetooth Low ...
MCE^3 - Dariusz Seweryn, Paweł Urban -  Demystifying Android's Bluetooth Low ...MCE^3 - Dariusz Seweryn, Paweł Urban -  Demystifying Android's Bluetooth Low ...
MCE^3 - Dariusz Seweryn, Paweł Urban - Demystifying Android's Bluetooth Low ...PROIDEA
 
Perth APAC Groundbreakers tour - SQL Techniques
Perth APAC Groundbreakers tour - SQL TechniquesPerth APAC Groundbreakers tour - SQL Techniques
Perth APAC Groundbreakers tour - SQL TechniquesConnor McDonald
 
Dsi 11g convert_to RAC
Dsi 11g convert_to RACDsi 11g convert_to RAC
Dsi 11g convert_to RACAnil Kumar
 
Errors detected in C++Builder
Errors detected in C++BuilderErrors detected in C++Builder
Errors detected in C++BuilderPVS-Studio
 
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c OptimizerWellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c OptimizerConnor McDonald
 

What's hot (7)

Demystifying Android's Bluetooth Low Energy at MCE^3 Conf
Demystifying Android's Bluetooth Low Energy at MCE^3 ConfDemystifying Android's Bluetooth Low Energy at MCE^3 Conf
Demystifying Android's Bluetooth Low Energy at MCE^3 Conf
 
MCE^3 - Dariusz Seweryn, Paweł Urban - Demystifying Android's Bluetooth Low ...
MCE^3 - Dariusz Seweryn, Paweł Urban -  Demystifying Android's Bluetooth Low ...MCE^3 - Dariusz Seweryn, Paweł Urban -  Demystifying Android's Bluetooth Low ...
MCE^3 - Dariusz Seweryn, Paweł Urban - Demystifying Android's Bluetooth Low ...
 
Perth APAC Groundbreakers tour - SQL Techniques
Perth APAC Groundbreakers tour - SQL TechniquesPerth APAC Groundbreakers tour - SQL Techniques
Perth APAC Groundbreakers tour - SQL Techniques
 
Dsi 11g convert_to RAC
Dsi 11g convert_to RACDsi 11g convert_to RAC
Dsi 11g convert_to RAC
 
Ieee 1149.1-2013-tutorial-ijtag
Ieee 1149.1-2013-tutorial-ijtagIeee 1149.1-2013-tutorial-ijtag
Ieee 1149.1-2013-tutorial-ijtag
 
Errors detected in C++Builder
Errors detected in C++BuilderErrors detected in C++Builder
Errors detected in C++Builder
 
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c OptimizerWellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
 

Viewers also liked

HKG15-305: Real Time processing comparing the RT patch vs Core isolation
HKG15-305: Real Time processing comparing the RT patch vs Core isolationHKG15-305: Real Time processing comparing the RT patch vs Core isolation
HKG15-305: Real Time processing comparing the RT patch vs Core isolationLinaro
 
Overview of Linux real-time challenges
Overview of Linux real-time challengesOverview of Linux real-time challenges
Overview of Linux real-time challengesDaniel Stenberg
 
An Essential Relationship between Real-time and Resource Partitioning
An Essential Relationship between Real-time and Resource PartitioningAn Essential Relationship between Real-time and Resource Partitioning
An Essential Relationship between Real-time and Resource PartitioningYoshitake Kobayashi
 
Soon To Be 60, Megastar Chiranjeevi Is Making A Comeback With His 150th Film,...
Soon To Be 60, Megastar Chiranjeevi Is Making A Comeback With His 150th Film,...Soon To Be 60, Megastar Chiranjeevi Is Making A Comeback With His 150th Film,...
Soon To Be 60, Megastar Chiranjeevi Is Making A Comeback With His 150th Film,...Vaikundarajan S
 
Technical Documentation_Embedded_Acoustic_DSP_Projects
Technical Documentation_Embedded_Acoustic_DSP_ProjectsTechnical Documentation_Embedded_Acoustic_DSP_Projects
Technical Documentation_Embedded_Acoustic_DSP_ProjectsEmmanuel Chidinma
 
Improvement of Scheduling Granularity for Deadline Scheduler
Improvement of Scheduling Granularity for Deadline Scheduler Improvement of Scheduling Granularity for Deadline Scheduler
Improvement of Scheduling Granularity for Deadline Scheduler Yoshitake Kobayashi
 
Poky meets Debian: Understanding how to make an embedded Linux by using an ex...
Poky meets Debian: Understanding how to make an embedded Linux by using an ex...Poky meets Debian: Understanding how to make an embedded Linux by using an ex...
Poky meets Debian: Understanding how to make an embedded Linux by using an ex...Yoshitake Kobayashi
 
Keshe - Nano and Gans Health Apps 2of4 25pp
Keshe - Nano and Gans Health Apps 2of4 25ppKeshe - Nano and Gans Health Apps 2of4 25pp
Keshe - Nano and Gans Health Apps 2of4 25ppExopolitics Hungary
 
INTEGRACION DE DATOS DE IMÁGENES DE SATELITE Y GEOQUÍMICOS PARA DEFINIR ZONAS...
INTEGRACION DE DATOS DE IMÁGENES DE SATELITE Y GEOQUÍMICOS PARA DEFINIR ZONAS...INTEGRACION DE DATOS DE IMÁGENES DE SATELITE Y GEOQUÍMICOS PARA DEFINIR ZONAS...
INTEGRACION DE DATOS DE IMÁGENES DE SATELITE Y GEOQUÍMICOS PARA DEFINIR ZONAS...Sonia GUiza-González
 
Lifi technology documentation
Lifi technology documentationLifi technology documentation
Lifi technology documentationSowjanya Jajaila
 
Sniffer for detecting lost mobile ppt
Sniffer for detecting lost mobile pptSniffer for detecting lost mobile ppt
Sniffer for detecting lost mobile pptasmita tarar
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Brendan Gregg
 
Ubiquitous documentation of learning: a framework of the introduction of mobi...
Ubiquitous documentation of learning: a framework of the introduction of mobi...Ubiquitous documentation of learning: a framework of the introduction of mobi...
Ubiquitous documentation of learning: a framework of the introduction of mobi...Gemma Tur
 
google glass,latest technology,btech seminar topic
google glass,latest technology,btech seminar topicgoogle glass,latest technology,btech seminar topic
google glass,latest technology,btech seminar topicShubham Gupta
 

Viewers also liked (20)

HKG15-305: Real Time processing comparing the RT patch vs Core isolation
HKG15-305: Real Time processing comparing the RT patch vs Core isolationHKG15-305: Real Time processing comparing the RT patch vs Core isolation
HKG15-305: Real Time processing comparing the RT patch vs Core isolation
 
Overview of Linux real-time challenges
Overview of Linux real-time challengesOverview of Linux real-time challenges
Overview of Linux real-time challenges
 
Mastering Real-time Linux
Mastering Real-time LinuxMastering Real-time Linux
Mastering Real-time Linux
 
An Essential Relationship between Real-time and Resource Partitioning
An Essential Relationship between Real-time and Resource PartitioningAn Essential Relationship between Real-time and Resource Partitioning
An Essential Relationship between Real-time and Resource Partitioning
 
Soon To Be 60, Megastar Chiranjeevi Is Making A Comeback With His 150th Film,...
Soon To Be 60, Megastar Chiranjeevi Is Making A Comeback With His 150th Film,...Soon To Be 60, Megastar Chiranjeevi Is Making A Comeback With His 150th Film,...
Soon To Be 60, Megastar Chiranjeevi Is Making A Comeback With His 150th Film,...
 
Technical Documentation_Embedded_Acoustic_DSP_Projects
Technical Documentation_Embedded_Acoustic_DSP_ProjectsTechnical Documentation_Embedded_Acoustic_DSP_Projects
Technical Documentation_Embedded_Acoustic_DSP_Projects
 
Improvement of Scheduling Granularity for Deadline Scheduler
Improvement of Scheduling Granularity for Deadline Scheduler Improvement of Scheduling Granularity for Deadline Scheduler
Improvement of Scheduling Granularity for Deadline Scheduler
 
Cell Phone Controlled Robotic Vehicle
Cell Phone Controlled Robotic VehicleCell Phone Controlled Robotic Vehicle
Cell Phone Controlled Robotic Vehicle
 
Poky meets Debian: Understanding how to make an embedded Linux by using an ex...
Poky meets Debian: Understanding how to make an embedded Linux by using an ex...Poky meets Debian: Understanding how to make an embedded Linux by using an ex...
Poky meets Debian: Understanding how to make an embedded Linux by using an ex...
 
Keshe - Nano and Gans Health Apps 2of4 25pp
Keshe - Nano and Gans Health Apps 2of4 25ppKeshe - Nano and Gans Health Apps 2of4 25pp
Keshe - Nano and Gans Health Apps 2of4 25pp
 
INTEGRACION DE DATOS DE IMÁGENES DE SATELITE Y GEOQUÍMICOS PARA DEFINIR ZONAS...
INTEGRACION DE DATOS DE IMÁGENES DE SATELITE Y GEOQUÍMICOS PARA DEFINIR ZONAS...INTEGRACION DE DATOS DE IMÁGENES DE SATELITE Y GEOQUÍMICOS PARA DEFINIR ZONAS...
INTEGRACION DE DATOS DE IMÁGENES DE SATELITE Y GEOQUÍMICOS PARA DEFINIR ZONAS...
 
Lifi technology documentation
Lifi technology documentationLifi technology documentation
Lifi technology documentation
 
Making Linux do Hard Real-time
Making Linux do Hard Real-timeMaking Linux do Hard Real-time
Making Linux do Hard Real-time
 
Sniffer for detecting lost mobile ppt
Sniffer for detecting lost mobile pptSniffer for detecting lost mobile ppt
Sniffer for detecting lost mobile ppt
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
Comunicacion satelite
Comunicacion sateliteComunicacion satelite
Comunicacion satelite
 
Ubiquitous documentation of learning: a framework of the introduction of mobi...
Ubiquitous documentation of learning: a framework of the introduction of mobi...Ubiquitous documentation of learning: a framework of the introduction of mobi...
Ubiquitous documentation of learning: a framework of the introduction of mobi...
 
google glass,latest technology,btech seminar topic
google glass,latest technology,btech seminar topicgoogle glass,latest technology,btech seminar topic
google glass,latest technology,btech seminar topic
 
Nano technology
Nano technologyNano technology
Nano technology
 

Similar to Ineffective and Effective Ways To Find Out Latency Bottlenecks With Ftrace

44CON London 2015 - Jtagsploitation: 5 wires, 5 ways to root
44CON London 2015 - Jtagsploitation: 5 wires, 5 ways to root44CON London 2015 - Jtagsploitation: 5 wires, 5 ways to root
44CON London 2015 - Jtagsploitation: 5 wires, 5 ways to root44CON
 
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner FischerNETWAYS
 
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner FischerNETWAYS
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and ArchitectureSidney Chen
 
了解IO协议栈
了解IO协议栈了解IO协议栈
了解IO协议栈Feng Yu
 
Io 120404052713-phpapp01
Io 120404052713-phpapp01Io 120404052713-phpapp01
Io 120404052713-phpapp01coxchen
 
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringOSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringNETWAYS
 
Jboss World 2011 Infinispan
Jboss World 2011 InfinispanJboss World 2011 Infinispan
Jboss World 2011 Infinispancbo_
 
MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"Hideyuki Kawashima
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityBrendan Gregg
 
Observability of InfluxDB IOx: Tracing, Metrics and System Tables
Observability of InfluxDB IOx: Tracing, Metrics and System TablesObservability of InfluxDB IOx: Tracing, Metrics and System Tables
Observability of InfluxDB IOx: Tracing, Metrics and System TablesInfluxData
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedAnne Nicolas
 
Interruption Timer Périodique
Interruption Timer PériodiqueInterruption Timer Périodique
Interruption Timer PériodiqueAnne Nicolas
 
Improving the performance of Odoo deployments
Improving the performance of Odoo deploymentsImproving the performance of Odoo deployments
Improving the performance of Odoo deploymentsOdoo
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek PROIDEA
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackJakub Hajek
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringGeorg Schönberger
 
OSDC 2015: Georg Schönberger | Linux Performance Profiling and Monitoring
OSDC 2015: Georg Schönberger | Linux Performance Profiling and MonitoringOSDC 2015: Georg Schönberger | Linux Performance Profiling and Monitoring
OSDC 2015: Georg Schönberger | Linux Performance Profiling and MonitoringNETWAYS
 
Introducing Scylla Manager: Cluster Management and Task Automation
Introducing Scylla Manager: Cluster Management and Task AutomationIntroducing Scylla Manager: Cluster Management and Task Automation
Introducing Scylla Manager: Cluster Management and Task AutomationScyllaDB
 

Similar to Ineffective and Effective Ways To Find Out Latency Bottlenecks With Ftrace (20)

44CON London 2015 - Jtagsploitation: 5 wires, 5 ways to root
44CON London 2015 - Jtagsploitation: 5 wires, 5 ways to root44CON London 2015 - Jtagsploitation: 5 wires, 5 ways to root
44CON London 2015 - Jtagsploitation: 5 wires, 5 ways to root
 
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
 
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and Architecture
 
了解IO协议栈
了解IO协议栈了解IO协议栈
了解IO协议栈
 
Io 120404052713-phpapp01
Io 120404052713-phpapp01Io 120404052713-phpapp01
Io 120404052713-phpapp01
 
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringOSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
 
Jboss World 2011 Infinispan
Jboss World 2011 InfinispanJboss World 2011 Infinispan
Jboss World 2011 Infinispan
 
MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
 
Observability of InfluxDB IOx: Tracing, Metrics and System Tables
Observability of InfluxDB IOx: Tracing, Metrics and System TablesObservability of InfluxDB IOx: Tracing, Metrics and System Tables
Observability of InfluxDB IOx: Tracing, Metrics and System Tables
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
 
Interruption Timer Périodique
Interruption Timer PériodiqueInterruption Timer Périodique
Interruption Timer Périodique
 
Improving the performance of Odoo deployments
Improving the performance of Odoo deploymentsImproving the performance of Odoo deployments
Improving the performance of Odoo deployments
 
sun solaris
sun solarissun solaris
sun solaris
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and Monitoring
 
OSDC 2015: Georg Schönberger | Linux Performance Profiling and Monitoring
OSDC 2015: Georg Schönberger | Linux Performance Profiling and MonitoringOSDC 2015: Georg Schönberger | Linux Performance Profiling and Monitoring
OSDC 2015: Georg Schönberger | Linux Performance Profiling and Monitoring
 
Introducing Scylla Manager: Cluster Management and Task Automation
Introducing Scylla Manager: Cluster Management and Task AutomationIntroducing Scylla Manager: Cluster Management and Task Automation
Introducing Scylla Manager: Cluster Management and Task Automation
 

More from Yoshitake Kobayashi

InnerSource Learning Path (Japanese)
InnerSource Learning Path (Japanese)InnerSource Learning Path (Japanese)
InnerSource Learning Path (Japanese)Yoshitake Kobayashi
 
Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...
Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...
Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...Yoshitake Kobayashi
 
SLTS kernel and base-layer development in the Civil Infrastructure Platform
SLTS kernel and base-layer development in the Civil Infrastructure PlatformSLTS kernel and base-layer development in the Civil Infrastructure Platform
SLTS kernel and base-layer development in the Civil Infrastructure PlatformYoshitake Kobayashi
 
Time is ready for the Civil Infrastructure Platform
Time is ready for the Civil Infrastructure PlatformTime is ready for the Civil Infrastructure Platform
Time is ready for the Civil Infrastructure PlatformYoshitake Kobayashi
 
Introducing the Civil Infrastructure Platform Project
Introducing the Civil Infrastructure Platform ProjectIntroducing the Civil Infrastructure Platform Project
Introducing the Civil Infrastructure Platform ProjectYoshitake Kobayashi
 
Introducing the Civil Infrastructure Platform
Introducing the Civil Infrastructure PlatformIntroducing the Civil Infrastructure Platform
Introducing the Civil Infrastructure PlatformYoshitake Kobayashi
 
The Latest Status of CE Workgroup Shared Embedded Linux Distribution Project
 The Latest Status of CE Workgroup Shared Embedded Linux Distribution Project The Latest Status of CE Workgroup Shared Embedded Linux Distribution Project
The Latest Status of CE Workgroup Shared Embedded Linux Distribution ProjectYoshitake Kobayashi
 
Applying Linux to the Civil Infrastructure
Applying Linux to the Civil InfrastructureApplying Linux to the Civil Infrastructure
Applying Linux to the Civil InfrastructureYoshitake Kobayashi
 
Using Embedded Linux for Infrastructure Systems
Using Embedded Linux for Infrastructure SystemsUsing Embedded Linux for Infrastructure Systems
Using Embedded Linux for Infrastructure SystemsYoshitake Kobayashi
 
Using Real-Time Patch with LTSI Kernel
Using Real-Time Patch with LTSI KernelUsing Real-Time Patch with LTSI Kernel
Using Real-Time Patch with LTSI KernelYoshitake Kobayashi
 
Deadline Miss Detection with SCHED_DEADLINE
Deadline Miss Detection with SCHED_DEADLINEDeadline Miss Detection with SCHED_DEADLINE
Deadline Miss Detection with SCHED_DEADLINEYoshitake Kobayashi
 
Moving Forward: Overcoming Compatibility Issues BoFs
Moving Forward: Overcoming Compatibility Issues BoFs Moving Forward: Overcoming Compatibility Issues BoFs
Moving Forward: Overcoming Compatibility Issues BoFs Yoshitake Kobayashi
 
Evaluation of Data Reliability on Linux File Systems
Evaluation of Data Reliability on Linux File SystemsEvaluation of Data Reliability on Linux File Systems
Evaluation of Data Reliability on Linux File SystemsYoshitake Kobayashi
 

More from Yoshitake Kobayashi (13)

InnerSource Learning Path (Japanese)
InnerSource Learning Path (Japanese)InnerSource Learning Path (Japanese)
InnerSource Learning Path (Japanese)
 
Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...
Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...
Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...
 
SLTS kernel and base-layer development in the Civil Infrastructure Platform
SLTS kernel and base-layer development in the Civil Infrastructure PlatformSLTS kernel and base-layer development in the Civil Infrastructure Platform
SLTS kernel and base-layer development in the Civil Infrastructure Platform
 
Time is ready for the Civil Infrastructure Platform
Time is ready for the Civil Infrastructure PlatformTime is ready for the Civil Infrastructure Platform
Time is ready for the Civil Infrastructure Platform
 
Introducing the Civil Infrastructure Platform Project
Introducing the Civil Infrastructure Platform ProjectIntroducing the Civil Infrastructure Platform Project
Introducing the Civil Infrastructure Platform Project
 
Introducing the Civil Infrastructure Platform
Introducing the Civil Infrastructure PlatformIntroducing the Civil Infrastructure Platform
Introducing the Civil Infrastructure Platform
 
The Latest Status of CE Workgroup Shared Embedded Linux Distribution Project
 The Latest Status of CE Workgroup Shared Embedded Linux Distribution Project The Latest Status of CE Workgroup Shared Embedded Linux Distribution Project
The Latest Status of CE Workgroup Shared Embedded Linux Distribution Project
 
Applying Linux to the Civil Infrastructure
Applying Linux to the Civil InfrastructureApplying Linux to the Civil Infrastructure
Applying Linux to the Civil Infrastructure
 
Using Embedded Linux for Infrastructure Systems
Using Embedded Linux for Infrastructure SystemsUsing Embedded Linux for Infrastructure Systems
Using Embedded Linux for Infrastructure Systems
 
Using Real-Time Patch with LTSI Kernel
Using Real-Time Patch with LTSI KernelUsing Real-Time Patch with LTSI Kernel
Using Real-Time Patch with LTSI Kernel
 
Deadline Miss Detection with SCHED_DEADLINE
Deadline Miss Detection with SCHED_DEADLINEDeadline Miss Detection with SCHED_DEADLINE
Deadline Miss Detection with SCHED_DEADLINE
 
Moving Forward: Overcoming Compatibility Issues BoFs
Moving Forward: Overcoming Compatibility Issues BoFs Moving Forward: Overcoming Compatibility Issues BoFs
Moving Forward: Overcoming Compatibility Issues BoFs
 
Evaluation of Data Reliability on Linux File Systems
Evaluation of Data Reliability on Linux File SystemsEvaluation of Data Reliability on Linux File Systems
Evaluation of Data Reliability on Linux File Systems
 

Recently uploaded

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Recently uploaded (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Ineffective and Effective Ways To Find Out Latency Bottlenecks With Ftrace

  • 1. Ineffective and effective way to find out latency bottlenecks by Ftrace 16 Feb, 2012 Yoshitake Kobayashi Advanced Software Technology Group Corporate Software Engineering Center TOSHIBA CORPORATION Copyright 2012, Toshiba Corporation.
  • 2. Agenda About Ftrace Problem definition Some actual examples to fix latency issue Conclusion 2Yoshitake Kobayashi - Embedded Linux Conference 2012 -
  • 3. About Ftrace Ftrace is a lightweight internal tracer Event tracer Function tracer Latency tracer Stack tracer The trace log stay in the kernel ring buffer 3Yoshitake Kobayashi - Embedded Linux Conference 2012 - The trace log stay in the kernel ring buffer Documentation available in kernel source tree Documentation/trace/ftrace.txt Documentation/trace/ftrace-design.txt
  • 4. About Ftrace Ftrace is a lightweight internal tracer Event tracer Function tracer Latency tracer Stack tracer The trace log stay in the kernel ring buffer 4Yoshitake Kobayashi - Embedded Linux Conference 2012 - The trace log stay in the kernel ring buffer Documentation available in kernel source tree Documentation/trace/ftrace.txt Documentation/trace/ftrace-design.txt
  • 5. Definition of latency Latency is a measure of time delay experienced in a system, the precise definition of which depends on the system and the time being measured. Latencies may have different meaning in different contexts. * http://en.wikipedia.org/wiki/Latency_(engineering) In this presentation 5Yoshitake Kobayashi - Embedded Linux Conference 2012 - time expected time to wake up actual time to wake up latency
  • 6. Problem definition Evaluate the scheduling latency by Cyclictest*…. but 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0 200 400 600 800 1000 latency(μ秒) 回数 Latency (us) Numberofsamples 23ms Why? 6Yoshitake Kobayashi - Embedded Linux Conference 2012 - 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 0 100 200 300 400 500 600 700 800 900 1000 latency(μ秒) 回数 latency (us) Numberofsamples Why? *Cyclictest (RTwiki): https://rt.wiki.kernel.org/articles/c/y/c/Cyclictest.html
  • 7. Problem definition Evaluate the scheduling latency by Cyclictest*…. but 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0 200 400 600 800 1000 latency(μ秒) 回数 Latency (us) Numberofsamples 23ms Why? 7Yoshitake Kobayashi - Embedded Linux Conference 2012 - Need to identify the cause of above problems 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 0 100 200 300 400 500 600 700 800 900 1000 latency(μ秒) 回数 latency (us) Numberofsamples Why? *Cyclictest (RTwiki): https://rt.wiki.kernel.org/articles/c/y/c/Cyclictest.html
  • 8. Problem definition Evaluate the scheduling latency by Cyclictest*…. but 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0 200 400 600 800 1000 latency(μ秒) 回数 Latency (us) Numberofsamples 23ms Why? Case 1 Case 2 8Yoshitake Kobayashi - Embedded Linux Conference 2012 - Need to identify the cause of above issues 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 0 100 200 300 400 500 600 700 800 900 1000 latency(μ秒) 回数 latency (us) Numberofsamples Why? Case 2 *Cyclictest (RTwiki): https://rt.wiki.kernel.org/articles/c/y/c/Cyclictest.html
  • 9. First mistake Incorrect use of trace-cmd # trace-cmd start –p function_graph ; target_prog ; trace-cmd stop The target_prog start with higher priority and stop if it missed 9Yoshitake Kobayashi - Embedded Linux Conference 2012 - deadline
  • 10. First mistake Incorrect use of trace-cmd # iostress ‘50%’ & # trace-cmd start –p function_graph ; target_prog ; trace-cmd stop The target_prog start with higher priority and stop if it missed SCHED_OTHER SCHED_FIFO SCHED_OTHER 10Yoshitake Kobayashi - Embedded Linux Conference 2012 - deadline Ftrace record the logs in the kernel ring buffer The older data just overwritten by new data Evaluate with too much workload is not a good idea Need to care about scheduling priority Run a shell with higher priority Stop logging in the target_prog
  • 11. An example for stop tracing (Cyclictest) rt-tests/src/cyclictest/cyclictest.c Cyclictest has following option “-b USEC” : send break trace command when latency > USEC if (!stopped && tracelimit && (diff > tracelimit)) { stopped++; tracing(0); shutdown++; 11Yoshitake Kobayashi - Embedded Linux Conference 2012 - shutdown++; pthread_mutex_lock(&break_thread_id_lock); if (break_thread_id == 0) break_thread_id = stat->tid; break_thread_value = diff; pthread_mutex_unlock(&break_thread_id_lock); }
  • 12. Case1: Evaluation environment CPU: Intel Pentium 4(2.66GHz) Memory: 512MB Kernel: Linux 2.6.31.12-rt21 echo -1 > /proc/sys/kernel/sched_rt_runtime_us 12Yoshitake Kobayashi - Embedded Linux Conference 2012 -
  • 13. Case1: Summary of evaluation result Evaluated without other CPU work load Cycle: 300us 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0 200 400 600 800 1000 latency(μ秒) 回数 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0 200 400 600 800 1000 latency(μ秒) 回数 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0 200 400 600 800 1000 latency(μ秒) 回数 Cycle: 500us Cycle: 1000us Latency (us) Numberofsamples Latency (us) Numberofsamples Latency (us) Numberofsamples 13Yoshitake Kobayashi - Embedded Linux Conference 2012 - Cycle [us] Min. Ave. Max Number of DL misses 300 18.118 19.162 25.629 0 500 18.171 19.269 32.615 0 1000 17.935 19.361 27207.563 1
  • 14. Case1: Check the trace log (First attempt) Stop the Cyclictest if the evaluated latency is higher than the given maximum Backward search for sys_clock_gettime() function Check the log for sys_rt_sigtimedwait() function 5586.207717 | 0) |5586.207717 | 0) |5586.207717 | 0) |5586.207717 | 0) | activate_task() {activate_task() {activate_task() {activate_task() { 5586.207718 | 0) | enqueue_task() {5586.207718 | 0) | enqueue_task() {5586.207718 | 0) | enqueue_task() {5586.207718 | 0) | enqueue_task() { 5586.207718 | 0) | enqueue_task_rt() {5586.207718 | 0) | enqueue_task_rt() {5586.207718 | 0) | enqueue_task_rt() {5586.207718 | 0) | enqueue_task_rt() { 5586.207718 | 0) | cpupri_set() {5586.207718 | 0) | cpupri_set() {5586.207718 | 0) | cpupri_set() {5586.207718 | 0) | cpupri_set() { 14Yoshitake Kobayashi - Embedded Linux Conference 2012 - 5586.207719 | 0) 0.407 us | _atomic_spin_lock_irqsave();5586.207719 | 0) 0.407 us | _atomic_spin_lock_irqsave();5586.207719 | 0) 0.407 us | _atomic_spin_lock_irqsave();5586.207719 | 0) 0.407 us | _atomic_spin_lock_irqsave(); 5586.207720 | 0) 0.448 us | _atomic_spin_unlock_irqrestore();5586.207720 | 0) 0.448 us | _atomic_spin_unlock_irqrestore();5586.207720 | 0) 0.448 us | _atomic_spin_unlock_irqrestore();5586.207720 | 0) 0.448 us | _atomic_spin_unlock_irqrestore(); 5586.207721 | 0) 0.430 us | _atomic_spin_lock_irqsave();5586.207721 | 0) 0.430 us | _atomic_spin_lock_irqsave();5586.207721 | 0) 0.430 us | _atomic_spin_lock_irqsave();5586.207721 | 0) 0.430 us | _atomic_spin_lock_irqsave(); 5586.207722 | 0) 0.444 us | _atomic_spin_unlock_irqrestore();5586.207722 | 0) 0.444 us | _atomic_spin_unlock_irqrestore();5586.207722 | 0) 0.444 us | _atomic_spin_unlock_irqrestore();5586.207722 | 0) 0.444 us | _atomic_spin_unlock_irqrestore(); 5586.207723 | 0) 4.391 us | }5586.207723 | 0) 4.391 us | }5586.207723 | 0) 4.391 us | }5586.207723 | 0) 4.391 us | } 5586.207723 | 0) 0.394 us | update_rt_migration();5586.207723 | 0) 0.394 us | update_rt_migration();5586.207723 | 0) 0.394 us | update_rt_migration();5586.207723 | 0) 0.394 us | update_rt_migration(); 5586.207724 | 0) 6.158 us | }5586.207724 | 0) 6.158 us | }5586.207724 | 0) 6.158 us | }5586.207724 | 0) 6.158 us | } 5586.2077255586.2077255586.2077255586.207725 | 0) 7.040 us | }| 0) 7.040 us | }| 0) 7.040 us | }| 0) 7.040 us | } 5586.2358355586.2358355586.2358355586.235835 | 0) ! 28117.53 us || 0) ! 28117.53 us || 0) ! 28117.53 us || 0) ! 28117.53 us | }}}} 5586.235836 | 0) | check_preempt_curr_rt() {5586.235836 | 0) | check_preempt_curr_rt() {5586.235836 | 0) | check_preempt_curr_rt() {5586.235836 | 0) | check_preempt_curr_rt() { 5586.235836 | 0) 0.415 us | resched_task();5586.235836 | 0) 0.415 us | resched_task();5586.235836 | 0) 0.415 us | resched_task();5586.235836 | 0) 0.415 us | resched_task(); 5586.235837 | 0) 1.397 us | }5586.235837 | 0) 1.397 us | }5586.235837 | 0) 1.397 us | }5586.235837 | 0) 1.397 us | } Here!
  • 15. Case1: Check the kernel source (First attempt) The activate_task() defined in kernel/sched.c static voidstatic voidstatic voidstatic void inc_nr_runninginc_nr_runninginc_nr_runninginc_nr_running(struct rq *rq)(struct rq *rq)(struct rq *rq)(struct rq *rq) {{{{ rqrqrqrq---->nr_running++;>nr_running++;>nr_running++;>nr_running++; }}}} ・・・・ ・・・・ ・・・・ static voidstatic voidstatic voidstatic void activate_taskactivate_taskactivate_taskactivate_task(struct rq *rq, struct task_struct *p, int(struct rq *rq, struct task_struct *p, int(struct rq *rq, struct task_struct *p, int(struct rq *rq, struct task_struct *p, int Only an increment instruction 15Yoshitake Kobayashi - Embedded Linux Conference 2012 - activate_taskactivate_taskactivate_taskactivate_task(struct rq *rq, struct task_struct *p, int(struct rq *rq, struct task_struct *p, int(struct rq *rq, struct task_struct *p, int(struct rq *rq, struct task_struct *p, int wakeup, bool head)wakeup, bool head)wakeup, bool head)wakeup, bool head) {{{{ if (task_contributes_to_load(p))if (task_contributes_to_load(p))if (task_contributes_to_load(p))if (task_contributes_to_load(p)) rqrqrqrq---->nr_uninterruptible>nr_uninterruptible>nr_uninterruptible>nr_uninterruptible--------;;;; enqueue_task(rq, p, wakeup, head);enqueue_task(rq, p, wakeup, head);enqueue_task(rq, p, wakeup, head);enqueue_task(rq, p, wakeup, head); inc_nr_runninginc_nr_runninginc_nr_runninginc_nr_running(rq);(rq);(rq);(rq); }}}} Here!
  • 16. Case1: Check the trace log (Second attempt) Result of the function graph tracer 184.976213 | 0) 0.404 us | pre_schedule_rt();184.976213 | 0) 0.404 us | pre_schedule_rt();184.976213 | 0) 0.404 us | pre_schedule_rt();184.976213 | 0) 0.404 us | pre_schedule_rt(); 184.976214 | 0) | put_prev_task_rt() {184.976214 | 0) | put_prev_task_rt() {184.976214 | 0) | put_prev_task_rt() {184.976214 | 0) | put_prev_task_rt() { 184.976214184.976214184.976214184.976214 | 0) || 0) || 0) || 0) | update_curr_rt() {update_curr_rt() {update_curr_rt() {update_curr_rt() { 185.004323185.004323185.004323185.004323 | 0) 0.537 us | sched_avg_update();| 0) 0.537 us | sched_avg_update();| 0) 0.537 us | sched_avg_update();| 0) 0.537 us | sched_avg_update(); 185.004324 | 0) ! 28109.76 us |185.004324 | 0) ! 28109.76 us |185.004324 | 0) ! 28109.76 us |185.004324 | 0) ! 28109.76 us | }}}} 185.004324 | 0) ! 28110.62 us | }185.004324 | 0) ! 28110.62 us | }185.004324 | 0) ! 28110.62 us | }185.004324 | 0) ! 28110.62 us | } 185.004325 | 0) 0.496 us | pick_next_task_rt();185.004325 | 0) 0.496 us | pick_next_task_rt();185.004325 | 0) 0.496 us | pick_next_task_rt();185.004325 | 0) 0.496 us | pick_next_task_rt(); 185.004326 | 0) 0.442 us | perf_counter_task_sched_out();185.004326 | 0) 0.442 us | perf_counter_task_sched_out();185.004326 | 0) 0.442 us | perf_counter_task_sched_out();185.004326 | 0) 0.442 us | perf_counter_task_sched_out(); 185.004327 | 0) 0.434 us | native_load_sp0();185.004327 | 0) 0.434 us | native_load_sp0();185.004327 | 0) 0.434 us | native_load_sp0();185.004327 | 0) 0.434 us | native_load_sp0(); 185.004328 | 0) 0.465 us | native_load_tls();185.004328 | 0) 0.465 us | native_load_tls();185.004328 | 0) 0.465 us | native_load_tls();185.004328 | 0) 0.465 us | native_load_tls(); Here! 16Yoshitake Kobayashi - Embedded Linux Conference 2012 - 185.004328 | 0) 0.465 us | native_load_tls();185.004328 | 0) 0.465 us | native_load_tls();185.004328 | 0) 0.465 us | native_load_tls();185.004328 | 0) 0.465 us | native_load_tls();
  • 17. Case1: Check the kernel source (Second attempt) Update_curr_rt() defined in kernel/sched.c static voidstatic voidstatic voidstatic void update_curr_rtupdate_curr_rtupdate_curr_rtupdate_curr_rt(struct rq *rq)(struct rq *rq)(struct rq *rq)(struct rq *rq) {{{{ struct task_struct *curr = rqstruct task_struct *curr = rqstruct task_struct *curr = rqstruct task_struct *curr = rq---->curr;>curr;>curr;>curr; struct sched_rt_entity *rt_se = &currstruct sched_rt_entity *rt_se = &currstruct sched_rt_entity *rt_se = &currstruct sched_rt_entity *rt_se = &curr---->rt;>rt;>rt;>rt; struct rt_rq *rt_rq = rt_rq_of_se(rt_se);struct rt_rq *rt_rq = rt_rq_of_se(rt_se);struct rt_rq *rt_rq = rt_rq_of_se(rt_se);struct rt_rq *rt_rq = rt_rq_of_se(rt_se); u64 delta_exec;u64 delta_exec;u64 delta_exec;u64 delta_exec; if (!task_has_rt_policy(curr))if (!task_has_rt_policy(curr))if (!task_has_rt_policy(curr))if (!task_has_rt_policy(curr)) return;return;return;return; delta_exec = rqdelta_exec = rqdelta_exec = rqdelta_exec = rq---->clock>clock>clock>clock ---- currcurrcurrcurr---->se.exec_start;>se.exec_start;>se.exec_start;>se.exec_start; if (unlikely((s64)delta_exec < 0))if (unlikely((s64)delta_exec < 0))if (unlikely((s64)delta_exec < 0))if (unlikely((s64)delta_exec < 0)) 17Yoshitake Kobayashi - Embedded Linux Conference 2012 - if (unlikely((s64)delta_exec < 0))if (unlikely((s64)delta_exec < 0))if (unlikely((s64)delta_exec < 0))if (unlikely((s64)delta_exec < 0)) delta_exec = 0;delta_exec = 0;delta_exec = 0;delta_exec = 0; schedstat_set(currschedstat_set(currschedstat_set(currschedstat_set(curr---->se.exec_max, max(curr>se.exec_max, max(curr>se.exec_max, max(curr>se.exec_max, max(curr---->se.exec_max, delta_exec));>se.exec_max, delta_exec));>se.exec_max, delta_exec));>se.exec_max, delta_exec)); currcurrcurrcurr---->se.sum_exec_runtime += delta_exec;>se.sum_exec_runtime += delta_exec;>se.sum_exec_runtime += delta_exec;>se.sum_exec_runtime += delta_exec; account_group_exec_runtime(curr, delta_exec);account_group_exec_runtime(curr, delta_exec);account_group_exec_runtime(curr, delta_exec);account_group_exec_runtime(curr, delta_exec); currcurrcurrcurr---->se.exec_start = rq>se.exec_start = rq>se.exec_start = rq>se.exec_start = rq---->clock;>clock;>clock;>clock; cpuacct_charge(curr, delta_exec);cpuacct_charge(curr, delta_exec);cpuacct_charge(curr, delta_exec);cpuacct_charge(curr, delta_exec); sched_rt_avg_update(rq, delta_exec);sched_rt_avg_update(rq, delta_exec);sched_rt_avg_update(rq, delta_exec);sched_rt_avg_update(rq, delta_exec); Maybe here! But different result.
  • 18. Case1: Check the trace log (Third attempt) 641.941738 | 0) | schedule() {641.941738 | 0) | schedule() {641.941738 | 0) | schedule() {641.941738 | 0) | schedule() { 641.941738 | 0) |641.941738 | 0) |641.941738 | 0) |641.941738 | 0) | __schedule() {__schedule() {__schedule() {__schedule() { 641.941739 | 0) 0.421 us | rcu_qsctr_inc();641.941739 | 0) 0.421 us | rcu_qsctr_inc();641.941739 | 0) 0.421 us | rcu_qsctr_inc();641.941739 | 0) 0.421 us | rcu_qsctr_inc(); ・・・・・・・・・・・・ 641.941751 | 0) | pre_schedule_rt() {641.941751 | 0) | pre_schedule_rt() {641.941751 | 0) | pre_schedule_rt() {641.941751 | 0) | pre_schedule_rt() { 641.941752 | 0) 0.418 us | pull_rt_task();641.941752 | 0) 0.418 us | pull_rt_task();641.941752 | 0) 0.418 us | pull_rt_task();641.941752 | 0) 0.418 us | pull_rt_task(); 641.941752 | 0) 1.281 us | }641.941752 | 0) 1.281 us | }641.941752 | 0) 1.281 us | }641.941752 | 0) 1.281 us | } 641.941753 | 0) | put_prev_task_rt() {641.941753 | 0) | put_prev_task_rt() {641.941753 | 0) | put_prev_task_rt() {641.941753 | 0) | put_prev_task_rt() { 641.941753 | 0) | update_curr_rt() {641.941753 | 0) | update_curr_rt() {641.941753 | 0) | update_curr_rt() {641.941753 | 0) | update_curr_rt() { 641.941754 | 0) 0.426 us | sched_avg_update();641.941754 | 0) 0.426 us | sched_avg_update();641.941754 | 0) 0.426 us | sched_avg_update();641.941754 | 0) 0.426 us | sched_avg_update(); 641.941755 | 0) 1.310 us | }641.941755 | 0) 1.310 us | }641.941755 | 0) 1.310 us | }641.941755 | 0) 1.310 us | } 641.941755 | 0) 2.178 us | }641.941755 | 0) 2.178 us | }641.941755 | 0) 2.178 us | }641.941755 | 0) 2.178 us | } 641.941756641.941756641.941756641.941756 | 0) 0.423 us | pick_next_task_fair();| 0) 0.423 us | pick_next_task_fair();| 0) 0.423 us | pick_next_task_fair();| 0) 0.423 us | pick_next_task_fair(); 18Yoshitake Kobayashi - Embedded Linux Conference 2012 - 641.941756641.941756641.941756641.941756 | 0) 0.423 us | pick_next_task_fair();| 0) 0.423 us | pick_next_task_fair();| 0) 0.423 us | pick_next_task_fair();| 0) 0.423 us | pick_next_task_fair(); 641.969863641.969863641.969863641.969863 | 0) 0.839 us | pick_next_task_rt();| 0) 0.839 us | pick_next_task_rt();| 0) 0.839 us | pick_next_task_rt();| 0) 0.839 us | pick_next_task_rt(); 641.969864 | 0) 0.422 us | pick_next_task_fair();641.969864 | 0) 0.422 us | pick_next_task_fair();641.969864 | 0) 0.422 us | pick_next_task_fair();641.969864 | 0) 0.422 us | pick_next_task_fair(); 641.969865 | 0) 0.421 us | pick_next_task_idle();641.969865 | 0) 0.421 us | pick_next_task_idle();641.969865 | 0) 0.421 us | pick_next_task_idle();641.969865 | 0) 0.421 us | pick_next_task_idle(); 641.969866 | 0) 0.461 us | perf_counter_task_sched_out();641.969866 | 0) 0.461 us | perf_counter_task_sched_out();641.969866 | 0) 0.461 us | perf_counter_task_sched_out();641.969866 | 0) 0.461 us | perf_counter_task_sched_out(); Here!
  • 19. Case1: Check the kernel source (Third attempt) Pick_next_task() also defined in kernel/sched.c static inline struct task_struct *static inline struct task_struct *static inline struct task_struct *static inline struct task_struct * pick_next_taskpick_next_taskpick_next_taskpick_next_task(struct rq *rq)(struct rq *rq)(struct rq *rq)(struct rq *rq) {{{{ ・・・・・・・・・・・・ if (likely(rqif (likely(rqif (likely(rqif (likely(rq---->nr_running == rq>nr_running == rq>nr_running == rq>nr_running == rq---->cfs.nr_running)) {>cfs.nr_running)) {>cfs.nr_running)) {>cfs.nr_running)) { p = fair_sched_class.pick_next_task(rq);p = fair_sched_class.pick_next_task(rq);p = fair_sched_class.pick_next_task(rq);p = fair_sched_class.pick_next_task(rq); if (likely(p))if (likely(p))if (likely(p))if (likely(p)) return p;return p;return p;return p; }}}} class = sched_class_highest;class = sched_class_highest;class = sched_class_highest;class = sched_class_highest; 19Yoshitake Kobayashi - Embedded Linux Conference 2012 - class = sched_class_highest;class = sched_class_highest;class = sched_class_highest;class = sched_class_highest; for ( ; ; ) {for ( ; ; ) {for ( ; ; ) {for ( ; ; ) { p = classp = classp = classp = class---->pick_next_task(rq);>pick_next_task(rq);>pick_next_task(rq);>pick_next_task(rq); if (p)if (p)if (p)if (p) return p;return p;return p;return p; ・・・・・・・・・・・・ class = classclass = classclass = classclass = class---->next;>next;>next;>next; }}}} }}}} asmlinkage void __schedasmlinkage void __schedasmlinkage void __schedasmlinkage void __sched __schedule__schedule__schedule__schedule(void)(void)(void)(void) {{{{ ・・・・・・・・・・・・ put_prev_task(rq, prev);put_prev_task(rq, prev);put_prev_task(rq, prev);put_prev_task(rq, prev); next =next =next =next = pick_next_taskpick_next_taskpick_next_taskpick_next_task(rq);(rq);(rq);(rq); Different result again!
  • 20. Case1: Summary of the evaluation results Latency occurs in different points each time and difficult to identify the cause Only occurs on specific hardware Maybe SMI (System Management Interrupt) 20Yoshitake Kobayashi - Embedded Linux Conference 2012 -
  • 21. Learn a lesson from Case1 One shot trace log is not enough 21Yoshitake Kobayashi - Embedded Linux Conference 2012 -
  • 22. Case2: Evaluation environment CPU: ARM Coretex-A8 Memory: 512MB Kernel: Linux 2.6.31.12-rt21 A function graph tracer patch applied http://elinux.org/Ftrace_Function_Graph_ARM 22Yoshitake Kobayashi - Embedded Linux Conference 2012 -
  • 23. Case2: Summary of evaluation result (First attempt) Evaluated without other CPU workload Cycle: 300us Cycle: 500us Cycle: 1000us 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 0 100 200 300 400 500 600 700 800 900 1000 latency(μ秒) 回数 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 0 100 200 300 400 500 600 700 800 900 1000 latency(μ秒) 回数 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 0 100 200 300 400 500 600 700 800 900 1000 latency(μ秒) 回数 Latency (us) Numberofsamples Latency (us) Numberofsamples Latency (us) Numberofsamples 23Yoshitake Kobayashi - Embedded Linux Conference 2012 - Cycle [µs] Min. Ave. Max # of DL misses 300 11 19.025 921 3018 500 11 19.458 684 5026 1000 11 23.011 717 0
  • 24. Case2: Summary of evaluation result (First attempt) Evaluated with approx 60% CPU workload Cycle: 300us Cycle: 500us Cycle: 1000us 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 0 100 200 300 400 500 600 700 800 900 1000 latency(μ秒) 回数 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 0 100 200 300 400 500 600 700 800 900 1000 latency(μ秒) 回数 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 0 100 200 300 400 500 600 700 800 900 1000 latency(μ秒) 回数 Latency (us) Numberofsamples Latency (us) Numberofsamples Latency (us) Numberofsamples 24Yoshitake Kobayashi - Embedded Linux Conference 2012 - Cycle [µs] Min. Ave. Max # of DL misses 300 13 34.165 980 3018 500 13 33.234 760 5025 1000 12 35.014 714 0
  • 25. 504.294647 | 0) | sys_timer_getoverrun() {504.294647 | 0) | sys_timer_getoverrun() {504.294647 | 0) | sys_timer_getoverrun() {504.294647 | 0) | sys_timer_getoverrun() { 504.294649 | 0) 2.000 us | lock_timer();504.294649 | 0) 2.000 us | lock_timer();504.294649 | 0) 2.000 us | lock_timer();504.294649 | 0) 2.000 us | lock_timer(); 504.294654 | 0) 6.625 us | }504.294654 | 0) 6.625 us | }504.294654 | 0) 6.625 us | }504.294654 | 0) 6.625 us | } 504.294656 | 0) |504.294656 | 0) |504.294656 | 0) |504.294656 | 0) | sys_rt_sigtimedwait() {sys_rt_sigtimedwait() {sys_rt_sigtimedwait() {sys_rt_sigtimedwait() { 504.294658 | 0) | dequeue_signal() {504.294658 | 0) | dequeue_signal() {504.294658 | 0) | dequeue_signal() {504.294658 | 0) | dequeue_signal() { 504.294660 | 0) | __dequeue_signal() {504.294660 | 0) | __dequeue_signal() {504.294660 | 0) | __dequeue_signal() {504.294660 | 0) | __dequeue_signal() { 504.294662 | 0) 1.625 us | next_signal();504.294662 | 0) 1.625 us | next_signal();504.294662 | 0) 1.625 us | next_signal();504.294662 | 0) 1.625 us | next_signal(); ・・・・ ・・・・ ・・・・ ・・・・ ・・・・ Case2: Check the trace log (First attempt) ? 25Yoshitake Kobayashi - Embedded Linux Conference 2012 - ・・・・ ・・・・ ・・・・ ・・・・ ・・・・ 504.297163 | 0) | recalc_sigpending() {504.297163 | 0) | recalc_sigpending() {504.297163 | 0) | recalc_sigpending() {504.297163 | 0) | recalc_sigpending() { 504.297165 | 0) | recalc_sigpending_tsk() {504.297165 | 0) | recalc_sigpending_tsk() {504.297165 | 0) | recalc_sigpending_tsk() {504.297165 | 0) | recalc_sigpending_tsk() { 504.297168 | 0) 5.250 us | }504.297168 | 0) 5.250 us | }504.297168 | 0) 5.250 us | }504.297168 | 0) 5.250 us | } 504.297170 | 0) ! 2513.625 us |504.297170 | 0) ! 2513.625 us |504.297170 | 0) ! 2513.625 us |504.297170 | 0) ! 2513.625 us | }}}} 504.297175 | 0) |504.297175 | 0) |504.297175 | 0) |504.297175 | 0) | sys_clock_gettime()sys_clock_gettime()sys_clock_gettime()sys_clock_gettime() {{{{ 504.297177 | 0) | posix_ktime_get_ts() {504.297177 | 0) | posix_ktime_get_ts() {504.297177 | 0) | posix_ktime_get_ts() {504.297177 | 0) | posix_ktime_get_ts() { 504.297178 | 0) | ktime_get_ts() {504.297178 | 0) | ktime_get_ts() {504.297178 | 0) | ktime_get_ts() {504.297178 | 0) | ktime_get_ts() { 504.297181 | 0) | asm_do_IRQ() {504.297181 | 0) | asm_do_IRQ() {504.297181 | 0) | asm_do_IRQ() {504.297181 | 0) | asm_do_IRQ() { Here! ?
  • 26. 504.295229 | 0) | __do_softirq() {504.295229 | 0) | __do_softirq() {504.295229 | 0) | __do_softirq() {504.295229 | 0) | __do_softirq() { 504.295232 | 0) | run_timer_softirq() {504.295232 | 0) | run_timer_softirq() {504.295232 | 0) | run_timer_softirq() {504.295232 | 0) | run_timer_softirq() { 504.295234 | 0) 1.750 us | hrtimer_run_pending();504.295234 | 0) 1.750 us | hrtimer_run_pending();504.295234 | 0) 1.750 us | hrtimer_run_pending();504.295234 | 0) 1.750 us | hrtimer_run_pending(); 504.295239 | 0) 1.750 us | preempt_schedule();504.295239 | 0) 1.750 us | preempt_schedule();504.295239 | 0) 1.750 us | preempt_schedule();504.295239 | 0) 1.750 us | preempt_schedule(); 504.295243 | 0) | ehci_watchdog() {504.295243 | 0) | ehci_watchdog() {504.295243 | 0) | ehci_watchdog() {504.295243 | 0) | ehci_watchdog() { 504.295245 | 0) | ehci_work() {504.295245 | 0) | ehci_work() {504.295245 | 0) | ehci_work() {504.295245 | 0) | ehci_work() { 504.295250 | 0) 1.875 us | free_cached_itd_list();504.295250 | 0) 1.875 us | free_cached_itd_list();504.295250 | 0) 1.875 us | free_cached_itd_list();504.295250 | 0) 1.875 us | free_cached_itd_list(); 504.295256 | 0) 4.625 us | qh_completions();504.295256 | 0) 4.625 us | qh_completions();504.295256 | 0) 4.625 us | qh_completions();504.295256 | 0) 4.625 us | qh_completions(); 504.295265 | 0) 3.375 us | qh_completions();504.295265 | 0) 3.375 us | qh_completions();504.295265 | 0) 3.375 us | qh_completions();504.295265 | 0) 3.375 us | qh_completions(); 504.295272 | 0) 3.250 us | qh_completions();504.295272 | 0) 3.250 us | qh_completions();504.295272 | 0) 3.250 us | qh_completions();504.295272 | 0) 3.250 us | qh_completions(); 504.295279 | 0) 3.250 us | qh_completions();504.295279 | 0) 3.250 us | qh_completions();504.295279 | 0) 3.250 us | qh_completions();504.295279 | 0) 3.250 us | qh_completions(); 504.295286 | 0) 3.625 us | qh_completions();504.295286 | 0) 3.625 us | qh_completions();504.295286 | 0) 3.625 us | qh_completions();504.295286 | 0) 3.625 us | qh_completions(); Case2: Identify the latency (First attempt) Point! 26Yoshitake Kobayashi - Embedded Linux Conference 2012 - 504.295286 | 0) 3.625 us | qh_completions();504.295286 | 0) 3.625 us | qh_completions();504.295286 | 0) 3.625 us | qh_completions();504.295286 | 0) 3.625 us | qh_completions(); 504.295292 | 0) 3.375 us | qh_completions();504.295292 | 0) 3.375 us | qh_completions();504.295292 | 0) 3.375 us | qh_completions();504.295292 | 0) 3.375 us | qh_completions(); 504.295299 | 0) 3.375 us | qh_completions();504.295299 | 0) 3.375 us | qh_completions();504.295299 | 0) 3.375 us | qh_completions();504.295299 | 0) 3.375 us | qh_completions(); 504.295305 | 0) 3.375 us | qh_completions();504.295305 | 0) 3.375 us | qh_completions();504.295305 | 0) 3.375 us | qh_completions();504.295305 | 0) 3.375 us | qh_completions(); 504.295312 | 0) 3.250 us | qh_completions();504.295312 | 0) 3.250 us | qh_completions();504.295312 | 0) 3.250 us | qh_completions();504.295312 | 0) 3.250 us | qh_completions(); 504.295319 | 0) 3.250 us | qh_completions();504.295319 | 0) 3.250 us | qh_completions();504.295319 | 0) 3.250 us | qh_completions();504.295319 | 0) 3.250 us | qh_completions(); 504.295325 | 0) 3.375 us | qh_completions()504.295325 | 0) 3.375 us | qh_completions()504.295325 | 0) 3.375 us | qh_completions()504.295325 | 0) 3.375 us | qh_completions();;;; This function is used to process and free completed qtds for a qh. (drivers/usb/host/ehci-q.c)
  • 27. Case2: Cause and possible solution Cause Too many qh_completions() function call Default polling threshold is 100ms Possible solution Change the polling threshold to 10ms 27Yoshitake Kobayashi - Embedded Linux Conference 2012 - diffdiffdiffdiff --------git a/drivers/usb/host/ehcigit a/drivers/usb/host/ehcigit a/drivers/usb/host/ehcigit a/drivers/usb/host/ehci----hcd.c b/drivers/usb/host/ehcihcd.c b/drivers/usb/host/ehcihcd.c b/drivers/usb/host/ehcihcd.c b/drivers/usb/host/ehci----hcd.chcd.chcd.chcd.c index 0c9b7d2..db2efd2 100644index 0c9b7d2..db2efd2 100644index 0c9b7d2..db2efd2 100644index 0c9b7d2..db2efd2 100644 ------------ a/drivers/usb/host/ehcia/drivers/usb/host/ehcia/drivers/usb/host/ehcia/drivers/usb/host/ehci----hcd.chcd.chcd.chcd.c +++ b/drivers/usb/host/ehci+++ b/drivers/usb/host/ehci+++ b/drivers/usb/host/ehci+++ b/drivers/usb/host/ehci----hcd.chcd.chcd.chcd.c @@@@@@@@ ----83,7 +83,7 @@ static const char hcd_name [] = "ehci_hcd";83,7 +83,7 @@ static const char hcd_name [] = "ehci_hcd";83,7 +83,7 @@ static const char hcd_name [] = "ehci_hcd";83,7 +83,7 @@ static const char hcd_name [] = "ehci_hcd"; #define EHCI_TUNE_FLS 2 /* (small) 256 frame schedule */#define EHCI_TUNE_FLS 2 /* (small) 256 frame schedule */#define EHCI_TUNE_FLS 2 /* (small) 256 frame schedule */#define EHCI_TUNE_FLS 2 /* (small) 256 frame schedule */ #define EHCI_IAA_MSECS 10 /* arbitrary */#define EHCI_IAA_MSECS 10 /* arbitrary */#define EHCI_IAA_MSECS 10 /* arbitrary */#define EHCI_IAA_MSECS 10 /* arbitrary */ ----#define EHCI_IO_JIFFIES (HZ/10) /* io watchdog > irq_thresh */#define EHCI_IO_JIFFIES (HZ/10) /* io watchdog > irq_thresh */#define EHCI_IO_JIFFIES (HZ/10) /* io watchdog > irq_thresh */#define EHCI_IO_JIFFIES (HZ/10) /* io watchdog > irq_thresh */ +#define EHCI_IO_JIFFIES (HZ/100) /* io watchdog > irq_thresh */+#define EHCI_IO_JIFFIES (HZ/100) /* io watchdog > irq_thresh */+#define EHCI_IO_JIFFIES (HZ/100) /* io watchdog > irq_thresh */+#define EHCI_IO_JIFFIES (HZ/100) /* io watchdog > irq_thresh */ #define EHCI_ASYNC_JIFFIES (HZ/20) /* async idle timeout */#define EHCI_ASYNC_JIFFIES (HZ/20) /* async idle timeout */#define EHCI_ASYNC_JIFFIES (HZ/20) /* async idle timeout */#define EHCI_ASYNC_JIFFIES (HZ/20) /* async idle timeout */ #define EHCI_SHRINK_FRAMES 5 /* async qh unlink delay */#define EHCI_SHRINK_FRAMES 5 /* async qh unlink delay */#define EHCI_SHRINK_FRAMES 5 /* async qh unlink delay */#define EHCI_SHRINK_FRAMES 5 /* async qh unlink delay */
  • 28. Case2: Summary of evaluation result (Second attempt) Evaluated without other CPU workload 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 0 100 200 300 400 500 600 700 800 900 1000 latency(μ秒) 回数 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 0 100 200 300 400 500 600 700 800 900 1000 latency(μ秒) 回数 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 0 100 200 300 400 500 600 700 800 900 1000 latency(μ秒) 回数 Cycle: 300us Cycle: 500us Cycle: 1000us Latency (us) Numberofsamples Latency (us) Numberofsamples Latency (us) Numberofsamples 28Yoshitake Kobayashi - Embedded Linux Conference 2012 - Cycle Min Ave. Max # of DL miss 300 11 17.590 209 0 500 11 17.583 117 0 1000 11 17.610 154 0
  • 29. Case2: Summary of evaluation result (Second attempt) Evaluated with approx 60% CPU workload 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 0 100 200 300 400 500 600 700 800 900 1000 latency(μ秒) 回数 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 0 100 200 300 400 500 600 700 800 900 1000 latency(μ秒) 回数 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 0 100 200 300 400 500 600 700 800 900 1000 latency(μ秒) 回数 Cycle: 300us Cycle: 500us Cycle: 1000us Latency (us) Numberofsamples Latency (us) Numberofsamples Latency (us) Numberofsamples 29Yoshitake Kobayashi - Embedded Linux Conference 2012 - Cycle Min Ave Max # of DL miss 300 13 33.365 259 0 500 13 29.695 228 0 1000 12 33.222 199 0 Maximum latency reduced to 1/4
  • 30. Learn a lesson from Case2 Need to check what kind of functions are called Need to count the number of callees 30Yoshitake Kobayashi - Embedded Linux Conference 2012 -
  • 31. How to stabilize latency less than 50µs Required spec is the following: Single process No network, No USB devices, No graphic device, No storage device latency < 50µs Evaluation result (cyclictest: cycle=250µs) 4500000 5000000 31Yoshitake Kobayashi - Embedded Linux Conference 2012 - 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 0 10 20 30 40 50 60 70 80 90 100 回数 latency(μ秒)Latency (us) Numberofsamples Why?
  • 32. Second mistake 2000 4000 6000 8000 10000 12000 14000 回数Numberofsamples cycle = 250µs 32Yoshitake Kobayashi - Embedded Linux Conference 2012 - Deadline miss ratio: 100% Function graph tracing requires additional overhead 0 2000 0 500 1000 1500 2000 latency(μ秒)Latency (us)
  • 33. Evaluate the latency with trace ON 2000 4000 6000 8000 10000 12000回数Numberofsamples cycle = 1500µs 33Yoshitake Kobayashi - Embedded Linux Conference 2012 - This result seems to have same characteristics Trace with the following setting Stop if the latency is more than 790µs Stop if the latency is between 575 and 585 0 2000 0 200 400 600 800 1000 latency(μ秒)Latency (us)
  • 34. Conclusion Evaluating with too much workload is not a good idea Required log data is lost One shot trace log is not enough Need to check performance characteristics between trace on and off Need to check if latency occurs at similar points Require some creative thinking to identify the cause 34Yoshitake Kobayashi - Embedded Linux Conference 2012 - Require some creative thinking to identify the cause Check if the same function takes different execution time Statistics approach may be applied Check all functions included in specific area to identify each function’s overhead Eliminate callee’s overhead to calculate pure execution time for each function’s algorithm
  • 35. 35Yoshitake Kobayashi - Embedded Linux Conference 2012 - 2008 / 7 / 24TOSHIBA Confidential