Tales of Linux Micro-benchmarks
Matt Fleming
@fleming_matt
Agenda
- Background: What is a micro-benchmark?
- Why all the hate?
- Case Studies
- When are they useful?
@fleming_mattTales of Linux micro-benchmarks
Micro-what?
Benchmark: A program to measure the
performance of a system, usually for comparsion.
Models a real-life workload.
Micro-benchmark: A program to measure a (small
but important!) portion of a system.
Artificial/synthetic.
@fleming_mattTales of Linux micro-benchmarks
Why all the hate?
You need some understanding of OS and
runtime/toolchain.
Many do not actually test what the author
intended.
But this is simply a bug or user error, it doesn’t
invalidate the concept.
@fleming_mattTales of Linux micro-benchmarks
Why all the hate?
C: simple loop compiled with -O0 and -O2
for (i = 0; i < 1000000000; i++)
val = val * 2;
Time: 2.238s Time: 0.001s
@fleming_mattTales of Linux micro-benchmarks
Why all the hate?
C: simple loop compiled with -O0 and -O2
for (i = 0; i < 1000000000; i++)
val = val * 2;
movl $0x0,-0xc(%rbp)
jmp 2
1:
shlq -0x8(%rbp)
addl $0x1,-0xc(%rbp)
2:
cmpl $999999999,-0xc(%rbp)
jle 1
Time: 2.238s Time: 0.001s
@fleming_mattTales of Linux micro-benchmarks
Why all the hate?
C: simple loop compiled with -O0 and -O2
for (i = 0; i < 1000000000; i++)
val = val * 2;
movl $0x0,-0xc(%rbp)
jmp 2
1:
shlq -0x8(%rbp)
addl $0x1,-0xc(%rbp)
2:
cmpl $999999999,-0xc(%rbp)
jle 1
Time: 2.238s Time: 0.001s
@fleming_mattTales of Linux micro-benchmarks
Case study 1 - Siege
@fleming_mattTales of Linux micro-benchmarks
Case study 1 - Siege
5.62% [kernel] [k] task_cputime
3.33% [kernel] [k] osq_lock
2.58% [kernel] [k] thread_group_cputime
$ perf top
- task_cputime
- 97.35% thread_group_cputime
thread_group_cputime_adjusted
do_sys_times
sys_times
entry_SYSCALL_64_fastpath
@fleming_mattTales of Linux micro-benchmarks
Case study 2 - lmbench
Measures fork() + exit()
@fleming_mattTales of Linux micro-benchmarks
Case study 2 - lmbench
Measures fork() + exit()
Actually measures fork() + page fault + exit()
@fleming_mattTales of Linux micro-benchmarks
Case study 2 - lmbench
Measures fork() + exit()
Actually measures fork() + page fault + exit()
Faulting address
fault_around_bytes#PF
@fleming_mattTales of Linux micro-benchmarks
Case study 3 - hackbench
Message-passing micro-benchmark
Processes or threads
Pipes or sockets
@fleming_mattTales of Linux micro-benchmarks
Case study 3 - hackbench
Message-passing micro-benchmark
Processes or threads
Pipes or sockets
70%
@fleming_mattTales of Linux micro-benchmarks
Case study 4 - pipetest
@fleming_mattTales of Linux micro-benchmarks
Case study 4 - pipetest
@fleming_mattTales of Linux micro-benchmarks
When are they useful?
After profiling your workload and identifying
bottlenecks
When they’re super simple
When they’ve been tested
@fleming_mattTales of Linux micro-benchmarks
Questions?
@fleming_mattTales of Linux micro-benchmarks

Tales of Linux micro-benchmarks

  • 1.
    Tales of LinuxMicro-benchmarks Matt Fleming @fleming_matt
  • 2.
    Agenda - Background: Whatis a micro-benchmark? - Why all the hate? - Case Studies - When are they useful? @fleming_mattTales of Linux micro-benchmarks
  • 3.
    Micro-what? Benchmark: A programto measure the performance of a system, usually for comparsion. Models a real-life workload. Micro-benchmark: A program to measure a (small but important!) portion of a system. Artificial/synthetic. @fleming_mattTales of Linux micro-benchmarks
  • 4.
    Why all thehate? You need some understanding of OS and runtime/toolchain. Many do not actually test what the author intended. But this is simply a bug or user error, it doesn’t invalidate the concept. @fleming_mattTales of Linux micro-benchmarks
  • 5.
    Why all thehate? C: simple loop compiled with -O0 and -O2 for (i = 0; i < 1000000000; i++) val = val * 2; Time: 2.238s Time: 0.001s @fleming_mattTales of Linux micro-benchmarks
  • 6.
    Why all thehate? C: simple loop compiled with -O0 and -O2 for (i = 0; i < 1000000000; i++) val = val * 2; movl $0x0,-0xc(%rbp) jmp 2 1: shlq -0x8(%rbp) addl $0x1,-0xc(%rbp) 2: cmpl $999999999,-0xc(%rbp) jle 1 Time: 2.238s Time: 0.001s @fleming_mattTales of Linux micro-benchmarks
  • 7.
    Why all thehate? C: simple loop compiled with -O0 and -O2 for (i = 0; i < 1000000000; i++) val = val * 2; movl $0x0,-0xc(%rbp) jmp 2 1: shlq -0x8(%rbp) addl $0x1,-0xc(%rbp) 2: cmpl $999999999,-0xc(%rbp) jle 1 Time: 2.238s Time: 0.001s @fleming_mattTales of Linux micro-benchmarks
  • 8.
    Case study 1- Siege @fleming_mattTales of Linux micro-benchmarks
  • 9.
    Case study 1- Siege 5.62% [kernel] [k] task_cputime 3.33% [kernel] [k] osq_lock 2.58% [kernel] [k] thread_group_cputime $ perf top - task_cputime - 97.35% thread_group_cputime thread_group_cputime_adjusted do_sys_times sys_times entry_SYSCALL_64_fastpath @fleming_mattTales of Linux micro-benchmarks
  • 10.
    Case study 2- lmbench Measures fork() + exit() @fleming_mattTales of Linux micro-benchmarks
  • 11.
    Case study 2- lmbench Measures fork() + exit() Actually measures fork() + page fault + exit() @fleming_mattTales of Linux micro-benchmarks
  • 12.
    Case study 2- lmbench Measures fork() + exit() Actually measures fork() + page fault + exit() Faulting address fault_around_bytes#PF @fleming_mattTales of Linux micro-benchmarks
  • 13.
    Case study 3- hackbench Message-passing micro-benchmark Processes or threads Pipes or sockets @fleming_mattTales of Linux micro-benchmarks
  • 14.
    Case study 3- hackbench Message-passing micro-benchmark Processes or threads Pipes or sockets 70% @fleming_mattTales of Linux micro-benchmarks
  • 15.
    Case study 4- pipetest @fleming_mattTales of Linux micro-benchmarks
  • 16.
    Case study 4- pipetest @fleming_mattTales of Linux micro-benchmarks
  • 17.
    When are theyuseful? After profiling your workload and identifying bottlenecks When they’re super simple When they’ve been tested @fleming_mattTales of Linux micro-benchmarks
  • 18.