IntelON 2021 Processor Benchmarking

Processor
Benchmarking
Brendan Gregg
Senior Performance Engineer
IntelON, Oct 2021

Case Study (2021)
New processor
Popular CPU benchmark: 2.6x faster than Intel
What would you do?

Active Benchmarking
Low-level analysis while it is still running
Not just statistical analysis of the results

Flame Graphs
Showed CPU time was
in a single function
Flame Graphs are now in Intel vTune!

Instruction-Level Profiling...

linux$ perf top -e cycles:ppp -p 18641
Samples: 274K of event 'cycles:ppp', 4000 Hz, Event count (approx.): 61489970617
│ for(l = 2; l <= t; l++)
0.02 │20290: comisd %xmm2,%xmm1
0.05 │20294: ↑ jb 20270 <cpu_execute_event+0x30>
│ if (c % l == 0)
0.15 │20296: test $0x1,%bl
0.15 │20299: ↑ je 20270 <cpu_execute_event+0x30>
│ for(l = 2; l <= t; l++)
│2029b: mov $0x2,%ecx
│202a0: ↓ jmp 202c4 <cpu_execute_event+0x84>
│202a2: nopw 0x0(%rax,%rax,1)
3.57 │202a8: pxor %xmm0,%xmm0
0.21 │202ac: cvtsi2sd %rcx,%xmm0
0.26 │202b1: comisd %xmm0,%xmm1
3.51 │202b5: ↑ jb 20270 <cpu_execute_event+0x30>
│ if (c % l == 0)
0.09 │202b7: mov %rbx,%rax
0.02 │202ba: xor %edx,%edx
85.00 │202bc: div %rcx
0.12 │202bf: test %rdx,%rdx

linux$ perf top -e cycles:ppp -p 18641
Samples: 274K of event 'cycles:ppp', 4000 Hz, Event count (approx.): 61489970617
│ for(l = 2; l <= t; l++)
0.02 │20290: comisd %xmm2,%xmm1
0.05 │20294: ↑ jb 20270 <cpu_execute_event+0x30>
│ if (c % l == 0)
0.15 │20296: test $0x1,%bl
0.15 │20299: ↑ je 20270 <cpu_execute_event+0x30>
│ for(l = 2; l <= t; l++)
│2029b: mov $0x2,%ecx
│202a0: ↓ jmp 202c4 <cpu_execute_event+0x84>
│202a2: nopw 0x0(%rax,%rax,1)
3.57 │202a8: pxor %xmm0,%xmm0
0.21 │202ac: cvtsi2sd %rcx,%xmm0
0.26 │202b1: comisd %xmm0,%xmm1
3.51 │202b5: ↑ jb 20270 <cpu_execute_event+0x30>
│ if (c % l == 0)
0.09 │202b7: mov %rbx,%rax
0.02 │202ba: xor %edx,%edx
85.00 │202bc: div %rcx
0.12 │202bf: test %rdx,%rdx
85% of cycles in
the div instruction

Instruction-level Analysis
● Determined it’s really a div benchmark
● Other processor has a faster div

Netflix Cloud
● <1% div cycles
● Therefore, perf win should be <1% (not 2.6x!)

Challenges
● This benchmark is widely used
● Cycle analysis is nearly impossible in the cloud
○ Under hypervisors: Limited PMCs; no PEBS
● Accurate benchmarking needs senior engineers

My Benchmarking Checklist
1. Why not double?
2. Was it tuned?
3. Did it break limits?
4. Did it error?
5. Does it reproduce?
6. Does it matter?
7. Did it even happen?
https://www.brendangregg.com/blog/2018-06-30/benchmarking-checklist.html

An Exciting New Era of
Processor Innovation
Vertical stacking, new capabilities
More processors & competition

But also a Challenging New Era of
Processor Benchmarking
Increased demand
Hard to do debug in the cloud
Popular benchmarks can be wrong

Good benchmarking
drives innovation

Thank you.
Brendan Gregg
@brendangregg

IntelON 2021 Processor Benchmarking

More Related Content

What's hot

Similar to IntelON 2021 Processor Benchmarking

More from Brendan Gregg

Recently uploaded

IntelON 2021 Processor Benchmarking