SlideShare a Scribd company logo
Let’s talk about microbenchmarking
Andrey Akinshin, JetBrains
DotNext 2016 Helsinki
1/29
BenchmarkDotNet
2/29
Today’s environment
• BenchmarkDotNet v0.10.1
• Haswell Core i7-4702MQ CPU 2.20GHz, Windows 10
• .NET Framework 4.6.2 + clrjit/compatjit-v4.6.1586.0
• Source code: https://git.io/v1RRX
3/29
Today’s environment
• BenchmarkDotNet v0.10.1
• Haswell Core i7-4702MQ CPU 2.20GHz, Windows 10
• .NET Framework 4.6.2 + clrjit/compatjit-v4.6.1586.0
• Source code: https://git.io/v1RRX
Other environments:
C# compiler old csc / Roslyn
CLR CLR2 / CLR4 / CoreCLR / Mono
OS Windows / Linux / MacOS / FreeBSD
JIT LegacyJIT-x86 / LegacyJIT-x64 / RyuJIT-x64
GC MS (different CLRs) / Mono (Boehm/Sgen)
Toolchain JIT / NGen / .NET Native
Hardware ∞ different configurations
. . . . . .
And don’t forget about multiple versions
3/29
Count of iterations
A bad benchmark
// Resolution (Stopwatch) = 466 ns
// Latency (Stopwatch) = 18 ns
var sw = Stopwatch.StartNew();
Foo(); // 100 ns
sw.Stop();
WriteLine(sw.ElapsedMilliseconds);
4/29
Count of iterations
A bad benchmark
// Resolution (Stopwatch) = 466 ns
// Latency (Stopwatch) = 18 ns
var sw = Stopwatch.StartNew();
Foo(); // 100 ns
sw.Stop();
WriteLine(sw.ElapsedMilliseconds);
A better benchmark
var sw = Stopwatch.StartNew();
for (int i = 0; i < N; i++) // (N * 100 + eps) ns
Foo(); // 100 ns
sw.Stop();
var total = sw.ElapsedTicks / Stopwatch.Frequency;
WriteLine(total / N);
4/29
Several launches
Run 01 : 529.8674 ns/op
Run 02 : 532.7541 ns/op
Run 03 : 558.7448 ns/op
Run 04 : 555.6647 ns/op
Run 05 : 539.6401 ns/op
Run 06 : 539.3494 ns/op
Run 07 : 564.3222 ns/op
Run 08 : 551.9544 ns/op
Run 09 : 550.1608 ns/op
Run 10 : 533.0634 ns/op
5/29
Several launches
6/29
A simple case
Central limit theorem to the rescue!
7/29
Complicated cases
8/29
Latencies
Event Latency
1 CPU cycle 0.3 ns
Level 1 cache access 0.9 ns
Level 2 cache access 2.8 ns
Level 3 cache access 12.9 ns
Main memory access 120 ns
Solid-state disk I/O 50-150 µs
Rotational disk I/O 1-10 ms
Internet: SF to NYC 40 ms
Internet: SF to UK 81 ms
Internet: SF to Australia 183 ms
OS virtualization reboot 4 sec
Hardware virtualization reboot 40 sec
Physical system reboot 5 min
© Systems Performance: Enterprise and the Cloud
9/29
Sum of elements
const int N = 1024;
int[,] a = new int[N, N];
[Benchmark]
public double SumIJ()
{
var sum = 0;
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
sum += a[i, j];
return sum;
}
[Benchmark]
public double SumJI()
{
var sum = 0;
for (int j = 0; j < N; j++)
for (int i = 0; i < N; i++)
sum += a[i, j];
return sum;
}
10/29
Sum of elements
const int N = 1024;
int[,] a = new int[N, N];
[Benchmark]
public double SumIJ()
{
var sum = 0;
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
sum += a[i, j];
return sum;
}
[Benchmark]
public double SumJI()
{
var sum = 0;
for (int j = 0; j < N; j++)
for (int i = 0; i < N; i++)
sum += a[i, j];
return sum;
}
CPU cache effect:
SumIJ SumJI
LegacyJIT-x86 ≈1.3ms ≈4.0ms
10/29
Cache-sensitive benchmarks
Let’s run a benchmark several times:
int[] x = new int[128 * 1024 * 1024];
for (int iter = 0; iter < 5; iter++)
{
var sw = Stopwatch.StartNew();
for (int i = 0; i < x.Length; i += 16)
x[i]++;
sw.Stop();
WriteLine(sw.ElapsedMilliseconds);
}
11/29
Cache-sensitive benchmarks
Let’s run a benchmark several times:
int[] x = new int[128 * 1024 * 1024];
for (int iter = 0; iter < 5; iter++)
{
var sw = Stopwatch.StartNew();
for (int i = 0; i < x.Length; i += 16)
x[i]++;
sw.Stop();
WriteLine(sw.ElapsedMilliseconds);
}
176 // not warmed
81 // still not warmed
62 // the steady state
62 // the steady state
62 // the steady state
Warmup is not only about .NET
11/29
Branch prediction
const int N = 32767;
int[] sorted, unsorted; // random numbers [0..255]
private static int Sum(int[] data)
{
int sum = 0;
for (int i = 0; i < N; i++)
if (data[i] >= 128)
sum += data[i];
return sum;
}
[Benchmark]
public int Sorted()
{
return Sum(sorted);
}
[Benchmark]
public int Unsorted()
{
return Sum(unsorted);
}
12/29
Branch prediction
const int N = 32767;
int[] sorted, unsorted; // random numbers [0..255]
private static int Sum(int[] data)
{
int sum = 0;
for (int i = 0; i < N; i++)
if (data[i] >= 128)
sum += data[i];
return sum;
}
[Benchmark]
public int Sorted()
{
return Sum(sorted);
}
[Benchmark]
public int Unsorted()
{
return Sum(unsorted);
}
Sorted Unsorted
LegacyJIT-x86 ≈20µs ≈139µs
12/29
Isolation
A bad benchmark
var sw1 = Stopwatch.StartNew();
Foo();
sw1.Stop();
var sw2 = Stopwatch.StartNew();
Bar();
sw2.Stop();
13/29
Isolation
A bad benchmark
var sw1 = Stopwatch.StartNew();
Foo();
sw1.Stop();
var sw2 = Stopwatch.StartNew();
Bar();
sw2.Stop();
In general case, you should run each benchmark in
his own process. Remember about:
• Interface method dispatch
• Garbage collector and autotuning
• Conditional jitting
13/29
Interface method dispatch
private interface IInc {
double Inc(double x);
}
private class Foo : IInc {
double Inc(double x) => x + 1;
}
private class Bar : IInc {
double Inc(double x) => x + 1;
}
private double Run(IInc inc) {
double sum = 0;
for (int i = 0; i < 1001; i++)
sum += inc.Inc(0);
return sum;
}
// Which method is faster?
[Benchmark]
public double FooFoo() {
var foo1 = new Foo();
var foo2 = new Foo();
return Run(foo1) + Run(foo2);
}
[Benchmark]
public double FooBar() {
var foo = new Foo();
var bar = new Bar();
return Run(foo) + Run(bar);
}
14/29
Interface method dispatch
private interface IInc {
double Inc(double x);
}
private class Foo : IInc {
double Inc(double x) => x + 1;
}
private class Bar : IInc {
double Inc(double x) => x + 1;
}
private double Run(IInc inc) {
double sum = 0;
for (int i = 0; i < 1001; i++)
sum += inc.Inc(0);
return sum;
}
// Which method is faster?
[Benchmark]
public double FooFoo() {
var foo1 = new Foo();
var foo2 = new Foo();
return Run(foo1) + Run(foo2);
}
[Benchmark]
public double FooBar() {
var foo = new Foo();
var bar = new Bar();
return Run(foo) + Run(bar);
}
FooFoo FooBar
LegacyJIT-x64 ≈5.4µs ≈7.1µs
14/29
Tricky inlining
[Benchmark]
int Calc() => WithoutStarg(0x11) + WithStarg(0x12);
int WithoutStarg(int value) => value;
int WithStarg(int value) {
if (value < 0)
value = -value;
return value;
}
15/29
Tricky inlining
[Benchmark]
int Calc() => WithoutStarg(0x11) + WithStarg(0x12);
int WithoutStarg(int value) => value;
int WithStarg(int value) {
if (value < 0)
value = -value;
return value;
}
LegacyJIT-x86 LegacyJIT-x64 RyuJIT-x64
≈1.7ns 0 ≈1.7ns
15/29
Tricky inlining
[Benchmark]
int Calc() => WithoutStarg(0x11) + WithStarg(0x12);
int WithoutStarg(int value) => value;
int WithStarg(int value) {
if (value < 0)
value = -value;
return value;
}
LegacyJIT-x86 LegacyJIT-x64 RyuJIT-x64
≈1.7ns 0 ≈1.7ns
; LegacyJIT-x64 : Inlining succeeded
mov ecx,23h
ret
15/29
Tricky inlining
[Benchmark]
int Calc() => WithoutStarg(0x11) + WithStarg(0x12);
int WithoutStarg(int value) => value;
int WithStarg(int value) {
if (value < 0)
value = -value;
return value;
}
LegacyJIT-x86 LegacyJIT-x64 RyuJIT-x64
≈1.7ns 0 ≈1.7ns
; LegacyJIT-x64 : Inlining succeeded
mov ecx,23h
ret
// RyuJIT-x64 : Inlining failed
// Inline expansion aborted due to opcode
// [06] OP_starg.s in method
// Program:WithStarg(int):int:this
15/29
SIMD
struct MyVector // Copy-pasted from System.Numerics.Vector4
{
public float X, Y, Z, W;
public MyVector(float x, float y, float z, float w)
{
X = x; Y = y; Z = z; W = w;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static MyVector operator *(MyVector left, MyVector right)
{
return new MyVector(left.X * right.X, left.Y * right.Y,
left.Z * right.Z, left.W * right.W);
}
}
Vector4 vector1, vector2, vector3;
MyVector myVector1, myVector2, myVector3;
[Benchmark] void MyMul() => myVector3 = myVector1 * myVector2;
[Benchmark] void BclMul() => vector3 = vector1 * vector2;
16/29
SIMD
struct MyVector // Copy-pasted from System.Numerics.Vector4
{
public float X, Y, Z, W;
public MyVector(float x, float y, float z, float w)
{
X = x; Y = y; Z = z; W = w;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static MyVector operator *(MyVector left, MyVector right)
{
return new MyVector(left.X * right.X, left.Y * right.Y,
left.Z * right.Z, left.W * right.W);
}
}
Vector4 vector1, vector2, vector3;
MyVector myVector1, myVector2, myVector3;
[Benchmark] void MyMul() => myVector3 = myVector1 * myVector2;
[Benchmark] void BclMul() => vector3 = vector1 * vector2;
LegacyJIT-x64 RyuJIT-x64
MyMul ≈12.9ns ≈2.5ns
BclMul ≈12.9ns ≈0.2ns
16/29
How so?
LegacyJIT-x64 RyuJIT-x64
MyMul ≈12.9ns ≈2.5ns
BclMul ≈12.9ns ≈0.2ns
; LegacyJIT-x64
; MyMul, BclMul: Na¨ıve SSE
; ...
movss xmm3,dword ptr [rsp+40h]
mulss xmm3,dword ptr [rsp+30h]
movss xmm2,dword ptr [rsp+44h]
mulss xmm2,dword ptr [rsp+34h]
movss xmm1,dword ptr [rsp+48h]
mulss xmm1,dword ptr [rsp+38h]
movss xmm0,dword ptr [rsp+4Ch]
mulss xmm0,dword ptr [rsp+3Ch]
xor eax,eax
mov qword ptr [rsp],rax
mov qword ptr [rsp+8],rax
lea rax,[rsp]
movss dword ptr [rax],xmm3
movss dword ptr [rax+4],xmm2
; ...
; RyuJIT-x64
; MyMul: Na¨ıve AVX
; ...
vmulss xmm0,xmm0,xmm4
vmulss xmm1,xmm1,xmm5
vmulss xmm2,xmm2,xmm6
vmulss xmm3,xmm3,xmm7
; ...
; BclMul: Smart AVX intrinsic
vmovupd xmm0,xmmword ptr [rcx+8]
vmovupd xmm1,xmmword ptr [rcx+18h]
vmulps xmm0,xmm0,xmm1
vmovupd xmmword ptr [rcx+28h],xmm0
17/29
Let’s calculate some square roots
[Benchmark]
double Sqrt13() =>
Math.Sqrt(1) + Math.Sqrt(2) + Math.Sqrt(3) + /* ... */
+ Math.Sqrt(13);
VS
[Benchmark]
double Sqrt14() =>
Math.Sqrt(1) + Math.Sqrt(2) + Math.Sqrt(3) + /* ... */
+ Math.Sqrt(13) + Math.Sqrt(14);
18/29
Let’s calculate some square roots
[Benchmark]
double Sqrt13() =>
Math.Sqrt(1) + Math.Sqrt(2) + Math.Sqrt(3) + /* ... */
+ Math.Sqrt(13);
VS
[Benchmark]
double Sqrt14() =>
Math.Sqrt(1) + Math.Sqrt(2) + Math.Sqrt(3) + /* ... */
+ Math.Sqrt(13) + Math.Sqrt(14);
RyuJIT-x64∗
Sqrt13 ≈91ns
Sqrt14 0 ns
∗Can be changed in future versions, see github.com/dotnet/coreclr/issues/987
18/29
How so?
RyuJIT-x64, Sqrt13
vsqrtsd xmm0,xmm0,mmword ptr [7FF94F9E4D28h]
vsqrtsd xmm1,xmm0,mmword ptr [7FF94F9E4D30h]
vaddsd xmm0,xmm0,xmm1
vsqrtsd xmm1,xmm0,mmword ptr [7FF94F9E4D38h]
vaddsd xmm0,xmm0,xmm1
vsqrtsd xmm1,xmm0,mmword ptr [7FF94F9E4D40h]
vaddsd xmm0,xmm0,xmm1
; A lot of vqrtsd and vaddsd instructions
; ...
vsqrtsd xmm1,xmm0,mmword ptr [7FF94F9E4D88h]
vaddsd xmm0,xmm0,xmm1
ret
RyuJIT-x64, Sqrt14
vmovsd xmm0,qword ptr [7FF94F9C4C80h] ; Const
ret
19/29
How so?
Big expression tree
* stmtExpr void (top level) (IL 0x000... ???)
| /--* mathFN double sqrt
| | --* dconst double 13.000000000000000
| /--* + double
| | | /--* mathFN double sqrt
| | | | --* dconst double 12.000000000000000
| | --* + double
| | | /--* mathFN double sqrt
| | | | --* dconst double 11.000000000000000
| | --* + double
| | | /--* mathFN double sqrt
| | | | --* dconst double 10.000000000000000
| | --* + double
| | | /--* mathFN double sqrt
| | | | --* dconst double 9.0000000000000000
| | --* + double
| | | /--* mathFN double sqrt
| | | | --* dconst double 8.0000000000000000
| | --* + double
| | | /--* mathFN double sqrt
| | | | --* dconst double 7.0000000000000000
| | --* + double
| | | /--* mathFN double sqrt
| | | | --* dconst double 6.0000000000000000
| | --* + double
| | | /--* mathFN double sqrt
| | | | --* dconst double 5.0000000000000000
// ...
20/29
How so?
Constant folding in action
N001 [000001] dconst 1.0000000000000000 => $c0 {DblCns[1.000000]}
N002 [000002] mathFN => $c0 {DblCns[1.000000]}
N003 [000003] dconst 2.0000000000000000 => $c1 {DblCns[2.000000]}
N004 [000004] mathFN => $c2 {DblCns[1.414214]}
N005 [000005] + => $c3 {DblCns[2.414214]}
N006 [000006] dconst 3.0000000000000000 => $c4 {DblCns[3.000000]}
N007 [000007] mathFN => $c5 {DblCns[1.732051]}
N008 [000008] + => $c6 {DblCns[4.146264]}
N009 [000009] dconst 4.0000000000000000 => $c7 {DblCns[4.000000]}
N010 [000010] mathFN => $c1 {DblCns[2.000000]}
N011 [000011] + => $c8 {DblCns[6.146264]}
N012 [000012] dconst 5.0000000000000000 => $c9 {DblCns[5.000000]}
N013 [000013] mathFN => $ca {DblCns[2.236068]}
N014 [000014] + => $cb {DblCns[8.382332]}
N015 [000015] dconst 6.0000000000000000 => $cc {DblCns[6.000000]}
N016 [000016] mathFN => $cd {DblCns[2.449490]}
N017 [000017] + => $ce {DblCns[10.831822]}
N018 [000018] dconst 7.0000000000000000 => $cf {DblCns[7.000000]}
N019 [000019] mathFN => $d0 {DblCns[2.645751]}
N020 [000020] + => $d1 {DblCns[13.477573]}
...
21/29
Concurrency
22/29
True sharing
23/29
False sharing
24/29
False sharing in action
// It's an extremely na¨ıve benchmark
// Don't try this at home
int[] x = new int[1024];
void Inc(int p) {
for (int i = 0; i < 10000001; i++)
x[p]++;
}
void Run(int step) {
var sw = Stopwatch.StartNew();
Task.WaitAll(
Task.Factory.StartNew(() => Inc(0 * step)),
Task.Factory.StartNew(() => Inc(1 * step)),
Task.Factory.StartNew(() => Inc(2 * step)),
Task.Factory.StartNew(() => Inc(3 * step)));
WriteLine(sw.ElapsedMilliseconds);
}
25/29
False sharing in action
// It's an extremely na¨ıve benchmark
// Don't try this at home
int[] x = new int[1024];
void Inc(int p) {
for (int i = 0; i < 10000001; i++)
x[p]++;
}
void Run(int step) {
var sw = Stopwatch.StartNew();
Task.WaitAll(
Task.Factory.StartNew(() => Inc(0 * step)),
Task.Factory.StartNew(() => Inc(1 * step)),
Task.Factory.StartNew(() => Inc(2 * step)),
Task.Factory.StartNew(() => Inc(3 * step)));
WriteLine(sw.ElapsedMilliseconds);
}
Run(1) Run(256)
≈400ms ≈150ms
25/29
Conclusion: Benchmarking is hard
Anon et al., “A Measure of Transaction Processing Power”
There are lies, damn lies and then there are performance
measures.
26/29
Some good books
27/29
Questions?
Andrey Akinshin
http://aakinshin.net
https://github.com/AndreyAkinshin
https://twitter.com/andrey_akinshin
andrey.akinshin@gmail.com
28/29

More Related Content

What's hot

How to add an optimization for C# to RyuJIT
How to add an optimization for C# to RyuJITHow to add an optimization for C# to RyuJIT
How to add an optimization for C# to RyuJIT
Egor Bogatov
 
20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugs20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugsComputer Science Club
 
20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugs20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugsComputer Science Club
 
Advance C++notes
Advance C++notesAdvance C++notes
Advance C++notes
Rajiv Gupta
 
Abstracting Vector Architectures in Library Generators: Case Study Convolutio...
Abstracting Vector Architectures in Library Generators: Case Study Convolutio...Abstracting Vector Architectures in Library Generators: Case Study Convolutio...
Abstracting Vector Architectures in Library Generators: Case Study Convolutio...
ETH Zurich
 
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
Alex Pruden
 
ZK Study Club: Sumcheck Arguments and Their Applications
ZK Study Club: Sumcheck Arguments and Their ApplicationsZK Study Club: Sumcheck Arguments and Their Applications
ZK Study Club: Sumcheck Arguments and Their Applications
Alex Pruden
 
To Swift 2...and Beyond!
To Swift 2...and Beyond!To Swift 2...and Beyond!
To Swift 2...and Beyond!
Scott Gardner
 
Computing on Encrypted Data
Computing on Encrypted DataComputing on Encrypted Data
Computing on Encrypted Data
New York Technology Council
 
zkStudy Club: Subquadratic SNARGs in the Random Oracle Model
zkStudy Club: Subquadratic SNARGs in the Random Oracle ModelzkStudy Club: Subquadratic SNARGs in the Random Oracle Model
zkStudy Club: Subquadratic SNARGs in the Random Oracle Model
Alex Pruden
 
Microsoft Word Hw#1
Microsoft Word   Hw#1Microsoft Word   Hw#1
Microsoft Word Hw#1
kkkseld
 
Some examples of the 64-bit code errors
Some examples of the 64-bit code errorsSome examples of the 64-bit code errors
Some examples of the 64-bit code errors
PVS-Studio
 
Advance features of C++
Advance features of C++Advance features of C++
Advance features of C++vidyamittal
 
PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...
Andrey Karpov
 
C++ TUTORIAL 6
C++ TUTORIAL 6C++ TUTORIAL 6
C++ TUTORIAL 6
Farhan Ab Rahman
 
Computer Programming- Lecture 4
Computer Programming- Lecture 4Computer Programming- Lecture 4
Computer Programming- Lecture 4
Dr. Md. Shohel Sayeed
 
How To Crack RSA Netrek Binary Verification System
How To Crack RSA Netrek Binary Verification SystemHow To Crack RSA Netrek Binary Verification System
How To Crack RSA Netrek Binary Verification SystemJay Corrales
 
ExperiencesSharingOnEmbeddedSystemDevelopment_20160321
ExperiencesSharingOnEmbeddedSystemDevelopment_20160321ExperiencesSharingOnEmbeddedSystemDevelopment_20160321
ExperiencesSharingOnEmbeddedSystemDevelopment_20160321Teddy Hsiung
 
Конверсия управляемых языков в неуправляемые
Конверсия управляемых языков в неуправляемыеКонверсия управляемых языков в неуправляемые
Конверсия управляемых языков в неуправляемые
Platonov Sergey
 
Homomorphic Encryption
Homomorphic EncryptionHomomorphic Encryption
Homomorphic Encryption
Victor Pereira
 

What's hot (20)

How to add an optimization for C# to RyuJIT
How to add an optimization for C# to RyuJITHow to add an optimization for C# to RyuJIT
How to add an optimization for C# to RyuJIT
 
20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugs20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugs
 
20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugs20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugs
 
Advance C++notes
Advance C++notesAdvance C++notes
Advance C++notes
 
Abstracting Vector Architectures in Library Generators: Case Study Convolutio...
Abstracting Vector Architectures in Library Generators: Case Study Convolutio...Abstracting Vector Architectures in Library Generators: Case Study Convolutio...
Abstracting Vector Architectures in Library Generators: Case Study Convolutio...
 
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
 
ZK Study Club: Sumcheck Arguments and Their Applications
ZK Study Club: Sumcheck Arguments and Their ApplicationsZK Study Club: Sumcheck Arguments and Their Applications
ZK Study Club: Sumcheck Arguments and Their Applications
 
To Swift 2...and Beyond!
To Swift 2...and Beyond!To Swift 2...and Beyond!
To Swift 2...and Beyond!
 
Computing on Encrypted Data
Computing on Encrypted DataComputing on Encrypted Data
Computing on Encrypted Data
 
zkStudy Club: Subquadratic SNARGs in the Random Oracle Model
zkStudy Club: Subquadratic SNARGs in the Random Oracle ModelzkStudy Club: Subquadratic SNARGs in the Random Oracle Model
zkStudy Club: Subquadratic SNARGs in the Random Oracle Model
 
Microsoft Word Hw#1
Microsoft Word   Hw#1Microsoft Word   Hw#1
Microsoft Word Hw#1
 
Some examples of the 64-bit code errors
Some examples of the 64-bit code errorsSome examples of the 64-bit code errors
Some examples of the 64-bit code errors
 
Advance features of C++
Advance features of C++Advance features of C++
Advance features of C++
 
PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...
 
C++ TUTORIAL 6
C++ TUTORIAL 6C++ TUTORIAL 6
C++ TUTORIAL 6
 
Computer Programming- Lecture 4
Computer Programming- Lecture 4Computer Programming- Lecture 4
Computer Programming- Lecture 4
 
How To Crack RSA Netrek Binary Verification System
How To Crack RSA Netrek Binary Verification SystemHow To Crack RSA Netrek Binary Verification System
How To Crack RSA Netrek Binary Verification System
 
ExperiencesSharingOnEmbeddedSystemDevelopment_20160321
ExperiencesSharingOnEmbeddedSystemDevelopment_20160321ExperiencesSharingOnEmbeddedSystemDevelopment_20160321
ExperiencesSharingOnEmbeddedSystemDevelopment_20160321
 
Конверсия управляемых языков в неуправляемые
Конверсия управляемых языков в неуправляемыеКонверсия управляемых языков в неуправляемые
Конверсия управляемых языков в неуправляемые
 
Homomorphic Encryption
Homomorphic EncryptionHomomorphic Encryption
Homomorphic Encryption
 

Similar to Let’s talk about microbenchmarking

Tema3_Introduction_to_CUDA_C.pdf
Tema3_Introduction_to_CUDA_C.pdfTema3_Introduction_to_CUDA_C.pdf
Tema3_Introduction_to_CUDA_C.pdf
pepe464163
 
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
Andrey Karpov
 
I dont know what is wrong with this roulette program I cant seem.pdf
I dont know what is wrong with this roulette program I cant seem.pdfI dont know what is wrong with this roulette program I cant seem.pdf
I dont know what is wrong with this roulette program I cant seem.pdf
archanaemporium
 
Whats new in_csharp4
Whats new in_csharp4Whats new in_csharp4
Whats new in_csharp4Abed Bukhari
 
JVM code reading -- C2
JVM code reading -- C2JVM code reading -- C2
JVM code reading -- C2
ytoshima
 
PVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications developmentPVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications development
OOO "Program Verification Systems"
 
#include stdafx.h using namespace std; #include stdlib.h.docx
#include stdafx.h using namespace std; #include stdlib.h.docx#include stdafx.h using namespace std; #include stdlib.h.docx
#include stdafx.h using namespace std; #include stdlib.h.docx
ajoy21
 
.net progrmming part2
.net progrmming part2.net progrmming part2
.net progrmming part2
Dr.M.Karthika parthasarathy
 
public class TrequeT extends AbstractListT { .pdf
  public class TrequeT extends AbstractListT {  .pdf  public class TrequeT extends AbstractListT {  .pdf
public class TrequeT extends AbstractListT { .pdf
info30292
 
ぐだ生 Java入門第三回(文字コードの話)(Keynote版)
ぐだ生 Java入門第三回(文字コードの話)(Keynote版)ぐだ生 Java入門第三回(文字コードの話)(Keynote版)
ぐだ生 Java入門第三回(文字コードの話)(Keynote版)Makoto Yamazaki
 
Covering a function using a Dynamic Symbolic Execution approach
Covering a function using a Dynamic Symbolic Execution approach Covering a function using a Dynamic Symbolic Execution approach
Covering a function using a Dynamic Symbolic Execution approach
Jonathan Salwan
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
Brendan Gregg
 
DATASTRUCTURES PPTS PREPARED BY M V BRAHMANANDA REDDY
DATASTRUCTURES PPTS PREPARED BY M V BRAHMANANDA REDDYDATASTRUCTURES PPTS PREPARED BY M V BRAHMANANDA REDDY
DATASTRUCTURES PPTS PREPARED BY M V BRAHMANANDA REDDYMalikireddy Bramhananda Reddy
 
COSCUP2023 RSA256 Verilator.pdf
COSCUP2023 RSA256 Verilator.pdfCOSCUP2023 RSA256 Verilator.pdf
COSCUP2023 RSA256 Verilator.pdf
Yodalee
 
PRACTICAL COMPUTING
PRACTICAL COMPUTINGPRACTICAL COMPUTING
PRACTICAL COMPUTING
Ramachendran Logarajah
 
54602399 c-examples-51-to-108-programe-ee01083101
54602399 c-examples-51-to-108-programe-ee0108310154602399 c-examples-51-to-108-programe-ee01083101
54602399 c-examples-51-to-108-programe-ee01083101
premrings
 
Score (smart contract for icon)
Score (smart contract for icon) Score (smart contract for icon)
Score (smart contract for icon)
Doyun Hwang
 

Similar to Let’s talk about microbenchmarking (20)

Tema3_Introduction_to_CUDA_C.pdf
Tema3_Introduction_to_CUDA_C.pdfTema3_Introduction_to_CUDA_C.pdf
Tema3_Introduction_to_CUDA_C.pdf
 
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
 
I dont know what is wrong with this roulette program I cant seem.pdf
I dont know what is wrong with this roulette program I cant seem.pdfI dont know what is wrong with this roulette program I cant seem.pdf
I dont know what is wrong with this roulette program I cant seem.pdf
 
Whats new in_csharp4
Whats new in_csharp4Whats new in_csharp4
Whats new in_csharp4
 
JVM code reading -- C2
JVM code reading -- C2JVM code reading -- C2
JVM code reading -- C2
 
PVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications developmentPVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications development
 
#include stdafx.h using namespace std; #include stdlib.h.docx
#include stdafx.h using namespace std; #include stdlib.h.docx#include stdafx.h using namespace std; #include stdlib.h.docx
#include stdafx.h using namespace std; #include stdlib.h.docx
 
.net progrmming part2
.net progrmming part2.net progrmming part2
.net progrmming part2
 
public class TrequeT extends AbstractListT { .pdf
  public class TrequeT extends AbstractListT {  .pdf  public class TrequeT extends AbstractListT {  .pdf
public class TrequeT extends AbstractListT { .pdf
 
ぐだ生 Java入門第三回(文字コードの話)(Keynote版)
ぐだ生 Java入門第三回(文字コードの話)(Keynote版)ぐだ生 Java入門第三回(文字コードの話)(Keynote版)
ぐだ生 Java入門第三回(文字コードの話)(Keynote版)
 
Insertion Sort Code
Insertion Sort CodeInsertion Sort Code
Insertion Sort Code
 
Ac2
Ac2Ac2
Ac2
 
Covering a function using a Dynamic Symbolic Execution approach
Covering a function using a Dynamic Symbolic Execution approach Covering a function using a Dynamic Symbolic Execution approach
Covering a function using a Dynamic Symbolic Execution approach
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
 
DATASTRUCTURES PPTS PREPARED BY M V BRAHMANANDA REDDY
DATASTRUCTURES PPTS PREPARED BY M V BRAHMANANDA REDDYDATASTRUCTURES PPTS PREPARED BY M V BRAHMANANDA REDDY
DATASTRUCTURES PPTS PREPARED BY M V BRAHMANANDA REDDY
 
Exploiting vectorization with ISPC
Exploiting vectorization with ISPCExploiting vectorization with ISPC
Exploiting vectorization with ISPC
 
COSCUP2023 RSA256 Verilator.pdf
COSCUP2023 RSA256 Verilator.pdfCOSCUP2023 RSA256 Verilator.pdf
COSCUP2023 RSA256 Verilator.pdf
 
PRACTICAL COMPUTING
PRACTICAL COMPUTINGPRACTICAL COMPUTING
PRACTICAL COMPUTING
 
54602399 c-examples-51-to-108-programe-ee01083101
54602399 c-examples-51-to-108-programe-ee0108310154602399 c-examples-51-to-108-programe-ee01083101
54602399 c-examples-51-to-108-programe-ee01083101
 
Score (smart contract for icon)
Score (smart contract for icon) Score (smart contract for icon)
Score (smart contract for icon)
 

More from Andrey Akinshin

Поговорим про performance-тестирование
Поговорим про performance-тестированиеПоговорим про performance-тестирование
Поговорим про performance-тестирование
Andrey Akinshin
 
Сложности performance-тестирования
Сложности performance-тестированияСложности performance-тестирования
Сложности performance-тестирования
Andrey Akinshin
 
Сложности микробенчмаркинга
Сложности микробенчмаркингаСложности микробенчмаркинга
Сложности микробенчмаркинга
Andrey Akinshin
 
Поговорим про память
Поговорим про памятьПоговорим про память
Поговорим про память
Andrey Akinshin
 
Кроссплатформенный .NET и как там дела с Mono и CoreCLR
Кроссплатформенный .NET и как там дела с Mono и CoreCLRКроссплатформенный .NET и как там дела с Mono и CoreCLR
Кроссплатформенный .NET и как там дела с Mono и CoreCLR
Andrey Akinshin
 
Теория и практика .NET-бенчмаркинга (25.01.2017, Москва)
 Теория и практика .NET-бенчмаркинга (25.01.2017, Москва) Теория и практика .NET-бенчмаркинга (25.01.2017, Москва)
Теория и практика .NET-бенчмаркинга (25.01.2017, Москва)
Andrey Akinshin
 
Продолжаем говорить про арифметику
Продолжаем говорить про арифметикуПродолжаем говорить про арифметику
Продолжаем говорить про арифметику
Andrey Akinshin
 
Теория и практика .NET-бенчмаркинга (02.11.2016, Екатеринбург)
Теория и практика .NET-бенчмаркинга (02.11.2016, Екатеринбург)Теория и практика .NET-бенчмаркинга (02.11.2016, Екатеринбург)
Теория и практика .NET-бенчмаркинга (02.11.2016, Екатеринбург)
Andrey Akinshin
 
Поговорим про арифметику
Поговорим про арифметикуПоговорим про арифметику
Поговорим про арифметику
Andrey Akinshin
 
Подружили CLR и JVM в Project Rider
Подружили CLR и JVM в Project RiderПодружили CLR и JVM в Project Rider
Подружили CLR и JVM в Project Rider
Andrey Akinshin
 
Что нам готовит грядущий C#7?
Что нам готовит грядущий C#7?Что нам готовит грядущий C#7?
Что нам готовит грядущий C#7?
Andrey Akinshin
 
Продолжаем говорить о микрооптимизациях .NET-приложений
Продолжаем говорить о микрооптимизациях .NET-приложенийПродолжаем говорить о микрооптимизациях .NET-приложений
Продолжаем говорить о микрооптимизациях .NET-приложений
Andrey Akinshin
 
Распространённые ошибки оценки производительности .NET-приложений
Распространённые ошибки оценки производительности .NET-приложенийРаспространённые ошибки оценки производительности .NET-приложений
Распространённые ошибки оценки производительности .NET-приложений
Andrey Akinshin
 
Поговорим о микрооптимизациях .NET-приложений
Поговорим о микрооптимизациях .NET-приложенийПоговорим о микрооптимизациях .NET-приложений
Поговорим о микрооптимизациях .NET-приложений
Andrey Akinshin
 
Практические приёмы оптимизации .NET-приложений
Практические приёмы оптимизации .NET-приложенийПрактические приёмы оптимизации .NET-приложений
Практические приёмы оптимизации .NET-приложений
Andrey Akinshin
 
Поговорим о различных версиях .NET
Поговорим о различных версиях .NETПоговорим о различных версиях .NET
Поговорим о различных версиях .NET
Andrey Akinshin
 
Низкоуровневые оптимизации .NET-приложений
Низкоуровневые оптимизации .NET-приложенийНизкоуровневые оптимизации .NET-приложений
Низкоуровневые оптимизации .NET-приложений
Andrey Akinshin
 
Основы работы с Git
Основы работы с GitОсновы работы с Git
Основы работы с Git
Andrey Akinshin
 
Сборка мусора в .NET
Сборка мусора в .NETСборка мусора в .NET
Сборка мусора в .NET
Andrey Akinshin
 
Об особенностях использования значимых типов в .NET
Об особенностях использования значимых типов в .NETОб особенностях использования значимых типов в .NET
Об особенностях использования значимых типов в .NET
Andrey Akinshin
 

More from Andrey Akinshin (20)

Поговорим про performance-тестирование
Поговорим про performance-тестированиеПоговорим про performance-тестирование
Поговорим про performance-тестирование
 
Сложности performance-тестирования
Сложности performance-тестированияСложности performance-тестирования
Сложности performance-тестирования
 
Сложности микробенчмаркинга
Сложности микробенчмаркингаСложности микробенчмаркинга
Сложности микробенчмаркинга
 
Поговорим про память
Поговорим про памятьПоговорим про память
Поговорим про память
 
Кроссплатформенный .NET и как там дела с Mono и CoreCLR
Кроссплатформенный .NET и как там дела с Mono и CoreCLRКроссплатформенный .NET и как там дела с Mono и CoreCLR
Кроссплатформенный .NET и как там дела с Mono и CoreCLR
 
Теория и практика .NET-бенчмаркинга (25.01.2017, Москва)
 Теория и практика .NET-бенчмаркинга (25.01.2017, Москва) Теория и практика .NET-бенчмаркинга (25.01.2017, Москва)
Теория и практика .NET-бенчмаркинга (25.01.2017, Москва)
 
Продолжаем говорить про арифметику
Продолжаем говорить про арифметикуПродолжаем говорить про арифметику
Продолжаем говорить про арифметику
 
Теория и практика .NET-бенчмаркинга (02.11.2016, Екатеринбург)
Теория и практика .NET-бенчмаркинга (02.11.2016, Екатеринбург)Теория и практика .NET-бенчмаркинга (02.11.2016, Екатеринбург)
Теория и практика .NET-бенчмаркинга (02.11.2016, Екатеринбург)
 
Поговорим про арифметику
Поговорим про арифметикуПоговорим про арифметику
Поговорим про арифметику
 
Подружили CLR и JVM в Project Rider
Подружили CLR и JVM в Project RiderПодружили CLR и JVM в Project Rider
Подружили CLR и JVM в Project Rider
 
Что нам готовит грядущий C#7?
Что нам готовит грядущий C#7?Что нам готовит грядущий C#7?
Что нам готовит грядущий C#7?
 
Продолжаем говорить о микрооптимизациях .NET-приложений
Продолжаем говорить о микрооптимизациях .NET-приложенийПродолжаем говорить о микрооптимизациях .NET-приложений
Продолжаем говорить о микрооптимизациях .NET-приложений
 
Распространённые ошибки оценки производительности .NET-приложений
Распространённые ошибки оценки производительности .NET-приложенийРаспространённые ошибки оценки производительности .NET-приложений
Распространённые ошибки оценки производительности .NET-приложений
 
Поговорим о микрооптимизациях .NET-приложений
Поговорим о микрооптимизациях .NET-приложенийПоговорим о микрооптимизациях .NET-приложений
Поговорим о микрооптимизациях .NET-приложений
 
Практические приёмы оптимизации .NET-приложений
Практические приёмы оптимизации .NET-приложенийПрактические приёмы оптимизации .NET-приложений
Практические приёмы оптимизации .NET-приложений
 
Поговорим о различных версиях .NET
Поговорим о различных версиях .NETПоговорим о различных версиях .NET
Поговорим о различных версиях .NET
 
Низкоуровневые оптимизации .NET-приложений
Низкоуровневые оптимизации .NET-приложенийНизкоуровневые оптимизации .NET-приложений
Низкоуровневые оптимизации .NET-приложений
 
Основы работы с Git
Основы работы с GitОсновы работы с Git
Основы работы с Git
 
Сборка мусора в .NET
Сборка мусора в .NETСборка мусора в .NET
Сборка мусора в .NET
 
Об особенностях использования значимых типов в .NET
Об особенностях использования значимых типов в .NETОб особенностях использования значимых типов в .NET
Об особенностях использования значимых типов в .NET
 

Recently uploaded

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 

Recently uploaded (20)

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 

Let’s talk about microbenchmarking

  • 1. Let’s talk about microbenchmarking Andrey Akinshin, JetBrains DotNext 2016 Helsinki 1/29
  • 3. Today’s environment • BenchmarkDotNet v0.10.1 • Haswell Core i7-4702MQ CPU 2.20GHz, Windows 10 • .NET Framework 4.6.2 + clrjit/compatjit-v4.6.1586.0 • Source code: https://git.io/v1RRX 3/29
  • 4. Today’s environment • BenchmarkDotNet v0.10.1 • Haswell Core i7-4702MQ CPU 2.20GHz, Windows 10 • .NET Framework 4.6.2 + clrjit/compatjit-v4.6.1586.0 • Source code: https://git.io/v1RRX Other environments: C# compiler old csc / Roslyn CLR CLR2 / CLR4 / CoreCLR / Mono OS Windows / Linux / MacOS / FreeBSD JIT LegacyJIT-x86 / LegacyJIT-x64 / RyuJIT-x64 GC MS (different CLRs) / Mono (Boehm/Sgen) Toolchain JIT / NGen / .NET Native Hardware ∞ different configurations . . . . . . And don’t forget about multiple versions 3/29
  • 5. Count of iterations A bad benchmark // Resolution (Stopwatch) = 466 ns // Latency (Stopwatch) = 18 ns var sw = Stopwatch.StartNew(); Foo(); // 100 ns sw.Stop(); WriteLine(sw.ElapsedMilliseconds); 4/29
  • 6. Count of iterations A bad benchmark // Resolution (Stopwatch) = 466 ns // Latency (Stopwatch) = 18 ns var sw = Stopwatch.StartNew(); Foo(); // 100 ns sw.Stop(); WriteLine(sw.ElapsedMilliseconds); A better benchmark var sw = Stopwatch.StartNew(); for (int i = 0; i < N; i++) // (N * 100 + eps) ns Foo(); // 100 ns sw.Stop(); var total = sw.ElapsedTicks / Stopwatch.Frequency; WriteLine(total / N); 4/29
  • 7. Several launches Run 01 : 529.8674 ns/op Run 02 : 532.7541 ns/op Run 03 : 558.7448 ns/op Run 04 : 555.6647 ns/op Run 05 : 539.6401 ns/op Run 06 : 539.3494 ns/op Run 07 : 564.3222 ns/op Run 08 : 551.9544 ns/op Run 09 : 550.1608 ns/op Run 10 : 533.0634 ns/op 5/29
  • 9. A simple case Central limit theorem to the rescue! 7/29
  • 11. Latencies Event Latency 1 CPU cycle 0.3 ns Level 1 cache access 0.9 ns Level 2 cache access 2.8 ns Level 3 cache access 12.9 ns Main memory access 120 ns Solid-state disk I/O 50-150 µs Rotational disk I/O 1-10 ms Internet: SF to NYC 40 ms Internet: SF to UK 81 ms Internet: SF to Australia 183 ms OS virtualization reboot 4 sec Hardware virtualization reboot 40 sec Physical system reboot 5 min © Systems Performance: Enterprise and the Cloud 9/29
  • 12. Sum of elements const int N = 1024; int[,] a = new int[N, N]; [Benchmark] public double SumIJ() { var sum = 0; for (int i = 0; i < N; i++) for (int j = 0; j < N; j++) sum += a[i, j]; return sum; } [Benchmark] public double SumJI() { var sum = 0; for (int j = 0; j < N; j++) for (int i = 0; i < N; i++) sum += a[i, j]; return sum; } 10/29
  • 13. Sum of elements const int N = 1024; int[,] a = new int[N, N]; [Benchmark] public double SumIJ() { var sum = 0; for (int i = 0; i < N; i++) for (int j = 0; j < N; j++) sum += a[i, j]; return sum; } [Benchmark] public double SumJI() { var sum = 0; for (int j = 0; j < N; j++) for (int i = 0; i < N; i++) sum += a[i, j]; return sum; } CPU cache effect: SumIJ SumJI LegacyJIT-x86 ≈1.3ms ≈4.0ms 10/29
  • 14. Cache-sensitive benchmarks Let’s run a benchmark several times: int[] x = new int[128 * 1024 * 1024]; for (int iter = 0; iter < 5; iter++) { var sw = Stopwatch.StartNew(); for (int i = 0; i < x.Length; i += 16) x[i]++; sw.Stop(); WriteLine(sw.ElapsedMilliseconds); } 11/29
  • 15. Cache-sensitive benchmarks Let’s run a benchmark several times: int[] x = new int[128 * 1024 * 1024]; for (int iter = 0; iter < 5; iter++) { var sw = Stopwatch.StartNew(); for (int i = 0; i < x.Length; i += 16) x[i]++; sw.Stop(); WriteLine(sw.ElapsedMilliseconds); } 176 // not warmed 81 // still not warmed 62 // the steady state 62 // the steady state 62 // the steady state Warmup is not only about .NET 11/29
  • 16. Branch prediction const int N = 32767; int[] sorted, unsorted; // random numbers [0..255] private static int Sum(int[] data) { int sum = 0; for (int i = 0; i < N; i++) if (data[i] >= 128) sum += data[i]; return sum; } [Benchmark] public int Sorted() { return Sum(sorted); } [Benchmark] public int Unsorted() { return Sum(unsorted); } 12/29
  • 17. Branch prediction const int N = 32767; int[] sorted, unsorted; // random numbers [0..255] private static int Sum(int[] data) { int sum = 0; for (int i = 0; i < N; i++) if (data[i] >= 128) sum += data[i]; return sum; } [Benchmark] public int Sorted() { return Sum(sorted); } [Benchmark] public int Unsorted() { return Sum(unsorted); } Sorted Unsorted LegacyJIT-x86 ≈20µs ≈139µs 12/29
  • 18. Isolation A bad benchmark var sw1 = Stopwatch.StartNew(); Foo(); sw1.Stop(); var sw2 = Stopwatch.StartNew(); Bar(); sw2.Stop(); 13/29
  • 19. Isolation A bad benchmark var sw1 = Stopwatch.StartNew(); Foo(); sw1.Stop(); var sw2 = Stopwatch.StartNew(); Bar(); sw2.Stop(); In general case, you should run each benchmark in his own process. Remember about: • Interface method dispatch • Garbage collector and autotuning • Conditional jitting 13/29
  • 20. Interface method dispatch private interface IInc { double Inc(double x); } private class Foo : IInc { double Inc(double x) => x + 1; } private class Bar : IInc { double Inc(double x) => x + 1; } private double Run(IInc inc) { double sum = 0; for (int i = 0; i < 1001; i++) sum += inc.Inc(0); return sum; } // Which method is faster? [Benchmark] public double FooFoo() { var foo1 = new Foo(); var foo2 = new Foo(); return Run(foo1) + Run(foo2); } [Benchmark] public double FooBar() { var foo = new Foo(); var bar = new Bar(); return Run(foo) + Run(bar); } 14/29
  • 21. Interface method dispatch private interface IInc { double Inc(double x); } private class Foo : IInc { double Inc(double x) => x + 1; } private class Bar : IInc { double Inc(double x) => x + 1; } private double Run(IInc inc) { double sum = 0; for (int i = 0; i < 1001; i++) sum += inc.Inc(0); return sum; } // Which method is faster? [Benchmark] public double FooFoo() { var foo1 = new Foo(); var foo2 = new Foo(); return Run(foo1) + Run(foo2); } [Benchmark] public double FooBar() { var foo = new Foo(); var bar = new Bar(); return Run(foo) + Run(bar); } FooFoo FooBar LegacyJIT-x64 ≈5.4µs ≈7.1µs 14/29
  • 22. Tricky inlining [Benchmark] int Calc() => WithoutStarg(0x11) + WithStarg(0x12); int WithoutStarg(int value) => value; int WithStarg(int value) { if (value < 0) value = -value; return value; } 15/29
  • 23. Tricky inlining [Benchmark] int Calc() => WithoutStarg(0x11) + WithStarg(0x12); int WithoutStarg(int value) => value; int WithStarg(int value) { if (value < 0) value = -value; return value; } LegacyJIT-x86 LegacyJIT-x64 RyuJIT-x64 ≈1.7ns 0 ≈1.7ns 15/29
  • 24. Tricky inlining [Benchmark] int Calc() => WithoutStarg(0x11) + WithStarg(0x12); int WithoutStarg(int value) => value; int WithStarg(int value) { if (value < 0) value = -value; return value; } LegacyJIT-x86 LegacyJIT-x64 RyuJIT-x64 ≈1.7ns 0 ≈1.7ns ; LegacyJIT-x64 : Inlining succeeded mov ecx,23h ret 15/29
  • 25. Tricky inlining [Benchmark] int Calc() => WithoutStarg(0x11) + WithStarg(0x12); int WithoutStarg(int value) => value; int WithStarg(int value) { if (value < 0) value = -value; return value; } LegacyJIT-x86 LegacyJIT-x64 RyuJIT-x64 ≈1.7ns 0 ≈1.7ns ; LegacyJIT-x64 : Inlining succeeded mov ecx,23h ret // RyuJIT-x64 : Inlining failed // Inline expansion aborted due to opcode // [06] OP_starg.s in method // Program:WithStarg(int):int:this 15/29
  • 26. SIMD struct MyVector // Copy-pasted from System.Numerics.Vector4 { public float X, Y, Z, W; public MyVector(float x, float y, float z, float w) { X = x; Y = y; Z = z; W = w; } [MethodImpl(MethodImplOptions.AggressiveInlining)] public static MyVector operator *(MyVector left, MyVector right) { return new MyVector(left.X * right.X, left.Y * right.Y, left.Z * right.Z, left.W * right.W); } } Vector4 vector1, vector2, vector3; MyVector myVector1, myVector2, myVector3; [Benchmark] void MyMul() => myVector3 = myVector1 * myVector2; [Benchmark] void BclMul() => vector3 = vector1 * vector2; 16/29
  • 27. SIMD struct MyVector // Copy-pasted from System.Numerics.Vector4 { public float X, Y, Z, W; public MyVector(float x, float y, float z, float w) { X = x; Y = y; Z = z; W = w; } [MethodImpl(MethodImplOptions.AggressiveInlining)] public static MyVector operator *(MyVector left, MyVector right) { return new MyVector(left.X * right.X, left.Y * right.Y, left.Z * right.Z, left.W * right.W); } } Vector4 vector1, vector2, vector3; MyVector myVector1, myVector2, myVector3; [Benchmark] void MyMul() => myVector3 = myVector1 * myVector2; [Benchmark] void BclMul() => vector3 = vector1 * vector2; LegacyJIT-x64 RyuJIT-x64 MyMul ≈12.9ns ≈2.5ns BclMul ≈12.9ns ≈0.2ns 16/29
  • 28. How so? LegacyJIT-x64 RyuJIT-x64 MyMul ≈12.9ns ≈2.5ns BclMul ≈12.9ns ≈0.2ns ; LegacyJIT-x64 ; MyMul, BclMul: Na¨ıve SSE ; ... movss xmm3,dword ptr [rsp+40h] mulss xmm3,dword ptr [rsp+30h] movss xmm2,dword ptr [rsp+44h] mulss xmm2,dword ptr [rsp+34h] movss xmm1,dword ptr [rsp+48h] mulss xmm1,dword ptr [rsp+38h] movss xmm0,dword ptr [rsp+4Ch] mulss xmm0,dword ptr [rsp+3Ch] xor eax,eax mov qword ptr [rsp],rax mov qword ptr [rsp+8],rax lea rax,[rsp] movss dword ptr [rax],xmm3 movss dword ptr [rax+4],xmm2 ; ... ; RyuJIT-x64 ; MyMul: Na¨ıve AVX ; ... vmulss xmm0,xmm0,xmm4 vmulss xmm1,xmm1,xmm5 vmulss xmm2,xmm2,xmm6 vmulss xmm3,xmm3,xmm7 ; ... ; BclMul: Smart AVX intrinsic vmovupd xmm0,xmmword ptr [rcx+8] vmovupd xmm1,xmmword ptr [rcx+18h] vmulps xmm0,xmm0,xmm1 vmovupd xmmword ptr [rcx+28h],xmm0 17/29
  • 29. Let’s calculate some square roots [Benchmark] double Sqrt13() => Math.Sqrt(1) + Math.Sqrt(2) + Math.Sqrt(3) + /* ... */ + Math.Sqrt(13); VS [Benchmark] double Sqrt14() => Math.Sqrt(1) + Math.Sqrt(2) + Math.Sqrt(3) + /* ... */ + Math.Sqrt(13) + Math.Sqrt(14); 18/29
  • 30. Let’s calculate some square roots [Benchmark] double Sqrt13() => Math.Sqrt(1) + Math.Sqrt(2) + Math.Sqrt(3) + /* ... */ + Math.Sqrt(13); VS [Benchmark] double Sqrt14() => Math.Sqrt(1) + Math.Sqrt(2) + Math.Sqrt(3) + /* ... */ + Math.Sqrt(13) + Math.Sqrt(14); RyuJIT-x64∗ Sqrt13 ≈91ns Sqrt14 0 ns ∗Can be changed in future versions, see github.com/dotnet/coreclr/issues/987 18/29
  • 31. How so? RyuJIT-x64, Sqrt13 vsqrtsd xmm0,xmm0,mmword ptr [7FF94F9E4D28h] vsqrtsd xmm1,xmm0,mmword ptr [7FF94F9E4D30h] vaddsd xmm0,xmm0,xmm1 vsqrtsd xmm1,xmm0,mmword ptr [7FF94F9E4D38h] vaddsd xmm0,xmm0,xmm1 vsqrtsd xmm1,xmm0,mmword ptr [7FF94F9E4D40h] vaddsd xmm0,xmm0,xmm1 ; A lot of vqrtsd and vaddsd instructions ; ... vsqrtsd xmm1,xmm0,mmword ptr [7FF94F9E4D88h] vaddsd xmm0,xmm0,xmm1 ret RyuJIT-x64, Sqrt14 vmovsd xmm0,qword ptr [7FF94F9C4C80h] ; Const ret 19/29
  • 32. How so? Big expression tree * stmtExpr void (top level) (IL 0x000... ???) | /--* mathFN double sqrt | | --* dconst double 13.000000000000000 | /--* + double | | | /--* mathFN double sqrt | | | | --* dconst double 12.000000000000000 | | --* + double | | | /--* mathFN double sqrt | | | | --* dconst double 11.000000000000000 | | --* + double | | | /--* mathFN double sqrt | | | | --* dconst double 10.000000000000000 | | --* + double | | | /--* mathFN double sqrt | | | | --* dconst double 9.0000000000000000 | | --* + double | | | /--* mathFN double sqrt | | | | --* dconst double 8.0000000000000000 | | --* + double | | | /--* mathFN double sqrt | | | | --* dconst double 7.0000000000000000 | | --* + double | | | /--* mathFN double sqrt | | | | --* dconst double 6.0000000000000000 | | --* + double | | | /--* mathFN double sqrt | | | | --* dconst double 5.0000000000000000 // ... 20/29
  • 33. How so? Constant folding in action N001 [000001] dconst 1.0000000000000000 => $c0 {DblCns[1.000000]} N002 [000002] mathFN => $c0 {DblCns[1.000000]} N003 [000003] dconst 2.0000000000000000 => $c1 {DblCns[2.000000]} N004 [000004] mathFN => $c2 {DblCns[1.414214]} N005 [000005] + => $c3 {DblCns[2.414214]} N006 [000006] dconst 3.0000000000000000 => $c4 {DblCns[3.000000]} N007 [000007] mathFN => $c5 {DblCns[1.732051]} N008 [000008] + => $c6 {DblCns[4.146264]} N009 [000009] dconst 4.0000000000000000 => $c7 {DblCns[4.000000]} N010 [000010] mathFN => $c1 {DblCns[2.000000]} N011 [000011] + => $c8 {DblCns[6.146264]} N012 [000012] dconst 5.0000000000000000 => $c9 {DblCns[5.000000]} N013 [000013] mathFN => $ca {DblCns[2.236068]} N014 [000014] + => $cb {DblCns[8.382332]} N015 [000015] dconst 6.0000000000000000 => $cc {DblCns[6.000000]} N016 [000016] mathFN => $cd {DblCns[2.449490]} N017 [000017] + => $ce {DblCns[10.831822]} N018 [000018] dconst 7.0000000000000000 => $cf {DblCns[7.000000]} N019 [000019] mathFN => $d0 {DblCns[2.645751]} N020 [000020] + => $d1 {DblCns[13.477573]} ... 21/29
  • 37. False sharing in action // It's an extremely na¨ıve benchmark // Don't try this at home int[] x = new int[1024]; void Inc(int p) { for (int i = 0; i < 10000001; i++) x[p]++; } void Run(int step) { var sw = Stopwatch.StartNew(); Task.WaitAll( Task.Factory.StartNew(() => Inc(0 * step)), Task.Factory.StartNew(() => Inc(1 * step)), Task.Factory.StartNew(() => Inc(2 * step)), Task.Factory.StartNew(() => Inc(3 * step))); WriteLine(sw.ElapsedMilliseconds); } 25/29
  • 38. False sharing in action // It's an extremely na¨ıve benchmark // Don't try this at home int[] x = new int[1024]; void Inc(int p) { for (int i = 0; i < 10000001; i++) x[p]++; } void Run(int step) { var sw = Stopwatch.StartNew(); Task.WaitAll( Task.Factory.StartNew(() => Inc(0 * step)), Task.Factory.StartNew(() => Inc(1 * step)), Task.Factory.StartNew(() => Inc(2 * step)), Task.Factory.StartNew(() => Inc(3 * step))); WriteLine(sw.ElapsedMilliseconds); } Run(1) Run(256) ≈400ms ≈150ms 25/29
  • 39. Conclusion: Benchmarking is hard Anon et al., “A Measure of Transaction Processing Power” There are lies, damn lies and then there are performance measures. 26/29