SlideShare a Scribd company logo
WHERETHEWILD
THINGS ARE
Benchmarking and
Micro-Optimisations
Matt Warren
@matthewwarren
http://mattwarren.org/
Premature Optimization
“We should forget about small efficiencies, say
about 97% of the time: premature
optimization is the root of all evil.Yet we
should not pass up our opportunities in that
critical 3%.“
- Donald Knuth
ProfilingTools
• ANTS Performance Profiler - Redgate
• dotTrace & dotMemory - Jet Brains
• PerfView - Microsoft (free)
• Visual Studio Profiling Tools (Ultimate, Premium or Professional)
• MiniProfiler - Stack Overflow (free)
BenchmarkDotNet
Why do you need a
benchmarking library?
static void Profile(int iterations, Action action)
{
action(); // warm up
GC.Collect(); // clean up
var watch = new Stopwatch();
watch.Start();
for (int i = 0; i < iterations; i++)
{
action();
}
watch.Stop();
Console.WriteLine("Time Elapsed {0} ms", watch.ElapsedMilliseconds);
}
Benchmarking small code samples in C#, can this implementation be improved?
http://stackoverflow.com/q/1047218/4500
private static T Result;
static void Profile<T>(int iterations, Func<T> func)
{
func(); // warm up
GC.Collect(); // clean up
var watch = new Stopwatch();
watch.Start();
for (int i = 0; i < iterations; i++)
{
Result = func();
}
watch.Stop();
Console.WriteLine("Time Elapsed {0} ms", watch.ElapsedMilliseconds);
}
Benchmarking small code samples in C#, can this implementation be improved?
http://stackoverflow.com/q/1047218/4500
BenchmarkDotNet project
Andrey Akinshin (the ‘Boss’)
@andrey_akinshin
http://aakinshin.net/en/blog/
Matt Warren (me)
Adam Sitnik (.NET Core guru)
@SitnikAdam
http://adamsitnik.com/
.NET
Foundation
Goals of BenchmarkDotNet
Benchmarking library that is:
•Accurate
•Easy-to-use
•Helpful
Benchmarking library that is:
•Accurate
•Easy-to-use
•Helpful
Stopwatch under the hood http://aakinshin.net/en/blog/dotnet/stopwatch/
LegacyJIT-x86 and first method call http://aakinshin.net/en/blog/dotnet/legacyjitx86-and-first-method-call/
Goals of BenchmarkDotNet
Proper docs!
benchmarkdotnet.org/
What BenchmarkDotNet doesn’t do
•Multi-threaded benchmarks
•Integrate with C.I builds
•Unit test runner integration
•Anything else?
http://github.com/dotnet/BenchmarkDotNet/issues/
“Other Benchmarking tools are available”
• NBench
• https://github.com/petabridge/NBench
• Microsoft Xunit performance
• http://github.com/Microsoft/xunit-performance/
• Lambda Micro Benchmarking (“Clash of the Lambdas”)
• https://github.com/biboudis/LambdaMicrobenchmarking
• Etimo.Benchmarks
• http://etimo.se/blog/etimo-benchmarks-lightweight-net-benchmark-tool/
• MeasureIt
• https://blogs.msdn.microsoft.com/vancem/2009/02/06/measureit-update-tool-for-
doing-microbenchmarks-for-net/
How it works
An invocation of the target method is an operation.
A bunch of operations is an iteration.
Iteration types:
• Pilot:The best operation count will be chosen.
• IdleWarmup, IdleTarget: BenchmarkDotNet overhead will be evaluated.
• MainWarmup:Warmup of the main method.
• MainTarget: Main measurements.
• Result = MainTarget – AverageOverhead
http://benchmarkdotnet.org/HowItWorks.htm
What happens under the covers?
Image credit Albert Rodríguez @UncleFirefox
DEMO
‘Hello World’ Benchmark
Scale of benchmarks
•millisecond - ms
• One thousandth of one second, single webapp request
•microsecond - us or µs
• One millionth of one second, several in-memory operations
•nanosecond - ns
• One billionth of one second, single operations
Who ‘times’ the timers?
[Benchmark]
public long StopwatchLatency()
{
return Stopwatch.GetTimestamp();
}
[Benchmark]
public long StopwatchGranularity()
{
// Loop until Stopwatch.GetTimestamp()
// gives us a different value
long lastTimestamp =
Stopwatch.GetTimestamp();
while (Stopwatch.GetTimestamp() ==
lastTimestamp)
{
}
return lastTimestamp;
}
[Benchmark]
public long DateTimeLatency()
{
return DateTime.Now.Ticks;
}
[Benchmark]
public long DateTimeGranularity()
{
// Loop until DateTime.Now
// gives us a different value
long lastTimestamp = DateTime.Now.Ticks;
while (DateTime.Now.Ticks == lastTimestamp)
{
}
return lastTimestamp;
}
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Job-FIDMNL : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Method | Mean | StdDev | Allocated |
--------------------- |---------------- |------------ |---------- |
StopwatchLatency | ?? ns | ?? ns | ?? B |
StopwatchGranularity | ?? ns | ?? ns | ?? B |
DateTimeLatency | ?? ns | ?? ns | ?? B |
DateTimeGranularity | ?? ns | ?? ns | ?? B |
Who ‘times’ the timers?
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Job-FIDMNL : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Method | Mean | StdDev | Allocated |
--------------------- |---------------- |------------ |---------- |
StopwatchLatency | 12.9960 ns | 0.1609 ns | 0 B |
StopwatchGranularity | 374.3049 ns | 2.4388 ns | 0 B |
DateTimeLatency | 682.2320 ns | 8.9341 ns | 32 B |
DateTimeGranularity | 996,025.6492 ns | 413.9175 ns | 47.34 kB |
Who ‘times’ the timers?
Loop-the-Loop
”Avoid foreach loop on everything except raw arrays?”
[Benchmark(Baseline = true)]
public int ForLoopArray()
{
var counter = 0;
for (int i = 0; i < anArray.Length; i++)
counter += anArray[i];
return counter;
}
[Benchmark]
public int ForEachArray()
{
var counter = 0;
foreach (var i in anArray)
counter += i;
return counter;
}
[Benchmark]
public int ForLoopList()
{
var counter = 0;
for (int i = 0; i < aList.Count; i++)
counter += aList[i];
return counter;
}
[Benchmark]
public int ForEachList()
{
var counter = 0;
foreach (var i in aList)
counter += i;
return counter;
}
Loop-the-Loop
”Avoid foreach loop on everything except raw arrays?”
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Method | Mean | StdDev | Scaled | Scaled-StdDev |
--------------- |-------------- |------------ |------- |-------------- |
ForLoopArray | ?? ns | | ?? | |
ForEachArray | ?? ns | | ?? | |
ForLoopList | ?? ns | | ?? | |
ForEachList | ?? ns | | ?? | |
Loop-the-Loop
”Avoid foreach loop on everything except raw arrays?”
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Method | Mean | StdDev | Scaled | Scaled-StdDev |
--------------- |-------------- |------------ |------- |-------------- |
ForLoopArray | 383.8279 ns | 2.9472 ns | 1.00 | 0.00 |
ForEachArray | 392.5611 ns | 4.1286 ns | 1.02 | 0.01 |
ForLoopList | 2,315.9658 ns | 12.1001 ns | 6.03 | 0.05 |
ForEachList | 2,663.5771 ns | 21.9822 ns | 6.94 | 0.08 |
Loop-the-Loop – ‘for loop’ - Arrays
Loop-the-Loop – ‘for loop’ - Lists
Abstractions - IDictionary v Dictionary
Dictionary<string, string> dictionary =
new Dictionary<string, string>();
IDictionary<string, string> iDictionary =
(IDictionary<string, string>)dictionary;
[Benchmark]
public Dictionary<string, string> DictionaryEnumeration()
{
foreach (var item in dictionary) { ; }
return dictionary;
}
[Benchmark]
public IDictionary<string, string> IDictionaryEnumeration()
{
foreach (var item in iDictionary) { ; }
return iDictionary;
}
Abstractions - IDictionary v Dictionary
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Method | Mean | StdErr | StdDev | Gen 0 | Allocated |
----------------------- |----------- |---------- |---------- |------- |---------- |
DictionaryEnumeration | ?? ns | ?? ns | ?? ns | ?? | ?? B |
IDictionaryEnumeration | ?? ns | ?? ns | ?? ns | ?? | ?? B |
// * Diagnostic Output - MemoryDiagnoser *
Note: the Gen 0/1/2 Measurements are per 1k Operations
Abstractions - IDictionary v Dictionary
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Method | Mean | StdErr | StdDev | Gen 0 | Allocated |
----------------------- |----------- |---------- |---------- |------- |---------- |
DictionaryEnumeration | 24.0353 ns | 0.2403 ns | 0.9307 ns | - | 0 B |
IDictionaryEnumeration | 41.6301 ns | 0.4479 ns | 2.1944 ns | 0.0086 | 32 B |
// * Diagnostic Output - MemoryDiagnoser *
Note: the Gen 0/1/2 Measurements are per 1k Operations
Abstractions - IDictionary v Dictionary
Dictionary<string, string> dictionary =
new Dictionary<string, string>();
IDictionary<string, string> iDictionary =
(IDictionary<string, string>)dictionary;
// struct – so doesn't allocate
Dictionary<string, string>.Enumerator enumerator =
dictionary.GetEnumerator();
// interface - allocates 56 B (64-bit) and 32 B (32-bit)
IEnumerator<KeyValuePair<string, string>> enumerator =
iDictionary.GetEnumerator();
Low-level increments
[LegacyJitX86Job, LegacyJitX64Job, RyuJitX64Job]
public class Program
{
private double a, b, c, d;
[Benchmark(OperationsPerInvoke = 4)]
public void MethodA()
{
a++; b++; c++; d++;
}
[Benchmark(OperationsPerInvoke = 4)]
public void MethodB()
{
a++; a++; a++; a++;
}
}
Low-level increments
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0
LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0
Runtime=Clr Allocated=0 B
Method | Job | Jit | Platform | Mean | StdErr | StdDev |
----------- |------------- |---------- |--------- |---------- |---------- |---------- |
Parallel | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns |
Sequential | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns |
Parallel | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns |
Sequential | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns |
Parallel | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns |
Sequential | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns |
MethodA = Parallel, MethodB() = Sequential
Low-level increments
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0
LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0
Runtime=Clr Allocated=0 B
Method | Job | Jit | Platform | Mean | StdErr | StdDev |
----------- |------------- |---------- |--------- |---------- |---------- |---------- |
Parallel | LegacyJitX64 | LegacyJit | X64 | 0.3420 ns | 0.0015 ns | 0.0057 ns |
Sequential | LegacyJitX64 | LegacyJit | X64 | 2.2038 ns | 0.0014 ns | 0.0051 ns |
Parallel | LegacyJitX86 | LegacyJit | X86 | 0.3276 ns | 0.0005 ns | 0.0020 ns |
Sequential | LegacyJitX86 | LegacyJit | X86 | 2.5229 ns | 0.0048 ns | 0.0187 ns |
Parallel | RyuJitX64 | RyuJit | X64 | 0.3686 ns | 0.0037 ns | 0.0144 ns |
Sequential | RyuJitX64 | RyuJit | X64 | 0.8959 ns | 0.0023 ns | 0.0090 ns |
MethodA = Parallel, MethodB() = Sequential
http://en.wikipedia.org/wiki/Instruction-level_parallelism
Search - Linear v Binary
private static int LinearSearch(
Data[] set, int key)
{
for (int i = 0; i < set.Length; i++)
{
var c = set[i].Key - key;
if (c == 0)
{
return i;
}
if (c > 0)
{
return ~i;
}
}
return ~set.Length;
}
private static int BinarySearch(
Data[] set, int key)
{
int i = 0;
int up = set.Length - 1;
while (i <= up)
{
int mid = (up - i) / 2 + i;
int c = set[mid].Key - key;
if (c == 0)
{
return mid;
}
if (c < 0)
i = mid + 1;
else
up = mid - 1;
}
return ~i;
}
Search - Linear v Binary
private readonly Data[][] dataSet;
private Data[] currentSet;
private int currentMid;
private int currentMax;
[Params(1, 2, 3, 4, 5, 7, 10, 12, 15)]
public int Size
{
set
{
currentSet = dataSet[value];
currentMax = value - 1;
currentMid = value / 2;
}
}
Linear
Search
v
Binary
Search
Linear
Search
v
Binary
Search
readonly fields
public struct Int256
{
private readonly long bits0, bits1,
bits2, bits3;
public Int256(long bits0, long bits1,
long bits2, long bits3)
{
this.bits0 = bits0; this.bits1 = bits1;
this.bits2 = bits2; this.bits3 = bits3;
}
public long Bits0 { get { return bits0; } }
public long Bits1 { get { return bits1; } }
public long Bits2 { get { return bits2; } }
public long Bits3 { get { return bits3; } }
}
private readonly Int256 readOnlyField =
new Int256(1L, 5L, 10L, 100L);
private Int256 field =
new Int256(1L, 5L, 10L, 100L);
[LegacyJitX86Job, LegacyJitX64Job, RyuJitX64Job]
public class Program
{
[Benchmark]
public long GetValue()
{
return field.Bits0 + field.Bits1 +
field.Bits2 + field.Bits3;
}
[Benchmark]
public long GetReadOnlyValue()
{
return readOnlyField.Bits0 +
readOnlyField.Bits1 +
readOnlyField.Bits2 +
readOnlyField.Bits3;
}
}
readonly fields
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0
LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0
Runtime=Clr Allocated=0 B
Method | Job | Jit | Platform | Mean | StdErr | StdDev |
----------------- |------------- |---------- |--------- |---------- |---------- |---------- |
GetValue | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns |
GetReadOnlyValue | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns |
GetValue | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns |
GetReadOnlyValue | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns |
GetValue | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns |
GetReadOnlyValue | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns |
readonly fields
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0
LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0
Runtime=Clr Allocated=0 B
Method | Job | Jit | Platform | Mean | StdErr | StdDev |
----------------- |------------- |---------- |--------- |---------- |---------- |---------- |
GetValue | LegacyJitX64 | LegacyJit | X64 | 0.7893 ns | 0.0078 ns | 0.0291 ns |
GetReadOnlyValue | LegacyJitX64 | LegacyJit | X64 | 9.5362 ns | 0.0251 ns | 0.0971 ns |
GetValue | LegacyJitX86 | LegacyJit | X86 | 1.4625 ns | 0.0506 ns | 0.1959 ns |
GetReadOnlyValue | LegacyJitX86 | LegacyJit | X86 | 1.9743 ns | 0.0641 ns | 0.2481 ns |
GetValue | RyuJitX64 | RyuJit | X64 | 0.3852 ns | 0.0183 ns | 0.0710 ns |
GetReadOnlyValue | RyuJitX64 | RyuJit | X64 | 9.6406 ns | 0.0803 ns | 0.3109 ns |
https://codeblog.jonskeet.uk/2014/07/16/micro-optimization-the-surprising-inefficiency-of-readonly-fields/
MOAR Benchmarks!!
Analysing Optimisations in the Wire Serialiser
• http://mattwarren.org/2016/08/23/Analysing-Optimisations-in-the-Wire-Serialiser/
Optimising LINQ
• http://mattwarren.org/2016/09/29/Optimising-LINQ/
Why is reflection slow?
• http://mattwarren.org/2016/12/14/Why-is-Reflection-slow/
Why Exceptions should be Exceptional
• http://mattwarren.org/2016/12/20/Why-Exceptions-should-be-Exceptional/
Resources
QUESTIONS?

More Related Content

What's hot

DTrace - Miracle Scotland Database Forum
DTrace - Miracle Scotland Database ForumDTrace - Miracle Scotland Database Forum
DTrace - Miracle Scotland Database ForumDoug Burns
 
Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf Tools
emBO_Conference
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
Roger Rafanell Mas
 
pstack, truss etc to understand deeper issues in Oracle database
pstack, truss etc to understand deeper issues in Oracle databasepstack, truss etc to understand deeper issues in Oracle database
pstack, truss etc to understand deeper issues in Oracle database
Riyaj Shamsudeen
 
Verification of Concurrent and Distributed Systems
Verification of Concurrent and Distributed SystemsVerification of Concurrent and Distributed Systems
Verification of Concurrent and Distributed Systems
Mykola Novik
 
Demystifying cost based optimization
Demystifying cost based optimizationDemystifying cost based optimization
Demystifying cost based optimization
Riyaj Shamsudeen
 
NetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF Observability
Brendan Gregg
 
A deep dive about VIP,HAIP, and SCAN
A deep dive about VIP,HAIP, and SCAN A deep dive about VIP,HAIP, and SCAN
A deep dive about VIP,HAIP, and SCAN
Riyaj Shamsudeen
 
Profiling Ruby
Profiling RubyProfiling Ruby
Profiling Ruby
Ian Pointer
 
Vertica trace
Vertica traceVertica trace
Vertica trace
Zvika Gutkin
 
Down to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap DumpsDown to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap Dumps
Andrei Pangin
 
Px execution in rac
Px execution in racPx execution in rac
Px execution in rac
Riyaj Shamsudeen
 
.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET
.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET
.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET
NETFest
 
Do snow.rwn
Do snow.rwnDo snow.rwn
Do snow.rwn
ARUN DN
 
Deep review of LMS process
Deep review of LMS processDeep review of LMS process
Deep review of LMS process
Riyaj Shamsudeen
 
Optimizing Parallel Reduction in CUDA : NOTES
Optimizing Parallel Reduction in CUDA : NOTESOptimizing Parallel Reduction in CUDA : NOTES
Optimizing Parallel Reduction in CUDA : NOTES
Subhajit Sahu
 
The Ring programming language version 1.7 book - Part 12 of 196
The Ring programming language version 1.7 book - Part 12 of 196The Ring programming language version 1.7 book - Part 12 of 196
The Ring programming language version 1.7 book - Part 12 of 196
Mahmoud Samir Fayed
 
Performance tuning a quick intoduction
Performance tuning   a quick intoductionPerformance tuning   a quick intoduction
Performance tuning a quick intoduction
Riyaj Shamsudeen
 
The Ring programming language version 1.6 book - Part 11 of 189
The Ring programming language version 1.6 book - Part 11 of 189The Ring programming language version 1.6 book - Part 11 of 189
The Ring programming language version 1.6 book - Part 11 of 189
Mahmoud Samir Fayed
 
Device-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded SystemsDevice-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded Systems
emBO_Conference
 

What's hot (20)

DTrace - Miracle Scotland Database Forum
DTrace - Miracle Scotland Database ForumDTrace - Miracle Scotland Database Forum
DTrace - Miracle Scotland Database Forum
 
Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf Tools
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
pstack, truss etc to understand deeper issues in Oracle database
pstack, truss etc to understand deeper issues in Oracle databasepstack, truss etc to understand deeper issues in Oracle database
pstack, truss etc to understand deeper issues in Oracle database
 
Verification of Concurrent and Distributed Systems
Verification of Concurrent and Distributed SystemsVerification of Concurrent and Distributed Systems
Verification of Concurrent and Distributed Systems
 
Demystifying cost based optimization
Demystifying cost based optimizationDemystifying cost based optimization
Demystifying cost based optimization
 
NetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF Observability
 
A deep dive about VIP,HAIP, and SCAN
A deep dive about VIP,HAIP, and SCAN A deep dive about VIP,HAIP, and SCAN
A deep dive about VIP,HAIP, and SCAN
 
Profiling Ruby
Profiling RubyProfiling Ruby
Profiling Ruby
 
Vertica trace
Vertica traceVertica trace
Vertica trace
 
Down to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap DumpsDown to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap Dumps
 
Px execution in rac
Px execution in racPx execution in rac
Px execution in rac
 
.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET
.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET
.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET
 
Do snow.rwn
Do snow.rwnDo snow.rwn
Do snow.rwn
 
Deep review of LMS process
Deep review of LMS processDeep review of LMS process
Deep review of LMS process
 
Optimizing Parallel Reduction in CUDA : NOTES
Optimizing Parallel Reduction in CUDA : NOTESOptimizing Parallel Reduction in CUDA : NOTES
Optimizing Parallel Reduction in CUDA : NOTES
 
The Ring programming language version 1.7 book - Part 12 of 196
The Ring programming language version 1.7 book - Part 12 of 196The Ring programming language version 1.7 book - Part 12 of 196
The Ring programming language version 1.7 book - Part 12 of 196
 
Performance tuning a quick intoduction
Performance tuning   a quick intoductionPerformance tuning   a quick intoduction
Performance tuning a quick intoduction
 
The Ring programming language version 1.6 book - Part 11 of 189
The Ring programming language version 1.6 book - Part 11 of 189The Ring programming language version 1.6 book - Part 11 of 189
The Ring programming language version 1.6 book - Part 11 of 189
 
Device-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded SystemsDevice-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded Systems
 

Similar to Where the wild things are - Benchmarking and Micro-Optimisations

Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"
Yulia Tsisyk
 
State of the .Net Performance
State of the .Net PerformanceState of the .Net Performance
State of the .Net Performance
CUSTIS
 
RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]
Igor Lozynskyi
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
Dongmin Yu
 
Deep dumpster diving 2010
Deep dumpster diving 2010Deep dumpster diving 2010
Deep dumpster diving 2010RonnBlack
 
Profiling ruby
Profiling rubyProfiling ruby
Profiling rubynasirj
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
NAVER D2
 
The Road To Reactive with RxJava JEEConf 2016
The Road To Reactive with RxJava JEEConf 2016The Road To Reactive with RxJava JEEConf 2016
The Road To Reactive with RxJava JEEConf 2016
Frank Lyaruu
 
Mysql handle socket
Mysql handle socketMysql handle socket
Mysql handle socket
Philip Zhong
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
Brendan Gregg
 
Limits Profiling
Limits ProfilingLimits Profiling
Limits Profiling
Adrian Larson
 
Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨
flyinweb
 
RxJava on Android
RxJava on AndroidRxJava on Android
RxJava on Android
Dustin Graham
 
5 must have patterns for your microservice - techorama
5 must have patterns for your microservice - techorama5 must have patterns for your microservice - techorama
5 must have patterns for your microservice - techorama
Ali Kheyrollahi
 
Benchmarking and PHPBench
Benchmarking and PHPBenchBenchmarking and PHPBench
Benchmarking and PHPBench
dantleech
 
QA Fest 2019. Антон Молдован. Load testing which you always wanted
QA Fest 2019. Антон Молдован. Load testing which you always wantedQA Fest 2019. Антон Молдован. Load testing which you always wanted
QA Fest 2019. Антон Молдован. Load testing which you always wanted
QAFest
 
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard World
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard WorldMonitoring Complex Systems: Keeping Your Head on Straight in a Hard World
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard World
Brian Troutwine
 
Code instrumentation
Code instrumentationCode instrumentation
Code instrumentation
Bryan Reinero
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
VictoriaMetrics
 

Similar to Where the wild things are - Benchmarking and Micro-Optimisations (20)

Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"
 
State of the .Net Performance
State of the .Net PerformanceState of the .Net Performance
State of the .Net Performance
 
RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Solr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene EuroconSolr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene Eurocon
 
Deep dumpster diving 2010
Deep dumpster diving 2010Deep dumpster diving 2010
Deep dumpster diving 2010
 
Profiling ruby
Profiling rubyProfiling ruby
Profiling ruby
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
The Road To Reactive with RxJava JEEConf 2016
The Road To Reactive with RxJava JEEConf 2016The Road To Reactive with RxJava JEEConf 2016
The Road To Reactive with RxJava JEEConf 2016
 
Mysql handle socket
Mysql handle socketMysql handle socket
Mysql handle socket
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
 
Limits Profiling
Limits ProfilingLimits Profiling
Limits Profiling
 
Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨
 
RxJava on Android
RxJava on AndroidRxJava on Android
RxJava on Android
 
5 must have patterns for your microservice - techorama
5 must have patterns for your microservice - techorama5 must have patterns for your microservice - techorama
5 must have patterns for your microservice - techorama
 
Benchmarking and PHPBench
Benchmarking and PHPBenchBenchmarking and PHPBench
Benchmarking and PHPBench
 
QA Fest 2019. Антон Молдован. Load testing which you always wanted
QA Fest 2019. Антон Молдован. Load testing which you always wantedQA Fest 2019. Антон Молдован. Load testing which you always wanted
QA Fest 2019. Антон Молдован. Load testing which you always wanted
 
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard World
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard WorldMonitoring Complex Systems: Keeping Your Head on Straight in a Hard World
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard World
 
Code instrumentation
Code instrumentationCode instrumentation
Code instrumentation
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
 

Recently uploaded

Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 

Recently uploaded (20)

Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 

Where the wild things are - Benchmarking and Micro-Optimisations

  • 3. Premature Optimization “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.Yet we should not pass up our opportunities in that critical 3%.“ - Donald Knuth
  • 4. ProfilingTools • ANTS Performance Profiler - Redgate • dotTrace & dotMemory - Jet Brains • PerfView - Microsoft (free) • Visual Studio Profiling Tools (Ultimate, Premium or Professional) • MiniProfiler - Stack Overflow (free)
  • 5.
  • 6.
  • 7.
  • 9. Why do you need a benchmarking library?
  • 10. static void Profile(int iterations, Action action) { action(); // warm up GC.Collect(); // clean up var watch = new Stopwatch(); watch.Start(); for (int i = 0; i < iterations; i++) { action(); } watch.Stop(); Console.WriteLine("Time Elapsed {0} ms", watch.ElapsedMilliseconds); } Benchmarking small code samples in C#, can this implementation be improved? http://stackoverflow.com/q/1047218/4500
  • 11. private static T Result; static void Profile<T>(int iterations, Func<T> func) { func(); // warm up GC.Collect(); // clean up var watch = new Stopwatch(); watch.Start(); for (int i = 0; i < iterations; i++) { Result = func(); } watch.Stop(); Console.WriteLine("Time Elapsed {0} ms", watch.ElapsedMilliseconds); } Benchmarking small code samples in C#, can this implementation be improved? http://stackoverflow.com/q/1047218/4500
  • 12. BenchmarkDotNet project Andrey Akinshin (the ‘Boss’) @andrey_akinshin http://aakinshin.net/en/blog/ Matt Warren (me) Adam Sitnik (.NET Core guru) @SitnikAdam http://adamsitnik.com/
  • 14. Goals of BenchmarkDotNet Benchmarking library that is: •Accurate •Easy-to-use •Helpful
  • 15. Benchmarking library that is: •Accurate •Easy-to-use •Helpful Stopwatch under the hood http://aakinshin.net/en/blog/dotnet/stopwatch/ LegacyJIT-x86 and first method call http://aakinshin.net/en/blog/dotnet/legacyjitx86-and-first-method-call/ Goals of BenchmarkDotNet
  • 17. What BenchmarkDotNet doesn’t do •Multi-threaded benchmarks •Integrate with C.I builds •Unit test runner integration •Anything else? http://github.com/dotnet/BenchmarkDotNet/issues/
  • 18. “Other Benchmarking tools are available” • NBench • https://github.com/petabridge/NBench • Microsoft Xunit performance • http://github.com/Microsoft/xunit-performance/ • Lambda Micro Benchmarking (“Clash of the Lambdas”) • https://github.com/biboudis/LambdaMicrobenchmarking • Etimo.Benchmarks • http://etimo.se/blog/etimo-benchmarks-lightweight-net-benchmark-tool/ • MeasureIt • https://blogs.msdn.microsoft.com/vancem/2009/02/06/measureit-update-tool-for- doing-microbenchmarks-for-net/
  • 19. How it works An invocation of the target method is an operation. A bunch of operations is an iteration. Iteration types: • Pilot:The best operation count will be chosen. • IdleWarmup, IdleTarget: BenchmarkDotNet overhead will be evaluated. • MainWarmup:Warmup of the main method. • MainTarget: Main measurements. • Result = MainTarget – AverageOverhead http://benchmarkdotnet.org/HowItWorks.htm
  • 20. What happens under the covers? Image credit Albert Rodríguez @UncleFirefox
  • 22. Scale of benchmarks •millisecond - ms • One thousandth of one second, single webapp request •microsecond - us or µs • One millionth of one second, several in-memory operations •nanosecond - ns • One billionth of one second, single operations
  • 23. Who ‘times’ the timers? [Benchmark] public long StopwatchLatency() { return Stopwatch.GetTimestamp(); } [Benchmark] public long StopwatchGranularity() { // Loop until Stopwatch.GetTimestamp() // gives us a different value long lastTimestamp = Stopwatch.GetTimestamp(); while (Stopwatch.GetTimestamp() == lastTimestamp) { } return lastTimestamp; } [Benchmark] public long DateTimeLatency() { return DateTime.Now.Ticks; } [Benchmark] public long DateTimeGranularity() { // Loop until DateTime.Now // gives us a different value long lastTimestamp = DateTime.Now.Ticks; while (DateTime.Now.Ticks == lastTimestamp) { } return lastTimestamp; }
  • 24. BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Job-FIDMNL : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Method | Mean | StdDev | Allocated | --------------------- |---------------- |------------ |---------- | StopwatchLatency | ?? ns | ?? ns | ?? B | StopwatchGranularity | ?? ns | ?? ns | ?? B | DateTimeLatency | ?? ns | ?? ns | ?? B | DateTimeGranularity | ?? ns | ?? ns | ?? B | Who ‘times’ the timers?
  • 25. BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Job-FIDMNL : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Method | Mean | StdDev | Allocated | --------------------- |---------------- |------------ |---------- | StopwatchLatency | 12.9960 ns | 0.1609 ns | 0 B | StopwatchGranularity | 374.3049 ns | 2.4388 ns | 0 B | DateTimeLatency | 682.2320 ns | 8.9341 ns | 32 B | DateTimeGranularity | 996,025.6492 ns | 413.9175 ns | 47.34 kB | Who ‘times’ the timers?
  • 26. Loop-the-Loop ”Avoid foreach loop on everything except raw arrays?” [Benchmark(Baseline = true)] public int ForLoopArray() { var counter = 0; for (int i = 0; i < anArray.Length; i++) counter += anArray[i]; return counter; } [Benchmark] public int ForEachArray() { var counter = 0; foreach (var i in anArray) counter += i; return counter; } [Benchmark] public int ForLoopList() { var counter = 0; for (int i = 0; i < aList.Count; i++) counter += aList[i]; return counter; } [Benchmark] public int ForEachList() { var counter = 0; foreach (var i in aList) counter += i; return counter; }
  • 27. Loop-the-Loop ”Avoid foreach loop on everything except raw arrays?” BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Method | Mean | StdDev | Scaled | Scaled-StdDev | --------------- |-------------- |------------ |------- |-------------- | ForLoopArray | ?? ns | | ?? | | ForEachArray | ?? ns | | ?? | | ForLoopList | ?? ns | | ?? | | ForEachList | ?? ns | | ?? | |
  • 28. Loop-the-Loop ”Avoid foreach loop on everything except raw arrays?” BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Method | Mean | StdDev | Scaled | Scaled-StdDev | --------------- |-------------- |------------ |------- |-------------- | ForLoopArray | 383.8279 ns | 2.9472 ns | 1.00 | 0.00 | ForEachArray | 392.5611 ns | 4.1286 ns | 1.02 | 0.01 | ForLoopList | 2,315.9658 ns | 12.1001 ns | 6.03 | 0.05 | ForEachList | 2,663.5771 ns | 21.9822 ns | 6.94 | 0.08 |
  • 29. Loop-the-Loop – ‘for loop’ - Arrays
  • 30. Loop-the-Loop – ‘for loop’ - Lists
  • 31. Abstractions - IDictionary v Dictionary Dictionary<string, string> dictionary = new Dictionary<string, string>(); IDictionary<string, string> iDictionary = (IDictionary<string, string>)dictionary; [Benchmark] public Dictionary<string, string> DictionaryEnumeration() { foreach (var item in dictionary) { ; } return dictionary; } [Benchmark] public IDictionary<string, string> IDictionaryEnumeration() { foreach (var item in iDictionary) { ; } return iDictionary; }
  • 32. Abstractions - IDictionary v Dictionary BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Method | Mean | StdErr | StdDev | Gen 0 | Allocated | ----------------------- |----------- |---------- |---------- |------- |---------- | DictionaryEnumeration | ?? ns | ?? ns | ?? ns | ?? | ?? B | IDictionaryEnumeration | ?? ns | ?? ns | ?? ns | ?? | ?? B | // * Diagnostic Output - MemoryDiagnoser * Note: the Gen 0/1/2 Measurements are per 1k Operations
  • 33. Abstractions - IDictionary v Dictionary BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Method | Mean | StdErr | StdDev | Gen 0 | Allocated | ----------------------- |----------- |---------- |---------- |------- |---------- | DictionaryEnumeration | 24.0353 ns | 0.2403 ns | 0.9307 ns | - | 0 B | IDictionaryEnumeration | 41.6301 ns | 0.4479 ns | 2.1944 ns | 0.0086 | 32 B | // * Diagnostic Output - MemoryDiagnoser * Note: the Gen 0/1/2 Measurements are per 1k Operations
  • 34. Abstractions - IDictionary v Dictionary Dictionary<string, string> dictionary = new Dictionary<string, string>(); IDictionary<string, string> iDictionary = (IDictionary<string, string>)dictionary; // struct – so doesn't allocate Dictionary<string, string>.Enumerator enumerator = dictionary.GetEnumerator(); // interface - allocates 56 B (64-bit) and 32 B (32-bit) IEnumerator<KeyValuePair<string, string>> enumerator = iDictionary.GetEnumerator();
  • 35. Low-level increments [LegacyJitX86Job, LegacyJitX64Job, RyuJitX64Job] public class Program { private double a, b, c, d; [Benchmark(OperationsPerInvoke = 4)] public void MethodA() { a++; b++; c++; d++; } [Benchmark(OperationsPerInvoke = 4)] public void MethodB() { a++; a++; a++; a++; } }
  • 36. Low-level increments BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0 LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0 Runtime=Clr Allocated=0 B Method | Job | Jit | Platform | Mean | StdErr | StdDev | ----------- |------------- |---------- |--------- |---------- |---------- |---------- | Parallel | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns | Sequential | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns | Parallel | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns | Sequential | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns | Parallel | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns | Sequential | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns | MethodA = Parallel, MethodB() = Sequential
  • 37. Low-level increments BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0 LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0 Runtime=Clr Allocated=0 B Method | Job | Jit | Platform | Mean | StdErr | StdDev | ----------- |------------- |---------- |--------- |---------- |---------- |---------- | Parallel | LegacyJitX64 | LegacyJit | X64 | 0.3420 ns | 0.0015 ns | 0.0057 ns | Sequential | LegacyJitX64 | LegacyJit | X64 | 2.2038 ns | 0.0014 ns | 0.0051 ns | Parallel | LegacyJitX86 | LegacyJit | X86 | 0.3276 ns | 0.0005 ns | 0.0020 ns | Sequential | LegacyJitX86 | LegacyJit | X86 | 2.5229 ns | 0.0048 ns | 0.0187 ns | Parallel | RyuJitX64 | RyuJit | X64 | 0.3686 ns | 0.0037 ns | 0.0144 ns | Sequential | RyuJitX64 | RyuJit | X64 | 0.8959 ns | 0.0023 ns | 0.0090 ns | MethodA = Parallel, MethodB() = Sequential http://en.wikipedia.org/wiki/Instruction-level_parallelism
  • 38. Search - Linear v Binary private static int LinearSearch( Data[] set, int key) { for (int i = 0; i < set.Length; i++) { var c = set[i].Key - key; if (c == 0) { return i; } if (c > 0) { return ~i; } } return ~set.Length; } private static int BinarySearch( Data[] set, int key) { int i = 0; int up = set.Length - 1; while (i <= up) { int mid = (up - i) / 2 + i; int c = set[mid].Key - key; if (c == 0) { return mid; } if (c < 0) i = mid + 1; else up = mid - 1; } return ~i; }
  • 39. Search - Linear v Binary private readonly Data[][] dataSet; private Data[] currentSet; private int currentMid; private int currentMax; [Params(1, 2, 3, 4, 5, 7, 10, 12, 15)] public int Size { set { currentSet = dataSet[value]; currentMax = value - 1; currentMid = value / 2; } }
  • 42. readonly fields public struct Int256 { private readonly long bits0, bits1, bits2, bits3; public Int256(long bits0, long bits1, long bits2, long bits3) { this.bits0 = bits0; this.bits1 = bits1; this.bits2 = bits2; this.bits3 = bits3; } public long Bits0 { get { return bits0; } } public long Bits1 { get { return bits1; } } public long Bits2 { get { return bits2; } } public long Bits3 { get { return bits3; } } } private readonly Int256 readOnlyField = new Int256(1L, 5L, 10L, 100L); private Int256 field = new Int256(1L, 5L, 10L, 100L); [LegacyJitX86Job, LegacyJitX64Job, RyuJitX64Job] public class Program { [Benchmark] public long GetValue() { return field.Bits0 + field.Bits1 + field.Bits2 + field.Bits3; } [Benchmark] public long GetReadOnlyValue() { return readOnlyField.Bits0 + readOnlyField.Bits1 + readOnlyField.Bits2 + readOnlyField.Bits3; } }
  • 43. readonly fields BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0 LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0 Runtime=Clr Allocated=0 B Method | Job | Jit | Platform | Mean | StdErr | StdDev | ----------------- |------------- |---------- |--------- |---------- |---------- |---------- | GetValue | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns | GetReadOnlyValue | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns | GetValue | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns | GetReadOnlyValue | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns | GetValue | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns | GetReadOnlyValue | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns |
  • 44. readonly fields BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0 LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0 Runtime=Clr Allocated=0 B Method | Job | Jit | Platform | Mean | StdErr | StdDev | ----------------- |------------- |---------- |--------- |---------- |---------- |---------- | GetValue | LegacyJitX64 | LegacyJit | X64 | 0.7893 ns | 0.0078 ns | 0.0291 ns | GetReadOnlyValue | LegacyJitX64 | LegacyJit | X64 | 9.5362 ns | 0.0251 ns | 0.0971 ns | GetValue | LegacyJitX86 | LegacyJit | X86 | 1.4625 ns | 0.0506 ns | 0.1959 ns | GetReadOnlyValue | LegacyJitX86 | LegacyJit | X86 | 1.9743 ns | 0.0641 ns | 0.2481 ns | GetValue | RyuJitX64 | RyuJit | X64 | 0.3852 ns | 0.0183 ns | 0.0710 ns | GetReadOnlyValue | RyuJitX64 | RyuJit | X64 | 9.6406 ns | 0.0803 ns | 0.3109 ns | https://codeblog.jonskeet.uk/2014/07/16/micro-optimization-the-surprising-inefficiency-of-readonly-fields/
  • 45. MOAR Benchmarks!! Analysing Optimisations in the Wire Serialiser • http://mattwarren.org/2016/08/23/Analysing-Optimisations-in-the-Wire-Serialiser/ Optimising LINQ • http://mattwarren.org/2016/09/29/Optimising-LINQ/ Why is reflection slow? • http://mattwarren.org/2016/12/14/Why-is-Reflection-slow/ Why Exceptions should be Exceptional • http://mattwarren.org/2016/12/20/Why-Exceptions-should-be-Exceptional/

Editor's Notes

  1. This is what we aim for This is why we wanted to build BenchmarkDotNet
  2. Aim is to make a framework that can accurately measure milli, micro and nano benchmarks. But in reality the main use-cases are probably milli/micro benchmarks, so these must work above all else!!  (Even lower down!!!) picoseconds: 1…1000 ps One trillionth of one second, pipelining
  3. Avoid foreach loop on everything except raw arrays