Where the wild things are - Benchmarking and Micro-Optimisations

WHERETHEWILD
THINGS ARE
Benchmarking and
Micro-Optimisations
Matt Warren
@matthewwarren
http://mattwarren.org/
Premature Optimization
“We should forget about small efficiencies, say
about 97% of the time: premature
optimization is the root of all evil.Yet we
should not pass up our opportunities in that
critical 3%.“
- Donald Knuth
ProfilingTools
• ANTS Performance Profiler - Redgate
• dotTrace & dotMemory - Jet Brains
• PerfView - Microsoft (free)
• Visual Studio Profiling Tools (Ultimate, Premium or Professional)
• MiniProfiler - Stack Overflow (free)
Where the wild things are - Benchmarking and Micro-Optimisations
Where the wild things are - Benchmarking and Micro-Optimisations
Where the wild things are - Benchmarking and Micro-Optimisations
BenchmarkDotNet
Why do you need a
benchmarking library?
static void Profile(int iterations, Action action)
{
action(); // warm up
GC.Collect(); // clean up
var watch = new Stopwatch();
watch.Start();
for (int i = 0; i < iterations; i++)
{
action();
}
watch.Stop();
Console.WriteLine("Time Elapsed {0} ms", watch.ElapsedMilliseconds);
}
Benchmarking small code samples in C#, can this implementation be improved?
http://stackoverflow.com/q/1047218/4500
private static T Result;
static void Profile<T>(int iterations, Func<T> func)
{
func(); // warm up
GC.Collect(); // clean up
var watch = new Stopwatch();
watch.Start();
for (int i = 0; i < iterations; i++)
{
Result = func();
}
watch.Stop();
Console.WriteLine("Time Elapsed {0} ms", watch.ElapsedMilliseconds);
}
Benchmarking small code samples in C#, can this implementation be improved?
http://stackoverflow.com/q/1047218/4500
BenchmarkDotNet project
Andrey Akinshin (the ‘Boss’)
@andrey_akinshin
http://aakinshin.net/en/blog/
Matt Warren (me)
Adam Sitnik (.NET Core guru)
@SitnikAdam
http://adamsitnik.com/
.NET
Foundation
Goals of BenchmarkDotNet
Benchmarking library that is:
•Accurate
•Easy-to-use
•Helpful
Benchmarking library that is:
•Accurate
•Easy-to-use
•Helpful
Stopwatch under the hood http://aakinshin.net/en/blog/dotnet/stopwatch/
LegacyJIT-x86 and first method call http://aakinshin.net/en/blog/dotnet/legacyjitx86-and-first-method-call/
Goals of BenchmarkDotNet
Proper docs!
benchmarkdotnet.org/
What BenchmarkDotNet doesn’t do
•Multi-threaded benchmarks
•Integrate with C.I builds
•Unit test runner integration
•Anything else?
http://github.com/dotnet/BenchmarkDotNet/issues/
“Other Benchmarking tools are available”
• NBench
• https://github.com/petabridge/NBench
• Microsoft Xunit performance
• http://github.com/Microsoft/xunit-performance/
• Lambda Micro Benchmarking (“Clash of the Lambdas”)
• https://github.com/biboudis/LambdaMicrobenchmarking
• Etimo.Benchmarks
• http://etimo.se/blog/etimo-benchmarks-lightweight-net-benchmark-tool/
• MeasureIt
• https://blogs.msdn.microsoft.com/vancem/2009/02/06/measureit-update-tool-for-
doing-microbenchmarks-for-net/
How it works
An invocation of the target method is an operation.
A bunch of operations is an iteration.
Iteration types:
• Pilot:The best operation count will be chosen.
• IdleWarmup, IdleTarget: BenchmarkDotNet overhead will be evaluated.
• MainWarmup:Warmup of the main method.
• MainTarget: Main measurements.
• Result = MainTarget – AverageOverhead
http://benchmarkdotnet.org/HowItWorks.htm
What happens under the covers?
Image credit Albert Rodríguez @UncleFirefox
DEMO
‘Hello World’ Benchmark
Scale of benchmarks
•millisecond - ms
• One thousandth of one second, single webapp request
•microsecond - us or µs
• One millionth of one second, several in-memory operations
•nanosecond - ns
• One billionth of one second, single operations
Who ‘times’ the timers?
[Benchmark]
public long StopwatchLatency()
{
return Stopwatch.GetTimestamp();
}
[Benchmark]
public long StopwatchGranularity()
{
// Loop until Stopwatch.GetTimestamp()
// gives us a different value
long lastTimestamp =
Stopwatch.GetTimestamp();
while (Stopwatch.GetTimestamp() ==
lastTimestamp)
{
}
return lastTimestamp;
}
[Benchmark]
public long DateTimeLatency()
{
return DateTime.Now.Ticks;
}
[Benchmark]
public long DateTimeGranularity()
{
// Loop until DateTime.Now
// gives us a different value
long lastTimestamp = DateTime.Now.Ticks;
while (DateTime.Now.Ticks == lastTimestamp)
{
}
return lastTimestamp;
}
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Job-FIDMNL : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Method | Mean | StdDev | Allocated |
--------------------- |---------------- |------------ |---------- |
StopwatchLatency | ?? ns | ?? ns | ?? B |
StopwatchGranularity | ?? ns | ?? ns | ?? B |
DateTimeLatency | ?? ns | ?? ns | ?? B |
DateTimeGranularity | ?? ns | ?? ns | ?? B |
Who ‘times’ the timers?
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Job-FIDMNL : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Method | Mean | StdDev | Allocated |
--------------------- |---------------- |------------ |---------- |
StopwatchLatency | 12.9960 ns | 0.1609 ns | 0 B |
StopwatchGranularity | 374.3049 ns | 2.4388 ns | 0 B |
DateTimeLatency | 682.2320 ns | 8.9341 ns | 32 B |
DateTimeGranularity | 996,025.6492 ns | 413.9175 ns | 47.34 kB |
Who ‘times’ the timers?
Loop-the-Loop
”Avoid foreach loop on everything except raw arrays?”
[Benchmark(Baseline = true)]
public int ForLoopArray()
{
var counter = 0;
for (int i = 0; i < anArray.Length; i++)
counter += anArray[i];
return counter;
}
[Benchmark]
public int ForEachArray()
{
var counter = 0;
foreach (var i in anArray)
counter += i;
return counter;
}
[Benchmark]
public int ForLoopList()
{
var counter = 0;
for (int i = 0; i < aList.Count; i++)
counter += aList[i];
return counter;
}
[Benchmark]
public int ForEachList()
{
var counter = 0;
foreach (var i in aList)
counter += i;
return counter;
}
Loop-the-Loop
”Avoid foreach loop on everything except raw arrays?”
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Method | Mean | StdDev | Scaled | Scaled-StdDev |
--------------- |-------------- |------------ |------- |-------------- |
ForLoopArray | ?? ns | | ?? | |
ForEachArray | ?? ns | | ?? | |
ForLoopList | ?? ns | | ?? | |
ForEachList | ?? ns | | ?? | |
Loop-the-Loop
”Avoid foreach loop on everything except raw arrays?”
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Method | Mean | StdDev | Scaled | Scaled-StdDev |
--------------- |-------------- |------------ |------- |-------------- |
ForLoopArray | 383.8279 ns | 2.9472 ns | 1.00 | 0.00 |
ForEachArray | 392.5611 ns | 4.1286 ns | 1.02 | 0.01 |
ForLoopList | 2,315.9658 ns | 12.1001 ns | 6.03 | 0.05 |
ForEachList | 2,663.5771 ns | 21.9822 ns | 6.94 | 0.08 |
Loop-the-Loop – ‘for loop’ - Arrays
Loop-the-Loop – ‘for loop’ - Lists
Abstractions - IDictionary v Dictionary
Dictionary<string, string> dictionary =
new Dictionary<string, string>();
IDictionary<string, string> iDictionary =
(IDictionary<string, string>)dictionary;
[Benchmark]
public Dictionary<string, string> DictionaryEnumeration()
{
foreach (var item in dictionary) { ; }
return dictionary;
}
[Benchmark]
public IDictionary<string, string> IDictionaryEnumeration()
{
foreach (var item in iDictionary) { ; }
return iDictionary;
}
Abstractions - IDictionary v Dictionary
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Method | Mean | StdErr | StdDev | Gen 0 | Allocated |
----------------------- |----------- |---------- |---------- |------- |---------- |
DictionaryEnumeration | ?? ns | ?? ns | ?? ns | ?? | ?? B |
IDictionaryEnumeration | ?? ns | ?? ns | ?? ns | ?? | ?? B |
// * Diagnostic Output - MemoryDiagnoser *
Note: the Gen 0/1/2 Measurements are per 1k Operations
Abstractions - IDictionary v Dictionary
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Method | Mean | StdErr | StdDev | Gen 0 | Allocated |
----------------------- |----------- |---------- |---------- |------- |---------- |
DictionaryEnumeration | 24.0353 ns | 0.2403 ns | 0.9307 ns | - | 0 B |
IDictionaryEnumeration | 41.6301 ns | 0.4479 ns | 2.1944 ns | 0.0086 | 32 B |
// * Diagnostic Output - MemoryDiagnoser *
Note: the Gen 0/1/2 Measurements are per 1k Operations
Abstractions - IDictionary v Dictionary
Dictionary<string, string> dictionary =
new Dictionary<string, string>();
IDictionary<string, string> iDictionary =
(IDictionary<string, string>)dictionary;
// struct – so doesn't allocate
Dictionary<string, string>.Enumerator enumerator =
dictionary.GetEnumerator();
// interface - allocates 56 B (64-bit) and 32 B (32-bit)
IEnumerator<KeyValuePair<string, string>> enumerator =
iDictionary.GetEnumerator();
Low-level increments
[LegacyJitX86Job, LegacyJitX64Job, RyuJitX64Job]
public class Program
{
private double a, b, c, d;
[Benchmark(OperationsPerInvoke = 4)]
public void MethodA()
{
a++; b++; c++; d++;
}
[Benchmark(OperationsPerInvoke = 4)]
public void MethodB()
{
a++; a++; a++; a++;
}
}
Low-level increments
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0
LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0
Runtime=Clr Allocated=0 B
Method | Job | Jit | Platform | Mean | StdErr | StdDev |
----------- |------------- |---------- |--------- |---------- |---------- |---------- |
Parallel | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns |
Sequential | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns |
Parallel | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns |
Sequential | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns |
Parallel | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns |
Sequential | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns |
MethodA = Parallel, MethodB() = Sequential
Low-level increments
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0
LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0
Runtime=Clr Allocated=0 B
Method | Job | Jit | Platform | Mean | StdErr | StdDev |
----------- |------------- |---------- |--------- |---------- |---------- |---------- |
Parallel | LegacyJitX64 | LegacyJit | X64 | 0.3420 ns | 0.0015 ns | 0.0057 ns |
Sequential | LegacyJitX64 | LegacyJit | X64 | 2.2038 ns | 0.0014 ns | 0.0051 ns |
Parallel | LegacyJitX86 | LegacyJit | X86 | 0.3276 ns | 0.0005 ns | 0.0020 ns |
Sequential | LegacyJitX86 | LegacyJit | X86 | 2.5229 ns | 0.0048 ns | 0.0187 ns |
Parallel | RyuJitX64 | RyuJit | X64 | 0.3686 ns | 0.0037 ns | 0.0144 ns |
Sequential | RyuJitX64 | RyuJit | X64 | 0.8959 ns | 0.0023 ns | 0.0090 ns |
MethodA = Parallel, MethodB() = Sequential
http://en.wikipedia.org/wiki/Instruction-level_parallelism
Search - Linear v Binary
private static int LinearSearch(
Data[] set, int key)
{
for (int i = 0; i < set.Length; i++)
{
var c = set[i].Key - key;
if (c == 0)
{
return i;
}
if (c > 0)
{
return ~i;
}
}
return ~set.Length;
}
private static int BinarySearch(
Data[] set, int key)
{
int i = 0;
int up = set.Length - 1;
while (i <= up)
{
int mid = (up - i) / 2 + i;
int c = set[mid].Key - key;
if (c == 0)
{
return mid;
}
if (c < 0)
i = mid + 1;
else
up = mid - 1;
}
return ~i;
}
Search - Linear v Binary
private readonly Data[][] dataSet;
private Data[] currentSet;
private int currentMid;
private int currentMax;
[Params(1, 2, 3, 4, 5, 7, 10, 12, 15)]
public int Size
{
set
{
currentSet = dataSet[value];
currentMax = value - 1;
currentMid = value / 2;
}
}
Linear
Search
v
Binary
Search
Linear
Search
v
Binary
Search
readonly fields
public struct Int256
{
private readonly long bits0, bits1,
bits2, bits3;
public Int256(long bits0, long bits1,
long bits2, long bits3)
{
this.bits0 = bits0; this.bits1 = bits1;
this.bits2 = bits2; this.bits3 = bits3;
}
public long Bits0 { get { return bits0; } }
public long Bits1 { get { return bits1; } }
public long Bits2 { get { return bits2; } }
public long Bits3 { get { return bits3; } }
}
private readonly Int256 readOnlyField =
new Int256(1L, 5L, 10L, 100L);
private Int256 field =
new Int256(1L, 5L, 10L, 100L);
[LegacyJitX86Job, LegacyJitX64Job, RyuJitX64Job]
public class Program
{
[Benchmark]
public long GetValue()
{
return field.Bits0 + field.Bits1 +
field.Bits2 + field.Bits3;
}
[Benchmark]
public long GetReadOnlyValue()
{
return readOnlyField.Bits0 +
readOnlyField.Bits1 +
readOnlyField.Bits2 +
readOnlyField.Bits3;
}
}
readonly fields
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0
LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0
Runtime=Clr Allocated=0 B
Method | Job | Jit | Platform | Mean | StdErr | StdDev |
----------------- |------------- |---------- |--------- |---------- |---------- |---------- |
GetValue | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns |
GetReadOnlyValue | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns |
GetValue | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns |
GetReadOnlyValue | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns |
GetValue | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns |
GetReadOnlyValue | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns |
readonly fields
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0
LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0
Runtime=Clr Allocated=0 B
Method | Job | Jit | Platform | Mean | StdErr | StdDev |
----------------- |------------- |---------- |--------- |---------- |---------- |---------- |
GetValue | LegacyJitX64 | LegacyJit | X64 | 0.7893 ns | 0.0078 ns | 0.0291 ns |
GetReadOnlyValue | LegacyJitX64 | LegacyJit | X64 | 9.5362 ns | 0.0251 ns | 0.0971 ns |
GetValue | LegacyJitX86 | LegacyJit | X86 | 1.4625 ns | 0.0506 ns | 0.1959 ns |
GetReadOnlyValue | LegacyJitX86 | LegacyJit | X86 | 1.9743 ns | 0.0641 ns | 0.2481 ns |
GetValue | RyuJitX64 | RyuJit | X64 | 0.3852 ns | 0.0183 ns | 0.0710 ns |
GetReadOnlyValue | RyuJitX64 | RyuJit | X64 | 9.6406 ns | 0.0803 ns | 0.3109 ns |
https://codeblog.jonskeet.uk/2014/07/16/micro-optimization-the-surprising-inefficiency-of-readonly-fields/
MOAR Benchmarks!!
Analysing Optimisations in the Wire Serialiser
• http://mattwarren.org/2016/08/23/Analysing-Optimisations-in-the-Wire-Serialiser/
Optimising LINQ
• http://mattwarren.org/2016/09/29/Optimising-LINQ/
Why is reflection slow?
• http://mattwarren.org/2016/12/14/Why-is-Reflection-slow/
Why Exceptions should be Exceptional
• http://mattwarren.org/2016/12/20/Why-Exceptions-should-be-Exceptional/
Resources
QUESTIONS?
1 of 47

Recommended

Performance and how to measure it - ProgSCon London 2016 by
Performance and how to measure it - ProgSCon London 2016Performance and how to measure it - ProgSCon London 2016
Performance and how to measure it - ProgSCon London 2016Matt Warren
769 views41 slides
Performance is a Feature! at DDD 11 by
Performance is a Feature! at DDD 11Performance is a Feature! at DDD 11
Performance is a Feature! at DDD 11Matt Warren
1.7K views49 slides
Performance is a feature! - London .NET User Group by
Performance is a feature! - London .NET User GroupPerformance is a feature! - London .NET User Group
Performance is a feature! - London .NET User GroupMatt Warren
100.3K views45 slides
Performance is a Feature! by
Performance is a Feature!Performance is a Feature!
Performance is a Feature!PostSharp Technologies
835 views53 slides
From 'dotnet run' to 'hello world' by
From 'dotnet run' to 'hello world'From 'dotnet run' to 'hello world'
From 'dotnet run' to 'hello world'Matt Warren
86.4K views40 slides
How Many Slaves (Ukoug) by
How Many Slaves (Ukoug)How Many Slaves (Ukoug)
How Many Slaves (Ukoug)Doug Burns
802 views37 slides

More Related Content

What's hot

DTrace - Miracle Scotland Database Forum by
DTrace - Miracle Scotland Database ForumDTrace - Miracle Scotland Database Forum
DTrace - Miracle Scotland Database ForumDoug Burns
6.2K views39 slides
Profiling your Applications using the Linux Perf Tools by
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsemBO_Conference
9.7K views37 slides
Profiling & Testing with Spark by
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with SparkRoger Rafanell Mas
3K views51 slides
pstack, truss etc to understand deeper issues in Oracle database by
pstack, truss etc to understand deeper issues in Oracle databasepstack, truss etc to understand deeper issues in Oracle database
pstack, truss etc to understand deeper issues in Oracle databaseRiyaj Shamsudeen
1.4K views34 slides
Verification of Concurrent and Distributed Systems by
Verification of Concurrent and Distributed SystemsVerification of Concurrent and Distributed Systems
Verification of Concurrent and Distributed SystemsMykola Novik
633 views52 slides
Demystifying cost based optimization by
Demystifying cost based optimizationDemystifying cost based optimization
Demystifying cost based optimizationRiyaj Shamsudeen
550 views31 slides

What's hot(20)

DTrace - Miracle Scotland Database Forum by Doug Burns
DTrace - Miracle Scotland Database ForumDTrace - Miracle Scotland Database Forum
DTrace - Miracle Scotland Database Forum
Doug Burns6.2K views
Profiling your Applications using the Linux Perf Tools by emBO_Conference
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf Tools
emBO_Conference9.7K views
pstack, truss etc to understand deeper issues in Oracle database by Riyaj Shamsudeen
pstack, truss etc to understand deeper issues in Oracle databasepstack, truss etc to understand deeper issues in Oracle database
pstack, truss etc to understand deeper issues in Oracle database
Riyaj Shamsudeen1.4K views
Verification of Concurrent and Distributed Systems by Mykola Novik
Verification of Concurrent and Distributed SystemsVerification of Concurrent and Distributed Systems
Verification of Concurrent and Distributed Systems
Mykola Novik633 views
Demystifying cost based optimization by Riyaj Shamsudeen
Demystifying cost based optimizationDemystifying cost based optimization
Demystifying cost based optimization
Riyaj Shamsudeen550 views
NetConf 2018 BPF Observability by Brendan Gregg
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF Observability
Brendan Gregg2.7K views
A deep dive about VIP,HAIP, and SCAN by Riyaj Shamsudeen
A deep dive about VIP,HAIP, and SCAN A deep dive about VIP,HAIP, and SCAN
A deep dive about VIP,HAIP, and SCAN
Riyaj Shamsudeen758 views
Profiling Ruby by Ian Pointer
Profiling RubyProfiling Ruby
Profiling Ruby
Ian Pointer2.3K views
Down to Stack Traces, up from Heap Dumps by Andrei Pangin
Down to Stack Traces, up from Heap DumpsDown to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap Dumps
Andrei Pangin1.7K views
.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET by NETFest
.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET
.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET
NETFest705 views
Do snow.rwn by ARUN DN
Do snow.rwnDo snow.rwn
Do snow.rwn
ARUN DN57 views
Optimizing Parallel Reduction in CUDA : NOTES by Subhajit Sahu
Optimizing Parallel Reduction in CUDA : NOTESOptimizing Parallel Reduction in CUDA : NOTES
Optimizing Parallel Reduction in CUDA : NOTES
Subhajit Sahu117 views
The Ring programming language version 1.7 book - Part 12 of 196 by Mahmoud Samir Fayed
The Ring programming language version 1.7 book - Part 12 of 196The Ring programming language version 1.7 book - Part 12 of 196
The Ring programming language version 1.7 book - Part 12 of 196
Performance tuning a quick intoduction by Riyaj Shamsudeen
Performance tuning   a quick intoductionPerformance tuning   a quick intoduction
Performance tuning a quick intoduction
Riyaj Shamsudeen2.2K views
The Ring programming language version 1.6 book - Part 11 of 189 by Mahmoud Samir Fayed
The Ring programming language version 1.6 book - Part 11 of 189The Ring programming language version 1.6 book - Part 11 of 189
The Ring programming language version 1.6 book - Part 11 of 189
Device-specific Clang Tooling for Embedded Systems by emBO_Conference
Device-specific Clang Tooling for Embedded SystemsDevice-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded Systems
emBO_Conference5.1K views

Similar to Where the wild things are - Benchmarking and Micro-Optimisations

Adam Sitnik "State of the .NET Performance" by
Adam Sitnik "State of the .NET Performance"Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"Yulia Tsisyk
1.6K views43 slides
State of the .Net Performance by
State of the .Net PerformanceState of the .Net Performance
State of the .Net PerformanceCUSTIS
198 views43 slides
RxJava applied [JavaDay Kyiv 2016] by
RxJava applied [JavaDay Kyiv 2016]RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]Igor Lozynskyi
514 views81 slides
Presto anatomy by
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
5K views50 slides
Solr @ Etsy - Apache Lucene Eurocon by
Solr @ Etsy - Apache Lucene EuroconSolr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene EuroconGiovanni Fernandez-Kincade
1.6K views56 slides
Deep dumpster diving 2010 by
Deep dumpster diving 2010Deep dumpster diving 2010
Deep dumpster diving 2010RonnBlack
423 views45 slides

Similar to Where the wild things are - Benchmarking and Micro-Optimisations(20)

Adam Sitnik "State of the .NET Performance" by Yulia Tsisyk
Adam Sitnik "State of the .NET Performance"Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"
Yulia Tsisyk1.6K views
State of the .Net Performance by CUSTIS
State of the .Net PerformanceState of the .Net Performance
State of the .Net Performance
CUSTIS198 views
RxJava applied [JavaDay Kyiv 2016] by Igor Lozynskyi
RxJava applied [JavaDay Kyiv 2016]RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]
Igor Lozynskyi514 views
Deep dumpster diving 2010 by RonnBlack
Deep dumpster diving 2010Deep dumpster diving 2010
Deep dumpster diving 2010
RonnBlack423 views
Profiling ruby by nasirj
Profiling rubyProfiling ruby
Profiling ruby
nasirj961 views
[245] presto 내부구조 파헤치기 by NAVER D2
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
NAVER D210.1K views
The Road To Reactive with RxJava JEEConf 2016 by Frank Lyaruu
The Road To Reactive with RxJava JEEConf 2016The Road To Reactive with RxJava JEEConf 2016
The Road To Reactive with RxJava JEEConf 2016
Frank Lyaruu749 views
Mysql handle socket by Philip Zhong
Mysql handle socketMysql handle socket
Mysql handle socket
Philip Zhong2.2K views
LSFMM 2019 BPF Observability by Brendan Gregg
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
Brendan Gregg8.3K views
Nodejs性能分析优化和分布式设计探讨 by flyinweb
Nodejs性能分析优化和分布式设计探讨Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨
flyinweb3K views
5 must have patterns for your microservice - techorama by Ali Kheyrollahi
5 must have patterns for your microservice - techorama5 must have patterns for your microservice - techorama
5 must have patterns for your microservice - techorama
Ali Kheyrollahi498 views
Benchmarking and PHPBench by dantleech
Benchmarking and PHPBenchBenchmarking and PHPBench
Benchmarking and PHPBench
dantleech1K views
QA Fest 2019. Антон Молдован. Load testing which you always wanted by QAFest
QA Fest 2019. Антон Молдован. Load testing which you always wantedQA Fest 2019. Антон Молдован. Load testing which you always wanted
QA Fest 2019. Антон Молдован. Load testing which you always wanted
QAFest324 views
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard World by Brian Troutwine
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard WorldMonitoring Complex Systems: Keeping Your Head on Straight in a Hard World
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard World
Brian Troutwine2K views
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse by VictoriaMetrics
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
VictoriaMetrics180 views

Recently uploaded

How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...ShapeBlue
97 views28 slides
Network Source of Truth and Infrastructure as Code revisited by
Network Source of Truth and Infrastructure as Code revisitedNetwork Source of Truth and Infrastructure as Code revisited
Network Source of Truth and Infrastructure as Code revisitedNetwork Automation Forum
49 views45 slides
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...ShapeBlue
120 views62 slides
Microsoft Power Platform.pptx by
Microsoft Power Platform.pptxMicrosoft Power Platform.pptx
Microsoft Power Platform.pptxUni Systems S.M.S.A.
74 views38 slides
"Surviving highload with Node.js", Andrii Shumada by
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada Fwdays
49 views29 slides
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue by
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueShapeBlue
63 views15 slides

Recently uploaded(20)

How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue97 views
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by ShapeBlue
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
ShapeBlue120 views
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays49 views
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue by ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
ShapeBlue63 views
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software373 views
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R... by ShapeBlue
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
ShapeBlue105 views
State of the Union - Rohit Yadav - Apache CloudStack by ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue218 views
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by ShapeBlue
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
ShapeBlue48 views
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ by ShapeBlue
Confidence in CloudStack - Aron Wagner, Nathan Gleason - AmericConfidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
ShapeBlue58 views
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue by ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
ShapeBlue149 views
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT by ShapeBlue
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
ShapeBlue138 views
Why and How CloudStack at weSystems - Stephan Bienek - weSystems by ShapeBlue
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsWhy and How CloudStack at weSystems - Stephan Bienek - weSystems
Why and How CloudStack at weSystems - Stephan Bienek - weSystems
ShapeBlue172 views
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T by ShapeBlue
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&TCloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
ShapeBlue81 views
Keynote Talk: Open Source is Not Dead - Charles Schulz - Vates by ShapeBlue
Keynote Talk: Open Source is Not Dead - Charles Schulz - VatesKeynote Talk: Open Source is Not Dead - Charles Schulz - Vates
Keynote Talk: Open Source is Not Dead - Charles Schulz - Vates
ShapeBlue178 views
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue by ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueVNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
ShapeBlue134 views
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ... by ShapeBlue
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
ShapeBlue121 views

Where the wild things are - Benchmarking and Micro-Optimisations

  • 3. Premature Optimization “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.Yet we should not pass up our opportunities in that critical 3%.“ - Donald Knuth
  • 4. ProfilingTools • ANTS Performance Profiler - Redgate • dotTrace & dotMemory - Jet Brains • PerfView - Microsoft (free) • Visual Studio Profiling Tools (Ultimate, Premium or Professional) • MiniProfiler - Stack Overflow (free)
  • 9. Why do you need a benchmarking library?
  • 10. static void Profile(int iterations, Action action) { action(); // warm up GC.Collect(); // clean up var watch = new Stopwatch(); watch.Start(); for (int i = 0; i < iterations; i++) { action(); } watch.Stop(); Console.WriteLine("Time Elapsed {0} ms", watch.ElapsedMilliseconds); } Benchmarking small code samples in C#, can this implementation be improved? http://stackoverflow.com/q/1047218/4500
  • 11. private static T Result; static void Profile<T>(int iterations, Func<T> func) { func(); // warm up GC.Collect(); // clean up var watch = new Stopwatch(); watch.Start(); for (int i = 0; i < iterations; i++) { Result = func(); } watch.Stop(); Console.WriteLine("Time Elapsed {0} ms", watch.ElapsedMilliseconds); } Benchmarking small code samples in C#, can this implementation be improved? http://stackoverflow.com/q/1047218/4500
  • 12. BenchmarkDotNet project Andrey Akinshin (the ‘Boss’) @andrey_akinshin http://aakinshin.net/en/blog/ Matt Warren (me) Adam Sitnik (.NET Core guru) @SitnikAdam http://adamsitnik.com/
  • 14. Goals of BenchmarkDotNet Benchmarking library that is: •Accurate •Easy-to-use •Helpful
  • 15. Benchmarking library that is: •Accurate •Easy-to-use •Helpful Stopwatch under the hood http://aakinshin.net/en/blog/dotnet/stopwatch/ LegacyJIT-x86 and first method call http://aakinshin.net/en/blog/dotnet/legacyjitx86-and-first-method-call/ Goals of BenchmarkDotNet
  • 17. What BenchmarkDotNet doesn’t do •Multi-threaded benchmarks •Integrate with C.I builds •Unit test runner integration •Anything else? http://github.com/dotnet/BenchmarkDotNet/issues/
  • 18. “Other Benchmarking tools are available” • NBench • https://github.com/petabridge/NBench • Microsoft Xunit performance • http://github.com/Microsoft/xunit-performance/ • Lambda Micro Benchmarking (“Clash of the Lambdas”) • https://github.com/biboudis/LambdaMicrobenchmarking • Etimo.Benchmarks • http://etimo.se/blog/etimo-benchmarks-lightweight-net-benchmark-tool/ • MeasureIt • https://blogs.msdn.microsoft.com/vancem/2009/02/06/measureit-update-tool-for- doing-microbenchmarks-for-net/
  • 19. How it works An invocation of the target method is an operation. A bunch of operations is an iteration. Iteration types: • Pilot:The best operation count will be chosen. • IdleWarmup, IdleTarget: BenchmarkDotNet overhead will be evaluated. • MainWarmup:Warmup of the main method. • MainTarget: Main measurements. • Result = MainTarget – AverageOverhead http://benchmarkdotnet.org/HowItWorks.htm
  • 20. What happens under the covers? Image credit Albert Rodríguez @UncleFirefox
  • 22. Scale of benchmarks •millisecond - ms • One thousandth of one second, single webapp request •microsecond - us or µs • One millionth of one second, several in-memory operations •nanosecond - ns • One billionth of one second, single operations
  • 23. Who ‘times’ the timers? [Benchmark] public long StopwatchLatency() { return Stopwatch.GetTimestamp(); } [Benchmark] public long StopwatchGranularity() { // Loop until Stopwatch.GetTimestamp() // gives us a different value long lastTimestamp = Stopwatch.GetTimestamp(); while (Stopwatch.GetTimestamp() == lastTimestamp) { } return lastTimestamp; } [Benchmark] public long DateTimeLatency() { return DateTime.Now.Ticks; } [Benchmark] public long DateTimeGranularity() { // Loop until DateTime.Now // gives us a different value long lastTimestamp = DateTime.Now.Ticks; while (DateTime.Now.Ticks == lastTimestamp) { } return lastTimestamp; }
  • 24. BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Job-FIDMNL : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Method | Mean | StdDev | Allocated | --------------------- |---------------- |------------ |---------- | StopwatchLatency | ?? ns | ?? ns | ?? B | StopwatchGranularity | ?? ns | ?? ns | ?? B | DateTimeLatency | ?? ns | ?? ns | ?? B | DateTimeGranularity | ?? ns | ?? ns | ?? B | Who ‘times’ the timers?
  • 25. BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Job-FIDMNL : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Method | Mean | StdDev | Allocated | --------------------- |---------------- |------------ |---------- | StopwatchLatency | 12.9960 ns | 0.1609 ns | 0 B | StopwatchGranularity | 374.3049 ns | 2.4388 ns | 0 B | DateTimeLatency | 682.2320 ns | 8.9341 ns | 32 B | DateTimeGranularity | 996,025.6492 ns | 413.9175 ns | 47.34 kB | Who ‘times’ the timers?
  • 26. Loop-the-Loop ”Avoid foreach loop on everything except raw arrays?” [Benchmark(Baseline = true)] public int ForLoopArray() { var counter = 0; for (int i = 0; i < anArray.Length; i++) counter += anArray[i]; return counter; } [Benchmark] public int ForEachArray() { var counter = 0; foreach (var i in anArray) counter += i; return counter; } [Benchmark] public int ForLoopList() { var counter = 0; for (int i = 0; i < aList.Count; i++) counter += aList[i]; return counter; } [Benchmark] public int ForEachList() { var counter = 0; foreach (var i in aList) counter += i; return counter; }
  • 27. Loop-the-Loop ”Avoid foreach loop on everything except raw arrays?” BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Method | Mean | StdDev | Scaled | Scaled-StdDev | --------------- |-------------- |------------ |------- |-------------- | ForLoopArray | ?? ns | | ?? | | ForEachArray | ?? ns | | ?? | | ForLoopList | ?? ns | | ?? | | ForEachList | ?? ns | | ?? | |
  • 28. Loop-the-Loop ”Avoid foreach loop on everything except raw arrays?” BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Method | Mean | StdDev | Scaled | Scaled-StdDev | --------------- |-------------- |------------ |------- |-------------- | ForLoopArray | 383.8279 ns | 2.9472 ns | 1.00 | 0.00 | ForEachArray | 392.5611 ns | 4.1286 ns | 1.02 | 0.01 | ForLoopList | 2,315.9658 ns | 12.1001 ns | 6.03 | 0.05 | ForEachList | 2,663.5771 ns | 21.9822 ns | 6.94 | 0.08 |
  • 29. Loop-the-Loop – ‘for loop’ - Arrays
  • 30. Loop-the-Loop – ‘for loop’ - Lists
  • 31. Abstractions - IDictionary v Dictionary Dictionary<string, string> dictionary = new Dictionary<string, string>(); IDictionary<string, string> iDictionary = (IDictionary<string, string>)dictionary; [Benchmark] public Dictionary<string, string> DictionaryEnumeration() { foreach (var item in dictionary) { ; } return dictionary; } [Benchmark] public IDictionary<string, string> IDictionaryEnumeration() { foreach (var item in iDictionary) { ; } return iDictionary; }
  • 32. Abstractions - IDictionary v Dictionary BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Method | Mean | StdErr | StdDev | Gen 0 | Allocated | ----------------------- |----------- |---------- |---------- |------- |---------- | DictionaryEnumeration | ?? ns | ?? ns | ?? ns | ?? | ?? B | IDictionaryEnumeration | ?? ns | ?? ns | ?? ns | ?? | ?? B | // * Diagnostic Output - MemoryDiagnoser * Note: the Gen 0/1/2 Measurements are per 1k Operations
  • 33. Abstractions - IDictionary v Dictionary BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Method | Mean | StdErr | StdDev | Gen 0 | Allocated | ----------------------- |----------- |---------- |---------- |------- |---------- | DictionaryEnumeration | 24.0353 ns | 0.2403 ns | 0.9307 ns | - | 0 B | IDictionaryEnumeration | 41.6301 ns | 0.4479 ns | 2.1944 ns | 0.0086 | 32 B | // * Diagnostic Output - MemoryDiagnoser * Note: the Gen 0/1/2 Measurements are per 1k Operations
  • 34. Abstractions - IDictionary v Dictionary Dictionary<string, string> dictionary = new Dictionary<string, string>(); IDictionary<string, string> iDictionary = (IDictionary<string, string>)dictionary; // struct – so doesn't allocate Dictionary<string, string>.Enumerator enumerator = dictionary.GetEnumerator(); // interface - allocates 56 B (64-bit) and 32 B (32-bit) IEnumerator<KeyValuePair<string, string>> enumerator = iDictionary.GetEnumerator();
  • 35. Low-level increments [LegacyJitX86Job, LegacyJitX64Job, RyuJitX64Job] public class Program { private double a, b, c, d; [Benchmark(OperationsPerInvoke = 4)] public void MethodA() { a++; b++; c++; d++; } [Benchmark(OperationsPerInvoke = 4)] public void MethodB() { a++; a++; a++; a++; } }
  • 36. Low-level increments BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0 LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0 Runtime=Clr Allocated=0 B Method | Job | Jit | Platform | Mean | StdErr | StdDev | ----------- |------------- |---------- |--------- |---------- |---------- |---------- | Parallel | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns | Sequential | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns | Parallel | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns | Sequential | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns | Parallel | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns | Sequential | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns | MethodA = Parallel, MethodB() = Sequential
  • 37. Low-level increments BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0 LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0 Runtime=Clr Allocated=0 B Method | Job | Jit | Platform | Mean | StdErr | StdDev | ----------- |------------- |---------- |--------- |---------- |---------- |---------- | Parallel | LegacyJitX64 | LegacyJit | X64 | 0.3420 ns | 0.0015 ns | 0.0057 ns | Sequential | LegacyJitX64 | LegacyJit | X64 | 2.2038 ns | 0.0014 ns | 0.0051 ns | Parallel | LegacyJitX86 | LegacyJit | X86 | 0.3276 ns | 0.0005 ns | 0.0020 ns | Sequential | LegacyJitX86 | LegacyJit | X86 | 2.5229 ns | 0.0048 ns | 0.0187 ns | Parallel | RyuJitX64 | RyuJit | X64 | 0.3686 ns | 0.0037 ns | 0.0144 ns | Sequential | RyuJitX64 | RyuJit | X64 | 0.8959 ns | 0.0023 ns | 0.0090 ns | MethodA = Parallel, MethodB() = Sequential http://en.wikipedia.org/wiki/Instruction-level_parallelism
  • 38. Search - Linear v Binary private static int LinearSearch( Data[] set, int key) { for (int i = 0; i < set.Length; i++) { var c = set[i].Key - key; if (c == 0) { return i; } if (c > 0) { return ~i; } } return ~set.Length; } private static int BinarySearch( Data[] set, int key) { int i = 0; int up = set.Length - 1; while (i <= up) { int mid = (up - i) / 2 + i; int c = set[mid].Key - key; if (c == 0) { return mid; } if (c < 0) i = mid + 1; else up = mid - 1; } return ~i; }
  • 39. Search - Linear v Binary private readonly Data[][] dataSet; private Data[] currentSet; private int currentMid; private int currentMax; [Params(1, 2, 3, 4, 5, 7, 10, 12, 15)] public int Size { set { currentSet = dataSet[value]; currentMax = value - 1; currentMid = value / 2; } }
  • 42. readonly fields public struct Int256 { private readonly long bits0, bits1, bits2, bits3; public Int256(long bits0, long bits1, long bits2, long bits3) { this.bits0 = bits0; this.bits1 = bits1; this.bits2 = bits2; this.bits3 = bits3; } public long Bits0 { get { return bits0; } } public long Bits1 { get { return bits1; } } public long Bits2 { get { return bits2; } } public long Bits3 { get { return bits3; } } } private readonly Int256 readOnlyField = new Int256(1L, 5L, 10L, 100L); private Int256 field = new Int256(1L, 5L, 10L, 100L); [LegacyJitX86Job, LegacyJitX64Job, RyuJitX64Job] public class Program { [Benchmark] public long GetValue() { return field.Bits0 + field.Bits1 + field.Bits2 + field.Bits3; } [Benchmark] public long GetReadOnlyValue() { return readOnlyField.Bits0 + readOnlyField.Bits1 + readOnlyField.Bits2 + readOnlyField.Bits3; } }
  • 43. readonly fields BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0 LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0 Runtime=Clr Allocated=0 B Method | Job | Jit | Platform | Mean | StdErr | StdDev | ----------------- |------------- |---------- |--------- |---------- |---------- |---------- | GetValue | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns | GetReadOnlyValue | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns | GetValue | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns | GetReadOnlyValue | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns | GetValue | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns | GetReadOnlyValue | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns |
  • 44. readonly fields BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0 LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0 Runtime=Clr Allocated=0 B Method | Job | Jit | Platform | Mean | StdErr | StdDev | ----------------- |------------- |---------- |--------- |---------- |---------- |---------- | GetValue | LegacyJitX64 | LegacyJit | X64 | 0.7893 ns | 0.0078 ns | 0.0291 ns | GetReadOnlyValue | LegacyJitX64 | LegacyJit | X64 | 9.5362 ns | 0.0251 ns | 0.0971 ns | GetValue | LegacyJitX86 | LegacyJit | X86 | 1.4625 ns | 0.0506 ns | 0.1959 ns | GetReadOnlyValue | LegacyJitX86 | LegacyJit | X86 | 1.9743 ns | 0.0641 ns | 0.2481 ns | GetValue | RyuJitX64 | RyuJit | X64 | 0.3852 ns | 0.0183 ns | 0.0710 ns | GetReadOnlyValue | RyuJitX64 | RyuJit | X64 | 9.6406 ns | 0.0803 ns | 0.3109 ns | https://codeblog.jonskeet.uk/2014/07/16/micro-optimization-the-surprising-inefficiency-of-readonly-fields/
  • 45. MOAR Benchmarks!! Analysing Optimisations in the Wire Serialiser • http://mattwarren.org/2016/08/23/Analysing-Optimisations-in-the-Wire-Serialiser/ Optimising LINQ • http://mattwarren.org/2016/09/29/Optimising-LINQ/ Why is reflection slow? • http://mattwarren.org/2016/12/14/Why-is-Reflection-slow/ Why Exceptions should be Exceptional • http://mattwarren.org/2016/12/20/Why-Exceptions-should-be-Exceptional/

Editor's Notes

  1. This is what we aim for This is why we wanted to build BenchmarkDotNet
  2. Aim is to make a framework that can accurately measure milli, micro and nano benchmarks. But in reality the main use-cases are probably milli/micro benchmarks, so these must work above all else!!  (Even lower down!!!) picoseconds: 1…1000 ps One trillionth of one second, pipelining
  3. Avoid foreach loop on everything except raw arrays