2. www.perfectial.com
Agenda
Profiling vs Benchmarking vs Micro-benchmarking
Benchmarking best practices
BenchmarkDotNet
Simplest benchmark
How does it work
Statistics
Configuration: Columns, Jobs, Diagnosers
4. www.perfectial.com
Profiling vs Benchmarking vs Micro-benchmarking
Benchmark measures the time for some whole operation.
Use benchmark to compare different versions of software, or the same software on
different hardware.
Result of benchmarking: single metric, a score.
5. www.perfectial.com
Profiling vs Benchmarking vs Micro-benchmarking
Microbenchmark - specific form of benchmarking, designed to measure the
performance of a very small and specific piece of code.
Result of benchmarking: single metric, a score.
7. www.perfectial.com
When to Benchmark
“Premature optimization is the root of all evil” – Donald Knuth
Before benchmarking:
• Think over architecture
• Choose fast algorithms
• Choose optimal data structures
• Optimize memory usage
• Optimize networking
• Optimize I/O
• Implement caching
8. www.perfectial.com
Benchmarking best practices
Warmup
Cleanup
Enable compiler optimizations (Release mode)
Run benchmarks without attached debugger
Avoid compiler optimizations in benchmark methods (dead code elimination , hoisting)
Run benchmark several times
Try different environments
Make sure your computer is in a fully idle state
//“Hoisting” is a compiler optimization that moves loop-invariant code out of loops
13. www.perfectial.com
Benchmark time units
Millisecond - ms
One thousandth of one second (1/1000 of a second)
Microsecond - us or µs
One millionth of one second (1/1,000,000 of a second)
Nanosecond - ns
One billionth of one second (1/1,000,000,000 of a second)
14. www.perfectial.com
1. BenchmarkRunner generates an isolated project per each benchmark
method/job/params and builds it in Release mode
2. Takes each method/job/params combination and try to measure its performance by
launching benchmark process several times
3. After all of the measurements, BenchmarkDotNet creates:
• An instance of the Summary class that contains all information about benchmark runs.
• A set of files that contains summary in human-readable and machine-readable
formats.
• A set of plots.
How does BenchmarkRunner work?
15. www.perfectial.com
How does benchmark work?
An invocation of the target method is an operation.
A bunch of operations is an iteration.
Types of iterations:
Pilot: The best operation count will be chosen
IdleWarmup, IdleTarget: BenchmarkDotNet overhead will be evaluated
MainWarmup: Warmup of the main method
MainTarget: Main measurements
Result = MainTarget - AverageOverhead
16. www.perfectial.com
Statistics
Mean - Arithmetic mean of all measurements
StandardError - Standard error of all measurements
StandardDeviation - Standard deviation of all measurements
Error - Half of 99.9% confidence interval
OperationsPerSecond
Min
Q1 - Quartile 1 (25th percentile)
Median (Q2) - Value separating the higher half of all measurements (50th percentile)
Q3 - Quartile 3 (75th percentile)
Max
P0, P25, P50, P67, P80, P85, P90, P95, P100 - Percentiles
17. www.perfectial.com
Configuration
Validators - validate benchmarks before they are executed and produce errors, if any critical
error exists execution is aborted
Columns - column in the summary table
Jobs - describes how to run your benchmark
Diagnosers – gather useful information
Exporters - export results of your benchmark in different formats. Default: csv, html, markdown
Loggers - log results
Analysers - analyze summary of benchmark and produce warnings
19. www.perfectial.com
Jobs Configuration: Environment
Platform: x86 / x64
Runtime: Full .NET Framework (4.6+) / .NET Core (1.1+) / Mono
Languages: C# / F# / Visual Basic
Jit: LegacyJit (Clr only) / RyuJit (Clr and Core only) / Llvm (Mono only)
Affinity: Process.ProcessorAffinity
GcMode:
Server: Server mode / Workstation mode
Concurrent: Concurrent mode / NonConcurrent mode
CpuGroups: Specifies whether garbage collection supports multiple CPU groups
Force: force full garbage collection after each benchmark invocation
20. www.perfectial.com
Jobs Configuration: Run
RunStrategy: Throughput / ColdStart / Monitoring
LaunchCount*: how many times we should launch process with target benchmark
WarmupCount*: how many warmup iterations should be performed
TargetCount*: how many target iterations should be performed
IterationTime*: desired time of a single iteration
InvocationCount: count of invocation of target method in a single iteration
* - better not to specify those characteristics, BenchmarkDotNet has a smart algorithm to choose these
values automatically
21. www.perfectial.com
Jobs Configuration: Accuracy
MaxRelativeError - defines max acceptable Error / Mean
MaxAbsoluteError - is an absolute TimeInterval
MinIterationTime - minimum time of a single iteration (specifies only the lower limit)
MinInvokeCount – min amount of target method invocation (default: 4)
EvaluateOverhead – evaluate and subtract overhead from result measurements (default: true)
RemoveOutliers – remove outliers from result measurements (default: true)
AnalyzeLaunchVariance - try to perform several launches and detect if there is a veriance
between launches
23. www.perfectial.com
Diagnosers:
A diagnoser can attach to your benchmark and get some useful info.
MemoryDiagnoser - GC and Memory Allocation, which is cross platform and built-in
HardwareCounters - Hardware Counter Diagnoser
InliningDiagnoser - JIT Inlining Events
DisassemblyDiagnoser - allows you to disassemble the benchmarked code to asm, IL, C#/F#
24. www.perfectial.com
Memory Diagnoser
Allocated - contains the size of allocated managed memory. Stackalloc/native heap
allocations are not included.
Gen X - the number of Gen X collections per 1 000 operations. If the value is equal 1, then it
means that GC collects memory once per one thousand of benchmark invocations in
generation X. BenchmarkDotNet is using some heuristic when running benchmarks, so the
number of invocations can be different for different runs.
//Every reference type instance has two extra fields: object header and method table
pointer, which are 4 bytes on x86 or 8bytes on x64 each.
//The layout of classes is determined by the JIT compiler depending on the
architecture.
31. US Representative Office
+1 857 30 23 414
75 Arlington St. Suite 500 Boston, MA 02116, USA
UA Software Development Office
+380 32 270 00 92
1A Kamianetska str., Lviv 79034, Ukraine
Thanks for your attention!