Performance is a Feature!
Performance is a Feature!
Matt Warren
ca.com/apm
www.mattwarren.org
@matthewwarren
Front-end
Database & Caching
.NET CLR
Mechanical
Sympathy
Why does performance matter?
What do we need to measure?
How we can fix the issues?
Why?
Save money
Save power
Bad perf == broken
Lost customers
Half a second delay caused
a 20% drop in traffic
(Google)
Why?
“The most amazing achievement of
the computer software industry is its
continuing cancellation of the steady
and staggering gains made by the
computer hardware industry.”
- Henry Petroski
Why?
“We should forget about small efficiencies,
say about 97% of the time: premature
optimization is the root of all evil. Yet we
should not pass up our opportunities in
that critical 3%.“
- Donald Knuth
Why?
“We should forget about small efficiencies,
say about 97% of the time: premature
optimization is the root of all evil. Yet we
should not pass up our opportunities in
that critical 3%.“
- Donald Knuth
Never give up your
performance accidentally
Rico Mariani,
Performance Architect @
Microsoft
What?
Averages
are bad
"most people have
more than the average
number of legs"
- Hans Rosling
https://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen
https://blogs.msdn.microsoft.com/bharry/2016/03/28/introducing-application-analytics/
Application Insights Analytics
When?
In production
You won't see ANY perf issues
during unit tests
You won't see ALL perf issues
in Development
How?
Measure, measure, measure
1. Identify bottlenecks
2. Verify the optimisation works
How?
“The simple act of putting a render time in the upper right hand corner of every
page we serve forced us to fix all our performance regressions and omissions.”
How?
https://github.com/opserver/Opserver
How?
https://github.com/opserver/Opserver
How?
Micro-benchmarks
How?
Profiling -> Micro-benchmarks
http://www.hanselman.com/blog/BenchmarkingNETCode.aspx
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
static Uri @object = new Uri("http://google.com/search");
[Benchmark(Baseline = true)]
public string RegularPropertyCall()
{
return @object.Host;
}
[Benchmark]
public object Reflection()
{
Type @class = @object.GetType();
PropertyInfo property =
@class.GetProperty(propertyName, bindingFlags);
return property.GetValue(@object);
}
static void Main(string[] args)
{
var summary = BenchmarkRunner.Run<Program>();
}
Compared to one second
• Millisecond – ms
–thousandth (0.001 or 1/1000)
• Microsecond - μs
–millionth (0.000001 or 1/1,000,000)
• Nanosecond - ns
–billionth (0.000000001 or 1/1,000,000,000)
BenchmarkDotNet
BenchmarkDotNet=v0.9.4.0
OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz, ProcessorCount=8
HostCLR=MS.NET 4.0.30319.42000, Arch=32-bit RELEASE
JitModules=clrjit-v4.6.100.0
Type=Program Mode=Throughput
Method | Median | StdDev | Scaled |
--------------------- |------------ |----------- |------- |
RegularPropertyCall |
Reflection |
BenchmarkDotNet
BenchmarkDotNet=v0.9.4.0
OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz, ProcessorCount=8
HostCLR=MS.NET 4.0.30319.42000, Arch=32-bit RELEASE
JitModules=clrjit-v4.6.100.0
Type=Program Mode=Throughput
Method | Median | StdDev | Scaled |
--------------------- |------------ |----------- |------- |
RegularPropertyCall | 13.4053 ns | 1.5826 ns | 1.00 |
Reflection | 232.7240 ns | 32.0018 ns | 17.36 |
[Params(1, 2, 3, 4, 5, 10, 100, 1000)]
public int Loops;
[Benchmark]
public string StringConcat()
{
string result = string.Empty;
for (int i = 0; i < Loops; ++i)
result = string.Concat(result, i.ToString());
return result;
}
[Benchmark]
public string StringBuilder()
{
StringBuilder sb = new StringBuilder(string.Empty);
for (int i = 0; i < Loops; ++i)
sb.Append(i.ToString());
return sb.ToString();
}
https://github.com/dotnet/roslyn/issues/5388
How?
Garbage Collection (GC)
Allocations are cheap, but cleaning up isn’t
Difficult to measure the impact of GC
https://samsaffron.com/archive/2011/10/28/in-managed-code-we-trust-our-
recent-battles-with-the-net-garbage-collector
Stack Overflow Performance Lessons
Use static classes
Don’t be afraid to write your own tools
Dapper, Jil, MiniProfiler,
Intimately know your platform - CLR
Roslyn Performance Lessons 1
public class Logger
{
public static void WriteLine(string s) { /*...*/ }
}
public class Logger
{
public void Log(int id, int size)
{
var s = string.Format("{0}:{1}", id, size);
Logger.WriteLine(s);
}
}
Essential Truths Everyone Should Know about Performance in a Large Managed Codebase
http://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/DEV-B333
Roslyn Performance Lessons 1
public class Logger
{
public static void WriteLine(string s) { /*...*/ }
}
public class BoxingExample
{
public void Log(int id, int size)
{
var s = string.Format("{0}:{1}",
id.ToString(), size.ToString());
Logger.WriteLine(s);
}
}
https://github.com/dotnet/roslyn/pull/415
AVOID BOXING
Roslyn Performance Lessons 2
class Symbol {
public string Name { get; private set; }
/*...*/
}
class Compiler {
private List<Symbol> symbols;
public Symbol FindMatchingSymbol(string name)
{
return symbols.FirstOrDefault(s => s.Name == name);
}
}
Roslyn Performance Lessons 2
class Symbol {
public string Name { get; private set; }
/*...*/
}
class Compiler {
private List<Symbol> symbols;
public Symbol FindMatchingSymbol(string name)
{
foreach (Symbol s in symbols)
{
if (s.Name == name)
return s;
}
return null;
}
}
DON’T USE LINQ
BenchmarkDotNet
BenchmarkDotNet=v0.9.4.0
OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz, ProcessorCount=8
Frequency=2630654 ticks, Resolution=380.1336 ns, Timer=TSC
HostCLR=MS.NET 4.0.30319.42000, Arch=32-bit RELEASE
JitModules=clrjit-v4.6.100.0
Type=Program Mode=Throughput Runtime=Clr
Method | Median | StdDev | Gen 0 | Bytes Allocated/Op |
---------- |----------- |---------- |------- |------------------- |
Iterative | 39.0957 ns | 0.2150 ns | - | 0.00 |
LINQ | 53.2441 ns | 0.5385 ns | 701.50 | 23.21 |
Roslyn Performance Lessons 3
public class Example
{
// Constructs a name like "Foo<T1, T2, T3>"
public string GenerateFullTypeName(string name, int arity)
{
StringBuilder sb = new StringBuilder();
sb.Append(name);
if (arity != 0)
{
sb.Append("<");
for (int i = 1; i < arity; i++)
{
sb.Append('T'); sb.Append(i.ToString());
}
sb.Append('T'); sb.Append(arity.ToString());
}
return sb.ToString();
}
}
Roslyn Performance Lessons 3
public class Example
{
// Constructs a name like "Foo<T1, T2, T3>"
public string GenerateFullTypeName(string name, int arity)
{
StringBuilder sb = new AcquireBuilder();
sb.Append(name);
if (arity != 0)
{
sb.Append("<");
for (int i = 1; i < arity; i++)
{
sb.Append('T'); sb.Append(i.ToString());
}
sb.Append('T'); sb.Append(arity.ToString());
}
return GetStringAndReleaseBuilder(sb);
}
}
OBJECT POOLING
Roslyn Performance Lessons 3
[ThreadStatic]
private static StringBuilder cachedStringBuilder;
private static StringBuilder AcquireBuilder()
{
StringBuilder result = cachedStringBuilder;
if (result == null)
{
return new StringBuilder();
}
result.Clear();
cachedStringBuilder = null;
return result;
}
private static string GetStringAndReleaseBuilder(StringBuilder sb)
{
string result = sb.ToString();
cachedStringBuilder = sb;
return result;
}
Questions?
www.oz-code.com
@matthewwarren www.mattwarren.org
jetbrains.com/dotTrace
jetbrains.com/dotMemory

Performance is a Feature! at DDD 11

  • 1.
  • 2.
    Performance is aFeature! Matt Warren ca.com/apm www.mattwarren.org @matthewwarren
  • 4.
    Front-end Database & Caching .NETCLR Mechanical Sympathy
  • 6.
    Why does performancematter? What do we need to measure? How we can fix the issues?
  • 7.
    Why? Save money Save power Badperf == broken Lost customers Half a second delay caused a 20% drop in traffic (Google)
  • 8.
    Why? “The most amazingachievement of the computer software industry is its continuing cancellation of the steady and staggering gains made by the computer hardware industry.” - Henry Petroski
  • 9.
    Why? “We should forgetabout small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.“ - Donald Knuth
  • 10.
    Why? “We should forgetabout small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.“ - Donald Knuth
  • 11.
    Never give upyour performance accidentally Rico Mariani, Performance Architect @ Microsoft
  • 12.
  • 14.
    "most people have morethan the average number of legs" - Hans Rosling
  • 15.
  • 18.
  • 19.
    When? In production You won'tsee ANY perf issues during unit tests You won't see ALL perf issues in Development
  • 20.
    How? Measure, measure, measure 1.Identify bottlenecks 2. Verify the optimisation works
  • 21.
    How? “The simple actof putting a render time in the upper right hand corner of every page we serve forced us to fix all our performance regressions and omissions.”
  • 22.
  • 23.
  • 24.
  • 25.
  • 27.
  • 28.
    using BenchmarkDotNet.Attributes; using BenchmarkDotNet.Running; staticUri @object = new Uri("http://google.com/search"); [Benchmark(Baseline = true)] public string RegularPropertyCall() { return @object.Host; } [Benchmark] public object Reflection() { Type @class = @object.GetType(); PropertyInfo property = @class.GetProperty(propertyName, bindingFlags); return property.GetValue(@object); } static void Main(string[] args) { var summary = BenchmarkRunner.Run<Program>(); }
  • 29.
    Compared to onesecond • Millisecond – ms –thousandth (0.001 or 1/1000) • Microsecond - μs –millionth (0.000001 or 1/1,000,000) • Nanosecond - ns –billionth (0.000000001 or 1/1,000,000,000)
  • 30.
    BenchmarkDotNet BenchmarkDotNet=v0.9.4.0 OS=Microsoft Windows NT6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz, ProcessorCount=8 HostCLR=MS.NET 4.0.30319.42000, Arch=32-bit RELEASE JitModules=clrjit-v4.6.100.0 Type=Program Mode=Throughput Method | Median | StdDev | Scaled | --------------------- |------------ |----------- |------- | RegularPropertyCall | Reflection |
  • 31.
    BenchmarkDotNet BenchmarkDotNet=v0.9.4.0 OS=Microsoft Windows NT6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz, ProcessorCount=8 HostCLR=MS.NET 4.0.30319.42000, Arch=32-bit RELEASE JitModules=clrjit-v4.6.100.0 Type=Program Mode=Throughput Method | Median | StdDev | Scaled | --------------------- |------------ |----------- |------- | RegularPropertyCall | 13.4053 ns | 1.5826 ns | 1.00 | Reflection | 232.7240 ns | 32.0018 ns | 17.36 |
  • 32.
    [Params(1, 2, 3,4, 5, 10, 100, 1000)] public int Loops; [Benchmark] public string StringConcat() { string result = string.Empty; for (int i = 0; i < Loops; ++i) result = string.Concat(result, i.ToString()); return result; } [Benchmark] public string StringBuilder() { StringBuilder sb = new StringBuilder(string.Empty); for (int i = 0; i < Loops; ++i) sb.Append(i.ToString()); return sb.ToString(); } https://github.com/dotnet/roslyn/issues/5388
  • 35.
    How? Garbage Collection (GC) Allocationsare cheap, but cleaning up isn’t Difficult to measure the impact of GC
  • 38.
  • 39.
    Stack Overflow PerformanceLessons Use static classes Don’t be afraid to write your own tools Dapper, Jil, MiniProfiler, Intimately know your platform - CLR
  • 41.
    Roslyn Performance Lessons1 public class Logger { public static void WriteLine(string s) { /*...*/ } } public class Logger { public void Log(int id, int size) { var s = string.Format("{0}:{1}", id, size); Logger.WriteLine(s); } } Essential Truths Everyone Should Know about Performance in a Large Managed Codebase http://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/DEV-B333
  • 42.
    Roslyn Performance Lessons1 public class Logger { public static void WriteLine(string s) { /*...*/ } } public class BoxingExample { public void Log(int id, int size) { var s = string.Format("{0}:{1}", id.ToString(), size.ToString()); Logger.WriteLine(s); } } https://github.com/dotnet/roslyn/pull/415 AVOID BOXING
  • 43.
    Roslyn Performance Lessons2 class Symbol { public string Name { get; private set; } /*...*/ } class Compiler { private List<Symbol> symbols; public Symbol FindMatchingSymbol(string name) { return symbols.FirstOrDefault(s => s.Name == name); } }
  • 44.
    Roslyn Performance Lessons2 class Symbol { public string Name { get; private set; } /*...*/ } class Compiler { private List<Symbol> symbols; public Symbol FindMatchingSymbol(string name) { foreach (Symbol s in symbols) { if (s.Name == name) return s; } return null; } } DON’T USE LINQ
  • 45.
    BenchmarkDotNet BenchmarkDotNet=v0.9.4.0 OS=Microsoft Windows NT6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz, ProcessorCount=8 Frequency=2630654 ticks, Resolution=380.1336 ns, Timer=TSC HostCLR=MS.NET 4.0.30319.42000, Arch=32-bit RELEASE JitModules=clrjit-v4.6.100.0 Type=Program Mode=Throughput Runtime=Clr Method | Median | StdDev | Gen 0 | Bytes Allocated/Op | ---------- |----------- |---------- |------- |------------------- | Iterative | 39.0957 ns | 0.2150 ns | - | 0.00 | LINQ | 53.2441 ns | 0.5385 ns | 701.50 | 23.21 |
  • 46.
    Roslyn Performance Lessons3 public class Example { // Constructs a name like "Foo<T1, T2, T3>" public string GenerateFullTypeName(string name, int arity) { StringBuilder sb = new StringBuilder(); sb.Append(name); if (arity != 0) { sb.Append("<"); for (int i = 1; i < arity; i++) { sb.Append('T'); sb.Append(i.ToString()); } sb.Append('T'); sb.Append(arity.ToString()); } return sb.ToString(); } }
  • 47.
    Roslyn Performance Lessons3 public class Example { // Constructs a name like "Foo<T1, T2, T3>" public string GenerateFullTypeName(string name, int arity) { StringBuilder sb = new AcquireBuilder(); sb.Append(name); if (arity != 0) { sb.Append("<"); for (int i = 1; i < arity; i++) { sb.Append('T'); sb.Append(i.ToString()); } sb.Append('T'); sb.Append(arity.ToString()); } return GetStringAndReleaseBuilder(sb); } } OBJECT POOLING
  • 48.
    Roslyn Performance Lessons3 [ThreadStatic] private static StringBuilder cachedStringBuilder; private static StringBuilder AcquireBuilder() { StringBuilder result = cachedStringBuilder; if (result == null) { return new StringBuilder(); } result.Clear(); cachedStringBuilder = null; return result; } private static string GetStringAndReleaseBuilder(StringBuilder sb) { string result = sb.ToString(); cachedStringBuilder = sb; return result; }
  • 49.

Editor's Notes

  • #2 Who has:      - any perf requirements     - perf requirements with numbers!     - any perf tests     - perf test that are run continuously
  • #3 Who has:      - any perf requirements     - perf requirements with numbers!     - any perf tests     - perf test that are run continuously
  • #5 Front-end - YSlow, Google PageSpeed, CDN & caching    - "High Performance Web Sites" by Steve Sounder Database & caching - Learn to use SQL Profiler  - Redis or similar - MiniProfiler .NET (server-side) <- This is what we are looking at Mechanical Sympathy - Anything by Martin Thompson - Disruptor and Disruptor.NET  - CPU caches (L1, L2, etc) - memory access patterns
  • #8 Save money when running in the cloud (Zeeshan anecdote)   - Scale-up rather than just scale-out - Save power on mobile devices (also bad perf more obvious on constrained device) - To users bad performance looks like you're website isn't working!   - PerfBytes podcast, "News Of The Damned", a.k.a "which UK ticketing site has crashed this week"! - Bad performance might be losing you customers, before you even got them!! - Even internal L.O.B apps   - What could Dave in accounting do with an extra 50 minutes per week (10 min per/day)   - Maybe the really slow accounting app is the reason for him quitting and going to work for your main competitor!! 
  • #9 Henry Petroski (February 6, 1942) is an American engineer specializing in failure analysis. A professor both of civil engineering and history at Duke University, he is also a prolific author. To Engineer Is Human: The Role of Failure in Successful Design
  • #10 To know the critical 3%, we have to measure, Except Donal Knuth, who never write slow code and if he did, he would know which bit was slow!
  • #11 To know the critical 3%, we have to measure, Except Donal Knuth, who never write slow code and if he did, he would know which bit was slow!
  • #12 Thanks him for making Visual Studio faster He helped fix it after adding WPF made it SLOW!!!!
  • #13 Should be roughly 10-15 mins in by now, if not hurry up!!!!
  • #14 Normal distribution Things like height, weight, DOESN’T apply to everything!!
  • #15 Average is just less than 2, i.e. 1.995 or something like that But > 99% of people in the UK have 2 legs (more than the average)
  • #17 This is a histogram, Real-world example Web page response times Why are there 2 groups of histograms bar? - fast = cached data - slow = hitting the database
  • #20 Unit tests are meant to be fast, and they only test 1 thing In dev you don’t always have a full set of data You don’t test for long periods of time Smaller setup Michelle Bustamante talk about logging, don’t just need to measure things, Need to log the data AND be able to get at it!!
  • #21 You’ll probably guess wrong!! Consider adding performance unit tests, Noda-Time does this, can graph performance over time, see if it’s regressed!!
  • #22 MiniProfiler Turn this on in Development and if possible in Production Glimpse is an alternative
  • #23 Runs on .NET, Puts everything in 1 place, Web Server & Database Summary metrics up front Can drill-down into detailed metrics, including executed SQL, page load times, etc
  • #25 Make sure you are really measuring what you think you are measuring!!
  • #26 Make sure you are really measuring what you think you are measuring!!
  • #29 Nbench Xunit Performance
  • #33 https://github.com/dotnet/roslyn/issues/5388 Implement string concatenation in loops via manipulating a StringBuilder instead of emitting String.Concat() WON’T be implemented by the compiler
  • #36 Both StackOverflow and Roslyn affected by this!!!!! In the .NET Framework 4.5, there is background server garbage collection (before .NET 4.5 was Workstation only) So until .NET 4.5, Server GC was STOP-THE-WORLD
  • #37 Process Explorer From Sysinternals
  • #38 PerfView is a stand-alone utility, to help you debug CPU and memory problems Light-weight and non-intrusive, can be used to on production apps with minimal impact Uses ETW (Event Tracing for Windows), designed to be v. fast!!!!
  • #39 They were able to graph these results & equate them to Garbage Collector pauses!!! They had good logging and measurements in place,
  • #42 They measured and found that all of these were on the HOT PATH
  • #43 https://github.com/dotnet/roslyn/pull/415 Avoid unnecessary boxing with String.Concat Able to implement this optimization for types which are immutable, pure, and not affected by other code. Notably: - bool - char (and this was one of the motivating types for this optimization) - IntPtr - UIntPtr Due to side-effects of calling ToString() implementations that rely on the current culture (i.e. it culture can be changed mid-way through and you’ll see different behaviour)
  • #50 Repeat questions back to the audience!!!!!