.NET Core
Networking stack
and
Performance
DotNext in Moscow, RU (2017/11/12)
Karel Zikmund ( @ziki_cz)
Agenda
• Networking stack architecture evolution
• .NET Framework, UWP and .NET Core
• Networking stack in .NET Core
• Direction and plans
• Status & perf results
• General BCL performance
.NET Framework UWP .NET Core
Networking – Architecture Evolution
HttpWebRequest
+ ServicePoint
Sockets
SslStream,
Dns, …
Windows
HttpClient
+ HttpClientHandler
4.5
.NET Framework UWP .NET Core
Windows Linux / Mac
Sockets,
SslStream, …
Sockets
SslStream,
Dns, …
Windows
Networking – Architecture Evolution
HttpWebRequest
+ ServicePoint
Sockets
SslStream,
Dns, …
Windows
HttpClient
+ HttpClientHandler
4.5 HttpWebRequest
+ ServicePoint
HttpClient
+ HttpClientHandler
WinRT APIs
win9net.dll
HttpClient
WinHttpHandler
WinHttp.dll
CurlHandler
libcurl
OpenSSL
HttpWebRequest
+ ServicePoint
2.0
Networking – Architecture Evolution
.NET Framework
HttpWebRequest
+ ServicePoint
Sockets
SslStream,
Dns, …
Windows
HttpClient
+ HttpClientHandler
4.5
.NET Core
Windows Linux / Mac
Sockets,
SslStream, …
HttpClient
WinHttpHandler
WinHttp.dll
CurlHandler
libcurl
OpenSSL
HttpWebRequest
+ ServicePoint
2.0
Networking – Architecture Evolution
.NET Framework
HttpWebRequest
+ ServicePoint
Sockets
SslStream,
Dns, …
Windows
HttpClient
+ HttpClientHandler
4.5
.NET Core
Windows Linux / Mac
Sockets,
SslStream, …
HttpClient
WinHttpHandler
WinHttp.dll
CurlHandler
libcurl
OpenSSL
HttpWebRequest
+ ServicePoint
2.0
.NET Core Future
HttpWebRequest
+ ServicePoint
HttpClient
ManagedHandler
Windows Linux / Mac
OpenSSL
Sockets
SslStream,
Dns, …
Networking – Technical Roadmap
https://github.com/dotnet/designs/issues/9
1. Foundation – rock solid
• Sockets, SSL, DNS
2. Web stack (client) – high perf & consistency
• HttpClient, ClientWebSocket
3. Emerging technologies
• HTTP/2, RIO, QUIC
4. Maintenance components
• (Http/Ftp/File)WebRequest + ServicePoint, Mail, HttpListener
Networking – Focus on Perf
Scenarios / workloads:
• Micro-benchmarks, benchmarks, real-world scenarios (feedback)
Metrics:
• RPS (Response per second) & throughput – e.g. streaming video/music
• Latency – e.g. real-time trading
• Connection density – e.g. messaging apps, IoT/devices
Important properties:
• Percentiles (95% / 99%)
• Scale up
• Resources utilization (90%-95% ideal)
Networking – Perf test environment
• Repeatability – isolated environment (reduce noise)
• 2 machines:
• 4-core
• 16 GB RAM
• 2x NIC: 1x 1 Gbps + 1x 10 Gbps
• 8 servers:
• 12-core
• 64 GB RAM
• 2x NIC: 1x 1 Gbps + 1x 40 Gbps
A B
External network
10 Gbps
1 Gbps 1 Gbps
External network
S.1 S.8…
40 Gbps
1 Gbps 1 Gbps 1 Gbps
Networking – Sockets perf results
• Micro-benchmark only (disclaimer: Netty/Go impl may be inefficient)
• Linux 2 CPUs
1,000x RPS 1 B 16 B 256 B 4 KB
.NET Core 370 369 384 198
Netty 527 540 454 124
Go 517 531 485 210
GB/s 256 B 4 KB 64 KB 1 MB
.NET Core 0.09 0.77 1.09 1.10
Netty 0.11 0.48 0.66 0.67
Go 0.12 0.82 1.10 1.11
Networking – Sockets perf results
• Micro-benchmark only (disclaimer: Netty/Go impl may be inefficient)
• Linux 2 CPUs GB/s 256 B 4 KB 64 KB 1 MB
.NET Core 0.09 0.77 1.09 1.10
Netty 0.11 0.48 0.66 0.67
Go 0.12 0.82 1.10 1.11
SSL - GB/s 256 B 4 KB 64 KB 1 MB
.NET Core 0.04 0.31 0.71 0.87
Netty 0.03 0.12 0.15 0.15
Go 0.06 0.56 0.98 1.12
Networking – Sockets perf on Server
• Kestrel server uses libuv -> Sockets prototypes
• Early prototype (with hacks):
• 7% improvement + more potential
• Recent prototype (very preliminary data):
• 15% worse on Linux
• 20% worse on Windows
• Workarounds in Sockets -> parity with libuv perf
• Investigation in progress
Networking – ManagedHandler perf
• ManagedHandler
• Very early development stage
• Bugs
• Missing large features – authentication, proxy, http2
• Early measurements (simple http micro-benchmark):
• Windows: Parity with Go
• Linux: 15% gap (pending investigation)
Networking – SSL perf
• Historical reports on some .NET Framework scenarios: 2x slower
• Linux .NET Core 2.0 app report 4x slower
• libcurl+HttpClient pattern limitation
• With workaround: 14% overhead of SSL
• TechEmpower benchmark
• http vs. https larger diff than Rust/Go/Netty
• Sockets micro-benchmarks – 23% gap
• Attempt for rewrite by community (@drawaes)
• Next steps: Measure & analyze micro-benchmarks & end-to-end scenarios
Networking – Industry benchmarks
• TechEmpower benchmark
• More end-to-end, with DB, etc.
• Useful for overall platform performance comparison
• Round 15 (preliminary data)
• ASP.NET Core at #5 entry (jump from #14 in Round 14)
Importance of Performance
Platform performance shows how fast your app could be
… but it is not everything:
• Productivity
• Tooling
• Developer availability (in-house/to hire)
• Documentation
• Community
• etc.
Application Performance Tips
• Plan for performance during design
• Understand scenario, set goals
• Prototype and measure early
• Optimize what’s important – measure
• Understand the big picture
• Avoid micro-optimizations
• Don’t guess root cause – measure
• Minimize repro – it’s worth it!
BCL Performance
• Fine-tuned over 15 years
• Opportunities are often trade-offs (memory vs. speed, etc.)
• Problem: Identify scenarios which matter
• OSS helps
• More eyes on code
• Motivated contributors
• More reports
• Perf improvements in .NET Core (.NET blog by Stephen Toub)
• Collections, Linq, Compression, Crypto, Math, Serialization, Networking
• Span<T> sprinkled in BCL
BCL Performance – What to not take?
• Specialized collections
• BCL designed for usability and decent perf for 95% customers
• Code complexity (maintainability) vs. perf wins
• APIs for specialized operations (e.g. to save duplicate lookup)
• Creates complexity
• May leak implementation into API surface
Wrap Up
• Proactive investments into .NET Networking stack
• Consistency across platforms
• Great performance for all workloads
• Ongoing scenario/feedback-based improvements in BCL perf
• Performance in general is:
• Important
• But not the only important thing
• Tricky to get right in the right place

DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel Zikmund

  • 1.
    .NET Core Networking stack and Performance DotNextin Moscow, RU (2017/11/12) Karel Zikmund ( @ziki_cz)
  • 2.
    Agenda • Networking stackarchitecture evolution • .NET Framework, UWP and .NET Core • Networking stack in .NET Core • Direction and plans • Status & perf results • General BCL performance
  • 3.
    .NET Framework UWP.NET Core Networking – Architecture Evolution HttpWebRequest + ServicePoint Sockets SslStream, Dns, … Windows HttpClient + HttpClientHandler 4.5
  • 4.
    .NET Framework UWP.NET Core Windows Linux / Mac Sockets, SslStream, … Sockets SslStream, Dns, … Windows Networking – Architecture Evolution HttpWebRequest + ServicePoint Sockets SslStream, Dns, … Windows HttpClient + HttpClientHandler 4.5 HttpWebRequest + ServicePoint HttpClient + HttpClientHandler WinRT APIs win9net.dll HttpClient WinHttpHandler WinHttp.dll CurlHandler libcurl OpenSSL HttpWebRequest + ServicePoint 2.0
  • 5.
    Networking – ArchitectureEvolution .NET Framework HttpWebRequest + ServicePoint Sockets SslStream, Dns, … Windows HttpClient + HttpClientHandler 4.5 .NET Core Windows Linux / Mac Sockets, SslStream, … HttpClient WinHttpHandler WinHttp.dll CurlHandler libcurl OpenSSL HttpWebRequest + ServicePoint 2.0
  • 6.
    Networking – ArchitectureEvolution .NET Framework HttpWebRequest + ServicePoint Sockets SslStream, Dns, … Windows HttpClient + HttpClientHandler 4.5 .NET Core Windows Linux / Mac Sockets, SslStream, … HttpClient WinHttpHandler WinHttp.dll CurlHandler libcurl OpenSSL HttpWebRequest + ServicePoint 2.0 .NET Core Future HttpWebRequest + ServicePoint HttpClient ManagedHandler Windows Linux / Mac OpenSSL Sockets SslStream, Dns, …
  • 7.
    Networking – TechnicalRoadmap https://github.com/dotnet/designs/issues/9 1. Foundation – rock solid • Sockets, SSL, DNS 2. Web stack (client) – high perf & consistency • HttpClient, ClientWebSocket 3. Emerging technologies • HTTP/2, RIO, QUIC 4. Maintenance components • (Http/Ftp/File)WebRequest + ServicePoint, Mail, HttpListener
  • 8.
    Networking – Focuson Perf Scenarios / workloads: • Micro-benchmarks, benchmarks, real-world scenarios (feedback) Metrics: • RPS (Response per second) & throughput – e.g. streaming video/music • Latency – e.g. real-time trading • Connection density – e.g. messaging apps, IoT/devices Important properties: • Percentiles (95% / 99%) • Scale up • Resources utilization (90%-95% ideal)
  • 9.
    Networking – Perftest environment • Repeatability – isolated environment (reduce noise) • 2 machines: • 4-core • 16 GB RAM • 2x NIC: 1x 1 Gbps + 1x 10 Gbps • 8 servers: • 12-core • 64 GB RAM • 2x NIC: 1x 1 Gbps + 1x 40 Gbps A B External network 10 Gbps 1 Gbps 1 Gbps External network S.1 S.8… 40 Gbps 1 Gbps 1 Gbps 1 Gbps
  • 10.
    Networking – Socketsperf results • Micro-benchmark only (disclaimer: Netty/Go impl may be inefficient) • Linux 2 CPUs 1,000x RPS 1 B 16 B 256 B 4 KB .NET Core 370 369 384 198 Netty 527 540 454 124 Go 517 531 485 210 GB/s 256 B 4 KB 64 KB 1 MB .NET Core 0.09 0.77 1.09 1.10 Netty 0.11 0.48 0.66 0.67 Go 0.12 0.82 1.10 1.11
  • 11.
    Networking – Socketsperf results • Micro-benchmark only (disclaimer: Netty/Go impl may be inefficient) • Linux 2 CPUs GB/s 256 B 4 KB 64 KB 1 MB .NET Core 0.09 0.77 1.09 1.10 Netty 0.11 0.48 0.66 0.67 Go 0.12 0.82 1.10 1.11 SSL - GB/s 256 B 4 KB 64 KB 1 MB .NET Core 0.04 0.31 0.71 0.87 Netty 0.03 0.12 0.15 0.15 Go 0.06 0.56 0.98 1.12
  • 12.
    Networking – Socketsperf on Server • Kestrel server uses libuv -> Sockets prototypes • Early prototype (with hacks): • 7% improvement + more potential • Recent prototype (very preliminary data): • 15% worse on Linux • 20% worse on Windows • Workarounds in Sockets -> parity with libuv perf • Investigation in progress
  • 13.
    Networking – ManagedHandlerperf • ManagedHandler • Very early development stage • Bugs • Missing large features – authentication, proxy, http2 • Early measurements (simple http micro-benchmark): • Windows: Parity with Go • Linux: 15% gap (pending investigation)
  • 14.
    Networking – SSLperf • Historical reports on some .NET Framework scenarios: 2x slower • Linux .NET Core 2.0 app report 4x slower • libcurl+HttpClient pattern limitation • With workaround: 14% overhead of SSL • TechEmpower benchmark • http vs. https larger diff than Rust/Go/Netty • Sockets micro-benchmarks – 23% gap • Attempt for rewrite by community (@drawaes) • Next steps: Measure & analyze micro-benchmarks & end-to-end scenarios
  • 15.
    Networking – Industrybenchmarks • TechEmpower benchmark • More end-to-end, with DB, etc. • Useful for overall platform performance comparison • Round 15 (preliminary data) • ASP.NET Core at #5 entry (jump from #14 in Round 14)
  • 16.
    Importance of Performance Platformperformance shows how fast your app could be … but it is not everything: • Productivity • Tooling • Developer availability (in-house/to hire) • Documentation • Community • etc.
  • 17.
    Application Performance Tips •Plan for performance during design • Understand scenario, set goals • Prototype and measure early • Optimize what’s important – measure • Understand the big picture • Avoid micro-optimizations • Don’t guess root cause – measure • Minimize repro – it’s worth it!
  • 18.
    BCL Performance • Fine-tunedover 15 years • Opportunities are often trade-offs (memory vs. speed, etc.) • Problem: Identify scenarios which matter • OSS helps • More eyes on code • Motivated contributors • More reports • Perf improvements in .NET Core (.NET blog by Stephen Toub) • Collections, Linq, Compression, Crypto, Math, Serialization, Networking • Span<T> sprinkled in BCL
  • 19.
    BCL Performance –What to not take? • Specialized collections • BCL designed for usability and decent perf for 95% customers • Code complexity (maintainability) vs. perf wins • APIs for specialized operations (e.g. to save duplicate lookup) • Creates complexity • May leak implementation into API surface
  • 20.
    Wrap Up • Proactiveinvestments into .NET Networking stack • Consistency across platforms • Great performance for all workloads • Ongoing scenario/feedback-based improvements in BCL perf • Performance in general is: • Important • But not the only important thing • Tricky to get right in the right place

Editor's Notes

  • #4 Client stack HttpWebRequest since .NET 1.0 Exceptions on errors (404) Headers as strings (parsing) – error prone 4.5 added HttpClient, driven by WCF Pipeline architecture (extensibility) No http2 support Original plan to later re-wire HttpClient directly on fundamentals – turns out it is huge compat work and will likely never happen (compat is king on .NET Framework) Missing APIs from the picture: Fundamentals: NetworkingInfo, Uri FtpWebRequest, FileWebRequest Mail on Sockets & SslStream – now obsoleted (MailKit recommended) WebSockets on HttpWebRequest & Sockets & websockets.dll (Win8+) HttpListener (server) – now obsoleted by Kestrel
  • #5 UWP: WinRT APIs designed by Windows Networking team at he same time as 4.5 – almost 1:1 mapping win9net.dll is client library - http2 support .NET Core: http2 support – server library HttpWebRequest for compatibility in .NET Core 2.0 (.NET Standard 2.0), but “obsoleted” Other “obsoleted” in 2.0: *WebRequest & Mail & HttpListener libcurl with NSS, not just OpenSSL Different behaviors on different OS’s or even Linux distros – inconsistencies in behavior WebSocket = ManagedWebSocket on Win7 & Linux/Mac (With mini ManagedHandler on Sockets) Note: Attempt to ship WinHttpHandler also for .NET Framework as System.Net.Http.dll (http2 on Desktop) – ambition to replace inbox failed due to differences
  • #7 Key values: Consistency and Perf wins Mono has OS-specific handlers (Phone OS specific capabilities around connection transition between data and Wi-Fi)
  • #8 Foundation – performance (& reliability) Usable for both server and client scenarios Web stack – consistency & performance (& reliability) Important for middleware scenarios (not just 1 server) Emerging technologies – new protocols, capabilities and performance motivated RIO = Registered I/O (Win8+) QUIC = Quick UDP Internet Protocol (TCP/TSL replacement) … Latency improvements (esp. for streaming) Maintenance components – minimal investments – mostly for reliability and the most necessary standards support (preserve customers’ investments)
  • #10 Repeatability Non-networking micro-benchmarks – time (wallclock), memory … classic CLR perf lab disables many OS features Cloud (Azure) attempt You want: Full control
  • #11 Column is payload size (1B -> 1MB, each column is 16x bigger than previous one) Note: RPS as metric in second table better tells the story of scaling based on payload size
  • #12 Go has assembly-written crypto Note: .NET Core 1.1 was at 0.47 GB/s at 1MB – 2x improvement in 2.0 and 2.1 (=future)
  • #13 Sockets value: Consistency & less external dependency Early prototype – Hacks around response buffering in Kestrel – flushing tuned for libuv Recent prototype (1 week old) – Workarounds point to potential perf improvements
  • #15 Known fact: SslStream class could use some love Note: ManagedHandler https: 7% slower than CurlHandler (without Ssl)
  • #16 Feature across Language (C# compiler), Runtime (incl. JIT), and BCL Value is not perf on its own – unsafe code can be faster (up to 25% on micro-benchmarks) But not universal for Native memory Pinning memory Example of not everything is black and white … clearly better perf, often trade offs
  • #17 See black line – regressions from Span<T> and recovering back perf
  • #19 Warning: Not everyone is truly building hyper-scale service Even if you think you are, don’t forget that most scaling apps are rewritten every 2-3 years You don’t have to be perfect on day 1, evolve Story: Trading SW .NET for productivity (over C++ proposal) Later perf More and more serious Down to sub-ms GCs, reusing memory Rewriting key components to native eventually Note: .NET can be faster than C++ in certain workloads (e.g. allocations in Gen0 are super efficient) Real app startup (demo app in both C++ and C#)
  • #20 Optimize what’s important: 90-10 rule (95-5) … what contributes to app performance Story: Arguing over d.s. List vs. array when the data is <100 items and the service has 3 hops between servers Guess root cause mistakes: Conclusions based on 1 dump (or 2 in better case) Blaming platform: Lots of memory used by app => it must be GC bug … if there is no mem pressure on OS, GC will use it GC shows happens too often => it must be GC bug … but maybe you just allocate too much and GC is doing what its told Also in non-performance: JIT_Throw on callstack => JIT bug … we renamed it to IL_Throw (just throws exception) Crash during GC => GC bug … corruption is likely from interop / native/unsafe code BTW: TTD is life saver
  • #21 Thousands of APIs, hard to pick the right ones Problem: Using telemetry from MS/partners, partner teams reports, working closely with customers and community OSS: We often ask about the scenario, to understand the big picture Examples: (quite often 30% and more) SortedSet<T>.ctor – O(n^2) -> O(n) … 600x on 400K items List<T>.Add – already fast, but used everywhere ConcurrentBag/ConcurrentQueue