Your SlideShare is downloading. ×
Introduction to .NET Performance Measurement
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Introduction to .NET Performance Measurement


Published on

A two hour talk on performance measurement tools for .NET applications, including performance counters, Visual Studio profiler, and ETW with PerfView.

A two hour talk on performance measurement tools for .NET applications, including performance counters, Visual Studio profiler, and ETW with PerfView.

Published in: Technology

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. © Copyright SELA software & Education Labs Ltd. 14-18 Baruch Hirsch St.Bnei Brak 51202 Israel
  • 2. • Execution time – CPU time, wall-clock time, kernel time vs. user time • I/O requests – Number of disk operations, number of bytes transferred across the wire, number of files accessed • Database access – Sessions opened, transactions committed, grouping of execution time by SQL statement • OS and hardware – System calls, page faults, cache misses, TLB misses
  • 3. • A set of numeric data exposed by Windows or by individual applications that can be sampled programmatically – Organized hierarchically into Categories, Instances, and Counters • Accessed using System.Diagnostics: – PerformanceCounter, PerformanceCounterCategory – Can expose your own counters as well • Read with the built-in Performance Monitor MMC snap-in (perfmon.exe)
  • 4. • Supports managed and unmanaged code – Part of Visual Studio 2010/2012 Premium/Ultimate – Can be run stand-alone from the command-line • Operation modes: Sampling •CPU-bound apps, very low overhead •Full program stacks (including all system DLLs) •Tier interactions Instrumentation •I/O-bound apps, CPU-bound apps, higher overhead •More detailed timing data, limited stacks (just my code) Allocations •Details on who allocated and what •Managed code only Concurrency
  • 5. • Periodically interrupt the application – Timer (default = 10,000,000 clock cycles) – Page faults – System calls – CPU performance counters (cache misses, branch mispredictions, etc.) • Walk the application’s stack – Record the frames, no symbol resolution yet – Very fast, small intrusion, little effect on profiled app
  • 6. • Exclusive samples – Function was on the top of the stack – Function is doing a lot of individual work • Inclusive samples – Function was on the stack (but not the top) – Function causes a lot of work to be done Bar Foo Main Top +1 Inc +1 Exc +1 Inc +1 Inc
  • 7. • Samples ≠ Time – Blocked functions don’t get samples – There may be statistical errors (an “evasive” function that never shows up during a sample) • Very long runs are not necessary – Long runs = more noise = less clarity • Make sure you have debugging symbols – Use the Microsoft symbol server,
  • 8. • The profiler instruments the binary before it’s launched – Emits markers that record function execution times and counts – In other profilers, can work at the line-level as well – but very expensive void foo() { FUNC_ENTER(foo); // do some work CALL_ENTER(ExtCall); // call another function ExtCall(); CALL_EXIT(ExtCall); // do some more work FUNC_EXIT(foo); }
  • 9. • More detailed performance data – Number of calls – Actual Time (probe overhead is subtracted) • Elapsed time – Raw time spent in the function (wall clock time) • Application time – Probes are marked when kernel transitions occur between two probes – That time is discounted in Application time
  • 10. • Memory allocations incur a significant cost – The allocations are cheap, but the GC isn’t! – You won’t always see the cost at the source, because the allocating function runs quickly • Profiling an application for excessive allocations may be more important than CPU time – Another aspect is diagnosing memory leaks or sources of excess memory consumption
  • 11. • Identifies the locations making the most allocations, and lists the types and allocation counts
  • 12. • Analyze the application’s concurrency characteristics – CPU utilization – are all CPU cores active? – Thread migration between cores – Thread blocking patterns – why are threads blocked/unblocked, preempted, executing? – Resource contention – which threads are competing for the same resources? • In-depth analysis is very difficult – lots of information in a very short time
  • 13. • Common Patterns for Poorly-Behaved Multithreaded Applications
  • 14. • To get a quick result, an idea of where to focus • To analyze sources of cache misses, page faults, and other environmental factors • To profile a running process (e.g. Web server) that can’t be restarted easily Sampling • To get more accurate results, function call counts • To get wall-clock time information including block and wait timesInstrumentation • To get a general idea of CPU utilization and thread migration • To understand why threads are blocked and unblockedConcurrency
  • 15. • xperf.exe: Command-line tool for ETW capturing and processing • xperfview.exe: Visual trace analysis tool • xbootmgr.exe: On/off transition state capture tool • PerfView.exe: ETW capture tool for managed apps • Works on Windows Vista SP1 and above
  • 16. • Turn tracing on: xperf -on <PROVIDER> • Perform activities • Capture a log: xperf -d <LOG_FILE_NAME> • Analyze it: xperf <LOG_FILE_NAME>
  • 17. Performance Counters Visual Studio Profiler Event Tracing for Windows