SlideShare a Scribd company logo
SKILLWISE-ENHANCING
DOTNET APP
Enhancing performance of .NET
applications
Content
ā€¢ Implementing value types correctly
ā€¢ Applying pre-compilation
ā€¢ Using unsafe code and pointers
ā€¢ Choosing a collection
ā€¢ Make your code as parallel as necessary
IMPLEMENTING VALUE TYPES
CORRECTLY
Two Categories of Types
ā€¢ Reference types
ā€“ Offer a set of managed services: locks, inheritance, and
more
ā€¢ Value types
ā€“ Do not offer these services
ā€¢ Additional superficial differences
ā€“ Parameter passing
ā€“ Equality
Object Layout
ā€¢ Heap objects (reference types) have two
header fields
ā€¢ Stack objects (value types) donā€™t have
headers
ā€¢ Why two types of types and object layouts
Using Value Types
ā€¢ Use value types when performance is critical
ā€“ Creating a large number of objects
ā€“ Creating a large collection of objects
Basic Value Type
ā€¢ The basic value type implementation is inadequate
Origins of Equals
ā€¢ List<T>.Contains calls Equals
ā€¢ Declared by System.Objectand overridden by
System.ValueType
Boxing
ā€¢ Equalsā€™ parameter must be boxed
Avoiding Boxing and Reflection
ā€¢ Override Equals
ā€¢ Overload Equals
ā€¢ Implement IEquatable<T>
Final Tuning
ā€¢ Add equality operators
ā€¢ Add GetHashCode
GetHashCode
ā€¢ Used by Dictionary, HashSet, and other collections
ā€¢ Declared by System.Object, overridden by System.ValueType
ā€¢ Must be consistent with Equals:
A.Equals(B) ļƒ A.GetHashCode() == B.GetHashCode()
ā€¢ Use value types in high-performance
scenarios
ā€“ Tight loops, large collections
ā€¢ Implement value types correctly
ā€“ Equals, IEquatable<T>, GetHashCode
Applying precompilation
ā€¢ Improving startup time
ā€¢ Precompilation
ā€“ Ngen
ā€“ Serialization assemblies
ā€“ Regular expressions
ā€¢ Other ways of improving startup time
ā€“ Multi-core background JIT
ā€“ MPGO
Startup Costs
ā€¢ Cold startup
ā€“ Disk I/O
ā€¢ Warm Startup
ā€“ JIT compilation
ā€“ Signature validation
ā€“ DLL rebasing
ā€“ Initialization
Improving Startup Time with NGen
ā€¢ NGen precompiles .NET assemblies to native code
> ngen install MyApp.exe
ā€“ Includes dependencies
ā€“ Precompiled assemblies stored in
C:WindowsAssemblyNativeImages_*
ā€“ Fall back to original if stale
ā€¢ Automatic NGen in Windows 8 and CLR 4.5
Multi-Core Background JIT
ā€¢ Usually, methods are compiled to native when invoked
ā€¢ Multi-core background JIT in CLR 4.5
ā€“ Opt in using System.Runtime.ProfileOptimization class
using System.Runtime;
ProfileOptimization.SetProfileRoot(folderName);
ProfileOptimization.StartProfile(profileName);
ā€¢ Relies on profile information generated at runtime
ā€“ Can use multiple profiles
RyuJIT
ā€¢ A rewrite of the JIT compiler
ā€“ Faster compilation (throughput)
ā€“ Better code (quality)
Managed Profile-Guided Optimization
(MPGO)
ā€¢ Introduced in .NET 4.5
ā€“ Improves precompiled assembliesā€™ disk layout
ā€“ Places hot code and data closer together on disk
ā€¢ Relies on profile information collected at
runtime
Improving Cold Startup
ā€¢ I/O costs are #1 thing to improve
ā€¢ ILMerge (Microsoft Research)
ā€¢ Executable packers
ā€¢ Placing strong-named assemblies in the GAC
ā€¢ Windows SuperFetch
Precompiling Serialization Assemblies
ā€¢ Serialization often creates dynamic methods
on first use
ā€¢ These methods can be precompiled
ā€“ SGen.exe creates precompiled serialization
assemblies on Xm
ā€“ Protobuf-net has a precompilation tool
Precompiling Regexes
ā€¢ By default, the Regex class interprets the regular expression
when you match it
ā€¢ Regex can generate IL code instead of using interpretation:
ā€¢ Even better, you can precompile regular expressions to an
assembly:
USING UNSAFE CODE AND
POINTERS
Pointers? In C#?
ā€¢ Raw pointers are part of the C# syntax
ā€¢ Interoperability with Win32 and other DLLs
ā€¢ Performance in specific scenarios
Pointers and Pinning
ā€¢ We want to go from byte[]to byte*
ā€¢ When getting a pointer to a heap object, what if the GC moves it?
ā€¢ Pinning is required
byte[] source = ...;
fixed(byte* p = &source)
{
...
}
ā€¢ Directly manipulate memory
*p = (byte)12;
int x = *(int*)p;
ā€¢ Requires unsafeblock and ā€œAllow unsafe codeā€
Copying Memory Using Pointers
ā€¢ Mimicking Array.Copyor Buffer.BlockCopy
ā€¢ Better to copy more than one byte per iteration
fixed (byte* p = &src)
fixed (byte* q = &dst)
{
long*pSrc= (long*)p;
long*pDst= (long*)q;
for (inti= 0; i< dst.Length/8; ++i)
{
*pDst= *pSrc;
++pDst; ++pSrc;
}
}
ā€¢ Might be interesting to unroll the loop
Reading Structures
ā€¢ Read structures from a potentially infinite stream
structTcpHeader
{
public uintSrcIP, DstIP;
public ushortSrcPort, DstPort;
}
ā€¢ Do it fast ā€“several GBps, >100M structures/second
ā€“ We will look at multiple approaches and measure them
The Pointer-Free Approach
TcpHeaderRead(byte[] data, intoffset)
{
MemoryStreamms= new MemoryStream(data);
BinaryReaderbr= new BinaryReader(ms);
TcpHeaderresult = new TcpHeader();
result.SrcIP= br.ReadUInt32();
result.DstIP= br.ReadUInt32();
result.SrcPort= br.ReadUInt16();
result.DstPort= br.ReadUInt16();
return result;
}
Marshal.PtrToStructure
ā€¢ System.Runtime.InteropServices.Marshal is designed for interoperability
scenarios
ā€¢ Marshal.PtrToStructure seems useful
Object PtrToStObject PtrToStructure(Type type, IntPtraddress)
ā€¢ GCHandle can pin an object in memory and give us a pointer to it
GCHandlehandle = GCHandle.Alloc(obj, GCHandleType.Pinned);
Try
{
IntPtraddress = handle.AddrOfPinnedObject();
}
Finally
{
handle.Free();
}
Using Pointers
ā€¢ Pointers can help by casting
fixed (byte* p = &data[offset])
{
TcpHeader* pHeader= (TcpHeader*)p;
return *pHeader;
}
ā€¢ Very simple, doesnā€™t require helper routines
A Generic Approach
ā€¢ Unfortunately, T*doesnā€™t work ā€“T must be blittable
unsafe T Read(byte[] data, int offset)
{
fixed (byte* p = &data[offset])
{
return *(T*)p;
}
}
ā€¢ We can generate a method for each T and call it when necessary
ā€“ Reflection.Emit
ā€“ CSharpCodeProvider
ā€“ Roslyn
CHOOSING A COLLECTION
Collection Considerations
ā€¢ There are many built-in collection classes
ā€“ There are even more in third-party libraries like C5
ā€¢ Fundamental operations: insert, delete, find
ā€¢ Evaluation criteria:
Example: LinkedList<T>
ā€¢ Doubly linked list, lots of memory overhead
per node
ā€¢ Insertion and deletion are very fast ā€“ O(1)
ā€¢ Lookup is slow ā€“ O(n)
Arrays
ā€¢ Flat, sequential, statically sized
ā€¢ Very fast access to elements
ā€¢ No per-element overhead
ā€¢ Foundation for many other collection classes
List<T>
ā€¢ Dynamic (resizable) array
ā€“ Doubles its size with each expansion
ā€“ For 100,000,000 insertions: [log 100,000,000] = 27
expansions
ā€¢ Insertions not at the end are very expensive
ā€“ Good for append-only data
ā€¢ No specialized lookup facility
ā€¢ Still no per-element overhead
LinkedList<T>
ā€¢ Doubly-linked list
ā€¢ Very flexible collection for insertions/deletions
ā€¢ Still requires linear-time (O(n)) for lookup
ā€¢ Very big space overhead per element
Trees
ā€¢ SortedDictionary<K,V> and SortedSet<T> are implemented
with a balanced binary search tree
ā€“ Efficient lookup by key
ā€“ Sorted by key
ā€¢ All fundamental operations take O(log(n)) time
ā€“ For example, log(100,000,000) is less than 27
ā€“ Great for storing dynamic data that is queried often
ā€¢ Big space overhead per element (several additional fields)
Associative Collections
ā€¢ Dictionary<K,V> and HashSet<T> use hashing to arrange the
elements
ā€¢ Insertion, deletion and lookup work in constant time ā€“ O(1)
ā€“ GetHashCode must be well-distributed for this to happen
ā€¢ Medium memory overhead
ā€“ Combination of arrays and linked lists
ā€“ Smaller than trees in most cases
Comparison of Built-In Collections
Scenarios
ā€¢ Word frequency in a large body of text
ā€“ Dictionary<string,uint>
ā€¢ Queue of orders in a restaurant
ā€“ LinkedList<Order>
ā€¢ Buffer of continuous log messages
ā€“ List<LogMessage>
Why Custom Collections?
Tries
ā€¢ A text editor needs to store a dictionary of words
ā€“ ā€œrunā€, ā€œdolphinā€, ā€œregardā€ but also ā€œrunningā€, ā€œdolphinsā€,
ā€œregardlessā€
ā€“ Offers spell checking and automatic word completion
ā€¢ HashSet
ā€“ Super-fast spell checking
ā€“ Not sorted, so automatic completion by prefix is O(n)
ā€¢ SortedSet
ā€“ Still fast spell checking
ā€“ Sorted but access to predecessor/successor is not exposed
ā€¢ Enter: Trie
Trie Internals
ā€¢ Very compact
ā€“ Shared prefixes are only stored once
ā€¢ Finding all words with a prefix is ā€œby designā€
Union-Find
ā€¢ Tracking which nodes are in each connected component in a graph
ā€“ Connected component = set of nodes that are connected
ā€¢ Need to support fast insertion of new edges
ā€¢ Basic operations required:
ā€“ Find the connected component to which a node belongs
ā€“ Unify two connected components into one
ā€¢ Using a list of nodes per component makes merging expensive
ā€¢ Enter: Disjoint set forest
Disjoint Set Forest
ā€¢ Each node has a reference to its parent
ā€“ The node without a parent is the representative of the set
ā€¢ Union and find:
ā€“ The representative knows the connected component
ā€“ Merging means updating representatives
ā€¢ Problem: find could be O(n), fixed by:
ā€“ Attaching smaller tree to larger one when merging
ā€“ Flattening the hierarchy while running find
ā€¢ O(a(n) running time, less than 5 for all practical values
GARBAGE COLLECTION INTERNALS
Garbage Collection
ā€¢ Garbage collection means we donā€™t have to manually free
memory
ā€¢ Garbage collection isnā€™t free and has performance trade-offs
ā€“ Questionable on real-time systems, mobile devices, etc.
ā€¢ The CLR garbage collector (GC) is an almost-concurrent,
parallel, compacting, mark-and-sweep, generational, tracing
GC
Mark and Sweep
ā€¢ Mark: identify all live objects
ā€¢ Sweep: reclaim dead objects
ā€¢ Compact: shift live objects
together
ā€¢ Objects that can still be used
must be kept alive
Roots
ā€¢ Starting points for the garbage collector
ā€¢ Static variables
ā€¢ Local variables
ā€“ More tricky than they appear
ā€¢ Finalization queue, f-reachable queue, GC
handles, etc.
ā€¢ Roots can cause memory leaks
Workstation GC
ā€¢ There are multiple garbage collection flavors
ā€¢ Workstation GC is ā€œkind ofā€ suitable for client apps
ā€“ The default for almost all .NET applications
ā€¢ GC runs on a single thread
ā€¢ Concurrent workstation GC
ā€“ Special GC thread
ā€“ Runs concurrently with application threads, only short suspensions
ā€¢ Non-concurrent workstation GC
ā€“ One of the app threads does the GC
ā€“ All threads are suspended during GC
ā€¢ Workstation GC doesnā€™t use all CPU cores
Server GC
ā€¢ One GC thread per logical processor, all working
at once
ā€¢ Separate heap area for each logical processor
ā€¢ Until CLR 4.5, server GC was non-concurrent
ā€¢ In CLR 4.5, server GC becomes concurrent
ā€“ Now a reasonable default for many high-memory apps
Switching GC Flavors
ā€¢ Configure preferred flavor in app.config
ā€“ Ignored if invalid (e.g. concurrent GC on CLR 2.0)
ā€¢ Canā€™t switch flavors at runtime
ā€“ But can query flavor using GCSettingsclass
Generational Garbage Collection
ā€¢ A full GC is expensive and inefficient
ā€¢ Divide the heap into regions and perform small
collections often
ā€“ Modern server apps canā€™t live with frequent full GCs
ā€“ Frequently-touched regions should have many dead
objects
ā€¢ Newobjects die fast, oldobjects stay alive
ā€“ Typical behavior for many applications, although
exceptions exist
.NET Generations
ā€¢ Three heap regions (generations)
ā€¢ Gen 0 and gen 1 are typically quite smallA high
allocation rate leads to many fast gen 0
collections
ā€¢ Survivors from gen 0 are promoted to gen 1, and
so on
ā€¢ Make sure your temporary objects die young
and avoid frequent promotions to generation 2
The Large Object Heap
ā€¢ Large objects are stored in a separate heap region (LOH)
ā€¢ Large means larger than 85,000 bytes or array of >1,000
doubles
ā€¢ The GC doesnā€™t compact the LOH
ā€“ This may cause fragmentation
ā€¢ The LOH is considered part of generation 2
ā€“ Temporary large objects are a common GC performance
problem
Explicit LOH Compilation
ā€¢ LOH fragmentation leads to a waste of
memory
ā€¢ .NET 4.5.1 introduces LOH compaction
ā€“ You can test for LOH fragmentation using the
!dumpheap-statSOS command
Foreground and Background GC
ā€¢ In concurrent GC, application threads continue to run during full
GC
ā€¢ What happens if an application thread allocates during GC?
ā€“ In CLR 2.0, the application thread waits for full GC to complete
ā€¢ In CLR 4.0, the application thread launches a foregroundGC
ā€¢ In servercon current GC, there are special foreground GC
threads
ā€¢ Background/foreground GC is only available as part of
concurrent GC
Resource Cleanup
ā€¢ The GC only takes care of memory, not all
reclaimable resources
ā€“ Sockets, file handles, database transactions, etc.
ā€“ When a database transaction dies, it has to abort the
transaction and close the network connection
ā€¢ C++ has destructors: deterministic cleanup
ā€¢ The .NET GC doesnā€™t release objects
deterministically
Finalization
ā€¢ The CLR runs a finalizer after the object becomes
unreachable
ā€¢ Letā€™s design the finalization mechanism:
ā€“ Finalization queue for potentially ā€œfinalizableā€ objects
ā€“ Identifying candidates for finalization
ā€“ Selecting a thread for finalization: the finalizer thread
ā€“ F-reachable queue for finalization candidates
ā€“ Objects removed from f-reachable queue can be GCā€™d
ā€¢ This is pretty much how CLR finalization works!
Performance Problems with Finalization
ā€¢ Finalization extends object lifetime
ā€¢ The f-reachable queue might fill up faster than the finalizer
thread can drain it
ā€“ Can be addressed by deterministic finalization (Dispose)
ā€¢ Itā€™s possible for a finalizerto run while an instance method
hasnā€™t returned yet
The Dispose Pattern
ā€¢ Stay away from finalization and use deterministic cleanup
ā€“ No performance problems
ā€“ Youā€™re responsible for resource management
ā€¢ The Dispose pattern
ā€¢ Can combine Dispose with finalization
Resurrection and Object Pooling
ā€¢ Bring an object back to life from the finalizer
ā€¢ Can be used to implement an object pool
ā€“ A cache of objects, like DB connections, that are
expensive to initialize
MAKE YOUR CODE AS PARALLEL AS
NECESSARY
Kinds of Parallelism
ā€¢ Parallelism - Running multiple threads in
parallel
ā€¢ Concurrency - Doing multiple things at once
ā€¢ Asynchrony - Without blocking the callerā€™s
thread
Kinds of Workloads
ā€¢ CPU bound
ā€¢ I/O bound
ā€¢ Mixed
Data Parallelism
ā€¢ Parallelize operation on a collection of items
ā€¢ TPL takes care of thread management
Parallel Loops
ā€¢ Parallel.For
ā€¢ Parallel.ForEach
ā€¢ Customization
ā€“ Breaking early
ā€“ Limiting parallelism
ā€“ Aggregation
I/O-Bound Workloads and Asynchronous I/O
ā€¢ Data parallelism is suited for CPU-bound
workloads
ā€“ CPUs arenā€™t good at sitting and waiting for I/O
ā€¢ Asynchronous I/O operations
ā€“ Asynchronous file read
ā€“ Asynchronous HTTP POST
ā€¢ Multiple outstanding I/O operations per
thread
async and await
ā€¢ C# 5.0 language support for asynchronous
operations
Awaiting Tasks and IAsyncOperation
ā€¢ await support
ā€“ The TPL Task class
ā€“ The IAsyncOperation Windows Runtime interface
// In System.Net.Http.HttpClient
public Task<string>GetStringAsync(string requestUri);
// In Windows.Web.Http.HttpClient
public IAsyncOperationWithProgress<String,
HttpProgress>GetStringAsync(Uri uri);
Parallelizing I/O Requests
ā€¢ Start a few outstanding I/O operations and
then..
ā€“ Wait-All : Process results when all operations are
done
ā€“ Wait-Any : Process each operationā€™s results when
available
Task.WhenAll
Task<string>[] tasks = new Task<string>[] {
m_http.GetStringAsync(url1),
m_http.GetStringAsync(url2),
m_http.GetStringAsync(url3)
};
Task<string[]> all = Task.WhenAll(tasks);
string[] results = await all;
// Process the results
Task.WhenAny
List<Task<string>> tasks = new List<Task<string>>[] {
m_http.GetStringAsync(url1),
m_http.GetStringAsync(url2),
m_http.GetStringAsync(url3)
};
while (tasks.Count> 0)
{
Task<Task<string>> any = Task.WhenAny(tasks);
Task<string> completed = await any;
// Process the result in completed.Result
tasks.Remove(completed);
}
Synchronization and Amdahlā€™s Law
ā€¢ When using parallelism, shared resources
require synchronization
ā€¢ Amdahlā€™s Law
ā€“ If the fraction P of the application requires
synchronization, the maximum possible speedup is:
ā€“ E.g., for P = 0.5 (50%), the maximum speedup is 2x
ā€¢ Scalability is critical as # of CPUs increases
Concurrent Data Structures
ā€¢ Thread-safe data structures in the TPL
ā€¢ Use them instead of a lock around the
standard collections
Aggregation
ā€¢ Collect intermediate results into thread-local structures
Parallel.For(
from,
to,
() => produce thread local state,
(i, _, local) => do work and return new local state,
local => combine local states into global state
);
Lock-Free Operations
ā€¢ Atomic hardware primitives from the Interlocked class
ā€“ Interlocked.Increment, Interlocked.Decrement, Interlocked.Add, etc.
ā€¢ Especially useful: Interlocked.CompareExchange
// Performs ā€œshared *= xā€ atomically
static void AtomicMultiply(ref intshared, intx)
{
intold, result;
do
{
old = shared;
result = old * x;
}
while (old != Interlocked.CompareExchange(
ref shared, old, result));
}
Skillwise - Enhancing dotnet app

More Related Content

What's hot

Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized EngineApache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
DataWorks Summit
Ā 
From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...
From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...
From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...
Databricks
Ā 

What's hot (20)

JVM languages "flame wars"
JVM languages "flame wars"JVM languages "flame wars"
JVM languages "flame wars"
Ā 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
Ā 
Java Hands-On Workshop
Java Hands-On WorkshopJava Hands-On Workshop
Java Hands-On Workshop
Ā 
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized EngineApache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Ā 
mongodb-aggregation-may-2012
mongodb-aggregation-may-2012mongodb-aggregation-may-2012
mongodb-aggregation-may-2012
Ā 
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLPDictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
Ā 
From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...
From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...
From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...
Ā 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Ā 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Ā 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
Ā 
SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)
Ā 
Performance van Java 8 en verder - Jeroen Borgers
Performance van Java 8 en verder - Jeroen BorgersPerformance van Java 8 en verder - Jeroen Borgers
Performance van Java 8 en verder - Jeroen Borgers
Ā 
Introduction to Spark with Scala
Introduction to Spark with ScalaIntroduction to Spark with Scala
Introduction to Spark with Scala
Ā 
Demystifying DataFrame and Dataset
Demystifying DataFrame and DatasetDemystifying DataFrame and Dataset
Demystifying DataFrame and Dataset
Ā 
Benchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on SparkBenchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on Spark
Ā 
Road to Analytics
Road to AnalyticsRoad to Analytics
Road to Analytics
Ā 
Tutorial 3 (b tree min heap)
Tutorial 3 (b tree min heap)Tutorial 3 (b tree min heap)
Tutorial 3 (b tree min heap)
Ā 
Sparkling pandas Letting Pandas Roam - PyData Seattle 2015
Sparkling pandas Letting Pandas Roam - PyData Seattle 2015Sparkling pandas Letting Pandas Roam - PyData Seattle 2015
Sparkling pandas Letting Pandas Roam - PyData Seattle 2015
Ā 
Intro to JavaScript - Week 4: Object and Array
Intro to JavaScript - Week 4: Object and ArrayIntro to JavaScript - Week 4: Object and Array
Intro to JavaScript - Week 4: Object and Array
Ā 
Pa2 session 1
Pa2 session 1Pa2 session 1
Pa2 session 1
Ā 

Similar to Skillwise - Enhancing dotnet app

Tthornton code4lib
Tthornton code4libTthornton code4lib
Tthornton code4lib
trevorthornton
Ā 
Hibernate in XPages
Hibernate in XPagesHibernate in XPages
Hibernate in XPages
Toby Samples
Ā 
Main MeMory Data Base
Main MeMory Data BaseMain MeMory Data Base
Main MeMory Data Base
Siva Rushi
Ā 
Cassandra
CassandraCassandra
Cassandra
exsuns
Ā 

Similar to Skillwise - Enhancing dotnet app (20)

2CPP17 - File IO
2CPP17 - File IO2CPP17 - File IO
2CPP17 - File IO
Ā 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastore
Ā 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
Ā 
Tthornton code4lib
Tthornton code4libTthornton code4lib
Tthornton code4lib
Ā 
SFDC Introduction to Apex
SFDC Introduction to ApexSFDC Introduction to Apex
SFDC Introduction to Apex
Ā 
Hibernate in XPages
Hibernate in XPagesHibernate in XPages
Hibernate in XPages
Ā 
Intro_2.ppt
Intro_2.pptIntro_2.ppt
Intro_2.ppt
Ā 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
Ā 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
Ā 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
Ā 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
Ā 
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Ā 
Decima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero DawnDecima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero Dawn
Ā 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
Ā 
Main MeMory Data Base
Main MeMory Data BaseMain MeMory Data Base
Main MeMory Data Base
Ā 
Intro to Data Structure & Algorithms
Intro to Data Structure & AlgorithmsIntro to Data Structure & Algorithms
Intro to Data Structure & Algorithms
Ā 
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 TaipeiPostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
Ā 
Cassandra
CassandraCassandra
Cassandra
Ā 
CPP19 - Revision
CPP19 - RevisionCPP19 - Revision
CPP19 - Revision
Ā 
Data structures and algorithms Module-1.pdf
Data structures and algorithms Module-1.pdfData structures and algorithms Module-1.pdf
Data structures and algorithms Module-1.pdf
Ā 

More from Skillwise Group

More from Skillwise Group (20)

Skillwise Consulting New updated
Skillwise Consulting New updatedSkillwise Consulting New updated
Skillwise Consulting New updated
Ā 
Email Etiquette
Email Etiquette Email Etiquette
Email Etiquette
Ā 
Healthcare profile
Healthcare profileHealthcare profile
Healthcare profile
Ā 
Manufacturing courses
Manufacturing coursesManufacturing courses
Manufacturing courses
Ā 
Retailing & logistics profile
Retailing & logistics profileRetailing & logistics profile
Retailing & logistics profile
Ā 
Skillwise orientation
Skillwise orientationSkillwise orientation
Skillwise orientation
Ā 
Overview- Skillwise Consulting
Overview- Skillwise Consulting Overview- Skillwise Consulting
Overview- Skillwise Consulting
Ā 
Skillwise corporate presentation
Skillwise corporate presentationSkillwise corporate presentation
Skillwise corporate presentation
Ā 
Skillwise Profile
Skillwise ProfileSkillwise Profile
Skillwise Profile
Ā 
Skillwise Softskill Training Workshop
Skillwise Softskill Training WorkshopSkillwise Softskill Training Workshop
Skillwise Softskill Training Workshop
Ā 
Skillwise Insurance profile
Skillwise Insurance profileSkillwise Insurance profile
Skillwise Insurance profile
Ā 
Skillwise Train and Hire Services
Skillwise Train and Hire ServicesSkillwise Train and Hire Services
Skillwise Train and Hire Services
Ā 
Skillwise Digital Technology
Skillwise Digital Technology Skillwise Digital Technology
Skillwise Digital Technology
Ā 
Skillwise Boot Camp Training
Skillwise Boot Camp TrainingSkillwise Boot Camp Training
Skillwise Boot Camp Training
Ā 
Skillwise Academy Profile
Skillwise Academy ProfileSkillwise Academy Profile
Skillwise Academy Profile
Ā 
Skillwise Overview
Skillwise OverviewSkillwise Overview
Skillwise Overview
Ā 
SKILLWISE - OOPS CONCEPT
SKILLWISE - OOPS CONCEPTSKILLWISE - OOPS CONCEPT
SKILLWISE - OOPS CONCEPT
Ā 
Skillwise - Business writing
Skillwise - Business writing Skillwise - Business writing
Skillwise - Business writing
Ā 
Imc.ppt
Imc.pptImc.ppt
Imc.ppt
Ā 
Skillwise cics part 1
Skillwise cics part 1Skillwise cics part 1
Skillwise cics part 1
Ā 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
Ā 

Recently uploaded (20)

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Ā 
UiPath New York Community Day in-person event
UiPath New York Community Day in-person eventUiPath New York Community Day in-person event
UiPath New York Community Day in-person event
Ā 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ā 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Ā 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Ā 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Ā 
ŠŠŠ”Š†ŠÆ Š¤Š•Š”Š®ŠØŠšŠž Š‘ŠŠ¦ Ā«ŠŸŃ€Š¾Ń„ŠµŃŃ–ŠøĢ†Š½Šµ Š·Ń€Š¾ŃŃ‚Š°Š½Š½Ń QA сŠæŠµŃ†Ń–Š°Š»Ń–стŠ°Ā»
ŠŠŠ”Š†ŠÆ Š¤Š•Š”Š®ŠØŠšŠž Š‘ŠŠ¦  Ā«ŠŸŃ€Š¾Ń„ŠµŃŃ–ŠøĢ†Š½Šµ Š·Ń€Š¾ŃŃ‚Š°Š½Š½Ń QA сŠæŠµŃ†Ń–Š°Š»Ń–стŠ°Ā»ŠŠŠ”Š†ŠÆ Š¤Š•Š”Š®ŠØŠšŠž Š‘ŠŠ¦  Ā«ŠŸŃ€Š¾Ń„ŠµŃŃ–ŠøĢ†Š½Šµ Š·Ń€Š¾ŃŃ‚Š°Š½Š½Ń QA сŠæŠµŃ†Ń–Š°Š»Ń–стŠ°Ā»
ŠŠŠ”Š†ŠÆ Š¤Š•Š”Š®ŠØŠšŠž Š‘ŠŠ¦ Ā«ŠŸŃ€Š¾Ń„ŠµŃŃ–ŠøĢ†Š½Šµ Š·Ń€Š¾ŃŃ‚Š°Š½Š½Ń QA сŠæŠµŃ†Ń–Š°Š»Ń–стŠ°Ā»
Ā 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
Ā 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Ā 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Ā 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Ā 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Ā 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Ā 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ā 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Ā 
Ransomware Mallox [EN].pdf
Ransomware         Mallox       [EN].pdfRansomware         Mallox       [EN].pdf
Ransomware Mallox [EN].pdf
Ā 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Ā 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
Ā 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Ā 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Ā 

Skillwise - Enhancing dotnet app

  • 2. Enhancing performance of .NET applications
  • 3. Content ā€¢ Implementing value types correctly ā€¢ Applying pre-compilation ā€¢ Using unsafe code and pointers ā€¢ Choosing a collection ā€¢ Make your code as parallel as necessary
  • 5. Two Categories of Types ā€¢ Reference types ā€“ Offer a set of managed services: locks, inheritance, and more ā€¢ Value types ā€“ Do not offer these services ā€¢ Additional superficial differences ā€“ Parameter passing ā€“ Equality
  • 6. Object Layout ā€¢ Heap objects (reference types) have two header fields ā€¢ Stack objects (value types) donā€™t have headers ā€¢ Why two types of types and object layouts
  • 7. Using Value Types ā€¢ Use value types when performance is critical ā€“ Creating a large number of objects ā€“ Creating a large collection of objects
  • 8. Basic Value Type ā€¢ The basic value type implementation is inadequate
  • 9. Origins of Equals ā€¢ List<T>.Contains calls Equals ā€¢ Declared by System.Objectand overridden by System.ValueType
  • 11. Avoiding Boxing and Reflection ā€¢ Override Equals ā€¢ Overload Equals ā€¢ Implement IEquatable<T>
  • 12. Final Tuning ā€¢ Add equality operators ā€¢ Add GetHashCode
  • 13. GetHashCode ā€¢ Used by Dictionary, HashSet, and other collections ā€¢ Declared by System.Object, overridden by System.ValueType ā€¢ Must be consistent with Equals: A.Equals(B) ļƒ A.GetHashCode() == B.GetHashCode()
  • 14. ā€¢ Use value types in high-performance scenarios ā€“ Tight loops, large collections ā€¢ Implement value types correctly ā€“ Equals, IEquatable<T>, GetHashCode
  • 15. Applying precompilation ā€¢ Improving startup time ā€¢ Precompilation ā€“ Ngen ā€“ Serialization assemblies ā€“ Regular expressions ā€¢ Other ways of improving startup time ā€“ Multi-core background JIT ā€“ MPGO
  • 16. Startup Costs ā€¢ Cold startup ā€“ Disk I/O ā€¢ Warm Startup ā€“ JIT compilation ā€“ Signature validation ā€“ DLL rebasing ā€“ Initialization
  • 17. Improving Startup Time with NGen ā€¢ NGen precompiles .NET assemblies to native code > ngen install MyApp.exe ā€“ Includes dependencies ā€“ Precompiled assemblies stored in C:WindowsAssemblyNativeImages_* ā€“ Fall back to original if stale ā€¢ Automatic NGen in Windows 8 and CLR 4.5
  • 18. Multi-Core Background JIT ā€¢ Usually, methods are compiled to native when invoked ā€¢ Multi-core background JIT in CLR 4.5 ā€“ Opt in using System.Runtime.ProfileOptimization class using System.Runtime; ProfileOptimization.SetProfileRoot(folderName); ProfileOptimization.StartProfile(profileName); ā€¢ Relies on profile information generated at runtime ā€“ Can use multiple profiles
  • 19. RyuJIT ā€¢ A rewrite of the JIT compiler ā€“ Faster compilation (throughput) ā€“ Better code (quality)
  • 20. Managed Profile-Guided Optimization (MPGO) ā€¢ Introduced in .NET 4.5 ā€“ Improves precompiled assembliesā€™ disk layout ā€“ Places hot code and data closer together on disk ā€¢ Relies on profile information collected at runtime
  • 21. Improving Cold Startup ā€¢ I/O costs are #1 thing to improve ā€¢ ILMerge (Microsoft Research) ā€¢ Executable packers ā€¢ Placing strong-named assemblies in the GAC ā€¢ Windows SuperFetch
  • 22. Precompiling Serialization Assemblies ā€¢ Serialization often creates dynamic methods on first use ā€¢ These methods can be precompiled ā€“ SGen.exe creates precompiled serialization assemblies on Xm ā€“ Protobuf-net has a precompilation tool
  • 23. Precompiling Regexes ā€¢ By default, the Regex class interprets the regular expression when you match it ā€¢ Regex can generate IL code instead of using interpretation: ā€¢ Even better, you can precompile regular expressions to an assembly:
  • 24. USING UNSAFE CODE AND POINTERS
  • 25. Pointers? In C#? ā€¢ Raw pointers are part of the C# syntax ā€¢ Interoperability with Win32 and other DLLs ā€¢ Performance in specific scenarios
  • 26. Pointers and Pinning ā€¢ We want to go from byte[]to byte* ā€¢ When getting a pointer to a heap object, what if the GC moves it? ā€¢ Pinning is required byte[] source = ...; fixed(byte* p = &source) { ... } ā€¢ Directly manipulate memory *p = (byte)12; int x = *(int*)p; ā€¢ Requires unsafeblock and ā€œAllow unsafe codeā€
  • 27. Copying Memory Using Pointers ā€¢ Mimicking Array.Copyor Buffer.BlockCopy ā€¢ Better to copy more than one byte per iteration fixed (byte* p = &src) fixed (byte* q = &dst) { long*pSrc= (long*)p; long*pDst= (long*)q; for (inti= 0; i< dst.Length/8; ++i) { *pDst= *pSrc; ++pDst; ++pSrc; } } ā€¢ Might be interesting to unroll the loop
  • 28. Reading Structures ā€¢ Read structures from a potentially infinite stream structTcpHeader { public uintSrcIP, DstIP; public ushortSrcPort, DstPort; } ā€¢ Do it fast ā€“several GBps, >100M structures/second ā€“ We will look at multiple approaches and measure them
  • 29. The Pointer-Free Approach TcpHeaderRead(byte[] data, intoffset) { MemoryStreamms= new MemoryStream(data); BinaryReaderbr= new BinaryReader(ms); TcpHeaderresult = new TcpHeader(); result.SrcIP= br.ReadUInt32(); result.DstIP= br.ReadUInt32(); result.SrcPort= br.ReadUInt16(); result.DstPort= br.ReadUInt16(); return result; }
  • 30. Marshal.PtrToStructure ā€¢ System.Runtime.InteropServices.Marshal is designed for interoperability scenarios ā€¢ Marshal.PtrToStructure seems useful Object PtrToStObject PtrToStructure(Type type, IntPtraddress) ā€¢ GCHandle can pin an object in memory and give us a pointer to it GCHandlehandle = GCHandle.Alloc(obj, GCHandleType.Pinned); Try { IntPtraddress = handle.AddrOfPinnedObject(); } Finally { handle.Free(); }
  • 31. Using Pointers ā€¢ Pointers can help by casting fixed (byte* p = &data[offset]) { TcpHeader* pHeader= (TcpHeader*)p; return *pHeader; } ā€¢ Very simple, doesnā€™t require helper routines
  • 32. A Generic Approach ā€¢ Unfortunately, T*doesnā€™t work ā€“T must be blittable unsafe T Read(byte[] data, int offset) { fixed (byte* p = &data[offset]) { return *(T*)p; } } ā€¢ We can generate a method for each T and call it when necessary ā€“ Reflection.Emit ā€“ CSharpCodeProvider ā€“ Roslyn
  • 34. Collection Considerations ā€¢ There are many built-in collection classes ā€“ There are even more in third-party libraries like C5 ā€¢ Fundamental operations: insert, delete, find ā€¢ Evaluation criteria:
  • 35. Example: LinkedList<T> ā€¢ Doubly linked list, lots of memory overhead per node ā€¢ Insertion and deletion are very fast ā€“ O(1) ā€¢ Lookup is slow ā€“ O(n)
  • 36. Arrays ā€¢ Flat, sequential, statically sized ā€¢ Very fast access to elements ā€¢ No per-element overhead ā€¢ Foundation for many other collection classes
  • 37. List<T> ā€¢ Dynamic (resizable) array ā€“ Doubles its size with each expansion ā€“ For 100,000,000 insertions: [log 100,000,000] = 27 expansions ā€¢ Insertions not at the end are very expensive ā€“ Good for append-only data ā€¢ No specialized lookup facility ā€¢ Still no per-element overhead
  • 38. LinkedList<T> ā€¢ Doubly-linked list ā€¢ Very flexible collection for insertions/deletions ā€¢ Still requires linear-time (O(n)) for lookup ā€¢ Very big space overhead per element
  • 39. Trees ā€¢ SortedDictionary<K,V> and SortedSet<T> are implemented with a balanced binary search tree ā€“ Efficient lookup by key ā€“ Sorted by key ā€¢ All fundamental operations take O(log(n)) time ā€“ For example, log(100,000,000) is less than 27 ā€“ Great for storing dynamic data that is queried often ā€¢ Big space overhead per element (several additional fields)
  • 40. Associative Collections ā€¢ Dictionary<K,V> and HashSet<T> use hashing to arrange the elements ā€¢ Insertion, deletion and lookup work in constant time ā€“ O(1) ā€“ GetHashCode must be well-distributed for this to happen ā€¢ Medium memory overhead ā€“ Combination of arrays and linked lists ā€“ Smaller than trees in most cases
  • 41. Comparison of Built-In Collections
  • 42. Scenarios ā€¢ Word frequency in a large body of text ā€“ Dictionary<string,uint> ā€¢ Queue of orders in a restaurant ā€“ LinkedList<Order> ā€¢ Buffer of continuous log messages ā€“ List<LogMessage>
  • 44. Tries ā€¢ A text editor needs to store a dictionary of words ā€“ ā€œrunā€, ā€œdolphinā€, ā€œregardā€ but also ā€œrunningā€, ā€œdolphinsā€, ā€œregardlessā€ ā€“ Offers spell checking and automatic word completion ā€¢ HashSet ā€“ Super-fast spell checking ā€“ Not sorted, so automatic completion by prefix is O(n) ā€¢ SortedSet ā€“ Still fast spell checking ā€“ Sorted but access to predecessor/successor is not exposed ā€¢ Enter: Trie
  • 45. Trie Internals ā€¢ Very compact ā€“ Shared prefixes are only stored once ā€¢ Finding all words with a prefix is ā€œby designā€
  • 46. Union-Find ā€¢ Tracking which nodes are in each connected component in a graph ā€“ Connected component = set of nodes that are connected ā€¢ Need to support fast insertion of new edges ā€¢ Basic operations required: ā€“ Find the connected component to which a node belongs ā€“ Unify two connected components into one ā€¢ Using a list of nodes per component makes merging expensive ā€¢ Enter: Disjoint set forest
  • 47. Disjoint Set Forest ā€¢ Each node has a reference to its parent ā€“ The node without a parent is the representative of the set ā€¢ Union and find: ā€“ The representative knows the connected component ā€“ Merging means updating representatives ā€¢ Problem: find could be O(n), fixed by: ā€“ Attaching smaller tree to larger one when merging ā€“ Flattening the hierarchy while running find ā€¢ O(a(n) running time, less than 5 for all practical values
  • 49. Garbage Collection ā€¢ Garbage collection means we donā€™t have to manually free memory ā€¢ Garbage collection isnā€™t free and has performance trade-offs ā€“ Questionable on real-time systems, mobile devices, etc. ā€¢ The CLR garbage collector (GC) is an almost-concurrent, parallel, compacting, mark-and-sweep, generational, tracing GC
  • 50. Mark and Sweep ā€¢ Mark: identify all live objects ā€¢ Sweep: reclaim dead objects ā€¢ Compact: shift live objects together ā€¢ Objects that can still be used must be kept alive
  • 51. Roots ā€¢ Starting points for the garbage collector ā€¢ Static variables ā€¢ Local variables ā€“ More tricky than they appear ā€¢ Finalization queue, f-reachable queue, GC handles, etc. ā€¢ Roots can cause memory leaks
  • 52. Workstation GC ā€¢ There are multiple garbage collection flavors ā€¢ Workstation GC is ā€œkind ofā€ suitable for client apps ā€“ The default for almost all .NET applications ā€¢ GC runs on a single thread ā€¢ Concurrent workstation GC ā€“ Special GC thread ā€“ Runs concurrently with application threads, only short suspensions ā€¢ Non-concurrent workstation GC ā€“ One of the app threads does the GC ā€“ All threads are suspended during GC ā€¢ Workstation GC doesnā€™t use all CPU cores
  • 53. Server GC ā€¢ One GC thread per logical processor, all working at once ā€¢ Separate heap area for each logical processor ā€¢ Until CLR 4.5, server GC was non-concurrent ā€¢ In CLR 4.5, server GC becomes concurrent ā€“ Now a reasonable default for many high-memory apps
  • 54. Switching GC Flavors ā€¢ Configure preferred flavor in app.config ā€“ Ignored if invalid (e.g. concurrent GC on CLR 2.0) ā€¢ Canā€™t switch flavors at runtime ā€“ But can query flavor using GCSettingsclass
  • 55. Generational Garbage Collection ā€¢ A full GC is expensive and inefficient ā€¢ Divide the heap into regions and perform small collections often ā€“ Modern server apps canā€™t live with frequent full GCs ā€“ Frequently-touched regions should have many dead objects ā€¢ Newobjects die fast, oldobjects stay alive ā€“ Typical behavior for many applications, although exceptions exist
  • 56. .NET Generations ā€¢ Three heap regions (generations) ā€¢ Gen 0 and gen 1 are typically quite smallA high allocation rate leads to many fast gen 0 collections ā€¢ Survivors from gen 0 are promoted to gen 1, and so on ā€¢ Make sure your temporary objects die young and avoid frequent promotions to generation 2
  • 57. The Large Object Heap ā€¢ Large objects are stored in a separate heap region (LOH) ā€¢ Large means larger than 85,000 bytes or array of >1,000 doubles ā€¢ The GC doesnā€™t compact the LOH ā€“ This may cause fragmentation ā€¢ The LOH is considered part of generation 2 ā€“ Temporary large objects are a common GC performance problem
  • 58. Explicit LOH Compilation ā€¢ LOH fragmentation leads to a waste of memory ā€¢ .NET 4.5.1 introduces LOH compaction ā€“ You can test for LOH fragmentation using the !dumpheap-statSOS command
  • 59. Foreground and Background GC ā€¢ In concurrent GC, application threads continue to run during full GC ā€¢ What happens if an application thread allocates during GC? ā€“ In CLR 2.0, the application thread waits for full GC to complete ā€¢ In CLR 4.0, the application thread launches a foregroundGC ā€¢ In servercon current GC, there are special foreground GC threads ā€¢ Background/foreground GC is only available as part of concurrent GC
  • 60. Resource Cleanup ā€¢ The GC only takes care of memory, not all reclaimable resources ā€“ Sockets, file handles, database transactions, etc. ā€“ When a database transaction dies, it has to abort the transaction and close the network connection ā€¢ C++ has destructors: deterministic cleanup ā€¢ The .NET GC doesnā€™t release objects deterministically
  • 61. Finalization ā€¢ The CLR runs a finalizer after the object becomes unreachable ā€¢ Letā€™s design the finalization mechanism: ā€“ Finalization queue for potentially ā€œfinalizableā€ objects ā€“ Identifying candidates for finalization ā€“ Selecting a thread for finalization: the finalizer thread ā€“ F-reachable queue for finalization candidates ā€“ Objects removed from f-reachable queue can be GCā€™d ā€¢ This is pretty much how CLR finalization works!
  • 62. Performance Problems with Finalization ā€¢ Finalization extends object lifetime ā€¢ The f-reachable queue might fill up faster than the finalizer thread can drain it ā€“ Can be addressed by deterministic finalization (Dispose) ā€¢ Itā€™s possible for a finalizerto run while an instance method hasnā€™t returned yet
  • 63. The Dispose Pattern ā€¢ Stay away from finalization and use deterministic cleanup ā€“ No performance problems ā€“ Youā€™re responsible for resource management ā€¢ The Dispose pattern ā€¢ Can combine Dispose with finalization
  • 64. Resurrection and Object Pooling ā€¢ Bring an object back to life from the finalizer ā€¢ Can be used to implement an object pool ā€“ A cache of objects, like DB connections, that are expensive to initialize
  • 65. MAKE YOUR CODE AS PARALLEL AS NECESSARY
  • 66. Kinds of Parallelism ā€¢ Parallelism - Running multiple threads in parallel ā€¢ Concurrency - Doing multiple things at once ā€¢ Asynchrony - Without blocking the callerā€™s thread
  • 67. Kinds of Workloads ā€¢ CPU bound ā€¢ I/O bound ā€¢ Mixed
  • 68. Data Parallelism ā€¢ Parallelize operation on a collection of items ā€¢ TPL takes care of thread management
  • 69. Parallel Loops ā€¢ Parallel.For ā€¢ Parallel.ForEach ā€¢ Customization ā€“ Breaking early ā€“ Limiting parallelism ā€“ Aggregation
  • 70. I/O-Bound Workloads and Asynchronous I/O ā€¢ Data parallelism is suited for CPU-bound workloads ā€“ CPUs arenā€™t good at sitting and waiting for I/O ā€¢ Asynchronous I/O operations ā€“ Asynchronous file read ā€“ Asynchronous HTTP POST ā€¢ Multiple outstanding I/O operations per thread
  • 71. async and await ā€¢ C# 5.0 language support for asynchronous operations
  • 72. Awaiting Tasks and IAsyncOperation ā€¢ await support ā€“ The TPL Task class ā€“ The IAsyncOperation Windows Runtime interface // In System.Net.Http.HttpClient public Task<string>GetStringAsync(string requestUri); // In Windows.Web.Http.HttpClient public IAsyncOperationWithProgress<String, HttpProgress>GetStringAsync(Uri uri);
  • 73. Parallelizing I/O Requests ā€¢ Start a few outstanding I/O operations and then.. ā€“ Wait-All : Process results when all operations are done ā€“ Wait-Any : Process each operationā€™s results when available
  • 74. Task.WhenAll Task<string>[] tasks = new Task<string>[] { m_http.GetStringAsync(url1), m_http.GetStringAsync(url2), m_http.GetStringAsync(url3) }; Task<string[]> all = Task.WhenAll(tasks); string[] results = await all; // Process the results
  • 75. Task.WhenAny List<Task<string>> tasks = new List<Task<string>>[] { m_http.GetStringAsync(url1), m_http.GetStringAsync(url2), m_http.GetStringAsync(url3) }; while (tasks.Count> 0) { Task<Task<string>> any = Task.WhenAny(tasks); Task<string> completed = await any; // Process the result in completed.Result tasks.Remove(completed); }
  • 76. Synchronization and Amdahlā€™s Law ā€¢ When using parallelism, shared resources require synchronization ā€¢ Amdahlā€™s Law ā€“ If the fraction P of the application requires synchronization, the maximum possible speedup is: ā€“ E.g., for P = 0.5 (50%), the maximum speedup is 2x ā€¢ Scalability is critical as # of CPUs increases
  • 77. Concurrent Data Structures ā€¢ Thread-safe data structures in the TPL ā€¢ Use them instead of a lock around the standard collections
  • 78. Aggregation ā€¢ Collect intermediate results into thread-local structures Parallel.For( from, to, () => produce thread local state, (i, _, local) => do work and return new local state, local => combine local states into global state );
  • 79. Lock-Free Operations ā€¢ Atomic hardware primitives from the Interlocked class ā€“ Interlocked.Increment, Interlocked.Decrement, Interlocked.Add, etc. ā€¢ Especially useful: Interlocked.CompareExchange // Performs ā€œshared *= xā€ atomically static void AtomicMultiply(ref intshared, intx) { intold, result; do { old = shared; result = old * x; } while (old != Interlocked.CompareExchange( ref shared, old, result)); }