In most cases it’s very hard to predict the number of resources needed for your .NET application. But If you spot some abnormal CPU or RAM usage, how to answer the question “Can my application use less?”.
Let’s see samples from real projects, where optimal resource usage by the application became one of the values for the product owner and see how less resource consumption can be.
The workshop will be actual for .NET developers who are interested in optimization of .NET applications, QA engineers who involved performance testing of .NET applications. It also will be interesting to everyone who "suspected" their .NET applications of non-optimal use of resources, but for some reason did not start an investigation.
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
1.
2. Yevhen Tatarynov
Software developer with 15 years of experience in commercial
software and database development (.NET / MS SQL / Delphi)
PhD in math, specializing in the theoretical foundations of
computer science and cybernetics
I was involved in projects performing complex mathematical calculations and processing large
amounts of data. For now my role senior software developer in infrastructure team, Covent IT.
Point of professional interest:
application performance optimization and analysis
writing C# code similar in performance to C++
advanced debugging
5. WinForms .NET Applications
Read *.bin, *.txt data
files
“Process bits”, extract
and use full data, pack
into new format
Write results in text
and binary files
Use .NET Framework
4.0
Run on Windows 10
x64
It works correctly
6. Environment
Windows 10 x64 19042.804
CPU – Intel(R) Core(TM) i3-2330M CPU @ 2.20GHz
SSD – Kingston SKC400S37256G (Read 550MB/s / Write 540 MB/s)
7. Non-Functional Requirements
We need to process files > 1GB (size of processed files can increase significantly)
Application will run on personal laptops
Processing data should be very fast to obtain a result in an appropriate
amount of time
9. What Is the Challenge?
Goals Values
Reduce execution time
Don’t increase total
memory allocation
Processing finished in
appropriate time
Don’t break the
application output
10. Choose Metrics
Execution time – main metric
Total memory allocation – try not to increase
Memory load peak – low priority
16. Sequential Runs Impact
Execution time HDD Execution time SSD
1st
2nd
Diff
%
2,806,777 ms (46 m 47 s 777 ms) 2,742,178 ms (45 m 42 s 178 ms)
2,744,085 ms (45 m 44 s 085 ms) 2,601,642 ms (43 m 21 s 642 ms)
62,692 ms (01 m 03 s 692 ms) 140,536 ms (02 m 20 s 536 ms)
2,23 % 5,12 %
17. Experiment Results Breakage Factors
Drive read / write speed, cache, type (HDD, SSD)
CPU base frequency, cache, burn time, turbo boost, power supply schema
Anti-malware software
Scheduled process, system updates
System file cache, file fragmentation
Total system load (CPU, RAM, Drive)
18. Experiment Results Breakage Factors
Disable anti-malware software, system schedules, system updates
Set high-performance power supply schema if you are using a laptop
Restart OS to end redundant processes and clear caches
Wait until it's fully loaded
Run application on SSD drive types
Run application twice on the same input data
In the analysis, we will use the results from the 2nd execution
21. Potential Improvements
Used .ToArray() - slow
Concat use foreach and extra
memory to iterate input params
Each time we produce new byte array.
Redundant memory traffic.
var b = new byte[];
for (int i = 0; i < N; i++)
{
byte[] a = new byte[GetLen(i)];
/* fill a with values */
b = b.Concat(a).ToArray();
}
return b;
22. Linq.Concat
static IEnumerable<TSource> Concat<TSource>(this IEnumerable<TSource> first,
IEnumerable<TSource> second) {
if (first == null) throw Error.ArgumentNull("first");
if (second == null) throw Error.ArgumentNull("second");
return ConcatIterator<TSource>(first, second);
}
static IEnumerable<TSource> ConcatIterator<TSource>(IEnumerable<TSource> first,
IEnumerable<TSource> second){
foreach (TSource element in first) yield return element;
foreach (TSource element in second) yield return element;
23. .ToArray()
static TSource[] ToArray<TSource>(this IEnumerable<TSource> source) {
if (source == null) throw Error.ArgumentNull("source");
return new Buffer<TSource>(source).ToArray();
}
internal TElement[] ToArray() {
if (count == 0) return new TElement[0];
if (items.Length == count) return items;
TElement[] result = new TElement[count];
Array.Copy(items, 0, result, 0, count);
return result;
24. Solution
var b = new byte[maxN]; var bN=0;
for (int i = 0; i < N; i++)
{
var aN = GetLen(i);
byte[] a = new
byte[GetLen(i)];
Buffer.BlockCopy(aN,0,b,bN,a);
bN += aN;
}
return b;
Use one array to store
array concatenation.
To copy array into result,
use Buffer.BlockCopy -
faster than Array.Copy .
25. Buffer.BlockCopy
public static void BlockCopy
(Array src, int srcOffset, Array
dst, int dstOffset,
int count);
Copies a specified number of bytes from a
source array starting at a particular offset to a
destination array starting at a particular offset.
● src – Array The source buffer.
● srcOffset - Int32 The
zero-based byte offset into src.
● dst – Array The destination
buffer.
● dstOffset - Int32 The
zero-based byte offset into
dst.
● count - Int32 The number of
bytes to copy.
26. Comparison
var b = new byte[];
for (int i = 0; i < N; i++)
{
byte[] a = new byte[GetLen(i)];
/* fill a with values */
b = b.Concat(a).ToArray();
}
return b;
var b = new byte[maxN]; var bN=0;
for (int i = 0; i < N; i++)
{
var aN = GetLen(i);
byte[] a = new byte[aN];
/* fill a with values */
Buffer.BlockCopy(aN,0,b,bN,aN);
bN += aN;
}
return b;
27. #1 Performance Summary
Execution time
1st 374,011
120,333
88,61 %
-38 m 25 s 228 ms
Memory (MB)
2nd
Diff
%
43 m 21 s 642 ms
4 m 56 s 414 ms
-253,678
68,83 %
х 3,11
х 9,25
28. #2 Is GetBits an Issue?
/*
Returns last N (N > 0) bits from X in
byte array
*/
byte[] GetBits(uint x, byte n)
29. Potential Improvements
byte[] GetBits(uint x, byte n)
var a = new byte[n];
var D = (uint)Math.Pow(2, n-1);
for (var i = 0; i < n; i++)
{
a[i] = (byte)(x / D);
x -= a[i] * D;
D /= 2;
}
return a;
Math.Pow use floating point
operation to calculate result
Convert int param to double
Convert double result to uint
Use * and / operation
30. Optimizing Software in C++ by Agner Fog.
Execution time (clock cycles)
Operation Min Max
Min
floating point mul
floating point div
int add
int sub
int shift
int mul
int div
3
40
14
1
1
1
3
4
6
80
45
1
1
1
4
8
Conversion of signed integers to
floating point is fast only when the
SSE2 instruction set is enabled.
Conversion of unsigned integers to
floating point is faster only when the
AVX512
instruction set is enabled.
A conversion from floating point to
integer without SSE2 typically takes
40 clock cycles.
31. Solution
byte[] GetBits(uint x, byte n)
var a = new byte[n];
var i = n - 1;
while (x! = 0)
{
a[i] = (byte)(x & 1);
x = x >> 1;
i--;
}
return a;
Use binary representation on int
numbers
Don’t use Math.Pow
Use only integer operands to avoid
converting
Use bitwise operation >> and &
32. Comparison
byte[] GetBits(uint x, byte n)
var a = new byte[n];
var D = (uint)Math.Pow(2, n-1);
for (var i = 0; i < n; i++)
{
a[i] = (byte)(x / D);
x -= a[i] * D;
D /= 2;
}
return a;
byte[] GetBits(uint x, byte n)
var a = new byte[n];
var i = n - 1;
while (x!=0)
{
a[i] = (byte)(x & 1);
x = x >> 1;
i--;
}
return a;
33. #2 Performance Summary
Execution time
Old 120,333
120,333
0,00 %
-1 m 02 s 992 ms
Memory (MB)
New
Diff
%
4 m 56 s 414 ms
3 m 53 s 422 ms
0
21,25 %
х 1,00
х 1,27
34. #3 Is AnyPathHasIllegalCharacters an Issue?
GetFileName call indirrectly
AnyPathHasIllegalCharacters
GetFileName used in functionality
to sort file names (in format
Name_ddddd.txt) located in
specific folder. Files have to be
sorted in order ddddd name part.
The number of files > 13000
(Name_1.txt .. Name_13000.txt).
35. Potential Improvements
Bubble sort O(N2)
for (var i = 0; i < FileNameArray.Length - 1; i++)
for (var j = i + 1; j < FileNameArray.Length; j++)
{
var S1 = Path.GetFileName(FileNameArray[i]);
var i_N = int.Parse(S1.Substring(4,S1.LastIndexOf('.’)-4));
S1 = Path.GetFileName(FileNameArray[j]);
var j_N =int.Parse(S1.Substring(4,S1.LastIndexOf('.')-4));
if (i_N > j_N)
{
S1 = FileNameArray[i];
FileNameArray[i] = FileNameArray[j];
FileNameArray[j] = S1;
}
}
Redundant GetFileName
calls
Redundant Parse calls
Redundant Substring calls
Redundant LastIndex calls
36. Solution
Array.Sort(FileNameArray, new NumericComparer());
public class NumericComparer : IComparer
/* */
public int Compare(string x, string y)
{
var result = x.Length.CompareTo(y.Length);
if (result == 0)
return x.CompareTo(y);
return result;
}
}
Quick sort O(N log(N))
Don’t call GetFileName
Don’t call Substring calls
Don’t call Parse calls
Don’t call LastIndex calls
37. Comparison
for (var i = 0; i < FileNameArray.Length - 1; i++)
for (var j = i + 1; j < FileNameArray.Length; j++)
{
var S1 = Path.GetFileName(FileNameArray[i]);
var i_N = int.Parse(S1.Substring(4,S1.LastIndexOf('.’)-4));
S1 = Path.GetFileName(FileNameArray[j]);
var j_N = int.Parse(S1.Substring(4,S1.LastIndexOf('.')-4));
if (i_N > j_N)
{
S1 = FileNameArray[i];
FileNameArray[i] = FileNameArray[j];
FileNameArray[j] = S1;
}
}
Array.Sort(FileNameArray, new NumericComparer());
public class NumericComparer : IComparer
/* */
public int Compare(string x, string y)
{
var result = x.Length.CompareTo(y.Length);
if (result == 0)
return x.CompareTo(y);
return result;
}
}
38. #3 Performance Summary
Execution time
Old 120,333
103,775
13,76 %
-1 m 42 s 331 ms
Memory (MB)
New
Diff
%
3 m 53 s 422 ms
2 m 11 s 091 ms
16,558
43,84 %
х 1,16
х 1,78
39. #4 Is FileStream.get_Length an Issue?
In both cases, the binary file read and
the basis on read data is the
calculated number of binary chains.
40. Potential Improvements
using(var br = new BinaryReader(…))
{
while (br.BaseStream.Position <= br.BaseStream.Length - 4)
{
counter++;
br.ReadUInt32();
br.ReadUInt32();
var n = br.ReadUInt32();
for (int i = 0; i < n; i++) br.ReadUInt32();
}
}
Redundant Length calls
Redundant subtraction
Redundant call
ReadUInt32
41. Solution
using(var br = new BinaryReader(…))
{
var length = br.BaseStream.Length - 4;
while (br.BaseStream.Position <= length)
{
counter++;
br.ReadUInt64();
var n = br.ReadUInt32();
for (int i = 0; i < n; i++) br.ReadUInt32();
}
}
Store Length in local
variable
Call ReadUInt64 instead
ReadUInt32
42. Comparison
using(var br = new BinaryReader(…))
{
while (br.BaseStream.Position <= br.BaseStream.Length - 4)
{
counter++;
br.ReadUInt32();
br.ReadUInt32();
var n = br.ReadUInt32();
for (int i = 0; i < n; i++) br.ReadUInt32();
}
}
using(var br = new BinaryReader(…))
{
var length = br.BaseStream.Length - 4;
while (br.BaseStream.Position <= length)
{
counter++;
br.ReadUInt64();
var n = br.ReadUInt32();
for (int i = 0; i < n; i++) br.ReadUInt32();
}
}
43. #4 Performance Summary
Execution time
Old 103,775
103,775
0.00 %
-31 s 786 ms
Memory (MB)
New
Diff
%
2 m 11 s 091 ms
1 m 39 s 305 ms
0
24.25 %
х 1.00
х 1.32
44. #5 Is Math.Log an Issue?
/*
Calculate the number of bits to
store number X. For 0 it should
return 1,
00101110b it should return 6.
*/
byte NumberOfBits(uint X)
45. Potential Improvements
byte NumberOfBits(uint X)
{
if (X > 0)
return (byte)Math.Ceiling(Math.Log(X+1,2));
return 1;
}
Math.Log use floating point
operation to calculate logarithm
Convert double result to byte
Convert uint to double param
46. Solution
byte NumberOfBits(uint X)
{
var x = (int)X;
byte counter = 0;
do {
counter++;
x = x >> 1;
}
while (x != 0);
return counter;
}
Just calculate the number of bits
in X to avoid floating point
arithmetic with bitwise operation
Avoid type conversion
Avoid using Math.Log
47. Comparison
byte NumberOfBits(uint X)
{
if (X > 0)
return (byte)Math.Ceiling(Math.Log(X+1,2));
return 1;
}
byte NumberOfBits(uint X)
{
var x = (int)X;
byte counter = 0;
do {
counter++;
x = x >> 1;
}
while (x != 0);
return counter;
}
48. #5 Performance Summary
Execution time
Old 103,775
103,775
0.00 %
-29 s 138 ms
Memory (MB)
New
Diff
%
1 m 39 s 305 ms
1 m 10 s 167 ms
0
29.34 %
х 1.00
х 1.42
53. Performance Summary
Execution time
Old 374,011
103,775
72.25 %
- 42 m 11 s 575 ms
Memory (MB)
New
Diff
%
43 m 21 s 642 ms
1 m 10 s 167 ms
270,236
97.30 %
х 3.60
х 37.08
55. #6 packbits Is A Heavy Function.
Can It Be Faster?
/*
Calculate statistics made from ushort array
blocks and byte array of bits and pack them
into a new format.
*/
byte[] PackBits(ushort[] blocks,
byte numberOfBits,
bool CalcStatistics)
56. Potential Improvement #1
byte[] PackBits(ushort[] blocks, byte numberOfbits, bool сalcStatistics)
var stat = 0; var packedBits = new byte[maxN];
foreach (var block in blocks)
{
byte bits = NumberOfBits(block);
if (bits <= numberOfbits) {
/*branch1*/ }
else {
/*branch2*/
}
}
return b;
/*branch1*/
/*GetBits return byte array with numberOfbits for block*/
if (сalcStatistics) stat += numberOfbits - 1;
Buffer.BlockCopy(GetBits(block, numberOfbits), 0, packedBits,
n,numberOfbits);
n += numberOfbits;
index += numberOfbits;
/*branch2*/
if (сalcStatistics) stat += 5*numberOfbits;
n += tmp;
index += tmp;
Buffer.BlockCopy(GetBits(block, bits),0,packedBits,n, bits);
n += bits;
index += bits;
57. Unused Calculation Resuts
/*branch 1*/
Buffer.BlockCopy(GetBits(block, nbit), 0, b, n, nbit);
n = n + nbit;
index += numberOfbits;
/*branch 2*/
n = n + tmp;
index += tmp;
Buffer.BlockCopy(GetBits(block, bits), 0, b, n, bits);
n = n + bits;
index += bits;
Result of index calculation not used. Just remove it.
58. Potential Improvement #2
byte[] PackBits(ushort[] blocks, byte numberOfbits, bool сalcStatistics)
var stat = 0; var packedBits = new byte[maxN];
foreach (var block in blocks)
{
byte bits = NumberOfBits(block);
if (bits <= numberOfbits) {
/*branch1*/ }
else {
/*branch2*/
}
}
return b;
/*branch1*/
/*GetBits return byte array with numberOfbits for block*/
if (сalcStatistics) stat += numberOfbits - 1;
Buffer.BlockCopy(GetBits(block, numberOfbits), 0, packedBits,
n,numberOfbits);
n += numberOfbits;
index += numberOfbits;
/*branch2*/
if (сalcStatistics) stat += 5*numberOfbits;
n += tmp;
index += tmp;
Buffer.BlockCopy(GetBits(block, bits),0,packedBits,n, bits);
n += bits;
index += bits;
59. Avoid Redundant Copy
Write bits inside GetBits to existing
array instead of producing a new one
byte[] GetBits(uint X,byte N,byte[] dest
,int offset)
var i = offset + N – 1;
while (X!=0)
{
dest[i] = (byte)(X & 1);
X = X >> 1;
i--;
}
return a;
Avoid using Buffer.BlockCopy
60. Potential Improvement #3
byte[] PackBits(ushort[] blocks, byte numberOfbits, bool сalcStatistics)
var stat = 0; var packedBits = new byte[maxN];
foreach (var block in blocks)
{
byte bits = NumberOfBits(block);
if (bits <= numberOfbits) {
/*branch1*/ }
else {
/*branch2*/
}
}
return b;
/*branch1*/
/*GetBits return byte array with numberOfbits for block*/
if (сalcStatistics) stat += numberOfbits - 1;
GetBits (block, b , numberOfBits, n);
n += numberOfbits;
index += numberOfbits;
/*branch2*/
if (сalcStatistics) stat += 5*numberOfbits;
n += tmp;
index += tmp;
GetBits (block, b , bits, n);
n += bits;
index += bits;
61. Foreach vs. For to Iterate Array
It seems that the compiler is smart enough to make foreach a for
statement on JIT Asm level. But we prefer the simple for.
C.M(UInt16[]) /*foreach statement*/
L0000: mov rax, rdx
L0002: xor eax, eax
L0005: mov ecx, [rdx+0x8]
L0007: test ecx, ecx
L0009: jle L0018
L000c: movsxd r8, eax
L0012: movzx r8d, word [rdx+r8*2+0x10]
L0014: inc eax
L0016: cmp ecx, eax
L0018: jg L000c
L0020: ret
C.M(UInt16[]) /*for statement*/
L0000: xor eax, eax
L0002: mov ecx, [rdx+0x8]xor eax, eax
L0005: test ecx, ecx
L0007: jle L0018
L0009: movsxd r8, eax
L000c: movzx r8d, word [rdx+r8*2+0x10]
L0012: inc eax
L0014: cmp ecx, eax
L0016: jg L0009
L0018: ret
62. Potential Improvement #4
byte[] PackBits(ushort[] blocks, byte numberOfbits, bool сalcStatistics)
var stat = 0; var packedBits = new byte[maxN];
for (int i=0; i< blocks.length; i++)
{
byte bits = NumberOfBits(block);
if (bits <= numberOfbits) {
/*branch1*/ }
else {
/*branch2*/
}
}
return b;
/*branch1*/
/*GetBits return byte array with numberOfbits for block*/
if (сalcStatistics) stat += numberOfbits - 1;
GetBits (block, b , numberOfBits, n);
n += numberOfbits;
index += numberOfbits;
/*branch2*/
if (сalcStatistics) stat += 5*numberOfbits;
n += tmp;
index += tmp;
GetBits (block, b , bits, n);
n += bits;
index += bits;
63. maxN Is Known
byte[] PackBits(ushort[] blocks, byte numberOfbits, bool сalcStatistics, int maxN);
int PackedBitsLength(ushort[] blocks, byte numberOfbits, bool сalcStatistics);
Upper bound maxN calculated before call PackBits; we can pass its function as a parameter
Often, PackBits calls just to get a result array length n variable in PackBits. So we can split it
in two functions.
64. Potential Improvement #5
byte[] PackBits(ushort[] blocks, byte numberOfbits, bool сalcStatistics, int maxN)
var stat = 0; var packedBits = new byte[maxN];
foreach (var block in blocks)
{
byte bits = NumberOfBits(block);
if (bits <= numberOfbits) {
/*branch1*/ }
else {
/*branch2*/
}
}
return b;
/*branch1*/
/*GetBits return byte array with numberOfbits for block*/
if (сalcStatistics) stat += numberOfbits - 1;
GetBits (block, b , numberOfBits, n);
n += numberOfbits;
index += numberOfbits;
/*branch2*/
if (сalcStatistics) stat += 5*numberOfbits;
n += tmp;
index += tmp;
GetBits (block, b , bits, n);
n += bits;
index += bits;
65. We Don’t Need an If Statement
var stat = 0; var packedBits = new byte[maxN];
foreach (var block in blocks)
{
byte bits = NumberOfBits(block);
if (bits <= nbit) {/*branch 1*/
if (сalcStatistics) stat += numberOfbits - 1;
} else{/*branch 2*/
if (сalcStatistics) stat += 5*numberOfbits;
}
}
return b;
Statistics are calculated only
when we need to produce an
array.
So we can make a calculation
without the condition and
avoid using a branch predictor.
In PackedBitsLength, we can
remove these rows.
66. PackBits
byte[] PackBits(ushort[] blocks, byte numberOfbits, bool сalcStatistics, int maxN)
var stat = 0; var packedBits = new byte[maxN];
for(int i=0;i< blocks.length; i++)
{
byte bits = NumberOfBits(block);
if (bits <= numberOfbits) {
/*branch1*/ }
else {
/*branch2*/
}
}
return b;
/*branch1*/
/*GetBits return byte array with numberOfbits for block*/
if (сalcStatistics) stat += numberOfbits - 1;
GetBits (block, b , numberOfBits, n);
n += numberOfbits;
index += numberOfbits;
/*branch2*/
if (сalcStatistics) stat += 5*numberOfbits;
n += tmp;
index += tmp;
GetBits (block, b , bits, n);
n += bits;
index += bits;
67. #6 Performance Summary
Execution time
Old 103,775
47,239
54.48 %
-19 s 974 ms
Memory (MB)
New
Diff
%
1 m 10 s 167 ms
50 s 193 ms
56,536
28.47 %
х 2.20
х 1.40
68. #7 Heavy function PackBits2.
CAN IT BE FASTER?
/*
Calculate statistics, make from uint list
bloks, byte array of bits and pack it in new
format.
*/
byte[] PackBits2(List<unit> blocks,
byte numberOfBits,
bool CalcStatistics)
69. #7 Performance Summary
Execution time
Old 47,239
10,510
77.75 %
-13 s 562 ms
Memory (MB)
New
Diff
%
50 s 193 ms
36 s 631 ms
26,729
27.02 %
х 4.49
х 1.37
70. #8 WriteBits - Can It Be Faster?
/* Write to the binary file the
given number of bits from UInt
and store bits which do not fit
into 8 bits in static array */
WriteBits(BinaryWriter bw, uint x
byte numberOfBits)
71. Potential Improvement #1
void WriteBits(BinaryWriter bw, uint x, byte numberOfBits)
byte[] bits = GetBits(x, numberOfBits);
byte[] tmpB = new byte[buffB.Length + bits.Length];
Buffer.BlockCopy(buffB, 0, tmpB, 0, buffB.Length);
Buffer.BlockCopy(bits, 0, tmpB, buffB.Length, bits.Length);
buffB = tmpB; // static byte array to store not writing bytes
for (int i = 0; i < buffB.Length / 8; i++)
{ int j = i * 8;
іbw.Write((byte)(buffB[j]*128+buffB[j+1]*64+buffB[j+2]*32+buffB[j+3]*16+buffB[j+4]*8+buffB[j+5]*4+buffB[j+6]*2+buffB[j+7]))
;
}
int L = (buffB.Length / 8) * 8;
for (int i = L; i < buffB.Length; i++) buffB[i - L] = buffB[i];
Array.Resize(ref buffB, buffB.Length - L);
72. Buffer.BlockCopy Can Be Inefficient
numberOfBits parameter can’t be more then 32 because we operate with Int32 and Uint32 numbers,
so the bit tail can’t be more than 8 bits. We need to store no more than 40 bits (in a 40-byte array).
Thus, copying an array with Buffer.BlockCopy is not so efficient. Replace it with a simple copy element
in the loop.
73. Potential Improvement #2
void WriteBits(BinaryWriter bw, uint x, byte numberOfBits)
byte[] bits = GetBits(x, numberOfBits);
byte[] tmpB = new byte[buffB.Length + bits.Length];
for (int i = 0; i < buffB.Length; i++) tmpB[i] = buffB[i];
for (int i = 0; i < bits.Length; i++) tmpB[i+ BitsBuffLength] = a[i];
buffB = tmpB; // static byte array to store not writing bytes
for (int i = 0; i < buffB.Length / 8; i++)
{ int j = i * 8;
bw.Write((byte)(buffB[j]*128+buffB[j+1]*64+buffB[j+2]*32+buffB[j+3]*16+buffB[j+4]*8+buffB[j+5]*4+buffB[j+6]*2+buffB[j+7]));
}
int L = (buffB.Length / 8) * 8;
for (int i = L; i < buffB.Length; i++) buffB[i - L] = buffB[i];
Array.Resize(ref buffB, buffB.Length - L);
75. Potential Improvement #3
void WriteBits(BinaryWriter bw, uint x, byte numberOfBits)
byte[] bits = GetBits(x, numberOfBits);
byte[] tmpB = new byte[buffB.Length + bits.Length];
for (int i = 0; i < buffB.Length; i++) tmpB[i] = buffB[i];
for (int i = 0; i < bits.Length; i++) tmpB[i+ BitsBuffLength] = a[i];
buffB = tmpB; // static byte array to store not writing bytes
for (int i = 0; i < buffB.Length >> 3; i++)
{ int j = i << 3;
bw.Write((byte)(buffB[j]<<7+buffB[j+1]<<6+buffB[j+2]<<5+buffB[j+3]<<4+buffB[j+4]<<3+buffB[j+5]<<2+buffB[j+6]<<1+buffB[j+7]));
}
int L = (buffB.Length >> 8) << 8;
for (int i = L; i < buffB.Length; i++) buffB[i - L] = buffB[i];
Array.Resize(ref buffB, buffB.Length - L);
76. Buffer Array to Store Bits Tail
We don’t need to create a new array each time, and instead just reuse the existing buffer length of 8.
Each time, we will store its current bits tail length so we can avoid using Array.Resize()
77. WriteBits
void WriteBits(BinaryWriter bw, uint x, byte numberOfBits)
byte[] bits = GetBits(x, numberOfBits);
byte[] tmpB = new byte[buffB.Length + bits.Length];
for (int i = 0; i < buffB.Length; i++) tmpB[i] = buffB[i];
for (int i = 0; i < bits.Length; i++) tmpB[i+ BitsBuffLength] = a[i];
buffB = tmpB; // static byte array to store not writing bytes
for (int i = 0; i < buffB.Length >> 3; i++)
{ int j = i << 3;
bw.Write((byte)(buffB[j]<<7+buffB[j+1]<<6+buffB[j+2]<<5+buffB[j+3]<<4+buffB[j+4]<<3+buffB[j+5]<<2+buffB[j+6]<<1+buffB[j+7]));
}
int L = (buffB.Length >> 8) << 8;
for (int i = L; i < buffB.Length; i++) buffB[i - L] = buffB[i];
BitsBuffLength = Bits_ tmpB.Length - L;
78. #8 Performance Summary
Execution time
Old 10,510
9,754
7.19 %
-4 s 080 ms
Memory (MB)
New
Diff
%
36 s 631 ms
32 s 551 ms
756
11.14 %
х 1.08
х 1.13
79. #9 ScaleGrad. - Can It Be Faster?
/*
Return index of number x
by ordered scale
*/
int ScaleGrad(int x)
80. Potential Improvements
Avoid compare int and double
values
Scale is a sorted array, so we
can use binary search; it’s more
efficient and less dependent
on input data
static double[] Scale;
…
/* 600+ lines of code */
…
int ScaleGrad(int x)
{
for(int i=0; i<Scale.Length && Scale[i]<=x; i++)
return i - 1;
}
81. Comparison
static double[] Scale;
/* 600+ lins of code */
int ScaleGrad(int x)
{
for(int i=0;(i<Scale.Length)&&(Scale[i]<= x);i++);
return i - 1;
}
static int[] Scale;
/* 600+ lins of code */
int ScaleGrad(int x)
var left = 1; var right = Scale.Length -1;
var mid =(left + right)>>1;//(left+right)/2
do {
mid = left + ((right - left)>>1);
if ( x < Scale[mid]) right = mid - 1;
else left = mid + 1;
} while (right >= left);
return mid;
82. #9 Performance Summary
Execution time
Old 9,754
9,754
0.00 %
-3 s 707 ms
Memory (MB)
New
Diff
%
32 s 551 ms
28 s 844 ms
0
11.39 %
х 1.00
х 1.13
83. #10 ReadData - Can It Be Faster?
ReadInt32( )
Used in main processing
function, so we shouldn’t touch
it right now.
ReadUInt32( )
In both cases, the binary file
read and the basis on read
data is the calculated number
of binary chains.
84. Potential Improvements
Don’t allocate and collect
unused data in a list
Read UInt64 values instead of
UIn32. It has cut the loop
length in half, but we have to
check if the loop length is not
odd
using(var br = new BinaryReader(…))
var list = new List<int>();
var lenght = br.BaseStream.Length – 4;
while (br.BaseStream.Position <= length){
br.ReadUInt64();
var n = br.ReadUInt32();
list.Add(n);
for (int i=0; i < n; i++)
br.ReadUInt32();
}
85. Comparison
using(var br = new BinaryReader(…))
var list = new List<int>();
var lenght = br.BaseStream.Length – 4;
while (br.BaseStream.Position <= length) {
br.ReadUInt64();
var n = br.ReadUInt32();
list.Add(n);
for(int i=0; i < n; i++) br.ReadUInt32();
}
using(var br = new BinaryReader(…))
var lenght = br.BaseStream.Length – 4;
while (br.BaseStream.Position <= length) {
br.ReadUInt64();
var n = br.ReadUInt32();
for(int i=0; i < n>>1; i++)
br.ReadUInt64();
if ((n & 1) == 1) br.ReadUInt32();
}
86. #10 Performance Summary
Execution time
Old 9,754
9,690
0.66 %
-4 s 796 ms
Memory (MB)
New
Diff
%
28 s 844 ms
24 s 048 ms
64
16.63 %
х 1.01
х 1.13
87. #11 List Usage - Can It Be Faster?
A lot of generic list usage (create
new list instances, add elements,
iterate).
It can be a convention or common
approach, but its usage is costly.
Even then, you just get element by
index.
var list = new List();
var item = list[i];
88. Potential Improvement #1
List<Point> Foo(List<Point> Points)
{
var R = new List<Point>();
foreach (var P in Points)
R.Add(new Point(P.X, ProcessPoint(P.Y)));
return R;
}
89. Favor Arrays Over Lists
Use arrays instead of lists if
possible. It allows simple array
indexing as opposed to add or [
] list functions
Use the capacity in list
constructor if it is known; it
allows you to add elements
without an internal array resize
public T this[int index] {
get {
if ((uint) index >= (uint)_size)
ThrowHelper.ThrowArgumentOutOfRangeException();
Contract.EndContractBlock();
return _items[index];
}
set {
if ((uint) index >= (uint)_size)
ThrowHelper.ThrowArgumentOutOfRangeException();
Contract.EndContractBlock();
_items[index] = value;
version++;
}
}
90. Potential Improvement #2
List<Point> Foo(List<Point> Points)
{
var R = new List<Point>();
foreach (var P in Points)
R.Add(new Point(P.X, ProcessPoint(P.Y)));
return R;
}
91. Favor Arrays Over Lists
Use simple for instead foreach to avoid :
- virtual GetEnumerator(), which
produces boxing
- Instance methods call get_Current( )
and the somewhat complex
MoveNext( )
…
callvirt instance valuetype GetEnumerator()
…
// loop start
…
call instance !0 valuetype get_Current()
…
call instance bool valuetype MoveNext()
…
// end loop
92. Comparison
List<Point> Foo(List<Point> Points)
{
var R = new List<Point>();
foreach (var P in Points)
R.Add(new Point(P.X,
ProcessPoint(P.Y)));
return R;
}
Point[] Foo(List<Point> Points)
{
var n = Points.Count;
var R = new Point[n];
for(var i = 0; i < n; i++)
R[i] = new Point(Points[i].X,
ProcessPoint(Points[i].Y));
return R;
}
93. #11 Performance Summary
Execution time
Old 9,690
7,842
19.07 %
-5 s 715 ms
Memory (MB)
New
Diff
%
24 s 048 ms
18 s 333 ms
1,848
23.76 %
х 1.24
х 1.31
94. #12 WriteBits - Can It Be Faster?
WriteBits is near the top again, and it has 64.56 % own time, so let’s try to optimize it.
95. Potential Improvement #3
void WriteBits(BinaryWriter bw, uint x, byte numberOfbits)
byte[] bits = GetBits(x, numberOfbits);
byte[] tmpB = new byte[buffB.Length + bits.Length];
for (int i = 0; i < BitsBuffLength; i++) tmpB[i] = buffB[i];
for (int i = 0; i < bits.Length; i++) tmpB[i+ BitsBuffLength] = bits[i];
buffB = tmpB; // static byte array to store not writing bytes
for (int i = 0; i < buffB.Length >> 3; i++) {
int j = i << 3;
bw.Write((byte)(buffB[j]<<7+buffB[j+1]<<6+buffB[j+2]<<5+buffB[j+3]<<4
+buffB[j+4]<<3+buffB[j+5]<<2+buffB[j+6]<<1+buffB[j+7]));
}
int L = (buffB.Length >> 3) << 3;
for (int i = L; i < buffB.Length; i++) {buffB[i - L] = buffB[i];}
BitsBuffLength = tmpB.Length - L;
96. Don't Forget to Use New Features
We forgot that GetBits can fill arrays with offset.
Using it here can prevent redundant array copying.
byte[] tmpB = new byte[buffB.Length+ bits.Length];
byte[] bits = GetBits(x, numberOfbits);
for (int i = 0; i < BitsBuffLength; i++)
tmpB[i] = buffB[i];
for (int i = 0; i < bits.Length; i++)
tmpB[i+ BitsBuffLength] = bits[i];
byte[] tmpB = new byte[BitsBuffLength + numberOfBits];
GetBits(x, numberOfBits, tmpB, BitsBuffLength);
for (int i = 0; i < BitsBuffLength; i++)
tmpB[i] = buffB[i];
97. Potential Improvement #2
void WriteBits(BinaryWriter bw, uint x, byte numberOfbits)
byte[] bits = GetBits(x, numberOfbits);
byte[] tmpB = new byte[buffB.Length + bits.Length];
for (int i = 0; i < BitsBuffLength; i++) tmpB[i] = buffB[i];
for (int i = 0; i < bits.Length; i++) tmpB[i+ BitsBuffLength] = bits[i];
buffB = tmpB; // static byte array to store not writing bytes
for (int i = 0; i < buffB.Length >> 3; i++) {
int j = i << 3;
bw.Write((byte)(buffB[j]<<7+buffB[j+1]<<6+buffB[j+2]<<5+buffB[j+3]<<4
+buffB[j+4]<<3+buffB[j+5]<<2+buffB[j+6]<<1+buffB[j+7]));
}
int L = (buffB.Length >> 3) << 3;
for (int i = L; i < buffB.Length; i++) {buffB[i - L] = buffB[i];}
BitsBuffLength = tmpB.Length - L;
98. Even Small Operations Can Have
Significant Impacts
Each iteration we calculate i << 3 and
offset for array buffB, we can counter
with step 8 and calculate offset for 8
needed elements
It’s not critical, but we change +
operation to |
99. Comparison
for (int i=0; i < buffB.Length >> 3; i++) {
int j = i << 3;
bw.Write(
(byte)(buffB[j]<<7+buffB[j+1]<<6
+buffB[j+2]<<5+buffB[j+3]<<4
+buffB[j+4]<<3+buffB[j+5]<<2
+buffB[j+6]<<1+buffB[j+7]));
}
for (int i=0; i<(buffB.Length >> 3) << 3; i+=8) {
bw.Write(
(byte)(buffB[i]<<7|buffB[i+1]<<6
|buffB[i+2]<<5|buffB[i+3]<<4
|buffB[i+4]<<3|buffB[i+5]<<2
|buffB[i+6]<<1|buffB[i+7]));
}
100. WriteBits
void WriteBits(BinaryWriter bw, uint x, byte numberOfBits)
byte[] tmpB = new byte[BitsBuffLength + numberOfBits];
GetBits(x, numberOfBits, tmpB, BitsBuffLength);
for (int i = 0; i < BitsBuffLength; i++) tmpB[i] = buffB[i];
buffB = tmpB; // static byte array to store not writing bytes
int L = (buffB.Length >> 8) << 8;
for (int i = 0; i < L; i+=8)
bw.Write((byte)(buffB[i]<<7|buffB[i+1]<<6|buffB[i+2]<<5|buffB[i+3]<<4
|buffB[i+4]<<3|buffB[i+5]<<2|buffB[i+6]<<1|buffB[i+7]));
for (int i = L; i < buffB.Length; i++) {buffB[i - L] = buffB[i];}
BitsBuffLength = tmpB.Length - L;
101. #12 Performance Summary
Execution time
Old 7,842
5,525
29.55 %
-1 s 966 ms
Memory (MB)
New
Diff
%
18 s 333 ms
16 s 367 ms
2,317
10.72 %
х 1.42
х 1.12
103. #13 Performance Summary
Execution time
Old 5,525
4,424
19.93 %
-1 s 331 ms
Memory (MB)
New
Diff
%
16 s 367 ms
15 s 036 ms
1,101
8.13 %
х 1.25
х 1.09
107. #12 Performance Summary
Execution time
Old 103,775
4,424
95.74 %
45 s 131 ms
Memory (MB)
New
Diff
%
1 m 10 s 167 ms
15 s 036 ms
99,341
78.57 %
х 23.46
х 4.67
108. Performance Summary
Old
New
Diff
%
Execution time Memory (MB) Peak (MB)
43 m 21 s 642 ms
15 s 036 ms
43 m 06 s 602 ms
99.42 %
x 173.02
374,011
4,424
369,577
98.82 %
х 84.54
93,26
33,67
59,59
63,89 %
х 2.77
113. LINKS
Use dotTrace Command-Line Profiler Hashtable and dictionary collection types
.NET Performance Optimization &
Profiling with JetBrains dotTrace
Why GC run when using a struct as a
generic dictionary
Matt Ellis. Writing Allocation Free Code
in C#
Konrad Kokosa. High-performance code
design patterns in C#
Maarten Balliauw. Let’s refresh our
memory! Memory management in .NET
Sasha Goldshtein. Pro .NET Performance:
Optimize Your C# Applications
Ben Watson. Writing High-Performance
.NET Code, 2nd Edition
Writing Faster Managed Code: Know
What Things Cost
114. Maarten Balliauw
LINKS
Sasha Goldshtein
Yevhen Tatarynov GitHub
Ling.Concat
Linq.Concat Implementation
Buffer.BlockCopy Generic List implementation
Optimizing software in C++
Denis Reznik video Array.Sort