SlideShare a Scribd company logo
1 of 114
Download to read offline
Yevhen Tatarynov
Software developer with 15 years of experience in commercial
software and database development (.NET / MS SQL / Delphi)
PhD in math, specializing in the theoretical foundations of
computer science and cybernetics
I was involved in projects performing complex mathematical calculations and processing large
amounts of data. For now my role senior software developer in infrastructure team, Covent IT.
Point of professional interest:
application performance optimization and analysis
writing C# code similar in performance to C++
advanced debugging
Agenda
What is the
challenge?
Measurements Is it an issue?
QA
Intermediate
results
Summary
Can it be
faster?
The Challenge
WinForms .NET Applications
Read *.bin, *.txt data
files
“Process bits”, extract
and use full data, pack
into new format
Write results in text
and binary files
Use .NET Framework
4.0
Run on Windows 10
x64
It works correctly
Environment
Windows 10 x64 19042.804
CPU – Intel(R) Core(TM) i3-2330M CPU @ 2.20GHz
SSD – Kingston SKC400S37256G (Read 550MB/s / Write 540 MB/s)
Non-Functional Requirements
We need to process files > 1GB (size of processed files can increase significantly)
Application will run on personal laptops
Processing data should be very fast to obtain a result in an appropriate
amount of time
Disappointing Forecast
File size
~(363+236)MB
Execution time
Peak memory
load
Total memory
allocation
~(415+178)MB
> (1 + 0.25|0.5)GB
~ 11.5 min.
~45.5 min
~89.28 MB
~93.26 MB
~132,584 MB
~374,011 MB
What Is the Challenge?
Goals Values
Reduce execution time
Don’t increase total
memory allocation
Processing finished in
appropriate time
Don’t break the
application output
Choose Metrics
Execution time – main metric
Total memory allocation – try not to increase
Memory load peak – low priority
Measurements
Measurement Tools
DateTime.Now StopWatch DotNet
Benchmark
Perfview Visual Studio
Performance Profiler
R# dotTrace RedGate
Performance
monitor
dotMemory Snapshot
dotTrace TimeLine Sample & Snapshot Execution Time
dotTrace TimeLine Snapshot Execution Time
Sequential Runs Impact
Execution time HDD Execution time SSD
1st
2nd
Diff
%
2,806,777 ms (46 m 47 s 777 ms) 2,742,178 ms (45 m 42 s 178 ms)
2,744,085 ms (45 m 44 s 085 ms) 2,601,642 ms (43 m 21 s 642 ms)
62,692 ms (01 m 03 s 692 ms) 140,536 ms (02 m 20 s 536 ms)
2,23 % 5,12 %
Experiment Results Breakage Factors
Drive read / write speed, cache, type (HDD, SSD)
CPU base frequency, cache, burn time, turbo boost, power supply schema
Anti-malware software
Scheduled process, system updates
System file cache, file fragmentation
Total system load (CPU, RAM, Drive)
Experiment Results Breakage Factors
Disable anti-malware software, system schedules, system updates
Set high-performance power supply schema if you are using a laptop
Restart OS to end redundant processes and clear caches
Wait until it's fully loaded
Run application on SSD drive types
Run application twice on the same input data
In the analysis, we will use the results from the 2nd execution
Is It an Issue?
#1 Is Linq an Issue?
Potential Improvements
Used .ToArray() - slow
Concat use foreach and extra
memory to iterate input params
Each time we produce new byte array.
Redundant memory traffic.
var b = new byte[];
for (int i = 0; i < N; i++)
{
byte[] a = new byte[GetLen(i)];
/* fill a with values */
b = b.Concat(a).ToArray();
}
return b;
Linq.Concat
static IEnumerable<TSource> Concat<TSource>(this IEnumerable<TSource> first,
IEnumerable<TSource> second) {
if (first == null) throw Error.ArgumentNull("first");
if (second == null) throw Error.ArgumentNull("second");
return ConcatIterator<TSource>(first, second);
}
static IEnumerable<TSource> ConcatIterator<TSource>(IEnumerable<TSource> first,
IEnumerable<TSource> second){
foreach (TSource element in first) yield return element;
foreach (TSource element in second) yield return element;
.ToArray()
static TSource[] ToArray<TSource>(this IEnumerable<TSource> source) {
if (source == null) throw Error.ArgumentNull("source");
return new Buffer<TSource>(source).ToArray();
}
internal TElement[] ToArray() {
if (count == 0) return new TElement[0];
if (items.Length == count) return items;
TElement[] result = new TElement[count];
Array.Copy(items, 0, result, 0, count);
return result;
Solution
var b = new byte[maxN]; var bN=0;
for (int i = 0; i < N; i++)
{
var aN = GetLen(i);
byte[] a = new
byte[GetLen(i)];
Buffer.BlockCopy(aN,0,b,bN,a);
bN += aN;
}
return b;
Use one array to store
array concatenation.
To copy array into result,
use Buffer.BlockCopy -
faster than Array.Copy .
Buffer.BlockCopy
public static void BlockCopy
(Array src, int srcOffset, Array
dst, int dstOffset,
int count);
Copies a specified number of bytes from a
source array starting at a particular offset to a
destination array starting at a particular offset.
● src – Array The source buffer.
● srcOffset - Int32 The
zero-based byte offset into src.
● dst – Array The destination
buffer.
● dstOffset - Int32 The
zero-based byte offset into
dst.
● count - Int32 The number of
bytes to copy.
Comparison
var b = new byte[];
for (int i = 0; i < N; i++)
{
byte[] a = new byte[GetLen(i)];
/* fill a with values */
b = b.Concat(a).ToArray();
}
return b;
var b = new byte[maxN]; var bN=0;
for (int i = 0; i < N; i++)
{
var aN = GetLen(i);
byte[] a = new byte[aN];
/* fill a with values */
Buffer.BlockCopy(aN,0,b,bN,aN);
bN += aN;
}
return b;
#1 Performance Summary
Execution time
1st 374,011
120,333
88,61 %
-38 m 25 s 228 ms
Memory (MB)
2nd
Diff
%
43 m 21 s 642 ms
4 m 56 s 414 ms
-253,678
68,83 %
х 3,11
х 9,25
#2 Is GetBits an Issue?
/*
Returns last N (N > 0) bits from X in
byte array
*/
byte[] GetBits(uint x, byte n)
Potential Improvements
byte[] GetBits(uint x, byte n)
var a = new byte[n];
var D = (uint)Math.Pow(2, n-1);
for (var i = 0; i < n; i++)
{
a[i] = (byte)(x / D);
x -= a[i] * D;
D /= 2;
}
return a;
Math.Pow use floating point
operation to calculate result
Convert int param to double
Convert double result to uint
Use * and / operation
Optimizing Software in C++ by Agner Fog.
Execution time (clock cycles)
Operation Min Max
Min
floating point mul
floating point div
int add
int sub
int shift
int mul
int div
3
40
14
1
1
1
3
4
6
80
45
1
1
1
4
8
Conversion of signed integers to
floating point is fast only when the
SSE2 instruction set is enabled.
Conversion of unsigned integers to
floating point is faster only when the
AVX512
instruction set is enabled.
A conversion from floating point to
integer without SSE2 typically takes
40 clock cycles.
Solution
byte[] GetBits(uint x, byte n)
var a = new byte[n];
var i = n - 1;
while (x! = 0)
{
a[i] = (byte)(x & 1);
x = x >> 1;
i--;
}
return a;
Use binary representation on int
numbers
Don’t use Math.Pow
Use only integer operands to avoid
converting
Use bitwise operation >> and &
Comparison
byte[] GetBits(uint x, byte n)
var a = new byte[n];
var D = (uint)Math.Pow(2, n-1);
for (var i = 0; i < n; i++)
{
a[i] = (byte)(x / D);
x -= a[i] * D;
D /= 2;
}
return a;
byte[] GetBits(uint x, byte n)
var a = new byte[n];
var i = n - 1;
while (x!=0)
{
a[i] = (byte)(x & 1);
x = x >> 1;
i--;
}
return a;
#2 Performance Summary
Execution time
Old 120,333
120,333
0,00 %
-1 m 02 s 992 ms
Memory (MB)
New
Diff
%
4 m 56 s 414 ms
3 m 53 s 422 ms
0
21,25 %
х 1,00
х 1,27
#3 Is AnyPathHasIllegalCharacters an Issue?
GetFileName call indirrectly
AnyPathHasIllegalCharacters
GetFileName used in functionality
to sort file names (in format
Name_ddddd.txt) located in
specific folder. Files have to be
sorted in order ddddd name part.
The number of files > 13000
(Name_1.txt .. Name_13000.txt).
Potential Improvements
Bubble sort O(N2)
for (var i = 0; i < FileNameArray.Length - 1; i++)
for (var j = i + 1; j < FileNameArray.Length; j++)
{
var S1 = Path.GetFileName(FileNameArray[i]);
var i_N = int.Parse(S1.Substring(4,S1.LastIndexOf('.’)-4));
S1 = Path.GetFileName(FileNameArray[j]);
var j_N =int.Parse(S1.Substring(4,S1.LastIndexOf('.')-4));
if (i_N > j_N)
{
S1 = FileNameArray[i];
FileNameArray[i] = FileNameArray[j];
FileNameArray[j] = S1;
}
}
Redundant GetFileName
calls
Redundant Parse calls
Redundant Substring calls
Redundant LastIndex calls
Solution
Array.Sort(FileNameArray, new NumericComparer());
public class NumericComparer : IComparer
/* */
public int Compare(string x, string y)
{
var result = x.Length.CompareTo(y.Length);
if (result == 0)
return x.CompareTo(y);
return result;
}
}
Quick sort O(N log(N))
Don’t call GetFileName
Don’t call Substring calls
Don’t call Parse calls
Don’t call LastIndex calls
Comparison
for (var i = 0; i < FileNameArray.Length - 1; i++)
for (var j = i + 1; j < FileNameArray.Length; j++)
{
var S1 = Path.GetFileName(FileNameArray[i]);
var i_N = int.Parse(S1.Substring(4,S1.LastIndexOf('.’)-4));
S1 = Path.GetFileName(FileNameArray[j]);
var j_N = int.Parse(S1.Substring(4,S1.LastIndexOf('.')-4));
if (i_N > j_N)
{
S1 = FileNameArray[i];
FileNameArray[i] = FileNameArray[j];
FileNameArray[j] = S1;
}
}
Array.Sort(FileNameArray, new NumericComparer());
public class NumericComparer : IComparer
/* */
public int Compare(string x, string y)
{
var result = x.Length.CompareTo(y.Length);
if (result == 0)
return x.CompareTo(y);
return result;
}
}
#3 Performance Summary
Execution time
Old 120,333
103,775
13,76 %
-1 m 42 s 331 ms
Memory (MB)
New
Diff
%
3 m 53 s 422 ms
2 m 11 s 091 ms
16,558
43,84 %
х 1,16
х 1,78
#4 Is FileStream.get_Length an Issue?
In both cases, the binary file read and
the basis on read data is the
calculated number of binary chains.
Potential Improvements
using(var br = new BinaryReader(…))
{
while (br.BaseStream.Position <= br.BaseStream.Length - 4)
{
counter++;
br.ReadUInt32();
br.ReadUInt32();
var n = br.ReadUInt32();
for (int i = 0; i < n; i++) br.ReadUInt32();
}
}
Redundant Length calls
Redundant subtraction
Redundant call
ReadUInt32
Solution
using(var br = new BinaryReader(…))
{
var length = br.BaseStream.Length - 4;
while (br.BaseStream.Position <= length)
{
counter++;
br.ReadUInt64();
var n = br.ReadUInt32();
for (int i = 0; i < n; i++) br.ReadUInt32();
}
}
Store Length in local
variable
Call ReadUInt64 instead
ReadUInt32
Comparison
using(var br = new BinaryReader(…))
{
while (br.BaseStream.Position <= br.BaseStream.Length - 4)
{
counter++;
br.ReadUInt32();
br.ReadUInt32();
var n = br.ReadUInt32();
for (int i = 0; i < n; i++) br.ReadUInt32();
}
}
using(var br = new BinaryReader(…))
{
var length = br.BaseStream.Length - 4;
while (br.BaseStream.Position <= length)
{
counter++;
br.ReadUInt64();
var n = br.ReadUInt32();
for (int i = 0; i < n; i++) br.ReadUInt32();
}
}
#4 Performance Summary
Execution time
Old 103,775
103,775
0.00 %
-31 s 786 ms
Memory (MB)
New
Diff
%
2 m 11 s 091 ms
1 m 39 s 305 ms
0
24.25 %
х 1.00
х 1.32
#5 Is Math.Log an Issue?
/*
Calculate the number of bits to
store number X. For 0 it should
return 1,
00101110b it should return 6.
*/
byte NumberOfBits(uint X)
Potential Improvements
byte NumberOfBits(uint X)
{
if (X > 0)
return (byte)Math.Ceiling(Math.Log(X+1,2));
return 1;
}
Math.Log use floating point
operation to calculate logarithm
Convert double result to byte
Convert uint to double param
Solution
byte NumberOfBits(uint X)
{
var x = (int)X;
byte counter = 0;
do {
counter++;
x = x >> 1;
}
while (x != 0);
return counter;
}
Just calculate the number of bits
in X to avoid floating point
arithmetic with bitwise operation
Avoid type conversion
Avoid using Math.Log
Comparison
byte NumberOfBits(uint X)
{
if (X > 0)
return (byte)Math.Ceiling(Math.Log(X+1,2));
return 1;
}
byte NumberOfBits(uint X)
{
var x = (int)X;
byte counter = 0;
do {
counter++;
x = x >> 1;
}
while (x != 0);
return counter;
}
#5 Performance Summary
Execution time
Old 103,775
103,775
0.00 %
-29 s 138 ms
Memory (MB)
New
Diff
%
1 m 39 s 305 ms
1 m 10 s 167 ms
0
29.34 %
х 1.00
х 1.42
Summary
dotMemory Report
dotMemory Snapshot
Timeline Report Comparison
Performance Summary
Execution time
Old 374,011
103,775
72.25 %
- 42 m 11 s 575 ms
Memory (MB)
New
Diff
%
43 m 21 s 642 ms
1 m 10 s 167 ms
270,236
97.30 %
х 3.60
х 37.08
Can It Be Faster?
#6 packbits Is A Heavy Function.
Can It Be Faster?
/*
Calculate statistics made from ushort array
blocks and byte array of bits and pack them
into a new format.
*/
byte[] PackBits(ushort[] blocks,
byte numberOfBits,
bool CalcStatistics)
Potential Improvement #1
byte[] PackBits(ushort[] blocks, byte numberOfbits, bool сalcStatistics)
var stat = 0; var packedBits = new byte[maxN];
foreach (var block in blocks)
{
byte bits = NumberOfBits(block);
if (bits <= numberOfbits) {
/*branch1*/ }
else {
/*branch2*/
}
}
return b;
/*branch1*/
/*GetBits return byte array with numberOfbits for block*/
if (сalcStatistics) stat += numberOfbits - 1;
Buffer.BlockCopy(GetBits(block, numberOfbits), 0, packedBits,
n,numberOfbits);
n += numberOfbits;
index += numberOfbits;
/*branch2*/
if (сalcStatistics) stat += 5*numberOfbits;
n += tmp;
index += tmp;
Buffer.BlockCopy(GetBits(block, bits),0,packedBits,n, bits);
n += bits;
index += bits;
Unused Calculation Resuts
/*branch 1*/
Buffer.BlockCopy(GetBits(block, nbit), 0, b, n, nbit);
n = n + nbit;
index += numberOfbits;
/*branch 2*/
n = n + tmp;
index += tmp;
Buffer.BlockCopy(GetBits(block, bits), 0, b, n, bits);
n = n + bits;
index += bits;
Result of index calculation not used. Just remove it.
Potential Improvement #2
byte[] PackBits(ushort[] blocks, byte numberOfbits, bool сalcStatistics)
var stat = 0; var packedBits = new byte[maxN];
foreach (var block in blocks)
{
byte bits = NumberOfBits(block);
if (bits <= numberOfbits) {
/*branch1*/ }
else {
/*branch2*/
}
}
return b;
/*branch1*/
/*GetBits return byte array with numberOfbits for block*/
if (сalcStatistics) stat += numberOfbits - 1;
Buffer.BlockCopy(GetBits(block, numberOfbits), 0, packedBits,
n,numberOfbits);
n += numberOfbits;
index += numberOfbits;
/*branch2*/
if (сalcStatistics) stat += 5*numberOfbits;
n += tmp;
index += tmp;
Buffer.BlockCopy(GetBits(block, bits),0,packedBits,n, bits);
n += bits;
index += bits;
Avoid Redundant Copy
Write bits inside GetBits to existing
array instead of producing a new one
byte[] GetBits(uint X,byte N,byte[] dest
,int offset)
var i = offset + N – 1;
while (X!=0)
{
dest[i] = (byte)(X & 1);
X = X >> 1;
i--;
}
return a;
Avoid using Buffer.BlockCopy
Potential Improvement #3
byte[] PackBits(ushort[] blocks, byte numberOfbits, bool сalcStatistics)
var stat = 0; var packedBits = new byte[maxN];
foreach (var block in blocks)
{
byte bits = NumberOfBits(block);
if (bits <= numberOfbits) {
/*branch1*/ }
else {
/*branch2*/
}
}
return b;
/*branch1*/
/*GetBits return byte array with numberOfbits for block*/
if (сalcStatistics) stat += numberOfbits - 1;
GetBits (block, b , numberOfBits, n);
n += numberOfbits;
index += numberOfbits;
/*branch2*/
if (сalcStatistics) stat += 5*numberOfbits;
n += tmp;
index += tmp;
GetBits (block, b , bits, n);
n += bits;
index += bits;
Foreach vs. For to Iterate Array
It seems that the compiler is smart enough to make foreach a for
statement on JIT Asm level. But we prefer the simple for.
C.M(UInt16[]) /*foreach statement*/
L0000: mov rax, rdx
L0002: xor eax, eax
L0005: mov ecx, [rdx+0x8]
L0007: test ecx, ecx
L0009: jle L0018
L000c: movsxd r8, eax
L0012: movzx r8d, word [rdx+r8*2+0x10]
L0014: inc eax
L0016: cmp ecx, eax
L0018: jg L000c
L0020: ret
C.M(UInt16[]) /*for statement*/
L0000: xor eax, eax
L0002: mov ecx, [rdx+0x8]xor eax, eax
L0005: test ecx, ecx
L0007: jle L0018
L0009: movsxd r8, eax
L000c: movzx r8d, word [rdx+r8*2+0x10]
L0012: inc eax
L0014: cmp ecx, eax
L0016: jg L0009
L0018: ret
Potential Improvement #4
byte[] PackBits(ushort[] blocks, byte numberOfbits, bool сalcStatistics)
var stat = 0; var packedBits = new byte[maxN];
for (int i=0; i< blocks.length; i++)
{
byte bits = NumberOfBits(block);
if (bits <= numberOfbits) {
/*branch1*/ }
else {
/*branch2*/
}
}
return b;
/*branch1*/
/*GetBits return byte array with numberOfbits for block*/
if (сalcStatistics) stat += numberOfbits - 1;
GetBits (block, b , numberOfBits, n);
n += numberOfbits;
index += numberOfbits;
/*branch2*/
if (сalcStatistics) stat += 5*numberOfbits;
n += tmp;
index += tmp;
GetBits (block, b , bits, n);
n += bits;
index += bits;
maxN Is Known
byte[] PackBits(ushort[] blocks, byte numberOfbits, bool сalcStatistics, int maxN);
int PackedBitsLength(ushort[] blocks, byte numberOfbits, bool сalcStatistics);
Upper bound maxN calculated before call PackBits; we can pass its function as a parameter
Often, PackBits calls just to get a result array length n variable in PackBits. So we can split it
in two functions.
Potential Improvement #5
byte[] PackBits(ushort[] blocks, byte numberOfbits, bool сalcStatistics, int maxN)
var stat = 0; var packedBits = new byte[maxN];
foreach (var block in blocks)
{
byte bits = NumberOfBits(block);
if (bits <= numberOfbits) {
/*branch1*/ }
else {
/*branch2*/
}
}
return b;
/*branch1*/
/*GetBits return byte array with numberOfbits for block*/
if (сalcStatistics) stat += numberOfbits - 1;
GetBits (block, b , numberOfBits, n);
n += numberOfbits;
index += numberOfbits;
/*branch2*/
if (сalcStatistics) stat += 5*numberOfbits;
n += tmp;
index += tmp;
GetBits (block, b , bits, n);
n += bits;
index += bits;
We Don’t Need an If Statement
var stat = 0; var packedBits = new byte[maxN];
foreach (var block in blocks)
{
byte bits = NumberOfBits(block);
if (bits <= nbit) {/*branch 1*/
if (сalcStatistics) stat += numberOfbits - 1;
} else{/*branch 2*/
if (сalcStatistics) stat += 5*numberOfbits;
}
}
return b;
Statistics are calculated only
when we need to produce an
array.
So we can make a calculation
without the condition and
avoid using a branch predictor.
In PackedBitsLength, we can
remove these rows.
PackBits
byte[] PackBits(ushort[] blocks, byte numberOfbits, bool сalcStatistics, int maxN)
var stat = 0; var packedBits = new byte[maxN];
for(int i=0;i< blocks.length; i++)
{
byte bits = NumberOfBits(block);
if (bits <= numberOfbits) {
/*branch1*/ }
else {
/*branch2*/
}
}
return b;
/*branch1*/
/*GetBits return byte array with numberOfbits for block*/
if (сalcStatistics) stat += numberOfbits - 1;
GetBits (block, b , numberOfBits, n);
n += numberOfbits;
index += numberOfbits;
/*branch2*/
if (сalcStatistics) stat += 5*numberOfbits;
n += tmp;
index += tmp;
GetBits (block, b , bits, n);
n += bits;
index += bits;
#6 Performance Summary
Execution time
Old 103,775
47,239
54.48 %
-19 s 974 ms
Memory (MB)
New
Diff
%
1 m 10 s 167 ms
50 s 193 ms
56,536
28.47 %
х 2.20
х 1.40
#7 Heavy function PackBits2.
CAN IT BE FASTER?
/*
Calculate statistics, make from uint list
bloks, byte array of bits and pack it in new
format.
*/
byte[] PackBits2(List<unit> blocks,
byte numberOfBits,
bool CalcStatistics)
#7 Performance Summary
Execution time
Old 47,239
10,510
77.75 %
-13 s 562 ms
Memory (MB)
New
Diff
%
50 s 193 ms
36 s 631 ms
26,729
27.02 %
х 4.49
х 1.37
#8 WriteBits - Can It Be Faster?
/* Write to the binary file the
given number of bits from UInt
and store bits which do not fit
into 8 bits in static array */
WriteBits(BinaryWriter bw, uint x
byte numberOfBits)
Potential Improvement #1
void WriteBits(BinaryWriter bw, uint x, byte numberOfBits)
byte[] bits = GetBits(x, numberOfBits);
byte[] tmpB = new byte[buffB.Length + bits.Length];
Buffer.BlockCopy(buffB, 0, tmpB, 0, buffB.Length);
Buffer.BlockCopy(bits, 0, tmpB, buffB.Length, bits.Length);
buffB = tmpB; // static byte array to store not writing bytes
for (int i = 0; i < buffB.Length / 8; i++)
{ int j = i * 8;
іbw.Write((byte)(buffB[j]*128+buffB[j+1]*64+buffB[j+2]*32+buffB[j+3]*16+buffB[j+4]*8+buffB[j+5]*4+buffB[j+6]*2+buffB[j+7]))
;
}
int L = (buffB.Length / 8) * 8;
for (int i = L; i < buffB.Length; i++) buffB[i - L] = buffB[i];
Array.Resize(ref buffB, buffB.Length - L);
Buffer.BlockCopy Can Be Inefficient
numberOfBits parameter can’t be more then 32 because we operate with Int32 and Uint32 numbers,
so the bit tail can’t be more than 8 bits. We need to store no more than 40 bits (in a 40-byte array).
Thus, copying an array with Buffer.BlockCopy is not so efficient. Replace it with a simple copy element
in the loop.
Potential Improvement #2
void WriteBits(BinaryWriter bw, uint x, byte numberOfBits)
byte[] bits = GetBits(x, numberOfBits);
byte[] tmpB = new byte[buffB.Length + bits.Length];
for (int i = 0; i < buffB.Length; i++) tmpB[i] = buffB[i];
for (int i = 0; i < bits.Length; i++) tmpB[i+ BitsBuffLength] = a[i];
buffB = tmpB; // static byte array to store not writing bytes
for (int i = 0; i < buffB.Length / 8; i++)
{ int j = i * 8;
bw.Write((byte)(buffB[j]*128+buffB[j+1]*64+buffB[j+2]*32+buffB[j+3]*16+buffB[j+4]*8+buffB[j+5]*4+buffB[j+6]*2+buffB[j+7]));
}
int L = (buffB.Length / 8) * 8;
for (int i = L; i < buffB.Length; i++) buffB[i - L] = buffB[i];
Array.Resize(ref buffB, buffB.Length - L);
Optimizing Software in C++ By Agner Fog.
Potential Improvement #3
void WriteBits(BinaryWriter bw, uint x, byte numberOfBits)
byte[] bits = GetBits(x, numberOfBits);
byte[] tmpB = new byte[buffB.Length + bits.Length];
for (int i = 0; i < buffB.Length; i++) tmpB[i] = buffB[i];
for (int i = 0; i < bits.Length; i++) tmpB[i+ BitsBuffLength] = a[i];
buffB = tmpB; // static byte array to store not writing bytes
for (int i = 0; i < buffB.Length >> 3; i++)
{ int j = i << 3;
bw.Write((byte)(buffB[j]<<7+buffB[j+1]<<6+buffB[j+2]<<5+buffB[j+3]<<4+buffB[j+4]<<3+buffB[j+5]<<2+buffB[j+6]<<1+buffB[j+7]));
}
int L = (buffB.Length >> 8) << 8;
for (int i = L; i < buffB.Length; i++) buffB[i - L] = buffB[i];
Array.Resize(ref buffB, buffB.Length - L);
Buffer Array to Store Bits Tail
We don’t need to create a new array each time, and instead just reuse the existing buffer length of 8.
Each time, we will store its current bits tail length so we can avoid using Array.Resize()
WriteBits
void WriteBits(BinaryWriter bw, uint x, byte numberOfBits)
byte[] bits = GetBits(x, numberOfBits);
byte[] tmpB = new byte[buffB.Length + bits.Length];
for (int i = 0; i < buffB.Length; i++) tmpB[i] = buffB[i];
for (int i = 0; i < bits.Length; i++) tmpB[i+ BitsBuffLength] = a[i];
buffB = tmpB; // static byte array to store not writing bytes
for (int i = 0; i < buffB.Length >> 3; i++)
{ int j = i << 3;
bw.Write((byte)(buffB[j]<<7+buffB[j+1]<<6+buffB[j+2]<<5+buffB[j+3]<<4+buffB[j+4]<<3+buffB[j+5]<<2+buffB[j+6]<<1+buffB[j+7]));
}
int L = (buffB.Length >> 8) << 8;
for (int i = L; i < buffB.Length; i++) buffB[i - L] = buffB[i];
BitsBuffLength = Bits_ tmpB.Length - L;
#8 Performance Summary
Execution time
Old 10,510
9,754
7.19 %
-4 s 080 ms
Memory (MB)
New
Diff
%
36 s 631 ms
32 s 551 ms
756
11.14 %
х 1.08
х 1.13
#9 ScaleGrad. - Can It Be Faster?
/*
Return index of number x
by ordered scale
*/
int ScaleGrad(int x)
Potential Improvements
Avoid compare int and double
values
Scale is a sorted array, so we
can use binary search; it’s more
efficient and less dependent
on input data
static double[] Scale;
…
/* 600+ lines of code */
…
int ScaleGrad(int x)
{
for(int i=0; i<Scale.Length && Scale[i]<=x; i++)
return i - 1;
}
Comparison
static double[] Scale;
/* 600+ lins of code */
int ScaleGrad(int x)
{
for(int i=0;(i<Scale.Length)&&(Scale[i]<= x);i++);
return i - 1;
}
static int[] Scale;
/* 600+ lins of code */
int ScaleGrad(int x)
var left = 1; var right = Scale.Length -1;
var mid =(left + right)>>1;//(left+right)/2
do {
mid = left + ((right - left)>>1);
if ( x < Scale[mid]) right = mid - 1;
else left = mid + 1;
} while (right >= left);
return mid;
#9 Performance Summary
Execution time
Old 9,754
9,754
0.00 %
-3 s 707 ms
Memory (MB)
New
Diff
%
32 s 551 ms
28 s 844 ms
0
11.39 %
х 1.00
х 1.13
#10 ReadData - Can It Be Faster?
ReadInt32( )
Used in main processing
function, so we shouldn’t touch
it right now.
ReadUInt32( )
In both cases, the binary file
read and the basis on read
data is the calculated number
of binary chains.
Potential Improvements
Don’t allocate and collect
unused data in a list
Read UInt64 values instead of
UIn32. It has cut the loop
length in half, but we have to
check if the loop length is not
odd
using(var br = new BinaryReader(…))
var list = new List<int>();
var lenght = br.BaseStream.Length – 4;
while (br.BaseStream.Position <= length){
br.ReadUInt64();
var n = br.ReadUInt32();
list.Add(n);
for (int i=0; i < n; i++)
br.ReadUInt32();
}
Comparison
using(var br = new BinaryReader(…))
var list = new List<int>();
var lenght = br.BaseStream.Length – 4;
while (br.BaseStream.Position <= length) {
br.ReadUInt64();
var n = br.ReadUInt32();
list.Add(n);
for(int i=0; i < n; i++) br.ReadUInt32();
}
using(var br = new BinaryReader(…))
var lenght = br.BaseStream.Length – 4;
while (br.BaseStream.Position <= length) {
br.ReadUInt64();
var n = br.ReadUInt32();
for(int i=0; i < n>>1; i++)
br.ReadUInt64();
if ((n & 1) == 1) br.ReadUInt32();
}
#10 Performance Summary
Execution time
Old 9,754
9,690
0.66 %
-4 s 796 ms
Memory (MB)
New
Diff
%
28 s 844 ms
24 s 048 ms
64
16.63 %
х 1.01
х 1.13
#11 List Usage - Can It Be Faster?
A lot of generic list usage (create
new list instances, add elements,
iterate).
It can be a convention or common
approach, but its usage is costly.
Even then, you just get element by
index.
var list = new List();
var item = list[i];
Potential Improvement #1
List<Point> Foo(List<Point> Points)
{
var R = new List<Point>();
foreach (var P in Points)
R.Add(new Point(P.X, ProcessPoint(P.Y)));
return R;
}
Favor Arrays Over Lists
Use arrays instead of lists if
possible. It allows simple array
indexing as opposed to add or [
] list functions
Use the capacity in list
constructor if it is known; it
allows you to add elements
without an internal array resize
public T this[int index] {
get {
if ((uint) index >= (uint)_size)
ThrowHelper.ThrowArgumentOutOfRangeException();
Contract.EndContractBlock();
return _items[index];
}
set {
if ((uint) index >= (uint)_size)
ThrowHelper.ThrowArgumentOutOfRangeException();
Contract.EndContractBlock();
_items[index] = value;
version++;
}
}
Potential Improvement #2
List<Point> Foo(List<Point> Points)
{
var R = new List<Point>();
foreach (var P in Points)
R.Add(new Point(P.X, ProcessPoint(P.Y)));
return R;
}
Favor Arrays Over Lists
Use simple for instead foreach to avoid :
- virtual GetEnumerator(), which
produces boxing
- Instance methods call get_Current( )
and the somewhat complex
MoveNext( )
…
callvirt instance valuetype GetEnumerator()
…
// loop start
…
call instance !0 valuetype get_Current()
…
call instance bool valuetype MoveNext()
…
// end loop
Comparison
List<Point> Foo(List<Point> Points)
{
var R = new List<Point>();
foreach (var P in Points)
R.Add(new Point(P.X,
ProcessPoint(P.Y)));
return R;
}
Point[] Foo(List<Point> Points)
{
var n = Points.Count;
var R = new Point[n];
for(var i = 0; i < n; i++)
R[i] = new Point(Points[i].X,
ProcessPoint(Points[i].Y));
return R;
}
#11 Performance Summary
Execution time
Old 9,690
7,842
19.07 %
-5 s 715 ms
Memory (MB)
New
Diff
%
24 s 048 ms
18 s 333 ms
1,848
23.76 %
х 1.24
х 1.31
#12 WriteBits - Can It Be Faster?
WriteBits is near the top again, and it has 64.56 % own time, so let’s try to optimize it.
Potential Improvement #3
void WriteBits(BinaryWriter bw, uint x, byte numberOfbits)
byte[] bits = GetBits(x, numberOfbits);
byte[] tmpB = new byte[buffB.Length + bits.Length];
for (int i = 0; i < BitsBuffLength; i++) tmpB[i] = buffB[i];
for (int i = 0; i < bits.Length; i++) tmpB[i+ BitsBuffLength] = bits[i];
buffB = tmpB; // static byte array to store not writing bytes
for (int i = 0; i < buffB.Length >> 3; i++) {
int j = i << 3;
bw.Write((byte)(buffB[j]<<7+buffB[j+1]<<6+buffB[j+2]<<5+buffB[j+3]<<4
+buffB[j+4]<<3+buffB[j+5]<<2+buffB[j+6]<<1+buffB[j+7]));
}
int L = (buffB.Length >> 3) << 3;
for (int i = L; i < buffB.Length; i++) {buffB[i - L] = buffB[i];}
BitsBuffLength = tmpB.Length - L;
Don't Forget to Use New Features
We forgot that GetBits can fill arrays with offset.
Using it here can prevent redundant array copying.
byte[] tmpB = new byte[buffB.Length+ bits.Length];
byte[] bits = GetBits(x, numberOfbits);
for (int i = 0; i < BitsBuffLength; i++)
tmpB[i] = buffB[i];
for (int i = 0; i < bits.Length; i++)
tmpB[i+ BitsBuffLength] = bits[i];
byte[] tmpB = new byte[BitsBuffLength + numberOfBits];
GetBits(x, numberOfBits, tmpB, BitsBuffLength);
for (int i = 0; i < BitsBuffLength; i++)
tmpB[i] = buffB[i];
Potential Improvement #2
void WriteBits(BinaryWriter bw, uint x, byte numberOfbits)
byte[] bits = GetBits(x, numberOfbits);
byte[] tmpB = new byte[buffB.Length + bits.Length];
for (int i = 0; i < BitsBuffLength; i++) tmpB[i] = buffB[i];
for (int i = 0; i < bits.Length; i++) tmpB[i+ BitsBuffLength] = bits[i];
buffB = tmpB; // static byte array to store not writing bytes
for (int i = 0; i < buffB.Length >> 3; i++) {
int j = i << 3;
bw.Write((byte)(buffB[j]<<7+buffB[j+1]<<6+buffB[j+2]<<5+buffB[j+3]<<4
+buffB[j+4]<<3+buffB[j+5]<<2+buffB[j+6]<<1+buffB[j+7]));
}
int L = (buffB.Length >> 3) << 3;
for (int i = L; i < buffB.Length; i++) {buffB[i - L] = buffB[i];}
BitsBuffLength = tmpB.Length - L;
Even Small Operations Can Have
Significant Impacts
Each iteration we calculate i << 3 and
offset for array buffB, we can counter
with step 8 and calculate offset for 8
needed elements
It’s not critical, but we change +
operation to |
Comparison
for (int i=0; i < buffB.Length >> 3; i++) {
int j = i << 3;
bw.Write(
(byte)(buffB[j]<<7+buffB[j+1]<<6
+buffB[j+2]<<5+buffB[j+3]<<4
+buffB[j+4]<<3+buffB[j+5]<<2
+buffB[j+6]<<1+buffB[j+7]));
}
for (int i=0; i<(buffB.Length >> 3) << 3; i+=8) {
bw.Write(
(byte)(buffB[i]<<7|buffB[i+1]<<6
|buffB[i+2]<<5|buffB[i+3]<<4
|buffB[i+4]<<3|buffB[i+5]<<2
|buffB[i+6]<<1|buffB[i+7]));
}
WriteBits
void WriteBits(BinaryWriter bw, uint x, byte numberOfBits)
byte[] tmpB = new byte[BitsBuffLength + numberOfBits];
GetBits(x, numberOfBits, tmpB, BitsBuffLength);
for (int i = 0; i < BitsBuffLength; i++) tmpB[i] = buffB[i];
buffB = tmpB; // static byte array to store not writing bytes
int L = (buffB.Length >> 8) << 8;
for (int i = 0; i < L; i+=8)
bw.Write((byte)(buffB[i]<<7|buffB[i+1]<<6|buffB[i+2]<<5|buffB[i+3]<<4
|buffB[i+4]<<3|buffB[i+5]<<2|buffB[i+6]<<1|buffB[i+7]));
for (int i = L; i < buffB.Length; i++) {buffB[i - L] = buffB[i];}
BitsBuffLength = tmpB.Length - L;
#12 Performance Summary
Execution time
Old 7,842
5,525
29.55 %
-1 s 966 ms
Memory (MB)
New
Diff
%
18 s 333 ms
16 s 367 ms
2,317
10.72 %
х 1.42
х 1.12
#13 Redundant Calls
uint statatistics1 …;
uint statatistics2 …;
…
uint x = …;
…
/* byte NumberOfBits(unit x) */
statatistics1 += NumberOfBits(x);
statatistics2 += 28 - NumberOfBits(x);
uint statatistics1 …;
uint statatistics2 …;
…
uint x = …;
…
/* byte NumberOfBits(unit x) */
uint xn = NumberOfBits(x);
statatistics1 += xn;
statatistics2 += 28 - xn;
#13 Performance Summary
Execution time
Old 5,525
4,424
19.93 %
-1 s 331 ms
Memory (MB)
New
Diff
%
16 s 367 ms
15 s 036 ms
1,101
8.13 %
х 1.25
х 1.09
Summary
dotMemory Report
Timeline Report Comparison
#12 Performance Summary
Execution time
Old 103,775
4,424
95.74 %
45 s 131 ms
Memory (MB)
New
Diff
%
1 m 10 s 167 ms
15 s 036 ms
99,341
78.57 %
х 23.46
х 4.67
Performance Summary
Old
New
Diff
%
Execution time Memory (MB) Peak (MB)
43 m 21 s 642 ms
15 s 036 ms
43 m 06 s 602 ms
99.42 %
x 173.02
374,011
4,424
369,577
98.82 %
х 84.54
93,26
33,67
59,59
63,89 %
х 2.77
Summary
Thank you!
What was the application doing
for 43 minutes?
Q&A
LINKS
Use dotTrace Command-Line Profiler Hashtable and dictionary collection types
.NET Performance Optimization &
Profiling with JetBrains dotTrace
Why GC run when using a struct as a
generic dictionary
Matt Ellis. Writing Allocation Free Code
in C#
Konrad Kokosa. High-performance code
design patterns in C#
Maarten Balliauw. Let’s refresh our
memory! Memory management in .NET
Sasha Goldshtein. Pro .NET Performance:
Optimize Your C# Applications
Ben Watson. Writing High-Performance
.NET Code, 2nd Edition
Writing Faster Managed Code: Know
What Things Cost
Maarten Balliauw
LINKS
Sasha Goldshtein
Yevhen Tatarynov GitHub
Ling.Concat
Linq.Concat Implementation
Buffer.BlockCopy Generic List implementation
Optimizing software in C++
Denis Reznik video Array.Sort

More Related Content

Similar to Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov

What's new in Python 3.11
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11Henry Schreiner
 
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningJava 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningCarol McDonald
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimizationg3_nittala
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeRizwan Habib
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16MLconf
 
Getting started cpp full
Getting started cpp   fullGetting started cpp   full
Getting started cpp fullVõ Hòa
 
Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsemBO_Conference
 
Flink Batch Processing and Iterations
Flink Batch Processing and IterationsFlink Batch Processing and Iterations
Flink Batch Processing and IterationsSameer Wadkar
 
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsReverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsAsuka Nakajima
 
Valerii Vasylkov Erlang. measurements and benefits.
Valerii Vasylkov Erlang. measurements and benefits.Valerii Vasylkov Erlang. measurements and benefits.
Valerii Vasylkov Erlang. measurements and benefits.Аліна Шепшелей
 
SE2016 Exotic Valerii Vasylkov "Erlang. Measurements and benefits"
SE2016 Exotic Valerii Vasylkov "Erlang. Measurements and benefits"SE2016 Exotic Valerii Vasylkov "Erlang. Measurements and benefits"
SE2016 Exotic Valerii Vasylkov "Erlang. Measurements and benefits"Inhacking
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Serban Tanasa
 
KScope14 Jython Scripting
KScope14 Jython ScriptingKScope14 Jython Scripting
KScope14 Jython ScriptingAlithya
 
Bypassing DEP using ROP
Bypassing DEP using ROPBypassing DEP using ROP
Bypassing DEP using ROPJapneet Singh
 
Python for R developers and data scientists
Python for R developers and data scientistsPython for R developers and data scientists
Python for R developers and data scientistsLambda Tree
 

Similar to Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov (20)

What's new in Python 3.11
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11
 
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningJava 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKee
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
 
Getting started cpp full
Getting started cpp   fullGetting started cpp   full
Getting started cpp full
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
 
Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf Tools
 
Flink Batch Processing and Iterations
Flink Batch Processing and IterationsFlink Batch Processing and Iterations
Flink Batch Processing and Iterations
 
srgoc
srgocsrgoc
srgoc
 
Using Parallel Computing Platform - NHDNUG
Using Parallel Computing Platform - NHDNUGUsing Parallel Computing Platform - NHDNUG
Using Parallel Computing Platform - NHDNUG
 
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsReverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
 
Valerii Vasylkov Erlang. measurements and benefits.
Valerii Vasylkov Erlang. measurements and benefits.Valerii Vasylkov Erlang. measurements and benefits.
Valerii Vasylkov Erlang. measurements and benefits.
 
SE2016 Exotic Valerii Vasylkov "Erlang. Measurements and benefits"
SE2016 Exotic Valerii Vasylkov "Erlang. Measurements and benefits"SE2016 Exotic Valerii Vasylkov "Erlang. Measurements and benefits"
SE2016 Exotic Valerii Vasylkov "Erlang. Measurements and benefits"
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
KScope14 Jython Scripting
KScope14 Jython ScriptingKScope14 Jython Scripting
KScope14 Jython Scripting
 
CP 04.pptx
CP 04.pptxCP 04.pptx
CP 04.pptx
 
Bypassing DEP using ROP
Bypassing DEP using ROPBypassing DEP using ROP
Bypassing DEP using ROP
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Python for R developers and data scientists
Python for R developers and data scientistsPython for R developers and data scientists
Python for R developers and data scientists
 

More from Fwdays

"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y..."How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...Fwdays
 
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil TopchiiFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro SpodaretsFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua", Maksym KindritskyiFwdays
 
"Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl..."Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl...Fwdays
 
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T..."How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...Fwdays
 
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ..."The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...Fwdays
 
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu..."[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...Fwdays
 
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care..."[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...Fwdays
 
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"..."4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...Fwdays
 
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout",  Anast..."Reconnecting with Purpose: Rediscovering Job Interest after Burnout",  Anast...
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...Fwdays
 
"Mentoring 101: How to effectively invest experience in the success of others...
"Mentoring 101: How to effectively invest experience in the success of others..."Mentoring 101: How to effectively invest experience in the success of others...
"Mentoring 101: How to effectively invest experience in the success of others...Fwdays
 
"Mission (im) possible: How to get an offer in 2024?", Oleksandra Myronova
"Mission (im) possible: How to get an offer in 2024?",  Oleksandra Myronova"Mission (im) possible: How to get an offer in 2024?",  Oleksandra Myronova
"Mission (im) possible: How to get an offer in 2024?", Oleksandra MyronovaFwdays
 
"Why have we learned how to package products, but not how to 'package ourselv...
"Why have we learned how to package products, but not how to 'package ourselv..."Why have we learned how to package products, but not how to 'package ourselv...
"Why have we learned how to package products, but not how to 'package ourselv...Fwdays
 
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin..."How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...Fwdays
 

More from Fwdays (20)

"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y..."How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
 
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
 
"Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl..."Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl...
 
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T..."How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
 
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ..."The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
 
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu..."[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
 
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care..."[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
 
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"..."4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
 
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout",  Anast..."Reconnecting with Purpose: Rediscovering Job Interest after Burnout",  Anast...
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...
 
"Mentoring 101: How to effectively invest experience in the success of others...
"Mentoring 101: How to effectively invest experience in the success of others..."Mentoring 101: How to effectively invest experience in the success of others...
"Mentoring 101: How to effectively invest experience in the success of others...
 
"Mission (im) possible: How to get an offer in 2024?", Oleksandra Myronova
"Mission (im) possible: How to get an offer in 2024?",  Oleksandra Myronova"Mission (im) possible: How to get an offer in 2024?",  Oleksandra Myronova
"Mission (im) possible: How to get an offer in 2024?", Oleksandra Myronova
 
"Why have we learned how to package products, but not how to 'package ourselv...
"Why have we learned how to package products, but not how to 'package ourselv..."Why have we learned how to package products, but not how to 'package ourselv...
"Why have we learned how to package products, but not how to 'package ourselv...
 
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin..."How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
 

Recently uploaded

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Recently uploaded (20)

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov

  • 1.
  • 2. Yevhen Tatarynov Software developer with 15 years of experience in commercial software and database development (.NET / MS SQL / Delphi) PhD in math, specializing in the theoretical foundations of computer science and cybernetics I was involved in projects performing complex mathematical calculations and processing large amounts of data. For now my role senior software developer in infrastructure team, Covent IT. Point of professional interest: application performance optimization and analysis writing C# code similar in performance to C++ advanced debugging
  • 3. Agenda What is the challenge? Measurements Is it an issue? QA Intermediate results Summary Can it be faster?
  • 5. WinForms .NET Applications Read *.bin, *.txt data files “Process bits”, extract and use full data, pack into new format Write results in text and binary files Use .NET Framework 4.0 Run on Windows 10 x64 It works correctly
  • 6. Environment Windows 10 x64 19042.804 CPU – Intel(R) Core(TM) i3-2330M CPU @ 2.20GHz SSD – Kingston SKC400S37256G (Read 550MB/s / Write 540 MB/s)
  • 7. Non-Functional Requirements We need to process files > 1GB (size of processed files can increase significantly) Application will run on personal laptops Processing data should be very fast to obtain a result in an appropriate amount of time
  • 8. Disappointing Forecast File size ~(363+236)MB Execution time Peak memory load Total memory allocation ~(415+178)MB > (1 + 0.25|0.5)GB ~ 11.5 min. ~45.5 min ~89.28 MB ~93.26 MB ~132,584 MB ~374,011 MB
  • 9. What Is the Challenge? Goals Values Reduce execution time Don’t increase total memory allocation Processing finished in appropriate time Don’t break the application output
  • 10. Choose Metrics Execution time – main metric Total memory allocation – try not to increase Memory load peak – low priority
  • 12. Measurement Tools DateTime.Now StopWatch DotNet Benchmark Perfview Visual Studio Performance Profiler R# dotTrace RedGate Performance monitor
  • 14. dotTrace TimeLine Sample & Snapshot Execution Time
  • 15. dotTrace TimeLine Snapshot Execution Time
  • 16. Sequential Runs Impact Execution time HDD Execution time SSD 1st 2nd Diff % 2,806,777 ms (46 m 47 s 777 ms) 2,742,178 ms (45 m 42 s 178 ms) 2,744,085 ms (45 m 44 s 085 ms) 2,601,642 ms (43 m 21 s 642 ms) 62,692 ms (01 m 03 s 692 ms) 140,536 ms (02 m 20 s 536 ms) 2,23 % 5,12 %
  • 17. Experiment Results Breakage Factors Drive read / write speed, cache, type (HDD, SSD) CPU base frequency, cache, burn time, turbo boost, power supply schema Anti-malware software Scheduled process, system updates System file cache, file fragmentation Total system load (CPU, RAM, Drive)
  • 18. Experiment Results Breakage Factors Disable anti-malware software, system schedules, system updates Set high-performance power supply schema if you are using a laptop Restart OS to end redundant processes and clear caches Wait until it's fully loaded Run application on SSD drive types Run application twice on the same input data In the analysis, we will use the results from the 2nd execution
  • 19. Is It an Issue?
  • 20. #1 Is Linq an Issue?
  • 21. Potential Improvements Used .ToArray() - slow Concat use foreach and extra memory to iterate input params Each time we produce new byte array. Redundant memory traffic. var b = new byte[]; for (int i = 0; i < N; i++) { byte[] a = new byte[GetLen(i)]; /* fill a with values */ b = b.Concat(a).ToArray(); } return b;
  • 22. Linq.Concat static IEnumerable<TSource> Concat<TSource>(this IEnumerable<TSource> first, IEnumerable<TSource> second) { if (first == null) throw Error.ArgumentNull("first"); if (second == null) throw Error.ArgumentNull("second"); return ConcatIterator<TSource>(first, second); } static IEnumerable<TSource> ConcatIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second){ foreach (TSource element in first) yield return element; foreach (TSource element in second) yield return element;
  • 23. .ToArray() static TSource[] ToArray<TSource>(this IEnumerable<TSource> source) { if (source == null) throw Error.ArgumentNull("source"); return new Buffer<TSource>(source).ToArray(); } internal TElement[] ToArray() { if (count == 0) return new TElement[0]; if (items.Length == count) return items; TElement[] result = new TElement[count]; Array.Copy(items, 0, result, 0, count); return result;
  • 24. Solution var b = new byte[maxN]; var bN=0; for (int i = 0; i < N; i++) { var aN = GetLen(i); byte[] a = new byte[GetLen(i)]; Buffer.BlockCopy(aN,0,b,bN,a); bN += aN; } return b; Use one array to store array concatenation. To copy array into result, use Buffer.BlockCopy - faster than Array.Copy .
  • 25. Buffer.BlockCopy public static void BlockCopy (Array src, int srcOffset, Array dst, int dstOffset, int count); Copies a specified number of bytes from a source array starting at a particular offset to a destination array starting at a particular offset. ● src – Array The source buffer. ● srcOffset - Int32 The zero-based byte offset into src. ● dst – Array The destination buffer. ● dstOffset - Int32 The zero-based byte offset into dst. ● count - Int32 The number of bytes to copy.
  • 26. Comparison var b = new byte[]; for (int i = 0; i < N; i++) { byte[] a = new byte[GetLen(i)]; /* fill a with values */ b = b.Concat(a).ToArray(); } return b; var b = new byte[maxN]; var bN=0; for (int i = 0; i < N; i++) { var aN = GetLen(i); byte[] a = new byte[aN]; /* fill a with values */ Buffer.BlockCopy(aN,0,b,bN,aN); bN += aN; } return b;
  • 27. #1 Performance Summary Execution time 1st 374,011 120,333 88,61 % -38 m 25 s 228 ms Memory (MB) 2nd Diff % 43 m 21 s 642 ms 4 m 56 s 414 ms -253,678 68,83 % х 3,11 х 9,25
  • 28. #2 Is GetBits an Issue? /* Returns last N (N > 0) bits from X in byte array */ byte[] GetBits(uint x, byte n)
  • 29. Potential Improvements byte[] GetBits(uint x, byte n) var a = new byte[n]; var D = (uint)Math.Pow(2, n-1); for (var i = 0; i < n; i++) { a[i] = (byte)(x / D); x -= a[i] * D; D /= 2; } return a; Math.Pow use floating point operation to calculate result Convert int param to double Convert double result to uint Use * and / operation
  • 30. Optimizing Software in C++ by Agner Fog. Execution time (clock cycles) Operation Min Max Min floating point mul floating point div int add int sub int shift int mul int div 3 40 14 1 1 1 3 4 6 80 45 1 1 1 4 8 Conversion of signed integers to floating point is fast only when the SSE2 instruction set is enabled. Conversion of unsigned integers to floating point is faster only when the AVX512 instruction set is enabled. A conversion from floating point to integer without SSE2 typically takes 40 clock cycles.
  • 31. Solution byte[] GetBits(uint x, byte n) var a = new byte[n]; var i = n - 1; while (x! = 0) { a[i] = (byte)(x & 1); x = x >> 1; i--; } return a; Use binary representation on int numbers Don’t use Math.Pow Use only integer operands to avoid converting Use bitwise operation >> and &
  • 32. Comparison byte[] GetBits(uint x, byte n) var a = new byte[n]; var D = (uint)Math.Pow(2, n-1); for (var i = 0; i < n; i++) { a[i] = (byte)(x / D); x -= a[i] * D; D /= 2; } return a; byte[] GetBits(uint x, byte n) var a = new byte[n]; var i = n - 1; while (x!=0) { a[i] = (byte)(x & 1); x = x >> 1; i--; } return a;
  • 33. #2 Performance Summary Execution time Old 120,333 120,333 0,00 % -1 m 02 s 992 ms Memory (MB) New Diff % 4 m 56 s 414 ms 3 m 53 s 422 ms 0 21,25 % х 1,00 х 1,27
  • 34. #3 Is AnyPathHasIllegalCharacters an Issue? GetFileName call indirrectly AnyPathHasIllegalCharacters GetFileName used in functionality to sort file names (in format Name_ddddd.txt) located in specific folder. Files have to be sorted in order ddddd name part. The number of files > 13000 (Name_1.txt .. Name_13000.txt).
  • 35. Potential Improvements Bubble sort O(N2) for (var i = 0; i < FileNameArray.Length - 1; i++) for (var j = i + 1; j < FileNameArray.Length; j++) { var S1 = Path.GetFileName(FileNameArray[i]); var i_N = int.Parse(S1.Substring(4,S1.LastIndexOf('.’)-4)); S1 = Path.GetFileName(FileNameArray[j]); var j_N =int.Parse(S1.Substring(4,S1.LastIndexOf('.')-4)); if (i_N > j_N) { S1 = FileNameArray[i]; FileNameArray[i] = FileNameArray[j]; FileNameArray[j] = S1; } } Redundant GetFileName calls Redundant Parse calls Redundant Substring calls Redundant LastIndex calls
  • 36. Solution Array.Sort(FileNameArray, new NumericComparer()); public class NumericComparer : IComparer /* */ public int Compare(string x, string y) { var result = x.Length.CompareTo(y.Length); if (result == 0) return x.CompareTo(y); return result; } } Quick sort O(N log(N)) Don’t call GetFileName Don’t call Substring calls Don’t call Parse calls Don’t call LastIndex calls
  • 37. Comparison for (var i = 0; i < FileNameArray.Length - 1; i++) for (var j = i + 1; j < FileNameArray.Length; j++) { var S1 = Path.GetFileName(FileNameArray[i]); var i_N = int.Parse(S1.Substring(4,S1.LastIndexOf('.’)-4)); S1 = Path.GetFileName(FileNameArray[j]); var j_N = int.Parse(S1.Substring(4,S1.LastIndexOf('.')-4)); if (i_N > j_N) { S1 = FileNameArray[i]; FileNameArray[i] = FileNameArray[j]; FileNameArray[j] = S1; } } Array.Sort(FileNameArray, new NumericComparer()); public class NumericComparer : IComparer /* */ public int Compare(string x, string y) { var result = x.Length.CompareTo(y.Length); if (result == 0) return x.CompareTo(y); return result; } }
  • 38. #3 Performance Summary Execution time Old 120,333 103,775 13,76 % -1 m 42 s 331 ms Memory (MB) New Diff % 3 m 53 s 422 ms 2 m 11 s 091 ms 16,558 43,84 % х 1,16 х 1,78
  • 39. #4 Is FileStream.get_Length an Issue? In both cases, the binary file read and the basis on read data is the calculated number of binary chains.
  • 40. Potential Improvements using(var br = new BinaryReader(…)) { while (br.BaseStream.Position <= br.BaseStream.Length - 4) { counter++; br.ReadUInt32(); br.ReadUInt32(); var n = br.ReadUInt32(); for (int i = 0; i < n; i++) br.ReadUInt32(); } } Redundant Length calls Redundant subtraction Redundant call ReadUInt32
  • 41. Solution using(var br = new BinaryReader(…)) { var length = br.BaseStream.Length - 4; while (br.BaseStream.Position <= length) { counter++; br.ReadUInt64(); var n = br.ReadUInt32(); for (int i = 0; i < n; i++) br.ReadUInt32(); } } Store Length in local variable Call ReadUInt64 instead ReadUInt32
  • 42. Comparison using(var br = new BinaryReader(…)) { while (br.BaseStream.Position <= br.BaseStream.Length - 4) { counter++; br.ReadUInt32(); br.ReadUInt32(); var n = br.ReadUInt32(); for (int i = 0; i < n; i++) br.ReadUInt32(); } } using(var br = new BinaryReader(…)) { var length = br.BaseStream.Length - 4; while (br.BaseStream.Position <= length) { counter++; br.ReadUInt64(); var n = br.ReadUInt32(); for (int i = 0; i < n; i++) br.ReadUInt32(); } }
  • 43. #4 Performance Summary Execution time Old 103,775 103,775 0.00 % -31 s 786 ms Memory (MB) New Diff % 2 m 11 s 091 ms 1 m 39 s 305 ms 0 24.25 % х 1.00 х 1.32
  • 44. #5 Is Math.Log an Issue? /* Calculate the number of bits to store number X. For 0 it should return 1, 00101110b it should return 6. */ byte NumberOfBits(uint X)
  • 45. Potential Improvements byte NumberOfBits(uint X) { if (X > 0) return (byte)Math.Ceiling(Math.Log(X+1,2)); return 1; } Math.Log use floating point operation to calculate logarithm Convert double result to byte Convert uint to double param
  • 46. Solution byte NumberOfBits(uint X) { var x = (int)X; byte counter = 0; do { counter++; x = x >> 1; } while (x != 0); return counter; } Just calculate the number of bits in X to avoid floating point arithmetic with bitwise operation Avoid type conversion Avoid using Math.Log
  • 47. Comparison byte NumberOfBits(uint X) { if (X > 0) return (byte)Math.Ceiling(Math.Log(X+1,2)); return 1; } byte NumberOfBits(uint X) { var x = (int)X; byte counter = 0; do { counter++; x = x >> 1; } while (x != 0); return counter; }
  • 48. #5 Performance Summary Execution time Old 103,775 103,775 0.00 % -29 s 138 ms Memory (MB) New Diff % 1 m 39 s 305 ms 1 m 10 s 167 ms 0 29.34 % х 1.00 х 1.42
  • 53. Performance Summary Execution time Old 374,011 103,775 72.25 % - 42 m 11 s 575 ms Memory (MB) New Diff % 43 m 21 s 642 ms 1 m 10 s 167 ms 270,236 97.30 % х 3.60 х 37.08
  • 54. Can It Be Faster?
  • 55. #6 packbits Is A Heavy Function. Can It Be Faster? /* Calculate statistics made from ushort array blocks and byte array of bits and pack them into a new format. */ byte[] PackBits(ushort[] blocks, byte numberOfBits, bool CalcStatistics)
  • 56. Potential Improvement #1 byte[] PackBits(ushort[] blocks, byte numberOfbits, bool сalcStatistics) var stat = 0; var packedBits = new byte[maxN]; foreach (var block in blocks) { byte bits = NumberOfBits(block); if (bits <= numberOfbits) { /*branch1*/ } else { /*branch2*/ } } return b; /*branch1*/ /*GetBits return byte array with numberOfbits for block*/ if (сalcStatistics) stat += numberOfbits - 1; Buffer.BlockCopy(GetBits(block, numberOfbits), 0, packedBits, n,numberOfbits); n += numberOfbits; index += numberOfbits; /*branch2*/ if (сalcStatistics) stat += 5*numberOfbits; n += tmp; index += tmp; Buffer.BlockCopy(GetBits(block, bits),0,packedBits,n, bits); n += bits; index += bits;
  • 57. Unused Calculation Resuts /*branch 1*/ Buffer.BlockCopy(GetBits(block, nbit), 0, b, n, nbit); n = n + nbit; index += numberOfbits; /*branch 2*/ n = n + tmp; index += tmp; Buffer.BlockCopy(GetBits(block, bits), 0, b, n, bits); n = n + bits; index += bits; Result of index calculation not used. Just remove it.
  • 58. Potential Improvement #2 byte[] PackBits(ushort[] blocks, byte numberOfbits, bool сalcStatistics) var stat = 0; var packedBits = new byte[maxN]; foreach (var block in blocks) { byte bits = NumberOfBits(block); if (bits <= numberOfbits) { /*branch1*/ } else { /*branch2*/ } } return b; /*branch1*/ /*GetBits return byte array with numberOfbits for block*/ if (сalcStatistics) stat += numberOfbits - 1; Buffer.BlockCopy(GetBits(block, numberOfbits), 0, packedBits, n,numberOfbits); n += numberOfbits; index += numberOfbits; /*branch2*/ if (сalcStatistics) stat += 5*numberOfbits; n += tmp; index += tmp; Buffer.BlockCopy(GetBits(block, bits),0,packedBits,n, bits); n += bits; index += bits;
  • 59. Avoid Redundant Copy Write bits inside GetBits to existing array instead of producing a new one byte[] GetBits(uint X,byte N,byte[] dest ,int offset) var i = offset + N – 1; while (X!=0) { dest[i] = (byte)(X & 1); X = X >> 1; i--; } return a; Avoid using Buffer.BlockCopy
  • 60. Potential Improvement #3 byte[] PackBits(ushort[] blocks, byte numberOfbits, bool сalcStatistics) var stat = 0; var packedBits = new byte[maxN]; foreach (var block in blocks) { byte bits = NumberOfBits(block); if (bits <= numberOfbits) { /*branch1*/ } else { /*branch2*/ } } return b; /*branch1*/ /*GetBits return byte array with numberOfbits for block*/ if (сalcStatistics) stat += numberOfbits - 1; GetBits (block, b , numberOfBits, n); n += numberOfbits; index += numberOfbits; /*branch2*/ if (сalcStatistics) stat += 5*numberOfbits; n += tmp; index += tmp; GetBits (block, b , bits, n); n += bits; index += bits;
  • 61. Foreach vs. For to Iterate Array It seems that the compiler is smart enough to make foreach a for statement on JIT Asm level. But we prefer the simple for. C.M(UInt16[]) /*foreach statement*/ L0000: mov rax, rdx L0002: xor eax, eax L0005: mov ecx, [rdx+0x8] L0007: test ecx, ecx L0009: jle L0018 L000c: movsxd r8, eax L0012: movzx r8d, word [rdx+r8*2+0x10] L0014: inc eax L0016: cmp ecx, eax L0018: jg L000c L0020: ret C.M(UInt16[]) /*for statement*/ L0000: xor eax, eax L0002: mov ecx, [rdx+0x8]xor eax, eax L0005: test ecx, ecx L0007: jle L0018 L0009: movsxd r8, eax L000c: movzx r8d, word [rdx+r8*2+0x10] L0012: inc eax L0014: cmp ecx, eax L0016: jg L0009 L0018: ret
  • 62. Potential Improvement #4 byte[] PackBits(ushort[] blocks, byte numberOfbits, bool сalcStatistics) var stat = 0; var packedBits = new byte[maxN]; for (int i=0; i< blocks.length; i++) { byte bits = NumberOfBits(block); if (bits <= numberOfbits) { /*branch1*/ } else { /*branch2*/ } } return b; /*branch1*/ /*GetBits return byte array with numberOfbits for block*/ if (сalcStatistics) stat += numberOfbits - 1; GetBits (block, b , numberOfBits, n); n += numberOfbits; index += numberOfbits; /*branch2*/ if (сalcStatistics) stat += 5*numberOfbits; n += tmp; index += tmp; GetBits (block, b , bits, n); n += bits; index += bits;
  • 63. maxN Is Known byte[] PackBits(ushort[] blocks, byte numberOfbits, bool сalcStatistics, int maxN); int PackedBitsLength(ushort[] blocks, byte numberOfbits, bool сalcStatistics); Upper bound maxN calculated before call PackBits; we can pass its function as a parameter Often, PackBits calls just to get a result array length n variable in PackBits. So we can split it in two functions.
  • 64. Potential Improvement #5 byte[] PackBits(ushort[] blocks, byte numberOfbits, bool сalcStatistics, int maxN) var stat = 0; var packedBits = new byte[maxN]; foreach (var block in blocks) { byte bits = NumberOfBits(block); if (bits <= numberOfbits) { /*branch1*/ } else { /*branch2*/ } } return b; /*branch1*/ /*GetBits return byte array with numberOfbits for block*/ if (сalcStatistics) stat += numberOfbits - 1; GetBits (block, b , numberOfBits, n); n += numberOfbits; index += numberOfbits; /*branch2*/ if (сalcStatistics) stat += 5*numberOfbits; n += tmp; index += tmp; GetBits (block, b , bits, n); n += bits; index += bits;
  • 65. We Don’t Need an If Statement var stat = 0; var packedBits = new byte[maxN]; foreach (var block in blocks) { byte bits = NumberOfBits(block); if (bits <= nbit) {/*branch 1*/ if (сalcStatistics) stat += numberOfbits - 1; } else{/*branch 2*/ if (сalcStatistics) stat += 5*numberOfbits; } } return b; Statistics are calculated only when we need to produce an array. So we can make a calculation without the condition and avoid using a branch predictor. In PackedBitsLength, we can remove these rows.
  • 66. PackBits byte[] PackBits(ushort[] blocks, byte numberOfbits, bool сalcStatistics, int maxN) var stat = 0; var packedBits = new byte[maxN]; for(int i=0;i< blocks.length; i++) { byte bits = NumberOfBits(block); if (bits <= numberOfbits) { /*branch1*/ } else { /*branch2*/ } } return b; /*branch1*/ /*GetBits return byte array with numberOfbits for block*/ if (сalcStatistics) stat += numberOfbits - 1; GetBits (block, b , numberOfBits, n); n += numberOfbits; index += numberOfbits; /*branch2*/ if (сalcStatistics) stat += 5*numberOfbits; n += tmp; index += tmp; GetBits (block, b , bits, n); n += bits; index += bits;
  • 67. #6 Performance Summary Execution time Old 103,775 47,239 54.48 % -19 s 974 ms Memory (MB) New Diff % 1 m 10 s 167 ms 50 s 193 ms 56,536 28.47 % х 2.20 х 1.40
  • 68. #7 Heavy function PackBits2. CAN IT BE FASTER? /* Calculate statistics, make from uint list bloks, byte array of bits and pack it in new format. */ byte[] PackBits2(List<unit> blocks, byte numberOfBits, bool CalcStatistics)
  • 69. #7 Performance Summary Execution time Old 47,239 10,510 77.75 % -13 s 562 ms Memory (MB) New Diff % 50 s 193 ms 36 s 631 ms 26,729 27.02 % х 4.49 х 1.37
  • 70. #8 WriteBits - Can It Be Faster? /* Write to the binary file the given number of bits from UInt and store bits which do not fit into 8 bits in static array */ WriteBits(BinaryWriter bw, uint x byte numberOfBits)
  • 71. Potential Improvement #1 void WriteBits(BinaryWriter bw, uint x, byte numberOfBits) byte[] bits = GetBits(x, numberOfBits); byte[] tmpB = new byte[buffB.Length + bits.Length]; Buffer.BlockCopy(buffB, 0, tmpB, 0, buffB.Length); Buffer.BlockCopy(bits, 0, tmpB, buffB.Length, bits.Length); buffB = tmpB; // static byte array to store not writing bytes for (int i = 0; i < buffB.Length / 8; i++) { int j = i * 8; іbw.Write((byte)(buffB[j]*128+buffB[j+1]*64+buffB[j+2]*32+buffB[j+3]*16+buffB[j+4]*8+buffB[j+5]*4+buffB[j+6]*2+buffB[j+7])) ; } int L = (buffB.Length / 8) * 8; for (int i = L; i < buffB.Length; i++) buffB[i - L] = buffB[i]; Array.Resize(ref buffB, buffB.Length - L);
  • 72. Buffer.BlockCopy Can Be Inefficient numberOfBits parameter can’t be more then 32 because we operate with Int32 and Uint32 numbers, so the bit tail can’t be more than 8 bits. We need to store no more than 40 bits (in a 40-byte array). Thus, copying an array with Buffer.BlockCopy is not so efficient. Replace it with a simple copy element in the loop.
  • 73. Potential Improvement #2 void WriteBits(BinaryWriter bw, uint x, byte numberOfBits) byte[] bits = GetBits(x, numberOfBits); byte[] tmpB = new byte[buffB.Length + bits.Length]; for (int i = 0; i < buffB.Length; i++) tmpB[i] = buffB[i]; for (int i = 0; i < bits.Length; i++) tmpB[i+ BitsBuffLength] = a[i]; buffB = tmpB; // static byte array to store not writing bytes for (int i = 0; i < buffB.Length / 8; i++) { int j = i * 8; bw.Write((byte)(buffB[j]*128+buffB[j+1]*64+buffB[j+2]*32+buffB[j+3]*16+buffB[j+4]*8+buffB[j+5]*4+buffB[j+6]*2+buffB[j+7])); } int L = (buffB.Length / 8) * 8; for (int i = L; i < buffB.Length; i++) buffB[i - L] = buffB[i]; Array.Resize(ref buffB, buffB.Length - L);
  • 74. Optimizing Software in C++ By Agner Fog.
  • 75. Potential Improvement #3 void WriteBits(BinaryWriter bw, uint x, byte numberOfBits) byte[] bits = GetBits(x, numberOfBits); byte[] tmpB = new byte[buffB.Length + bits.Length]; for (int i = 0; i < buffB.Length; i++) tmpB[i] = buffB[i]; for (int i = 0; i < bits.Length; i++) tmpB[i+ BitsBuffLength] = a[i]; buffB = tmpB; // static byte array to store not writing bytes for (int i = 0; i < buffB.Length >> 3; i++) { int j = i << 3; bw.Write((byte)(buffB[j]<<7+buffB[j+1]<<6+buffB[j+2]<<5+buffB[j+3]<<4+buffB[j+4]<<3+buffB[j+5]<<2+buffB[j+6]<<1+buffB[j+7])); } int L = (buffB.Length >> 8) << 8; for (int i = L; i < buffB.Length; i++) buffB[i - L] = buffB[i]; Array.Resize(ref buffB, buffB.Length - L);
  • 76. Buffer Array to Store Bits Tail We don’t need to create a new array each time, and instead just reuse the existing buffer length of 8. Each time, we will store its current bits tail length so we can avoid using Array.Resize()
  • 77. WriteBits void WriteBits(BinaryWriter bw, uint x, byte numberOfBits) byte[] bits = GetBits(x, numberOfBits); byte[] tmpB = new byte[buffB.Length + bits.Length]; for (int i = 0; i < buffB.Length; i++) tmpB[i] = buffB[i]; for (int i = 0; i < bits.Length; i++) tmpB[i+ BitsBuffLength] = a[i]; buffB = tmpB; // static byte array to store not writing bytes for (int i = 0; i < buffB.Length >> 3; i++) { int j = i << 3; bw.Write((byte)(buffB[j]<<7+buffB[j+1]<<6+buffB[j+2]<<5+buffB[j+3]<<4+buffB[j+4]<<3+buffB[j+5]<<2+buffB[j+6]<<1+buffB[j+7])); } int L = (buffB.Length >> 8) << 8; for (int i = L; i < buffB.Length; i++) buffB[i - L] = buffB[i]; BitsBuffLength = Bits_ tmpB.Length - L;
  • 78. #8 Performance Summary Execution time Old 10,510 9,754 7.19 % -4 s 080 ms Memory (MB) New Diff % 36 s 631 ms 32 s 551 ms 756 11.14 % х 1.08 х 1.13
  • 79. #9 ScaleGrad. - Can It Be Faster? /* Return index of number x by ordered scale */ int ScaleGrad(int x)
  • 80. Potential Improvements Avoid compare int and double values Scale is a sorted array, so we can use binary search; it’s more efficient and less dependent on input data static double[] Scale; … /* 600+ lines of code */ … int ScaleGrad(int x) { for(int i=0; i<Scale.Length && Scale[i]<=x; i++) return i - 1; }
  • 81. Comparison static double[] Scale; /* 600+ lins of code */ int ScaleGrad(int x) { for(int i=0;(i<Scale.Length)&&(Scale[i]<= x);i++); return i - 1; } static int[] Scale; /* 600+ lins of code */ int ScaleGrad(int x) var left = 1; var right = Scale.Length -1; var mid =(left + right)>>1;//(left+right)/2 do { mid = left + ((right - left)>>1); if ( x < Scale[mid]) right = mid - 1; else left = mid + 1; } while (right >= left); return mid;
  • 82. #9 Performance Summary Execution time Old 9,754 9,754 0.00 % -3 s 707 ms Memory (MB) New Diff % 32 s 551 ms 28 s 844 ms 0 11.39 % х 1.00 х 1.13
  • 83. #10 ReadData - Can It Be Faster? ReadInt32( ) Used in main processing function, so we shouldn’t touch it right now. ReadUInt32( ) In both cases, the binary file read and the basis on read data is the calculated number of binary chains.
  • 84. Potential Improvements Don’t allocate and collect unused data in a list Read UInt64 values instead of UIn32. It has cut the loop length in half, but we have to check if the loop length is not odd using(var br = new BinaryReader(…)) var list = new List<int>(); var lenght = br.BaseStream.Length – 4; while (br.BaseStream.Position <= length){ br.ReadUInt64(); var n = br.ReadUInt32(); list.Add(n); for (int i=0; i < n; i++) br.ReadUInt32(); }
  • 85. Comparison using(var br = new BinaryReader(…)) var list = new List<int>(); var lenght = br.BaseStream.Length – 4; while (br.BaseStream.Position <= length) { br.ReadUInt64(); var n = br.ReadUInt32(); list.Add(n); for(int i=0; i < n; i++) br.ReadUInt32(); } using(var br = new BinaryReader(…)) var lenght = br.BaseStream.Length – 4; while (br.BaseStream.Position <= length) { br.ReadUInt64(); var n = br.ReadUInt32(); for(int i=0; i < n>>1; i++) br.ReadUInt64(); if ((n & 1) == 1) br.ReadUInt32(); }
  • 86. #10 Performance Summary Execution time Old 9,754 9,690 0.66 % -4 s 796 ms Memory (MB) New Diff % 28 s 844 ms 24 s 048 ms 64 16.63 % х 1.01 х 1.13
  • 87. #11 List Usage - Can It Be Faster? A lot of generic list usage (create new list instances, add elements, iterate). It can be a convention or common approach, but its usage is costly. Even then, you just get element by index. var list = new List(); var item = list[i];
  • 88. Potential Improvement #1 List<Point> Foo(List<Point> Points) { var R = new List<Point>(); foreach (var P in Points) R.Add(new Point(P.X, ProcessPoint(P.Y))); return R; }
  • 89. Favor Arrays Over Lists Use arrays instead of lists if possible. It allows simple array indexing as opposed to add or [ ] list functions Use the capacity in list constructor if it is known; it allows you to add elements without an internal array resize public T this[int index] { get { if ((uint) index >= (uint)_size) ThrowHelper.ThrowArgumentOutOfRangeException(); Contract.EndContractBlock(); return _items[index]; } set { if ((uint) index >= (uint)_size) ThrowHelper.ThrowArgumentOutOfRangeException(); Contract.EndContractBlock(); _items[index] = value; version++; } }
  • 90. Potential Improvement #2 List<Point> Foo(List<Point> Points) { var R = new List<Point>(); foreach (var P in Points) R.Add(new Point(P.X, ProcessPoint(P.Y))); return R; }
  • 91. Favor Arrays Over Lists Use simple for instead foreach to avoid : - virtual GetEnumerator(), which produces boxing - Instance methods call get_Current( ) and the somewhat complex MoveNext( ) … callvirt instance valuetype GetEnumerator() … // loop start … call instance !0 valuetype get_Current() … call instance bool valuetype MoveNext() … // end loop
  • 92. Comparison List<Point> Foo(List<Point> Points) { var R = new List<Point>(); foreach (var P in Points) R.Add(new Point(P.X, ProcessPoint(P.Y))); return R; } Point[] Foo(List<Point> Points) { var n = Points.Count; var R = new Point[n]; for(var i = 0; i < n; i++) R[i] = new Point(Points[i].X, ProcessPoint(Points[i].Y)); return R; }
  • 93. #11 Performance Summary Execution time Old 9,690 7,842 19.07 % -5 s 715 ms Memory (MB) New Diff % 24 s 048 ms 18 s 333 ms 1,848 23.76 % х 1.24 х 1.31
  • 94. #12 WriteBits - Can It Be Faster? WriteBits is near the top again, and it has 64.56 % own time, so let’s try to optimize it.
  • 95. Potential Improvement #3 void WriteBits(BinaryWriter bw, uint x, byte numberOfbits) byte[] bits = GetBits(x, numberOfbits); byte[] tmpB = new byte[buffB.Length + bits.Length]; for (int i = 0; i < BitsBuffLength; i++) tmpB[i] = buffB[i]; for (int i = 0; i < bits.Length; i++) tmpB[i+ BitsBuffLength] = bits[i]; buffB = tmpB; // static byte array to store not writing bytes for (int i = 0; i < buffB.Length >> 3; i++) { int j = i << 3; bw.Write((byte)(buffB[j]<<7+buffB[j+1]<<6+buffB[j+2]<<5+buffB[j+3]<<4 +buffB[j+4]<<3+buffB[j+5]<<2+buffB[j+6]<<1+buffB[j+7])); } int L = (buffB.Length >> 3) << 3; for (int i = L; i < buffB.Length; i++) {buffB[i - L] = buffB[i];} BitsBuffLength = tmpB.Length - L;
  • 96. Don't Forget to Use New Features We forgot that GetBits can fill arrays with offset. Using it here can prevent redundant array copying. byte[] tmpB = new byte[buffB.Length+ bits.Length]; byte[] bits = GetBits(x, numberOfbits); for (int i = 0; i < BitsBuffLength; i++) tmpB[i] = buffB[i]; for (int i = 0; i < bits.Length; i++) tmpB[i+ BitsBuffLength] = bits[i]; byte[] tmpB = new byte[BitsBuffLength + numberOfBits]; GetBits(x, numberOfBits, tmpB, BitsBuffLength); for (int i = 0; i < BitsBuffLength; i++) tmpB[i] = buffB[i];
  • 97. Potential Improvement #2 void WriteBits(BinaryWriter bw, uint x, byte numberOfbits) byte[] bits = GetBits(x, numberOfbits); byte[] tmpB = new byte[buffB.Length + bits.Length]; for (int i = 0; i < BitsBuffLength; i++) tmpB[i] = buffB[i]; for (int i = 0; i < bits.Length; i++) tmpB[i+ BitsBuffLength] = bits[i]; buffB = tmpB; // static byte array to store not writing bytes for (int i = 0; i < buffB.Length >> 3; i++) { int j = i << 3; bw.Write((byte)(buffB[j]<<7+buffB[j+1]<<6+buffB[j+2]<<5+buffB[j+3]<<4 +buffB[j+4]<<3+buffB[j+5]<<2+buffB[j+6]<<1+buffB[j+7])); } int L = (buffB.Length >> 3) << 3; for (int i = L; i < buffB.Length; i++) {buffB[i - L] = buffB[i];} BitsBuffLength = tmpB.Length - L;
  • 98. Even Small Operations Can Have Significant Impacts Each iteration we calculate i << 3 and offset for array buffB, we can counter with step 8 and calculate offset for 8 needed elements It’s not critical, but we change + operation to |
  • 99. Comparison for (int i=0; i < buffB.Length >> 3; i++) { int j = i << 3; bw.Write( (byte)(buffB[j]<<7+buffB[j+1]<<6 +buffB[j+2]<<5+buffB[j+3]<<4 +buffB[j+4]<<3+buffB[j+5]<<2 +buffB[j+6]<<1+buffB[j+7])); } for (int i=0; i<(buffB.Length >> 3) << 3; i+=8) { bw.Write( (byte)(buffB[i]<<7|buffB[i+1]<<6 |buffB[i+2]<<5|buffB[i+3]<<4 |buffB[i+4]<<3|buffB[i+5]<<2 |buffB[i+6]<<1|buffB[i+7])); }
  • 100. WriteBits void WriteBits(BinaryWriter bw, uint x, byte numberOfBits) byte[] tmpB = new byte[BitsBuffLength + numberOfBits]; GetBits(x, numberOfBits, tmpB, BitsBuffLength); for (int i = 0; i < BitsBuffLength; i++) tmpB[i] = buffB[i]; buffB = tmpB; // static byte array to store not writing bytes int L = (buffB.Length >> 8) << 8; for (int i = 0; i < L; i+=8) bw.Write((byte)(buffB[i]<<7|buffB[i+1]<<6|buffB[i+2]<<5|buffB[i+3]<<4 |buffB[i+4]<<3|buffB[i+5]<<2|buffB[i+6]<<1|buffB[i+7])); for (int i = L; i < buffB.Length; i++) {buffB[i - L] = buffB[i];} BitsBuffLength = tmpB.Length - L;
  • 101. #12 Performance Summary Execution time Old 7,842 5,525 29.55 % -1 s 966 ms Memory (MB) New Diff % 18 s 333 ms 16 s 367 ms 2,317 10.72 % х 1.42 х 1.12
  • 102. #13 Redundant Calls uint statatistics1 …; uint statatistics2 …; … uint x = …; … /* byte NumberOfBits(unit x) */ statatistics1 += NumberOfBits(x); statatistics2 += 28 - NumberOfBits(x); uint statatistics1 …; uint statatistics2 …; … uint x = …; … /* byte NumberOfBits(unit x) */ uint xn = NumberOfBits(x); statatistics1 += xn; statatistics2 += 28 - xn;
  • 103. #13 Performance Summary Execution time Old 5,525 4,424 19.93 % -1 s 331 ms Memory (MB) New Diff % 16 s 367 ms 15 s 036 ms 1,101 8.13 % х 1.25 х 1.09
  • 107. #12 Performance Summary Execution time Old 103,775 4,424 95.74 % 45 s 131 ms Memory (MB) New Diff % 1 m 10 s 167 ms 15 s 036 ms 99,341 78.57 % х 23.46 х 4.67
  • 108. Performance Summary Old New Diff % Execution time Memory (MB) Peak (MB) 43 m 21 s 642 ms 15 s 036 ms 43 m 06 s 602 ms 99.42 % x 173.02 374,011 4,424 369,577 98.82 % х 84.54 93,26 33,67 59,59 63,89 % х 2.77
  • 111. What was the application doing for 43 minutes?
  • 112. Q&A
  • 113. LINKS Use dotTrace Command-Line Profiler Hashtable and dictionary collection types .NET Performance Optimization & Profiling with JetBrains dotTrace Why GC run when using a struct as a generic dictionary Matt Ellis. Writing Allocation Free Code in C# Konrad Kokosa. High-performance code design patterns in C# Maarten Balliauw. Let’s refresh our memory! Memory management in .NET Sasha Goldshtein. Pro .NET Performance: Optimize Your C# Applications Ben Watson. Writing High-Performance .NET Code, 2nd Edition Writing Faster Managed Code: Know What Things Cost
  • 114. Maarten Balliauw LINKS Sasha Goldshtein Yevhen Tatarynov GitHub Ling.Concat Linq.Concat Implementation Buffer.BlockCopy Generic List implementation Optimizing software in C++ Denis Reznik video Array.Sort