Concurrency scalability

Mårten Rånge
WCOM AB

@marten_range

Performance
Scalable algorithms

Three pillars of Concurrency
 Scalability (CPU)
 Parallel.For

 Responsiveness
 Task/Future
 async/await

 Consistency





lock/synchronized
Interlocked.*
Mutex/Event/Semaphore
Monitor

Which is fastest?
var ints = new int[InnerLoop];
var random = new Random();
for (var inner = 0; inner < InnerLoop; ++inner)
{
ints[inner] = random.Next();
}
// -----------------------------------------------var ints = new int[InnerLoop];
Parallel.For(
0,
InnerLoop,
i => ints[i] = random.Next()
);

SHARED STATE  Race condition
{
}
Parallel.For(
0,
InnerLoop,
);

SHARED STATE  Poor performance
{
}
Parallel.For(
0,
InnerLoop,
);

Then and now
Metric

VAX-11/750 (’80)

Today

Improvement

MHz

6

3300

550x

Memory MB

2

16384

8192x

Memory MB/s

13

R ~10000
W ~2500

770x
190x

Then and now
Metric

VAX-11/750 (’80)

Today

Improvement

MHz

6

3300

550x

Memory MB

2

16384

8192x

Memory MB/s

13

Memory nsec

225

R ~10000
W ~2500
70

770x
190x
3x

Then and now
Metric

VAX-11/750 (’80)

Today

Improvement

MHz

6

3300

550x

Memory MB

2

16384

8192x

Memory MB/s

13

Memory nsec

225

R ~10000
W ~2500
70

770x
190x
3x

Memory cycles

1.4

210

-150x

99% - latency mitigation
1% - computation

2 Core CPU
CPU1

CPU2

L1

L1

L2

L2
L3

RAM

2 Core CPU – L1 Cache

CPU1

L1

CPU2
new Random ()

new int[InnerLoop]

L1


CPU1

CPU2

Random object

Random object

L1

L1


CPU1

L1

CPU2

L1

CPU3

new Random ()

new int[InnerLoop]

L1

CPU4

L1

2x4 Core CPU
CPU1 CPU2 CPU3 CPU4

CPU5 CPU6 CPU7 CPU8

L1

L1

L1

L1

L1

L1

L1

L1

L2

L2

L2

L2

L2

L2

L2

L2

L3

L3

RAM

Solution 1 – Locks
Parallel.For(
0,
InnerLoop,
i => {lock (ints) {ints[i] = random.Next();}}
);

Solution 2 – No sharing
Parallel.For(
0,
InnerLoop,
() => new Random(),
(i, pls, random) =>
{ints[i] = random.Next(); return random;},
random => {}
);

Parallel.For adds overhead
Level2
Level1

Level2
Level0
Level2

Level1
Level2

ints[0]

ints[1]
ints[2]
ints[3]

ints[4]
ints[5]
ints[6]

ints[7]

Solution 3 – Less overhead
Parallel.For(
0,
InnerLoop / Modulus,
() => new Random(),
(i, pls, random) =>
{
var begin
= i * Modulus
;
var end
= begin + Modulus
;
for (var iter = begin; iter < end; ++iter)
{
ints[iter] = random.Next();
}
return random;
},
random => {}
);

{
}

Solution 4 – Independent runs
var tasks = Enumerable.Range (0, 8).Select (
i => Task.Factory.StartNew (
() =>
{
var random = new Random ();
while (counter.CountDown ())
{
{
}
}
},
TaskCreationOptions.LongRunning))
.ToArray ();
Task.WaitAll (tasks);

Parallel.For
Only for CPU bound problems

Sharing is bad
Kills performance
Race conditions
Dead-locks

Cache locality
RAM is a misnomer
Class design
Avoid GC

Natural concurrency
Avoid Parallel.For

Act like an engineer
Measure before and after

http://tinyurl.com/wcom-cpuscalability

Concurrency scalability

More Related Content

What's hot

Similar to Concurrency scalability

More from Mårten Rånge

Recently uploaded

Concurrency scalability