0
Practical Solutions
for Multicore Programming

Dr. Guy Korland
Process 1
a = acc.get()
a = a + 100

Process 2
b = acc.get()
b = b + 50
acc.set(b)

acc.set(a)
... Lost Update! ...
Process 1

Process 2

lock(A)
lock(B)
….
lock(A)
lock(A)
... DeadLock! ...
Process 1

Process 2

atomic{
a = acc.get()
a = a + 100
acc.set(a)
}

atomic{
b = acc.get()
b = b + 50
acc.set(b)
}

... W...
Intel TSX
if(_xbegin()==-1) {
if( !fallback_mutex.is_acquired() ) {
tions.
sums[mygroup] += data[i];e instruc
impl
} else ...
We still need
Software Transactional Memory
DSTM2

Maurice Herlihy et al, A flexible framework … [OOPSLA06]

@atomic public interface INode{
int getValue ();
void set...
JVSTM

João Cachopo and António Rito-Silva, Versioned boxes as the
basis for memory transactions [SCOOL05]

public class A...
Atom-Java

B. Hindman and D. Grossman. Atomicity via source-tosource
translation. [MSPC06]

public void update ( double va...
Deuce STM - API
G. Korland, N. Shavit and P. Felber, “Noninvasive Java
Concurrency with Deuce STM”, [MultiProg '10]

publi...
Deuce STM - Overview
Benchmarks

(Sun UltraSPARC T2 Plus – 2 x Quad x 8 HT)
Benchmarks

(Azul – Vega2 – 2 x 48)
Benchmark - the dark side
1.2

1

0.8

0.6

0.4

0.2

0
1

2

3

4

5

6

7

8

9

10
Overhead
●

Contention – Retries, Aborts, Contention Manager …

●

STM Algorithm – Data structures, optimistic, pessimisti...
Static analysis Optimizations
1. Avoiding instrumentation of accesses to immutable and
transaction-local memory.
2. Avoidi...
Novel Static analysis
Optimizations

Y. Afek, G. Korland, and A. Zilberstein,
“Lowering STM Overhead with Static Analysis”...
Benchmarks – K-Means
We still need
Fine-Grained
Concurrent Data Structures
e.g. Pool
• P1

• Get( )

• Put(x)

• C2

• P2

•.
•.
•.

• C1

• Put(y)

• Get( )

• Pn • Put(z)

• Get( )

• pool

•.
•....
Java - pools
1. SynchronousQueue/Stack -

pairing up function without buffering.
Producers and consumers wait for one anot...
ED-Tree
Scalable Producer-Consumr Pools Based on Elimination-Diffraction Trees
(Y. Afek, G. Korland, M. Natanzon, N. Shavi...
Performance
What about other cases?
Do we really need Linearizability?
Can we make it more formal?
The solution:
Relax the Linearizability Requirements
Y. Afek, G. Korland, and A. Yanovsky,
“Quasi-Linearizability: relaxed...
e.g. Task Queue
Tail

Head

Task

Task Consumers

Task

Task

Task

Task

Task Producers
K-Quasi Task Queue
k
Tail

Head
Task

Task

Task

Task
Consumer

Task

Task

Task
Consumer

Task

Task
Quasi Linearizable Definition

H’

1

2

3

4

5

6

H

4

1

2

3

5

6

Distance 3
More motivation...
●

Statistical Counter

●

ID generator

●

Web Cache
Paractical Solutions for Multicore Programming
Paractical Solutions for Multicore Programming
Paractical Solutions for Multicore Programming
Paractical Solutions for Multicore Programming
Paractical Solutions for Multicore Programming
Paractical Solutions for Multicore Programming
Paractical Solutions for Multicore Programming
Paractical Solutions for Multicore Programming
Paractical Solutions for Multicore Programming
Paractical Solutions for Multicore Programming
Paractical Solutions for Multicore Programming
Paractical Solutions for Multicore Programming
Paractical Solutions for Multicore Programming
Paractical Solutions for Multicore Programming
Paractical Solutions for Multicore Programming
Upcoming SlideShare
Loading in...5
×

Paractical Solutions for Multicore Programming

296

Published on

From Crafting a High-Performance Ready-to-Go STM
to non-Linearizable Data Structures

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
296
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Circuit routing is the process of automatically producing an interconnection between
    electronic components. Lee's routing algorithm is attractive for parallelization since circuits
    consist of thousands of routes, each of which can potentially be routed
    concurrently.
  • Transcript of "Paractical Solutions for Multicore Programming"

    1. 1. Practical Solutions for Multicore Programming Dr. Guy Korland
    2. 2. Process 1 a = acc.get() a = a + 100 Process 2 b = acc.get() b = b + 50 acc.set(b) acc.set(a) ... Lost Update! ...
    3. 3. Process 1 Process 2 lock(A) lock(B) …. lock(A) lock(A) ... DeadLock! ...
    4. 4. Process 1 Process 2 atomic{ a = acc.get() a = a + 100 acc.set(a) } atomic{ b = acc.get() b = b + 50 acc.set(b) } ... WIIIII! ...
    5. 5. Intel TSX if(_xbegin()==-1) { if( !fallback_mutex.is_acquired() ) { tions. sums[mygroup] += data[i];e instruc impl } else { d to s e _xabort(1); ● Limit ll-back fa } erency Coh uires ● _xend(); Req Cache } else { ing on ●fallback_mutex.acquire(); Relay sums[mygroup] += data[i]; fallback_mutex.release(); }
    6. 6. We still need Software Transactional Memory
    7. 7. DSTM2 Maurice Herlihy et al, A flexible framework … [OOPSLA06] @atomic public interface INode{ int getValue (); void setValue (int value ); jects. } to Ob d imite L sive. Factory<INode> factory ru int = Thread.makeFactory(INode.class ); aries. ● final INodeVery factory.create(); ort libr node = factory result = Thread.doIt(new Callable<Boolean>() { ’t supp e (fork). n ● Does public Boolean call nc rma () { return node.setValue(value); perfo ● Bad } }); ●
    8. 8. JVSTM João Cachopo and António Rito-Silva, Versioned boxes as the basis for memory transactions [SCOOL05] public class Account{ private VBox<Long> balance = new aries. VBox<Long>(); } rt libr suppo public @Atomic void withdraw(long amount) { esn’t ● Do e. - amount); hared fields balance.put rusiv int(balance.get() nce” s ● Less } nnou to “A ● Need
    9. 9. Atom-Java B. Hindman and D. Grossman. Atomicity via source-tosource translation. [MSPC06] public void update ( double value) { Atomic { ord. w commission += value; erved a res tion. ● Add } ompila ries. pre-c } ibra ● eed N ort l ’t supp sive. n ● Does s intru n Les ● Eve
    10. 10. Deuce STM - API G. Korland, N. Shavit and P. Felber, “Noninvasive Java Concurrency with Deuce STM”, [MultiProg '10] public class Bank{ rds. ed wo private double commission = 10; serv No re ased. nb tion. @Atomicnnotatio mpila ● A re co pac1,-Account ac2,rdouble amount){ public void transaction( Account ies. d for ee ● No n (amount + commission);lib al ra ol ac1.balance -= xtern ac2.balanceppamount;e += orts rch to ● Su resea } able – d ● Exten } ●
    11. 11. Deuce STM - Overview
    12. 12. Benchmarks (Sun UltraSPARC T2 Plus – 2 x Quad x 8 HT)
    13. 13. Benchmarks (Azul – Vega2 – 2 x 48)
    14. 14. Benchmark - the dark side 1.2 1 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 7 8 9 10
    15. 15. Overhead ● Contention – Retries, Aborts, Contention Manager … ● STM Algorithm – Data structures, optimistic, pessimistic… ● Semantic – Consistency model, Privatization… ● Instrumented Memory access – Linear overhead on every read/write
    16. 16. Static analysis Optimizations 1. Avoiding instrumentation of accesses to immutable and transaction-local memory. 2. Avoiding lock acquisition and releases for local memory. thread- 3. Avoiding readset population in read-only transactions.
    17. 17. Novel Static analysis Optimizations Y. Afek, G. Korland, and A. Zilberstein, “Lowering STM Overhead with Static Analysis”, LCPC'10 1. Reduce amount of instrumented memory reads using load elimination. 2. Reduce amount of instrumented memory writes using scalar promotion. 3. Avoid writeset lookups for memory not yet written to. 4. Avoid writeset record keeping for memory that will not be read. 5. Reduce false conflicts by Transaction re-scoping. ...
    18. 18. Benchmarks – K-Means
    19. 19. We still need Fine-Grained Concurrent Data Structures
    20. 20. e.g. Pool • P1 • Get( ) • Put(x) • C2 • P2 •. •. •. • C1 • Put(y) • Get( ) • Pn • Put(z) • Get( ) • pool •. •. •. • Cn
    21. 21. Java - pools 1. SynchronousQueue/Stack - pairing up function without buffering. Producers and consumers wait for one another labilty. /FIFO Sca LIFO and leave, mited ● Li 2. LinkedBlockingQueuet- Producers put their value ' need n Consumers wait l does become available. for a value to ● Poo 3. ConcurrentLinkedQueue - Producers put their value and leave, Consumers return null if the pool is empty.
    22. 22. ED-Tree Scalable Producer-Consumr Pools Based on Elimination-Diffraction Trees (Y. Afek, G. Korland, M. Natanzon, N. Shavit) : ucture ● Merge ee Str ng-Tr fracti ● Dif ach) d Zem cture an havit e Stru (S n-Tre inatio ● Elim itou) nd Tou ueue a v it a (Sh kingQ Bloc ed ● Link
    23. 23. Performance
    24. 24. What about other cases?
    25. 25. Do we really need Linearizability?
    26. 26. Can we make it more formal?
    27. 27. The solution: Relax the Linearizability Requirements Y. Afek, G. Korland, and A. Yanovsky, “Quasi-Linearizability: relaxed consistency for improved concurrency”, OPODIS'10
    28. 28. e.g. Task Queue Tail Head Task Task Consumers Task Task Task Task Task Producers
    29. 29. K-Quasi Task Queue k Tail Head Task Task Task Task Consumer Task Task Task Consumer Task Task
    30. 30. Quasi Linearizable Definition H’ 1 2 3 4 5 6 H 4 1 2 3 5 6 Distance 3
    31. 31. More motivation... ● Statistical Counter ● ID generator ● Web Cache
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×