Tongping Liu, Charlie Curtsinger, Emery Berger
DTHREADS: Efficient Deterministic
Multithreading
Insanity: Doing the same
thing over and over again
and expecting different
results.
2
In the Beginning…
3
There was the Core.
4
And it was Good.
5
It gave us our Daily Speed.
6
Until the Apocalypse.
7
And the Speed was no Moore.
8
And then came a False Prophet…
9
10
Want speed?
11
I BRING YOU THE GIFT OF PARALLELISM!
12
color = ; row = 0; // globals
void nextStripe(){
for (c = 0; c < Width; c++)
drawBox (c,row,color);
color = (color == )?  : ;
row++;
}
for (n = 0; n < 9; n++)
pthread_create(t[n], nextStripe);
for (n = 0; n < 9; n++)
pthread_join(t[n]);
JUST USE THREADS…
13
14
15
16
17
18
pthreads
race conditions
atomicity violations
deadlock
order violations
19
Salvation?
20
21
pthreads
race conditions
atomicity violations
deadlock
order violations
DTHREADS
deterministic
race conditions
atomicity violations
deadlock
order violations
22
DTHREADS Enables…
Race-free Executions
Replay Debugging w/o Logging
Replicated State
Machines
23
0
1
2
3
4
5
6
runtimerelativetopthreads
CoreDet dthreads pthreads
8.47.8
DTHREADS: Efficient Determinism
Usually faster than the state of the art
24
0
1
2
3
4
5
6
runtimerelativetopthreads
CoreDet dthreads pthreads
8.47.8
DTHREADS: Efficient Determinism
Generally as fast or faster than pthreads
25
% g++ myprog.cpp –l thread
DTHREADS: Easy to Use
p
26
Isolation
shared address space disjoint address spaces
27
Performance: Processes vs. Threads
threads
processes
1 2 4 8 16 32 64 128 256 512
1024
Thread Execution Time (ms)
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
NormalizedExecutionTime
28
Performance: Processes vs. Threads
threads
processes
1 2 4 8 16 32 64 128 256 512
1024
Thread Execution Time (ms)
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
NormalizedExecutionTime
29
Performance: Processes vs. Threads
threads
processes
1 2 4 8 16 32 64 128 256 512
1024
Thread Execution Time (ms)
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
NormalizedExecutionTime
30
“Shared Memory”
31
Snapshot pages
before modifications
“Shared Memory”
32
Write back diffs
“Shared Memory”
33
“Thread” 1
“Thread” 2
“Thread” 3
Parallel Serial
Update in Deterministic Time & Order
Para
mutex_lock
cond_wait
pthread_create
34
0
1
2
3
4
runtimerelativetopthreads
dthreads pthreads
DTHREADS performance analysis
35
Thread 1
Main Memory
Core 1
Thread 2
Core 2
Invalidate
The Culprit: False Sharing
36
Thread 1 Thread 2
Invalidate
Main Memory
Core 1 Core 2
The Culprit: False Sharing
20x
37
Process 1 Process 2
Global State
Core 1 Core 2
Process 2
Process 1
DTHREADS: Eliminates False Sharing!
38
0
1
2
3
4
5
6
runtimerelativetopthreads
ordering only isolation only dthreads
DTHREADS: Detailed Analysis
39
0
1
2
3
4
5
6
runtimerelativetopthreads
ordering only isolation only dthreads
DTHREADS: Detailed Analysis
40
0
1
2
3
4
5
6
runtimerelativetopthreads
ordering only isolation only dthreads
DTHREADS: Detailed Analysis
41
0
1
2
3
4
speedupof8coresover2cores
CoreDet dthreads pthreads
DTHREADS: Scalable Determinism
42
0
1
2
3
4
speedupof8coresover2cores
CoreDet dthreads pthreads
DTHREADS: Scalable Determinism
43
0
1
2
3
4
speedupof8coresover2cores
CoreDet dthreads pthreads
DTHREADS: Scalable Determinism
44
DTHREADS
% g++ myprog.cpp –l threadp
45

Dthreads: Efficient Deterministic Multithreading

Editor's Notes

  • #2 In the beginning, there was the Core. And it was good.
  • #3 Casts out the demons of nondeterminism
  • #4 Highlight when same speed or faster.
  • #5 Highlight when same speed or faster.
  • #6 Obviously this doesn’t preserve shared memory semantics, so we need to commit changes made by one thread so they become visible to others.
  • #7 ADD ANIMATIONS: threads initially on one core then migrating, vs. processes spewed across cores
  • #8 ADD ANIMATIONS: threads initially on one core then migrating, vs. processes spewed across cores
  • #9 ADD ANIMATIONS: threads initially on one core then migrating, vs. processes spewed across cores
  • #11 It’s not *always* as fast or faster than pthreads. Slow THEN HIGHLIGHT THE FASTER PARTS.
  • #36 Cache coherence protocol makes false sharing problem unpleasant performance effect
  • #39 Panel 1 = what it does, panel 2 = how, panel 3 = efficient, panel 4 = easy to use