SlideShare a Scribd company logo
1
Thread 1 Thread 2
X++ T=Y
Z=2 T=X
What is a Data Race?
 Two concurrent accesses to a shared
location, at least one of them for writing.
 Indicative of a bug
2
Lock(m)
Unlock(m) Lock(m)
Unlock(m)
How Can Data Races be Prevented?
 Explicit synchronization between threads:
 Locks
 Critical Sections
 Barriers
 Mutexes
 Semaphores
 Monitors
 Events
 Etc.
Thread 1 Thread 2
X++
T=X
3
Is This Sufficient?
 Yes!
 No!
 Programmer dependent

Correctness – programmer may forget to synch
 Need tools to detect data races
 Expensive

Efficiency – to achieve correctness, programmer may
overdo.
 Need tools to remove excessive synch’s
4
#define N 100
Type g_stack = new Type[N];
int g_counter = 0;
Lock g_lock;
void push( Type& obj ){lock(g_lock);...unlock(g_lock);}
void pop( Type& obj ) {lock(g_lock);...unlock(g_lock);}
void popAll( ) {
lock(g_lock);
delete[] g_stack;
g_stack = new Type[N];
g_counter = 0;
unlock(g_lock);
}
int find( Type& obj, int number ) {
lock(g_lock);
for (int i = 0; i < number; i++)
if (obj == g_stack[i]) break; // Found!!!
if (i == number) i = -1; // Not found… Return -1 to caller
unlock(g_lock);
return i;
}
int find( Type& obj ) {
return find( obj, g_counter );
}
Where is Waldo?
5
#define N 100
Type g_stack = new Type[N];
int g_counter = 0;
Lock g_lock;
void push( Type& obj ){lock(g_lock);...unlock(g_lock);}
void pop( Type& obj ) {lock(g_lock);...unlock(g_lock);}
void popAll( ) {
lock(g_lock);
delete[] g_stack;
g_stack = new Type[N];
g_counter = 0;
unlock(g_lock);
}
int find( Type& obj, int number ) {
lock(g_lock);
for (int i = 0; i < number; i++)
if (obj == g_stack[i]) break; // Found!!!
if (i == number) i = -1; // Not found… Return -1 to caller
unlock(g_lock);
return i;
}
int find( Type& obj ) {
return find( obj, g_counter );
}
Can You Find the Race?
Similar problem was found
in java.util.Vector
write
read
6
Detecting Data Races?
 NP-hard [Netzer&Miller 1990]
 Input size = # instructions performed
 Even for 3 threads only
 Even with no loops/recursion
 Execution orders/scheduling (#threads)thread_length
 # inputs
 Detection-code’s side-effects
 Weak memory, instruction reorder, atomicity
7
Motivation
Run-time framework goals
 Collect a complete trace of a program’s user-mode execution
 Keep the tracing overhead for both space and time low
 Re-simulate the traced execution deterministically based on the
collected trace with full fidelity down to the instruction level

Full fidelity: user mode only, no tracing of kernel, only user-mode I/O
callbacks
Advantages
 Complete program trace that can be analyzed from multiple
perspectives (replay analyzers: debuggers, locality, etc)
 Trace can be collected on one machine and re-played on other
machines (or perform live analysis by streaming)
Challenges: Trace Size and Performance
8
Original Record-Replay Approaches
 InstantReplay ’87
 Record order or memory accesses
 overhead may affect program behavior
 RecPlay ’00
 Record only synchronizations
 Not deterministic if have data races
 Netzer ’93
 Record optimal trace
 too expensive to keep track of all memory locations
 Bacon & Goldstein ’91
 Record memory bus transactions with hardware
 high logging bandwidth
9
Motivation
Increasing use and development for multi-core
processors

MT program behavior is non-deterministic

To effectively debug software, developers must be able
to replay executions that exhibit concurrency bugs
 Shared memory updates happen in different order
10
Related Concepts
 Runtime interpretation/translation of binary instructions
 Requires no static instrumentation, or special symbol information
 Handle dynamically generated code, self modifying code
 Recording/Logging: ~100-200x
 More recent logging
 Proposed hardware support (for MT domain)
 FDR (Flight Data Recorder)
 BugNet (cache bits set on first load)
 RTR (Regulated Transitive Reduction)
 DeLorean (ISCA 2008- chunks of instructions)
 Strata (time layer across all the logs for the running threads)
 iDNA (Diagnostic infrastructure using NirvanA- Microsoft)
11
Deterministic Replay
Re-execute the exact same sequence of instructions
as recorded in a previous run

Single threaded programs
 Record Load Values needed for reproducing behavior of a
run (Load Log)
 Registers updated by system calls and signal handlers
(Reg Log)
 Output of special instructions: RDTSC, CPUID (Reg Log)
 System call (virtualization- cloning arguments, updates)
 Checkpointing (log summary ~10Million)

Multi-threaded programs
 Log interleaving among threads (shared memory updates
ordering – SMO Log)
12
PinSEL – System Effect Log (SEL)
Logging program load values needed for deterministic replay:
– First access from a memory location
– Values modified by the system (system effect) and read by
program
– Machine and time sensitive instructions (cpuid,rdtsc)
Load A; (A = 111)
Logged
Not Logged
Syscall modifies
location (B -> 0)
and (C -> 99)
Load C; (C = 99)
Load D; (D = 10)
Store A; (A  111)
Store B; (B  55)
Load B; (B = 0)
system call
Program
execution
Load C; (C = 9)
Load D; (D = 10)
•Trace size is ~4-5 bytes per instruction
13
reads
 Observation: Hardware caches eliminate most off-chip reads
 Optimize logging:
 Logger and replayer simulate identical cache memories
 Simple cache (the memory copy structure) to decide which values
to log. No tags or valid bits to check. If the values mismatch they
are logged.
 Average trace size is <1 bit per instruction
i = 1;
for (j = 0; j < 10; j++)
{
i = i + j;
}
k = i; // value read is 46
System_call();
k = i; // value read is 0 (not predicted)
 The only read not predicted and logged follows the system call
14
Example Overhead
 PinSEL and PinPLAY

Initial work (2006) with single threaded programs:
 SPEC2000 ref runs: 130x slowdown for pinSEL and ~80x
for PinPLAY (w/o in-lining)

Working with a subset of SPLASH2 benchmarks: 230x
slowdown for PinSEL
 Now: Geo-mean SPEC2006

Pin 1.4x

Logger 83.6x

Replayer 1.4x
15
Example: Microsoft iDNA Trace
Writer Performance
Applicatio
n
Simulated
Instructions
(millions)
Trace File
Size
Trace File
Bits /
Instructio
n
Native
Execution
Time
Execution
Time While
Tracing
Execution
Overhead
Gzip 24,097 245 MB 0.09 11.7s 187s 15.98
Excel 1,781 99 MB 0.47 18.2s 105s 5.76
Power
Point
7,392 528 MB 0.60 43.6s 247s 5.66
IE 116 5 MB 0.50 0.499s 6.94s 13.90
Vulcan 2,408 152 MB 0.53 2.74s 46.6s 17.01
Satsolver 9,431 1300 MB 1.16 9.78s 127s 12.98
•Memchecker and valgrind are in 30-40x range on CPU 2006
•iDNA ~11x, (does not log shared-memory dependences explicitly)
•Use a sequential number for every lock prefixed memory operation: offline
data race analysis
16
Logging Shared Memory Ordering
(Cristiano’s PinSEL/PLAY Overview)
 Emulation of Directory Based Cache
Coherence

Identifies RAW, WAR, WAW dependences

Indexed by hashing effective address

Each entry represents an address range
Store A
Load B
Program execution
hash
Dir Entry
Dir Entry
Dir Entry
Dir Entry
Directory
17
Directory Entries
 Every DirEntry maintains:

Thread id of the last_writer

A timestamp is the # of memory ref. the thread has executed

Vector of timestamps of last access for each thread to that
entry

On Loads: update the timestamp for the thread in the entry

On Stores: update the timestamp and the last_writer fields
Programexecution
Thread T1 Thread T2
Last writer id:1: Store A
2: Load A
DirEntry: [A:D]
Last writer id:
DirEntry: [E:H]
Directory
T1: T2:
T1: T2:
1: Load F
2: Store A
3: Load F
3: Store F
T1
1
1
T2
22
3
T1
3
Vector
18
Detecting Dependences
 RAW dependency between threads T and T’ is established
if:

T executes a load that maps to the directory entry A

T’ is the last_writer for the same entry
 WAW dependency between T and T’ is established if:

T executes a store that maps to the directory entry A

T’ is the last_writer for the same entry
 WAR dependency between T and T’ is established if:

T executes a store that maps to the directory entry A

T’ has accessed the same entry in the past and T is not the
last_writer
19
ExampleProgramexecution
Thread T1 Thread T2
Last writer id:1: Store A
2: Load A
DirEntry: [A:D]
Last writer id:
DirEntry: [E:H]
T1: T2:
T1: T2:
1: Load F
2: Store A
3: Load F
3: Store F
T1
1
1
T2
22
3
T1
3
WAW
RAW
WAR
T1 2 T2 2
T1 3 T2 3
T2 2 T1 1
SMO logs:
Thread T1 cannot execute memory reference 2
until T2 executes its memory reference 2
Thread T2 cannot execute memory
reference 2 until T1 executes its
memory reference 1
Last access to the DirEntry
Last_writer
Last access to
the DirEntry
20
Ordering Memory Accesses
(Reducing log size)
 Preserving order will reproduce
execution
 a→b: “a happens-before b”
 Ordering is transitive: a→b, b→c means
a→c
 Two instructions must be ordered if:
 they both access the same memory, and
 one of them is a write
21
Constraints: Enforcing Order
 To guarantee a→d:
 a→d
 b→d
 a→c
 b→c
 Suppose we need b→c
 b→c is necessary
 a→d is redundant
P1
a
b
c
d
P2
overconstrained
22
 Reproduce exact same conflicts: no more, no less
Problem Formulation
ld A
Thread I Thread J
Recording
st B
st C
sub
ld B
add
st C
ld B
st A
st C
Thread I Thread J
Replay
Log
ld D
st D
ld A
st B
st C
sub
ld B
add
st C
ld B
st A
st C
ld D
st D
Conflicts
(red)
Dependence
(black)
23
  Detect conflicts  Write
log
Log All Conflicts
1
2
3
4
5
6
1
2
3
4
5
6
ld A
Thread I Thread J
Replay
st B
st C
sub
ld B
add
st C
ld B
st A
st C
ld D
st D
Log J: 2→3
1→4
3→5
4→6
Log I: 2→3
Log Size: 5*16=80 bytes
(10 integers)
Dependence Log
16
bytes
 Assign IC
 (logical Timestamps)
 But too many conflicts
24
Netzer’s Transitive Reduction
1
2
3
4
5
6
1
2
3
4
5
6
ld A
Thread I Thread J
Replay
st B
st C
sub
ld B
add
st C
ld B
st A
st C
ld D
st D
TR
reduced
Log J: 2→3
3→5
4→6
Log I: 2→3
Log Size: 64 bytes
(8 integers)
TR Reduced Log
25
RTR (Regulated Transitive Reduction):
Stricter Dependences to Aid Vectorization
1
2
3
4
5
6
1
2
3
4
5
6
ld A
Thread I Thread J
Replay
st B
st C
sub
ld B
add
st C
ld B
st A
st C
ld D
st D
Log J: 2→3
4→5
Log I: 2→3
Log Size: 48 bytes
(6 integers)
New Reduced Log
stricte
r
Reduce
d
4% Overhead RTR+FDR (simulated on GEMs)
.2 MB/core/second logging (Apache)

More Related Content

What's hot

Replication in Distributed Database
Replication in Distributed DatabaseReplication in Distributed Database
Replication in Distributed Database
Abhilasha Lahigude
 
Servlets
ServletsServlets
Servlets
ZainabNoorGul
 
4.3 MySQL + PHP
4.3 MySQL + PHP4.3 MySQL + PHP
4.3 MySQL + PHP
Jalpesh Vasa
 
Database replication
Database replicationDatabase replication
Database replication
Arslan111
 
Servlets
ServletsServlets
Semaphore
Semaphore Semaphore
Semaphore
LakshmiSamivel
 
Replication Techniques for Distributed Database Design
Replication Techniques for Distributed Database DesignReplication Techniques for Distributed Database Design
Replication Techniques for Distributed Database Design
Meghaj Mallick
 
Javafx tutorial
Javafx tutorialJavafx tutorial
Javafx tutorial
sloumaallagui1
 
Goroutines and Channels in practice
Goroutines and Channels in practiceGoroutines and Channels in practice
Goroutines and Channels in practice
Guilherme Garnier
 
DNS & DNSSEC
DNS & DNSSECDNS & DNSSEC
DNS & DNSSEC
APNIC
 
Concurrency With Go
Concurrency With GoConcurrency With Go
Concurrency With Go
John-Alan Simmons
 
Chapter 6.pptx
Chapter 6.pptxChapter 6.pptx
Chapter 6.pptx
RanjanaShevkar
 
Understanding Web Cache
Understanding Web CacheUnderstanding Web Cache
Understanding Web Cache
ProdigyView
 
JavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient JavaJavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient Java
Chris Bailey
 
Introduction to Garbage Collection
Introduction to Garbage CollectionIntroduction to Garbage Collection
Introduction to Garbage Collection
Artur Mkrtchyan
 
Chapter 12 transactions and concurrency control
Chapter 12 transactions and concurrency controlChapter 12 transactions and concurrency control
Chapter 12 transactions and concurrency controlAbDul ThaYyal
 
Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management System
AAKANKSHA JAIN
 
Process Scheduling
Process SchedulingProcess Scheduling

What's hot (20)

Replication in Distributed Database
Replication in Distributed DatabaseReplication in Distributed Database
Replication in Distributed Database
 
Servlets
ServletsServlets
Servlets
 
4.3 MySQL + PHP
4.3 MySQL + PHP4.3 MySQL + PHP
4.3 MySQL + PHP
 
Database replication
Database replicationDatabase replication
Database replication
 
Servlets
ServletsServlets
Servlets
 
Semaphore
Semaphore Semaphore
Semaphore
 
Replication Techniques for Distributed Database Design
Replication Techniques for Distributed Database DesignReplication Techniques for Distributed Database Design
Replication Techniques for Distributed Database Design
 
Javafx tutorial
Javafx tutorialJavafx tutorial
Javafx tutorial
 
Goroutines and Channels in practice
Goroutines and Channels in practiceGoroutines and Channels in practice
Goroutines and Channels in practice
 
Database System Architectures
Database System ArchitecturesDatabase System Architectures
Database System Architectures
 
DNS & DNSSEC
DNS & DNSSECDNS & DNSSEC
DNS & DNSSEC
 
Concurrency With Go
Concurrency With GoConcurrency With Go
Concurrency With Go
 
Chapter 6.pptx
Chapter 6.pptxChapter 6.pptx
Chapter 6.pptx
 
Understanding Web Cache
Understanding Web CacheUnderstanding Web Cache
Understanding Web Cache
 
Optional in Java 8
Optional in Java 8Optional in Java 8
Optional in Java 8
 
JavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient JavaJavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient Java
 
Introduction to Garbage Collection
Introduction to Garbage CollectionIntroduction to Garbage Collection
Introduction to Garbage Collection
 
Chapter 12 transactions and concurrency control
Chapter 12 transactions and concurrency controlChapter 12 transactions and concurrency control
Chapter 12 transactions and concurrency control
 
Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management System
 
Process Scheduling
Process SchedulingProcess Scheduling
Process Scheduling
 

Viewers also liked

Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
M. Atif Qureshi
 
Visual Object Category Recognition
Visual Object Category RecognitionVisual Object Category Recognition
Visual Object Category Recognition
Ashish Gupta
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
lucenerevolution
 
Text classification
Text classificationText classification
Text classification
James Wong
 
Text categorization
Text categorizationText categorization
Text categorization
KU Leuven
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Jimmy Lai
 
Object recognition
Object recognitionObject recognition
Object recognition
saniacorreya
 

Viewers also liked (8)

Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
 
Visual Object Category Recognition
Visual Object Category RecognitionVisual Object Category Recognition
Visual Object Category Recognition
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Text classification
Text classificationText classification
Text classification
 
Object recognition
Object recognitionObject recognition
Object recognition
 
Text categorization
Text categorizationText categorization
Text categorization
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 
Object recognition
Object recognitionObject recognition
Object recognition
 

Similar to Data race

Parallel computation
Parallel computationParallel computation
Parallel computation
Jayanti Prasad Ph.D.
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
Vipin Varghese
 
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
David Walker
 
parallel-computation.pdf
parallel-computation.pdfparallel-computation.pdf
parallel-computation.pdf
Jayanti Prasad Ph.D.
 
Swug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainathSwug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainath
Dennis Chung
 
Hs java open_party
Hs java open_partyHs java open_party
Hs java open_party
Open Party
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
Alcides Fonseca
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
Kostas Tzoumas
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Sean Zhong
 
Servers and Processes: Behavior and Analysis
Servers and Processes: Behavior and AnalysisServers and Processes: Behavior and Analysis
Servers and Processes: Behavior and Analysisdreamwidth
 
Troubleshooting .net core on linux
Troubleshooting .net core on linuxTroubleshooting .net core on linux
Troubleshooting .net core on linux
Pavel Klimiankou
 
Linux Internals - Part II
Linux Internals - Part IILinux Internals - Part II
Linux Internals - Part II
Emertxe Information Technologies Pvt Ltd
 
Track c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eveTrack c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -evechiportal
 
Java Memory Model
Java Memory ModelJava Memory Model
Java Memory Model
Łukasz Koniecki
 
Memory model
Memory modelMemory model
Memory model
MingdongLiao
 
Threads and multi threading
Threads and multi threadingThreads and multi threading
Threads and multi threading
Antonio Cesarano
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
Ding Li
 
Meltdown & spectre
Meltdown & spectreMeltdown & spectre
Meltdown & spectre
Sergio Shevchenko
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
EUDAT
 

Similar to Data race (20)

Handout3o
Handout3oHandout3o
Handout3o
 
Parallel computation
Parallel computationParallel computation
Parallel computation
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
 
parallel-computation.pdf
parallel-computation.pdfparallel-computation.pdf
parallel-computation.pdf
 
Swug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainathSwug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainath
 
Hs java open_party
Hs java open_partyHs java open_party
Hs java open_party
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 
Servers and Processes: Behavior and Analysis
Servers and Processes: Behavior and AnalysisServers and Processes: Behavior and Analysis
Servers and Processes: Behavior and Analysis
 
Troubleshooting .net core on linux
Troubleshooting .net core on linuxTroubleshooting .net core on linux
Troubleshooting .net core on linux
 
Linux Internals - Part II
Linux Internals - Part IILinux Internals - Part II
Linux Internals - Part II
 
Track c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eveTrack c-High speed transaction-based hw-sw coverification -eve
Track c-High speed transaction-based hw-sw coverification -eve
 
Java Memory Model
Java Memory ModelJava Memory Model
Java Memory Model
 
Memory model
Memory modelMemory model
Memory model
 
Threads and multi threading
Threads and multi threadingThreads and multi threading
Threads and multi threading
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
Meltdown & spectre
Meltdown & spectreMeltdown & spectre
Meltdown & spectre
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 

More from James Wong

Multi threaded rtos
Multi threaded rtosMulti threaded rtos
Multi threaded rtos
James Wong
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
James Wong
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
James Wong
 
Cache recap
Cache recapCache recap
Cache recap
James Wong
 
Big picture of data mining
Big picture of data miningBig picture of data mining
Big picture of data mining
James Wong
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching works
James Wong
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessors
James Wong
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherence
James Wong
 
Abstract data types
Abstract data typesAbstract data types
Abstract data types
James Wong
 
Abstraction file
Abstraction fileAbstraction file
Abstraction file
James Wong
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cache
James Wong
 
Object model
Object modelObject model
Object model
James Wong
 
Abstract class
Abstract classAbstract class
Abstract class
James Wong
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysis
James Wong
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with java
James Wong
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
James Wong
 
Cobol, lisp, and python
Cobol, lisp, and pythonCobol, lisp, and python
Cobol, lisp, and python
James Wong
 
Inheritance
InheritanceInheritance
Inheritance
James Wong
 

More from James Wong (20)

Multi threaded rtos
Multi threaded rtosMulti threaded rtos
Multi threaded rtos
 
Recursion
RecursionRecursion
Recursion
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Cache recap
Cache recapCache recap
Cache recap
 
Big picture of data mining
Big picture of data miningBig picture of data mining
Big picture of data mining
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching works
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessors
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherence
 
Abstract data types
Abstract data typesAbstract data types
Abstract data types
 
Abstraction file
Abstraction fileAbstraction file
Abstraction file
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cache
 
Object model
Object modelObject model
Object model
 
Abstract class
Abstract classAbstract class
Abstract class
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysis
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with java
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
 
Cobol, lisp, and python
Cobol, lisp, and pythonCobol, lisp, and python
Cobol, lisp, and python
 
Inheritance
InheritanceInheritance
Inheritance
 
Api crash
Api crashApi crash
Api crash
 

Recently uploaded

Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 

Recently uploaded (20)

Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 

Data race

  • 1. 1 Thread 1 Thread 2 X++ T=Y Z=2 T=X What is a Data Race?  Two concurrent accesses to a shared location, at least one of them for writing.  Indicative of a bug
  • 2. 2 Lock(m) Unlock(m) Lock(m) Unlock(m) How Can Data Races be Prevented?  Explicit synchronization between threads:  Locks  Critical Sections  Barriers  Mutexes  Semaphores  Monitors  Events  Etc. Thread 1 Thread 2 X++ T=X
  • 3. 3 Is This Sufficient?  Yes!  No!  Programmer dependent  Correctness – programmer may forget to synch  Need tools to detect data races  Expensive  Efficiency – to achieve correctness, programmer may overdo.  Need tools to remove excessive synch’s
  • 4. 4 #define N 100 Type g_stack = new Type[N]; int g_counter = 0; Lock g_lock; void push( Type& obj ){lock(g_lock);...unlock(g_lock);} void pop( Type& obj ) {lock(g_lock);...unlock(g_lock);} void popAll( ) { lock(g_lock); delete[] g_stack; g_stack = new Type[N]; g_counter = 0; unlock(g_lock); } int find( Type& obj, int number ) { lock(g_lock); for (int i = 0; i < number; i++) if (obj == g_stack[i]) break; // Found!!! if (i == number) i = -1; // Not found… Return -1 to caller unlock(g_lock); return i; } int find( Type& obj ) { return find( obj, g_counter ); } Where is Waldo?
  • 5. 5 #define N 100 Type g_stack = new Type[N]; int g_counter = 0; Lock g_lock; void push( Type& obj ){lock(g_lock);...unlock(g_lock);} void pop( Type& obj ) {lock(g_lock);...unlock(g_lock);} void popAll( ) { lock(g_lock); delete[] g_stack; g_stack = new Type[N]; g_counter = 0; unlock(g_lock); } int find( Type& obj, int number ) { lock(g_lock); for (int i = 0; i < number; i++) if (obj == g_stack[i]) break; // Found!!! if (i == number) i = -1; // Not found… Return -1 to caller unlock(g_lock); return i; } int find( Type& obj ) { return find( obj, g_counter ); } Can You Find the Race? Similar problem was found in java.util.Vector write read
  • 6. 6 Detecting Data Races?  NP-hard [Netzer&Miller 1990]  Input size = # instructions performed  Even for 3 threads only  Even with no loops/recursion  Execution orders/scheduling (#threads)thread_length  # inputs  Detection-code’s side-effects  Weak memory, instruction reorder, atomicity
  • 7. 7 Motivation Run-time framework goals  Collect a complete trace of a program’s user-mode execution  Keep the tracing overhead for both space and time low  Re-simulate the traced execution deterministically based on the collected trace with full fidelity down to the instruction level  Full fidelity: user mode only, no tracing of kernel, only user-mode I/O callbacks Advantages  Complete program trace that can be analyzed from multiple perspectives (replay analyzers: debuggers, locality, etc)  Trace can be collected on one machine and re-played on other machines (or perform live analysis by streaming) Challenges: Trace Size and Performance
  • 8. 8 Original Record-Replay Approaches  InstantReplay ’87  Record order or memory accesses  overhead may affect program behavior  RecPlay ’00  Record only synchronizations  Not deterministic if have data races  Netzer ’93  Record optimal trace  too expensive to keep track of all memory locations  Bacon & Goldstein ’91  Record memory bus transactions with hardware  high logging bandwidth
  • 9. 9 Motivation Increasing use and development for multi-core processors  MT program behavior is non-deterministic  To effectively debug software, developers must be able to replay executions that exhibit concurrency bugs  Shared memory updates happen in different order
  • 10. 10 Related Concepts  Runtime interpretation/translation of binary instructions  Requires no static instrumentation, or special symbol information  Handle dynamically generated code, self modifying code  Recording/Logging: ~100-200x  More recent logging  Proposed hardware support (for MT domain)  FDR (Flight Data Recorder)  BugNet (cache bits set on first load)  RTR (Regulated Transitive Reduction)  DeLorean (ISCA 2008- chunks of instructions)  Strata (time layer across all the logs for the running threads)  iDNA (Diagnostic infrastructure using NirvanA- Microsoft)
  • 11. 11 Deterministic Replay Re-execute the exact same sequence of instructions as recorded in a previous run  Single threaded programs  Record Load Values needed for reproducing behavior of a run (Load Log)  Registers updated by system calls and signal handlers (Reg Log)  Output of special instructions: RDTSC, CPUID (Reg Log)  System call (virtualization- cloning arguments, updates)  Checkpointing (log summary ~10Million)  Multi-threaded programs  Log interleaving among threads (shared memory updates ordering – SMO Log)
  • 12. 12 PinSEL – System Effect Log (SEL) Logging program load values needed for deterministic replay: – First access from a memory location – Values modified by the system (system effect) and read by program – Machine and time sensitive instructions (cpuid,rdtsc) Load A; (A = 111) Logged Not Logged Syscall modifies location (B -> 0) and (C -> 99) Load C; (C = 99) Load D; (D = 10) Store A; (A  111) Store B; (B  55) Load B; (B = 0) system call Program execution Load C; (C = 9) Load D; (D = 10) •Trace size is ~4-5 bytes per instruction
  • 13. 13 reads  Observation: Hardware caches eliminate most off-chip reads  Optimize logging:  Logger and replayer simulate identical cache memories  Simple cache (the memory copy structure) to decide which values to log. No tags or valid bits to check. If the values mismatch they are logged.  Average trace size is <1 bit per instruction i = 1; for (j = 0; j < 10; j++) { i = i + j; } k = i; // value read is 46 System_call(); k = i; // value read is 0 (not predicted)  The only read not predicted and logged follows the system call
  • 14. 14 Example Overhead  PinSEL and PinPLAY  Initial work (2006) with single threaded programs:  SPEC2000 ref runs: 130x slowdown for pinSEL and ~80x for PinPLAY (w/o in-lining)  Working with a subset of SPLASH2 benchmarks: 230x slowdown for PinSEL  Now: Geo-mean SPEC2006  Pin 1.4x  Logger 83.6x  Replayer 1.4x
  • 15. 15 Example: Microsoft iDNA Trace Writer Performance Applicatio n Simulated Instructions (millions) Trace File Size Trace File Bits / Instructio n Native Execution Time Execution Time While Tracing Execution Overhead Gzip 24,097 245 MB 0.09 11.7s 187s 15.98 Excel 1,781 99 MB 0.47 18.2s 105s 5.76 Power Point 7,392 528 MB 0.60 43.6s 247s 5.66 IE 116 5 MB 0.50 0.499s 6.94s 13.90 Vulcan 2,408 152 MB 0.53 2.74s 46.6s 17.01 Satsolver 9,431 1300 MB 1.16 9.78s 127s 12.98 •Memchecker and valgrind are in 30-40x range on CPU 2006 •iDNA ~11x, (does not log shared-memory dependences explicitly) •Use a sequential number for every lock prefixed memory operation: offline data race analysis
  • 16. 16 Logging Shared Memory Ordering (Cristiano’s PinSEL/PLAY Overview)  Emulation of Directory Based Cache Coherence  Identifies RAW, WAR, WAW dependences  Indexed by hashing effective address  Each entry represents an address range Store A Load B Program execution hash Dir Entry Dir Entry Dir Entry Dir Entry Directory
  • 17. 17 Directory Entries  Every DirEntry maintains:  Thread id of the last_writer  A timestamp is the # of memory ref. the thread has executed  Vector of timestamps of last access for each thread to that entry  On Loads: update the timestamp for the thread in the entry  On Stores: update the timestamp and the last_writer fields Programexecution Thread T1 Thread T2 Last writer id:1: Store A 2: Load A DirEntry: [A:D] Last writer id: DirEntry: [E:H] Directory T1: T2: T1: T2: 1: Load F 2: Store A 3: Load F 3: Store F T1 1 1 T2 22 3 T1 3 Vector
  • 18. 18 Detecting Dependences  RAW dependency between threads T and T’ is established if:  T executes a load that maps to the directory entry A  T’ is the last_writer for the same entry  WAW dependency between T and T’ is established if:  T executes a store that maps to the directory entry A  T’ is the last_writer for the same entry  WAR dependency between T and T’ is established if:  T executes a store that maps to the directory entry A  T’ has accessed the same entry in the past and T is not the last_writer
  • 19. 19 ExampleProgramexecution Thread T1 Thread T2 Last writer id:1: Store A 2: Load A DirEntry: [A:D] Last writer id: DirEntry: [E:H] T1: T2: T1: T2: 1: Load F 2: Store A 3: Load F 3: Store F T1 1 1 T2 22 3 T1 3 WAW RAW WAR T1 2 T2 2 T1 3 T2 3 T2 2 T1 1 SMO logs: Thread T1 cannot execute memory reference 2 until T2 executes its memory reference 2 Thread T2 cannot execute memory reference 2 until T1 executes its memory reference 1 Last access to the DirEntry Last_writer Last access to the DirEntry
  • 20. 20 Ordering Memory Accesses (Reducing log size)  Preserving order will reproduce execution  a→b: “a happens-before b”  Ordering is transitive: a→b, b→c means a→c  Two instructions must be ordered if:  they both access the same memory, and  one of them is a write
  • 21. 21 Constraints: Enforcing Order  To guarantee a→d:  a→d  b→d  a→c  b→c  Suppose we need b→c  b→c is necessary  a→d is redundant P1 a b c d P2 overconstrained
  • 22. 22  Reproduce exact same conflicts: no more, no less Problem Formulation ld A Thread I Thread J Recording st B st C sub ld B add st C ld B st A st C Thread I Thread J Replay Log ld D st D ld A st B st C sub ld B add st C ld B st A st C ld D st D Conflicts (red) Dependence (black)
  • 23. 23   Detect conflicts  Write log Log All Conflicts 1 2 3 4 5 6 1 2 3 4 5 6 ld A Thread I Thread J Replay st B st C sub ld B add st C ld B st A st C ld D st D Log J: 2→3 1→4 3→5 4→6 Log I: 2→3 Log Size: 5*16=80 bytes (10 integers) Dependence Log 16 bytes  Assign IC  (logical Timestamps)  But too many conflicts
  • 24. 24 Netzer’s Transitive Reduction 1 2 3 4 5 6 1 2 3 4 5 6 ld A Thread I Thread J Replay st B st C sub ld B add st C ld B st A st C ld D st D TR reduced Log J: 2→3 3→5 4→6 Log I: 2→3 Log Size: 64 bytes (8 integers) TR Reduced Log
  • 25. 25 RTR (Regulated Transitive Reduction): Stricter Dependences to Aid Vectorization 1 2 3 4 5 6 1 2 3 4 5 6 ld A Thread I Thread J Replay st B st C sub ld B add st C ld B st A st C ld D st D Log J: 2→3 4→5 Log I: 2→3 Log Size: 48 bytes (6 integers) New Reduced Log stricte r Reduce d 4% Overhead RTR+FDR (simulated on GEMs) .2 MB/core/second logging (Apache)

Editor's Notes

  1. &amp;lt;number&amp;gt; Talking about previous solutions, let’s have a short survey. Most previous record-replay solutions are in software. For example, InstantReplay and Netzer both try to record the software execution in software. But both of them suffered from high performance overhead due to high data bandwidth or high computation overhead. Bacon and Goldstein proposed a hardware recorder solution but have high logging bandwidth and required a central memory bus. Recently, RecPlay took a unique approach to reduce the performance overhead by only record synchronizations. Unfortunately it does not work for programs that contain data races. So, we have to record more than just synchronizations.
  2. Talk about the big picture first, we need to log and recreate dependencies, then we need to reduce the log size. Define dependencies using words Mention assume in-order replay Go slow, define everything Reproduce the same conflict, as we will see a naïve way to do that is to record all conflicts
  3. It is sufficient to log all, but is it necessary?
  4. Mention Bart’s PhD Netzer