SlideShare a Scribd company logo
1 of 29
Download to read offline
Nondeterminism is Unavoidable,
but Data Races are Pure Evil
Hans-J. Boehm
HP Labs
5 November 2012 1
Low-level nondeterminism is pervasive
• E.g.
– Atomically incrementing a global counter is
nondeterministic.
• Which is unobservable if only read after threads are joined.
– Memory allocation returns nondeterministic addresses.
• Which usually doesn’t matter.
• But sometimes does.
– Same for any other resource allocation.
• Harder to debug, but helps test coverage.
• These are races of a sort,
– but not data races
5 November 2012
5 November 20123
Data Races
• Two memory accesses conflict if they
– access the same memory location, e.g. variable
– at least one access is a store
• A program has a data race if two data accesses
– conflict, and
– can occur simultaneously.
• simultaneous = different threads and unordered by synchronization
• A program data-race-free (on a particular input) if no
sequentially consistent execution results in a data race.
November 2010 CACM Technical
Perspective
• “Data races are evil with no exceptions.”
- Sarita Adve, technical perspective for Goldilocks
and FastTrack papers on data race detection.
4 5 November 2012
Data Races are Evil, because:
• They are errors or poorly defined in all
mainstream multithreaded programming
languages.
– Treated like subscript errors in C11, C++11, pthreads,
Ada83, OpenMP, ...
• May unpredictably break code in practice.
– Now or when you recompile, upgrade your OS, or …
• Are easily avoidable in C11 & C++11.
• Don’t improve scalability significantly.
– Even if the code still works.
5 November 2012
C++ 2011 / C1x
• [C++ 2011 1.10p21] The execution of a program
contains a data race if it contains two
conflicting actions in different threads, at least
one of which is not atomic, and neither
happens before the other. Any such data race
results in undefined behavior.
6
Data races are errors!
(and you get atomic operations to avoid them)
Ada 83
• [ANSI-STD-1815A-1983, 9.11] For the actions performed by a program that
uses shared variables, the following assumptions can always be made:
– If between two synchronization points in a task, this task reads a shared
variable whose type is a scalar or access type, then the variable is not
updated by any other task at any time between these two points.
– If between two synchronization points in a task, this task updates a shared
variable whose task type is a scalar or access type, then the variable is
neither read nor updated by any other task at any time between these two
points.
• The execution of the program is erroneous if any of these assumptions is
violated.
7
Data races are errors!
Posix Threads Specification
• [IEEE 1003.1-2008, Base Definitions 4.11; dates back to
1995] Applications shall ensure that access to any
memory location by more than one thread of control
(threads or processes) is restricted such that no thread
of control can read or modify a memory location while
another thread of control may be modifying it.
8
Data races are errors!
Java
9
Data races are not errors.
But we don’t really know what they really mean.
(e.g. [Sevcik, Aspinall ECOOP 2008])
Absence of data races guarantees
sane semantics
5 November 2012
But does it really matter in
practice?
11
Code breakage example
12
x = 42;
done = true;
while (!done) {}
assert(x == 42);
Thread 1: Thread 2:
bool done; int x;
Code breakage example (2)
13
x = 42;
done = true;
tmp = done;
while (!tmp) {}
assert(x == 42);
Thread 1: Thread 2:
while (!done) {}
assert(x == 42);
Compiler
bool done; int x;
Code breakage example (3)
14
done = true;
x = 42;
while (!done) {}
assert(x == 42);
Thread 1: Thread 2:
x = 42;
done = true;
Compiler, ARM, POWER
bool done; int x;
Code breakage example (4)
15
x = 42;
done = true;
while (!done) {}
assert(x == 42);
Thread 1: Thread 2:
ARM, POWER
Load of x precedes
final load of done
bool done; int x;
Another code breakage example
16
{
bool tmp = x; // racy load
if (tmp)
launch_missile = true;
…
if (tmp && no_incoming)
launch_missile = false;
}
Another code breakage example
17
{
bool tmp = x; // racy load
if (tmp)
launch_missile = true;
…
if (tmp && no_incoming)
launch_missile = false;
}
Other “misoptimizations”
• Reading half a pointer.
• Reading uninitialized method table pointer.
• See HotPar ‘11 paper.
– Even redundant writes can fail.
5 November 2012
But if it works, does it help
performance?
• C11 and C++11 have two main ways to avoid
data races:
1. Locks
2. atomic objects
• Indivisible operations.
• Treated as synchronization; don’t create data races.
• Warns compiler of the race.
5 November 2012
Code breakage example fixed (1)
20
x = 42;
done = true;
while (!done) {}
assert(x == 42);
Thread 1: Thread 2:
atomic<bool> done; int x;
Code breakage example fixed (2)
21
x = 42;
done.store(true,
memory_order_release);
while (!done.load(
memory_order_acquire) {}
assert(x == 42);
Thread 1: Thread 2:
atomic<bool> done; int x;
Complex, and dangerous
but much simpler and safer than data races
So how much does synchronization
cost?
• Architecture dependent.
• On X86:
– Atomic stores with any ordering constraint except
default memory_order_seq_cst impose only
mild compiler constraints.
• If data races work, they can probably generate the same
code.
– Atomic loads impose only mild compiler constraints.
22
So how much does “non-free”
synchronization cost?
• Locking read-only critical sections can be
expensive.
– Adds cache contention where there was none.
– And contention is the real scalability issue.
– But if races “work”, then atomics work better.
– And on X86 atomic reads are “free”.
– And there are cheaper ways to handle read-only “critical
sections” (seqlocks, RCU). See my MSPC ‘12 paper.
• Here we focus on critical sections that write.
23
Simple benchmark
• Sieve of Eratosthenes example from PLDI 05 paper,
modified to write unconditionally:
for (my_prime = start; my_prime < SQRT; ++my_prime)
if (!get(my_prime)) {
for (multiple = my_prime;
multiple < MAX; multiple += my_prime)
set(multiple);
}
• Ran on 8 processors x 4 core x86 AMD machine.
• C++11 atomic-equivalents hand-generated.
• Lock individual accesses.
24
Time for 10⁸ primes, 1 and 2 cores
5 November 2012 25
0
5
10
15
20
25
30
1 core 2 cores
mutex
seq_cst atomic
plain store
Secs
Time for 10⁸ primes, more cores
5 November 2012 26
Why?
• Conjecture: Memory-bandwidth limited.
• memory_order_seq_cst adds mfence
instruction to store.
– Approximate effect: Flush store buffer.
• Local; no added memory traffic.
• Locking adds fences + some memory traffic.
– Locks are much smaller than bit array. (hashed)
– Tuned to minimize memory bandwidth and
contention.
– Also mostly local, parallelizable overhead.
5 November 2012
5 November 2012
Conclusion
Data
Races
Questions?
5 November 2012 29

More Related Content

What's hot

Working With Concurrency In Java 8
Working With Concurrency In Java 8Working With Concurrency In Java 8
Working With Concurrency In Java 8Heartin Jacob
 
Distributed Systems Theory for Mere Mortals - GeeCON Krakow May 2017
Distributed Systems Theory for Mere Mortals -  GeeCON Krakow May 2017Distributed Systems Theory for Mere Mortals -  GeeCON Krakow May 2017
Distributed Systems Theory for Mere Mortals - GeeCON Krakow May 2017Ensar Basri Kahveci
 
Reader/writer problem
Reader/writer problemReader/writer problem
Reader/writer problemRinkuMonani
 
Libckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixLibckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixZongYing Lyu
 
Parallel Programming: Beyond the Critical Section
Parallel Programming: Beyond the Critical SectionParallel Programming: Beyond the Critical Section
Parallel Programming: Beyond the Critical SectionTony Albrecht
 
Parallel Programming Primer
Parallel Programming PrimerParallel Programming Primer
Parallel Programming PrimerSri Prasanna
 
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012TEST Huddle
 
Deep learning architectures
Deep learning architecturesDeep learning architectures
Deep learning architecturesJoe li
 
Implementation of Election Algorithm of Distributed Systems in Client-Server ...
Implementation of Election Algorithm of Distributed Systems in Client-Server ...Implementation of Election Algorithm of Distributed Systems in Client-Server ...
Implementation of Election Algorithm of Distributed Systems in Client-Server ...Mushfekur Rahman
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with javaHoang Nguyen
 

What's hot (16)

Multi-Threading
Multi-ThreadingMulti-Threading
Multi-Threading
 
Working With Concurrency In Java 8
Working With Concurrency In Java 8Working With Concurrency In Java 8
Working With Concurrency In Java 8
 
Freckle
FreckleFreckle
Freckle
 
Distributed Systems Theory for Mere Mortals - GeeCON Krakow May 2017
Distributed Systems Theory for Mere Mortals -  GeeCON Krakow May 2017Distributed Systems Theory for Mere Mortals -  GeeCON Krakow May 2017
Distributed Systems Theory for Mere Mortals - GeeCON Krakow May 2017
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
Reader/writer problem
Reader/writer problemReader/writer problem
Reader/writer problem
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
Libckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixLibckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unix
 
Java under the hood
Java under the hoodJava under the hood
Java under the hood
 
Parallel Programming: Beyond the Critical Section
Parallel Programming: Beyond the Critical SectionParallel Programming: Beyond the Critical Section
Parallel Programming: Beyond the Critical Section
 
Parallel Programming Primer
Parallel Programming PrimerParallel Programming Primer
Parallel Programming Primer
 
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
 
Lock free programming- pro tips
Lock free programming- pro tipsLock free programming- pro tips
Lock free programming- pro tips
 
Deep learning architectures
Deep learning architecturesDeep learning architectures
Deep learning architectures
 
Implementation of Election Algorithm of Distributed Systems in Client-Server ...
Implementation of Election Algorithm of Distributed Systems in Client-Server ...Implementation of Election Algorithm of Distributed Systems in Client-Server ...
Implementation of Election Algorithm of Distributed Systems in Client-Server ...
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with java
 

Viewers also liked

Does Better Throughput Require Worse Latency?
Does Better Throughput Require Worse Latency?Does Better Throughput Require Worse Latency?
Does Better Throughput Require Worse Latency?racesworkshop
 
Parallel Sorting on a Spatial Computer
Parallel Sorting on a Spatial ComputerParallel Sorting on a Spatial Computer
Parallel Sorting on a Spatial Computerracesworkshop
 
Dancing with Uncertainty
Dancing with UncertaintyDancing with Uncertainty
Dancing with Uncertaintyracesworkshop
 
Programming with Relaxed Synchronization
Programming with Relaxed SynchronizationProgramming with Relaxed Synchronization
Programming with Relaxed Synchronizationracesworkshop
 
Welcome and Lightning Intros
Welcome and Lightning IntrosWelcome and Lightning Intros
Welcome and Lightning Introsracesworkshop
 
A Case for Relativistic Programming
A Case for Relativistic ProgrammingA Case for Relativistic Programming
A Case for Relativistic Programmingracesworkshop
 
Edge Chasing Delayed Consistency: Pushing the Limits of Weak Memory Models
Edge Chasing Delayed Consistency: Pushing the Limits of Weak Memory ModelsEdge Chasing Delayed Consistency: Pushing the Limits of Weak Memory Models
Edge Chasing Delayed Consistency: Pushing the Limits of Weak Memory Modelsracesworkshop
 
(Relative) Safety Properties for Relaxed Approximate Programs
(Relative) Safety Properties for Relaxed Approximate Programs(Relative) Safety Properties for Relaxed Approximate Programs
(Relative) Safety Properties for Relaxed Approximate Programsracesworkshop
 
Beyond Expert-Only Parallel Programming
Beyond Expert-Only Parallel ProgrammingBeyond Expert-Only Parallel Programming
Beyond Expert-Only Parallel Programmingracesworkshop
 
Keynote, LambdaConf 2015 - Ipecac for the Ouroboros
Keynote, LambdaConf 2015 - Ipecac for the OuroborosKeynote, LambdaConf 2015 - Ipecac for the Ouroboros
Keynote, LambdaConf 2015 - Ipecac for the OuroborosPaul Phillips
 

Viewers also liked (10)

Does Better Throughput Require Worse Latency?
Does Better Throughput Require Worse Latency?Does Better Throughput Require Worse Latency?
Does Better Throughput Require Worse Latency?
 
Parallel Sorting on a Spatial Computer
Parallel Sorting on a Spatial ComputerParallel Sorting on a Spatial Computer
Parallel Sorting on a Spatial Computer
 
Dancing with Uncertainty
Dancing with UncertaintyDancing with Uncertainty
Dancing with Uncertainty
 
Programming with Relaxed Synchronization
Programming with Relaxed SynchronizationProgramming with Relaxed Synchronization
Programming with Relaxed Synchronization
 
Welcome and Lightning Intros
Welcome and Lightning IntrosWelcome and Lightning Intros
Welcome and Lightning Intros
 
A Case for Relativistic Programming
A Case for Relativistic ProgrammingA Case for Relativistic Programming
A Case for Relativistic Programming
 
Edge Chasing Delayed Consistency: Pushing the Limits of Weak Memory Models
Edge Chasing Delayed Consistency: Pushing the Limits of Weak Memory ModelsEdge Chasing Delayed Consistency: Pushing the Limits of Weak Memory Models
Edge Chasing Delayed Consistency: Pushing the Limits of Weak Memory Models
 
(Relative) Safety Properties for Relaxed Approximate Programs
(Relative) Safety Properties for Relaxed Approximate Programs(Relative) Safety Properties for Relaxed Approximate Programs
(Relative) Safety Properties for Relaxed Approximate Programs
 
Beyond Expert-Only Parallel Programming
Beyond Expert-Only Parallel ProgrammingBeyond Expert-Only Parallel Programming
Beyond Expert-Only Parallel Programming
 
Keynote, LambdaConf 2015 - Ipecac for the Ouroboros
Keynote, LambdaConf 2015 - Ipecac for the OuroborosKeynote, LambdaConf 2015 - Ipecac for the Ouroboros
Keynote, LambdaConf 2015 - Ipecac for the Ouroboros
 

Similar to Nondeterminism is unavoidable, but data races are pure evil

Programming Language Memory Models: What do Shared Variables Mean?
Programming Language Memory Models: What do Shared Variables Mean?Programming Language Memory Models: What do Shared Variables Mean?
Programming Language Memory Models: What do Shared Variables Mean?greenwop
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel ComputingAkhila Prabhakaran
 
Evolution of JDK Tools for Multithreaded Programming
Evolution of JDK Tools for Multithreaded ProgrammingEvolution of JDK Tools for Multithreaded Programming
Evolution of JDK Tools for Multithreaded ProgrammingGlobalLogic Ukraine
 
Database Consistency Models
Database Consistency ModelsDatabase Consistency Models
Database Consistency ModelsSimon Ouellette
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsZvi Avraham
 
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
Concurrency Programming in Java - 01 - Introduction to Concurrency ProgrammingConcurrency Programming in Java - 01 - Introduction to Concurrency Programming
Concurrency Programming in Java - 01 - Introduction to Concurrency ProgrammingSachintha Gunasena
 
Parallel Computing - Lec 5
Parallel Computing - Lec 5Parallel Computing - Lec 5
Parallel Computing - Lec 5Shah Zaib
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rulesOleg Tsal-Tsalko
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsDanish Javed
 
Distributed Algorithms
Distributed AlgorithmsDistributed Algorithms
Distributed Algorithms913245857
 
Aleksandr_Butenko_Mobile_Development
Aleksandr_Butenko_Mobile_DevelopmentAleksandr_Butenko_Mobile_Development
Aleksandr_Butenko_Mobile_DevelopmentCiklum
 
Replication in the wild ankara cloud meetup - feb 2017
Replication in the wild   ankara cloud meetup - feb 2017Replication in the wild   ankara cloud meetup - feb 2017
Replication in the wild ankara cloud meetup - feb 2017AnkaraCloud
 
Replication in the wild ankara cloud meetup - feb 2017
Replication in the wild   ankara cloud meetup - feb 2017Replication in the wild   ankara cloud meetup - feb 2017
Replication in the wild ankara cloud meetup - feb 2017Onur Dayıbaşı
 
10 Multicore 07
10 Multicore 0710 Multicore 07
10 Multicore 07timcrack
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists jlacefie
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
 
Ch-7-Part-2-Distributed-System.pptx
Ch-7-Part-2-Distributed-System.pptxCh-7-Part-2-Distributed-System.pptx
Ch-7-Part-2-Distributed-System.pptxKabindra Koirala
 

Similar to Nondeterminism is unavoidable, but data races are pure evil (20)

Programming Language Memory Models: What do Shared Variables Mean?
Programming Language Memory Models: What do Shared Variables Mean?Programming Language Memory Models: What do Shared Variables Mean?
Programming Language Memory Models: What do Shared Variables Mean?
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Evolution of JDK Tools for Multithreaded Programming
Evolution of JDK Tools for Multithreaded ProgrammingEvolution of JDK Tools for Multithreaded Programming
Evolution of JDK Tools for Multithreaded Programming
 
Database Consistency Models
Database Consistency ModelsDatabase Consistency Models
Database Consistency Models
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
Concurrency Programming in Java - 01 - Introduction to Concurrency ProgrammingConcurrency Programming in Java - 01 - Introduction to Concurrency Programming
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
 
Parallel Computing - Lec 5
Parallel Computing - Lec 5Parallel Computing - Lec 5
Parallel Computing - Lec 5
 
Memory model
Memory modelMemory model
Memory model
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rules
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Distributed Algorithms
Distributed AlgorithmsDistributed Algorithms
Distributed Algorithms
 
Aleksandr_Butenko_Mobile_Development
Aleksandr_Butenko_Mobile_DevelopmentAleksandr_Butenko_Mobile_Development
Aleksandr_Butenko_Mobile_Development
 
Replication in the wild ankara cloud meetup - feb 2017
Replication in the wild   ankara cloud meetup - feb 2017Replication in the wild   ankara cloud meetup - feb 2017
Replication in the wild ankara cloud meetup - feb 2017
 
Replication in the wild ankara cloud meetup - feb 2017
Replication in the wild   ankara cloud meetup - feb 2017Replication in the wild   ankara cloud meetup - feb 2017
Replication in the wild ankara cloud meetup - feb 2017
 
10 Multicore 07
10 Multicore 0710 Multicore 07
10 Multicore 07
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists
 
Lecture1
Lecture1Lecture1
Lecture1
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
Memory model
Memory modelMemory model
Memory model
 
Ch-7-Part-2-Distributed-System.pptx
Ch-7-Part-2-Distributed-System.pptxCh-7-Part-2-Distributed-System.pptx
Ch-7-Part-2-Distributed-System.pptx
 

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Nondeterminism is unavoidable, but data races are pure evil

  • 1. Nondeterminism is Unavoidable, but Data Races are Pure Evil Hans-J. Boehm HP Labs 5 November 2012 1
  • 2. Low-level nondeterminism is pervasive • E.g. – Atomically incrementing a global counter is nondeterministic. • Which is unobservable if only read after threads are joined. – Memory allocation returns nondeterministic addresses. • Which usually doesn’t matter. • But sometimes does. – Same for any other resource allocation. • Harder to debug, but helps test coverage. • These are races of a sort, – but not data races 5 November 2012
  • 3. 5 November 20123 Data Races • Two memory accesses conflict if they – access the same memory location, e.g. variable – at least one access is a store • A program has a data race if two data accesses – conflict, and – can occur simultaneously. • simultaneous = different threads and unordered by synchronization • A program data-race-free (on a particular input) if no sequentially consistent execution results in a data race.
  • 4. November 2010 CACM Technical Perspective • “Data races are evil with no exceptions.” - Sarita Adve, technical perspective for Goldilocks and FastTrack papers on data race detection. 4 5 November 2012
  • 5. Data Races are Evil, because: • They are errors or poorly defined in all mainstream multithreaded programming languages. – Treated like subscript errors in C11, C++11, pthreads, Ada83, OpenMP, ... • May unpredictably break code in practice. – Now or when you recompile, upgrade your OS, or … • Are easily avoidable in C11 & C++11. • Don’t improve scalability significantly. – Even if the code still works. 5 November 2012
  • 6. C++ 2011 / C1x • [C++ 2011 1.10p21] The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior. 6 Data races are errors! (and you get atomic operations to avoid them)
  • 7. Ada 83 • [ANSI-STD-1815A-1983, 9.11] For the actions performed by a program that uses shared variables, the following assumptions can always be made: – If between two synchronization points in a task, this task reads a shared variable whose type is a scalar or access type, then the variable is not updated by any other task at any time between these two points. – If between two synchronization points in a task, this task updates a shared variable whose task type is a scalar or access type, then the variable is neither read nor updated by any other task at any time between these two points. • The execution of the program is erroneous if any of these assumptions is violated. 7 Data races are errors!
  • 8. Posix Threads Specification • [IEEE 1003.1-2008, Base Definitions 4.11; dates back to 1995] Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. 8 Data races are errors!
  • 9. Java 9 Data races are not errors. But we don’t really know what they really mean. (e.g. [Sevcik, Aspinall ECOOP 2008])
  • 10. Absence of data races guarantees sane semantics 5 November 2012
  • 11. But does it really matter in practice? 11
  • 12. Code breakage example 12 x = 42; done = true; while (!done) {} assert(x == 42); Thread 1: Thread 2: bool done; int x;
  • 13. Code breakage example (2) 13 x = 42; done = true; tmp = done; while (!tmp) {} assert(x == 42); Thread 1: Thread 2: while (!done) {} assert(x == 42); Compiler bool done; int x;
  • 14. Code breakage example (3) 14 done = true; x = 42; while (!done) {} assert(x == 42); Thread 1: Thread 2: x = 42; done = true; Compiler, ARM, POWER bool done; int x;
  • 15. Code breakage example (4) 15 x = 42; done = true; while (!done) {} assert(x == 42); Thread 1: Thread 2: ARM, POWER Load of x precedes final load of done bool done; int x;
  • 16. Another code breakage example 16 { bool tmp = x; // racy load if (tmp) launch_missile = true; … if (tmp && no_incoming) launch_missile = false; }
  • 17. Another code breakage example 17 { bool tmp = x; // racy load if (tmp) launch_missile = true; … if (tmp && no_incoming) launch_missile = false; }
  • 18. Other “misoptimizations” • Reading half a pointer. • Reading uninitialized method table pointer. • See HotPar ‘11 paper. – Even redundant writes can fail. 5 November 2012
  • 19. But if it works, does it help performance? • C11 and C++11 have two main ways to avoid data races: 1. Locks 2. atomic objects • Indivisible operations. • Treated as synchronization; don’t create data races. • Warns compiler of the race. 5 November 2012
  • 20. Code breakage example fixed (1) 20 x = 42; done = true; while (!done) {} assert(x == 42); Thread 1: Thread 2: atomic<bool> done; int x;
  • 21. Code breakage example fixed (2) 21 x = 42; done.store(true, memory_order_release); while (!done.load( memory_order_acquire) {} assert(x == 42); Thread 1: Thread 2: atomic<bool> done; int x; Complex, and dangerous but much simpler and safer than data races
  • 22. So how much does synchronization cost? • Architecture dependent. • On X86: – Atomic stores with any ordering constraint except default memory_order_seq_cst impose only mild compiler constraints. • If data races work, they can probably generate the same code. – Atomic loads impose only mild compiler constraints. 22
  • 23. So how much does “non-free” synchronization cost? • Locking read-only critical sections can be expensive. – Adds cache contention where there was none. – And contention is the real scalability issue. – But if races “work”, then atomics work better. – And on X86 atomic reads are “free”. – And there are cheaper ways to handle read-only “critical sections” (seqlocks, RCU). See my MSPC ‘12 paper. • Here we focus on critical sections that write. 23
  • 24. Simple benchmark • Sieve of Eratosthenes example from PLDI 05 paper, modified to write unconditionally: for (my_prime = start; my_prime < SQRT; ++my_prime) if (!get(my_prime)) { for (multiple = my_prime; multiple < MAX; multiple += my_prime) set(multiple); } • Ran on 8 processors x 4 core x86 AMD machine. • C++11 atomic-equivalents hand-generated. • Lock individual accesses. 24
  • 25. Time for 10⁸ primes, 1 and 2 cores 5 November 2012 25 0 5 10 15 20 25 30 1 core 2 cores mutex seq_cst atomic plain store Secs
  • 26. Time for 10⁸ primes, more cores 5 November 2012 26
  • 27. Why? • Conjecture: Memory-bandwidth limited. • memory_order_seq_cst adds mfence instruction to store. – Approximate effect: Flush store buffer. • Local; no added memory traffic. • Locking adds fences + some memory traffic. – Locks are much smaller than bit array. (hashed) – Tuned to minimize memory bandwidth and contention. – Also mostly local, parallelizable overhead. 5 November 2012