SlideShare a Scribd company logo
1 of 57
Programming Language Memory
             Models:
  What do Shared Variables Mean?

            Hans-J. Boehm
              HP Labs


7/28/2011                          1
The problem
• Shared memory parallel programs are built on shared
  variables visible to multiple threads of control.
• But there is a lot of confusion about what those variables
  mean:
     –   Are concurrent accesses allowed?
     –   What is a concurrent access?
     –   When do updates become visible to other threads?
     –   Can an update be partially visible?
• Many recent efforts with serious technical issues:
     – Java, OpenMP 3.0, UPC, Go …



7/28/2011                                                   2
Credits and Disclaimers:
• Much of this work was done by others or jointly. I‟m
  relying particularly on:
     – Basic approach: Sarita Adve, Mark Hill, Ada 83 …
     – JSR 133: Also Jeremy Manson, Bill Pugh, Doug Lea
     – C++11: Lawrence Crowl, Clark Nelson, Paul McKenney, Herb
       Sutter, …
     – Improved hardware models: Peter Sewell‟s group, many Intel,
       AMD, ARM, IBM participants …
     – Conflict exception work: Ceze, Lucia, Qadeer, Strauss
     – Recent Java Memory Model work: Sevcik, Aspinall, Cenciarelli
     – Race-free optimization: Pramod Joisha, Laura Effinger-Dean,
       Dhruva Chakrabarti
• But some of it is still controversial.
     – This reflects my personal views.
7/28/2011                                                             3
Outline
• Emerging consensus:
     – Interleaving semantics (Sequential Consistency)
     – But only for data-race-free programs
• Brief discussion of consequences
     – Software requirements
     – Hardware requirements
• Major remaining problem:
     – Java can‟t outlaw races.
     – We don‟t know how to give meaning to data races.
     – Some speculative solutions.

7/28/2011                                                 4
Naive threads programming model
     (Sequential Consistency)
• Threads behave as though their memory
  accesses were simply interleaved. (Sequential
  consistency)

        Thread 1              Thread 2
        x = 1;                y = 2;
        z = 3;

     – might be executed as
       x = 1; y = 2; z = 3;
7/28/2011                                         5
Locks restrict interleavings
          Thread 1                 Thread 2
          lock(l);                 lock(l);
          r1 = x;                  r2 = x;
          x = r1+1;                x = r2+1;
          unlock(l);               unlock(l);

     – can only be executed as
          lock(l); r1 = x; x = r1+1; unlock(l); lock(l);
          r2 = x; x = r2+1; unlock(l);
     or
        lock(l); r2 = x; x = r2+1; unlock(l); lock(l);
        r1 = x; x = r1+1; unlock(l);
     since second lock(l) must follow first unlock(l)

7/28/2011                                                  6
Atomic sections / transactional memory are
       just like a single global lock.




7/28/2011                                           7
But this doesn‟t quite work …
• Limits reordering and other hardware/compiler
  transformations
     – “Dekker‟s” example (everything initially zero) should
       allow r1 = r2 = 0:
            Thread 1                Thread 2
            x = 1;                  y = 1;
            r1 = y;                 r2 = x;
     – Compilers like to move up loads.
     – Hardware likes to buffer stores.


7/28/2011                                                      8
Sensitive to memory access
                     granularity
            Thread 1             Thread 2
            x = 300;             x = 100;

• If memory is accessed a byte at a time, this may be
  executed as:
      x_high = 0;
      x_high = 1; // x = 256
      x_low = 44; // x = 300;
      x_low = 100; // x = 356;


7/28/2011                                               9
And this is at too low a level …
• And taking advantage of sequential consistency
  involves reasoning about memory access
  interleaving:
     – Much too hard.
     – Want to reason about larger “atomic” code regions
             • which can‟t be visibly interleaved.




09/08/2010                                                 10
Real threads programming model
                 (1)
 • Two memory accesses conflict if they
      – access the same scalar object*, e.g. variable.
      – at least one access is a store.
      – E.g. x = 1; and r2 = x; conflict
 • Two ordinary memory accesses participate in a data
   race if they
      – conflict, and
      – can occur simultaneously
             • i.e. appear as adjacent operations (diff. threads) in interleaving.
 • A program is data-race-free (on a particular input) if no
   sequentially consistent execution results in a data race.

              * or contiguous sequence of bit-fields
09/08/2010                                                                           11
Real threads programming model
                 (2)
• Sequential consistency only for data-race-
  free programs!
     – Avoid anything else.
• Data races are prevented by
     – locks (or atomic sections) to restrict
       interleaving
     – declaring synchronization variables
             • (in two slides …)


09/08/2010                                      12
Alternate data-race-free definition:
          happens-before
• Memory access a happens-before b if
   – a precedes b in program order.                 Thread 1:
   – a and b are synchronization operations,
     and b observes the results of a, thus          lock(m);
     enforcing ordering between them.               t = x + 1;
      • e.g. a unlocks m, b subsequently locks m.
   – Or there is a c such that a happens-before x = t;
     c and c happens-before b.                  unlock(m)
• Two ordinary conflicting memory
  operations a and b participate in a                           Thread 2:
  data race if neither                                          lock(m);
   – a happens-before b,                                        t = x + 1;
   – nor b happens-before a.
• Set of data-race-free programs usually the                    x = t;
  same.                                                         unlock(m)
Synchronization variables
•   Java: volatile, java.util.concurrent.atomic.
•   C++11: atomic<int>, atomic_int
•   C1x: _Atomic(int), _Atomic int, atomic_int
•   Guarantee indivisibility of operations.
•   “Don‟t count” in determining whether there is a data race:
     – Programs with “races” on synchronization variables are still
       sequentially consistent.
     – Though there may be “escapes” (Java, C++11).
• Dekker‟s algorithm “just works” with synchronization
  variables.



7/28/2011                                                             14
As expressive as races
Double-checked locking:      Double-checked locking:
Wrong!                       Correct (C++11):
bool x_init;                 atomic<bool> x_init;

if (!x_init) {               if (!x_init) {
   l.lock();                    l.lock();
   if (!x_init) {               if (!x_init) {
            initialize x;           initialize x;
            x_init = true;          x_init = true;
     }                           }
     l.unlock();                 l.unlock();
}                            }
use x;                       use x;


7/28/2011                                              15
Data Races Revisited
• Are defined in terms of sequentially
  consistent executions.
• If x and y are initially zero, this does not
  have a data race:
            Thread 1       Thread 2
            if (x)         if (y)
              y = 1;         x = 1;




7/28/2011                                        16
SC for DRF programming model
         advantages over SC
• Supports important hardware & compiler optimizations.
• DRF restriction  Synchronization-free code sections
  appear to execute atomically, i.e. without visible
  interleaving.
     – If one didn‟t:

 Thread 1 (not atomic):   Thread 2(observer):
      a = 1;
                           if (a == 1 && b == 0) {

      b = 1;
                               …
                           }


09/08/2010                                                17
SC for DRF implementation model
               (1)
• Very restricted reordering of                 lock();
  memory operations around                       synch-free
  synchronization operations:                      code
  – Compiler either understands these, or
                                                  region
    treats them as opaque, potentially
    updating any location.                      unlock();
  – Synchronization operations include
                                                 synch-free
    instructions to limit or prevent
    hardware reordering (“memory                   code
    fences”).                                     region
     • e.g. lock acquisition, release, atomic
       store, might contain memory fences.
SC for DRF implementation model
              (2)
• Code may be reordered                  lock();
  between synchronization
  operations.                             synch-free

  – Another thread can only tell if it      code
    accesses the same data between         region
    reordered operations.
  – Such an access would be a data       unlock();
    race.
                                          synch-free
• If data races are disallowed              code
  (e.g. Posix, Ada, C++11, not
  Java), compiler may assume               region
  that variables don‟t change
  asynchronously.
Data
                                                            races
               Some variants                                 are
                                                            errors

C++11 draft     SC for DRF,                                 X
                with explicit easily avoidable exceptions
C1x draft
Java            SC for DRF, with some exceptions
                More details later.

Ada83           SC for DRF (?)                              X
Pthreads        SC for DRF (??)                             X
OpenMP          SC for DRF (??, except atomics)             X
Fortran 2008    SC for DRF (??, except atomics)             X
C#, .Net        SC for DRF when restricted to
                locks + Interlocked (???)
7/28/2011                                                       20
The exceptions in C++11:
atomic<bool> x_init;

if (!x_init.load(memory_order_acquire) {
   l.lock();
   if (!x_init.load(memory_order_relaxed) {
         initialize x;
         x_init.store(true, memory_order_release);
    }
    l.unlock();
}
use x;

We‟ll ignore this for the rest of the talk …
Data Races as Errors
• In many languages, data races are errors
     – In the sense of out-of-bounds array stores in C.
     – Anything can happen:
         • E.g. program can crash, local variables can appear
           to change, etc.
     – There are no benign races in e.g. C++11 or Ada 83.
     – Compilers may, and do, assume there are no races.
       If that assumption is false, all bets are off.
     – See e.g. Boehm, How to miscompile programs with
       “benign” data races, HotPar „11.
      .
7/28/2011                                                  22
How a data race may cause a
                    crash
unsigned x;                       • Assume switch
                                    statement compiled as
If (x < 3) {                        branch table.
                                  • May assume x is in
     … // async x change
                                    range.
     switch(x) {
                                  • Asynchronous change to
         case 0: …                  x causes wild branch.
         case 1: …                   – Not just wrong value.
         case 2: …
     }
}
23                         28 July 2011
Note: We can usually ignore
              wait/notify
• In most languages wait can return without
  notify.
     – “Spurious wakeup”
• Except for termination issues:
     – wait() ≡ unlock(); lock()
     – notify() ≡ nop
• Applies to existence of data races, partial
  correctness, flow analysis.
7/28/2011                                       24
Another note on data race
                     definition
• We define it in terms of scalar accesses, but
• Container libraries should ensure that
            Container accesses don‟t race 
            No races on memory locations
• This means
     – Accesses to hidden shared state (caches,
       allocation) must be locked by implementation.
     – User must lock for container-level races.
• This is often the correct library thread-safety
  condition.
7/28/2011                                              25
Outline
• Emerging consensus:
     – Interleaving semantics (Sequential Consistency)
     – But only for data-race-free programs
• Brief discussion of consequences
     – Software requirements
     – Hardware requirements
• Major remaining problem:
     – Java can‟t outlaw races.
     – We don‟t know how to give meaning to data races.
     – Some speculative solutions.

7/28/2011                                                 26
Compilers must not introduce data
             races
• Single thread compilers currently may add
  data races: (PLDI 05)
     struct {char a; char b} x;
                                  tmp = x;
     x.a = „z‟;                   tmp.a = „z‟;
                                  x = tmp;

     – x.a = 1 in parallel with x.b = 1 may fail to
       update x.b.
• … and much more interesting examples.
• Still broken in gcc with bit-fields.
7/28/2011                                         27
Some restrictions are a bit more
               annoying:
• Compiler may not introduce “speculative” stores:
     int count;   // global, possibly shared
     …
     for (p = q; p != 0; p = p -> next)
        if (p -> data > 0) ++count;




     int count;   // global, possibly shared
     …
     reg = count;
     for (p = q; p != 0; p = p -> next)
        if (p -> data > 0) ++reg;
     count = reg; // may spuriously assign to count

28                       28 July 2011
Introducing data races:
            Potentially infinite loops
                        ?
while (…) { *p = 1; }       x = 1;
x = 1;                      while (…) { *p = 1; }


• x not otherwise accessed by loop.
• Prohibited in Java.
• Allowed by giving undefined behavior to such
  infinite loops in C++0x. (Controversial.)
• Common text book analyses treat as equivalent?

7/28/2011                                           29
Note on program analysis for
             optimization
• Lots of research papers on analysis of
  sequentially consistent parallel programs.
• AFAIK: Real compilers don’t do that!
     – For good reason.
     – Use sequential optimization in sync-free regions.
• SC analysis is
     – Generally sound, but pessimistic for C++11 (minus
       SC escapes)
     – Very much so with incomplete information
     – Unsound for Java (but …)
7/28/2011                                                  30
Some positive optimization
            consequences
• Rules are finally clear!
     – Optimization does not need to preserve full sequential
       consistency!
• In languages that outlaw data races
     – Compilers already use that.
     – It can be leveraged more aggressively:
     – Assuming no other synchronization, x not motified by this thread,
       x is loop invariant in
         while (…) {
              … x …
              l.lock(); … l.unlock();
         }

7/28/2011                                                             31
Language spec challenge 1
• Some common language features almost
  unavoidably introduce data races.
• Most significant example:
     – Detached/daemon threads, combined with
     – Destructors / finalizers for static variables
     – Detached threads call into libraries that may
       access static variables
            • Even while they‟re being cleaned up.


7/28/2011                                              32
Language spec challenge 2:
    • Some really awful code:
                    Thread 2:       Don’t try this at home!!
Thread 1:

    x = 42;            while (m.trylock()==SUCCESS)
?   m.lock();            m.unlock();
                       assert (x == 42);
                                       •   Disclaimer: Example requires
    •   Can the assertion fail?            tweaking to be pthreads-
                                           compliant.
    •   Many implementations: Yes
    •   Traditional specs: No. C++11: Yes
    •   Trylock() can effectively fail spuriously!
    09/08/2010                                                       33
Outline
• Emerging consensus:
     – Interleaving semantics (Sequential Consistency)
     – But only for data-race-free programs
• Brief discussion of consequences
     – Software requirements
     – Hardware requirements
• Major remaining problem:
     – Java can‟t outlaw races.
     – We don‟t know how to give meaning to data races.
     – Some speculative solutions.

7/28/2011                                                 34
Byte store instructions
• x.c = „a‟; may not visibly read and
  rewrite adjacent fields.
• Byte stores must be implemented with
     – Byte store instruction, or
     – Atomic read-modify-write.
            • Typically expensive on multiprocessors.
            • Often cheaply implementable on uniprocessors.



7/28/2011                                                     35
Sequential consistency must be
              enforceable
• Programs using only synchronization variables
  must be sequentially consistent.
• Compiler literature contains many papers on
  enforcing sequential consistency by adding
  fences. But:
     – Not really possible on Itanium.
     – Wasn‟t possible on X86 until the re-revision of the
       spec last year.
     – Took months of discussions with PowerPC architects
       to conclude it‟s (barely, sort of) possible there.
• The core issue is “write atomicity”:
7/28/2011                                                36
Can fences enforce SC?
Unclear that hardware fences can ensure sequential
consistency. “IRIW” example:
x, y initially zero. Fences between every instruction pair!
 Thread 1:     Thread 2:              Thread 3:     Thread 4:
 x = 1;        r1 = x; (1)            y = 1;        r3 = y; (1)
               fence;                               fence;
               r2 = y; (0)                          r4 = x; (0)

                x set first!                       y set first!

 Fully fenced, not sequentially consistent. Does hardware allow it?

7/28/2011                                                             37
Why does it matter?
• Nobody cares about IRIW!?
• It‟s a pain to enforce on at least PowerPC.
• Many people (Sarita Adve, Doug Lea,
  Vijay Saraswat) spent about a year trying
  to relax SC requirement here.
• (Personal opinion) The results were
  incomprehensible, and broke more
  important code.
• No viable alternatives!
7/28/2011                                   38
Acceptable hardware memory
             models
• More challenging requirements:
     1.     Precise memory model specification
     2.     Byte stores
     3.     Cheap mechanism to enforce write atomicity
     4.     Dirt cheap mechanism to enforce data
            dependency ordering(?) (Java final fields)
• Other than that, all standard approaches
  appear workable, but …

7/28/2011                                            39
Replace fences completely?
        Synchronization variables on X86
• atomic store:     ~1 cycle   dozens of cycles
     – store (mov); mfence;
• atomic load:      ~1 cycle
     – load (mov)
• Store implicitly ensures that prior memory
  operations become visible before store.
• Load implicitly ensures that subsequent memory
  operations become visible later.
• Sole reason for mfence: Order atomic store
  followed by atomic load.
7/28/2011                                         40
Fence enforces all kinds of
 additional, unobservable orderings
• s is a synchronization variable:
       x = 1;
       s = 2; // includes fence
       r1 = y;
• Prevents reordering of x = 1 and r1 = y;
   – final load delayed until assignment to a visible.
• But this ordering is invisible to non-racing threads
     – …and expensive to enforce?
• We need a tiny fraction of mfence functionality.


7/28/2011                                                41
Outline
• Emerging consensus:
     – Interleaving semantics (Sequential Consistency)
     – But only for data-race-free programs
• Brief discussion of consequences
     – Software requirements
     – Hardware requirements
• Major remaining problem:
     – Java can‟t outlaw races.
     – We don‟t know how to give meaning to data races.
     – Some speculative solutions.

7/28/2011                                                 42
Data Races in Java
• C++11 leaves data race semantics
  undefined.
     – “catch fire” semantics
• Java supports sand-boxed code.
• Don‟t know how to prevent data-races in
  sand-boxed, malicious code.
• Java must provide some guarantees in the
  presence of data races.
7/28/2011                                43
Interesting data race outcome?
            x, y initially null,
            Loads may or may not see racing stores?

Thread 1:                            Thread 2:
  r1 = x;                              r2 = y;
  y = r1;                              x = r2;



            Outcome: x = y = r1 = r2 =
            “<your bank password here>”


7/28/2011                                             44
The Java Solution
Quotation from 17.4.8, Java Language Specification, 3rd
edition, omitted, to avoid possible copyright questions. It
addresses the issue by explicitly outlawing the causality
cycles required to get the dubious result from the last slide.

The important point is that this is a rather complex
mathematical specification.




                                               …
7/28/2011                                                    45
Complicated, but nice properties?
• Manson, Pugh, Adve: The Java Memory
  Model, POPL 05


 Quotation from section 9.1.2 of above paper omitted, to avoid
 possible copyright questions. This asserts (Theorem 1) that
 non-conflicting operations may be reordered by a compiler.




7/28/2011                                                        46
Much nicer than prior attempts, but:
• Aspinall, Sevcik, “Java Memory Model Examples: Good,
  Bad, and Ugly”, VAMP 2007 (also ECOOP 2008 paper)



Quotation from above paper omitted, to avoid possible
copyright questions. This ends in the statement:
“This falsifies Theorem 1 of [paper from previous slide].”



Note: The underlying observation is due to Pietro Cenciarelli.



7/28/2011                                                        47
Why is this hard?
• Want
     – Constrained race semantics for essential
       security properties.
     – Unconstrained race semantics to support
       compiler and hardware optimizations.
     – Simplicity.
• No known good resolution.


7/28/2011                                         48
Outline
• Emerging consensus:
     – Interleaving semantics (Sequential Consistency)
     – But only for data-race-free programs
• Brief discussion of consequences
     – Software requirements
     – Hardware requirements
• Major remaining problem:
     – Java can‟t outlaw races.
     – We don‟t know how to give meaning to data races.
     – Some speculative solutions.

7/28/2011                                                 49
A Different Approach
• Outlaw data races.
• Require violations to be detectable!
     – Even in malicious sand-boxed code.


• Possible approaches:
     – Statically prevent data races.
            • Tried repeatedly, ongoing work …
     – Dynamically detect the relevant data races.
7/28/2011                                            50
Dynamic Race Detection
• Need to guarantee one of:
     – Program is data-race free and provides SC execution (done),
     – Program contains a data race and raises an exception, or
     – Program exhibits simple semantics anyway, e.g.
            • Sequentially consistent
            • Synchronization-free regions are atomic
• This is significantly cheaper than fully accurate data-race
  detection.
     – Track byte-level R/W information
     – Mostly in cache
     – As opposed to epoch number + thread id per byte



7/28/2011                                                            51
For more information:
• Boehm, “Threads Basics”, HPL TR 2009-259.
• Boehm, Adve, “Foundations of the C++ Concurrency Memory
  Model”, PLDI 08.
• Sevcık and Aspinall, “On Validity of Program Transformations in the
  Java Memory Model”, ECOOP 08.
• Sewell et al, “x86-TSO: A Rigorous and Usable Programmer‟s
  Model for x86”, CACM, July 2010.
• S. V. Adve, Boehm, “Memory Models: A Case for Rethinking Parallel
  Languages and Hardware”, CACM, August 2010.
• Lucia, Strauss, Ceze, Qadeer, Boehm, "Conflict Exceptions:
  Providing Simple Parallel Language Semantics with Precise
  Hardware Exceptions, ISCA 2010. Also Marino et al, PLDI 10.
• Effinger-Dean, Boehm, Chakrabarti, Joisha, “Extended Sequential
  Reasoning for Data-Race-Free Programs”, MSPC 11.



7/28/2011                                                          52
Questions?




7/28/2011                53
Backup slides




7/28/2011                   54
Introducing Races:
            Register Promotion 2
                       r = g;
                       for(...) {
[g is global]            if(mt) {
for(...) {                 g = r; lock(); r = g;
                         }
    if(mt) lock();
                         use/update r instead of g;
    use/update g;        if(mt) {
    if(mt) unlock();       g = r; unlock(); r = g;
                         }
}
                       }
                       g = r;


7/28/2011                                       55
Trylock:
            Critical section reordering?
• Reordering of memory operations with respect to critical
  sections:

    Expected (& Java):   Naïve pthreads:   Optimized pthreads



       lock()                lock()             lock()




     unlock()              unlock()          unlock()



7/28/2011                                                       56
Some open source pthread lock
   implementations (2006):
       lock()                  lock()            lock()           lock()




     unlock()               unlock()          unlock()           unlock()




  [technically incorrect]   [Correct, slow]      [Correct]        [Incorrect]
            NPTL                NPTL               NPTL           FreeBSD
   {Alpha, PowerPC}         Itanium (&X86)    { Itanium, X86 }     Itanium
      {mutex, spin}             mutex              spin              spin
7/28/2011                                                                       57

More Related Content

What's hot

Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model佳蓉 倪
 
RNN, LSTM and Seq-2-Seq Models
RNN, LSTM and Seq-2-Seq ModelsRNN, LSTM and Seq-2-Seq Models
RNN, LSTM and Seq-2-Seq ModelsEmory NLP
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)RichardWarburton
 
論文輪読資料「Gated Feedback Recurrent Neural Networks」
論文輪読資料「Gated Feedback Recurrent Neural Networks」論文輪読資料「Gated Feedback Recurrent Neural Networks」
論文輪読資料「Gated Feedback Recurrent Neural Networks」kurotaki_weblab
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNsGrigory Sapunov
 
RNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataRNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataYao-Chieh Hu
 
Codefreeze eng
Codefreeze engCodefreeze eng
Codefreeze engDevexperts
 
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...David Walker
 
Sstic 2015 detailed_version_triton_concolic_execution_frame_work_f_saudel_jsa...
Sstic 2015 detailed_version_triton_concolic_execution_frame_work_f_saudel_jsa...Sstic 2015 detailed_version_triton_concolic_execution_frame_work_f_saudel_jsa...
Sstic 2015 detailed_version_triton_concolic_execution_frame_work_f_saudel_jsa...Jonathan Salwan
 
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...Jason Hearne-McGuiness
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
 
A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.Jason Hearne-McGuiness
 

What's hot (19)

LSTM
LSTMLSTM
LSTM
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
 
RNN, LSTM and Seq-2-Seq Models
RNN, LSTM and Seq-2-Seq ModelsRNN, LSTM and Seq-2-Seq Models
RNN, LSTM and Seq-2-Seq Models
 
nn network
nn networknn network
nn network
 
Rnn & Lstm
Rnn & LstmRnn & Lstm
Rnn & Lstm
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)
 
論文輪読資料「Gated Feedback Recurrent Neural Networks」
論文輪読資料「Gated Feedback Recurrent Neural Networks」論文輪読資料「Gated Feedback Recurrent Neural Networks」
論文輪読資料「Gated Feedback Recurrent Neural Networks」
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNs
 
Collections forceawakens
Collections forceawakensCollections forceawakens
Collections forceawakens
 
ASE02.ppt
ASE02.pptASE02.ppt
ASE02.ppt
 
LSTM Basics
LSTM BasicsLSTM Basics
LSTM Basics
 
RNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataRNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential Data
 
Codefreeze eng
Codefreeze engCodefreeze eng
Codefreeze eng
 
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
 
Sstic 2015 detailed_version_triton_concolic_execution_frame_work_f_saudel_jsa...
Sstic 2015 detailed_version_triton_concolic_execution_frame_work_f_saudel_jsa...Sstic 2015 detailed_version_triton_concolic_execution_frame_work_f_saudel_jsa...
Sstic 2015 detailed_version_triton_concolic_execution_frame_work_f_saudel_jsa...
 
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
 
Deep Learning in theano
Deep Learning in theanoDeep Learning in theano
Deep Learning in theano
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.
 

Similar to Programming Language Memory Models: What do Shared Variables Mean

Nondeterminism is unavoidable, but data races are pure evil
Nondeterminism is unavoidable, but data races are pure evilNondeterminism is unavoidable, but data races are pure evil
Nondeterminism is unavoidable, but data races are pure evilracesworkshop
 
Google: Cluster computing and MapReduce: Introduction to Distributed System D...
Google: Cluster computing and MapReduce: Introduction to Distributed System D...Google: Cluster computing and MapReduce: Introduction to Distributed System D...
Google: Cluster computing and MapReduce: Introduction to Distributed System D...tugrulh
 
Introduction & Parellelization on large scale clusters
Introduction & Parellelization on large scale clustersIntroduction & Parellelization on large scale clusters
Introduction & Parellelization on large scale clustersSri Prasanna
 
Константин Серебряный, Google, - Как мы охотимся на гонки (data races) или «н...
Константин Серебряный, Google, - Как мы охотимся на гонки (data races) или «н...Константин Серебряный, Google, - Как мы охотимся на гонки (data races) или «н...
Константин Серебряный, Google, - Как мы охотимся на гонки (data races) или «н...Media Gorod
 
Simon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSimon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSkills Matter
 
Peyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futurePeyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futureTakayuki Muranushi
 
Distributed computing presentation
Distributed computing presentationDistributed computing presentation
Distributed computing presentationDelhi/NCR HUG
 
what every web and app developer should know about multithreading
what every web and app developer should know about multithreadingwhat every web and app developer should know about multithreading
what every web and app developer should know about multithreadingIlya Haykinson
 
Константин Серебряный "Быстрый динамичекский анализ программ на примере поиск...
Константин Серебряный "Быстрый динамичекский анализ программ на примере поиск...Константин Серебряный "Быстрый динамичекский анализ программ на примере поиск...
Константин Серебряный "Быстрый динамичекский анализ программ на примере поиск...Yandex
 
Fast dynamic analysis, Kostya Serebryany
Fast dynamic analysis, Kostya SerebryanyFast dynamic analysis, Kostya Serebryany
Fast dynamic analysis, Kostya Serebryanyyaevents
 
AOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itAOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itZubair Nabi
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Studyelliando dias
 
Modern Java Concurrency
Modern Java ConcurrencyModern Java Concurrency
Modern Java ConcurrencyBen Evans
 
C11/C++11 Memory model. What is it, and why?
C11/C++11 Memory model. What is it, and why?C11/C++11 Memory model. What is it, and why?
C11/C++11 Memory model. What is it, and why?Mikael Rosbacke
 

Similar to Programming Language Memory Models: What do Shared Variables Mean (20)

Nondeterminism is unavoidable, but data races are pure evil
Nondeterminism is unavoidable, but data races are pure evilNondeterminism is unavoidable, but data races are pure evil
Nondeterminism is unavoidable, but data races are pure evil
 
Memory model
Memory modelMemory model
Memory model
 
Google: Cluster computing and MapReduce: Introduction to Distributed System D...
Google: Cluster computing and MapReduce: Introduction to Distributed System D...Google: Cluster computing and MapReduce: Introduction to Distributed System D...
Google: Cluster computing and MapReduce: Introduction to Distributed System D...
 
Introduction & Parellelization on large scale clusters
Introduction & Parellelization on large scale clustersIntroduction & Parellelization on large scale clusters
Introduction & Parellelization on large scale clusters
 
Константин Серебряный, Google, - Как мы охотимся на гонки (data races) или «н...
Константин Серебряный, Google, - Как мы охотимся на гонки (data races) или «н...Константин Серебряный, Google, - Как мы охотимся на гонки (data races) или «н...
Константин Серебряный, Google, - Как мы охотимся на гонки (data races) или «н...
 
Hoard_2022AIM1001.pptx.pdf
Hoard_2022AIM1001.pptx.pdfHoard_2022AIM1001.pptx.pdf
Hoard_2022AIM1001.pptx.pdf
 
Simon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSimon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelism
 
Peyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futurePeyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_future
 
Memory model
Memory modelMemory model
Memory model
 
Distributed computing presentation
Distributed computing presentationDistributed computing presentation
Distributed computing presentation
 
what every web and app developer should know about multithreading
what every web and app developer should know about multithreadingwhat every web and app developer should know about multithreading
what every web and app developer should know about multithreading
 
Константин Серебряный "Быстрый динамичекский анализ программ на примере поиск...
Константин Серебряный "Быстрый динамичекский анализ программ на примере поиск...Константин Серебряный "Быстрый динамичекский анализ программ на примере поиск...
Константин Серебряный "Быстрый динамичекский анализ программ на примере поиск...
 
Fast dynamic analysis, Kostya Serebryany
Fast dynamic analysis, Kostya SerebryanyFast dynamic analysis, Kostya Serebryany
Fast dynamic analysis, Kostya Serebryany
 
Surge2012
Surge2012Surge2012
Surge2012
 
AOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itAOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on it
 
Ndp Slides
Ndp SlidesNdp Slides
Ndp Slides
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
 
Modern Java Concurrency
Modern Java ConcurrencyModern Java Concurrency
Modern Java Concurrency
 
C11/C++11 Memory model. What is it, and why?
C11/C++11 Memory model. What is it, and why?C11/C++11 Memory model. What is it, and why?
C11/C++11 Memory model. What is it, and why?
 
Failure Of DEP And ASLR
Failure Of DEP And ASLRFailure Of DEP And ASLR
Failure Of DEP And ASLR
 

More from greenwop

Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programsgreenwop
 
Unifying Remote Data, Remote Procedure, and Service Clients
Unifying Remote Data, Remote Procedure, and Service ClientsUnifying Remote Data, Remote Procedure, and Service Clients
Unifying Remote Data, Remote Procedure, and Service Clientsgreenwop
 
Expressiveness, Simplicity and Users
Expressiveness, Simplicity and UsersExpressiveness, Simplicity and Users
Expressiveness, Simplicity and Usersgreenwop
 
Category theory, Monads, and Duality in the world of (BIG) Data
Category theory, Monads, and Duality in the world of (BIG) DataCategory theory, Monads, and Duality in the world of (BIG) Data
Category theory, Monads, and Duality in the world of (BIG) Datagreenwop
 
A Featherweight Approach to FOOL
A Featherweight Approach to FOOLA Featherweight Approach to FOOL
A Featherweight Approach to FOOLgreenwop
 
The Rise of Dynamic Languages
The Rise of Dynamic LanguagesThe Rise of Dynamic Languages
The Rise of Dynamic Languagesgreenwop
 
Turning a Tower of Babel into a Beautiful Racket
Turning a Tower of Babel into a Beautiful RacketTurning a Tower of Babel into a Beautiful Racket
Turning a Tower of Babel into a Beautiful Racketgreenwop
 
Normal Considered Harmful
Normal Considered HarmfulNormal Considered Harmful
Normal Considered Harmfulgreenwop
 
High Performance JavaScript
High Performance JavaScriptHigh Performance JavaScript
High Performance JavaScriptgreenwop
 

More from greenwop (9)

Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programs
 
Unifying Remote Data, Remote Procedure, and Service Clients
Unifying Remote Data, Remote Procedure, and Service ClientsUnifying Remote Data, Remote Procedure, and Service Clients
Unifying Remote Data, Remote Procedure, and Service Clients
 
Expressiveness, Simplicity and Users
Expressiveness, Simplicity and UsersExpressiveness, Simplicity and Users
Expressiveness, Simplicity and Users
 
Category theory, Monads, and Duality in the world of (BIG) Data
Category theory, Monads, and Duality in the world of (BIG) DataCategory theory, Monads, and Duality in the world of (BIG) Data
Category theory, Monads, and Duality in the world of (BIG) Data
 
A Featherweight Approach to FOOL
A Featherweight Approach to FOOLA Featherweight Approach to FOOL
A Featherweight Approach to FOOL
 
The Rise of Dynamic Languages
The Rise of Dynamic LanguagesThe Rise of Dynamic Languages
The Rise of Dynamic Languages
 
Turning a Tower of Babel into a Beautiful Racket
Turning a Tower of Babel into a Beautiful RacketTurning a Tower of Babel into a Beautiful Racket
Turning a Tower of Babel into a Beautiful Racket
 
Normal Considered Harmful
Normal Considered HarmfulNormal Considered Harmful
Normal Considered Harmful
 
High Performance JavaScript
High Performance JavaScriptHigh Performance JavaScript
High Performance JavaScript
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Programming Language Memory Models: What do Shared Variables Mean

  • 1. Programming Language Memory Models: What do Shared Variables Mean? Hans-J. Boehm HP Labs 7/28/2011 1
  • 2. The problem • Shared memory parallel programs are built on shared variables visible to multiple threads of control. • But there is a lot of confusion about what those variables mean: – Are concurrent accesses allowed? – What is a concurrent access? – When do updates become visible to other threads? – Can an update be partially visible? • Many recent efforts with serious technical issues: – Java, OpenMP 3.0, UPC, Go … 7/28/2011 2
  • 3. Credits and Disclaimers: • Much of this work was done by others or jointly. I‟m relying particularly on: – Basic approach: Sarita Adve, Mark Hill, Ada 83 … – JSR 133: Also Jeremy Manson, Bill Pugh, Doug Lea – C++11: Lawrence Crowl, Clark Nelson, Paul McKenney, Herb Sutter, … – Improved hardware models: Peter Sewell‟s group, many Intel, AMD, ARM, IBM participants … – Conflict exception work: Ceze, Lucia, Qadeer, Strauss – Recent Java Memory Model work: Sevcik, Aspinall, Cenciarelli – Race-free optimization: Pramod Joisha, Laura Effinger-Dean, Dhruva Chakrabarti • But some of it is still controversial. – This reflects my personal views. 7/28/2011 3
  • 4. Outline • Emerging consensus: – Interleaving semantics (Sequential Consistency) – But only for data-race-free programs • Brief discussion of consequences – Software requirements – Hardware requirements • Major remaining problem: – Java can‟t outlaw races. – We don‟t know how to give meaning to data races. – Some speculative solutions. 7/28/2011 4
  • 5. Naive threads programming model (Sequential Consistency) • Threads behave as though their memory accesses were simply interleaved. (Sequential consistency) Thread 1 Thread 2 x = 1; y = 2; z = 3; – might be executed as x = 1; y = 2; z = 3; 7/28/2011 5
  • 6. Locks restrict interleavings Thread 1 Thread 2 lock(l); lock(l); r1 = x; r2 = x; x = r1+1; x = r2+1; unlock(l); unlock(l); – can only be executed as lock(l); r1 = x; x = r1+1; unlock(l); lock(l); r2 = x; x = r2+1; unlock(l); or lock(l); r2 = x; x = r2+1; unlock(l); lock(l); r1 = x; x = r1+1; unlock(l); since second lock(l) must follow first unlock(l) 7/28/2011 6
  • 7. Atomic sections / transactional memory are just like a single global lock. 7/28/2011 7
  • 8. But this doesn‟t quite work … • Limits reordering and other hardware/compiler transformations – “Dekker‟s” example (everything initially zero) should allow r1 = r2 = 0: Thread 1 Thread 2 x = 1; y = 1; r1 = y; r2 = x; – Compilers like to move up loads. – Hardware likes to buffer stores. 7/28/2011 8
  • 9. Sensitive to memory access granularity Thread 1 Thread 2 x = 300; x = 100; • If memory is accessed a byte at a time, this may be executed as: x_high = 0; x_high = 1; // x = 256 x_low = 44; // x = 300; x_low = 100; // x = 356; 7/28/2011 9
  • 10. And this is at too low a level … • And taking advantage of sequential consistency involves reasoning about memory access interleaving: – Much too hard. – Want to reason about larger “atomic” code regions • which can‟t be visibly interleaved. 09/08/2010 10
  • 11. Real threads programming model (1) • Two memory accesses conflict if they – access the same scalar object*, e.g. variable. – at least one access is a store. – E.g. x = 1; and r2 = x; conflict • Two ordinary memory accesses participate in a data race if they – conflict, and – can occur simultaneously • i.e. appear as adjacent operations (diff. threads) in interleaving. • A program is data-race-free (on a particular input) if no sequentially consistent execution results in a data race. * or contiguous sequence of bit-fields 09/08/2010 11
  • 12. Real threads programming model (2) • Sequential consistency only for data-race- free programs! – Avoid anything else. • Data races are prevented by – locks (or atomic sections) to restrict interleaving – declaring synchronization variables • (in two slides …) 09/08/2010 12
  • 13. Alternate data-race-free definition: happens-before • Memory access a happens-before b if – a precedes b in program order. Thread 1: – a and b are synchronization operations, and b observes the results of a, thus lock(m); enforcing ordering between them. t = x + 1; • e.g. a unlocks m, b subsequently locks m. – Or there is a c such that a happens-before x = t; c and c happens-before b. unlock(m) • Two ordinary conflicting memory operations a and b participate in a Thread 2: data race if neither lock(m); – a happens-before b, t = x + 1; – nor b happens-before a. • Set of data-race-free programs usually the x = t; same. unlock(m)
  • 14. Synchronization variables • Java: volatile, java.util.concurrent.atomic. • C++11: atomic<int>, atomic_int • C1x: _Atomic(int), _Atomic int, atomic_int • Guarantee indivisibility of operations. • “Don‟t count” in determining whether there is a data race: – Programs with “races” on synchronization variables are still sequentially consistent. – Though there may be “escapes” (Java, C++11). • Dekker‟s algorithm “just works” with synchronization variables. 7/28/2011 14
  • 15. As expressive as races Double-checked locking: Double-checked locking: Wrong! Correct (C++11): bool x_init; atomic<bool> x_init; if (!x_init) { if (!x_init) { l.lock(); l.lock(); if (!x_init) { if (!x_init) { initialize x; initialize x; x_init = true; x_init = true; } } l.unlock(); l.unlock(); } } use x; use x; 7/28/2011 15
  • 16. Data Races Revisited • Are defined in terms of sequentially consistent executions. • If x and y are initially zero, this does not have a data race: Thread 1 Thread 2 if (x) if (y) y = 1; x = 1; 7/28/2011 16
  • 17. SC for DRF programming model advantages over SC • Supports important hardware & compiler optimizations. • DRF restriction  Synchronization-free code sections appear to execute atomically, i.e. without visible interleaving. – If one didn‟t: Thread 1 (not atomic): Thread 2(observer): a = 1; if (a == 1 && b == 0) { b = 1; … } 09/08/2010 17
  • 18. SC for DRF implementation model (1) • Very restricted reordering of lock(); memory operations around synch-free synchronization operations: code – Compiler either understands these, or region treats them as opaque, potentially updating any location. unlock(); – Synchronization operations include synch-free instructions to limit or prevent hardware reordering (“memory code fences”). region • e.g. lock acquisition, release, atomic store, might contain memory fences.
  • 19. SC for DRF implementation model (2) • Code may be reordered lock(); between synchronization operations. synch-free – Another thread can only tell if it code accesses the same data between region reordered operations. – Such an access would be a data unlock(); race. synch-free • If data races are disallowed code (e.g. Posix, Ada, C++11, not Java), compiler may assume region that variables don‟t change asynchronously.
  • 20. Data races Some variants are errors C++11 draft SC for DRF, X with explicit easily avoidable exceptions C1x draft Java SC for DRF, with some exceptions More details later. Ada83 SC for DRF (?) X Pthreads SC for DRF (??) X OpenMP SC for DRF (??, except atomics) X Fortran 2008 SC for DRF (??, except atomics) X C#, .Net SC for DRF when restricted to locks + Interlocked (???) 7/28/2011 20
  • 21. The exceptions in C++11: atomic<bool> x_init; if (!x_init.load(memory_order_acquire) { l.lock(); if (!x_init.load(memory_order_relaxed) { initialize x; x_init.store(true, memory_order_release); } l.unlock(); } use x; We‟ll ignore this for the rest of the talk …
  • 22. Data Races as Errors • In many languages, data races are errors – In the sense of out-of-bounds array stores in C. – Anything can happen: • E.g. program can crash, local variables can appear to change, etc. – There are no benign races in e.g. C++11 or Ada 83. – Compilers may, and do, assume there are no races. If that assumption is false, all bets are off. – See e.g. Boehm, How to miscompile programs with “benign” data races, HotPar „11. . 7/28/2011 22
  • 23. How a data race may cause a crash unsigned x; • Assume switch statement compiled as If (x < 3) { branch table. • May assume x is in … // async x change range. switch(x) { • Asynchronous change to case 0: … x causes wild branch. case 1: … – Not just wrong value. case 2: … } } 23 28 July 2011
  • 24. Note: We can usually ignore wait/notify • In most languages wait can return without notify. – “Spurious wakeup” • Except for termination issues: – wait() ≡ unlock(); lock() – notify() ≡ nop • Applies to existence of data races, partial correctness, flow analysis. 7/28/2011 24
  • 25. Another note on data race definition • We define it in terms of scalar accesses, but • Container libraries should ensure that Container accesses don‟t race  No races on memory locations • This means – Accesses to hidden shared state (caches, allocation) must be locked by implementation. – User must lock for container-level races. • This is often the correct library thread-safety condition. 7/28/2011 25
  • 26. Outline • Emerging consensus: – Interleaving semantics (Sequential Consistency) – But only for data-race-free programs • Brief discussion of consequences – Software requirements – Hardware requirements • Major remaining problem: – Java can‟t outlaw races. – We don‟t know how to give meaning to data races. – Some speculative solutions. 7/28/2011 26
  • 27. Compilers must not introduce data races • Single thread compilers currently may add data races: (PLDI 05) struct {char a; char b} x; tmp = x; x.a = „z‟; tmp.a = „z‟; x = tmp; – x.a = 1 in parallel with x.b = 1 may fail to update x.b. • … and much more interesting examples. • Still broken in gcc with bit-fields. 7/28/2011 27
  • 28. Some restrictions are a bit more annoying: • Compiler may not introduce “speculative” stores: int count; // global, possibly shared … for (p = q; p != 0; p = p -> next) if (p -> data > 0) ++count; int count; // global, possibly shared … reg = count; for (p = q; p != 0; p = p -> next) if (p -> data > 0) ++reg; count = reg; // may spuriously assign to count 28 28 July 2011
  • 29. Introducing data races: Potentially infinite loops ? while (…) { *p = 1; } x = 1; x = 1; while (…) { *p = 1; } • x not otherwise accessed by loop. • Prohibited in Java. • Allowed by giving undefined behavior to such infinite loops in C++0x. (Controversial.) • Common text book analyses treat as equivalent? 7/28/2011 29
  • 30. Note on program analysis for optimization • Lots of research papers on analysis of sequentially consistent parallel programs. • AFAIK: Real compilers don’t do that! – For good reason. – Use sequential optimization in sync-free regions. • SC analysis is – Generally sound, but pessimistic for C++11 (minus SC escapes) – Very much so with incomplete information – Unsound for Java (but …) 7/28/2011 30
  • 31. Some positive optimization consequences • Rules are finally clear! – Optimization does not need to preserve full sequential consistency! • In languages that outlaw data races – Compilers already use that. – It can be leveraged more aggressively: – Assuming no other synchronization, x not motified by this thread, x is loop invariant in while (…) { … x … l.lock(); … l.unlock(); } 7/28/2011 31
  • 32. Language spec challenge 1 • Some common language features almost unavoidably introduce data races. • Most significant example: – Detached/daemon threads, combined with – Destructors / finalizers for static variables – Detached threads call into libraries that may access static variables • Even while they‟re being cleaned up. 7/28/2011 32
  • 33. Language spec challenge 2: • Some really awful code: Thread 2: Don’t try this at home!! Thread 1: x = 42; while (m.trylock()==SUCCESS) ? m.lock(); m.unlock(); assert (x == 42); • Disclaimer: Example requires • Can the assertion fail? tweaking to be pthreads- compliant. • Many implementations: Yes • Traditional specs: No. C++11: Yes • Trylock() can effectively fail spuriously! 09/08/2010 33
  • 34. Outline • Emerging consensus: – Interleaving semantics (Sequential Consistency) – But only for data-race-free programs • Brief discussion of consequences – Software requirements – Hardware requirements • Major remaining problem: – Java can‟t outlaw races. – We don‟t know how to give meaning to data races. – Some speculative solutions. 7/28/2011 34
  • 35. Byte store instructions • x.c = „a‟; may not visibly read and rewrite adjacent fields. • Byte stores must be implemented with – Byte store instruction, or – Atomic read-modify-write. • Typically expensive on multiprocessors. • Often cheaply implementable on uniprocessors. 7/28/2011 35
  • 36. Sequential consistency must be enforceable • Programs using only synchronization variables must be sequentially consistent. • Compiler literature contains many papers on enforcing sequential consistency by adding fences. But: – Not really possible on Itanium. – Wasn‟t possible on X86 until the re-revision of the spec last year. – Took months of discussions with PowerPC architects to conclude it‟s (barely, sort of) possible there. • The core issue is “write atomicity”: 7/28/2011 36
  • 37. Can fences enforce SC? Unclear that hardware fences can ensure sequential consistency. “IRIW” example: x, y initially zero. Fences between every instruction pair! Thread 1: Thread 2: Thread 3: Thread 4: x = 1; r1 = x; (1) y = 1; r3 = y; (1) fence; fence; r2 = y; (0) r4 = x; (0) x set first! y set first! Fully fenced, not sequentially consistent. Does hardware allow it? 7/28/2011 37
  • 38. Why does it matter? • Nobody cares about IRIW!? • It‟s a pain to enforce on at least PowerPC. • Many people (Sarita Adve, Doug Lea, Vijay Saraswat) spent about a year trying to relax SC requirement here. • (Personal opinion) The results were incomprehensible, and broke more important code. • No viable alternatives! 7/28/2011 38
  • 39. Acceptable hardware memory models • More challenging requirements: 1. Precise memory model specification 2. Byte stores 3. Cheap mechanism to enforce write atomicity 4. Dirt cheap mechanism to enforce data dependency ordering(?) (Java final fields) • Other than that, all standard approaches appear workable, but … 7/28/2011 39
  • 40. Replace fences completely? Synchronization variables on X86 • atomic store: ~1 cycle dozens of cycles – store (mov); mfence; • atomic load: ~1 cycle – load (mov) • Store implicitly ensures that prior memory operations become visible before store. • Load implicitly ensures that subsequent memory operations become visible later. • Sole reason for mfence: Order atomic store followed by atomic load. 7/28/2011 40
  • 41. Fence enforces all kinds of additional, unobservable orderings • s is a synchronization variable: x = 1; s = 2; // includes fence r1 = y; • Prevents reordering of x = 1 and r1 = y; – final load delayed until assignment to a visible. • But this ordering is invisible to non-racing threads – …and expensive to enforce? • We need a tiny fraction of mfence functionality. 7/28/2011 41
  • 42. Outline • Emerging consensus: – Interleaving semantics (Sequential Consistency) – But only for data-race-free programs • Brief discussion of consequences – Software requirements – Hardware requirements • Major remaining problem: – Java can‟t outlaw races. – We don‟t know how to give meaning to data races. – Some speculative solutions. 7/28/2011 42
  • 43. Data Races in Java • C++11 leaves data race semantics undefined. – “catch fire” semantics • Java supports sand-boxed code. • Don‟t know how to prevent data-races in sand-boxed, malicious code. • Java must provide some guarantees in the presence of data races. 7/28/2011 43
  • 44. Interesting data race outcome? x, y initially null, Loads may or may not see racing stores? Thread 1: Thread 2: r1 = x; r2 = y; y = r1; x = r2; Outcome: x = y = r1 = r2 = “<your bank password here>” 7/28/2011 44
  • 45. The Java Solution Quotation from 17.4.8, Java Language Specification, 3rd edition, omitted, to avoid possible copyright questions. It addresses the issue by explicitly outlawing the causality cycles required to get the dubious result from the last slide. The important point is that this is a rather complex mathematical specification. … 7/28/2011 45
  • 46. Complicated, but nice properties? • Manson, Pugh, Adve: The Java Memory Model, POPL 05 Quotation from section 9.1.2 of above paper omitted, to avoid possible copyright questions. This asserts (Theorem 1) that non-conflicting operations may be reordered by a compiler. 7/28/2011 46
  • 47. Much nicer than prior attempts, but: • Aspinall, Sevcik, “Java Memory Model Examples: Good, Bad, and Ugly”, VAMP 2007 (also ECOOP 2008 paper) Quotation from above paper omitted, to avoid possible copyright questions. This ends in the statement: “This falsifies Theorem 1 of [paper from previous slide].” Note: The underlying observation is due to Pietro Cenciarelli. 7/28/2011 47
  • 48. Why is this hard? • Want – Constrained race semantics for essential security properties. – Unconstrained race semantics to support compiler and hardware optimizations. – Simplicity. • No known good resolution. 7/28/2011 48
  • 49. Outline • Emerging consensus: – Interleaving semantics (Sequential Consistency) – But only for data-race-free programs • Brief discussion of consequences – Software requirements – Hardware requirements • Major remaining problem: – Java can‟t outlaw races. – We don‟t know how to give meaning to data races. – Some speculative solutions. 7/28/2011 49
  • 50. A Different Approach • Outlaw data races. • Require violations to be detectable! – Even in malicious sand-boxed code. • Possible approaches: – Statically prevent data races. • Tried repeatedly, ongoing work … – Dynamically detect the relevant data races. 7/28/2011 50
  • 51. Dynamic Race Detection • Need to guarantee one of: – Program is data-race free and provides SC execution (done), – Program contains a data race and raises an exception, or – Program exhibits simple semantics anyway, e.g. • Sequentially consistent • Synchronization-free regions are atomic • This is significantly cheaper than fully accurate data-race detection. – Track byte-level R/W information – Mostly in cache – As opposed to epoch number + thread id per byte 7/28/2011 51
  • 52. For more information: • Boehm, “Threads Basics”, HPL TR 2009-259. • Boehm, Adve, “Foundations of the C++ Concurrency Memory Model”, PLDI 08. • Sevcık and Aspinall, “On Validity of Program Transformations in the Java Memory Model”, ECOOP 08. • Sewell et al, “x86-TSO: A Rigorous and Usable Programmer‟s Model for x86”, CACM, July 2010. • S. V. Adve, Boehm, “Memory Models: A Case for Rethinking Parallel Languages and Hardware”, CACM, August 2010. • Lucia, Strauss, Ceze, Qadeer, Boehm, "Conflict Exceptions: Providing Simple Parallel Language Semantics with Precise Hardware Exceptions, ISCA 2010. Also Marino et al, PLDI 10. • Effinger-Dean, Boehm, Chakrabarti, Joisha, “Extended Sequential Reasoning for Data-Race-Free Programs”, MSPC 11. 7/28/2011 52
  • 55. Introducing Races: Register Promotion 2 r = g; for(...) { [g is global] if(mt) { for(...) { g = r; lock(); r = g; } if(mt) lock(); use/update r instead of g; use/update g; if(mt) { if(mt) unlock(); g = r; unlock(); r = g; } } } g = r; 7/28/2011 55
  • 56. Trylock: Critical section reordering? • Reordering of memory operations with respect to critical sections: Expected (& Java): Naïve pthreads: Optimized pthreads lock() lock() lock() unlock() unlock() unlock() 7/28/2011 56
  • 57. Some open source pthread lock implementations (2006): lock() lock() lock() lock() unlock() unlock() unlock() unlock() [technically incorrect] [Correct, slow] [Correct] [Incorrect] NPTL NPTL NPTL FreeBSD {Alpha, PowerPC} Itanium (&X86) { Itanium, X86 } Itanium {mutex, spin} mutex spin spin 7/28/2011 57