• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Memory models
 

Memory models

on

  • 376 views

 

Statistics

Views

Total Views
376
Views on SlideShare
376
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Memory models Memory models Presentation Transcript

    • Memory Models Dr. C.V. Suresh Babu
    • Memory Consistency Models Parallelism for the masses! Shared-memory most common Memory model = Legal values for reads
    • Memory Consistency Models Parallelism for the masses! Shared-memory most common Memory model = Legal values for reads
    • Memory Consistency Models Parallelism for the masses! Shared-memory most common Memory model = Legal values for reads
    • Memory Consistency Models Parallelism for the masses! Shared-memory most common Memory model = Legal values for reads
    • Memory Consistency Models Parallelism for the masses! Shared-memory most common Memory model = Legal values for reads
    • 20 Years of Memory Models • Memory model is at the heart of concurrency semantics – 20 year journey from confusion to convergence at last! – Hard lessons learned – Implications for future • Current way to specify concurrency semantics is too hard – Fundamentally broken • Must rethink parallel languages and hardware • Implications for broader CS disciplines
    • What is a Memory Model? • Memory model defines what values a read can return Initially A=B=C=Flag=0 Thread 1 Thread 2 A = 26 while (Flag != 1) {;} 90 B = 90 r1 = B 26 0 … r2 = A Flag = 1 …
    • Memory Model is Key to Concurrency Semantics • Interface between program and transformers of program – Defines what values a read can return Compiler Assembly C++ program Dynamic optimizer Hardware • Weakest system component exposed to the programmer – Language level model has implications for hardware • Interface must last beyond trends
    • Desirable Properties of a Memory Model • 3 Ps – Programmability – Performance – Portability • Challenge: hard to satisfy all 3 Ps – Late 1980’s - 90’s: Largely driven by hardware • Lots of models, little consensus – 2000 onwards: Largely driven by languages/compilers • Consensus model for Java, C++ (C, others ongoing) • Had to deal with mismatches in hardware models Path to convergence has lessons for future
    • Programmability – SC [Lamport79] • Programmability: Sequential consistency (SC) most intuitive – Operations of a single thread in program order – All operations in a total order or atomic • But Performance? – Recent (complex) hardware techniques boost performance with SC – But compiler transformations still inhibited • But Portability? – Almost all h/w, compilers violate SC today ⇒SC not practical, but…
    • Next Best Thing – SC Almost Always • Parallel programming too hard even with SC – Programmers (want to) write well structured code – Explicit synchronization, no data races Thread 1 Thread 2 Lock(L) Lock(L) Read Data1 Read Data2 Write Data2 Write Data1 … … Unlock(L) Unlock(L) – SC for such programs much easier: can reorder data accesses ⇒ Data-race-free model [AdveHill90] – SC for data-race-free programs – No guarantees for programs with data races
    • Definition of a Data Race • Distinguish between data and non-data (synchronization) accesses • Only need to define for SC executions ⇒ total order • Two memory accesses form a race if – From different threads, to same location, at least one is a write – Occur one after another Thread 1 Write, A, 26 Write, B, 90 Thread 2 Read, Flag, 0 Write, Flag, 1 Read, Flag, 1 Read, B, 90 Read, A, 26 • A race with a data access is a data race • Data-race-free-program = No data race in any SC execution
    • Data-Race-Free Model Data-race-free model = SC for data-race-free programs – Does not preclude races for wait-free constructs, etc. • Requires races be explicitly identified as synchronization • E.g., use volatile variables in Java, atomics in C++ – Dekker’s algorithm Initially Flag1 = Flag2 = 0 volatile Flag1, Flag2 Thread1 Flag1 = 1 if Flag2 == 0 //critical section Thread2 Flag2 = 1 if Flag1 == 0 //critical section SC prohibits both loads returning 0
    • Data-Race-Free Approach • Programmer’s model: SC for data-race-free programs • Programmability – Simplicity of SC, for data-race-free programs • Performance – Specifies minimal constraints (for SC-centric view) • Portability – Language must provide way to identify races – Hardware must provide way to preserve ordering on races – Compiler must translate correctly
    • 1990's in Practice (The Memory Models Mess) • Hardware – Implementation/performance-centric view LD LD LD – Various ordering guarantees + fences to impose other orders – Many ambiguities - due to complexity, by design(?), … • High-level languages – Most shared-memory programming with Pthreads, OpenMP • Incomplete, ambiguous model specs • Memory model property of language, not library [Boehm05] – Java – commercially successful language with threads • Chapter 17 of Java language spec on memory model • But hard to interpret, badly broken ST Fence – Different vendors had different models – most non-SC • Alpha, Sun, x86, Itanium, IBM, AMD, HP, Cray, … ST ST LD ST
    • 2000 – 2004: Java Memory Model • ~ 2000: Bill Pugh publicized fatal flaws in Java model • Lobbied Sun to form expert group to revise Java model • Open process via mailing list – – – – Diverse participants Took 5 years of intense, spirited debates Many competing models Final consensus model approved in 2005 for Java 5.0 [MansonPughAdve POPL 2005]
    • Java Memory Model Highlights • Quick agreement that SC for data-race-free was required • Missing piece: Semantics for programs with data races – Java cannot have undefined semantics for ANY program – Must ensure safety/security guarantees – Limit damage from data races in untrusted code • Goal: Satisfy security/safety, w/ maximum system flexibility – Problem: “safety/security, limited damage” w/ threads very vague
    • Java Memory Model Highlights Initially X=Y=0 Thread 1 r1 = X Y = r1 Thread 2 r2 = Y X = r2 Is r1=r2=42 allowed? Data races produce causality loop! Definition of a causality loop was surprisingly hard Common compiler optimizations seem to violate“causality”
    • Java Memory Model Highlights • Final model based on consensus, but complex – Programmers can (must) use “SC for data-race-free” – But system designers must deal with complexity – Correctness tools, racy programs, debuggers, …?? – Recent discovery of bugs [SevcikAspinall08]
    • 2005 - :C++, Microsoft Prism, Multicore • ~ 2005: Hans Boehm initiated C++ concurrency model – Prior status: no threads in C++, most concurrency w/ Pthreads • Microsoft concurrently started its own internal effort • C++ easier than Java because it is unsafe – Data-race-free is plausible model • BUT multicore ⇒ New h/w optimizations, more scrutiny – Mismatched h/w, programming views became painfully obvious – Debate that SC for data-race-free inefficient w/ hardware models
    • C++ Challenges 2006: Pressure to change Java/C++ to remove SC baseline To accommodate some hardware vendors • But what is alternative? – Must allow some hardware optimizations – But must be teachable to undergrads • Showed such an alternative (probably) does not exist
    • C++ Compromise • Default C++ model is data-race-free • AMD, Intel, … on board • But – Some systems need expensive fence for SC – Some programmers really want more flexibility • C++ specifies low-level atomics only for experts • Complicates spec, but only for experts • We are not advertising this part – [BoehmAdve PLDI 2008]
    • Summary of Current Status • Convergence to “SC for data-race-free” as baseline • For programs with data races – Minimal but complex semantics for safe languages – No semantics for unsafe languages
    • Lessons Learned • Specifying semantics for programs with data races is HARD – But “no semantics for data races” also has problems • Not an option for safe languages • Debugging, correctness checking tools • Hardware-software mismatch for some code – “Simple” optimizations have unintended consequences ⇒State-of-the-art is fundamentally broken
    • Lessons Learned • Specifying semantics for programs with data races is HARD – But “no semantics for data races” also has problems • Not an option for safe languages • Debugging, correctness checking tools Banish mismatch for some code • Hardware-softwareshared-memory? – “Simple” optimizations have unintended consequences ⇒State-of-the-art is fundamentally broken
    • Lessons Learned • Specifying semantics for programs with data races is HARD – But “no semantics for data races” also has problems • Not an option for safe languages • Debugging, correctness checking tools Banish wild shared-memory! • Hardware-software mismatch for some code – “Simple” optimizations have unintended consequences ⇒State-of-the-art is fundamentally broken • We need – Higher-level disciplined models that enforce discipline – Hardware co-designed with high-level models
    • Research Agenda for Languages • Disciplined shared-memory models – Simple – Enforceable – Expressive – Performance Key: What discipline? How to enforce it?
    • Data-Race-Free • A near-term discipline: Data-race-free • Enforcement – Ideally, language prohibits by design – e.g., ownership types [Boyapati+02] – Else, runtime catches as exception e.g., Goldilocks [Elmas+07] • But work still needed for expressivity and/or performance • But data-race-free still not sufficiently high level
    • Deterministic-by-Default Parallel Programming • Even data-race-free parallel programs are too hard – Multiple interleavings due to unordered synchronization (or races) – Makes reasoning and testing hard • But many algorithms are deterministic – Fixed input gives fixed output – Standard model for sequential programs – Also holds for many transformative parallel programs • Parallelism not part of problem specification, only for performance Why write such an algorithm in non-deterministic style, then struggle to understand and control its behavior?
    • Deterministic-by-Default Model • Parallel programs should be deterministic-by-default – Sequential semantics (easier than SC!) • If non-determinism is needed – should be explicitly requested, encapsulated – should not interfere with guarantees for the rest of the program • Enforcement: – Ideally, language prohibits by design – Else, runtime catches violations as exceptions
    • State-of-the-art • Many deterministic languages today – Functional, pure data parallel, some domain-specific, … – Much recent work on runtime, library-based approaches – E.g., Allen09, Divietti09, Olszewski09, … • Our work: Language approach for modern O-O methods – Deterministic Parallel Java (DPJ) [V. Adve et al.]
    • Deterministic Parallel Java (DPJ) • Object-oriented type and effect system – – – – Aliasing information: partition the heap into “regions” Effect specifications: regions read or written by each method Language guarantees determinism through type checking Side benefit: regions, effects are valuable documentation • Implemented as extension to base Java type system – – – – Initial evaluation for expressivity, performance [Bocchino+09] Semi-automatic tool for region annotations [Vakilian+09] Recent work on encapsulating frameworks and unchecked code Ongoing work on integrating non-determinism
    • Implications for Hardware • Current hardware not matched even to current model • Near term: ISA changes, speculation • Long term: Co-design hardware with new software models – Use disciplined software to make more efficient hardware – Use hardware to support disciplined software
    • Illinois DeNovo Project • How to design hardware from the ground up to – Exploit disciplined parallelism • for better performance, power, … – Support disciplined parallelism • for better dependability • Working with DPJ to exploit region, effect information – Software-assisted coherence, communication, scheduling – New hardware/software interface – Opportune time as we determine how to scale multicore
    • Conclusions • Current way to specify concurrency semantics fundamentally broken – Best we can do is SC for data-race-free • But cannot hide from programs with data races – Mismatched hardware-software • Simple optimizations give unintended consequences • Need – High-level disciplined models that enforce discipline – Hardware co-designed with high-level models • E.g., DPJ, DeNovo – Implications for many CS communities